Collecting original data with AI Task Builder Collections

View as MarkdownOpen in Claude

This cookbook covers how to gather original data from participants using AI Task Builder Collections. Unlike Batches (where participants evaluate your existing data), Collections let participants submit their own content such as images, files, or text responses.

Overview

A Collection defines the pages and questions participants see. You attach it to a Prolific study, which handles recruitment, reward, and participant targeting. The end-to-end workflow is:

  1. Define a study template: reward, description, and participant targeting
  2. Create a collection: define pages, instructions, and content blocks
  3. Publish the study to start recruiting participants
  4. Monitor submissions via the CLI or app

Use cases

  • Image or file collection (e.g. photos, documents, recordings)
  • Open-ended text responses
  • Multi-page surveys that gather original participant data
  • Consent + submission workflows

Prerequisites

  • A Prolific researcher account with AI Task Builder Collections enabled
  • An API token (see API Fundamentals)
  • Your workspace ID, visible in the app URL: app.prolific.com/researcher/workspaces/<workspace_id>/home
  • A project ID within your workspace, required to bill against workspace funds (see note below)
  • The Prolific CLI installed (see step 1)

Workspace billing requires a project

Studies must be associated with a project to bill against workspace funds. A study created without a project field will default to your personal account balance. Find your project IDs with:

$prolific project list -w <workspace_id>

Step-by-step guide

1

Install and authenticate the Prolific CLI

The Prolific CLI lets you create collections, publish studies, and view submissions without writing API calls directly.

$gh repo clone prolific-oss/cli prolific-cli
$cd prolific-cli && make build
$mkdir -p ~/.local/bin && cp prolific ~/.local/bin/
$cd .. && rm -rf prolific-cli

Configure your API token by creating ~/.config/prolific-oss/prolific.yaml:

1prolific_token: <your_api_token>

Verify authentication:

$prolific whoami
2

Define your study

Create a JSON template for your study. When you publish via collection publish, the CLI automatically injects data_collection_method and data_collection_id; you don’t need to set these manually.

1{
2 "name": "My Study",
3 "description": "<p>Participant-facing description of your study.</p>",
4 "project": "<project_id>",
5 "reward": 50,
6 "estimated_completion_time": 5,
7 "total_available_places": 100,
8 "prolific_id_option": "question",
9 "completion_codes": [
10 {
11 "code": "MYCODE1",
12 "code_type": "COMPLETED",
13 "actions": [{ "action": "AUTOMATICALLY_APPROVE" }]
14 }
15 ],
16 "device_compatibility": ["mobile", "desktop", "tablet"],
17 "peripheral_requirements": [],
18 "filters": [],
19 "submissions_config": {
20 "max_submissions_per_participant": 1
21 }
22}

Reward and pricing:

  • reward is in pence (GBP) or cents (USD)
  • Prolific’s minimum rate is £6 / $8 per hour
  • To calculate a reward that fits a total budget: reward_per_person = (budget / participants) / 1.4 (the 1.4 accounts for Prolific’s ~40% service fee on workspace-billed studies)

Participant targeting with filters:

You can restrict your study to participants matching specific criteria. For example, to target pet owners:

1"filters": [
2 {
3 "filter_id": "pets",
4 "selected_values": ["0", "1", "2", "3", "4", "5", "6", "7"]
5 }
6]

Browse available filters by querying the API:

$curl -H "Authorization: Token <your_token>" https://api.prolific.com/api/v1/filters/
3

Define your collection

Create a YAML file describing your collection. A collection is made up of one or more pages (collection_items), each containing instructions and content blocks (page_items).

1# collection.yaml
2workspace_id: <your_workspace_id>
3name: My Collection
4description: A brief description of what participants will do.
5task_details:
6 task_name: Task title shown to participants
7 task_introduction: "<p>Brief introduction shown before participants begin.</p>"
8 task_steps: "<ol><li>Step one</li><li>Step two</li></ol>"
9collection_items:
10 - order: 0
11 title: Page title
12 page_items:
13 - order: 0
14 type: rich_text
15 content: |
16 Instructions shown to participants on this page.
17 content_format: markdown # optional; defaults to html
18 - order: 1
19 type: file_upload
20 description: Upload your file here
21 accepted_file_types:
22 - .jpg
23 - .jpeg
24 - .png
25 max_file_size_mb: 10.0
26 min_file_count: 1
27 max_file_count: 1

Supported page item types:

Instruction types collect participant input:

TypeDescription
free_textOpen text input from participant.
free_text_with_unitText input with a unit selector (e.g. kg / lbs).
multiple_choiceSingle or multi-select options. Set answer_limit: 1 for single-select.
multiple_choice_with_free_textOptions with associated free-text fields.
file_uploadFile submission with configurable type and size constraints.

Content block types display content to participants (no input collected):

TypeDescription
rich_textDisplay formatted content. Supports markdown or html via content_format (default: html).
imageDisplay an image. Requires url (HTTPS only) and alt_text.

Once you’re happy with your collection definition, create it:

$prolific collection create -t collection.yaml

Save the ID from the response; you’ll need it in the next step.

4

Create a draft study

Link your collection to a study and create it in draft status for review before going live:

$prolific collection publish <collection_id> -t study-template.json --draft

The -t study-template.json flag is required. Without it the command has no reward, description, or participant count to work with and will fail.

The --draft flag creates the study as UNPUBLISHED so you can review and adjust it before participants can see it. Save the Study ID from the output.

To update any fields on the draft (e.g. adjust the reward):

$echo '{"reward": 60}' | prolific study update <study_id> -t -

All fields can be updated on a draft study. Once published, only certain fields (such as total_available_places) can be changed.

5

Preview and publish the study

Before going live, preview your study to check how it will appear to participants:

$prolific collection publish <collection_id> -t study-template.json --preview

This returns a preview URL you can open in your browser to walk through the participant experience.

Do not use the Prolific app to review this study. The researcher UI is not yet configured to handle studies created via Collections and will crash on the review screen. Use the --preview flag above or the CLI instead.

When you’re ready to go live, transition the draft study to active:

$prolific study transition <study_id> -a PUBLISH

Once successfully published, the study’s status will be ACTIVE and participants matching your filters will begin to see it.

Ensure your workspace has sufficient funds before publishing. A study created without a project will bill against your personal account (which may have a £0 / $0 balance). See the prerequisites note above.

6

Monitor submissions

View incoming submissions for your study:

$prolific submission list -s <study_id>

This shows each submission’s participant ID, status, and time taken. Statuses include:

StatusMeaning
ACTIVEParticipant is currently in progress
AWAITING REVIEWCompleted, pending your approval
APPROVEDApproved and paid
RETURNEDParticipant returned or timed out

You can also view submissions in the app at: https://app.prolific.com/researcher/studies/<study_id>

Retrieving submitted data

Submitted data is available via the export API. Exports are generated asynchronously and delivered as a presigned HTTPS URL pointing to a ZIP archive.

Request an export

$curl -X POST \
> -H "Authorization: Token <your_token>" \
> https://api.prolific.com/api/v1/data-collection/collections/<collection_id>/export

If the export is ready immediately (e.g. a cached result), the response is 200 OK:

1{
2 "status": "complete",
3 "url": "https://...",
4 "expires_at": "2024-01-01T12:00:00.000Z"
5}

If generation is still in progress, the response is 202 Accepted:

1{
2 "status": "generating",
3 "export_id": "<export_id>"
4}

Poll for completion

When you receive a 202, poll using the export_id until status is complete or failed:

$curl -H "Authorization: Token <your_token>" \
> https://api.prolific.com/api/v1/data-collection/collections/<collection_id>/export/<export_id>

Once complete, the response includes a url field — a presigned URL valid for 1 hour. Re-poll the GET endpoint to receive a refreshed URL if it has expired. If the status is failed, retry by sending POST again.

Archive contents

The ZIP contains:

FileDescription
responses.jsonlOne JSON record per submission, keyed by instruction ID
collection.jsonCollection metadata and instruction definitions
README.mdQuick-start guide and pandas example
files/Uploaded files referenced by file_upload responses (if any)

Each line in responses.jsonl has this shape:

1{
2 "submission_id": "...",
3 "participant_id": "...",
4 "collection_id": "...",
5 "created_at": "2024-01-01T10:00:00.000Z",
6 "responses": {
7 "<instruction_id>": {
8 "type": "free_text",
9 "description": "Instruction prompt shown to the participant",
10 "value": "Participant's answer"
11 }
12 }
13}

The responses object is keyed by instruction ID. Use collection.json to map instruction IDs to their human-readable descriptions. The shape of each response entry depends on the instruction type:

Instruction typeResponse fields
free_texttype, description, value
free_text_with_unittype, description, value, unit
multiple_choicetype, description, values (array of strings)
multiple_choice_with_free_texttype, description, values (array of { option, explanation })
file_uploadtype, description, files (array of { name, path })

For file_upload responses, the path field corresponds to the file path within the files/ directory of the ZIP archive.

Working with the data (Python)

1import pandas as pd
2import json
3
4# Load responses
5df = pd.read_json('responses.jsonl', lines=True)
6
7# Load instruction metadata for human-readable column names
8with open('collection.json') as f:
9 schema = json.load(f)
10id_to_label = {i['id']: i['description'] for i in schema['instructions']}
11
12# Rename response keys from instruction IDs to descriptions
13responses = df['responses'].apply(
14 lambda r: {id_to_label.get(k, k): v for k, v in r.items()}
15)

Help & support

For questions and requests specific to using Prolific’s API, you can reach our support team through this form.