Collecting original data with AI Task Builder Collections

This cookbook covers how to gather original data from participants using AI Task Builder Collections. Unlike Batches (where participants evaluate your existing data), Collections let participants submit their own content such as images, files, or text responses.

Overview

A Collection defines the pages and questions participants see. You attach it to a Prolific study, which handles recruitment, reward, and participant targeting. The end-to-end workflow is:

Define a study template: reward, description, and participant targeting
Create a collection: define pages, instructions, and content blocks
Publish the study to start recruiting participants
Monitor submissions via the CLI or app

Use cases

Image or file collection (e.g. photos, documents, recordings)
Open-ended text responses
Multi-page surveys that gather original participant data
Consent + submission workflows

Prerequisites

A Prolific researcher account with AI Task Builder Collections enabled
An API token (see API Fundamentals)
Your workspace ID, visible in the app URL: app.prolific.com/researcher/workspaces/<workspace_id>/home
A project ID within your workspace, required to bill against workspace funds (see note below)
The Prolific CLI installed (see step 1)

Workspace billing requires a project

Studies must be associated with a project to bill against workspace funds. A study created without a project field will default to your personal account balance. Find your project IDs with:

$ prolific project list -w <workspace_id>

Step-by-step guide

Install and authenticate the Prolific CLI

The Prolific CLI lets you create collections, publish studies, and view submissions without writing API calls directly.

$ gh repo clone prolific-oss/cli prolific-cli
$ cd prolific-cli && make build
$ mkdir -p ~/.local/bin && cp prolific ~/.local/bin/
$ cd .. && rm -rf prolific-cli

Configure your API token by creating ~/.config/prolific-oss/prolific.yaml:

1 prolific_token: <your_api_token>

Verify authentication:

$ prolific whoami

Define your study

Create a JSON template for your study. When you publish via collection publish, the CLI automatically injects data_collection_method and data_collection_id; you don’t need to set these manually.

1 {
2   "name": "My Study",
3   "description": "<p>Participant-facing description of your study.</p>",
4   "project": "<project_id>",
5   "reward": 50,
6   "estimated_completion_time": 5,
7   "total_available_places": 100,
8   "prolific_id_option": "question",
9   "completion_codes": [
10     {
11       "code": "MYCODE1",
12       "code_type": "COMPLETED",
13       "actions": [{ "action": "AUTOMATICALLY_APPROVE" }]
14     }
15   ],
16   "device_compatibility": ["mobile", "desktop", "tablet"],
17   "peripheral_requirements": [],
18   "filters": [],
19   "submissions_config": {
20     "max_submissions_per_participant": 1
21   }
22 }

Reward and pricing:

reward is in pence (GBP) or cents (USD)
Prolific’s minimum rate is £6 / $8 per hour
To calculate a reward that fits a total budget: reward_per_person = (budget / participants) / 1.4 (the 1.4 accounts for Prolific’s ~40% service fee on workspace-billed studies)

Participant targeting with filters:

You can restrict your study to participants matching specific criteria. For example, to target pet owners:

1 "filters": [
2   {
3     "filter_id": "pets",
4     "selected_values": ["0", "1", "2", "3", "4", "5", "6", "7"]
5   }
6 ]

Browse available filters by querying the API:

$ curl -H "Authorization: Token <your_token>" https://api.prolific.com/api/v1/filters/

Define your collection

Create a YAML file describing your collection. A collection is made up of one or more pages (collection_items), each containing instructions and content blocks (page_items).

1 # collection.yaml
2 workspace_id: <your_workspace_id>
3 name: My Collection
4 description: A brief description of what participants will do.
5 task_details:
6   task_name: Task title shown to participants
7   task_introduction: "<p>Brief introduction shown before participants begin.</p>"
8   task_steps: "<ol><li>Step one</li><li>Step two</li></ol>"
9 collection_items:
10   - order: 0
11     title: Page title
12     page_items:
13       - order: 0
14         type: rich_text
15         content: |
16           Instructions shown to participants on this page.
17         content_format: markdown  # optional; defaults to html
18       - order: 1
19         type: file_upload
20         description: Upload your file here
21         accepted_file_types:
22           - .jpg
23           - .jpeg
24           - .png
25         max_file_size_mb: 10.0
26         min_file_count: 1
27         max_file_count: 1

Supported page item types:

Instruction types collect participant input:

Type	Description
`free_text`	Open text input from participant.
`free_text_with_unit`	Text input with a unit selector (e.g. kg / lbs).
`multiple_choice`	Single or multi-select options. Set `answer_limit: 1` for single-select.
`multiple_choice_with_free_text`	Options with associated free-text fields.
`file_upload`	File submission with configurable type and size constraints.

Content block types display content to participants (no input collected):

Type	Description
`rich_text`	Display formatted content. Supports `markdown` or `html` via `content_format` (default: `html`).
`image`	Display an image. Requires `url` (HTTPS only) and `alt_text`.

Once you’re happy with your collection definition, create it:

$ prolific collection create -t collection.yaml

Save the ID from the response; you’ll need it in the next step.

Create a draft study

Link your collection to a study and create it in draft status for review before going live:

$ prolific collection publish <collection_id> -t study-template.json --draft

The -t study-template.json flag is required. Without it the command has no reward, description, or participant count to work with and will fail.

The --draft flag creates the study as UNPUBLISHED so you can review and adjust it before participants can see it. Save the Study ID from the output.

To update any fields on the draft (e.g. adjust the reward):

$ echo '{"reward": 60}' | prolific study update <study_id> -t -

All fields can be updated on a draft study. Once published, only certain fields (such as total_available_places) can be changed.

Preview and publish the study

Before going live, preview your study to check how it will appear to participants:

$ prolific collection publish <collection_id> -t study-template.json --preview

This returns a preview URL you can open in your browser to walk through the participant experience.

Do not use the Prolific app to review this study. The researcher UI is not yet configured to handle studies created via Collections and will crash on the review screen. Use the --preview flag above or the CLI instead.

When you’re ready to go live, transition the draft study to active:

$ prolific study transition <study_id> -a PUBLISH

Once successfully published, the study’s status will be ACTIVE and participants matching your filters will begin to see it.

Ensure your workspace has sufficient funds before publishing. A study created without a project will bill against your personal account (which may have a £0 / $0 balance). See the prerequisites note above.

Monitor submissions

View incoming submissions for your study:

$ prolific submission list -s <study_id>

This shows each submission’s participant ID, status, and time taken. Statuses include:

Status	Meaning
`ACTIVE`	Participant is currently in progress
`AWAITING REVIEW`	Completed, pending your approval
`APPROVED`	Approved and paid
`RETURNED`	Participant returned or timed out

You can also view submissions in the app at: https://app.prolific.com/researcher/studies/<study_id>

Retrieving submitted data

Submitted data is available via the export API. Exports are generated asynchronously and delivered as a presigned HTTPS URL pointing to a ZIP archive.

Request an export

$ curl -X POST \
>   -H "Authorization: Token <your_token>" \
>   https://api.prolific.com/api/v1/data-collection/collections/<collection_id>/export

If the export is ready immediately (e.g. a cached result), the response is 200 OK:

1 {
2   "status": "complete",
3   "url": "https://...",
4   "expires_at": "2024-01-01T12:00:00.000Z"
5 }

If generation is still in progress, the response is 202 Accepted:

1 {
2   "status": "generating",
3   "export_id": "<export_id>"
4 }

Poll for completion

When you receive a 202, poll using the export_id until status is complete or failed:

$ curl -H "Authorization: Token <your_token>" \
>   https://api.prolific.com/api/v1/data-collection/collections/<collection_id>/export/<export_id>

Once complete, the response includes a url field — a presigned URL valid for 1 hour. Re-poll the GET endpoint to receive a refreshed URL if it has expired. If the status is failed, retry by sending POST again.

Archive contents

The ZIP contains:

File	Description
`responses.jsonl`	One JSON record per submission, keyed by instruction ID
`collection.json`	Collection metadata and instruction definitions
`README.md`	Quick-start guide and pandas example
`files/`	Uploaded files referenced by `file_upload` responses (if any)

Each line in responses.jsonl has this shape:

1 {
2   "submission_id": "...",
3   "participant_id": "...",
4   "collection_id": "...",
5   "created_at": "2024-01-01T10:00:00.000Z",
6   "responses": {
7     "<instruction_id>": {
8       "type": "free_text",
9       "description": "Instruction prompt shown to the participant",
10       "value": "Participant's answer"
11     }
12   }
13 }

The responses object is keyed by instruction ID. Use collection.json to map instruction IDs to their human-readable descriptions. The shape of each response entry depends on the instruction type:

Instruction type	Response fields
`free_text`	`type`, `description`, `value`
`free_text_with_unit`	`type`, `description`, `value`, `unit`
`multiple_choice`	`type`, `description`, `values` (array of strings)
`multiple_choice_with_free_text`	`type`, `description`, `values` (array of `{ option, explanation }`)
`file_upload`	`type`, `description`, `files` (array of `{ name, path }`)

For file_upload responses, the path field corresponds to the file path within the files/ directory of the ZIP archive.

Working with the data (Python)

1 import pandas as pd
2 import json
3 
4 # Load responses
5 df = pd.read_json('responses.jsonl', lines=True)
6 
7 # Load instruction metadata for human-readable column names
8 with open('collection.json') as f:
9     schema = json.load(f)
10 id_to_label = {i['id']: i['description'] for i in schema['instructions']}
11 
12 # Rename response keys from instruction IDs to descriptions
13 responses = df['responses'].apply(
14     lambda r: {id_to_label.get(k, k): v for k, v in r.items()}
15 )

Help & support

For questions and requests specific to using Prolific’s API, you can reach our support team through this form.