For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Go to app
DocumentationAPI Reference
DocumentationAPI Reference
  • API Reference
    • Introduction
    • Users
    • Taskflow
    • AI Task Builder
      • Batches
      • Exporting Batch Data
      • Collections
      • Exporting Collection Data
      • Datasets
      • Instructions
      • GETList Batches
      • POSTCreate a Batch
      • GETGet a Batch
      • PATCHUpdate a Batch
      • GETGet Batch Status
      • POSTSetup a Batch
      • GETGet Batch Responses
      • GETGet Batch Report
      • POSTDuplicate a Batch
      • POSTRequest a Batch Export
      • GETGet Batch Export Status
      • POSTCreate a Dataset
      • GETGet Dataset Upload URL
      • GETGet Dataset Status
      • GETGet Batch Instructions
      • POSTCreate Batch Instructions
      • PUTUpdate Batch Instructions
      • GETList Collections
      • POSTCreate a Collection
      • GETGet a Collection
      • PUTUpdate a Collection
      • GETGet Collection Responses
      • POSTRequest a Collection Export
      • GETGet Collection Export Status
    • Studies
    • Representative sample studies
    • study-collections
    • Filter Sets
    • Participant Groups
    • Custom Groups
    • Study Distribution
    • Submissions
    • Bonuses
    • Messages
    • Workspaces
    • Projects
    • Surveys
    • Webhooks
    • Invitations
    • Reward Recommendations
    • Testing
    • Well Known Endpoints
Go to app
LogoLogo
On this page
  • Creating a dataset
  • Response
  • Uploading data
  • Step 1: Request a presigned URL
  • Step 2: Upload to S3
  • CSV format
  • Metadata columns
  • Custom task grouping
  • Dataset status
API ReferenceAI Task Builder

Working with Datasets

|View as Markdown|Open in Claude|
Was this page helpful?
Previous

Exporting Collection Data

Next

Instructions

A dataset contains the data that participants will annotate in an AI Task Builder Batch. This page covers dataset creation, upload, and advanced configuration options.

For the complete batch workflow, see Working with Batches.

Creating a dataset

$POST /api/v1/data-collection/datasets
1{
2 "name": "Product reviews Q4 2024",
3 "workspace_id": "6278acb09062db3b35bcbeb0"
4}
FieldTypeRequiredDescription
namestringYesA name for your dataset
workspace_idstringYesThe ID of the Prolific workspace

Response

1{
2 "id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
3 "name": "Product reviews Q4 2024",
4 "status": "UNINITIALISED"
5}

Uploading data

Upload your dataset as a CSV file using presigned URLs.

Step 1: Request a presigned URL

$GET /api/v1/data-collection/datasets/{dataset_id}/upload-url/{filename}

For example:

$GET /api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d/upload-url/reviews.csv

Step 2: Upload to S3

Use the presigned URL from the response to upload your CSV file directly to S3.

$curl -X PUT \
> -H "Content-Type: text/csv" \
> --data-binary @reviews.csv \
> "{presigned_url}"

CSV format

Your CSV should contain one row per datapoint. Each column is displayed to participants alongside the instructions.

1id,review_text,product_name,rating
21,"Great product, exactly what I needed!",Widget Pro,5
32,"Arrived damaged, very disappointed",Widget Pro,1
43,"Works as expected, nothing special",Basic Widget,3

Metadata columns

Columns prefixed with META_ are not displayed to participants. Use these for internal data you need in your results but don’t want participants to see.

1id,review_text,META_source,META_timestamp
21,"Great product!",amazon,2024-01-15T10:30:00Z
32,"Not worth it",trustpilot,2024-01-16T14:22:00Z

In this example, participants see only the id and review_text columns. The META_source and META_timestamp columns are included in your results but hidden during annotation.

Custom task grouping

By default, tasks are grouped randomly when you set up a batch (using the tasks_per_group parameter). To define your own groupings, include a META_TASK_GROUP_ID column in your CSV.

Rows with the same META_TASK_GROUP_ID value will be grouped together into a single task group. Participants complete all tasks within a group in one submission.

1id,review_text,product_name,META_TASK_GROUP_ID
21,"Great product!",Widget Pro,widget_pro_reviews
32,"Excellent quality",Widget Pro,widget_pro_reviews
43,"Not worth the price",Basic Widget,basic_widget_reviews
54,"Does the job",Basic Widget,basic_widget_reviews

In this example, tasks 1 and 2 are grouped together, as are tasks 3 and 4. A participant assigned to the widget_pro_reviews group will annotate both reviews in a single submission.

If your dataset includes META_TASK_GROUP_ID, these groupings take precedence over the tasks_per_group parameter during batch setup.

Dataset status

Poll the dataset endpoint to check processing status.

$GET /api/v1/data-collection/datasets/{dataset_id}
StatusDescription
UNINITIALISEDDataset created but no data uploaded
PROCESSINGDataset is being processed
READYDataset is ready to be attached to a batch
ERRORSomething went wrong during processing

Wait for the status to reach READY before creating a batch with this dataset.