Exporting Batch Data | Prolific API

Once participants have completed your batch study, you can export all responses and uploaded files as a single ZIP archive.

Exports are generated asynchronously — the API responds immediately with a job ID, and the archive is built out-of-band. This keeps the request fast even for large batches with many uploaded files.

Workflow overview

Request an export by sending a POST request to the export endpoint. The API returns a job ID immediately.

Poll for completion by sending GET requests with your job ID until the status is complete or failed.

Download the ZIP archive from the presigned URL included in the complete response.

Work with the exported data — load responses-by-submission.jsonl or responses-by-task.jsonl for analysis, or extract files from the files/ directory.

Using the Prolific CLI

The Prolific CLI handles the full request, poll, and download flow in a single command:

$ prolific batch export <batch-id>

By default the archive is saved to <batch-id>-export-<YYYYMMDD-HHMMSS>.zip in the current directory. Use --output to specify a path:

$ prolific batch export <batch-id> --output ./my-export.zip

Requires the PROLIFIC_TOKEN environment variable and researcher access to the batch’s workspace.

Requesting an export

$ POST /api/v1/data-collection/batches/{batch_id}/export

No request body is required.

Responses

If a new export job is created, the API returns 202 Accepted:

1 {
2   "status": "generating",
3   "export_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
4 }

If a recent export for this batch already exists, the API returns 200 OK with the download URL immediately — you can skip polling and go straight to the download step.

1 {
2   "status": "complete",
3   "url": "https://...",
4   "expires_at": "2026-03-20T10:30:00Z"
5 }

Export requests are idempotent. Re-sending POST for a batch that is already generating or has a complete export returns the existing job ID or download URL rather than triggering a new generation.

Polling for completion

Use the export_id from the POST response to check the status of your export job.

$ GET /api/v1/data-collection/batches/{batch_id}/export/{export_id}

Poll at a reasonable interval (every 3–5 seconds) until the status changes.

Status	Meaning	Next step
`generating`	The archive is still being built	Continue polling
`complete`	The archive is ready — `url` and `expires_at` are included	Download the ZIP
`failed`	Generation failed or the archive was deleted	Retry by sending `POST` again

Complete response

1 {
2   "status": "complete",
3   "url": "https://...",
4   "expires_at": "2026-03-20T10:30:00Z"
5 }

The url is a presigned HTTPS link valid for 1 hour. Re-poll the GET endpoint to receive a refreshed URL if it has expired.

Polling example

1 import time
2 import requests
3 
4 def poll_export(batch_id, export_id, token, timeout=600):
5     headers = {"Authorization": f"Token {token}"}
6     deadline = time.time() + timeout
7     while time.time() < deadline:
8         r = requests.get(
9             f"https://api.prolific.com/api/v1/data-collection/batches/{batch_id}/export/{export_id}",
10             headers=headers,
11         )
12         r.raise_for_status()
13         data = r.json()
14         if data["status"] == "complete":
15             return data["url"]
16         if data["status"] == "failed":
17             raise RuntimeError(f"Export failed for batch {batch_id}")
18         time.sleep(3)
19     raise TimeoutError("Export did not complete within the timeout period")

Archive format

The downloaded ZIP contains the following structure:

export.zip
├── responses-by-submission.jsonl
├── responses-by-task.jsonl
├── batch.json
├── README.md
└── files/
    └── {submission_id}_{instruction_id}_{index}.{ext}

The files/ directory is only present if participants uploaded files. The README.md inside the archive contains a quick-start guide and a pandas example.

The archive ships two JSONL files covering the same underlying data from different angles — choose whichever suits your analysis:

File	One record per	Best for
`responses-by-submission.jsonl`	Participant submission	Quality review, per-participant analysis
`responses-by-task.jsonl`	Task	Content-centric analysis, agreement metrics

responses-by-submission.jsonl

Each line is a JSON object representing one participant’s submission, with all of their responses across every task they completed.

1 {
2   "submission_id": "sub-abc123",
3   "participant_id": "part-def456",
4   "batch_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
5   "created_at": "2026-03-19T10:30:00Z",
6   "responses": {
7     "task-id-1": {
8       "0192a3b4-e7f8-7a0b-1c2d-3e4f5a6b7c8d": {
9         "type": "free_text",
10         "description": "Briefly describe the item shown",
11         "value": "A red leather wallet with two card slots"
12       },
13       "0192a3b5-e3f4-7a5b-6c7d-9e0f1a2b3c4d": {
14         "type": "file_upload",
15         "description": "Upload a photo of the item",
16         "files": [
17           {
18             "name": "photo.jpg",
19             "path": "files/sub-abc123_0192a3b5-e3f4-7a5b-6c7d-9e0f1a2b3c4d_0.jpg"
20           }
21         ]
22       }
23     }
24   }
25 }

responses-by-task.jsonl

Each line is a JSON object representing one task, with all participant responses for that task collected together.

1 {
2   "task_id": "task-xyz789",
3   "batch_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
4   "responses": [
5     {
6       "submission_id": "sub-abc123",
7       "participant_id": "part-def456",
8       "created_at": "2026-03-19T10:30:00Z",
9       "0192a3b4-e7f8-7a0b-1c2d-3e4f5a6b7c8d": {
10         "type": "free_text",
11         "description": "Briefly describe the item shown",
12         "value": "A red leather wallet with two card slots"
13       }
14     },
15     {
16       "submission_id": "sub-ghi789",
17       "participant_id": "part-jkl012",
18       "created_at": "2026-03-19T11:15:00Z",
19       "0192a3b4-e7f8-7a0b-1c2d-3e4f5a6b7c8d": {
20         "type": "free_text",
21         "description": "Briefly describe the item shown",
22         "value": "Red bifold wallet, appears to be leather"
23       }
24     }
25   ]
26 }

The path for file uploads is relative to the archive root, so it can be used directly after extraction.

Response value shapes

Instruction type	Value fields
`free_text`	`value: string`
`free_text_with_unit`	`value: string`, `unit: string`
`multiple_choice`	`values: string[]`
`multiple_choice_with_free_text`	`values: { option: string, explanation: string }[]`
`file_upload`	`files: { name: string, path: string }[]`

batch.json

Batch metadata and a list of all instructions, useful for mapping instruction IDs to their descriptions and types.

1 {
2   "batch_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
3   "name": "Product Image Annotation",
4   "exported_at": "2026-03-19T10:30:00Z",
5   "instructions": [
6     {
7       "id": "0192a3b4-e7f8-7a0b-1c2d-3e4f5a6b7c8d",
8       "type": "free_text",
9       "description": "Briefly describe the item shown"
10     },
11     {
12       "id": "0192a3b5-e3f4-7a5b-6c7d-9e0f1a2b3c4d",
13       "type": "file_upload",
14       "description": "Upload a photo of the item"
15     }
16   ]
17 }

Working with the exported data

Load responses with pandas

1 import pandas as pd
2 
3 # Participant-centric view
4 by_submission = pd.read_json("responses-by-submission.jsonl", lines=True)
5 
6 # Task-centric view (useful for inter-annotator agreement)
7 by_task = pd.read_json("responses-by-task.jsonl", lines=True)

Extract uploaded files

1 import zipfile
2 
3 with zipfile.ZipFile("export.zip") as z:
4     z.extractall("export/")
5 
6 # Files are at: export/files/{submission_id}_{instruction_id}_{index}.{ext}

The submission_id prefix in each filename lets you match files back to their submission record in responses-by-submission.jsonl.

Handle all response types

1 for record in by_submission.itertuples():
2     for task_id, task_responses in record.responses.items():
3         for instruction_id, response in task_responses.items():
4             match response["type"]:
5                 case "free_text" | "free_text_with_unit":
6                     print(response["value"])
7                 case "multiple_choice":
8                     print(response["values"])
9                 case "multiple_choice_with_free_text":
10                     for v in response["values"]:
11                         print(v["option"], v["explanation"])
12                 case "file_upload":
13                     for f in response["files"]:
14                         print(f["path"])

Notes

Presigned URL expiry: download URLs are valid for 1 hour. Re-poll GET to receive a refreshed URL.
Retry on failure: a failed export can be retried by sending POST again.
Active responses only: deleted and no_submission responses are excluded from the export.

By using AI Task Builder, you agree to our AI Task Builder Terms.