Exporting Collection Data

Once participants have completed your collection study, you can export all responses and uploaded files as a single ZIP archive.

Exports are generated asynchronously — the API responds immediately with a job ID, and the archive is built out-of-band. This keeps the request fast even for large collections with many uploaded files.

Workflow overview

Request an export by sending a POST request to the export endpoint. The API returns a job ID immediately.

Poll for completion by sending GET requests with your job ID until the status is complete or failed.

Download the ZIP archive from the presigned URL included in the complete response.

Work with the exported data — load responses.jsonl for analysis or extract files from the files/ directory.

Using the Prolific CLI

The Prolific CLI handles the full request, poll, and download flow in a single command:

$ prolific collection export <collection-id>

By default the archive is saved to <collection-id>-export-<YYYYMMDD-HHMMSS>.zip in the current directory. Use --output to specify a path:

$ prolific collection export <collection-id> --output ./my-export.zip

Requires the PROLIFIC_TOKEN environment variable and researcher access to the collection’s workspace.

Requesting an export

$ POST /api/v1/data-collection/collections/{collection_id}/export

No request body is required.

Responses

If a new export job is created, the API returns 202 Accepted:

1 {
2   "status": "generating",
3   "export_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
4 }

If a recent export for this collection already exists, the API returns 200 OK with the download URL immediately — you can skip polling and go straight to the download step.

1 {
2   "status": "complete",
3   "url": "https://...",
4   "expires_at": "2026-03-20T10:30:00Z"
5 }

Export requests are idempotent. Re-sending POST for a collection that is already generating or has a complete export returns the existing job ID or download URL rather than triggering a new generation.

Polling for completion

Use the export_id from the POST response to check the status of your export job.

$ GET /api/v1/data-collection/collections/{collection_id}/export/{export_id}

Poll at a reasonable interval (every 3–5 seconds) until the status changes.

Status	Meaning	Next step
`generating`	The archive is still being built	Continue polling
`complete`	The archive is ready — `url` and `expires_at` are included	Download the ZIP
`failed`	Generation failed or the archive was deleted	Retry by sending `POST` again

Complete response

1 {
2   "status": "complete",
3   "url": "https://...",
4   "expires_at": "2026-03-20T10:30:00Z"
5 }

The url is a presigned HTTPS link valid for 24 hours. Re-poll the GET endpoint to receive a refreshed URL if it has expired.

Polling example

1 import time
2 import requests
3 
4 def poll_export(collection_id, export_id, token, timeout=600):
5     headers = {"Authorization": f"Token {token}"}
6     deadline = time.time() + timeout
7     while time.time() < deadline:
8         r = requests.get(
9             f"https://api.prolific.com/api/v1/data-collection/collections/{collection_id}/export/{export_id}",
10             headers=headers,
11         )
12         r.raise_for_status()
13         data = r.json()
14         if data["status"] == "complete":
15             return data["url"]
16         if data["status"] == "failed":
17             raise RuntimeError(f"Export failed for collection {collection_id}")
18         time.sleep(3)
19     raise TimeoutError("Export did not complete within the timeout period")

Archive format

The downloaded ZIP contains the following structure:

collection-export-{collection_id}-{YYYYMMDDTHHMMSS}/
├── responses.jsonl
├── collection.json
├── README.md
└── files/
    └── {submission_id}_{instruction_id}_{index}.{ext}

The files/ directory is only present if participants uploaded files. The README.md inside the archive contains a quick-start guide and a pandas example.

responses.jsonl

Each line is a JSON object representing one submission. Response values are keyed by instruction ID.

1 {
2   "submission_id": "sub-abc123",
3   "participant_id": "part-def456",
4   "collection_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
5   "created_at": "2026-03-19T10:30:00Z",
6   "responses": {
7     "0192a3b4-e7f8-7a0b-1c2d-3e4f5a6b7c8d": {
8       "type": "free_text",
9       "description": "Briefly describe the skin condition shown in your image",
10       "value": "Small red patch on left forearm, slightly raised"
11     },
12     "0192a3b5-e3f4-7a5b-6c7d-9e0f1a2b3c4d": {
13       "type": "file_upload",
14       "description": "Upload a clear photo of the affected area",
15       "files": [
16         {
17           "name": "photo.jpg",
18           "path": "files/sub-abc123_0192a3b5-e3f4-7a5b-6c7d-9e0f1a2b3c4d_0.jpg"
19         }
20       ]
21     }
22   }
23 }

The path for file uploads is relative to the archive root, so it can be used directly after extraction.

Response value shapes

Instruction type	Value fields
`free_text`	`value: string`
`free_text_with_unit`	`value: string`, `unit: string`
`multiple_choice`	`values: string[]`
`multiple_choice_with_free_text`	`values: { option: string, explanation: string }[]`
`file_upload`	`files: { name: string, path: string }[]`

collection.json

Collection metadata and a list of all instructions, useful for mapping instruction IDs to their descriptions and types.

1 {
2   "collection_id": "0192a3b4-c5d6-7e8f-9a0b-1c2d3e4f5a6b",
3   "name": "Skin Condition Image Collection",
4   "exported_at": "2026-03-19T10:30:00Z",
5   "instructions": [
6     {
7       "id": "0192a3b4-e7f8-7a0b-1c2d-3e4f5a6b7c8d",
8       "type": "free_text",
9       "description": "Briefly describe the skin condition shown in your image"
10     },
11     {
12       "id": "0192a3b5-e3f4-7a5b-6c7d-9e0f1a2b3c4d",
13       "type": "file_upload",
14       "description": "Upload a clear photo of the affected area"
15     }
16   ]
17 }

Working with the exported data

Load responses with pandas

1 import pandas as pd
2 
3 df = pd.read_json("responses.jsonl", lines=True)
4 print(df.head())

Extract uploaded files

1 import zipfile
2 
3 with zipfile.ZipFile("export.zip") as z:
4     z.extractall("export/")
5 
6 # Files are at: export/files/{submission_id}_{instruction_id}_{index}.{ext}

The submission_id prefix in each filename lets you match files back to their submission record in responses.jsonl.

Handle all response types

1 for record in df.itertuples():
2     for instruction_id, response in record.responses.items():
3         match response["type"]:
4             case "free_text" | "free_text_with_unit":
5                 print(response["value"])
6             case "multiple_choice":
7                 print(response["values"])
8             case "multiple_choice_with_free_text":
9                 for v in response["values"]:
10                     print(v["option"], v["explanation"])
11             case "file_upload":
12                 for f in response["files"]:
13                     print(f["path"])

Notes

Presigned URL expiry: download URLs are valid for 1 hour. Re-poll GET to receive a refreshed URL.
Retry on failure: a failed export can be retried by sending POST again.
Active responses only: deleted and no_submission responses are excluded from the export.

By using AI Task Builder, you agree to our AI Task Builder Terms.