> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.prolific.com/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.prolific.com/_mcp/server.

# Uploading Data to a Dataset

This guide covers the V4 dataset upload flow: requesting a presigned URL, uploading a file, and polling the import job until processing completes.

For V3 (legacy CSV/ZIP) uploads, see [Working with Datasets](/api-reference/ai-task-builder/datasets).

## Overview

V4 uploads follow a three-step flow:

Call `GET /datasets/{dataset_id}/upload-url/{filename}`. The service creates an import job and returns a presigned S3 URL and an `import_id`.

PUT the file directly to S3 using the presigned URL. The file never passes through the Prolific API.

Poll `GET /datasets/{dataset_id}/imports/{import_id}` until a terminal status is reached.

## Step 1: Request a presigned URL

```bash
GET /api/v1/data-collection/datasets/{dataset_id}/upload-url/{filename}
```

Example:

```bash
curl -H "Authorization: Token {api_token}" \
  "https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d/upload-url/reviews.jsonl"
```

Response:

```json
{
  "upload_url": "https://s3.amazonaws.com/raw-datasets/0192a3b5.../01935c2d.../reviews.jsonl?X-Amz-Signature=...",
  "http_method": "PUT",
  "import_id": "01935c2d-1a2b-3c4d-5e6f-7a8b9c0d1e2f"
}
```

Save the `import_id` — you will need it to poll for status.

## Step 2: Upload the file to S3

PUT the file directly to the presigned URL. No authentication headers are required for the S3 request.

Use the `content_type` value from the upload URL response as the `Content-Type` header — for example, `application/x-ndjson` for JSONL files.

```bash
curl -X PUT \
  -H "Content-Type: {content_type}" \
  --data-binary @reviews.jsonl \
  "{upload_url}"
```

## Step 3: Poll the import job

```bash
GET /api/v1/data-collection/datasets/{dataset_id}/imports/{import_id}
```

Example:

```bash
curl -H "Authorization: Token {api_token}" \
  "https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5.../imports/01935c2d..."
```

Poll every few seconds until the status is terminal (`complete`, `partial`, `failed`, or `pending_schema`).

## Import job statuses

| Status           | Terminal | Description                               |
| ---------------- | -------- | ----------------------------------------- |
| `uninitialised`  | No       | Import job created; waiting for S3 upload |
| `queued`         | No       | File received; queued for extraction      |
| `processing`     | No       | Extraction in progress                    |
| `complete`       | Yes      | All records accepted                      |
| `partial`        | Yes      | Some records accepted, some rejected      |
| `failed`         | Yes      | Extraction failed entirely                |
| `pending_schema` | Yes      | Dataset has no schema; upload paused      |

## Handling outcomes

### Complete

All records were accepted. The dataset is ready to be used.

```json
{
  "import_id": "01935c2d-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
  "status": "complete",
  "accepted_count": 1000
}
```

### Partial

Some records were accepted and some were rejected. The accepted records are available immediately — you do not need to re-upload the entire file. Review the `errors` array to understand which records were rejected and why.

```json
{
  "import_id": "01935c2d-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
  "status": "partial",
  "accepted_count": 997,
  "rejected_count": 3,
  "errors": [
    { "record_index": 47, "field": "review_text", "reason": "Value exceeds maximum length" },
    { "record_index": 312, "field": null, "reason": "Record missing required field: product_name" },
    { "record_index": 891, "field": "image_url", "reason": "Value is not a valid URL" }
  ]
}
```

Fix the rejected records in a new file and upload it separately using the same three-step flow. The new records will be appended to the existing accepted records.

### Failed

The file could not be parsed at all. Check the `reason` field and correct the file before re-uploading.

```json
{
  "import_id": "01935c2d-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
  "status": "failed",
  "reason": "File could not be parsed as JSONL: invalid JSON on line 1"
}
```

### Pending schema

The dataset does not yet have a schema defined. Set a schema on the dataset and then re-upload the file.

## Uploading multiple files

You can upload multiple files to the same dataset. Each upload creates a new import job with its own `import_id`. Uploads for the same dataset are processed one at a time in arrival order, so a second upload will wait in `queued` status while the first is still `processing`.

```bash
# Upload first file
GET /datasets/{dataset_id}/upload-url/batch1.jsonl
# → import_id: "aaa..."

# Upload second file (while first may still be processing)
GET /datasets/{dataset_id}/upload-url/batch2.jsonl
# → import_id: "bbb..."

# Poll both independently
GET /datasets/{dataset_id}/imports/aaa...
GET /datasets/{dataset_id}/imports/bbb...
```

The `total_datapoint_count` field on the dataset reflects the cumulative count across all completed imports.