> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.prolific.com/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.prolific.com/_mcp/server.

# Get a Dataset

GET https://api.prolific.com/api/v1/data-collection/datasets/{dataset_id}

Get a specific AI Task Builder dataset by its unique identifier.

For **V4 datasets**, the response includes the dataset `schema` (if one has been set) and an `imports` array listing all import jobs for this dataset.

For **V3 datasets**, the response includes `status`, `filename`, and `has_predetermined_grouping_id` instead.

Reference: https://docs.prolific.com/api-reference/ai-task-builder/get-task-builder-dataset

## OpenAPI Specification

```yaml
openapi: 3.1.0
info:
  title: Prolific API for Data Collectors
  version: 1.0.0
paths:
  /api/v1/data-collection/datasets/{dataset_id}:
    get:
      operationId: get-task-builder-dataset
      summary: Get a Dataset
      description: >-
        Get a specific AI Task Builder dataset by its unique identifier.


        For **V4 datasets**, the response includes the dataset `schema` (if one
        has been set) and an `imports` array listing all import jobs for this
        dataset.


        For **V3 datasets**, the response includes `status`, `filename`, and
        `has_predetermined_grouping_id` instead.
      tags:
        - subpackage_aiTaskBuilder
      parameters:
        - name: dataset_id
          in: path
          description: The unique identifier of the AI Task Builder dataset
          required: true
          schema:
            type: string
        - name: Authorization
          in: header
          description: >-
            The Prolific API uses API token to authenticate requests. You can
            create an API token directly from your settings.


            Your API token does not have an expiry date and carries full
            permission, so be sure to keep them secure.


            If your token is leaked, delete it and create a new one directly in
            the app.


            In your requests add `Authorization` header with the value `Token
            <your token>`.
          required: true
          schema:
            type: string
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/AITaskBuilderDataset'
        '400':
          description: Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        '404':
          description: Dataset not found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
servers:
  - url: https://api.prolific.com
    description: Production
components:
  schemas:
    AiTaskBuilderDatasetSchemaVersion:
      type: string
      enum:
        - '3'
        - '4'
      description: >-
        Internal dataset version. 3 = legacy CSV/ZIP; 4 = structured schema with
        JSONL/CSV import tracking.
      title: AiTaskBuilderDatasetSchemaVersion
    AiTaskBuilderDatasetStatus:
      type: string
      enum:
        - ERROR
        - PROCESSING
        - READY
        - UNINITIALISED
      description: >-
        Processing status. **V3 datasets only.** V4 datasets track status per
        import job via `imports`.
      title: AiTaskBuilderDatasetStatus
    DatasetSchemaFieldType:
      type: string
      enum:
        - text
        - image_url
        - metadata
        - task_group_id
      description: >-
        Field type:

        - `text` — a text value shown to participants. Can be referenced by
        `dataset_field` items in `batch_items`.

        - `image_url` — a URL pointing to an image shown to participants. Can be
        referenced by `dataset_field` items in `batch_items`.

        - `metadata` — an internal value included in exports but not shown to
        participants (equivalent to the `META_` prefix in V3 CSV datasets).

        - `task_group_id` — groups datapoints into task groups. Rows with the
        same `task_group_id` value are assigned to the same participant.
        Equivalent to `META_TASK_GROUP_ID` in V3 CSV datasets. At most one field
        per schema may have this type.
      title: DatasetSchemaFieldType
    DatasetSchemaField:
      type: object
      properties:
        type:
          $ref: '#/components/schemas/DatasetSchemaFieldType'
          description: >-
            Field type:

            - `text` — a text value shown to participants. Can be referenced by
            `dataset_field` items in `batch_items`.

            - `image_url` — a URL pointing to an image shown to participants.
            Can be referenced by `dataset_field` items in `batch_items`.

            - `metadata` — an internal value included in exports but not shown
            to participants (equivalent to the `META_` prefix in V3 CSV
            datasets).

            - `task_group_id` — groups datapoints into task groups. Rows with
            the same `task_group_id` value are assigned to the same participant.
            Equivalent to `META_TASK_GROUP_ID` in V3 CSV datasets. At most one
            field per schema may have this type.
        label:
          type: string
          description: >-
            Optional human-readable label for the field, used in the participant
            interface.
      required:
        - type
      description: Descriptor for a single field in a dataset schema.
      title: DatasetSchemaField
    DatasetSchema:
      type: object
      properties:
        strict:
          type: boolean
          description: >-
            When `true`, records that are missing any field defined in the
            schema are rejected during import. When `false`, missing fields are
            allowed and treated as absent.
        fields:
          type: object
          additionalProperties:
            $ref: '#/components/schemas/DatasetSchemaField'
          description: >-
            A map of field key to field descriptor. The key is used to reference
            the field in `batch_items` (`dataset_field` items) and in JSONL
            records.
      required:
        - strict
        - fields
      description: >-
        A researcher-defined schema that specifies the fields in a V4 dataset.
        Each field has a type that determines how it is rendered to participants
        and how it can be referenced in `batch_items`.


        Constraints:

        - Maximum 200 fields per schema.

        - Field keys: 1–128 characters.

        - Field labels: maximum 255 characters.
      title: DatasetSchema
    DatasetImportJobType:
      type: string
      enum:
        - file_upload
      description: The type of import. Currently always `file_upload`.
      title: DatasetImportJobType
    DatasetImportJobStatus:
      type: string
      enum:
        - uninitialised
        - queued
        - processing
        - complete
        - partial
        - failed
        - pending_schema
      description: Current status of the import job.
      title: DatasetImportJobStatus
    ImportJobError:
      type: object
      properties:
        record_index:
          type:
            - integer
            - 'null'
          description: Zero-based index of the rejected record in the uploaded file.
        field:
          type:
            - string
            - 'null'
          description: >-
            The schema field key that caused the rejection, or `null` if no
            specific field is implicated.

            The special value `_raw` indicates a whole-record parse failure
            (e.g. malformed JSON) rather than a field-level error.
        reason:
          type:
            - string
            - 'null'
          description: Human-readable description of why the record was rejected.
      description: A record-level validation error from a partial import.
      title: ImportJobError
    DatasetImportJob:
      type: object
      properties:
        dataset_id:
          type: string
          format: uuid
          description: The dataset this import job belongs to.
        import_id:
          type: string
          format: uuid
          description: The unique identifier of the import job.
        type:
          $ref: '#/components/schemas/DatasetImportJobType'
          description: The type of import. Currently always `file_upload`.
        filename:
          type:
            - string
            - 'null'
          description: The original filename supplied when the upload URL was requested.
        created_at:
          type: string
          format: date-time
          description: When the import job was created (ISO 8601, UTC).
        updated_at:
          type: string
          format: date-time
          description: When the import job was last updated (ISO 8601, UTC).
        status:
          $ref: '#/components/schemas/DatasetImportJobStatus'
          description: Current status of the import job.
        accepted_count:
          type: integer
          description: >-
            Number of records successfully ingested. Present when status is
            `complete` or `partial`.
        rejected_count:
          type: integer
          description: Number of records rejected. Present when status is `partial`.
        errors:
          type: array
          items:
            $ref: '#/components/schemas/ImportJobError'
          description: Record-level validation errors. Present when status is `partial`.
        reason:
          type: string
          description: Human-readable reason for failure. Present when status is `failed`.
      required:
        - dataset_id
        - import_id
        - type
        - created_at
        - updated_at
        - status
      description: >-
        Tracks the lifecycle of a single file upload to a V4 dataset.


        An import job is created when `GET
        /datasets/{dataset_id}/upload-url/{filename}` is called. It transitions
        through the following statuses:


        | Status | Meaning |

        |---|---|

        | `uninitialised` | Import job created; file not yet uploaded to S3 |

        | `queued` | File uploaded; queued for extraction |

        | `processing` | Extraction in progress |

        | `complete` | All records accepted |

        | `partial` | Some records accepted, some rejected — see `errors` |

        | `failed` | Extraction failed entirely — see `reason` |

        | `pending_schema` | Dataset has no schema; upload paused until schema
        is set |


        Poll `GET /datasets/{dataset_id}/imports/{import_id}` until a terminal
        status is reached (`complete`, `partial`, `failed`, or
        `pending_schema`).
      title: DatasetImportJob
    AITaskBuilderDataset:
      type: object
      properties:
        id:
          type: string
          format: uuid
        name:
          type: string
        created_at:
          type: string
          format: date-time
        created_by:
          type: string
        workspace_id:
          type: string
        total_datapoint_count:
          type: integer
        schema_version:
          $ref: '#/components/schemas/AiTaskBuilderDatasetSchemaVersion'
          description: >-
            Internal dataset version. 3 = legacy CSV/ZIP; 4 = structured schema
            with JSONL/CSV import tracking.
        status:
          $ref: '#/components/schemas/AiTaskBuilderDatasetStatus'
          description: >-
            Processing status. **V3 datasets only.** V4 datasets track status
            per import job via `imports`.
        filename:
          type:
            - string
            - 'null'
          description: Filename of the uploaded data file. **V3 datasets only.**
        has_predetermined_grouping_id:
          type:
            - boolean
            - 'null'
          description: >-
            Whether the dataset contains a `META_TASK_GROUP_ID` column. **V3
            datasets only.**
        schema:
          oneOf:
            - $ref: '#/components/schemas/DatasetSchema'
            - type: 'null'
          description: >-
            The researcher-defined field schema for this dataset. **V4 datasets
            only.**

            `null` if no schema has been set yet.
        imports:
          type:
            - array
            - 'null'
          items:
            $ref: '#/components/schemas/DatasetImportJob'
          description: >-
            Import jobs for this dataset, most recent first. **V4 datasets
            only.** `null` for V3 datasets.
      required:
        - id
        - name
        - created_at
        - created_by
        - workspace_id
        - total_datapoint_count
        - schema_version
      description: >-
        An AI Task Builder dataset. The shape of the response varies by
        `schema_version`:


        - **V3 datasets** include `status`, `filename`, and
        `has_predetermined_grouping_id`.

        - **V4 datasets** include `schema` (the researcher-defined field schema,
        or `null` if not yet set) and `imports` (the list of import jobs for
        this dataset). V4 datasets do not include `status` or `filename`.
      title: AITaskBuilderDataset
    ErrorDetailDetail2:
      type: object
      properties:
        any_field:
          type: array
          items:
            type: string
          description: >-
            Name of the field with a validation error and as a value an array
            with the error descriptions
      description: All fields with validation errors
      title: ErrorDetailDetail2
    ErrorDetailDetail:
      oneOf:
        - type: string
        - type: array
          items:
            type: string
        - $ref: '#/components/schemas/ErrorDetailDetail2'
      description: Error detail
      title: ErrorDetailDetail
    ErrorDetail:
      type: object
      properties:
        status:
          type: integer
          description: Status code as in the http standards
        error_code:
          type: integer
          description: Internal error code
        title:
          type: string
          description: Error title
        detail:
          $ref: '#/components/schemas/ErrorDetailDetail'
          description: Error detail
        additional_information:
          type: string
          description: Optional extra information
        traceback:
          type: string
          description: Optional debug information
        interactive:
          type: boolean
      required:
        - status
        - error_code
        - title
        - detail
      title: ErrorDetail
    Error:
      type: object
      properties:
        error:
          $ref: '#/components/schemas/ErrorDetail'
      required:
        - error
      title: Error
  securitySchemes:
    token:
      type: apiKey
      in: header
      name: Authorization
      description: >-
        The Prolific API uses API token to authenticate requests. You can create
        an API token directly from your settings.


        Your API token does not have an expiry date and carries full permission,
        so be sure to keep them secure.


        If your token is leaked, delete it and create a new one directly in the
        app.


        In your requests add `Authorization` header with the value `Token <your
        token>`.

```

## Examples


**Request**

```json
{}
```

**Response**

```json
{
  "id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
  "name": "Customer Feedback Dataset",
  "created_at": "2024-01-15T09:30:00Z",
  "created_by": "65786062db3b35bcbeb07bcc",
  "workspace_id": "6278acb09062db3b35bcbeb0",
  "total_datapoint_count": 1500,
  "schema_version": 3,
  "status": "ERROR",
  "filename": "customer_feedback_jan2024.csv",
  "has_predetermined_grouping_id": true,
  "schema": {
    "strict": true,
    "fields": {}
  },
  "imports": [
    {
      "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
      "import_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "type": "file_upload",
      "created_at": "2024-01-15T09:30:00Z",
      "updated_at": "2024-01-15T10:00:00Z",
      "status": "partial",
      "filename": "customer_feedback_jan2024.csv",
      "accepted_count": 1498,
      "rejected_count": 2,
      "errors": [
        {
          "record_index": 102,
          "field": "feedback_text",
          "reason": "Text field exceeds maximum length"
        },
        {
          "record_index": 457,
          "field": "_raw",
          "reason": "Malformed JSON in record"
        }
      ],
      "reason": ""
    }
  ]
}
```

**SDK Code**

```python
import requests

url = "https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d"

payload = {}
headers = {
    "Authorization": "Token <token>",
    "Content-Type": "application/json"
}

response = requests.get(url, json=payload, headers=headers)

print(response.json())
```

```javascript
const url = 'https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d';
const options = {
  method: 'GET',
  headers: {Authorization: 'Token <token>', 'Content-Type': 'application/json'},
  body: '{}'
};

try {
  const response = await fetch(url, options);
  const data = await response.json();
  console.log(data);
} catch (error) {
  console.error(error);
}
```

```go
package main

import (
	"fmt"
	"strings"
	"net/http"
	"io"
)

func main() {

	url := "https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d"

	payload := strings.NewReader("{}")

	req, _ := http.NewRequest("GET", url, payload)

	req.Header.Add("Authorization", "Token <token>")
	req.Header.Add("Content-Type", "application/json")

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := io.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}
```

```ruby
require 'uri'
require 'net/http'

url = URI("https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Get.new(url)
request["Authorization"] = 'Token <token>'
request["Content-Type"] = 'application/json'
request.body = "{}"

response = http.request(request)
puts response.read_body
```

```java
import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.Unirest;

HttpResponse<String> response = Unirest.get("https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d")
  .header("Authorization", "Token <token>")
  .header("Content-Type", "application/json")
  .body("{}")
  .asString();
```

```php
<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('GET', 'https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d', [
  'body' => '{}',
  'headers' => [
    'Authorization' => 'Token <token>',
    'Content-Type' => 'application/json',
  ],
]);

echo $response->getBody();
```

```csharp
using RestSharp;

var client = new RestClient("https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d");
var request = new RestRequest(Method.GET);
request.AddHeader("Authorization", "Token <token>");
request.AddHeader("Content-Type", "application/json");
request.AddParameter("application/json", "{}", ParameterType.RequestBody);
IRestResponse response = client.Execute(request);
```

```swift
import Foundation

let headers = [
  "Authorization": "Token <token>",
  "Content-Type": "application/json"
]
let parameters = [] as [String : Any]

let postData = JSONSerialization.data(withJSONObject: parameters, options: [])

let request = NSMutableURLRequest(url: NSURL(string: "https://api.prolific.com/api/v1/data-collection/datasets/0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d")! as URL,
                                        cachePolicy: .useProtocolCachePolicy,
                                    timeoutInterval: 10.0)
request.httpMethod = "GET"
request.allHTTPHeaderFields = headers
request.httpBody = postData as Data

let session = URLSession.shared
let dataTask = session.dataTask(with: request as URLRequest, completionHandler: { (data, response, error) -> Void in
  if (error != nil) {
    print(error as Any)
  } else {
    let httpResponse = response as? HTTPURLResponse
    print(httpResponse)
  }
})

dataTask.resume()
```