Defining a Dataset Schema

A dataset schema lets you describe the structure of your data before uploading it. Schemas are a feature of V4 datasets and enable:

Named, typed fields that drive what participants see
dataset_field items in batch_items that pull dataset values directly into your task layout
Per-record validation during import (with strict mode)
Structured metadata and custom task grouping

Creating a dataset with a schema

Pass a schema object when creating a dataset.

$ POST /api/v1/data-collection/datasets

1 {
2   "name": "AI response evaluation",
3   "workspace_id": "6278acb09062db3b35bcbeb0",
4   "schema": {
5     "strict": true,
6     "fields": {
7       "prompt": { "type": "text", "label": "Prompt" },
8       "response_a": { "type": "text", "label": "Response A" },
9       "response_b": { "type": "text", "label": "Response B" },
10       "category": { "type": "metadata" }
11     }
12   }
13 }

Field types

Type	Description
`text`	A text value displayed to participants. Can be referenced by `dataset_field` items in `batch_items`.
`image_url`	A URL pointing to an image displayed to participants. Can be referenced by `dataset_field` items in `batch_items`.
`metadata`	An internal value included in exports but not shown to participants. Equivalent to the `META_` column prefix in V3 CSV datasets.
`task_group_id`	Groups datapoints into task groups. Rows with the same value are assigned to the same participant in one submission. At most one field per schema may have this type.

Strict mode

The strict flag controls how missing fields are handled during import.

Mode	Behaviour
`"strict": true`	Records missing any schema field are rejected. Use this to enforce data completeness.
`"strict": false`	Records with missing fields are accepted. Missing field values are treated as absent.

Schema constraints

Maximum 200 fields per schema.
Field keys: 1–128 characters.
Field labels: maximum 255 characters.
At most one field of type task_group_id per schema.

Referencing schema fields in the batch layout

Once a dataset with a schema is attached to a batch, you can use dataset_field items in batch_items to display dataset values to participants.

1 {
2   "batch_items": [
3     {
4       "rows": [
5         {
6           "columns": [
7             {
8               "items": [
9                 { "type": "dataset_field", "field": "prompt" },
10                 { "type": "dataset_field", "field": "response_a" },
11                 { "type": "dataset_field", "field": "response_b" },
12                 {
13                   "type": "multiple_choice",
14                   "description": "Which response is more helpful?",
15                   "answer_limit": 1,
16                   "options": [
17                     { "label": "Response A", "value": "a" },
18                     { "label": "Response B", "value": "b" },
19                     { "label": "Neither", "value": "neither" }
20                   ]
21                 }
22               ]
23             }
24           ]
25         }
26       ]
27     }
28   ]
29 }

Only fields of type text or image_url can be referenced by dataset_field items. metadata and task_group_id fields are not displayed to participants.

Retrieving a dataset with its schema

$ GET /api/v1/data-collection/datasets/{dataset_id}

The response includes the current schema and all import jobs:

1 {
2   "id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
3   "name": "AI response evaluation",
4   "schema_version": 4,
5   "total_datapoint_count": 1000,
6   "schema": {
7     "strict": true,
8     "fields": {
9       "prompt": { "type": "text", "label": "Prompt" },
10       "response_a": { "type": "text", "label": "Response A" },
11       "response_b": { "type": "text", "label": "Response B" },
12       "category": { "type": "metadata" }
13     }
14   },
15   "imports": [
16     {
17       "import_id": "01935c2d-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
18       "status": "complete",
19       "accepted_count": 1000
20     }
21   ]
22 }