Defining a Dataset Schema

View as MarkdownOpen in Claude

A dataset schema lets you describe the structure of your data before uploading it. Schemas are a feature of V4 datasets and enable:

  • Named, typed fields that drive what participants see
  • dataset_field items in batch_items that pull dataset values directly into your task layout
  • Per-record validation during import (with strict mode)
  • Structured metadata and custom task grouping

Creating a dataset with a schema

Pass a schema object when creating a dataset.

$POST /api/v1/data-collection/datasets
1{
2 "name": "AI response evaluation",
3 "workspace_id": "6278acb09062db3b35bcbeb0",
4 "schema": {
5 "strict": true,
6 "fields": {
7 "prompt": { "type": "text", "label": "Prompt" },
8 "response_a": { "type": "text", "label": "Response A" },
9 "response_b": { "type": "text", "label": "Response B" },
10 "category": { "type": "metadata" }
11 }
12 }
13}

Field types

TypeDescription
textA text value displayed to participants. Can be referenced by dataset_field items in batch_items.
image_urlA URL pointing to an image displayed to participants. Can be referenced by dataset_field items in batch_items.
metadataAn internal value included in exports but not shown to participants. Equivalent to the META_ column prefix in V3 CSV datasets.
task_group_idGroups datapoints into task groups. Rows with the same value are assigned to the same participant in one submission. At most one field per schema may have this type.

Strict mode

The strict flag controls how missing fields are handled during import.

ModeBehaviour
"strict": trueRecords missing any schema field are rejected. Use this to enforce data completeness.
"strict": falseRecords with missing fields are accepted. Missing field values are treated as absent.

Schema constraints

  • Maximum 200 fields per schema.
  • Field keys: 1–128 characters.
  • Field labels: maximum 255 characters.
  • At most one field of type task_group_id per schema.

Referencing schema fields in the batch layout

Once a dataset with a schema is attached to a batch, you can use dataset_field items in batch_items to display dataset values to participants.

1{
2 "batch_items": [
3 {
4 "rows": [
5 {
6 "columns": [
7 {
8 "items": [
9 { "type": "dataset_field", "field": "prompt" },
10 { "type": "dataset_field", "field": "response_a" },
11 { "type": "dataset_field", "field": "response_b" },
12 {
13 "type": "multiple_choice",
14 "description": "Which response is more helpful?",
15 "answer_limit": 1,
16 "options": [
17 { "label": "Response A", "value": "a" },
18 { "label": "Response B", "value": "b" },
19 { "label": "Neither", "value": "neither" }
20 ]
21 }
22 ]
23 }
24 ]
25 }
26 ]
27 }
28 ]
29}

Only fields of type text or image_url can be referenced by dataset_field items. metadata and task_group_id fields are not displayed to participants.

Retrieving a dataset with its schema

$GET /api/v1/data-collection/datasets/{dataset_id}

The response includes the current schema and all import jobs:

1{
2 "id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
3 "name": "AI response evaluation",
4 "schema_version": 4,
5 "total_datapoint_count": 1000,
6 "schema": {
7 "strict": true,
8 "fields": {
9 "prompt": { "type": "text", "label": "Prompt" },
10 "response_a": { "type": "text", "label": "Response A" },
11 "response_b": { "type": "text", "label": "Response B" },
12 "category": { "type": "metadata" }
13 }
14 },
15 "imports": [
16 {
17 "import_id": "01935c2d-1a2b-3c4d-5e6f-7a8b9c0d1e2f",
18 "status": "complete",
19 "accepted_count": 1000
20 }
21 ]
22}