Setting up a Batch from a Schema Dataset

This guide covers the additional configuration options available when setting up a batch that uses a V4 dataset (one with a schema).

For the general batch setup workflow, see Working with Batches.

How task grouping works with V4 datasets

When you set up a batch, the service creates tasks by pairing each datapoint with the batch instructions. Tasks are then organised into task groups — each participant completes one task group per submission.

With V4 datasets you can control task grouping in two ways:

Option 1: Use a `task_group_id` field

Define a field of type task_group_id in your dataset schema. Datapoints with the same value in that field are grouped together into a single task group.

Schema:

1 {
2   "strict": false,
3   "fields": {
4     "prompt": { "type": "text" },
5     "response_a": { "type": "text" },
6     "response_b": { "type": "text" },
7     "session_id": { "type": "task_group_id" }
8   }
9 }

Data (JSONL):

{"prompt": "Explain gravity", "response_a": "...", "response_b": "...", "session_id": "session_001"}
{"prompt": "What is AI?", "response_a": "...", "response_b": "...", "session_id": "session_001"}
{"prompt": "Describe photosynthesis", "response_a": "...", "response_b": "...", "session_id": "session_002"}

In this example, the first two datapoints share session_id = "session_001" and will be grouped together. A participant assigned to session_001 will annotate both prompts in a single submission.

Option 2: Use `tasks_per_group` (random grouping)

If your schema does not include a task_group_id field, tasks are grouped randomly at setup time using the tasks_per_group parameter.

$ POST /api/v1/data-collection/batches/{batch_id}/setup

1 {
2   "tasks_per_group": 3
3 }

This assigns 3 randomly selected tasks to each task group.

If a task_group_id field is present in the dataset schema, those groupings take precedence over tasks_per_group. The tasks_per_group parameter is ignored.

Using dataset fields in the batch layout

V4 datasets let you reference specific dataset fields directly in your batch layout using dataset_field items. This controls exactly where each field value appears on the participant screen.

$ POST /api/v1/data-collection/batches

1 {
2   "name": "Response evaluation batch",
3   "workspace_id": "6278acb09062db3b35bcbeb0",
4   "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
5   "task_details": {
6     "task_name": "Evaluate AI Responses",
7     "task_introduction": "<p>Compare two AI responses and choose the better one.</p>",
8     "task_steps": "<ol><li>Read the prompt</li><li>Read both responses</li><li>Select the better response</li></ol>"
9   },
10   "batch_items": [
11     {
12       "rows": [
13         {
14           "columns": [
15             {
16               "items": [
17                 { "type": "rich_text", "content": "<strong>Prompt</strong>" },
18                 { "type": "dataset_field", "field": "prompt" }
19               ]
20             }
21           ]
22         },
23         {
24           "columns": [
25             {
26               "items": [
27                 { "type": "rich_text", "content": "<strong>Response A</strong>" },
28                 { "type": "dataset_field", "field": "response_a" }
29               ]
30             },
31             {
32               "items": [
33                 { "type": "rich_text", "content": "<strong>Response B</strong>" },
34                 { "type": "dataset_field", "field": "response_b" }
35               ]
36             }
37           ]
38         },
39         {
40           "columns": [
41             {
42               "items": [
43                 {
44                   "type": "multiple_choice",
45                   "description": "Which response is more helpful?",
46                   "answer_limit": 1,
47                   "options": [
48                     { "label": "Response A", "value": "a" },
49                     { "label": "Response B", "value": "b" },
50                     { "label": "Neither", "value": "neither" }
51                   ]
52                 }
53               ]
54             }
55           ]
56         }
57       ]
58     }
59   ]
60 }

Only fields of type text or image_url can be used as dataset_field items. metadata and task_group_id fields are not available for display.

Setting up the batch

The dataset must have at least one import with status complete before setup can be initiated.

$ POST /api/v1/data-collection/batches/{batch_id}/setup

1 {
2   "tasks_per_group": 1
3 }

Monitor batch status until it reaches READY:

$ GET /api/v1/data-collection/batches/{batch_id}/status