Setting up a Batch from a Schema Dataset

View as MarkdownOpen in Claude

This guide covers the additional configuration options available when setting up a batch that uses a V4 dataset (one with a schema).

For the general batch setup workflow, see Working with Batches.

How task grouping works with V4 datasets

When you set up a batch, the service creates tasks by pairing each datapoint with the batch instructions. Tasks are then organised into task groups — each participant completes one task group per submission.

With V4 datasets you can control task grouping in two ways:

Option 1: Use a task_group_id field

Define a field of type task_group_id in your dataset schema. Datapoints with the same value in that field are grouped together into a single task group.

Schema:

1{
2 "strict": false,
3 "fields": {
4 "prompt": { "type": "text" },
5 "response_a": { "type": "text" },
6 "response_b": { "type": "text" },
7 "session_id": { "type": "task_group_id" }
8 }
9}

Data (JSONL):

{"prompt": "Explain gravity", "response_a": "...", "response_b": "...", "session_id": "session_001"}
{"prompt": "What is AI?", "response_a": "...", "response_b": "...", "session_id": "session_001"}
{"prompt": "Describe photosynthesis", "response_a": "...", "response_b": "...", "session_id": "session_002"}

In this example, the first two datapoints share session_id = "session_001" and will be grouped together. A participant assigned to session_001 will annotate both prompts in a single submission.

Option 2: Use tasks_per_group (random grouping)

If your schema does not include a task_group_id field, tasks are grouped randomly at setup time using the tasks_per_group parameter.

$POST /api/v1/data-collection/batches/{batch_id}/setup
1{
2 "tasks_per_group": 3
3}

This assigns 3 randomly selected tasks to each task group.

If a task_group_id field is present in the dataset schema, those groupings take precedence over tasks_per_group. The tasks_per_group parameter is ignored.

Using dataset fields in the batch layout

V4 datasets let you reference specific dataset fields directly in your batch layout using dataset_field items. This controls exactly where each field value appears on the participant screen.

$POST /api/v1/data-collection/batches
1{
2 "name": "Response evaluation batch",
3 "workspace_id": "6278acb09062db3b35bcbeb0",
4 "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
5 "task_details": {
6 "task_name": "Evaluate AI Responses",
7 "task_introduction": "<p>Compare two AI responses and choose the better one.</p>",
8 "task_steps": "<ol><li>Read the prompt</li><li>Read both responses</li><li>Select the better response</li></ol>"
9 },
10 "batch_items": [
11 {
12 "rows": [
13 {
14 "columns": [
15 {
16 "items": [
17 { "type": "rich_text", "content": "<strong>Prompt</strong>" },
18 { "type": "dataset_field", "field": "prompt" }
19 ]
20 }
21 ]
22 },
23 {
24 "columns": [
25 {
26 "items": [
27 { "type": "rich_text", "content": "<strong>Response A</strong>" },
28 { "type": "dataset_field", "field": "response_a" }
29 ]
30 },
31 {
32 "items": [
33 { "type": "rich_text", "content": "<strong>Response B</strong>" },
34 { "type": "dataset_field", "field": "response_b" }
35 ]
36 }
37 ]
38 },
39 {
40 "columns": [
41 {
42 "items": [
43 {
44 "type": "multiple_choice",
45 "description": "Which response is more helpful?",
46 "answer_limit": 1,
47 "options": [
48 { "label": "Response A", "value": "a" },
49 { "label": "Response B", "value": "b" },
50 { "label": "Neither", "value": "neither" }
51 ]
52 }
53 ]
54 }
55 ]
56 }
57 ]
58 }
59 ]
60}

Only fields of type text or image_url can be used as dataset_field items. metadata and task_group_id fields are not available for display.

Setting up the batch

The dataset must have at least one import with status complete before setup can be initiated.

$POST /api/v1/data-collection/batches/{batch_id}/setup
1{
2 "tasks_per_group": 1
3}

Monitor batch status until it reaches READY:

$GET /api/v1/data-collection/batches/{batch_id}/status