Setting up a Batch from a Schema Dataset
This guide covers the additional configuration options available when setting up a batch that uses a V4 dataset (one with a schema).
For the general batch setup workflow, see Working with Batches.
How task grouping works with V4 datasets
When you set up a batch, the service creates tasks by pairing each datapoint with the batch instructions. Tasks are then organised into task groups — each participant completes one task group per submission.
With V4 datasets you can control task grouping in two ways:
Option 1: Use a task_group_id field
Define a field of type task_group_id in your dataset schema. Datapoints with the same value in that field are grouped together into a single task group.
Schema:
Data (JSONL):
In this example, the first two datapoints share session_id = "session_001" and will be grouped together. A participant assigned to session_001 will annotate both prompts in a single submission.
Option 2: Use tasks_per_group (random grouping)
If your schema does not include a task_group_id field, tasks are grouped randomly at setup time using the tasks_per_group parameter.
This assigns 3 randomly selected tasks to each task group.
If a task_group_id field is present in the dataset schema, those groupings take precedence over tasks_per_group. The tasks_per_group parameter is ignored.
Using dataset fields in the batch layout
V4 datasets let you reference specific dataset fields directly in your batch layout using dataset_field items. This controls exactly where each field value appears on the participant screen.
Only fields of type text or image_url can be used as dataset_field items. metadata and task_group_id fields are not available for display.
Setting up the batch
The dataset must have at least one import with status complete before setup can be initiated.
Monitor batch status until it reaches READY: