> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.prolific.com/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.prolific.com/_mcp/server.

# Setting up a Batch from a Schema Dataset

This guide covers the additional configuration options available when setting up a batch that uses a **V4 dataset** (one with a schema).

For the general batch setup workflow, see [Working with Batches](/api-reference/ai-task-builder/batches).

## How task grouping works with V4 datasets

When you set up a batch, the service creates tasks by pairing each datapoint with the batch instructions. Tasks are then organised into task groups — each participant completes one task group per submission.

With V4 datasets you can control task grouping in two ways:

### Option 1: Use a `task_group_id` field

Define a field of type `task_group_id` in your dataset schema. Datapoints with the same value in that field are grouped together into a single task group.

**Schema:**

```json
{
  "strict": false,
  "fields": {
    "prompt": { "type": "text" },
    "response_a": { "type": "text" },
    "response_b": { "type": "text" },
    "session_id": { "type": "task_group_id" }
  }
}
```

**Data (JSONL):**

```
{"prompt": "Explain gravity", "response_a": "...", "response_b": "...", "session_id": "session_001"}
{"prompt": "What is AI?", "response_a": "...", "response_b": "...", "session_id": "session_001"}
{"prompt": "Describe photosynthesis", "response_a": "...", "response_b": "...", "session_id": "session_002"}
```

In this example, the first two datapoints share `session_id = "session_001"` and will be grouped together. A participant assigned to `session_001` will annotate both prompts in a single submission.

### Option 2: Use `tasks_per_group` (random grouping)

If your schema does not include a `task_group_id` field, tasks are grouped randomly at setup time using the `tasks_per_group` parameter.

```bash
POST /api/v1/data-collection/batches/{batch_id}/setup
```

```json
{
  "tasks_per_group": 3
}
```

This assigns 3 randomly selected tasks to each task group.

If a `task_group_id` field is present in the dataset schema, those groupings take precedence over `tasks_per_group`. The `tasks_per_group` parameter is ignored.

## Using dataset fields in the batch layout

V4 datasets let you reference specific dataset fields directly in your batch layout using `dataset_field` items. This controls exactly where each field value appears on the participant screen.

```bash
POST /api/v1/data-collection/batches
```

```json
{
  "name": "Response evaluation batch",
  "workspace_id": "6278acb09062db3b35bcbeb0",
  "dataset_id": "0192a3b5-e8f9-7a0b-1c2d-3e4f5a6b7c8d",
  "task_details": {
    "task_name": "Evaluate AI Responses",
    "task_introduction": "<p>Compare two AI responses and choose the better one.</p>",
    "task_steps": "<ol><li>Read the prompt</li><li>Read both responses</li><li>Select the better response</li></ol>"
  },
  "batch_items": [
    {
      "rows": [
        {
          "columns": [
            {
              "items": [
                { "type": "rich_text", "content": "<strong>Prompt</strong>" },
                { "type": "dataset_field", "field": "prompt" }
              ]
            }
          ]
        },
        {
          "columns": [
            {
              "items": [
                { "type": "rich_text", "content": "<strong>Response A</strong>" },
                { "type": "dataset_field", "field": "response_a" }
              ]
            },
            {
              "items": [
                { "type": "rich_text", "content": "<strong>Response B</strong>" },
                { "type": "dataset_field", "field": "response_b" }
              ]
            }
          ]
        },
        {
          "columns": [
            {
              "items": [
                {
                  "type": "multiple_choice",
                  "description": "Which response is more helpful?",
                  "answer_limit": 1,
                  "options": [
                    { "label": "Response A", "value": "a" },
                    { "label": "Response B", "value": "b" },
                    { "label": "Neither", "value": "neither" }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}
```

Only fields of type `text` or `image_url` can be used as `dataset_field` items. `metadata` and `task_group_id` fields are not available for display.

## Setting up the batch

The dataset must have at least one import with status `complete` before setup can be initiated.

```bash
POST /api/v1/data-collection/batches/{batch_id}/setup
```

```json
{
  "tasks_per_group": 1
}
```

Monitor batch status until it reaches `READY`:

```bash
GET /api/v1/data-collection/batches/{batch_id}/status
```