Uploading Data to a Dataset
This guide covers the V4 dataset upload flow: requesting a presigned URL, uploading a file, and polling the import job until processing completes.
For V3 (legacy CSV/ZIP) uploads, see Working with Datasets.
Overview
V4 uploads follow a three-step flow:
Request a presigned URL
Call GET /datasets/{dataset_id}/upload-url/{filename}. The service creates an import job and returns a presigned S3 URL and an import_id.
Step 1: Request a presigned URL
Example:
Response:
Save the import_id — you will need it to poll for status.
Step 2: Upload the file to S3
PUT the file directly to the presigned URL. No authentication headers are required for the S3 request.
Use the content_type value from the upload URL response as the Content-Type header — for example, application/x-ndjson for JSONL files.
Step 3: Poll the import job
Example:
Poll every few seconds until the status is terminal (complete, partial, failed, or pending_schema).
Import job statuses
Handling outcomes
Complete
All records were accepted. The dataset is ready to be used.
Partial
Some records were accepted and some were rejected. The accepted records are available immediately — you do not need to re-upload the entire file. Review the errors array to understand which records were rejected and why.
Fix the rejected records in a new file and upload it separately using the same three-step flow. The new records will be appended to the existing accepted records.
Failed
The file could not be parsed at all. Check the reason field and correct the file before re-uploading.
Pending schema
The dataset does not yet have a schema defined. Set a schema on the dataset and then re-upload the file.
Uploading multiple files
You can upload multiple files to the same dataset. Each upload creates a new import job with its own import_id. Uploads for the same dataset are processed one at a time in arrival order, so a second upload will wait in queued status while the first is still processing.
The total_datapoint_count field on the dataset reflects the cumulative count across all completed imports.