Data Generate generates synthetic or sample data from scratch based on a provided MCP (schema/config), producing records that conform to the defined structure and constraints. You describe what kind of data you need, and the system generates realistic conversations tailored to your specifications.Documentation Index
Fetch the complete documentation index at: https://docs.eigenai.com/llms.txt
Use this file to discover all available pages before exploring further.
Two-Phase Generation Strategy
Generation uses a two-phase workflow to ensure quality:- Phase 1: Pilot Optimization: Runs several iterations, generating small batches for validation and refinement. You can provide feedback to improve or customize your personalized generation agentic system.
- Phase 2: Large-Scale Generation with Online Monitoring: Uses the optimized generation agentic system from Phase 1 to generate your complete dataset at scale.
Parameters
| Parameter | Required | Description |
|---|---|---|
domain | Yes | The problem space or topic area (see Domain) |
request | Yes | Description of data to generate |
final_samples | Yes | Number of samples to generate |
schema_file | One of | Path to a local schema file |
mcp_server_url | One of | URL of an MCP server providing the schema |
data_language | No | Language mix specification |
reference_doc | No | Path to reference documentation |
You must provide either
schema_file or mcp_server_url as the function schema source.Phase 1: Pilot Optimization
After you confirm the configuration, Phase 1 begins. The system runs several iterations, each producing a small batch of samples for validation. The CLI displays a live progress tracker showing the current iteration, elapsed time, and workflow pipeline status.
Provide Feedback
During each Phase 1 iteration, the CLI opens a browser-based review interface displaying the generated samples. You can:- Review each generated conversation
- Add per-sample feedback in the provided text areas
- Submit with no feedback if the samples look good

Phase 2: Large-Scale Generation with Online Monitoring
Phase 2 starts automatically once Phase 1 completes. The system uses the optimized generation strategy to produce your full dataset at the target scale, processing in batches. The CLI continues showing progress until the run finishes.Completion
Upon completion:
The sampling config file can be reused for future resample operations to skip Phase 1.
Resample: Quick Generation with Saved Config
If you are satisfied with a previous generation, use resample to skip Phase 1:
Output
After a run completes, results are saved underoutputs/ as a new run directory, for example:
outputs/generated_data_<run_id>/
generated_data.jsonl- The generated dataset in JSONL formatdatapoints/- One JSON file per sample (expanded view ofgenerated_data.jsonl)metadata.json- Run metadata (task type, parameters, primary files, timestamps)
Example Output: Tau2-Bench Airline Domain
Below is an example of 20 multi-turn function-calling dialogs generated for the Tau2-Bench airline customer service domain. Each sample is a realistic conversation involving booking, modification, cancellation, refunds, and baggage handling — with tool calls to airline backend APIs.Sample dialog excerpt
Sample dialog excerpt
User:Assistant → Assistant:
Hi, this is Emma Smith, user id emma_smith_9363. I have three things I need handled across my bookings: 1) Please cancel reservation E9TZTU with a full refund. 2) Change the flights for reservation 1YVFB9 to travel on 2024-05-28 instead of 2024-05-27 AND upgrade to business class. 3) Add 2 checked bags to reservation …Assistant →
get_user_details({"user_id": "emma_smith_9363"})Tool response:get_reservation_details({"reservation_id": "E9TZTU"})Tool response:I’ve pulled up your details. Let me walk through each request. For reservation E9TZTU — this is a basic economy round-trip flight from PHL to DTW. I need to let you know that basic economy reservations can only receive a refund as travel certificates, not a full cash refund …
- 20 samples, averaging 37 messages per dialog
- 4–20 tool calls per conversation across functions like
get_user_details,get_reservation_details,cancel_reservation,update_reservation_flights,book_reservation, and more - Covers diverse scenarios: cancellation, flight modification, baggage updates, refunds, payment handling, insurance, and membership-related requests
Download sample data
20-sample JSONL file (OpenAI format)
Browse dialogs interactively
Interactive HTML dialog viewer
Using /execute
You can also run data-generate non-interactively via/execute with a YAML config.
Prerequisites
- You have a YAML configuration file available.
- You provide a schema source (
schema_fileormcp_server_url).