Datasets - Documentation

EigenData-CLI generates high-quality datasets for agent evaluation and training. Below is a catalog of off-the-shelf datasets spanning different domains and task complexities. Each comes with a free, ready-to-use demo sample; where a full production corpus is available, the dataset page describes the complete dataset and how to license it. Each dataset includes:

Environment — the simulated world state (MCP server snapshots, databases, or filesystems) that the agent operates in
Data — generated samples including intents, datapoints, evaluators, and reference payloads

Available Datasets

APEX Agent

Professional knowledge work across investment banking, law, and management consulting — synthesized from scratch inspired by the APEX benchmark.

Personal Agent Bench

Long-horizon personal knowledge-work on a simulated laptop — tax packets, federal returns, reimbursements, and subscription audits across an 8-app environment.

Tau2-Bench

Multi-turn, policy-grounded customer-service dialogs across airline, telecom, and retail, with tool use and machine-checkable success criteria.

Tau3-Bench

Hard, single-domain retail-banking dialogs with dynamically discoverable tools — the agent must search a knowledge base and unlock the right tool at runtime.

Enterprise Bench

Long-horizon agent tasks inside realistic simulated companies — operate the business or answer questions across up to ~40 connected SaaS systems sharing one world state.

WildClawBench

Agentic, tool-using tasks across six capability categories — from PDF parsing to code debugging to safety alignment — built on InternLM’s WildClawBench.

MCP-Atlas

Multi-step, multi-server tool-use tasks over a ~40-server MCP graph — each frozen with a claims-based reward and a replayable environment snapshot. Built on the MCP-Atlas benchmark.

MCPMark

Synthetic, agentic filesystem + GitHub tasks with deterministic Python verifiers — repo archaeology, cross-file joins, and stateful MCP actions, runnable fully offline.

Toolathlon

Single-turn, tool-using tasks over a shared multi-application MCP workspace — 32 tool servers, 102 task families, 4,300 RL environments with deterministic grading.

Google Workspace

Everyday Google Workspace tasks — managing emails, calendars, sheets, and contacts across diverse personal and professional scenarios.

Download

The free demo samples are hosted on Hugging Face:

# Download everything
hf download jindidi/eigendata-demo-data --repo-type dataset

# Download a specific dataset
hf download jindidi/eigendata-demo-data --repo-type dataset --include "tau2_bench/*"

Browse on Hugging Face

View and download all demo samples

License

The demo samples are released under CC BY-NC-ND 4.0.

For demonstration and evaluation purposes only
No commercial use
No redistribution or derivative works
No use for model training

Full dataset corpora are available for commercial licensing, including model training — see each dataset’s page or contact support@eigenai.com.

​Available Datasets