Skip to main content
EigenData-CLI generates high-quality datasets for agent evaluation and training. Below is a catalog of off-the-shelf datasets spanning different domains and task complexities. Each comes with a free, ready-to-use demo sample; where a full production corpus is available, the dataset page describes the complete dataset and how to license it. Each dataset includes:
  • Environment — the simulated world state (MCP server snapshots, databases, or filesystems) that the agent operates in
  • Data — generated samples including intents, datapoints, evaluators, and reference payloads

Available Datasets

APEX Agent

Professional knowledge work across investment banking, law, and management consulting — synthesized from scratch inspired by the APEX benchmark.

Personal Agent Bench

Long-horizon personal knowledge-work on a simulated laptop — tax packets, federal returns, reimbursements, and subscription audits across an 8-app environment.

Tau2-Bench

Multi-turn, policy-grounded customer-service dialogs across airline, telecom, and retail, with tool use and machine-checkable success criteria.

Tau3-Bench

Hard, single-domain retail-banking dialogs with dynamically discoverable tools — the agent must search a knowledge base and unlock the right tool at runtime.

Enterprise Bench

Long-horizon agent tasks inside realistic simulated companies — operate the business or answer questions across up to ~40 connected SaaS systems sharing one world state.

WildClawBench

Agentic, tool-using tasks across six capability categories — from PDF parsing to code debugging to safety alignment — built on InternLM’s WildClawBench.

MCP-Atlas

Multi-step, multi-server tool-use tasks over a ~40-server MCP graph — each frozen with a claims-based reward and a replayable environment snapshot. Built on the MCP-Atlas benchmark.

MCPMark

Synthetic, agentic filesystem + GitHub tasks with deterministic Python verifiers — repo archaeology, cross-file joins, and stateful MCP actions, runnable fully offline.

Google Workspace

Everyday Google Workspace tasks — managing emails, calendars, sheets, and contacts across diverse personal and professional scenarios.

Download

The free demo samples are hosted on Hugging Face:
# Download everything
hf download jindidi/eigendata-demo-data --repo-type dataset

# Download a specific dataset
hf download jindidi/eigendata-demo-data --repo-type dataset --include "tau2_bench/*"

Browse on Hugging Face

View and download all demo samples

License

The demo samples are released under CC BY-NC-ND 4.0.
  • For demonstration and evaluation purposes only
  • No commercial use
  • No redistribution or derivative works
  • No use for model training
Full dataset corpora are available for commercial licensing, including model training — see each dataset’s page or contact support@eigenai.com.