- Environment — the simulated world state (MCP server snapshots, databases, or filesystems) that the agent operates in
- Data — generated samples including intents, datapoints, evaluators, and reference payloads
Available Demos
Tau2-Bench
Customer service dialogs across airline, banking, and retail domains with multi-turn function calling.
Google Workspace
Everyday Google Workspace tasks — managing emails, calendars, sheets, and contacts across diverse personal and professional scenarios.
APEX Agent
Professional knowledge work across investment banking, law, and management consulting — synthesized from scratch inspired by the APEX benchmark.
OpenClaw
Agentic coding and tool-use tasks across productivity, code intelligence, search, creative synthesis, and safety — built on WildClawBench.
Enterprise Database
Enterprise database operations with realistic schema and query scenarios. Coming soon.
Download
All demo datasets are hosted on Hugging Face:Browse on Hugging Face
View and download all datasets
License
This demo data is released under CC BY-NC-ND 4.0.- For demonstration and evaluation purposes only
- No commercial use
- No redistribution or derivative works
- No use for model training