Generate Audio - Documentation

POST /api/v1/generate Content-Type: multipart/form-data

Parameter support can differ depending on the model used. Check the Model Library for model-specific compatibility. Open Model Library.

Authentication

Send your API key in the Authorization header as a Bearer token.

Authorization: Bearer YOUR_API_KEY

Audio Transcription (ASR)

Supported model: Whisper V3 Turbo (model=whisper_v3_turbo)

Parameters

Name	Type	Required	Description
`model`	`string`	Required	Must be `whisper_v3_turbo`.
`file`	`file`	Required	Audio file to transcribe (MP3, WAV, M4A, OGG, WebM).
`language`	`string`	Optional	Spoken language code (default `en`). Supports 99 languages.
`response_format`	`string`	Optional	Output format: `json` or `text`.

Example

curl -X POST https://api-web.eigenai.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "model=whisper_v3_turbo" \
  -F "file=@/path/to/audio.mp3" \
  -F "language=en" \
  -F "response_format=json"

Text-to-Speech (TTS)

Three models are available. All accept multipart/form-data and return a WAV audio file by default. For real-time streaming over WebSocket, see Stream Audio. To upload a voice reference for cloning, see Upload Voice Reference.

Higgs Audio V2.5 (`model=higgs2p5`)

Name	Type	Required	Description
`model`	`string`	Required	Must be `higgs2p5`.
`text`	`string`	Required	Text to convert to speech.
`voice`	`string`	Optional	Voice preset (e.g. `Linda`, `Jack`).
`voice_reference_file`	`file`	Optional	Audio file for voice cloning (WAV, MP3).
`voice_id`	`string`	Optional	Stored voice ID returned by Upload Voice Reference.
`voice_url`	`string`	Optional	External URL to a voice reference audio sample.
`voice_name`	`string`	Optional	Name of a saved voice from the voice library.
`voice_settings`	`string`	Optional	JSON string with voice settings. Supports `speed` (default `1.0`).
`sampling`	`string`	Optional	JSON string with sampling controls: `temperature` (default `1.0`), `top_p` (default `0.95`), `top_k` (default `50`).
`stream`	`boolean`	Optional	`false` = return WAV file (default); `true` = HTTP SSE streaming.

ChatterBox Voice Twin (`model=chatterbox`)

Name	Type	Required	Description
`model`	`string`	Required	Must be `chatterbox`.
`text`	`string`	Required	Text to convert to speech (≤ 1,000 characters recommended).
`language_id`	`string`	Optional	Language code (e.g. `en`, `zh`, `es`, `ja`). Supports 23 languages. Default `en`.
`audio_prompt_file`	`file`	Optional	Voice reference clip for voice cloning (WAV/MP3/M4A/OGG, max 30s).
`voice_id`	`string`	Optional	Stored voice ID returned by Upload Voice Reference.
`preset_url`	`string`	Optional	URL to a voice preset audio sample.
`exaggeration`	`number`	Optional	Expressiveness: `0.0` = subtle, `0.5` = balanced, `1.0+` = highly animated (default `0.5`).
`temperature`	`number`	Optional	Sampling temperature (default `0.8`).
`diffusion_steps`	`number`	Optional	Quality vs. latency. Higher = better quality, slower (default `5`).
`max_tokens`	`integer`	Optional	Upper bound on generated tokens (default `3000`).
`top_p`	`number`	Optional	Nucleus sampling ceiling (default `1.0`).
`min_p`	`number`	Optional	Nucleus sampling floor (default `0.05`).
`repetition_penalty`	`number`	Optional	Penalizes repeated tokens (default `1.2`).
`seed`	`integer`	Optional	Seed for reproducible generation (`null` = random).
`stream`	`boolean`	Optional	`false` = return WAV file (default); `true` = HTTP SSE streaming.

Qwen3 TTS (`model=qwen3-tts`)

Supports named speakers (CustomVoice mode) or voice cloning (Base mode). voice and voice_id/voice_url cannot be used together.

Name	Type	Required	Description
`model`	`string`	Required	Must be `qwen3-tts`.
`text`	`string`	Required	Text to synthesize.
`voice`	`string`	Optional	Named speaker for CustomVoice mode: `Vivian`, `Serena`, `Uncle_Fu`, `Dylan`, `Eric`, `Ryan`, `Aiden`, `Ono_Anna`, `Sohee`. Cannot be used with `voice_id` or `voice_url`.
`voice_id`	`string`	Optional	Stored voice ID for Base model (from Upload Voice Reference).
`voice_url`	`string`	Optional	External URL to voice reference audio (Base model).
`voice_settings`	`string`	Optional	JSON string with voice settings. Supports `speed` (default `1.0`).
`language`	`string`	Optional	`Auto`, `Chinese`, `English`, `French`, `German`, `Italian`, `Japanese`, `Korean`, `Portuguese`, `Russian`, `Spanish` (default `Auto`).
`instructions`	`string`	Optional	Style/emotion control (e.g. `"speak cheerfully"`).
`response_format`	`string`	Optional	Output format: `wav` (default), `pcm`, `mp3`, `flac`, `aac`, `opus`.
`stream`	`boolean`	Optional	`false` = return audio file (default); `true` = HTTP SSE streaming.

TTS Example

curl -X POST https://api-web.eigenai.com/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "model=higgs2p5" \
  -F "text=Hello, this is a test of the text-to-speech system." \
  -F "voice=Linda" \
  --output speech.wav

​Authentication

​Audio Transcription (ASR)

​Parameters

​Example

​Text-to-Speech (TTS)

​Higgs Audio V2.5 (model=higgs2p5)

​ChatterBox Voice Twin (model=chatterbox)

​Qwen3 TTS (model=qwen3-tts)

​TTS Example

Authentication

Audio Transcription (ASR)

Parameters

Example

Text-to-Speech (TTS)

Higgs Audio V2.5 (`model=higgs2p5`)

ChatterBox Voice Twin (`model=chatterbox`)

Qwen3 TTS (`model=qwen3-tts`)

TTS Example