Deployments - Documentation

The Deployments page lets you provision a dedicated GPU endpoint for any supported model. Once a deployment is Ready, you can call it using the OpenAI-compatible API.

Prerequisites

An EigenAI account with available credits.
An API key (see Authentication).

The deployments list

The Deployments page shows a table of all your deployments.

Column	Description
Deployment name	Human-readable name and short ID (e.g., `dep-6a6ee693c4`).
Type	Deployment type. Currently `llm` for all text and vision-language models.
Status	Current state (see Deployment statuses below).
Actions	Sync status refreshes the status from the backend. Terminate shuts down the deployment and stops billing.

Use the Search box to filter by name, model, or ID. Use the All types and All statuses dropdowns to narrow the list. Click Refresh to reload all deployment statuses at once.

Deployment statuses

Status	Description
Provisioning	The deployment is starting up. GPUs are being allocated.
Ready	The deployment is running and accepting API requests.
Terminated	The deployment has been shut down.
Failed	The deployment encountered an error and could not start.

Create a deployment

Click Create Deployment to open the 3-step wizard.

Step 1 — Select model

Choose the model you want to deploy. 35 models are available, organized by provider.

Provider	Models
DeepSeek	DeepSeek R1, DeepSeek R1 (0528), DeepSeek R1 (0528, FP4), DeepSeek V3.1 Terminus, DeepSeek V3.2
Qwen3 (Text)	Qwen3 4B Thinking 2507 (FP8), Qwen3 4B Instruct 2507 (FP8), Qwen3 30B Thinking 2507 (FP8), Qwen3 30B Instruct 2507 (FP8), Qwen3 Next 80B Thinking (FP8), Qwen3 235B Thinking 2507 (FP8), Qwen3 235B Instruct 2507 (FP8), Qwen3 Coder 480B Instruct (FP8)
Qwen3 (Vision)	Qwen3-VL 2B Instruct (FP8), Qwen3-VL 2B Thinking (FP8), Qwen3-VL 4B Instruct (FP8), Qwen3-VL 4B Thinking (FP8), Qwen3-VL 8B Instruct (FP8), Qwen3-VL 8B Thinking (FP8), Qwen3-VL 30B Instruct (FP8), Qwen3-VL 30B Thinking (FP8), Qwen3-VL 32B Instruct (FP8), Qwen3-VL 32B Thinking (FP8), Qwen3-VL 235B Instruct (FP8), Qwen3-VL 235B Thinking (FP8)
GLM	GLM 4.5V, GLM 5 (FP8)
Llama	Llama 3.1 70B Instruct, Llama 4 Scout 17B 16E Instruct
Kimi	Kimi K2 Instruct, Kimi K2 Thinking
Others	GPT-OSS 120B, Intern S1, MiMo V2 Flash, Nemotron 3 Nano 30B (BF16)

Each model card shows:

GPU minimum — the minimum number of GPUs required.
Capability tags — e.g., Think (chain-of-thought reasoning), Tools (function calling), Turbo (speculative decoding), Vision (image input), FP8 (FP8 quantized weights), BF16 (BF16 quantized weights).

Use the provider filter tabs (All, DeepSeek, Qwen3 (Text), Qwen3 (Vision), GLM, Llama, Kimi, Others) to browse by family. You can also deploy a fine-tuned model checkpoint directly from the Fine-tuning job detail page using the Deploy button on any epoch checkpoint.

Step 2 — Configuration

Configure the hardware and runtime options for your deployment.

Field	Description
Display name	A human-readable name for this deployment. Auto-generated by default (e.g., `deepseek-r1-chief-moth-207`).
Hardware platform	GPU type. Currently NVIDIA H200 (141 GB, $5.99/GPU/hr).
GPU count	Number of GPUs per replica: 1, 2, 4, or 8. Some models require a minimum GPU count.
Replicas	Number of parallel instances for scaling: 1, 2, 3, or 4.
Reasoning parser	Enable structured chain-of-thought output. Recommended for reasoning models (e.g., DeepSeek R1).
Tool call parser	Enable function calling and tool use.

Step 3 — Review & deploy

Review the deployment summary before launching.

Field	Description
Name	Display name of the deployment.
Model	The model being deployed.
Hardware	GPU type selected.
GPUs per replica	GPU count chosen in step 2.
Replicas	Number of replicas chosen in step 2.
Reasoning parser	Whether reasoning output parsing is enabled.
Tool calling	Whether tool call parsing is enabled.
Estimated cost	Calculated as: GPUs per replica × replicas × price per GPU/hr.

Click Deploy to provision the endpoint. The deployment will enter Provisioning status.

View deployment details

Click any row in the deployments list to expand its details inline.

Field	Description
Model name	The unique model identifier to use in API calls.
Base model	The base model or checkpoint used.
Accelerator	GPU type and count (e.g., `H200:1`).
Replicas	Number of running replicas.
Checkpoint	The specific model checkpoint deployed.

Call a deployment

When a deployment is Ready, an OpenAI-Compatible Endpoint section appears with everything you need to make your first request. Base URL

https://api-web.eigenai.com/api/deployment/v1

Model name Each deployment gets a unique model name (e.g., qwen3-vl-8b-instruct-mushy-booby-167-dep-6a6ee693c4). Use this as the model field in your requests. Click the copy icon to copy it. Authentication Include your API key in the Authorization header. Click Get API Key to go to the API Keys page.

curl https://api-web.eigenai.com/api/deployment/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_DEPLOYMENT_MODEL_NAME",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Replace YOUR_DEPLOYMENT_MODEL_NAME with the model name shown in the deployment’s detail panel.

Terminate a deployment

Click Terminate in the Actions column to shut down a deployment. Billing stops once the status changes to Terminated. This action cannot be undone.

​Prerequisites

​The deployments list

​Deployment statuses

​Create a deployment

​Step 1 — Select model

​Step 2 — Configuration

​Step 3 — Review & deploy

​View deployment details

​Call a deployment

​Terminate a deployment

Prerequisites

The deployments list

Deployment statuses

Create a deployment

Step 1 — Select model

Step 2 — Configuration

Step 3 — Review & deploy

View deployment details

Call a deployment

Terminate a deployment