The Deployments page lets you provision a dedicated GPU endpoint for any supported model. Once a deployment is Ready, you can call it using the OpenAI-compatible API.
Prerequisites
- An EigenAI account with available credits.
- An API key (see Authentication).
The deployments list
The Deployments page shows a table of all your deployments.
| Column | Description |
|---|
| Deployment name | Human-readable name and short ID (e.g., dep-6a6ee693c4). |
| Type | Deployment type. Currently llm for all text and vision-language models. |
| Status | Current state (see Deployment statuses below). |
| Actions | Sync status refreshes the status from the backend. Terminate shuts down the deployment and stops billing. |
Use the Search box to filter by name, model, or ID. Use the All types and All statuses dropdowns to narrow the list.
Click Refresh to reload all deployment statuses at once.
Deployment statuses
| Status | Description |
|---|
| Provisioning | The deployment is starting up. GPUs are being allocated. |
| Ready | The deployment is running and accepting API requests. |
| Terminated | The deployment has been shut down. |
| Failed | The deployment encountered an error and could not start. |
Create a deployment
Click Create Deployment to open the 3-step wizard.
Step 1 — Select model
Choose the model you want to deploy. 15 models are available, organized by provider.
| Provider | Models |
|---|
| DeepSeek | DeepSeek R1, DeepSeek V3.1 Terminus, DeepSeek V3.2 |
| Qwen3 | Qwen3-VL 235B Thinking (FP8), Qwen3-VL 235B Instruct (FP8), Qwen3-VL 32B Instruct (FP8), Qwen3-VL 30B MoE Instruct (FP8), Qwen3-VL 8B Instruct (FP8), Qwen3 235B Thinking (FP8) |
| GLM | GLM 4.5V, GLM 4.6V 106B, GLM 4.6V 9B Flash |
| GPT-OSS | GPT-OSS 120B |
| Kimi | Kimi K2 Instruct, Kimi K2 Thinking |
Each model card shows:
- GPU minimum — the minimum number of GPUs required.
- Capability tags — e.g.,
Think (chain-of-thought reasoning), Tools (function calling), Fast (speculative decoding), Vision (image input), FP8 (quantized weights).
Use the provider filter tabs (All, DeepSeek, Qwen3, GLM, GPT-OSS, Kimi) to browse by family.
You can also deploy a fine-tuned model checkpoint directly from the Fine-tuning job detail page using the Deploy button on any epoch checkpoint.
Step 2 — Configuration
Configure the hardware and runtime options for your deployment.
| Field | Description |
|---|
| Display name | A human-readable name for this deployment. Auto-generated by default (e.g., deepseek-r1-chief-moth-207). |
| Hardware platform | GPU type. Currently NVIDIA H200 (141 GB, $5.99/GPU/hr). |
| GPU count | Number of GPUs per replica: 1, 2, 4, or 8. Some models require a minimum GPU count. |
| Replicas | Number of parallel instances for scaling: 1, 2, 3, or 4. |
| Reasoning parser | Enable structured chain-of-thought output. Recommended for reasoning models (e.g., DeepSeek R1). |
| Tool call parser | Enable function calling and tool use. |
Step 3 — Review & deploy
Review the deployment summary before launching.
| Field | Description |
|---|
| Name | Display name of the deployment. |
| Model | The model being deployed. |
| Hardware | GPU type selected. |
| GPUs per replica | GPU count chosen in step 2. |
| Replicas | Number of replicas chosen in step 2. |
| Reasoning parser | Whether reasoning output parsing is enabled. |
| Tool calling | Whether tool call parsing is enabled. |
| Estimated cost | Calculated as: GPUs per replica × replicas × price per GPU/hr. |
Click Deploy to provision the endpoint. The deployment will enter Provisioning status.
View deployment details
Click any row in the deployments list to expand its details inline.
| Field | Description |
|---|
| Model name | The unique model identifier to use in API calls. |
| Base model | The base model or checkpoint used. |
| Accelerator | GPU type and count (e.g., H200:1). |
| Replicas | Number of running replicas. |
| Checkpoint | The specific model checkpoint deployed. |
Call a deployment
When a deployment is Ready, an OpenAI-Compatible Endpoint section appears with everything you need to make your first request.
Base URL
https://api-web.eigenai.com/api/deployment/v1
Model name
Each deployment gets a unique model name (e.g., qwen3-vl-8b-instruct-mushy-booby-167-dep-6a6ee693c4). Use this as the model field in your requests. Click the copy icon to copy it.
Authentication
Include your API key in the Authorization header. Click Get API Key to go to the API Keys page.
curl https://api-web.eigenai.com/api/deployment/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "YOUR_DEPLOYMENT_MODEL_NAME",
"messages": [{"role": "user", "content": "Hello"}]
}'
Replace YOUR_DEPLOYMENT_MODEL_NAME with the model name shown in the deployment’s detail panel.
Terminate a deployment
Click Terminate in the Actions column to shut down a deployment. Billing stops once the status changes to Terminated. This action cannot be undone.