Skip to main content
The Deployments page lets you provision a dedicated GPU endpoint for any supported model. Once a deployment is Ready, you can call it using the OpenAI-compatible API.

Prerequisites

  • An EigenAI account with available credits.
  • An API key (see Authentication).

The deployments list

The Deployments page shows a table of all your deployments.
ColumnDescription
Deployment nameHuman-readable name and short ID (e.g., dep-6a6ee693c4).
TypeDeployment type. Currently llm for all text and vision-language models.
StatusCurrent state (see Deployment statuses below).
ActionsSync status refreshes the status from the backend. Terminate shuts down the deployment and stops billing.
Use the Search box to filter by name, model, or ID. Use the All types and All statuses dropdowns to narrow the list. Click Refresh to reload all deployment statuses at once.

Deployment statuses

StatusDescription
ProvisioningThe deployment is starting up. GPUs are being allocated.
ReadyThe deployment is running and accepting API requests.
TerminatedThe deployment has been shut down.
FailedThe deployment encountered an error and could not start.

Create a deployment

Click Create Deployment to open the 3-step wizard.

Step 1 — Select model

Choose the model you want to deploy. 15 models are available, organized by provider.
ProviderModels
DeepSeekDeepSeek R1, DeepSeek V3.1 Terminus, DeepSeek V3.2
Qwen3Qwen3-VL 235B Thinking (FP8), Qwen3-VL 235B Instruct (FP8), Qwen3-VL 32B Instruct (FP8), Qwen3-VL 30B MoE Instruct (FP8), Qwen3-VL 8B Instruct (FP8), Qwen3 235B Thinking (FP8)
GLMGLM 4.5V, GLM 4.6V 106B, GLM 4.6V 9B Flash
GPT-OSSGPT-OSS 120B
KimiKimi K2 Instruct, Kimi K2 Thinking
Each model card shows:
  • GPU minimum — the minimum number of GPUs required.
  • Capability tags — e.g., Think (chain-of-thought reasoning), Tools (function calling), Fast (speculative decoding), Vision (image input), FP8 (quantized weights).
Use the provider filter tabs (All, DeepSeek, Qwen3, GLM, GPT-OSS, Kimi) to browse by family. You can also deploy a fine-tuned model checkpoint directly from the Fine-tuning job detail page using the Deploy button on any epoch checkpoint.

Step 2 — Configuration

Configure the hardware and runtime options for your deployment.
FieldDescription
Display nameA human-readable name for this deployment. Auto-generated by default (e.g., deepseek-r1-chief-moth-207).
Hardware platformGPU type. Currently NVIDIA H200 (141 GB, $5.99/GPU/hr).
GPU countNumber of GPUs per replica: 1, 2, 4, or 8. Some models require a minimum GPU count.
ReplicasNumber of parallel instances for scaling: 1, 2, 3, or 4.
Reasoning parserEnable structured chain-of-thought output. Recommended for reasoning models (e.g., DeepSeek R1).
Tool call parserEnable function calling and tool use.

Step 3 — Review & deploy

Review the deployment summary before launching.
FieldDescription
NameDisplay name of the deployment.
ModelThe model being deployed.
HardwareGPU type selected.
GPUs per replicaGPU count chosen in step 2.
ReplicasNumber of replicas chosen in step 2.
Reasoning parserWhether reasoning output parsing is enabled.
Tool callingWhether tool call parsing is enabled.
Estimated costCalculated as: GPUs per replica × replicas × price per GPU/hr.
Click Deploy to provision the endpoint. The deployment will enter Provisioning status.

View deployment details

Click any row in the deployments list to expand its details inline.
FieldDescription
Model nameThe unique model identifier to use in API calls.
Base modelThe base model or checkpoint used.
AcceleratorGPU type and count (e.g., H200:1).
ReplicasNumber of running replicas.
CheckpointThe specific model checkpoint deployed.

Call a deployment

When a deployment is Ready, an OpenAI-Compatible Endpoint section appears with everything you need to make your first request. Base URL
https://api-web.eigenai.com/api/deployment/v1
Model name Each deployment gets a unique model name (e.g., qwen3-vl-8b-instruct-mushy-booby-167-dep-6a6ee693c4). Use this as the model field in your requests. Click the copy icon to copy it. Authentication Include your API key in the Authorization header. Click Get API Key to go to the API Keys page.
curl https://api-web.eigenai.com/api/deployment/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "YOUR_DEPLOYMENT_MODEL_NAME",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
Replace YOUR_DEPLOYMENT_MODEL_NAME with the model name shown in the deployment’s detail panel.

Terminate a deployment

Click Terminate in the Actions column to shut down a deployment. Billing stops once the status changes to Terminated. This action cannot be undone.