POST /api/v1/chat/completions
Content-Type: application/json
Authentication
Send your API key in theAuthorization header as a Bearer token.
Parameters
Common
| Name | Type | Required | Description |
|---|---|---|---|
model | string | Required | The model ID used to generate the response, like gpt-oss-120b. Find supported models in the Model Library. |
messages | array | Required | A list of messages comprising the conversation so far. Depending on the model you use, different message types (modalities) are supported, like text, images, and video. |
Conditional
The following parameters are not supported by every model. Check the Model Library for model-specific compatibility.Generation Controls
Common tuning knobs for output length and randomness (availability varies by model).| Name | Type | Required | Description |
|---|---|---|---|
temperature | number | Optional | What sampling temperature to use, between 0 and 2 (defaults to 1). Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both. |
max_tokens | integer | Optional | The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API. |
top_p | number | Optional | An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Defaults to 1. We generally recommend altering this or temperature but not both. |
top_k | integer | Optional | Top-k sampling is another sampling method where the k most probable next tokens are filtered and the probability mass is redistributed among only those k next tokens. The value of k controls the number of candidates for the next token at each step during text generation. Must be between 0 and 100. |
min_p | number | Optional | Minimum probability threshold for token selection. Only tokens with probability >= min_p are considered for selection. This is an alternative to top_p and top_k sampling. Must be between 0 and 1. |
repetition_penalty | number | Optional | Applies a penalty to repeated tokens to discourage or encourage repetition. A value of 1.0 means no penalty, allowing free repetition. Values above 1.0 penalize repetition, reducing the likelihood of repeating tokens. Values between 0.0 and 1.0 reward repetition, increasing the chance of repeated tokens. For a good balance, a value of 1.2 is often recommended. Must be between 0 and 100. |
reasoning_effort | string | Optional | Constrains effort on reasoning for reasoning models. Currently supported values are none, minimal, low, medium, high, and xhigh. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. Defaults to medium. |
separate_reasoning | boolean | Optional | Emit structured reasoning separately from the final answer. |
chat_template_kwargs.thinking | boolean | Optional | Set to true to request chain-of-thought output. |
Streaming
Receive partial outputs incrementally.| Name | Type | Required | Description |
|---|---|---|---|
stream | boolean | Optional | Set to true to stream output via server-sent events (SSE). Defaults to false |
Vision Input
Some models accept mixed text + image inputs by using an array forcontent on a message.
| Name | Type | Required | Description |
|---|---|---|---|
messages[].content | string|array | Optional | For vision requests, content can be an array of parts like { type: "video_url" } and { type: "image_url" }. |