API reference

Chat completions

Generate a model response for a conversation. POST a list of messages and the model returns the next assistant message. This is the primary endpoint and follows the OpenAI chat-completions schema, including streaming, tool calls, and structured outputs.

Create a chat completion #

POST https://api.merius.ai/chat/completions

Request
curl https://api.merius.ai/v1/chat/completions \
  -H "Authorization: Bearer $MERIUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-30b-a3b",
    "messages": [
      {"role": "system", "content": "You are concise."},
      {"role": "user", "content": "Name three primary colors."}
    ],
    "temperature": 0.7
  }'

Request parameters #

The request body is JSON. model and messages are required; the rest are optional and share OpenAI’s semantics and defaults.

ParameterTypeDescription
model requiredstringThe model slug to call, e.g. qwen/qwen3-30b-a3b. See the Models page for the full list.
messages requiredarrayThe conversation so far, as objects with a role (system, user, or assistant) and content.
temperaturenumberSampling temperature, 0–2. Higher is more random, lower is more focused. Default 1.
top_pnumberNucleus sampling, 0–1. An alternative to temperature; adjust one, not both. Default 1.
max_tokensintegerUpper bound on tokens generated in the completion. Defaults to the model’s remaining context.
streambooleanStream the response as server-sent events. Default false. See Streaming.
stopstring | arrayUp to four sequences where generation stops. The stop text is not included in the output.
presence_penaltynumberBetween -2 and 2. Positive values push the model toward new topics. Default 0.
frequency_penaltynumberBetween -2 and 2. Positive values discourage repeating the same tokens. Default 0.
seedintegerBest-effort determinism: the same seed and parameters return a similar result where supported.
toolsarrayFunction definitions the model may call. See Function calling.
response_formatobjectRequest JSON or a JSON schema for structured outputs. See Structured outputs.

The response object #

A non-streaming request returns a single chat-completion object. The generated text is in choices[0].message.content; usage reports token counts for the call.

Response
{
  "id": "chatcmpl-9f3a…",
  "object": "chat.completion",
  "created": 1768000000,
  "model": "qwen/qwen3-30b-a3b",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Red, blue, and yellow."},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 18, "completion_tokens": 7, "total_tokens": 25}
}

Response fields #

FieldTypeDescription
idstringUnique identifier for the completion.
objectstringAlways chat.completion (or chat.completion.chunk when streaming).
createdintegerUnix timestamp (seconds) of when the completion was created.
modelstringThe model that produced the response.
choicesarrayThe generated choices. Each has an index, a message, and a finish_reason.
finish_reasonstringWhy generation stopped: stop, length, tool_calls, or content_filter.
usageobjectToken counts: prompt_tokens, completion_tokens, and total_tokens.

When you set stream: true, the object becomes a series of chat.completion.chunk events instead, with text under choices[0].delta.content. See Streaming.