Skip to content

Unified API (OpenAI compat)

TIE AI Gateway offers an OpenAI-compatible /v1/chat/completions endpoint, enabling integration with multiple AI providers using a single URL. This is a drop-in replacement for OpenAI, Cloudflare AI Gateway, or LiteLLM — change the base URL and it works.

All requests require a Bearer token in the Authorization header. See Ways to authenticate for how to get one.

CallerMethodDetails
Browser / mobile appUser tokensSign in via TIE Auth, pass the JWT
Backend serviceService accountsUse AUTH_SECRET, optionally with X-On-Behalf-Of for user context
POST /v1/chat/completions

Switch providers by changing the model parameter. For example: anthropic/claude-sonnet-4-5, openai/gpt-5.1, vertexai/gemini-2.5-flash.

ParameterTypeDefaultDescription
modelstringrequiredModel to use (e.g. anthropic/claude-sonnet-4-5, openai/gpt-5.1)
messagesarrayrequiredConversation messages in OpenAI format. Each message's content can be a plain string or an array of content parts (text and images). If the array includes a system or developer message, it replaces the agent's built-in prompt for that request (memory and personas are still injected).
toolsarraynullTool definitions in OpenAI function calling format
streambooleanfalseEnable SSE streaming
stream_optionsobjectnullStreaming options. Set {"include_usage": true} to receive a final chunk with token usage (only applies when stream is true).
temperaturefloatnullSampling temperature
max_tokensintegernullMaximum tokens to generate
top_pfloatnullNucleus sampling parameter
stopstring/arraynullStop sequences
thread_idstringnullConversation thread for state persistence (see Threads)
persona_idstringnullPersona to use for this request. Falls back to the user's active persona if omitted.
metadataobjectnullUp to 16 key-value string pairs attached to this request. Persisted with the thread and returned in history. Echoed in the SSE first chunk when streaming. See Metadata.
skill_instructionsstringnullCaller-composed skill prompt. Injected after persona, before memory context. Purely additive — never replaces persona or base instructions. See Skill Instructions.

If you want to replace the agent's built-in prompt with your own instructions, send a system or developer message in the messages array.

When TIE sees one of those roles, your message replaces the agent's base instructions for that request:

  • your system or developer message becomes the top-level instruction instead of the agent's built-in prompt
  • TIE still injects persistent memory context so the model knows what it remembers about the user
  • TIE still injects the active persona (if any)
  • thread history and turn recording still work normally
  • client-defined tools in the tools parameter still work

This is useful when you want full prompt control for a specific request while keeping TIE's memory and conversation features.

Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "openai/gpt-5.1",
"messages": [
{"role": "developer", "content": "You are a concise support assistant. Answer in 3 bullet points max."},
{"role": "user", "content": "How do I rotate an API key?"}
]
}'
Sendtousing
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "$TOKEN",
  baseURL: "https://your-tie-host/v1",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

console.log(response.choices[0].message.content);
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1711234567,
"model": "anthropic/claude-sonnet-4-5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20,
"cache_read_tokens": 0,
"cache_creation_tokens": 0
},
"metadata": null
}

If metadata was included in the request, it is echoed back here.

Every time you call the AI, it counts the pieces of text (called tokens) that go in and out. You pay your provider for these tokens, so the usage object is how you know what just got spent.

Think of it like a taxi meter: prompt_tokens is how far you rode in, completion_tokens is how far the AI drove back, and total_tokens is the full fare.

FieldWhat it means (in plain English)
prompt_tokensHow much you sent in (your messages + system prompt + memory + persona). Bigger question = bigger number.
completion_tokensHow much the AI wrote back. A one-word answer is tiny. A long essay is big.
total_tokensSimply prompt_tokens + completion_tokens. The whole bill.
cache_read_tokensOf your prompt_tokens, how many were reused from a cached copy the provider had already saved. These are WAY cheaper — Anthropic charges up to 10× less for cache reads.
cache_creation_tokensHow many tokens the provider saved into cache for next time. Slightly pricier than a normal token, but it makes future calls much cheaper.

You can send images to vision-capable models by using the content parts format instead of a plain string for content. This follows the OpenAI Vision API format.

Each message's content can be either:

  • A string (text-only, the default)
  • An array of content parts mixing text and images
TypeFieldsDescription
texttextA text segment
image_urlimage_url.url, image_url.detailAn image, either as a direct URL or base64 data URI

The detail parameter controls image resolution processing:

ValueDescription
autoLet the model decide (default)
lowFaster, lower resolution, fewer tokens
highHigher resolution, more tokens
Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{
"type": "image_url",
"image_url": {
"url": "https://example.com/photo.jpg",
"detail": "auto"
}
}
]
}
]
}'
Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Describe this image." },
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
"detail": "low"
}
}
]
}
]
}'

Pass a thread_id to maintain conversation state across requests. TIE uses it to persist the message history so the LLM has context from previous turns.

  • If omitted, TIE auto-generates a UUID for the thread. Each request starts a new conversation.
  • If provided, TIE resumes the conversation from where it left off. Use any string (e.g. a UUID you generate client-side).

See Threads for the full API reference (list, rename, delete, history).

When streaming (stream: true), token usage is available by setting stream_options.include_usage to true.

Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"stream": true,
"stream_options": {"include_usage": true},
"messages": [
{"role": "user", "content": "Hello!"}
]
}'

When enabled, TIE emits a final SSE chunk after the finish_reason chunk with an empty choices array and the accumulated usage:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711234567,"model":"anthropic/claude-sonnet-4-5","choices":[],"usage":{"prompt_tokens":9,"completion_tokens":12,"total_tokens":21,"cache_read_tokens":0,"cache_creation_tokens":0}}
data: [DONE]

The final usage chunk includes the same fields as the non-streaming response, including cache_read_tokens and cache_creation_tokens. Numbers are accumulated across every internal step (tool calls, memory lookups, etc.) so you get one clean bill for the whole request. See Understanding the usage object for what each field means.

If stream_options is omitted or include_usage is false, no usage chunk is emitted.

TIE supports two types of tools:

  • Client tools — You define them in the tools parameter. TIE returns tool_calls for your app to execute locally and send results back.
  • Internal toolsmemory_search and memory_write run server-side, invisible to the client. See Memory.
Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"messages": [
{"role": "user", "content": "What tasks do I have today?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "list_tasks",
"description": "List user tasks filtered by status",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["INBOX", "NEXT_UP", "IN_PROGRESS", "WAITING"]
}
}
}
}
}
]
}'

When the model wants to call a tool, finish_reason is "tool_calls":

{
"choices": [
{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_xyz789",
"type": "function",
"function": {
"name": "list_tasks",
"arguments": "{\"status\": \"INBOX\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}

Execute the tool locally, then send the result back:

Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"messages": [
{"role": "user", "content": "What tasks do I have today?"},
{"role": "assistant", "content": null, "tool_calls": [{"id": "call_xyz789", "type": "function", "function": {"name": "list_tasks", "arguments": "{\"status\": \"INBOX\"}"}}]},
{"role": "tool", "tool_call_id": "call_xyz789", "content": "[{\"title\": \"Buy groceries\"}, {\"title\": \"Review PR #42\"}]"}
],
"tools": [...]
}'

The model may request multiple rounds of tool calls. Keep sending results until finish_reason is "stop".

Attach arbitrary key-value string pairs to a request using the root-level metadata parameter. This is useful for storing client context (source app, user location, image URLs, UI hints) alongside the conversation without changing the message schema.

Constraints:

LimitValue
Max keys per request16
Max key length64 characters
Max value length512 characters
Value typestring only

Example:

Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"messages": [
{"role": "user", "content": "Describe this image"}
],
"metadata": {
"session_id": "8b94776f-31ae-49ac-9f06-3be137d69186",
"platform": "ios",
"app_version": "2.4.1"
}
}'

Metadata is echoed back in the response and persisted with the thread. When you retrieve thread history, metadata appears on the user message it was attached to.

When streaming, metadata is included in the first SSE chunk so clients can act on it immediately (e.g. render skill-specific UI before content arrives):

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711234567,"model":"anthropic/claude-sonnet-4-5","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}],"metadata":{"active_skill":"menu","super_skill":"keto"}}

Subsequent chunks do not contain metadata.

Use skill_instructions to inject caller-composed skill prompts into the LLM system prompt. This is how app backends send skill-specific behavior to TIE without TIE needing to know about the app's skill definitions.

TIE inserts the text at a fixed position in the prompt stack:

┌─────────────────────────┐
│ Agent base instructions │ ← TIE owns (e.g. Eliza personality, safety rules)
├─────────────────────────┤
│ Persona │ ← TIE's persona system
├─────────────────────────┤
│ skill_instructions │ ← Your text goes here
├─────────────────────────┤
│ Memory context │ ← TIE adds from the user's knowledge graph
├─────────────────────────┤
│ Conversation history │ ← Messages
└─────────────────────────┘

Key behavior:

  • Additive — never replaces the agent's base instructions or persona
  • Optional — if omitted, behavior is unchanged (persona + memory as before)
  • TIE does not interpret, cache, or fetch skill_instructions — it only appends the text
Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"model": "anthropic/claude-sonnet-4-5",
"messages": [
{"role": "user", "content": "What should I order from this Italian restaurant?"}
],
"skill_instructions": "You are in Menu Analysis mode. Break down each menu item by calories, protein, carbs, and fat. Apply the Keto lens: flag items exceeding 20g net carbs.",
"metadata": {
"active_skill": "menu",
"super_skill": "keto"
}
}'

In this example, the LLM sees Eliza's personality and safety rules, then the menu analysis instructions, then any memories about the user (e.g. allergies). The metadata is echoed back to the client but not shown to the LLM.

By default, requests use the chatbot agent. To use a different agent, pass the X-Agent-Id header:

Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "X-Agent-Id: research-assistant" \
-H "Authorization: Bearer $TOKEN" \
-d '{"model": "anthropic/claude-sonnet-4-5", "messages": [...]}'

If the request includes a system or developer message, that message replaces the agent's built-in prompt for that request. Memory and personas are still injected.

To scope threads to a specific application, pass the X-App-Id header. Threads sent with an app ID are isolated — they won't appear in other apps' thread lists, and can only be resumed, renamed, or deleted with the same X-App-Id.

Terminal window
curl -X POST https://your-tie-host/v1/chat/completions \
-H "X-App-Id: vibrantly" \
-H "X-Agent-Id: chatbot" \
-H "Authorization: Bearer $TOKEN" \
-d '{"model": "anthropic/claude-sonnet-4-5", "messages": [...], "thread_id": "my-thread"}'

When X-App-Id is omitted, threads are unscoped and visible to all apps. See Threads — Per-App Thread Isolation for details.

AgentDescriptionMemorySafety
chatbotGeneral-purpose chatbotsharednone
elizaAI wellness coach — nutrition guidance, meal logging, supplement trackingsharednone
research-assistantWeb search and calculatorsharedLlamaGuard
rag-assistantDatabase search (RAG)isolatedLlamaGuard
command-agentCommand executionnonenone
bg-task-agentBackground task processingnonenone
  • Memory: shared — memories are accessible across all agents that use the shared pool. Memory: isolated — memories are scoped to that specific agent.
  • Safety: LlamaGuard — user inputs and agent outputs are checked against Meta's Llama Guard content safety classifier before processing. Requests flagged as unsafe are rejected.

Memory scope is configured per agent — clients cannot override it. Query GET /info to see all available agents on your instance.

Query GET /info to see which models are currently available on your instance.