Unified API (OpenAI compat)
TIE AI Gateway offers an OpenAI-compatible /v1/chat/completions endpoint, enabling integration with multiple AI providers using a single URL. This is a drop-in replacement for OpenAI, Cloudflare AI Gateway, or LiteLLM — change the base URL and it works.
Authentication
Section titled “Authentication”All requests require a Bearer token in the Authorization header. See Ways to authenticate for how to get one.
| Caller | Method | Details |
|---|---|---|
| Browser / mobile app | User tokens | Sign in via TIE Auth, pass the JWT |
| Backend service | Service accounts | Use AUTH_SECRET, optionally with X-On-Behalf-Of for user context |
Endpoint URL
Section titled “Endpoint URL”POST /v1/chat/completionsSwitch providers by changing the model parameter. For example: anthropic/claude-sonnet-4-5, openai/gpt-5.1, vertexai/gemini-2.5-flash.
Parameters
Section titled “Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Model to use (e.g. anthropic/claude-sonnet-4-5, openai/gpt-5.1) |
messages | array | required | Conversation messages in OpenAI format. Each message's content can be a plain string or an array of content parts (text and images). If the array includes a system or developer message, it replaces the agent's built-in prompt for that request (memory and personas are still injected). |
tools | array | null | Tool definitions in OpenAI function calling format |
stream | boolean | false | Enable SSE streaming |
stream_options | object | null | Streaming options. Set {"include_usage": true} to receive a final chunk with token usage (only applies when stream is true). |
temperature | float | null | Sampling temperature |
max_tokens | integer | null | Maximum tokens to generate |
top_p | float | null | Nucleus sampling parameter |
stop | string/array | null | Stop sequences |
thread_id | string | null | Conversation thread for state persistence (see Threads) |
persona_id | string | null | Persona to use for this request. Falls back to the user's active persona if omitted. |
metadata | object | null | Up to 16 key-value string pairs attached to this request. Persisted with the thread and returned in history. Echoed in the SSE first chunk when streaming. See Metadata. |
skill_instructions | string | null | Caller-composed skill prompt. Injected after persona, before memory context. Purely additive — never replaces persona or base instructions. See Skill Instructions. |
Custom Prompt Mode
Section titled “Custom Prompt Mode”If you want to replace the agent's built-in prompt with your own instructions, send a system or developer message in the messages array.
When TIE sees one of those roles, your message replaces the agent's base instructions for that request:
- your
systemordevelopermessage becomes the top-level instruction instead of the agent's built-in prompt - TIE still injects persistent memory context so the model knows what it remembers about the user
- TIE still injects the active persona (if any)
- thread history and turn recording still work normally
- client-defined tools in the
toolsparameter still work
This is useful when you want full prompt control for a specific request while keeping TIE's memory and conversation features.
Example: caller-owned prompt
Section titled “Example: caller-owned prompt”curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "openai/gpt-5.1", "messages": [ {"role": "developer", "content": "You are a concise support assistant. Answer in 3 bullet points max."}, {"role": "user", "content": "How do I rotate an API key?"} ] }'Examples
Section titled “Examples”Response
Section titled “Response”{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1711234567, "model": "anthropic/claude-sonnet-4-5", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 8, "total_tokens": 20, "cache_read_tokens": 0, "cache_creation_tokens": 0 }, "metadata": null}If metadata was included in the request, it is echoed back here.
Understanding the usage object
Section titled “Understanding the usage object”Every time you call the AI, it counts the pieces of text (called tokens) that go in and out. You pay your provider for these tokens, so the usage object is how you know what just got spent.
Think of it like a taxi meter: prompt_tokens is how far you rode in, completion_tokens is how far the AI drove back, and total_tokens is the full fare.
| Field | What it means (in plain English) |
|---|---|
prompt_tokens | How much you sent in (your messages + system prompt + memory + persona). Bigger question = bigger number. |
completion_tokens | How much the AI wrote back. A one-word answer is tiny. A long essay is big. |
total_tokens | Simply prompt_tokens + completion_tokens. The whole bill. |
cache_read_tokens | Of your prompt_tokens, how many were reused from a cached copy the provider had already saved. These are WAY cheaper — Anthropic charges up to 10× less for cache reads. |
cache_creation_tokens | How many tokens the provider saved into cache for next time. Slightly pricier than a normal token, but it makes future calls much cheaper. |
Vision (Image Input)
Section titled “Vision (Image Input)”You can send images to vision-capable models by using the content parts format instead of a plain string for content. This follows the OpenAI Vision API format.
Each message's content can be either:
- A string (text-only, the default)
- An array of content parts mixing text and images
Content part types
Section titled “Content part types”| Type | Fields | Description |
|---|---|---|
text | text | A text segment |
image_url | image_url.url, image_url.detail | An image, either as a direct URL or base64 data URI |
The detail parameter controls image resolution processing:
| Value | Description |
|---|---|
auto | Let the model decide (default) |
low | Faster, lower resolution, fewer tokens |
high | Higher resolution, more tokens |
Sending an image URL
Section titled “Sending an image URL”curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg", "detail": "auto" } } ] } ] }'Sending a base64-encoded image
Section titled “Sending a base64-encoded image”curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image." }, { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg...", "detail": "low" } } ] } ] }'Threads
Section titled “Threads”Pass a thread_id to maintain conversation state across requests. TIE uses it to persist the message history so the LLM has context from previous turns.
- If omitted, TIE auto-generates a UUID for the thread. Each request starts a new conversation.
- If provided, TIE resumes the conversation from where it left off. Use any string (e.g. a UUID you generate client-side).
See Threads for the full API reference (list, rename, delete, history).
Streaming Usage
Section titled “Streaming Usage”When streaming (stream: true), token usage is available by setting stream_options.include_usage to true.
curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "stream": true, "stream_options": {"include_usage": true}, "messages": [ {"role": "user", "content": "Hello!"} ] }'When enabled, TIE emits a final SSE chunk after the finish_reason chunk with an empty choices array and the accumulated usage:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711234567,"model":"anthropic/claude-sonnet-4-5","choices":[],"usage":{"prompt_tokens":9,"completion_tokens":12,"total_tokens":21,"cache_read_tokens":0,"cache_creation_tokens":0}}
data: [DONE]The final usage chunk includes the same fields as the non-streaming response, including cache_read_tokens and cache_creation_tokens. Numbers are accumulated across every internal step (tool calls, memory lookups, etc.) so you get one clean bill for the whole request. See Understanding the usage object for what each field means.
If stream_options is omitted or include_usage is false, no usage chunk is emitted.
Tool Calling
Section titled “Tool Calling”TIE supports two types of tools:
- Client tools — You define them in the
toolsparameter. TIE returnstool_callsfor your app to execute locally and send results back. - Internal tools —
memory_searchandmemory_writerun server-side, invisible to the client. See Memory.
Step 1: Send message with tools
Section titled “Step 1: Send message with tools”curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "messages": [ {"role": "user", "content": "What tasks do I have today?"} ], "tools": [ { "type": "function", "function": { "name": "list_tasks", "description": "List user tasks filtered by status", "parameters": { "type": "object", "properties": { "status": { "type": "string", "enum": ["INBOX", "NEXT_UP", "IN_PROGRESS", "WAITING"] } } } } } ] }'Step 2: Response with tool_calls
Section titled “Step 2: Response with tool_calls”When the model wants to call a tool, finish_reason is "tool_calls":
{ "choices": [ { "message": { "role": "assistant", "content": null, "tool_calls": [ { "id": "call_xyz789", "type": "function", "function": { "name": "list_tasks", "arguments": "{\"status\": \"INBOX\"}" } } ] }, "finish_reason": "tool_calls" } ]}Step 3: Send tool results
Section titled “Step 3: Send tool results”Execute the tool locally, then send the result back:
curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "messages": [ {"role": "user", "content": "What tasks do I have today?"}, {"role": "assistant", "content": null, "tool_calls": [{"id": "call_xyz789", "type": "function", "function": {"name": "list_tasks", "arguments": "{\"status\": \"INBOX\"}"}}]}, {"role": "tool", "tool_call_id": "call_xyz789", "content": "[{\"title\": \"Buy groceries\"}, {\"title\": \"Review PR #42\"}]"} ], "tools": [...] }'The model may request multiple rounds of tool calls. Keep sending results until finish_reason is "stop".
Metadata
Section titled “Metadata”Attach arbitrary key-value string pairs to a request using the root-level metadata parameter. This is useful for storing client context (source app, user location, image URLs, UI hints) alongside the conversation without changing the message schema.
Constraints:
| Limit | Value |
|---|---|
| Max keys per request | 16 |
| Max key length | 64 characters |
| Max value length | 512 characters |
| Value type | string only |
Example:
curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "messages": [ {"role": "user", "content": "Describe this image"} ], "metadata": { "session_id": "8b94776f-31ae-49ac-9f06-3be137d69186", "platform": "ios", "app_version": "2.4.1" } }'Metadata is echoed back in the response and persisted with the thread. When you retrieve thread history, metadata appears on the user message it was attached to.
When streaming, metadata is included in the first SSE chunk so clients can act on it immediately (e.g. render skill-specific UI before content arrives):
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711234567,"model":"anthropic/claude-sonnet-4-5","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}],"metadata":{"active_skill":"menu","super_skill":"keto"}}Subsequent chunks do not contain metadata.
Skill Instructions
Section titled “Skill Instructions”Use skill_instructions to inject caller-composed skill prompts into the LLM system prompt. This is how app backends send skill-specific behavior to TIE without TIE needing to know about the app's skill definitions.
TIE inserts the text at a fixed position in the prompt stack:
┌─────────────────────────┐│ Agent base instructions │ ← TIE owns (e.g. Eliza personality, safety rules)├─────────────────────────┤│ Persona │ ← TIE's persona system├─────────────────────────┤│ skill_instructions │ ← Your text goes here├─────────────────────────┤│ Memory context │ ← TIE adds from the user's knowledge graph├─────────────────────────┤│ Conversation history │ ← Messages└─────────────────────────┘Key behavior:
- Additive — never replaces the agent's base instructions or persona
- Optional — if omitted, behavior is unchanged (persona + memory as before)
- TIE does not interpret, cache, or fetch
skill_instructions— it only appends the text
Example
Section titled “Example”curl -X POST https://your-tie-host/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "model": "anthropic/claude-sonnet-4-5", "messages": [ {"role": "user", "content": "What should I order from this Italian restaurant?"} ], "skill_instructions": "You are in Menu Analysis mode. Break down each menu item by calories, protein, carbs, and fat. Apply the Keto lens: flag items exceeding 20g net carbs.", "metadata": { "active_skill": "menu", "super_skill": "keto" } }'In this example, the LLM sees Eliza's personality and safety rules, then the menu analysis instructions, then any memories about the user (e.g. allergies). The metadata is echoed back to the client but not shown to the LLM.
Agent Selection
Section titled “Agent Selection”By default, requests use the chatbot agent. To use a different agent, pass the X-Agent-Id header:
curl -X POST https://your-tie-host/v1/chat/completions \ -H "X-Agent-Id: research-assistant" \ -H "Authorization: Bearer $TOKEN" \ -d '{"model": "anthropic/claude-sonnet-4-5", "messages": [...]}'If the request includes a system or developer message, that message replaces the agent's built-in prompt for that request. Memory and personas are still injected.
Per-App Thread Isolation
Section titled “Per-App Thread Isolation”To scope threads to a specific application, pass the X-App-Id header. Threads sent with an app ID are isolated — they won't appear in other apps' thread lists, and can only be resumed, renamed, or deleted with the same X-App-Id.
curl -X POST https://your-tie-host/v1/chat/completions \ -H "X-App-Id: vibrantly" \ -H "X-Agent-Id: chatbot" \ -H "Authorization: Bearer $TOKEN" \ -d '{"model": "anthropic/claude-sonnet-4-5", "messages": [...], "thread_id": "my-thread"}'When X-App-Id is omitted, threads are unscoped and visible to all apps. See Threads — Per-App Thread Isolation for details.
| Agent | Description | Memory | Safety |
|---|---|---|---|
chatbot | General-purpose chatbot | shared | none |
eliza | AI wellness coach — nutrition guidance, meal logging, supplement tracking | shared | none |
research-assistant | Web search and calculator | shared | LlamaGuard |
rag-assistant | Database search (RAG) | isolated | LlamaGuard |
command-agent | Command execution | none | none |
bg-task-agent | Background task processing | none | none |
- Memory: shared — memories are accessible across all agents that use the shared pool. Memory: isolated — memories are scoped to that specific agent.
- Safety: LlamaGuard — user inputs and agent outputs are checked against Meta's Llama Guard content safety classifier before processing. Requests flagged as unsafe are rejected.
Memory scope is configured per agent — clients cannot override it. Query GET /info to see all available agents on your instance.
Supported Providers
Section titled “Supported Providers”Query GET /info to see which models are currently available on your instance.