Unified API (OpenAI compat)

TIE AI Gateway offers an OpenAI-compatible /v1/chat/completions endpoint, enabling integration with multiple AI providers using a single URL. This is a drop-in replacement for OpenAI, Cloudflare AI Gateway, or LiteLLM — change the base URL and it works.

Authentication

All requests require a Bearer token in the Authorization header. See Ways to authenticate for how to get one.

Caller	Method	Details
Browser / mobile app	User tokens	Sign in via TIE Auth, pass the JWT
Backend service	Service accounts	Use `AUTH_SECRET`, optionally with `X-On-Behalf-Of` for user context

Endpoint URL

POST /v1/chat/completions

Switch providers by changing the model parameter. For example: anthropic/claude-sonnet-4-5, openai/gpt-5.1, vertexai/gemini-2.5-flash.

Parameters

Parameter	Type	Default	Description
`model`	string	required	Model to use (e.g. `anthropic/claude-sonnet-4-5`, `openai/gpt-5.1`)
`messages`	array	required	Conversation messages in OpenAI format. Each message's `content` can be a plain string or an array of content parts (text and images). If the array includes a `system` or `developer` message, it replaces the agent's built-in prompt for that request (memory and personas are still injected).
`tools`	array	null	Tool definitions in OpenAI function calling format
`stream`	boolean	false	Enable SSE streaming
`stream_options`	object	null	Streaming options. Set `{"include_usage": true}` to receive a final chunk with token usage (only applies when `stream` is `true`).
`temperature`	float	null	Sampling temperature
`max_tokens`	integer	null	Maximum tokens to generate
`top_p`	float	null	Nucleus sampling parameter
`stop`	string/array	null	Stop sequences
`thread_id`	string	null	Conversation thread for state persistence (see Threads)
`persona_id`	string	null	Persona to use for this request. Falls back to the user's active persona if omitted.
`metadata`	object	null	Up to 16 key-value string pairs attached to this request. Persisted with the thread and returned in history. Echoed in the SSE first chunk when streaming. See Metadata.
`skill_instructions`	string	null	Caller-composed skill prompt. Injected after persona, before memory context. Purely additive — never replaces persona or base instructions. See Skill Instructions.

Custom Prompt Mode

If you want to replace the agent's built-in prompt with your own instructions, send a system or developer message in the messages array.

When TIE sees one of those roles, your message replaces the agent's base instructions for that request:

your system or developer message becomes the top-level instruction instead of the agent's built-in prompt
TIE still injects persistent memory context so the model knows what it remembers about the user
TIE still injects the active persona (if any)
thread history and turn recording still work normally
client-defined tools in the tools parameter still work

This is useful when you want full prompt control for a specific request while keeping TIE's memory and conversation features.

Example: caller-owned prompt

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "openai/gpt-5.1",
    "messages": [
      {"role": "developer", "content": "You are a concise support assistant. Answer in 3 bullet points max."},
      {"role": "user", "content": "How do I rotate an API key?"}
    ]
  }'

Examples

Sendtousing

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "$TOKEN",
  baseURL: "https://your-tie-host/v1",
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

console.log(response.choices[0].message.content);

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1711234567,
  "model": "anthropic/claude-sonnet-4-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20,
    "cache_read_tokens": 0,
    "cache_creation_tokens": 0
  },
  "metadata": null
}

If metadata was included in the request, it is echoed back here.

Understanding the `usage` object

Every time you call the AI, it counts the pieces of text (called tokens) that go in and out. You pay your provider for these tokens, so the usage object is how you know what just got spent.

Think of it like a taxi meter: prompt_tokens is how far you rode in, completion_tokens is how far the AI drove back, and total_tokens is the full fare.

Field	What it means (in plain English)
`prompt_tokens`	How much you sent in (your messages + system prompt + memory + persona). Bigger question = bigger number.
`completion_tokens`	How much the AI wrote back. A one-word answer is tiny. A long essay is big.
`total_tokens`	Simply `prompt_tokens + completion_tokens`. The whole bill.
`cache_read_tokens`	Of your `prompt_tokens`, how many were reused from a cached copy the provider had already saved. These are WAY cheaper — Anthropic charges up to 10× less for cache reads.
`cache_creation_tokens`	How many tokens the provider saved into cache for next time. Slightly pricier than a normal token, but it makes future calls much cheaper.

Vision (Image Input)

You can send images to vision-capable models by using the content parts format instead of a plain string for content. This follows the OpenAI Vision API format.

Each message's content can be either:

A string (text-only, the default)
An array of content parts mixing text and images

Content part types

Type	Fields	Description
`text`	`text`	A text segment
`image_url`	`image_url.url`, `image_url.detail`	An image, either as a direct URL or base64 data URI

The detail parameter controls image resolution processing:

Value	Description
`auto`	Let the model decide (default)
`low`	Faster, lower resolution, fewer tokens
`high`	Higher resolution, more tokens

Sending an image URL

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is in this image?" },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/photo.jpg",
              "detail": "auto"
            }
          }
        ]
      }
    ]
  }'

Sending a base64-encoded image

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Describe this image." },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
              "detail": "low"
            }
          }
        ]
      }
    ]
  }'

Threads

Pass a thread_id to maintain conversation state across requests. TIE uses it to persist the message history so the LLM has context from previous turns.

If omitted, TIE auto-generates a UUID for the thread. Each request starts a new conversation.
If provided, TIE resumes the conversation from where it left off. Use any string (e.g. a UUID you generate client-side).

See Threads for the full API reference (list, rename, delete, history).

Streaming Usage

When streaming (stream: true), token usage is available by setting stream_options.include_usage to true.

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "stream": true,
    "stream_options": {"include_usage": true},
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

When enabled, TIE emits a final SSE chunk after the finish_reason chunk with an empty choices array and the accumulated usage:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711234567,"model":"anthropic/claude-sonnet-4-5","choices":[],"usage":{"prompt_tokens":9,"completion_tokens":12,"total_tokens":21,"cache_read_tokens":0,"cache_creation_tokens":0}}

data: [DONE]

The final usage chunk includes the same fields as the non-streaming response, including cache_read_tokens and cache_creation_tokens. Numbers are accumulated across every internal step (tool calls, memory lookups, etc.) so you get one clean bill for the whole request. See Understanding the usage object for what each field means.

If stream_options is omitted or include_usage is false, no usage chunk is emitted.

Tool Calling

TIE supports two types of tools:

Client tools — You define them in the tools parameter. TIE returns tool_calls for your app to execute locally and send results back.
Internal tools — memory_search and memory_write run server-side, invisible to the client. See Memory.

Step 1: Send message with tools

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What tasks do I have today?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "list_tasks",
          "description": "List user tasks filtered by status",
          "parameters": {
            "type": "object",
            "properties": {
              "status": {
                "type": "string",
                "enum": ["INBOX", "NEXT_UP", "IN_PROGRESS", "WAITING"]
              }
            }
          }
        }
      }
    ]
  }'

Step 2: Response with tool_calls

When the model wants to call a tool, finish_reason is "tool_calls":

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_xyz789",
            "type": "function",
            "function": {
              "name": "list_tasks",
              "arguments": "{\"status\": \"INBOX\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Step 3: Send tool results

Execute the tool locally, then send the result back:

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What tasks do I have today?"},
      {"role": "assistant", "content": null, "tool_calls": [{"id": "call_xyz789", "type": "function", "function": {"name": "list_tasks", "arguments": "{\"status\": \"INBOX\"}"}}]},
      {"role": "tool", "tool_call_id": "call_xyz789", "content": "[{\"title\": \"Buy groceries\"}, {\"title\": \"Review PR #42\"}]"}
    ],
    "tools": [...]
  }'

The model may request multiple rounds of tool calls. Keep sending results until finish_reason is "stop".

Metadata

Attach arbitrary key-value string pairs to a request using the root-level metadata parameter. This is useful for storing client context (source app, user location, image URLs, UI hints) alongside the conversation without changing the message schema.

Constraints:

Limit	Value
Max keys per request	16
Max key length	64 characters
Max value length	512 characters
Value type	string only

Example:

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "Describe this image"}
    ],
    "metadata": {
      "session_id": "8b94776f-31ae-49ac-9f06-3be137d69186",
      "platform": "ios",
      "app_version": "2.4.1"
    }
  }'

Metadata is echoed back in the response and persisted with the thread. When you retrieve thread history, metadata appears on the user message it was attached to.

When streaming, metadata is included in the first SSE chunk so clients can act on it immediately (e.g. render skill-specific UI before content arrives):

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711234567,"model":"anthropic/claude-sonnet-4-5","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}],"metadata":{"active_skill":"menu","super_skill":"keto"}}

Subsequent chunks do not contain metadata.

Skill Instructions

Use skill_instructions to inject caller-composed skill prompts into the LLM system prompt. This is how app backends send skill-specific behavior to TIE without TIE needing to know about the app's skill definitions.

TIE inserts the text at a fixed position in the prompt stack:

┌─────────────────────────┐
│ Agent base instructions │  ← TIE owns (e.g. Eliza personality, safety rules)
├─────────────────────────┤
│ Persona                 │  ← TIE's persona system
├─────────────────────────┤
│ skill_instructions      │  ← Your text goes here
├─────────────────────────┤
│ Memory context          │  ← TIE adds from the user's knowledge graph
├─────────────────────────┤
│ Conversation history    │  ← Messages
└─────────────────────────┘

Key behavior:

Additive — never replaces the agent's base instructions or persona
Optional — if omitted, behavior is unchanged (persona + memory as before)
TIE does not interpret, cache, or fetch skill_instructions — it only appends the text

Example

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "anthropic/claude-sonnet-4-5",
    "messages": [
      {"role": "user", "content": "What should I order from this Italian restaurant?"}
    ],
    "skill_instructions": "You are in Menu Analysis mode. Break down each menu item by calories, protein, carbs, and fat. Apply the Keto lens: flag items exceeding 20g net carbs.",
    "metadata": {
      "active_skill": "menu",
      "super_skill": "keto"
    }
  }'

In this example, the LLM sees Eliza's personality and safety rules, then the menu analysis instructions, then any memories about the user (e.g. allergies). The metadata is echoed back to the client but not shown to the LLM.

Agent Selection

By default, requests use the chatbot agent. To use a different agent, pass the X-Agent-Id header:

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "X-Agent-Id: research-assistant" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model": "anthropic/claude-sonnet-4-5", "messages": [...]}'

If the request includes a system or developer message, that message replaces the agent's built-in prompt for that request. Memory and personas are still injected.

Per-App Thread Isolation

To scope threads to a specific application, pass the X-App-Id header. Threads sent with an app ID are isolated — they won't appear in other apps' thread lists, and can only be resumed, renamed, or deleted with the same X-App-Id.

curl -X POST https://your-tie-host/v1/chat/completions \
  -H "X-App-Id: vibrantly" \
  -H "X-Agent-Id: chatbot" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model": "anthropic/claude-sonnet-4-5", "messages": [...], "thread_id": "my-thread"}'

When X-App-Id is omitted, threads are unscoped and visible to all apps. See Threads — Per-App Thread Isolation for details.

Agent	Description	Memory	Safety
`chatbot`	General-purpose chatbot	shared	none
`eliza`	AI wellness coach — nutrition guidance, meal logging, supplement tracking	shared	none
`research-assistant`	Web search and calculator	shared	LlamaGuard
`rag-assistant`	Database search (RAG)	isolated	LlamaGuard
`command-agent`	Command execution	none	none
`bg-task-agent`	Background task processing	none	none

Memory: shared — memories are accessible across all agents that use the shared pool. Memory: isolated — memories are scoped to that specific agent.
Safety: LlamaGuard — user inputs and agent outputs are checked against Meta's Llama Guard content safety classifier before processing. Requests flagged as unsafe are rejected.

Memory scope is configured per agent — clients cannot override it. Query GET /info to see all available agents on your instance.

Supported Providers

Query GET /info to see which models are currently available on your instance.