Cache token counts in /v1/chat/completions usage

Apr 17, 2026

TIE AI Gateway

The usage object on /v1/chat/completions now includes two extra fields:

{
  "usage": {
    "prompt_tokens": 4532,
    "completion_tokens": 187,
    "total_tokens": 4719,
    "cache_read_tokens": 4200,
    "cache_creation_tokens": 0
  }
}

Field	What it tells you
`cache_read_tokens`	How many of your `prompt_tokens` were reused from the provider's prompt cache. On Anthropic these are roughly 10× cheaper than regular input tokens.
`cache_creation_tokens`	How many tokens the provider wrote into its cache for next time. Slightly pricier than a normal input token, but pays for itself on the next hit.

Why this matters

Without these fields, you had no way to tell a 10,000-token cached conversation apart from a 10,000-token fresh conversation — so the bill you computed was always the worst-case number. That could overstate input costs by up to 67% on long, repeat conversations (Anthropic's standard input is $3.00/MTok vs cache reads at $0.30/MTok).

Now you can compute what you're actually being charged:

fresh_input  = prompt_tokens - cache_read_tokens - cache_creation_tokens
real_cost    = (fresh_input        × standard_rate)
             + (cache_read_tokens  × cache_read_rate)
             + (cache_creation_tokens × cache_write_rate)
             + (completion_tokens  × output_rate)

Supported providers

Provider	`cache_read_tokens`	`cache_creation_tokens`
Anthropic (Claude)	Yes	Yes
OpenAI (GPT)	Yes	Always `0` — OpenAI doesn't report cache writes separately
Vertex AI, Gemini, Groq, Mistral, etc.	Always `0`	Always `0`

If a provider doesn't support prompt caching, both fields are 0. Nothing breaks — you just don't see savings where there are none to see.

Backward compatibility

Existing callers that ignore the new fields are unaffected — they're purely additive.
Both fields default to 0, so old clients that deserialize into strict schemas continue to work as long as they allow unknown or zero-default fields.
Streaming responses (stream_options.include_usage: true) include the new fields in the final usage chunk.

See the full field reference at Understanding the usage object.