Skip to content

Cache token counts in /v1/chat/completions usage

The usage object on /v1/chat/completions now includes two extra fields:

{
"usage": {
"prompt_tokens": 4532,
"completion_tokens": 187,
"total_tokens": 4719,
"cache_read_tokens": 4200,
"cache_creation_tokens": 0
}
}
FieldWhat it tells you
cache_read_tokensHow many of your prompt_tokens were reused from the provider's prompt cache. On Anthropic these are roughly 10× cheaper than regular input tokens.
cache_creation_tokensHow many tokens the provider wrote into its cache for next time. Slightly pricier than a normal input token, but pays for itself on the next hit.

Why this matters

Without these fields, you had no way to tell a 10,000-token cached conversation apart from a 10,000-token fresh conversation — so the bill you computed was always the worst-case number. That could overstate input costs by up to 67% on long, repeat conversations (Anthropic's standard input is $3.00/MTok vs cache reads at $0.30/MTok).

Now you can compute what you're actually being charged:

fresh_input = prompt_tokens - cache_read_tokens - cache_creation_tokens
real_cost = (fresh_input × standard_rate)
+ (cache_read_tokens × cache_read_rate)
+ (cache_creation_tokens × cache_write_rate)
+ (completion_tokens × output_rate)

Supported providers

Providercache_read_tokenscache_creation_tokens
Anthropic (Claude)YesYes
OpenAI (GPT)YesAlways 0 — OpenAI doesn't report cache writes separately
Vertex AI, Gemini, Groq, Mistral, etc.Always 0Always 0

If a provider doesn't support prompt caching, both fields are 0. Nothing breaks — you just don't see savings where there are none to see.

Backward compatibility

  • Existing callers that ignore the new fields are unaffected — they're purely additive.
  • Both fields default to 0, so old clients that deserialize into strict schemas continue to work as long as they allow unknown or zero-default fields.
  • Streaming responses (stream_options.include_usage: true) include the new fields in the final usage chunk.

See the full field reference at Understanding the usage object.