Cache token counts in /v1/chat/completions usage
The usage object on /v1/chat/completions now includes two extra fields:
{ "usage": { "prompt_tokens": 4532, "completion_tokens": 187, "total_tokens": 4719, "cache_read_tokens": 4200, "cache_creation_tokens": 0 }}| Field | What it tells you |
|---|---|
cache_read_tokens | How many of your prompt_tokens were reused from the provider's prompt cache. On Anthropic these are roughly 10× cheaper than regular input tokens. |
cache_creation_tokens | How many tokens the provider wrote into its cache for next time. Slightly pricier than a normal input token, but pays for itself on the next hit. |
Why this matters
Without these fields, you had no way to tell a 10,000-token cached conversation apart from a 10,000-token fresh conversation — so the bill you computed was always the worst-case number. That could overstate input costs by up to 67% on long, repeat conversations (Anthropic's standard input is $3.00/MTok vs cache reads at $0.30/MTok).
Now you can compute what you're actually being charged:
fresh_input = prompt_tokens - cache_read_tokens - cache_creation_tokensreal_cost = (fresh_input × standard_rate) + (cache_read_tokens × cache_read_rate) + (cache_creation_tokens × cache_write_rate) + (completion_tokens × output_rate)Supported providers
| Provider | cache_read_tokens | cache_creation_tokens |
|---|---|---|
| Anthropic (Claude) | Yes | Yes |
| OpenAI (GPT) | Yes | Always 0 — OpenAI doesn't report cache writes separately |
| Vertex AI, Gemini, Groq, Mistral, etc. | Always 0 | Always 0 |
If a provider doesn't support prompt caching, both fields are 0. Nothing breaks — you just don't see savings where there are none to see.
Backward compatibility
- Existing callers that ignore the new fields are unaffected — they're purely additive.
- Both fields default to
0, so old clients that deserialize into strict schemas continue to work as long as they allow unknown or zero-default fields. - Streaming responses (
stream_options.include_usage: true) include the new fields in the final usage chunk.
See the full field reference at Understanding the usage object.