← Blog

Prompt Caching: How Much Does It Actually Save?

Our system prompt is 1,800 tokens. We send it with every request. 12,000 requests per day. We were paying for those 1,800 tokens 12,000 times. Then we turned on prompt caching. Same prompt, 50% discount on cached tokens. Our input bill dropped 22%. Not huge — but enough to matter for a high-volume feature. Prompt caching: how much does it actually save? It depends on how much of your input repeats. For us, it was worth it.

OpenAI and Anthropic prompt caching: when it helps

Long system prompts. RAG contexts that repeat across requests. Conversation history you resend. If your repeated input is a big chunk of your total, caching can cut a meaningful slice. OpenAI prompt caching and Anthropic prompt caching both offer roughly 50% off on cached tokens. Our document analysis feature had a 3,000-token context that repeated across batches — caching saved us 35% on that feature. Our chat feature: highly variable prompts, little repetition. Caching saved almost nothing. We turned it off there. Reduce AI input cost where repetition is high.

Anthropic prompt caching vs OpenAI: implementation

Both providers have different APIs for caching. You need to use their caching endpoints, pass the right parameters, and handle cache invalidation when your prompts change. It adds complexity. For our high-volume document feature, the engineering cost was worth it. For low-volume or highly variable features, we skipped it. The savings wouldn't have justified the work. Run the numbers for your use case. Prompt caching savings only matter if the repeated input is a significant share of your bill.

What we learned

We saved 22% on our highest-volume feature. We saved 35% on document analysis. We saved nothing on chat. Prompt caching how much save depends entirely on your workload. If you have long, repeated contexts, it's worth testing. If every request is different, skip it. The 50% discount sounds great — but 50% of a small number is still small.

To estimate your baseline cost before caching, use our AI cost calculator. For cost by customer and feature, PerUnit gives you the breakdown.

Need cost per customer, not just totals?

PerUnit breaks down your AI spend by customer, feature, and pricing tier — so you know who to charge more, what to gate, and where to cut.

Get early access to PerUnit →