Context Window Cost Calculator

Long context costs add up fast — and the per-request math is what surprises most teams the first time they run it. Plug in your context size and see what one request costs across every major model, sorted cheapest first.

Last updated: April 2026

Your context

Input tokens (context size)

System prompt + context (e.g. RAG docs, conversation history). 100k tokens ≈ 75k words.

Output tokens per request

Requests per day

Cost per request and per month

Sorted by monthly cost. Input-heavy workloads favor cheaper input pricing.

Gemini 2.0 Flash

google

$0.010/request

$30.60/mo

GPT-4o mini

openai

$0.015/request

$45.90/mo

GPT-5.4 nano

openai

$0.021/request

$61.88/mo

Claude Haiku 3

anthropic

$0.026/request

$76.88/mo

Gemini 2.5 Flash

google

$0.031/request

$93.75/mo

GPT-4.1 mini

openai

$0.041/request

$122.40/mo

GPT-5.4 mini

openai

$0.077/request

$231.75/mo

Claude Haiku 4.5

anthropic

$0.103/request

$307.50/mo

o4-mini

openai

$0.112/request

$336.60/mo

Gemini 2.5 Pro

google

$0.130/request

$390.00/mo

GPT-4.1

openai

$0.204/request

$612.00/mo

openai

$0.204/request

$612.00/mo

Long-context features are usually used by a few power users.

The pattern we see most often: a small group of customers sends 80%+ of the long-context requests, and they're rarely on your top tier. PerUnit shows you who's sending the long prompts and which features they're hitting — so the gating and pricing decisions are obvious.

Get early access to PerUnit

Frequently asked questions

Why does long context cost so much more?: Because input is the bulk of the bill at long context lengths. A 100K-token prompt with a 500-token response is 200× more input than output — even though input is priced cheaper per token, the volume swamps everything else. At 80K input + 500 output on a flagship model, input typically accounts for 90%+ of the per-request cost.
Does prompt caching reduce long-context costs?: A lot. OpenAI and Anthropic both offer explicit prompt caching; Gemini does an implicit version on Flash. For workloads where the same large context (system prompt, RAG corpus, conversation history) repeats across many requests, caching can drop the input portion of the bill by 50–90%. It only helps if the cacheable portion is genuinely repeated — one-off long-context requests get no benefit.
Which model is cheapest for 200K-token requests?: On price alone, Gemini 2.5 Flash at the cheap tier ($0.10/M input ≈ $0.020/request) and Gemini 2.5 Pro at the flagship tier ($1.25/M input ≈ $0.250/request) win. Claude tops out at 200K context and is more expensive per token but consistently strong on reasoning over long context. GPT-4.1 sits in the middle on both. Use the calculator above to plug in your output volume and see which one wins for your specific workload.
What's the largest context window available right now?: Gemini 2.5 Pro and Flash support 1M tokens across all tiers, with some configurations going to 2M. GPT-4.1 supports just over 1M. Claude Sonnet/Opus/Haiku 4.x cap at 200K. GPT-4o caps at 128K. Bigger isn't automatically better — long-context reasoning quality varies by model, and many models degrade past a certain depth even when they technically accept the input.

One email a month. New model launches, price cuts, deprecations across OpenAI, Anthropic, and Google. No spam, unsubscribe any time.

Context Window Cost Calculator

Your context

Cost per request and per month

Frequently asked questions

Get a monthly email when AI model pricing changes