← Blog

Claude vs GPT vs Gemini for Long Context: Which Is Cheapest in 2026?

Our document analysis feature sends about 80,000 input tokens per request. Full doc, full conversation history, full prompt template. When we looked at the bill, input cost was 2.5× output cost — we were paying for every token of context we sent on every single call. The "which model is cheapest at long context" question stopped being academic.

Input pricing at the low end

For high-volume, simple long-context work — bulk summarisation, extraction, scoring — the cheap-tier numbers as of April 2026:

Gemini 2.0 Flash is $0.10/1M input. GPT-4o mini is $0.15/1M input. Claude Haiku 3 is $0.25/1M input. At 80K input tokens per request, that works out to $0.008 vs $0.012 vs $0.020 per request. Across a million requests a month, the spread is $8,000 vs $12,000 vs $20,000. We moved bulk summarisation onto Gemini Flash and that one feature's bill dropped about 40%.

Input pricing at the flagship tier

For nuanced analysis — contract review, multi-document Q&A, anything where the model has to actually reason over 100K+ tokens — the cheap tier doesn't hold up and you're back on a flagship. Gemini 2.5 Pro is $1.25/1M input. GPT-4.1 is $2.00/1M. Claude Sonnet 4.6 is $3.00/1M.

Gemini Pro wins on price. Claude has the most consistent reasoning over very long context in our tests, even though its absolute context window (200K) is smaller than Gemini's (1M–2M). GPT-4.1 sits in the middle on both. There isn't a clean "cheapest" answer at the flagship tier — it's a price-vs-quality call you have to make per workload.

Where we landed

We split it. Bulk document summarisation runs on Gemini 2.5 Flash — millions of docs a month, simple extraction, the quality bar is "did it pull the right fields?" Contract analysis runs on Claude Sonnet 4.6 — thousands of docs a month, the quality bar is "would a lawyer trust this?" One wrong extraction in a legal doc cost us more than a quarter of token savings on that feature. The split dropped our long-context spend about 30% overall without any quality regression we could detect.

The lesson wasn't "Gemini is cheapest, switch everything." It was "long context is one workload type, and within it you have at least two quality bars. Match the model to the bar, not the bar to the model."

One thing the comparisons usually miss

Most cost comparisons quote the flat per-million-token rate. For long context specifically, two other things matter. First, prompt caching — sending the same large context repeatedly is much cheaper if you cache it (Anthropic and OpenAI both offer it explicitly, Gemini handles a version of it implicitly on Flash). For repeat-context workloads it can drop input cost 50–90%. Second, some Gemini Pro tiers charge a higher rate above 200K tokens — check the specific pricing tier for your context size before assuming the headline number applies.

To plug your own token volumes into a side-by-side, the model comparison tool and the context window cost calculator cover the math. For cost by feature across providers — so you know which feature is the long-context spender — PerUnit fills in the picture.

Need cost per customer, not just totals?

PerUnit breaks down your AI spend by customer, feature, and pricing tier — so you know who to charge more, what to gate, and where to cut.

Get early access to PerUnit →