← Blog

Batch API: When the 50% Discount Is Worth It

We had a nightly job: process 50,000 documents, extract key points, store in our database. It ran at 2am. Users never saw it. We were paying full price for async work that could wait 24 hours. Then we switched to OpenAI's Batch API. 50% off. Same results. Next month we saved $1,800. When to use Batch API: any workload that doesn't need immediate response. The Batch API 50 percent discount is real — but the trade-off is latency.

OpenAI Batch API: when the 50% discount makes sense

Nightly report generation. Batch document processing. Bulk embeddings for our search index. Large-scale evaluation runs. Anything that doesn't need an immediate response. If you can wait hours, the 50% discount adds up. At $10k/month in compatible spend, you save $5k. We had about $3.6k in batch-suitable work. We saved $1.8k. OpenAI async API cost drops by half when you use Batch. The OpenAI Batch API is built for exactly this: high-volume, non-real-time workloads.

When Batch API doesn't work

User-facing chat. Real-time features. Anything where latency matters. Batch API isn't an option for those. Results can take up to 24 hours. You also need to structure your workload for batch: upload inputs, poll for completion, handle results. That's more engineering than a simple API call. When to use Batch API: only when your workload can tolerate the delay. For our nightly job, the refactor took a day. Worth it. For chat, it would have been pointless.

How we decided what to move

We listed every AI workload. Real-time? Stay on the standard API. Runs on a schedule or in the background? Batch candidate. We moved three workloads. Kept seven on real-time. The Batch API 50 percent discount only helps where you have compatible spend. We had about 20% of our total. The savings were meaningful. If you have more batch-suitable work, the OpenAI Batch API discount could save you thousands.

To estimate how much you could save, calculate your current cost for batch-suitable workloads with our AI cost calculator, then halve it. For cost by customer and feature across all workloads, PerUnit gives you the full picture.

Need cost per customer, not just totals?

PerUnit breaks down your AI spend by customer, feature, and pricing tier — so you know who to charge more, what to gate, and where to cut.

Get early access to PerUnit →