OpenAI Batch API: When Half-Price Is Worth the 24-Hour Wait
We had a nightly job that processed 50,000 documents, extracted key fields, and stored them. It ran at 2am. No user ever saw it live. We were paying full standard-API rates for work that could have waited until morning. We moved it to OpenAI's Batch API and the bill on that feature dropped by half. About $1,800/month saved on one cron job.
What Batch actually is
Batch is a separate OpenAI endpoint with two trade-offs: a flat 50% discount on input and output tokens, and a turnaround commitment of "within 24 hours" rather than seconds. You upload a JSONL file of requests and get back a JSONL file of responses whenever the job completes. No streaming. No interactive UX. Most jobs complete in well under an hour in our experience, but the SLA is 24h and you should plan for that.
The workloads that fit
Anything that runs on a schedule or in the background and doesn't need to be fresh by the second. Nightly report generation. Bulk document processing. Re-embedding a search index after a content update. Large-scale evaluation runs. Backfilling AI-generated metadata for existing rows in a database. We moved three of our workloads — the nightly doc job, weekly evaluation runs, and a monthly content-tagging refresh — to Batch and they all just ran cheaper.
The workloads that don't
Anything user-facing and synchronous. Chat. Real-time autocomplete. Anything where a person is waiting on the response. Even "near real-time" workflows with a 5-second budget aren't Batch candidates. A useful test: if you'd be embarrassed to tell a user "we'll have your answer by tomorrow," it's not for Batch.
The math, roughly
Batch is 50% off both input and output rates. So for every $10,000/month in batch-eligible spend you save $5,000. We had about $3,600 of compatible spend across our cron jobs and saved about $1,800. Most teams we talk to find 10–25% of their total OpenAI bill is genuinely batchable — almost always more than they guessed before they sat down to list the jobs.
What to do tomorrow
Open the file with all your AI calls in it. For each one, ask: "could this run overnight without anyone caring?" Mark the ones that can. Add up their current monthly cost. Halve it. That's your savings if you move them. Refactoring each job to use the Batch endpoint took us about a day per job — a small price for a recurring 50% discount on the workload it covers.
The AI cost calculator has a Batch API toggle that does the halving for you. To see which of your features and customers actually drive the batchable spend (vs the real-time spend) in the first place, that's what PerUnit is for.