OpenAI Costs Suddenly Spiked? Find the Cause in 10 Minutes
A spike has one of five causes: a loop bug, a model bump, context growth, lost cache hits, or traffic. The 10-minute diagnostic, in order, with math to confirm each.
A sudden OpenAI cost spike has one of five causes, in rough order of frequency: a retry or loop bug, a model bump, context growth, a broken cache prefix, or genuine traffic. The diagnosis runs through the usage page in about ten minutes: find the day it started, find the project or key, match the spike’s shape to its cause. If money is actively burning, revoke the key first and diagnose second.
Minutes 0-2: read the usage page
Open platform.openai.com/usage. Three reads, in order:
- Find the day. Switch to daily granularity. A spike has a start date; that date is your strongest clue.
- Find the model. Group by model. A new model name appearing in the mix, or an old one’s share jumping, points straight at a config change.
- Find the project. Group by project to get the app. If all apps share one key, note that as post-incident homework: per-project keys are what make this step take ten seconds instead of an hour. Capping spend per app covers the setup.
Minutes 2-5: find the deploy
Take the start date to your deploy log, merged PRs, and cron schedule. You are looking for anything that shipped within a day of the spike: a dependency bump, a prompt refactor, a new retry wrapper, a feature flag, a model default. Costs do not spike on their own; something shipped, looped, or switched models. No deploy near the date usually means traffic or accumulating state, which the next step confirms.
Minutes 5-10: match the shape to the cause
| Spike shape | Likely cause | Confirm by |
|---|---|---|
| Vertical cliff, off-hours | Retry storm or loop bug | Request count by hour; error rate |
| Step up on a deploy day | Model bump or prompt change | Per-model usage split; config diff |
| Gradual ramp over weeks | Traffic or context growth | Tokens per request trend vs request count |
| Input-heavy jump | Cache hits lost | Cached vs uncached input token split |
| Output-heavy jump | Reasoning effort or max_tokens | Output-to-input ratio per model |
Two of these deserve their own articles: tokens-per-request creep is dissected in the bill diagnostic checklist, and the output-heavy case is usually reasoning tokens billing as output, where a three-line visible answer carries thousands of paid thinking tokens behind it.
The two most expensive one-liners
Worked with June 2026 prices (live at openai.com/api/pricing).
The model bump. A summarizer runs 2,000 jobs a day at 8,000 input / 1,000 output tokens:
GPT-5 (1.25/10): 0.008 x $1.25 + 0.001 x $10 = $0.02/job -> $40/day -> $1,200/mo
GPT-5.5 (5/30): 0.008 x $5 + 0.001 x $30 = $0.07/job -> $140/day -> $4,200/mo
One config line, 3.5x the bill. A default-model bump from GPT-5 to GPT-5.5 multiplies the same workload’s cost by 3.5x at this token mix.
The cache break. The same job with 6,000 of its 8,000 input tokens in a stable cached prefix bills input at $0.125 per million for the cached part:
cached: 0.002 x $1.25 + 0.006 x $0.125 + 0.001 x $10 = $0.01325/job
uncached: 0.008 x $1.25 + 0.001 x $10 = $0.02/job
A prompt refactor that reorders the prefix (a timestamp moved to the top is the classic) silently raises per-job cost by 51% with zero visible change in behavior. How prefix caching works, and how to structure prompts so it keeps working, is in OpenAI prompt caching explained.
Stop the bleed, then prevent the rerun
During the incident: revoke or rotate the spiking key, ship the fix, restore the key. Spend stops the moment the key dies, which is why key-per-app is incident response infrastructure, not bookkeeping.
After it: set monthly budgets and alert thresholds on every project, cap retries in code with exponential backoff and a max attempt count, and put hard per-app dollar caps on anything that runs unattended. The full containment stack, including caps that enforce at request time, is in how to cap OpenAI API spending.
The structural version of “this can never happen again” is moving loop-prone bulk workloads onto flat-cost capacity, where the worst case is an exhausted usage window instead of an open-ended invoice. The calculator shows what that looks like for your current bill.
Frequently asked questions
Why did my OpenAI API costs suddenly spike?
Five causes account for nearly every spike: a retry or loop bug, a model change (someone bumped the default to a pricier tier), context growth in an agent or chat history, a prompt refactor that broke cached-input pricing, or genuine traffic growth. The usage page at platform.openai.com narrows it to one in about ten minutes.
How do I find which API key is spending the most?
Open the usage page on platform.openai.com and group spend by project. Each app should have its own project and key so the spike is attributable; if everything shares one key, that is the first thing to fix after the incident.
Can a model change alone triple my bill?
Yes. Moving a workload from GPT-5 ($1.25/$10 per million tokens) to GPT-5.5 ($5/$30) multiplies the same job's cost by roughly 3.5x at typical input-output ratios. A one-line default-model change is one of the two most expensive diffs in production AI.
How do I stop a runaway OpenAI loop right now?
Revoke or rotate the key the loop is using; its requests fail immediately and the spend stops. Then fix the retry logic with capped backoff, and put a budget and per-app caps in place so the next incident stops itself.