How to Cap OpenAI API Spending Before It Caps You

Set OpenAI's budget alerts and monthly limits, then add hard per-app caps with scoped sub-keys. A runaway GPT-5.5 loop burns about $576 an hour; here is the containment stack.

You cap OpenAI API spending at two levels: OpenAI’s own monthly budgets with alert thresholds, set per organization and per project, and hard per-app dollar caps enforced at request time through a gateway key. The first level is built in and takes five minutes to configure. The second is what actually stops a runaway loop mid-flight instead of emailing you about it afterward.

How fast a runaway loop spends

The case for caps is arithmetic. Take a cron job with a broken retry: it re-sends a failed GPT-5.5 call in a tight loop, 2 calls per second, 10,000 input and 1,000 output tokens per call. At June 2026 rates ($5 input / $30 output per million tokens, live prices at openai.com/api/pricing):

cost per call = 0.01 x $5 + 0.001 x $30 = $0.05 + $0.03 = $0.08
2 calls/second = $0.16/second
Loop runs forCost
1 minute$9.60
1 hour$576
Overnight (8h)$4,608
24 hours$13,824

A runaway GPT-5.5 retry loop at two calls per second burns about $576 an hour. Overnight incidents like this are the standard origin story behind a suddenly high OpenAI bill.

What OpenAI’s own controls give you

Configure these first; they are free and live in the platform settings at platform.openai.com:

  • Monthly budget. A dollar amount per organization, and per project, after which requests fail for the month. Set it just above a normal month, not at your annual pain threshold.
  • Alert thresholds. Email notifications at amounts you choose below the budget. Useful as tripwires: an alert at 50% of budget on day 5 is a fire alarm.
  • Projects and per-project keys. Each app gets its own project, key, budget, and rate limits, so spend is attributable and an incident is contained to one project.
  • The usage page. Spend by day, model, and project. This is where you diagnose; the diagnostic checklist walks the full read.

Two honest limitations. Metering lags real time by a few minutes, so a hot loop can overshoot a budget slightly before enforcement lands. And budgets are monthly: a loop that burns half the monthly budget on day 2 was “within limits” the whole time. An alert tells you the money is going; a cap is what keeps it.

Per-app hard caps with scoped sub-keys

The gap in OpenAI’s controls is granularity and immediacy, and a gateway key fills it. With ProxyLLM, we issue scoped sub-keys under one account: one key per app, per client, or per environment, each with its own hard dollar budget enforced at request time. When a key hits its cap, that key stops; everything else keeps running.

What that looks like operationally:

  • A staging key capped at $20 a month, so a load test can never become an invoice.
  • One key per client with a cap matched to their retainer, so a single runaway workflow cannot eat the shared capacity.
  • A request log per key, so “what did app X spend this week” is a filter, not a spreadsheet.
  • Instant revoke, which is the kill switch during an incident.

Sub-keys work on our free Starter tier with your own OpenAI key as a passthrough lane: no inference markup, keys encrypted, logs and caps included. The cap enforcement happens before the request reaches OpenAI, which removes the metering-lag problem for the keys you scope.

The structural cap: flat-cost capacity

Budgets and caps limit damage on a meter. The stronger version is removing the meter from bulk workloads: a ChatGPT plan billed flat and run through Codex Hosted cannot bill more than the subscription costs. The plan’s usage windows become the limit, your API key stays connected as the overflow lane, and worst-case monthly spend becomes the plan price plus whatever overflow you allow, instead of unbounded.

The full meter-vs-flat arithmetic lives in the API vs subscription cost comparison; capping is one of several levers ranked in how to reduce OpenAI API costs.

The containment checklist

  1. Set a monthly budget per organization and per project, just above normal.
  2. Add alert thresholds at 50% and 80% of each budget.
  3. Split apps onto separate projects and keys.
  4. Issue scoped sub-keys with hard per-app dollar caps for anything that loops, retries, or runs unattended.
  5. Cap retries in code: exponential backoff with a maximum attempt count, never a bare while-loop.
  6. Move bulk, loop-heavy workloads onto flat capacity so the worst case is a window, not a bill.

If you want the per-key caps and logs without changing anything else, our free tier sets up in a few minutes, and the calculator shows what the flat lane would absorb.

Frequently asked questions

Can you set a hard spending limit on the OpenAI API?

Yes. In the OpenAI platform settings you can set a monthly budget for the organization and for each project, with email alerts at thresholds you choose. Metering lags real time by a few minutes, so a fast loop can overshoot slightly before the budget bites, which is why per-app caps at the gateway are worth adding.

Does OpenAI stop requests when you hit your budget?

When the monthly budget is reached, API requests start failing for that organization or project. Alert thresholds below the budget only send email; they do not stop traffic. An alert tells you the money is going; a cap is what keeps it.

How do I limit OpenAI spending per app or per client?

Two layers work together: give each app its own OpenAI project and key so spend is attributable, then issue scoped sub-keys through a gateway with a hard dollar cap per key. ProxyLLM sub-keys enforce per-key budgets at request time and log every request per key, on the free tier with your own OpenAI key.

What is the fastest way to stop a runaway OpenAI loop?

Revoke or rotate the API key the loop is using; requests fail immediately. Then fix the retry logic, set a budget below your pain threshold, and split apps onto separate keys so the next incident is contained to one app instead of the whole account.

More on OpenAI costs
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.