Fallback Lanes: How Codex Hosted Survives Usage Limits
Codex Hosted orders your credentials into lanes: ChatGPT accounts first, your API key last. How failover works, what the request log shows, and how to size the stack.
A fallback lane is the credential Codex Hosted uses next when the one above it hits a usage limit. Lanes run in a fixed order: your primary ChatGPT account, then a second connected account if you have one, then your own OpenAI API key. When a plan window resets, traffic moves back up. A request never waits on a limit and never fails because of one; it runs on the next lane down, and the request log records which lane served it.
That is the summary. The rest of this page is the mechanics: what a lane is, exactly what happens at the moment a limit hits, how to read the per-lane log, and how to size the stack from observed traffic instead of guesses.
What a lane is
Every credential you connect becomes a lane, and there are two kinds.
Codex lanes are ChatGPT accounts. Each one signs in through OpenAI’s device-code flow into its own isolated container, and the work it serves bills to that account’s flat plan. Capacity arrives in rolling usage windows that OpenAI sets per plan; we deliberately avoid quoting window numbers because they drift, and OpenAI publishes current limits on its pricing page. Responses on these lanes arrive complete rather than streamed.
API-key lanes are your own OpenAI keys, passed through with no markup and encrypted at rest with AES-256-GCM. They are metered per token, have no usage window, and stream normally.
The two kinds fail in opposite ways. A subscription lane runs out on a schedule and comes back on a schedule. A metered lane never runs out and never stops charging. Ordering subscription lanes above the key produces the policy the whole product is built on: flat where possible, metered only when necessary.
The failover order
lane 1 Codex · account A flat window A primary
lane 2 Codex · account B flat window B optional
lane 3 OpenAI API key metered no window last resort
Every request enters at the highest lane that currently has capacity. When OpenAI signals that account A’s window is exhausted, the gateway marks lane 1 as cooling down, runs the triggering request on lane 2, and keeps routing there until lane 1’s window resets. If lane 2 exhausts too, traffic runs metered on lane 3.
Two details matter in production:
- The triggering request survives. The request that discovers an exhausted window is re-run on the next lane, not bounced back to your app as an error.
- Recovery is automatic. You never re-enable a lane after a reset. The gateway moves traffic back up to the cheapest available lane on its own.
How OpenAI’s windows behave, including the weekly components on some plans, is covered in Codex usage limits, explained.
What your app sees during failover
The same endpoint and the same response shape. Your code does not handle lanes; it sends OpenAI-format requests to one base URL and gets completions back.
The one behavioral difference between lanes is delivery. Codex lanes return complete responses; API-key lanes stream when a client asks for streaming. Backend jobs, agents, and scheduled pipelines rarely notice. If something in your stack genuinely needs token-by-token output, keep it on an API-key lane and let the bulk traffic ride the subscription lanes.
Reading the per-lane request log
Every entry in the request log names the lane that served it, alongside the API-equivalent value of the call. Three reads matter in your first week:
- Overflow share. The fraction of API-equivalent value served by the key lane. This is your sizing signal, priced in real dollars.
- Time-of-day clustering. Bursts that exhaust a window early leave a visible pattern: lane 1 in the morning, lane 3 by afternoon. That pattern argues for a second account more strongly than raw volume does.
- Recovery. Traffic returning to lane 1 after each reset confirms the stack is healthy.
Overflow shows up in the log the same day it happens, weeks before it shows up on an invoice.
Sizing the stack
Our planning estimates for what each plan absorbs per month: Plus roughly $700 of API-equivalent work, Pro 5x roughly $3,500, Pro 20x roughly $14,000. Estimates, never guarantees.
| Traffic profile | Lane stack | Fixed cost per month |
|---|---|---|
| Under ~$150 API spend | API key only (free tier) | $0, stay metered |
| ~$250 to $700, steady | Plus + API key | $20 + $129 |
| ~$700 to $3,500, mixed | Pro 5x + API key | $100 + $129 |
| ~$3,500 to $7,000, bursty | Pro 5x + Pro 5x + API key | $200 + $129 |
| Sustained heavy volume | Pro 20x + API key | $200 + $129 |
The bursty row deserves a note. Two Pro 5x windows cost the same $200 as one Pro 20x but fail independently, and independent windows rarely exhaust together. For spiky traffic, two medium windows often beat one large one even though the combined capacity estimate is lower.
The rule of thumb for adding a lane: when the log shows overflow regularly costing more than a plan’s subscription price at API rates, the plan pays for itself. $180 of key-lane overflow in a month is a $20 Plus account waiting to be connected. Strategies for running several accounts cleanly are in Codex with multiple accounts.
Rules that apply to every lane
Each connected ChatGPT account must be your own, with its own paid subscription, used for your own workloads. One account, one container, never pooled with anyone else’s, and we never see your password; sign-in happens between you and OpenAI. Programmatic Codex use is documented, intended functionality, and OpenAI has the final call over its services. The broader limit-day picture, including OpenAI’s on-demand credits as a manual relief valve, is in what happens when you hit your Codex limit.
Connect your lanes once with the setup guide, watch the log for a week, and size from what you see. If you have not mapped your current bill to a lane stack yet, the calculator does it in thirty seconds.
Frequently asked questions
What is a fallback lane in Codex Hosted?
A lane is one connected credential: a ChatGPT account running Codex in its own isolated container, or an OpenAI API key. Lanes form an ordered list. Requests run on the highest lane with capacity, and when a plan's usage window is exhausted, traffic moves to the next lane automatically.
What order do fallback lanes run in?
Subscription lanes first, metered last. Your primary ChatGPT account serves traffic until its window is exhausted, then a second connected account takes over if you have one, then your own OpenAI API key. When a subscription window resets, traffic moves back up the list.
Do requests fail when a Codex usage limit hits?
No. The gateway detects the exhausted window and re-runs the request on the next lane instead of returning an error. Your app sees a normal response, and the request log records which lane served it.
How do I see which lane served a request?
Every entry in the ProxyLLM request log names its lane, account A, account B, or API key, along with the API-equivalent value of the call. Filtering by lane shows exactly how much traffic overflowed in any period.
How many lanes do I need?
Start with one ChatGPT account plus an API key as the last lane. After a week, check the log: if overflow onto the key regularly costs more than a plan's subscription price at API rates, adding that plan as a lane pays for itself.