Why Agent Workloads Flip the API-vs-Subscription Math
One agent task is 5 to 50 model calls, and context regrows every step. Worked loop math showing where per-token billing breaks and flat plan windows win.
A chat completion is one call; an agent task is 5 to 50. Per-token pricing bills every plan step, tool call, retry, and review pass, and each step re-sends the growing context as fresh input tokens. A subscription window does not care how many calls a task took. That single difference flips the API-vs-subscription math harder for agents than for any other workload.
Here is the multiplier, a worked loop with real numbers, and where the flat-window model honestly strains.
Where the multiplier comes from
An agent loop spends tokens in four ways a single completion never does:
- Step count. Plan, call a tool, read the result, decide, repeat. Five steps is a modest agent; thirty is unremarkable.
- Context regrowth. Step 9 re-sends steps 1 through 8 as input. Input tokens scale roughly with the square of conversation length, not linearly with steps.
- Tool chatter. Search results, fetched pages, and API payloads enter the context and get re-sent on every subsequent call.
- Retries and review. Failed parses, self-checks, and critic passes are extra full-context calls.
The user sees a 150-word email; the meter sees eighty thousand tokens. The formula for estimating this for your own loops is in how to calculate AI agent costs.
A worked agent loop
Task: research a company and draft an outreach email. Ten steps, context accumulating, on GPT-5.4 (OpenAI June 2026 list: $2.50/M input, $15/M output).
| Step | Input tokens | Output tokens |
|---|---|---|
| 1. Plan | 1,200 | 300 |
| 2. Search query | 1,800 | 150 |
| 3. Read results | 4,500 | 400 |
| 4. Fetch website | 5,200 | 150 |
| 5. Read page | 9,000 | 500 |
| 6. CRM lookup | 9,800 | 120 |
| 7. Synthesize notes | 11,000 | 700 |
| 8. Draft email | 12,200 | 450 |
| 9. Critic pass | 13,000 | 350 |
| 10. Final revision | 13,600 | 400 |
| Total | 81,300 | 3,520 |
Input: 81,300 × $2.50/M = $0.203
Output: 3,520 × $15/M = $0.053
Cost per task ≈ $0.26
Twenty-six cents sounds harmless until you multiply by volume:
| Tasks/day | API cost/mo | Subscription path (estimate) |
|---|---|---|
| 50 | ~$390 | Plus $20 + $129 = $149 (window ≈ $700 equiv.) |
| 300 | ~$2,300 | Pro 5x $100 + $129 = $229 (≈ $3,500 equiv.) |
| 1,000 | ~$7,700 | Pro 20x $200 + $129 = $329 (≈ $14,000 equiv.) |
Capacity figures are our planning estimates, never guarantees; OpenAI sets and adjusts the underlying limits. The 30-day always-on version of this table, modeled at three volumes, is in what a 24/7 AI agent actually costs.
The meter taxes exactly what makes agents good
Retries, self-checks, and one more tool call are what turn a mediocre agent into a reliable one, and on per-token billing each of those quality moves costs real money. Teams respond predictably: trim the critic pass, cap the loop at five steps, cache aggressively, accept worse output to protect the budget.
On a meter, every retry is a cost decision; on a window, it is just a retry. When the marginal call is free until the window, you tune the agent for outcome quality instead of token thrift. That, more than the headline savings, is why agent builders move to subscription-backed lanes: windows price the task, meters price every step of it. MCP-style tool loops sharpen the same effect, which we cover in MCP tool loops and the cost of agency.
Where the window model honestly strains
Three limits, stated plainly. Windows exhaust: a bursty fleet of agents can hit a plan’s rolling limits mid-run, which is why production setups keep a second connected account and an API key as ordered fallback lanes; the failover mechanics are here. Responses arrive complete, not streamed: ideal for agents, which need the full answer before acting, wrong for user-facing chat UIs, which should keep a streaming API-key lane. And the model surface is what Codex serves: embeddings and fine-tunes stay on your own key.
Sizing a subscription for agent traffic
Measure before you size: instrument calls per task and tokens per call for a week, or run a representative day through the calculator against your actual API bill. Loops under roughly 100 tasks a day tend to fit a Plus window on our estimates; steady multi-hundred-task fleets belong on Pro tiers with a fallback lane configured.
Codex Hosted runs your agents on your own ChatGPT account’s windows, complete responses, per-lane request logs, $129 flat with no markup. The loop math above is thirty seconds to check against your own bill.
Frequently asked questions
Why are AI agents so expensive on the OpenAI API?
Because one task is many calls. An agent plans, calls tools, reads results, retries, and reviews, commonly 5 to 50 model calls per task, and each call re-sends the growing conversation as input tokens. The meter bills every step, so agent tasks cost 10x to 50x what a single completion suggests.
How many API calls does an AI agent make per task?
A simple tool-using agent typically makes 5 to 15 calls per task; research or multi-step coding agents commonly run 20 to 50. Each later call carries the accumulated context of earlier steps, so input tokens grow faster than the call count.
Is a ChatGPT subscription better than the API for agent workloads?
Above roughly $150 a month of API spend, usually yes. Subscription windows price the task instead of every step: loops, retries, and tool chatter consume window capacity rather than billing per token. Limits still exist, so production setups keep a second account or an API key as fallback.
Do AI agents need streaming responses?
Generally no. An agent cannot act on half an answer; it needs the complete response before the next step. That makes agents a natural fit for the Codex lane, which returns complete responses rather than streams. Keep a streaming API-key lane for user-facing chat surfaces.