Forecasting OpenAI Costs for Client Projects
A working estimation method for agencies: prototype a thin slice, sample real token counts, multiply by steps and retries, quote a range, then track it weekly.
To forecast OpenAI costs for a client project, measure instead of guessing: build a thin slice of the workflow, run 20 to 50 representative inputs through it, and read the real token counts from the API responses. Multiply by steps, calls, and monthly volume, add a 20 to 30 percent buffer for retries and growth, and quote the result as a range with written assumptions. Then track actual spend against the estimate weekly for the first month. That is the entire method; the rest of this article is the arithmetic.
Why there is no lookup table
“How do I estimate costs before I build it” is one of the most asked and least answered questions in agency communities, because the honest answer is workload-specific. Public calculators model single completions. Client projects are workflows: one task is four to ten model calls once you count classification, retrieval, drafting, and review, and agent loops multiply that further. How to calculate AI agent costs covers the loop math in depth; this article covers the estimation workflow that wraps around it.
The principle that makes forecasting tractable: an AI cost estimate is a measurement plus arithmetic, not a guess.
Step 1: sample real token counts
Build the thinnest slice of the workflow that produces real outputs, then run 20 to 50 representative inputs through it. Representative means pulled from the client: actual support tickets, actual documents, actual form submissions. Demo inputs run short and flatter the estimate.
Every OpenAI response carries exact token counts in its usage object. Log them per step:
resp = client.chat.completions.create(model="gpt-5-mini", messages=msgs)
u = resp.usage
log_step("draft", input_tokens=u.prompt_tokens, output_tokens=u.completion_tokens)
Keep the median and the 90th percentile for each step. Words-to-tokens rules of thumb (about 1.3 tokens per English word) are fine as a sanity check, but they miss system prompts, tool definitions, retrieved context, and reasoning tokens, which is exactly where estimates die.
Step 2: multiply out the workflow
The formula:
cost per run = sum over steps of: calls x (input tokens x input price
+ output tokens x output price)
monthly cost = runs per month x cost per run x (1 + buffer)
Worked example: a support triage agent for a client doing 6,000 tickets a month. Token counts are the sampled medians; prices are June 2026 list prices, GPT-5 Mini at $0.25 input / $2 output per million tokens and GPT-5 at $1.25 / $10 (current table in the pricing reference).
| Step | Model | Calls | Input/call | Output/call | Step cost |
|---|---|---|---|---|---|
| Classify | GPT-5 Mini | 1 | 1,800 | 120 | $0.0007 |
| Retrieve and summarize | GPT-5 Mini | 2 | 2,500 | 400 | $0.0029 |
| Draft reply | GPT-5 | 1 | 3,000 | 450 | $0.0083 |
| QA check | GPT-5 Mini | 1 | 2,200 | 150 | $0.0009 |
| Per ticket | 5 | ~$0.013 |
At 6,000 tickets: 6,000 x $0.013 = $78 a month before buffer.
Step 3: buffer for what the prototype hides
Add 20 percent and the estimate becomes about $94 a month. The buffer is not padding; it is the observed gap between prototype and production. It covers retries (failed JSON parses and timeouts rerun 5 to 15 percent of calls in typical production logs), context that grows as threads lengthen, and prompt edits during delivery.
Two refinements. Quote from the p90 sample, not the median: the average run is the demo, the p90 run is the customer. And if any step uses a reasoning model, take the high end of the buffer, because reasoning tokens bill as output and do not show up in the visible reply.
Step 4: put the quote in writing
A quote is an estimate plus its assumptions plus the conditions that reopen it. A template:
AI usage estimate: support triage agent
Assumptions: ~6,000 tickets/mo, 5 model calls per ticket,
GPT-5 Mini for routing and QA, GPT-5 for drafting.
Measured cost per ticket (32-run sample, median): $0.013
With 20% retry and growth buffer: $0.016
Estimated monthly usage: $80-140 at June 2026 list prices
Re-quote triggers: volume above 8,000 tickets/mo, a model
change, or two consecutive weeks 25% over estimate.
Billing structure: [included in retainer / pass-through / capped]
The billing structure line is its own decision. The five agency pricing models compare the options, and whether to pass costs through covers the contract mechanics.
Step 5: track actuals from day one
Serve each client through its own scoped key, then compare per-key spend to the estimate weekly for the first month. Clients increasingly ask for exact token spend in reporting; with a per-request log, that report is a filter, not a project.
The variance rule from the template matters most: two consecutive weeks at 25 percent over estimate means a conversation, not silent absorption. Estimates that nobody rechecks are how agencies drift into unprofitable accounts.
When the forecast sizes a plan instead of a bill
If the workload runs through Codex Hosted, the same estimate buys something different: it tells you which ChatGPT plan step to size, because workloads bill to the flat plan instead of the meter. As an estimate, a $20 Plus plan absorbs roughly $700 of API-equivalent work a month and a $100 plan roughly $3,500.
The honest math cuts both ways. Our $94 triage workload alone is cheaper on the meter than the $129 platform fee plus a $20 plan ($149). Run six clients like it and the picture flips: about $564 metered versus about $149 flat, inside the same estimated window. Forecast first, then decide which side of that line your book of clients sits on; the calculator does the comparison with your own numbers.
Frequently asked questions
How do I estimate OpenAI API costs for a client project before building it?
Build a thin prototype of the workflow, run 20 to 50 representative inputs through it, and read real token counts from the API's usage field. Multiply per-run cost by expected volume, add a 20 to 30 percent buffer for retries and growth, and quote the result as a range with written assumptions.
What buffer should I add to an AI cost estimate?
20 to 30 percent on top of measured per-run cost is a defensible starting point. The buffer covers retries, context growth, and prompt changes during delivery. Quote from your 90th-percentile run cost rather than the average, because production inputs run longer than test inputs.
How accurate are token estimates from a prototype?
Sampling 20 or more real inputs typically lands within about 30 percent of production cost, which is what the buffer absorbs. Estimating from word counts alone is far worse: it misses system prompts, tool definitions, retrieved context, and reasoning tokens that bill as output.
How do I track OpenAI costs per client after launch?
Give each client or app its own scoped key and review per-key spend weekly for the first month against the written estimate. If actual usage runs 25 percent over estimate for two consecutive weeks, re-quote with the client instead of absorbing the difference silently.