AI Agency Unit Economics: Where Margin Actually Leaks
Revenue per client is flat; cost per client is metered. The agentic margin concept, a quantified leak inventory (retries, scope creep, model bumps), and the math to plug it.
AI agency unit economics come down to two numbers: revenue per client and cost per client. Revenue per client is fixed by contract; cost per client is metered by the token, and it moves without telling you. The gap between the workload you quoted and the model bill you pay is where margin leaks, and the leaks have names: retries, scope creep, model bumps, context growth. This article quantifies each one, then shows what changes when the cost side stops being metered.
The problem statement, why OpenAI quietly becomes an agency’s largest vendor after payroll, is in the agency margin problem. This is the framework underneath it.
The baseline: one client, fully loaded
A $2,500 monthly retainer client, costed at kickoff:
| Line | Monthly | % of revenue |
|---|---|---|
| Model spend (measured in the prototype) | $340 | 13.6% |
| Tool share (platform, hosting, logs) | $45 | 1.8% |
| Delivery labor (8 hrs at $90) | $720 | 28.8% |
| Gross margin | $1,395 | 55.8% |
Labor is planned and tools are flat. The model line is the only one that drifts on its own, which is why it gets the rest of this article.
What is agentic margin?
You will increasingly hear operators and investors use the term “agentic margin”: what remains of a software-style gross margin once agent workloads have billed their inference. The term is emerging rather than standard, but it names something real. A single completion costs fractions of a cent; an agent task is 5 to 50 calls once planning, tool use, retries, and review passes are counted, so a margin computed from single completions overstates profitability by the same multiple.
For agencies the term earns its keep as a discipline: compute margin per client after the loops, not before. The same arithmetic for SaaS products, where the tail of heavy users does the damage, is worked through in LLM cost per user.
The leak inventory
Four leaks recur in nearly every agency P&L. Stacked on the baseline client:
| Leak | Mechanism | Applied to baseline | Running total |
|---|---|---|---|
| (quoted) | $340 | ||
| Retries | failed parses and timeouts rerun 5-15% of calls | +12% | $381 |
| Context growth | threads and RAG payloads lengthen; input +25% | +15% of total | $438 |
| Model bump | drafting step moved GPT-5 Mini to GPT-5 on quality | +$90 | $528 |
| Scope creep | a fourth workflow ships mid-retainer | +25% volume | $660 |
The client now costs $660 a month in model spend against a $340 quote: 94% over, none of it a bug, all of it normal delivery. That is 12.8 points of gross margin gone, and across 15 similar clients it is $4,800 a month, about $57,600 a year, invisible on any single invoice.
The quoted workload is the only AI cost anyone is watching. The leaks happen everywhere else.
Two of the four have cheap detection. Retries show up in request logs as repeated calls; model bumps show up as a spend step the week after a deploy. The other two are slow drifts that only a written baseline catches, which is why the estimate from the forecasting workflow belongs in the contract file, not a Slack thread.
Plugging the leaks
The mechanics that keep the gap honest:
- A scoped key per client. Per-client spend becomes a filter on the request log instead of a month-end reconstruction.
- A written estimate with re-quote triggers. Two consecutive weeks 25% over estimate reopens pricing. The trigger does the awkward conversation’s scheduling for you.
- Budget caps per key. A runaway loop stops at the cap, not at the invoice.
- A model-change rule. Any step upgraded to a pricier model gets its line re-measured the same week, since one step moving from Mini to GPT-5 multiplies that step’s token prices by 5 to 8x.
How the recovered margin gets billed, absorbed, passed through, or capped, is its own decision, covered in whether to pass costs through.
What flat capacity changes
Every leak above is a multiplication of a metered price. Run the same workloads through Codex Hosted and the multiplicand changes: workloads bill to a flat ChatGPT plan, so cost per client becomes a step function instead of a meter.
The leak-bloated book, both ways: 20 clients at the drifted $660 average is $13,200 a month on the meter. A Pro 20x plan plus our $129 fee is about $329 a month and absorbs an estimated $14,000 of API-equivalent work. The estimates are estimates, plan capacity arrives in usage windows, and a book that size wants fallback lanes configured. But the unit economics invert: retries and context growth consume window capacity rather than marginal dollars, and the worst case for a heavy month is a $100 plan step instead of a five-figure surprise.
The leaks do not stop existing, they stop compounding into the P&L. Caps and per-client keys remain worth keeping, because a shared window is still a shared resource.
Run your own book through the calculator: it takes the monthly bill you have now and shows the plan-backed version next to it.
Frequently asked questions
What are healthy unit economics for an AI agency?
Working guidance from agency operators: keep model spend under roughly 15 percent of each client's revenue and total delivery COGS (labor plus tools plus models) under 50 percent, which leaves a gross margin above 50 percent. The model line deserves the most scrutiny because it is the only COGS line that moves on its own after you quote.
What is agentic margin?
An emerging industry term for what remains of a software-style gross margin once agent workloads have billed their inference. Agents run 5 to 50 model calls per task, so a margin computed from single completions overstates profitability; agentic margin is the figure after the loops, retries, and tool calls are counted.
Where do AI agencies lose the most margin?
Rarely on the workload they quoted. The leaks are retries (5 to 15 percent of calls in typical production logs), scope creep that adds workflows without re-quoting, model upgrades that multiply a single step's cost, and context growth that inflates input tokens over months. Stacked together they can nearly double model spend without any new deliverable.
How do I track AI cost per client?
Give each client a scoped API key and read per-key spend from the request log, weekly for a new account's first month and monthly after. Compare against the written estimate from kickoff and reopen pricing when actuals run 25 percent over for two consecutive weeks.