LLM Cost per User: The SaaS Math Nobody Shows

Cost per user = actions x calls x tokens x price. Worked example: $1.82 at the mean, $18 at p95 against $29 ARPU. Margin bands, and how flat capacity caps tail risk.

LLM cost per user is actions per user x calls per action x tokens per call x token price, and for a typical AI SaaS feature on GPT-5 it lands near $1.82 a month for the mean user: 6.3% of a $29 plan. The mean is not the risk. The p95 user costs $18, the p99 user costs more than their subscription, and AI pricing survives or dies on that tail, not on the average.

The formula and a worked baseline

Take an AI writing product at $29 a month. The mean user runs 40 generations a month; each generation makes 3 model calls (draft, refine, title) averaging 2,500 input and 1,200 output tokens. On GPT-5 at $1.25 input / $10 output per million tokens (June 2026, live prices at openai.com/api/pricing):

per call:       0.0025 x $1.25 + 0.0012 x $10 = $0.0151
per generation: 3 x $0.0151                   = $0.0454
per mean user:  40 x $0.0454                  = $1.82/month

That is the number most teams put in the deck: 6.3% of ARPU, healthy software margins. It is also the least informative number in the model.

The distribution is the business

Usage in AI products is heavily skewed. The same product, sliced by user segment:

SegmentGenerations/monthLLM cost/month% of $29 ARPU
Median user25$1.133.9%
Mean user40$1.826.3%
p95 user400$18.1563%
p99 user1,200$54.45188%

The mean user costs 6% of revenue; the p95 user costs 63%; the p99 user costs almost twice what they pay. With 10,000 users, the top 1% generates around $5,400 of monthly COGS against $2,900 of revenue from that same segment. Every “unlimited AI” plan quietly bets that this tail stays thin, and engaged products grow their own tails: your best users become your most expensive ones.

Margin bands worth aiming at

Working guidance from these unit economics:

  • Under 10% of ARPU at the mean: comfortable; AI is a feature cost.
  • 10 to 25%: a warning band; one model upgrade or engagement spike pushes you out of software margins.
  • Over 25%: reprice, cap, or re-architect; the AI is eating the gross margin that sales efficiency math assumes.

The cheapest lever is model assignment per path. Moving draft calls to GPT-5 Mini ($0.25/$2 per million) drops the per-call cost to about $0.003 and the mean user to roughly $0.36 a month: 1.3% of ARPU, with quality holding wherever your evals say it holds. Which tier suffices per task type is the subject of the cheapest OpenAI model that still does the job.

One more multiplier hides in roadmaps: agent features. The baseline assumes 3 calls per action; an agentic “do this for me” feature runs 15 to 50 calls per action, which moves the whole table by 5 to 15x. Price those features with the agent cost formula before they ship, not after the first invoice.

When flat capacity de-risks the pricing

Per-token COGS means your liability scales with engagement, which is the one thing you are trying to maximize. Flat-cost capacity puts a ceiling on it. A subscription-backed lane through Codex Hosted bills bulk workloads against a flat ChatGPT plan: Pro 5x plus our $129 fee is about $229 a month and absorbs an estimated $3,500 of API-equivalent work (an estimate from observed usage windows, not a guarantee). At this product’s mean usage, that is roughly 1,900 users’ worth of generation volume for a fixed $229.

The tail math changes character: a p99 user on the flat lane consumes window capacity, not marginal dollars, so their worst case is throttling into the API fallback lane rather than a COGS spike. Caps per sub-key still belong in the design, and bursty launch days still want the API lane as overflow. The honest survey of every fixed-cost path, including self-hosting and provider commitments, is in fixed-cost LLM inference.

The model to build this week

Pull 30 days of logs and compute four numbers: mean and median cost per user, p95, p99, and LLM COGS as a percent of MRR. Then price the tail: caps, tiers, or a flat lane. If your mean user costs under 10% of ARPU and your p99 user is capped, AI margins stop being a board-meeting surprise.

Your own volumes drop into the calculator in under a minute; it shows the metered bill next to the flat-capacity setup that would cover it.

Frequently asked questions

How do you calculate LLM cost per user?

Monthly LLM cost per user = actions per user per month x model calls per action x tokens per call x token price. For a typical AI writing product on GPT-5 (40 generations a month, 3 calls each, 2,500 input and 1,200 output tokens per call), that is about $1.82 per user at June 2026 prices.

What percentage of revenue should LLM costs be for a SaaS?

Keep mean LLM cost under 10% of ARPU for comfortable software margins, treat 10 to 25% as a warning band, and reprice or cap above that. The mean is not enough on its own: model the p95 and p99 users, because the tail is where AI pricing breaks.

How do you handle power users who cost more than they pay?

Three levers: usage caps or tiers in the pricing, cheaper models for high-volume paths, and a flat-cost capacity lane so heavy usage consumes window capacity instead of marginal dollars. Most products need the caps regardless; flat capacity decides whether a p99 user is a cost incident or a non-event.

Does flat-rate capacity make unlimited AI plans safe?

Safer, with limits. A subscription-backed lane caps what heavy users can cost you (the plan price), so 'unlimited' marketing stops being an open-ended liability. Plan capacity arrives in usage windows and the numbers are estimates, so per-key caps and an API fallback lane still belong in the design.

More on OpenAI costs
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.