← Blog AI agency June 12, 2026

Should You Pass API Costs Through to Clients?

Pass through when usage is client-driven, absorb when it is small and stable, cap when clients need certainty. Failure modes, sample contract language, and the flat-capacity case.

Pass API costs through when usage is client-driven, variable, and big enough to matter; absorb them when they are small and stable relative to the retainer; cap them when the client needs a fixed number more than flexibility. Every structure works until its failure mode arrives, so the real decision is which failure you would rather manage. And if delivery runs on flat subscription-backed capacity instead of the meter, most of the question dissolves, because there is little cost left to pass through.

This article walks the three structures with numbers, the contract language each needs, and the flat-capacity exception. The wider menu, including value and hybrid pricing, is in the five agency pricing structures.

The three structures on one client

Take a client on a $2,500 monthly retainer whose workflows measure $400 of OpenAI usage in a normal month and $700 in campaign months. (How to measure that before you sign is covered in forecasting costs for client projects.)

Structure	The deal	Normal month	Campaign month	Usage risk sits with
Absorb	usage baked into the retainer	$2,500	$2,500	you
Pass through +25%	reimbursed at cost plus fee	$2,500 + $500	$2,500 + $875	client
Cap at $550	included up to the cap	$2,500	cap trips in week 3	shared

Same client, same workload, three different businesses.

Where absorbing fails

Absorption fails silently. The AI line starts at 16% of revenue, drifts to 28% in campaign months, and nobody renegotiates because nobody is watching a number that never appears on an invoice. One client doubling their volume converts your retainer into their compute budget. The agency margin problem is mostly this failure repeated across a client book.

Absorption survives only with measurement attached: a scoped key per client and a written re-quote trigger, such as two consecutive weeks 25% over estimate.

Where pass-through fails

Pass-through looks safest and fails three ways.

Bill shock lands on the client. The variance you escaped becomes their problem, and clients respond by bill-shopping, freezing usage mid-project, or disputing the spike. Pass-through billing does not remove usage risk; it converts it into trust risk with a one-month delay.

The markup gets scrutinized. A client who checks OpenAI’s pricing page will ask why your per-token rate is 25% higher. If the fee is disclosed and tied to something real (monitoring, retries, reporting), the conversation is fine. If it is discovered, it is not.

Your invoice tracks a vendor’s pricing page. A model deprecation or price change mid-quarter rewrites your billing basis, and you get to explain a vendor’s decision to every client at once.

Pass-through earns its keep when usage genuinely belongs to the client: their ad volume, their ticket count, their document pile. It needs a per-request report behind every invoice, which is why we surface API-equivalent value per request in the request log.

Where caps fail

A cap is a support conversation scheduled in advance. The client hits $550 in week three of a campaign month, and you choose between stalling their campaign and waiving the cap. A waived cap is absorption with extra steps.

Caps work when they are mechanical rather than social: alert at 80%, a pre-agreed overflow rate after 100%, enforced by a scoped sub-key with a budget rather than by someone noticing a dashboard. The failure mode never disappears, but it becomes a planned email instead of an emergency call.

Sample contract language

These samples are illustrative templates, not legal advice. Have a lawyer review anything you sign.

Pass-through: "Client will reimburse Agency for third-party AI model
usage incurred on Client's behalf, billed monthly at provider list
prices plus a 25% administration fee. Agency will provide a per-request
usage report with each invoice. Changes to provider list prices pass
through as of the provider's effective date."

Capped: "Monthly AI model usage up to $550 is included in the retainer.
Agency will notify Client when usage reaches 80% of the cap. Usage
beyond the cap requires written approval and bills at cost plus a 25%
administration fee."

Absorbed: "AI model usage required to deliver the Services is included
in the fee, provided monthly request volume remains within the scope
defined in Schedule A. Material changes in volume, scope, or provider
pricing reopen the fee for renegotiation."

The shared ingredient across all three is the reopener. Contracts fail at the edges, and the edge in AI work is always volume.

How flat capacity changes the answer

All three structures are workarounds for a metered input cost. Run the same workloads through Codex Hosted and they bill to a flat ChatGPT plan instead: our $129 fee plus a $100 Pro plan is about $229 a month, absorbing an estimated $3,500 of API-equivalent work. Estimates, not guarantees, and capacity arrives in usage windows.

The arithmetic on our sample client: six clients at $400 of measured usage is $2,400 a month on the meter, but sits inside one estimated plan window for $229 flat. The next client adds close to zero marginal cost until the window fills, and then the increment is a $100 plan step, not an open-ended bill.

That reshapes each structure:

Absorb stops being dangerous. Your worst case is a plan step, not an unbounded invoice, so baking usage into the retainer becomes the default rather than a gamble.
Pass-through runs out of material. You cannot honestly reimburse a step function per client. Agencies that keep a usage line on invoices reframe it as capacity allocation or value pricing, with the request log as the receipt.
Caps become hygiene. Per-client sub-key budgets stop protecting your P&L and start protecting the shared window from any one client’s runaway workflow.

The decision in one paragraph

Usage under 10% of the retainer and stable: absorb, with per-client measurement. Usage large and client-driven: pass through, disclosed markup, per-request report attached. Client needs invoice certainty: cap, with an 80% alert and a pre-agreed overflow rate. On flat capacity: absorb by default, cap sub-keys for hygiene, and price the value of the work rather than the meter behind it.

If you want to see which side of the line your book sits on, put your current monthly bill into the calculator; it shows the same workload priced against a plan.

Frequently asked questions

Should an AI agency pass API costs through to clients?

Pass them through when usage is client-driven, variable, and large enough to matter, roughly 10 percent of the retainer or more. Absorb them when they are small and stable, and cap them when the client values a fixed invoice over flexibility. Each structure has a specific failure mode, so pick the one whose failure you can manage.

What markup do agencies add to pass-through API costs?

Common patterns in agency communities run 20 to 50 percent on top of provider list prices, framed as an administration fee that covers monitoring, retries, and reporting. Disclose the markup in the contract. A markup the client discovers later costs more trust than it ever earned in margin.

What should a pass-through billing clause include?

Four things: the billing basis (provider list prices plus a stated fee), a reporting cadence with per-request detail, how provider price changes flow through, and the triggers that reopen pricing. Treat any template as a starting point rather than legal advice, and have a lawyer review the contract you actually sign.

How does flat-rate capacity change pass-through billing?

Subscription-backed capacity makes the marginal cost of a client workload close to zero until the next plan step, so there is little usage cost left to pass through. Agencies on flat capacity typically absorb the AI line, keep per-client budget caps for hygiene, and price on value instead of reimbursement.