MCP Tool Loops and the Cost of Agency
MCP hosts re-send every tool schema with every model call, and each tool result triggers another call. The worked math of tool-loop spend and what a flat lane changes.
MCP itself costs nothing; the model calls around it are the bill. Every tool result triggers another model call, and every one of those calls re-sends the schemas of every tool the host exposes. On per-token billing, that pair of multipliers makes MCP agents expensive in a way single completions never show. On a flat subscription-backed lane, the same loop consumes window capacity instead of generating line items.
Here is where the tokens actually go, a worked session, and the host config that moves the loop to a flat lane.
The two multipliers MCP adds
Schema overhead on every call. Tool definitions ride in the prompt, so the model can know what it may call. Connect three MCP servers exposing 25 tools at roughly 240 tokens of JSON schema each, and every model call in the session starts with about 6,000 input tokens of definitions. The model pays that tax whether it uses one tool or none. On per-token billing, every tool you expose costs money on every call, whether the model uses it or not.
Calls triggered by results. A tool invocation is never one model call. The model emits the call, the host runs the tool, and the result goes back to the model in a fresh request that carries everything before it. Ten tool invocations is at least twenty model calls, with context growing the whole way.
A worked MCP session
A research-and-file task in an MCP host with 25 tools exposed: 14 model calls, 7 tool invocations, context accumulating as results land. On GPT-5.4 (OpenAI’s June 2026 list: $2.50 per million input, $15 per million output):
| Component | Arithmetic | Tokens |
|---|---|---|
| Schema re-sends | 14 calls × ~6,000 | ~84,000 |
| Conversation and tool results | grows step over step | ~66,000 |
| Total input | ~150,000 | |
| Total output | plans, calls, final answer | ~4,000 |
Input: 150,000 × $2.50/M = $0.375
Output: 4,000 × $15/M = $0.060
Per task ≈ $0.44
More than half this task’s input tokens are tool schemas sent fourteen times. A 25-tool MCP setup can spend more on re-sent definitions than on the answers themselves. At volume:
| Tasks/day | API cost/mo (GPT-5.4) | Flat path |
|---|---|---|
| 30 | ~$396 | Plus $20 + ProxyLLM $129 = $149 |
| 150 | ~$1,980 | Pro 5x $100 + $129 = $229 |
| 500 | ~$6,600 | Pro 20x $200 + $129 = $329 |
Window capacities (roughly $700, $3,500, and $14,000 of API-equivalent work) are our planning estimates, never guarantees; the request log shows real consumption per lane.
The meter prices your tool surface
Here is the design pressure nobody states out loud: on per-token billing, the rational move is to prune tools, because every schema costs input tokens on every call. Teams strip useful tools from their servers to thin the prompt, and the agent gets less capable to make the bill smaller.
A flat lane removes that trade. When the loop bills to a subscription window, the marginal cost of exposing a complete tool surface is window capacity, not dollars per call, so you keep the tools that make the agent good. The same logic applies to retries and review passes, which is the broader argument in why agent workloads flip the API-vs-subscription math. The estimation formula for your own loops is in how to calculate AI agent costs.
Point the host at the flat lane
Any MCP host that takes OpenAI-compatible model settings can route its calls through the gateway. Where the host reads environment configuration:
{
"env": {
"OPENAI_BASE_URL": "https://api.proxyllm.ai/v1",
"OPENAI_API_KEY": "pk_live_your_proxyllm_key"
}
}
Behind the endpoint, OpenAI-model calls run through Codex Hosted on your own connected ChatGPT account; past a plan limit, calls fall back to a second account, then your own API key, until the window resets. Responses arrive complete rather than streamed, which suits tool loops exactly: the host needs the full model response before it can run the next tool anyway. If your agent is Codex itself rather than a custom host, the same pattern from the CLI side is in the codex exec cookbook.
Give each host its own scoped sub-key with a budget cap. Tool loops are the workload most likely to run long, and a cap turns “the agent looped overnight” from an invoice into a stopped key, with the request log showing exactly which calls got there.
The condensed setup lives on the MCP integration page. If you can estimate your tasks per day, the calculator maps the loop math above to a plan tier with your own numbers.
Frequently asked questions
Why do MCP agents use so many tokens?
Two multipliers stack. Every model call carries the schemas of every exposed tool as input tokens, and every tool result triggers another model call that re-sends the grown context. A host with 25 tools can spend 6,000 tokens per call on definitions alone, before the conversation itself.
How many model calls does an MCP task make?
Each tool invocation costs at least two model calls: one to decide and emit the call, one to read the result and continue. Tasks that touch several tools commonly run 10 to 30 model calls, with input tokens growing as results accumulate in context.
Do MCP tool definitions cost tokens on every request?
Yes. Tool schemas are part of the prompt, so they bill as input tokens on every call in the loop, whether or not the model uses the tool. Twenty-five tools at roughly 240 tokens each is about 6,000 input tokens per call, on every call.
How do I run MCP tool loops on a flat rate?
Point the MCP host's model settings at an OpenAI-compatible endpoint backed by a ChatGPT subscription, such as ProxyLLM. Set OPENAI_BASE_URL to https://api.proxyllm.ai/v1 with a scoped key, and the loop's calls bill to the flat plan instead of per token.