Reasoning Tokens: The Hidden Multiplier in Your OpenAI Bill

Reasoning tokens bill as output tokens at full price. A 150-token answer can bill 1,500 tokens, turning a $525 monthly estimate into $2,550. How to see and control them.

Reasoning tokens are the tokens a model spends thinking before it writes the answer, and OpenAI bills them at the full output rate even though they never appear in the response. On reasoning models, a 150-token reply routinely bills 1,500 or more output tokens, which is why bills run 5 to 10 times what visible answer lengths predict. As of June 2026 output costs $30 per million tokens on GPT-5.5 and $10 on GPT-5 (live rates: OpenAI’s pricing page).

What reasoning tokens are

Reasoning models (the GPT-5 family with reasoning enabled, and o4-mini) generate an internal chain of work before answering: plans, intermediate steps, self-checks. OpenAI discards that text from the response but counts every token of it in completion_tokens. The model genuinely does more compute; the catch is only that the line item is invisible until you read the usage object.

Where they show up in your usage data

{
  "usage": {
    "prompt_tokens": 1200,
    "completion_tokens": 1500,
    "completion_tokens_details": {
      "reasoning_tokens": 1350
    }
  }
}

Visible answer: 150 tokens. Billed output: 1,500. The 1,350-token gap is reasoning, and at GPT-5.5’s output rate it is 90 percent of this request’s output cost.

The worked example: a 10x line item

One request to GPT-5.5 ($5 input, $30 output per million tokens as of June 2026):

LineTokensMathCost
Input1,2001,200 × $5 ÷ 1M$0.0060
Visible answer150150 × $30 ÷ 1M$0.0045
Reasoning tokens1,3501,350 × $30 ÷ 1M$0.0405
Billed total$0.0510

A team estimating from visible lengths expects $0.0105 per request. The real number is $0.051, almost five times higher, and the output line alone is ten times the estimate. At 50,000 requests a month that is the difference between a $525 forecast and a $2,550 bill.

Reasoning tokens bill at the output rate even though they never appear in the response. That single fact explains more confused OpenAI invoices than anything else, which is why it sits at the top of the bill diagnostic checklist.

How much models reason

Reasoning volume tracks task difficulty and the reasoning effort setting, not answer length. A one-word answer to a hard question can carry thousands of reasoning tokens; a long answer to an easy question can carry almost none. That is why per-request costs vary wildly on the same route and why averages mislead: log the distribution.

The cheap reasoner changes the math. o4-mini reasons by default but costs $0.55 input and $2.20 output per million tokens. The same 1,200-in, 1,500-out request shape costs 1,200 × $0.55 ÷ 1M + 1,500 × $2.20 ÷ 1M = $0.0007 + $0.0033 = $0.004. Those 50,000 monthly requests cost about $198 instead of $2,550. When a task needs thinking but not flagship judgment, the cheap reasoner keeps 92 percent of the money.

Five ways to control reasoning spend

  1. Match reasoning effort to the task. The effort parameter is the throttle; keep it low or minimal on extraction, formatting, and routing tasks, higher only where quality measurably improves.
  2. Route easy work away from reasoning entirely. GPT-5 Mini at $0.25/$2 handles the simple majority of most traffic without thinking about it.
  3. Budget max_output_tokens carefully. Reasoning counts against the cap. Set it too tight and you pay for the thinking, then get a truncated answer.
  4. Log the ratio. Alert when reasoning_tokens ÷ completion_tokens jumps on a route; a prompt change or model bump can triple reasoning volume silently.
  5. Reprice, then optimize. Before tuning prompts, check whether the route belongs on a different model. The per-model rates are in OpenAI API pricing explained.

Reasoning is one line item among several; the ranked list of every cost lever is in how to reduce OpenAI API costs. If you want reasoning counts logged per request without building it yourself, our request log records them, and the calculator turns your real token mix into a monthly number.

Frequently asked questions

What are reasoning tokens in the OpenAI API?

Reasoning tokens are the internal thinking a reasoning model generates before writing its visible answer: plans, intermediate steps, checks. OpenAI discards that text from the response but counts every token of it in completion_tokens, so you pay for thinking you never see.

Are reasoning tokens billed as output tokens?

Yes, at the full output rate. On GPT-5.5 that is $30 per million tokens as of June 2026, and on GPT-5 it is $10. A request whose visible answer is 150 tokens can bill 1,500 output tokens once reasoning is counted.

How do I see how many reasoning tokens I am paying for?

Read usage.completion_tokens_details.reasoning_tokens in each API response. Visible answer length is completion_tokens minus reasoning_tokens. Logging that ratio per route is the fastest way to find where reasoning spend concentrates.

Can I turn reasoning tokens off?

You can shrink them. Set the reasoning effort parameter low for simple tasks, route easy work to small models, or use a cheap reasoner like o4-mini where thinking is needed but flagship quality is not. Reasoning volume follows task difficulty and effort settings, not answer length.

More on OpenAI costs
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.