The Cheapest OpenAI Model That Still Does the Job

GPT-5 Nano at $0.05/$0.40 per million tokens handles classification and extraction; Mini covers most production text. The decision table, with a worked 94x cost spread.

The cheapest OpenAI model is GPT-5 Nano: $0.05 per million input tokens and $0.40 per million output as of June 2026. The more useful answer is the cheapest model per task type: Nano for classification and extraction, GPT-5 Mini for most production text, GPT-5 and up only where judgment is the product. On an identical job, the spread between the cheapest and the most expensive current model is roughly 94x, so this choice outweighs every prompt-level optimization you will ever ship.

The price ladder

June 2026 API prices per million tokens (live numbers at openai.com/api/pricing):

ModelInputOutputSweet spot
GPT-5 Nano$0.05$0.40Classification, extraction, routing, dedup
GPT-5 Mini$0.25$2.00Summaries, support drafts, structured rewrites
o4-mini$0.55$2.20Budget reasoning: math checks, code triage
GPT-5$1.25$10.00Default for customer-facing generation
GPT-5.4$2.50$15.00Harder reasoning, review passes
GPT-5.5$5.00$30.00Agent planners, frontier-quality judgment

GPT-5 also offers cached input at $0.125 per million, which matters when a long system prompt repeats across calls. Per-model workload math for the top tier is worked through in GPT-5.5 API cost: per-token prices and real workload math, and the full pricing mechanics (caching, batch, gotchas) live in OpenAI API pricing explained.

The same job at four prices

Classify 100,000 support tickets, 500 input and 20 output tokens each: 50M input, 2M output tokens total.

ModelInput costOutput costTotal job cost
GPT-5 Nano$2.50$0.80$3.30
GPT-5 Mini$12.50$4.00$16.50
GPT-5$62.50$20.00$82.50
GPT-5.5$250.00$60.00$310.00

Checking one row: 50M input x $0.05/1M = $2.50, plus 2M output x $0.40/1M = $0.80, so the whole job costs $3.30 on Nano. On the same classification job, the spread between GPT-5 Nano and GPT-5.5 is roughly 94x. If a team is running tickets through GPT-5.5 “to be safe”, they are paying $306.70 per hundred thousand tickets for safety nobody measured.

How to find your floor

The method beats the folklore: write an eval set of 100 to 200 real examples with expected outputs before touching model names. Then downgrade until the evals fail, and step back up one tier. Task-type starting points:

  • Nano: single-label decisions, JSON extraction from consistent formats, language detection, spam filtering. Wrong for anything open-ended.
  • Mini: summaries under a page, templated support replies, title generation, rewrites with clear instructions.
  • o4-mini: the budget reasoning slot. It thinks before answering, and those reasoning tokens bill as output, so its effective cost runs above the sticker for hard problems. Good for math validation and code-review triage.
  • GPT-5: the production default when output quality is customer-visible.
  • GPT-5.4 / GPT-5.5: planning steps in agents, final review passes, work where a wrong answer costs more than the model does.

Cascades beat single-model choices

Production systems rarely need one model; they need a cheap default and an escalation path. Route everything to Nano with a confidence check, escalate the uncertain 15% to GPT-5:

100,000 tickets through Nano            = $3.30
15,000 escalations through GPT-5
  (7.5M in x $1.25 + 0.3M out x $10)/1M = $12.38
total                                   = $15.68

That is 81% below the $82.50 all-GPT-5 bill, with GPT-5 quality exactly where the easy cases end. The confidence check can be as simple as asking Nano to emit a certainty field and escalating anything below a threshold.

Two levers that change the answer

Batch halves everything offline. The Batch API runs at a 50% discount for results within 24 hours. The Nano cascade above drops toward $8 if the job can wait overnight; the mechanics and fit are in the Batch API: when 50% off is worth the wait.

Flat capacity makes the question moot for bulk work. Model-shaving exists because every token is metered. Work routed through a subscription-backed lane bills against a flat ChatGPT plan instead, so the per-token spread stops mattering for whatever the Codex lane serves; the model surface is what Codex exposes, and capacity comes as plan windows (estimates, not guarantees). When bulk jobs move off the meter, model choice goes back to being a quality decision instead of a budget one. The arithmetic is in the API vs subscription cost comparison.

Price your own workload both ways in the calculator; it takes token counts and shows the per-model meter cost next to the flat-lane setup.

Frequently asked questions

What is the cheapest OpenAI model in 2026?

GPT-5 Nano, at $0.05 per million input tokens and $0.40 per million output tokens as of June 2026. It handles classification, extraction, routing, and other narrow tasks well. GPT-5 Mini at $0.25/$2 is the cheapest model most teams can run for general production text work.

Is GPT-5 Nano good enough for production?

For narrow, well-specified tasks, yes: classification, entity extraction, deduplication, routing, and moderation pre-filters. It is the wrong choice for open-ended writing or multi-step reasoning. The reliable pattern is an eval suite: downgrade until your evals fail, then step back up one tier.

When is GPT-5.5 worth the price?

When the task carries judgment that cheaper models measurably fail: agent planning, hard debugging, high-stakes drafts. At $5/$30 per million tokens it costs roughly 94x GPT-5 Nano on a typical job, so it earns its keep as the escalation tier, not the default.

How much does the OpenAI Batch API save?

Batch runs at a 50% discount on both input and output tokens in exchange for results within 24 hours instead of seconds. For offline jobs like backfills, nightly classification, or bulk summarization, it halves whatever model price you chose.

More on OpenAI costs
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.