Routing

Not every prompt needs your premium model.

You are paying GPT-5.5 prices to rename a variable. Most prompts can be answered by a model that costs 30 to 80 times less. ProxyLLM picks the cheapest model that can handle each request, automatically.

Start routing See what you would save

$129/month SaaS. Bring your own model keys. No inference markup.

What you are paying for vs what you need.

A sample of prompts most teams send to the same expensive model. The first five do not need it.

Prompt

Routed to

Routed cost

Saved

"Rename a variable"

minimax

$0.0001

82x

"Add JSDoc to function"

gpt-4o-mini

$0.0004

31x

"Classify support ticket"

claude-haiku

$0.0003

31x

"Extract names from text"

gpt-4o-mini

$0.0005

29x

"Summarize a paragraph"

minimax

$0.0002

45x

"Refactor auth middleware"

gpt-5.5

$0.0820

Same prompts. Same outputs. The last one needed the big model. ProxyLLM knew the difference.

How the router picks a model.

No black box. Every routing decision is visible in the response.

Classify intent

A cheap classifier reads the prompt, returns a difficulty score, a task type, and a confidence.

Match your config

ProxyLLM looks up the model assigned to that score range. Visual editor or JSON, you decide.

Return the reason

The response includes which model ran it and why. Audit any routing decision in your logs.

response.json

{
  "choices": [{ "message": { "content": "..." }}],
  "x_proxyllm": {
    "routed_to": "gpt-4o-mini",
    "reason": "intent=format · difficulty=low · confidence=0.94",
    "cost_usd": 0.0004,
    "would_have_cost_usd": 0.0125
  }
}

$129/month · normal SaaS pricing

Routing pays for itself by lunch.

If routing saves you more than $129 a month, you come out ahead. Most teams clear that once model traffic is meaningful.

Start routing Run the math

Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing. Routing picks the model for each prompt. Codex Hosted makes the OpenAI lane a flat price.

Get Codex Hosted How it works

Questions on routing.

How does the router decide which model to use?

A cheap classifier call inspects intent and difficulty. It returns a score, and ProxyLLM matches it against your routing config. The chosen model plus the routing reason come back in every response so you can audit it.

What if the cheap model gets it wrong?

Set confidence thresholds in the routing config. Below a threshold, the request escalates to the next-tier model automatically. You can also force-route specific prompt templates to a specific model.

Do I have to rewrite my prompts?

No. Point your OpenAI SDK at the ProxyLLM base URL and routing turns on. Your prompts and your code stay exactly the same.

How much will I actually save?

Depends on your prompt mix. Teams sending mostly simple prompts (classification, extraction, formatting) see 60-90% off their inference bill. Teams with heavy refactor workloads see 20-40%. Try the calculator.