Silent cost leak

Stop losing money to uncached prompts.

Most teams enable prompt caching and assume it works. But a whitespace edit, a forgotten flag, a timestamp in the wrong place means you're paying for the same prompt twice.

Start tracking your caching How it works

Free, always. No card. No trial. Bring your own keys.

Three ways your cache quietly breaks.

None of them throw an error. None of them show up in logs. All of them inflate your bill.

Someone edited the system prompt.

A teammate adds one whitespace character to a system prompt to make logs cleaner. Hash changes. Every request that used to hit cache misses for a week before anyone notices.

A config flag got flipped.

A new SDK version, a refactor, an env var rename. Caching silently turns off. The app keeps working. The bill keeps growing. You only catch it on billing day.

A prompt has a timestamp in it.

Someone interpolates new Date() or a UUID into the prompt without realizing it. Every request looks unique to the cache. Hit rate drops to 0%. No exception, no error, no log line.

How it works

Every request gets a price tag.

ProxyLLM hashes every prompt server-side, watches what the provider reports as cached, and keeps that separate from repeat detection. Provider cache hit rate per model. Repeats over the last 30 days. Cost trend per day. All of it on every account.

Per-request USD cost, per-day rollup, per-model breakdown
Repeating prompt detection out of the box, with near-repeats flagged
30+ days of full request history
Alerts when hit rate drops below your threshold

Provider cache hit rate

42.1%

last 7d

Provider savings

$1.07

when reported

Repeats caught

this week

Models tested

this week

Request log now

gpt-4o-mini

1,204 in 824 provider cached $0.0003

claude-3.5-sonnet

1,204 in 0 provider cached $0.0058

gpt-4o-mini repeat

1,204 in 1,204 provider cached $0.0002

claude-3.5-sonnet repeat

1,204 in 1,204 provider cached $0.0002

gpt-4o-mini

892 in 0 provider cached $0.0008

Free on every account · 0% markup on inference

Cache visibility is the cheapest fix in your stack.

Sign up, drop in your keys, point your SDK at our base URL. Hit rate, savings, and repeat detection turn on automatically.

See your hit rate

Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing. Cache analytics shows what each request costs. Codex Hosted is how you stop paying per token.

Get Codex Hosted How it works

Questions on cache analytics.

Is cache analytics included with membership?

Yes. Cache analytics are part of the ProxyLLM suite. One membership at $129/month covers gateway, routing, sub-keys, Blitz, schema outputs, analytics, and Codex Hosted.

Do you change my caching behavior?

No. ProxyLLM observes. Caching happens at the provider (OpenAI prompt caching, Anthropic prompt caching). We hash the prompt server-side, watch the response, and tell you the truth about what your provider actually cached.

What counts as a repeat?

A full match on system + user message hash, within a configurable window (default 30 days). We also surface near-repeats so you can find prompt templates that almost match but blow the cache.

How far back can I look?

30+ days of full request history on every account. Per-request USD cost, per-day rollup, per-model breakdown.