Stop losing money to uncached prompts.
Most teams enable prompt caching and assume it works. But a whitespace edit, a forgotten flag, a timestamp in the wrong place means you're paying for the same prompt twice.
Free, always. No card. No trial. Bring your own keys.
Three ways your cache quietly breaks.
None of them throw an error. None of them show up in logs. All of them inflate your bill.
Someone edited the system prompt.
A teammate adds one whitespace character to a system prompt to make logs cleaner. Hash changes. Every request that used to hit cache misses for a week before anyone notices.
A config flag got flipped.
A new SDK version, a refactor, an env var rename. Caching silently turns off. The app keeps working. The bill keeps growing. You only catch it on billing day.
A prompt has a timestamp in it.
Someone interpolates new Date() or a UUID into the prompt without realizing it. Every request looks unique to the cache. Hit rate drops to 0%. No exception, no error, no log line.
Every request gets a price tag.
ProxyLLM hashes every prompt server-side, watches what the provider reports as cached, and keeps that separate from repeat detection. Provider cache hit rate per model. Repeats over the last 30 days. Cost trend per day. All of it on every account.
- Per-request USD cost, per-day rollup, per-model breakdown
- Repeating prompt detection out of the box, with near-repeats flagged
- 30+ days of full request history
- Alerts when hit rate drops below your threshold
Cache visibility is the cheapest fix in your stack.
Sign up, drop in your keys, point your SDK at our base URL. Hit rate, savings, and repeat detection turn on automatically.
Run your AI workloads on your ChatGPT subscription.
ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing. Cache analytics shows what each request costs. Codex Hosted is how you stop paying per token.
Questions on cache analytics.
Is cache analytics included with membership?
Yes. Cache analytics are part of the ProxyLLM suite. One membership at $129/month covers gateway, routing, sub-keys, Blitz, schema outputs, analytics, and Codex Hosted.
Do you change my caching behavior?
No. ProxyLLM observes. Caching happens at the provider (OpenAI prompt caching, Anthropic prompt caching). We hash the prompt server-side, watch the response, and tell you the truth about what your provider actually cached.
What counts as a repeat?
A full match on system + user message hash, within a configurable window (default 30 days). We also surface near-repeats so you can find prompt templates that almost match but blow the cache.
How far back can I look?
30+ days of full request history on every account. Per-request USD cost, per-day rollup, per-model breakdown.