Blitz

Stop losing customers to long AI response times.

Classify. Extract. Score. Summarize. Respond. Five prompts that could finish at once instead take five times as long. Blitz fans them out in parallel with a single call.

Start with Blitz Without vs. with ProxyLLM

Free on every account. 0% markup on inference. Bring your own keys.

Without vs. with ProxyLLM

Eight 0.9-second prompts in a row is 7.5 seconds of spinner. In parallel, it is 0.94 seconds.

Sequential 7.50s

0.9s

Total = sum of every individual call. User watches a spinner.

ProxyLLM Blitz 0.94s

0.9s

Total = slowest single call. User does not see a spinner.

One call. All results.

Drop the for-loop. Replace it with a Blitz request that takes an array of prompts.

before.ts

const results = []
for (const prompt of prompts) {
  const r = await openai.chat.completions
    .create({ model: "gpt-4o-mini",
      messages: [{ role: "user",
        content: prompt }] })
  results.push(r)
}
// 8 prompts · 7.5s total

after.ts

const { results } = await proxyllm.blitz({
  model: "gpt-4o-mini",
  prompts,
  max_usd: 0.50,
})
// 8 prompts · 0.94s total
// rate-limit aware, partial-failure handled

What people use Blitz for.

Anywhere you have a for-loop around an LLM call, Blitz fits.

Classify, then extract

Two passes on the same input. Classify the message, then extract structured fields. Blitz runs both at once and returns when both are done.

Multi-aspect scoring

Score a piece of content on tone, accuracy, brand fit, and risk. Four separate calls, one Blitz request, all four results back at once.

Fan-out summarization

Summarize 30 documents. Sequential is a coffee break. Blitz finishes before you switch tabs.

A/B prompt evaluation

Same input, three prompt variants. Compare outputs side by side. Blitz gets you all three in the time of one.

Free on every account · 0% markup on inference

Same tokens. Same bill. 8x faster.

Blitz is free on every account. Sign up, drop in your keys, swap the for-loop for one call.

Start with Blitz

Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing. Blitz fans the calls out. Codex Hosted is the lane where they run at a flat price.

Get Codex Hosted How it works

Questions on Blitz.

What if one of the prompts fails?

Blitz returns partial results with per-prompt status. The rest of your batch is unaffected. You decide whether to retry the failures or move on.

Does it respect rate limits?

Yes. Blitz is rate-limit aware across providers. It will back off, queue, and reschedule based on the limits your keys have. You can also set a hard concurrency cap.

Can I set a cost ceiling?

Yes. Set a max-USD-per-Blitz-call. ProxyLLM stops dispatching when the cap is hit and returns whatever finished.

Does Blitz work with my Codex subscription?

Yes. Blitz distributes calls across whatever credentials you have configured. If your Codex container can absorb half the batch, it does. The rest falls back to API keys.