What Works with Codex Hosted (and What Doesn't)

If a tool accepts an OpenAI base URL, it works with Codex Hosted. The caveats, honestly: complete responses, Codex's model surface, and what stays on your API key.

Compatibility with Codex Hosted comes down to one rule: if a tool lets you set an OpenAI base URL, it works, because the endpoint speaks the standard OpenAI request and response shape. The honest caveats are three. The Codex lane returns complete responses rather than streams, the model list is whatever Codex currently serves, and API-platform features like embeddings and fine-tunes stay on your own key.

This page is the capability map: what runs, what runs with a caveat, and what should not move at all. If you want the feature definition first, what is Codex Hosted? covers it in two minutes.

The one-line compatibility test

If a tool accepts an OpenAI base URL and can wait for a complete response, it can run on a ChatGPT subscription. The swap is two values:

export OPENAI_BASE_URL="https://api.proxyllm.ai/v1"
export OPENAI_API_KEY="pllm_your_key_here"

Everything downstream of those variables, request format, response parsing, error handling, behaves like the OpenAI API it was written for. The five-minute walkthrough is in the setup guide.

The capability table

CapabilityOn the Codex laneWhere it runs instead
/v1/chat/completionsYes, standard payloadsn/a
Official OpenAI SDKs (Python, Node)Yes, base URL swapn/a
Automation platforms (n8n, Make, Zapier)Yes, custom base URL or HTTP modulen/a
Agent frameworks (LangChain, LlamaIndex)Yesn/a
Plain curl and HTTP clientsYesn/a
Streaming (stream: true)No, complete responses onlyAPI-key lane, which streams
Full API model catalogNo, Codex’s model surfaceYour OpenAI key
EmbeddingsNoYour OpenAI key
Fine-tuningNoYour OpenAI key
Parameters outside Codex’s surfaceNoYour OpenAI key

The left column is where the flat-plan economics live. The right column is why a good setup keeps a key configured even when the subscription does the bulk work.

Complete responses, not streams

The Codex lane returns the whole response at once. We say this everywhere because it is the one behavioral difference your code can observe, and pretending otherwise would cost you a debugging afternoon.

Most programmatic workloads never notice. Cron jobs, agents, pipelines, document processing, CI checks, and webhook handlers all consume the finished text; a stream would be buffered into a string anyway. What notices is a human watching a screen: a chat interface that renders tokens as they arrive will feel broken waiting on a complete payload.

The split that works in practice: bulk and background traffic on the Codex lane, interactive chat on an API-key lane. Both lanes live behind the same endpoint, and the request log shows which lane served each call.

The model surface

The endpoint serves what Codex serves. OpenAI decides that set and rotates it as models ship, so we refuse to hardcode a list that would be stale by autumn. Ask the endpoint instead:

curl -s "$OPENAI_BASE_URL/models" \
  -H "Authorization: Bearer $OPENAI_API_KEY"

If your workload depends on a specific API-only model, a tuned snapshot, or sampling parameters Codex does not expose, that traffic belongs on your key. Raw request and response anatomy, including error shapes, is covered in the curl walkthrough.

What stays on your API key

Three categories, stated plainly so nothing surprises you in production:

  1. Embeddings and fine-tunes. API-platform features with no Codex equivalent.
  2. Streaming interfaces. Anything where perceived latency is the product.
  3. Catalog and parameter edge cases. Models or options outside what Codex serves.

Your key can route through ProxyLLM as a passthrough lane with no markup, which keeps every request, flat or metered, in one log. That is also the fallback path when plan windows are exhausted.

App-by-app guidance

SDK codebases change two constructor arguments and are done. Automation platforms point their OpenAI credential’s base URL field, or an HTTP module, at the endpoint; the n8n version is worked through in using your ChatGPT subscription in n8n. Agent frameworks configure a custom endpoint once and inherit it everywhere, which matters because agent loops are where per-token billing hurts most. Chat products should split traffic as described above rather than forcing one lane to do both jobs.

Tool-specific configuration pages live in integrations.

The capability map is honest because the economics survive honesty: the workloads that fit the Codex lane are exactly the high-volume programmatic ones that make per-token bills painful. If that sounds like your bill, the calculator prices the move in thirty seconds.

Frequently asked questions

Does Codex Hosted work with any app that accepts an OpenAI base URL?

Yes. The endpoint speaks the standard OpenAI request and response shape at https://api.proxyllm.ai/v1, so official SDKs, automation platforms, agent frameworks, and plain HTTP clients work with a base URL and key swap, no code changes.

Does Codex Hosted support streaming responses?

The Codex lane returns complete responses, not token streams. API-key fallback lanes stream normally. Backend jobs, agents, and batch work rarely notice; interfaces that render token by token should run on a key lane.

Which models does Codex Hosted serve?

The models Codex itself currently serves, which OpenAI rotates over time. Query GET /models on the endpoint for the live list rather than relying on a published snapshot. The API's full catalog stays available through your own key.

Can I run embeddings or fine-tuning through Codex Hosted?

Not on the Codex lane. Embeddings, fine-tunes, and parameters outside Codex's surface are API-platform features and belong on your OpenAI key, either called directly or routed through ProxyLLM's passthrough lane with no markup.

Should a chat UI use Codex Hosted?

Only if it tolerates complete responses. The practical split is to run background and bulk work on the Codex lane and keep interactive, token-streamed chat on an API-key lane, both behind the same endpoint.

More on Codex Hosted
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.