Model integration · Cohere

Cohere for RAG, behind one gateway.

Run command-r-plus and the rest of the Command family for retrieval-heavy work without provider-specific glue code. Your key passes through; sub-keys, caps, and logs wrap the traffic.

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

01

Use your OpenRouter key

Cohere's Command family ships through OpenRouter today. Add your own key once; native Cohere key storage can come later without touching client code.

02

Keep one gateway

Send requests to https://api.proxyllm.ai/v1 with your ProxyLLM key. Your app never learns Cohere's native API shape.

03

Watch retrieval spend

RAG summarization burns tokens on context. Track volume per sub-key and set budget caps before a retrieval pipeline surprises you.

Cohere as a passthrough model.

Use cohere/ model names where your configured provider exposes them.

client.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.proxyllm.ai/v1",
    api_key="pk_live_...",
)

r = client.chat.completions.create(
    model="cohere/command-r-plus",
    messages=[{"role": "user", "content": "Answer from these retrieved passages."}],
)
Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

$129/month · normal SaaS pricing

Control retrieval spend.

RAG gets expensive quickly. ProxyLLM gives each pipeline a scoped sub-key, a budget cap, and a request log, with no markup on inference.