Model integration · Replicate

Replicate for model breadth.

Pass chat-compatible Replicate text models through one endpoint on your own key. Specialized predictions stay on the native API until direct adapter support lands.

Start free How to connect

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

Use Replicate for model breadth

Replicate hosts community models and custom deployments. ProxyLLM covers the chat-compatible text side today; native Replicate API support is future work.

Unify text inference

Send chat-completion-compatible requests through https://api.proxyllm.ai/v1 on your own key so usage and budget caps stay in ProxyLLM.

Keep media on the native API

Image, video, and other non-chat Replicate predictions stay on Replicate's own API until ProxyLLM adds a direct adapter.

Unify the text side first.

Chat-compatible Replicate models sit behind the same OpenAI-compatible gateway on your key.

client.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.proxyllm.ai/v1",
    api_key="pk_live_...",
)

r = client.chat.completions.create(
    model="replicate/meta/meta-llama-3-70b-instruct",
    messages=[{"role": "user", "content": "Create a compact product FAQ."}],
)

Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

Get Codex Hosted How it works

$129/month · normal SaaS pricing

Track the text side.

Request logs, budget caps, and scoped sub-keys for Replicate text workloads, without pretending every Replicate API surface is identical.

Start free All integrations