Replicate for model breadth.
Pass chat-compatible Replicate text models through one endpoint on your own key. Specialized predictions stay on the native API until direct adapter support lands.
$129/month SaaS. Bring your own model keys. No inference markup.
Three steps to connect.
Use Replicate for model breadth
Replicate hosts community models and custom deployments. ProxyLLM covers the chat-compatible text side today; native Replicate API support is future work.
Unify text inference
Send chat-completion-compatible requests through https://api.proxyllm.ai/v1 on your own key so usage and budget caps stay in ProxyLLM.
Keep media on the native API
Image, video, and other non-chat Replicate predictions stay on Replicate's own API until ProxyLLM adds a direct adapter.
Unify the text side first.
Chat-compatible Replicate models sit behind the same OpenAI-compatible gateway on your key.
from openai import OpenAI
client = OpenAI(
base_url="https://api.proxyllm.ai/v1",
api_key="pk_live_...",
)
r = client.chat.completions.create(
model="replicate/meta/meta-llama-3-70b-instruct",
messages=[{"role": "user", "content": "Create a compact product FAQ."}],
) Run your AI workloads on your ChatGPT subscription.
ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.
Track the text side.
Request logs, budget caps, and scoped sub-keys for Replicate text workloads, without pretending every Replicate API surface is identical.