Llama as your open-weights lane.
Call Llama from the same endpoint as your closed models, on your own OpenRouter key. ProxyLLM adds sub-keys, budget caps, and request logs around the traffic.
$129/month SaaS. Bring your own model keys. No inference markup.
Three steps to connect.
Add an OpenRouter key
Llama ships through OpenRouter today. One encrypted key of your own covers Meta Llama models across the providers OpenRouter exposes.
Request meta-llama models
Use https://api.proxyllm.ai/v1 from the OpenAI SDK and pick a meta-llama/ model name for open-weights inference.
Hold Llama accountable
Give Llama workloads their own sub-keys and budget caps, then read cost per request in the logs to confirm the open-weights economics actually land.
Open-weights, same client.
The meta-llama/ prefix passes Llama traffic through your OpenRouter key.
from openai import OpenAI
client = OpenAI(
base_url="https://api.proxyllm.ai/v1",
api_key="pk_live_...",
)
r = client.chat.completions.create(
model="meta-llama/llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Summarize these notes."}],
) Run your AI workloads on your ChatGPT subscription.
ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.
Open weights, accounted for.
Llama passes through on your own key with no inference markup. OpenAI-bound work can run through Codex Hosted on your flat ChatGPT subscription instead of per-token pricing.