Model integration · Meta Llama

Llama as your open-weights lane.

Call Llama from the same endpoint as your closed models, on your own OpenRouter key. ProxyLLM adds sub-keys, budget caps, and request logs around the traffic.

Start free How to connect

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

Add an OpenRouter key

Llama ships through OpenRouter today. One encrypted key of your own covers Meta Llama models across the providers OpenRouter exposes.

Request meta-llama models

Use https://api.proxyllm.ai/v1 from the OpenAI SDK and pick a meta-llama/ model name for open-weights inference.

Hold Llama accountable

Give Llama workloads their own sub-keys and budget caps, then read cost per request in the logs to confirm the open-weights economics actually land.

Open-weights, same client.

The meta-llama/ prefix passes Llama traffic through your OpenRouter key.

client.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.proxyllm.ai/v1",
    api_key="pk_live_...",
)

r = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Summarize these notes."}],
)

Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

Get Codex Hosted How it works

$129/month · normal SaaS pricing

Open weights, accounted for.

Llama passes through on your own key with no inference markup. OpenAI-bound work can run through Codex Hosted on your flat ChatGPT subscription instead of per-token pricing.

Start free All integrations