Model integration · Hugging Face

Hugging Face without key sprawl.

Try huggingface/ models through one gateway on your own key. Every experiment sits behind a scoped sub-key, a budget cap, and a shared request log.

Start free How to connect

$129/month SaaS. Bring your own model keys. No inference markup.

Three steps to connect.

Reach the open-model catalog

Hugging Face fronts a wide set of open models and inference providers. Use OpenAI-compatible access with your own key where available; a native Hugging Face adapter is future work.

Normalize the client

Send compatible chat requests through https://api.proxyllm.ai/v1 and keep provider choice out of your application code.

Budget the experiments

Give researchers and internal tools scoped sub-keys with caps so model experiments never become unrestricted upstream access.

Experiment behind a sub-key.

Call chat-compatible Hugging Face models where your provider setup exposes them.

client.ts

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.proxyllm.ai/v1",
  apiKey: "pk_live_...",
});

const r = await client.chat.completions.create({
  model: "huggingface/meta-llama/llama-3.1-8b-instruct",
  messages: [{ role: "user", content: "Test this small-model prompt." }],
});

Codex Hosted · the main feature

Run your AI workloads on your ChatGPT subscription.

ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.

Get Codex Hosted How it works

$129/month · normal SaaS pricing

Let teams explore safely.

Spend limits and request logs per sub-key keep open-model experiments accountable. $129/month flat, no markup on inference.

Start free All integrations