Hugging Face without key sprawl.
Try huggingface/ models through one gateway on your own key. Every experiment sits behind a scoped sub-key, a budget cap, and a shared request log.
$129/month SaaS. Bring your own model keys. No inference markup.
Three steps to connect.
Reach the open-model catalog
Hugging Face fronts a wide set of open models and inference providers. Use OpenAI-compatible access with your own key where available; a native Hugging Face adapter is future work.
Normalize the client
Send compatible chat requests through https://api.proxyllm.ai/v1 and keep provider choice out of your application code.
Budget the experiments
Give researchers and internal tools scoped sub-keys with caps so model experiments never become unrestricted upstream access.
Experiment behind a sub-key.
Call chat-compatible Hugging Face models where your provider setup exposes them.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.proxyllm.ai/v1",
apiKey: "pk_live_...",
});
const r = await client.chat.completions.create({
model: "huggingface/meta-llama/llama-3.1-8b-instruct",
messages: [{ role: "user", content: "Test this small-model prompt." }],
}); Run your AI workloads on your ChatGPT subscription.
ProxyLLM runs OpenAI's Codex for you, signed in with your own ChatGPT account. Your apps call one OpenAI-compatible endpoint and the work bills to your flat plan instead of per-token API pricing.
Let teams explore safely.
Spend limits and request logs per sub-key keep open-model experiments accountable. $129/month flat, no markup on inference.