300+ AI models. One OpenAI-compatible URL.
ai.hoody.com runs on your bare metal. Swap the base URL in any OpenAI client and your code works against Claude, GPT, Gemini, Llama, or any of 15+ inference providers.
Containers authenticate with container-NAME tokens — no real API keys in your workloads. Revoke a container and its AI access dies with it.
# Before (OpenAI direct)
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1",
api_key="sk-..."
)
# After (Hoody AI Gateway)
from openai import OpenAI
client = OpenAI(
base_url="https://ai.hoody.com/api/v1",
api_key="container-dev-env"
)
15+ inference providers. One API.
Text generation, image generation, embeddings. Every major provider you would otherwise wire up separately.
Claude family — Opus, Sonnet, Haiku
GPT family — flagship and cost-optimized tiers
Gemini family via Vertex AI
Llama family — hosted via inference partners
Mistral + Mixtral open-weight models
V3 and Coder family
Alibaba Qwen — 72B and QwQ reasoning
Command R family plus embeddings
Grok family — including vision tier
Sonar models with live-web grounding
Hosting platform for open-weight models
Optimized inference for open-weight models
Flux, SDXL, and specialized vision models
Stable Diffusion variants
300+ models total. Bring Your Own Provider (§5 below) extends this to direct keys and local/private endpoints. Full catalog in the docs.
No real API keys in your containers.
Every container gets a virtual key tied to its name: `container-NAME`. The token only works from the infrastructure it was issued in. Delete the container — the token is instantly useless.
Traditional AI integration
- —Real sk-... API key stored in env
- —Leaks live on in git history, logs, Slack
- —Rotation requires coordinated update across all workloads
- —Revocation kills everything using that key
Hoody AI gateway
- —Auto-issued container-NAME token at container create
- —Token only works from inside the container
- —Rotation = recreate container with the same name
- —Revoke container = that container is gone and so is its key
Safe for freelancer handoff, vibe-coded side-projects, and consumer SaaS. The gateway runs on your bare metal — Hoody never sees your prompts or responses. Zero-knowledge by architecture, not policy.
Intercept every AI request. Stack layers in order.
Route the gateway through hoody-exec to insert middleware before the provider call. Cache cheap. Inject context. Route to the right model. Gate destructive tool calls on human approval. Fan out to other agents. Log everything. The order is a pipeline — each layer runs before the next, and you pick which ones you need.
1 · Response cache
Hash the prompt; skip inference on a hit. Cheapest layer first.
2 · Context injection
Prepend system prompts from your knowledge base before the call.
3 · Cost routing
Easy prompts → cheaper models. Hard prompts → Claude. 40-70% savings documented.
4 · Tool-call tampering
Rewrite or block tool calls before they execute. Sandbox file writes.
5 · Human-in-the-loop
Stall high-stakes actions. Push a notification. Wait for approval.
6 · Agent cascade
Trigger another hoody-agent over HTTP. Multi-agent systems with no orchestrator.
7 · Audit log
Every request + response into SQLite for compliance and debugging.
Built-in rule engine covers the common MITM patterns with zero code. Drop down to custom hoody-exec scripts when a rule doesn't fit.
Bring your own provider. Opt out of keyless.
Keyless auth is the default and the safe path. But you are never locked in: route straight to any OpenAI-compatible endpoint — local Ollama, Azure OpenAI, Hugging Face inference, an enterprise proxy, or direct provider keys — by setting env vars inside the container. This is the explicit opt-out of keyless, not a parallel mode.
# direct provider keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
# point at local Ollama
OPENAI_BASE_URL=http://localhost:11434/v1
# or Azure / enterprise proxy
OPENAI_BASE_URL=https://your-proxy.internal/v1
What you otherwise stitch together
LiteLLM, Portkey, OpenRouter, and direct provider accounts each solve part of what the gateway solves in one surface running on your own hardware.
| Concern | Hoody AI Gateway | Commercial gateway / DIY |
|---|---|---|
| Where it runs | supported natively — Your bare metal | Vendor cloud (LiteLLM SaaS, Portkey) |
| Container-scoped auth | supported natively — container-NAME tokens | Shared API keys + RBAC |
| Bring your own provider | supported natively — Any OpenAI-compatible endpoint | Mostly yes, varies by vendor |
| MITM rules + scripts | supported natively — Built-in rule engine + hoody-exec scripts | Paid tier or external proxy |
| OpenAI-compatible | supported natively — Yes | Yes (most alternatives) |
| Pricing | supported natively — 5% markup over provider cost | Per-call fees + per-seat SaaS |
| Integrated with infra + wallet | supported natively — Single wallet, single API surface | Separate billing and ops |
When your stack is already on LiteLLM or OpenRouter and you do not want to run a gateway, those stay better. The Hoody AI Gateway earns its place when you want container-scoped auth, native MITM, and the two-balance wallet model — General Balance funds the server (Stripe/crypto/bank), AI Balance funds the gateway, with one-way General → AI transfers and one invoice covering both.
Claude in one base_url change.
Create a container, set base URL to https://ai.hoody.com/api/v1, use container-NAME as the bearer. Every OpenAI-compatible library already knows what to do.
See also — /platform/control-plane for token issuance and wallet, /platform/proxy for the URL layer underneath.