One endpoint. Per-tenant virtual keys. Hard budget caps. Edge prompt caching. Audit log. PII redaction. Model failover. Built on LiteLLM + Cloudflare AI Gateway. Run by Manny, the Intelligent IT assistant.
One LiteLLM team per tenant, one virtual key, one budget cap, one model allowlist. Tenants never see each other's traffic, keys, or spend. The master key only ever lives in GSM.
Caps and content blocks are enforced at the gateway, before any provider call. PII / PHI / PCI never reaches a model. "Ignore previous instructions" never moves a budget.
Every request gets an immutable audit row: tenant, user, model, tokens, cost, latency, decision, policy hit. Useful for SOC2, HIPAA, and the "why did our bill spike" conversation.
Cloudflare AI Gateway sits in front for edge prompt caching (~30% hit rate on repeat workloads) and provider failover. When Anthropic stutters, traffic shifts to OpenAI inside the same policy.