The LLM gateway category started as a proxy problem. Developers did not want to maintain separate integrations for OpenAI, Anthropic, Cohere, and whatever model came next, so a unified API layer made sense. But as teams moved LLM features into production, the gateway accumulated more jobs: routing decisions, fallback logic, request logging, cost allocation, spend controls. Four tools now occupy different corners of this space, each with a different opinion about what the gateway's primary job actually is. The differences matter more than they look on a feature checklist.
LiteLLM
LiteLLM started as a Python library that maps 100+ LLM providers to the OpenAI SDK signature. You call completion() once, and LiteLLM handles translating that call for Anthropic, Cohere, Mistral, Bedrock, or whatever provider is on the other end. That translation layer is the core reason LiteLLM has strong adoption among teams running multi-provider strategies. You write your application once against the OpenAI interface and swap models or providers without touching application code.
The proxy server mode turns LiteLLM into a sidecar that your entire application routes through. This unlocks centralized key management (your application holds a LiteLLM virtual key, not raw provider credentials), request logging to a Postgres or SQLite database, and basic spend tracking by user or team. Rate limiting exists in both the open source and enterprise versions, but it operates at the key or team level rather than per end-customer.
One risk worth noting: in March 2026, a supply chain attack compromised LiteLLM's CI/CD pipeline and injected credential-stealing code into PyPI versions 1.82.7 and 1.82.8, documented by Datadog Security Labs and tracked as CVE-2026-33634 (CVSS 9.4). Teams with compliance requirements have used this incident as a reason to evaluate alternatives with a smaller attack surface.
LiteLLM's open source tier is free to self-host. The Enterprise version adds SSO, dedicated support, and more granular controls, but pricing requires a sales conversation. For teams that need multi-provider routing under one roof and are comfortable running infrastructure, LiteLLM is the most mature open source option. The tradeoff is operational overhead: you own the database, the deployment, and the uptime. That is fine if you have a platform team and a reason to stay on-prem. It is a real drag if you do not.
Portkey
Note: Palo Alto Networks announced its intent to acquire Portkey on April 30, 2026, pending close in Q4 FY2026. The product is actively developed but the long-term self-serve pricing and roadmap may shift post-acquisition.
Portkey positions itself as the production reliability layer for LLM applications. Its headline features are load balancing across providers, automatic fallbacks when a provider returns an error or times out, and request retries with configurable backoff. These are genuine production problems. OpenAI's API has had several high-profile outages, and a gateway that can silently fail over to Anthropic or Azure OpenAI before your users notice is worth something.
Beyond reliability, Portkey logs every request and response, calculates cost per request, and provides a dashboard for cost and latency trends. The logging is after-the-fact, which means it is observability rather than enforcement. You can configure budget alerts in Portkey, but an alert is not a block. If a runaway agent fires a thousand requests before you see the alert, the cost has already landed on your bill.
Portkey offers a free tier and paid plans for higher volume and team features (see portkey.ai/pricing). It is available as a hosted service or self-hosted via its open source gateway. The architecture sits closer to Helicone than to LiteLLM: it assumes you have settled on your provider set and want operational visibility and resilience on top, rather than a translation layer underneath.
Helicone
Note: Helicone was acquired by Mintlify on March 3, 2026 and is now in maintenance mode. New feature development has stopped. The product remains live but teams evaluating it for new projects should factor this in.
Helicone is primarily an observability tool. You route your OpenAI or Anthropic calls through Helicone's proxy, or use a header-based integration that requires no SDK changes, and Helicone logs everything: prompts, completions, token counts, cost, latency, and any user identifiers you attach at request time. The dashboard is well-designed for debugging and cost analysis. You can slice spend by user, by prompt template, or by custom properties your application sets on each request.
The open source version of Helicone can be self-hosted. The hosted service has a free tier and paid plans for higher volume, experiments, and eval integrations (see helicone.ai/pricing). The experiments feature lets you A/B test prompt variants against each other on real traffic, and the eval integrations connect to tools like Braintrust. If your primary problem is understanding what your LLM is doing and why it costs what it costs, Helicone is probably the fastest path to that answer.
Rate limiting exists but it is reactive. You can configure per-user rate limits, and Helicone will return a 429 once the limit has been crossed. That is different from blocking a call before it reaches the provider. If your primary problem is preventing an individual user or customer from spending more than a fixed dollar amount before the billing cycle closes, the reactive model means at least one over-limit call gets through before enforcement kicks in. For many use cases that is acceptable. For SaaS products with per-customer LLM budgets, it is not.
Comparison
| LiteLLM | Portkey | Helicone | noburn.dev | |
|---|---|---|---|---|
| Primary job | Multi-provider routing | Reliability + observability | Observability + debugging | Pre-flight spend enforcement |
| Enforcement model | Post-call rate limiting | Budget alerts (reactive) | Reactive 429 after limit crossed | Pre-call block before provider fires |
| Per-user metering | Key/team level | User-level logging | User-level logging | Per end-customer budget enforcement |
| Passthrough billing | No | No | No | Yes, via Stripe |
| Self-hosting | Yes (open source) | Yes (open source gateway) | Yes (open source) | Hosted only |
| Free tier | Free to self-host | Check portkey.ai | Check helicone.ai | 50k req/mo, 1 project |
| Paid pricing | Enterprise (contact sales) | Check portkey.ai | Check helicone.ai | $9/mo Early Bird, $49/mo Pro |
| OpenAI SDK | Yes | Yes | Yes | Yes |
| Anthropic SDK | Yes | Yes | Yes | Yes |
| LangChain / LangGraph | Yes | Yes | Partial | Yes |
| Vercel AI SDK | No | Partial | Partial | Yes |
| Multi-provider routing | Yes (100+ providers) | Yes | Limited | No |
| Prompt logging | Yes | Yes | Yes | No |
| Evals / experiments | No | Limited | Yes | No |
Verify current pricing and limits at each provider's website before making purchasing decisions.
What the category is still missing
Every tool in this list either measures cost after the call completes or blocks at a request-count ceiling. That distinction sounds minor until you are running a multi-tenant SaaS where each customer has a separate LLM budget. The current tools give you one of two things: a dashboard showing how much each customer spent last billing period, or a hard request-count limit that eventually returns a 429. Neither of those is "customer A has a $50/month LLM budget and I need their API calls to stop at $49."
Token-accurate pre-call estimation is hard because actual cost depends on output tokens, which are unknown until the call completes. Most gateways sidestep this by counting requests or measuring spend in arrears. Pre-flight enforcement requires estimating input token cost before dispatch and applying a conservative ceiling, then reconciling output tokens after the response. That is a different architecture than a logging proxy. None of the three major gateways has made it their primary design constraint, because their primary constraint is something else: provider translation, reliability, or debugging.
The Stripe passthrough gap is also real. You can build per-customer LLM billing yourself on top of Helicone's user-level cost data or LiteLLM's team spend tracking, but that means writing a billing service, handling Stripe webhooks, setting up products and meters, and reconciling usage monthly. For SaaS teams that want to charge customers for their LLM usage without building a billing layer from scratch, there is no off-the-shelf answer in the current gateway landscape.
FAQ
What is an LLM gateway and when do I actually need one?
An LLM gateway sits between your application code and the provider APIs. At minimum it gives you a single integration point so you stop scattering API keys and retry logic across every service that calls an LLM. A single internal tool with low volume probably does not need it. Any production SaaS application making LLM calls on behalf of users benefits from centralized routing, logging, and some form of spend visibility because the alternative is a surprise on your provider invoice and no clear path to isolating which feature or customer caused it.
What is the practical difference between LiteLLM and Portkey?
LiteLLM's core job is provider translation: it makes Anthropic, Cohere, Mistral, and 100 other providers look like the OpenAI API so your application code does not change when you switch models. Portkey's core job is production reliability: load balancing, automatic fallbacks, and observability for a provider set you have already decided on. They overlap significantly in the middle (both log requests, both do some routing), but LiteLLM is the right starting point if you have multi-provider requirements, and Portkey is the right starting point if you have reliability and observability requirements against a stable provider set.
Can any of these tools block LLM calls before they fire?
LiteLLM and Helicone can return a 429 once a rate limit has been reached, which stops subsequent calls but only after the limit was already crossed on a prior request. Pre-flight enforcement, where the gateway estimates token cost before dispatching the API call and blocks if the user is over budget, is not the primary design of any of the three. noburn.dev is built specifically around this model: it estimates input token cost client-side, checks against the user's remaining budget, and blocks the call before it reaches the provider.
Which tool is best for a multi-tenant SaaS with per-customer budgets?
It depends on what "best" means in your context. LiteLLM lets you assign virtual keys per customer and configure spend limits on those keys, but enforcement is after-the-fact and requires you to run and maintain the proxy. Helicone gives you clean per-user cost data but does not block on budget. If you need hard per-customer budget enforcement combined with the ability to bill customers for their LLM usage via Stripe, none of the three tools handles this end-to-end without significant custom work on top.
Is self-hosting an LLM gateway worth the operational cost?
For teams with a dedicated infrastructure function, self-hosting LiteLLM or the open source Helicone or Portkey gateways gives you data residency, no per-request fees at scale, and full control over the upgrade cycle. For smaller teams without infrastructure ownership, maintaining database backups, uptime monitoring, and version upgrades typically costs more in engineering time than the monthly fee for a hosted service at the request volumes they are running. The math changes somewhere around 5-10 million requests per month depending on the pricing tier, but below that threshold the operational overhead rarely justifies it.
If the gap you are trying to close is pre-call enforcement, specifically stopping LLM spend before it exceeds a per-user budget rather than measuring it after, noburn.dev is built for that use case. It estimates token cost client-side before the API call fires, blocks the request if the user is over budget, and includes Stripe passthrough so you can charge customers for their LLM usage without writing a billing layer yourself. The free tier covers 50k requests per month with no credit card required.