noburn.dev
← BlogJoin waitlist
billingper-user meteringSaaSLLMstripe

Per-User LLM Billing: The Gap Nobody Has Filled

OpenMeter went enterprise after Kong acquired it. Stripe LLM billing is still rolling out. For SaaS companies that need to meter each customer's AI usage, there's almost nothing self-serve.

nb
noburn.dev·2026-05-26

Most AI SaaS companies start with a flat subscription. You pay $49/mo and you get access to the AI features. Simple pricing, easy to reason about, fast to ship.

The flat subscription works until you look at usage across your customer base and find that the top 5% of users are consuming 60% of your AI costs. You're losing money on heavy users and leaving money on the table with light users. The economics don't work at scale.

The solution is per-user LLM billing — meter each customer's AI consumption and charge them proportionally. In theory, solved. In practice, as of mid-2026, the self-serve options are surprisingly thin. I've looked at every meaningful platform in this space and the conclusion is uncomfortable: the tools that could do this well either went enterprise or got acquired. What's left is either a DIY project or a sales call.


Why Flat Pricing Breaks for AI Products

Traditional SaaS products have reasonably predictable marginal costs per user. Hosting a project management app costs roughly the same whether the user creates 10 tasks or 1,000 tasks — the compute difference is negligible compared to the subscription revenue.

LLM APIs are different. The cost is directly proportional to usage. A heavy user processing long documents, running multi-step agents, or using expensive models can cost you 10x-50x more than a light user — while paying the same flat fee. This creates three distinct problems:

Margin erosion at scale. The first 100 users might have acceptable average costs. By user 10,000, the heavy users have found your product, and the average cost has drifted well above your model. You discover this when the cloud bill comes in and the unit economics no longer work.

No incentive alignment. Heavy users who consume expensive resources have no feedback signal that their usage has a cost. They make no tradeoff decisions. Light users subsidize them silently.

Inability to price-tier correctly. Without per-user metering, you can't offer tiered plans based on AI consumption. You can't create a "power user" tier for high-volume customers. Your pricing is either too cheap for heavy users or too expensive for light users.


What the Market Has for Per-User Metering

There have been three meaningful attempts at per-user AI metering as a self-serve product. All three have problems.

OpenMeter → Kong Metering & Billing

OpenMeter was the closest thing to a self-serve metering platform for AI usage. Its architecture was designed for real-time scalable metering across any dimension — requests, tokens, compute time, custom events.

OpenMeter was acquired by Kong and is now Kong Metering & Billing. The product still exists, but it's enterprise-grade and sales-gated. There is no public pricing, no self-serve signup. The indie/SMB entry point is gone.

This is a meaningful gap. OpenMeter was genuinely useful for the small team that needed metering infrastructure without the overhead of building it. Kong's acquisition moved the product upmarket and effectively removed it from the self-serve category. In my view, this was the single most damaging consolidation in the AI tooling space for small teams — not because OpenMeter was perfect, but because nothing comparable has filled the space it left.

Stripe LLM Token Billing

Stripe has launched an LLM token billing feature that auto-syncs token prices for OpenAI, Anthropic, and Google models. You set a markup percentage, and Stripe handles metering through its AI Gateway or self-reported usage events. The feature is documented in Stripe's billing docs and covers the invoicing layer.

The current limitations: the LLM token billing feature is still rolling out — not all accounts have access. Stripe's 0.7% billing fee on top of payment processing fees pushes total cost toward 1.5%, which compounds at scale. And critically, Stripe handles invoicing — it doesn't handle token counting, usage attribution to specific customers, or budget enforcement. You still need to build those layers yourself.

Portkey Enterprise

Portkey includes per-customer budget enforcement and metering in its Enterprise tier. The feature set is solid — virtual keys per customer, budget limits, usage reporting.

The limitation is pricing. Enterprise is custom and sales-gated. The self-serve tiers (Developer, Production at $49/mo — see portkey.ai/pricing) don't include per-user metering or enforcement. For solo founders and small teams, Enterprise pricing is typically $500+/mo with a sales call and a contract. For a full breakdown of Portkey's tier limitations, see Helicone vs Portkey in 2026.


What Per-User Billing Actually Requires

If you're building this yourself, the components are:

1. Usage attribution. Every LLM call must be attributed to a specific end-user. This means passing a customer identifier through your application layer and into the call metadata. This sounds simple but gets complicated quickly: what if one user triggers a background job that makes 20 calls? What if multiple users share a session?

2. Token counting and cost calculation. After each call, you need the exact token counts (input + output) and the model-specific price per token. Model pricing changes periodically — this is a maintenance surface. Cached token pricing is different from regular token pricing (OpenAI charges 50% for cached input). Reasoning tokens on models like o3 count differently from standard tokens.

3. A spend ledger. The total spend per customer needs to be stored somewhere, updated atomically after each call, and queryable for billing cycles. At low volume, this is a Postgres table. At high volume, you start worrying about write throughput under concurrent calls.

4. Budget enforcement. If you're capping users when they hit a limit, you need a check before each call — not after. The call can't fire first and be rejected after. This is the pre-flight enforcement problem, covered in detail in How to Set a Hard Budget Cap on LLM API Calls.

5. Invoice generation. At the end of the billing period, you need to generate an invoice that shows the customer what they were charged and why. This connects to Stripe, needs line items that map to LLM usage events, and needs to handle proration, overages, and billing period boundaries correctly.

6. A customer-facing dashboard. Your users want to know how much they've used. If they don't have visibility, they'll be surprised by charges and churn. This means building a spend view for each customer — current period, history, remaining budget.

Each of these is a distinct engineering project. The total build time for a complete per-user metering and billing stack is typically 3-6 weeks of focused engineering time, longer if you're doing it alongside feature work.


The Passthrough Billing Pattern

Passthrough billing is the specific model where:

  • You charge your customers for their LLM consumption
  • At roughly your cost (with a margin)
  • Instead of bundling it into a flat subscription

This is common in usage-based SaaS: Vercel bills for compute per request, Cloudflare bills for bandwidth per GB, Twilio bills per SMS. The cost passes through to the customer in proportion to their actual usage.

For AI SaaS, the pattern is: customer uses your AI features → you count their tokens → you charge them $X per 1k tokens (where $X covers your model cost plus your margin).

The appeal is clear — your gross margin per customer is predictable regardless of usage volume. The challenge is that implementing passthrough billing requires all the components listed above to work correctly, reliably, at scale.


What This Looks Like in Practice

A typical implementation flow for a founder adding per-user billing:

Week 1: Add user ID to every LLM call. Create a spend table in Postgres. Write a reconciliation job that runs after each call and updates the table.

Week 2: Build the pre-flight check. Write a function that queries the spend table before each call and rejects it if budget is exhausted. Test it under concurrent load to find the race conditions.

Week 3: Connect to Stripe. Create a billing meter, write the reporting event to Stripe on each call, handle webhook events for billing period resets.

Week 4: Build the customer-facing spend dashboard. Handle edge cases: what does a customer see in their first billing period? What happens when a billing period resets mid-session?

Week 5-6: Production hardening. The spend table now has 3 million rows. The pre-flight check is adding 40ms of latency. The Stripe webhook is occasionally delayed and creating reconciliation gaps.

The build is not technically complex — none of the individual components are hard. The complexity is the surface area. You're now maintaining metering infrastructure alongside your product.


The Remaining Gap

The self-serve per-user AI metering market is genuinely thin:

  • OpenMeter (the only dedicated platform) got acquired and went enterprise
  • Stripe LLM token billing is still rolling out with limited access
  • Portkey does it at enterprise pricing only
  • Helicone is in maintenance mode and never had enforcement anyway

For a founder who needs this today — who needs to bill their own customers for AI usage, enforce budgets per customer, and generate invoices automatically — the options are: build it yourself, or use a tool built specifically for this problem. After evaluating all three platform options against each other, I'd conclude that none of them work without either significant custom engineering or an enterprise contract. That's not a knock on the individual tools — it's an accurate description of where the market is right now.

The build-yourself path is the default. Most teams that need per-user AI metering build it in-house, discover the edge cases in production, and then maintain it as an indefinite operational burden.


Frequently Asked Questions

Does Stripe support LLM token billing natively?

Yes, Stripe has an LLM token billing feature that auto-syncs model pricing and handles invoicing. However, it handles the billing layer only — you still need to build token counting, customer attribution, and budget enforcement yourself. Access is not yet available to all accounts. Stripe also charges a 0.7% fee on billing volume on top of payment processing fees, which compounds at scale.

Can I use OpenMeter for per-user LLM metering?

OpenMeter was acquired by Kong and is now Kong Metering & Billing — enterprise-grade with custom pricing and no self-serve entry point. The OpenMeter product that was available to indie developers is no longer offered in its previous form.

What is passthrough billing for AI SaaS?

Passthrough billing is a pricing model where you charge your end-customers for your AI API costs in proportion to their actual usage, rather than bundling the cost into a flat subscription. You define a per-token or per-request rate (typically your cost plus a margin), meter each customer's consumption, and invoice them for it. This keeps gross margin stable regardless of how much any individual customer uses the AI features.

How long does it take to build per-user LLM metering in-house?

The core components — usage attribution, token counting, spend ledger, pre-flight enforcement, Stripe integration, and a customer spend dashboard — take roughly 3-6 weeks of focused engineering time. Ongoing maintenance (model pricing updates, new provider support, ledger scaling) is an additional operational surface that compounds over time.

What is the difference between metering and billing enforcement for LLM usage?

Metering counts and attributes usage — tracking how many tokens each customer consumed. Billing enforcement blocks calls when a spending threshold is crossed. You need both: metering alone tells you what customers spent but can't prevent overruns, and billing enforcement without metering can't calculate per-customer costs accurately. Pre-flight enforcement (covered in How to Set a Hard Budget Cap on LLM API Calls) requires accurate metering as its foundation.