GPT-4-level capability cost $30 per million tokens in 2023. In mid-2026, you can get comparable quality for under $3 — and the cheapest option from either provider sits at $0.10. That 300x compression at the top tier, and 150x at the bottom, has happened in under three years. Models that were the cheapest option at launch now sit in the middle of each provider's tier stack as newer, cheaper models land below them. The GPT-4.1 nano tier and the Claude Haiku line have pushed the floor substantially downward, while reasoning models at the top of each catalog have introduced a new cost ceiling that can surprise teams running agentic workloads.
This post covers token pricing across both providers' current model tiers as of mid-2026 and works through three realistic agent scenarios. All figures are based on confirmed pricing; no rows are flagged as unverified.
OpenAI API pricing
OpenAI's API catalog now spans three product lines: the GPT-4.1 series (their current general-purpose flagship), the GPT-4o multimodal series, and the o-series reasoning models. Pricing differs substantially across these lines, and picking the wrong one for your workload is a common source of unnecessary cost.
GPT-4.1 series
GPT-4.1 is OpenAI's recommended starting point for most text tasks. The nano tier is aimed at high-volume, latency-sensitive tasks where quality requirements are modest.
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| GPT-4.1 | $2.00 | $8.00 |
| GPT-4.1 mini | $0.40 | $1.60 |
| GPT-4.1 nano | $0.10 | $0.40 |
GPT-4o series
GPT-4o handles vision and audio inputs alongside text. For text-only workloads, GPT-4.1 is cheaper at equivalent capability tiers.
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
o-series reasoning models
The o-series trades latency for accuracy on multi-step reasoning. They are significantly more expensive, and they generate internal thinking tokens that count toward your bill even though they are not returned in the response.
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| o3 | $10.00 | $40.00 |
| o4-mini | $1.10 | $4.40 |
Key limitations
OpenAI provides no pre-flight per-user budget enforcement. You can set a monthly spend cap for your organization account, and the usage API returns token counts after each call. There is no mechanism to reject a specific request before it fires because a particular user or project has crossed a threshold.
Anthropic API pricing
Anthropic's tier structure follows their Opus, Sonnet, and Haiku naming convention across Claude generations. Opus is the most capable and most expensive; Haiku is the cheapest and fastest; Sonnet sits between them. Each generation generally moves prices downward relative to the prior generation at equivalent capability.
Claude 3.5 and 3.7 series (legacy)
These remain widely deployed and are still available through the API, though the Claude 4 family is now the current generation.
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| Claude 3 Opus | $15.00 | $75.00 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3.7 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Haiku | $0.80 | $4.00 |
Claude 3.7 Sonnet introduced extended thinking, where the model generates scratchpad tokens before its final response. Those thinking tokens are billed at the standard input rate and can add substantially to cost per call. Most frameworks let you set a per-request cap on thinking tokens.
Claude 4.x series
Anthropic's Claude 4 family — Opus 4, Sonnet 4.5, and Haiku 4.5 — shipped in 2025 and is now the current recommended generation. Sonnet 4.5 and Haiku 4.5 land at the same price points as their 3.5 predecessors; Opus 4 carries the same price as Claude 3 Opus, reflecting its positioning as the top reasoning and accuracy tier.
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 |
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $0.80 | $4.00 |
Key limitations
Anthropic provides no pre-flight per-user budget enforcement at the API level. The Console allows workspace-level spend limits, and each API response includes token usage metadata. There is no API mechanism to reject an inbound request because a specific customer in your multi-tenant application has exhausted their allocation.
Ownership note
Amazon Web Services holds a significant minority stake in Anthropic and offers Claude models through Amazon Bedrock. Bedrock pricing differs from direct API pricing by region and model. This post covers direct API pricing only.
Side-by-side comparison
| Model | Provider | Input ($/1M) | Output ($/1M) | Notes |
|---|---|---|---|---|
| GPT-4.1 nano | OpenAI | $0.10 | $0.40 | Cheapest current tier |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | Multimodal |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | |
| Claude Haiku 4.5 | Anthropic | $0.80 | $4.00 | |
| o4-mini | OpenAI | $1.10 | $4.40 | Reasoning model |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | |
| GPT-4o | OpenAI | $2.50 | $10.00 | Multimodal |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | |
| Claude 3.7 Sonnet | Anthropic | $3.00 | $15.00 | Extended thinking available |
| o3 | OpenAI | $10.00 | $40.00 | Reasoning model |
| Claude Opus 4 | Anthropic | $15.00 | $75.00 |
Worked examples for agent workloads
Token ratios matter more than headline prices. Agent workloads tend to be input-heavy because each call includes tool schemas, growing conversation history, and sometimes retrieval context. Output tokens are a smaller fraction of total cost than most cost estimates assume.
Example 1: Customer support with conversation history
Ten turns of conversation, where each turn includes the full prior history (average 2,000 tokens per turn) and produces a 200-token response. Total per session: approximately 22,000 input tokens and 2,000 output tokens.
| Model | Input cost | Output cost | Session total |
|---|---|---|---|
| GPT-4.1 nano | $0.0022 | $0.0008 | $0.0030 |
| GPT-4o mini | $0.0033 | $0.0012 | $0.0045 |
| Claude Haiku 4.5 | $0.0176 | $0.0080 | $0.0256 |
| GPT-4.1 | $0.0440 | $0.0160 | $0.0600 |
| Claude Sonnet 4.5 | $0.0660 | $0.0300 | $0.0960 |
GPT-4.1 nano is roughly 32 times cheaper per session than Claude Sonnet 4.5. At 10,000 sessions per day, that gap is $27/day vs $960/day.
Example 2: Document summarization
A 50-page document, approximately 75,000 input tokens, producing a 1,000-token summary.
| Model | Input cost | Output cost | Call total |
|---|---|---|---|
| GPT-4.1 nano | $0.0075 | $0.0004 | $0.0079 |
| Claude Haiku 4.5 | $0.0600 | $0.0040 | $0.0640 |
| GPT-4.1 | $0.1500 | $0.0080 | $0.1580 |
| Claude Sonnet 4.5 | $0.2250 | $0.0150 | $0.2400 |
This workload is almost entirely input-driven, so the input price per million tokens is the dominant variable. Prompt caching can substantially reduce this cost if your document shares a large static prefix across calls.
Example 3: Agentic loop with tool calls
A ReAct-style agent that makes eight tool calls before completing. Each call includes a growing context (average 3,000 tokens) and a short output (150 tokens). Total across eight iterations: approximately 24,000 input tokens and 1,200 output tokens.
| Model | Input cost | Output cost | Loop total |
|---|---|---|---|
| GPT-4.1 nano | $0.0024 | $0.0005 | $0.0029 |
| GPT-4o mini | $0.0036 | $0.0007 | $0.0043 |
| Claude Haiku 4.5 | $0.0192 | $0.0048 | $0.0240 |
| GPT-4.1 | $0.0480 | $0.0096 | $0.0576 |
| Claude Sonnet 4.5 | $0.0720 | $0.0180 | $0.0900 |
At 1,000 loops per day, GPT-4.1 nano costs $2.90/day and Claude Sonnet 4.5 costs $90/day. The more important variable for agent loops is context growth per step. If a loop starts at 1,000 tokens and adds 500 tokens per step, a 20-step loop costs roughly 3-4x what a flat-average estimate suggests. Build your cost models with a growth multiplier, not a fixed token count.
The enforcement gap
Both providers give you tools to see what you spent after a call completes. OpenAI returns token counts in every API response and exposes usage data in the dashboard. Anthropic does the same. Neither provider offers a way to reject a specific request before it fires because a given user in your application has crossed their allocated budget.
This works fine for single-tenant applications with predictable workloads. It becomes a real problem in three scenarios.
First: multi-tenant SaaS where each customer has a spending allocation. If one customer's agent loop sends 400 API calls in a minute, you see the damage in your dashboard after it happens. That customer's $50 monthly allocation is gone, and depending on your billing model, you may absorb the overage.
Second: agent loops where individual calls trigger cascading downstream calls. A planning agent might spawn sub-agents, each of which calls a retrieval system and then calls the LLM again. The cost of a single user request is not one API call but a tree of calls, and the total compounds with each branch. Without per-user pre-flight enforcement, any branch can run unchecked.
Third: development environments where engineers run agents during testing without realizing they are consuming production quota. Usage dashboards surface this at the end of the billing cycle, not in the moment.
The enforcement gap is not an observability problem. Post-hoc logging tells you what happened. Pre-flight enforcement determines whether it happens at all.
FAQ
Which is cheaper overall, OpenAI or Anthropic?
It depends on the tier. GPT-4.1 nano at $0.10/1M input is the cheapest option from either provider. In the mid-tier, GPT-4.1 mini ($0.40/1M input) is cheaper than Claude Haiku 4.5 ($0.80/1M input). At the top general-purpose tier, Claude Opus 4 ($15/1M input, $75/1M output) is more expensive than GPT-4.1 ($2/1M input, $8/1M output), though Opus serves a different use case. The o-series reasoning models sit above GPT-4.1 in price — however, o3 ($40/1M output) is significantly cheaper than Claude Opus 4 ($75/1M output) on output tokens, making it a more cost-effective choice for reasoning-heavy tasks.
Do prompt caching discounts change this comparison significantly?
Yes, especially for agent workloads with large static system prompts or tool schemas. OpenAI charges approximately $0.50/1M for cached input tokens on GPT-4.1, versus $2.00/1M uncached. Anthropic charges approximately $0.30/1M for cache reads on Claude 3.5 Sonnet, versus $3.00/1M uncached. If 70% of your input tokens are a fixed system prompt that caches reliably, your effective input cost drops dramatically. Verify current cache pricing on the respective pricing pages, as these rates have changed with model releases.
What about Batch API pricing for non-real-time workloads?
Both providers offer batch processing at roughly 50% off standard input and output prices. If your workload tolerates a 24-hour turnaround, document summarization and data extraction pipelines can cut their token cost in half using the batch endpoint. Real-time workloads, streaming responses, and agent loops with tool calls cannot use batch processing.
How do I estimate agent loop costs before running them?
Estimate your starting context size, your average output size per step, your average context growth per step, and your expected number of steps. Calculate the cost of each step using the cumulative context at that point, then sum. The most common modeling error is using the average context size across all steps rather than accounting for the fact that later steps are always more expensive than earlier ones.
Does model quality differ enough to justify the higher prices?
For structured extraction and retrieval-augmented generation tasks, the quality gap between mid-tier and frontier models has narrowed substantially over the past 18 months. For tasks involving multi-step reasoning over ambiguous information or long-horizon planning, the gap is real and measurable. The only reliable way to answer this for your specific use case is running your own evaluation on representative samples and checking whether the quality delta justifies the cost delta at your expected usage volume.
OpenAI vs Anthropic: Key Takeaways
Both providers have narrowed the quality gap at every tier — the cheapest models today handle workloads that required frontier models eighteen months ago. The pricing decision has become primarily a cost-engineering question: which model tier delivers acceptable quality at the token ratio your workload actually produces. The teams that win on margin control spend per user, not just spend per month — because aggregate dashboards hide the outliers that actually destroy unit economics.
Neither provider offers a mechanism to reject a call before it fires based on per-user budget state. noburn.dev fills that gap: it estimates token cost client-side, checks it against the per-user or per-project budget, and blocks the call before it reaches OpenAI or Anthropic. For multi-tenant SaaS, per-user metering and Stripe passthrough billing are included. Free tier at noburn.dev/docs.