aws bedrockazure openaienterprise llmcloud ai

AWS Bedrock vs Azure OpenAI: Enterprise LLM Cost and Compliance

Both platforms let enterprise teams run frontier models inside their cloud perimeter. The pricing models, compliance posture, and vendor lock-in implications look different under scrutiny.

nb

noburn.dev·2026-05-27

Both AWS Bedrock and Azure OpenAI let enterprise teams run frontier models without routing traffic through a third-party API endpoint. That matters to security teams who need data residency guarantees, compliance officers who need a signed BAA, and architects who want LLM traffic to stay inside an existing cloud perimeter. What looks like a simple "which cloud do you already use" decision turns out to involve meaningfully different pricing structures, model catalogs, and compliance postures.

The comparison is also a cost question. On-demand pricing from either platform is higher per token than direct API access in some configurations, and provisioned capacity models on both platforms commit you to spend whether or not your traffic materializes. Understanding how each platform meters usage, and where budget controls actually live, changes the calculus for any team running multi-tenant workloads or expecting variable demand.

AWS Bedrock

Bedrock is Amazon's managed inference service for foundation models. It aggregates models from Anthropic (Claude 3.5 Sonnet, Claude 3 Haiku, the Claude 3 family), Meta (Llama 3.x), Mistral, AI21, Cohere, Stability AI, and Amazon's own Titan family. Traffic stays inside your VPC via PrivateLink, so no model request crosses the public internet.

Pricing. Bedrock offers two modes: on-demand and provisioned throughput. On-demand bills per token with no commitment. As of mid-2025, Claude 3.5 Sonnet on Bedrock was $3.00 per million input tokens and $15.00 per million output tokens, matching Anthropic's direct API pricing. Provisioned throughput sells capacity in model units (MUs), committed for one or six months, and is appropriate only if your traffic is consistent enough to saturate that capacity. Batch inference, where you submit jobs that complete within 24 hours, runs at roughly half the on-demand rate for supported models. Check current Bedrock pricing at the AWS console before committing to any estimate, as rates change.

Compliance. Bedrock is HIPAA eligible, SOC 1/2/3 certified, ISO 27001 certified, PCI DSS compliant, and holds FedRAMP authorization for a subset of models. Amazon does not use your inference data to train models by default, and you can configure a model invocation logging pipeline that keeps audit logs inside your own S3 bucket.

Key limitations. Model availability lags the direct API. Anthropic typically releases new Claude versions on its own API weeks before they reach Bedrock. Fine-tuning Claude models is not available on Bedrock as of mid-2025. If your workload requires the absolute latest model revision or fine-tuning, you will need to account for this gap. Latency through Bedrock also runs slightly higher than direct Anthropic API calls because traffic routes through AWS infrastructure, not Anthropic's own endpoints.

Ownership. Amazon Web Services is a wholly owned subsidiary of Amazon. Anthropic maintains an independent relationship with Amazon (Amazon has invested in Anthropic), but AWS does not own Anthropic. The model serving infrastructure is Amazon's; the model weights are Anthropic's.

Azure OpenAI

Azure OpenAI is Microsoft's managed deployment of OpenAI models. It exposes GPT-4o, GPT-4 Turbo, GPT-4o mini, GPT-3.5 Turbo, DALL-E 3, Whisper, and text embedding models inside the Azure infrastructure. Azure adds regional deployments, private endpoints, Azure Active Directory authentication, and Microsoft's compliance envelope around the same models available at api.openai.com.

Pricing. Azure OpenAI has two billing paths: pay-as-you-go (standard deployment) and provisioned throughput units (PTU). Standard deployment bills per token at rates that match OpenAI's direct API pricing in most regions. As of mid-2025, GPT-4o standard was $2.50 per million input tokens and $10.00 per million output tokens. PTU pricing is a monthly or hourly reservation of throughput capacity, priced per PTU per hour, and the relationship between PTUs and actual token throughput depends on your request pattern. Microsoft publishes a PTU sizing calculator, but estimating accurately requires traffic profiling before you commit. Check current Azure OpenAI pricing in the Azure portal before planning.

Compliance. Azure OpenAI carries the full Azure compliance stack: HIPAA BAA, SOC 1/2/3, ISO 27001, ISO 27018, FedRAMP High, GDPR processing addendum, and CCPA. Data submitted to Azure OpenAI is not used to train OpenAI or Microsoft models. Azure supports data residency commitments in most regulated regions, and you can choose the deployment region to satisfy geographic data requirements. This is one of the stronger compliance postures available for GPT-family models.

Key limitations. Azure OpenAI serves only OpenAI models. If your architecture requires Claude, Llama, or any non-OpenAI model, Azure OpenAI is not the right surface. You would need to combine it with Azure Machine Learning managed endpoints or a separate Bedrock deployment. PTU provisioning requires forecasting traffic accurately in advance, and unused PTU capacity does not carry forward. Standard deployment throughput is subject to rate limits that can cause throttling in burst scenarios without PTU.

Ownership. Microsoft has a multibillion-dollar investment in OpenAI and is OpenAI's exclusive cloud provider. OpenAI is not a Microsoft subsidiary, but the commercial relationship is deep enough that the two roadmaps are tightly coupled. Changes to OpenAI's corporate structure or API strategy will affect Azure OpenAI availability and feature parity.

Comparison table

Dimension	AWS Bedrock	Azure OpenAI	noburn.dev
Model catalog	Multi-provider (Claude, Llama, Mistral, Titan, others)	OpenAI models only	Provider-agnostic enforcement layer
Pricing model	On-demand per token, provisioned throughput, batch	Pay-as-you-go per token, PTU reservation	Free to $49/mo flat; no per-token charge
HIPAA eligible	Yes	Yes (BAA available)	Not a data processor; sits client-side
FedRAMP	Yes (subset of models)	Yes (High)	N/A
Data residency	Region-scoped via PrivateLink	Region-scoped, Azure data boundary	N/A (no data leaves your runtime)
Pre-flight budget enforcement	No	No	Yes, blocks API calls before they fire
Per-user spend metering	No (account-level quotas only)	No (deployment-level limits only)	Yes, per-user limits
Passthrough billing to customers	No	No	Yes, via Stripe integration
Vendor lock-in	AWS ecosystem	Azure and OpenAI ecosystem	Works with OpenAI, Anthropic, LiteLLM, LangChain, LangGraph, Vercel AI SDK

The enforcement gap

Both platforms provide cost visibility after the fact. Bedrock routes all requests through CloudWatch, and you can build cost dashboards using Cost Explorer or third-party observability tools. Azure OpenAI surfaces token usage through Azure Monitor. Neither platform enforces spending limits at the individual API request level.

The gap becomes concrete in multi-tenant SaaS applications. If you expose an AI feature to end customers, Bedrock and Azure OpenAI meter usage at the account or deployment level, not per customer. A single heavy user can exhaust capacity or spike costs without any mechanism at the model provider layer to stop it before the cost is incurred. The same problem appears in internal tooling where different teams or projects share a deployment.

What the platforms offer as a substitute: Bedrock lets you set service quotas (requests per minute, tokens per minute), which are throughput limits, not budget limits. Azure OpenAI lets you cap tokens per minute on a deployment. Neither can say "this specific user has spent $X this month, block their next request." Both approaches prevent runaway throughput in aggregate but do nothing to enforce a per-user or per-project budget before a call executes.

A separate enforcement layer is required if you need that behavior. The options are building it yourself (estimate token count, check against a stored budget counter, gate the call) or using a purpose-built tool. Building it yourself is non-trivial because accurate token estimation requires the same tokenizer the model uses, the counter needs to be consistent under concurrent requests, and the whole path needs to handle Bedrock and Azure's respective SDKs without adding significant latency.

FAQ

Which is cheaper, AWS Bedrock or Azure OpenAI?

For GPT-family models, you cannot run them on Bedrock, so the comparison only applies to overlapping models. For Claude models, Bedrock on-demand pricing matches Anthropic's direct API as of mid-2025, so the cost difference is near zero unless you can utilize batch inference (roughly 50% discount for asynchronous jobs). For GPT-4o, Azure OpenAI matches OpenAI direct API pricing on standard deployments. The real cost difference emerges in provisioned capacity: Bedrock's model units and Azure's PTUs both require accurate traffic forecasting, and over-provisioning either adds fixed cost regardless of actual usage.

Which platform has stronger compliance coverage?

Both hold HIPAA, SOC 2, and ISO 27001 certifications. Azure OpenAI has FedRAMP High authorization across its model deployments, while Bedrock's FedRAMP coverage applies to a subset of models, not all providers. If you need FedRAMP High for a GPT-4o workload specifically, Azure OpenAI is the cleaner path. For HIPAA workloads using Claude, Bedrock is the primary option since Azure OpenAI does not carry Claude. Verify the specific model and region against each platform's compliance documentation before signing a BAA, as certifications can be region-specific.

Can I use Claude on Azure OpenAI?

No. Azure OpenAI serves only OpenAI models. Claude is available on AWS Bedrock and directly through the Anthropic API. If your compliance requirements mandate Azure infrastructure and you also need Claude, you would need to evaluate Azure Machine Learning managed online endpoints or accept a mixed-cloud architecture.

Which platform has less vendor lock-in?

Bedrock has broader model catalog lock-in risk but lower single-vendor lock-in risk. If Anthropic degrades on Bedrock, you can switch to a Bedrock Llama deployment without leaving AWS. Azure OpenAI's risk is more concentrated: the entire catalog is OpenAI models, so any disruption to the Microsoft-OpenAI relationship affects your full model catalog. On the infrastructure side, both platforms use proprietary SDK patterns for request signing and authentication that require code changes to move elsewhere. LiteLLM provides a translation layer that reduces migration cost for either platform.

How do I enforce per-user LLM budgets on Bedrock or Azure OpenAI?

Neither platform provides this natively. You implement it in your application layer: estimate the token count for a request before it fires (using a tokenizer library matched to the model family), check that estimate against a per-user budget counter stored in your database or cache, and reject the request if the user is over budget. The token estimation step is the tricky part because over-counting wastes budget and under-counting allows overspend. An enforcement layer like noburn.dev handles this without requiring you to implement and maintain the tokenizer and counter logic yourself.

Where noburn fits in this stack

noburn.dev does something neither Bedrock nor Azure OpenAI does: it estimates the token cost of an API call client-side, before the request fires, and blocks it if the user or project has exceeded its budget. This runs at the SDK level rather than as a proxy, which means it adds no network hop to your inference latency. For teams building on Bedrock with Claude or using Azure OpenAI with GPT-4o, noburn integrates through the OpenAI SDK, Anthropic SDK, LiteLLM, LangChain, and LangGraph, so it works with whichever surface your existing code uses. If you are billing customers for their LLM usage, noburn's per-user metering and Stripe passthrough billing let you charge each tenant accurately for what they consume rather than absorbing that cost or estimating it manually. The free tier covers 100 requests per month. Documentation and SDKs are at noburn.dev/docs.