Anthropic Claude API Pricing Tiers Explained

Anthropic prices the Claude API by model tier, and the right tier — combined with the batch and caching discounts — can change a production bill by a factor of five. This guide sets out the 2026 token rates for Opus, Sonnet and Haiku, the cost levers, the committed-spend benchmarks, and where AWS Bedrock changes the maths.

By AI Practice Lead

The Three Model Tiers

The Anthropic Claude API pricing tiers map directly to three model families, and matching the tier to the task is the first and largest cost decision you make. Claude Opus 4.8 is the flagship — the highest-capability model, priced accordingly — and suits complex reasoning, agentic workflows and high-stakes analysis. Claude Sonnet 4.6 is the balanced workhorse, strong enough for most production traffic at a fraction of Opus cost. Claude Haiku 4.5 is the fast, low-cost tier for high-volume classification, extraction and routing. Most enterprises overspend by routing everything to Opus; a tiered architecture that sends only the hardest requests to the flagship and the rest to Sonnet and Haiku is the single biggest lever before any discount.

Anthropic also exposes capability variants within the flagship tier. Fast Mode on Opus 4.8, for example, is priced at $10 input / $50 output per million tokens — a premium over the standard $5 / $25, but a sharp reduction from the $30 / $150 Fast Mode commanded on Opus 4.7. The takeaway is that the price of a given capability moves between model generations, sometimes steeply, so an architecture pinned to a specific variant should be revisited each release rather than treated as fixed. Build your routing logic to make the model choice a configuration value, not a hard-coded assumption, and you can capture each generation's price cut without re-engineering.

Token Rates at a Glance

Published per-million-token rates as of mid-2026. These are the standard synchronous rates; batch and caching adjustments follow below.

Model tierInput / 1M tokensOutput / 1M tokensBatch (50% off)
Claude Opus 4.8 (flagship)$5.00$25.00$2.50 / $12.50
Claude Sonnet 4.6$3.00$15.00$1.50 / $7.50
Claude Haiku 4.5$1.00$5.00$0.50 / $2.50

The gap between tiers is roughly 5×: Opus output at $25 per million tokens against Haiku at $5. Routing even a quarter of flagship traffic down to Sonnet or Haiku where quality allows typically cuts the model line of the bill by 20–40% — before batch, caching or any negotiated discount.

Batch, Caching and Context

Three platform features do the heavy lifting on cost. Batch processing is 50% cheaper across every model in exchange for asynchronous completion — ideal for evaluation runs, document enrichment and overnight pipelines. Prompt caching cuts cached input cost by 90% (an Opus cache hit costs $0.50 per million tokens), though the first request writes the cache at 1.25× input cost for the 5-minute TTL or 2.0× for the 1-hour TTL — so caching pays off only on genuinely repeated prefixes such as large system prompts or fixed document context. On a stable, high-reuse workload the two together routinely halve a production Claude bill. Finally, Opus 4.8, Opus 4.7 and Sonnet 4.6 support a 1M-token context window at the flat rate with no long-context surcharge, which removes a cost cliff that catches buyers on other platforms. The same discipline applies to OpenAI's stack — see our OpenAI API volume discounts guide for the equivalent batch and caching levers.

Committed-Spend Discount Benchmarks

Token list prices are the same for everyone, but Anthropic negotiates committed-spend discounts for enterprise volume. These are not published; the indicative 2026 benchmarks below come from deals across the market. The commitment is use-it-or-lose-it, so size it to demonstrated baseline volume, not optimistic forecasts.

Annual committed spendIndicative discount off list
$250K–$1M10–15%
$1M–$5M15–25%
$5M+25–40% (negotiated)

Because Anthropic recently decoupled Claude Enterprise seat fees from prepaid token discounts, the seat plan no longer carries a built-in API discount — committed spend is now the route to lower token rates. We compare that change against OpenAI's model in the Claude Enterprise vs ChatGPT Enterprise cost comparison, and the seat side specifically in the ChatGPT Enterprise seat licensing guide.

Rate Limits and Cost Control

Token rates determine unit cost; rate limits determine whether you can actually run the workload, and the two are negotiated together at enterprise scale. Anthropic governs API access through usage tiers with per-minute token and request ceilings, and for production traffic those ceilings — not price — are frequently the binding constraint. Raising them is often easier to secure than a deeper discount, because added headroom costs Anthropic nothing in margin, so treat rate-limit headroom as an explicit ask in any committed-spend conversation.

Cost control on the Claude API is mostly an engineering discipline rather than a contract term. Three habits matter most. Set hard monthly spend alerts per workspace so a runaway agent loop is caught in hours rather than at the invoice. Instrument token usage by model and by feature, so you can see whether Opus traffic that could run on Sonnet is quietly inflating the bill. And separate experimental from production keys, so prompt-engineering iteration does not consume the committed-spend pool you sized for live traffic. A team that measures consumption this way forecasts its commit accurately — and an accurate forecast is precisely what unlocks the deeper discount band rather than a padded one that ends in forfeited commitment.

Direct vs AWS Bedrock

The structural decision for large Claude workloads is whether to consume the API directly from Anthropic or through AWS Bedrock. Token rates are broadly comparable, but enterprises running material AWS spend often land better overall economics on Bedrock, because Claude usage draws down an existing AWS commitment and can attract Enterprise Discount Program-level treatment. Run both paths in parallel, price them against each other, and use each as leverage. Anthropic's hub and the AWS hub on our vendor intelligence pages set out the wider commercial picture, and the AI contract red flags white paper details the usage-commitment clauses to scrutinise on either path.

The fastest way to overpay for Claude is to commit at the wrong tier and the wrong volume. If you are sizing a commit above $1M a year, request a confidential briefing and we will model your tier mix, batch and caching savings, and the direct-versus-Bedrock split before you sign.

Common Questions

Anthropic Claude API Pricing Tiers: FAQ

What are the Claude API pricing tiers in 2026?
Anthropic prices the Claude API by model tier per million tokens. As of mid-2026, Claude Opus 4.8 costs $5 input / $25 output; Claude Sonnet 4.6 costs $3 / $15; and Claude Haiku 4.5 costs $1 / $5. Opus 4.8, Opus 4.7 and Sonnet 4.6 support a 1M-token context window at the flat rate with no long-context surcharge.
How much do batch processing and prompt caching save?
Batch processing is 50% cheaper across all models — Opus 4.8 batch is $2.50 / $12.50 per million tokens. Prompt caching cuts cached input cost by 90% (Opus cache hits cost $0.50 per million tokens), though the first request writes the cache at 1.25× input cost for the 5-minute TTL or 2.0× for the 1-hour TTL. Used together on a stable prompt, the two routinely halve a production bill.
Does Anthropic offer committed-spend discounts on the Claude API?
Yes, but they are negotiated and not published. Indicative 2026 benchmarks are roughly 10–15% off for a $250K–$1M annual commit, 15–25% for $1M–$5M, and 25–40% above $5M. The commitment is use-it-or-lose-it, so size it to demonstrated baseline volume rather than forecast peaks.
Should we buy Claude through Anthropic directly or AWS Bedrock?
It depends on your existing cloud commitment. Token prices are broadly comparable, but enterprises running material AWS spend often land better overall economics consuming Claude through AWS Bedrock, because the usage draws down an existing AWS commitment and can attract EDP-level discounting. Run both paths in parallel and use each as leverage on the other.

Don't Commit to Claude at the Wrong Tier

Our AI practice models tier mix, batch and caching savings, and committed-spend sizing for enterprise Claude deployments — direct or via Bedrock.

Request a Confidential Briefing AI Procurement Advisory

AI Procurement Intelligence

Monthly briefings on Claude, OpenAI and cloud AI pricing changes, committed-spend benchmarks, and contract tactics — from advisors who have been on both sides of the table.