Why do output tokens cost more than input tokens?

Generating text is more computationally expensive than reading it, so providers price output tokens three to ten times higher than input tokens. On Claude Sonnet 4.6, input is around $3 per million tokens and output around $15 — a 5x asymmetry. This means the ratio of output to input in your workload, not the headline input price, drives your real cost per request.

How much can caching and batch processing cut AI token costs?

A lot, and at no negotiation cost. Prompt caching reduces repeated input token costs by 75–90% — a $3 per million system prompt can drop to $0.30 on cache hits. Asynchronous Batch API endpoints apply a flat 50% discount. Combined with model routing, these three mechanisms typically cut a production AI bill by 40–70% before any enterprise contract is signed.

Can enterprises negotiate token pricing below list rates?

Yes. Volume discounts and committed-use agreements typically reduce list token rates by 25–50% for predictable, high-volume workloads. Providers offer provisioned-throughput tiers that reserve capacity at a negotiated rate, and the largest enterprise deals are custom-quoted. Marketplace routes such as Amazon Bedrock may require annual commitments in the $10,000–$50,000 range to unlock volume pricing.

If token prices keep falling, why is our AI bill rising?

List prices fell roughly 80% between early 2025 and early 2026, but consumption grew faster. Reasoning models that emit far more output tokens, agentic workflows that chain many calls, and broader rollout all expand token volume. Falling unit prices are easily outrun by rising usage, which is why a token budget and routing strategy matter more than the headline price drop.

Pricing & Benchmarks · AI Procurement · 2026·7 min read·Updated May 2026

AI Token Pricing: Understanding and Negotiating Costs

AI token pricing looks simple — a few dollars per million tokens — and then the bill arrives an order of magnitude larger than the model card implied. This guide explains how token costs actually accrue, what the 2026 benchmarks are, and the levers that turn a runaway AI bill into a negotiated, predictable line item.

The Negotiation Experts Editorial Team · AI Procurement desk
Reviewed to our editorial standards · Report an error

In This Article

How Token Pricing Actually Works
The 2026 Price Collapse — and Why Your Bill Rose
Model Pricing Benchmarks
The Free Levers: Caching, Batch, Routing
Negotiating an Enterprise Token Contract
Building a Defensible Token Budget

How Token Pricing Actually Works

AI token pricing charges separately for the tokens you send (input) and the tokens the model generates (output). A token is roughly four characters of English, so a thousand-word prompt is about 1,300 input tokens. The trap is the asymmetry: output tokens cost three to ten times more than input tokens, because generation is far more compute-intensive than reading. On Claude Sonnet 4.6, input runs around $3 per million tokens and output around $15 — a 5x gap.

That asymmetry, not the headline input price, drives your real cost per request. A summarisation task with a long document in and a short answer out is cheap. A reasoning or agentic task that emits thousands of output tokens per call — and chains several calls together — is expensive, even on a "cheap" model. Modelling the input-to-output ratio of your actual workload is the first step in any AI cost analysis, a discipline we apply across the pricing work in our enterprise software pricing benchmarks guide.

The 2026 Price Collapse — and Why Your Bill Rose

Token list prices fell roughly 80% between early 2025 and early 2026. GPT-4o input alone dropped from $5.00 to $2.50 per million tokens, and budget models now sit below $0.15. Yet most enterprises report their AI spend rising over the same period. The reason is volume: reasoning models emit far more output tokens, agentic workflows multiply the number of calls, and broader rollout expands the user base faster than unit prices fall.

This is the same dynamic that makes consumption-based pricing dangerous without governance — a pattern we examine in pay-per-use vs subscription. Falling unit prices give a false sense of safety; the only durable control is a token budget tied to usage volume, not list rate.

Model Pricing Benchmarks

The table below shows representative 2026 list rates per million tokens. Prices move quickly, so treat these as a benchmark band rather than a quote — but the tiering structure (budget, mid, premium) is stable across providers.

Tier / Model	Input (per 1M)	Output (per 1M)	Typical Use
Budget (Gemini Flash Lite, GPT Nano)	$0.08–$0.15	$0.28–$0.60	High-volume classification, routing
Mid (Claude Sonnet 4.6, GPT-5.4)	$2.50–$3.00	$15.00	Production reasoning, drafting
Premium (Claude Opus 4.5)	$5.00	$25.00	Complex analysis, agents
Frontier reasoning (GPT-5 class)	$15.00	$75.00	Hardest tasks only

Enterprises running a tiered routing architecture — roughly 70% of queries to a budget model, 20% to mid-tier, 10% to premium — achieve a median blended cost near $2.31 per million tokens. The single biggest cost mistake is sending every request to a frontier model "to be safe".

The Free Levers: Caching, Batch, Routing

Three mechanisms cut token costs before any contract negotiation, and most enterprises leave all three on the table. First, prompt caching: repeated input — a long system prompt, a fixed knowledge base — is cached and re-billed at 75–90% off. A $3 per million system prompt drops to roughly $0.30 on cache hits. Second, the Batch API: asynchronous workloads that tolerate a delay get a flat 50% discount on every token. Third, model routing: directing each request to the cheapest model that can handle it, rather than defaulting to the most capable.

Together these typically reduce a production AI bill by 40–70% with no loss of quality and no vendor negotiation required. They should be exhausted before you ever ask a provider for a discount — both because they are free, and because they shrink the committed volume you will eventually negotiate around. The same "fix the consumption before you commit" logic governs cloud commitments, as covered in Enterprise Agreement vs pay-as-you-go.

Negotiating an Enterprise Token Contract

Once usage is optimised and a stable volume baseline is visible, list rates become negotiable. Volume discounts and committed-use agreements typically cut list token rates by 25–50% for predictable, high-volume workloads. Providers also offer provisioned-throughput tiers that reserve dedicated capacity at a negotiated rate — useful for latency-sensitive production systems — and the largest enterprise deals are custom-quoted entirely.

The marketplace route matters too: accessing models through Amazon Bedrock or Azure AI Foundry lets you fold AI spend into an existing cloud commitment such as an AWS EDP or Azure MACC, sometimes unlocking better effective rates than a direct provider contract — though Bedrock volume pricing may require annual commitments in the $10,000–$50,000 range. The levers that move these deals are the familiar ones: a credible multi-model alternative, a committed annual volume, and timing aligned to the provider's fiscal year. We size and negotiate these agreements through our AI procurement advisory practice.

Building a Defensible Token Budget

The deliverable that protects an enterprise is a token budget: a model, per use case, of expected tokens per request, requests per day, and the routed model mix — converted into a monthly cost with a clear variance band. Without it, AI spend is governed by hope. With it, every optimisation and every negotiated discount can be measured against a baseline, and finance has a number to hold the programme to.

Treat the AI token contract like any other major procurement: benchmark before you negotiate, optimise consumption first, commit only your stable baseline, and keep a flexible tier for growth. To benchmark your token spend and model an enterprise AI contract before you commit, request a confidential briefing, or download our Price Benchmarking Report for current rate bands across vendors.

Facing a negotiation that matters?

Tell us about the deal in front of you and we will tell you how we would approach it. Benchmarking, strategy and direct execution on your behalf.

Request a confidential briefing

AI Token Pricing: Understanding and Negotiating Costs

How Token Pricing Actually Works

The 2026 Price Collapse — and Why Your Bill Rose

Model Pricing Benchmarks

The Free Levers: Caching, Batch, Routing

Negotiating an Enterprise Token Contract

Building a Defensible Token Budget

AI Token Pricing: FAQ

Negotiation intelligence,
once a month.

How Token Pricing Actually Works

The 2026 Price Collapse — and Why Your Bill Rose

Model Pricing Benchmarks

The Free Levers: Caching, Batch, Routing

Negotiating an Enterprise Token Contract

Building a Defensible Token Budget

Pricing & Benchmarks Articles

Related White Papers

AI Token Pricing: FAQ

Negotiation intelligence,once a month.

Negotiation intelligence,
once a month.