Azure OpenAI Service Pricing and Enterprise Terms

Azure OpenAI is the enterprise gateway to GPT-class models — and the fastest-growing consumption meter in the Microsoft estate. Its cost turns on a single decision: pay per token, or reserve throughput. Get that decision and the data-handling terms right, and Azure OpenAI is a controlled line item; get them wrong, and it is an open-ended bill with no seat count to cap it.

By Microsoft Practice Lead

Two Pricing Models

Azure OpenAI is billed two fundamentally different ways. Pay-as-you-go charges per token consumed — for GPT-4o, roughly $2.50 per million input tokens and $10 per million output tokens. Provisioned Throughput Units (PTUs) reserve dedicated capacity billed hourly, starting around $2,448 per month, and you pay for the deployed PTU count regardless of how many tokens you actually process. PTUs can deliver up to 70% savings over pay-as-you-go on sustained, high-volume workloads, with Azure Reservations adding further discount for committed terms. This is the purest consumption layer in the wider advanced Microsoft estate — there is no per-seat anchor, so the meter is the cost.

The two models suit opposite usage patterns. Pay-as-you-go is right for spiky, experimental or low-volume workloads where idle reserved capacity would be wasted. PTUs are right for steady, high-throughput production traffic where the discount on guaranteed capacity outweighs the cost of reserving it. Most enterprises end up running both — pay-as-you-go for development and bursts, PTUs for the production workloads that have proven their volume.

ModelBilling basisIndicative costBest for
Pay-as-you-go (GPT-4o)Per token$2.50/$10 per M in/outSpiky, experimental, low volume
Provisioned Throughput (PTU)Reserved capacity, hourlyFrom ~$2,448/monthSteady, high-volume production
PTU + Azure ReservationCommitted termUp to −70% vs PAYGProven, sustained workloads

The PTU Break-Even

The decision between the two models comes down to volume. The break-even between pay-as-you-go and PTUs sits at roughly 150–200 million tokens per month for GPT-4o-class models. Below that line, pay-as-you-go is cheaper and avoids paying for idle reserved capacity; above it, PTUs win and the saving widens with scale. Enterprise Azure OpenAI deployments commonly run between $5,000 and $50,000 per month — squarely in the range where the PTU question must be modelled against measured token volume, never guessed.

Committing to PTUs before you have measured real token volume is the classic Azure OpenAI mistake — you lock in reserved capacity that may sit half-idle, paying for throughput you never use. Run the workload on pay-as-you-go first, measure the steady-state token rate, then size PTUs to that floor. Pilot, measure, commit — in that order.

This is the same predictable-versus-variable trade-off that governs every consumption meter in the estate, from Sentinel ingestion tiers to Windows 365 capacity. Reserve to the measured floor, and keep the variable and experimental traffic on pay-as-you-go.

The Enterprise Terms That Matter

Price is only half the Azure OpenAI conversation; the data-handling terms are the other half, and for regulated organisations they are the reason Azure OpenAI exists. Under the service terms, your prompts and completions are not used to train the underlying OpenAI models, are not shared with other customers or with OpenAI, and stay within your Azure tenant and chosen region. That commitment is the single most important enterprise term — and it should be confirmed in writing against your specific regulatory, residency and sector requirements rather than assumed.

Three further terms deserve attention before any production rollout: regional and data-residency guarantees for where inference runs; the content-filtering and abuse-monitoring configuration, including whether human review of flagged content can be disabled for sensitive workloads; and the commercial commitment structure if PTUs are reserved. These terms sit alongside the broader AI-governance posture that connects to Microsoft Copilot and the wider AI estate, where the same data-handling and commitment questions recur.

Controlling Token Cost

Because Azure OpenAI is a meter, cost control is about volume and efficiency, not licence counts. Measure real token consumption before committing to PTUs; size reserved capacity to steady-state, not peak; keep pay-as-you-go for spiky and experimental traffic; and apply Azure Reservations only once volume is proven. Output efficiency is a direct lever — output tokens cost four times input tokens for GPT-4o, so trimming verbose responses and capping output length cuts the bill immediately. Model selection matters too: routing simple tasks to smaller, cheaper models rather than defaulting every call to the flagship is one of the largest available savings.

The same measure-then-commit discipline applies to adjacent AI-flavoured SKUs such as Microsoft Sustainability Manager: prove the use case and the volume before signing a commitment. Pilots are cheap; over-committed reserved capacity is not.

Negotiating Azure OpenAI

Azure OpenAI consumption flows through your Azure commitment, which means PTU reservations and token spend can be folded into the broader Azure commitment (MACC) that anchors a Microsoft negotiation — and AI commitments can be traded against other commercial terms across the estate. The levers are: pilot on pay-as-you-go, size PTUs to measured volume, fold the committed spend into the Azure agreement rather than buying at portal list, and negotiate the data-handling and residency terms explicitly. That benchmark-led approach is set out in the Microsoft Enterprise Agreement Guide and supported by the Microsoft vendor intelligence hub.

Before committing to PTUs or signing an AI clause, model the workload against the break-even and pressure-test the enterprise terms. To benchmark your Azure OpenAI spend and contract terms against current data, request a confidential briefing — AI consumption is the newest and least-governed line in the Microsoft estate, and the one most worth getting right from the start.

Common Questions

Azure OpenAI: FAQ

How is Azure OpenAI Service priced?
Azure OpenAI is billed two ways. Pay-as-you-go charges per token — GPT-4o, for example, is around $2.50 per million input tokens and $10 per million output tokens. Provisioned Throughput Units (PTUs) reserve dedicated capacity billed hourly, starting around $2,448 per month, regardless of how many tokens you actually process. PTUs can deliver up to 70% savings versus pay-as-you-go on sustained, high-volume workloads, with further discounts available through Azure Reservations.
When do PTUs become cheaper than pay-as-you-go?
The break-even between pay-as-you-go and Provisioned Throughput Units sits at roughly 150–200 million tokens per month for GPT-4o-class models. Below that, pay-as-you-go is cheaper and avoids paying for idle reserved capacity; above it, PTUs win and the gap widens with volume. Enterprise Azure OpenAI deployments commonly run between $5,000 and $50,000 per month, which is squarely in the range where the PTU decision has to be modelled rather than guessed.
Does Microsoft use our data to train Azure OpenAI models?
No. Under the Azure OpenAI Service terms, your prompts and completions are not used to train the underlying OpenAI models and are not shared with other customers or with OpenAI. Data stays within your Azure tenant and chosen region. This data-handling commitment is the single most important enterprise term and the main reason regulated organisations choose Azure OpenAI over the consumer API — but it should be confirmed in writing against your specific regulatory and residency requirements.
How do you control Azure OpenAI cost?
Treat it as a consumption meter, not a licence. Measure real token volume before committing to PTUs, size reserved capacity to steady-state demand rather than peak, keep pay-as-you-go for spiky or experimental workloads, and apply Azure Reservations once volume is proven. Prompt and output efficiency matters too — output tokens cost four times input tokens for GPT-4o, so reducing verbose responses directly cuts the bill. Pilot first, measure, then commit.

Size Azure OpenAI Before You Commit

Our advisors model token volume against the PTU break-even, size reserved capacity, and negotiate Azure OpenAI pricing and data-handling terms on your behalf.

Request a Confidential Briefing Explore Microsoft Intelligence

Microsoft Licensing Intelligence

Monthly briefings on Azure OpenAI, AI procurement and Azure pricing changes — from advisors who have been on both sides of the table.