Two Pricing Models
Azure OpenAI is billed two fundamentally different ways. Pay-as-you-go charges per token consumed — for GPT-4o, roughly $2.50 per million input tokens and $10 per million output tokens. Provisioned Throughput Units (PTUs) reserve dedicated capacity billed hourly, starting around $2,448 per month, and you pay for the deployed PTU count regardless of how many tokens you actually process. PTUs can deliver up to 70% savings over pay-as-you-go on sustained, high-volume workloads, with Azure Reservations adding further discount for committed terms. This is the purest consumption layer in the wider advanced Microsoft estate — there is no per-seat anchor, so the meter is the cost.
The two models suit opposite usage patterns. Pay-as-you-go is right for spiky, experimental or low-volume workloads where idle reserved capacity would be wasted. PTUs are right for steady, high-throughput production traffic where the discount on guaranteed capacity outweighs the cost of reserving it. Most enterprises end up running both — pay-as-you-go for development and bursts, PTUs for the production workloads that have proven their volume.
| Model | Billing basis | Indicative cost | Best for |
|---|---|---|---|
| Pay-as-you-go (GPT-4o) | Per token | $2.50/$10 per M in/out | Spiky, experimental, low volume |
| Provisioned Throughput (PTU) | Reserved capacity, hourly | From ~$2,448/month | Steady, high-volume production |
| PTU + Azure Reservation | Committed term | Up to −70% vs PAYG | Proven, sustained workloads |
The PTU Break-Even
The decision between the two models comes down to volume. The break-even between pay-as-you-go and PTUs sits at roughly 150–200 million tokens per month for GPT-4o-class models. Below that line, pay-as-you-go is cheaper and avoids paying for idle reserved capacity; above it, PTUs win and the saving widens with scale. Enterprise Azure OpenAI deployments commonly run between $5,000 and $50,000 per month — squarely in the range where the PTU question must be modelled against measured token volume, never guessed.
Committing to PTUs before you have measured real token volume is the classic Azure OpenAI mistake — you lock in reserved capacity that may sit half-idle, paying for throughput you never use. Run the workload on pay-as-you-go first, measure the steady-state token rate, then size PTUs to that floor. Pilot, measure, commit — in that order.
This is the same predictable-versus-variable trade-off that governs every consumption meter in the estate, from Sentinel ingestion tiers to Windows 365 capacity. Reserve to the measured floor, and keep the variable and experimental traffic on pay-as-you-go.
The Enterprise Terms That Matter
Price is only half the Azure OpenAI conversation; the data-handling terms are the other half, and for regulated organisations they are the reason Azure OpenAI exists. Under the service terms, your prompts and completions are not used to train the underlying OpenAI models, are not shared with other customers or with OpenAI, and stay within your Azure tenant and chosen region. That commitment is the single most important enterprise term — and it should be confirmed in writing against your specific regulatory, residency and sector requirements rather than assumed.
Three further terms deserve attention before any production rollout: regional and data-residency guarantees for where inference runs; the content-filtering and abuse-monitoring configuration, including whether human review of flagged content can be disabled for sensitive workloads; and the commercial commitment structure if PTUs are reserved. These terms sit alongside the broader AI-governance posture that connects to Microsoft Copilot and the wider AI estate, where the same data-handling and commitment questions recur.
Controlling Token Cost
Because Azure OpenAI is a meter, cost control is about volume and efficiency, not licence counts. Measure real token consumption before committing to PTUs; size reserved capacity to steady-state, not peak; keep pay-as-you-go for spiky and experimental traffic; and apply Azure Reservations only once volume is proven. Output efficiency is a direct lever — output tokens cost four times input tokens for GPT-4o, so trimming verbose responses and capping output length cuts the bill immediately. Model selection matters too: routing simple tasks to smaller, cheaper models rather than defaulting every call to the flagship is one of the largest available savings.
The same measure-then-commit discipline applies to adjacent AI-flavoured SKUs such as Microsoft Sustainability Manager: prove the use case and the volume before signing a commitment. Pilots are cheap; over-committed reserved capacity is not.
Negotiating Azure OpenAI
Azure OpenAI consumption flows through your Azure commitment, which means PTU reservations and token spend can be folded into the broader Azure commitment (MACC) that anchors a Microsoft negotiation — and AI commitments can be traded against other commercial terms across the estate. The levers are: pilot on pay-as-you-go, size PTUs to measured volume, fold the committed spend into the Azure agreement rather than buying at portal list, and negotiate the data-handling and residency terms explicitly. That benchmark-led approach is set out in the Microsoft Enterprise Agreement Guide and supported by the Microsoft vendor intelligence hub.
Before committing to PTUs or signing an AI clause, model the workload against the break-even and pressure-test the enterprise terms. To benchmark your Azure OpenAI spend and contract terms against current data, request a confidential briefing — AI consumption is the newest and least-governed line in the Microsoft estate, and the one most worth getting right from the start.