- Why AI SLAs Are Fundamentally Different
- Availability SLA: What 99.9% Really Means
- Latency and Response Time SLAs
- Model Versioning and Change Notification Provisions
- Output Quality and Accuracy Standards
- Financial Remedies That Actually Provide Leverage
- Additional Requirements for Regulated Industries
- AI SLA Reference Table: Minimums vs Best Practice
Why AI SLAs Are Fundamentally Different
Traditional enterprise software SLAs address a well-understood failure mode: the system is unavailable. When your ERP goes down, the impact is clear and measurable. The SLA framework — uptime percentage, response time, maximum incident duration, service credits — maps naturally to this failure mode.
AI models introduce a second, more insidious failure mode: the system is available, but the outputs have degraded. A language model can be fully accessible and processing requests while producing outputs that are materially less accurate, less relevant, or less aligned with your business requirements than the version you evaluated in procurement. This can happen for several reasons:
- The vendor updates the underlying model to improve average performance on benchmarks — but those improvements come with regressions on your specific use case
- The vendor modifies safety guardrails or content policies in ways that affect your legitimate business outputs
- Infrastructure changes alter inference behaviour in subtle ways not captured by standard availability monitoring
- Prompt caching, context handling, or tokenisation changes modify how the model processes your specific inputs
In our experience advising enterprises on AI contracts, model quality degradation following vendor updates is a more common operational issue than pure downtime — yet almost no standard AI vendor SLA addresses it. The negotiation challenge is to get contractual protections around both dimensions.
"An AI system can be 100% available and simultaneously delivering outputs that are 30% less accurate than what you purchased. Standard SLAs won't protect you from this. Your contract must."
Availability SLA: What 99.9% Really Means
Most AI vendors offer a 99.9% monthly availability commitment for enterprise tiers — which translates to approximately 43 minutes of permitted downtime per month. For production AI workloads embedded in customer-facing applications, this may be insufficient. For batch processing or internal productivity tools, it may be more than adequate. The first step is establishing what availability level your use case actually requires.
Measurement Methodology Matters
The availability percentage is less important than how it is measured. Vendor defaults typically exclude from downtime calculation: scheduled maintenance windows; degraded performance that doesn't meet a formal "unavailable" threshold; API timeouts that don't trigger the vendor's internal monitoring; and regional outages if the vendor considers other regions "available."
Enterprise contracts should specify: (1) availability measurement inclusive of scheduled maintenance unless you have pre-approved it; (2) a degraded performance threshold (e.g., API error rate exceeding 5% or P95 latency exceeding twice the SLA target counts as a partial outage); (3) separate availability tracking for real-time inference vs batch processing APIs; and (4) customer-visible monitoring dashboards rather than self-reported vendor metrics.
Service Credits
Standard service credit structures offer minimal deterrent — typically 10% of the monthly invoice for 99.5–99.9% availability, rising to 25–30% for availability below 99.0%. Enterprise negotiations should push for: automatic credit issuance triggered by measurement data without requiring the customer to file a claim; credit rates that escalate for extended outages (e.g., additional 10% per 4-hour increment beyond the first breach); and termination for cause rights for repeated breaches (three or more in a rolling 12-month period).
Latency and Response Time SLAs
For real-time AI applications — customer-facing assistants, real-time content generation, API-driven workflows — latency commitments are as commercially important as availability commitments. Standard AI vendor terms rarely include latency SLAs. They should.
| Use Case Category | Recommended P50 Target | Recommended P95 Target | P99 Target |
|---|---|---|---|
| Real-time customer chat / assistant | <800ms first token | <2,000ms first token | <5,000ms |
| API-driven workflow automation | <2,000ms full response | <5,000ms full response | <15,000ms |
| Document analysis / summarisation | <5,000ms | <15,000ms | <30,000ms |
| Batch processing (async) | Not applicable | Job completion within agreed window | N/A |
Latency targets should be measured end-to-end from the API call to the last token received, not from the vendor's internal processing start. For streaming responses, both first-token latency and throughput (tokens per second) should be defined.
Model Versioning and Change Notification Provisions
This is the most commercially significant SLA dimension for most AI enterprise deployments, and the one most consistently absent from vendor defaults.
Minimum 30-Day Change Notification
Your contract should require the vendor to provide minimum 30-day advance written notice before deploying any model update that may materially affect output quality, behaviour, or API compatibility. This notice should include: a description of the changes; the expected impact on output characteristics; and access to the new model version in a test environment prior to deployment to your production environment.
Model Stability Windows
Negotiate a model stability window — a contractual commitment that the model version deployed to your environment will remain unchanged for a minimum period (typically 90 days for standard enterprise agreements, 180 days for regulated deployments). After the stability window, updates can proceed with the notification requirements above.
Rollback Rights
Perhaps the most valuable protection — and the one vendors most consistently resist — is the right to request a rollback to a previous model version if a new version materially degrades your use case performance. "Material degradation" should be defined contractually: typically a 10% or greater decline in accuracy on your agreed benchmark test suite, or more than a 20% increase in P95 latency. Rollback rights with a defined duration (minimum 90 days) and a clear escalation process create meaningful operational protection.
Version Pinning
Several major AI providers now offer API-level version pinning — the ability to specify a particular model version in API calls and receive guaranteed access to that version for a defined period. Where this is available, enterprise contracts should formalise the version pinning window and include a deprecation notification requirement (minimum 6 months before a pinned version is retired).
Output Quality and Accuracy Standards
Defining contractual accuracy standards for AI is technically challenging — accuracy is use-case-dependent, and no vendor will accept open-ended accuracy warranties. However, a structured approach to output quality standards is achievable for well-defined use cases.
Benchmark Test Suite Approach
Develop a benchmark test suite during procurement — a representative sample of your actual production prompts and their expected outputs. This suite serves as the reference point for accuracy measurement. The contract specifies that the model, at contract signature, achieves a defined score on this benchmark, and that any model update must maintain performance within an agreed tolerance (e.g., ±5% on accuracy metrics).
If the vendor cannot commit to benchmark performance maintenance, the minimum acceptable position is a regression testing obligation: the vendor runs your benchmark suite against any candidate model update and provides the results before deployment, giving you the information needed to exercise rollback rights if warranted.
Financial Remedies That Actually Provide Leverage
Service credits — while important — rarely provide sufficient deterrent for AI performance breaches. The operational cost of an AI model degradation event far exceeds the credit value for most enterprises. Supplementary remedies that provide meaningful leverage include:
- Termination for cause: Three or more SLA breaches in a rolling 12-month period, or any single breach lasting more than 72 hours, triggers termination rights without penalty or payment of the remaining contract term
- Fee abatement: For sustained degradation events (longer than 7 days below agreed quality standards), a fee suspension for the duration of the degradation event — not a credit against future invoices
- Step-in rights: If the vendor fails to remediate a performance issue within 30 days, the right to engage a third party at the vendor's expense to provide equivalent capability
- Direct damages carve-out: For high-stakes regulated use cases (healthcare decisions, financial advice, credit), negotiate a carve-out from the standard limitation of liability for AI output failures caused by vendor gross negligence
"Service credits tell vendors that SLA breaches are an acceptable business cost. Termination rights tell vendors that SLA breaches are an existential risk to the relationship. The latter creates the behaviour you want."
Additional Requirements for Regulated Industries
Enterprises operating in financial services, healthcare, defence, and other regulated sectors face additional SLA requirements beyond the commercial framework above.
Audit Trail Requirements
Regulated use cases require complete, immutable audit logs of AI inputs, outputs, model versions used, and any human review decisions. Your SLA should include a commitment to provide these logs in a defined format within 24 hours of any regulatory enquiry, with a minimum 7-year retention period.
Human Override Requirements
For consequential automated decisions, the EU AI Act and sector-specific regulations require meaningful human oversight mechanisms. Your contract should specify: the vendor's obligations to provide explainability outputs sufficient for human review; the format and timeliness of those outputs; and the vendor's cooperation obligations if regulators require review of specific decisions.
Incident Response for Regulated AI Failures
An AI model producing systematically biased, inaccurate, or harmful outputs may constitute a regulatory incident requiring notification to supervisory authorities. Your contract should include: 24-hour vendor notification of any confirmed AI output failures affecting your regulated use cases; vendor cooperation with regulatory investigations; and specific contractual obligations for the vendor to support your incident response process.
AI SLA Reference Table: Minimums vs Best Practice
| SLA Dimension | Typical Vendor Default | Minimum Acceptable | Best Practice |
|---|---|---|---|
| Availability | 99.9% (excl. maintenance) | 99.9% incl. maintenance | 99.95% with degradation threshold |
| Latency (P95) | Not specified | Defined per use case type | P50 + P95 + P99 per endpoint type |
| Model change notice | Best efforts / none | 30 days written notice | 30 days + pre-production access |
| Model stability window | None | 90 days | 180 days for regulated use cases |
| Rollback rights | None | 90 days post-deployment | 180 days with defined benchmark trigger |
| Accuracy commitments | None | Benchmark regression testing | Contractual benchmark performance with tolerance |
| Service credits | 10–25% of monthly fee | Auto-issued, escalating scale | Plus termination for cause after 3 breaches |