SAP CEO Warns "Almost Right" AI Costs Enterprises
Serge Bulaev
SAP's CEO warned that "almost right" results from AI may not be good enough for important business tasks, especially in finance and supply chains. Mistakes from AI can lead to more manual work, errors, and delays, especially if data is weak. Companies are now building systems where every AI decision can be checked, explained, and approved by humans when needed. They use layers of controls, keep detailed records for audits, and set rules to stop or review AI actions if risks get too high. These steps may help businesses use AI safely while meeting auditor and regulator demands.

Following a stark warning from SAP's CEO about the hidden costs of "almost right" AI, enterprise leaders are rethinking their automation strategies. The declaration that near-perfect AI is insufficient for critical business functions has ignited a crucial debate. This tension between probabilistic AI models and the deterministic needs of finance, supply chain, and compliance has pushed verifiable AI from theory to a boardroom priority. This article outlines the emerging patterns for building audit-ready, reliable AI systems.
Why "Almost Right" Creates Real Cost
The term "almost right" AI refers to probabilistic models that produce nearly accurate but not perfect results. In business, these small errors can cause significant financial and operational damage by triggering transaction failures, creating manual rework, delaying processes like financial closing, and generating exception backlogs that overwhelm teams.
The costs of "almost right" AI become tangible when automated agents interact with core business systems. Industry reports indicate these agents often fail to meet the strict validation criteria of enterprise applications. Each failure creates an exception, leading to a backlog of manual rework, delayed financial postings, and unplanned reconciliations. The problem is amplified in organizations with poor master data, where minor AI classification errors can cascade into major discrepancies across integrated financial ledgers.
From Explainability Veneer to Audit-Ready Design
Effective AI governance requires building auditability into the system architecture from the start, not as an afterthought. In response to the risks highlighted by SAP, a growing number of enterprises are implementing a three-layer model to ensure transparency and accountability:
- Model Layer: Maintain versioned snapshots of models, their input data, and confidence scores.
- Interface Layer: Expose key data points like feature importance, uncertainty levels, and data lineage for human review.
- Oversight Layer: Capture structured human feedback and log the rationale behind every decision.
This layered design provides a complete, auditable record, allowing teams to reconstruct the AI's decision process, including the inputs, its reasoning, and the final human approval.
Human-in-the-Loop Pattern That Scales
To manage AI risk at scale, companies are adopting structured human-in-the-loop (HITL) workflows. A typical pattern involves:
- AI proposes an action with a confidence score and an explanation.
- A reviewer uses a checklist to assess intent, data quality, and potential impact.
- An approver accepts, modifies, escalates, or rejects the proposal, documenting their reasoning.
- The system archives every step for future audits or incident analysis.
This risk-based approach enables efficiency by fast-tracking routine, high-confidence decisions while ensuring senior experts review high-impact or low-confidence proposals. For instance, many firms set financial thresholds to automatically trigger mandatory human oversight for higher-risk decisions.
Aligning KPIs with Governance
AI governance is ineffective if it only measures speed and volume. To ensure quality, a significant portion of organizations are augmenting automation KPIs like "touchless order rate" with quality-control metrics that set targets for acceptable error rates. When error rates exceed established thresholds, automated systems can throttle AI agents, rerouting tasks for manual review until the models are retrained or rules are adjusted to restore performance.
Escalation Playbooks and Rollback Readiness
A surge in AI-generated exceptions can quickly overwhelm operational teams, highlighting a critical gap in preparedness: the lack of rehearsed rollback plans. To counter this, organizations are developing escalation playbooks that automatically pause AI processes when error rates spike. These plans include pre-tested scripts that allow a swift, safe reversion to established, deterministic rules without risking data loss.
Data Fitness as a First-Class Control
The quality of an AI's output is directly tied to the quality of its input data. Recognizing that poor master data amplifies "almost right" errors, companies are now treating data fitness as a primary control. This involves integrating data quality monitors that act as a preventative gate, automatically blocking AI recommendations if source data completeness falls below predefined thresholds. This step prevents flawed outputs from ever reaching critical transaction systems.
Evidence That Satisfies Regulators
Regulatory scrutiny of AI is intensifying, with auditors demanding to know who approved a decision, based on what information, and how the model is monitored for drift or bias. To meet these demands, enterprises must maintain immutable logs for every AI-assisted decision. These logs should capture the model version, input data, the explanation provided, the human's final decision, and a timestamp. Best practices also include rotating human reviewers and auditing for "rubber-stamping" to ensure vigilant oversight.
The SAP CEO's warning has put a spotlight on the real-world costs of near-miss AI decisions. In response, best practices are converging on a framework of end-to-end traceability, risk-based human oversight, and the alignment of automation goals with quality controls. By implementing these measures, enterprises can confidently leverage AI's power while preserving the deterministic integrity essential for mission-critical systems.