EY withdraws AI-generated report after GPTZero finds fake citations
Serge Bulaev
EY withdrew a report on loyalty rewards after GPTZero found that it may contain fake citations and AI-generated errors. GPTZero suggested that the report included made-up footnotes, wrong statistics, and mislabeled sources. EY said it is reviewing how the report was published, but has not announced any new rules yet. This incident highlights that using AI in professional work may lead to mistakes and extra costs for checking facts, and shows why more companies are using AI detection tools.

EY's withdrawal of an AI-generated report on loyalty rewards has intensified scrutiny of generative AI in professional services. The firm retracted the 28-page document, "Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems," after AI detection platform GPTZero published a detailed critique revealing AI hallucinations and fabricated statistics.
Investigators from GPTZero found the report made significant claims - including substantial market size figures for loyalty points and claims about fraud increases - backed by nonexistent or misattributed sources. One such fabrication was a reference to a McKinsey study that GPTZero could not find. In a statement to the Financial Times, EY confirmed it is "reviewing the circumstances" of the publication, highlighting the reputational risks for major firms adopting AI-assisted research.
What GPTZero found
EY withdrew its report after AI-detection tool GPTZero discovered it contained numerous errors characteristic of AI generation. These included fabricated statistics, fake footnotes citing non-existent industry reports, mislabeled sources, and phrasing patterns that strongly indicated the text was not written by a human expert.
GPTZero's public memo highlighted four categories of concern:
- Fabricated footnotes, including a non-existent industry benchmark on unused points.
- Statistical claims inconsistent with the cited material.
- Mislabelled primary sources, such as white papers published under different titles.
- Repetition of identical phrasing patterns that its detector flags as highly probable AI output.
A complete breakdown of the findings is available on the GPTZero investigations page, where the company details the "fake citations and inaccurate claims" and provides sentence-level highlights of likely AI-generated text.
A verification burden across the sector
The incident at EY highlights a growing challenge known as the "verification tax," a term popularized by Harvard Business Review. Verifying AI-generated content can introduce significant costs, potentially thousands of dollars per employee annually, as firms must check outputs for accuracy. This is especially critical in professional services, where even minor errors in data can lead to serious compliance issues.
The verification challenge is confirmed by industry reports on AI in professional services. While many professionals surveyed support daily generative AI use, a significant portion report that their firms don't track the technology's return on investment. A corporate CFO quoted in recent studies warned that AI which "sometimes gives answers that are... entirely incorrect" could ultimately "create more headaches than benefits."
Why detection tools are on the rise
The risks associated with unverified AI content have fueled a rising demand for detection tools. Companies like GPTZero, which reports high accuracy rates on recent AI models, are being integrated into workflows across education, publishing, and legal sectors. While independent reviewers note accuracy can vary on heavily edited text, these tools are becoming essential for triaging content before publication.
Governance questions inside consulting
For a global firm like EY, the currency at stake is reputation. This incident is not isolated; other consulting reports have shown similar signs of AI generation without sufficient human oversight. Experts now predict a shift in professional services, with staff spending more time validating AI outputs than drafting from scratch. This will necessitate greater investment in robust audit trails and human-in-the-loop approval systems.
While EY has stated it is "reviewing the circumstances," the firm has not announced specific changes to its internal controls or a timeline for a policy reassessment. The episode serves as a powerful case study on the risks of generative AI, demonstrating how quickly unverified, AI-generated errors can escalate from a minor detail to a major reputational crisis.
Why did Ernst & Young retract its recent report on loyalty programs?
EY pulled the document titled "Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems" after GPTZero investigators found hallucinated statistics, fabricated citations, and at least one fake footnote pointing to a non-existent McKinsey report.
- The firm told the Financial Times it was "reviewing the circumstances that led to this article's publication."
Which specific claims in the report were flagged?
GPTZero's public investigation highlighted several key claims that could not be verified:
| Claim in EY report | Status |
|---|---|
| Global loyalty points market valued at substantial billions | No matching source found |
| Significant percentage of points go unused annually | Unsupported by cited literature |
| Fraud attacks up significantly since recent years | Statistic misaligned with the reference provided |
Each figure was accompanied by citations that either did not exist or were mis-attributed.
How did GPTZero uncover the errors so quickly?
The platform ran sentence-level AI-detection scans, then cross-checked every footnote against academic and industry databases.
- The tool's reported high accuracy rate on modern LLM output and explainable highlights helped reviewers zero-in on suspicious passages.
- GPTZero published its findings within 48 hours, prompting immediate removal of the document by EY.
What does this mean for AI usage in big-consulting research?
The episode is a wake-up call for the Big Four. Recent industry surveys show many professional-services firms don't formally track AI ROI, while a significant portion still allow staff to use generative tools without systematic vetting.
- The same studies quote a CFO warning: "If clients rely on it as 100 % accurate, it could create more headaches than benefits."
How can firms prevent similar retractions?
Leading practices emerging in the industry include:
- Human-in-the-loop review for all client-facing insights
- Real-time citation verification using tools like GPTZero or Originality.AI
- Immutable audit trails that log who generated, edited, and approved each paragraph
- Governance playbooks that define which use-cases are low- versus high-risk
Industry voices from Harvard Business Review emphasize that time saved on drafting is often erased by a "verification tax", so budgeting for oversight is now essential.