In mid-October 2025, OpenAI’s GPT-5 math claims ignited a firestorm after company executives announced the AI had solved ten unsolved mathematical problems. What was initially hailed as a monumental leap in AI reasoning quickly unraveled into a cautionary tale about the perils of hype without rigorous verification, becoming a case study in the gap between AI capabilities and corporate marketing.
From Bold Claim to Swift Retraction
In October 2025, OpenAI executives claimed on social media that GPT-5 had solved ten open mathematical problems. However, experts swiftly revealed that the AI had merely rediscovered existing proofs from published literature, prompting a public retraction and widespread criticism of the company’s verification process.
The controversy began with posts on X from VP Kevin Weil and researcher Sébastien Bubeck on October 17, asserting GPT-5 generated novel proofs for ten open Erdős problems. The celebration was short-lived. Within hours, mathematician Thomas Bloom, who curates the Erdős Problems database, clarified that the solutions were already present in published research. His rebuttal went viral following a detailed TechCrunch report on October 19. Criticism from industry leaders mounted, with Google DeepMind CEO Demis Hassabis labeling the claims “embarrassing.” Between October 19 and 21, OpenAI deleted the original posts. An internal memo, later quoted by ImaginePro, admitted GPT-5 provided “valuable literature review, not discovery,” ending the 72-hour saga.
Expert Scrutiny Reveals a Literature Review, Not Discovery
The backlash wasn’t confined to social media; it was driven by rigorous peer scrutiny. Mathematicians dissected the GPT-5 outputs, concluding the model performed advanced text retrieval, not genuine mathematical reasoning. Their findings were summarized and circulated widely:
- Eight of the “new” proofs were from articles published before 2019.
- The remaining two solutions came from obscure conference proceedings.
- All eleven “partial results” were found in publicly available graduate theses.
- None of the proofs demonstrated complexity beyond an undergraduate level.
This collective analysis reinforced the consensus that while large language models are powerful tools for information discovery, they still struggle with creating original, abstract proofs.
The Scientific Cost of Bypassing Peer Review
The mathematical community’s pushback stemmed from a core scientific principle: claims require proof. Announcing a breakthrough without undergoing peer review erodes the trust that underpins scientific progress. The GPT-5 incident compounded existing worries about AI reliability, as it came on the heels of studies showing AI tools often cite retracted scientific papers without any warning. For example, a September 2025 analysis mentioned on Jim Sellmeijer’s blog found that some AI research assistants referenced discredited studies in 12% of medical-related queries. The controversy intensified calls for building robust, model-level validation pipelines to ensure AI-generated information is trustworthy.
Lessons Learned: The Impact on Future AI Announcements
This episode highlights the intense competitive pressures among frontier AI labs like OpenAI, Google DeepMind, and Anthropic. The race for breakthroughs that attract investment and top talent can incentivize premature announcements on social media before claims are fully substantiated. In response, OpenAI has reportedly instituted an internal “proof-audit” checklist for scientific claims, requiring review by independent mathematicians before any public statements are made. Concurrently, the wider industry is seeing startups integrate tools like the Retraction Watch and OpenAlex databases to help AI models flag unreliable sources automatically. While AI hype is unlikely to vanish, the GPT-5 misstep has reinforced the need for transparency, independent verification, and cautious communication.
What exactly was OpenAI’s claim about GPT-5’s math abilities?
Between October 17 and 19, 2025, OpenAI executives publicly stated that GPT-5 had “solved 10 previously unsolved Erdős problems,” implying the AI had generated novel mathematical proofs. This was presented as a significant breakthrough in the model’s reasoning capabilities.
Why was the math claim retracted so quickly?
The claim was debunked within hours by mathematician Thomas Bloom. He clarified that the problems were only “unsolved” in his personal database (ErdosProblems.com), not in the broader mathematical community. GPT-5 had simply located existing, published solutions that he had not yet cataloged.
How did the AI and math communities react?
The reaction was swift and critical. Google DeepMind’s CEO called the incident “embarrassing,” and other prominent AI researchers like Yann LeCun and Terence Tao criticized the lack of due diligence. The event was widely reported and became a prominent example of premature AI hype.
What are the implications for trust in AI?
Incidents like this can erode public and professional trust in AI announcements. They underscore the urgent need for transparency and automated validation, especially since studies show AI tools can unknowingly reference retracted or discredited scientific papers, raising concerns about their reliability.
How might this change OpenAI’s future announcements?
While OpenAI hasn’t announced a formal policy change, the company quickly shifted to more measured internal communications. Analysts predict that frontier AI labs will adopt more stringent internal verification processes before publicizing scientific achievements, particularly with major model releases anticipated in late 2025.
















