DeepMind AlphaProof Nexus Proves 44 Math Conjectures with AI

Serge Bulaev

Serge Bulaev

DeepMind's AlphaProof Nexus is an AI system that may help prove difficult math problems by creating formal, computer-checkable proofs. The system appears to have solved 9 open Erdős problems and proved 44 conjectures from a well-known math database, but experts suggest more work is needed for widespread use. Mathematicians say the tool may help them understand problems better and find mistakes, rather than just give answers. The workflow keeps the computer as the final checker, which might limit mistakes from the AI. Early signs suggest researchers may start using such tools more often, but lowering the effort needed to formalize new problems could be important for adoption.

DeepMind AlphaProof Nexus Proves 44 Math Conjectures with AI

DeepMind AlphaProof Nexus, an AI system that combines large language models with the Lean 4 theorem prover, is generating formal, computer-checkable proofs for difficult math problems. The system produces verifiable proofs for research-level mathematics, with mathematicians downloading Lean source files from its GitHub repository for mechanical inspection.

Google DeepMind's designers report that their agent resolved 9 open Erdős problems and proved 44 conjectures from the Online Encyclopedia of Integer Sequences. All proofs are delivered as Lean scripts, so every inference can be re-checked by a compiler instead of human peer review. This may signal a shift toward verification-first workflows where AI suggestions are accepted only after a formal assistant certifies them.

Research-Level Benchmarks Replace Contest Puzzles

AlphaProof Nexus is an AI agent from Google DeepMind that combines large language models with the Lean 4 theorem prover. It is designed to solve open mathematical problems by generating formal, computer-verifiable proofs, moving beyond previous systems that focused on simpler contest-level puzzles.

The May 2026 preprint, Advancing Mathematics Research with AI-Driven Formal Proof Search, states that Nexus "performs the first large-scale evaluation on open problems," a significant step up in difficulty from prior Olympiad or Mizar benchmarks. The paper's top agent used an evolutionary search, with each successful Erdős proof costing just a few hundred dollars in compute. Exploratory deployments now span optimization, algebraic geometry, and quantum optics.

A Research Partner, Not an Autonomous Solver

Community reaction has been positive but cautious. While press coverage highlights the headline counts, mathematician collaborators emphasize that the tool "enhanced their understanding of a problem" and surfaced misformalizations. This suggests researchers value Nexus as a partner that exposes hidden assumptions rather than a black-box answer machine.

By keeping the proof assistant as the final arbiter, the workflow limits false positives. The main impacts reported include:
- Reusable Lean scripts for reproducibility and auditing.
- Formal checking to eliminate prose-level hallucinations.
- Agentic loops that reduce the need for task-specific training.

Early Signals of Cultural Change

Experts believe widespread adoption depends on lowering the formalization overhead for new problems. As Google's AI-for-Math initiative adds AlphaProof Nexus to its shared tools, an emerging norm may see formal verification accompany conjecture exploration rather than conclude it. Ongoing discussions focus on expanding open-problem datasets to measure steady progress beyond headline numbers, though compute remains a bottleneck for harder statements.


What exactly is AlphaProof Nexus and how does it differ from earlier AI math provers?

AlphaProof Nexus is an LLM-aided formal-proof agent that pairs large language models with the Lean 4 proof assistant.
Unlike earlier systems that stopped at olympiad drills, the new agent tackles open research problems: it proposes candidate proofs and Lean's kernel checks every step, so the final output is machine-certifiable.
DeepMind's May 2026 preprint calls this "the first large-scale evaluation on open problems," moving the field from classroom benchmarks to live mathematics.

Which long-standing problems has the system settled?

On record, AlphaProof Nexus proved 9 of the 353 open Erdős problems and 44 of the 492 OEIS conjectures evaluated in the study.
The reported cost per Erdős proof averaged a few hundred dollars of cloud compute, showing that formal search can be cheaper than many wet-lab experiments.
All successful proofs are published in the accompanying GitHub repository so anyone can replay or audit them inside Lean.

Why do mathematicians see this as more than a novelty?

Collaborators quoted in the arXiv paper say the agent "enhanced their understanding" and helped catch mis-formalizations, evidence that the tool works as a research partner rather than a black-box oracle.
Because every line is compiled, subtle reasoning errors that plague prose-only models are eliminated before the proof is accepted, raising trust levels well above previous AI attempts.

Where else is DeepMind stress-testing the agent?

Beyond Erdős and OEIS, the team ran exploratory pilots in optimization, graph theory, algebraic geometry, additive combinatorics and quantum optics, suggesting the workflow generalizes across fields that rely on rigorous lemmas.
The paper frames this as early evidence that formal-verification loops can speed up discovery wherever exact statements matter.

What are the current failure modes and next steps?

Researchers warn that the agent can hallucinate helper lemmas when the prompt drifts, and success still hinges on how cleanly a question can be formalized in Lean.
DeepMind sees cheaper search loops, broader problem libraries and tighter integration with day-to-day mathematician workflows as the critical next targets for 2026 and beyond.