Google DeepMind

Sources#

Advancing Mathematics Research with AI-Driven Formal Proof Search

Summary#

Google's AI research lab. In this corpus it appears as the lab behind AI-Driven Formal Proof Search — the team (George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Swarat Chaudhuri, Pushmeet Kohli et al.) that built AlphaProof Nexus and ran the first large-scale evaluation of LLM-aided formal proof search on open research mathematics (arXiv 2605.22763). It is also the maker of the Gemini model family used throughout (Gemini 3.1 Pro as prover, Gemini 3.0 Flash as rater), the prior AlphaProof olympiad theorem-prover, and AlphaEvolve, whose evolutionary design inspired Evolutionary Proof Search.

Role in the corpus#

DeepMind is the third frontier-lab "voice" in the wiki alongside Anthropic and OpenAI (Symphony / Agent Harness Engineering), and the one that opens the AI-for-mathematics domain. Its contribution is methodological as much as mathematical: the paper's finding that simple agentic loops increasingly rival DeepMind's own bespoke trained systems (Agentic Loops Overtake Bespoke Systems) is a candid, self-undercutting result — a lab that built specialized RL provers reporting that a plain LLM loop is catching up.

Systems and models referenced#

Gemini 3.1 Pro / 3.0 Flash / 3.1 Flash-Lite — the LLM backbone; Pro for proving, Flash for rating; the smaller variants solved no problems (capability is sharply scale-gated — Scale-Dependent Prompt Sensitivity).
AlphaProof — DeepMind's RL-trained olympiad-level Lean prover; used inside Nexus as a focused subgoal tool (and the system behind earlier IMO results).
AlphaEvolve — the evolutionary-coding system whose population/diversity approach Evolutionary Proof Search adapts; also helped formulate the bipartite graph-reconstruction variants in the paper.
Formal Conjectures repo — DeepMind's open-source Lean formalizations of Erdős problems, the benchmark for the Erdős runs.

Connections#

AI-Driven Formal Proof Search — the paradigm DeepMind demonstrated at research scale
AlphaProof Nexus — its framework
Lean — the proof assistant it drives with Gemini
Evolutionary Proof Search — adapts DeepMind's AlphaEvolve
Agentic Loops Overtake Bespoke Systems — DeepMind's self-undercutting finding about its own bespoke systems
Anthropic — peer frontier lab; the two anchor different domains in the corpus (alignment/coding vs. mathematics)
Scale-Dependent Prompt Sensitivity — Gemini-model scale gating mirrors the broader model-capability-threshold theme

Open Questions#

DeepMind reports its bespoke systems being caught by simple loops. Does the lab's comparative advantage move from systems to models + verifiers + benchmarks (mathlib, Formal Conjectures)?
The paper opens AI-for-math; what's DeepMind's next target domain where a sound verifier exists?

Sources#

Advancing Mathematics Research with AI-Driven Formal Proof Search