LEAP: The system that helped LLMs solve all Putnam 2025 Olympiad problems

Google has introduced LEAP (LLM-in-Lean Environment Agentic Prover) — an agentic framework for automated theorem proving in the Lean language. The system utilizes general-purpose LLMs, breaking down complex mathematical problems into hierarchical subtasks and using the Lean compiler for recursive error correction.

What Happened

During testing on the Putnam 2025 Olympiad, the LEAP system demonstrated exceptional results, successfully solving all 12 problems. For comparison, specialized models like Goedel-Prover-V2 and powerful general models such as Gemini 3.1 Pro failed to solve a single problem from the list.

Context

LEAP employs an agentic planning approach via an AND-OR DAG structure for task decomposition. Instead of simply generating text, the system works in tandem with the Lean compiler, which serves as a feedback environment. This allows the model to receive verified error information and recursively correct its actions to construct a correct formal proof.

Why It Matters for the Industry

The development of LEAP marks a transition from narrowly specialized "provers" to agentic systems based on general-purpose LLMs. This paves the way for scalable formal mathematical reasoning and the creation of the "LLM + Formal Verifier Agent" pattern, which can be applied in other fields requiring absolute precision and verifiable reasoning.

Why It Matters for Users

For users, this is a significant step toward creating AI capable of absolutely reliable reasoning. Unlike standard chatbots prone to hallucinations, LEAP-based systems produce proofs that cannot be disputed, as their correctness is verified by mathematical software.

What Is Not Yet Known / Limitations

Despite the impressive results, the technology is still in the research stage. There is a need for further analysis of the operational complexity involved in implementing such systems into real-world workflows.

Sources

Author

Look at AI, Editorial Staff