Major agents robotics DeepSeek

Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

Jui-Hui Chung, Ziyang Cai, Zihao Li, Qishuo Yin, Rohit Agarwal, Simon Park

Published: Jun 4, 2026 — 17:54 UTC

Problem
This paper addresses the inefficiencies in existing formal theorem proving methodologies, particularly those relying on recursive lemma decomposition, which can lead to looping on dead-end strategies. The authors propose Goedel-Architect, an agentic framework designed to streamline the theorem proving process in Lean 4. This work is presented as a preprint and has not undergone peer review, indicating that the findings should be interpreted with caution.

Method
Goedel-Architect employs a two-step approach: blueprint generation and refinement. The blueprint is a dependency graph that organizes definitions and lemmas leading to the main theorem. Initially, the framework generates this blueprint, which can be guided by a natural language proof to enhance its relevance. The core architecture utilizes the open-weight DeepSeek-V4-Flash model (284B-A13B) as its backbone. The Lean prover component operates in parallel to close open lemma nodes using the dependencies outlined in the blueprint. If a lemma fails, the framework refines the global blueprint, iteratively improving the theorem proving process. This method contrasts with traditional approaches by focusing on dependency management rather than recursive decomposition.

Results
Goedel-Architect achieves impressive performance metrics, attaining a 99.2% pass rate on the MiniF2F-test and a 75.6% pass rate on PutnamBench. When leveraging a natural language proof to seed the initial blueprint, the framework successfully closes all remaining problems in the MiniF2F-test, achieving a perfect score of 100%. Additionally, it improves the PutnamBench score to 88.8% (597 out of 672 problems solved) and demonstrates capability on competitive problems, solving 4 out of 6 on IMO 2025, 11 out of 12 on Putnam 2025, and 3 out of 6 on USAMO 2026. Notably, this performance is achieved at a cost that is up to 500 times lower than comparable open-source theorem proving pipelines.

Limitations
The authors acknowledge that the framework’s reliance on natural language proof seeding may limit its applicability in scenarios where such guidance is unavailable. Additionally, while the results are promising, the lack of peer review means that the robustness of the approach and its generalizability across diverse theorem proving tasks remain to be validated. The performance on more complex or less structured problems has not been thoroughly evaluated, which could reveal further limitations.

Why it matters
The introduction of Goedel-Architect has significant implications for the field of formal theorem proving, particularly in enhancing efficiency and accessibility. By reducing the computational cost and improving the success rate of theorem proving tasks, this framework could facilitate broader adoption of formal methods in software verification and mathematical proof automation. The work sets a new benchmark for open-source theorem proving pipelines, as discussed in arXiv cs.AI. This advancement may inspire further research into hybrid approaches that integrate natural language processing with formal verification techniques.

By Callan Zhang · Jun 4, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.AI