The expanding Lean 4 ecosystem poses challenges for navigating its vast libraries. This paper introduces LeanExplore, a search engine for Lean 4 declarations. LeanExplore enables users to semantically search for statements, both formally and informally, across select Lean 4 packages (including Batteries, Init, Lean, Mathlib, PhysLean, and Std). This search capability is powered by a hybrid ranking strategy, integrating scores from a multi-source semantic embedding model (capturing conceptual meaning from formal Lean code, docstrings, AI-generated informal translations, and declaration titles), BM25+ for keyword-based lexical relevance, and a PageRank-based score reflecting declaration importance and interconnectedness. The search engine is accessible via a dedicated website (https://www.leanexplore.com/) and a Python API (https://github.com/justincasher/lean-explore). Furthermore, the database can be downloaded, allowing users to self-host the service. LeanExplore integrates easily with LLMs via the model context protocol (MCP), enabling users to chat with an AI assistant about Lean declarations or utilize the search engine for building theorem-proving agents. This work details Lea
We present Lean Refactor, a plug-and-play retrieval-augmented agentic framework for multi-objective, controllable, and version-robust refactoring of Lean proofs. LLM-generated proofs are notoriously correct-but-verbose and brittle across library versions, yet existing refactoring works overlook three practical challenges: 1) Lean refactoring is natively multi-objective (proof length, compilation cost, and version compatibility are often in tension); 2) Lean repositories have fragile compatibility, whereas LLM releases are unaware of Lean/Mathlib versions; 3) Training-based pipelines require repeated fine-tuning with each new LLM release, scaling neither with model churn nor with Lean's release cycle. Lean Refactor steers a frozen agentic LLM with retrievals from a curated database of multi-objective refactoring strategies, each densely annotated with metadata such as supported Lean/Mathlib versions and expected compilation-cost reduction. Experiments show over $70\%$ token-level compression on competition benchmarks, over $20\%$ on research repositories, and up to $60\%$ compilation-time reduction, outperforming prior work and Claude Code. Version-filtered retrieval further improve
We present PBLean, a method for importing VeriPB pseudo-Boolean (PB) proof certificates into Lean 4. Key to our approach is reflection: a Boolean checker function whose soundness is fully proved in Lean and executed as compiled native code. Our method scales to proofs with tens of thousands of steps that would exhaust memory under explicit proof-term construction. Our checker supports all VeriPB kernel rules, including cutting-plane derivations, proof-by-contradiction subproofs, and redundance-based reasoning for symmetry breaking. In contrast to external verified checkers that produce verdicts, our integration yields Lean theorems that can serve as composable lemmas in larger formal developments. To derive theorems about the original combinatorial problems rather than about PB constraints alone, we support verified encodings. This closes the trust gap between solver output and problem semantics since the constraint translation and its correctness proof are both formalized in Lean. We demonstrate the approach on various combinatorial problems.
Datalog is a lightweight logic programming language, based on the logic of Horn clauses. Lean, on the other hand, is a proof assistant system and language based on the Calculus of Inductive Constructions (CIC). Datalog is more constrained and less expressive than Lean but has a long history of established deduction algorithms. Writing definitions and queries in the Datalog fragment of Lean would be more succinct and understandable than writing them in Lean itself. This paper outlines the design and implementation of a shallow embedding of Datalog as a Domain Specific Language (DSL) on top of Lean. Bidirectional interoperability between the Datalog DSL and Lean is a primary goal of this design. In addition to rules and facts, backward chaining queries are automatically translated into theorems with tactic-based proofs. The paper also includes three simple examples of how the DSL can be used.
We present Lean Finder, a semantic search engine for Lean and mathlib that understands and aligns with the intents of mathematicians. Progress in formal theorem proving is often hindered by the difficulty of locating relevant theorems and the steep learning curve of the Lean 4 language, making advancement slow and labor-intensive. Existing Lean search engines, though helpful, rely primarily on informalizations (natural language translation of the formal statements), while largely overlooking the mismatch with real-world user queries. In contrast, we propose a user-centered semantic search tailored to the needs of mathematicians. Our approach begins by analyzing and clustering the semantics of public Lean discussions, then fine-tuning text embeddings on synthesized queries that emulate user intents. We further align Lean Finder with mathematicians' preferences using diverse feedback signals, encoding it with a rich awareness of their goals from multiple perspectives. Evaluations on real-world queries, informalized statements, and proof states demonstrate that our Lean Finder achieves over $30\%$ relative improvement compared to previous search engines and GPT-4o. In addition, Lean F
We introduce the Kimina Lean Server, an open-source project designed as a high-performance verifier for reinforcement learning pipelines. Built on top of the Lean REPL (Read-Eval-Print Loop) maintained by the Lean FRO, our server combines server-side parallelism by managing multiple Lean processes in parallel with a Least Recently Used (LRU) caching mechanism that reuses Lean imports across requests. On the client side, a lightweight Python package enables submitting proof batches and receiving Lean feedback, including extracted tactics and tactic states. Together, these features enable a scalable workflow for large-scale verification and data extraction. In our experiments, the Kimina Lean Server outperforms previous Lean interaction tools, achieving a 1.5 to 2 times speedup in verification time. Moreover, its improved efficiency has enabled its use in the large-scale training of state-of-the-art models such as Kimina-Prover. We hope that our open-source project will support the neural theorem proving community and accelerate future progress by enabling efficient large-scale verification and proof data extraction.
Applying Gröbner basis theory to concrete problems in Lean 4 remains difficult since the current formalization of multivariate polynomials is based on a non-computable representation and is therefore not suitable for efficient symbolic computation. As a result, computing Gröbner bases directly inside Lean is impractical for realistic examples. To address this issue, we propose a certificate-based approach that combines external computer algebra systems, such as SageMath or SymPy, with formal verification in Lean 4. Our approach uses a computable representation of multivariate polynomials in Lean to import and verify externally generated Gröbner basis computations. The external solver carries out the main algebraic computations, while the returned results are verified inside Lean. Based on this method, we develop automated tactics that transfer polynomial data between Lean and the external system and certify the returned results. These tactics support tasks such as remainder verification, Gröbner basis checking, ideal equality, and ideal or radical membership. This work provides a practical way to integrate external symbolic computation into Lean 4 while preserving the reliability o
We present the design and implementation of the Small Scale Reflection proof methodology and tactic language (a.k.a. SSR) for the Lean 4 proof assistant. Like its Coq predecessor SSReflect, our Lean 4 implementation, dubbed LeanSSR, provides powerful rewriting principles and means for effective management of hypotheses in the proof context. Unlike SSReflect for Coq, LeanSSR does not require explicit switching between the logical and symbolic representation of a goal, allowing for even more concise proof scripts that seamlessly combine deduction steps with proofs by computation. In this paper, we first provide a gentle introduction to the principles of structuring mechanised proofs using LeanSSR. Next, we show how the native support for metaprogramming in Lean 4 makes it possible to develop LeanSSR entirely within the proof assistant, greatly improving the overall experience of both tactic implementers and proof engineers. Finally, we demonstrate the utility of LeanSSR by conducting two case studies: (a) porting a collection of Coq lemmas about sequences from the widely used Mathematical Components library and (b) reimplementing proofs in the finite set library of Lean's mathlib4. B
Formal theorem-proving benchmarks enable mechanically verifiable evaluation of mathematical reasoning in large language models. However, existing benchmarks mainly focus on Olympiad-style problems and algebraic domains, leaving computational and applied mathematics underrepresented. We introduce CAM-Bench, a Lean 4 theorem-proving benchmark of 1,000 Lean proof targets in computational and applied mathematics, with coverage spanning optimization, numerical linear algebra, and numerical analysis. These problems are adapted from textbook exercises and often depend on locally introduced definitions, notation, algorithms, and elementary results. To construct CAM-Bench, we develop a dependency-recovery pipeline that reconstructs the local textbook context needed to state each problem faithfully. It then normalizes each problem into a standalone informal theorem and translates it into a Lean target. We validate the resulting formal problems through Lean compilation and semantic review, checking both formal correctness and semantic alignment with the original exercises. For each problem, we release the raw exercise, recovered context, normalized informal theorem, and final Lean target. CAM
We present a benchmark for evaluating AI models and agents on real-world formal software verification tasks. We first scrape 11,039 property-based tests (PBTs) from real-world Python repositories, then automatically translate 2,772 of them (25%) into 9,415 Lean 4 specifications with sorry placeholders (about 3 formalizations/PBT; we retain multiple attempts when none dominates on quality metrics). Translating PBTs into Lean specifications is challenging: it requires modeling Python semantics in Lean, inferring the logical property encoded in an imperative PBT, and handling the inherent difficulties of dependently-typed programming in a seldom-used language. We describe a three-agent LLM pipeline for transpiling PBTs into Lean specifications, evaluate coverage and quality metrics, and provide baselines for proof generation using several automated and model based approaches. All code (scraper and agents) and data (PBTs and Lean specifications) are open source. Our benchmark aims to drive progress on the underexplored problem of AI-assisted formal verification of real-world software, which is of increasing interest as AI produces more and more of the world's code.
The On-Line Encyclopedia of Integer Sequences (OEIS) is a web-accessible database cataloging interesting integer sequences and associated theorems. With more than 12,000 citations, the OEIS is one of the most highly cited resources in all of theoretical mathematics. In this paper, we present Sequencelib, a project to formalize the mathematics contained within the OEIS using the Lean programming language. Sequencelib includes a library of Lean formalizations of OEIS sequences as well as metaprogramming tools for programmatically attaching OEIS metadata to Lean definitions and deriving theorems about their values. Further, we describe OEIS-LT, a highly scalable Lean server that exposes these tools via a low-latency API. Finally, using OEIS-LT and prior work of Gauthier, et al., we describe a computational pipeline that formalized more than 25,000 sequences from the OEIS and proved more than 1.6 million theorems about their values. Our method makes use of a transpiler, available in OEIS-LT, that is capable of translating a subset of Standard ML to Lean, together with a set of performance improvement transformations and proofs of correctness.
Lean is an increasingly popular proof assistant based on dependent type theory. Despite its success, it still lacks important automation features present in more seasoned proof assistants, such as the Sledgehammer tactic in Isabelle/HOL. A key aspect of Sledgehammer is the use of proof-producing SMT solvers to prove a translated proof goal and the reconstruction of the resulting proof into valid justifications for the original goal. We present Lean-SMT, a tactic providing this functionality in Lean. We detail how the tactic converts Lean goals into SMT problems and, more importantly, how it reconstructs SMT proofs into native Lean proofs. We evaluate the tactic on established benchmarks used to evaluate Sledgehammer's SMT integration, with promising results. We also evaluate Lean-SMT as a standalone proof checker for proofs of SMT-LIB problems. We show that Lean-SMT offers a smaller trusted core without sacrificing too much performance.
We introduce CSLib, an open-source framework for proving computer-science-related theorems and writing formally verified code in the Lean proof assistant. CSLib aims to be for computer science what Lean's Mathlib is for mathematics. Mathlib has been tremendously impactful: it is a key reason for Lean's popularity within the mathematics research community, and it has also played a critical role in the training of AI systems for mathematical reasoning. However, the base of computer science knowledge in Lean is currently quite limited. CSLib will vastly enhance this knowledge base and provide infrastructure for using this knowledge in real-world verification projects. By doing so, CSLib will (1) enable the broad use of Lean in computer science education and research, and (2) facilitate the manual and AI-aided engineering of large-scale formally verified systems.
Neural theorem proving combines large language models (LLMs) with proof assistants such as Lean, where the correctness of formal proofs can be rigorously verified, leaving no room for hallucination. With existing neural theorem provers pretrained on a fixed collection of data and offering valuable suggestions at times, it is challenging for them to continually prove novel theorems in a fully autonomous mode, where human insights may be critical. In this paper, we explore LLMs as copilots that assist humans in proving theorems. We introduce Lean Copilot, a general framework for running LLM inference natively in Lean. It enables programmers to build various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users. Lean users can use our pretrained models or bring their own ones that run either locally (with or without GPUs) or on the cloud. Using Lean Copilot, we build LLM-based tools that suggest proof steps, complete proof goals, and select relevant premises. Experimental results on the Mathematics in Lean textbook demonstrate the effectiveness of our method compared to existing rule-based proof automation in Lean (aesop). When assisting humans, Le
Proof automation is crucial to large-scale formal mathematics and software/hardware verification projects in ITPs. Sophisticated tools called hammers have been developed to provide general-purpose proof automation in ITPs such as Coq and Isabelle, leveraging the power of ATPs. An important component of a hammer is the translation algorithm from the ITP's logical system to the ATP's logical system. In this paper, we propose a novel translation algorithm for ITPs based on dependent type theory. The algorithm is implemented in Lean 4 under the name Lean-auto. When combined with ATPs, Lean-auto provides general-purpose, ATP-based proof automation in Lean 4 for the first time. Soundness of the main translation procedure is guaranteed, and experimental results suggest that our algorithm is sufficiently complete to automate the proof of many problems that arise in practical uses of Lean 4. We also find that Lean-auto solves more problems than existing tools on Lean 4's math library Mathlib4.
This study aims to observe if the theorem prover Lean positively influences students' understanding of mathematical proving. To this end, we perform a pilot study concerning freshmen students at the University of Zurich (UZH). While doing so, we apply certain teaching methods and gather data from the volunteer students enrolled in the ``Foundations of Mathematics'' course. After eleven weeks of study covering some exercise questions implemented with Lean, we measure Lean students' performances in proving mathematical statements, compared to other students who are not engaged with Lean. For this measurement, we interview five Lean and four Non-Lean students and we analyze the scores of all students in the final exam. Finally, we check significance by performing a $t$-test for independent samples and the Mann-Whitney $U$-test.
In this paper we present a new "external checker" for the Lean theorem prover, written in Lean itself. This is the first complete typechecker for Lean 4 other than the reference implementation in C++ used by Lean itself, and our new checker is competitive with the original, running between 20% and 50% slower and usable to verify all of Lean's mathlib library, forming an additional step in Lean's aim to self-host the full elaborator and compiler. Moreover, because the checker is written in a language which admits formal verification, it is possible to state and prove properties about the kernel itself, and we report on progress to formalize the Lean type theory abstractly and prove some theorems about it. Finally, we combine these to get a proof of correctness of parts of the kernel. We plan to use this project to help justify any future changes to the kernel and type theory and ensure unsoundness does not sneak in through either the abstract theory or implementation bugs. The verification is already paying off, as one soundness bug has been spotted and fixed as a result of this work.
We present a formal verification of Wolstenholme's theorem -- $\binom{2p}{p} \equiv 2 \pmod{p^3}$ for prime $p \geq 5$ -- in Lean~4 with Mathlib. The proof proceeds by expanding the shifted factorial product $\prod_{k=1}^{p-1}(p+k)$ to second order in $p$, identifying the quadratic coefficient as the second elementary symmetric product, and showing its divisibility by $p$ via power sum vanishing in $\mathbb{Z}/p\mathbb{Z}$. The formalization comprises nine lemmas across approximately 800 lines of Lean, with zero \texttt{sorry} declarations. To our knowledge, this is the first formal verification of Wolstenholme's theorem in Lean~4. The proof was discovered through a collaboration between a relational analogy engine for theorem proving and human-directed formalization.
Lean R&D has been used at PUC-Rio to foster industry-academia collaboration in innovation projects across multiple sectors. This industrial experience paper describes recent experiences and evaluation results from applying Lean R&D in partnership with Petrobras in the oil and gas sector and Americanas in retail. The findings highlight Lean R&D's effectiveness in transforming ideas into meaningful business outcomes. Based on responses from 57 participants - including team members, managers, and sponsors - the assessment indicates that stakeholders find the structured phases of Lean R&D well-suited to innovation projects and endorse the approach. Although acknowledging that successful collaboration relies on various factors, this industrial experience positions Lean R&D as a promising framework for industry-academia projects focused on achieving rapid, impactful results for industry partners.
The rapid evolution of autonomous, agentic artificial intelligence within financial services has introduced an existential architectural crisis: large language models (LLMs) are probabilistic, non-deterministic systems operating in domains that demand absolute, mathematically verifiable compliance guarantees. Existing guardrail solutions -- including NVIDIA NeMo Guardrails and Guardrails AI -- rely on probabilistic classifiers and syntactic validators that are fundamentally inadequate for enforcing complex multi-variable regulatory constraints mandated by the SEC, FINRA, and OCC. This paper presents the Lean-Agent Protocol, a formal-verification-based AI guardrail platform that leverages the Aristotle neural-symbolic model developed by Harmonic AI to auto-formalize institutional policies into Lean 4 code. Every proposed agentic action is treated as a mathematical conjecture: execution is permitted if and only if the Lean 4 kernel proves that the action satisfies pre-compiled regulatory axioms. This architecture provides cryptographic-level compliance certainty at microsecond latency, directly satisfying SEC Rule 15c3-5, OCC Bulletin 2011-12, FINRA Rule 3110, and CFPB explainability