DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
2510.10815v2
cs.AI, cs.CL, cs.IR, cs.SC
2025-10-15
Авторы:
Meiru Zhang, Philipp Borchert, Milan Gritta, Gerasimos Lampouras
Abstract
Automating the formalization of mathematical statements for theorem proving
remains a major challenge for Large Language Models (LLMs). LLMs struggle to
identify and utilize the prerequisite mathematical knowledge and its
corresponding formal representation in languages like Lean. Current
retrieval-augmented autoformalization methods query external libraries using
the informal statement directly, but overlook a fundamental limitation:
informal mathematical statements are often complex and offer limited context on
the underlying math concepts. To address this, we introduce DRIFT, a novel
framework that enables LLMs to decompose informal mathematical statements into
smaller, more tractable ''sub-components''. This facilitates targeted retrieval
of premises from mathematical libraries such as Mathlib. Additionally, DRIFT
retrieves illustrative theorems to help models use premises more effectively in
formalization tasks. We evaluate DRIFT across diverse benchmarks (ProofNet,
ConNF, and MiniF2F-test) and find that it consistently improves premise
retrieval, nearly doubling the F1 score compared to the DPR baseline on
ProofNet. Notably, DRIFT demonstrates strong performance on the
out-of-distribution ConNF benchmark, with BEq+@10 improvements of 37.14% and
42.25% using GPT-4.1 and DeepSeek-V3.1, respectively. Our analysis shows that
retrieval effectiveness in mathematical autoformalization depends heavily on
model-specific knowledge boundaries, highlighting the need for adaptive
retrieval strategies aligned with each model's capabilities.