Are Language Models Efficient Reasoners? A Perspective from Logic Programming
2510.25626v1
cs.CL, cs.AI, cs.LG, cs.LO
2025-10-31
Авторы:
Andreas Opedal, Yanick Zengaffinen, Haruki Shirakami, Clemente Pasti, Mrinmaya Sachan, Abulhair Saparov, Ryan Cotterell, Bernhard Schölkopf
Abstract
Modern language models (LMs) exhibit strong deductive reasoning capabilities,
yet standard evaluations emphasize correctness while overlooking a key aspect
of human-like reasoning: efficiency. In real-world reasoning scenarios, much of
the available information is irrelevant, and effective deductive inference
requires identifying and ignoring such distractions. We propose a framework for
assessing LM reasoning efficiency through the lens of logic programming,
introducing a simple method to align proofs written in natural language -- as
generated by an LM -- with shortest proofs found by executing the logic
program. Efficiency is quantified by measuring how well a model avoids
unnecessary inference. Empirically, we construct a dataset of math word
problems injected with various number of irrelevant axioms that vary in
semantic overlap with the goal theorem. We find that current LMs show marked
accuracy declines under such conditions -- even with minimal, domain-consistent
distractions -- and the proofs they generate frequently exhibit detours through
irrelevant inferences.