\texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs
2511.00488v1
cs.PL, cs.CL
2025-11-06
Авторы:
Jun Gao, Yun Peng, Xiaoxue Ren
Abstract
Large Language Models (LLMs) have achieved remarkable progress in
code-related tasks. Despite their advancement, empirical evidence reveals that
they still struggle with \emph{deductive code reasoning}, the ability to reason
about the program execution process. While prior studies have recognized this
limitation, the underlying causes remain largely underexplored. In this paper,
we begin by presenting a comprehensive empirical study that reveals three key
challenges undermining deductive code reasoning: (1) an intrinsic gap between
generation and reasoning abilities, (2) a consistent bias towards code sources,
and (3) weak zero-shot generalization on complex benchmarks. In light of these
challenges, we propose \texttt{ReMind}, a multi-agent framework composed of
\texttt{Mutator}, \texttt{Executor}, and \texttt{Inspector}. The
\texttt{Mutator} generates code variants to mitigate bias towards code sources,
the \texttt{Executor} traces variable states step-by-step to expose
inconsistency, and the \texttt{Inspector} identifies problematic reasoning
steps and provides control-flow refinement to bridge the intrinsic reasoning
gap. Through their coordinated collaboration, \texttt{ReMind} systematically
identifies and refines reasoning flaws, achieving outstanding performance and
enabling robust zero-shot generalization. Extensive experiments on two
benchmarks with five LLMs demonstrate the superior advantages of
\texttt{ReMind} compared to baseline approaches in deductive code reasoning.
Ссылки и действия
Дополнительные ресурсы: