Assessing LLM Reasoning Steps via Principal Knowledge Grounding
2511.00879v1
cs.CL, cs.AI, cs.LG
2025-11-06
Авторы:
Hyeon Hwang, Yewon Cho, Chanwoong Yoon, Yein Park, Minju Song, Kyungjae Lee, Gangwoo Kim, Jaewoo Kang
Abstract
Step-by-step reasoning has become a standard approach for large language
models (LLMs) to tackle complex tasks. While this paradigm has proven
effective, it raises a fundamental question: How can we verify that an LLM's
reasoning is accurately grounded in knowledge? To address this question, we
introduce a novel evaluation suite that systematically assesses the knowledge
grounding of intermediate reasoning. Our framework comprises three key
components. (1) Principal Knowledge Collection, a large-scale repository of
atomic knowledge essential for reasoning. Based on the collection, we propose
(2) knowledge-grounded evaluation metrics designed to measure how well models
recall and apply prerequisite knowledge in reasoning. These metrics are
computed by our (3) evaluator LLM, a lightweight model optimized for
cost-effective and reliable metric computation. Our evaluation suite
demonstrates remarkable effectiveness in identifying missing or misapplied
knowledge elements, providing crucial insights for uncovering fundamental
reasoning deficiencies in LLMs. Beyond evaluation, we demonstrate how these
metrics can be integrated into preference optimization, showcasing further
applications of knowledge-grounded evaluation.
Ссылки и действия
Дополнительные ресурсы: