Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
2510.02249v1
cs.CL, cs.AI, cs.LG
2025-10-04
Авторы:
Tianyi Jiang, Yi Bin, Yujuan Ding, Kainian Zhu, Fei Ma, Jingkuan Song, Heng Tao Shen
Abstract
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities
on complex problems using long Chain-of-Thought (CoT) reasoning. However, they
often suffer from overthinking, meaning generating unnecessarily lengthy
reasoning steps for simpler problems. This issue may degrade the efficiency of
the models and make them difficult to adapt the reasoning depth to the
complexity of problems. To address this, we introduce a novel metric Token
Entropy Cumulative Average (TECA), which measures the extent of exploration
throughout the reasoning process. We further propose a novel reasoning paradigm
-- Explore Briefly, Then Decide -- with an associated Cumulative Entropy
Regulation (CER) mechanism. This paradigm leverages TECA to help the model
dynamically determine the optimal point to conclude its thought process and
provide a final answer, thus achieving efficient reasoning. Experimental
results across diverse mathematical benchmarks show that our approach
substantially mitigates overthinking without sacrificing problem-solving
ability. With our thinking paradigm, the average response length decreases by
up to 71% on simpler datasets, demonstrating the effectiveness of our method in
creating a more efficient and adaptive reasoning process.
Ссылки и действия
Дополнительные ресурсы: