DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

2510.27419v1 cs.AI, cs.CL 2025-11-04

Авторы:

Tian Liang, Wenxiang Jiao, Zhiwei He, Jiahao Xu, Haitao Mi, Dong Yu

Abstract

Large Reasoning Models (LRMs) have demonstrated impressive capabilities but suffer from cognitive inefficiencies like ``overthinking'' simple problems and ``underthinking'' complex ones. While existing methods that use supervised fine-tuning~(SFT) or reinforcement learning~(RL) with token-length rewards can improve efficiency, they often do so at the cost of accuracy. This paper introduces \textbf{DeepCompress}, a novel framework that simultaneously enhances both the accuracy and efficiency of LRMs. We challenge the prevailing approach of consistently favoring shorter reasoning paths, showing that longer responses can contain a broader range of correct solutions for difficult problems. DeepCompress employs an adaptive length reward mechanism that dynamically classifies problems as ``Simple'' or ``Hard'' in real-time based on the model's evolving capability. It encourages shorter, more efficient reasoning for ``Simple'' problems while promoting longer, more exploratory thought chains for ``Hard'' problems. This dual-reward strategy enables the model to autonomously adjust its Chain-of-Thought (CoT) length, compressing reasoning for well-mastered problems and extending it for those it finds challenging. Experimental results on challenging mathematical benchmarks show that DeepCompress consistently outperforms baseline methods, achieving superior accuracy while significantly improving token efficiency.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Ontology Learning with LLMs: A Benchmark Study on Axiom Identification

To Err Is Human: Systematic Quantification of Errors in Published AI Papers via ...

On the Computability of Artificial General Intelligence

Algorithmic Thinking Theory

From Atomic to Composite: Reinforcement Learning Enables Generalization in Compl...

Навигация