Efficient Reasoning via Thought-Training and Thought-Free Inference
2511.03408v1
cs.CL, I.2.7
2025-11-07
Авторы:
Canhui Wu, Qiong Cao, Chao Xue, Wei Xi, Xiaodong He
Abstract
Recent advances in large language models (LLMs) have leveraged explicit
Chain-of-Thought (CoT) prompting to improve reasoning accuracy. However, most
existing methods primarily compress verbose reasoning outputs. These
Long-to-Short transformations aim to improve efficiency, but still rely on
explicit reasoning during inference. In this work, we introduce \textbf{3TF}
(\textbf{T}hought-\textbf{T}raining and \textbf{T}hought-\textbf{F}ree
inference), a framework for efficient reasoning that takes a Short-to-Long
perspective. We first train a hybrid model that can operate in both reasoning
and non-reasoning modes, and then further train it on CoT-annotated data to
internalize structured reasoning, while enforcing concise, thought-free outputs
at inference time using the no-reasoning mode. Unlike compression-based
approaches, 3TF improves the reasoning quality of non-reasoning outputs,
enabling models to perform rich internal reasoning implicitly while keeping
external outputs short. Empirically, 3TF-trained models obtain large
improvements on reasoning benchmarks under thought-free inference,
demonstrating that high quality reasoning can be learned and executed
implicitly without explicit step-by-step generation.
Ссылки и действия
Дополнительные ресурсы: