Demystifying Hybrid Thinking: Can LLMs Truly Switch Between Think and No-Think?
2510.12680v1
cs.LG, cs.AI, cs.CL
2025-10-16
Авторы:
Shouren Wang, Wang Yang, Xianxuan Long, Qifan Wang, Vipin Chaudhary, Xiaotian Han
Abstract
Hybrid thinking enables LLMs to switch between reasoning and direct
answering, offering a balance between efficiency and reasoning capability. Yet
our experiments reveal that current hybrid thinking LLMs only achieve partial
mode separation: reasoning behaviors often leak into the no-think mode. To
understand and mitigate this, we analyze the factors influencing
controllability and identify four that matter most: (1) larger data scale, (2)
using think and no-think answers from different questions rather than the same
question, (3) a moderate increase in no-think data number, and (4) a two-phase
strategy that first trains reasoning ability and then applies hybrid think
training. Building on these findings, we propose a practical recipe that,
compared to standard training, can maintain accuracy in both modes while
significantly reducing no-think output length (from $1085$ to $585$ on MATH500)
and occurrences of reasoning-supportive tokens such as ``\texttt{wait}'' (from
$5917$ to $522$ on MATH500). Our findings highlight the limitations of current
hybrid thinking and offer directions for strengthening its controllability.
Ссылки и действия
Дополнительные ресурсы: