Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs
2510.07697v1
cs.CR, cs.AI
2025-10-11
Авторы:
Man Hu, Xinyi Wu, Zuofeng Suo, Jinbo Feng, Linghui Meng, Yanhao Jia, Anh Tuan Luu, Shuai Zhao
Abstract
With the rise of advanced reasoning capabilities, large language models
(LLMs) are receiving increasing attention. However, although reasoning improves
LLMs' performance on downstream tasks, it also introduces new security risks,
as adversaries can exploit these capabilities to conduct backdoor attacks.
Existing surveys on backdoor attacks and reasoning security offer comprehensive
overviews but lack in-depth analysis of backdoor attacks and defenses targeting
LLMs' reasoning abilities. In this paper, we take the first step toward
providing a comprehensive review of reasoning-based backdoor attacks in LLMs by
analyzing their underlying mechanisms, methodological frameworks, and
unresolved challenges. Specifically, we introduce a new taxonomy that offers a
unified perspective for summarizing existing approaches, categorizing
reasoning-based backdoor attacks into associative, passive, and active. We also
present defense strategies against such attacks and discuss current challenges
alongside potential directions for future research. This work offers a novel
perspective, paving the way for further exploration of secure and trustworthy
LLM communities.
Ссылки и действия
Дополнительные ресурсы: