Signature in Code Backdoor Detection, how far are we?
2510.13992v1
cs.SE, cs.LG
2025-10-19
Авторы:
Quoc Hung Le, Thanh Le-Cong, Bach Le, Bowen Xu
Abstract
As Large Language Models (LLMs) become increasingly integrated into software
development workflows, they also become prime targets for adversarial attacks.
Among these, backdoor attacks are a significant threat, allowing attackers to
manipulate model outputs through hidden triggers embedded in training data.
Detecting such backdoors remains a challenge, and one promising approach is the
use of Spectral Signature defense methods that identify poisoned data by
analyzing feature representations through eigenvectors. While some prior works
have explored Spectral Signatures for backdoor detection in neural networks,
recent studies suggest that these methods may not be optimally effective for
code models. In this paper, we revisit the applicability of Spectral
Signature-based defenses in the context of backdoor attacks on code models. We
systematically evaluate their effectiveness under various attack scenarios and
defense configurations, analyzing their strengths and limitations. We found
that the widely used setting of Spectral Signature in code backdoor detection
is often suboptimal. Hence, we explored the impact of different settings of the
key factors. We discovered a new proxy metric that can more accurately estimate
the actual performance of Spectral Signature without model retraining after the
defense.
Ссылки и действия
Дополнительные ресурсы: