Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
2509.23041v1
cs.CR, cs.AI, cs.CL
2025-10-01
Авторы:
Zi Liang, Qingqing Ye, Xuan Liu, Yanyun Wang, Jianliang Xu, Haibo Hu
Abstract
Synthetic data refers to artificial samples generated by models. While it has
been validated to significantly enhance the performance of large language
models (LLMs) during training and has been widely adopted in LLM development,
potential security risks it may introduce remain uninvestigated. This paper
systematically evaluates the resilience of synthetic-data-integrated training
paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal
that such a paradigm exhibits strong resistance to existing attacks, primarily
thanks to the different distribution patterns between poisoning data and
queries used to generate synthetic samples. To enhance the effectiveness of
these attacks and further investigate the security risks introduced by
synthetic data, we introduce a novel and universal attack framework, namely,
Virus Infection Attack (VIA), which enables the propagation of current attacks
through synthetic data even under purely clean queries. Inspired by the
principles of virus design in cybersecurity, VIA conceals the poisoning payload
within a protective "shell" and strategically searches for optimal hijacking
points in benign samples to maximize the likelihood of generating malicious
content. Extensive experiments on both data poisoning and backdoor attacks show
that VIA significantly increases the presence of poisoning content in synthetic
data and correspondingly raises the attack success rate (ASR) on downstream
models to levels comparable to those observed in the poisoned upstream models.
Ссылки и действия
Дополнительные ресурсы: