RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
2510.11195v1
cs.CR, cs.AI
2025-10-15
Авторы:
Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli
Abstract
Retrieval-Augmented Generation (RAG) increases the reliability and
trustworthiness of the LLM response and reduces hallucination by eliminating
the need for model retraining. It does so by adding external data into the
LLM's context. We develop a new class of black-box attack, RAG-Pull, that
inserts hidden UTF characters into queries or external code repositories,
redirecting retrieval toward malicious code, thereby breaking the models'
safety alignment. We observe that query and code perturbations alone can shift
retrieval toward attacker-controlled snippets, while combined query-and-target
perturbations achieve near-perfect success. Once retrieved, these snippets
introduce exploitable vulnerabilities such as remote code execution and SQL
injection. RAG-Pull's minimal perturbations can alter the model's safety
alignment and increase preference towards unsafe code, therefore opening up a
new class of attacks on LLMs.
Ссылки и действия
Дополнительные ресурсы: