Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors
2510.00586v1
cs.LG, cs.CL, cs.CR
2025-10-04
Авторы:
Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen
Abstract
Existing data poisoning attacks on retrieval-augmented generation (RAG)
systems scale poorly because they require costly optimization of poisoned
documents for each target phrase. We introduce Eyes-on-Me, a modular attack
that decomposes an adversarial document into reusable Attention Attractors and
Focus Regions. Attractors are optimized to direct attention to the Focus
Region. Attackers can then insert semantic baits for the retriever or malicious
instructions for the generator, adapting to new targets at near zero cost. This
is achieved by steering a small subset of attention heads that we empirically
identify as strongly correlated with attack success. Across 18 end-to-end RAG
settings (3 datasets $\times$ 2 retrievers $\times$ 3 generators), Eyes-on-Me
raises average attack success rates from 21.9 to 57.8 (+35.9 points,
2.6$\times$ over prior work). A single optimized attractor transfers to unseen
black box retrievers and generators without retraining. Our findings establish
a scalable paradigm for RAG data poisoning and show that modular, reusable
components pose a practical threat to modern AI systems. They also reveal a
strong link between attention concentration and model outputs, informing
interpretability research.
Ссылки и действия
Дополнительные ресурсы: