Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation

2508.03110v1 cs.CL 2025-08-09

Авторы:

Zizhong Li, Haopeng Zhang, Jiawei Zhang

Резюме на русском

Атаки на ретриверы в реализациях Retrieval-Augmented Generation (RAG) могут повлиять на качество и безопасность моделей, использующих эти технологии. Новый подход Token-level Precise Attack on the RAG (TPARAG) исследует эти уязвимости, ориентируясь на точные модификации генерируемых текстов за счет использования легковесного генеративного моделирования. Атака работает как в white-box, так и в black-box сценариях, оптимизируя внедренные вандальные сведения на уровне токенов, чтобы обеспечить высокую эффективность в обеих стадиях — поиска и синтеза. Исследования показали, что TPARAG превосходит существующие методы в обеих категориях, установив новые стандарты для эффективности атак. Эти выводы выделяют критические уязвимости RAG-систем и направляют внимание на необходимость укрепления их защиты.

Abstract

While large language models (LLMs) have achieved remarkable success in providing trustworthy responses for knowledge-intensive tasks, they still face critical limitations such as hallucinations and outdated knowledge. To address these issues, the retrieval-augmented generation (RAG) framework enhances LLMs with access to external knowledge via a retriever, enabling more accurate and real-time outputs about the latest events. However, this integration brings new security vulnerabilities: the risk that malicious content in the external database can be retrieved and used to manipulate model outputs. Although prior work has explored attacks on RAG systems, existing approaches either rely heavily on access to the retriever or fail to jointly consider both retrieval and generation stages, limiting their effectiveness, particularly in black-box scenarios. To overcome these limitations, we propose Token-level Precise Attack on the RAG (TPARAG), a novel framework that targets both white-box and black-box RAG systems. TPARAG leverages a lightweight white-box LLM as an attacker to generate and iteratively optimize malicious passages at the token level, ensuring both retrievability and high attack success in generation. Extensive experiments on open-domain QA datasets demonstrate that TPARAG consistently outperforms previous approaches in retrieval-stage and end-to-end attack effectiveness. These results further reveal critical vulnerabilities in RAG pipelines and offer new insights into improving their robustness.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Nexus: Higher-Order Attention Mechanisms in Transformers

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation

SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats

RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Mod...

Навигация