RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines
2510.20768v1
cs.CR, cs.AI, cs.IR
2025-10-25
Авторы:
Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu
Abstract
Retrieval-Augmented Generation (RAG) has emerged as the dominant
architectural pattern to operationalize Large Language Model (LLM) usage in
Cyber Threat Intelligence (CTI) systems. However, this design is susceptible to
poisoning attacks, and previously proposed defenses can fail for CTI contexts
as cyber threat information is often completely new for emerging attacks, and
sophisticated threat actors can mimic legitimate formats, terminology, and
stylistic conventions. To address this issue, we propose that the robustness of
modern RAG defenses can be accelerated by applying source credibility
algorithms on corpora, using PageRank as an example. In our experiments, we
demonstrate quantitatively that our algorithm applies a lower authority score
to malicious documents while promoting trusted content, using the standardized
MS MARCO dataset. We also demonstrate proof-of-concept performance of our
algorithm on CTI documents and feeds.
Ссылки и действия
Дополнительные ресурсы: