Exploiting Web Search Tools of AI Agents for Data Exfiltration
2510.09093v1
cs.CR, cs.CL, 68T50, 68T0, F.2.2; I.2.7; K.6.5
2025-10-14
Авторы:
Dennis Rall, Bernhard Bauer, Mohit Mittal, Thomas Fraunholz
Abstract
Large language models (LLMs) are now routinely used to autonomously execute
complex tasks, from natural language processing to dynamic workflows like web
searches. The usage of tool-calling and Retrieval Augmented Generation (RAG)
allows LLMs to process and retrieve sensitive corporate data, amplifying both
their functionality and vulnerability to abuse. As LLMs increasingly interact
with external data sources, indirect prompt injection emerges as a critical and
evolving attack vector, enabling adversaries to exploit models through
manipulated inputs. Through a systematic evaluation of indirect prompt
injection attacks across diverse models, we analyze how susceptible current
LLMs are to such attacks, which parameters, including model size and
manufacturer, specific implementations, shape their vulnerability, and which
attack methods remain most effective. Our results reveal that even well-known
attack patterns continue to succeed, exposing persistent weaknesses in model
defenses. To address these vulnerabilities, we emphasize the need for
strengthened training procedures to enhance inherent resilience, a centralized
database of known attack vectors to enable proactive defense, and a unified
testing framework to ensure continuous security validation. These steps are
essential to push developers toward integrating security into the core design
of LLMs, as our findings show that current models still fail to mitigate
long-standing threats.