ToolTweak: An Attack on Tool Selection in LLM-based Agents
2510.02554v1
cs.CR, cs.AI
2025-10-07
Авторы:
Jonathan Sneh, Ruomei Yan, Jialin Yu, Philip Torr, Yarin Gal, Sunando Sengupta, Eric Sommerlade, Alasdair Paren, Adel Bibi
Abstract
As LLMs increasingly power agents that interact with external tools, tool use
has become an essential mechanism for extending their capabilities. These
agents typically select tools from growing databases or marketplaces to solve
user tasks, creating implicit competition among tool providers and developers
for visibility and usage. In this paper, we show that this selection process
harbors a critical vulnerability: by iteratively manipulating tool names and
descriptions, adversaries can systematically bias agents toward selecting
specific tools, gaining unfair advantage over equally capable alternatives. We
present ToolTweak, a lightweight automatic attack that increases selection
rates from a baseline of around 20% to as high as 81%, with strong
transferability between open-source and closed-source models. Beyond individual
tools, we show that such attacks cause distributional shifts in tool usage,
revealing risks to fairness, competition, and security in emerging tool
ecosystems. To mitigate these risks, we evaluate two defenses: paraphrasing and
perplexity filtering, which reduce bias and lead agents to select functionally
similar tools more equally. All code will be open-sourced upon acceptance.
Ссылки и действия
Дополнительные ресурсы: