SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
2509.23694v2
cs.AI, cs.CL, cs.CR
2025-10-02
Авторы:
Jianshuo Dong, Sheng Guo, Hao Wang, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu
Abstract
Search agents connect LLMs to the Internet, enabling access to broader and
more up-to-date information. However, unreliable search results may also pose
safety threats to end users, establishing a new threat surface. In this work,
we conduct two in-the-wild experiments to demonstrate both the prevalence of
low-quality search results and their potential to misguide agent behaviors. To
counter this threat, we introduce an automated red-teaming framework that is
systematic, scalable, and cost-efficient, enabling lightweight and harmless
safety assessments of search agents. Building on this framework, we construct
the SafeSearch benchmark, which includes 300 test cases covering five
categories of risks (e.g., misinformation and indirect prompt injection). Using
this benchmark, we evaluate three representative search agent scaffolds,
covering search workflow, tool-calling, and deep research, across 7 proprietary
and 8 open-source backend LLMs. Our results reveal substantial vulnerabilities
of LLM-based search agents: when exposed to unreliable websites, the highest
ASR reached 90.5% for GPT-4.1-mini under a search workflow setting. Moreover,
our analysis highlights the limited effectiveness of common defense practices,
such as reminder prompting. This emphasizes the value of our framework in
promoting transparency for safer agent development. Our codebase and test cases
are publicly available: https://github.com/jianshuod/SafeSearch.
Ссылки и действия
Дополнительные ресурсы: