MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation
2510.15186v1
cs.CR, cs.CL
2025-10-21
Авторы:
Gurusha Juneja, Jayanth Naga Sai Pasupulati, Alon Albalak, Wenyue Hua, William Yang Wang
Abstract
A core challenge for autonomous LLM agents in collaborative settings is
balancing robust privacy understanding and preservation alongside task
efficacy. Existing privacy benchmarks only focus on simplistic, single-turn
interactions where private information can be trivially omitted without
affecting task outcomes. In this paper, we introduce MAGPIE (Multi-AGent
contextual PrIvacy Evaluation), a novel benchmark of 200 high-stakes tasks
designed to evaluate privacy understanding and preservation in multi-agent
collaborative, non-adversarial scenarios. MAGPIE integrates private information
as essential for task resolution, forcing agents to balance effective
collaboration with strategic information control. Our evaluation reveals that
state-of-the-art agents, including GPT-5 and Gemini 2.5-Pro, exhibit
significant privacy leakage, with Gemini 2.5-Pro leaking up to 50.7% and GPT-5
up to 35.1% of the sensitive information even when explicitly instructed not
to. Moreover, these agents struggle to achieve consensus or task completion and
often resort to undesirable behaviors such as manipulation and power-seeking
(e.g., Gemini 2.5-Pro demonstrating manipulation in 38.2% of the cases). These
findings underscore that current LLM agents lack robust privacy understanding
and are not yet adequately aligned to simultaneously preserve privacy and
maintain effective collaboration in complex environments.
Ссылки и действия
Дополнительные ресурсы: