GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?
2510.20333v1
cs.CR, cs.AI
2025-10-25
Авторы:
Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang
Abstract
Vision-Language Models (VLMs) are increasingly deployed as autonomous agents
to navigate mobile graphical user interfaces (GUIs). Operating in dynamic
on-device ecosystems, which include notifications, pop-ups, and inter-app
interactions, exposes them to a unique and underexplored threat vector:
environmental injection. Unlike prompt-based attacks that manipulate textual
instructions, environmental injection corrupts an agent's visual perception by
inserting adversarial UI elements (for example, deceptive overlays or spoofed
notifications) directly into the GUI. This bypasses textual safeguards and can
derail execution, causing privacy leakage, financial loss, or irreversible
device compromise. To systematically evaluate this threat, we introduce
GhostEI-Bench, the first benchmark for assessing mobile agents under
environmental injection attacks within dynamic, executable environments. Moving
beyond static image-based assessments, GhostEI-Bench injects adversarial events
into realistic application workflows inside fully operational Android emulators
and evaluates performance across critical risk scenarios. We further propose a
judge-LLM protocol that conducts fine-grained failure analysis by reviewing the
agent's action trajectory alongside the corresponding screenshot sequence,
pinpointing failure in perception, recognition, or reasoning. Comprehensive
experiments on state-of-the-art agents reveal pronounced vulnerability to
deceptive environmental cues: current models systematically fail to perceive
and reason about manipulated UIs. GhostEI-Bench provides a framework for
quantifying and mitigating this emerging threat, paving the way toward more
robust and secure embodied agents.
Ссылки и действия
Дополнительные ресурсы: