Cross-Modal Content Optimization for Steering Web Agent Preferences
2510.03612v1
cs.AI, cs.CR
2025-10-08
Авторы:
Tanqiu Jiang, Min Bai, Nikolaos Pappas, Yanjun Qi, Sandesh Swamy
Abstract
Vision-language model (VLM)-based web agents increasingly power high-stakes
selection tasks like content recommendation or product ranking by combining
multimodal perception with preference reasoning. Recent studies reveal that
these agents are vulnerable against attackers who can bias selection outcomes
through preference manipulations using adversarial pop-ups, image
perturbations, or content tweaks. Existing work, however, either assumes strong
white-box access, with limited single-modal perturbations, or uses impractical
settings. In this paper, we demonstrate, for the first time, that joint
exploitation of visual and textual channels yields significantly more powerful
preference manipulations under realistic attacker capabilities. We introduce
Cross-Modal Preference Steering (CPS) that jointly optimizes imperceptible
modifications to an item's visual and natural language descriptions, exploiting
CLIP-transferable image perturbations and RLHF-induced linguistic biases to
steer agent decisions. In contrast to prior studies that assume gradient
access, or control over webpages, or agent memory, we adopt a realistic
black-box threat setup: a non-privileged adversary can edit only their own
listing's images and textual metadata, with no insight into the agent's model
internals. We evaluate CPS on agents powered by state-of-the-art proprietary
and open source VLMs including GPT-4.1, Qwen-2.5VL and Pixtral-Large on both
movie selection and e-commerce tasks. Our results show that CPS is
significantly more effective than leading baseline methods. For instance, our
results show that CPS consistently outperforms baselines across all models
while maintaining 70% lower detection rates, demonstrating both effectiveness
and stealth. These findings highlight an urgent need for robust defenses as
agentic systems play an increasingly consequential role in society.
Ссылки и действия
Дополнительные ресурсы: