Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models
2510.13237v1
cs.CV, cs.LG
2025-10-17
Авторы:
Haochuan Xu, Yun Sing Koh, Shuhuai Huang, Zirun Zhou, Di Wang, Jun Sakuma, Jingfeng Zhang
Abstract
Vision-Language-Action (VLA) models have achieved revolutionary progress in
robot learning, enabling robots to execute complex physical robot tasks from
natural language instructions. Despite this progress, their adversarial
robustness remains underexplored. In this work, we propose both adversarial
patch attack and corresponding defense strategies for VLA models. We first
introduce the Embedding Disruption Patch Attack (EDPA), a model-agnostic
adversarial attack that generates patches directly placeable within the
camera's view. In comparison to prior methods, EDPA can be readily applied to
different VLA models without requiring prior knowledge of the model
architecture, or the controlled robotic manipulator. EDPA constructs these
patches by (i) disrupting the semantic alignment between visual and textual
latent representations, and (ii) maximizing the discrepancy of latent
representations between adversarial and corresponding clean visual inputs.
Through the optimization of these objectives, EDPA distorts the VLA's
interpretation of visual information, causing the model to repeatedly generate
incorrect actions and ultimately result in failure to complete the given
robotic task. To counter this, we propose an adversarial fine-tuning scheme for
the visual encoder, in which the encoder is optimized to produce similar latent
representations for both clean and adversarially perturbed visual inputs.
Extensive evaluations on the widely recognized LIBERO robotic simulation
benchmark demonstrate that EDPA substantially increases the task failure rate
of cutting-edge VLA models, while our proposed defense effectively mitigates
this degradation. The codebase is accessible via the homepage at
https://edpa-attack.github.io/.
Ссылки и действия
Дополнительные ресурсы: