Goal-oriented Backdoor Attack against Vision-Language-Action Models via Physical Objects
2510.09269v1
cs.CR, cs.CV, cs.LG
2025-10-14
Авторы:
Zirun Zhou, Zhengyang Xiao, Haochuan Xu, Jing Sun, Di Wang, Jingfeng Zhang
Abstract
Recent advances in vision-language-action (VLA) models have greatly improved
embodied AI, enabling robots to follow natural language instructions and
perform diverse tasks. However, their reliance on uncurated training datasets
raises serious security concerns. Existing backdoor attacks on VLAs mostly
assume white-box access and result in task failures instead of enforcing
specific actions. In this work, we reveal a more practical threat: attackers
can manipulate VLAs by simply injecting physical objects as triggers into the
training dataset. We propose goal-oriented backdoor attacks (GoBA), where the
VLA behaves normally in the absence of physical triggers but executes
predefined and goal-oriented actions in the presence of physical triggers.
Specifically, based on a popular VLA benchmark LIBERO, we introduce BadLIBERO
that incorporates diverse physical triggers and goal-oriented backdoor actions.
In addition, we propose a three-level evaluation that categorizes the victim
VLA's actions under GoBA into three states: nothing to do, try to do, and
success to do. Experiments show that GoBA enables the victim VLA to
successfully achieve the backdoor goal in 97 percentage of inputs when the
physical trigger is present, while causing zero performance degradation on
clean inputs. Finally, by investigating factors related to GoBA, we find that
the action trajectory and trigger color significantly influence attack
performance, while trigger size has surprisingly little effect. The code and
BadLIBERO dataset are accessible via the project page at
https://goba-attack.github.io/.
Ссылки и действия
Дополнительные ресурсы: