Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots
2510.17369v1
cs.RO, cs.AI, cs.LG
2025-10-22
Авторы:
Haochen Su, Cristian Meo, Francesco Stella, Andrea Peirone, Kai Junge, Josie Hughes
Abstract
Robotic systems are increasingly expected to operate in human-centered,
unstructured environments where safety, adaptability, and generalization are
essential. Vision-Language-Action (VLA) models have been proposed as a language
guided generalized control framework for real robots. However, their deployment
has been limited to conventional serial link manipulators. Coupled by their
rigidity and unpredictability of learning based control, the ability to safely
interact with the environment is missing yet critical. In this work, we present
the deployment of a VLA model on a soft continuum manipulator to demonstrate
autonomous safe human-robot interaction. We present a structured finetuning and
deployment pipeline evaluating two state-of-the-art VLA models (OpenVLA-OFT and
$\pi_0$) across representative manipulation tasks, and show while
out-of-the-box policies fail due to embodiment mismatch, through targeted
finetuning the soft robot performs equally to the rigid counterpart. Our
findings highlight the necessity of finetuning for bridging embodiment gaps,
and demonstrate that coupling VLA models with soft robots enables safe and
flexible embodied AI in human-shared environments.
Ссылки и действия
Дополнительные ресурсы: