From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
2510.14952v1
cs.RO, cs.CV
2025-10-18
Авторы:
Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Yibo Peng, Tao Huang, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang, Chang Xu
Abstract
Natural language offers a natural interface for humanoid robots, but existing
language-guided humanoid locomotion pipelines remain cumbersome and unreliable.
They typically decode human motion, retarget it to robot morphology, and then
track it with a physics-based controller. However, this multi-stage process is
prone to cumulative errors, introduces high latency, and yields weak coupling
between semantics and control. These limitations call for a more direct pathway
from language to action, one that eliminates fragile intermediate stages.
Therefore, we present RoboGhost, a retargeting-free framework that directly
conditions humanoid policies on language-grounded motion latents. By bypassing
explicit motion decoding and retargeting, RoboGhost enables a diffusion-based
policy to denoise executable actions directly from noise, preserving semantic
intent and supporting fast, reactive control. A hybrid causal
transformer-diffusion motion generator further ensures long-horizon consistency
while maintaining stability and diversity, yielding rich latent representations
for precise humanoid behavior. Extensive experiments demonstrate that RoboGhost
substantially reduces deployment latency, improves success rates and tracking
accuracy, and produces smooth, semantically aligned locomotion on real
humanoids. Beyond text, the framework naturally extends to other modalities
such as images, audio, and music, providing a general foundation for
vision-language-action humanoid systems.
Ссылки и действия
Дополнительные ресурсы: