Trajectory Conditioned Cross-embodiment Skill Transfer
2510.07773v1
cs.RO, cs.AI
2025-10-11
Авторы:
YuHang Tang, Yixuan Lou, Pengfei Han, Haoming Song, Xinyi Ye, Dong Wang, Bin Zhao
Abstract
Learning manipulation skills from human demonstration videos presents a
promising yet challenging problem, primarily due to the significant embodiment
gap between human body and robot manipulators. Existing methods rely on paired
datasets or hand-crafted rewards, which limit scalability and generalization.
We propose TrajSkill, a framework for Trajectory Conditioned Cross-embodiment
Skill Transfer, enabling robots to acquire manipulation skills directly from
human demonstration videos. Our key insight is to represent human motions as
sparse optical flow trajectories, which serve as embodiment-agnostic motion
cues by removing morphological variations while preserving essential dynamics.
Conditioned on these trajectories together with visual and textual inputs,
TrajSkill jointly synthesizes temporally consistent robot manipulation videos
and translates them into executable actions, thereby achieving cross-embodiment
skill transfer. Extensive experiments are conducted, and the results on
simulation data (MetaWorld) show that TrajSkill reduces FVD by 39.6\% and KVD
by 36.6\% compared with the state-of-the-art, and improves cross-embodiment
success rate by up to 16.7\%. Real-robot experiments in kitchen manipulation
tasks further validate the effectiveness of our approach, demonstrating
practical human-to-robot skill transfer across embodiments.
Ссылки и действия
Дополнительные ресурсы: