MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
2510.18316v1
cs.RO, cs.AI, cs.LG
2025-10-23
Авторы:
Chengshu Li, Mengdi Xu, Arpit Bahety, Hang Yin, Yunfan Jiang, Huang Huang, Josiah Wong, Sujay Garlanka, Cem Gokmen, Ruohan Zhang, Weiyu Liu, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei
Abstract
Imitation learning from large-scale, diverse human demonstrations has proven
effective for training robots, but collecting such data is costly and
time-consuming. This challenge is amplified for multi-step bimanual mobile
manipulation, where humans must teleoperate both a mobile base and two
high-degree-of-freedom arms. Prior automated data generation frameworks have
addressed static bimanual manipulation by augmenting a few human demonstrations
in simulation, but they fall short for mobile settings due to two key
challenges: (1) determining base placement to ensure reachability, and (2)
positioning the camera to provide sufficient visibility for visuomotor
policies. To address these issues, we introduce MoMaGen, which formulates data
generation as a constrained optimization problem that enforces hard constraints
(e.g., reachability) while balancing soft constraints (e.g., visibility during
navigation). This formulation generalizes prior approaches and provides a
principled foundation for future methods. We evaluate MoMaGen on four
multi-step bimanual mobile manipulation tasks and show that it generates
significantly more diverse datasets than existing methods. Leveraging this
diversity, MoMaGen can train successful imitation learning policies from a
single source demonstration, and these policies can be fine-tuned with as few
as 40 real-world demonstrations to achieve deployment on physical robotic
hardware. More details are available at our project page: momagen.github.io.
Ссылки и действия
Дополнительные ресурсы: