PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
2510.11072v1
cs.RO, cs.AI, cs.LG, cs.SY, eess.SY
2025-10-15
Авторы:
Huayi Wang, Wentao Zhang, Runyi Yu, Tao Huang, Junli Ren, Feiyu Jia, Zirui Wang, Xiaojie Niu, Xiao Chen, Jiahe Chen, Qifeng Chen, Jingbo Wang, Jiangmiao Pang
Abstract
Deploying humanoid robots to interact with real-world environments--such as
carrying objects or sitting on chairs--requires generalizable, lifelike motions
and robust scene perception. Although prior approaches have advanced each
capability individually, combining them in a unified system is still an ongoing
challenge. In this work, we present a physical-world humanoid-scene interaction
system, PhysHSI, that enables humanoids to autonomously perform diverse
interaction tasks while maintaining natural and lifelike behaviors. PhysHSI
comprises a simulation training pipeline and a real-world deployment system. In
simulation, we adopt adversarial motion prior-based policy learning to imitate
natural humanoid-scene interaction data across diverse scenarios, achieving
both generalization and lifelike behaviors. For real-world deployment, we
introduce a coarse-to-fine object localization module that combines LiDAR and
camera inputs to provide continuous and robust scene perception. We validate
PhysHSI on four representative interactive tasks--box carrying, sitting, lying,
and standing up--in both simulation and real-world settings, demonstrating
consistently high success rates, strong generalization across diverse task
goals, and natural motion patterns.