Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
2510.11689v1
cs.RO, cs.AI
2025-10-15
Авторы:
Maggie Wang, Stephen Tian, Aiden Swann, Ola Shorinwa, Jiajun Wu, Mac Schwager
Abstract
Learning robotic manipulation policies directly in the real world can be
expensive and time-consuming. While reinforcement learning (RL) policies
trained in simulation present a scalable alternative, effective sim-to-real
transfer remains challenging, particularly for tasks that require precise
dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL
pipeline that combines vision-language model (VLM)-inferred physical parameter
estimates with interactive adaptation through uncertainty-aware fusion. Our
approach consists of three core components: (1) high-fidelity geometric
reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions
over physical parameters, and (3) online physical parameter estimation from
interaction data. Phys2Real conditions policies on interpretable physical
parameters, refining VLM predictions with online estimates via ensemble-based
uncertainty quantification. On planar pushing tasks of a T-block with varying
center of mass (CoM) and a hammer with an off-center mass distribution,
Phys2Real achieves substantial improvements over a domain randomization
baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23%
in the challenging top-weighted T-block, and 15% faster average task completion
for hammer pushing. Ablation studies indicate that the combination of VLM and
interaction information is essential for success. Project website:
https://phys2real.github.io/ .
Ссылки и действия
Дополнительные ресурсы: