Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

2510.04354v1 cs.RO, cs.AI, cs.SY, eess.SY 2025-10-08

Авторы:

Apurva Badithela, David Snyder, Lihan Zha, Joseph Mikhail, Matthew O'Kelly, Anushri Dixit, Anirudha Majumdar

Abstract

Rapid progress in imitation learning, foundation models, and large-scale datasets has led to robot manipulation policies that generalize to a wide-range of tasks and environments. However, rigorous evaluation of these policies remains a challenge. Typically in practice, robot policies are often evaluated on a small number of hardware trials without any statistical assurances. We present SureSim, a framework to augment large-scale simulation with relatively small-scale real-world testing to provide reliable inferences on the real-world performance of a policy. Our key idea is to formalize the problem of combining real and simulation evaluations as a prediction-powered inference problem, in which a small number of paired real and simulation evaluations are used to rectify bias in large-scale simulation. We then leverage non-asymptotic mean estimation algorithms to provide confidence intervals on mean policy performance. Using physics-based simulation, we evaluate both diffusion policy and multi-task fine-tuned \(\pi_0\) on a joint distribution of objects and initial conditions, and find that our approach saves over \(20-25\%\) of hardware evaluation effort to achieve similar bounds on policy performance.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Integrating Legal and Logical Specifications in Perception, Prediction, and Plan...

Aircraft Collision Avoidance Systems: Technological Challenges and Solutions on ...

Ego-Vision World Model for Humanoid Contact Planning

PhysiAgent: An Embodied Agent Framework in Physical World

TranTac: Leveraging Transient Tactile Signals for Contact-Rich Robotic Manipulat...

Навигация