EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities
2510.27545v1
cs.RO, cs.AI
2025-11-04
Авторы:
Travis Davies, Yiqi Huang, Alexi Gladstone, Yunxin Liu, Xiang Chen, Heng Ji, Huxian Liu, Luhui Hu
Abstract
Implicit policies parameterized by generative models, such as Diffusion
Policy, have become the standard for policy learning and Vision-Language-Action
(VLA) models in robotics. However, these approaches often suffer from high
computational cost, exposure bias, and unstable inference dynamics, which lead
to divergence under distribution shifts. Energy-Based Models (EBMs) address
these issues by learning energy landscapes end-to-end and modeling equilibrium
dynamics, offering improved robustness and reduced exposure bias. Yet, policies
parameterized by EBMs have historically struggled to scale effectively. Recent
work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs
to high-dimensional spaces, but their potential for solving core challenges in
physically embodied models remains underexplored. We introduce a new
energy-based architecture, EBT-Policy, that solves core issues in robotic and
real-world settings. Across simulated and real-world tasks, EBT-Policy
consistently outperforms diffusion-based policies, while requiring less
training and inference computation. Remarkably, on some tasks it converges
within just two inference steps, a 50x reduction compared to Diffusion Policy's
100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior
models, such as zero-shot recovery from failed action sequences using only
behavior cloning and without explicit retry training. By leveraging its scalar
energy for uncertainty-aware inference and dynamic compute allocation,
EBT-Policy offers a promising path toward robust, generalizable robot behavior
under distribution shifts.
Ссылки и действия
Дополнительные ресурсы: