Towards Robust Zero-Shot Reinforcement Learning
2510.15382v1
cs.LG, cs.AI, cs.RO
2025-10-21
Авторы:
Kexin Zheng, Lauriane Teyssier, Yinan Zheng, Yu Luo, Xiayuan Zhan
Abstract
The recent development of zero-shot reinforcement learning (RL) has opened a
new avenue for learning pre-trained generalist policies that can adapt to
arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward
representations (FB) and related methods have shown promise in zero-shot RL, we
empirically found that their modeling lacks expressivity and that extrapolation
errors caused by out-of-distribution (OOD) actions during offline learning
sometimes lead to biased representations, ultimately resulting in suboptimal
performance. To address these issues, we propose Behavior-REgularizEd Zero-shot
RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that
simultaneously enhances learning stability, policy extraction capability, and
representation learning quality. BREEZE introduces behavioral regularization in
zero-shot RL policy learning, transforming policy optimization into a stable
in-sample learning paradigm. Additionally, BREEZE extracts the policy using a
task-conditioned diffusion model, enabling the generation of high-quality and
multimodal action distributions in zero-shot RL settings. Moreover, BREEZE
employs expressive attention-based architectures for representation modeling to
capture the complex relationships between environmental dynamics. Extensive
experiments on ExORL and D4RL Kitchen demonstrate that BREEZE achieves the best
or near-the-best performance while exhibiting superior robustness compared to
prior offline zero-shot RL methods. The official implementation is available
at: https://github.com/Whiterrrrr/BREEZE.
Ссылки и действия
Дополнительные ресурсы: