Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies
2510.05692v1
cs.RO, cs.LG
2025-10-09
Авторы:
Yuhang Zhang, Jiaping Xiao, Chao Yan, Mir Feroskhan
Abstract
A prevailing approach for learning visuomotor policies is to employ
reinforcement learning to map high-dimensional visual observations directly to
action commands. However, the combination of high-dimensional visual inputs and
agile maneuver outputs leads to long-standing challenges, including low sample
efficiency and significant sim-to-real gaps. To address these issues, we
propose Oracle-Guided Masked Contrastive Reinforcement Learning (OMC-RL), a
novel framework designed to improve the sample efficiency and asymptotic
performance of visuomotor policy learning. OMC-RL explicitly decouples the
learning process into two stages: an upstream representation learning stage and
a downstream policy learning stage. In the upstream stage, a masked Transformer
module is trained with temporal modeling and contrastive learning to extract
temporally-aware and task-relevant representations from sequential visual
inputs. After training, the learned encoder is frozen and used to extract
visual representations from consecutive frames, while the Transformer module is
discarded. In the downstream stage, an oracle teacher policy with privileged
access to global state information supervises the agent during early training
to provide informative guidance and accelerate early policy learning. This
guidance is gradually reduced to allow independent exploration as training
progresses. Extensive experiments in simulated and real-world environments
demonstrate that OMC-RL achieves superior sample efficiency and asymptotic
policy performance, while also improving generalization across diverse and
perceptually complex scenarios.
Ссылки и действия
Дополнительные ресурсы: