Pretraining in Actor-Critic Reinforcement Learning for Robot Motion Control
2510.12363v1
cs.RO, cs.LG
2025-10-16
Авторы:
Jiale Fan, Andrei Cramariuc, Tifanny Portela, Marco Hutter
Abstract
The pretraining-finetuning paradigm has facilitated numerous transformative
advancements in artificial intelligence research in recent years. However, in
the domain of reinforcement learning (RL) for robot motion control, individual
skills are often learned from scratch despite the high likelihood that some
generalizable knowledge is shared across all task-specific policies belonging
to a single robot embodiment. This work aims to define a paradigm for
pretraining neural network models that encapsulate such knowledge and can
subsequently serve as a basis for warm-starting the RL process in classic
actor-critic algorithms, such as Proximal Policy Optimization (PPO). We begin
with a task-agnostic exploration-based data collection algorithm to gather
diverse, dynamic transition data, which is then used to train a Proprioceptive
Inverse Dynamics Model (PIDM) through supervised learning. The pretrained
weights are loaded into both the actor and critic networks to warm-start the
policy optimization of actual tasks. We systematically validated our proposed
method on seven distinct robot motion control tasks, showing significant
benefits to this initialization strategy. Our proposed approach on average
improves sample efficiency by 40.1% and task performance by 7.5%, compared to
random initialization. We further present key ablation studies and empirical
analyses that shed light on the mechanisms behind the effectiveness of our
method.
Ссылки и действия
Дополнительные ресурсы: