Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem
2510.08768v1
cs.LG, cs.RO
2025-10-14
Авторы:
Francisco Pascoa, Ian Lalonde, Alexandre Girard
Abstract
Reinforcement learning (RL) policies often fail to generalize to new robots,
tasks, or environments with different physical parameters, a challenge that
limits their real-world applicability. This paper presents a simple, zero-shot
transfer method based on Buckingham's Pi Theorem to address this limitation.
The method adapts a pre-trained policy to new system contexts by scaling its
inputs (observations) and outputs (actions) through a dimensionless space,
requiring no retraining. The approach is evaluated against a naive transfer
baseline across three environments of increasing complexity: a simulated
pendulum, a physical pendulum for sim-to-real validation, and the
high-dimensional HalfCheetah. Results demonstrate that the scaled transfer
exhibits no loss of performance on dynamically similar contexts. Furthermore,
on non-similar contexts, the scaled policy consistently outperforms the naive
transfer, significantly expanding the volume of contexts where the original
policy remains effective. These findings demonstrate that dimensional analysis
provides a powerful and practical tool to enhance the robustness and
generalization of RL policies.
Ссылки и действия
Дополнительные ресурсы: