Policy Transfer Ensures Fast Learning for Continuous-Time LQR with Entropy Regularization
2510.15165v1
cs.LG, math.OC
2025-10-21
Авторы:
Xin Guo, Zijiu Lyu
Abstract
Reinforcement Learning (RL) enables agents to learn optimal decision-making
strategies through interaction with an environment, yet training from scratch
on complex tasks can be highly inefficient. Transfer learning (TL), widely
successful in large language models (LLMs), offers a promising direction for
enhancing RL efficiency by leveraging pre-trained models.
This paper investigates policy transfer, a TL approach that initializes
learning in a target RL task using a policy from a related source task, in the
context of continuous-time linear quadratic regulators (LQRs) with entropy
regularization. We provide the first theoretical proof of policy transfer for
continuous-time RL, proving that a policy optimal for one LQR serves as a
near-optimal initialization for closely related LQRs, while preserving the
original algorithm's convergence rate. Furthermore, we introduce a novel policy
learning algorithm for continuous-time LQRs that achieves global linear and
local super-linear convergence. Our results demonstrate both theoretical
guarantees and algorithmic benefits of transfer learning in continuous-time RL,
addressing a gap in existing literature and extending prior work from discrete
to continuous time settings.
As a byproduct of our analysis, we derive the stability of a class of
continuous-time score-based diffusion models via their connection with LQRs.
Ссылки и действия
Дополнительные ресурсы: