Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with multiplicative noise
2510.02896v1
eess.SY, cs.AI, cs.SY, 37N35, 49N10
2025-10-07
Авторы:
Gabriel Diaz, Lucky Li, Wenhao Zhang
Abstract
Reinforcement Learning (RL) has emerged as a powerful framework for
sequential decision-making in dynamic environments, particularly when system
parameters are unknown. This paper investigates RL-based control for
entropy-regularized Linear Quadratic control (LQC) problems with multiplicative
noises over an infinite time horizon. First, we adapt the Regularized Policy
Gradient (RPG) algorithm to stochastic optimal control settings, proving that
despite the non-convexity of the problem, RPG converges globally under
conditions of gradient domination and near-smoothness. Second, based on
zero-order optimization approach, we introduce a novel model free RL algorithm:
Sample-Based Regularized Policy Gradient (SB-RPG). SB-RPG operates without
knowledge of system parameters yet still retains strong theoretical guarantees
of global convergence. Our model leverages entropy regularization to accelerate
convergence and address the exploration versus exploitation trade-off inherent
in RL. Numerical simulations validate the theoretical results and demonstrate
the efficacy of SB-RPG in unknown-parameters environments.