An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning
2510.17564v1
cs.LG, cs.AI, cs.RO, cs.SY, eess.SY
2025-10-22
Авторы:
Lindsay Spoor, Álvaro Serra-Gómez, Aske Plaat, Thomas Moerland
Abstract
In safety-critical domains such as robotics, navigation and power systems,
constrained optimization problems arise where maximizing performance must be
carefully balanced with associated constraints. Safe reinforcement learning
provides a framework to address these challenges, with Lagrangian methods being
a popular choice. However, the effectiveness of Lagrangian methods crucially
depends on the choice of the Lagrange multiplier $\lambda$, which governs the
trade-off between return and constraint cost. A common approach is to update
the multiplier automatically during training. Although this is standard in
practice, there remains limited empirical evidence on the robustness of an
automated update and its influence on overall performance. Therefore, we
analyze (i) optimality and (ii) stability of Lagrange multipliers in safe
reinforcement learning across a range of tasks. We provide $\lambda$-profiles
that give a complete visualization of the trade-off between return and
constraint cost of the optimization problem. These profiles show the highly
sensitive nature of $\lambda$ and moreover confirm the lack of general
intuition for choosing the optimal value $\lambda^*$. Our findings additionally
show that automated multiplier updates are able to recover and sometimes even
exceed the optimal performance found at $\lambda^*$ due to the vast difference
in their learning trajectories. Furthermore, we show that automated multiplier
updates exhibit oscillatory behavior during training, which can be mitigated
through PID-controlled updates. However, this method requires careful tuning to
achieve consistently better performance across tasks. This highlights the need
for further research on stabilizing Lagrangian methods in safe reinforcement
learning. The code used to reproduce our results can be found at
https://github.com/lindsayspoor/Lagrangian_SafeRL.