R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning
2510.18074v1
cs.LG, cs.AI, math.OC
2025-10-23
Авторы:
Nadir Farhi
Abstract
In this work, we address the problem of determining reliable policies in
reinforcement learning (RL), with a focus on optimization under uncertainty and
the need for performance guarantees. While classical RL algorithms aim at
maximizing the expected return, many real-world applications - such as routing,
resource allocation, or sequential decision-making under risk - require
strategies that ensure not only high average performance but also a guaranteed
probability of success. To this end, we propose a novel formulation in which
the objective is to maximize the probability that the cumulative return exceeds
a prescribed threshold. We demonstrate that this reliable RL problem can be
reformulated, via a state-augmented representation, into a standard RL problem,
thereby allowing the use of existing RL and deep RL algorithms without the need
for entirely new algorithmic frameworks. Theoretical results establish the
equivalence of the two formulations and show that reliable strategies can be
derived by appropriately adapting well-known methods such as Q-learning or
Dueling Double DQN. To illustrate the practical relevance of the approach, we
consider the problem of reliable routing, where the goal is not to minimize the
expected travel time but rather to maximize the probability of reaching the
destination within a given time budget. Numerical experiments confirm that the
proposed formulation leads to policies that effectively balance efficiency and
reliability, highlighting the potential of reliable RL for applications in
stochastic and safety-critical environments.
Ссылки и действия
Дополнительные ресурсы: