Online Optimization for Offline Safe Reinforcement Learning
2510.22027v1
cs.LG, cs.AI, stat.ML
2025-10-29
Авторы:
Yassine Chemingui, Aryan Deshwal, Alan Fern, Thanh Nguyen-Tang, Janardhan Rao Doppa
Abstract
We study the problem of Offline Safe Reinforcement Learning (OSRL), where the
goal is to learn a reward-maximizing policy from fixed data under a cumulative
cost constraint. We propose a novel OSRL approach that frames the problem as a
minimax objective and solves it by combining offline RL with online
optimization algorithms. We prove the approximate optimality of this approach
when integrated with an approximate offline RL oracle and no-regret online
optimization. We also present a practical approximation that can be combined
with any offline RL algorithm, eliminating the need for offline policy
evaluation. Empirical results on the DSRL benchmark demonstrate that our method
reliably enforces safety constraints under stringent cost budgets, while
achieving high rewards. The code is available at
https://github.com/yassineCh/O3SRL.
Ссылки и действия
Дополнительные ресурсы: