Online Optimization for Offline Safe Reinforcement Learning

2510.22027v1 cs.LG, cs.AI, stat.ML 2025-10-29

Авторы:

Yassine Chemingui, Aryan Deshwal, Alan Fern, Thanh Nguyen-Tang, Janardhan Rao Doppa

Abstract

We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. We prove the approximate optimality of this approach when integrated with an approximate offline RL oracle and no-regret online optimization. We also present a practical approximation that can be combined with any offline RL algorithm, eliminating the need for offline policy evaluation. Empirical results on the DSRL benchmark demonstrate that our method reliably enforces safety constraints under stringent cost budgets, while achieving high rewards. The code is available at https://github.com/yassineCh/O3SRL.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Online Optimization for Offline Safe Reinforcement Learning

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Single-Round Scalable Analytic Federated Learning

Does Flatness imply Generalization for Logistic Loss in Univariate Two-Layer ReL...

Multi-view diffusion geometry using intertwined diffusion trajectories

A Diffusion Model Framework for Maximum Entropy Reinforcement Learning

Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainab...

Навигация