Boundary-to-Region Supervision for Offline Safe Reinforcement Learning
2509.25727v1
cs.LG, cs.AI, cs.RO
2025-10-02
Авторы:
Huikang Su, Dengyun Peng, Zifeng Zhuang, YuHan Liu, Qiguang Chen, Donglin Wang, Qinghe Liu
Abstract
Offline safe reinforcement learning aims to learn policies that satisfy
predefined safety constraints from static datasets. Existing
sequence-model-based methods condition action generation on symmetric input
tokens for return-to-go and cost-to-go, neglecting their intrinsic asymmetry:
return-to-go (RTG) serves as a flexible performance target, while cost-to-go
(CTG) should represent a rigid safety boundary. This symmetric conditioning
leads to unreliable constraint satisfaction, especially when encountering
out-of-distribution cost trajectories. To address this, we propose
Boundary-to-Region (B2R), a framework that enables asymmetric conditioning
through cost signal realignment . B2R redefines CTG as a boundary constraint
under a fixed safety budget, unifying the cost distribution of all feasible
trajectories while preserving reward structures. Combined with rotary
positional embeddings , it enhances exploration within the safe region.
Experimental results show that B2R satisfies safety constraints in 35 out of 38
safety-critical tasks while achieving superior reward performance over baseline
methods. This work highlights the limitations of symmetric token conditioning
and establishes a new theoretical and practical approach for applying sequence
models to safe RL. Our code is available at https://github.com/HuikangSu/B2R.
Ссылки и действия
Дополнительные ресурсы: