Expressive Value Learning for Scalable Offline Reinforcement Learning

2510.08218v1 cs.LG, cs.AI, I.2.6 2025-10-11
Авторы:

Nicolas Espinosa-Dice, Kiante Brantley, Wen Sun

Abstract

Reinforcement learning (RL) is a powerful paradigm for learning to make sequences of decisions. However, RL has yet to be fully leveraged in robotics, principally due to its lack of scalability. Offline RL offers a promising avenue by training agents on large, diverse datasets, avoiding the costly real-world interactions of online RL. Scaling offline RL to increasingly complex datasets requires expressive generative models such as diffusion and flow matching. However, existing methods typically depend on either backpropagation through time (BPTT), which is computationally prohibitive, or policy distillation, which introduces compounding errors and limits scalability to larger base policies. In this paper, we consider the question of how to develop a scalable offline RL approach without relying on distillation or backpropagation through time. We introduce Expressive Value Learning for Offline Reinforcement Learning (EVOR): a scalable offline RL approach that integrates both expressive policies and expressive value functions. EVOR learns an optimal, regularized Q-function via flow matching during training. At inference-time, EVOR performs inference-time policy extraction via rejection sampling against the expressive value function, enabling efficient optimization, regularization, and compute-scalable search without retraining. Empirically, we show that EVOR outperforms baselines on a diverse set of offline RL tasks, demonstrating the benefit of integrating expressive value learning into offline RL.

Ссылки и действия

Связанные статьи

1 bit is all we need: binary normalized neural networks

## Контекст Настоящее исследование сосредоточено на проблемах, связанных с размером и вычислительной сложностью больших ...

2025-09-12

Deep Residual Echo State Networks: exploring residual orthogonal connections in ...

## Контекст Echo State Networks (ESNs) — это вид необученных Recurrent Neural Networks (RNNs) в контексте Reservoir Comp...

2025-09-02

Residual Reservoir Memory Networks

## Контекст Residual Reservoir Memory Networks (ResRMNs) — это новая класса необученных рекуррентных нейронных сетей (Re...

2025-08-15