Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

2512.04958v1 cs.LG, cs.AI 2025-12-05

Авторы:

Roberto Cipollone, Luca Iocchi, Matteo Leonetti

Abstract

The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combining partial solutions computed for smaller subtasks. Despite their very intuitive role for learning, most notions of MDP abstractions proposed in the HRL literature have limited expressive power or do not possess formal efficiency guarantees. This work addresses these fundamental issues by defining Realizable Abstractions, a new relation between generic low-level MDPs and their associated high-level decision processes. The notion we propose avoids non-Markovianity issues and has desirable near-optimality guarantees. Indeed, we show that any abstract policy for Realizable Abstractions can be translated into near-optimal policies for the low-level MDP, through a suitable composition of options. As demonstrated in the paper, these options can be expressed as solutions of specific constrained MDPs. Based on these findings, we propose RARL, a new HRL algorithm that returns compositional and near-optimal low-level policies, taking advantage of the Realizable Abstraction given in the input. We show that RARL is Probably Approximately Correct, it converges in a polynomial number of samples, and it is robust to inaccuracies in the abstraction.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval

Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function

TimesNet-Gen: Deep Learning-based Site Specific Strong Motion Generation

BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training

Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep...

Навигация