On the hardness of RL with Lookahead
2510.19372v1
stat.ML, cs.LG
2025-10-24
Авторы:
Corentin Pla, Hugo Richard, Marc Abeille, Nadav Merlis, Vianney Perchet
Abstract
We study reinforcement learning (RL) with transition look-ahead, where the
agent may observe which states would be visited upon playing any sequence of
$\ell$ actions before deciding its course of action. While such predictive
information can drastically improve the achievable performance, we show that
using this information optimally comes at a potentially prohibitive
computational cost. Specifically, we prove that optimal planning with one-step
look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear
programming formulation. In contrast, for $\ell \geq 2$, the problem becomes
NP-hard. Our results delineate a precise boundary between tractable and
intractable cases for the problem of planning with transition look-ahead in
reinforcement learning.
Ссылки и действия
Дополнительные ресурсы: