Reinforcement Learning with Action-Triggered Observations
2510.02149v1
cs.LG, math.OC, stat.ML, 68T05 (Primary), 62L05, 68W27 (Secondary)
2025-10-04
Авторы:
Alexander Ryabchenko, Wenlong Mou
Abstract
We study reinforcement learning problems where state observations are
stochastically triggered by actions, a constraint common in many real-world
applications. This framework is formulated as Action-Triggered Sporadically
Traceable Markov Decision Processes (ATST-MDPs), where each action has a
specified probability of triggering a state observation. We derive tailored
Bellman optimality equations for this framework and introduce the
action-sequence learning paradigm in which agents commit to executing a
sequence of actions until the next observation arrives. Under the linear MDP
assumption, value-functions are shown to admit linear representations in an
induced action-sequence feature map. Leveraging this structure, we propose
off-policy estimators with statistical error guarantees for such feature maps
and introduce ST-LSVI-UCB, a variant of LSVI-UCB adapted for action-triggered
settings. ST-LSVI-UCB achieves regret $\widetilde
O(\sqrt{Kd^3(1-\gamma)^{-3}})$, where $K$ is the number of episodes, $d$ the
feature dimension, and $\gamma$ the discount factor (per-step episode
non-termination probability). Crucially, this work establishes the theoretical
foundation for learning with sporadic, action-triggered observations while
demonstrating that efficient learning remains feasible under such observation
constraints.