Lipschitz Bandits with Stochastic Delayed Feedback
2510.00309v1
cs.LG, stat.ML
2025-10-04
Авторы:
Zhongxuan Liu, Yue Kang, Thomas C. M. Lee
Abstract
The Lipschitz bandit problem extends stochastic bandits to a continuous
action set defined over a metric space, where the expected reward function
satisfies a Lipschitz condition. In this work, we introduce a new problem of
Lipschitz bandit in the presence of stochastic delayed feedback, where the
rewards are not observed immediately but after a random delay. We consider both
bounded and unbounded stochastic delays, and design algorithms that attain
sublinear regret guarantees in each setting. For bounded delays, we propose a
delay-aware zooming algorithm that retains the optimal performance of the
delay-free setting up to an additional term that scales with the maximal delay
$\tau_{\max}$. For unbounded delays, we propose a novel phased learning
strategy that accumulates reliable feedback over carefully scheduled intervals,
and establish a regret lower bound showing that our method is nearly optimal up
to logarithmic factors. Finally, we present experimental results to demonstrate
the efficiency of our algorithms under various delay scenarios.
Ссылки и действия
Дополнительные ресурсы: