Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms
2510.25811v1
stat.ML, cs.LG, math.ST, stat.TH
2025-11-01
Авторы:
William Réveillard, Richard Combes
Abstract
We consider a stochastic multi-armed bandit problem with i.i.d. rewards where
the expected reward function is multimodal with at most m modes. We propose the
first known computationally tractable algorithm for computing the solution to
the Graves-Lai optimization problem, which in turn enables the implementation
of asymptotically optimal algorithms for this bandit problem. The code for the
proposed algorithms is publicly available at
https://github.com/wilrev/MultimodalBandits