Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

2510.25811v1 stat.ML, cs.LG, math.ST, stat.TH 2025-11-01

Авторы:

William Réveillard, Richard Combes

Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Найти цитирования в Google Scholar
Поиск в Semantic Scholar
Другие статьи категории stat.ML, cs.LG, math.ST, stat.TH

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity

Optimal Convergence Analysis of DDPM for General Distributions

Complexity Dependent Error Rates for Physics-informed Statistical Learning via t...

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

Non-asymptotic error bounds for probability flow ODEs under weak log-concavity

Навигация