Optimal Arm Elimination Algorithms for Combinatorial Bandits
2510.23992v1
cs.LG, cs.IT, math.IT, stat.ML
2025-10-30
Авторы:
Yuxiao Wen, Yanjun Han, Zhengyuan Zhou
Abstract
Combinatorial bandits extend the classical bandit framework to settings where
the learner selects multiple arms in each round, motivated by applications such
as online recommendation and assortment optimization. While extensions of upper
confidence bound (UCB) algorithms arise naturally in this context, adapting arm
elimination methods has proved more challenging. We introduce a novel
elimination scheme that partitions arms into three categories (confirmed,
active, and eliminated), and incorporates explicit exploration to update these
sets. We demonstrate the efficacy of our algorithm in two settings: the
combinatorial multi-armed bandit with general graph feedback, and the
combinatorial linear contextual bandit. In both cases, our approach achieves
near-optimal regret, whereas UCB-based methods can provably fail due to
insufficient explicit exploration. Matching lower bounds are also provided.