Oracle-Efficient Combinatorial Semi-Bandits
2510.21431v1
stat.ML, cs.LG
2025-10-28
Авторы:
Jung-hun Kim, Milan Vojnović, Min-hwan Oh
Abstract
We study the combinatorial semi-bandit problem where an agent selects a
subset of base arms and receives individual feedback. While this generalizes
the classical multi-armed bandit and has broad applicability, its scalability
is limited by the high cost of combinatorial optimization, requiring oracle
queries at every round. To tackle this, we propose oracle-efficient frameworks
that significantly reduce oracle calls while maintaining tight regret
guarantees. For the worst-case linear reward setting, our algorithms achieve
$\tilde{O}(\sqrt{T})$ regret using only $O(\log\log T)$ oracle queries. We also
propose covariance-adaptive algorithms that leverage noise structure for
improved regret, and extend our approach to general (non-linear) rewards.
Overall, our methods reduce oracle usage from linear to (doubly) logarithmic in
time, with strong theoretical guarantees.
Ссылки и действия
Дополнительные ресурсы: