Provable Anytime Ensemble Sampling Algorithms in Nonlinear Contextual Bandits
2510.10730v1
cs.LG, cs.AI, stat.ML
2025-10-16
Авторы:
Jiazheng Sun, Weixin Wang, Pan Xu
Abstract
We provide a unified algorithmic framework for ensemble sampling in nonlinear
contextual bandits and develop corresponding regret bounds for two most common
nonlinear contextual bandit settings: Generalized Linear Ensemble Sampling
(\texttt{GLM-ES}) for generalized linear bandits and Neural Ensemble Sampling
(\texttt{Neural-ES}) for neural contextual bandits. Both methods maintain
multiple estimators for the reward model parameters via maximum likelihood
estimation on randomly perturbed data. We prove high-probability frequentist
regret bounds of $\mathcal{O}(d^{3/2} \sqrt{T} + d^{9/2})$ for \texttt{GLM-ES}
and $\mathcal{O}(\widetilde{d} \sqrt{T})$ for \texttt{Neural-ES}, where $d$ is
the dimension of feature vectors, $\widetilde{d}$ is the effective dimension of
a neural tangent kernel matrix, and $T$ is the number of rounds. These regret
bounds match the state-of-the-art results of randomized exploration algorithms
in nonlinear contextual bandit settings. In the theoretical analysis, we
introduce techniques that address challenges specific to nonlinear models.
Practically, we remove fixed-time horizon assumptions by developing anytime
versions of our algorithms, suitable when $T$ is unknown. Finally, we
empirically evaluate \texttt{GLM-ES}, \texttt{Neural-ES}, and their anytime
variants, demonstrating strong performance. Overall, our results establish
ensemble sampling as a provable and practical randomized exploration approach
for nonlinear contextual bandits.
Ссылки и действия
Дополнительные ресурсы: