Exploration via Feature Perturbation in Contextual Bandits
2510.17390v1
cs.LG, stat.ML
2025-10-22
Авторы:
Seouh-won Yi, Min-hwan Oh
Abstract
We propose feature perturbation, a simple yet powerful technique that injects
randomness directly into feature inputs, instead of randomizing unknown
parameters or adding noise to rewards. Remarkably, this algorithm achieves
$\tilde{\mathcal{O}}(d\sqrt{T})$ worst-case regret bound for generalized linear
bandits, while avoiding the $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ regret
typical of existing randomized bandit algorithms. Because our algorithm eschews
parameter sampling, it is both computationally efficient and naturally extends
to non-parametric or neural network models. We verify these advantages through
empirical evaluations, demonstrating that feature perturbation not only
surpasses existing methods but also unifies strong practical performance with
best-known theoretical guarantees.
Ссылки и действия
Дополнительные ресурсы: