The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin
2511.03708v1
math.ST, cs.LG, stat.ML, stat.TH
2025-11-07
Авторы:
Rong Jiang, Cong Ma
Abstract
We study batched nonparametric contextual bandits under a margin condition
when the margin parameter $\alpha$ is unknown. To capture the statistical price
of this ignorance, we introduce the regret inflation criterion, defined as the
ratio between the regret of an adaptive algorithm and that of an oracle knowing
$\alpha$. We show that the optimal regret inflation grows polynomial with the
horizon $T$, with exponent precisely given by the value of a convex
optimization problem involving the dimension, smoothness, and batch budget.
Moreover, the minimizers of this optimization problem directly prescribe the
batch allocation and exploration strategy of a rate-optimal algorithm. Building
on this principle, we develop RoBIN (RObust batched algorithm with adaptive
BINning), which achieves the optimal regret inflation up to logarithmic
factors. These results reveal a new adaptivity barrier: under batching,
adaptation to an unknown margin parameter inevitably incurs a polynomial
penalty, sharply characterized by a variational problem. Remarkably, this
barrier vanishes when the number of batches exceeds $\log \log T$; with only a
doubly logarithmic number of updates, one can recover the oracle regret rate up
to polylogarithmic factors.