Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
2510.26303v1
cs.LG, cs.AI, math.OC, stat.ML
2025-11-01
Авторы:
Beomhan Baek, Minhak Song, Chulhee Yun
Abstract
Adam [Kingma and Ba, 2015] is the de facto optimizer in deep learning, yet
its theoretical understanding remains limited. Prior analyses show that Adam
favors solutions aligned with $\ell_\infty$-geometry, but these results are
restricted to the full-batch regime. In this work, we study the implicit bias
of incremental Adam (using one sample per step) for logistic regression on
linearly separable data, and we show that its bias can deviate from the
full-batch behavior. To illustrate this, we construct a class of structured
datasets where incremental Adam provably converges to the $\ell_2$-max-margin
classifier, in contrast to the $\ell_\infty$-max-margin bias of full-batch
Adam. For general datasets, we develop a proxy algorithm that captures the
limiting behavior of incremental Adam as $\beta_2 \to 1$ and we characterize
its convergence direction via a data-dependent dual fixed-point formulation.
Finally, we prove that, unlike Adam, Signum [Bernstein et al., 2018] converges
to the $\ell_\infty$-max-margin classifier for any batch size by taking $\beta$
close enough to 1. Overall, our results highlight that the implicit bias of
Adam crucially depends on both the batching scheme and the dataset, while
Signum remains invariant.