High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
2511.03952v1
stat.ML, cs.LG
2025-11-08
Авторы:
Aukosh Jagannath, Taj Jones-McCormick, Varnan Sarangian
Abstract
We develop a high-dimensional scaling limit for Stochastic Gradient Descent
with Polyak Momentum (SGD-M) and adaptive step-sizes. This provides a framework
to rigourously compare online SGD with some of its popular variants. We show
that the scaling limits of SGD-M coincide with those of online SGD after an
appropriate time rescaling and a specific choice of step-size. However, if the
step-size is kept the same between the two algorithms, SGD-M will amplify
high-dimensional effects, potentially degrading performance relative to online
SGD. We demonstrate our framework on two popular learning problems: Spiked
Tensor PCA and Single Index Models. In both cases, we also examine online SGD
with an adaptive step-size based on normalized gradients. In the
high-dimensional regime, this algorithm yields multiple benefits: its dynamics
admit fixed points closer to the population minimum and widens the range of
admissible step-sizes for which the iterates converge to such solutions. These
examples provide a rigorous account, aligning with empirical motivation, of how
early preconditioners can stabilize and improve dynamics in settings where
online SGD fails.
Ссылки и действия
Дополнительные ресурсы: