Exact Dynamics of Multi-class Stochastic Gradient Descent
2510.14074v1
stat.ML, cs.LG, math.OC, math.PR, 60H30
2025-10-19
Авторы:
Elizabeth Collins-Woodfin, Inbar Seroussi
Abstract
We develop a framework for analyzing the training and learning rate dynamics
on a variety of high- dimensional optimization problems trained using one-pass
stochastic gradient descent (SGD) with data generated from multiple anisotropic
classes. We give exact expressions for a large class of functions of the
limiting dynamics, including the risk and the overlap with the true signal, in
terms of a deterministic solution to a system of ODEs. We extend the existing
theory of high-dimensional SGD dynamics to Gaussian-mixture data and a large
(growing with the parameter size) number of classes. We then investigate in
detail the effect of the anisotropic structure of the covariance of the data in
the problems of binary logistic regression and least square loss. We study
three cases: isotropic covariances, data covariance matrices with a large
fraction of zero eigenvalues (denoted as the zero-one model), and covariance
matrices with spectra following a power-law distribution. We show that there
exists a structural phase transition. In particular, we demonstrate that, for
the zero-one model and the power-law model with sufficiently large power, SGD
tends to align more closely with values of the class mean that are projected
onto the "clean directions" (i.e., directions of smaller variance). This is
supported by both numerical simulations and analytical studies, which show the
exact asymptotic behavior of the loss in the high-dimensional limit.