Low Rank Gradients and Where to Find Them
2510.01303v1
cs.LG, cs.AI, stat.ML
2025-10-04
Авторы:
Rishi Sonthalia, Michael Murray, Guido Montúfar
Abstract
This paper investigates low-rank structure in the gradients of the training
loss for two-layer neural networks while relaxing the usual isotropy
assumptions on the training data and parameters. We consider a spiked data
model in which the bulk can be anisotropic and ill-conditioned, we do not
require independent data and weight matrices and we also analyze both the
mean-field and neural-tangent-kernel scalings. We show that the gradient with
respect to the input weights is approximately low rank and is dominated by two
rank-one terms: one aligned with the bulk data-residue , and another aligned
with the rank one spike in the input data. We characterize how properties of
the training data, the scaling regime and the activation function govern the
balance between these two components. Additionally, we also demonstrate that
standard regularizers, such as weight decay, input noise and Jacobian
penalties, also selectively modulate these components. Experiments on synthetic
and real data corroborate our theoretical predictions.
Ссылки и действия
Дополнительные ресурсы: