Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
2510.21078v1
cs.LG, math.OC
2025-10-28
Авторы:
Hancheng Min, Zhihui Zhu, René Vidal
Abstract
Among many mysteries behind the success of deep networks lies the exceptional
discriminative power of their learned representations as manifested by the
intriguing Neural Collapse (NC) phenomenon, where simple feature structures
emerge at the last layer of a trained neural network. Prior works on the
theoretical understandings of NC have focused on analyzing the optimization
landscape of matrix-factorization-like problems by considering the last-layer
features as unconstrained free optimization variables and showing that their
global minima exhibit NC. In this paper, we show that gradient flow on a
two-layer ReLU network for classifying orthogonally separable data provably
exhibits NC, thereby advancing prior results in two ways: First, we relax the
assumption of unconstrained features, showing the effect of data structure and
nonlinear activations on NC characterizations. Second, we reveal the role of
the implicit bias of the training dynamics in facilitating the emergence of NC.
Ссылки и действия
Дополнительные ресурсы: