Global Dynamics of Heavy-Tailed SGDs in Nonconvex Loss Landscape: Characterization and Control

2510.20905v1 cs.LG, math.PR 2025-10-28

Авторы:

Xingyu Wang, Chang-Han Rhee

Abstract

Stochastic gradient descent (SGD) and its variants enable modern artificial intelligence. However, theoretical understanding lags far behind their empirical success. It is widely believed that SGD has a curious ability to avoid sharp local minima in the loss landscape, which are associated with poor generalization. To unravel this mystery and further enhance such capability of SGDs, it is imperative to go beyond the traditional local convergence analysis and obtain a comprehensive understanding of SGDs' global dynamics. In this paper, we develop a set of technical machinery based on the recent large deviations and metastability analysis in Wang and Rhee (2023) and obtain sharp characterization of the global dynamics of heavy-tailed SGDs. In particular, we reveal a fascinating phenomenon in deep learning: by injecting and then truncating heavy-tailed noises during the training phase, SGD can almost completely avoid sharp minima and achieve better generalization performance for the test data. Simulation and deep learning experiments confirm our theoretical prediction that heavy-tailed SGD with gradient clipping finds local minima with a more flat geometry and achieves better generalization performance.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Global Dynamics of Heavy-Tailed SGDs in Nonconvex Loss Landscape: Characterization and Control

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Towards Continuous-Time Approximations for Stochastic Gradient Descent without R...

Covering-Space Normalizing Flows: Approximating Pushforwards on Lens Spaces

Resolving Node Identifiability in Graph Neural Processes via Laplacian Spectral ...

Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Ar...

Deep Learning for Markov Chains: Lyapunov Functions, Poisson's Equation, and Sta...

Навигация