ODE approximation for the Adam algorithm: General and overparametrized setting
2511.04622v1
math.OC, cs.LG, math.PR
2025-11-08
Авторы:
Steffen Dereich, Arnulf Jentzen, Sebastian Kassing
Abstract
The Adam optimizer is currently presumably the most popular optimization
method in deep learning. In this article we develop an ODE based method to
study the Adam optimizer in a fast-slow scaling regime. For fixed momentum
parameters and vanishing step-sizes, we show that the Adam algorithm is an
asymptotic pseudo-trajectory of the flow of a particular vector field, which is
referred to as the Adam vector field. Leveraging properties of asymptotic
pseudo-trajectories, we establish convergence results for the Adam algorithm.
In particular, in a very general setting we show that if the Adam algorithm
converges, then the limit must be a zero of the Adam vector field, rather than
a local minimizer or critical point of the objective function.
In contrast, in the overparametrized empirical risk minimization setting, the
Adam algorithm is able to locally find the set of minima. Specifically, we show
that in a neighborhood of the global minima, the objective function serves as a
Lyapunov function for the flow induced by the Adam vector field. As a
consequence, if the Adam algorithm enters a neighborhood of the global minima
infinitely often, it converges to the set of global minima.