On the Theory of Continual Learning with Gradient Descent for Neural Networks
2510.05573v1
stat.ML, cs.IT, cs.LG, math.IT
2025-10-12
Авторы:
Hossein Taheri, Avishek Ghosh, Arya Mazumdar
Abstract
Continual learning, the ability of a model to adapt to an ongoing sequence of
tasks without forgetting the earlier ones, is a central goal of artificial
intelligence. To shed light on its underlying mechanisms, we analyze the
limitations of continual learning in a tractable yet representative setting. In
particular, we study one-hidden-layer quadratic neural networks trained by
gradient descent on an XOR cluster dataset with Gaussian noise, where different
tasks correspond to different clusters with orthogonal means. Our results
obtain bounds on the rate of forgetting during train and test-time in terms of
the number of iterations, the sample size, the number of tasks, and the
hidden-layer size. Our results reveal interesting phenomena on the role of
different problem parameters in the rate of forgetting. Numerical experiments
across diverse setups confirm our results, demonstrating their validity beyond
the analyzed settings.