On the Theory of Continual Learning with Gradient Descent for Neural Networks

2510.05573v1 stat.ML, cs.IT, cs.LG, math.IT 2025-10-12

Авторы:

Hossein Taheri, Avishek Ghosh, Arya Mazumdar

Abstract

Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting the earlier ones, is a central goal of artificial intelligence. To shed light on its underlying mechanisms, we analyze the limitations of continual learning in a tractable yet representative setting. In particular, we study one-hidden-layer quadratic neural networks trained by gradient descent on an XOR cluster dataset with Gaussian noise, where different tasks correspond to different clusters with orthogonal means. Our results obtain bounds on the rate of forgetting during train and test-time in terms of the number of iterations, the sample size, the number of tasks, and the hidden-layer size. Our results reveal interesting phenomena on the role of different problem parameters in the rate of forgetting. Numerical experiments across diverse setups confirm our results, demonstrating their validity beyond the analyzed settings.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

On the Theory of Continual Learning with Gradient Descent for Neural Networks

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Tighter CMI-Based Generalization Bounds via Stochastic Projection and Quantizati...

How Patterns Dictate Learnability in Sequential Data

Навигация