Provable Unlearning with Gradient Ascent on Two-Layer ReLU Neural Networks
2510.14844v1
cs.LG, cs.CR, cs.NE, stat.ML
2025-10-18
Авторы:
Odelia Melamed, Gilad Yehudai, Gal Vardi
Abstract
Machine Unlearning aims to remove specific data from trained models,
addressing growing privacy and ethical concerns. We provide a theoretical
analysis of a simple and widely used method - gradient ascent - used to reverse
the influence of a specific data point without retraining from scratch.
Leveraging the implicit bias of gradient descent towards solutions that satisfy
the Karush-Kuhn-Tucker (KKT) conditions of a margin maximization problem, we
quantify the quality of the unlearned model by evaluating how well it satisfies
these conditions w.r.t. the retained data. To formalize this idea, we propose a
new success criterion, termed \textbf{$(\epsilon, \delta, \tau)$-successful}
unlearning, and show that, for both linear models and two-layer neural networks
with high dimensional data, a properly scaled gradient-ascent step satisfies
this criterion and yields a model that closely approximates the retrained
solution on the retained data. We also show that gradient ascent performs
successful unlearning while still preserving generalization in a synthetic
Gaussian-mixture setting.