Label Smoothing Improves Gradient Ascent in LLM Unlearning
2510.22376v1
cs.LG, cs.CL
2025-10-29
Авторы:
Zirui Pang, Hao Zheng, Zhijie Deng, Ling Li, Zixin Zhong, Jiaheng Wei
Abstract
LLM unlearning has emerged as a promising approach, aiming to enable models
to forget hazardous/undesired knowledge at low cost while preserving as much
model utility as possible. Among existing techniques, the most straightforward
method is performing Gradient Ascent (GA) w.r.t. the forget data, thereby
forcing the model to unlearn the forget dataset. However, GA suffers from
severe instability, as it drives updates in a divergent direction, often
resulting in drastically degraded model utility. To address this issue, we
propose Smoothed Gradient Ascent (SGA). SGA combines the forget data with
multiple constructed normal data through a tunable smoothing rate. Intuitively,
this extends GA from learning solely on the forget data to jointly learning
across both forget and normal data, enabling more stable unlearning while
better preserving model utility. Theoretically, we provide the theoretical
guidance on the selection of the optimal smoothing rate. Empirically, we
evaluate SGA on three benchmarks: TOFU, Harry Potter, and MUSE-NEWS.
Experimental results demonstrate that SGA consistently outperforms the original
Gradient Ascent (GA) method across all metrics and achieves top-2 performance
among all baseline methods on several key metrics.
Ссылки и действия
Дополнительные ресурсы: