More Than Memory Savings: Zeroth-Order Optimization Mitigates Forgetting in Continual Learning
2510.21019v1
cs.LG, cs.CV
2025-10-28
Авторы:
Wanhao Yu, Zheng Wang, Shuteng Niu, Sen Lin, Li Yang
Abstract
Zeroth-order (ZO) optimization has gained attention as a memory-efficient
alternative to first-order (FO) methods, particularly in settings where
gradient computation is expensive or even impractical. Beyond its memory
efficiency, in this work, we investigate ZO optimization for continual learning
(CL) as a novel approach to address the plasticity-stability-efficiency
trilemma. Through theoretical analysis and empirical evidence, we show that ZO
optimization naturally leads to flatter loss landscapes, which in turn reduce
forgetting in CL. However, this stability comes at a cost of plasticity: due to
its imprecise gradient estimates and slower convergence, ZO optimization tends
to be less effective than FO in acquiring new task-specific knowledge,
particularly under constrained training budgets. To better understand this
trade-off, we conduct a holistic evaluation of ZO optimization applied to
various existing CL methods. Our findings reveal that ZO optimization enhances
stability but often undermines plasticity, particularly when used with
learnable classifiers. Motivated by this insight, we propose ZO-FC, a simple
but effective approach that applies ZO optimization to a single adapter-based
PEFT module with FO optimized classifier. This design leverages the stability
benefits of ZO while preserving the adaptability of FO updates with negligible
memory overhead. Experiments demonstrate that ZO-FC achieves an effective
balance between stability and plasticity, offering a practical and
memory-efficient solution for on-device CL.
Ссылки и действия
Дополнительные ресурсы: