Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization
2509.23898v2
cs.LG, stat.ML
2025-10-01
Авторы:
Chris Kolb, Laetitia Frost, Bernd Bischl, David Rügamer
Abstract
Structured sparsity regularization offers a principled way to compact neural
networks, but its non-differentiability breaks compatibility with conventional
stochastic gradient descent and requires either specialized optimizers or
additional post-hoc pruning without formal guarantees. In this work, we propose
$D$-Gating, a fully differentiable structured overparameterization that splits
each group of weights into a primary weight vector and multiple scalar gating
factors. We prove that any local minimum under $D$-Gating is also a local
minimum using non-smooth structured $L_{2,2/D}$ penalization, and further show
that the $D$-Gating objective converges at least exponentially fast to the
$L_{2,2/D}$-regularized loss in the gradient flow limit. Together, our results
show that $D$-Gating is theoretically equivalent to solving the original group
sparsity problem, yet induces distinct learning dynamics that evolve from a
non-sparse regime into sparse optimization. We validate our theory across
vision, language, and tabular tasks, where $D$-Gating consistently delivers
strong performance-sparsity tradeoffs and outperforms both direct optimization
of structured penalties and conventional pruning baselines.
Ссылки и действия
Дополнительные ресурсы: