High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator
2510.10693v1
stat.ML, cond-mat.dis-nn, cs.AI, cs.LG, math.ST, stat.TH
2025-10-16
Авторы:
Yuma Ichikawa, Shuhei Kashiwamura, Ayaka Sakata
Abstract
Quantized neural network training optimizes a discrete, non-differentiable
objective. The straight-through estimator (STE) enables backpropagation through
surrogate gradients and is widely used. While previous studies have primarily
focused on the properties of surrogate gradients and their convergence, the
influence of quantization hyperparameters, such as bit width and quantization
range, on learning dynamics remains largely unexplored. We theoretically show
that in the high-dimensional limit, STE dynamics converge to a deterministic
ordinary differential equation. This reveals that STE training exhibits a
plateau followed by a sharp drop in generalization error, with plateau length
depending on the quantization range. A fixed-point analysis quantifies the
asymptotic deviation from the unquantized linear model. We also extend
analytical techniques for stochastic gradient descent to nonlinear
transformations of weights and inputs.