Learning under Quantization for High-Dimensional Linear Regression
2510.18259v1
stat.ML, cs.AI, cs.LG
2025-10-23
Авторы:
Dechen Zhang, Junwei Su, Difan Zou
Abstract
The use of low-bit quantization has emerged as an indispensable technique for
enabling the efficient training of large-scale models. Despite its widespread
empirical success, a rigorous theoretical understanding of its impact on
learning performance remains notably absent, even in the simplest linear
regression setting. We present the first systematic theoretical study of this
fundamental question, analyzing finite-step stochastic gradient descent (SGD)
for high-dimensional linear regression under a comprehensive range of
quantization targets: data, labels, parameters, activations, and gradients. Our
novel analytical framework establishes precise algorithm-dependent and
data-dependent excess risk bounds that characterize how different quantization
affects learning: parameter, activation, and gradient quantization amplify
noise during training; data quantization distorts the data spectrum; and data
and label quantization introduce additional approximation and quantized error.
Crucially, we prove that for multiplicative quantization (with input-dependent
quantization step), this spectral distortion can be eliminated, and for
additive quantization (with constant quantization step), a beneficial scaling
effect with batch size emerges. Furthermore, for common polynomial-decay data
spectra, we quantitatively compare the risks of multiplicative and additive
quantization, drawing a parallel to the comparison between FP and integer
quantization methods. Our theory provides a powerful lens to characterize how
quantization shapes the learning dynamics of optimization algorithms, paving
the way to further explore learning theory under practical hardware
constraints.
Ссылки и действия
Дополнительные ресурсы: