Learning under Quantization for High-Dimensional Linear Regression

2510.18259v1 stat.ML, cs.AI, cs.LG 2025-10-23

Авторы:

Dechen Zhang, Junwei Su, Difan Zou

Abstract

The use of low-bit quantization has emerged as an indispensable technique for enabling the efficient training of large-scale models. Despite its widespread empirical success, a rigorous theoretical understanding of its impact on learning performance remains notably absent, even in the simplest linear regression setting. We present the first systematic theoretical study of this fundamental question, analyzing finite-step stochastic gradient descent (SGD) for high-dimensional linear regression under a comprehensive range of quantization targets: data, labels, parameters, activations, and gradients. Our novel analytical framework establishes precise algorithm-dependent and data-dependent excess risk bounds that characterize how different quantization affects learning: parameter, activation, and gradient quantization amplify noise during training; data quantization distorts the data spectrum; and data and label quantization introduce additional approximation and quantized error. Crucially, we prove that for multiplicative quantization (with input-dependent quantization step), this spectral distortion can be eliminated, and for additive quantization (with constant quantization step), a beneficial scaling effect with batch size emerges. Furthermore, for common polynomial-decay data spectra, we quantitatively compare the risks of multiplicative and additive quantization, drawing a parallel to the comparison between FP and integer quantization methods. Our theory provides a powerful lens to characterize how quantization shapes the learning dynamics of optimization algorithms, paving the way to further explore learning theory under practical hardware constraints.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Learning under Quantization for High-Dimensional Linear Regression

Авторы:

Abstract

Ссылки и действия

Связанные статьи

An Approach to Variable Clustering: K-means in Transposed Data and its Relations...

FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selectio...

E-Scores for (In)Correctness Assessment of Generative Model Outputs

Robust Decision Making with Partially Calibrated Forecasts

Finding the Sweet Spot: Trading Quality, Cost, and Speed During Inference-Time L...

Навигация