Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation
2509.26219v1
cs.CV, cs.AI, I.2.0; I.4.2; I.4.10
2025-10-02
Авторы:
Chenyang Jiang, Zhengcen Li, Hang Zhao, Qiben Shan, Shaocong Wu, Jingyong Su
Abstract
Dataset distillation has emerged as a promising paradigm that synthesizes
compact, informative datasets capable of retaining the knowledge of large-scale
counterparts, thereby addressing the substantial computational and storage
burdens of modern model training. Conventional approaches typically rely on
dense pixel-level representations, which introduce redundancy and are difficult
to scale up. In this work, we propose GSDD, a novel and efficient sparse
representation for dataset distillation based on 2D Gaussians. Instead of
representing all pixels equally, GSDD encodes critical discriminative
information in a distilled image using only a small number of Gaussian
primitives. This sparse representation could improve dataset diversity under
the same storage budget, enhancing coverage of difficult samples and boosting
distillation performance. To ensure both efficiency and scalability, we adapt
CUDA-based splatting operators for parallel inference and training, enabling
high-quality rendering with minimal computational and memory overhead. Our
method is simple yet effective, broadly applicable to different distillation
pipelines, and highly scalable. Experiments show that GSDD achieves
state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet subsets,
while remaining highly efficient encoding and decoding cost. Our code is
available at https://github.com/j-cyoung/GSDatasetDistillation.