Nonparametric Data Attribution for Diffusion Models
2510.14269v1
cs.LG, stat.ML
2025-10-19
Авторы:
Yutian Zhao, Chao Du, Xiaosen Zheng, Tianyu Pang, Min Lin
Abstract
Data attribution for generative models seeks to quantify the influence of
individual training examples on model outputs. Existing methods for diffusion
models typically require access to model gradients or retraining, limiting
their applicability in proprietary or large-scale settings. We propose a
nonparametric attribution method that operates entirely on data, measuring
influence via patch-level similarity between generated and training images. Our
approach is grounded in the analytical form of the optimal score function and
naturally extends to multiscale representations, while remaining
computationally efficient through convolution-based acceleration. In addition
to producing spatially interpretable attributions, our framework uncovers
patterns that reflect intrinsic relationships between training data and
outputs, independent of any specific model. Experiments demonstrate that our
method achieves strong attribution performance, closely matching gradient-based
approaches and substantially outperforming existing nonparametric baselines.
Code is available at https://github.com/sail-sg/NDA.
Ссылки и действия
Дополнительные ресурсы: