Carré du champ flow matching: better quality-generalisation tradeoff in generative models
2510.05930v1
cs.LG, cs.AI, math.DG
2025-10-09
Авторы:
Jacob Bamberger, Iolo Jones, Dennis Duncan, Michael M. Bronstein, Pierre Vandergheynst, Adam Gosztolai
Abstract
Deep generative models often face a fundamental tradeoff: high sample quality
can come at the cost of memorisation, where the model reproduces training data
rather than generalising across the underlying data geometry. We introduce
Carr\'e du champ flow matching (CDC-FM), a generalisation of flow matching
(FM), that improves the quality-generalisation tradeoff by regularising the
probability path with a geometry-aware noise. Our method replaces the
homogeneous, isotropic noise in FM with a spatially varying, anisotropic
Gaussian noise whose covariance captures the local geometry of the latent data
manifold. We prove that this geometric noise can be optimally estimated from
the data and is scalable to large data. Further, we provide an extensive
experimental evaluation on diverse datasets (synthetic manifolds, point clouds,
single-cell genomics, animal motion capture, and images) as well as various
neural network architectures (MLPs, CNNs, and transformers). We demonstrate
that CDC-FM consistently offers a better quality-generalisation tradeoff. We
observe significant improvements over standard FM in data-scarce regimes and in
highly non-uniformly sampled datasets, which are often encountered in AI for
science applications. Our work provides a mathematical framework for studying
the interplay between data geometry, generalisation and memorisation in
generative models, as well as a robust and scalable algorithm that can be
readily integrated into existing flow matching pipelines.
Ссылки и действия
Дополнительные ресурсы: