Improving Clinical Dataset Condensation with Mode Connectivity-based Trajectory Surrogates
2510.05805v1
cs.LG, cs.CV, cs.DB
2025-10-09
Авторы:
Pafue Christy Nganjimi, Andrew Soltan, Danielle Belgrave, Lei Clifton, David A. Clifton, Anshul Thakur
Abstract
Dataset condensation (DC) enables the creation of compact, privacy-preserving
synthetic datasets that can match the utility of real patient records,
supporting democratised access to highly regulated clinical data for developing
downstream clinical models. State-of-the-art DC methods supervise synthetic
data by aligning the training dynamics of models trained on real and those
trained on synthetic data, typically using full stochastic gradient descent
(SGD) trajectories as alignment targets; however, these trajectories are often
noisy, high-curvature, and storage-intensive, leading to unstable gradients,
slow convergence, and substantial memory overhead. We address these limitations
by replacing full SGD trajectories with smooth, low-loss parametric surrogates,
specifically quadratic B\'ezier curves that connect the initial and final model
states from real training trajectories. These mode-connected paths provide
noise-free, low-curvature supervision signals that stabilise gradients,
accelerate convergence, and eliminate the need for dense trajectory storage. We
theoretically justify B\'ezier-mode connections as effective surrogates for SGD
paths and empirically show that the proposed method outperforms
state-of-the-art condensation approaches across five clinical datasets,
yielding condensed datasets that enable clinically effective model development.
Ссылки и действия
Дополнительные ресурсы: