Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models
2511.02986v1
stat.ML, cs.LG, q-bio.GN
2025-11-08
Авторы:
Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak
Abstract
Computational modeling of single-cell gene expression is crucial for
understanding cellular processes, but generating realistic expression profiles
remains a major challenge. This difficulty arises from the count nature of gene
expression data and complex latent dependencies among genes. Existing
generative models often impose artificial gene orderings or rely on shallow
neural network architectures. We introduce a scalable latent diffusion model
for single-cell gene expression data, which we refer to as scLDM, that respects
the fundamental exchangeability property of the data. Our VAE uses fixed-size
latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB)
architecture, which serves dual roles: permutation-invariant pooling in the
encoder and permutation-equivariant unpooling in the decoder. We enhance this
framework by replacing the Gaussian prior with a latent diffusion model using
Diffusion Transformers and linear interpolants, enabling high-quality
generation with multi-conditional classifier-free guidance. We show its
superior performance in a variety of experiments for both observational and
perturbational single-cell data, as well as downstream tasks like cell-level
classification.