Multimodal Datasets with Controllable Mutual Information
2510.21686v1
stat.ML, cs.LG
2025-10-28
Авторы:
Raheem Karim Hashmani, Garrett W. Merz, Helen Qu, Mariel Pettee, Kyle Cranmer
Abstract
We introduce a framework for generating highly multimodal datasets with
explicitly calculable mutual information between modalities. This enables the
construction of benchmark datasets that provide a novel testbed for systematic
studies of mutual information estimators and multimodal self-supervised
learning techniques. Our framework constructs realistic datasets with known
mutual information using a flow-based generative model and a structured causal
framework for generating correlated latent variables.
Ссылки и действия
Дополнительные ресурсы: