Multimodal Datasets with Controllable Mutual Information

2510.21686v1 stat.ML, cs.LG 2025-10-28
Авторы:

Raheem Karim Hashmani, Garrett W. Merz, Helen Qu, Mariel Pettee, Kyle Cranmer

Abstract

We introduce a framework for generating highly multimodal datasets with explicitly calculable mutual information between modalities. This enables the construction of benchmark datasets that provide a novel testbed for systematic studies of mutual information estimators and multimodal self-supervised learning techniques. Our framework constructs realistic datasets with known mutual information using a flow-based generative model and a structured causal framework for generating correlated latent variables.

Ссылки и действия