InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
2509.25270v1
cs.LG, cs.AI, cs.CV
2025-10-02
Авторы:
Liangjian Wen, Qun Dai, Jianzhuang Liu, Jiangtao Zheng, Yong Dai, Dongkai Wang, Zhao Kang, Jun Wang, Zenglin Xu, Jiang Duan
Abstract
In multimodal representation learning, synergistic interactions between
modalities not only provide complementary information but also create unique
outcomes through specific interaction patterns that no single modality could
achieve alone. Existing methods may struggle to effectively capture the full
spectrum of synergistic information, leading to suboptimal performance in tasks
where such interactions are critical. This is particularly problematic because
synergistic information constitutes the fundamental value proposition of
multimodal representation. To address this challenge, we introduce InfMasking,
a contrastive synergistic information extraction method designed to enhance
synergistic information through an \textbf{Inf}inite \textbf{Masking} strategy.
InfMasking stochastically occludes most features from each modality during
fusion, preserving only partial information to create representations with
varied synergistic patterns. Unmasked fused representations are then aligned
with masked ones through mutual information maximization to encode
comprehensive synergistic information. This infinite masking strategy enables
capturing richer interactions by exposing the model to diverse partial modality
combinations during training. As computing mutual information estimates with
infinite masking is computationally prohibitive, we derive an InfMasking loss
to approximate this calculation. Through controlled experiments, we demonstrate
that InfMasking effectively enhances synergistic information between
modalities. In evaluations on large-scale real-world datasets, InfMasking
achieves state-of-the-art performance across seven benchmarks. Code is released
at https://github.com/brightest66/InfMasking.
Ссылки и действия
Дополнительные ресурсы: