MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates

2510.10534v1 cs.CV, cs.LG, cs.MM 2025-10-15

Авторы:

Binyu Zhao, Wei Zhang, Zhaonian Zou

Abstract

Multi-modal learning has made significant advances across diverse pattern recognition applications. However, handling missing modalities, especially under imbalanced missing rates, remains a major challenge. This imbalance triggers a vicious cycle: modalities with higher missing rates receive fewer updates, leading to inconsistent learning progress and representational degradation that further diminishes their contribution. Existing methods typically focus on global dataset-level balancing, often overlooking critical sample-level variations in modality utility and the underlying issue of degraded feature quality. We propose Modality Capability Enhancement (MCE) to tackle these limitations. MCE includes two synergistic components: i) Learning Capability Enhancement (LCE), which introduces multi-level factors to dynamically balance modality-specific learning progress, and ii) Representation Capability Enhancement (RCE), which improves feature semantics and robustness through subset prediction and cross-modal completion tasks. Comprehensive evaluations on four multi-modal benchmarks show that MCE consistently outperforms state-of-the-art methods under various missing configurations. The journal preprint version is now available at https://doi.org/10.1016/j.patcog.2025.112591. Our code is available at https://github.com/byzhaoAI/MCE.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates

Авторы:

Abstract

Ссылки и действия

Связанные статьи

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Image...

DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Repres...

Calibrated Multimodal Representation Learning with Missing Modalities

Post-surgical Endometriosis Segmentation in Laparoscopic Videos

Zero-shot image privacy classification with Vision-Language Models

Навигация