Quantifying Modality Contributions via Disentangling Multimodal Representations

2511.19470v1 cs.LG, cs.AI, cs.CL 2025-11-26

Авторы:

Padegal Amit, Omkar Mahesh Kashyap, Namitha Rayasam, Nidhi Shekhar, Surabhi Narayan

Abstract

Quantifying modality contributions in multimodal models remains a challenge, as existing approaches conflate the notion of contribution itself. Prior work relies on accuracy-based approaches, interpreting performance drops after removing a modality as indicative of its influence. However, such outcome-driven metrics fail to distinguish whether a modality is inherently informative or whether its value arises only through interaction with other modalities. This distinction is particularly important in cross-attention architectures, where modalities influence each other's representations. In this work, we propose a framework based on Partial Information Decomposition (PID) that quantifies modality contributions by decomposing predictive information in internal embeddings into unique, redundant, and synergistic components. To enable scalable, inference-only analysis, we develop an algorithm based on the Iterative Proportional Fitting Procedure (IPFP) that computes layer and dataset-level contributions without retraining. This provides a principled, representation-level view of multimodal behavior, offering clearer and more interpretable insights than outcome-based metrics.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Quantifying Modality Contributions via Disentangling Multimodal Representations

Авторы:

Abstract

Ссылки и действия

Связанные статьи

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

Multi-LLM Collaboration for Medication Recommendation

Network of Theseus (like the ship)

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

Mode-Conditioning Unlocks Superior Test-Time Scaling

Навигация