Beyond Simple Fusion: Adaptive Gated Fusion for Robust Multimodal Sentiment Analysis
2510.01677v1
cs.LG, cs.CV
2025-10-04
Авторы:
Han Wu, Yanming Sun, Yunhe Yang, Derek F. Wong
Abstract
Multimodal sentiment analysis (MSA) leverages information fusion from diverse
modalities (e.g., text, audio, visual) to enhance sentiment prediction.
However, simple fusion techniques often fail to account for variations in
modality quality, such as those that are noisy, missing, or semantically
conflicting. This oversight leads to suboptimal performance, especially in
discerning subtle emotional nuances. To mitigate this limitation, we introduce
a simple yet efficient \textbf{A}daptive \textbf{G}ated \textbf{F}usion
\textbf{N}etwork that adaptively adjusts feature weights via a dual gate fusion
mechanism based on information entropy and modality importance. This mechanism
mitigates the influence of noisy modalities and prioritizes informative cues
following unimodal encoding and cross-modal interaction. Experiments on
CMU-MOSI and CMU-MOSEI show that AGFN significantly outperforms strong
baselines in accuracy, effectively discerning subtle emotions with robust
performance. Visualization analysis of feature representations demonstrates
that AGFN enhances generalization by learning from a broader feature
distribution, achieved by reducing the correlation between feature location and
prediction error, thereby decreasing reliance on specific locations and
creating more robust multimodal feature representations.
Ссылки и действия
Дополнительные ресурсы: