Structurally Refined Graph Transformer for Multimodal Recommendation
2511.00584v1
cs.IR, cs.CL
2025-11-06
Авторы:
Ke Shi, Yan Zhang, Miao Zhang, Lifan Chen, Jiali Yi, Kui Xiao, Xiaoju Hou, Zhifei Li
Abstract
Multimodal recommendation systems utilize various types of information,
including images and text, to enhance the effectiveness of recommendations. The
key challenge is predicting user purchasing behavior from the available data.
Current recommendation models prioritize extracting multimodal information
while neglecting the distinction between redundant and valuable data. They also
rely heavily on a single semantic framework (e.g., local or global semantics),
resulting in an incomplete or biased representation of user preferences,
particularly those less expressed in prior interactions. Furthermore, these
approaches fail to capture the complex interactions between users and items,
limiting the model's ability to meet diverse users. To address these
challenges, we present SRGFormer, a structurally optimized multimodal
recommendation model. By modifying the transformer for better integration into
our model, we capture the overall behavior patterns of users. Then, we enhance
structural information by embedding multimodal information into a hypergraph
structure to aid in learning the local structures between users and items.
Meanwhile, applying self-supervised tasks to user-item collaborative signals
enhances the integration of multimodal information, thereby revealing the
representational features inherent to the data's modality. Extensive
experiments on three public datasets reveal that SRGFormer surpasses previous
benchmark models, achieving an average performance improvement of 4.47 percent
on the Sports dataset. The code is publicly available online.
Ссылки и действия
Дополнительные ресурсы: