A Novel Framework for Multi-Modal Protein Representation Learning
2510.23273v1
cs.LG, cs.AI, q-bio.QM
2025-10-29
Авторы:
Runjie Zheng, Zhen Wang, Anjie Qiao, Jiancong Xie, Jiahua Rao, Yuedong Yang
Abstract
Accurate protein function prediction requires integrating heterogeneous
intrinsic signals (e.g., sequence and structure) with noisy extrinsic contexts
(e.g., protein-protein interactions and GO term annotations). However, two key
challenges hinder effective fusion: (i) cross-modal distributional mismatch
among embeddings produced by pre-trained intrinsic encoders, and (ii) noisy
relational graphs of extrinsic data that degrade GNN-based information
aggregation. We propose Diffused and Aligned Multi-modal Protein Embedding
(DAMPE), a unified framework that addresses these through two core mechanisms.
First, we propose Optimal Transport (OT)-based representation alignment that
establishes correspondence between intrinsic embedding spaces of different
modalities, effectively mitigating cross-modal heterogeneity. Second, we
develop a Conditional Graph Generation (CGG)-based information fusion method,
where a condition encoder fuses the aligned intrinsic embeddings to provide
informative cues for graph reconstruction. Meanwhile, our theoretical analysis
implies that the CGG objective drives this condition encoder to absorb
graph-aware knowledge into its produced protein representations. Empirically,
DAMPE outperforms or matches state-of-the-art methods such as DPFunc on
standard GO benchmarks, achieving AUPR gains of 0.002-0.013 pp and Fmax gains
0.004-0.007 pp. Ablation studies further show that OT-based alignment
contributes 0.043-0.064 pp AUPR, while CGG-based fusion adds 0.005-0.111 pp
Fmax. Overall, DAMPE offers a scalable and theoretically grounded approach for
robust multi-modal protein representation learning, substantially enhancing
protein function prediction.
Ссылки и действия
Дополнительные ресурсы: