ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning
2510.16824v1
cs.LG, q-bio.MN
2025-10-22
Авторы:
Yingxu Wang, Kunyu Zhang, Jiaxin Huang, Nan Yin, Siwei Liu, Eran Segal
Abstract
Multimodal molecular representation learning, which jointly models molecular
graphs and their textual descriptions, enhances predictive accuracy and
interpretability by enabling more robust and reliable predictions of drug
toxicity, bioactivity, and physicochemical properties through the integration
of structural and semantic information. However, existing multimodal methods
suffer from two key limitations: (1) they typically perform cross-modal
interaction only at the final encoder layer, thus overlooking hierarchical
semantic dependencies; (2) they lack a unified prototype space for robust
alignment between modalities. To address these limitations, we propose
ProtoMol, a prototype-guided multimodal framework that enables fine-grained
integration and consistent semantic alignment between molecular graphs and
textual descriptions. ProtoMol incorporates dual-branch hierarchical encoders,
utilizing Graph Neural Networks to process structured molecular graphs and
Transformers to encode unstructured texts, resulting in comprehensive
layer-wise representations. Then, ProtoMol introduces a layer-wise
bidirectional cross-modal attention mechanism that progressively aligns
semantic features across layers. Furthermore, a shared prototype space with
learnable, class-specific anchors is constructed to guide both modalities
toward coherent and discriminative representations. Extensive experiments on
multiple benchmark datasets demonstrate that ProtoMol consistently outperforms
state-of-the-art baselines across a variety of molecular property prediction
tasks.
Ссылки и действия
Дополнительные ресурсы: