Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification

2508.04205v1 cs.CV 2025-08-09

Авторы:

Jianxun Yu, Ruiquan Ge, Zhipeng Wang, Cheng Yang, Chenyu Lin, Xianjun Fu, Jikui Liu, Ahmed Elazab, Changmiao Wang

Резюме на русском

**Резюме** Диагностика медицинских заболеваний часто сталкивается с проблемой недостаточной точности в определении небольших легковычлистных образов, что приводит к ошибкам в диагностике. Данная работа предлагает Multimodal Multiscale Cross-Attention Fusion Network (MMCAF-Net) — сеть, основанную на мультимодальном подходе и мультимасштабной функции кросс-аттенции, для улучшения диагностической точности. Модель использует многослойную структуру признаков и 3D мультимасштабный аттенционный модуль для эффективного извлечения легковычлистных признаков из медицинских изображений. Для решения проблемы межмодального выравнивания, MMCAF-Net включает в себя кросс-аттенционный модуль, решающий проблему неоднородности данных и обеспечивающий более эффективное объединение мультимодальных признаков. Модель была проверена на Lung-PET-CT-Dx датасете, показав значительное улучшение производительности по сравнению с текущими методами. Этот подход может стать важной добавкой к арсеналу инструментов для улучшения диагностической точности в медицинской практике.

Abstract

The diagnosis of medical diseases faces challenges such as the misdiagnosis of small lesions. Deep learning, particularly multimodal approaches, has shown great potential in the field of medical disease diagnosis. However, the differences in dimensionality between medical imaging and electronic health record data present challenges for effective alignment and fusion. To address these issues, we propose the Multimodal Multiscale Cross-Attention Fusion Network (MMCAF-Net). This model employs a feature pyramid structure combined with an efficient 3D multi-scale convolutional attention module to extract lesion-specific features from 3D medical images. To further enhance multimodal data integration, MMCAF-Net incorporates a multi-scale cross-attention module, which resolves dimensional inconsistencies, enabling more effective feature fusion. We evaluated MMCAF-Net on the Lung-PET-CT-Dx dataset, and the results showed a significant improvement in diagnostic accuracy, surpassing current state-of-the-art methods. The code is available at https://github.com/yjx1234/MMCAF-Net

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

ViRectify: A Challenging Benchmark for Video Reasoning Correction with Multimoda...

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with P...

ViDiC: Video Difference Captioning

Beyond the Ground Truth: Enhanced Supervision for Image Restoration

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task ...

Навигация