U-DFA: A Unified DINOv2-Unet with Dual Fusion Attention for Multi-Dataset Medical Segmentation
2510.00585v1
eess.IV, cs.AI, cs.CV
2025-10-04
Авторы:
Zulkaif Sajjad, Furqan Shaukat, Junaid Mir
Abstract
Accurate medical image segmentation plays a crucial role in overall diagnosis
and is one of the most essential tasks in the diagnostic pipeline. CNN-based
models, despite their extensive use, suffer from a local receptive field and
fail to capture the global context. A common approach that combines CNNs with
transformers attempts to bridge this gap but fails to effectively fuse the
local and global features. With the recent emergence of VLMs and foundation
models, they have been adapted for downstream medical imaging tasks; however,
they suffer from an inherent domain gap and high computational cost. To this
end, we propose U-DFA, a unified DINOv2-Unet encoder-decoder architecture that
integrates a novel Local-Global Fusion Adapter (LGFA) to enhance segmentation
performance. LGFA modules inject spatial features from a CNN-based Spatial
Pattern Adapter (SPA) module into frozen DINOv2 blocks at multiple stages,
enabling effective fusion of high-level semantic and spatial features. Our
method achieves state-of-the-art performance on the Synapse and ACDC datasets
with only 33\% of the trainable model parameters. These results demonstrate
that U-DFA is a robust and scalable framework for medical image segmentation
across multiple modalities.
Ссылки и действия
Дополнительные ресурсы: