Foundation Models in Medical Image Analysis: A Systematic Review and Meta-Analysis
2510.16973v1
cs.CV, cs.AI, physics.med-ph
2025-10-22
Авторы:
Praveenbalaji Rajendran, Mojtaba Safari, Wenfeng He, Mingzhe Hu, Shansong Wang, Jun Zhou, Xiaofeng Yang
Abstract
Recent advancements in artificial intelligence (AI), particularly foundation
models (FMs), have revolutionized medical image analysis, demonstrating strong
zero- and few-shot performance across diverse medical imaging tasks, from
segmentation to report generation. Unlike traditional task-specific AI models,
FMs leverage large corpora of labeled and unlabeled multimodal datasets to
learn generalized representations that can be adapted to various downstream
clinical applications with minimal fine-tuning. However, despite the rapid
proliferation of FM research in medical imaging, the field remains fragmented,
lacking a unified synthesis that systematically maps the evolution of
architectures, training paradigms, and clinical applications across modalities.
To address this gap, this review article provides a comprehensive and
structured analysis of FMs in medical image analysis. We systematically
categorize studies into vision-only and vision-language FMs based on their
architectural foundations, training strategies, and downstream clinical tasks.
Additionally, a quantitative meta-analysis of the studies was conducted to
characterize temporal trends in dataset utilization and application domains. We
also critically discuss persistent challenges, including domain adaptation,
efficient fine-tuning, computational constraints, and interpretability along
with emerging solutions such as federated learning, knowledge distillation, and
advanced prompting. Finally, we identify key future research directions aimed
at enhancing the robustness, explainability, and clinical integration of FMs,
thereby accelerating their translation into real-world medical practice.