MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection
2511.04255v1
cs.CV, cs.AI, cs.LG
2025-11-08
Авторы:
Marawan Elbatel, Anbang Wang, Keyuan Liu, Kaouther Mouheb, Enrique Almar-Munoz, Lizhuo Lin, Yanqi Yang, Karim Lekadir, Xiaomeng Li
Abstract
This paper does not introduce a novel architecture; instead, it revisits a
fundamental yet overlooked baseline: adapting human-centric foundation models
for anatomical landmark detection in medical imaging. While landmark detection
has traditionally relied on domain-specific models, the emergence of
large-scale pre-trained vision models presents new opportunities. In this
study, we investigate the adaptation of Sapiens, a human-centric foundation
model designed for pose estimation, to medical imaging through multi-dataset
pretraining, establishing a new state of the art across multiple datasets. Our
proposed model, MedSapiens, demonstrates that human-centric foundation models,
inherently optimized for spatial pose localization, provide strong priors for
anatomical landmark detection, yet this potential has remained largely
untapped. We benchmark MedSapiens against existing state-of-the-art models,
achieving up to 5.26% improvement over generalist models and up to 21.81%
improvement over specialist models in the average success detection rate (SDR).
To further assess MedSapiens adaptability to novel downstream tasks with few
annotations, we evaluate its performance in limited-data settings, achieving
2.69% improvement over the few-shot state of the art in SDR. Code and model
weights are available at https://github.com/xmed-lab/MedSapiens .
Ссылки и действия
Дополнительные ресурсы: