Multimodal Foundation Models for Early Disease Detection
2510.01899v1
cs.LG, cs.AI, cs.HC
2025-10-04
Авторы:
Md Talha Mohsin, Ismail Abdulrashid
Abstract
Healthcare generates diverse streams of data, including electronic health
records (EHR), medical imaging, genetics, and ongoing monitoring from wearable
devices. Traditional diagnostic models frequently analyze these sources in
isolation, which constrains their capacity to identify cross-modal correlations
essential for early disease diagnosis. Our research presents a multimodal
foundation model that consolidates diverse patient data through an
attention-based transformer framework. At first, dedicated encoders put each
modality into a shared latent space. Then, they combine them using multi-head
attention and residual normalization. The architecture is made for pretraining
on many tasks, which makes it easy to adapt to new diseases and datasets with
little extra work. We provide an experimental strategy that uses benchmark
datasets in oncology, cardiology, and neurology, with the goal of testing early
detection tasks. The framework includes data governance and model management
tools in addition to technological performance to improve transparency,
reliability, and clinical interpretability. The suggested method works toward a
single foundation model for precision diagnostics, which could improve the
accuracy of predictions and help doctors make decisions.
Ссылки и действия
Дополнительные ресурсы: