BiPETE: A Bi-Positional Embedding Transformer Encoder for Risk Assessment of Alcohol and Substance Use Disorder with Electronic Health Records
2511.04998v1
cs.LG, cs.AI, q-bio.QM
2025-11-11
Авторы:
Daniel S. Lee, Mayra S. Haedo-Cruz, Chen Jiang, Oshin Miranda, LiRong Wang
Abstract
Transformer-based deep learning models have shown promise for disease risk
prediction using electronic health records(EHRs), but modeling temporal
dependencies remains a key challenge due to irregular visit intervals and lack
of uniform structure. We propose a Bi-Positional Embedding Transformer Encoder
or BiPETE for single-disease prediction, which integrates rotary positional
embeddings to encode relative visit timing and sinusoidal embeddings to
preserve visit order. Without relying on large-scale pretraining, BiPETE is
trained on EHR data from two mental health cohorts-depressive disorder and
post-traumatic stress disorder (PTSD)-to predict the risk of alcohol and
substance use disorders (ASUD). BiPETE outperforms baseline models, improving
the area under the precision-recall curve (AUPRC) by 34% and 50% in the
depression and PTSD cohorts, respectively. An ablation study further confirms
the effectiveness of the dual positional encoding strategy. We apply the
Integrated Gradients method to interpret model predictions, identifying key
clinical features associated with ASUD risk and protection, such as abnormal
inflammatory, hematologic, and metabolic markers, as well as specific
medications and comorbidities. Overall, these key clinical features identified
by the attribution methods contribute to a deeper understanding of the risk
assessment process and offer valuable clues for mitigating potential risks. In
summary, our study presents a practical and interpretable framework for disease
risk prediction using EHR data, which can achieve strong performance.
Ссылки и действия
Дополнительные ресурсы: