Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study

2510.21389v1 cs.LG, cs.AI, cs.HC 2025-10-28

Авторы:

Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Gjergji Kasneci, Hendrik Lensch

Abstract

Artificial intelligence (AI) systems increasingly match or surpass human experts in biomedical signal interpretation. However, their effective integration into clinical practice requires more than high predictive accuracy. Clinicians must discern \textit{when} and \textit{why} to trust algorithmic recommendations. This work presents an application-grounded user study with eight professional sleep medicine practitioners, who score nocturnal arousal events in polysomnographic data under three conditions: (i) manual scoring, (ii) black-box (BB) AI assistance, and (iii) transparent white-box (WB) AI assistance. Assistance is provided either from the \textit{start} of scoring or as a post-hoc quality-control (\textit{QC}) review. We systematically evaluate how the type and timing of assistance influence event-level and clinically most relevant count-based performance, time requirements, and user experience. When evaluated against the clinical standard used to train the AI, both AI and human-AI teams significantly outperform unaided experts, with collaboration also reducing inter-rater variability. Notably, transparent AI assistance applied as a targeted QC step yields median event-level performance improvements of approximately 30\% over black-box assistance, and QC timing further enhances count-based outcomes. While WB and QC approaches increase the time required for scoring, start-time assistance is faster and preferred by most participants. Participants overwhelmingly favor transparency, with seven out of eight expressing willingness to adopt the system with minor or no modifications. In summary, strategically timed transparent AI assistance effectively balances accuracy and clinical efficiency, providing a promising pathway toward trustworthy AI integration and user acceptance in clinical workflows.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Assessing the Real-World Utility of Explainable AI for Arousal Diagnostics: An Application-Grounded User Study

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Monte Carlo Expected Threat (MOCET) Scoring

Simulated Human Learning in a Dynamic, Partially-Observed, Time-Series Environme...

From Prototypes to Sparse ECG Explanations: SHAP-Driven Counterfactuals for Mult...

NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models

Benchmark It Yourself (BIY): Preparing a Dataset and Benchmarking AI Models for ...

Навигация