Learning to reason about rare diseases through retrieval-augmented agents

2511.04720v1 cs.CL, cs.AI 2025-11-11

Авторы:

Ha Young Kim, Jun Li, Ana Beatriz Solana, Carolin M. Pirkl, Benedikt Wiestler, Julia A. Schnabel, Cosmin I. Bercea

Abstract

Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consult case reports and literature when confronted with unfamiliar findings. Following this line of reasoning, we introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic system for rare disease detection in brain MRI. Our approach uses AI agents with access to external medical knowledge by embedding both case reports and literature using sentence transformers and indexing them with FAISS to enable efficient similarity search. The agent retrieves clinically relevant evidence to guide diagnostic decision making on unseen diseases, without the need of additional training. Designed as a model-agnostic reasoning module, RADAR can be seamlessly integrated with diverse large language models, consistently improving their rare pathology recognition and interpretability. On the NOVA dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2% performance gain, with the strongest improvements observed for open source models such as DeepSeek. Beyond accuracy, the retrieved examples provide interpretable, literature grounded explanations, highlighting retrieval-augmented reasoning as a powerful paradigm for low-prevalence conditions in medical imaging.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Learning to reason about rare diseases through retrieval-augmented agents

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Mitigating Self-Preference by Authorship Obfuscation

Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framew...

Grounded Multilingual Medical Reasoning for Question Answering with Large Langua...

Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness acro...

Retrieving Semantically Similar Decisions under Noisy Institutional Labels: Robu...

Навигация