Interpretable RNA-Seq Clustering with an LLM-Based Agentic Evidence-Grounded Framework
2510.16082v1
q-bio.QM, cs.AI, cs.LG
2025-10-22
Авторы:
Elias Hossain, Mehrdad Shoeibi, Ivan Garibay, Niloofar Yousefi
Abstract
We propose CITE V.1, an agentic, evidence-grounded framework that leverages
Large Language Models (LLMs) to provide transparent and reproducible
interpretations of RNA-seq clusters. Unlike existing enrichment-based
approaches that reduce results to broad statistical associations and LLM-only
models that risk unsupported claims or fabricated citations, CITE V.1
transforms cluster interpretation by producing biologically coherent
explanations explicitly anchored in the biomedical literature. The framework
orchestrates three specialized agents: a Retriever that gathers domain
knowledge from PubMed and UniProt, an Interpreter that formulates functional
hypotheses, and Critics that evaluate claims, enforce evidence grounding, and
qualify uncertainty through confidence and reliability indicators. Applied to
Salmonella enterica RNA-seq data, CITE V.1 generated biologically meaningful
insights supported by the literature, while an LLM-only Gemini baseline
frequently produced speculative results with false citations. By moving RNA-seq
analysis from surface-level enrichment to auditable, interpretable, and
evidence-based hypothesis generation, CITE V.1 advances the transparency and
reliability of AI in biomedicine.
Ссылки и действия
Дополнительные ресурсы: