GAPMAP: Mapping Scientific Knowledge Gaps in Biomedical Literature Using Large Language Models
2510.25055v1
cs.CL, cs.AI, cs.LG
2025-10-31
Авторы:
Nourah M Salem, Elizabeth White, Michael Bada, Lawrence Hunter
Abstract
Scientific progress is driven by the deliberate articulation of what remains
unknown. This study investigates the ability of large language models (LLMs) to
identify research knowledge gaps in the biomedical literature. We define two
categories of knowledge gaps: explicit gaps, clear declarations of missing
knowledge; and implicit gaps, context-inferred missing knowledge. While prior
work has focused mainly on explicit gap detection, we extend this line of
research by addressing the novel task of inferring implicit gaps. We conducted
two experiments on almost 1500 documents across four datasets, including a
manually annotated corpus of biomedical articles. We benchmarked both
closed-weight models (from OpenAI) and open-weight models (Llama and Gemma 2)
under paragraph-level and full-paper settings. To address the reasoning of
implicit gaps inference, we introduce \textbf{\small TABI}, a Toulmin-Abductive
Bucketed Inference scheme that structures reasoning and buckets inferred
conclusion candidates for validation. Our results highlight the robust
capability of LLMs in identifying both explicit and implicit knowledge gaps.
This is true for both open- and closed-weight models, with larger variants
often performing better. This suggests a strong ability of LLMs for
systematically identifying candidate knowledge gaps, which can support
early-stage research formulation, policymakers, and funding decisions. We also
report observed failure modes and outline directions for robust deployment,
including domain adaptation, human-in-the-loop verification, and benchmarking
across open- and closed-weight models.
Ссылки и действия
Дополнительные ресурсы: