PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora

2510.14377v1 cs.CL, cs.IR, cs.LG 2025-10-18

Авторы:

Mykolas Sveistrys, Richard Kunert

Abstract

Recent advances in large language models (LLMs) and retrieval-augmented generation (RAG) have enabled progress on question answering (QA) when relevant evidence is in one (single-hop) or multiple (multi-hop) passages. Yet many realistic questions about recurring report data - medical records, compliance filings, maintenance logs - require aggregation across all documents, with no clear stopping point for retrieval and high sensitivity to even one missed passage. We term these pluri-hop questions and formalize them by three criteria: recall sensitivity, exhaustiveness, and exactness. To study this setting, we introduce PluriHopWIND, a diagnostic multilingual dataset of 48 pluri-hop questions built from 191 real-world wind industry reports in German and English. We show that PluriHopWIND is 8-40% more repetitive than other common datasets and thus has higher density of distractor documents, better reflecting practical challenges of recurring report corpora. We test a traditional RAG pipeline as well as graph-based and multimodal variants, and find that none of the tested approaches exceed 40% in statement-wise F1 score. Motivated by this, we propose PluriHopRAG, a RAG architecture that follows a "check all documents individually, filter cheaply" approach: it (i) decomposes queries into document-level subquestions and (ii) uses a cross-encoder filter to discard irrelevant documents before costly LLM reasoning. We find that PluriHopRAG achieves relative F1 score improvements of 18-52% depending on base LLM. Despite its modest size, PluriHopWIND exposes the limitations of current QA systems on repetitive, distractor-rich corpora. PluriHopRAG's performance highlights the value of exhaustive retrieval and early filtering as a powerful alternative to top-k methods.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

PluriHop: Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora

Авторы:

Abstract

Ссылки и действия

Связанные статьи

When Sufficient is not Enough: Utilizing the Rashomon Effect for Complete Eviden...

AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progres...

Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedica...

Mental Multi-class Classification on Social Media: Benchmarking Transformer Arch...

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

Навигация