Auto-ARGUE: LLM-Based Report Generation Evaluation

2509.26184v2 cs.IR, cs.AI, cs.CL 2025-10-02

Авторы:

William Walden, Marc Mason, Orion Weller, Laura Dietz, Hannah Recknor, Bryan Li, Gabrielle Kaili-May Liu, Yu Hou, James Mayfield, Eugene Yang

Abstract

Generation of long-form, citation-backed reports is a primary use case for retrieval augmented generation (RAG) systems. While open-source evaluation tools exist for various RAG tasks, ones tailored to report generation are lacking. Accordingly, we introduce Auto-ARGUE, a robust LLM-based implementation of the recent ARGUE framework for report generation evaluation. We present analysis of Auto-ARGUE on the report generation pilot task from the TREC 2024 NeuCLIR track, showing good system-level correlations with human judgments. We further release a web app for visualization of Auto-ARGUE outputs.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Найти цитирования в Google Scholar
Поиск в Semantic Scholar
Другие статьи категории cs.IR, cs.AI, cs.CL

Auto-ARGUE: LLM-Based Report Generation Evaluation

Авторы:

Abstract

Ссылки и действия

Связанные статьи

What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Langua...

Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information ...

PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Rea...

REVISION:Reflective Intent Mining and Online Reasoning Auxiliary for E-commerce ...

Pctx: Tokenizing Personalized Context for Generative Recommendation

Навигация