FaStFACT: Faster, Stronger Long-Form Factuality Evaluations in LLMs
2510.12839v1
cs.CL, cs.AI, cs.CE, cs.CY
2025-10-17
Авторы:
Yingjia Wan, Haochen Tan, Xiao Zhu, Xinyu Zhou, Zhiwei Li, Qingsong Lv, Changxuan Sun, Jiaqi Zeng, Yi Xu, Jianqiao Lu, Yinhong Liu, Zhijiang Guo
Abstract
Evaluating the factuality of long-form generations from Large Language Models
(LLMs) remains challenging due to accuracy issues and costly human assessment.
Prior efforts attempt this by decomposing text into claims, searching for
evidence, and verifying claims, but suffer from critical drawbacks: (1)
inefficiency due to complex pipeline components unsuitable for long LLM
outputs, and (2) ineffectiveness stemming from inaccurate claim sets and
insufficient evidence collection of one-line snippets.
To address these limitations, we propose \name, a fast and strong evaluation
framework that achieves the highest alignment with human evaluation and
efficiency among existing baselines. \name first employs chunk-level claim
extraction integrated with confidence-based pre-verification, significantly
reducing the cost of web searching and inference calling while ensuring
reliability. For searching and verification, it collects document-level
evidence from crawled webpages and selectively retrieves it during
verification, addressing the evidence insufficiency problem in previous
pipelines.
Extensive experiments based on an aggregated and manually annotated benchmark
demonstrate the reliability of \name in both efficiently and effectively
evaluating the factuality of long-form LLM generations. Code and benchmark data
is available at https://github.com/Yingjia-Wan/FastFact.