VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents
2510.08109v1
cs.IR, cs.AI, cs.CL
2025-10-11
Авторы:
Daniel Huwiler, Kurt Stockinger, Jonathan Fürst
Abstract
Retrieval-Augmented Generation (RAG) systems fail when documents evolve
through versioning-a ubiquitous characteristic of technical documentation.
Existing approaches achieve only 58-64% accuracy on version-sensitive
questions, retrieving semantically similar content without temporal validity
checks. We present VersionRAG, a version-aware RAG framework that explicitly
models document evolution through a hierarchical graph structure capturing
version sequences, content boundaries, and changes between document states.
During retrieval, VersionRAG routes queries through specialized paths based on
intent classification, enabling precise version-aware filtering and change
tracking. On our VersionQA benchmark-100 manually curated questions across 34
versioned technical documents-VersionRAG achieves 90% accuracy, outperforming
naive RAG (58%) and GraphRAG (64%). VersionRAG reaches 60% accuracy on implicit
change detection where baselines fail (0-10%), demonstrating its ability to
track undocumented modifications. Additionally, VersionRAG requires 97% fewer
tokens during indexing than GraphRAG, making it practical for large-scale
deployment. Our work establishes versioned document QA as a distinct task and
provides both a solution and benchmark for future research.
Ссылки и действия
Дополнительные ресурсы: