Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs
2511.04473v1
cs.LG, cs.AI, cs.CL, cs.IR
2025-11-08
Авторы:
Alberto Cattaneo, Carlo Luschi, Daniel Justus
Abstract
Retrieval of information from graph-structured knowledge bases represents a
promising direction for improving the factuality of LLMs. While various
solutions have been proposed, a comparison of methods is difficult due to the
lack of challenging QA datasets with ground-truth targets for graph retrieval.
We present SynthKGQA, a framework for generating high-quality synthetic
Knowledge Graph Question Answering datasets from any Knowledge Graph, providing
the full set of ground-truth facts in the KG to reason over each question. We
show how, in addition to enabling more informative benchmarking of KG
retrievers, the data produced with SynthKGQA also allows us to train better
models. We apply SynthKGQA to Wikidata to generate GTSQA, a new dataset
designed to test zero-shot generalization abilities of KG retrievers with
respect to unseen graph structures and relation types, and benchmark popular
solutions for KG-augmented LLMs on it.