RAGSmith: A Framework for Finding the Optimal Composition of Retrieval-Augmented Generation Methods Across Datasets
2511.01386v1
cs.CL, cs.AI, cs.IR, H.3.3; I.2.7
2025-11-06
Авторы:
Muhammed Yusuf Kartal, Suha Kagan Kose, Korhan Sevinç, Burak Aktas
Abstract
Retrieval-Augmented Generation (RAG) quality depends on many interacting
choices across retrieval, ranking, augmentation, prompting, and generation, so
optimizing modules in isolation is brittle. We introduce RAGSmith, a modular
framework that treats RAG design as an end-to-end architecture search over nine
technique families and 46{,}080 feasible pipeline configurations. A genetic
search optimizes a scalar objective that jointly aggregates retrieval metrics
(recall@k, mAP, nDCG, MRR) and generation metrics (LLM-Judge and semantic
similarity). We evaluate on six Wikipedia-derived domains (Mathematics, Law,
Finance, Medicine, Defense Industry, Computer Science), each with 100 questions
spanning factual, interpretation, and long-answer types. RAGSmith finds
configurations that consistently outperform naive RAG baseline by +3.8\% on
average (range +1.2\% to +6.9\% across domains), with gains up to +12.5\% in
retrieval and +7.5\% in generation. The search typically explores $\approx
0.2\%$ of the space ($\sim 100$ candidates) and discovers a robust backbone --
vector retrieval plus post-generation reflection/revision -- augmented by
domain-dependent choices in expansion, reranking, augmentation, and prompt
reordering; passage compression is never selected. Improvement magnitude
correlates with question type, with larger gains on factual/long-answer mixes
than interpretation-heavy sets. These results provide practical, domain-aware
guidance for assembling effective RAG systems and demonstrate the utility of
evolutionary search for full-pipeline optimization.