Rethinking On-policy Optimization for Query Augmentation
2510.17139v1
cs.CL, cs.IR
2025-10-22
Авторы:
Zhichao Xu, Shengyao Zhuang, Xueguang Ma, Bingsen Chen, Yijun Tian, Fengran Mo, Jie Cao, Vivek Srikumar
Abstract
Recent advances in large language models (LLMs) have led to a surge of
interest in query augmentation for information retrieval (IR). Two main
approaches have emerged. The first prompts LLMs to generate answers or
pseudo-documents that serve as new queries, relying purely on the model's
parametric knowledge or contextual information. The second applies
reinforcement learning (RL) to fine-tune LLMs for query rewriting, directly
optimizing retrieval metrics. While having respective advantages and
limitations, the two approaches have not been compared under consistent
experimental conditions. In this work, we present the first systematic
comparison of prompting-based and RL-based query augmentation across diverse
benchmarks, including evidence-seeking, ad hoc, and tool retrieval. Our key
finding is that simple, training-free query augmentation often performs on par
with, or even surpasses, more expensive RL-based counterparts, especially when
using powerful LLMs. Motivated by this discovery, we introduce a novel hybrid
method, On-policy Pseudo-document Query Expansion (OPQE), which, instead of
rewriting a query, the LLM policy learns to generate a pseudo-document that
maximizes retrieval performance, thus merging the flexibility and generative
structure of prompting with the targeted optimization of RL. We show OPQE
outperforms both standalone prompting and RL-based rewriting, demonstrating
that a synergistic approach yields the best results. Our implementation is made
available to facilitate reproducibility.
Ссылки и действия
Дополнительные ресурсы: