Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval
2511.02770v1
cs.CL, cs.IR
2025-11-06
Авторы:
Hung-Ting Chen, Xiang Liu, Shauli Ravfogel, Eunsol Choi
Abstract
Most text retrievers generate \emph{one} query vector to retrieve relevant
documents. Yet, the conditional distribution of relevant documents for the
query may be multimodal, e.g., representing different interpretations of the
query. We first quantify the limitations of existing retrievers. All retrievers
we evaluate struggle more as the distance between target document embeddings
grows. To address this limitation, we develop a new retriever architecture,
\emph{A}utoregressive \emph{M}ulti-\emph{E}mbedding \emph{R}etriever (AMER).
Our model autoregressively generates multiple query vectors, and all the
predicted query vectors are used to retrieve documents from the corpus. We show
that on the synthetic vectorized data, the proposed method could capture
multiple target distributions perfectly, showing 4x better performance than
single embedding model. We also fine-tune our model on real-world multi-answer
retrieval datasets and evaluate in-domain. AMER presents 4 and 21\% relative
gains over single-embedding baselines on two datasets we evaluate on.
Furthermore, we consistently observe larger gains on the subset of dataset
where the embeddings of the target documents are less similar to each other. We
demonstrate the potential of using a multi-query vector retriever and open up a
new direction for future work.
Ссылки и действия
Дополнительные ресурсы: