Mixture of Experts Approaches in Dense Retrieval Tasks
2510.15683v1
cs.IR, cs.AI, I.2.4; I.2.7
2025-10-21
Авторы:
Effrosyni Sokli, Pranav Kasela, Georgios Peikos, Gabriella Pasi
Abstract
Dense Retrieval Models (DRMs) are a prominent development in Information
Retrieval (IR). A key challenge with these neural Transformer-based models is
that they often struggle to generalize beyond the specific tasks and domains
they were trained on. To address this challenge, prior research in IR
incorporated the Mixture-of-Experts (MoE) framework within each Transformer
layer of a DRM, which, though effective, substantially increased the number of
additional parameters. In this paper, we propose a more efficient design, which
introduces a single MoE block (SB-MoE) after the final Transformer layer. To
assess the retrieval effectiveness of SB-MoE, we perform an empirical
evaluation across three IR tasks. Our experiments involve two evaluation
setups, aiming to assess both in-domain effectiveness and the model's zero-shot
generalizability. In the first setup, we fine-tune SB-MoE with four different
underlying DRMs on seven IR benchmarks and evaluate them on their respective
test sets. In the second setup, we fine-tune SB-MoE on MSMARCO and perform
zero-shot evaluation on thirteen BEIR datasets. Additionally, we perform
further experiments to analyze the model's dependency on its hyperparameters
(i.e., the number of employed and activated experts) and investigate how this
variation affects SB-MoE's performance. The obtained results show that SB-MoE
is particularly effective for DRMs with lightweight base models, such as
TinyBERT and BERT-Small, consistently exceeding standard model fine-tuning
across benchmarks. For DRMs with more parameters, such as BERT-Base and
Contriever, our model requires a larger number of training samples to achieve
improved retrieval performance. Our code is available online at:
https://github.com/FaySokli/SB-MoE.