Simple Projection Variants Improve ColBERT Performance
2510.12327v1
cs.IR, cs.AI, cs.CL
2025-10-16
Авторы:
Benjamin Clavié, Sean Lee, Rikiya Takehi, Aamir Shakir, Makoto P. Kato
Abstract
Multi-vector dense retrieval methods like ColBERT systematically use a
single-layer linear projection to reduce the dimensionality of individual
vectors. In this study, we explore the implications of the MaxSim operator on
the gradient flows of the training of multi-vector models and show that such a
simple linear projection has inherent, if non-critical, limitations in this
setting. We then discuss the theoretical improvements that could result from
replacing this single-layer projection with well-studied alternative
feedforward linear networks (FFN), such as deeper, non-linear FFN blocks, GLU
blocks, and skip-connections, could alleviate these limitations. Through the
design and systematic evaluation of alternate projection blocks, we show that
better-designed final projections positively impact the downstream performance
of ColBERT models. We highlight that many projection variants outperform the
original linear projections, with the best-performing variants increasing
average performance on a range of retrieval benchmarks across domains by over 2
NDCG@10 points. We then conduct further exploration on the individual
parameters of these projections block in order to understand what drives this
empirical performance, highlighting the particular importance of upscaled
intermediate projections and residual connections. As part of these ablation
studies, we show that numerous suboptimal projection variants still outperform
the traditional single-layer projection across multiple benchmarks, confirming
our hypothesis. Finally, we observe that this effect is consistent across
random seeds, further confirming that replacing the linear layer of ColBERT
models is a robust, drop-in upgrade.
Ссылки и действия
Дополнительные ресурсы: