DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning
2509.23883v1
cs.CL, cs.IR
2025-10-01
Авторы:
Yibo Yan, Guangwei Xu, Xin Zou, Shuliang Liu, James Kwok, Xuming Hu
Abstract
Visual Document Retrieval (VDR), the task of retrieving visually-rich
document pages using queries that combine visual and textual cues, is crucial
for numerous real-world applications. Recent state-of-the-art methods leverage
Large Vision-Language Models (LVLMs) in a multi-vector paradigm, representing
each document as patch-level embeddings to capture fine-grained details. While
highly effective, this approach introduces a critical challenge: prohibitive
storage overhead, as storing hundreds of vectors per page makes large-scale
deployment costly and impractical. To address this, we introduce DocPruner, the
first framework to employ adaptive patch-level embedding pruning for VDR to
effectively reduce the storage overhead. DocPruner leverages the intra-document
patch attention distribution to dynamically identify and discard redundant
embeddings for each document. This adaptive mechanism enables a significant
50-60% reduction in storage for leading multi-vector VDR models with negligible
degradation in document retrieval performance. Extensive experiments across
more than ten representative datasets validate that DocPruner offers a robust,
flexible, and effective solution for building storage-efficient, large-scale
VDR systems.
Ссылки и действия
Дополнительные ресурсы: