Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth
2510.15719v1
cs.CL, cs.IR
2025-10-21
Авторы:
Helia Hashemi, Victor Rühle, Saravan Rajmohan
Abstract
Reasoning models have gained significant attention due to their strong
performance, particularly when enhanced with retrieval augmentation. However,
these models often incur high computational costs, as both retrieval and
reasoning tokens contribute substantially to the overall resource usage. In
this work, we make the following contributions: (1) we propose a
retrieval-augmented reasoning model that dynamically adjusts the length of the
retrieved document list based on the query and retrieval results; (2) we
develop a cost-aware advantage function for training of efficient
retrieval-augmented reasoning models through reinforcement learning; and (3) we
explore both memory- and latency-bound implementations of the proposed
cost-aware framework for both proximal and group relative policy optimization
algorithms. We evaluate our approach on seven public question answering
datasets and demonstrate significant efficiency gains, without compromising
effectiveness. In fact, we observed that the model latency decreases by ~16-20%
across datasets, while its effectiveness increases by ~5% on average, in terms
of exact match.
Ссылки и действия
Дополнительные ресурсы: