LimRank: Less is More for Reasoning-Intensive Information Reranking

2510.23544v1 cs.CL, cs.IR 2025-10-29

Авторы:

Tingyu Song, Yilun Zhao, Siyue Zhang, Chen Zhao, Arman Cohan

Abstract

Existing approaches typically rely on large-scale fine-tuning to adapt LLMs for information reranking tasks, which is computationally expensive. In this work, we demonstrate that modern LLMs can be effectively adapted using only minimal, high-quality supervision. To enable this, we design LIMRANK-SYNTHESIZER, a reusable and open-source pipeline for generating diverse, challenging, and realistic reranking examples. Using this synthetic data, we fine-tune our reranker model, LIMRANK. We evaluate LIMRANK on two challenging benchmarks, i.e., BRIGHT for reasoning-intensive retrieval and FollowIR for instruction-following retrieval. Our experiments demonstrate that LIMRANK achieves competitive performance, while being trained on less than 5% of the data typically used in prior work. Further ablation studies demonstrate the effectiveness of LIMRANK-SYNTHESIZER and the strong generalization capabilities of LIMRANK across downstream tasks, including scientific literature search and retrieval-augmented generation for knowledge-intensive problem solving.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

LimRank: Less is More for Reasoning-Intensive Information Reranking

Авторы:

Abstract

Ссылки и действия

Связанные статьи

MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Informa...

Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance ...

MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

Towards Unification of Hallucination Detection and Fact Verification for Large L...

Навигация