TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
2511.05385v1
cs.IR, cs.AI
2025-11-11
Авторы:
Chao Zhang, Yuhao Wang, Derong Xu, Haoxin Zhang, Yuanjie Lyu, Yuhao Chen, Shuochen Liu, Tong Xu, Xiangyu Zhao, Yan Gao, Yao Hu, Enhong Chen
Abstract
Retrieval-Augmented Generation (RAG) utilizes external knowledge to augment
Large Language Models' (LLMs) reliability. For flexibility, agentic RAG employs
autonomous, multi-round retrieval and reasoning to resolve queries. Although
recent agentic RAG has improved via reinforcement learning, they often incur
substantial token overhead from search and reasoning processes. This trade-off
prioritizes accuracy over efficiency. To address this issue, this work proposes
TeaRAG, a token-efficient agentic RAG framework capable of compressing both
retrieval content and reasoning steps. 1) First, the retrieved content is
compressed by augmenting chunk-based semantic retrieval with a graph retrieval
using concise triplets. A knowledge association graph is then built from
semantic similarity and co-occurrence. Finally, Personalized PageRank is
leveraged to highlight key knowledge within this graph, reducing the number of
tokens per retrieval. 2) Besides, to reduce reasoning steps, Iterative
Process-aware Direct Preference Optimization (IP-DPO) is proposed.
Specifically, our reward function evaluates the knowledge sufficiency by a
knowledge matching mechanism, while penalizing excessive reasoning steps. This
design can produce high-quality preference-pair datasets, supporting iterative
DPO to improve reasoning conciseness. Across six datasets, TeaRAG improves the
average Exact Match by 4% and 2% while reducing output tokens by 61% and 59% on
Llama3-8B-Instruct and Qwen2.5-14B-Instruct, respectively. Code is available at
https://github.com/Applied-Machine-Learning-Lab/TeaRAG.
Ссылки и действия
Дополнительные ресурсы: