xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning
2510.08439v1
cs.LG, cs.AI, cs.CL
2025-10-11
Авторы:
Cheng Qian, Zuxin Liu, Shirley Kokane, Akshara Prabhakar, Jielin Qiu, Haolin Chen, Zhiwei Liu, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
Abstract
Modern LLM deployments confront a widening cost-performance spectrum: premium
models deliver strong reasoning but are expensive, while lightweight models are
economical yet brittle on complex tasks. Static escalation rules and keyword
heuristics under-utilize this spectrum and fail to adapt across task types. We
present xRouter, a tool-calling-based routing system in which a learned router
can either answer directly or invoke one or more external models. The router is
trained end-to-end with reinforcement learning using an explicit, cost-aware
reward that encodes cost-performance trade-offs, eliminating the need for
hand-engineered routing rules. Our implementation encompasses the full
reinforcement learning framework, including reward and cost accounting, as well
as the deployment and evaluation pipelines. Across diverse benchmarks, xRouter
achieves strong cost-performance trade-offs (e.g., substantial cost reductions
at comparable task completion rates), and provides empirical insights into what
reliably helps learned routing and what does not, ranging from model
trainability to the difficulty of eliciting sophisticated orchestration
behaviors in small open models. We hope these findings and our open
implementation will serve as a practical substrate for advancing learned,
cost-aware LLM orchestration.
Ссылки и действия
Дополнительные ресурсы: