MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning
2510.04935v1
cs.AI, cs.CL, cs.LG
2025-10-08
Авторы:
Guoxin Chen, Zile Qiao, Wenqing Wang, Donglei Yu, Xuanzhong Chen, Hao Sun, Minpeng Liao, Kai Fan, Yong Jiang, Penguin Xie, Wayne Xin Zhao, Ruihua Song, Fei Huang
Abstract
Large Reasoning Models (LRMs) often exhibit a tendency for overanalysis in
simple tasks, where the models excessively utilize System 2-type, deliberate
reasoning, leading to inefficient token generation. Furthermore, these models
face challenges in adapting their reasoning capabilities to rapidly changing
environments due to the static nature of their pretraining data. To address
these issues, advancing Large Language Models (LLMs) for complex reasoning
tasks requires innovative approaches that bridge intuitive and deliberate
cognitive processes, akin to human cognition's dual-system dynamic. This paper
introduces a Multi-Agent System for Deep ReSearch (MARS) enabling seamless
integration of System 1's fast, intuitive thinking with System 2's deliberate
reasoning within LLMs. MARS strategically integrates multiple external tools,
such as Google Search, Google Scholar, and Python Interpreter, to access
up-to-date information and execute complex computations, while creating a
specialized division of labor where System 1 efficiently processes and
summarizes high-volume external information, providing distilled insights that
expand System 2's reasoning context without overwhelming its capacity.
Furthermore, we propose a multi-agent reinforcement learning framework
extending Group Relative Policy Optimization to simultaneously optimize both
systems with multi-turn tool interactions, bin-packing optimization, and sample
balancing strategies that enhance collaborative efficiency. Extensive
experiments demonstrate MARS achieves substantial improvements of 3.86% on the
challenging Humanity's Last Exam (HLE) benchmark and an average gain of 8.9%
across 7 knowledge-intensive tasks, validating the effectiveness of our
dual-system paradigm for complex reasoning in dynamic information environments.
Ссылки и действия
Дополнительные ресурсы: