Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization
2510.22477v1
cs.MA, cs.AI
2025-10-29
Авторы:
Yijia Fan, Jusheng Zhang, Jing Yang, Keze Wang
Abstract
To combat the prohibitive communication costs of ``free-for-all" multi-agent
systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly
optimizes for token economy using sequence-level reinforcement learning.
Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy
Optimization (GSPO) algorithm to train agents on a communication-aware reward
that explicitly penalizes verbosity. Across seven reasoning benchmarks,
Agent-GSPO not only achieves new state-of-the-art performance but does so with
a fraction of the token consumption of existing methods. By fostering emergent
strategies like ``strategic silence," our approach provides a practical
blueprint for developing scalable and economically viable multi-agent systems.
Ссылки и действия
Дополнительные ресурсы: