Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

2510.22477v1 cs.MA, cs.AI 2025-10-29

Авторы:

Yijia Fan, Jusheng Zhang, Jing Yang, Keze Wang

Abstract

To combat the prohibitive communication costs of ``free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies like ``strategic silence," our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Strategic Self-Improvement for Competitive Agents in AI Labour Markets

AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation

EZYer: A simulacrum of high school with generative agent

Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions

AgentODRL: A Large Language Model-based Multi-agent System for ODRL Generation

Навигация