OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning
2510.18032v1
cs.AI, cs.MA
2025-10-23
Авторы:
Zhenyu Bi, Meng Lu, Yang Li, Swastik Roy, Weijie Guan, Morteza Ziyadi, Xuan Wang
Abstract
Large Language Models (LLMs) have shown remarkable reasoning capabilities in
mathematical and scientific tasks. To enhance complex reasoning, multi-agent
systems have been proposed to harness the collective intelligence of LLM
agents. However, existing collaboration structures are either predefined or
rely on majority voting or round-table debates, which can suppress correct but
less dominant agent contributions. Recent approaches model multi-agent systems
as graph networks but optimize purely for agent performance, neglecting the
quality of interactions. We hypothesize that effective agent communication is
crucial for multi-agent reasoning and that debating quality plays a significant
role. To address this, we propose $\ours$, a multi-agent verbal reinforcement
learning algorithm that dynamically constructs and refines multi-agent
collaboration structures. Our method defines action spaces and a feedback
mechanism that evaluates communication robustness and coherence throughout the
debate. The final decision is achieved through a majority vote over all the
agents. We assess $\ours$ on various reasoning tasks, including mathematical
reasoning, creative writing, scientific reasoning, and numerical sorting.
Results demonstrate that our approach significantly outperforms single-agent
prompting methods and state-of-the-art multi-agent frameworks on diverse tasks.
Ссылки и действия
Дополнительные ресурсы: