Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?
2510.11184v1
cs.LG, cs.CL
2025-10-15
Авторы:
Zhengyu Chen, Jinluan Yang, Teng Xiao, Ruochen Zhou, Luan Zhang, Xiangyu Xi, Xiaowei Shi, Wei Wang, Jinggang Wang
Abstract
Recent advances in large language models (LLMs) have demonstrated remarkable
capabilities in reasoning and tool utilization. However, the generalization of
tool-augmented reinforcement learning (RL) across diverse domains remains
underexplored. In this work, we investigate the cross-domain generalization of
an LLM agent equipped with a code interpreter tool, which is exclusively
trained on mathematical problem-solving tasks. Despite the restricted training
domain, we evaluate the agent's performance across several distinct reasoning
domains. The results reveal that RL-based tool usage learned from mathematical
tasks can be effectively transferred to complex tasks in other domains,
enabling great task performance and high token efficiency. To facilitate this
cross-domain transfer, we propose a Tool Generalization Reinforcement Learning
(TGRL) framework designed to promote domain-agnostic learning and skill
migration, encompassing: (i) a standardized tool interface that abstracts
domain-specific nuances through consistent formatting and explicit termination,
fostering transferable invocation patterns; (ii) a dual-component reward system
that decomposes rewards to incentivize generalizable behaviors like tool
efficiency and reasoning abstraction, ensuring alignment and robustness across
domain shifts; and (iii) an XML-based prompt template that separates thinking,
tool calls, and responses to encourage modular, domain-invariant planning and
coherent multi-turn interactions. Extensive experiments across diverse
benchmarks validate our approach, achieving state-of-the-art performance and
highlighting the cross-domain potential of Tool RL for LLM reasoning.
Ссылки и действия
Дополнительные ресурсы: