RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
2510.14828v1
cs.AI, cs.RO
2025-10-18
Авторы:
Jinrui Liu, Bingyan Nie, Boyu Li, Yaran Chen, Yuze Wang, Shunsen He, Haoran Li
Abstract
Improving the reasoning capabilities of embodied agents is crucial for robots
to complete complex human instructions in long-view manipulation tasks
successfully. Despite the success of large language models and vision language
models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue
facing challenges in performing long-horizon manipulation tasks in complex
real-world environments, owing to their restricted common sense and reasoning
capabilities. Considering that aligning general-purpose vision language models
to robotic planning tasks via supervised fine-tuning suffers from poor
generalization and insufficient physical understanding, we propose RoboGPT-R1,
a two-stage fine-tuning framework for embodied planning. In this framework,
supervised training acquires foundational knowledge through expert sequences,
followed by RL to address the model's shortcomings in visual-spatial
understanding and reasoning. To achieve physical understanding and action
sequence consistency in multi-step reasoning tasks, we design a rule-based
reward function that simultaneously considers long-horizon performance and
action constraint in the environment. The reasoning model, trained on
Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini,
by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the
EmbodiedBench benchmark.
Ссылки и действия
Дополнительные ресурсы: