Agentic Reinforcement Learning for Real-World Code Repair
2510.22075v1
cs.LG, cs.AI, cs.CL
2025-10-29
Авторы:
Siyu Zhu, Anastasiya Karpovich, Albert Chen, Jessica Koscheka, Shailesh Jannu, Di Wen, Yuqing Zhu, Rohit Jain, Alborz Geramifard
Abstract
We tackle the challenge of training reliable code-fixing agents in real
repositories, where complex builds and shifting dependencies make evaluation
unstable. We developed a verifiable pipeline with success defined as post-fix
build validation and improved reproducibility across ~1K real issues by pinning
dependencies and disabling automatic upgrades. Building on this, we introduced
a scalable simplified pipeline for large-scale reinforcement learning (RL).
Using this setup, we supervised fine-tuned Qwen3-32B in the full pipeline and
applied RL on top of the SFT model in the simplified environment. The SFT model
distilled from GPT-4.1 trajectories performs on par while being 56x smaller,
and RL added 7-20% absolute gains under matched train-test conditions.
"Thinking mode" was on par or worse in our experiments. Both SFT and RL models
failed to generalize across environments, highlighting the importance of
matching train-test environments for building reliable real-world code-fixing
agents.
Ссылки и действия
Дополнительные ресурсы: