Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
2509.23993v1
cs.CV, cs.RO
2025-10-01
Авторы:
Muleilan Pei, Shaoshuai Shi, Shaojie Shen
Abstract
Scalable and realistic simulation of multi-agent traffic behavior is critical
for advancing autonomous driving technologies. Although existing data-driven
simulators have made significant strides in this domain, they predominantly
rely on supervised learning to align simulated distributions with real-world
driving scenarios. A persistent challenge, however, lies in the distributional
shift that arises between training and testing, which often undermines model
generalization in unseen environments. To address this limitation, we propose
SMART-R1, a novel R1-style reinforcement fine-tuning paradigm tailored for
next-token prediction models to better align agent behavior with human
preferences and evaluation metrics. Our approach introduces a metric-oriented
policy optimization algorithm to improve distribution alignment and an
iterative "SFT-RFT-SFT" training strategy that alternates between Supervised
Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT) to maximize performance
gains. Extensive experiments on the large-scale Waymo Open Motion Dataset
(WOMD) validate the effectiveness of this simple yet powerful R1-style training
framework in enhancing foundation models. The results on the Waymo Open Sim
Agents Challenge (WOSAC) showcase that SMART-R1 achieves state-of-the-art
performance with an overall realism meta score of 0.7858, ranking first on the
leaderboard at the time of submission.
Ссылки и действия
Дополнительные ресурсы: