Laminar: A Scalable Asynchronous RL Post-Training Framework
2510.12633v1
cs.LG, cs.AI, cs.DC
2025-10-16
Авторы:
Guangming Sheng, Yuxuan Tong, Borui Wan, Wang Zhang, Chaobo Jia, Xibin Wu, Yuqi Wu, Xiang Li, Chi Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu
Abstract
Reinforcement learning (RL) post-training for Large Language Models (LLMs) is
now scaling to large clusters and running for extended durations to enhance
model reasoning performance. However, the scalability of existing RL frameworks
is limited, as extreme long-tail skewness in RL trajectory generation causes
severe GPU underutilization. Current asynchronous RL systems attempt to
mitigate this, but they rely on global weight synchronization between the actor
and all rollouts, which creates a rigid model update schedule. This global
synchronization is ill-suited for the highly skewed and evolving distribution
of trajectory generation latency in RL training, crippling training efficiency.
Our key insight is that efficient scaling requires breaking this lockstep
through trajectory-level asynchrony, which generates and consumes each
trajectory independently. We propose Laminar, a scalable and robust RL
post-training system built on a fully decoupled architecture. First, we replace
global updates with a tier of relay workers acting as a distributed parameter
service. This enables asynchronous and fine-grained weight synchronization,
allowing rollouts to pull the latest weight anytime without stalling the
actor's training loop. Second, a dynamic repack mechanism consolidates
long-tail trajectories onto a few dedicated rollouts, maximizing generation
throughput. The fully decoupled design also isolates failures, ensuring
robustness for long-running jobs. Our evaluation on a 1024-GPU cluster shows
that Laminar achieves up to 5.48$\times$ training throughput speedup over
state-of-the-art systems, while reducing model convergence time.
Ссылки и действия
Дополнительные ресурсы: