Laminar: A Scalable Asynchronous RL Post-Training Framework

2510.12633v1 cs.LG, cs.AI, cs.DC 2025-10-16

Авторы:

Guangming Sheng, Yuxuan Tong, Borui Wan, Wang Zhang, Chaobo Jia, Xibin Wu, Yuqi Wu, Xiang Li, Chi Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu

Abstract

Reinforcement learning (RL) post-training for Large Language Models (LLMs) is now scaling to large clusters and running for extended durations to enhance model reasoning performance. However, the scalability of existing RL frameworks is limited, as extreme long-tail skewness in RL trajectory generation causes severe GPU underutilization. Current asynchronous RL systems attempt to mitigate this, but they rely on global weight synchronization between the actor and all rollouts, which creates a rigid model update schedule. This global synchronization is ill-suited for the highly skewed and evolving distribution of trajectory generation latency in RL training, crippling training efficiency. Our key insight is that efficient scaling requires breaking this lockstep through trajectory-level asynchrony, which generates and consumes each trajectory independently. We propose Laminar, a scalable and robust RL post-training system built on a fully decoupled architecture. First, we replace global updates with a tier of relay workers acting as a distributed parameter service. This enables asynchronous and fine-grained weight synchronization, allowing rollouts to pull the latest weight anytime without stalling the actor's training loop. Second, a dynamic repack mechanism consolidates long-tail trajectories onto a few dedicated rollouts, maximizing generation throughput. The fully decoupled design also isolates failures, ensuring robustness for long-running jobs. Our evaluation on a 1024-GPU cluster shows that Laminar achieves up to 5.48$\times$ training throughput speedup over state-of-the-art systems, while reducing model convergence time.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Laminar: A Scalable Asynchronous RL Post-Training Framework

Авторы:

Abstract

Ссылки и действия

Связанные статьи

A Fast and Flat Federated Learning Method via Weighted Momentum and Sharpness-Aw...

Privacy in Federated Learning with Spiking Neural Networks

Federated style aware transformer aggregation of representations

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampl...

Навигация