LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning
2510.07685v1
cs.LG, cs.CL
2025-10-11
Авторы:
Yuhan Sun, Zhiwei Huang, Wanqing Cui, Shaopan Xiong, Yazhi Guo, Meiguang Jin, Junfeng Ma
Abstract
In AI-powered e-commerce livestreaming, digital avatars require real-time
responses to drive engagement, a task for which high-latency Large Reasoning
Models (LRMs) are ill-suited. We introduce LiveThinking, a practical two-stage
optimization framework to bridge this gap. First, we address computational cost
by distilling a 670B teacher LRM into a lightweight 30B Mixture-of-Experts
(MoE) model (3B active) using Rejection Sampling Fine-Tuning (RFT). This
reduces deployment overhead but preserves the teacher's verbose reasoning,
causing latency. To solve this, our second stage employs reinforcement learning
with Group Relative Policy Optimization (GRPO) to compress the model's
reasoning path, guided by a multi-objective reward function balancing
correctness, helpfulness, and brevity. LiveThinking achieves a 30-fold
reduction in computational cost, enabling sub-second latency. In real-world
application on Taobao Live, it improved response correctness by 3.3% and
helpfulness by 21.8%. Tested by hundreds of thousands of viewers, our system
led to a statistically significant increase in Gross Merchandise Volume (GMV),
demonstrating its effectiveness in enhancing user experience and commercial
performance in live, interactive settings.
Ссылки и действия
Дополнительные ресурсы: