Hierarchical LoRA MoE for Efficient CTR Model Scaling
2510.10432v1
cs.LG, cs.AI, cs.IR
2025-10-16
Авторы:
Zhichen Zeng, Mengyue Hang, Xiaolong Liu, Xiaoyi Liu, Xiao Lin, Ruizhong Qiu, Tianxin Wei, Zhining Liu, Siyang Yuan, Chaofei Yang, Yiqun Liu, Hang Yin, Jiyan Yang, Hanghang Tong
Abstract
Deep models have driven significant advances in click-through rate (CTR)
prediction. While vertical scaling via layer stacking improves model
expressiveness, the layer-by-layer sequential computation poses challenges to
efficient scaling. Conversely, horizontal scaling through Mixture of Experts
(MoE) achieves efficient scaling by activating a small subset of experts in
parallel, but flat MoE layers may struggle to capture the hierarchical
structure inherent in recommendation tasks. To push the Return-On-Investment
(ROI) boundary, we explore the complementary strengths of both directions and
propose HiLoMoE, a hierarchical LoRA MoE framework that enables holistic
scaling in a parameter-efficient manner. Specifically, HiLoMoE employs
lightweight rank-1 experts for parameter-efficient horizontal scaling, and
stacks multiple MoE layers with hierarchical routing to enable combinatorially
diverse expert compositions. Unlike conventional stacking, HiLoMoE routes based
on prior layer scores rather than outputs, allowing all layers to execute in
parallel. A principled three-stage training framework ensures stable
optimization and expert diversity. Experiments on four public datasets show
that HiLoMoE achieving better performance-efficiency tradeoff, achieving an
average AUC improvement of 0.20\% in AUC and 18.5\% reduction in FLOPs compared
to the non-MoE baseline.
Ссылки и действия
Дополнительные ресурсы: