MTmixAtt: Integrating Mixture-of-Experts with Multi-Mix Attention for Large-Scale Recommendation
2510.15286v1
cs.IR, cs.AI
2025-10-21
Авторы:
Xianyang Qi, Yuan Tian, Zhaoyu Hu, Zhirui Kuai, Chang Liu, Hongxiang Lin, Lei Wang
Abstract
Industrial recommender systems critically depend on high-quality ranking
models. However, traditional pipelines still rely on manual feature engineering
and scenario-specific architectures, which hinder cross-scenario transfer and
large-scale deployment. To address these challenges, we propose
\textbf{MTmixAtt}, a unified Mixture-of-Experts (MoE) architecture with
Multi-Mix Attention, designed for large-scale recommendation tasks. MTmixAtt
integrates two key components. The \textbf{AutoToken} module automatically
clusters heterogeneous features into semantically coherent tokens, removing the
need for human-defined feature groups. The \textbf{MTmixAttBlock} module
enables efficient token interaction via a learnable mixing matrix, shared dense
experts, and scenario-aware sparse experts, capturing both global patterns and
scenario-specific behaviors within a single framework. Extensive experiments on
the industrial TRec dataset from Meituan demonstrate that MTmixAtt consistently
outperforms state-of-the-art baselines including Transformer-based models,
WuKong, HiFormer, MLP-Mixer, and RankMixer. At comparable parameter scales,
MTmixAtt achieves superior CTR and CTCVR metrics; scaling to MTmixAtt-1B yields
further monotonic gains. Large-scale online A/B tests validate the real-world
impact: in the \textit{Homepage} scenario, MTmixAtt increases Payment PV by
\textbf{+3.62\%} and Actual Payment GTV by \textbf{+2.54\%}. Overall, MTmixAtt
provides a unified and scalable solution for modeling arbitrary heterogeneous
features across scenarios, significantly improving both user experience and
commercial outcomes.
Ссылки и действия
Дополнительные ресурсы: