Think before Recommendation: Autonomous Reasoning-enhanced Recommender
2510.23077v1
cs.IR, cs.AI
2025-10-29
Авторы:
Xiaoyu Kong, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Jiancan Wu, Xiang Wang
Abstract
The core task of recommender systems is to learn user preferences from
historical user-item interactions. With the rapid development of large language
models (LLMs), recent research has explored leveraging the reasoning
capabilities of LLMs to enhance rating prediction tasks. However, existing
distillation-based methods suffer from limitations such as the teacher model's
insufficient recommendation capability, costly and static supervision, and
superficial transfer of reasoning ability. To address these issues, this paper
proposes RecZero, a reinforcement learning (RL)-based recommendation paradigm
that abandons the traditional multi-model and multi-stage distillation
approach. Instead, RecZero trains a single LLM through pure RL to autonomously
develop reasoning capabilities for rating prediction. RecZero consists of two
key components: (1) "Think-before-Recommendation" prompt construction, which
employs a structured reasoning template to guide the model in step-wise
analysis of user interests, item features, and user-item compatibility; and (2)
rule-based reward modeling, which adopts group relative policy optimization
(GRPO) to compute rewards for reasoning trajectories and optimize the LLM.
Additionally, the paper explores a hybrid paradigm, RecOne, which combines
supervised fine-tuning with RL, initializing the model with cold-start
reasoning samples and further optimizing it with RL. Experimental results
demonstrate that RecZero and RecOne significantly outperform existing baseline
methods on multiple benchmark datasets, validating the superiority of the RL
paradigm in achieving autonomous reasoning-enhanced recommender systems.
Ссылки и действия
Дополнительные ресурсы: