MTRec: Learning to Align with User Preferences via Mental Reward Models
2509.22807v1
cs.IR, cs.AI
2025-10-01
Авторы:
Mengchen Zhao, Yifan Gao, Yaqing Hou, Xiangyang Li, Pengjie Gu, Zhenhua Dong, Ruiming Tang, Yi Cai
Abstract
Recommendation models are predominantly trained using implicit user feedback,
since explicit feedback is often costly to obtain. However, implicit feedback,
such as clicks, does not always reflect users' real preferences. For example, a
user might click on a news article because of its attractive headline, but end
up feeling uncomfortable after reading the content. In the absence of explicit
feedback, such erroneous implicit signals may severely mislead recommender
systems. In this paper, we propose MTRec, a novel sequential recommendation
framework designed to align with real user preferences by uncovering their
internal satisfaction on recommended items. Specifically, we introduce a mental
reward model to quantify user satisfaction and propose a distributional inverse
reinforcement learning approach to learn it. The learned mental reward model is
then used to guide recommendation models to better align with users' real
preferences. Our experiments show that MTRec brings significant improvements to
a variety of recommendation models. We also deploy MTRec on an industrial short
video platform and observe a 7 percent increase in average user viewing time.
Ссылки и действия
Дополнительные ресурсы: