ORBIT -- Open Recommendation Benchmark for Reproducible Research with Hidden Tests
2510.26095v1
cs.IR, cs.CL
2025-11-01
Авторы:
Jingyuan He, Jiongnan Liu, Vishan Vishesh Oberoi, Bolin Wu, Mahima Jagadeesh Patel, Kangrui Mao, Chuning Shi, I-Ta Lee, Arnold Overwijk, Chenyan Xiong
Abstract
Recommender systems are among the most impactful AI applications, interacting
with billions of users every day, guiding them to relevant products, services,
or information tailored to their preferences. However, the research and
development of recommender systems are hindered by existing datasets that fail
to capture realistic user behaviors and inconsistent evaluation settings that
lead to ambiguous conclusions. This paper introduces the Open Recommendation
Benchmark for Reproducible Research with HIdden Tests (ORBIT), a unified
benchmark for consistent and realistic evaluation of recommendation models.
ORBIT offers a standardized evaluation framework of public datasets with
reproducible splits and transparent settings for its public leaderboard.
Additionally, ORBIT introduces a new webpage recommendation task, ClueWeb-Reco,
featuring web browsing sequences from 87 million public, high-quality webpages.
ClueWeb-Reco is a synthetic dataset derived from real, user-consented, and
privacy-guaranteed browsing data. It aligns with modern recommendation
scenarios and is reserved as the hidden test part of our leaderboard to
challenge recommendation models' generalization ability. ORBIT measures 12
representative recommendation models on its public benchmark and introduces a
prompted LLM baseline on the ClueWeb-Reco hidden test. Our benchmark results
reflect general improvements of recommender systems on the public datasets,
with variable individual performances. The results on the hidden test reveal
the limitations of existing approaches in large-scale webpage recommendation
and highlight the potential for improvements with LLM integrations. ORBIT
benchmark, leaderboard, and codebase are available at
https://www.open-reco-bench.ai.
Ссылки и действия
Дополнительные ресурсы: