mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
2510.01146v1
cs.CL, cs.AI, cs.LG
2025-10-04
Авторы:
David Anugraha, Shou-Yi Hung, Zilu Tang, Annie En-Shiun Lee, Derry Tanti Wijaya, Genta Indra Winata
Abstract
Evaluation using Large Language Model (LLM) judges has been widely adopted in
English and shown to be effective for automatic evaluation. However, their
performance does not generalize well to non-English settings, and it remains
unclear what constitutes effective multilingual training for such judges. In
this paper, we introduce mR3, a massively multilingual, rubric-agnostic reward
reasoning model trained on 72 languages, achieving the broadest language
coverage in reward modeling to date. We present a comprehensive study of data
and curriculum selection for training to identify effective strategies and data
sources for building high-quality reward models, including the integration of
target-language reasoning datasets. Our approach attains state-of-the-art
performance on multilingual reward model benchmarks, surpassing much larger
models (i.e., GPT-OSS-120B) while being up to 9x smaller, and its effectiveness
is further confirmed through extensive ablation studies. Our models, data, and
code are available as open source at https://github.com/rubricreward/mr3.
Ссылки и действия
Дополнительные ресурсы: