Relative-Based Scaling Law for Neural Language Models
2510.20387v1
cs.LG, cs.AI, cs.CL
2025-10-25
Авторы:
Baoqing Yue, Jinyuan Zhou, Zixi Wei, Jingtao Zhan, Qingyao Ai, Yiqun Liu
Abstract
Scaling laws aim to accurately predict model performance across different
scales. Existing scaling-law studies almost exclusively rely on cross-entropy
as the evaluation metric. However, cross-entropy provides only a partial view
of performance: it measures the absolute probability assigned to the correct
token, but ignores the relative ordering between correct and incorrect tokens.
Yet, relative ordering is crucial for language models, such as in
greedy-sampling scenario. To address this limitation, we investigate scaling
from the perspective of relative ordering. We first propose the Relative-Based
Probability (RBP) metric, which quantifies the probability that the correct
token is ranked among the top predictions. Building on this metric, we
establish the Relative-Based Scaling Law, which characterizes how RBP improves
with increasing model size. Through extensive experiments on four datasets and
four model families spanning five orders of magnitude, we demonstrate the
robustness and accuracy of this law. Finally, we illustrate the broad
application of this law with two examples, namely providing a deeper
explanation of emergence phenomena and facilitating finding fundamental
theories of scaling laws. In summary, the Relative-Based Scaling Law
complements the cross-entropy perspective and contributes to a more complete
understanding of scaling large language models. Thus, it offers valuable
insights for both practical development and theoretical exploration.
Ссылки и действия
Дополнительные ресурсы: