Best-of-Majority: Minimax-Optimal Strategy for Pass@$k$ Inference Scaling
2510.03199v1
cs.LG, stat.ML
2025-10-07
Авторы:
Qiwei Di, Kaixuan Ji, Xuheng Li, Heyang Zhao, Quanquan Gu
Abstract
LLM inference often generates a batch of candidates for a prompt and selects
one via strategies like majority voting or Best-of- N (BoN). For difficult
tasks, this single-shot selection often underperforms. Consequently,
evaluations commonly report Pass@$k$: the agent may submit up to $k$ responses,
and only the best of them is used when computing regret. Motivated by this, we
study inference scaling in the more general Pass@$k$ inference setting, and
prove that neither majority voting nor BoN exhibits the desirable scaling with
$k$ and the sampling budget $N$. Combining the advantages of majority voting
and BoN, we propose a new inference strategy called Best-of-Majority (BoM),
with a pivotal step that restricts the candidates to the responses with high
frequency in the $N$ samples before selecting the top-$k$ rewards. We prove
that when the sampling budget is $N=\tilde\Omega(C^*)$, the regret of BoM is
$O(\epsilon_{\mathrm{opt}}+\sqrt{\epsilon_{\mathrm{RM}}^2C^*/k})$, where $C^*$
is the coverage coefficient, $\epsilon_{\mathrm{RM}}$ is the estimation error
of the reward model, and $\epsilon_{\mathrm{opt}}$ is the estimation error of
reward at the optimal response. We further establish a matching lower bound,
certifying that our algorithm is minimax optimal. Beyond optimality, BoM has a
key advantage: unlike majority voting and BoN, its performance does not degrade
when increasing $N$. Experimental results of inference on math problems show
BoM outperforming both majority voting and BoN.
Ссылки и действия
Дополнительные ресурсы: