EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling
2510.11170v1
cs.LG, cs.AI, cs.CL
2025-10-15
Авторы:
Daniel Scalena, Leonidas Zotos, Elisabetta Fersini, Malvina Nissim, Ahmet Üstün
Abstract
With the rise of reasoning language models and test-time scaling methods as a
paradigm for improving model performance, substantial computation is often
required to generate multiple candidate sequences from the same prompt. This
enables exploration of different reasoning paths toward the correct solution,
however, allocates the same compute budget for each prompt. Grounded on the
assumption that different prompts carry different degrees of complexity, and
thus different computation needs, we propose EAGer, a training-free generation
method that leverages model uncertainty through token-wise entropy distribution
to reduce redundant computation and concurrently improve overall performance.
EAGer allows branching to multiple reasoning paths only in the presence of
high-entropy tokens, and then reallocates the saved compute budget to the
instances where exploration of alternative paths is most needed. We find that
across multiple open-source models on complex reasoning benchmarks such as AIME
2025, EAGer can reallocate the budget without accessing target labels,
achieving the best efficiency-performance trade-off in terms of reasoning
length and Pass@k. When target labels are accessible, EAGer generates up to 65%
fewer tokens (hence saving compute) and achieves up to 37% improvement in
Pass@k compared to the Full Parallel Sampling.
Ссылки и действия
Дополнительные ресурсы: