📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Marius Dragoi, Ioana Pintilie, Florin Gogianu, Florin Brad
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a
powerful paradigm to improve Large Language Models on reasoning tasks such as
coding, math or logic. To assess the reasoning boundary (the fraction of
problems a model can solve) researchers often report Pass@k at large sampling
budgets. Recent results reveal a crossover phenomenon: while RLVR models
outperform the base model at small k values, the base model usually outperforms
them when sampling a very large number of compl...