📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 Accelerating Training Speed of Tiny Recursive Models via Curriculum Guided Adaptive Recursion
2025-11-15Авторы:
Kaleem Ullah Qasim, Jiashu Zhang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recursive reasoning models achieve remarkable performance on complex reasoning tasks through iterative refinement, enabling tiny networks to match large language models thousands of times their size. However, training remains computationally expensive, prior work reporting approximately 36 GPU-hours per dataset, limiting broader adoption and research. We propose CGAR, a novel training methodology that applies curriculum learning to architectural depth rather than traditional data ordering. CGAR ...
Авторы:
Houjun Liu, Shikhar Murty, Christopher D. Manning, Róbert Csordás
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Current approaches for scaling inference-time compute in transformers rely on
training them to emit explicit chain-of-thought tokens before producing an
answer. While these methods are powerful, they are limited because they cannot
be applied during pretraining and are limited to only serially-generated,
natural-language verbalization to scale inference-time compute. In this work,
we propose Thoughtbubbles, a transformer variant that natively performs
parallel adaptive computation in latent spac...