📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Lingcheng Kong, Jiateng Wei, Hanzhang Shen, Huan Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
GPU kernel generation by LLMs has recently experienced rapid development,
leveraging test-time scaling and reinforcement learning techniques. However, a
key challenge for kernel generation is the scarcity of high-quality data, as
most high-quality kernels are proprietary and not open-source. This challenge
prevents us from leveraging supervised fine-tuning to align LLMs to the kernel
generation task. To address this challenge, we develop a pipeline that
generates and curates high-quality CUDA ke...
📄 On Structured State-Space Duality
2025-10-08Авторы:
Jerry Yao-Chieh Hu, Xiwen Zhang, Weimin Wu, Han Liu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Structured State-Space Duality (SSD) [Dao & Gu, ICML 2024] is an equivalence
between a simple Structured State-Space Model (SSM) and a masked attention
mechanism. In particular, a state-space model with a scalar-times-identity
state matrix is equivalent to a masked self-attention with a $1$-semiseparable
causal mask. Consequently, the same sequence transformation (model) has two
algorithmic realizations: as a linear-time $O(T)$ recurrence or as a
quadratic-time $O(T^2)$ attention. In this note, ...