ConCuR: Conciseness Makes State-of-the-Art Kernel Generation
2510.07356v1
cs.LG, cs.CL, cs.CV, stat.ML
2025-10-11
Авторы:
Lingcheng Kong, Jiateng Wei, Hanzhang Shen, Huan Wang
Abstract
GPU kernel generation by LLMs has recently experienced rapid development,
leveraging test-time scaling and reinforcement learning techniques. However, a
key challenge for kernel generation is the scarcity of high-quality data, as
most high-quality kernels are proprietary and not open-source. This challenge
prevents us from leveraging supervised fine-tuning to align LLMs to the kernel
generation task. To address this challenge, we develop a pipeline that
generates and curates high-quality CUDA kernels with reasoning traces,
motivated by a critical observation that concise yet informative reasoning
traces result in robust generation of high-performance kernels. Using this
pipeline, we construct our dataset ConCuR and introduce our model KernelCoder,
which is the first model trained on a curated dataset consisting of PyTorch,
reasoning, and CUDA kernel pairs, to our knowledge. In the KernelBench setup,
our model achieves significant improvements over the existing top-performing
model, QwQ-32B, and outperforms all open-source models fine-tuned for kernel
generation, as well as frontier models such as DeepSeek-V3.1-Think and
Claude-4-sonnet. Finally, we show that the average reasoning length can serve
as a metric to assess the difficulty of kernel generation tasks. The
observations, metrics, and our data collection and curation pipeline can help
obtain better data in the kernel generation task in the future.