From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
2510.19873v1
cs.LG, cs.AI, cs.PL
2025-10-25
Авторы:
Junfeng Gong, Zhiyi Wei, Junying Chen, Cheng Liu, Huawei Li
Abstract
Despite significant evolution of CUDA programming and domain-specific
libraries, effectively utilizing GPUs with massively parallel engines remains
difficult. Large language models (LLMs) show strong potential in generating
optimized CUDA code from sequential code. However, using LLMs in practice faces
two major challenges: cloud-based APIs pose risks of code leakage, and local
deployment is often computationally expensive and inefficient. These drawbacks
have spurred interest in small language models (SLMs), which are more
lightweight and privacy-friendly. Encouragingly, recent studies show that SLMs
can achieve performance comparable to LLMs on specific tasks. While SLMs can
match LLMs on domain-specific tasks, their limited reasoning abilities lead to
suboptimal performance in complex CUDA generation according to our experiments.
To bridge this gap, we propose ReGraphT, a training-free, retrieval-augmented
generation framework that transfers LLM-level reasoning to smaller models.
ReGraphT organizes CUDA optimization trajectories into a structured reasoning
graph, modeling the combined CUDA optimizations as state transitions, and
leverages Monte Carlo Graph Search (MCGS) for efficient exploration. We also
present a CUDA-specific benchmark with difficulty tiers defined by reasoning
complexity to evaluate models more comprehensively. Experiments show that
ReGraphT outperforms HPC-specific fine-tuned models and other
retrieval-augmented approaches, achieving an average 2.33X speedup on CUDAEval
and ParEval. When paired with DeepSeek-Coder-V2-Lite-Instruct and
Qwen2.5-Coder-7B-Instruct, ReGraphT enables SLMs to approach LLM-level
performance without the associated privacy risks or excessive computing
overhead.
Ссылки и действия
Дополнительные ресурсы: