LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
2510.00206v1
cs.LG, cs.AI, cs.DC
2025-10-05
Авторы:
Zhanda Zhu, Qidong Su, Yaoyao Ding, Kevin Song, Shang Wang, Gennady Pekhimenko
Abstract
Low-Rank Adaptation (LoRA) has become the leading Parameter-Efficient
Fine-Tuning (PEFT) method for Large Language Models (LLMs), as it significantly
reduces GPU memory usage while maintaining competitive fine-tuned model quality
on downstream tasks. Despite these benefits, we identify two key inefficiencies
in existing LoRA fine-tuning systems. First, they incur substantial runtime
overhead due to redundant memory accesses on large activation tensors. Second,
they miss the opportunity to concurrently fine-tune multiple independent LoRA
adapters that share the same base model on the same set of GPUs. This leads to
missed performance gains such as reduced pipeline bubbles, better communication
overlap, and improved GPU load balance.
To address these issues, we introduce LoRAFusion, an efficient LoRA
fine-tuning system for LLMs. At the kernel level, we propose a graph-splitting
method that fuses memory-bound operations. This design eliminates unnecessary
memory accesses and preserves the performance of compute-bound GEMMs without
incurring the cost of recomputation or synchronization. At the scheduling
level, LoRAFusion introduces an adaptive batching algorithm for multi-job
fine-tuning. It first splits LoRA adapters into groups to intentionally stagger
batch execution across jobs, and then solves a bin-packing problem within each
group to generate balanced, dependency-aware microbatches. LoRAFusion achieves
up to $1.96\times$ ($1.47\times$ on average) end-to-end speedup compared to
Megatron-LM, and up to $1.46\times$ ($1.29\times$ on average) improvement over
mLoRA, the state-of-the-art multi-LoRA fine-tuning system. Our fused kernel
achieves up to $1.39\times$ ($1.27\times$ on average) kernel performance
improvement and can directly serve as a plug-and-play replacement in existing
LoRA systems. We open-source LoRAFusion at
https://github.com/CentML/lorafusion.
Ссылки и действия
Дополнительные ресурсы: