Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning
2510.16882v1
cs.LG, cs.AI, cs.CL
2025-10-22
Авторы:
Heming Zou, Yixiu Mao, Yun Qu, Qi Wang, Xiangyang Ji
Abstract
Supervised fine-tuning (SFT) is a commonly used technique to adapt large
language models (LLMs) to downstream tasks. In practice, SFT on a full dataset
is computationally expensive and sometimes suffers from overfitting or bias
amplification. This facilitates the rise of data curation in SFT, which
prioritizes the most valuable data to optimze. This work studies the online
batch selection family that dynamically scores and filters samples during the
training process. However, existing popular methods often (i) rely merely on
the utility of data to select a subset while neglecting other crucial factors
like diversity, (ii) rely on external resources such as reference models or
validation sets, and (iii) incur extra training time over full-dataset
training. To address these limitations, this work develops \textbf{UDS
(Utility-Diversity Sampling)}, a framework for efficient online batch selection
in SFT. UDS leverages the nuclear norm of the logits matrix to capture both
data utility and intra-sample diversity, while estimating inter-sample
diversity through efficient low-dimensional embedding comparisons with a
lightweight memory buffer of historical samples. Such a design eliminates the
need for external resources and unnecessary backpropagation, securing
computational efficiency. Experiments on multiple benchmarks demonstrate that
UDS consistently outperforms state-of-the-art online batch selection methods
under varying data budgets, and significantly reduces training time compared to
full-dataset fine-tuning. Code is available at https://github.com/gfyddha/UDS.
Ссылки и действия
Дополнительные ресурсы: