Anchored Supervised Fine-Tuning
2509.23753v1
cs.LG, cs.CL
2025-10-01
Авторы:
He Zhu, Junyou Su, Peng Lai, Ren Ma, Wenjia Zhang, Linyi Yang, Guanhua Chen
Abstract
Post-training of large language models involves a fundamental trade-off
between supervised fine-tuning (SFT), which efficiently mimics demonstrations
but tends to memorize, and reinforcement learning (RL), which achieves better
generalization at higher computational cost. Dynamic Fine-Tuning (DFT) recently
emerged as a promising middle ground, reweighting SFT objectives with token
probabilities and achieving improvements in certain reasoning domains, though
it exhibits instability in other tasks. We provide a analysis of DFT through
the reward-weighted regression (RWR) framework, revealing that it corresponds
to a specific auxiliary distribution choice that yields provably tighter RL
bounds than standard SFT. However, our analysis also uncovers a critical
limitation: this construction lacks distributional anchoring, leading to
progressive drift that undermines training stability. To address this, we
propose Anchored Supervised Fine-Tuning (ASFT), which augments DFT's
reweighting with lightweight KL regularization to preserve tightness while
ensuring stability. Empirically, ASFT consistently outperforms both SFT and DFT
across mathematical reasoning, medical knowledge grounding, and code
generation, achieving substantial improvements with minimal computational
overhead. Our RWR framework provides a systematic lens for understanding
post-training methods and demonstrates that principled theoretical analysis
leads to both stronger guarantees and practical gains.
Ссылки и действия
Дополнительные ресурсы: