Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
2510.10925v1
cs.LG, cs.CL
2025-10-15
Авторы:
Hengyuan Zhang, Shiping Yang, Xiao Liang, Chenming Shang, Yuxuan Jiang, Chaofan Tao, Jing Xiong, Hayden Kwok-Hay So, Ruobing Xie, Angel X. Chang, Ngai Wong
Abstract
Training student models on synthetic data generated by strong teacher models
is a promising way to distilling the capabilities of teachers. However, recent
studies show that stronger models are not always optimal teachers, revealing a
mismatch between teacher outputs and student learnability. To address this
issue, we propose PerSyn (Personalized data Synthesis), a novel synthesis
strategy that operates under a new ``Route then Generate'' paradigm to create
data tailored to each student model, enabling it to learn more effectively.
Specifically, PerSyn first assigns each prompt to its optimal teacher via a
query-level router that jointly considers student learnability and teacher
response quality. Each teacher then synthesizes data only for its assigned
prompts, making the process more efficient than the conventional ``Generate
then Select'' paradigm, where all teachers must generate parallel responses for
the entire prompt set before constructing the final dataset. Extensive
experiments across different model families and scales demonstrate that PerSyn
consistently achieves superior or comparable performance to all baselines in
instruct tuning and math reasoning settings. Further analysis verifies the
effectiveness of PerSyn and offers extra insights to propel future research.
Ссылки и действия
Дополнительные ресурсы: