Does Weak-to-strong Generalization Happen under Spurious Correlations?
2509.24005v1
cs.LG, stat.ML
2025-10-01
Авторы:
Chenruo Liu, Yijun Dong, Qi Lei
Abstract
We initiate a unified theoretical and algorithmic study of a key problem in
weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained
student with pseudolabels from a weaker teacher on a downstream task with
spurious correlations, does W2S happen, and how to improve it upon failures? We
consider two sources of spurious correlations caused by group imbalance: (i) a
weak teacher fine-tuned on group-imbalanced labeled data with a minority group
of fraction $\eta_\ell$, and (ii) a group-imbalanced unlabeled set
pseudolabeled by the teacher with a minority group of fraction $\eta_u$.
Theoretically, a precise characterization of W2S gain at the proportional
asymptotic limit shows that W2S always happens with sufficient pseudolabels
when $\eta_u = \eta_\ell$ but may fail when $\eta_u \ne \eta_\ell$, where W2S
gain diminishes as $(\eta_u - \eta_\ell)^2$ increases. Our theory is
corroborated by extensive experiments on various spurious correlation
benchmarks and teacher-student pairs. To boost W2S performance upon failures,
we further propose a simple, effective algorithmic remedy that retrains the
strong student on its high-confidence data subset after W2S fine-tuning. Our
algorithm is group-label-free and achieves consistent, substantial improvements
over vanilla W2S fine-tuning.
Ссылки и действия
Дополнительные ресурсы: