PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets
2510.23198v1
cs.LG, cs.AI, cs.CL
2025-10-29
Авторы:
Etienne Goffinet, Shane Bergsma, Avraham Sheinin, Natalia Vassilieva, Shaheer Muhammad, Preslav Nakov, Gurpreet Gosal
Abstract
Continual pre-training (CPT) for domain adaptation must balance target-domain
gains with stability on the base domain. Existing CPT scaling laws typically
assume a fixed pre-training budget, which limits their ability to forecast
adaptation outcomes for models trained at different tokens-per-parameter
(PTPP). We present \emph{PTPP-aware} adaptation scaling laws that make the
pre-training budget an explicit variable, enabling accurate \emph{prediction}
of adaptation loss at unseen \ptpp. On a multilingual setup (English/Arabic
$\rightarrow$ French), PTPP-aware formulations trained on early stages
(\ptpp{}=\{15,31\}) predict target loss at \ptpp{}=279 and outperform a
PTPP-agnostic \dcpt{} transfer baseline on metrics (Huber-on-log,
MAE$_\mathrm{rel}$, calibration slope); full diagnostics (RMSE, MAPE) are in
the appendix. Beyond forecasting, we show a practical use case: planning replay
ratios and adaptation token budgets that satisfy target and forgetting
constraints under compute limits.
Ссылки и действия
Дополнительные ресурсы: