Scaling Up Temporal Domain Generalization via Temporal Experts Averaging
2509.26045v1
cs.LG, cs.CL, cs.CV
2025-10-02
Авторы:
Aoming Liu, Kevin Miller, Venkatesh Saligrama, Kate Saenko, Boqing Gong, Ser-Nam Lim, Bryan A. Plummer
Abstract
Temporal Domain Generalization (TDG) aims to generalize across temporal
distribution shifts, e.g., lexical change over time. Prior work often addresses
this by predicting future model weights. However, full model prediction is
prohibitively expensive for even reasonably sized models. Thus, recent methods
only predict the classifier layer, limiting generalization by failing to adjust
other model components. To address this, we propose Temporal Experts Averaging
(TEA), a novel and scalable TDG framework that updates the entire model using
weight averaging to maximize generalization potential while minimizing
computational costs. Our theoretical analysis guides us to two steps that
enhance generalization to future domains. First, we create expert models with
functional diversity yet parameter similarity by fine-tuning a domain-agnostic
base model on individual temporal domains while constraining weight changes.
Second, we optimize the bias-variance tradeoff through adaptive averaging
coefficients derived from modeling temporal weight trajectories in a principal
component subspace. Expert's contributions are based on their projected
proximity to future domains. Extensive experiments across 7 TDG benchmarks, 5
models, and 2 TDG settings shows TEA outperforms prior TDG methods by up to 69%
while being up to 60x more efficient.
Ссылки и действия
Дополнительные ресурсы: