High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization
2510.06955v1
cs.LG, cs.CV
2025-10-10
Авторы:
Masih Aminbeidokhti, Heitor Rapela Medeiros, Eric Granger, Marco Pedersoli
Abstract
Ensembling fine-tuned models initialized from powerful pre-trained weights is
a common strategy to improve robustness under distribution shifts, but it comes
with substantial computational costs due to the need to train and store
multiple models. Dropout offers a lightweight alternative by simulating
ensembles through random neuron deactivation; however, when applied to
pre-trained models, it tends to over-regularize and disrupt critical
representations necessary for generalization. In this work, we investigate
Mixout, a stochastic regularization technique that provides an alternative to
Dropout for domain generalization. Rather than deactivating neurons, Mixout
mitigates overfitting by probabilistically swapping a subset of fine-tuned
weights with their pre-trained counterparts during training, thereby
maintaining a balance between adaptation and retention of prior knowledge. Our
study reveals that achieving strong performance with Mixout on domain
generalization benchmarks requires a notably high masking probability of 0.9
for ViTs and 0.8 for ResNets. While this may seem like a simple adjustment, it
yields two key advantages for domain generalization: (1) higher masking rates
more strongly penalize deviations from the pre-trained parameters, promoting
better generalization to unseen domains; and (2) high-rate masking
substantially reduces computational overhead, cutting gradient computation by
up to 45% and gradient memory usage by up to 90%. Experiments across five
domain generalization benchmarks, PACS, VLCS, OfficeHome, TerraIncognita, and
DomainNet, using ResNet and ViT architectures, show that our approach,
High-rate Mixout, achieves out-of-domain accuracy comparable to ensemble-based
methods while significantly reducing training costs.
Ссылки и действия
Дополнительные ресурсы: