On Measuring Localization of Shortcuts in Deep Networks
2510.26560v1
cs.LG, stat.ML
2025-11-01
Авторы:
Nikita Tsoy, Nikola Konstantinov
Abstract
Shortcuts, spurious rules that perform well during training but fail to
generalize, present a major challenge to the reliability of deep networks
(Geirhos et al., 2020). However, the impact of shortcuts on feature
representations remains understudied, obstructing the design of principled
shortcut-mitigation methods. To overcome this limitation, we investigate the
layer-wise localization of shortcuts in deep models. Our novel experiment
design quantifies the layer-wise contribution to accuracy degradation caused by
a shortcut-inducing skew by counterfactual training on clean and skewed
datasets. We employ our design to study shortcuts on CIFAR-10, Waterbirds, and
CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures. We find
that shortcut learning is not localized in specific layers but distributed
throughout the network. Different network parts play different roles in this
process: shallow layers predominantly encode spurious features, while deeper
layers predominantly forget core features that are predictive on clean data. We
also analyze the differences in localization and describe its principal axes of
variation. Finally, our analysis of layer-wise shortcut-mitigation strategies
suggests the hardness of designing general methods, supporting dataset- and
architecture-specific approaches instead.
Ссылки и действия
Дополнительные ресурсы: