A Function Centric Perspective On Flat and Sharp Minima
2510.12451v1
cs.LG, cs.AI, cs.CV
2025-10-16
Авторы:
Israel Mason-Williams, Gabryel Mason-Williams, Helen Yannakoudakis
Abstract
Flat minima are widely believed to correlate with improved generalisation in
deep neural networks. However, this connection has proven more nuanced in
recent studies, with both theoretical counterexamples and empirical exceptions
emerging in the literature. In this paper, we revisit the role of sharpness in
model performance, proposing that sharpness is better understood as a
function-dependent property rather than a reliable indicator of poor
generalisation. We conduct extensive empirical studies, from single-objective
optimisation to modern image classification tasks, showing that sharper minima
often emerge when models are regularised (e.g., via SAM, weight decay, or data
augmentation), and that these sharp minima can coincide with better
generalisation, calibration, robustness, and functional consistency. Across a
range of models and datasets, we find that baselines without regularisation
tend to converge to flatter minima yet often perform worse across all safety
metrics. Our findings demonstrate that function complexity, rather than
flatness alone, governs the geometry of solutions, and that sharper minima can
reflect more appropriate inductive biases (especially under regularisation),
calling for a function-centric reappraisal of loss landscape geometry.
Ссылки и действия
Дополнительные ресурсы: