From Memorization to Reasoning in the Spectrum of Loss Curvature

2510.24256v1 cs.CL, cs.LG 2025-10-30

Авторы:

Jack Merullo, Srihita Vatsavaya, Lucius Bushnaq, Owen Lewis

Abstract

We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature. This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels. This motivates a weight editing procedure that suppresses far more recitation of untargeted memorized data more effectively than a recent unlearning method (BalancedSubnet), while maintaining lower perplexity. Since the basis of curvature has a natural interpretation for shared structure in model weights, we analyze the editing procedure extensively on its effect on downstream tasks in LMs, and find that fact retrieval and arithmetic are specifically and consistently negatively affected, even though open book fact retrieval and general logical reasoning is conserved. We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized. We support this by showing a correspondence between task data's activation strength with low curvature components that we edit out, and the drop in task performance after the edit. Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiosyncratic, narrowly-used structures involved in solving tasks like math and fact retrieval.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

From Memorization to Reasoning in the Spectrum of Loss Curvature

Авторы:

Abstract

Ссылки и действия

Связанные статьи

A Preliminary Study on the Promises and Challenges of Native Top-$k$ Sparse Atte...

Computational Linguistics Meets Libyan Dialect: A Study on Dialect Identificatio...

Sarcasm Detection on Reddit Using Classical Machine Learning and Feature Enginee...

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

Enhancing Job Matching: Occupation, Skill and Qualification Linking with the ESC...

Навигация