Unravelling the Mechanisms of Manipulating Numbers in Language Models

2510.26285v1 cs.CL, cs.AI, cs.LG, cs.NE 2025-11-01

Авторы:

Michal Štefánik, Timothee Mickus, Marek Kadlčík, Bertram Højer, Michal Spiegel, Raúl Vázquez, Aman Sinha, Josef Kuchař, Philipp Mondorf

Abstract

Recent work has shown that different large language models (LLMs) converge to similar and accurate input embedding representations for numbers. These findings conflict with the documented propensity of LLMs to produce erroneous outputs when dealing with numeric information. In this work, we aim to explain this conflict by exploring how language models manipulate numbers and quantify the lower bounds of accuracy of these mechanisms. We find that despite surfacing errors, different language models learn interchangeable representations of numbers that are systematic, highly accurate and universal across their hidden states and the types of input contexts. This allows us to create universal probes for each LLM and to trace information -- including the causes of output errors -- to specific layers. Our results lay a fundamental understanding of how pre-trained LLMs manipulate numbers and outline the potential of more accurate probing techniques in addressed refinements of LLMs' architectures.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Unravelling the Mechanisms of Manipulating Numbers in Language Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Elastic Architecture Search for Efficient Language Models

Understanding Textual Emotion Through Emoji Prediction

Навигация