A multiscale analysis of mean-field transformers in the moderate interaction regime
2509.25040v1
cs.LG, math.PR, stat.ML
2025-10-01
Авторы:
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
Abstract
In this paper, we study the evolution of tokens through the depth of
encoder-only transformer models at inference time by modeling them as a system
of particles interacting in a mean-field way and studying the corresponding
dynamics. More specifically, we consider this problem in the moderate
interaction regime, where the number $N$ of tokens is large and the inverse
temperature parameter $\beta$ of the model scales together with $N$. In this
regime, the dynamics of the system displays a multiscale behavior: a fast
phase, where the token empirical measure collapses on a low-dimensional space,
an intermediate phase, where the measure further collapses into clusters, and a
slow one, where such clusters sequentially merge into a single one. We provide
a rigorous characterization of the limiting dynamics in each of these phases
and prove convergence in the above mentioned limit, exemplifying our results
with some simulations.