A multiscale analysis of mean-field transformers in the moderate interaction regime

2509.25040v1 cs.LG, math.PR, stat.ML 2025-10-01
Авторы:

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

Abstract

In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where the number $N$ of tokens is large and the inverse temperature parameter $\beta$ of the model scales together with $N$. In this regime, the dynamics of the system displays a multiscale behavior: a fast phase, where the token empirical measure collapses on a low-dimensional space, an intermediate phase, where the measure further collapses into clusters, and a slow one, where such clusters sequentially merge into a single one. We provide a rigorous characterization of the limiting dynamics in each of these phases and prove convergence in the above mentioned limit, exemplifying our results with some simulations.

Ссылки и действия

Связанные статьи

Differentiable Expectation-Maximisation and Applications to Gaussian Mixture Mod...

#### Контекст Область исследования сосредоточена на расширении возможностей Expectation-Maximisation (EM), широко примен...

2025-09-06

Sig-DEG for Distillation: Making Diffusion Models Faster and Lighter

#### Контекст Diffusion models являются перспективным классом генерирующих моделей, способных достигать современных рез...

2025-08-27