Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models

2510.11789v1 stat.ML, cs.LG, math.PR, math.ST, stat.TH 2025-10-16

Авторы:

Shai Zucker, Xiong Wang, Fei Lu, Inbar Seroussi

Abstract

We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a non-linear activation function. We prove that the minimax rate is $M^{-\frac{2\beta}{2\beta+1}}$ with $M$ being the sample size, depending only on the smoothness $\beta$ of the activation, and crucially independent of token count, ambient dimension, or rank of the weight matrix. These results highlight a fundamental dimension-free statistical efficiency of attention-style nonlocal models, even when the weight matrix and activation are not separately identifiable and provide a theoretical understanding of the attention mechanism and its training.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer ...

Non-Asymptotic Analysis of Data Augmentation for Precision Matrix Estimation

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communitie...

Навигация