The Curved Spacetime of Transformer Architectures
2511.03060v1
cs.LG, cs.CL, math.DG
2025-11-07
Авторы:
Riccardo Di Sipio, Jairo Diaz-Rodriguez, Luis Serrano
Abstract
We present a geometric framework for understanding Transformer-based language
models, drawing an explicit analogy to General Relativity. Queries and keys
induce an effective metric on representation space, and attention acts as a
discrete connection that implements parallel transport of value vectors across
tokens. Stacked layers provide discrete time-slices through which token
representations evolve on this curved manifold, while backpropagation plays the
role of a least-action principle that shapes loss-minimizing trajectories in
parameter space. If this analogy is correct, token embeddings should not
traverse straight paths in feature space; instead, their layer-wise steps
should bend and reorient as interactions mediated by embedding space curvature.
To test this prediction, we design experiments that expose both the presence
and the consequences of curvature: (i) we visualize a curvature landscape for a
full paragraph, revealing how local turning angles vary across tokens and
layers; (ii) we show through simulations that excess counts of sharp/flat
angles and longer length-to-chord ratios are not explainable by dimensionality
or chance; and (iii) inspired by Einstein's eclipse experiment, we probe
deflection under controlled context edits, demonstrating measurable,
meaning-consistent bends in embedding trajectories that confirm
attention-induced curvature.
Ссылки и действия
Дополнительные ресурсы: