📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
📄 Normalization in Attention Dynamics
2025-10-29Авторы:
Nikita Karagodin, Shu Ge, Yury Polyanskiy, Philippe Rigollet
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We study the effect of normalization schemes on token representations in deep
transformers. Modeling their evolution as interacting particles on the sphere,
we show that normalization acts as a form of speed regulation. This perspective
enables a unified analysis of several schemes -- including Post-LN, Pre-LN,
Mix-LN, Peri-LN, nGPT, and LN-Scaling -- revealing how they influence
clustering dynamics and representation collapse. Our framework clarifies how
different schemes shape token representa...