📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Anand Srinivasan, Jean-Jacques Slotine
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recently, the vanishing-step-size limit of the Sinkhorn algorithm at finite
regularization parameter $\varepsilon$ was shown to be a mirror descent in the
space of probability measures. We give $L^2$ contraction criteria in two
time-dependent metrics induced by the mirror Hessian, which reduce to the
coercivity of certain conditional expectation operators. We then give an exact
identity for the entropy production rate of the Sinkhorn flow, which was
previously known only to be nonpositive. Exami...
📄 Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Architectures
2025-10-12Авторы:
Weiguo Lu, Gangnan Yuan, Hong-kun Zhang, Shangyang Li
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Neural networks in general, from MLPs and CNNs to attention-based
Transformers, are constructed from layers of linear combinations followed by
nonlinear operations such as ReLU, Sigmoid, or Softmax. Despite their strength,
these conventional designs are often limited in introducing non-linearity by
the choice of activation functions. In this work, we introduce Gaussian
Mixture-Inspired Nonlinear Modules (GMNM), a new class of differentiable
modules that draw on the universal density approximatio...
Авторы:
Debsurya De, Dmitriy Kunisky
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recent work has generalized several results concerning the well-understood
spiked Wigner matrix model of a low-rank signal matrix corrupted by additive
i.i.d. Gaussian noise to the inhomogeneous case, where the noise has a variance
profile. In particular, for the special case where the variance profile has a
block structure, a series of results identified an effective spectral algorithm
for detecting and estimating the signal, identified the threshold signal
strength required for that algorithm ...
Авторы:
Tassilo Schwarz, Cai Dieball, Constantin Kogler, Kevin Lam, Renaud Lambiotte, Arnaud Doucet, Aljaž Godec, George Deligiannidis
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Diffusion models are central to generative modeling and have been adapted to
graphs by diffusing adjacency matrix representations. The challenge of having
up to $n!$ such representations for graphs with $n$ nodes is only partially
mitigated by using permutation-equivariant learning architectures. Despite
their computational efficiency, existing graph diffusion models struggle to
distinguish certain graph families, unless graph data are augmented with ad hoc
features. This shortcoming stems from ...
📄 Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
2025-10-10Авторы:
Tomohiro Hayase, Benoît Collins, Ryo Karakida
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Self-attention layers have become fundamental building blocks of modern deep
neural networks, yet their theoretical understanding remains limited,
particularly from the perspective of random matrix theory. In this work, we
provide a rigorous analysis of the singular value spectrum of the attention
matrix and establish the first Gaussian equivalence result for attention. In a
natural regime where the inverse temperature remains of constant order, we show
that the singular value distribution of th...
Авторы:
Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Masked diffusion models have shown promising performance in generating
high-quality samples in a wide range of domains, but accelerating their
sampling process remains relatively underexplored. To investigate efficient
samplers for masked diffusion, this paper theoretically analyzes the MaskGIT
sampler for image modeling, revealing its implicit temperature sampling
mechanism. Through this analysis, we introduce the "moment sampler," an
asymptotically equivalent but more tractable and interpretab...
Авторы:
Nabarun Deb
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper, we study fluctuations of conditionally centered statistics of
the form $$N^{-1/2}\sum_{i=1}^N
c_i(g(\sigma_i)-\mathbb{E}_N[g(\sigma_i)|\sigma_j,j\neq i])$$ where
$(\sigma_1,\ldots ,\sigma_N)$ are sampled from a dependent random field, and
$g$ is some bounded function. Our first main result shows that under weak
smoothness assumptions on the conditional means (which cover both sparse and
dense interactions), the above statistic converges to a Gaussian \emph{scale
mixture} with a ra...
Авторы:
Lucas Morisset, Adrien Hardy, Alain Durmus
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper addresses the problem of inverse covariance (also known as
precision matrix) estimation in high-dimensional settings. Specifically, we
focus on two classes of estimators: linear shrinkage estimators with a target
proportional to the identity matrix, and estimators derived from data
augmentation (DA). Here, DA refers to the common practice of enriching a
dataset with artificial samples--typically generated via a generative model or
through random transformations of the original data--p...
Авторы:
Eloy Mosig, Andrea Agazzi, Dario Trevisan
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper, we study the quantitative convergence of shallow neural
networks trained via gradient descent to their associated Gaussian processes in
the infinite-width limit.
While previous work has established qualitative convergence under broad
settings, precise, finite-width estimates remain limited, particularly during
training.
We provide explicit upper bounds on the quadratic Wasserstein distance
between the network output and its Gaussian approximation at any training time
$t \ge 0$...
Авторы:
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper, we study the evolution of tokens through the depth of
encoder-only transformer models at inference time by modeling them as a system
of particles interacting in a mean-field way and studying the corresponding
dynamics. More specifically, we consider this problem in the moderate
interaction regime, where the number $N$ of tokens is large and the inverse
temperature parameter $\beta$ of the model scales together with $N$. In this
regime, the dynamics of the system displays a multisc...
Показано 21 -
30
из 43 записей