📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Contraction and entropy production in continuous-time Sinkhorn dynamics

2025-10-16

Авторы:

Anand Srinivasan, Jean-Jacques Slotine

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recently, the vanishing-step-size limit of the Sinkhorn algorithm at finite regularization parameter $\varepsilon$ was shown to be a mirror descent in the space of probability measures. We give $L^2$ contraction criteria in two time-dependent metrics induced by the mirror Hessian, which reduce to the coercivity of certain conditional expectation operators. We then give an exact identity for the entropy production rate of the Sinkhorn flow, which was previously known only to be nonpositive. Exami...

ID: 2510.12639v1 stat.ML, cs.LG, math.PR

arXiv PDF

📄 Rethinking Nonlinearity: Trainable Gaussian Mixture Modules for Modern Neural Architectures

2025-10-12

Авторы:

Weiguo Lu, Gangnan Yuan, Hong-kun Zhang, Shangyang Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Neural networks in general, from MLPs and CNNs to attention-based Transformers, are constructed from layers of linear combinations followed by nonlinear operations such as ReLU, Sigmoid, or Softmax. Despite their strength, these conventional designs are often limited in introducing non-linearity by the choice of activation functions. In this work, we introduce Gaussian Mixture-Inspired Nonlinear Modules (GMNM), a new class of differentiable modules that draw on the universal density approximatio...

ID: 2510.06660v1 cs.LG, math.PR

arXiv PDF

📄 Computational and statistical lower bounds for low-rank estimation under general inhomogeneous noise

2025-10-11

Авторы:

Debsurya De, Dmitriy Kunisky

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent work has generalized several results concerning the well-understood spiked Wigner matrix model of a low-rank signal matrix corrupted by additive i.i.d. Gaussian noise to the inhomogeneous case, where the noise has a variance profile. In particular, for the special case where the variance profile has a block structure, a series of results identified an effective spectral algorithm for detecting and estimating the signal, identified the threshold signal strength required for that algorithm ...

ID: 2510.08541v1 math.ST, cs.DS, cs.LG, math.PR, stat.TH

arXiv PDF

📄 Permutation-Invariant Spectral Learning via Dyson Diffusion

2025-10-11

Авторы:

Tassilo Schwarz, Cai Dieball, Constantin Kogler, Kevin Lam, Renaud Lambiotte, Arnaud Doucet, Aljaž Godec, George Deligiannidis

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Diffusion models are central to generative modeling and have been adapted to graphs by diffusing adjacency matrix representations. The challenge of having up to $n!$ such representations for graphs with $n$ nodes is only partially mitigated by using permutation-equivariant learning architectures. Despite their computational efficiency, existing graph diffusion models struggle to distinguish certain graph families, unless graph data are augmented with ad hoc features. This shortcoming stems from ...

ID: 2510.08535v1 stat.ML, cs.LG, math.PR

arXiv PDF

📄 Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix

2025-10-10

Авторы:

Tomohiro Hayase, Benoît Collins, Ryo Karakida

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Self-attention layers have become fundamental building blocks of modern deep neural networks, yet their theoretical understanding remains limited, particularly from the perspective of random matrix theory. In this work, we provide a rigorous analysis of the singular value spectrum of the attention matrix and establish the first Gaussian equivalence result for attention. In a natural regime where the inverse temperature remains of constant order, we show that the singular value distribution of th...

ID: 2510.06685v1 stat.ML, cs.LG, math.PR

arXiv PDF

📄 Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

2025-10-08

Авторы:

Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains, but accelerating their sampling process remains relatively underexplored. To investigate efficient samplers for masked diffusion, this paper theoretically analyzes the MaskGIT sampler for image modeling, revealing its implicit temperature sampling mechanism. Through this analysis, we introduce the "moment sampler," an asymptotically equivalent but more tractable and interpretab...

ID: 2510.04525v1 cs.LG, math.PR, stat.ML

arXiv PDF

📄 Pivotal CLTs for Pseudolikelihood via Conditional Centering in Dependent Random Fields

2025-10-08

Авторы:

Nabarun Deb

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this paper, we study fluctuations of conditionally centered statistics of the form $$N^{-1/2}\sum_{i=1}^N c_i(g(\sigma_i)-\mathbb{E}_N[g(\sigma_i)|\sigma_j,j\neq i])$$ where $(\sigma_1,\ldots ,\sigma_N)$ are sampled from a dependent random field, and $g$ is some bounded function. Our first main result shows that under weak smoothness assumptions on the conditional means (which cover both sparse and dense interactions), the above statistic converges to a Gaussian \emph{scale mixture} with a ra...

ID: 2510.04972v1 math.ST, cs.LG, math.PR, stat.TH, 82B20, 82B26

arXiv PDF

📄 Non-Asymptotic Analysis of Data Augmentation for Precision Matrix Estimation

2025-10-04

Авторы:

Lucas Morisset, Adrien Hardy, Alain Durmus

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper addresses the problem of inverse covariance (also known as precision matrix) estimation in high-dimensional settings. Specifically, we focus on two classes of estimators: linear shrinkage estimators with a target proportional to the identity matrix, and estimators derived from data augmentation (DA). Here, DA refers to the common practice of enriching a dataset with artificial samples--typically generated via a generative model or through random transformations of the original data--p...

ID: 2510.02119v1 stat.ML, cs.LG, math.PR, math.ST, stat.TH

arXiv PDF

📄 Quantitative convergence of trained single layer neural networks to Gaussian processes

2025-10-01

Авторы:

Eloy Mosig, Andrea Agazzi, Dario Trevisan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this paper, we study the quantitative convergence of shallow neural networks trained via gradient descent to their associated Gaussian processes in the infinite-width limit. While previous work has established qualitative convergence under broad settings, precise, finite-width estimates remain limited, particularly during training. We provide explicit upper bounds on the quadratic Wasserstein distance between the network output and its Gaussian approximation at any training time $t \ge 0$...

ID: 2509.24544v1 stat.ML, cs.LG, math.PR

arXiv PDF

📄 A multiscale analysis of mean-field transformers in the moderate interaction regime

2025-10-01

Авторы:

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where the number $N$ of tokens is large and the inverse temperature parameter $\beta$ of the model scales together with $N$. In this regime, the dynamics of the system displays a multisc...

ID: 2509.25040v1 cs.LG, math.PR, stat.ML

arXiv PDF

Показано 21 - 30 из 43 записей