📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 Accelerated stochastic first-order method for convex optimization under heavy-tailed noise

2025-10-15

Авторы:

Chuan He, Zhaosong Lu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We study convex composite optimization problems, where the objective function is given by the sum of a prox-friendly function and a convex function whose subgradients are estimated under heavy-tailed noise. Existing work often employs gradient clipping or normalization techniques in stochastic first-order methods to address heavy-tailed noise. In this paper, we demonstrate that a vanilla stochastic algorithm -- without additional modifications such as clipping or normalization -- can achieve opt...

ID: 2510.11676v1 math.OC, cs.AI, cs.LG, stat.ML, 49M05, 49M37, 90C25, 90C30

arXiv PDF

📄 How Scale Breaks "Normalized Stress" and KL Divergence: Rethinking Quality Metrics

2025-10-14

Авторы:

Kiran Smelser, Kaviru Gunaratne, Jacob Miller, Stephen Kobourov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Complex, high-dimensional data is ubiquitous across many scientific disciplines, including machine learning, biology, and the social sciences. One of the primary methods of visualizing these datasets is with two-dimensional scatter plots that visually capture some properties of the data. Because visually determining the accuracy of these plots is challenging, researchers often use quality metrics to measure the projection's accuracy and faithfulness to the original data. One of the most commonly...

ID: 2510.08660v1 cs.LG, stat.ML

arXiv PDF

📄 Characterizing the Multiclass Learnability of Forgiving 0-1 Loss Functions

2025-10-14

Авторы:

Jacob Trauger, Tyson Trauger, Ambuj Tewari

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this paper we will give a characterization of the learnability of forgiving 0-1 loss functions in the finite label multiclass setting. To do this, we create a new combinatorial dimension that is based off of the Natarajan Dimension and we show that a hypothesis class is learnable in our setting if and only if this Generalized Natarajan Dimension is finite. We also show a connection to learning with set-valued feedback. Through our results we show that the learnability of a set learning proble...

ID: 2510.08382v2 cs.LG, stat.ML

arXiv PDF

📄 Spatial Deconfounder: Interference-Aware Deconfounding for Spatial Causal Inference

2025-10-14

Авторы:

Ayush Khot, Miruna Oprescu, Maresa Schröder, Ai Kagawa, Xihaier Luo

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Causal inference in spatial domains faces two intertwined challenges: (1) unmeasured spatial factors, such as weather, air pollution, or mobility, that confound treatment and outcome, and (2) interference from nearby treatments that violate standard no-interference assumptions. While existing methods typically address one by assuming away the other, we show they are deeply connected: interference reveals structure in the latent confounder. Leveraging this insight, we propose the Spatial Deconfou...

ID: 2510.08762v1 cs.LG, stat.ML

arXiv PDF

📄 Reliability Sensitivity with Response Gradient

2025-10-14

Авторы:

Siu-Kui Au, Zi-Jun Cao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Engineering risk is concerned with the likelihood of failure and the scenarios when it occurs. The sensitivity of failure probability to change in system parameters is relevant to risk-informed decision making. Computing sensitivity is at least one level more difficult than the probability itself, which is already challenged by a large number of input random variables, rare events and implicit nonlinear `black-box' response. Finite difference with Monte Carlo probability estimates is spurious, r...

ID: 2510.09315v1 stat.ME, cs.LG, stat.ML

arXiv PDF

📄 Structured Output Regularization: a framework for few-shot transfer learning

2025-10-14

Авторы:

Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Traditional transfer learning typically reuses large pre-trained networks by freezing some of their weights and adding task-specific layers. While this approach is computationally efficient, it limits the model's ability to adapt to domain-specific features and can still lead to overfitting with very limited data. To address these limitations, we propose Structured Output Regularization (SOR), a simple yet effective framework that freezes the internal network structures (e.g., convolutional filt...

ID: 2510.08728v1 cs.CV, cs.LG, stat.ML

arXiv PDF

📄 Nearly Instance-Optimal Parameter Recovery from Many Trajectories via Hellinger Localization

2025-10-12

Авторы:

Eliot Shekhtman, Yichen Zhou, Ingvar Ziemann, Nikolai Matni, Stephen Tu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Learning from temporally-correlated data is a core facet of modern machine learning. Yet our understanding of sequential learning remains incomplete, particularly in the multi-trajectory setting where data consists of many independent realizations of a time-indexed stochastic process. This important regime both reflects modern training pipelines such as for large foundation models, and offers the potential for learning without the typical mixing assumptions made in the single-trajectory case. Ho...

ID: 2510.06434v1 cs.LG, stat.ML

arXiv PDF

📄 Wide Neural Networks as a Baseline for the Computational No-Coincidence Conjecture

2025-10-12

Авторы:

John Dunbar, Scott Aaronson

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We establish that randomly initialized neural networks, with large width and a natural choice of hyperparameters, have nearly independent outputs exactly when their activation function is nonlinear with zero mean under the Gaussian measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[\sigma(z)]=0$. For example, this includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or GeLU by themselves. Because of their nearly independent outputs, we propose neural networks with zero-mean a...

ID: 2510.06527v1 cs.LG, stat.ML

arXiv PDF

📄 The Effect of Attention Head Count on Transformer Approximation

2025-10-12

Авторы:

Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Transformer has become the dominant architecture for sequence modeling, yet a detailed understanding of how its structural parameters influence expressive power remains limited. In this work, we study the approximation properties of transformers, with particular emphasis on the role of the number of attention heads. Our analysis begins with the introduction of a generalized $D$-retrieval task, which we prove to be dense in the space of continuous functions, thereby providing the basis for our th...

ID: 2510.06662v1 cs.LG, stat.ML

arXiv PDF

📄 metabeta -- A fast neural model for Bayesian mixed-effects regression

2025-10-11

Авторы:

Alex Kipnis, Marcel Binz, Eric Schulz

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Hierarchical data with multiple observations per group is ubiquitous in empirical sciences and is often analyzed using mixed-effects regression. In such models, Bayesian inference gives an estimate of uncertainty but is analytically intractable and requires costly approximation using Markov Chain Monte Carlo (MCMC) methods. Neural posterior estimation shifts the bulk of computation from inference time to pre-training time, amortizing over simulated datasets with known ground truth targets. We pr...

ID: 2510.07473v1 cs.LG, stat.ML, 62J05, 62F15, 68T07, I.2.6; G.3

arXiv PDF

Показано 201 - 210 из 385 записей