📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
📄 Accelerated stochastic first-order method for convex optimization under heavy-tailed noise
2025-10-15Авторы:
Chuan He, Zhaosong Lu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We study convex composite optimization problems, where the objective function
is given by the sum of a prox-friendly function and a convex function whose
subgradients are estimated under heavy-tailed noise. Existing work often
employs gradient clipping or normalization techniques in stochastic first-order
methods to address heavy-tailed noise. In this paper, we demonstrate that a
vanilla stochastic algorithm -- without additional modifications such as
clipping or normalization -- can achieve opt...
Авторы:
Kiran Smelser, Kaviru Gunaratne, Jacob Miller, Stephen Kobourov
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Complex, high-dimensional data is ubiquitous across many scientific
disciplines, including machine learning, biology, and the social sciences. One
of the primary methods of visualizing these datasets is with two-dimensional
scatter plots that visually capture some properties of the data. Because
visually determining the accuracy of these plots is challenging, researchers
often use quality metrics to measure the projection's accuracy and faithfulness
to the original data. One of the most commonly...
Авторы:
Jacob Trauger, Tyson Trauger, Ambuj Tewari
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper we will give a characterization of the learnability of
forgiving 0-1 loss functions in the finite label multiclass setting. To do
this, we create a new combinatorial dimension that is based off of the
Natarajan Dimension and we show that a hypothesis class is learnable in our
setting if and only if this Generalized Natarajan Dimension is finite. We also
show a connection to learning with set-valued feedback. Through our results we
show that the learnability of a set learning proble...
Авторы:
Ayush Khot, Miruna Oprescu, Maresa Schröder, Ai Kagawa, Xihaier Luo
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Causal inference in spatial domains faces two intertwined challenges: (1)
unmeasured spatial factors, such as weather, air pollution, or mobility, that
confound treatment and outcome, and (2) interference from nearby treatments
that violate standard no-interference assumptions. While existing methods
typically address one by assuming away the other, we show they are deeply
connected: interference reveals structure in the latent confounder. Leveraging
this insight, we propose the Spatial Deconfou...
Авторы:
Siu-Kui Au, Zi-Jun Cao
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Engineering risk is concerned with the likelihood of failure and the
scenarios when it occurs. The sensitivity of failure probability to change in
system parameters is relevant to risk-informed decision making. Computing
sensitivity is at least one level more difficult than the probability itself,
which is already challenged by a large number of input random variables, rare
events and implicit nonlinear `black-box' response. Finite difference with
Monte Carlo probability estimates is spurious, r...
Авторы:
Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Traditional transfer learning typically reuses large pre-trained networks by
freezing some of their weights and adding task-specific layers. While this
approach is computationally efficient, it limits the model's ability to adapt
to domain-specific features and can still lead to overfitting with very limited
data. To address these limitations, we propose Structured Output Regularization
(SOR), a simple yet effective framework that freezes the internal network
structures (e.g., convolutional filt...
📄 Nearly Instance-Optimal Parameter Recovery from Many Trajectories via Hellinger Localization
2025-10-12Авторы:
Eliot Shekhtman, Yichen Zhou, Ingvar Ziemann, Nikolai Matni, Stephen Tu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Learning from temporally-correlated data is a core facet of modern machine
learning. Yet our understanding of sequential learning remains incomplete,
particularly in the multi-trajectory setting where data consists of many
independent realizations of a time-indexed stochastic process. This important
regime both reflects modern training pipelines such as for large foundation
models, and offers the potential for learning without the typical mixing
assumptions made in the single-trajectory case. Ho...
Авторы:
John Dunbar, Scott Aaronson
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We establish that randomly initialized neural networks, with large width and
a natural choice of hyperparameters, have nearly independent outputs exactly
when their activation function is nonlinear with zero mean under the Gaussian
measure: $\mathbb{E}_{z \sim \mathcal{N}(0,1)}[\sigma(z)]=0$. For example, this
includes ReLU and GeLU with an additive shift, as well as tanh, but not ReLU or
GeLU by themselves. Because of their nearly independent outputs, we propose
neural networks with zero-mean a...
Авторы:
Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Transformer has become the dominant architecture for sequence modeling, yet a
detailed understanding of how its structural parameters influence expressive
power remains limited. In this work, we study the approximation properties of
transformers, with particular emphasis on the role of the number of attention
heads. Our analysis begins with the introduction of a generalized $D$-retrieval
task, which we prove to be dense in the space of continuous functions, thereby
providing the basis for our th...
Авторы:
Alex Kipnis, Marcel Binz, Eric Schulz
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Hierarchical data with multiple observations per group is ubiquitous in
empirical sciences and is often analyzed using mixed-effects regression. In
such models, Bayesian inference gives an estimate of uncertainty but is
analytically intractable and requires costly approximation using Markov Chain
Monte Carlo (MCMC) methods. Neural posterior estimation shifts the bulk of
computation from inference time to pre-training time, amortizing over simulated
datasets with known ground truth targets. We pr...
Показано 201 -
210
из 385 записей