📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Xinwen Zhang, Hongchang Gao
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The recently introduced optimizer, Muon, has gained increasing attention due
to its superior performance across a wide range of applications. However, its
effectiveness in federated learning remains unexplored. To address this gap,
this paper investigates the performance of Muon in the federated learning
setting. Specifically, we propose a new algorithm, FedMuon, and establish its
convergence rate for nonconvex problems. Our theoretical analysis reveals
multiple favorable properties of FedMuon. ...
📄 Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
2025-10-08Авторы:
Karthik Viswanathan, Sang Eon Park
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We introduce a cumulant-expansion framework for quantifying how large
language models (LLMs) internalize higher-order statistical structure during
next-token prediction. By treating the softmax entropy of each layer's logit
distribution as a perturbation around its "center" distribution, we derive
closed-form cumulant observables that isolate successively higher-order
correlations. Empirically, we track these cumulants in GPT-2 and Pythia models
on Pile-10K prompts. (i) Structured prompts exhibi...
📄 Arithmetic-Mean $μ$P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
2025-10-08Авторы:
Haosong Zhang, Shenxi Wu, Yichi Zhang, Wei Lin
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Choosing an appropriate learning rate remains a key challenge in scaling
depth of modern deep networks. The classical maximal update parameterization
($\mu$P) enforces a fixed per-layer update magnitude, which is well suited to
homogeneous multilayer perceptrons (MLPs) but becomes ill-posed in
heterogeneous architectures where residual accumulation and convolutions
introduce imbalance across layers. We introduce Arithmetic-Mean $\mu$P
(AM-$\mu$P), which constrains not each individual layer but t...
Авторы:
Kaosar Uddin
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We present spd-metrics-id, a Python package for computing distances and
divergences between symmetric positive-definite (SPD) matrices. Unlike
traditional toolkits that focus on specific applications, spd-metrics-id
provides a unified, extensible, and reproducible framework for SPD distance
computation. The package supports a wide variety of geometry-aware metrics,
including Alpha-z Bures-Wasserstein, Alpha-Procrustes, affine-invariant
Riemannian, log-Euclidean, and others, and is accessible bot...
📄 Domain Generalization: A Tale of Two ERMs
2025-10-08Авторы:
Yilun Zhu, Naihao Deng, Naichen Shi, Aditya Gangrade, Clayton Scott
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Domain generalization (DG) is the problem of generalizing from several
distributions (or domains), for which labeled training data are available, to a
new test domain for which no labeled data is available. A common finding in the
DG literature is that it is difficult to outperform empirical risk minimization
(ERM) on the pooled training data.
In this work, we argue that this finding has primarily been reported for
datasets satisfying a \emph{covariate shift} assumption. When the dataset
satis...
📄 Graph-based Tabular Deep Learning Should Learn Feature Interactions, Not Just Make Predictions
2025-10-08Авторы:
Elias Dubbeldam, Reza Mohammadi, Marit Schoonhoven, S. Ilker Birbil
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Despite recent progress, deep learning methods for tabular data still
struggle to compete with traditional tree-based models. A key challenge lies in
modeling complex, dataset-specific feature interactions that are central to
tabular data. Graph-based tabular deep learning (GTDL) methods aim to address
this by representing features and their interactions as graphs. However,
existing methods predominantly optimize predictive accuracy, neglecting
accurate modeling of the graph structure. This posi...
Авторы:
Kaito Takanami, Takashi Takahashi, Yoshiyuki Kabashima
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In-context learning (ICL) is a key building block of modern large language
models, yet its theoretical mechanisms remain poorly understood. It is
particularly mysterious how ICL operates in real-world applications where tasks
have a common structure. In this work, we address this problem by analyzing a
linear attention model trained on low-rank regression tasks. Within this
setting, we precisely characterize the distribution of predictions and the
generalization error in the high-dimensional lim...
📄 Closed-Form Last Layer Optimization
2025-10-08Авторы:
Alexandre Galashov, Nathaël Da Costa, Liyuan Xu, Philipp Hennig, Arthur Gretton
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Neural networks are typically optimized with variants of stochastic gradient
descent. Under a squared loss, however, the optimal solution to the linear last
layer weights is known in closed-form. We propose to leverage this during
optimization, treating the last layer as a function of the backbone parameters,
and optimizing solely for these parameters. We show this is equivalent to
alternating between gradient descent steps on the backbone and closed-form
updates on the last layer. We adapt the ...
Авторы:
Adel Javanmard, Baharan Mirzasoleiman, Vahab Mirrokni
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Test-time scaling improves the reasoning capabilities of large language
models (LLMs) by allocating extra compute to generate longer Chains-of-Thoughts
(CoTs). This enables models to tackle more complex problem by breaking them
down into additional steps, backtracking, and correcting mistakes. Despite its
strong performance--demonstrated by OpenAI's o1 and DeepSeek R1, the conditions
in the training data under which long CoTs emerge, and when such long CoTs
improve the performance, remain unclea...
📄 The Hidden Game Problem
2025-10-08Авторы:
Gon Buzaglo, Noah Golowich, Elad Hazan
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper investigates a class of games with large strategy spaces,
motivated by challenges in AI alignment and language games. We introduce the
hidden game problem, where for each player, an unknown subset of strategies
consistently yields higher rewards compared to the rest. The central question
is whether efficient regret minimization algorithms can be designed to discover
and exploit such hidden structures, leading to equilibrium in these subgames
while maintaining rationality in general. W...
Показано 231 -
240
из 385 записей