📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics

2025-12-05

Авторы:

Connall Garrod, Jonathan P. Keating, Christos Thrampoulidis

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Cross-entropy (CE) training loss dominates deep learning practice, yet existing theory often relies on simplifications, either replacing it with squared loss or restricting to convex models, that miss essential behavior. CE and squared loss generate fundamentally different dynamics, and convex linear models cannot capture the complexities of non-convex optimization. We provide an in-depth characterization of multi-class CE optimization dynamics beyond the convex regime by analyzing a canonical t...

ID: 2512.04006v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 Convergence for Discrete Parameter Updates

2025-12-05

Авторы:

Paul Wilson, Fabio Zanasi, George Constantinides

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a...

ID: 2512.04051v1 cs.LG, math.OC

arXiv PDF

📄 When do spectral gradient updates help in deep learning?

2025-12-05

Авторы:

Damek Davis, Dmitriy Drusvyatskiy

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. We propose a simple layerwise condition that predicts when a spectral update yields a larger decrease in the loss than a Euclidean gradient step. This condition compares, for each parameter block, the squared nuclear-to-Frobeni...

ID: 2512.04299v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 The Geometry of Intelligence: Deterministic Functional Topology as a Foundation for Real-World Perception

2025-12-05

Авторы:

Eduardo Di Santi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Real-world physical processes do not generate arbitrary variability: their signals concentrate on compact and low-variability subsets of functional space. This geometric structure enables rapid generalization from a few examples in both biological and artificial systems. This work develops a deterministic functional-topological framework in which the set of valid realizations of a physical phenomenon forms a compact perceptual manifold with stable invariants and a finite Hausdorff radius. We s...

ID: 2512.05089v1 cs.LG, math.OC

arXiv PDF

📄 The Silence that Speaks: Neural Estimation via Communication Gaps

2025-12-04

Авторы:

Shubham Aggarwal, Dipankar Maity, Tamer Başar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Accurate remote state estimation is a fundamental component of many autonomous and networked dynamical systems, where multiple decision-making agents interact and communicate over shared, bandwidth-constrained channels. These communication constraints introduce an additional layer of complexity, namely, the decision of when to communicate. This results in a fundamental trade-off between estimation accuracy and communication resource usage. Traditional extensions of classical estimation algorithm...

ID: 2512.01056v1 eess.SY, cs.LG, math.OC

arXiv PDF

📄 High-dimensional Mean-Field Games by Particle-based Flow Matching

2025-12-04

Авторы:

Jiajia Yu, Junghwan Lee, Yao Xie, Xiuyuan Cheng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Mean-field games (MFGs) study the Nash equilibrium of systems with a continuum of interacting agents, which can be formulated as the fixed-point of optimal control problems. They provide a unified framework for a variety of applications, including optimal transport (OT) and generative models. Despite their broad applicability, solving high-dimensional MFGs remains a significant challenge due to fundamental computational and analytical obstacles. In this work, we propose a particle-based deep Flo...

ID: 2512.01172v1 stat.ML, cs.LG, math.OC

arXiv PDF

📄 Beyond Scaffold: A Unified Spatio-Temporal Gradient Tracking Method

2025-12-04

Авторы:

Yan Huang, Jinming Xu, Jiming Chen, Karl Henrik Johansson

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In distributed and federated learning algorithms, communication overhead is often reduced by performing multiple local updates between communication rounds. However, due to data heterogeneity across nodes and the local gradient noise within each node, this strategy can lead to the drift of local models away from the global optimum. To address this issue, we revisit the well-known federated learning method Scaffold (Karimireddy et al., 2020) under a gradient tracking perspective, and propose a un...

ID: 2512.01732v1 cs.LG, math.OC

arXiv PDF

📄 Verifying Closed-Loop Contractivity of Learning-Based Controllers via Partitioning

2025-12-04

Авторы:

Alexander Davydov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We address the problem of verifying closed-loop contraction in nonlinear control systems whose controller and contraction metric are both parameterized by neural networks. By leveraging interval analysis and interval bound propagation, we derive a tractable and scalable sufficient condition for closed-loop contractivity that reduces to checking that the dominant eigenvalue of a symmetric Metzler matrix is nonpositive. We combine this sufficient condition with a domain partitioning strategy to in...

ID: 2512.02262v1 eess.SY, cs.LG, math.OC

arXiv PDF

📄 Risk-Sensitive Q-Learning in Continuous Time with Application to Dynamic Portfolio Selection

2025-12-04

Авторы:

Chuhan Xie

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper studies the problem of risk-sensitive reinforcement learning (RSRL) in continuous time, where the environment is characterized by a controllable stochastic differential equation (SDE) and the objective is a potentially nonlinear functional of cumulative rewards. We prove that when the functional is an optimized certainty equivalent (OCE), the optimal policy is Markovian with respect to an augmented environment. We also propose \textit{CT-RS-q}, a risk-sensitive q-learning algorithm ba...

ID: 2512.02386v1 cs.LG, math.OC

arXiv PDF

📄 Generative modeling using evolved quantum Boltzmann machines

2025-12-04

Авторы:

Mark M. Wilde

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Born-rule generative modeling, a central task in quantum machine learning, seeks to learn probability distributions that can be efficiently sampled by measuring complex quantum states. One hope is for quantum models to efficiently capture probability distributions that are difficult to learn and simulate by classical means alone. Quantum Boltzmann machines were proposed about one decade ago for this purpose, yet efficient training methods have remained elusive. In this paper, I overcome this obs...

ID: 2512.02721v1 quant-ph, cs.LG, math.OC

arXiv PDF

Показано 1 - 10 из 157 записей