📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics

2025-12-05

Авторы:

Connall Garrod, Jonathan P. Keating, Christos Thrampoulidis

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Cross-entropy (CE) training loss dominates deep learning practice, yet existing theory often relies on simplifications, either replacing it with squared loss or restricting to convex models, that miss essential behavior. CE and squared loss generate fundamentally different dynamics, and convex linear models cannot capture the complexities of non-convex optimization. We provide an in-depth characterization of multi-class CE optimization dynamics beyond the convex regime by analyzing a canonical t...

ID: 2512.04006v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 When do spectral gradient updates help in deep learning?

2025-12-05

Авторы:

Damek Davis, Dmitriy Drusvyatskiy

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Spectral gradient methods, such as the recently popularized Muon optimizer, are a promising alternative to standard Euclidean gradient descent for training deep neural networks and transformers, but it is still unclear in which regimes they are expected to perform better. We propose a simple layerwise condition that predicts when a spectral update yields a larger decrease in the loss than a Euclidean gradient step. This condition compares, for each parameter block, the squared nuclear-to-Frobeni...

ID: 2512.04299v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles

2025-11-27

Авторы:

Kaiyi Ji

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $Ω(κ^{3/2}ε^{-2})$ oracle c...

ID: 2511.19656v2 cs.LG, math.OC, stat.ML

arXiv PDF

📄 Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization

2025-11-27

Авторы:

Peng Zhao, Yu-Hu Yan, Hang Yu, Zhi-Hua Zhou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Universal online learning aims to achieve optimal regret guarantees without requiring prior knowledge of the curvature of online functions. Existing methods have established minimax-optimal regret bounds for universal online learning, where a single algorithm can simultaneously attain $\mathcal{O}(\sqrt{T})$ regret for convex functions, $\mathcal{O}(d \log T)$ for exp-concave functions, and $\mathcal{O}(\log T)$ for strongly convex functions, where $T$ is the number of rounds and $d$ is the dime...

ID: 2511.19937v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

2025-11-18

Авторы:

Wei-Cheng Lee, Francesco Orabona

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not opt...

ID: 2511.11211v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

2025-11-08

Авторы:

Semih Cayci

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on ...

ID: 2511.03972v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits

2025-11-06

Авторы:

Xuheng Li, Quanquan Gu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Variance-dependent regret bounds have received increasing attention in recent studies on contextual bandits. However, most of these studies are focused on upper confidence bound (UCB)-based bandit algorithms, while sampling based bandit algorithms such as Thompson sampling are still understudied. The only exception is the LinVDTS algorithm (Xu et al., 2023), which is limited to linear reward function and its regret bound is not optimal with respect to the model dimension. In this paper, we prese...

ID: 2511.02123v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models

2025-11-06

Авторы:

Lejs Deen Behric, Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Zeroth-order or derivative-free optimization (MeZO) is an attractive strategy for finetuning large language models (LLMs) because it eliminates the memory overhead of backpropagation. However, it converges slowly due to the inherent curse of dimensionality when searching for descent directions in the high-dimensional parameter space of billion-scale LLMs. We propose ConMeZO, a novel zeroth-order optimizer that accelerates convergence by adaptive directional sampling. Instead of drawing the direc...

ID: 2511.02757v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 Rethinking PCA Through Duality

2025-10-23

Авторы:

Jan Quan, Johan Suykens, Panagiotis Patrinos

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Motivated by the recently shown connection between self-attention and (kernel) principal component analysis (PCA), we revisit the fundamentals of PCA. Using the difference-of-convex (DC) framework, we present several novel formulations and provide new theoretical insights. In particular, we show the kernelizability and out-of-sample applicability for a PCA-like family of problems. Moreover, we uncover that simultaneous iteration, which is connected to the classical QR algorithm, is an instance o...

ID: 2510.18130v1 cs.LG, math.OC, stat.ML

arXiv PDF

📄 Enhancing Fractional Gradient Descent with Learned Optimizers

2025-10-23

Авторы:

Jan Sobotka, Petr Šimánek, Pavel Kordík

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Fractional Gradient Descent (FGD) offers a novel and promising way to accelerate optimization by incorporating fractional calculus into machine learning. Although FGD has shown encouraging initial results across various optimization tasks, it faces significant challenges with convergence behavior and hyperparameter selection. Moreover, the impact of its hyperparameters is not fully understood, and scheduling them is particularly difficult in non-convex settings such as neural network training. T...

ID: 2510.18783v1 cs.LG, math.OC, stat.ML

arXiv PDF

Показано 1 - 10 из 34 записей