📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
📄 Frugality in second-order optimization: floating-point approximations for Newton's method
2025-11-25Авторы:
Giuseppe Carrino, Elena Loli Piccolomini, Elisa Riccietti, Theo Mary
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are often avoided due to their computational cost. This work analyzes the impact of finite-precision arithmetic on Newton steps and establishes a convergence theorem for mixed-precision Newton optimizers, including "quasi" and "inexact" variants. The theorem provide...
Авторы:
Fares Fourati, Mohamed-Slim Alouini, Vaneet Aggarwal
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We propose ECPv2, a scalable and theoretically grounded algorithm for global optimization of Lipschitz-continuous functions with unknown Lipschitz constants. Building on the Every Call is Precious (ECP) framework, which ensures that each accepted function evaluation is potentially informative, ECPv2 addresses key limitations of ECP, including high computational cost and overly conservative early behavior. ECPv2 introduces three innovations: (i) an adaptive lower bound to avoid vacuous acceptance...
Авторы:
Abdelouahed Ben Mhamed, Assia Kamal-Idrissi, Amal El Fallah Seghrouchni
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Branch-and-Bound (B\&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale instances. The growing capabilities of machine learning have spurred efforts to improve B\&B by learning data-driven branching policies. However, most existing approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen i...
Авторы:
Matteo Francobaldi, Michele Lombardi, Andrea Lodi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Artificial Intelligence systems are increasingly deployed in settings where ensuring robustness, fairness, or domain-specific properties is essential for regulation compliance and alignment with human values. However, especially on Neural Networks, property enforcement is very challenging, and existing methods are limited to specific constraints or local properties (defined around datapoints), or fail to provide full guarantees. We tackle these limitations by extending SMiLE, a recently proposed...
Авторы:
Yu Huang, Zixin Wen, Aarti Singh, Yuejie Chi, Yuxin Chen
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The ability to reason lies at the core of artificial intelligence (AI), and challenging problems usually call for deeper and longer reasoning to tackle. A crucial question about AI reasoning is whether models can extrapolate learned reasoning patterns to solve harder tasks with longer chain-of-thought (CoT). In this work, we present a theoretical analysis of transformers learning on synthetic state-tracking tasks with gradient descent. We mathematically prove how the algebraic structure of state...
Авторы:
Ipsita Ghosh, Ethan Nguyen, Christian Kümmerle
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Parameter-efficient training, based on low-rank optimization, has become a
highly successful tool for fine-tuning large deep-learning models. However,
these methods fail at low-rank pre-training tasks where maintaining the
low-rank structure and the objective remains a challenging task. We propose the
Quadratic Reweighted Rank Regularizer dubbed Q3R, which leads to a novel
low-rank inducing training strategy inspired by the iteratively reweighted
least squares (IRLS) framework. Q3R is based on a...
Авторы:
Fengxu Li, Stephanie M. Carpenter, Matthew P. Buman, Yonatan Mintz
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
A common challenge for decision makers is selecting actions whose rewards are
unknown and evolve over time based on prior policies. For instance, repeated
use may reduce an action's effectiveness (habituation), while inactivity may
restore it (recovery). These nonstationarities are captured by the Reducing or
Gaining Unknown Efficacy (ROGUE) bandit framework, which models real-world
settings such as behavioral health interventions. While existing algorithms can
compute sublinear regret policies ...
📄 Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
2025-11-04Авторы:
Beomhan Baek, Minhak Song, Chulhee Yun
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Adam [Kingma and Ba, 2015] is the de facto optimizer in deep learning, yet
its theoretical understanding remains limited. Prior analyses show that Adam
favors solutions aligned with $\ell_\infty$-geometry, but these results are
restricted to the full-batch regime. In this work, we study the implicit bias
of incremental Adam (using one sample per step) for logistic regression on
linearly separable data, and we show that its bias can deviate from the
full-batch behavior. To illustrate this, we con...
Авторы:
Tong Zhao, Jiacheng Li, Yuanchang Zhou, Guangming Tan, Weile Jia
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Finding lower and better-generalizing minima is crucial for deep learning.
However, most existing optimizers stop searching the parameter space once they
reach a local minimum. Given the complex geometric properties of the loss
landscape, it is difficult to guarantee that such a point is the lowest or
provides the best generalization. To address this, we propose an adaptor "E"
for gradient-based optimizers. The adapted optimizer tends to continue
exploring along landscape valleys (areas with low...
📄 Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
2025-11-01Авторы:
Beomhan Baek, Minhak Song, Chulhee Yun
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Adam [Kingma and Ba, 2015] is the de facto optimizer in deep learning, yet
its theoretical understanding remains limited. Prior analyses show that Adam
favors solutions aligned with $\ell_\infty$-geometry, but these results are
restricted to the full-batch regime. In this work, we study the implicit bias
of incremental Adam (using one sample per step) for logistic regression on
linearly separable data, and we show that its bias can deviate from the
full-batch behavior. To illustrate this, we con...
Показано 1 -
10
из 34 записей