📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Tomas Hrycej, Bernhard Bermeitinger, Massimo Pavone, Götz-Henrik Wiegand, Siegfried Handschuh
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The key task of machine learning is to minimize the loss function that
measures the model fit to the training data. The numerical methods to do this
efficiently depend on the properties of the loss function. The most decisive
among these properties is the convexity or non-convexity of the loss function.
The fact that the loss function can have, and frequently has, non-convex
regions has led to a widespread commitment to non-convex methods such as Adam.
However, a local minimum implies that, in s...
Авторы:
Mostafa Ameli, Van Anh Le, Sulthana Shams, Alexander Skabardonis
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The traffic assignment problem is essential for traffic flow analysis,
traditionally solved using mathematical programs under the Equilibrium
principle. These methods become computationally prohibitive for large-scale
networks due to non-linear growth in complexity with the number of OD pairs.
This study introduces a novel data-driven approach using deep neural networks,
specifically leveraging the Transformer architecture, to predict equilibrium
path flows directly. By focusing on path-level tr...
📄 On the Optimal Construction of Unbiased Gradient Estimators for Zeroth-Order Optimization
2025-10-25Авторы:
Shaocong Ma, Heng Huang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Zeroth-order optimization (ZOO) is an important framework for stochastic
optimization when gradients are unavailable or expensive to compute. A
potential limitation of existing ZOO methods is the bias inherent in most
gradient estimators unless the perturbation stepsize vanishes. In this paper,
we overcome this biasedness issue by proposing a novel family of unbiased
gradient estimators based solely on function evaluations. By reformulating
directional derivatives as a telescoping series and sam...
📄 Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets
2025-10-25Авторы:
Shaocong Ma, Heng Huang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In financial applications, reinforcement learning (RL) agents are commonly
trained on historical data, where their actions do not influence prices.
However, during deployment, these agents trade in live markets where their own
transactions can shift asset prices, a phenomenon known as market impact. This
mismatch between training and deployment environments can significantly degrade
performance. Traditional robust RL approaches address this model
misspecification by optimizing the worst-case per...
Авторы:
Shaocong Ma, Heng Huang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper, we explore the two-point zeroth-order gradient estimator and
identify the distribution of random perturbations that minimizes the
estimator's asymptotic variance as the perturbation stepsize tends to zero. We
formulate it as a constrained functional optimization problem over the space of
perturbation distributions. Our findings reveal that such desired perturbations
can align directionally with the true gradient, instead of maintaining a fixed
length. While existing research has l...
Авторы:
Nadir Farhi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this work, we address the problem of determining reliable policies in
reinforcement learning (RL), with a focus on optimization under uncertainty and
the need for performance guarantees. While classical RL algorithms aim at
maximizing the expected return, many real-world applications - such as routing,
resource allocation, or sequential decision-making under risk - require
strategies that ensure not only high average performance but also a guaranteed
probability of success. To this end, we pr...
📄 Unbiased Gradient Low-Rank Projection
2025-10-22Авторы:
Rui Pan, Yang Luo, Yuxing Liu, Yang You, Tong Zhang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Memory-efficient optimization is critical for training increasingly large
language models (LLMs). A popular strategy involves gradient low-rank
projection, storing only the projected optimizer states, with GaLore being a
representative example. However, a significant drawback of many such methods is
their lack of convergence guarantees, as various low-rank projection approaches
introduce inherent biases relative to the original optimization algorithms,
which contribute to performance gaps compar...
📄 Self-Certifying Primal-Dual Optimization Proxies for Large-Scale Batch Economic Dispatch
2025-10-21Авторы:
Michael Klamkin, Mathieu Tanneau, Pascal Van Hentenryck
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recent research has shown that optimization proxies can be trained to high
fidelity, achieving average optimality gaps under 1% for large-scale problems.
However, worst-case analyses show that there exist in-distribution queries that
result in orders of magnitude higher optimality gap, making it difficult to
trust the predictions in practice. This paper aims at striking a balance
between classical solvers and optimization proxies in order to enable
trustworthy deployments with interpretable spee...
Авторы:
Alexandru Meterez, Depen Morwani, Jingfeng Wu, Costin-Andrei Oncescu, Cengiz Pehlevan, Sham Kakade
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Increasing the batch size during training -- a ''batch ramp'' -- is a
promising strategy to accelerate large language model pretraining. While for
SGD, doubling the batch size can be equivalent to halving the learning rate,
the optimal strategy for adaptive optimizers like Adam is less clear. As a
result, any batch-ramp scheduling, if used at all, is typically tuned
heuristically. This work develops a principled framework for batch-size
scheduling and introduces Seesaw: whenever a standard sched...
Авторы:
Jiayuan Sheng, Hanyang Zhao, Haoxian Chen, David D. Yao, Wenpin Tang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Reinforcement Learning from Human Feedback (RLHF) is increasingly used to
fine-tune diffusion models, but a key challenge arises from the mismatch
between stochastic samplers used during training and deterministic samplers
used during inference. In practice, models are fine-tuned using stochastic SDE
samplers to encourage exploration, while inference typically relies on
deterministic ODE samplers for efficiency and stability. This discrepancy
induces a reward gap, raising concerns about whether ...
Показано 11 -
20
из 34 записей