📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Markus Krimmel, Philip Hartout, Karsten Borgwardt, Dexiong Chen
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Existing methods for evaluating graph generative models primarily rely on
Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these
metrics can rank generative models, they do not provide an absolute measure of
performance. Their values are also highly sensitive to extrinsic parameters,
namely kernel and descriptor parametrization, making them incomparable across
different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new
evaluation framework that addresses ...
Авторы:
Zhao Song, Shenghao Xie, Samson Zhou
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper studies the computational challenges of large-scale
attention-based models in artificial intelligence by utilizing importance
sampling methods in the streaming setting. Inspired by the classical definition
of the $\ell_2$ sampler and the recent progress of the attention scheme in
Large Language Models (LLMs), we propose the definition of the attention
sampler. Our approach significantly reduces the computational burden of
traditional attention mechanisms. We analyze the effectiveness ...
📄 Group Policy Gradient
2025-10-08Авторы:
Junhua Chen, Zixi Zhang, Hantao Zhong, Rika Antonova
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We introduce Group Policy Gradient (GPG), a family of critic-free
policy-gradient estimators for general MDPs. Inspired by the success of GRPO's
approach in Reinforcement Learning from Human Feedback (RLHF), GPG replaces a
learned value function with a group-based Monte Carlo advantage estimator,
removing the memory, compute, and hyperparameter costs of training a critic
while preserving PPO's clipped-objective structure. We prove the consistency of
the GPG estimator, analyze the bias-variance t...
Авторы:
Ali Azizpour, Reza Ramezanpour, Ashutosh Sabharwal, Santiago Segarra
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Real-world graph datasets often consist of mixtures of populations, where
graphs are generated from multiple distinct underlying distributions. However,
modern representation learning approaches, such as graph contrastive learning
(GCL) and augmentation methods like Mixup, typically overlook this mixture
structure. In this work, we propose a unified framework that explicitly models
data as a mixture of underlying probabilistic graph generative models
represented by graphons. To characterize thes...
Авторы:
Qianxin Yi, Shao-Bo Lin, Jun Fan, Yao Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Reinforcement learning (RL) has been widely applied to sequential decision
making, where interpretability and performance are both critical for practical
adoption. Current approaches typically focus on performance and rely on post
hoc explanations to account for interpretability. Different from these
approaches, we focus on designing an interpretability-oriented yet
performance-enhanced RL approach. Specifically, we propose a spectral based
linear RL method that extends the ridge regression-base...
📄 Allocation of Parameters in Transformers
2025-10-08Авторы:
Ruoxi Yu, Haotian Jiang, Jingpu Cheng, Penghao Yu, Qianxiao Li, Zhong Li
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Transformers have achieved remarkable successes across a wide range of
applications, yet the theoretical foundation of their model efficiency remains
underexplored. In this work, we investigate how the model parameters -- mainly
attention heads and head dimensions -- should be allocated across layers to
balance expressivity and efficiency. We first provide mathematical analysis on
the role of early layers in information extraction from an approximation
perspective, with a theoretical characteriz...
📄 Robust Batched Bandits
2025-10-08Авторы:
Yunwen Guo, Yunlun Shu, Gongyi Zhuo, Tianyu Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The batched multi-armed bandit (MAB) problem, in which rewards are collected
in batches, is crucial for applications such as clinical trials. Existing
research predominantly assumes light-tailed reward distributions, yet many
real-world scenarios, including clinical outcomes, exhibit heavy-tailed
characteristics. This paper bridges this gap by proposing robust batched bandit
algorithms designed for heavy-tailed rewards, within both finite-arm and
Lipschitz-continuous settings. We reveal a surpri...
Авторы:
Philipp Becker, Niklas Freymuth, Serge Thilges, Fabian Otto, Gerhard Neumann
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
On-policy Reinforcement Learning (RL) with PPO-like clip objectives has
become the standard choice for reward-based fine-tuning of large language
models (LLMs). Although recent work has explored improved estimators of
advantages and normalization, the clipping mechanism itself has remained
untouched. Originally introduced as a proxy for principled KL-based trust
regions, clipping is a crude approximation that often causes unstable updates
and suboptimal performance. We replace the clip objective...
📄 Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting
2025-10-08Авторы:
Behraj Khan, Tahir Qasim Syed
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We present a theoretical framework for M-FISHER, a method for sequential
distribution shift detection and stable adaptation in streaming data. For
detection, we construct an exponential martingale from non-conformity scores
and apply Ville's inequality to obtain time-uniform guarantees on false alarm
control, ensuring statistical validity at any stopping time. Under sustained
shifts, we further bound the expected detection delay as
$\mathcal{O}(\log(1/\delta)/\Gamma)$, where $\Gamma$ reflects th...
Авторы:
Behraj Khan, Tahir Qasim Syed
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
When training data are fragmented across batches or federated-learned across
different geographic locations, trained models manifest performance
degradation. That degradation partly owes to covariate shift induced by data
having been fragmented across time and space and producing dissimilar empirical
training distributions. Each fragment's distribution is slightly different to a
hypothetical unfragmented training distribution of covariates, and to the
single validation distribution. To address t...
Показано 221 -
230
из 385 записей