📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 PolyGraph Discrepancy: a classifier-based metric for graph generation

2025-10-09

Авторы:

Markus Krimmel, Philip Hartout, Karsten Borgwardt, Dexiong Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new evaluation framework that addresses ...

ID: 2510.06122v1 cs.LG, stat.ML

arXiv PDF

📄 Towards Sampling Data Structures for Tensor Products in Turnstile Streams

2025-10-08

Авторы:

Zhao Song, Shenghao Xie, Samson Zhou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper studies the computational challenges of large-scale attention-based models in artificial intelligence by utilizing importance sampling methods in the streaming setting. Inspired by the classical definition of the $\ell_2$ sampler and the recent progress of the attention scheme in Large Language Models (LLMs), we propose the definition of the attention sampler. Our approach significantly reduces the computational burden of traditional attention mechanisms. We analyze the effectiveness ...

ID: 2510.03678v1 cs.LG, stat.ML

arXiv PDF

📄 Group Policy Gradient

2025-10-08

Авторы:

Junhua Chen, Zixi Zhang, Hantao Zhong, Rika Antonova

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce Group Policy Gradient (GPG), a family of critic-free policy-gradient estimators for general MDPs. Inspired by the success of GRPO's approach in Reinforcement Learning from Human Feedback (RLHF), GPG replaces a learned value function with a group-based Monte Carlo advantage estimator, removing the memory, compute, and hyperparameter costs of training a critic while preserving PPO's clipped-objective structure. We prove the consistency of the GPG estimator, analyze the bias-variance t...

ID: 2510.03679v1 cs.LG, stat.ML

arXiv PDF

📄 From Moments to Models: Graphon Mixture-Aware Mixup and Contrastive Learning

2025-10-08

Авторы:

Ali Azizpour, Reza Ramezanpour, Ashutosh Sabharwal, Santiago Segarra

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Real-world graph datasets often consist of mixtures of populations, where graphs are generated from multiple distinct underlying distributions. However, modern representation learning approaches, such as graph contrastive learning (GCL) and augmentation methods like Mixup, typically overlook this mixture structure. In this work, we propose a unified framework that explicitly models data as a mixture of underlying probabilistic graph generative models represented by graphons. To characterize thes...

ID: 2510.03690v1 cs.LG, stat.ML

arXiv PDF

📄 Balancing Interpretability and Performance in Reinforcement Learning: An Adaptive Spectral Based Linear Approach

2025-10-08

Авторы:

Qianxin Yi, Shao-Bo Lin, Jun Fan, Yao Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reinforcement learning (RL) has been widely applied to sequential decision making, where interpretability and performance are both critical for practical adoption. Current approaches typically focus on performance and rely on post hoc explanations to account for interpretability. Different from these approaches, we focus on designing an interpretability-oriented yet performance-enhanced RL approach. Specifically, we propose a spectral based linear RL method that extends the ridge regression-base...

ID: 2510.03722v1 cs.LG, stat.ML

arXiv PDF

📄 Allocation of Parameters in Transformers

2025-10-08

Авторы:

Ruoxi Yu, Haotian Jiang, Jingpu Cheng, Penghao Yu, Qianxiao Li, Zhong Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Transformers have achieved remarkable successes across a wide range of applications, yet the theoretical foundation of their model efficiency remains underexplored. In this work, we investigate how the model parameters -- mainly attention heads and head dimensions -- should be allocated across layers to balance expressivity and efficiency. We first provide mathematical analysis on the role of early layers in information extraction from an approximation perspective, with a theoretical characteriz...

ID: 2510.03784v1 cs.LG, stat.ML

arXiv PDF

📄 Robust Batched Bandits

2025-10-08

Авторы:

Yunwen Guo, Yunlun Shu, Gongyi Zhuo, Tianyu Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The batched multi-armed bandit (MAB) problem, in which rewards are collected in batches, is crucial for applications such as clinical trials. Existing research predominantly assumes light-tailed reward distributions, yet many real-world scenarios, including clinical outcomes, exhibit heavy-tailed characteristics. This paper bridges this gap by proposing robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings. We reveal a surpri...

ID: 2510.03798v1 cs.LG, stat.ML

arXiv PDF

📄 TROLL: Trust Regions improve Reinforcement Learning for Large Language Models

2025-10-08

Авторы:

Philipp Becker, Niklas Freymuth, Serge Thilges, Fabian Otto, Gerhard Neumann

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

On-policy Reinforcement Learning (RL) with PPO-like clip objectives has become the standard choice for reward-based fine-tuning of large language models (LLMs). Although recent work has explored improved estimators of advantages and normalization, the clipping mechanism itself has remained untouched. Originally introduced as a proxy for principled KL-based trust regions, clipping is a crude approximation that often causes unstable updates and suboptimal performance. We replace the clip objective...

ID: 2510.03817v1 cs.LG, stat.ML

arXiv PDF

📄 Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting

2025-10-08

Авторы:

Behraj Khan, Tahir Qasim Syed

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as $\mathcal{O}(\log(1/\delta)/\Gamma)$, where $\Gamma$ reflects th...

ID: 2510.03839v1 cs.LG, stat.ML

arXiv PDF

📄 Technical note on Fisher Information for Robust Federated Cross-Validation

2025-10-08

Авторы:

Behraj Khan, Tahir Qasim Syed

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

When training data are fragmented across batches or federated-learned across different geographic locations, trained models manifest performance degradation. That degradation partly owes to covariate shift induced by data having been fragmented across time and space and producing dissimilar empirical training distributions. Each fragment's distribution is slightly different to a hypothetical unfragmented training distribution of covariates, and to the single validation distribution. To address t...

ID: 2510.03838v1 cs.LG, stat.ML

arXiv PDF

Показано 221 - 230 из 385 записей