📊 Статистика дайджестов

Всего дайджестов: 35039 Добавлено сегодня: 432

Последнее обновление: сегодня

📄 Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning

2025-12-02

Авторы:

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, Vinay Kumar Sankarapu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Tabular data drive most real-world machine learning applications, yet building general-purpose models for them remains difficult. Mixed numeric and categorical fields, weak feature structure, and limited labeled data make scaling and generalization challenging. To this end, we introduce Orion-Bix, a tabular foundation model that combines biaxial attention with meta-learned in-context reasoning for few-shot tabular learning. Its encoder alternates standard, grouped, hierarchical, and relational a...

ID: 2512.00181v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

2025-12-02

Авторы:

Jianhai Su, Jinzhu Luo, Qi Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical interactions as offline dataset. We formalize this idea into a framework that accommodates several variants of offline RL incorporation such as final policy recommendation and online fine-tuning. We further introduce convenient techniques to improve its effectiveness in enhancing online learning efficiency. Ou...

ID: 2512.00383v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 ESPO: Entropy Importance Sampling Policy Optimization

2025-12-02

Авторы:

Yuepeng Sheng, Yuwei Huang, Shuman Liu, Haibo Zhang, Anxiang Zeng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language model (LLM) reinforcement learning has increasingly relied on group-based policy optimization frameworks, such as GRPO and GSPO, to achieve stable fine-tuning at scale. However, a fundamental trade-off persists between optimization granularity and training stability. While GSPO improves robustness via sequence-level optimization, its monolithic treatment of sequences introduces severe inefficiencies: its conservative clipping mechanism indiscriminately discards valid training samp...

ID: 2512.00499v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 List Replicable Reinforcement Learning

2025-12-02

Авторы:

Bohan Zhang, Michael Chen, A. Pavan, N. V. Vinodchandran, Lin F. Yang, Ruosong Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Replicability is a fundamental challenge in reinforcement learning (RL), as RL algorithms are empirically observed to be unstable and sensitive to variations in training conditions. To formally address this issue, we study \emph{list replicability} in the Probably Approximately Correct (PAC) RL framework, where an algorithm must return a near-optimal policy that lies in a \emph{small list} of policies across different runs, with high probability. The size of this list defines the \emph{list comp...

ID: 2512.00553v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 Solving Diffusion Inverse Problems with Restart Posterior Sampling

2025-11-27

Авторы:

Bilal Ahmed, Joseph G. Makin

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Inverse problems are fundamental to science and engineering, where the goal is to infer an underlying signal or state from incomplete or noisy measurements. Recent approaches employ diffusion models as powerful implicit priors for such problems, owing to their ability to capture complex data distributions. However, existing diffusion-based methods for inverse problems often rely on strong approximations of the posterior distribution, require computationally expensive gradient backpropagation thr...

ID: 2511.20705v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 Selecting Belief-State Approximations in Simulators with Latent States

2025-11-27

Авторы:

Nan Jiang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables calibration of simulators using real data by resetting to states observed in real-system traces. While often taken for granted, state resetting in complex simulators can be nontrivial: when the simulator comes with latent variables (states), state resetting requires sampling from the posterior over the latent...

ID: 2511.20870v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs

2025-11-27

Авторы:

Dongkyu Derek Cho, Huan Song, Arijit Ghosh Chowdhury, Haotian An, Yawei Wang, Rohit Thekkanal, Negin Sokhandan, Sharlina Keshava, Hannah Marlowe

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Fine-tuning large language models (LLMs) for downstream tasks typically exhibit a fundamental safety-capability tradeoff, where improving task performance degrades safety alignment even on benign datasets. This degradation persists across standard approaches including supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF). While reinforcement learning with verifiable rewards (RLVR) has emerged as a promising alternative that optimizes models on objectively measurable t...

ID: 2511.21050v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 Statistically-Guided Dual-Domain Meta-Learning with Adaptive Multi-Prototype Aggregation for Distributed Fiber Optic Sensing

2025-11-26

Авторы:

Yifan He, Haodong Zhang, Qiuheng Song, Lin Lei, Zhenxuan Zeng, Haoyang He, Hongyan Wu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Distributed Fiber Optic Sensing (DFOS) has shown strong potential in perimeter security due to its capability of monitoring vibration events across long distances with fine spatial resolution. However, practical DFOS systems face three critical challenges: (1) signal patterns of the same activity vary drastically under different fiber deployment types (e.g., underground, wall-mounted), causing domain shift; (2) labeled data in new deployment scenarios is often scarce or entirely unavailable, lim...

ID: 2511.17902v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 Local Entropy Search over Descent Sequences for Bayesian Optimization

2025-11-26

Авторы:

David Stenger, Armin Lindicke, Alexander von Rohr, Sebastian Trimpe

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Searching large and complex design spaces for a global optimum can be infeasible and unnecessary. A practical alternative is to iteratively refine the neighborhood of an initial design using local optimization methods such as gradient descent. We propose local entropy search (LES), a Bayesian optimization paradigm that explicitly targets the solutions reachable by the descent sequences of iterative optimizers. The algorithm propagates the posterior belief over the objective through the optimizer...

ID: 2511.19241v1 cs.LG, cs.AI, stat.ML

arXiv PDF

📄 Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme

2025-11-26

Авторы:

Rudy Morel, Francesco Pio Ramunno, Jeff Shen, Alberto Bietti, Kyunghyun Cho, Miles Cranmer, Siavash Golkar, Olexandr Gugnin, Geraud Krawezik, Tanya Marwah, Michael McCabe, Lucas Meyer, Payel Mukhopadhyay, Ruben Ohana, Liam Parker, Helen Qu, François Rozet, K. D. Leka, François Lanusse, David Fouhey, Shirley Ho

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Conditional diffusion models provide a natural framework for probabilistic prediction of dynamical systems and have been successfully applied to fluid dynamics and weather prediction. However, in many settings, the available information at a given time represents only a small fraction of what is needed to predict future states, either due to measurement uncertainty or because only a small fraction of the state can be observed. This is true for example in solar physics, where we can observe the S...

ID: 2511.19390v1 cs.LG, astro-ph.SR, cs.AI, stat.ML

arXiv PDF

Показано 11 - 20 из 124 записей