📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Lingcheng Kong, Jiateng Wei, Hanzhang Shen, Huan Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
GPU kernel generation by LLMs has recently experienced rapid development,
leveraging test-time scaling and reinforcement learning techniques. However, a
key challenge for kernel generation is the scarcity of high-quality data, as
most high-quality kernels are proprietary and not open-source. This challenge
prevents us from leveraging supervised fine-tuning to align LLMs to the kernel
generation task. To address this challenge, we develop a pipeline that
generates and curates high-quality CUDA ke...
📄 Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs
2025-10-09Авторы:
Paloma García-de-Herreros, Philipp Slusallek, Dietrich Klakow, Vagrant Gautam
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models have shown great success on natural language tasks in
recent years, but they have also shown great promise when adapted to new
modalities, e.g., for scientific machine learning tasks. Even though
decoder-only models are more popular within NLP and scale exceedingly well at
generating natural language, most proposed approaches for cross-modal
adaptation focus on encoder-only models, raising the question of how model
architecture affects these approaches. In this paper, we th...
Авторы:
Zichong Li, Liming Liu, Chen Liang, Weizhu Chen, Tuo Zhao
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The choice of optimizer significantly impacts the training efficiency and
computational costs of large language models (LLMs). Recently, the Muon
optimizer has demonstrated promising results by orthogonalizing parameter
updates, improving optimization geometry through better conditioning. Despite
Muon's emergence as a candidate successor to Adam, the potential for jointly
leveraging their strengths has not been systematically explored. In this work,
we bridge this gap by proposing NorMuon (Neuro...
Авторы:
Xueyan Li, Guinan Su, Mrinmaya Sachan, Jonas Geiping
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large Language Models (LLMs) are increasingly applied to complex tasks that
require extended reasoning. In such settings, models often benefit from diverse
chains-of-thought to arrive at multiple candidate solutions. This requires two
competing objectives: to inject enough stochasticity to explore multiple
reasoning chains, and to ensure sufficient accuracy and quality in each path.
Existing works pursue the first objective by increasing exploration at highly
uncertain steps with higher temperat...
Авторы:
Nyal Patel, Matthieu Bou, Arjun Jagota, Satyapriya Krishna, Sonali Parbhoo
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Reinforcement Learning from Human Feedback (RLHF) aligns Large Language
Models (LLMs) with human preferences, yet the underlying reward signals they
internalize remain hidden, posing a critical challenge for interpretability and
safety. Existing approaches attempt to extract these latent incentives using
Inverse Reinforcement Learning (IRL), but treat all preference pairs equally,
often overlooking the most informative signals: those examples the extracted
reward model misclassifies or assigns n...
Авторы:
Matthieu Bou, Nyal Patel, Arjun Jagota, Satyapriya Krishna, Sonali Parbhoo
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The objectives that Large Language Models (LLMs) implicitly optimize remain
dangerously opaque, making trustworthy alignment and auditing a grand
challenge. While Inverse Reinforcement Learning (IRL) can infer reward
functions from behaviour, existing approaches either produce a single,
overconfident reward estimate or fail to address the fundamental ambiguity of
the task (non-identifiability). This paper introduces a principled auditing
framework that re-frames reward inference from a simple es...
Авторы:
Prateek Humane, Paolo Cudrano, Daniel Z. Kaplan, Matteo Matteucci, Supriyo Chakraborty, Irina Rish
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Fine-tuning large language models (LLMs) on chain-of-thought (CoT) data shows
that a small amount of high-quality data can outperform massive datasets. Yet,
what constitutes "quality" remains ill-defined. Existing reasoning methods rely
on indirect heuristics such as problem difficulty or trace length, while
instruction-tuning has explored a broader range of automated selection
strategies, but rarely in the context of reasoning. We propose to define
reasoning data quality using influence functio...
📄 Studying the Korean Word-Chain Game with RLVR:Mitigating Reward Conflicts via Curriculum Learning
2025-10-08Авторы:
Donghwan Rho
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Reinforcement learning with verifiable rewards (RLVR) is a promising approach
for training large language models (LLMs) with stronger reasoning abilities. It
has also been applied to a variety of logic puzzles. In this work, we study the
Korean word-chain game using RLVR. We show that rule-derived rewards can
naturally conflict, and demonstrate through experiments that a
curriculum-learning scheme mitigates these conflicts. Our findings motivate
further studies of puzzle tasks in diverse languag...
Авторы:
Jairo Diaz-Rodriguez, Mumin Jia
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Kernel change-point detection (KCPD) has become a widely used tool for
identifying structural changes in complex data. While existing theory
establishes consistency under independence assumptions, real-world sequential
data such as text exhibits strong dependencies. We establish new guarantees for
KCPD under $m$-dependent data: specifically, we prove consistency in the number
of detected change points and weak consistency in their locations under mild
additional assumptions. We perform an LLM-ba...
Авторы:
Fatmazohra Rezkellah, Ramzi Dakhmouche
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
With the increasing adoption of Large Language Models (LLMs), more
customization is needed to ensure privacy-preserving and safe generation. We
address this objective from two critical aspects: unlearning of sensitive
information and robustness to jail-breaking attacks. We investigate various
constrained optimization formulations that address both aspects in a
\emph{unified manner}, by finding the smallest possible interventions on LLM
weights that either make a given vocabulary set unreachable ...
Показано 101 -
110
из 233 записей