📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Zhichao Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
I propose \textbf{G}roup-relative \textbf{I}mplicit \textbf{F}ine
\textbf{T}uning (GIFT), a novel reinforcement learning framework for aligning
LLMs. Instead of directly maximizing cumulative rewards like PPO or GRPO, GIFT
minimizes the discrepancy between implicit and explicit reward models. It
combines three key ideas: (1) the online multi-response generation and
normalization of GRPO, (2) the implicit reward formulation of DPO, and (3) the
implicit-explicit reward alignment principle of UNA. ...
Авторы:
Xinqi Li, Yiqun Liu, Shan Jiang, Enrong Zheng, Huaijin Zheng, Wenhao Dai, Haodong Deng, Dianhai Yu, Yanjun Ma
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We introduce GraphNet, a dataset of 2.7K real-world deep learning
computational graphs with rich metadata, spanning six major task categories
across multiple deep learning frameworks. To evaluate tensor compiler
performance on these samples, we propose the benchmark metric Speedup Score
S(t), which jointly considers runtime speedup and execution correctness under
tunable tolerance levels, offering a reliable measure of general optimization
capability. Furthermore, we extend S(t) to the Error-awa...
Авторы:
Armin Gerami, Ramani Duraiswami
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The original softmax-based attention mechanism (regular attention) in the
extremely successful Transformer architecture computes attention between $N$
tokens, each embedded in a $D$-dimensional head, with a time complexity of
$O(N^2D)$. Given the success of Transformers, improving their runtime during
both training and inference is a popular research area. One such approach is
the introduction of the linear attention (LA) mechanisms, which offers a linear
time complexity of $O(ND^2)$ and have de...
Авторы:
Iskander Azangulov, Teodora Pandeva, Niranjani Prasad, Javier Zazo, Sushrut Karmalkar
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Masked diffusion models (MDMs) offer a compelling alternative to
autoregressive models (ARMs) for discrete text generation because they enable
parallel token sampling, rather than sequential, left-to-right generation. This
means potentially much faster inference. However, effective parallel sampling
faces two competing requirements: (i) simultaneously updated tokens must be
conditionally independent, and (ii) updates should prioritise high-confidence
predictions. These goals conflict because hig...
Авторы:
T. Tony Cai, Xiang Li, Qi Long, Weijie J. Su, Garrett G. Wen
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Text watermarking plays a crucial role in ensuring the traceability and
accountability of large language model (LLM) outputs and mitigating misuse.
While promising, most existing methods assume perfect pseudorandomness. In
practice, repetition in generated text induces collisions that create
structured dependence, compromising Type I error control and invalidating
standard analyses.
We introduce a statistical framework that captures this structure through a
hierarchical two-layer partition. At...
📄 Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
2025-10-29Авторы:
Jinzhe Liu, Junshu Sun, Shufan Shen, Chenxue Yang, Shuhui Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Lifelong knowledge editing enables continuous, precise updates to outdated
knowledge in large language models (LLMs) without computationally expensive
full retraining. However, existing methods often accumulate errors throughout
the editing process, causing a gradual decline in both editing accuracy and
generalization. To tackle this problem, we propose Neuron-Specific Masked
Knowledge Editing (NMKE), a novel fine-grained editing framework that combines
neuron-level attribution with dynamic spar...
📄 The Lossy Horizon: Error-Bounded Predictive Coding for Lossy Text Compression (Episode I)
2025-10-29Авторы:
Nnamdi Aghanya, Jun Li, Kewei Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large Language Models (LLMs) can achieve near-optimal lossless compression by
acting as powerful probability models. We investigate their use in the lossy
domain, where reconstruction fidelity is traded for higher compression ratios.
This paper introduces Error-Bounded Predictive Coding (EPC), a lossy text codec
that leverages a Masked Language Model (MLM) as a decompressor. Instead of
storing a subset of original tokens, EPC allows the model to predict masked
content and stores minimal, rank-ba...
Авторы:
Jiazheng Li, Andreas Damianou, J Rosser, José Luis Redondo García, Konstantina Palla
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Chain-of-thought (CoT) traces promise transparency for reasoning language
models, but prior work shows they are not always faithful reflections of
internal computation. This raises challenges for oversight: practitioners may
misinterpret decorative reasoning as genuine. We introduce Concept Walk, a
general framework for tracing how a model's internal stance evolves with
respect to a concept direction during reasoning. Unlike surface text, Concept
Walk operates in activation space, projecting eac...
Авторы:
Zirui Pang, Hao Zheng, Zhijie Deng, Ling Li, Zixin Zhong, Jiaheng Wei
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
LLM unlearning has emerged as a promising approach, aiming to enable models
to forget hazardous/undesired knowledge at low cost while preserving as much
model utility as possible. Among existing techniques, the most straightforward
method is performing Gradient Ascent (GA) w.r.t. the forget data, thereby
forcing the model to unlearn the forget dataset. However, GA suffers from
severe instability, as it drives updates in a divergent direction, often
resulting in drastically degraded model utility...
Авторы:
Omar Naim, Krish Sharma, Nicholas Asher
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper we introduce Tale, Task-Aware Layer Elimination, an
inference-time algorithm that prunes entire transformer layers in an LLM by
directly optimizing task-specific validation performance. We evaluate TALE on 9
tasks and 5 models, including LLaMA 3.1 8B, Qwen 2.5 7B, Qwen 2.5 0.5B, Mistral
7B, and Lucie 7B, under both zero-shot and few-shot settings. Unlike prior
approaches, TALE requires no retraining and consistently improves accuracy
while reducing computational cost across all ben...
Показано 51 -
60
из 233 записей