📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Saeed Najafi, Alona Fyshe
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Aligning Large Language Models (LLMs) with human preferences is crucial, but
standard methods like Reinforcement Learning from Human Feedback (RLHF) are
often complex and unstable. In this work, we propose a new, simpler approach
that recasts alignment through the lens of Maximum Marginal Likelihood (MML)
estimation. Our new MML based Preference Optimization (MMPO) maximizes the
marginal log-likelihood of a preferred text output, using the preference pair
as samples for approximation, and forgoe...
Авторы:
Zidong Liu, Zhuoyan Xu, Zhenmei Shi, Yingyu Liang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Composing basic skills from simple tasks to accomplish composite tasks is
crucial for modern intelligent systems. We investigate the in-context
composition ability of language models to perform composite tasks that combine
basic skills demonstrated in in-context examples. This is more challenging than
the standard setting, where skills and their composition can be learned in
training. We conduct systematic experiments on various representative
open-source language models, utilizing linguistic an...
Авторы:
Di Zhang, Xun Wu, Shaohan Huang, Yaru Hao, Li Dong, Zewen Chi, Zhifang Sui, Furu Wei
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recent advances in reinforcement learning (RL) have substantially improved
the training of large-scale language models, leading to significant gains in
generation quality and reasoning ability. However, most existing research
focuses on dense models, while RL training for Mixture-of-Experts (MoE)
architectures remains underexplored. To address the instability commonly
observed in MoE training, we propose a novel router-aware approach to optimize
importance sampling (IS) weights in off-policy RL....
Авторы:
Siddharth Sahay, Radhika Agarwal
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper presents an end-to-end multilingual translation pipeline that
integrates a custom U-Net for text detection, the Tesseract engine for text
recognition, and a from-scratch sequence-to-sequence (Seq2Seq) Transformer for
Neural Machine Translation (NMT). Our approach first utilizes a U-Net model,
trained on a synthetic dataset , to accurately segment and detect text regions
from an image. These detected regions are then processed by Tesseract to
extract the source text. This extracted tex...
📄 Leverage Unlearning to Sanitize LLMs
2025-10-28Авторы:
Antoine Boutet, Lucas Magnana
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Pre-trained large language models (LLMs) are becoming useful for various
tasks. To improve their performance on certain tasks, it is necessary to
fine-tune them on specific data corpora (e.g., medical reports, business data).
These specialized data corpora may contain sensitive data (e.g., personal or
confidential data) that will be memorized by the model and likely to be
regurgitated during its subsequent use. This memorization of sensitive
information by the model poses a significant privacy o...
Авторы:
Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin Delaney, Chris Russell
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Text-to-image diffusion models, such as Stable Diffusion, have demonstrated
remarkable capabilities in generating high-quality and diverse images from
natural language prompts. However, recent studies reveal that these models
often replicate and amplify societal biases, particularly along demographic
attributes like gender and race. In this paper, we introduce FairImagen
(https://github.com/fuzihaofzh/FairImagen), a post-hoc debiasing framework that
operates on prompt embeddings to mitigate such...
Авторы:
Dian Yu, Yulai Zhao, Kishan Panaganti, Linfeng Song, Haitao Mi, Dong Yu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We propose Reinforcement Learning with Explicit Human Values (RLEV), a method
that aligns Large Language Model (LLM) optimization directly with quantifiable
human value signals. While Reinforcement Learning with Verifiable Rewards
(RLVR) effectively trains models in objective domains using binary correctness
rewards, it overlooks that not all tasks are equally significant. RLEV extends
this framework by incorporating human-defined value signals directly into the
reward function. Using exam-style...
Авторы:
Ziqian Zhong, Aditi Raghunathan, Nicholas Carlini
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The tendency to find and exploit "shortcuts" to complete tasks poses
significant risks for reliable assessment and deployment of large language
models (LLMs). For example, an LLM agent with access to unit tests may delete
failing tests rather than fix the underlying bug. Such behavior undermines both
the validity of benchmark results and the reliability of real-world LLM coding
assistant deployments.
To quantify, study, and mitigate such behavior, we introduce ImpossibleBench,
a benchmark fram...
📄 BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation
2025-10-25Авторы:
Liang Ye, Shengqin Chen, Jiazhu Dai
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The rapid progress of graph generation has raised new security concerns,
particularly regarding backdoor vulnerabilities. While prior work has explored
backdoor attacks in image diffusion and unconditional graph generation,
conditional, especially text-guided graph generation remains largely
unexamined. This paper proposes BadGraph, a backdoor attack method targeting
latent diffusion models for text-guided graph generation. BadGraph leverages
textual triggers to poison training data, covertly im...
📄 LLM Unlearning with LLM Beliefs
2025-10-24Авторы:
Kemou Li, Qizhou Wang, Yue Wang, Fengpeng Li, Jun Liu, Bo Han, Jiantao Zhou
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models trained on vast corpora inherently risk memorizing
sensitive or harmful content, which may later resurface in their outputs.
Prevailing unlearning methods generally rely on gradient ascent and its
variants to lower the probability of specific target responses. However, we
find that this strategy induces a critical side effect: probability mass is
redistributed into high-likelihood regions, often corresponding to semantically
related rephrasings of the targets. We refer to t...
Показано 61 -
70
из 233 записей