📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection

2025-11-15

Авторы:

Fangqi Dai, Xingjian Jiang, Zizhuang Deng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

To prevent misinformation and social issues arising from trustworthy-looking content generated by LLMs, it is crucial to develop efficient and reliable methods for identifying the source of texts. Previous approaches have demonstrated exceptional performance in detecting texts fully generated by LLMs. However, these methods struggle when confronting more advanced LLM output or text with adversarial multi-task machine revision, especially in the black-box setting, where the generating model is un...

ID: 2511.06942v2 cs.CL, cs.CR

arXiv PDF

📄 EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models

2025-11-15

Авторы:

Jialin Wu, Kecen Li, Zhicong Huang, Xinfeng Li, Xiaofeng Wang, Cheng Hong

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Many machine learning models are fine-tuned from large language models (LLMs) to achieve high performance in specialized domains like code generation, biomedical analysis, and mathematical problem solving. However, this fine-tuning process often introduces a critical vulnerability: the systematic degradation of safety alignment, undermining ethical guidelines and increasing the risk of harmful outputs. Addressing this challenge, we introduce EnchTable, a novel framework designed to transfer and ...

ID: 2511.09880v1 cs.CL, cs.CR

arXiv PDF

📄 AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness

2025-11-15

Авторы:

Zhuoqun Huang, Neil G. Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We consider the problem of certified robustness for sequence classification against edit distance perturbations. Naturally occurring inputs of varying lengths (e.g., sentences in natural language processing tasks) present a challenge to current methods that employ fixed-rate deletion mechanisms and lead to suboptimal performance. To this end, we introduce AdaptDel methods with adaptable deletion rates that dynamically adjust based on input properties. We extend the theoretical framework of rando...

ID: 2511.09316v1 cs.CL, cs.CR, cs.LG

arXiv PDF

📄 How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis

2025-11-08

Авторы:

Ahmed Mostafa, Raisul Arefin Nahid, Samuel Mulder

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Tokenization is fundamental in assembly code analysis, impacting intrinsic characteristics like vocabulary size, semantic coverage, and extrinsic performance in downstream tasks. Despite its significance, tokenization in the context of assembly code remains an underexplored area. This study aims to address this gap by evaluating the intrinsic properties of Natural Language Processing (NLP) tokenization models and parameter choices, such as vocabulary size. We explore preprocessing customization ...

ID: 2511.03825v1 cs.AI, cs.CL, cs.CR, cs.LG

arXiv PDF

📄 "Give a Positive Review Only": An Early Investigation Into In-Paper Prompt Injection Attacks and Defenses for AI Reviewers

2025-11-06

Авторы:

Qin Zhou, Zhexin Zhang, Zhi Li, Limin Sun

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

With the rapid advancement of AI models, their deployment across diverse tasks has become increasingly widespread. A notable emerging application is leveraging AI models to assist in reviewing scientific papers. However, recent reports have revealed that some papers contain hidden, injected prompts designed to manipulate AI reviewers into providing overly favorable evaluations. In this work, we present an early systematic investigation into this emerging threat. We propose two classes of attacks...

ID: 2511.01287v1 cs.CL, cs.CR

arXiv PDF

📄 Optimal Detection for Language Watermarks with Pseudorandom Collision

2025-10-29

Авторы:

T. Tony Cai, Xiang Li, Qi Long, Weijie J. Su, Garrett G. Wen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Text watermarking plays a crucial role in ensuring the traceability and accountability of large language model (LLM) outputs and mitigating misuse. While promising, most existing methods assume perfect pseudorandomness. In practice, repetition in generated text induces collisions that create structured dependence, compromising Type I error control and invalidating standard analyses. We introduce a statistical framework that captures this structure through a hierarchical two-layer partition. At...

ID: 2510.22007v1 cs.LG, cs.CL, cs.CR, math.ST, stat.ML, stat.TH

arXiv PDF

📄 Power to the Clients: Federated Learning in a Dictatorship Setting

2025-10-29

Авторы:

Mohammadsajad Alipour, Mohammad Mohammadi Amiri

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Federated learning (FL) has emerged as a promising paradigm for decentralized model training, enabling multiple clients to collaboratively learn a shared model without exchanging their local data. However, the decentralized nature of FL also introduces vulnerabilities, as malicious clients can compromise or manipulate the training process. In this work, we introduce dictator clients, a novel, well-defined, and analytically tractable class of malicious participants capable of entirely erasing the...

ID: 2510.22149v1 cs.LG, cs.AI, cs.CL, cs.CR, cs.CV, cs.DC

arXiv PDF

📄 A Neuro-Symbolic Multi-Agent Approach to Legal-Cybersecurity Knowledge Integration

2025-10-29

Авторы:

Chiara Bonfanti, Alessandro Druetto, Cataldo Basile, Tharindu Ranasinghe, Marcos Zampieri

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The growing intersection of cybersecurity and law creates a complex information space where traditional legal research tools struggle to deal with nuanced connections between cases, statutes, and technical vulnerabilities. This knowledge divide hinders collaboration between legal experts and cybersecurity professionals. To address this important gap, this work provides a first step towards intelligent systems capable of navigating the increasingly intricate cyber-legal domain. We demonstrate pro...

ID: 2510.23443v1 cs.AI, cs.CL, cs.CR, cs.MA

arXiv PDF

📄 LLMs can hide text in other text of the same length

2025-10-27

Авторы:

Antonio Norelli, Michael Bronstein

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

A meaningful text can be hidden inside another, completely different yet still coherent and plausible, text of the same length. For example, a tweet containing a harsh political critique could be embedded in a tweet that celebrates the same political leader, or an ordinary product review could conceal a secret manuscript. This uncanny state of affairs is now possible thanks to Large Language Models, and in this paper we present a simple and efficient protocol to achieve it. We show that even mod...

ID: 2510.20075v2 cs.AI, cs.CL, cs.CR, cs.LG

arXiv PDF

📄 LLMs can hide text in other text of the same length.ipynb

2025-10-25

Авторы:

Antonio Norelli, Michael Bronstein

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

ID: 2510.20075v1 cs.AI, cs.CL, cs.CR, cs.LG

arXiv PDF

Показано 11 - 20 из 60 записей