📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning

2025-12-03

Авторы:

Boyu Zhu, Xiaofei Wen, Wenjie Jacky Mo, Tinghui Zhu, Yanan Xie, Peng Qi, Muhao Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Omni-modal Large Language Models (OLLMs) that process text, images, videos, and audio introduce new challenges for safety and value guardrails in human-AI interaction. Prior guardrail research largely targets unimodal settings and typically frames safeguarding as binary classification, which limits robustness across diverse modalities and tasks. To address this gap, we propose OmniGuard, the first family of omni-modal guardrails that performs safeguarding across all modalities with deliberate re...

ID: 2512.02306v1 cs.AI, cs.CL, cs.CR, cs.CV, cs.LG

arXiv PDF

📄 Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments

2025-11-19

Авторы:

Samuel Nathanson, Rebecca Williams, Cynthia Matuszek

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially. This study examines whether larger models can systematically jailbreak smaller ones - eliciting harmful or restricted behavior despite alignment safeguards. Using standardized adversarial tasks from JailbreakBench, we simulate over 6,000 multi-turn attacker-target exchanges across major LLM families and sca...

ID: 2511.13788v1 cs.LG, cs.AI, cs.CL, cs.CR, cs.MA

arXiv PDF

📄 Forgetting-MarI: LLM Unlearning via Marginal Information Regularization

2025-11-18

Авторы:

Shizhou Xu, Yuan Ni, Stefan Broecker, Thomas Strohmer

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As AI models are trained on ever-expanding datasets, the ability to remove the influence of specific data from trained models has become essential for privacy protection and regulatory compliance. Unlearning addresses this challenge by selectively removing parametric knowledge from the trained models without retraining from scratch, which is critical for resource-intensive models such as Large Language Models (LLMs). Existing unlearning methods often degrade model performance by removing more in...

ID: 2511.11914v1 cs.AI, cs.CL, cs.CR, cs.IT, cs.LG

arXiv PDF

📄 SQuaD: The Software Quality Dataset

2025-11-17

Авторы:

Mikel Robredo, Matteo Esposito, Davide Taibi, Rafael Peñaloza, Valentina Lenarduzzi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Software quality research increasingly relies on large-scale datasets that measure both the product and process aspects of software systems. However, existing resources often focus on limited dimensions, such as code smells, technical debt, or refactoring activity, thereby restricting comprehensive analyses across time and quality dimensions. To address this gap, we present the Software Quality Dataset (SQuaD), a multi-dimensional, time-aware collection of software quality metrics extracted from...

ID: 2511.11265v1 cs.SE, cs.AI, cs.CL, cs.CR, cs.IR

arXiv PDF

📄 How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis

2025-11-08

Авторы:

Ahmed Mostafa, Raisul Arefin Nahid, Samuel Mulder

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Tokenization is fundamental in assembly code analysis, impacting intrinsic characteristics like vocabulary size, semantic coverage, and extrinsic performance in downstream tasks. Despite its significance, tokenization in the context of assembly code remains an underexplored area. This study aims to address this gap by evaluating the intrinsic properties of Natural Language Processing (NLP) tokenization models and parameter choices, such as vocabulary size. We explore preprocessing customization ...

ID: 2511.03825v1 cs.AI, cs.CL, cs.CR, cs.LG

arXiv PDF

📄 Power to the Clients: Federated Learning in a Dictatorship Setting

2025-10-29

Авторы:

Mohammadsajad Alipour, Mohammad Mohammadi Amiri

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Federated learning (FL) has emerged as a promising paradigm for decentralized model training, enabling multiple clients to collaboratively learn a shared model without exchanging their local data. However, the decentralized nature of FL also introduces vulnerabilities, as malicious clients can compromise or manipulate the training process. In this work, we introduce dictator clients, a novel, well-defined, and analytically tractable class of malicious participants capable of entirely erasing the...

ID: 2510.22149v1 cs.LG, cs.AI, cs.CL, cs.CR, cs.CV, cs.DC

arXiv PDF

📄 A Neuro-Symbolic Multi-Agent Approach to Legal-Cybersecurity Knowledge Integration

2025-10-29

Авторы:

Chiara Bonfanti, Alessandro Druetto, Cataldo Basile, Tharindu Ranasinghe, Marcos Zampieri

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The growing intersection of cybersecurity and law creates a complex information space where traditional legal research tools struggle to deal with nuanced connections between cases, statutes, and technical vulnerabilities. This knowledge divide hinders collaboration between legal experts and cybersecurity professionals. To address this important gap, this work provides a first step towards intelligent systems capable of navigating the increasingly intricate cyber-legal domain. We demonstrate pro...

ID: 2510.23443v1 cs.AI, cs.CL, cs.CR, cs.MA

arXiv PDF

📄 LLMs can hide text in other text of the same length

2025-10-27

Авторы:

Antonio Norelli, Michael Bronstein

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

A meaningful text can be hidden inside another, completely different yet still coherent and plausible, text of the same length. For example, a tweet containing a harsh political critique could be embedded in a tweet that celebrates the same political leader, or an ordinary product review could conceal a secret manuscript. This uncanny state of affairs is now possible thanks to Large Language Models, and in this paper we present a simple and efficient protocol to achieve it. We show that even mod...

ID: 2510.20075v2 cs.AI, cs.CL, cs.CR, cs.LG

arXiv PDF

📄 LLMs can hide text in other text of the same length.ipynb

2025-10-25

Авторы:

Antonio Norelli, Michael Bronstein

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

ID: 2510.20075v1 cs.AI, cs.CL, cs.CR, cs.LG

arXiv PDF

📄 Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies

2025-10-18

Авторы:

Mason Nakamura, Abhinav Kumar, Saaduddin Mahmud, Sahar Abdelnabi, Shlomo Zilberstein, Eugene Bagdasarian

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

A multi-agent system (MAS) powered by large language models (LLMs) can automate tedious user tasks such as meeting scheduling that requires inter-agent collaboration. LLMs enable nuanced protocols that account for unstructured private data, user constraints, and preferences. However, this design introduces new risks, including misalignment and attacks by malicious parties that compromise agents or steal user data. In this paper, we propose the Terrarium framework for fine-grained study on safety...

ID: 2510.14312v1 cs.AI, cs.CL, cs.CR, I.2.7; I.2.11

arXiv PDF

Показано 1 - 10 из 23 записей