📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 0
Последнее обновление: сегодня
Авторы:
Yining Lu, Wenyi Tang, Max Johnson, Taeho Jung, Meng Jiang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which ca...
Авторы:
Amr Gomaa, Ahmed Salem, Sahar Abdelnabi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
As language models evolve into autonomous agents that act and communicate on
behalf of users, ensuring safety in multi-agent ecosystems becomes a central
challenge. Interactions between personal assistants and external service
providers expose a core tension between utility and protection: effective
collaboration requires information sharing, yet every exchange creates new
attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating
privacy and security risks in agent-agent interac...
Авторы:
Hongwei Yao, Yun Xia, Shuo Shao, Haoran Shi, Tong Qiao, Cong Wang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models (LLMs) increasingly employ guardrails to enforce
ethical, legal, and application-specific constraints on their outputs. While
effective at mitigating harmful responses, these guardrails introduce a new
class of vulnerabilities by exposing observable decision patterns. In this
work, we present the first study of black-box LLM guardrail reverse-engineering
attacks. We propose Guardrail Reverse-engineering Attack (GRA), a reinforcement
learning-based framework that leverages g...
Авторы:
Yize Liu, Yunyun Hou, Aina Sui
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large Language Models (LLMs) have been widely deployed across various
applications, yet their potential security and ethical risks have raised
increasing concerns. Existing research employs red teaming evaluations,
utilizing multi-turn jailbreaks to identify potential vulnerabilities in LLMs.
However, these approaches often lack exploration of successful dialogue
trajectories within the attack space, and they tend to overlook the
considerable overhead associated with the attack process. To addre...
Авторы:
Soufiane Essahli, Oussama Sarsar, Imane Fouad, Anas Motii, Ahmed Bentajer
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Social platforms distribute information at unprecedented speed, which in turn
accelerates the spread of misinformation and threatens public discourse. We
present FakeZero, a fully client-side, cross-platform browser extension that
flags unreliable posts on Facebook and X (formerly Twitter) while the user
scrolls. All computation, DOM scraping, tokenisation, Transformer inference,
and UI rendering run locally through the Chromium messaging API, so no personal
data leaves the device.FakeZero emplo...
Авторы:
Haohua Duan, Liyao Xiang, Xin Zhang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Watermarking schemes for large language models (LLMs) have been proposed to
identify the source of the generated text, mitigating the potential threats
emerged from model theft. However, current watermarking solutions hardly
resolve the trust issue: the non-public watermark detection cannot prove itself
faithfully conducting the detection. We observe that it is attributed to the
secret key mostly used in the watermark detection -- it cannot be public, or
the adversary may launch removal attacks ...
Авторы:
Hasan Akgul, Mari Eplik, Javier Rojas, Aina Binti Abdullah, Pieter van der Merwe
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
ZK-SenseLM is a secure and auditable wireless sensing framework that pairs a
large-model encoder for Wi-Fi channel state information (and optionally mmWave
radar or RFID) with a policy-grounded decision layer and end-to-end
zero-knowledge proofs of inference. The encoder uses masked spectral
pretraining with phase-consistency regularization, plus a light cross-modal
alignment that ties RF features to compact, human-interpretable policy tokens.
To reduce unsafe actions under distribution shift, w...
Авторы:
Hiromu Takahashi, Shotaro Ishihara
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We propose Fast-MIA (https://github.com/Nikkei/fast-mia), a Python library
for efficiently evaluating membership inference attacks (MIA) against Large
Language Models (LLMs). MIA against LLMs has emerged as a crucial challenge due
to growing concerns over copyright, security, and data privacy, and has
attracted increasing research attention. However, the progress of this research
is significantly hindered by two main obstacles: (1) the high computational
cost of inference in LLMs, and (2) the la...
Авторы:
Zheng-Xin Yong, Stephen H. Bach
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We discover a novel and surprising phenomenon of unintentional misalignment
in reasoning language models (RLMs), which we call self-jailbreaking.
Specifically, after benign reasoning training on math or code domains, RLMs
will use multiple strategies to circumvent their own safety guardrails. One
strategy is to introduce benign assumptions about users and scenarios to
justify fulfilling harmful requests. For instance, an RLM reasons that harmful
requests like ``outline a strategy for stealing cu...
Авторы:
Adetayo Adebimpe, Helmut Neukirchen, Thomas Welsh
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Honeypots are decoy systems used for gathering valuable threat intelligence
or diverting attackers away from production systems. Maximising attacker
engagement is essential to their utility. However research has highlighted that
context-awareness, such as the ability to respond to new attack types, systems
and attacker agents, is necessary to increase engagement. Large Language Models
(LLMs) have been shown as one approach to increase context awareness but suffer
from several challenges includin...
Показано 11 -
20
из 58 записей