📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Shourya Batra, Pierce Tillman, Samarth Gaggar, Shashank Kesineni, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Vasu Sharma, Maheep Chaudhary
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
As Large Language Models (LLMs) evolve into personal assistants with access to sensitive user data, they face a critical privacy challenge: while prior work has addressed output-level privacy, recent findings reveal that LLMs often leak private information through their internal reasoning processes, violating contextual privacy expectations. These leaky thoughts occur when models inadvertently expose sensitive details in their reasoning traces, even when final outputs appear safe. The challenge ...
Авторы:
Xingyu Li, Xiaolei Liu, Cheng Liu, Yixiao Xu, Kangyi Ding, Bangzhou Xin, Jia-Li Yin
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
As large language models (LLMs) scale, their inference incurs substantial computational resources, exposing them to energy-latency attacks, where crafted prompts induce high energy and latency cost. Existing attack methods aim to prolong output by delaying the generation of termination symbols. However, as the output grows longer, controlling the termination symbols through input becomes difficult, making these methods less effective. Therefore, we propose LoopLLM, an energy-latency attack frame...
📄 Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology
2025-11-07Авторы:
Thomas Souverain
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
To foster trustworthy Artificial Intelligence (AI) within the European Union,
the AI Act requires providers to mark and detect the outputs of their
general-purpose models. The Article 50 and Recital 133 call for marking methods
that are ''sufficiently reliable, interoperable, effective and robust''. Yet,
the rapidly evolving and heterogeneous landscape of watermarks for Large
Language Models (LLMs) makes it difficult to determine how these four standards
can be translated into concrete and measu...
Авторы:
Shaked Zychlinski, Yuval Kainan
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large Language Models (LLMs) are susceptible to jailbreak attacks where
malicious prompts are disguised using ciphers and character-level encodings to
bypass safety guardrails. While these guardrails often fail to interpret the
encoded content, the underlying models can still process the harmful
instructions. We introduce CPT-Filtering, a novel, model-agnostic with
negligible-costs and near-perfect accuracy guardrail technique that aims to
mitigate these attacks by leveraging the intrinsic behav...
📄 SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
2025-11-01Авторы:
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar, Amin Saied
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The ability of LLM agents to plan and invoke tools exposes them to new safety
risks, making a comprehensive red-teaming system crucial for discovering
vulnerabilities and ensuring their safe deployment. We present SIRAJ: a generic
red-teaming framework for arbitrary black-box LLM agents. We employ a dynamic
two-step process that starts with an agent definition and generates diverse
seed test cases that cover various risk outcomes, tool-use trajectories, and
risk sources. Then, it iteratively con...
📄 Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models
2025-10-29Авторы:
Pavlos Ntais
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models (LLMs) remain vulnerable to sophisticated prompt
engineering attacks that exploit contextual framing to bypass safety
mechanisms, posing significant risks in cybersecurity applications. We
introduce Jailbreak Mimicry, a systematic methodology for training compact
attacker models to automatically generate narrative-based jailbreak prompts in
a one-shot manner. Our approach transforms adversarial prompt discovery from
manual craftsmanship into a reproducible scientific proces...
Авторы:
Amirkia Rafiei Oskooei, Mehmet S. Aktas
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The proficiency of Large Language Models (LLMs) in processing structured data
and adhering to syntactic rules is a capability that drives their widespread
adoption but also makes them paradoxically vulnerable. In this paper, we
investigate this vulnerability through BreakFun, a jailbreak methodology that
weaponizes an LLM's adherence to structured schemas. BreakFun employs a
three-part prompt that combines an innocent framing and a Chain-of-Thought
distraction with a core "Trojan Schema"--a care...
📄 PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
2025-10-23Авторы:
Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large Language Models (LLMs) are improving at an exceptional rate. With the
advent of agentic workflows, multi-turn dialogue has become the de facto mode
of interaction with LLMs for completing long and complex tasks. While LLM
capabilities continue to improve, they remain increasingly susceptible to
jailbreaking, especially in multi-turn scenarios where harmful intent can be
subtly injected across the conversation to produce nefarious outcomes. While
single-turn attacks have been extensively ex...
Авторы:
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The integration of Large Language Models (LLMs) into computer applications
has introduced transformative capabilities but also significant security
challenges. Existing safety alignments, which primarily focus on semantic
interpretation, leave LLMs vulnerable to attacks that use non-standard data
representations. This paper introduces ArtPerception, a novel black-box
jailbreak framework that strategically leverages ASCII art to bypass the
security measures of state-of-the-art (SOTA) LLMs. Unlike...
Авторы:
Shuo Shao, Yiming Li, Hongwei Yao, Yifei Chen, Yuchen Yang, Zhan Qin
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The substantial investment required to develop Large Language Models (LLMs)
makes them valuable intellectual property, raising significant concerns about
copyright protection. LLM fingerprinting has emerged as a key technique to
address this, which aims to verify a model's origin by extracting an intrinsic,
unique signature (a "fingerprint") and comparing it to that of a source model
to identify illicit copies. However, existing black-box fingerprinting methods
often fail to generate distinctive...
Показано 11 -
20
из 50 записей