📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models
2025-10-29Авторы:
Pavlos Ntais
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models (LLMs) remain vulnerable to sophisticated prompt
engineering attacks that exploit contextual framing to bypass safety
mechanisms, posing significant risks in cybersecurity applications. We
introduce Jailbreak Mimicry, a systematic methodology for training compact
attacker models to automatically generate narrative-based jailbreak prompts in
a one-shot manner. Our approach transforms adversarial prompt discovery from
manual craftsmanship into a reproducible scientific proces...