📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments
2025-11-19Авторы:
Samuel Nathanson, Rebecca Williams, Cynthia Matuszek
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially. This study examines whether larger models can systematically jailbreak smaller ones - eliciting harmful or restricted behavior despite alignment safeguards. Using standardized adversarial tasks from JailbreakBench, we simulate over 6,000 multi-turn attacker-target exchanges across major LLM families and sca...
Авторы:
Chiara Bonfanti, Alessandro Druetto, Cataldo Basile, Tharindu Ranasinghe, Marcos Zampieri
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The growing intersection of cybersecurity and law creates a complex
information space where traditional legal research tools struggle to deal with
nuanced connections between cases, statutes, and technical vulnerabilities.
This knowledge divide hinders collaboration between legal experts and
cybersecurity professionals. To address this important gap, this work provides
a first step towards intelligent systems capable of navigating the increasingly
intricate cyber-legal domain. We demonstrate pro...