📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

2025-11-06

Авторы:

Aashray Reddy, Andrew Zagula, Nicholas Saban

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models (LLMs) remain vulnerable to jailbreaking attacks where adversarial prompts elicit harmful outputs, yet most evaluations focus on single-turn interactions while real-world attacks unfold through adaptive multi-turn conversations. We present AutoAdv, a training-free framework for automated multi-turn jailbreaking that achieves up to 95% attack success rate on Llama-3.1-8B within six turns a 24 percent improvement over single turn baselines. AutoAdv uniquely combines three ada...

ID: 2511.02376v1 cs.CL, cs.AI, cs.CR, cs.LG

arXiv PDF

📄 Security Risk of Misalignment between Text and Image in Multi-modal Model

2025-11-01

Авторы:

Xiaosen Wang, Zhijin Ge, Shaokang Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Despite the notable advancements and versatility of multi-modal diffusion models, such as text-to-image models, their susceptibility to adversarial inputs remains underexplored. Contrary to expectations, our investigations reveal that the alignment between textual and Image modalities in existing diffusion models is inadequate. This misalignment presents significant risks, especially in the generation of inappropriate or Not-Safe-For-Work (NSFW) content. To this end, we propose a novel attack ca...

ID: 2510.26105v1 cs.CV, cs.AI, cs.CR

arXiv PDF

📄 RefleXGen:The unexamined code is not worth using

2025-10-30

Авторы:

Bin Wang, Hui Li, AoFan Liu, BoTao Yang, Ao Yang, YiLu Zhong, Weixiang Huang, Yanping Zhang, Runhuai Huang, Weimin Zeng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Security in code generation remains a pivotal challenge when applying large language models (LLMs). This paper introduces RefleXGen, an innovative method that significantly enhances code security by integrating Retrieval-Augmented Generation (RAG) techniques with guided self-reflection mechanisms inherent in LLMs. Unlike traditional approaches that rely on fine-tuning LLMs or developing specialized secure code datasets - processes that can be resource-intensive - RefleXGen iteratively optimizes ...

ID: 2510.23674v1 cs.SE, cs.AI, cs.CR

arXiv PDF

📄 SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability

2025-10-30

Авторы:

Peiyang Xu, Minzhou Pan, Zhaorun Chen, Shuang Yang, Chaowei Xiao, Bo Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

With the rapid proliferation of digital media, the need for efficient and transparent safeguards against unsafe content is more critical than ever. Traditional image guardrail models, constrained by predefined categories, often misclassify content due to their pure feature-based learning without semantic reasoning. Moreover, these models struggle to adapt to emerging threats, requiring costly retraining for new threats. To address these limitations, we introduce SafeVision, a novel image guardra...

ID: 2510.23960v1 cs.CV, cs.AI, cs.CR

arXiv PDF

📄 LLMLogAnalyzer: A Clustering-Based Log Analysis Chatbot using Large Language Models

2025-10-30

Авторы:

Peng Cai, Reza Ryan, Nickson M. Karie

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

System logs are a cornerstone of cybersecurity, supporting proactive breach prevention and post-incident investigations. However, analyzing vast amounts of diverse log data remains significantly challenging, as high costs, lack of in-house expertise, and time constraints make even basic analysis difficult for many organizations. This study introduces LLMLogAnalyzer, a clustering-based log analysis chatbot that leverages Large Language Models (LLMs) and Machine Learning (ML) algorithms to simplif...

ID: 2510.24031v1 cs.AI, cs.CR, H.3.3, I.2.7, I.5.3, I.2.5,

arXiv PDF

📄 Quantum-Resistant Networks Using Post-Quantum Cryptography

2025-10-30

Авторы:

Xin Jin, Nitish Kumar Chandra, Mohadeseh Azari, Kaushik P. Seshadreesan, Junyu Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Quantum networks rely on both quantum and classical channels for coordinated operation. Current architectures employ entanglement distribution and key exchange over quantum channels but often assume that classical communication is sufficiently secure. In practice, classical channels protected by traditional cryptography remain vulnerable to quantum adversaries, since large-scale quantum computers could break widely used public-key schemes and reduce the effective security of symmetric cryptograp...

ID: 2510.24534v1 quant-ph, cs.AI, cs.CR

arXiv PDF

📄 Ask What Your Country Can Do For You: Towards a Public Red Teaming Model

2025-10-25

Авторы:

Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave, Blake Chambers, Aayush Dhanotiya, Darshini Ramiah, Reva Schwartz, Jack Hagen, Akash Kundu, Mouni Pendharkar, Liam Baisley, Theodora Skeadas, Rumman Chowdhury

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

AI systems have the potential to produce both benefits and harms, but without rigorous and ongoing adversarial evaluation, AI actors will struggle to assess the breadth and magnitude of the AI risk surface. Researchers from the field of systems design have developed several effective sociotechnical AI evaluation and red teaming techniques targeting bias, hate speech, mis/disinformation, and other documented harm classes. However, as increasingly sophisticated AI systems are released into high-st...

ID: 2510.20061v1 cs.CY, cs.AI, cs.CR

arXiv PDF

📄 Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models

2025-10-25

Авторы:

Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent years have seen a surge in interest in digital content watermarking techniques, driven by the proliferation of generative models and increased legal pressure. With an ever-growing percentage of AI-generated content available online, watermarking plays an increasingly important role in ensuring content authenticity and attribution at scale. There have been many works assessing the robustness of watermarking to removal attacks, yet, watermark forging, the scenario when a watermark is stolen...

ID: 2510.20468v1 cs.LG, cs.AI, cs.CR, cs.CV

arXiv PDF

📄 Black Box Absorption: LLMs Undermining Innovative Ideas

2025-10-25

Авторы:

Wenjun Cao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Models are increasingly adopted as critical tools for accelerating innovation. This paper identifies and formalizes a systemic risk inherent in this paradigm: \textbf{Black Box Absorption}. We define this as the process by which the opaque internal architectures of LLM platforms, often operated by large-scale service providers, can internalize, generalize, and repurpose novel concepts contributed by users during interaction. This mechanism threatens to undermine the foundational p...

ID: 2510.20612v1 cs.CY, cs.AI, cs.CR, cs.LG, econ.GN, q-fin.EC

arXiv PDF

📄 Data Unlearning Beyond Uniform Forgetting via Diffusion Time and Frequency Selection

2025-10-23

Авторы:

Jinseong Park, Mijung Park

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Data unlearning aims to remove the influence of specific training samples from a trained model without requiring full retraining. Unlike concept unlearning, data unlearning in diffusion models remains underexplored and often suffers from quality degradation or incomplete forgetting. To address this, we first observe that most existing methods attempt to unlearn the samples at all diffusion time steps equally, leading to poor-quality generation. We argue that forgetting occurs disproportionately ...

ID: 2510.17917v1 cs.LG, cs.AI, cs.CR

arXiv PDF

Показано 41 - 50 из 162 записей