📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Password-Activated Shutdown Protocols for Misaligned Frontier Agents

2025-12-04

Авторы:

Kai Williams, Rohan Subramani, Francis Rhys Ward

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Frontier AI developers may fail to align or control highly-capable AI agents. In many cases, it could be useful to have emergency shutdown mechanisms which effectively prevent misaligned agents from carrying out harmful actions in the world. We introduce password-activated shutdown protocols (PAS protocols) -- methods for designing frontier agents to implement a safe shutdown protocol when given a password. We motivate PAS protocols by describing intuitive use-cases in which they mitigate risks ...

ID: 2512.03089v1 cs.CR, cs.AI, cs.CY, cs.LG

arXiv PDF

📄 A Taxonomy of Pix Fraud in Brazil: Attack Methodologies, AI-Driven Amplification, and Defensive Strategies

2025-11-27

Авторы:

Glener Lanes Pizzolato, Brenda Medeiros Lopes, Claudio Schepke, Diego Kreutz

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This work presents a review of attack methodologies targeting Pix, the instant payment system launched by the Central Bank of Brazil in 2020. The study aims to identify and classify the main types of fraud affecting users and financial institutions, highlighting the evolution and increasing sophistication of these techniques. The methodology combines a structured literature review with exploratory interviews conducted with professionals from the banking sector. The results show that fraud scheme...

ID: 2511.20902v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 Future-Back Threat Modeling: A Foresight-Driven Security Framework

2025-11-22

Авторы:

Vu Van Than

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Traditional threat modeling remains reactive-focused on known TTPs and past incident data, while threat prediction and forecasting frameworks are often disconnected from operational or architectural artifacts. This creates a fundamental weakness: the most serious cyber threats often do not arise from what is known, but from what is assumed, overlooked, or not yet conceived, and frequently originate from the future, such as artificial intelligence, information warfare, and supply chain attacks, w...

ID: 2511.16088v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation

2025-11-18

Авторы:

Fred Heiding, Simon Lermen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated safety guardrails across six frontier LLMs spanning four attack categories, revealing critical failures where several models exhibited near-complete susceptibility to certain attack vectors. In a human va...

ID: 2511.11759v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 Watermarking Discrete Diffusion Language Models

2025-11-06

Авторы:

Avi Bagchi, Akhil Bhimaraju, Moulik Choraria, Daniel Alabi, Lav R. Varshney

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Watermarking has emerged as a promising technique to track AI-generated content and differentiate it from authentic human creations. While prior work extensively studies watermarking for autoregressive large language models (LLMs) and image diffusion models, none address discrete diffusion language models, which are becoming popular due to their high inference throughput. In this paper, we introduce the first watermarking method for discrete diffusion models by applying the distribution-preservi...

ID: 2511.02083v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 Covert Surveillance in Smart Devices: A SCOUR Framework Analysis of Youth Privacy Implications

2025-10-30

Авторы:

Austin Shouli, Yulia Bobkova, Ajay Kumar Shrestha

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper investigates how smart devices covertly capture private conversations and discusses in more in-depth the implications of this for youth privacy. Using a structured review guided by the PRISMA methodology, the analysis focuses on privacy concerns, data capture methods, data storage and sharing practices, and proposed technical mitigations. To structure and synthesize findings, we introduce the SCOUR framework, encompassing Surveillance mechanisms, Consent and awareness, Operational dat...

ID: 2510.24072v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

2025-10-23

Авторы:

Fengqing Jiang, Yichen Feng, Yuetai Li, Luyao Niu, Basel Alomair, Radha Poovendran

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The convergence of LLM-powered research assistants and AI-based peer review systems creates a critical vulnerability: fully automated publication loops where AI-generated research is evaluated by AI reviewers without human oversight. We investigate this through \textbf{BadScientist}, a framework that evaluates whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems. Our generator employs presentation-manipulation strategies requiring no real experiments. W...

ID: 2510.18003v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

2025-10-21

Авторы:

Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large web-scale datasets have driven the rapid advancement of pre-trained language models (PLMs), but unauthorized data usage has raised serious copyright concerns. Existing dataset ownership verification (DOV) methods typically assume that watermarks remain stable during inference; however, this assumption often fails under natural noise and adversary-crafted perturbations. We propose the first certified dataset ownership verification method for PLMs based on dual-space smoothing (i.e., DSSmoot...

ID: 2510.15303v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 Auditing Pay-Per-Token in Large Language Models

2025-10-09

Авторы:

Ander Artola Velasco, Stratis Tsirtsis, Manuel Gomez-Rodriguez

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Millions of users rely on a market of cloud-based services to obtain access to state-of-the-art large language models. However, it has been very recently shown that the de facto pay-per-token pricing mechanism used by providers creates a financial incentive for them to strategize and misreport the (number of) tokens a model used to generate an output. In this paper, we develop an auditing framework based on martingale theory that enables a trusted third-party auditor who sequentially queries a p...

ID: 2510.05181v1 cs.CR, cs.AI, cs.CY

arXiv PDF

📄 Secure Multi-Modal Data Fusion in Federated Digital Health Systems via MCP

2025-10-04

Авторы:

Aueaphum Aueawatthanaphisut

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Secure and interoperable integration of heterogeneous medical data remains a grand challenge in digital health. Current federated learning (FL) frameworks offer privacy-preserving model training but lack standardized mechanisms to orchestrate multi-modal data fusion across distributed and resource-constrained environments. This study introduces a novel framework that leverages the Model Context Protocol (MCP) as an interoperability layer for secure, cross-agent communication in multi-modal feder...

ID: 2510.01780v1 cs.CR, cs.AI, cs.CY, cs.LG

arXiv PDF

Показано 1 - 10 из 13 записей