📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Kai Williams, Rohan Subramani, Francis Rhys Ward
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Frontier AI developers may fail to align or control highly-capable AI agents. In many cases, it could be useful to have emergency shutdown mechanisms which effectively prevent misaligned agents from carrying out harmful actions in the world. We introduce password-activated shutdown protocols (PAS protocols) -- methods for designing frontier agents to implement a safe shutdown protocol when given a password. We motivate PAS protocols by describing intuitive use-cases in which they mitigate risks ...
Авторы:
Glener Lanes Pizzolato, Brenda Medeiros Lopes, Claudio Schepke, Diego Kreutz
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This work presents a review of attack methodologies targeting Pix, the instant payment system launched by the Central Bank of Brazil in 2020. The study aims to identify and classify the main types of fraud affecting users and financial institutions, highlighting the evolution and increasing sophistication of these techniques. The methodology combines a structured literature review with exploratory interviews conducted with professionals from the banking sector. The results show that fraud scheme...
Авторы:
Vu Van Than
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Traditional threat modeling remains reactive-focused on known TTPs and past incident data, while threat prediction and forecasting frameworks are often disconnected from operational or architectural artifacts. This creates a fundamental weakness: the most serious cyber threats often do not arise from what is known, but from what is assumed, overlooked, or not yet conceived, and frequently originate from the future, such as artificial intelligence, information warfare, and supply chain attacks, w...
Авторы:
Fred Heiding, Simon Lermen
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated safety guardrails across six frontier LLMs spanning four attack categories, revealing critical failures where several models exhibited near-complete susceptibility to certain attack vectors. In a human va...
Авторы:
Avi Bagchi, Akhil Bhimaraju, Moulik Choraria, Daniel Alabi, Lav R. Varshney
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Watermarking has emerged as a promising technique to track AI-generated
content and differentiate it from authentic human creations. While prior work
extensively studies watermarking for autoregressive large language models
(LLMs) and image diffusion models, none address discrete diffusion language
models, which are becoming popular due to their high inference throughput. In
this paper, we introduce the first watermarking method for discrete diffusion
models by applying the distribution-preservi...
📄 Covert Surveillance in Smart Devices: A SCOUR Framework Analysis of Youth Privacy Implications
2025-10-30Авторы:
Austin Shouli, Yulia Bobkova, Ajay Kumar Shrestha
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper investigates how smart devices covertly capture private
conversations and discusses in more in-depth the implications of this for youth
privacy. Using a structured review guided by the PRISMA methodology, the
analysis focuses on privacy concerns, data capture methods, data storage and
sharing practices, and proposed technical mitigations. To structure and
synthesize findings, we introduce the SCOUR framework, encompassing
Surveillance mechanisms, Consent and awareness, Operational dat...
📄 BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?
2025-10-23Авторы:
Fengqing Jiang, Yichen Feng, Yuetai Li, Luyao Niu, Basel Alomair, Radha Poovendran
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
The convergence of LLM-powered research assistants and AI-based peer review
systems creates a critical vulnerability: fully automated publication loops
where AI-generated research is evaluated by AI reviewers without human
oversight. We investigate this through \textbf{BadScientist}, a framework that
evaluates whether fabrication-oriented paper generation agents can deceive
multi-model LLM review systems. Our generator employs presentation-manipulation
strategies requiring no real experiments. W...
Авторы:
Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Large web-scale datasets have driven the rapid advancement of pre-trained
language models (PLMs), but unauthorized data usage has raised serious
copyright concerns. Existing dataset ownership verification (DOV) methods
typically assume that watermarks remain stable during inference; however, this
assumption often fails under natural noise and adversary-crafted perturbations.
We propose the first certified dataset ownership verification method for PLMs
based on dual-space smoothing (i.e., DSSmoot...
Авторы:
Ander Artola Velasco, Stratis Tsirtsis, Manuel Gomez-Rodriguez
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Millions of users rely on a market of cloud-based services to obtain access
to state-of-the-art large language models. However, it has been very recently
shown that the de facto pay-per-token pricing mechanism used by providers
creates a financial incentive for them to strategize and misreport the (number
of) tokens a model used to generate an output. In this paper, we develop an
auditing framework based on martingale theory that enables a trusted
third-party auditor who sequentially queries a p...
Авторы:
Aueaphum Aueawatthanaphisut
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Secure and interoperable integration of heterogeneous medical data remains a
grand challenge in digital health. Current federated learning (FL) frameworks
offer privacy-preserving model training but lack standardized mechanisms to
orchestrate multi-modal data fusion across distributed and resource-constrained
environments. This study introduces a novel framework that leverages the Model
Context Protocol (MCP) as an interoperability layer for secure, cross-agent
communication in multi-modal feder...
Показано 1 -
10
из 13 записей