📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

2025-12-02

Авторы:

Xinyu Zhang, Yixin Wu, Boyang Zhang, Chenhao Lin, Chao Shen, Michael Backes, Yang Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference. It follows a procedu...

ID: 2511.22441v1 cs.CR, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 SPQR: A Standardized Benchmark for Modern Safety Alignment Methods in Text-to-Image Diffusion Models

2025-11-26

Авторы:

Mohammed Talha Alam, Nada Saadi, Fahad Shamshad, Nils Lukas, Karthik Nandakumar, Fahkri Karray, Samuele Poppi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Text-to-image diffusion models can emit copyrighted, unsafe, or private content. Safety alignment aims to suppress specific concepts, yet evaluations seldom test whether safety persists under benign downstream fine-tuning routinely applied after deployment (e.g., LoRA personalization, style/domain adapters). We study the stability of current safety methods under benign fine-tuning and observe frequent breakdowns. As true safety alignment must withstand even benign post-deployment adaptations, we...

ID: 2511.19558v1 cs.CR, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning

2025-11-18

Авторы:

Shanmin Wang, Dongdong Zhao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party repositories introduces serious security risks -- most notably backdoor attacks. Existing KD backdoor methods are typically complex and computationally intensive: they employ surrogate student models and simulated distillation to guarantee transferability, and they construct triggers in a way similar to universal adversarial perturbations (UAPs), which be...

ID: 2511.12046v1 cs.CR, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 EditTrack: Detecting and Attributing AI-assisted Image Editing

2025-10-04

Авторы:

Zhengyuan Jiang, Yuyang Zhang, Moyang Guo, Neil Zhenqiang Gong

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this work, we formulate and study the problem of image-editing detection and attribution: given a base image and a suspicious image, detection seeks to determine whether the suspicious image was derived from the base image using an AI editing model, while attribution further identifies the specific editing model responsible. Existing methods for detecting and attributing AI-generated images are insufficient for this problem, as they focus on determining whether an image was AI-generated/edite...

ID: 2510.01173v1 cs.CR, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack

2025-10-01

Авторы:

Yukun Chen, Boheng Li, Yu Yuan, Leyi Qi, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

#### Контекст Knowledge distillation (KD) является ключевым методом для развертывания глубоких нейронных сетей (DNN) на устройствах с ограниченными ресурсами. Он предполагает передачу знаний от высокоэффективных, но ресурсоёмких "учительских" моделей к компактным, но производительным "ученическим" моделям. Этот подход позволяет обеспечить высокую производительность моделей на устройствах, где производительность и энергоэффективность являются критичными факторами. Несмотря на популярность и полезность этого метода, он не без недостатков. Одним из возможных рисков является то, что учительские модели могут быть заражены скрытыми backdoor-атаками, которые могут быть переданы студенческим моделям через процесс KD. Эта проблема становится особенно критичной, если учительские модели получены из третьих сторон, где невозможно гарантировать их безопасность. Эта работа рассматривает новый и критический вид такой атаки, названный **distillation-conditional backdoor attack (DCBA)**, который имеет уникальные характеристики и значительный потенциал для загруженных устройств. #### Метод Для реализации DCBA мы предлагаем метод, основанный на **bilevel optimization**. Этот подход позволяет имитировать процесс KD, оптимизировав студенческую модель внутренним уровнем, а затем использовать выходы этой модели для оптимизации учителя, чтобы внедрить зараженный триггер. Мы вводим **SCAR (Simple Conditional Attack with Reverse-mode)**, который обеспечивает эффективную инъекцию backdoor-атаки в учительскую модель при помощи явного задания условий. Наша инъекция триггера основывается на алгоритме **implicit differentiation**, что позволяет нам обеспечить точность и эффективность при решении этой сложной задачи. Ключевые отличительные черты нашего подхода заключаются в том, что он не требует изменений в данных или допущений о модели, что делает его универсальным и опасным в различных условиях. #### Результаты Мы проводим опыты на эталонных датасетах, таких как CIFAR-10 и ImageNet, используя различные модели, такие как VGG, ResNet и MobileNet. Мы также используем различные KD-техники, включая fit-tuning и attention-based distillation. Результаты показывают, что метод SCAR выполняет успешную инъекцию backdoor-атаки в ученические модели даже при очистке данных и незаметности для существующих методов обнаружения бэкдоров. Кроме того, наши результаты показывают, что SCAR может выполнить успешную атаку с высокой инъекционной стойкостью, даже при соблюдении формальных процедур обнаружения backdoor-атак. Эти результаты обнаруживают серьезную уязвимость в процессе KD, которая была до этого незамечена. #### Значимость Наша работа выделяет новую и критическую уязвимость в широко используемом KD-процессе.

Annotation:

Knowledge distillation (KD) is a vital technique for deploying deep neural networks (DNNs) on resource-constrained devices by transferring knowledge from large teacher models to lightweight student models. While teacher models from third-party platforms may undergo security verification (\eg, backdoor detection), we uncover a novel and critical threat: distillation-conditional backdoor attacks (DCBAs). DCBA injects dormant and undetectable backdoors into teacher models, which become activated in...

ID: 2509.23871v1 cs.CR, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 Of-SemWat: High-payload text embedding for semantic watermarking of AI-generated images with arbitrary size

2025-10-01

Авторы:

Benedetta Tondi, Andrea Costanzo, Mauro Barni

## Контекст В последние годы стало всё более популярным использование генераторов изображений на основе искусственного интеллекта (AI-генераторов) для создания изображений, основанных на текстовых описаниях. Однако эти технологии иногда используются незаконно, чтобы создавать спам, де DEEPFAKE-контент или враньё. Для борьбы с этим проблемой необходимо мотивированное применение, которое позволит обнаруживать искусственные изменения в изображениях, генерируемых AI. В этом контексте появилась методика "Of-SemWat" (Отечественный Семантический Метадатный Метод), нацеленная на решение проблемы доказательства авторства изображений и судебного доказательства, когда манипуляции с ними были выполнены с помощью AI. Этот метод предлагает возможность встраивать в картинки семантические метаданные, описывающие образ, который может соответствовать входному текстовому промоутору. ## Метод Метод Of-SemWat заключается в создании высокополевого объёмного водяного знака, который может быть встроен в любого размера изображения. Основная идея заключается в использовании традиционных систем водяных знаков, в том числе ортогональных и турбокодов, чтобы обеспечить высокую устойчивость. Для улучшения интергральности водяного знака в изображение используется техника частотного внедрения и маскирования, которая позволяет минимизировать заметность водяного знака в графическом представлении. Работа выполняется на базе нейросетевой архитектуры, модифицированной для обработки больших размеров изображений. В процессе внедрения метаданных водяного знака в картинку становится главным фактором маскирование, чтобы оптимизировать незаметность. Это делается с использованием частотной модели, которая позволяет водяному знаку сохраняться в графической структуре изображения. ## Результаты Проведенные эксперименты показали, что Of-SemWat достаточно высокой степени устойчивости к широкому спектру видов процессов обработки изображений, включая сжатие, изменение разрешения, шумоподавление и различные виды фильтров. Более того, даже после применения AI-инпейтинга, водяный знак может быть восстановлен, что позволяет определить, были ли внесены изменения в изображение. Таким образом, Of-SemWat позволяет не только верифицировать целостность изображения, но и отслеживать изменения, внесённые AI-генератором, в соответствии с входным текстом. ## Значимость Of-SemWat открывает широкие перспективы в области защиты интеллектуальной собственности, модернизации методов доказательства прав на цифровый контент и противодействия AI-мошенничеству. Этот метод может

Annotation:

We propose a high-payload image watermarking method for textual embedding, where a semantic description of the image - which may also correspond to the input text prompt-, is embedded inside the image. In order to be able to robustly embed high payloads in large-scale images - such as those produced by modern AI generators - the proposed approach builds upon a traditional watermarking scheme that exploits orthogonal and turbo codes for improved robustness, and integrates frequency-domain embeddi...

ID: 2509.24823v1 cs.CR, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 Signal-Based Malware Classification Using 1D CNNs

2025-09-10

Авторы:

Jack Wilkie, Hanan Hindy, Ivan Andonovic, Christos Tachtatzis, Robert Atkinson

## Контекст Современные угрозы в сфере кибербезопасности, такие как малвирь, требуют эффективных методов идентификации и классификации. Одним из ключевых вызовов является обход традиционных методов статического анализа, которые могут быть обойдены с помощью различных оболочек и обфускации. Динамический анализ, хотя и показывает высокую точность, требует больших ресурсов, что не допускает массового развертывания. Ранее проводились исследования, применяющие методы компьютерного зрения к 2D-изображениям, созданным из бинарных файлов. Однако этот подход приводит к значительной потере информации, включая зашумление и введение зависимостей между пикселями, которые не существуют в начальных данных. ## Метод В данном исследовании предлагается новый подход к классификации малвирьа, основанный на преобразовании бинарных файлов в одномерные сигналы. Этот метод устраняет необходимость использования 2D-изображений, сохраняя большую часть оригинальной информации. Бинарные файлы конвертируются в 1D-сигналы без ненужных преобразований, используя формат вещественных чисел, что позволяет избежать зашумления и сохранить точность. Для классификации были использованы 1D-конvolutional neural networks (1D-CNNs), адаптированные из 2D-архитектур, таких как ResNet, с добавлением squeeze-and-excitation слоев для улучшения осознанности и эффективности. ## Результаты Использовав MalNet dataset, были проведены эксперименты для классификации на уровнях бинарный, тип и семейство. 1D-CNNs показали высокую точность, достигнув F1-метрик 0.874, 0.503 и 0.507 соответственно. Эти результаты опережают предыдущие решения, основанные на 2D-изображениях. Особенно выдающимися были результаты при классификации на уровне бинарный и тип, где 1D-подход показал значительное превосходство. ## Значимость Предложенный подход имеет широкие возможности применения в сфере безопасности информационных технологий. Он позволяет более эффективно обнаруживать и классифицировать новые виды малвирьа, даже с использованием обфускации. Благодаря использованию 1D-сигналов, данный метод экономит ресурсы и повышает точность. Его можно применять в системах мониторинга, антивирусной защите и анализа бинарных файлов. ## Выводы Результаты этого исследования указывают на то, что использование 1D-сигналов для классификации малвирьа является более эффективным, чем традиционные 2D-подходы. Будущие исследования будут сфокусированы на расширении этой техники для работы с более сложными данными и улучш

Annotation:

Malware classification is a contemporary and ongoing challenge in cyber-security: modern obfuscation techniques are able to evade traditional static analysis, while dynamic analysis is too resource intensive to be deployed at a large scale. One prominent line of research addresses these limitations by converting malware binaries into 2D images by heuristically reshaping them into a 2D grid before resizing using Lanczos resampling. These images can then be classified based on their textural infor...

ID: 2509.06548v2 cs.CR, cs.AI, cs.CV, cs.LG, I.2.6; K.6.5

arXiv PDF

📄 Signal-Based Malware Classification Using 1D CNNs

2025-09-10

Авторы:

Jack Wilkie, Hanan Hindy, Ivan Andonovic, Christos Tachtatzis, Robert Atkinson

## Контекст Modern malware detection faces significant challenges due to the use of advanced obfuscation techniques, which can bypass traditional static analysis methods. Dynamic analysis, while effective, is resource-intensive and impractical for large-scale deployment. To address these issues, existing research transforms malware binaries into 2D images by reshaping their data into a grid format and resizing it using Lanczos resampling. These images are then analyzed using computer vision techniques, enabling detection of obfuscated malware more effectively than static analysis. However, this approach introduces significant information loss due to quantization noise and the artificial introduction of 2D dependencies, which do not exist in the original binary data. This limitation reduces the classification performance of downstream models. This study proposes a novel approach that converts malware binaries into 1D signals, eliminating the need for heuristic reshaping and avoiding quantization noise by storing data in a floating-point format. ## Метод The proposed methodology focuses on converting malware binaries into 1D signals, leveraging their inherent structure and minimizing information loss. Unlike traditional 2D image-based approaches, this method preserves the original signal's integrity by avoiding heuristic reshaping and quantization noise. The signals are processed using a bespoke 1D convolutional neural network (1D CNN) based on the ResNet architecture. The network incorporates squeeze-and-excitation layers to enhance feature representation and classification accuracy. The model was evaluated on the MalNet dataset, a comprehensive dataset for malware classification, to assess its performance across binary, type, and family-level classification tasks. This approach represents a significant departure from conventional methods, offering improved classification accuracy and robustness. ## Результаты The experiments demonstrated the efficacy of the 1D signal-based approach in malware classification. The bespoke 1D CNN achieved state-of-the-art performance on the MalNet dataset, with F1 scores of 0.874 for binary classification, 0.503 for type-level classification, and 0.507 for family-level classification. These results outperform existing 2D CNN models when applied to the same dataset, highlighting the superiority of the proposed signal-based methodology. The floating-point representation of signals eliminates quantization noise, ensuring that the models receive more accurate and complete data for analysis. This improvement in signal fidelity directly translates to better classification performance, paving the way for more effective malware detection systems. ## Значимость The proposed 1D signal-based approach offers several advantages over traditional 2D image-based methods. By avoiding heuristic reshaping and quantization noise, it preserves the integrity of the original malware data, leading to more accurate classification. The method is computationally efficient, making it suitable for large-scale deployment in real-world cybersecurity systems. Its applications extend beyond malware classification, as the signal-based modality can be applied to other domains requiring robust signal processing. The potential impact of this work includes enhanced malware detection capabilities, improved system security, and reduced resource consumption in large-scale deployment scenarios. ## Выводы The study demonstrates the effectiveness of converting malware binaries into 1D signals for classification using 1D CNNs. The bespoke 1D CNN architecture, based on ResNet and squeeze-and-excitation layers, achieves state-of-the-art performance on the MalNet dataset, outperforming existing 2D CNN models. This approach eliminates the limitations of traditional 2D image-based methods, offering superior classification accuracy and robustness. Future research directions include exploring advanced signal processing techniques to further enhance signal fidelity and investigating the applicability of the proposed methodology to other cybersecurity and signal processing tasks.

Annotation:

ID: 2509.06548v1 cs.CR, cs.AI, cs.CV, cs.LG, I.2.6; K.6.5

arXiv PDF