DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

2510.15303v1 cs.CR, cs.AI, cs.CY 2025-10-21

Авторы:

Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li

Abstract

Large web-scale datasets have driven the rapid advancement of pre-trained language models (PLMs), but unauthorized data usage has raised serious copyright concerns. Existing dataset ownership verification (DOV) methods typically assume that watermarks remain stable during inference; however, this assumption often fails under natural noise and adversary-crafted perturbations. We propose the first certified dataset ownership verification method for PLMs based on dual-space smoothing (i.e., DSSmoothing). To address the challenges of text discreteness and semantic sensitivity, DSSmoothing introduces continuous perturbations in the embedding space to capture semantic robustness and applies controlled token reordering in the permutation space to capture sequential robustness. DSSmoothing consists of two stages: in the first stage, triggers are collaboratively embedded in both spaces to generate norm-constrained and robust watermarked datasets; in the second stage, randomized smoothing is applied in both spaces during verification to compute the watermark robustness (WR) of suspicious models and statistically compare it with the principal probability (PP) values of a set of benign models. Theoretically, DSSmoothing provides provable robustness guarantees for dataset ownership verification by ensuring that WR consistently exceeds PP under bounded dual-space perturbations. Extensive experiments on multiple representative web datasets demonstrate that DSSmoothing achieves stable and reliable verification performance and exhibits robustness against potential adaptive attacks.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing

Авторы:

Abstract

Ссылки и действия

Связанные статьи

A Taxonomy of Pix Fraud in Brazil: Attack Methodologies, AI-Driven Amplification...

Future-Back Threat Modeling: A Foresight-Driven Security Framework

Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation

Watermarking Discrete Diffusion Language Models

Covert Surveillance in Smart Devices: A SCOUR Framework Analysis of Youth Privac...

Навигация