Provable Watermarking for Data Poisoning Attacks
2510.09210v1
cs.CR, cs.LG
2025-10-14
Авторы:
Yifan Zhu, Lijia Yu, Xiao-Shan Gao
Abstract
In recent years, data poisoning attacks have been increasingly designed to
appear harmless and even beneficial, often with the intention of verifying
dataset ownership or safeguarding private data from unauthorized use. However,
these developments have the potential to cause misunderstandings and conflicts,
as data poisoning has traditionally been regarded as a security threat to
machine learning systems. To address this issue, it is imperative for harmless
poisoning generators to claim ownership of their generated datasets, enabling
users to identify potential poisoning to prevent misuse. In this paper, we
propose the deployment of watermarking schemes as a solution to this challenge.
We introduce two provable and practical watermarking approaches for data
poisoning: {\em post-poisoning watermarking} and {\em poisoning-concurrent
watermarking}. Our analyses demonstrate that when the watermarking length is
$\Theta(\sqrt{d}/\epsilon_w)$ for post-poisoning watermarking, and falls within
the range of $\Theta(1/\epsilon_w^2)$ to $O(\sqrt{d}/\epsilon_p)$ for
poisoning-concurrent watermarking, the watermarked poisoning dataset provably
ensures both watermarking detectability and poisoning utility, certifying the
practicality of watermarking under data poisoning attacks. We validate our
theoretical findings through experiments on several attacks, models, and
datasets.
Ссылки и действия
Дополнительные ресурсы: