Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
2510.20468v1
cs.LG, cs.AI, cs.CR, cs.CV
2025-10-25
Авторы:
Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko
Abstract
Recent years have seen a surge in interest in digital content watermarking
techniques, driven by the proliferation of generative models and increased
legal pressure. With an ever-growing percentage of AI-generated content
available online, watermarking plays an increasingly important role in ensuring
content authenticity and attribution at scale. There have been many works
assessing the robustness of watermarking to removal attacks, yet, watermark
forging, the scenario when a watermark is stolen from genuine content and
applied to malicious content, remains underexplored. In this work, we
investigate watermark forging in the context of widely used post-hoc image
watermarking. Our contributions are as follows. First, we introduce a
preference model to assess whether an image is watermarked. The model is
trained using a ranking loss on purely procedurally generated images without
any need for real watermarks. Second, we demonstrate the model's capability to
remove and forge watermarks by optimizing the input image through
backpropagation. This technique requires only a single watermarked image and
works without knowledge of the watermarking model, making our attack much
simpler and more practical than attacks introduced in related work. Third, we
evaluate our proposed method on a variety of post-hoc image watermarking
models, demonstrating that our approach can effectively forge watermarks,
questioning the security of current watermarking approaches. Our code and
further resources are publicly available.