PVMark: Enabling Public Verifiability for LLM Watermarking Schemes

2510.26274v1 cs.CR, cs.CL, cs.LG 2025-11-01

Авторы:

Haohua Duan, Liyao Xiang, Xin Zhang

Abstract

Watermarking schemes for large language models (LLMs) have been proposed to identify the source of the generated text, mitigating the potential threats emerged from model theft. However, current watermarking solutions hardly resolve the trust issue: the non-public watermark detection cannot prove itself faithfully conducting the detection. We observe that it is attributed to the secret key mostly used in the watermark detection -- it cannot be public, or the adversary may launch removal attacks provided the key; nor can it be private, or the watermarking detection is opaque to the public. To resolve the dilemma, we propose PVMark, a plugin based on zero-knowledge proof (ZKP), enabling the watermark detection process to be publicly verifiable by third parties without disclosing any secret key. PVMark hinges upon the proof of `correct execution' of watermark detection on which a set of ZKP constraints are built, including mapping, random number generation, comparison, and summation. We implement multiple variants of PVMark in Python, Rust and Circom, covering combinations of three watermarking schemes, three hash functions, and four ZKP protocols, to show our approach effectively works under a variety of circumstances. By experimental results, PVMark efficiently enables public verifiability on the state-of-the-art LLM watermarking schemes yet without compromising the watermarking performance, promising to be deployed in practice.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

PVMark: Enabling Public Verifiability for LLM Watermarking Schemes

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Securing Large Language Models (LLMs) from Prompt Injection Attacks

Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion

Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks again...

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generat...

SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models

Навигация