Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
2510.06605v1
cs.CR, cs.AI, cs.CL
2025-10-10
Авторы:
Shuo Shao, Yiming Li, Hongwei Yao, Yifei Chen, Yuchen Yang, Zhan Qin
Abstract
The substantial investment required to develop Large Language Models (LLMs)
makes them valuable intellectual property, raising significant concerns about
copyright protection. LLM fingerprinting has emerged as a key technique to
address this, which aims to verify a model's origin by extracting an intrinsic,
unique signature (a "fingerprint") and comparing it to that of a source model
to identify illicit copies. However, existing black-box fingerprinting methods
often fail to generate distinctive LLM fingerprints. This ineffectiveness
arises because black-box methods typically rely on model outputs, which lose
critical information about the model's unique parameters due to the usage of
non-linear functions. To address this, we first leverage Fisher Information
Theory to formally demonstrate that the gradient of the model's input is a more
informative feature for fingerprinting than the output. Based on this insight,
we propose ZeroPrint, a novel method that approximates these information-rich
gradients in a black-box setting using zeroth-order estimation. ZeroPrint
overcomes the challenge of applying this to discrete text by simulating input
perturbations via semantic-preserving word substitutions. This operation allows
ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint.
Experiments on the standard benchmark show ZeroPrint achieves a
state-of-the-art effectiveness and robustness, significantly outperforming
existing black-box methods.
Ссылки и действия
Дополнительные ресурсы: