Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
2510.18333v1
cs.CR, cs.CL
2025-10-23
Авторы:
Yepeng Liu, Xuandong Zhao, Dawn Song, Gregory W. Wornell, Yuheng Bu
Abstract
Despite progress in watermarking algorithms for large language models (LLMs),
real-world deployment remains limited. We argue that this gap stems from
misaligned incentives among LLM providers, platforms, and end users, which
manifest as four key barriers: competitive risk, detection-tool governance,
robustness concerns and attribution issues. We revisit three classes of
watermarking through this lens. \emph{Model watermarking} naturally aligns with
LLM provider interests, yet faces new challenges in open-source ecosystems.
\emph{LLM text watermarking} offers modest provider benefit when framed solely
as an anti-misuse tool, but can gain traction in narrowly scoped settings such
as dataset de-contamination or user-controlled provenance. \emph{In-context
watermarking} (ICW) is tailored for trusted parties, such as conference
organizers or educators, who embed hidden watermarking instructions into
documents. If a dishonest reviewer or student submits this text to an LLM, the
output carries a detectable watermark indicating misuse. This setup aligns
incentives: users experience no quality loss, trusted parties gain a detection
tool, and LLM providers remain neutral by simply following watermark
instructions. We advocate for a broader exploration of incentive-aligned
methods, with ICW as an example, in domains where trusted parties need reliable
tools to detect misuse. More broadly, we distill design principles for
incentive-aligned, domain-specific watermarking and outline future research
directions. Our position is that the practical adoption of LLM watermarking
requires aligning stakeholder incentives in targeted application domains and
fostering active community engagement.
Ссылки и действия
Дополнительные ресурсы: