Majority Bit-Aware Watermarking For Large Language Models

2508.03829v1 cs.CL, cs.CR 2025-08-09

Авторы:

Jiahao Xu, Rui Hu, Zikai Zhang

Резюме на русском

Деплой бо LLM в реальном мире создает риск их недобросовестного использования для генерации вредного или дезориентирующего контента. Watermarking технологии предлагаются как способ проверки подлинности и отслеживания источника. Несмотря на работу по многобитному watermarking, модели страдают от компромисса между текстовой качеством и точностью расшифровки. В этой работе предлагается MajorMark, метод, основывающийся на majority bit-aware encoding. Он позволяет расширить и гибко оптимизировать сеть токенов для сохранения качества текста без ущерба для точности расшифровки. MajorMark$^+$ разбивает текст на блоки для изолированной работы каждого, что далее улучшает качество водяных знаков и расшифровку. Эксперименты показали, что подходы MajorMark и MajorMark$^+$ значительно повышают точность расшифровки и поддерживают высокое качество текста, превосходя современные многобитные watermarking-методы.

Abstract

The growing deployment of Large Language Models (LLMs) in real-world applications has raised concerns about their potential misuse in generating harmful or deceptive content. To address this issue, watermarking techniques have emerged as a promising solution by embedding identifiable binary messages into generated text for origin verification and misuse tracing. While recent efforts have explored multi-bit watermarking schemes capable of embedding rich information such as user identifiers, they typically suffer from the fundamental trade-off between text quality and decoding accuracy: to ensure reliable message decoding, they have to restrict the size of preferred token sets during encoding, yet such restrictions reduce the quality of the generated content. In this work, we propose MajorMark, a novel watermarking method that improves this trade-off through majority bit-aware encoding. MajorMark selects preferred token sets based on the majority bit of the message, enabling a larger and more flexible sampling of tokens. In contrast to prior methods that rely on token frequency analysis for decoding, MajorMark employs a clustering-based decoding strategy, which maintains high decoding accuracy even when the preferred token set is large, thus preserving both content quality and decoding accuracy. We further introduce MajorMark$^+$, which partitions the message into multiple blocks to independently encode and deterministically decode each block, thereby further enhancing the quality of watermarked text and improving decoding accuracy. Extensive experiments on state-of-the-art LLMs demonstrate that our methods significantly enhance both decoding accuracy and text generation quality, outperforming prior multi-bit watermarking baselines.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Majority Bit-Aware Watermarking For Large Language Models

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks ...

LLM Reinforcement in Context

RegionMarker: A Region-Triggered Semantic Watermarking Framework for Embedding-a...

HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detect...

EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models

Навигация