SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
2510.05173v2
cs.CR, cs.AI, cs.CV, I.2
2025-10-09
Авторы:
Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang
Abstract
Text-to-image models have shown remarkable capabilities in generating
high-quality images from natural language descriptions. However, these models
are highly vulnerable to adversarial prompts, which can bypass safety measures
and produce harmful content. Despite various defensive strategies, achieving
robustness against attacks while maintaining practical utility in real-world
applications remains a significant challenge. To address this issue, we first
conduct an empirical study of the text encoder in the Stable Diffusion (SD)
model, which is a widely used and representative text-to-image model. Our
findings reveal that the [EOS] token acts as a semantic aggregator, exhibiting
distinct distributional patterns between benign and adversarial prompts in its
embedding space. Building on this insight, we introduce \textbf{SafeGuider}, a
two-step framework designed for robust safety control without compromising
generation quality. SafeGuider combines an embedding-level recognition model
with a safety-aware feature erasure beam search algorithm. This integration
enables the framework to maintain high-quality image generation for benign
prompts while ensuring robust defense against both in-domain and out-of-domain
attacks. SafeGuider demonstrates exceptional effectiveness in minimizing attack
success rates, achieving a maximum rate of only 5.48\% across various attack
scenarios. Moreover, instead of refusing to generate or producing black images
for unsafe prompts, \textbf{SafeGuider} generates safe and meaningful images,
enhancing its practical utility. In addition, SafeGuider is not limited to the
SD model and can be effectively applied to other text-to-image models, such as
the Flux model, demonstrating its versatility and adaptability across different
architectures. We hope that SafeGuider can shed some light on the practical
deployment of secure text-to-image systems.