Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

2510.06525v1 cs.LG, cs.CR 2025-10-12

Авторы:

Ali Naseh, Anshuman Suri, Yuefeng Peng, Harsh Chaudhari, Alina Oprea, Amir Houmansadr

Abstract

Generative AI leaderboards are central to evaluating model capabilities, but remain vulnerable to manipulation. Among key adversarial objectives is rank manipulation, where an attacker must first deanonymize the models behind displayed outputs -- a threat previously demonstrated and explored for large language models (LLMs). We show that this problem can be even more severe for text-to-image leaderboards, where deanonymization is markedly easier. Using over 150,000 generated images from 280 prompts and 19 diverse models spanning multiple organizations, architectures, and sizes, we demonstrate that simple real-time classification in CLIP embedding space identifies the generating model with high accuracy, even without prompt control or historical data. We further introduce a prompt-level separability metric and identify prompts that enable near-perfect deanonymization. Our results indicate that rank manipulation in text-to-image leaderboards is easier than previously recognized, underscoring the need for stronger defenses.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Towards Irreversible Machine Unlearning for Diffusion Models

Log Probability Tracking of LLM APIs

Efficient Public Verification of Private ML via Regularization

Exploiting \texttt{ftrace}'s \texttt{function\_graph} Tracer Features for Machin...

SD-CGAN: Conditional Sinkhorn Divergence GAN for DDoS Anomaly Detection in IoT N...

Навигация