Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security
2510.06525v1
cs.LG, cs.CR
2025-10-12
Авторы:
Ali Naseh, Anshuman Suri, Yuefeng Peng, Harsh Chaudhari, Alina Oprea, Amir Houmansadr
Abstract
Generative AI leaderboards are central to evaluating model capabilities, but
remain vulnerable to manipulation. Among key adversarial objectives is rank
manipulation, where an attacker must first deanonymize the models behind
displayed outputs -- a threat previously demonstrated and explored for large
language models (LLMs). We show that this problem can be even more severe for
text-to-image leaderboards, where deanonymization is markedly easier. Using
over 150,000 generated images from 280 prompts and 19 diverse models spanning
multiple organizations, architectures, and sizes, we demonstrate that simple
real-time classification in CLIP embedding space identifies the generating
model with high accuracy, even without prompt control or historical data. We
further introduce a prompt-level separability metric and identify prompts that
enable near-perfect deanonymization. Our results indicate that rank
manipulation in text-to-image leaderboards is easier than previously
recognized, underscoring the need for stronger defenses.
Ссылки и действия
Дополнительные ресурсы: