What Do AI-Generated Images Want?
2510.20350v2
cs.CY, cs.AI
2025-10-28
Авторы:
Amanda Wasielewski
Abstract
W.J.T. Mitchell's influential essay 'What do pictures want?' shifts the
theoretical focus away from the interpretative act of understanding pictures
and from the motivations of the humans who create them to the possibility that
the picture itself is an entity with agency and wants. In this article, I
reframe Mitchell's question in light of contemporary AI image generation tools
to ask: what do AI-generated images want? Drawing from art historical discourse
on the nature of abstraction, I argue that AI-generated images want specificity
and concreteness because they are fundamentally abstract. Multimodal
text-to-image models, which are the primary subject of this article, are based
on the premise that text and image are interchangeable or exchangeable tokens
and that there is a commensurability between them, at least as represented
mathematically in data. The user pipeline that sees textual input become visual
output, however, obscures this representational regress and makes it seem like
one form transforms into the other -- as if by magic.
Ссылки и действия
Дополнительные ресурсы: