IRIS: Intrinsic Reward Image Synthesis

2509.25562v1 cs.AI, cs.CL, cs.CV, cs.LG 2025-10-02

Авторы:

Yihang Chen, Yuanhao Ban, Yunqi Hong, Cho-Jui Hsieh

Abstract

Despite the success of Reinforcement Learning from Human Feedback (RLHF) in language reasoning, its application to autoregressive Text-to-Image (T2I) generation is often constrained by the limited availability of human preference data. This paper explores how an autoregressive T2I model can learn from internal signals without relying on external rewards or labeled data. Contrary to recent findings in text generation, we show that maximizing self-uncertainty, rather than self-certainty, improves image generation. We observe that this is because autoregressive T2I models with low uncertainty tend to generate simple and uniform images, which are less aligned with human preferences. Based on these observations, we propose IRIS (Intrinsic Reward Image Synthesis), the first framework to improve autoregressive T2I models with reinforcement learning using only an intrinsic reward. Empirical results demonstrate that applying IRIS to autoregressive T2I models achieves performance that is competitive with or superior to external rewards.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

IRIS: Intrinsic Reward Image Synthesis

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Jr. AI Scientist and Its Risk Report: Autonomous Scientific Exploration from a B...

DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of ...

Real Deep Research for AI, Robotics and Beyond

Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models

Bridging the Gap Between Multimodal Foundation Models and World Models

Навигация