IE-Critic-R1: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment

2511.18055v1 cs.CV, cs.AI, cs.CL 2025-11-25

Авторы:

Bowen Qu, Shangkun Sun, Xiaoyu Liang, Wei Gao

Abstract

Recent advances in text-driven image editing have been significant, yet the task of accurately evaluating these edited images continues to pose a considerable challenge. Different from the assessment of text-driven image generation, text-driven image editing is characterized by simultaneously conditioning on both text and a source image. The edited images often retain an intrinsic connection to the original image, which dynamically change with the semantics of the text. However, previous methods tend to solely focus on text-image alignment or have not well aligned with human perception. In this work, we introduce the Text-driven Image Editing Benchmark suite (IE-Bench) to enhance the assessment of text-driven edited images. IE-Bench includes a database contains diverse source images, various editing prompts and the corresponding edited results from different editing methods, and nearly 4,000 samples with corresponding Mean Opinion Scores (MOS) provided by 15 human subjects. Furthermore, we introduce IE-Critic-R1, which, benefiting from Reinforcement Learning from Verifiable Rewards (RLVR), provides more comprehensive and explainable quality assessment for text-driven image editing that aligns with human perception. Extensive experiments demonstrate IE-Critic-R1's superior subjective-alignments on the text-driven image editing task compared with previous metrics. Related data and codes are available to the public.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

IE-Critic-R1: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Traini...

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Model...

Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Stream...

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcem...

Навигация