Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
2510.19808v1
cs.CV, cs.CL, cs.LG
2025-10-24
Авторы:
Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan
Abstract
Recent advances in multimodal models have demonstrated remarkable text-guided
image editing capabilities, with systems like GPT-4o and Nano-Banana setting
new benchmarks. However, the research community's progress remains constrained
by the absence of large-scale, high-quality, and openly accessible datasets
built from real images. We introduce Pico-Banana-400K, a comprehensive
400K-image dataset for instruction-based image editing. Our dataset is
constructed by leveraging Nano-Banana to generate diverse edit pairs from real
photographs in the OpenImages collection. What distinguishes Pico-Banana-400K
from previous synthetic datasets is our systematic approach to quality and
diversity. We employ a fine-grained image editing taxonomy to ensure
comprehensive coverage of edit types while maintaining precise content
preservation and instruction faithfulness through MLLM-based quality scoring
and careful curation. Beyond single turn editing, Pico-Banana-400K enables
research into complex editing scenarios. The dataset includes three specialized
subsets: (1) a 72K-example multi-turn collection for studying sequential
editing, reasoning, and planning across consecutive modifications; (2) a
56K-example preference subset for alignment research and reward model training;
and (3) paired long-short editing instructions for developing instruction
rewriting and summarization capabilities. By providing this large-scale,
high-quality, and task-rich resource, Pico-Banana-400K establishes a robust
foundation for training and benchmarking the next generation of text-guided
image editing models.
Ссылки и действия
Дополнительные ресурсы: