Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

🍌 Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Pico-Banana-400K is a large-scale dataset of ~400K text–image–edit triplets designed to advance research in text-guided image editing.
Each example contains:

  • an original image (from Open Images),
  • a human-like edit instruction, and
  • the edited result generated by Nano-Banana and verified by Gemini-2.5-Pro.

The dataset spans 35 edit operations across 8 semantic categories, covering diverse transformations—from low-level color adjustments to high-level object, scene, and stylistic edits.


🧩 Key Features

FeatureDescription
Total Samples~257K single-turn text–image–edit triplets for SFT, ~56K single-turn text-image(positive) - image(negative)-edit for preference learning, and ~72K multi-turn texts-images-edits for multi-turn applications
SourceOpen Images
Edit Operations35 across 8 semantic categories
CategoriesPixel & Photometric, Object-Level, Scene Composition, Stylistic, Text & Symbol, Human-Centric, Scale & Perspective, Spatial/Layout
Image Resolution512–1024 px
Prompt GeneratorGemini-2.5-Flash
Editing ModelNano-Banana
Self-EvaluationAutomated judging pipeline using Gemini-2.5-Pro for edit quality

🏗️ Dataset Construction

Pico-Banana-400K is built using a two-stage multimodal generation pipeline:

  1. Instruction Generation
    Each Open Images sample is passed to Gemini-2.5-Flash, which writes concise, natural-language editing instructions grounded in visible content. We also provide short instructions summarized by Qwen-2.5-Instruct-7B. Example:
    { "instruction": "Change the red car to blue." }
  2. Editing + Self-Evaluation The Nano-Banana model performs the edit, then automatically evaluates the result using a structured quality prompt that measures: Instruction Compliance (40%) Editing Realism (25%) Preservation Balance (20%) Technical Quality (15%) Only edits scoring above a strict threshold (~0.7) are labeled as successful, forming the main dataset; the remaining ~56K are retained as failure cases for robustness and preference learning.

📊 Dataset Statistics

Pico-Banana-400K contains ~400K image editing data, covering a wide visual and semantic range drawn from real-world imagery.


🧭 Category Distribution

CategoryDescriptionPercentage
Object-Level SemanticAdd, remove, replace, or relocate objects35%
Scene Composition & Multi-SubjectContextual and environmental transformations20%
Human-CentricEdits involving clothing, expression, or appearance18%
StylisticDomain and artistic style transfer10%
Text & SymbolEdits involving visible text, signs, or symbols8%
Pixel & PhotometricBrightness, contrast, and tonal adjustments5%
Scale & PerspectiveZoom, viewpoint, or framing changes2%
Spatial / LayoutOutpainting, composition, or canvas extension2%

📂 Data Composition

  • Single-Turn SFT samples (successful edits): ~257K
  • Single-Turn Preference samples (failure cases): ~56K
  • Multi-Turn SFT samples (successful cases): ~72K
  • Gemini-generated instructions: concise, natural, and image-aware
  • Edit coverage: 35 edit types across 8 semantic categories
  • Image diversity: includes humans, objects, text-rich scenes, etc from Open Images

🖼️ Visualization

Below are representative examples from different categories:

CategoryExample
Object-Level“Replace the red apple with a green one.”
Scene Composition“Add sunlight streaming through the window.”
Human-Centric“Change the person’s expression to smiling.”
Text & Symbol“Uppercase the text on the billboard.”
Stylistic“Convert the image to a Van Gogh painting style.”

Pico-Banana-400K provides both breadth (diverse edit operations) and depth (quality-controlled multimodal supervision), making it a strong foundation for training and evaluating text-guided image editing models.

🧠 Applications

Pico-Banana-400K serves as a versatile resource for advancing controllable and instruction-aware image editing.
Beyond single-step editing, the dataset enables multi-turn, conversational editing and reward-based training paradigms.

📦 Dataset Download Guide

The Pico-Banana-400K dataset is hosted on Apple’s public CDN.
You can download each component (single-turn, multi-turn, and preference data) using the provided manifest files.


🖼️ 1. Single-Turn Edited Images

Manifest files: sft link and preference link

🖼️ 2. Multi-Turn Edited Images

Manifest file: multi-turn link

🖼️ 3. Source Images

Urls to download source images are provided along with edit instructions in sft link, preference link, and multi-turn link. If you hit rate limit with Flickr when downloading images, you can either request higher rate limit with Flickr or follow steps below.

Another way to download the source images is to download packed files train_0.tar.gz and train_1.tar.gz from Open Images, then map with the urls we provide. We also provide a sample mapping code here. Due to legal requirements, we cannot provide the source image files directly.

# Install awscli if you don't have it (https://aws.amazon.com/cli/) # Download Open Images packed files aws s3 --no-sign-request --endpoint-url https://s3.amazonaws.com cp s3://open-images-dataset/tar/train_0.tar.gz . aws s3 --no-sign-request --endpoint-url https://s3.amazonaws.com cp s3://open-images-dataset/tar/train_1.tar.gz . # Create folder for extracted images mkdir openimage_source_images # Extract the tar files tar -xvzf train_0.tar.gz -C openimage_source_images tar -xvzf train_1.tar.gz -C openimage_source_images # Download metadata CSV (ImageID ↔ OriginalURL mapping) wget https://storage.googleapis.com/openimages/2018_04/train/train-images-boxable-with-rotation.csv # Map urls to local paths python map_openimage_url_to_local.py #please modify variable is_multi_turn and file paths as needed

🧩 License

Pico-Banana-400K is released under the Creative Commons Attribution–NonCommercial–NoDerivatives (CC BY-NC-ND 4.0) license. ✅ Free for research and non-commercial use ❌ Commercial use and derivative redistribution are not permitted 🖼️ Source images follow the Open Images (CC BY 2.0) license By using this dataset, you agree to comply with the terms of both licenses.

📘 Citation

If you use 🍌 Pico-Banana-400K in your research, please cite it as follows:

@misc{qian2025picobanana400klargescaledatasettextguided, title={Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing}, author={Yusu Qian and Eli Bocek-Rivele and Liangchen Song and Jialing Tong and Yinfei Yang and Jiasen Lu and Wenze Hu and Zhe Gan}, year={2025}, eprint={2510.19808}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.19808}, }

关于 About

No description, website, or topics provided.

语言 Languages

Python100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
16
Total Commits
峰值: 13次/周
Less
More

核心贡献者 Contributors