Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

FDFO: Finite Difference Flow Optimization

Teaser image

This repository contains the official reference implementation of FDFO, a method for fine-tuning flow-based diffusion models using finite difference gradient estimation. We fine-tune Stable Diffusion 3.5 Medium using reward signals from VLM-based scoring and/or PickScore.

Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models
David McAllister, Miika Aittala, Tero Karras, Janne Hellsten, Angjoo Kanazawa, Timo Aila, Samuli Laine
https://arxiv.org/abs/2603.12893

Requirements

  • GPUs: At least 4 NVIDIA GPUs with at least 48GB memory (tested on A6000, A100, and H200)
  • Python: 3.11

Environment setup

# Create conda environment conda create -n fdfo python=3.11 conda activate fdfo # Install dependencies pip install torch torchvision torchaudio pip install transformers accelerate diffusers numpy pillow scipy sentencepiece wandb tyro peft dreamsim # Log in to HuggingFace (required for SD 3.5) hf auth login # Download PickScore prompts (required for training) mkdir -p prompt_sets curl -o prompt_sets/pickscore_train.txt https://raw.githubusercontent.com/yifan123/flow_grpo/9745241cf1101407e9b518dd4f10498092150522/dataset/pickscore/train.txt

We have tested the code with specific package versions listed here.

Pre-trained checkpoints

We provide the full set of pretrained checkpoints for 5 representative training runs shown in the paper:

To generate a set of images for a given checkpoint, run:

export CHECKPOINT=https://huggingface.co/nvidia/finite-difference-flow-optimization/tree/main/fdfo-combined-reward-no-cfg/epoch-0000100 python generate.py --checkpoint $CHECKPOINT --out out.jpg

This should take a couple of minutes and produce a result like this one.

Training

Sanity check using a single GPU

We recommend first launching a short training run on a single GPU to ensure that the required dependencies are installed and HuggingFace models are in the cache:

python train.py --num-epochs 5 --total-pairs-per-epoch 16

This should take about 10–20 minutes and complete without errors.

Reproducing our results using torchrun multi-GPU

To reproduce the results from our paper, launch the training script using at least 4 GPUs with the default options:

torchrun --nproc_per_node=8 train.py

We have verified that the the training script produces correct results with 4, 8, 16, 24, and 32 GPUs, and does not require more than 48GB of VRAM in these cases. The results are expected to be reasonably good around 100 epochs, which takes about 12 hours to reach using 4 H200 GPUs, or 1.5 hours using 32 H200 GPUs. By default, the training continues all the way until 1000 epochs to illustrate the tradeoffs associated with extended RL training.

Outputs

  • Checkpoints: Saved to runs/<run_id>-<run_name>/checkpoints/
  • Logging: W&B logging available behind flag (--log-wandb)

Reward configuration

FDFO supports multiple reward signals. Select a preset via --reward-preset:

  • pickscore - PickScore only
  • vlm_alignment - VLM prompt alignment only
  • combined (default) - VLM prompt alignment + PickScore

For example:

torchrun --nproc_per_node=8 train.py --reward-preset pickscore

Evaluation

Evaluate a given checkpoint using various metrics:

export CHECKPOINT=https://huggingface.co/nvidia/finite-difference-flow-optimization/tree/main/fdfo-combined-reward-no-cfg/epoch-0000100 # Training-time rewards # 8min on 8xH200 # pickscore 22.4704 # vlm_prompt 61.7239 torchrun --nproc_per_node=8 metrics.py --checkpoint $CHECKPOINT \ --prompt-set pickscore_train --num-prompts 4096 --num-repeats 1 --metrics pickscore vlm_alignment # External control metrics # 22min on 8xH200 # clip_h14 35.7743 # clip_l14 28.1932 # dreamsim_diversity 0.4550 # hpsv2 28.2949 torchrun --nproc_per_node=8 metrics.py --checkpoint $CHECKPOINT \ --prompt-set hpdv2 --num-prompts 3200 --num-repeats 4 --metrics hpsv2 clip_h14 clip_l14 dreamsim_diversity

Available metrics

  • CLIP-based: pickscore, hpsv2, clip_h14, clip_l14
  • VLM-based: vlm_alignment (prompt adherence), vlm_quality (professional quality), vlm_photo (photorealism)
  • Diversity: dreamsim_diversity (requires --num-repeats >= 2)

License

The FDFO source code and pretrained checkpoints are licensed under the NVIDIA Source Code License v1 (Non-Commercial). Copyright © 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

The Stable Diffusion 3.5 Medium Model is licensed under the Stability AI Community License. Copyright © Stability AI Ltd. All rights reserved. Powered by Stability AI.

Development

This is a research reference implementation and is treated as a one-time code drop. As such, we do not accept outside code contributions in the form of pull requests.

关于 About

FDFO: Finite Difference Flow Optimization

语言 Languages

Python100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
3
Total Commits
峰值: 2次/周
Less
More

核心贡献者 Contributors