ShortFT

ShortFT endeavors to achieve the alignment of diffusion models with reward functions by facilitating the end-to-end backpropagation of the targeted reward gradient throughout the denoising chain.

Our method has exhibited remarkable efficacy, particularly evident in the realms of text-image alignment and the overall enhancement of image quality (Top). Moreover, its versatility is underscored by the successful application across diverse reward functions, substantially amplifying alignment performance (Bottom).

ShortFT

The core of the proposed method is the Shortcut-based Fine-Tuning (ShortFT), which leverages the trajectory-preserving few-step diffusion model as the shortcut (identified as blue arrow) to achieve direct end-to-end backpropagation through the diffusion sampling process, fine-tuning the parameters of the pre-trained diffusion model to align it with the reward function.

Results

Each image is generated with the same text prompt and random seed for all methods. All methods are trained with the same computational cost. Our method outperforms existing methods in both text-image alignment and image quality.

Citation

@inproceedings{guo2025shortft,
    title     = {ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning},
    author    = {Guo, Xiefan and Cui, Miaomiao and Bo, Liefeng and Huang, Di},
    booktitle = {ICCV},
    year      = {2025}
}

Related Work

1. Clark et al., "Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models", ICLR 2024
2. Wu et al., "Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models", ECCV 2024
3. Xu et al., "ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation", NeurIPS 2023
4. Prabhudesai et al., "Aligning Text-to-Image Diffusion Models with Reward Backpropagation", arXiv 2023