InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

  1. 1Beihang University  2Shanghai AI Laboratory

Example results (left: stable diffusion, right: ours).

overview overview overview overview overview overview

"A squirrel with a hat."

"A rabbit and a turtle."

"A red backpack and a black suitcase."

InitNO

InitNO. Our investigation dives into the exploration of various random noise configurations and their subsequent influence on the generated results. Notably, when different noises are input into SD under identical text prompts, there are marked discrepancy in the alignment between the generated image and the given text. Unsuccessful cases are delineated by gray contours, while successful instances are indicated by yellow contours. This observation underscores the pivotal role of initial noise in determining the success of the generation process.

overview

Based on this insight, we divide the initial noise space into valid and invalid regions. Introducing Initial Noise Optimization (InitNO), identified as orange arrow, our method is capable of guiding any initial noise into the valid region, thereby synthesizing high-fidelity results (orange contours) that precisely correspond to the given prompt. The same location employs the same random seed.

Results

Qualitative comparison. Each image is generated with the same text prompt and random seed for all methods. The subject tokens are highlighted in underline. Our method shows excellent alignment with text prompts while maintaining a high level of realism.

overview

Results with Complex Text Prompts

Qualitative comparison with complex text prompts. Each image is generated with the same text prompt and random seed for all methods. The subject tokens are highlighted in underline.

overview

Citation

@inproceedings{guo2024initno,
    title     = {InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization},
    author    = {Guo, Xiefan and Liu, Jinlin and Cui, Miaomiao and Li, Jiankai and Yang, Hongyu and Huang, Di},
    booktitle = {CVPR},
    year      = {2024}
}

Related Work

1. Chefer et al., "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models", SIGGRAPH 2023
2. Esser et al., "Taming Transformers for High-Resolution Image Synthesis", CVPR 2021