InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
-
1Beihang University 2Shanghai AI Laboratory
|
|
|
|
|
|
"A squirrel with a hat." |
"A rabbit and a turtle." |
"A red backpack and a black suitcase." |
InitNO
InitNO. Our investigation dives into the exploration of various random noise configurations and their subsequent influence on the generated results. Notably, when different noises are input into SD under identical text prompts, there are marked discrepancy in the alignment between the generated image and the given text. Unsuccessful cases are delineated by gray contours, while successful instances are indicated by yellow contours. This observation underscores the pivotal role of initial noise in determining the success of the generation process.
Based on this insight, we divide the initial noise space into valid and invalid regions. Introducing Initial Noise Optimization (InitNO), identified as orange arrow, our method is capable of guiding any initial noise into the valid region, thereby synthesizing high-fidelity results (orange contours) that precisely correspond to the given prompt. The same location employs the same random seed.
Results
Qualitative comparison. Each image is generated with the same text prompt and random seed for all methods. The subject tokens are highlighted in underline. Our method shows excellent alignment with text prompts while maintaining a high level of realism.
Results with Complex Text Prompts
Qualitative comparison with complex text prompts. Each image is generated with the same text prompt and random seed for all methods. The subject tokens are highlighted in underline.
Citation
@inproceedings{guo2024initno, title = {InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization}, author = {Guo, Xiefan and Liu, Jinlin and Cui, Miaomiao and Li, Jiankai and Yang, Hongyu and Huang, Di}, booktitle = {CVPR}, year = {2024} }
Related Work
1. Chefer et al., "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models", SIGGRAPH 2023
2. Esser et al., "Taming Transformers for High-Resolution Image Synthesis", CVPR 2021