Quiz – text-to-image workflow in ComfyUI

Drag and Drop the items into the correct order.

The image is saved to your local storage
The CLIP model encodes the text prompts into embeddings
The random latent image is denoised, conditioned by the prompt
The VAE Decoder converts the latent image into the pixel image
Load the checkpoint model