Are you confused about a term in Stable Diffusion? You are not alone, and we are here to help. This page has all the key terms you need to know in Stable Diffusion. You will also find links to in-depth articles.
Search a keyword on this page with Ctrl+F (Windows) or Cmd+F (Mac).
An AI upscaler is an AI model that enlarges an image while adding details.
An ancestral sampler adds noise to the image at each sampling step. They are stochastic samplers because the sampling outcome has some randomness to it. They usually have a standalone letter “a” in their name. E.g. Euler a.
AnimateDiff is a text-to-video method for Stable Diffusion. It uses a motion control model to influence a Stable Diffusion model to generate a video as a sequence of images with motions.
Anything v3 is a celebrated anime-style Stable Diffusion model. It is a Stable Diffusion v1.5 model.
AUTOMATIC1111 is a popular open-source, community-developed user interface for stable diffusion. AUTOMATIC1111 is the name of the user who started the project. The official project name is Stable Diffusion Web UI.
Compared to Hugging Face, Civitai specializes in Stable Diffusion models. You can see many user-generated images there.
ControlNet is a neural network that controls image generation by adding extra conditions. You can use it to control human poses and image compositions. It is a major breakthrough in Stable Diffusion.
Denoising Diffusion Implicit Models (DDIM) is one of the first samplers for solving diffusion models.
Denoising strength controls how much the image should change in the img2img process.
Diffusion is an AI image-generation technique starting with a random image and gradually denoising it to a clear image. It is inspired by the Langevin dynamics formulation of the diffusion process in Physics. See: How Stable Diffusion works.
A dreambooth model needs a trigger keyword in the prompt to trigger the injected subject or style.
EMA stands for Exponential Moving Average. In a Stable Diffusion model, it is the average weights over the last training steps. Instead of the last training step, a checkpoint model often use the EMA weights to improve stability.
An embedding is the product of textual inversion. It is a small file for modifying an image. You apply embedding by putting in the associated keyword in the prompt or negative prompt.
The Euler method is the simplest sampling method for solving a diffusion model.
Fooocus is a Stable Diffusion software designed for simplicity. It centers the user experience on prompting and image generation. It’s free and open source.
Heun’s method is a sampling method. It is an improvement to Euler’s method. But it needs to predict noise twice in each step, so it is twice as slow as Euler.
Hugging Face is a website that hosts a large amount of AI models. In addition, they develop tools to help run and host the models. Compared to Civitai, Hugging Face covers all AI models, not just Stable Diffusion.
A hypernetowrk is a small neural network that modifies the cross-attention module of the U-net noise predictor. Similar to LoRAs and embeddings, they are small model files used for modifying a checkpoint model.
Karras noise schedule
K-diffusion or K-samplers refer to sampling methods that Katherine Crowson’s k-diffusion GitHub repository implemented.
The Latent diffusion refers to a diffusion process in the latent space. For example, Stable Diffusion is a latent diffusion model. See how Stable Diffusion works.
The latent Diffusion Model (LDM) is an AI model that performs diffusion in the latent space. See how stable diffusion works.
The Linear Multi Step method is a method for solving ordinary differential equations. It aims at improving accuracy by clever use of the values of the previous time steps. It is one of the available sampling methods in AUTOMATIC1111.
Low Rank Adaptation (LoRA) is a method to modify a checkpoint model with a small file people called LoRA. They are used to modify the style or add a special effect to a checkpoint model. See also: How to train LoRA.
A negative embedding is an embedding intended to be used in the negative prompt.
Regional prompter is an extension that allows you to specify different prompts for different regions of the image.
A sampling method or a sampler is a method to denoise an image in Stable Diffusion. It may affect the rendering speed and may have a subtle effect on the final image.
Sampling steps is the number of steps a sampler discretizes the denoising process. A higher number of steps produces a higher quality result but takes longer. Set it to at least 20.
SDXL is an abbreviation of Stable Diffusion XL.
Stable Diffusion is a text-to-image AI model that generates images from natural language inputs. It is a latent diffusion model with a frozen language encoder.
Stable Diffusion v1.4
Stable Diffusion v1.4 is the first official release of the Stable Diffusion model. It was released in August 2022. The default image size is 512×512 pixels.
Stable Diffusion v1.5
Stable Diffusion v1.5 was an improvement to v1.4. Although it was not obvious what the improvement was, people moved on to using v1.5 anyway. The default image size is 512×512 pixels.
Stable Diffusion v2
Stable Diffusion v2 is a larger version of the v1 models. The default image size is 768×768. The model follows the prompt more literally, making it harder to prompt. There are two versions of v2 models: v2 and v2.1.
The v2 models are now mostly forgotten. Not many people use them.
Stable Diffusion XL
Stable Diffusion XL is the latest Stable Diffusion model. It produces higher quality and larger images than the Stable Diffusion v1.5 model.
Textual inversion is a method to inject a custom subject or style into a checkpoint model. It creates a new keyword to exert the effect. The outcome of textual inversion is called an embedding. It is a small file.
Text-to-image refers to creating an image from a text prompt.
VAE (Variational AutoEncoder) is a neural network that converts the image between the pixel and latent space.
U-Net is a neural network responsible for predicting noise in each sampling step. It is the most important part of a Stable Diffusion model. Finetuning methods like LoRA and hypernetwork aim at modifying it.
UniPC (Unified Predictor-Corrector) is a newer sampler released in 2023. Inspired by ODE solvers’ predictor-corrector method, it can generate high-quality images in 5-10 steps.
An upscaler enlarges an image.