Stable Diffusion Glossary

Confused about a term in Stable Diffusion? You are not alone, and we are here to help. This page has all the key terms you need to know in Stable Diffusion. You will also find links to in-depth articles.

Search a keyword on this page with Ctrl+F (Windows) or Cmd+F (Mac).


4x-ultrasharp is a popular AI upscaler that produces sharp images. It is popular among Stable Diffusion users.

AI upscaler

An AI upscaler is an AI model that enlarges an image while adding details.

Ancestral sampler

An ancestral sampler adds noise to the image at each sampling step. They are stochastic samplers because the sampling outcome has some randomness to it. They usually have a standalone letter “a” in their name. E.g. Euler a.


AnimateDiff is a text-to-video method for Stable Diffusion. It uses a motion control model to influence a Stable Diffusion model to generate a video as a sequence of images with motions.

Anything v3

Anything v3 is a celebrated anime-style Stable Diffusion model. It is a Stable Diffusion v1.5 model.


AUTOMATIC1111 is a popular open-source, community-developed user interface for stable diffusion. AUTOMATIC1111 is the name of the user who started the project. The official project name is Stable Diffusion Web UI.


Civitai is a website that holds a large number of Stable Diffusion models. You can use the AUTOMATIC1111 extension Civitai Helper to facilitate the download.

Compared to Hugging Face, Civitai specializes in Stable Diffusion models. You can see many user-generated images there.

CFG scale

The Classifier-Free Guidance (CFG) scale controls how much the prompt should be followed in txt2img and img2img.

Checkpoint model

A checkpoint model is a more precise name for a Stable Diffusion model. It is used to distinguish from LoRA, textual inversion and Lycoris.


ComfyUI is a node-based user interface for Stable Diffusion. It is popular among advanced Stable Diffusion users. See installation guide.


ControlNet is a neural network that controls image generation by adding extra conditions. You can use it to control human poses and image compositions. It is a major breakthrough in Stable Diffusion.


Denoising Diffusion Implicit Models (DDIM) is one of the first samplers for solving diffusion models.


Deforum is a tool for generating videos with Stable Diffusion.

Denoiser/Noise predictor

The denoiser is at the heart of the Stable Diffusion model. It predicts the noisy image at each sampling step. The sampling method then subtracts it from the image. See how Stable Diffusion works.

Denoising strength

Denoising strength controls how much the image should change in the img2img process. Its value ranges from 0 to 1. 0 means no change. 1 means the input image is completely changed.


Diffusion is an AI image-generation technique starting with a random image and gradually denoising it to a clear image. It is inspired by the Langevin dynamics formulation of the diffusion process in Physics. See: How Stable Diffusion works.

DPM solver

Diffusion Probabilistic Model Solvers (DPM-Solvers) belong to a family of newly developed solvers for diffusion models. See the Sampler article for more details.


Dreambooth is a training technique to modify a checkpoint model. Needing as few as 5 images, you can use it to inject a person or a style into a model.

A dreambooth model needs a trigger keyword in the prompt to trigger the injected subject or style.


EMA stands for Exponential Moving Average. In a Stable Diffusion model, it is the average weights over the last training steps. Instead of the last training step, a checkpoint model often use the EMA weights to improve stability.


An embedding is the product of textual inversion. It is a small file for modifying an image. You apply embedding by putting in the associated keyword in the prompt or negative prompt.

In Stable Diffusion, embedding is an encoded version of the prompt. It is used in the cross-attention layers of the denoiser to influence the AI image.


An extension extends the functionality of AUTOMATIC1111 WebUI. For example, ControlNet is implemented through an extension.


The Euler method is the simplest sampling method for solving a diffusion model.

Face ID

Face ID is an IP-Adapter model that uses InsightFace to extract accurate facial features. The model then uses it as conditioning to generate images with highly accurate custom faces.


Fooocus is a Stable Diffusion software designed for simplicity. It centers the user experience on prompting and image generation. It’s free and open source.


Heun’s method is a sampling method. It is an improvement to Euler’s method. But it needs to predict noise twice in each step, so it is twice as slow as Euler.

Hugging Face

Hugging Face is a website that hosts a large amount of AI models. In addition, they develop tools to help run and host the models. Compared to Civitai, Hugging Face covers all AI models, not just Stable Diffusion.


A hypernetowrk is a small neural network that modifies the cross-attention module of the U-net noise predictor. Similar to LoRAs and embeddings, they are small model files used for modifying a checkpoint model.


InstantID is model that uses ControlNet and IP-adapter to copy and stylize a face image.


The Image Prompt Adapter is a method to control image generation using an image as a prompt. It is used to generate a similar image.

Karras noise schedule

Karras is a noise schedule proposed in the Karras article, which studied a unified framework for denoising images in diffusion AI models.


K-diffusion or K-samplers refer to sampling methods that Katherine Crowson’s k-diffusion GitHub repository implemented.

Latent diffusion

The Latent diffusion refers to a diffusion process in the latent space. For example, Stable Diffusion is a latent diffusion model. See how Stable Diffusion works.


Latent Consistency Model (LCM) is a new class of Stable Diffusion model trained to generate images in a single step.

Normally, each checkpoint model needs to be trained with the LCM method. LCM LoRA is a LoRA trained with the LCM method. The LoRA can be used with ANY checkpoint models to speed up generation.


The latent Diffusion Model (LDM) is an AI model that performs diffusion in the latent space. See how stable diffusion works.


The Linear Multi Step method is a method for solving ordinary differential equations. It aims at improving accuracy by clever use of the values of the previous time steps. It is one of the available sampling methods in AUTOMATIC1111.


Low Rank Adaptation (LoRA) is a method to modify a checkpoint model with a small file people called LoRA. They are used to modify the style or add a special effect to a checkpoint model. See also: How to train LoRA.


A LyCORIS is an improvement to LoRA. It can change more parts of the checkpoint model and therefore change it more. You can train Lycoris the same way as LoRA. See training LoRA.


ModelScope is a text-to-video diffusion model. It generates a short video clip from a text input.

Negative embedding

A negative embedding is an embedding intended to be used in the negative prompt.

Negative Prompt

A negative prompt is a text input to a text-to-image AI model describing what you do NOT want to see in the image. See also: How does negative prompt work.

Noise schedule

The noise schedule dictates how much noise the latent image should have at a sampling step. It is the expected level of noise the sampler tries to get to.


A prompt is a text input to a text-to-image AI model describing what you want to see in the image.

Prompt schedule

The prompt schedule is the prompt used at a given sampling step. Stable Diffusion allows the prompt to be different at each sampling step.

Regional prompter

Regional prompter is an extension that allows you to specify different prompts for different regions of the image.

Sampling Method/Sampler

A sampling method or a sampler is a method to denoise an image in Stable Diffusion. It may affect the rendering speed and may have a subtle effect on the final image.

Sampling steps

Sampling steps is the number of steps a sampler discretizes the denoising process. A higher number of steps produces a higher quality result but takes longer. Set it to at least 20.


SD.Next is a free and open-source Stable Diffusion Software that you can install locally on your machine. It is derived from AUTOMATIC1111. Many extensions of AUTOMATIC1111 can be used with SD.Next.


SDXL is an abbreviation of Stable Diffusion XL. It is a Stable Diffusion model with native resolution of 1024×1024, 4 times higher than Stable Diffusion v1.5.

SDXL Turbo

SDXL Turbo is a SDXL mdoel trained with the Turbo training method. It can reduce image generation time by about 3x.

Stable Diffusion

Stable Diffusion is a text-to-image AI model that generates images from natural language inputs. It is a latent diffusion model with a frozen language encoder.

Stable Diffusion v1.4

Stable Diffusion v1.4 is the first official release of the Stable Diffusion model. It was released in August 2022. The default image size is 512×512 pixels.

Stable Diffusion v1.5

Stable Diffusion v1.5 was an improvement to v1.4. Although it was not obvious what the improvement was, people moved on to using v1.5 anyway. The default image size is 512×512 pixels.

Stable Diffusion v2

Stable Diffusion v2 is a larger version of the v1 models. The default image size is 768×768. The model follows the prompt more literally, making it harder to prompt. There are two versions of v2 models: v2 and v2.1.

The v2 models are now mostly forgotten. Not many people use them.

Stable Diffusion XL

Stable Diffusion XL is the latest Stable Diffusion model. It produces higher quality and larger images than the Stable Diffusion v1.5 model.

Stable Zero123

Stable Zero123 is a Stable Diffusion model that generates novel views or 3D models of an object.

Textual inversion

Textual inversion is a method to inject a custom subject or style into a checkpoint model. It creates a new keyword to exert the effect. The outcome of textual inversion is called an embedding. It is a small file.

Since an embedding does not modify the checkpoint model, its effect is smaller than Dreambooth, LoRA, and LyCORIS.

Text-to-image (txt2img)

Text-to-image refers to creating an image from a text prompt.

Trigger keyword

The keyword used in the Dreambooth training. You need to use the trigger keyword in the prompt with a checkpoint model modified using Dreambooth.


VAE (Variational AutoEncoder) is a neural network that converts the image between the pixel and latent space.


U-Net is a neural network responsible for predicting noise in each sampling step. It is the most important part of a Stable Diffusion model. Finetuning methods like LoRA and hypernetwork aim at modifying it.

See How Stable diffusion works.


UniPC (Unified Predictor-Corrector) is a newer sampler released in 2023. Inspired by ODE solvers’ predictor-corrector method, it can generate high-quality images in 5-10 steps.


An upscaler enlarges an image.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *