Stable Diffusion Glossary - Stable Diffusion Art

Confused about a term in Stable Diffusion? You are not alone, and we are here to help. This page has all the key terms you need to know in Stable Diffusion. You will also find links to in-depth articles.

Search a keyword on this page with Ctrl+F (Windows) or Cmd+F (Mac).

4x-Ultrasharp

4x-ultrasharp is a popular AI upscaler that produces sharp images. It is popular among Stable Diffusion users.

AI upscaler

An AI upscaler is an AI model that enlarges an image while adding details.

Ancestral sampler

An ancestral sampler adds noise to the image at each sampling step. They are stochastic samplers because the sampling outcome has some randomness to it. They usually have a standalone letter “a” in their name. E.g. Euler a.

AnimateDiff

AnimateDiff is a text-to-video method for Stable Diffusion. It uses a motion control model to influence a Stable Diffusion model to generate a video as a sequence of images with motions.

Anything v3

Anything v3 is a celebrated anime-style Stable Diffusion model. It is a Stable Diffusion v1.5 model.

AUTOMATIC1111

AUTOMATIC1111 is a popular open-source, community-developed user interface for stable diffusion. AUTOMATIC1111 is the name of the user who started the project. The official project name is Stable Diffusion Web UI.

Civitai

Civitai is a website that holds a large number of Stable Diffusion models. You can use the AUTOMATIC1111 extension Civitai Helper to facilitate the download.

Compared to Hugging Face, Civitai specializes in Stable Diffusion models. You can see many user-generated images there.

CFG scale

The Classifier-Free Guidance (CFG) scale controls how much the prompt should be followed in txt2img and img2img.

Checkpoint model

A checkpoint model is a more precise name for a Stable Diffusion model. It is used to distinguish from LoRA, textual inversion and Lycoris.

CLIP

CLIP (Contrastive Language Image Pretraining) is a deep-learning neural network model developed by OpenAI that maps images and text to a shared embedding space. Stable Diffusion and Flux encode the text prompt with CLIP so that the model can generate images the text represents.

ComfyUI

ComfyUI is a node-based user interface for Stable Diffusion. It is popular among advanced Stable Diffusion users. See installation guide.

ControlNet

ControlNet is a neural network that controls image generation by adding extra conditions. You can use it to control human poses and image compositions. It is a breakthrough in Stable Diffusion.

DDIM

Denoising Diffusion Implicit Models (DDIM) is one of the first samplers for solving diffusion models.

Deforum

Deforum is a tool for generating videos with Stable Diffusion.

Denoiser/Noise predictor

The denoiser is at the heart of the Stable Diffusion model. It predicts the noisy image at each sampling step. The sampling method then subtracts it from the image. See how Stable Diffusion works.

Denoising strength

Denoising strength controls how much the image should change in the img2img process. Its value ranges from 0 to 1. 0 means no change. 1 means the input image is completely changed.

Diffusion

Diffusion is an AI image-generation technique starting with a random image and gradually denoising it to a clear image. It is inspired by the Langevin dynamics formulation of the diffusion process in Physics. See: How Stable Diffusion works.

DPM solver

Diffusion Probabilistic Model Solvers (DPM-Solvers) belong to a family of newly developed solvers for diffusion models. See the Sampler article for more details.

Dreambooth

Dreambooth is a training technique to modify a checkpoint model. Needing as few as 5 images, you can use it to inject a person or a style into a model.

A dreambooth model needs a trigger keyword in the prompt to trigger the injected subject or style.

EMA

EMA stands for Exponential Moving Average. In a Stable Diffusion model, the stored weights are often the average of the last few training steps. It has the benefit of making the model more stable.

Embedding

An embedding is the product of textual inversion. You can think of it as a custom keyword created by the textual inversion training.

Training an embedding model has become less popular, giving way to more powerful methods like Dreambooth and LoRA.

Extension

An extension extends the functionality of AUTOMATIC1111 WebUI. For example, ControlNet is implemented through an extension.

Euler

The Euler method is the simplest sampling method for solving a diffusion model.

Face ID

Face ID is an IP-Adapter model that uses InsightFace to extract accurate facial features. The model then uses it as conditioning to generate images with highly precise custom faces.

Fooocus

Fooocus is a Stable Diffusion software designed for simplicity. It centers the user experience on prompting and image generation. It’s free and open source.

Flux

Flux is a text-to-image model developed by the Black Forest Labs. It is a 12-billion parameter model that excels in photorealistic images. The following variants are available.

Flux.1 Pro: This highest-quality Flux model is intended for professional use where the highest-quality images are necessary.
Flux.1 Dev: A faster model (with guidance distillation) at the expense of quality. This is an open model widely used by the community.
Flux.1 Schnell: An even faster Flux model that generates images with 1 to 4 sampling steps. The quality is lower in return.

Heun

Heun’s method is a sampling method. It is an improvement to Euler’s method. But it needs to predict noise twice in each step, so it is twice as slow as Euler.

Hugging Face

Hugging Face is a website that hosts a large amount of AI models. In addition, they develop tools to help run and host the models. Compared to Civitai, Hugging Face covers all AI models, not just Stable Diffusion.

Hypernetwork

A hypernetowrk is a small neural network that modifies the cross-attention module of the U-net noise predictor. Similar to LoRAs and embeddings, they are small model files used for modifying a checkpoint model.

InstantID

InstantID is a model that uses ControlNet and IP-adapter to copy and stylize a face image.

IP-adapter

The Image Prompt Adapter is a method to control image generation using an image as a prompt. It is used to generate a similar image.

Karras noise schedule

Karras is a noise schedule proposed in the Karras article, which studied a unified framework for denoising images in diffusion AI models.

K-diffusion/K-sampler

K-diffusion or K-samplers refer to sampling methods that Katherine Crowson’s k-diffusion GitHub repository implemented.

Latent diffusion

The Latent diffusion refers to a diffusion process in the latent space. For example, Stable Diffusion is a latent diffusion model. See how Stable Diffusion works.

LCM LoRA

Latent Consistency Model (LCM) is a new class of Stable Diffusion model trained to generate images in a single step.

Normally, each checkpoint model needs to be trained with the LCM method. LCM LoRA is a LoRA trained with the LCM method. The LoRA can be used with ANY checkpoint models to speed up generation.

LDM

The latent Diffusion Model (LDM) is an AI model that performs diffusion in the latent space. See how stable diffusion works.

LMS

The Linear Multi Step method is a method for solving ordinary differential equations. It aims at improving accuracy by clever use of the values of the previous time steps. It is one of the available sampling methods in AUTOMATIC1111.

LoRA

Low Rank Adaptation (LoRA) is a method to modify a checkpoint model with a small file people called LoRA. They are used to modify the style or add a special effect to a checkpoint model. See also: How to train LoRA.

Lycoris

A LyCORIS is an improvement to LoRA. It can change more parts of the checkpoint model and therefore change it more. You can train Lycoris the same way as LoRA. See training LoRA.

ModelScope

ModelScope is a text-to-video diffusion model. It generates a short video clip from a text input.

Negative embedding

A negative embedding is an embedding intended to be used in the negative prompt.

Negative Prompt

A negative prompt is a text input to a text-to-image AI model describing what you do NOT want to see in the image. See also: How does negative prompt work.

Noise schedule

The noise schedule dictates how much noise the latent image should have at a sampling step. It is the expected level of noise the sampler tries to get to.

Prompt

A prompt is a text input to a text-to-image AI model describing what you want to see in the image.

Prompt schedule

The prompt schedule is the prompt used at a given sampling step. Stable Diffusion allows the prompt to be different at each sampling step.

Regional prompter

Regional prompter is an extension that allows you to specify different prompts for different regions of the image.

Sampling Method/Sampler

A sampling method or a sampler is a method to denoise an image in Stable Diffusion. It may affect the rendering speed and may have a subtle effect on the final image.

Sampling steps

Sampling steps is the number of steps a sampler discretizes the denoising process. A higher number of steps produces a higher quality result but takes longer. Set it to at least 20.

SD.Next

SD.Next is a free and open-source Stable Diffusion Software that you can install locally on your machine. It is derived from AUTOMATIC1111. Many extensions of AUTOMATIC1111 can be used with SD.Next.

SDXL

SDXL is an abbreviation of Stable Diffusion XL. It is a Stable Diffusion model with native resolution of 1024×1024, 4 times higher than Stable Diffusion v1.5.

SDXL Turbo

SDXL Turbo is a SDXL mdoel trained with the Turbo training method. It can reduce image generation time by about 3x.

Stable Diffusion

Stable Diffusion is a text-to-image AI model that generates images from natural language inputs. It is a latent diffusion model with a frozen language encoder.

Stable Diffusion v1.4

Stable Diffusion v1.4 is the first official release of the Stable Diffusion model. It was released in August 2022. The default image size is 512×512 pixels.

Stable Diffusion v1.5

Stable Diffusion v1.5 was an improvement to v1.4. Although it was not obvious what the improvement was, people moved on to using v1.5 anyway. The default image size is 512×512 pixels.

Stable Diffusion v2

Stable Diffusion v2 is a larger version of the v1 models. The default image size is 768×768. The model follows the prompt more literally, making it harder to prompt. There are two versions of v2 models: v2 and v2.1.

The v2 models are now mostly forgotten. Not many people use them.

Stable Diffusion XL

Stable Diffusion XL is a popular Stable Diffusion model. It produces high-quality images with a native resolution of 1,024×1,024.

Stable Diffusion 3

Stable Diffusion 3 is an improvement upon the Stable Diffusion XL model. The full model is available through API and has never been released. The medium version can be run locally, but the quality is below the community’s expectations.

Stable Diffusion 3.5

Stable Diffusion 3.5 is an improvement over the version 3 model. It produces diverse styles and follows the prompt well. The medium and the large versions can be run locally.

Stable Zero123

Stable Zero123 is a Stable Diffusion model that generates novel views or 3D models of an object.

Textual inversion

Textual inversion is a method to inject a custom subject or style into a checkpoint model. It creates a new keyword to exert the effect. The outcome of textual inversion is called an embedding. It is a small file.

Since an embedding does not modify the checkpoint model, its effect is smaller than Dreambooth, LoRA, and LyCORIS.

Text-to-image (txt2img)

Text-to-image refers to creating an image from a text prompt.

Trigger keyword

The keyword used in the Dreambooth training. You need to use the trigger keyword in the prompt with a checkpoint model modified using Dreambooth.

Torch

Torch refers to PyTorch, a machine-learning library in AUTOMATIC1111, Forge, and ComfyUI.

VAE

A VAE (Variational AutoEncoder) is a neural network that converts the image between the pixel and latent space.

It has two parts:

The VAE Decoder converts an image from the latent to the pixel space.
The VAE Encoder converts an image from the pixel to the latent space.

U-Net

U-Net is a neural network responsible for predicting noise in each sampling step. It is the most important part of a Stable Diffusion model. Finetuning methods like LoRA and hypernetwork aim at modifying it.

See How Stable diffusion works.

UniPC

UniPC (Unified Predictor-Corrector) is a newer sampler released in 2023. Inspired by ODE solvers’ predictor-corrector method, it can generate high-quality images in 5-10 steps.

Upscaler

An upscaler enlarges an image.

3 comments

Michael says:

January 11, 2025 at 1:08 pm

Great list, but missing the following items:
– CLIP
– Torch

1. Andrew says:
  
  January 12, 2025 at 8:26 am
  
  I forgot this page lol. these and a few more will be added.
  
baber says:

April 5, 2024 at 9:32 pm

nice