Beginner’s Guide to Stable Diffusion Models

Updated Categorized as Tutorial Tagged , , 37 Comments on Beginner’s Guide to Stable Diffusion Models
stable diffusion model ghostmix

Models, sometimes called checkpoint files, are pre-trained Stable Diffusion weights intended for generating general or a particular genre of images.

What images a model can generate depends on the data used to train them. A model won’t be able to generate a cat’s image if there’s never a cat in the training data. Likewise, if you only train a model with cat images, it will only generate cats.

We will go introduce what models are, some common ones (v1.4, v1.5, F222, Anything V3, Open Journey v4), and how to install, use and merge them.

This is part 4 of the beginner’s guide series.
Read part 1: Absolute beginner’s guide.
Read part 2: Prompt building.
Read part 3: Inpainting.

Fine-tuned models

What is fine-tuning?

Fine-tuning is a common technique in machine learning. It takes a model that is trained on a wide dataset and trains a bit more on a narrow dataset.

A fine-tuned model will be biased toward generating images similar to your dataset while maintaining the versatility of the original model.

Why do people make them?

Stable diffusion is great but is not good at everything. For example, it can and will generate anime-style images with the keyword “anime” in the prompt. But it could be difficult to generate images of a sub-genre of anime. Instead of tinkering with the prompt, you can fine-tune the model with images of that sub-genre.

How are they made?

Two main fine-tuning methods are (1) Additional training and (2) Dreambooth. They both start with a base model like Stable Diffusion v1.4 or v1.5.

Additional training is achieved by training a base model with an additional dataset you are interested in. For example, you can train Stable Diffusion v1.5 with an additional dataset of vintage cars to bias the aesthetic of cars towards the sub-genre.

Dreambooth, initially developed by Google, is a technique to inject custom subjects into text-to-image models. It works with as few as 3-5 custom images. You can take a few pictures of yourself and use Dreambooth to put yourself into the model. A model trained with Dreambooth requires a special keyword to condition the model.

There’s another less popular fine-tuning technique called textual inversion (sometimes called embedding). The goal is similar to Dreambooth: Inject a custom subject into the model with only a few examples. A new keyword is created specifically for the new object. Only the text embedding network is fine-tuned while keeping the rest of the model unchanged. In layman’s terms, it’s like using existing words to describe a new concept.

Models

There are two groups of models: v1 and v2. I will cover the v1 models in this section and the v2 models in the next section.

There are thousands of fine-tuned Stable Diffusion models. The number is increasing every day. Below is a list of models that can be used for general purposes.

Stable diffusion v1.4

v1.4 image

Model Page

Download link

Released in August 2022 by Stability AI, v1.4 model is considered to be the first publicly available Stable Diffusion model.

You can treat v1.4 as a general-purpose model. Most of the time, it is enough to use it as is unless you are really picky about certain styles.

Stable diffusion v1.5

v1.5 image.

Model Page

Download link

v1.5 is released in Oct 2022 by Runway ML, a partner of Stability AI. The model is based on v1.2 with further training.

The model page does not mention what the improvement is. It produces slightly different results compared to v1.4 but it is unclear if they are better.

Like v1.4, you can treat v1.5 as a general-purpose model.

In my experience, v1.5 is a fine choice as the initial model and can be used interchangeably with v1.4.

F222

F222

Download link

F222 is trained originally for generating nudes, but people found it helpful in generating beautiful female portraits with correct body part relations. Interestingly, contrary to what you might think, it’s quite good at generating aesthetically pleasing clothing.

F222 is good for portraits. It has a high tendency to generate nudes. Include wardrobe terms like “dress” and “jeans” in the prompt.

Find more realistic photo-style models in this post.

Anything V3

Anything v3 model.

Model Page

Download Link

Anything V3 is a special-purpose model trained to produce high-quality anime-style images. You can use danbooru tags (like 1girl, white hair) in the text prompt.

It’s useful for casting celebrities to amine style, which can then be blended seamlessly with illustrative elements.

One drawback (at least to me) is that it produces females with disproportional body shapes. I like to tone it down with F222.

Open Journey

Open Journey model.

Model Page

Download link

Open Journey is a model fine-tuned with images generated by Mid Journey v4. It has a different aesthetic and is a good general-purpose model.

Triggering keyword: mdjrny-v4 style

Model comparison

Here’s a comparison of these models with the same prompt and seed. All but Anything v3 generate realistic images but with different aesthetics.

Compare commonly used models.
Images generated with the same seed and steps.

Best models

There are thousands of Stable Diffusion models available. Many of them are special-purpose models designed to generate a particular style. Where should you start?

Here are some of the best models I keep going back to:

DreamShaper

Dreamshaper model

Dreamshaper model is fine-tuned for a portrait illustration style that sits between photorealistic and computer graphics. It’s easy to use and you will like it if you like this style.

Model page

Download link

Deliberate v2

Deliberate v2 is another must-have model (so many!) that renders realistic illustrations. The results can be surprisingly good. Whenever you have a good prompt, switch to this model and see what you get!

Download link

Realistic Vision v2

Realistic Vision v2 is for generating anything realistic. Learn more about generating realistic people.

Model Download link

ChilloutMix

Model Page

ChilloutMix is a special model for generating photo-quality Asian females. It is like the Asian counterpart of F222. Use with Korean embedding ulzzang-6500-v1 to generate girls like k-pop.

Like F222, it generates nudes sometimes. Suppress with wardrobe terms like “dress” and “jeans” in the prompt, and “nude” in the negative prompt.

Protogen v2.2 (Anime)

Protogen v2.2 is classy. It generates illustration and anime-style images with good taste.

Protogen v2.2 model page

Download link

GhostMix

Prompt:
beautiful face, long hair, sci-fi girl, mechanical limbs, (machine made joints:1.2), impressionist, highly detailed, an extremely delicate and beautiful, side view, cinematic light,solo,full body,(blood vessels connected to tubes),(mechanical vertebra attaching to back),((mechanical cervial attaching to neck)),(sitting),expressionless,(wires and cables attaching to neck:1.2),(wires and cables on head:1.2)(character focus),science fiction,white background, extreme detailed,colorful,highest detailed

Negative Prompt:
NSFW,monochrome, zombie,overexposure, watermark,text,bad anatomy, distorted, oversized head, ugly, huge eyes, text, logo,(blurry:2.0), bad-artist, cartoon,Scribbles,Low quality,Low rated,Mediocre,3D rendering,Screenshot,Software,UI,watermark,signature

GhostMix is trained with Ghost in the Shell style, a classic anime in the 90s. You will find it useful for generating cyborgs and robots.

Download link

Waifu-diffusion

Download link

Waifu Diffusion is a Japanese anime style.

Inkpunk Diffusion

Inkpunk diffusion

Model Page

Download link

Inkpunk Diffusion is a Dreambooth-trained model with a very distinct illustration style.

Use keyword: nvinkpunk

Finding more models

You can find more models in Huggingface.

Civitai is another great resource to search for models.

v2 models

Sample 2.1 image.

Stability AI released a new series of models version 2. So far 2.0 and 2.1 models are released. The main change in v2 models are

  • In addition to 512×512 pixels, a higher resolution version 768×768 pixels is available.
  • You can no longer generate explicit content because pornographic materials were removed from training.

You may assume that everyone has moved on to using the v2 models. However, the Stable Diffusion community found that the images looked worse in the 2.0 model. People also have difficulty in using power keywords like celebrity names and artist names.

The 2.1 model has partially addressed these issues. The images look better out of the box. It’s easier to generate artistic style.

As of now, most people have not completely moved on to the 2.1 model. Many use them occasionally but spend most of the time with v1 models.

If you decided to try out v2 models, be sure to check out these tips to avoid some common frustrations.

SDXL model

SDXL model is an upgrade to the celebrated v1.5 model and the forgotten v2 models. Early testing results are very promising. The latest publicly available version is SDXL 0.9. Unlike the Beta version, you can download and run SDXL 0.9 locally.

The benefits of using the SDXL model are

  • Higher native resolution – 1024 px compared to 512 px for v1.5
  • Higher image quality (compared to the v1.5 base model)
  • Capable of generating legible text
  • Easy to generate darker images

How to install and use a model

These instructions are only applicable to v1 models. See the instructions for v2.0 and v2.1.

To install a model in AUTOMATIC1111 GUI, download and place the checkpoint (.ckpt) file in the following folder

stable-diffusion-webui/models/Stable-diffusion/

Press reload button next to the checkpoint drop box

You should see the checkpoint file you just put in available for selection. Select the new checkpoint file to use the model.

Alternatively, you can press the “iPod” button under Generate.

The model panel will appear. Select the Checkpoints tab and choose a model.

If you are new to AUTOMATIC1111 GUI, some models are preloaded in the Colab notebook included in the Quick Start Guide.

See the SDXL article for using the SDXL model.

Merging two models

Settings for merging two models.

To merge two models using AUTOMATIC1111 GUI, go to the Checkpoint Merger tab and select the two models you want to merge in Primary model (A) and Secondary model (B).

Adjust the multiplier (M) to adjust the relative weight of the two models. Setting it to 0.5 would merge the two models with equal importance.

After pressing Run, the new merged model will be available for use.

Example of a merged model

Here are sample images from merging F222 and Anything V3 with equal weight (0.5):

Compare F222, Anything V3 and Merged (50% each)

The merged model sits between the realistic F222 and the anime Anything V3 styles. It is a very good model for generating illustration art with human figures.

Model variants

On a model download page, you may see several variants of the model.

  • Pruned
  • Full
  • EMA-only
  • FP16
  • FP32
  • .pt
  • .safetensor

This is confusing! Which one should you download?

Pruned, full, EMA-only models

Some Stable Diffusion checkpoint models consist of two sets of weights: (1) The weights after the last training step, and (2) the average weights over the last few training steps called EMA (exponential moving average).

If you are only interested in using the model, you only need the EMA-only model. These are the weights you actually use when you use the model. They are sometimes called pruned models.

You will only need the full model (i.e. A checkpoint file consisting of two sets of weights) if you want to fine-tune the model with additional training.

So, download the pruned or EMA-only model if you simply want to use it to generate images. This saves you some disk space. Trust me, your hard drive will fill up very soon!

fp16/fp32 models

FP stands for floating point. It is a computer’s way of storing decimal numbers. Here the decimal numbers are the model weights. FP16 takes 16 bits per number and is called half precision. FP32 takes 32 bits and is called full precision.

For deep learning models (such as Stable Diffusion), the training data is pretty noisy. You rarely need full precision when you use the model. The extra precision just stores noise!

So, download the FP16 models if available. They are about half as big. This saves you a few GB!

Safetensor models

The original pytorch model format is .pt. The downside of this format is that it is not secure. Someone can pack some malicious code in it. The code can run on your machine when you use the model.

Safetensor is an improved version of the PT model format. It does the same thing of storing the weights, but it will not execute any codes.

So, download the safetensor version whenever it is available. If not, make sure you download the PT files from a trust-worthy source.

Other model types

Four main types of files can be called “models”. Let’s clarify them so you know what people are talking about.

  • Checkpoint models: These are the real Stable Diffusion models. They contain all you need to generate an image. No additional files are required. They are large, typically 2 – 7 GB. They are the subject of this article.
  • Textual inversions: Also called embeddings. They are small files defining new keywords to generate new objects or styles. They are small, typically 10 – 100 KB. You must use them with a checkpoint model.
  • LoRA models: They are small patch files to checkpoint models for modifying styles. They are typically 10-200 MB. You must use them with a checkpoint model.
  • Hypernetworks: They are additional network modules added to checkpoint models. They are typically 5 – 300 MB. You must use them with a checkpoint model.

Summary

In this article, I have introduced what Stable Diffusion models are, how they are made, a few common ones, and how to merge them. Using models can make your life easier when you have a specific style in mind.

This is part 4 of the beginner’s guide series.
Read part 1: Absolute beginner’s guide.
Read part 2: Prompt building.
Read part 3: Inpainting.


If you find the content helpful, please support this site by becoming a member.

Buy Me A Coffee

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He possesses a Ph.D. in engineering.

37 comments

  1. I read the four “Absolute beginner” posts and I have a question that what is the good choice to read next tutorial?

  2. Of course it was not necessary. But it was appreciated. Don’t body shame. There might be women with large breasts reading this. They should feel as welcome and celebrated as anyone else.

    Please try to be more kind and considerate of others in the future. Thank you. Be well.

  3. For some models, when I click on the link, I don’t see a download button. How do I download these models? For example: inkpunk diffusion.

  4. Was it really necessary to use images of women with extensive cleavage to illustrate your post? Does this indicate you think your market is mostly men seeking images of women with exaggerated breasts?

    1. OP did not say anything about women with extensive cleavage in the article. Does this mean that you are assuming the gender of a person just by looking how big their breasts are?

  5. hello i have problem

    C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui>git pull
    error: Your local changes to the following files would be overwritten by merge:
    javascript/hints.js
    requirements.txt
    Please commit your changes or stash them before you merge.
    Aborting
    Updating 5ab7f213..89f9faa6
    venv “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\venv\Scripts\Python.exe”
    Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
    Commit hash: 5ab7f213bec2f816f9c5644becb32eb72c8ffb89
    Installing requirements
    Launching Web UI with arguments:
    No module ‘xformers’. Proceeding without it.
    ControlNet v1.1.191
    ControlNet v1.1.191
    Error loading script: img2img.py
    Traceback (most recent call last):
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\modules\scripts.py”, line 256, in load_scripts
    script_module = script_loading.load_module(scriptfile.path)
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\modules\script_loading.py”, line 11, in load_module
    module_spec.loader.exec_module(module)
    File “”, line 883, in exec_module
    File “”, line 241, in _call_with_frames_removed
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\scripts\img2img.py”, line 16, in
    from imwatermark import WatermarkEncoder
    ModuleNotFoundError: No module named ‘imwatermark’

    Error loading script: txt2img.py
    Traceback (most recent call last):
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\modules\scripts.py”, line 256, in load_scripts
    script_module = script_loading.load_module(scriptfile.path)
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\modules\script_loading.py”, line 11, in load_module
    module_spec.loader.exec_module(module)
    File “”, line 883, in exec_module
    File “”, line 241, in _call_with_frames_removed
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\scripts\txt2img.py”, line 14, in
    from imwatermark import WatermarkEncoder
    ModuleNotFoundError: No module named ‘imwatermark’

    Loading weights [c0d1994c73] from C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\models\Stable-diffusion\realisticVisionV20_v20.safetensors
    Creating model from config: C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\configs\v1-inference.yaml
    LatentDiffusion: Running in eps-prediction mode
    DiffusionWrapper has 859.52 M params.
    Applying cross attention optimization (Doggettx).
    Textual inversion embeddings loaded(0):
    Model loaded in 8.0s (load weights from disk: 0.3s, create model: 0.5s, apply weights to model: 3.9s, apply half(): 0.7s, move model to device: 0.7s, load textual inversion embeddings: 1.9s).
    Traceback (most recent call last):
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\launch.py”, line 353, in
    start()
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\launch.py”, line 348, in start
    webui.webui()
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\webui.py”, line 302, in webui
    shared.demo = modules.ui.create_ui()
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\modules\ui.py”, line 461, in create_ui
    modules.scripts.scripts_txt2img.initialize_scripts(is_img2img=False)
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\modules\scripts.py”, line 298, in initialize_scripts
    script = script_class()
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\scripts\controlnet.py”, line 209, in __init__
    self.preprocessor = global_state.cache_preprocessors(global_state.cn_preprocessor_modules)
    File “C:\stable-diffusion\stable-diffusion1\stable-diffusion-webui\scripts\global_state.py”, line 22, in cache_preprocessors
    CACHE_SIZE = shared.cmd_opts.controlnet_preprocessor_cache_size
    AttributeError: ‘Namespace’ object has no attribute ‘controlnet_preprocessor_cache_size’
    Press any key to continue . . .

  6. Excuse me, sir. I want to add new model and VAE of like GhostMix. I already have inserted model and vae files into “stable-diffusion-webui/models/Stable-diffusion/” and “stable-diffusion-webui/models/VAE/” but it didn’t come out in the web UI.

    1. Try to refresh button? It should come up after restart.
      And check to see if you have downloaded the file correctly. The size of models should be at least 2GB.

  7. hwo do you know the trigger word for specific model, also how do you know who is in the model file.

  8. I downloaded F222 and it arrived as a zip file. When I unzip it, do I just use the file called ‘data’ and do I need to rename it?

  9. im currently stuck at this point its not downloading

    Already up to date.
    venv “C:\Users\User\stable-diffusion-webui\venv\Scripts\Python.exe”
    Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
    Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
    Installing requirements for Web UI
    Launching Web UI with arguments: –xformers
    Loading weights [9e2c6ceff3] from C:\Users\User\stable-diffusion-webui\models\Stable-diffusion\f222.ckpt
    Creating model from config: C:\Users\User\stable-diffusion-webui\configs\v1-inference.yaml
    LatentDiffusion: Running in eps-prediction mode
    DiffusionWrapper has 859.52 M params.

  10. Thank you so much, by the way.

    Like a good little geek, I’ve looked at Reddit and such for help. But people throw all of these terms around, that I have no idea what means. Figured it would all make sense eventually, but finding someone to break it all down is a godsend.

  11. Do you have to merge the files? (it all starts getting really big) So say I have small niche LORA file do I have to merge it with a big checkpoint file to use it? Or can I say something in a prompt? Other methods?

  12. When I try to merge models it says: Error merging checkpoints: unhashable type: ‘list’

    am I doing something wrong? I’m trying to merge a dreambooth face I trained, with a model I downloaded online called “midjourneyPapercut_v1.ckpt”

    1. I’m not familiar with the papercut model. Perhaps you can systematically figure out whether it is a issue with setup or models.
      1. Merge v1.4 and anythingv3 model with your setup. It should work.
      2. Merge dreambooth model with v1.4. If it doesn’t work, the issue is with dreambooth.
      3. Do the same for papercut.

  13. Have you tried Dreambooth on a non SD base model (not 1.4 or 1.5) but rather using f222 as base? My finding is that you can’t get more than 5% of your likeness into one of these already finetuned models, I would like to hear from other’s experience.

    1. No I haven’t tried it. I think models fine tuned with additional training is less stable since the fine tuning samples lack diversity. Dreambooth and text inversion were designed to solve this issue.

Leave a comment

Your email address will not be published. Required fields are marked *