How to run Stable Diffusion 3 locally

You can now run the Stable Diffusion 3 Medium model locally on your machine. As of the time of writing, you can use ComfyUI to run SD 3 Medium.

Here’s the video version of this tutorial.

Table of Contents

Software
Systems requirement
Model
License concern
ComfyUI
Comparisons
Conclusion

Software

We will use ComfyUI, a free AI image and video generator. You can use it on Windows, Mac, or Google Colab.

Think Diffusion provides an online ComfyUI service. They offer an extra 20% credit to our readers.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step by step.

Systems requirement

You need a GPU card with 12 GB to use the full SD3 medium model. A smaller variant of SD3 Medium (without T5XXL) requires 8 GB VRAM.

Model

The Stable Diffusion 3 Medium model is the so-called 2B model. (ie. 2 billion parameters).

The SD 3 Medium model is different from the model accessible through Stable Diffusion 3 API, which is likely to be the 8B Large model. The SD3 Medium model described in the article is less capable.

See an overview of Stable Diffusion 3 if you are unfamiliar with the model.

License concern

The SD3 model is free for non-commercial use. There are some concerns about what constitutes commercial use. Does using an image generated with the SD3 model in a book you sell count as commercial use?

Disclaimer: I’m not a lawyer. Below is my read of the license file.

The license limits the usage of the model and its “derivative works”.

You may not use the Software Products or Derivative Works to enable third parties to use the Software Products or Derivative Works as part of your hosted service or via your APIs,

So, what are the derivative works? The license clarifies that it does NOT include the images generated by the model.

For clarity, Derivative Works do not include the output of any Model.

In the context of Stable Diffusion, Derivative works mean fine-tuned models.

The license explicitly forbids you from hosting an image-generation service without obtaining a commercial license from them. It should be OK to use the images generated by the model in any way you want as long as you compile with their Acceptable Use Policy.

Stability’s subscription page is less clear. The name Creator License seems to suggest that you should buy the license ($20 per month) if you are an artist or a creator of social media.

Stability should clarify the license issue ASAP given that it relies so much on the user community for its success.

ComfyUI

Step 1: Update ComfyUI

The easiest way to update ComfyUI is to use ComfyUI Manager.

Select Manager > Update ComfyUI.

Step 2: Download SD3 model

Download the SD3 model.

SD 3 Medium (10.1 GB) (12 GB VRAM) (Alternative download link)
SD 3 Medium without T5XXL (5.6 GB) (8 GB VRAM) (Alternative download link)

Put it in ComfyUI > models > checkpoints.

Step 3: Load the workflow

Download the workflow JSON file below and drop it in ComfyUI.

Download

Step 4: Select a model and generate an image

In the Load Checkpoint node, Select

stableDiffusion3SD3_sd3MediumInclT5XXL for the full model. (12 GB VRAM)
stableDiffusion3SD3_sd3MediumInclClips for the model without T5XXL. (8 GB VRAM)

Click Queue Prompt to generate an image.

Image sizes

Here is a list of aspect ratios and image size:
1:1 – 1024 x 1024
5:4 – 1152 x 896
3:2 – 1216 x 832
16:9 – 1344 x 768
21:9 – 1536 x 640

Comparisons

Here’s a first look at the models’ performance.

Text generation

Generating legible text is a big improvement in the Stable Diffusion 3 API model. Let’s see if the locally-run SD 3 Medium performs equally well.

Prompt:

The words “Stable Diffusion 3 Medium” made with fire and lava. dimly lit background with rocks

Negative Prompt:

disfigured, deformed, ugly

Stable Diffusion 3 Medium:

Stable Diffusion 3 Medium without T5XXL.

Stable Diffusion 3 API:

Unfortunately, the SD 3 Medium model did not generate text as well as the Stable Diffusion 3 API model, which is likely the Large 8B model.

Controlling poses

Stable Diffusion 3 Medium has issues with human anatomy. See the following comparison between SD3 Medium, SDXL, and SD3 API (Large).

Prompt:

Photo of a woman sitting on a chair with both hands above her head, white background

Negative prompt:

disfigured, deformed, ugly, detailed face

Stable Diffusion 3 Medium:

Stable Diffusion 3 Medium without T5XXL:

Below are the images from the SDXL model.

Stable Diffusion 3 API (Large):

Overall, Stable Diffusion 3 Medium’s capability in generating correct human pose is worse than SDXL.

However, the SD3 Medium model is not too bad at generating fingers! This is a nice surprise.

Photo of a woman showing her palm, new york city background

Prompt adherence

Let’s test if the model can accurately follow the prompt. I will use the following prompt.

Still life painting of a skull above a book, with an orange on the right and an apple on the left

SD3 Medium: 2 out of 3 is correct

SD3 Medium (without T5XXL): 1 out of 3 is correct

SDXL: None is correct

Stable Diffusion 3 API (Large): All are correct

Here’s another example of excellent prompt adherence.

a man and woman are standing together gains a brick wall. The left side of the brick wall is red, right side is gold. the woman is wearing a t-shirt with a panda motif, she has a long skirt with birds on it, the man is wearing a silver suit, he has spiky red hair

I’m pleasantly surprised that SD 3 medium follows the prompt well and outperforms SDXL. At least something is moving in the right direction!

Conclusion

SD 3 Medium excels in following the prompt closely, which is a big improvement over the SDXL model. While it is a bit disappointing to generate text and human anatomy, these defects can likely be corrected by further fine-tuning and the use of the SD 3 Large model.

15 comments

Lucas says:

June 20, 2024 at 5:05 pm

I tried it locally and can confirm that a 3060 12GB can handle it, but after running many tests, I thought I’d just downloaded 20GB of junk. Human anatomy is subpar in the 2B model and struggles in many other tasks. I noticed some Pony XL finetunes are really good and not complicated to use and train loras on, maybe is something you might want to cover.

1. Andrew says:
  
  June 21, 2024 at 8:22 am
  
  I’ve been avoiding pony because of the subject. but looks like it can generate more than ponies. let me take a look.
  
Trekeyus says:

June 19, 2024 at 3:12 pm

Does invoke AI support it yet?
I use invoke AI on my 2070 super and mostly use sdxl. If I remember correctly the 2070s uper only has 8 gigs of vram.

1. Andrew says:
  
  June 20, 2024 at 8:04 am
  
  Not yet.
  
Bo says:

June 18, 2024 at 3:36 pm

What about ComfyAI being canned, and SD3 canceled due to various trust issues?

1. Andrew says:
  
  June 19, 2024 at 8:51 am
  
  Not sure what you meant by ComfyAI (comfyUI?) canned.
  
  Regarding CivitAI banning SD3, The creator license clearly doesn’t work for many companies including CivitAI. It costs money to host models and let people download them for free. They will need to get a better deal to make the business work. I will leave my other unpopular opinions to the newsletter.
  
Txmac says:

June 17, 2024 at 8:33 pm

What do we think about running on a m3 Mac??

Alvaro says:

June 17, 2024 at 7:32 pm

I tested SD3 on my humble laptop with RTX 4060 8GB VRAM and blows out due OutOfMemoryError 🙁

1. Andrew says:
  
  June 17, 2024 at 7:51 pm
  
  Try 1024×1024 with the smaller-size model.
  
Rafael Bravin says:

June 17, 2024 at 10:17 am

Can a nvidia 2060, 6gb handle the smaller model? Is it worth trying?

1. Andrew says:
  
  June 17, 2024 at 1:51 pm
  
  It may work if comfy is smart enough to offload the model before loading VAE. its worth a try.
  
Hello Charlie says:

June 17, 2024 at 6:39 am

I haven’t tested it out, but I read that Ruined Fooocus currently has support for SD3.

1. Andrew says:
  
  June 17, 2024 at 9:03 am
  
  Yeah, seems it has partial support for it. Another option is Stable Swarm UI https://github.com/Stability-AI/StableSwarmUI
  
Humulin says:

June 16, 2024 at 10:27 am

In step 2, You got the information about the model sizes reversed.
Thanks for valuable info.

1. Andrew says:
  
  June 16, 2024 at 8:36 pm
  
  Good catch! corrected.