VAE is a partial update to Stable Diffusion 1.4 or 1.5 models that will make rendering eyes better. I will explain what VAE is, what you can expect, where you can get it and how to install and use it.
What is VAE?
VAE stands for variational autoencoder. It is the part of the neural network model that encodes and decodes the images to and from the smaller latent space, so that computation can be faster.
Do I need a VAE?
You don’t need to install a VAE file to run Stable Diffusion. Any models you use, be it v1, v2 or custom, already comes with a default VAE.
When people say downloading and using a VAE, they refer to using an improved version of it. This happens when the model trainer further fine tuned the VAE part of the model with additional data. Instead of releasing a whole new model, which is a big file, they release only the small part that has been updated.
What is the effect of using VAE?
Usually it’s pretty small. An improved VAE decodes the image better from the latent space. Fine details are better recovered. It helps rendering eyes and text where all fine details matter.
Stability AI released two variants of fine tuned VAE decoders, EMA and MSE. (Exponential Moving Average and Mean Square Error are metrics for measuring how good the autoencoders are.)
See their comparison reproduced below.
Which one should you use? Stability’s assessment with 256×256 images is that EMA produces sharper images while MSE’s images are smoother. (That matches my own testing.)
In my own testing of Stable Diffusion v1.4 and v1.5 with 512×512 images, I see good improvements to rendering eyes in some images, especially when the faces are small. I didn’t see any improvements to rendering text but I don’t think many people are using Stable Diffusion for this reason anyway.
In no case the new VAE performs worse. Either doing better or nothing.
Below is a comparison between original, EMA and MSE using Stable Diffusion v1.5 model. (prompt can be found here.) Enlarge and compare the difference.
Improvements to text generation are not as clear (Added “holding a sign said Stable Diffusion” to the prompt):
You can also use these VAEs with a custom model. I tested with some anime models but didn’t see any improvements. I encourage you to do your own test.
As a final note, EMA and MSE are compatible with Stable Diffusion v2.0. You can use them but the effect is minimal. 2.0 is already very good at rendering eyes. Perhaps they have already incorporated the improvement to the model.
Should I use a VAE?
You don’t need to use a VAE if you are happy with the result you are getting. E.g. you are already using face restoration like CodeFormer to fix eyes.
You should use a VAE if you are in the camp of taking all little improvements you can get. You only need to go through the trouble of setting it up once. After that the art creation workflow stays the same.
How to use VAE?
VAEs are ready to use in the Colab Notebook included in the Quick Start Guide.
Currently there are two improved versions of VAE released by Stability. Below are direct download links.
This install instruction applies to AUTOMATIC1111 GUI. Place the downloaded VAE files to the directory
For Linux and Mac OS
For your convenience, running the commands below in Linux or Mac OS under
stable-diffusion-webui‘s directory downloads and installs the VAE files.
wget https://huggingface.co/stabilityai/sd-vae-ft-ema-original/resolve/main/vae-ft-ema-560000-ema-pruned.ckpt -O models/VAE/vae-ft-ema-560000-ema-pruned.ckpt wget https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt -O models/VAE/vae-ft-mse-840000-ema-pruned.ckpt
To use a VAE in AUTOMATIC1111 GUI, go to the Settings tab and find a section called SD VAE (Use Ctrl+F if you cannot find it). In the dropdown menu, select the VAE file you want to use.
Press the big red Apply Settings button on top. You should see the message
Settings: sd_vae applied
in the Setting tab when the loading is successful.
Other options in the dropdown menu are:
- None: Use the original VAE that comes with the model.
- Auto: see this post for behavior. I don’t recommend beginners to use Auto since it is easy to confuse which VAE is in use.
We have gone through how to use the two improved VAE decoders released by Stability AI. They provide small but noticeable improvement to rendering eyes. You can decide whether you want to use it.
For me, I am using it because I don’t see any cases they harm my images. Hope this article helps!