How to use AI upscaler to improve image details

Updated Categorized as Tutorial Tagged , 4 Comments on How to use AI upscaler to improve image details

Image AI upscaler like ESRGAN is an indispensable tool to improve the quality of images generated by Stable Diffusion. In fact, it is so commonly used that many Stable Diffusion GUI has built-in support for it.

Here, we will learn about what image upscalers are, how they work, and how to use them.

Why do we need an image upscaler?

The default image size of Stable Diffusion v1 is 512×512 pixels. This is pretty low in today’s standard. Let’s take iPhone 12 as an example. It’s camera produces 12 MP images – that is 4,032 × 3,024 pixels. Its screen displays 2,532 x 1,170 pixels so an unscaled Stable Diffusion image would need to be enlarged and look low quality.

To complicate the matter, a complex scene generated by Stable Diffusion is often not as sharp as it should be. It often struggles with fine details.

Why can’t we use a traditional upscaler?

You can, but the result won’t be as good.

Traditional algorithms for resizing images, such as the nearest neighbor interpolation and Lanczos interpolation, have been criticized for using only pixel values of the image. They enlarge the canvas and fill in the new pixels by performing mathematical operations using only the image’s pixel values. However, if the image itself is corrupted or distorted, there’s no way for these algorithms to fill in missing information accurately.

How does AI upscaler work?

In contrast, AI upscalers are models trained with massive amounts of data.

Good-quality images are first artificially corrupted to emulate real-world degradation. The degraded images are then reduced to a smaller size. A neural network model is then trained to recover the original images.

A massive amount of prior knowledge is embedded into the model. It is capable of filling in the missing information. It’s like humans don’t need to study a person’s face in great detail to remember it. We mainly pay attention to a few key features.

Below is an example of comparing the traditional (Lanczos) and AI (R-ESRGAN) upscaler. Because of the knowledge embedded in the AI upscaler, it can upscale the image and recover the details simultaneously.

Compare image recovery between Lanczos (traditional upscaler) and R-ESRGAN (AI upscaler)

How to use AI upscaler for Stable Diffusion?

We will go through how to use an AI upscaler using AUTOMATIC1111 GUI for Stable Diffusion.

See my Quick Start Guide for setting up AUTOMATIC1111 GUI.

Go to the Extras tab (I know the name is confusing), and select Single Image.

Upload the image you want to upscale to the source canvas.

Set the Resize factor. Many AI upscaler is default to upscaling 4 times, so 4 is a fine choice. Set it to a lower value, like 2, if you don’t want the image to be that big.

If your image is 512×512 pixels, resizing 2x is 1024×1024 pixels, and 4x is 2048×2048 pixels.

Select R-ESRGAN 4x+, an AI upscaler that works for most images.

Press Generate to start upscaling.

When it is done, the upscaled image will appear in the output window on the right. Right-click on the image to save.

AI upscaler options

I will go through a few notable options.

LDSR

Latent Diffusion Super Resolution (LDSR) upscaler was initially released along with Stable Diffusion 1.4. It is a latent diffusion model trained to perform upscaling tasks.

Although delivering superior quality, it is extremely slow. I won’t recommend it.

ESRGAN 4x

Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) is an upscaling network that has won the 2018 Perceptual Image Restoration and Manipulation challenge. It is an enhancement to the previous SRGAN model.

It tends to retain fine details and produce crisp and sharp images.

R-ESRGAN 4x

The Real-ESRGAN (R-ESRGAN) is an enhancement to ESRGAN and can restore a variety of real-world images. It models various degrees of distortion from the camera lens and digital compression.

Compared to ESRGAN, it tends to produce smoother images.

R-ESRGAN performs best with realistic photo images.

Other Options

There’s a good comparison in this post to check out other options.

R-ESRAGN is a good choice for photographs or realistic paintings. Anime images require upscalers specifically trained for recovering animes.

Visit Upscaler model database to download other upscalers.

Installing new upscaler

To install a new upscaler in AUTOMATIC1111 GUI, download a model from the upscaler model database and put it in the folder

stable-diffusion-webui/models/ESRGAN

Restart the GUI. Your upscaler should now be available for selection. Below is what you should see after installing the Universal Upscaler V2.

Example of upscaled images

Below is an example of a complex scene upscaled using R-ESRGAN. Enlarge and switch between them to observe the difference. Compare them on computer and cell phone screens to see the difference.


Buy Me A Coffee

4 comments

  1. I have been messing with Stable Diffusion and the Automatic web gui for the past 2 weeks, and have been having a hard time finding good documentation and what all the different settings in the GUI do. Thanks for writing a great article that’s clear and right to the point. This tutorial is 10X better than what ChatGPT could produce!

Leave a Reply