Beginner’s guide to inpainting (step-by-step examples)

213,113 views
Updated Categorized as Tutorial Tagged , 17 Comments on Beginner’s guide to inpainting (step-by-step examples)

No matter how good your prompt and model are, it is rare to get a perfect image in one shot.

Inpainting is an indispensable way to fix small defects. In this post, I will go through a few basic examples to use inpainting for fixing defects.

If you are new to AI images, you may want to read the beginner’s guide first.

This is part 3 of the beginner’s guide series.
Read part 1: Absolute beginner’s guide.
Read part 2: Prompt building.
Read part 4: Models.

Image model and GUI

We will use Stable Diffusion AI and AUTOMATIC1111 GUI. See my quick start guide for setting up in Google’s cloud server.

Basic inpainting settings

In this section, I will show you step-by-step how to use inpainting to fix small defects.

I will use an original image from the Lonely Palace prompt:

[emma watson: amber heard: 0.5], (long hair:0.5), headLeaf, wearing stola, vast roman palace, large window, medieval renaissance palace, ((large room)), 4k, arstation, intricate, elegant, highly detailed

(Detailed settings can be found here.)
Original image

It’s a fine image but I would like to fix the following issues

  • The face looks unnatural.
  • The right arm is missing.

Use an inpainting model (optional)

Do you know there is a Stable Diffusion model trained for inpainting? You can use it if you want to get the best result. But usually, it’s OK to use the same model you generated the image with for inpainting.

To install the v1.5 inpainting model, download the model checkpoint file and put it in the folder

stable-diffusion-webui/models/Stable-diffusion

In AUTOMATIC1111, press the refresh icon next to the checkpoint selection dropbox at the top left. Select sd-v1-5-inpainting.ckpt to enable the model.

Creating an inpaint mask

In AUTOMATIC1111 GUI, Select the img2img tab and select the Inpaint sub-tab. Upload the image to the inpainting canvas.

inpainting canvas

We will inpaint both the right arm and the face at the same time. Use the paintbrush tool to create a mask. This is the area you want Stable Diffusion to regenerate the image.

Create mask use the paintbrush tool.

Settings for inpainting

Prompt

You can reuse the original prompt for fixing defects. This is like generating multiple images but only in a particular area.

Image size

The image size needs to be adjusted to be the same as the original image. (704 x 512 in this case).

Face restoration

If you are inpainting faces, you can turn on restore faces. You will also need to select and apply the face restoration model to be used in the Settings tab. CodeFormer is a good one.

Caution that this option may generate unnatural looks. It may also generate something inconsistent with the style of the model.

Mask content

The next important setting is Mask Content.

Select original if you want the result guided by the color and shape of the original content. Original is often used when inpainting faces because the general shape and anatomy were ok. We just want it to look a bit different.

In most cases, you will use Original and change denoising strength to achieve different effects.

You can use latent noise or latent nothing if you want to regenerate something completely different from the original, for example removing a limb or hiding a hand. These options initialize the masked area with something other than the original image. It will produce something completely different.

Denoising strength

Denoising strength controls how much change it will make compared with the original image. Nothing will change when you set it to 0. You will get an unrelated inpainting when you set it to 1.

0.75 is usually a good starting point. Decrease if you want to change less.

Batch size

Make sure to generate a few images at a time so that you can choose the best ones. Set the seed to -1 so that every image is different.

Prompt(Same as original)
Sampling steps20
Seed-1
Image size704 x 512
Face restorationCodeformer
Sampling methodEuler a
ModelStable Diffusion v1.5 inpainting
Mask contentlatent noise or latent nothing
Inpaint at full resolutionOn
Denoising strength0.75

Inpainting results

Below are some of the inpainted images.

One more round of inpainting

I like the last one but there’s an extra hand under the newly inpainted arm. Follow similar steps of uploading this image and creating a mask. Masked content must be set to latent noise to generate something completely different.

The hand under the arm is removed with the second round of inpainting:

Use inpainting to remove the extra hand under the arm.

And this is my final image.

A side-by-side comparison

Left: original. Right: inpainted 2 times.

Inpainting is an iterative process. You can apply it as many times as you want to refine an image.

See this post for another more extreme example of inpainting.

See the tutorial for removing extra limbs with inpainting.

Adding new objects

Sometimes you want to add something new to the image.

Let’s try adding a hand fan to the picture.

First, upload the image to the inpainting canvas and create a mask around the chest and right arm.

Add the prompt “holding a hand fan” to the beginning of the original prompt. The prompt for inpainting is

(holding a hand fan: 1.2), [emma watson: amber heard: 0.5], (long hair:0.5), headLeaf, wearing stola, vast roman palace, large window, medieval renaissance palace, ((large room)), 4k, arstation, intricate, elegant, highly detailed

Adding new objects to the original prompt ensures consistency in style. You can adjust the keyword weight (1.2 above) to make the fan show.

Set masked content as latent noise.

Adjust denoising strength and CFG scale to fine-tune the inpainted images.

After some experimentation, our mission is accomplished:

Adding a hand fan with inpainting.

Explanation of inpainting parameters

Denoising strength

Denoising strength controls how much respect the final image should pay to the original content. Setting it to 0 changes nothing. Setting to 1 you got an unrelated image.

Set to a low value if you want small change and a high value if you want big change.

Changing denoising strength. Set to low value if you want small change and high value if you want big change.

CFG scale

Similar to usage in text-to-image, the Classifier Free Guidance scale is a parameter to control how much the model should respect your prompt.

1 – Mostly ignore your prompt.
3 – Be more creative.
7 – A good balance between following the prompt and freedom.
15 – Adhere more to the prompt.
30 – Strictly follow the prompt.

Masked content

Masked content controls how the masked area is initialized.

  • Fill: Initialize with a highly blurred of the original image.
  • Original: Unmodified.
  • Latent noise: Masked area initialized with fill and random noise is added to the latent space.
  • Latent nothing: Like latent noise except no noise is added to the latent space.

Below are the initial mask content before any sampling steps. This gives you some idea of what they are.

Masked content.

Tips for inpainting

Successful inpainting requires patience and skill. Here are some take homes for using inpainting

  • One small area at a time.
  • Keep masked content at Original and adjust denoising strength works 90% of the time.
  • Play with masked content to see which one works the best.
  • If nothing works well within AUTOMATIC1111’s settings, use photo editing software like Photoshop or GIMP to paint the area of interest with the rough shape and color you wanted. Upload that image and inpaint with original content.

Check out the Stable Diffusion Course for a step-by-step guided course.

Or continue to part 4 below.

This is part 3 of the beginner’s guide series.
Read part 1: Absolute beginner’s guide.
Read part 2: Prompt building.
Read part 4: Models.

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

17 comments

  1. hello, thanks for the tutorial, i have just load your image sample. Then mask it using brush. Then add some prompt, then error happend below :
    NansException: A tensor with all NaNs was produced in Unet. This could be either because there’s not enough precision to represent the picture, or because your video card does not support half type. Try setting the “Upcast cross attention layer to float32” option in Settings > Stable Diffusion or using the –no-half commandline argument to fix this. Use –disable-nan-check commandline argument to disable this check.

  2. Any idea why when I select inpainting from the dropdown list, it never loads? It counts in seconds but entirely locks my PC to the point that after 5 minutes or more I reboot. I get the same when selecting the refiner, too, I let that count to over 3000 seconds before hard rebooting. Thanks in advance 🙂

  3. I really like this website. It provides deep insight with simple explanations. My favorite chapters are basic prompting, ControlNet, and regional prompter. Without Andrew and team I would not have understood it.
    Now it’s my turn to share something, especially about inpainting. I think we need to consider to combine inpainting with ControlNet. So, we tell SD how the inpainted part should look like. Based on my experience lineart is a good choice. We just need to draw some white line segments or curves and upload it to ControlNet.
    I took an example of inpainting bad hand here:
    https://powerpointopenposeeditor.wordpress.com/2023/07/03/chapter-8-correcting-band-hand-with-inpainting-and-lineart/

  4. No matter what I do, I just get a blurred area if I select “latest noise”. Original + restore faces does nothing to the final image. I don’t understand what’s going on.

    1. I typically select original. It sounds like you didn’t set denoising strength high enough. Set to 0.75 as a starting point and adjust accordingly.

  5. You said select Latent noise for removing hand. How does that suppose to work? Latent noise just added lots of weird pixated blue dots in mask area on the top of extra hand and that was it. It just makes whole image look worser than before? This tutorial needs to explain more about what to do if you get oddly colorful pixated in place of extra hand when you select Latent noise.

    1. Hi, the “oddly colorful pixels” for latent noise was for illustration purpose only. It was obtained by setting sampling step as 1. In practice, you set it to higher values like 25, so that the random colorful pixels would converge to a nice image.

      Alternatively, you can use “original” but increase denoising strength.

      1. Hi Andrew,

        Thanks for your clarification. I followed your instruction and this example, and it didn’t remove extra hand at all. Just add more pixels on the top of it. I tried both Latent noise and original and it doesn’t make any difference. I can’t see how you achieved this in two steps when I tried to do this step 135 times and it got worse and worse (basically AI got dumber and dumber every time I repeat this step in my feeling).

        I am lost. Maybe it’s worthwhile to proofread this tutorial because I feel that there is a missing step or two? Thanks for your help/clarification.

        1. Hi Peter, the method should work in majority of cases and I am happy to revise to make it clearer. If you don’t mind, could you send me an image and prompt that doesn’t work, so I understand where the pain point is? [email protected]

Leave a comment

Your email address will not be published. Required fields are marked *