How to use ControlNet with SDXL model

Updated Categorized as Tutorial Tagged , 41 Comments on How to use ControlNet with SDXL model
ControlNet SDXL usage guide

Stable Diffusion XL (SDXL) is a brand-new model with unprecedented performance. Because of its larger size, the base model itself can generate a wide range of diverse styles.

What’s better? You can now use ControlNet with the SDXL model!

Note: This tutorial is for using ControlNet with the SDXL model. I won’t repeat the basic usage of ControlNet here. See the ControlNet guide for the basic ControlNet usage with the v1 models.

This guide covers

  • Installing ControlNet for SDXL model.
  • Copying outlines with the Canny Control models.
  • Copying depth information with the depth Control models.
  • Coloring a black and white image with a recolor model.
  • Sharpening a blurry image with the blur control model.
  • Copying an image’s content and style with the Image Prompt Adapter (IP-adapter) model.

Software

AUTOMATIC1111 Web-UI is a free and popular Stable Diffusion software. You can use this GUI on WindowsMac, or Google Colab.

Check out the Quick Start Guide if you are new to Stable Diffusion.

Installing ControlNet for Stable Diffusion XL on Google Colab

If you use our Stable Diffusion Colab Notebook, select to download the SDXL 1.0 model and ControlNet. That’s it!

Installing ControlNet for Stable Diffusion XL on Windows or Mac

Step 1: Update AUTOMATIC1111

AUTOMATIC1111 WebUI must be version 1.6.0 or higher to use ControlNet for SDXL. You can update the WebUI by running the following commands in the PowerShell (Windows) or the Terminal App (Mac).

cd stable-diffusion-webu
git pull

Delete the venv folder and restart WebUI.

Step 2: Install or update ControlNet

You need the latest ControlNet extension to use ControlNet with the SDXL model.

Read the following section if you don’t have ControlNet installed.

Skip to the Update ControlNet section if you already have the ControlNet extension installed but need to update it.

Installing ControlNet

To install the ControlNet extension in AUTOMATIC1111 Stable Diffusion WebUI:

  1. Start AUTOMATIC1111 WebUI normally.

2. Navigate to the Extension Page.

3. Click the Install from URL tab.

4. Enter the following URL in the URL for extension’s git repository field.

https://github.com/Mikubill/sd-webui-controlnet

5. Wait for the confirmation message that the installation is complete.

6. Restart AUTOMATIC1111.

You don’t need to update ControlNet after installation. You can skip the next section.

Updating ControlNet

You need ControlNet version 1.1.400 or higher to use ControlNet with the SDXL model.

To update the ControlNet extension:

  1. In AUTOMATIC1111, go to the Extensions page.

2. In the Installed tab, click Check for updates.

3. Click Apply and restart UI. (Note: This will update ALL extensions. This may be necessary to make it work.)

If AUTOMATIC1111 fails to start, delete the venv folder and start the WebUI.

Step 3: Download the SDXL control models

You can download the ControlNet models for SDXL at the following link.

Download ControlNet Models for SDXL

You can put models in 

stable-diffusion-webui\extensions\sd-webui-controlnet\models

or 

stable-diffusion-webui\models\ControlNet

There are so many models. Which ones to download? You can hold off downloading. I will give some guidance when we talk about what the models do.

VRAM settings

If your GPU card has 8 GB to 16 GB VRAM, use the command line flag --medvram-sdxl. You can edit webui-user.bat as

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--medvram-sdxl --xformers

call webui.bat

If your GPU card has less than 8 GB VRAM, use this instead.

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--lowvram --xformers

call webui.bat

Canny models

Use the Canny ControlNet to copy the composition of an image.

The Canny preprocessor detects edges in the control image. The Canny control model then conditions the denoising process to generate images with those edges.

The first problem: There are so many Canny Control models to choose from. Which one should you pick?

  • diffusers_xl_canny_full
  • diffusers_xl_canny_mid
  • diffusers_xl_canny_small
  • kohya_controllllite_xl_canny_anime
  • kohya_controllllite_xl_canny
  • sai_xl_canny_128lora
  • sai_xl_canny_256lora
  • t2i-adapter_xl_canny
  • t2i-adapter_diffusers_xl_canny

Let’s use ControlNet Canny to steal the composition of the following image for a watercolor drawing.

Txt2img Settings

  • Model: SDXL Base 1.0
  • Refiner: None
  • Width: 1216
  • Height: 832
  • CFG Scale: 5
  • Steps: 20
  • Sampler: DPM++ 2M Karras
  • Prompt:

Watercolor painting a young man sitting . Vibrant, beautiful, painterly, detailed, textural, artistic

  • Negative prompt

anime, photorealistic, 35mm film, deformed, glitch, low contrast, noisy

This is the style this prompt produces WITHOUT ControlNet:

The best Control Model should copy the composition without changing the style.

ControlNet settings:

  • Enabled: Yes
  • Pixel Perfect: Yes
  • Preprocessor: Canny
  • Model: Various
  • Control Mode: Balanced
  • Resize Mode: Crop and Resize

Diffusers Canny control models

  • diffusers_xl_canny_full
  • diffusers_xl_canny_mid
  • diffusers_xl_canny_small

Download the models here.

The diffusers XL control model comes in 3 sizes: full, mid, and small. What’s the difference?

The control weight is set to 0.25 when generating these images. Reducing the control weight and the CFG scale helps to generate the correct style.

The smaller model has a lower controlling effect. A higher control weight value can compensate for it. But you shouldn’t set it too high. Otherwise, the image may look flat.

Control Weight: 0.25. (diffusers_xl_canny_small)
Control Weight: 0.5. (diffusers_xl_canny_small)
Control Weight: 1. (diffusers_xl_canny_small)

Kohya Canny control models

  • kohya_controllllite_xl_canny_anime
  • kohya_controllllite_xl_canny

Download the models here.

The advantage of the Kohya control model is its small size. We are talking about under 50 MB!

The anime variant is trained on anime images. It is good for anime or painting styles.

Unlike the diffuser Canny models, the control weight can’t be set too low. Otherwise, there will be no effect. Around 0.75 to 1.0 is about right.

There’s a narrow range of control weight that works. So experimenting is important.

Kohya Controllllite XL Canny (control weight 0.75)

The color is more vibrant with the Kohya Canny Anime model.

Kohya Controllllite XL Canny Anime (control weight 1.0)

Stability AI Canny Control-LoRA Model

  • sai_xl_canny_128lora
  • sai_xl_canny_256lora

Download the models here.

The file sizes of these Control-LoRA are pretty reasonable: about 400 MB and 800 MB.

A control weight of around 0.75 seems to be the sweet spot.

The 128 and 256-rank LoRA perform very similarly. You can use the 128 variant if you want to conserve space.

T2I adapter

  • t2i-adapter_xl_canny
  • t2i-adapter_diffusers_xl_canny

Download the models here.

The T2I adapter versions of Canny Control model run pretty fast. But I am not impressed with the images produced.

Comparison

Impact on style

Applying a ControlNet model should not change the style of the image.

Among all Canny control models tested, the diffusers_xl Control models produce a style closest to the original.

The Kohya’s controllllite models change the style slightly. It is not a problem if you like it.

Speed

Sability AI’s Control-LoRA runs slowest.

Kohya’s controllllite and t2i-adapter models run the fastest.

But overall, the speeds are not terribly different.

Canny ModelRendering time (4 images)File size
diffusers_xl_canny_full18.4 sec2,500 MB
diffusers_xl_canny_mid16.9 sec545 MB
diffusers_xl_canny_small15.9 sec320 MB
kohya_controllllite_xl_canny15.7 sec46 MB
kohya_controllllite_xl_canny_anime15.4 sec46 MB
sai_xl_canny_128lora19.0 sec396 MB
sai_xl_canny_256lora19.7 sec774 MB
t2i-adapter_diffusers_xl_canny14.5 sec158 MB
t2i-adapter_xl_canny15.8 sec155 MB
Rendering time on RTX4090 and file size.

Size

Although diffusers_xl_canny_full works quite well, it is, unfortunately, the largest. (2.5 GB!)

kohya_controllllite control models are really small. They performed very well, given their small size.

Recommendations for Canny SDXL

Use diffusers_xl_canny_full if you are okay with its large size and lower speed.

Use kohya_controllllite_xl_canny if you need a small and faster model and can accept a slight change in style.

Use sai_xl_canny_128lora for a reasonable file size while changing the style less.

The control weight parameter is critical to generating good images. Most models need it to be lower than 1.

Depth models

Use the ControlNet Depth model to copy the composition of an image. The usage is similar to Canny but the result is different.

Here are the depth models we are going to study.

  • diffusers_xl_depth_full
  • diffusers_xl_depth_mid
  • diffusers_xl_depth_small
  • kohya_controllllite_xl_depth_anime
  • kohya_controllllite_xl_depth
  • sai_xl_depth_128lora
  • sai_xl_depth_256lora
  • sargezt_xl_depth
  • sargezt_xl_depth_faid_vidit
  • sargezt_xl_depth_zeed
  • t2i-adapter_diffusers_xl_depth_midas
  • t2i-adapter_diffusers_xl_depth_zoe

Download the models here.

A depth control model uses a depth map (like the one shown below) to condition a Stable Diffusion model to generate an image that follows the depth information.

A depth map can be extracted from an image using a preprocessor or created from scratch.

The above is extracted from an image of a woman sitting using the depth_leres preprocessor.

I will use the following ControlNet settings in this section.

  • Enabled: Yes
  • Pixel Perfect: Yes
  • Preprocessor: Depth_leres
  • Model: Various
  • Control Mode: Balanced
  • Resize Mode: Crop and Resize
  • Control weight: adjusted according to model

Diffusers depth model

  • diffusers_xl_depth_full
  • diffusers_xl_depth_mid
  • diffusers_xl_depth_small

Download the models here.

All diffusers depth models perform well. They all follow the depth map pretty well.

Kohya’s depth model

  • kohya_controllllite_xl_depth_anime
  • kohya_controllllite_xl_depth

Download the models here.

The kohya_controllllite depth control models follow the depth map well. But it tends to add an anime style.

You can suppress the anime style (to some extent) of the Kohya depth model by adding the following terms to the negative prompt.

painting, anime, cartoon

Stability AI’s depth Control-LoRA

  • sai_xl_depth_128lora
  • sai_xl_depth_256lora

Download the models here.

For the most part, these two control-LoRA work very similarly, even down to the setting level. There’s only a slight color difference between the two.

SAI control-lora 128
SAI control-Lora 256

You will be happy that they work well in a large range of control weights. This ensures this loRA works out of the box without much tweaking. Use a larger control weight to follow the depth map exactly. Use a lower value to follow loosely.

Sargezt’s depth model

Original model page:

Download the models here.

I couldn’t get any of these to work. Let me know if you are able to because some of them can do interesting things, according to the original repository.

This is what I got from the first depth model.

T2I adapter depth

  • t2i-adapter_diffusers_xl_depth_midas
  • t2i-adapter_diffusers_xl_depth_zoe

Download the models here.

The T2I depth adapters are doing a decent job of generating images that follow the depth information. Overall, the images are decent, with minimal changes to the style.

These T2I adapters are good choices if you are looking for small models.

t2i-adapter_diffusers_xl_depth_midas
t2i-adapter_diffusers_xl_depth_zoe

Recommendation for SDXL Depth

diffusers_xl_depth, sai_xl_depth, and t2i-adapter_diffusers_xl_depth models perform well despite their size differences. All are safe choices.

Recolor models

Use the recolor models to color an back-and-white photo.

  • sai_xl_recolor_128lora
  • sai_xl_recolor_256lora

Download the models here.

To use the recoloring model, navigate to the txt2img page on AUTOMATIC11111. Use the following settings in the Generation tab.

  • Model: SDXL Base 1.0
  • Refiner: None
  • Width: 1216
  • Height: 832
  • CFG Scale: 7
  • Steps: 30
  • Sampler: DPM++ 2M Karras
  • Prompt:

cinematic photo . 35mm photograph, film, bokeh, professional, 4k, highly detailed

  • Negative prompt

drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly

In the ControlNet section:

  • Image Canvas: Upload the black-and-white photo you wish to recolor
  • Pixel Perfect: Yes
  • Preprocessor: recolor_intensity
  • Model: sai_xl_recolor_128lora
  • Control Mode: Balanced
  • Resize Mode: Crop and Resize
  • Control weight: 1.0

Below is the original image used for testing.

Below are the recolored images, using the preprocessors recolor_intensity and recolor_luminance, and the 128 and 256 variants of the Control-Lora.

The luminance preprocessor produces brighter images closer to the original one. In this test, the 128-Lora has less artifact on coloring, but this could be specific to this image and prompt. I won’t draw any conclusions based on this.

They definitely work better than I expected.

Pick a good prompt for recoloring

I used a photographic prompt above. Do you really need a prompt to recolor? The answer is yes. The recoloring doesn’t work correctly without a good prompt.

Recoloring WITHOUT a prompt.
Recoloring WITH a photographic style prompt.

Not all prompts work for this image. I used some SDXL style prompts below for recoloring. You can get drastically different performance.

To conclude, you need to find a prompt matching your picture’s style for recoloring.

Recommendations for SDXL Recolor

Both the 128 and 256 Recolor Control-Lora work well.

Use the recolor_luminance preprocessor because it produces a brighter image matching human perception.

Be careful in crafting the prompt and the negative prompt. It can have a big effect on recoloring. Use these SDXL style prompts as your starting point.

You don’t need to use a refiner.

Blur models

Use the Blur model to recover a blurry image.

  • kohya_controllllite_xl_blur_anime
  • kohya_controllllite_xl_blur

Download the models here.

Let’s try to recover this blurred image.

Alternatively, you can use blur_gaussian preprocessor to blur a clear image for testing.

Of course, some image details are lost in the blur, so you should not expect to recover the same image. Instead, let’s see how well it makes up a sharp image that makes sense.

Blur

Txt2img settings:

  • Model: SDXL Base 1.0
  • Refiner: None
  • Width: 1216
  • Height: 832
  • CFG Scale: 7
  • Steps: 30
  • Sampler: DPM++ 2M Karras
  • Prompt:

cinematic photo . 35mm photograph, film, bokeh, professional, 4k, highly detailed

  • Negative prompt

drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly

ControlNet Settings:

  • Image Canvas: Upload the blurred image
  • Pixel Perfect: Yes
  • Preprocessor: None
  • Model: kohya_controllllite_xl_blur
  • Control Mode: Balanced
  • Resize Mode: Crop and Resize
  • Control weight: 1.0

The images do look sharp, but the faces lack detail. There are also a lot of irreverent objects added in the background.

Let’s turn on the refiner to polish the images a bit.

  • Refiner: sd_xl_refiner_1.0
  • Switch at: 0.6

The images do look more detailed and natural. But the backgrounds are wrong.

Write a good prompt

You can write a prompt to describe the image to improve the accuracy. Or you can use the Clip interrogator.

In the img2img page, upload the blur image to the image canvas. Then click Interrogate CLIP. You will get a prompt for the image. It may not be correct so do some editing yourself. Remove all the style keywords. I get this:

a woman leaning against a stone wall next to a stone staircase with steps leading up to it and a stone wall behind her

ADD this prompt to the original prompt and try again. Now, you get something much closer to the original image.

Use a realistic model

You can also use a fine-tuned model to enhance a style. The image below is generated with the realvisxlV20 SDXL model.

Blur anime

Let’s see if we can do something interesting by switching to the blur-anime model and an anime prompt.

Prompt:

anime artwork . anime style, key visual, vibrant, studio anime, highly detailed

Negative prompt:

photo, deformed, black and white, realism, disfigured, low contrast

We got some anime images, as expected. But the staircase in the background is not recovered correctly. You can get something much closer to the original image by using the same technique used in the previous section to revise the prompt.

IP-adapter

  • ip-adapter_xl

Download the models here.

The Image Prompt Adapter (IP-adapter) lets you use an image prompt like MidJourney. Let’s use the original example from the ControlNet extension to illustrate what it does.

Image to be used for image Prompt. (Image from the ControlNet Extension)
  • Model: SDXL Base 1.0
  • Refiner: SDXL Refiner 1.0
  • Width: 832
  • Height: 1216
  • CFG Scale: 7
  • Steps: 30
  • Sampler: DPM++ 2M Karras
  • Prompt:

Female Warrior, Digital Art, High Quality, Armor

  • Negative prompt

anime, cartoon, bad, low quality

In the ControlNet section:

  • Image Canvas: Upload an image for image prompt
  • Pixel Perfect: Yes
  • Preprocessor: ip-adapter_clip_sdxl
  • Model: ip-adapter_xl
  • Control Mode: Balanced
  • Resize Mode: Crop and Resize
  • Control weight: 1.0

With ControlNet ON:

With ControlNet OFF:

You can also use IP-Adapter in inpainting, but it has not worked well for me. I won’t go through it here.

Copy a picture with IP-Adapter

You can use the IP-Apdater to copy a picture and generate more. Let’s use this photo as an example.

Step 1: Come up with a prompt that describes the picture.

You can use the

  • Interrogate CLIP function on the img2img page, or
  • The Clip Interrogator extension. This is preferred because you can use the language model in SDXL. (ViT-g-14/laion2b_s12b_b42k)

Here’s what I got from the second method.

arafed woman with long purple hair and a black top, eva elfie, gray haired, 8k)), sienna, medium close up, light purple, lacey, vivid and detailed, vintage glow, chunky, cute looking, titanium, mid tone, hd elegant, multi – coloured, wavy, hd 16k, blond, stacks

Step 2: Set txt2img parameters

  • Model: SDXL Base 1.0
  • Refiner: SDXL Refiner 1.0
  • Width: 896
  • Height: 1152
  • CFG Scale: 7
  • Steps: 30
  • Sampler: DPM++ 2M Karras
  • Prompt: As above.

Step 3: Set ControlNet parameters

  • Image Canvas: Upload the reference image for the image prompt.
  • Pixel Perfect: Yes
  • Preprocessor: ip-adapter_clip_sdxl
  • Model: ip-adapter_xl
  • Control Mode: Balanced
  • Resize Mode: Crop and Resize
  • Control weight: 1.0

Step 4: Press Generate

Here’s what I got.

OpenPose

The OpenPose ControlNet model is for copying a human pose but the outfit, background and anything else.

There are quite a few OpenPose models available. But the best performing one is xinsir’s OpenPose ControlNet model. You can download it here. Put it in the models > ControlNet folder and rename it to diffusion_xl_openpose.safetensors.

Settings:

  • Preprocessor: openpsoe
  • Model: diffusion_xl_openpose.safetensors
  • Control weight: 1

Below is an example of the generated images.

Good reads

[Major Update] sd-webui-controlnet 1.1.400 – Official writeup of SDXL ControlNet models for WebUI.

Stabilityai/control-lora – An overview of Stability AI’s Control LoRA models.

kohya-ss/controlnet-lllite – Model Card of ControlNet-LLLite.

tencent-ailab/IP-Adapter – GitHub page of the Image Prompt adapter.

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

41 comments

  1. Hi Andrew
    In “Use a realistic model”
    Should the desired model be used in the txt2img settings or in the checkpoint of the refiner section?

  2. Hi Andrew,

    Thank you so much for all your hard work! It is really appreciated.

    I am having trouble with controlnet. I have followed all the instructions to a t – checked for updates – reinstalled. Everytime I attempt to use controlnet this long error message starting with *** Error running process: /Users/Berg/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlnet.py

    And ending by
    RuntimeError: You have not selected any ControlNet Model

    Do you have any idea what the issue could be

  3. Hi, Andrew,
    Thank you for this great guide. Super helpful.

    Been having problems running some Preprocessors in ControlNet (a1111 both on SDXL and 1.5) and hoping you can help me.
    I’ve tried updating, deleting and reinstalling, I’ve tried running SD a1111 in different browsers, redownloading the models, tried etc., and I just keep getting the following error:


    urllib.error.URLError:

    Anything come to mind that could help me solve this?

    Like I mentioned, only a few preprocessors work.
    For example, for “Depth,” only the “depth_zoe” works, all others give the error. For “OpenPose”, none of the preprocessors work, “Canny” on the other hand, they all work.
    This happens both in SDXL and in 1.5 models.

    Any ideas would be great. Thank you in advance.

      1. Hi, Andrew
        Thank you for taking the time to respond.
        I actually was able to figure it out with the help of Chat GPT!
        Looking at the entire read-out in the Command Line there was actual paths to the missing Annotators for each of the Preprocessors.
        So, while I’m not actually sure what caused the underlying problem, I was able to download the missing files manually, and it seems to be working.
        Thank you again!

  4. I’ve used Sarge’s main v1e depth map controlnet , the first link with good results.
    I requires a different depth map format that usual did you use the right one ?
    It uses the inferno colour map, basically RGB as a single 24 bit value with 0 (black) being the ‘nearest’ value and 8*8*8 (white) being the farthest.

    I haven’t tried the others but they look like they use greyscale but again with 0 being the near value, and black being the far value unlike the others with use 1 as the near value.

    1. Replying to myself actually SargeZT/controlnet-sd-xl-1.0-depth-faid-vidit uses a interesting colour map that repeats I think, it’s hard to tell what goes on in the middle, and it almost looks like a normal map combined with depth. Guess well will never know now SargeZT has passed away.

  5. Thanks for the info. I’ve shifted to XL just today and I was wondering if controlnet got supported. Do you have any info if/when the Reference tab will be available for XL models?

      1. Oh sorry, yes I mean the Reference model in controlnet. When I try to use it, the generated image change slightly compared to without reference (same parameters) , but i’m pretty sure it’s not working as it should since there is not the slightest hint of the referenced image in it. Just some random smudges/noise.

          1. But don’t they have different scope? I thought IP adapter was to keep the same face only? (kind of) At least that’s what I got from the official samples provided

          2. There are a few IP adapters.

            IP adapter and IP adapter plus copies the whole picture. So it is similar to reference.

            IP adapter face and face id copies the face only.

  6. Hello, Andrew, I am wondering if it is possible to obtain a vector image in .svc format from a photograph, using Control Net. If it’s not possible, does Stable Diffusion offer me any other tool to do it?
    Thank you very much!

  7. OpenPose cannot be used in the XL model that makes me very frustrated. As a result, I can only switch to the 1.5 version of the Stable Diffusion when I want to use ControlNet. And the OpenPose in the XL model has many different versions, but there is no one that can be used.

  8. Thanks for step by step guide really useful.
    I am puzzled why none of the images using the canny model looked like the model. The pose is correct but the male model’s face looks drastically different. The older version of Controlnet would have reproduced a similar facial feature if you used the canny model. Is there something I am missing?

  9. Hi. I would switch to controlNet-xs for SDXL. Its much better and way smaller. Controlnet-xs does not copy the sdxl model internal but its a New and slimmer design to Focus on its task, i. E. To only control the Image generarion process.

  10. Hi Andrew,
    thank you for the sdxl adds!
    Do you think would be possible to have implemented on Colab notebook the Infinite zoom?
    We are experiencing some issue using it with sdxl uploading from url

  11. Hi Andrew, thanks for showing some paths in the jungle. In my experience t2i-adapter_xl_openpose and t2i-adapter_diffusers_xl_openpose work with ComfyUI; however, both support body pose only, and not hand or face keynotes. thibaud_xl_openpose also runs in ComfyUI and recognizes hand and face keynotes; but, it is extremely slow.
    What still missing is lineart. There is t2i-adapter_diffusers_xl_lineart, which unfortunately creates lots of sketch and, yes, linearts.

  12. Has anyone tested with deforum in hybrid video mode?
    For me it gave inconsistent results, as if I wasn’t using any controlnet

  13. Hi,
    What about models for QRCodes for instance?
    I imagine they all have to be “ported” to SDXL.
    If I try to use a model for SD1.5 with SDXL, WebUI crashes.

      1. Thanks a lot, it works 🙂
        Seems refiner must be turned off in the txt2img, but it can still be used via img2img… Only that right now if I use the refiner I lose some of the effect of the controlNet. I guess I’ll have to play to find correct setting.

Leave a comment

Your email address will not be published. Required fields are marked *