Perturbed Attention Guidance

9,737 views
Updated Categorized as Tutorial Tagged , , 12 Comments on Perturbed Attention Guidance

Perturbed Attention Guidance is a simple modification to the sampling process to enhance your Stable Diffusion images.

I will cover:

  • What Perturbed Attention Guidance is.
  • How to use it in ComfyUI and AUTOMATIC1111.
  • Comparison of settings.

Software

AUTOMATIC1111

We will use AUTOMATIC1111 , a popular and free Stable Diffusion software. Check out the installation guides on WindowsMac, or Google Colab.

Check out the AUTOMATIC1111 Guide if you are new to AUTOMATIC1111.

ComfyUI

We will use ComfyUI in this section. It is an alternative to AUTOMATIC1111.

Read the ComfyUI installation guide and ComfyUI beginner’s guide if you are new to ComfyUI.

Take the Stable Diffusion Courses to learn ComfyUI and AUTOMATIC1111 step-by-step.

What is Perturbed Attention Guidance?

Perturbed Attention Guidance (PAG) is a change in the sampling process to enhance the image quality. You can use this technique in SD 1.5 and SDXL models.

You can read the research article Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance by Donghoon Ahn and his coworkers.

Attentions in U-Nets

Stable Diffusion SD 1.5 and SDXL models use a deep neural called U-Net to denoise the image during sampling. There are many attention operations in the U-Net. There are two types of attentions:

  1. Cross-attention between the prompt and the latent image.
  2. Self-attention within the latent image.

The above applies to both the positive and negative latent images controlled by the positive and the negative prompts, respectively. The negative prompt is optional, but using it improves image quality.

The negative latent image is also called the unconditioned latent image because, originally, there was no negative prompt! The diffusion process steers away from a random, unconditioned image.

The negative prompt is a later invention that hacks the unconditioned latent image by injecting a prompt so that it steers away from the concepts in the negative prompt.

Perturbed Attention Guidance (PAG)

PAG only modifies the diffusion of the unconditioned latent image, corresponding to the one specified by the negative prompt.

It also modifies only one small step: The self-attention operation of the middle block of the U-Net.

The author argues that the unconditioned latent image is slow to form due to a lack of guidance (when the negative prompt is not used).

Instead of performing a self-attention to determine what part of the unconditoned latent image is important, PAG simply says the whole image is equally important.

In practice, as implemented in ComfyUI and A1111, the PAG doesn’t replace classifier-free guidance (CFG). Instead, both are used. The PAG diffusion direction is added to that of CFG and controlled by an independent scale factor analogous to the CFG scale.

The diffusion step is a combination of CFG and PAG.

Mathematically, the total guidance during sampling is:

Total guidance = CFG scale + PAG scale

That’s why the default setting is a CFG scale of 4 and PAG scale of 3, summing up to 7, a widely used CFG value.

Use PAG on ComfyUI

ComfyUI has native support for the Perturbed Attention Guidance node. To use it, you must update ComfyUI, which you can do easily with ComfyUI Manager.

Click the Manager > Update ComfyUI. Restart ComfyUI.

Add the PerturbedAtttentionGuidance node between the Model and KSampler node.

Or download the PAG txt2img workflow below.

The following workflow compares images with and without PAG using the same seed and image size.

Use PAG on AUTOMATIC1111

You can use Perturbed Attention Guidance with AUTOMATIC11111. You will need to install the Incantation extension.

Installing the Incantation extension

To install an extension in AUTOMATIC1111 Stable Diffusion WebUI:

  1. Start AUTOMATIC1111 Web-UI normally.

2. Navigate to the Extension Page.

3. Click the Install from URL tab.

4. Enter the URL in the URL for extension’s git repository field.

https://github.com/v0xie/sd-webui-incantations

5. Click the Install button.

6. Wait for the confirmation message that the installation is complete.

7. Restart AUTOMATIC1111.

Using PAG

To use Perturbed Attention Guidance, expand the Incantations section on the txt2img page.

Check the Active box.

Set PAG Scale to 3.

This setting works for SD 1.5 and SDXL models.

Enter a prompt and hit Generate to create an image.

PAG settings

I will use the following prompt and the Juggernaunt XL v7 model.

realistic anime half body dark and gritty cinematic lighting vibrant and Final Fantasy, goth, dark angel, dynamic pose, japanese, asymmetrical goth fashion, sorcerer’s stronghold, silver hair, dimly lit, empty hall

PAG Scale

I will use the default CFG setting of 4.

Setting the PAG scale to 0 turns it off. So PAG 0 is the reference image without PAG.

The sweet spot is between a PAG scale of 1 to 3. It’s a matter of choosing how saturated you want the images to be.

Setting it to higher than 3 over-saturates the image, an effect similar to setting a high CFG scale.

Overall, I think it’s an improvement (for this CFG setting).

Fixing total guidance

The comparison above is not entirely fair because each image has a different total guidance (CFG scale + PAG scale). You can expect a similar result of higher contrast by changing the CFG scale alone!

So, let’s fix the total guidance to 7 and see if PAG is really doing anything better.

A low PAG value (1-3) indeed improves the image quality. We also see that PAG provides stronger guidance than CFG, as the image is fried at the PAG scale of 7.

Negative prompts

A missing piece of the research article is the negative prompt.

We can get higher image quality by substituting the unconditioned latent image with a latent image conditioned by the negative prompt without using PAG.

How does PAG fare when negative prompts are used? Let’s find out.

Let’s add this negative prompt:

disfigured, ugly, deformed, low quality, beginner

The left column is with PAG 0 and CFG 7, while the right column is PAG 3 and CFG 4.

With negative prompts, using PAG still seems to be better.

Reference

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

12 comments

  1. After reading your guide, I’ve seen some success with PAG, such that I always have it at least 3-4 in my prompts (with some special exceptions where I *want* low guidance, like all the bogus beach pictures that now cycle through my desktop).

    But I don’t have the faintest clue what it does. I’ve used your settings (CFG 4, PAG 3), and copied the bit about 10 and 15 for low/high noise interval, but if you have the time to expand the article, I’d love to know how to use those settings. Your experiment suggests PAG 3 is about as high as the PAG should go before your pictures come out frazzled, but it seems like such a novel guidance method there should be more to it than “don’t set it too high.”

    Great coverage of the settings you were tweaking, though; they give you a very good idea of where you need to be to get the best results.

    1. They just claim PAG is better than CFG. I tried to address some issues in their comparison. I have already listed the usage guidance it the article. It seems that PAG increases contrast more than CFG so you can use a low PAG value in addition to CFG.

  2. I noticed there is a CFG Scheduler setting, how does that work? do i just set the CFG to 7 and then it automatically adjust the CFG and PAG, either liner, cosine, etc. with each step?

  3. anyone getting error executing traceback?
    AttributeError: ‘CFGDenoiserParams’ object has no attribute ‘denoiser’

    using SD 1.5, this error pops up on every step. Comparing PAG off and on, there looks to be no effect on the image generation.

  4. Hello, for SDXL hires.fix 2k x 2k with cfg greater than 0,5 i get double faces and weird stuff, what can i do with that?

  5. I have installed the incantations exstension and even after a full restart it does not show up in the UI. Is there some other setting or required extension that I have missed?

  6. from your experiments, any idea what is the sweet spot for cfg/pag for sdxl lighning models? my images are coming out fried at 4/3

    1. Lightning models use a very low CFG, so you are actually using the equivalent of CFG 7, which will come out fried. Try CFG and PAG with a maximum of 1.5 each, and only move them downwards from that.

Leave a comment

Your email address will not be published. Required fields are marked *