Align Your Steps: How-to guide and review

3,243 views
Updated Categorized as Tutorial Tagged 3 Comments on Align Your Steps: How-to guide and review

Align Your Steps (AYS) is a change in the sampling process proposed by the Nvidia team to solve the reverse diffusion equation more accurately. The paper claims that using it can generate high-quality images in as few as 10 steps with this simple change.

See a comparison from the research article below.

Sample images from the Align Your Steps paper.

In this article, I will cover:

  • What Align Your Steps is trying to do
  • How to use it in ComfyUI
  • My independent test against standard alternatives.

Software

You can use Algin Your Steps in ComfyUI today. Support in AUTOMATIC1111 is coming. I will update this article when it is available.

Read the ComfyUI installation guide and ComfyUI beginner’s guide if you are new to ComfyUI.

Consider taking the ComfyUI course if you want to learn ComfyUI step-by-step.

What is Align Your Steps?

Align Your Steps is not a new model but a change to the sampling processing. More precisely, it is a change to the noise schedule. You can use Align Your Steps with any models.

To understand what Align Your Steps is , you will first need to understand what a sampling process is. I will explain it briefly in the following sections. You can find a more in-depth explanation here.

Sampling process

Diffusion models generate samples from noise. This process is called Reverse Diffusion, and we need to solve a reverse diffusion equation. Since we cannot solve the equation exactly on a computer because it has no analytical solution, we break it into discrete steps to solve it.

The process is called discretization.

Discretization error

The diffusion process is continuous. When you set the number of sampling steps, you break the process into a fixed number of steps to solve it. This would introduce discretization errors.

You can solve it this way, and usually it is the only way, but the answer is going to be different from the actual solution.

Normally, discretization error is small if you set the number of steps high enough, like 30 steps.

Below is an example of solving the Stable Diffusion 1.5 mode in 15 discrete steps using the Euler sampler.

stable diffusion euler

Noise schedule

You need to define a noise level at each step, which is the expected noise level at each step. The set of noise levels of each step is called the noise schedule.

The difference in noise level between two steps is the step size.

Of course, you can divide the steps uniformly so that the noise decreases linearly. But this doesn’t produce high-quality images. Instead, you get better images if you take larger steps in the beginning and small steps near the end, like the popular Karras noise schedule below.

The smaller steps at the end improve fine details.

Align Your Steps

How do you come up with the noise schedule? Most are based on empirical results.

What Align Your Steps does is propose a systematic method to create a noise schedule that minimizes the discretization error. It answers the following question:

If you can only do N sampling steps, what is the optimal noise schedule that minimizes the discretation error?

Solving the optimal noise schedule requires minimizing the Kullback-Leibler Upper Bound of each sampling step, which is the upper bound of the discretization error.

In practice, such as in ComfyUI and Hugging Face’s diffusers, the optimal noise schedule for 10 steps is hard-coded in the Align Your Steps noise schedule. If you do more than 10 steps, an interpolation in a log-linear scale is performed to get the new noise schedule.

Below is a comparison between Align Your Steps and Karras. You can see that Align Your Steps takes even more steps at low noise levels at the expense of larger initial noise steps.

Compare the noise schedule of Karras and AYS.

The optimal noise schedule depends on the training data, which is another way to say that it depends on models. In practice, you can use a single noise schedule for all SD 1.5 models and another for all SDXL models.

How to use Align Your Steps in ComfyUI

ComfyUI has native support for Align Your Steps. After a software update, you will be able to use it.

Updating ComfyUI

To update ComfyUI, click the Manager button (You only have it after installing the ComfyUI Manager) > Update ComfyUI. Restart completely.

Workflows

Here’s the ComfyUI workflow for using Align Your Steps with SDXL.

Here’s the ComfyUI workflow for using Align Your Steps with Stable Diffusion 1.5.

A review of Align Your Steps

How much benefit do you really get from using Align Your Steps?

The paper uses mostly 10 sampling steps and compares with the time-uniform schedule. None of these conditions are commonly used.

I will test the new schedule with the popular Karras schedule and with higher sampling steps.

Close-up images

Let’s start with close-up portriat images which are less likely to go wrong. Here are images from the juggernaut XL v7 model with Align Your Steps.

Prompt:

photo portrait of a beautiful 25 year old girl dancer

Negative prompt:

(worst quality, low quality), deformed, distorted, disfigured, doll, poorly drawn, bad anatomy, wrong anatomy, nsfw, ugly

The quality looks good even in 10 steps.

Here are using the Karras noise schedule for reference. It is not bad either.

Conclusion: 10 steps of Align Your Steps is competent in close up images, but the standard Karras scheduler yields similar results.

Full body images

Full body images are more challenging to get right because they require both good global consistency and good fine details.

Prompt:

full body photo portrait of a beautiful 25 year old girl dancer

Negative prompt:

(worst quality, low quality), deformed, distorted, disfigured, doll, poorly drawn, bad anatomy, wrong anatomy, nsfw, ugly

10 or even 20 steps is not sufficient to converge for Align Your Steps. The 10-step image has obvious artifects.

Images below are using the same model and sampler but with the Karras noise schedule. The 10-step image likewise has obvious defects but in different ways. Interestingly, Karas seems to converges globally at 20 steps.

Conclusion: For more complex images like full body images, you need at least 20 steps.

Testing the prompts in the paper

The paper compared EDM (time-continuous), time-uniform and Align Your Steps schedules with several prompts. The issue is, these two schedules are not common used.

In this section, I will retest the prompts with Karras and Align Your Steps with 10 sampling steps.

Babel tower

Spiderbaby

Conclusion: There is little difference in perceptual quality between the two noise schedules.

Conclusion

Align Your Steps is a competent noise schedule that yields good-quality images. However, the difference between other popular schedules like Karras is not that big.

You can generate good images in 10 steps in some conditions. But if you can wait, you should use 20 steps or higher.

A potential advantage of Align Your Steps is that it spends more steps in the low noise levels. This improves fine details in expense of accuracy of the global composition.

Reference

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

3 comments

Leave a comment

Your email address will not be published. Required fields are marked *