How does negative prompt work?

Updated Categorized as Tutorial Tagged , 4 Comments on How does negative prompt work?

The negative prompt is an additional way to nudge Stable Diffusion to give you what you want. Unlike inpainting, which requires drawing a mask, you can use a negative prompt with all the convenience of text input. In fact, some images can only be generated by using negative prompts.

In this article, we will review a simple example of using a negative prompt. Then, you will learn how a negative prompt works in Stable Diffusion.

This is the first part of the two-part series on using negative prompts. See the second part: How to use negative prompts for guidelines on building good negative prompts.

A simple example

Positive prompt only

Let’s try generating some images of man. That’s right. We are going into uncharted territory here… I am using Stable Diffusion v1.5 with the prompt:

Portrait photo of a man.

Prompt: Portrait photo of a man.

OK, we got what we expected. No surprise. However, these men look a bit too serious. Let’s try removing their mustaches to lighten them up. Let’s try the prompt:

Portrait photo of a man without mustache.

image generated with positive prompt only.
Prompt: Portrait photo of a man without a mustache.

We have a problem here. We get even more prominent mustaches! What’s going on? The culprit is likely the failure of cross-attention to associate “without” and “mustache”. Stable Diffusion understood the prompt as “man” and “mustache”. That’s why you see both of them.

Positive and negative prompts

So what can we do to generate men without mustache? Is this something Stable Diffusion cannot do? The answer is using negative prompts. If we use the prompt

Portrait photo of a man.

together with the negative prompt

mustache

We can finally generate some men without a mustache! You will get similar results using v2 models.

Images generated with negative prompt.
Prompt: Portrait photo of a man.
Negative prompt: mustache.

This example demonstrates the principle of using negative prompts:

If you see something you don’t want, put it in the negative prompt.

How does a negative prompt work?

Recall in text-to-image conditioning, the prompt is converted to embedding vectors, which are in turn fed to the U-Net noise predictor. Well, that’s not the whole story. (Sorry, this has happened so many times…) There are actually two sets of embedding vectors, one for the positive prompt and the other for the negative prompt.

The positive and negative prompts are on equal footing. They both have 77 tokens. You can always use one with or without the other.

The negative prompt is implemented in samplers, the algorithm responsible for implementing the reverse diffusion. To understand how a negative prompt works, we will first need to understand how sampling works without using a negative prompt.

Sampling without negative prompt

In a sampling step in Stable Diffusion, the algorithm first denoises the image a little bit with conditional sampling, guided by the text prompt. The sampler then denoises the same image a little bit with unconditional sampling. That is totally unguided, as if you don’t use a text prompt. Note that it would still diffuse towards a decent image, like a basketball or a wineglass below, but it could be anything. The diffusion step that’s actually done is the difference between the conditional and unconditional samplings. This process is repeated for the number of sampling steps.

Sampling steps in Stable Diffusion WITHOUT negative prompt.
Without a negative prompt, a diffusion step is a step toward the prompt and away from random images.

Sampling with negative prompt

The negative prompt is implemented by hijacking the unconditional sampling. Instead of using an empty prompt, which generates random images, a negative prompt is used.

Sampling steps in Stable Diffusion WITH negative prompt.
When using a negative prompt, a diffusion step is a step towards the positive prompt and away from the negative prompt.

Technically, a positive prompt steers the diffusion toward the images associated with it, while a negative prompt steers the diffusion away from it. Note that the diffusion in Stable Diffusion happens in latent space, not images. The above figures in the image space are for illustration purposes only. See this great write-up if you are interested in how it is implemented at the code level.

Sampling space

Let’s consider the following illustration of sampling space. When we use the prompt “Portrait photo of a man”, Stable Diffusion samples images from the whole latent space of all men, with and without a mustache. You should get images of men with and without it.

Space of all images of men.

When the negative prompt “mustache” is added, the “Men with mustache” space is excluded. Effectively, we are sampling images from men without mustache.

Summary

I hope this article gives you a good overview of the negative prompt and how it works.

A negative prompt removes objects or styles in a way that may not be possible by tinkering with a positive prompt alone. It works by hijacking the unconditional sampling in each sampling step. The diffusion steers away from what’s described in the negative prompt.

Head to the second part: How to use a negative prompt if you want to know how to use them.

Andrew

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

4 comments

  1. Was looking for something like “How to come up with good prompts for Stable Diffusion for negative prompts” for -ve prompts. Though still this is cool too.

Leave a comment

Your email address will not be published. Required fields are marked *