Image prompt allows you to use an image as part of the prompt to influence the output image’s composition, style, and colors. In this post, you will learn how to use image prompts in the Stable Diffusion AI image generator.
You will need to have ControlNet installed to follow this tutorial.
Install IP-adapter models
Before using the IP adapters in ControlNet, download the IP-adapter models for the v1.5 model.
Put them in ControlNet’s model folder.
stable-diffusion-webui > extensions > sd-webui-controlnet > models
What is an image prompt?
An image prompt is an image input to a Stable Diffusion model. It is in an additional input to the text prompt. Both the text prompt and the image prompt influence the AI image generation through conditioning.
You can use the image prompt with Stable Diffusion through the IP-adapter (Image Prompt adapter), a neural network described in IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models by Hu Ye and coworkers.
Similar to ControlNet, the IP-adapter does not modify a Stable Diffusion model. It influences a model by conditioning. You can use it with any Stable Diffusion model.
Using Image Prompt
You can use the Image Prompt on the txt2img page of AUTOMATIC1111.
Let’s first generate an image without using an image prompt, and add an image prompt to it to test the effect.
Let’s use the following settings on txt2img page to generate an image.
- Model: Realistic Vision v5.1
photo of a ino woman in a race car with black hair and a black pilot outfit,morning time, dessert
- Negative prompt:
disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w, 2d, 3d, illustration, sketch, nfsw, nude
- Sampling Method: DPM++ 2M Karras
- Size: 512×768
- CFG Scale: 7
We get the following image.
Adding image prompt
Let’s add an image prompt by using enabling the IP-adatper control model in the ControlNet extension.
We will use the following image as the image prompt.
On the txt2img page, scroll down to the ControlNet section.
Upload this image to the image Canvas.
Enter the following ControlNet settings:
- Enable: Yes
- Pixel Perfect: No
- Control Type: IP-Adapter
- Preprocessor: ip-adapter_clip_sd15
- Model: ip-adapter_sd15
- Control Weight: 0.5
- Starting Control Step: 0
- Ending Control Step: 1
- Control Mode: Balanced
- Resize mode: Crop and Resize
Press Generate. You get an image that is influenced by both the text prompt (a woman in a pilot outfit) and the image prompt (a woman looking sideway with a horizon in the background.)
Adjusting the effect of the image prompt
The effect of the image prompt can be controlled by adjusting the Control weight of the IP-adapter.
In this example, using a weight higher than 0.5 overwhelms the text prompt. So, it is important to set the weight to an appropriate value so that both the text prompt and the image prompt have effects.
There are two IP-adapters available. The standard model we just studied and the plus model.
To use the plus model, select ip-adapter_sd15_plus in ControlNet > model.
The plus model is very strong. It tends to copy the image prompt faithfully. See the result below.
But it has some good use cases that I will go through in a moment.
Using image prompt with SDXL model
You can use the IP-adapter with an SDXL model. The changes you need to make are:
- Checkpoint model: Select a SDXL model.
- Image size: 832×1216
- ControlNet Preprocessor: ip-adapter_clip_sdxl
- ControlNet model: ip-adapter_xl
Here’s the image without using the image prompt.
Here are the images with IP-adapter XL at various control weights. The effect is similar to the standard 1.5 model.
Use cases for IP-adapter
Generate images with the same style
With IP-adapter, you can set the color and composition to achieve a somewhat fixed style.
Control weight: 0.6
Ending control step: 0.3
We get images with a similar composition.
Reproducing an image
You may not know how to reproduce an image with a text prompt alone. For example, the image below cannot be recreated with the prompt “cat chair”
Using the prompt “cat chair”, you get:
But using IP-adapter with the reference image, you get:
The effect is stronger with the Plus model: