This beginner’s guide is for newbies with zero experience with Stable Diffusion or other AI image generators.
I will give an overview of Stable Diffusion, what it can do for you, and some tips for using it.
This is part 1 of the beginner’s guide series.
Read part 2: Prompt building.
Read part 3: Inpainting.
Read part 4: Models.
Contents
- What is Stable Diffusion?
- What kind of images can I generate with Stable Diffusion?
- Sign me up! How to start generating images?
- How to build a good prompt?
- Rule of thumbs of building good prompts
- What are those parameters, and should I change them?
- How many images should I generate?
- What is image-to-image?
- Common ways to fix defects in images
- What are custom models?
- Negative prompts
- How to control image composition?
- Next Step
What is Stable Diffusion?
Stable Diffusion is an AI model that generates images from text input. Let’s say if you want to generate images of a gingerbread house, you use a prompt like:
gingerbread house, diorama, in focus, white background, toast , crunch cereal
The AI model would generate images that match the prompt:



There are similar text-to-image generation services like DALLE and MidJourney. Why Stable Diffusion? The advantages of Stable Diffusion are
- Open-source: Many enthusiasts have created free and powerful tools.
- Designed for low-power computers: It’s free or cheap to run.
What kind of images can I generate with Stable Diffusion?
Only Sky is the limit. Here are some examples.
Anime style



Photo-realistic



Landscape



Fantasy



Artistic styles



Sign me up! How to start generating images?
Online generator
For absolute beginners, I recommend using a free online generator. Go to one of the sites in the list, put in the example prompt above, and you are in business!
Advanced GUI
The downside of free online generators is that the functionalities are pretty limited.
You can use a more advanced GUI (Graphical User Interface) if you outgrow them. I use AUTOMATIC1111, a powerful and popular choice. See the Quick Start Guide for setting up in the Google Colab cloud server.
Running on your PC is also a good option if you have a decent NVIDIA GPU with at least 4GB VRAM. See install guides for Windows and Mac.
Why use an advanced GUI? A whole array of tools are at your disposal
- Advanced prompting techniques
- Regenerate a small part of an image with Inpainting
- Generate images based on an input image (Image-to-image)
- Edit an image by telling an instruction.
How to build a good prompt?
There’s a lot to learn to craft a good prompt. But the basic is to describe your subject in as much detail as possible. Make sure to include powerful keywords to define the style.
Using a prompt generator is a great way to learn a step-by-step process and important keywords. It is essential for beginners to learn a set of powerful keywords and their expected effects. This is like learning vocabulary for a new language. You can also find a short list of keywords and notes here.
A shortcut to generating high-quality images is to reuse existing prompts. Head to the prompt collection, pick an image you like, and steal the prompt! The downside is that you may not understand why it generates high-quality images. Read the notes and change the prompt to see the effect.
Alternatively, use image collection sites like Playground AI. Pick an image you like and remix the prompt. But it could be like finding a needle in a haystack for a high-quality prompt.
Treat the prompt as a starting point. Modify to suit your needs.
Rule of thumbs of building good prompts
Two rules: (1) Be detailed and specific, and (2) use powerful keywords.
Be detailed and specific
Although AI advances in leaps and bounds, Stable Diffusion still cannot read your mind. You need to describe your image in as much detail as possible.
Let’s say you want to generate a picture of a woman in a street scene. A simplistic prompt
a woman on street
gives you an image like this:

Well, you may not want the generate a grandma, but this technically matches your prompt. You cannot blame Stable Diffusion…
So instead, you should write more.
a young lady, brown eyes, highlights in hair, smile, wearing stylish business casual attire, sitting outside, quiet city street, rim lighting

See the drastic difference. So work on your prompt-building skills!
Use powerful keywords
Some keywords are more powerful than others. Examples are
- Celebrity names (e.g. Emma Watson)
- Artist names (e.g. van Gogh)
- Art medium (e.g. illustration, painting, photograph)
Using them carefully can steer the image in the direction you want.
You can learn more about prompt building and example keywords in the basics of building prompts.
Want to cheat? Like doing homework, you can use ChatGPT to generate prompts!
What are those parameters, and should I change them?
Most online generators allow you to change a limited set of parameters. Below are some important ones:
- Image size: The size of the output image. The standard size is 512×512 pixels. Changing it to portrait or landscape size can have big impact on the image. For example, use portrait size to generate a full-body image.
- Sampling steps: Use at least 20 steps. Increase if you see blurry image.
- CFG scale: Typical value is 7. Increase if you want the image to follow the prompt more.
- Seed value: -1 generates a random image. Specify a value if you want the same image.
See recommendations for other settings.

How many images should I generate?
You should always generate multiple images when testing a prompt.
I generate 2-4 images at a time when making big changes to the prompt, so that I can speed up the search. I would generate 4 at a time when making small changes to increase the chance of seeing something usable.
Some prompt only works half of the time or less. So don’t write off a prompt based on one image.
What is image-to-image?

Image-to-image (or img2img for short) takes (1) an image and (2) a prompt as an input. You can guide the image generation not just with the prompt but also with the image.
In fact, you can see text-to-image as a particular case of image-to-image: It is simply image-to-image with an input image of random noise.
Img2img is an under-appreciated technique. See how you can make professional drawings and cartoonize photos with img2img!
Common ways to fix defects in images
When you see stunning AI images shared on social media, there’s a good chance they have undergone a series of post-processing steps. We will go over some of them in this section.
Face Restoration

It’s well known in the AI artist community that Stable Diffusion is not good at generating faces. Very often, the faces generated have artifacts.
We often use image AI models that are trained for restoring faces, for example, CodeFormer, which AUTOMATIC1111 GUI has built-in support. See how to turn it on.
Do you know there’s an update to v1.4 and v1.5 models to fix eyes? Check out how to install a VAE.
Fixing small artifacts with inpainting
It is difficult to get the image you want on the first try. A better approach is to generate an image with good composition. Then repair the defects with inpainting.
Below is an example of an image before and after inpainting. Using the original prompt for inpainting works 90% of the time.

There are other techniques to fix things. Read more about fixing common issues.
What are custom models?
The official models released by Stability AI and their partners are called base models. Some examples of base models are Stable Diffusion 1.4, 1.5, 2.0, and 2.1.
Custom models are trained from the base models. Currently, most of the models are trained from v1.4 or v1.5. They are trained with additional data for generating images of particular styles or objects.
Only the sky is the limit when it comes to custom models. It can be anime style, Disney style, the style of another AI. You name it.
Below is a comparison of 5 different models.

It is also easy to merge two models to create a style in between.
Which model should I use?
Stick with the base models if you are starting out. There are pretty to learn and play with to keep you busy for months.
The two main groups of base models are v1 and v2. v1 models are 1.4 and 1.5. v2 models are 2.0 and 2.1.
You may think you should start with the newer v2 models. People are still trying to figure out how to use the v2 models. Images from v2 are not necessarily better than v1’s.
I recommend using the v1.5 model if you are new to Stable Diffusion.
How to train a new model?
An advantage of using Stable Diffusion is that you have total control of the model. You can create your own model with a unique style if you want. Two main ways to train models: (1) Dreambooth and (2) embedding.
Dreambooth is considered more powerful because it fine-tunes the weight of the whole model. Embeddings leave the model untouched but find keywords to describe the new subject or style.
You can experiment with the Colab notebook in the dreambooth article.
Negative prompts
You put what you want to see in the prompt. You put what you don’t want to see in the negative prompt. Not all Stable Diffusion services support negative prompts. But it is valuable for v1 models and a must for v2 models. It doesn’t hurt for a beginner to use a universal negative prompt. Read more about negative prompts:
How to control image composition?
Stable Diffusion technology is rapidly improving. There are a few ways.
Image-to-image
You can ask Stable Diffusion to roughly follow an input image when generating a new one. It’s called image-to-image. Below is an example of using an input image of an eagle to generate a dragon. The composition of the output image follows the input.


ControlNet
ControlNet similarly uses an input image to direct the output. But it can extract specific information, for example, human poses. Below is an example of using ControlNet to copy a human pose from the input image.


In addition to human poses, ControlNet can extract other information such as outlines.
Depth-to-image
Depth-to-image is another way to control composition through an input image. It can detect the foreground and the background of the input image. The output image will follow the same foreground and background. Below is an example.


Next Step
So you have completed the first tutorial of the Beginner’s Guide! Check out the rest of them.
This is part 1 of the beginner’s guide series.
Read part 2: Prompt building.
Read part 3: Inpainting.
Read part 4: Models.
1 comment