Midjourney vs Stable Diffusion: Which one should you pick?

46,514 views
Updated Categorized as Tutorial Tagged 26 Comments on Midjourney vs Stable Diffusion: Which one should you pick?
midjourney mechanical dove vs stable diffusion

Midjourney is a web service that makes stunning AI pictures using words. It’s similar to Stable Diffusion, but there are some differences. Midjourney can only be used on the internet, and you have to pay for it. So, is it worth paying for Midjourney? And how is it different from Stable Diffusion? Let’s find out.

Midjourney vs Stable Diffusion – Feature comparison

You will find a detailed comparison between Stable Diffusion and Midjourney in this section. Unlike Midjourney, there are multiple ways to use Stable Diffusion. I will confine my analysis to using AUTOMATIC1111, a popular GUI for Stable Diffusion.

Like Midjourney, you can use AUTOMATIC1111 as a web service (e.g. Google Colab). You can also use it locally on Windows PC and Mac. New to Stable Diffusion? Check out the Quick Start Guide.

You will see image comparisons throughout the article. I tweaked the prompts and selected models in each case to optimize the images. So they are not direct comparisons of the same prompts but more like attempts to generate similar pictures of various styles.

Here’s the summary of the comparison.

Stable Diffusion (AUTOMATIC1111)Midjourney
Image CustomizationHighLow
Ease of getting startedLowMedium
Ease of generating good imagesLowHigh
InpaintingYesNo
OutpaintingYesNo
Aspect ratioYesYes
Model variants~1,000s~ 10s
Negative promptYesYes
Variation from a GenerationYesYes
Control composition and poseYesNo
LicensePermissive.
Depends on the model used
Restrictive.
Depends on the paid tier
Make your own modelYesNo
CostFree$10-$60 per month
ModelOpen-sourcedProprietary
Content FilterNoYes
StyleVariesRealistic illustration, artistic
UpscalerYesYes
Image PromptNoYes
Image-to-imageYesNo
Prompt word limitNo limit?

Image customization

There are more ways to customize an image in Stable Diffusion, such as changing the image size, how closely the prompt should be followed, the number of images generated, the seed value, samplers, etc. The options are fewer in Midjourney. You can change the aspect ratio, the seed and whether to stop early.

Verdict: Stable Diffusion wins.

Easy to Get Started

AUTOMATIC1111 is a bit hard to install. After you it up and running, you will still need to find and install models to get the styles you want.

Midjourney is not as user-friendly as it should be, mainly because of their choice of using Discord as an interface. But it’s still ten times easier to get started.

Pro tip: Want to hide other people’s generations? Create a new private server and invite the Midjourney bot. And you can generate images in peace.

Verdict: Midjourney wins.

Easy to generate good images

Midjourney is well-known for being surprisingly easy to generate artistic images with a lot of fine details. You don’t need to work very hard to generate good images. If fact, very often, it will ignore part of your prompt and deliver surprising aesthetic images.

A Stable Diffusion user needs to put more work into building a good prompt and experiment with models to generate an image of similar quality.

Verdict: Midjourney wins.

Prompt

Both Stable Diffusion and Midjourney support prompt and negative prompt. Both can add weight to any keywords in a prompt. You can do slightly more prompt tricks with AUTOMATIC1111, such as blending two keywords.

Verdict: Tie.

Models varieties

Stable Diffusion is an open-source model. People have made models of different styles. There are currently more than a thousand models available for download. Each model can be further modified with LoRA models, embedding models, and hypernetworks. The end result is there are more models than you have time to try.

Midjourney’s models are limited in comparisons. They offer v1 to v5 models, plus a few special models like niji, test, testp and HD. There is an additional parameter you can “stylize” the image. But the overall offerings dwarf Stable Diffusion.

Verdict: Stable Diffusion wins.

Image editing

You can use Stable Diffusion to edit a generated image in many ways. This includes regenerating only part of an image with inpainting, and extending an image through outpainting. You can also simply tell Stable Diffusion what you want to change using the instruct-pix2pix model.

Sadly, you cannot edit an image with Midjourney.

Verdict: Stable Diffusion wins.

Style

Midjourney v4 produces images with a realistic illustration style by default. It can also generate other art styles when prompted correctly. A realistic photo is possible in the v5 model.

Stable Diffusion can generate a broader range of styles ranging from realistic photos to abstract art, thanks to the passionate community and ease of training new models. Users can remix models with embeddings, LoRAs, or hypernetowrks. It can produce surprising effects and is fun to play with.

Verdict: Stable Diffusion wins.

Variation from a generation

Both offers generate slight variations of a generated image. You press the V buttons under the images in Midjourney. You use the variational seed option in AUTOMATIC1111.

Verdict: Tie.

Control composition and pose

You can control composition and pose in Stable Diffusion in multiple ways: Image-to-image, depth-to-image, instruct-pix2pix and controlNet. In Midjourney, the closest option is using an image prompt which acts like a text prompt to control image generation.

Verdict: Stable Diffusion wins.

Cost

Using Stable Diffusion with AUTOMATIC1111 can be free using your own computer. In contrast, using Midjourney would set you back at least $10 a month.

Verdict: Stable Diffusion wins.

License

Many people are unaware that the ownership of the images you generate using Midjourney depends on your paid tier. You own nothing if you are not a paid subscriber. You have more rights if you pay more. In any case, Midjourney can use your images without asking you first. See their terms of service.

In contrast, Stable Diffusion claims no right to the images you generate. You are allowed to distribute and further train the model and even sell it. However, models further fine-tuned by others may have additional restrictions. So be sure to read the license and terms of use when you use a new model.

Verdict: Stable Diffusion wins.

Content Filter

There is a content filter in the original Stable Diffusion v1 software, but the community quickly shared a version with the filter disabled. So in practice, there’s no content filter in the v1 models. v2 is trickier because NSFW content is removed from the training images. It cannot generate explicit content by design. In contrast, generating explicit images are off limit in Midjourney. It is blocked even at the prompt level. You can get banned if you try.

Verdict: Stable Diffusion wins.

Making your own models

Perhaps the biggest appeal of Stable Diffusion is the possibility of making your own models. If you don’t like the images you see, you can always train your own model. You can use dreambooth, textual inversion, LoRA, hypernetwork, or simply run additional rounds of training with your own images. Unfortunately, you cannot do that with Midjourney.

Verdict: Stable Diffusion wins.

Upscaler

Both Stable Diffusion and Midjourney have upscalers. The choices and parameters available in AUTOMATIC1111 are more. In fact, you can install additional ones easily.

Verdict: Stable Diffusion wins.

Image Prompt

You can use an image as a prompt together with a text prompt in MidJourney. It will generate a combination of the content of the image prompt and the text prompt. That’s not the same as image-to-image in Stable Diffusion, where the input image acts as an initial image but is not used in conditioning. The closest thing Stable Diffusion will have is Stable Diffusion Reimagine, which uses an input image as conditioning in place of the text prompt.

Verdict: Midjourney wins.

Image-to-image

Currently, Midjourney offers no image-to-image functionality, a method for diffusion models to generate images based on another image. This is unsurprising since the earlier versions of Midjourney may not be diffusion models.

Verdict: Stable Diffusion wins.

Prompt limit

Midjourney used to state there was about 60 words limit for the prompt in their user guide. But they removed that statement. On the other hand, AUTOMATIC1111 now supports unlimited prompts length.

Verdict: Not clear.

Is Midjourney using Stable Diffusion?

Midjourney v5 model is not Stable Diffusion. That’s all they said. However, the improvements to v5 look suspiciously similar to Stable Diffusion v2: The prompt needs to be more literal and specific. People are getting five fingers… Could Midjourney share some components of Stable Diffusion v2, like the OpenClip text embedding? It certainly makes sense to use a diffusion model because of the lower run costs.

Is Midjourney better than Stable Diffusion?

I don’t want to give a diplomatic answer but it really depends on what you are looking for.

Midjourney has its own unique style – high contrast, good lighting, and realistic illustration. It’s super easy to create images with crazy amounts of detail. You can get good images without trying very hard.

On the other hand, Stable Diffusion can also create similar or better images, but it requires a bit more know-how. So, if you’re up for a challenge and want to dive deep into the technical side of things, then Stable Diffusion is the perfect fit for you.

How does Midjourney differ from Stable Diffusion?

You can read the first section for a point-by-point comparison. The main difference lies in the operating model and the users they cater to.

Midjourney chose a proprietary business model. They take care of the model development, training, tweaking and the user interface. Everything should be simple and works out-of-box. You tell the model what you want, and you get it.

Stable Diffusion is a software that embraces an open-source ecosystem. The model’s codes and training data are available for everyone to access. You can build on it and fine-tune the model to achieve exactly what you want. And guess what? People have already done that! There are thousands of models that have been publicly created and shared by users just like you.

But that’s not all. New and amazing tools are being created every week, and it never ceases to amaze me how creative people can be when given the opportunity to do so.

Generating a Midjourney image in Stable Diffusion

Recreating a Midjourney image in Stable Diffusion is tricky but possible. I use the following workflow.

  1. Use the same prompt to see what you get. You can start with the v1.5 base model. The result is usually very different.
  2. Adjust the keywords of the prompt. You will likely find that Midjourney ignores some keywords and takes the liberty of adding others. I usually look at the keywords in the prompt generator to see how to achieve the same effect.
  3. You will likely want to add a negative prompt (The universal one is usually fine).
  4. You will definitely need to add some lighting keywords. Pay attention to the contrast and luminosity. Choose the lighting keywords that can achieve a similar effect.
  5. Since Midjourney images are on the darker side, you may want to add a LoRA like epi_noiseoffset.
  6. Finally, experiment with different models and adjust the tweak prompt.

And use ControlNet if you want to copy the composition.

I will write another article to detail the process step-by-step. Stay tuned!

Which one should I use?

Midjourney and Stable Diffusion both have a large user base. They have their strengths and weaknesses.

Midjourney is for you if

  • You want to generate stunning images without a deep learning curve.
  • You are busy and cannot afford the time to set up and learn the models.
  • You like the Midjourney styles.
  • You are looking for an out-of-box AI image solution.
  • You don’t mind paying a subscription fee.
  • You are ok with their terms of use.

Stable Diffusion is for you if

  • You want a completely free solution.
  • You want to run everything locally.
  • You are tech-savvy.
  • You like tinkering with your setup, trying out model combinations, and using new tools.
  • You need the image-editing capability.
  • You prefer open-source tools.
  • You want more control over your images.

I hope this article helps you understand the difference between Midjourney and Stable Diffusion and helps you decide which one to use. If you can afford the time and resources, you should try out both. You will likely find both have their place in your workflow. I use both of them and am often fascinated by the challenge of producing one’s images with the other.

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

26 comments

  1. I’ve been using SD for about two months by now. It is fascinating that in such a short time I was able to learn things like training models on dreambooth (which pushed me to buy a larger HDD) and then training my first LoRa, reaching exceptionally good results and also saving a lot of space, getting better results than my previously trained checkpoints early on.
    Something worth mentioning is that you need good hardware to run SD locally. This may cost more than a subscription on Midjourney, but it pays off because you have limitless freedom to render and train your own models. I upgraded my PC almost entirely, new processor, graphics card with at least 12GB of VRAM, and 4x more RAM, and now I can do everything from generating larger batches of images to train Loras and even dreambooth locally at a reasonable amount of time. Of course, I can use my more powerful graphics card in Blender as well, a software I’m using since at least 2016. Great to make poses for ControlNet.
    Thanks a lot for writting this site, if not by the community, I’d have gave up. For anyone starting now, don’t give up! For everyone else who managed to get everything set up and running, happy diffusing!

    Lucas, from Brazil.

  2. Dear Sir,

    I use your code on Mac M1 Pro 2021 (without GPU) , when I run then have 2 error :

    Launching Web UI with arguments: –skip-torch-cuda-test –upcast-sampling –no-half-vae –use-cpu interrogate
    no module ‘xformers’. Processing without…
    no module ‘xformers’. Processing without…
    No module ‘xformers’. Proceeding without it.
    Warning: caught exception ‘Torch not compiled with CUDA enabled’, memory monitor disabled

    RuntimeError: MPS backend out of memory (MPS allocated: 9.93 GB, other allocations: 2.03 GB, max allowed: 18.13 GB). Tried to allocate 7.43 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

    I try install Torch, Tensoflow to active GPU (virrual) and check have GPU, but the same error, this is file web-user.bat :

    git pull

    @echo off

    set PYTHON=
    set GIT=
    set VENV_DIR=
    set COMMANDLINE_ARGS=–autolaunch –skip-version-check –precision full –no-half –skip-torch-cuda-test –listen –no-half –precision full –port 9999 –disable-safe-unpickle –deepdanbooru

    call webui.bat

    Pls help me, many thanks

  3. I use MJ because it is easy but I want to move to Stable Diffusion because I do want to dive deeper. wish me luck! I hope it works out, so thankful for all the resources here.

  4. We experimented with the Tower of Babel (Genesis 11) as text prompt. Stable Diffusion seemed to recognize the scene and came up with some images similar the Brueghel’s famous Tower of Babel paintings, Midjourney gave some very naturalistic images of people buidling (but did less with the tower-motif). Any explanation how Stable Diffuction recognized the Tower of Bible scene (similarly for a Last Super prompt with some similarities to Da Vinci’s Last Supper, but not with an Paradise scene) and Midjourney didn’t?

    1. Good question! It comes to two parts for Stable Diffusion:
      (1) The language model (OpenAI’s Clip which SD v1 uses) – The prompt is mapped to a space closer to these images. That has something to do with how Open AI trained the word embeddings and is proprietary.
      (2) the training data. – The training data further reinforce and finetune to output to those images. There are likely something similar in the dataset.

      I cannot speak for MJ because their models are proprietary. It is likely that they finetune the models based on what their customers like to see/generate. It is not uncommon to see MJ completely ignores part of the prompt to produce good-looking images. It’s a trade off between easy-to-use and accuracy. (i.e. People don’t need to put a lot of effort in writing prompts to get good images)

  5. If you right click and save the images, the interface will show you their names which include prompts lmao. You probably also see the name using F12.

  6. Since these diffusion models appeared it has been clear there are two kinds of people, those that want to share and help others so that they can create the best pictures possible with the new technology. They offer tips on prompt engineering and can even help you achieve a style that you want.
    Then there’s others that just showoff what they can produce but will keep their “secret sauce” hidden, never sharing prompts, gathering as much from others but never giving back, showing what’s possible to create without a hint of how to do so.
    And this page is from the latter, there are similar pages on the internet but those ones include the prompts that produced the images, remember that you wouldn’t have been able to produce these images if others didn’t share their prompts with you. There’s something wrong when the world would be a worse place if everyone were like you, and nobody shared.

  7. Midjourney is too simple to be compared to Stable Diffusion. Midjourney does not have a large number of models. For Midjourney, you cannot train your model. Midjourney works through discord on a remote server, so you can’t use plugins, or write and use your own code. You cannot create pattern textures in Midjourney. Midjourney is a toy to play around with, generate cool pictures and that’s it.

  8. I refuse to use software that treats me like a child. “NSFW” is a disgusting Orwellian concept. If Puritans want to drag everyone to the New Dark Ages they’ll have to speak to me because I’m not going to play along.

    The erotica industry, which is disgustingly labeled “porn” generates billions of dollars and satisfies a lot of people. Not safe for work, indeed. Do the people in that industry not work?

    It’s time for adults to say no to Big Brother and Big Sister with their “everything, including David, is pornography” and their “not safe for work” euphemisms for radical censorship of basic things like the healthy naked body everyone is born with.

  9. Being new to this whole AI art thing, I find it interested for someone with experience to compare different programs and include the results, presuming using the same input prompts.

    1. I should have said I tried generating similar images in MJ and SD. I tweaked the prompt and other parameters because using the same prompt may not be the best comparison of what each can achieve.

    1. Hi, midjourney offers image prompt but it is not the same as image-to-image in Stable Diffusion.

      To be precise, image-to-image is called SDEdit in the machine learning literature. It uses the input image as the blueprint to generate the output image. The overall color and composition closely match the input image.

      Midjourney doesn’t tell us the detail, but it is likely using an image as an extra conditioning to the text prompt. It is similar to Stable Diffusion Reimagine or ControlNet. The output image has elements of the input image but the spatial composition would not be followed.

      Hope this clarifies the two features.

  10. Thank you for this really helpful, useful, and informative article, it answered a number of questions I had about Midjourney and Stable Diffusion.

  11. Articles like these are so cringe, because it shows the author has no understanding of the origins of these tools or how different they really are. As if they are some sort of competitors to one another or 2 unique things in the same field meant for the same purpose. Just sad really.

    1. What a pointless comment, it adds nothing of value. You aren’t saying anything, you’re not explaining your position, you dropping some vague criticism without content to shield you from pesky replies. Just sad really.

    2. I have been following AI image processing for a few years now (remember deep dreaming?) and I consider this article an excellent summary for someone who just want to dabble in the stuff.
      There is so much can be said about “the origins” it requires whole another article and audience.
      Please, move on and be salty somewhere else, twitter or something.

    3. Quite the opposite. The author actually showed quite a lot of understanding of the underlying mechanisms of those tools… well in the case of Diffusion models at least, because Midjourney is a proprietary tool with no visibility into how it is working underneath. I find this article very informative and well balanced. It showed advantages and disadvantages of both, and also commented on when you could chose MJ and when SD.
      And yes, they are competitors. Both are tools in the exact same field. Both are used for image generation, both are used mostly for the same purposes.

Leave a comment

Your email address will not be published. Required fields are marked *