SDXL vs Flux1.dev models comparison

Published Categorized as Tutorial Tagged , 8 Comments on SDXL vs Flux1.dev models comparison

SDXL and Flux1.dev are two popular local AI image models. Both have a relatively high native resolution of 1024×1024. Currently, more resources are available for SDXL, such as model training tools and ControlNet models, but those of the Flux model will likely catch up.

Should you move on to the newest Flux model and delete the SDXL models? In this article, I will compare the strengths and weaknesses of the SDXL and the Flux1.dev model.

Software used

I used Stable Diffusion Forge WebUI to generate the images for comparison. You can use it on Windows, Mac, or Google Colab.

The following checkpoint models are used for generating the images in this article.

Generation Speed

Below are the rendering times for generating four 1024×1024 images using the Euler sampler with 20 steps on an Nvidia 4090 GPU.

  • SDXL: 13 secs.
  • Flux1.dev: 57 secs.

Flux1.dev model takes ~4 times longer to generate an image.

Text rendering

Generating legible text has long been a challenge for AI models. Would the Flux model do better? Let’s compare the models’ ability to render text. I will use the prompt below.

a portrait photo of a 25-year old beautiful woman, busy street street, smiling, holding a sign “SDXL vs Flux”

SDXL:

Flux:

The Flux model is clearly the winner in text rendering. It generates the text correctly in each image.

Prompt adherence

Prompt adherence is the ability of the model to follow the prompt closely.

I will challenge the models in the following areas:

  1. Controlling poses
  2. Object compositions

Controlling poses

Most people use AI models to generate… people. Let’s test the models’ ability to render correct poses.

Photo of a woman with pink hair raising her left hand above her head. Stand with one leg on a hardwood floor.

SDXL:

Flux:

Flux is the clear winner here. Unlike the SDXL model, it generates the pose specified mostly correctly. It made minor mistakes in mixing left and right hands.

Object composition

The test of objection composition is to see how well the model follows the object placement in the prompt.

Prompt:

Still life painting of a skull above a book, with an orange on the right and an apple on the left

SDXL:

Flux:

Again, Flux is a clear winner, rendering the composition correctly in all images. The SDXL model struggles to render the correct composition.

Hands

Rendering hands has long been a weakness in Stable Diffusion AI image models. Would the Flux model do better? Let’s find out.

photo of open palms, detailed fingers, beach, sea

SDXL:

Flux:

Flux generates better hands! I hope we don’t need to use hand fixers anymore.

Faces

This is a rather common task: Generating a big face.

photo of a 85 year old Syrian man, detailed face, eyes, lips, nose, hair, realistic skin tone, freckles, skin texture

SDXL:

Flux:

Both SDXL and Flux models can generate realistic portrait images, although they generate different default styles.

Styles

Next, I am going to try out some prompts from the SDXL style reference. I won’t use the negative prompts for both models since the Flux model doesn’t support it.

Expressionist style

expressionist woman. raw, emotional, dynamic, distortion for emotional effect, vibrant, use of unusual colors, detailed

SDXL:

Flux:

I would say the SDXL model is more accurate in generating this style. The Flux images are too realistic and polished.

Pixel art

pixel art of a dragon. low-res, blocky, pixel art style, 8-bit graphics, pixelated, 90s video game

SDXL:

Flux:

Well, I think the SDXL model generates more accurate images of pixelated graphics. The images from Flux are a bit too smooth and polished.

Ad Poster

advertising poster style sneaker. Professional, modern, product-focused, commercial, eye-catching, highly detailed

SDXL:

Flux:

I can’t say which model performs better. The two models generate quite different styles. The SDXL model has a more consistent style, whereas the Flux model renders diverse styles.

Conclusions

The Flux model is worth the wait for most of the images I tested here. It renders human poses, text, and object compositions better.

Both Flux and SDXL models are competent in rendering detailed faces.

However, you may still want to use the SDXL model in specific areas for its more accurate artistic styles.

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

8 comments

  1. I can say, without any doubt, that Flux renders better. I worked with shoe design using 3D software for years and these footwear images are, to say the least, impressive.

Leave a comment

Your email address will not be published. Required fields are marked *