Lumina Image 2.0 is an open-source AI model that generates images from text descriptions. It excels in artistic styles and adhering to the prompt. In this tutorial, I will cover:
- What Lumina Image 2.0 is.
- How to use Lumina Image 2.0 in ComfyUI.
- Image comparison between Lumina, Flux, and SDXL.
Table of Contents
Software
We will use ComfyUI, an alternative to AUTOMATIC1111. You can use it on Windows, Mac, or Google Colab. If you prefer using a ComfyUI service, Think Diffusion offers our readers an extra 20% credit.
Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.
Take the ComfyUI course to learn how to use ComfyUI step-by-step.
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-40-1024x585.png)
What is Lumina Image 2.0?
Lumina Image 2.0 generates images in 1024×1024 resolution with 2.6B parameters. It leverages the Gemma-2-2B text encoder to process natural language prompts and employs the FLUX-VAE-16CH as its variational autoencoder for image compression.
Run Lumina Image 2.0 on ComfyUI
Step 1: Download the Lumina model
Download the Lumina 2.0 checkpoint model. Put the model file in the folder ComfyUI > models > checkpoints.
Step 2: Update ComfyUI
The easiest way to update ComfyUI is through the ComfyUI Manager. Click Manager > Update All.
Reload the ComfyUI page after the update.
Step 3: Load the Lumina 2.0 workflow
Download the Lumina Image 2.0 JSON workflow below.
Drop it to your ComfyUI.
Step 4: Run the workflow
Revise the prompt.
Run the workflow by clicking the Queue button.
![queue button comfyui](https://stable-diffusion-art.com/wp-content/uploads/2024/01/image-276-1024x180.png)
You should get the image below.
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-39.png)
Comparing Lumina Image 2.0 with Flux.1 Dev and SDXL
With a native resolution of 1024 x 1024, the outputs of the Lumina Image 2.0 model is comparable to the Flux.1 Dev and SDXL models. Let’s do a comparison to study its performance.
Realistic images with text rendering
The first test is generating realistic images with text, which the Flux model is very good at. I will use the following prompt (The text varies).
a portrait photo of a 25-year old beautiful woman, busy street street, smiling, holding a sign “XXXX”
Lumina (Text: Lumina Image 2.0):
![Lumina image 2.0](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-35.png)
![Lumina image 2.0](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-34.png)
![Lumina image 2.0](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-33.png)
SDXL (Text: SDXL vs Flux):
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-32.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-33.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-34.png)
Flux.1 Dev (Text: SDXL vs Flux):
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-31.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-30.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-29.png)
The Lumina model lags behind the Flux model in text rendering and realistic image generation. The texts are not correct. The images are too polished to be realistic. Its quality is slightly worse than the SDXL model. Interestingly, Lumina generates almost the same image, indicating issues in the diversity of training images.
Prompt adherence
Prompt adherence is the ability of the model to follow the prompt closely.
I will challenge the models in the following areas:
- Controlling poses
- Object compositions
Controlling poses
Most people use AI models to generate… people. Let’s test the models’ ability to render correct poses.
Photo of a woman with pink hair raising her left hand above her head. Stand with one leg on a hardwood floor.
Lumina:
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-36.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-37.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/image-38.png)
SDXL:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-40.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-39.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-38.png)
Flux.1 Dev:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-37.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-36.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-35.png)
Again, Flux generates the most accurate images. Lumina’s images are slightly more correct than SDXL’s, but one leg is curiously missing…
Object composition
The test of objection composition is to see how well the model follows the object placement in the prompt.
Prompt:
Still life painting of a skull above a book, with an orange on the right and an apple on the left
Lumina:
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00014.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00015.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00013.png)
SDXL:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-43.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-42.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-41.png)
Flux.1 Dev:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-46.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-45.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-44.png)
Lumina’s object composition is pretty good, only slightly worse than Flux. It is better than SDXL.
Hands
Rendering hands has long been a weakness in Stable Diffusion AI image models. Flux is a significant improvement. Would Lumina do better?
photo of open palms, detailed fingers, beach, sea
Lumina:
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00018.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00019.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00017.png)
SDXL:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-52.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-51.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-50.png)
Flux:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-49.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-48.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-47.png)
While not as good as Flux, Lumina generates better hands than SDXL.
Faces
This is a common task: Generating a big face.
photo of a 85 year old Syrian man, detailed face, eyes, lips, nose, hair, realistic skin tone, freckles, skin texture
Lumina:
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00021.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00022.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00023.png)
SDXL:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-55.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-54.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-53.png)
Flux:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-58.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-57.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-56.png)
Unfortunately, Lumina produces the least realistic images of the face.
Styles
I will use prompts from the SDXL style reference to test Lumina’s ability to generate styles.
Expressionist style
expressionist woman. raw, emotional, dynamic, distortion for emotional effect, vibrant, use of unusual colors, detailed
Lumina:
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00025.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00026.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00027.png)
SDXL:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-64.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-63.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-62.png)
Flux:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-61.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-60.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-59.png)
The art style from Lumina is surprisingly decent! The interpretation of the prompt is different from the SDXL model, but both are correct.
Pixel art
pixel art of a dragon. low-res, blocky, pixel art style, 8-bit graphics, pixelated, 90s video game
Lumina:
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00031.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00030.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00029.png)
SDXL:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-67.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-66.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-65.png)
Flux:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-70.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-69.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-68.png)
Another accurate art style from Lumina! It’s even more precise than SDXL (I said low res). Flux is only good at realistic styles.
Ad Poster
advertising poster style sneaker. Professional, modern, product-focused, commercial, eye-catching, highly detailed
Lumina:
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00040.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00038.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2025/02/ComfyUI_temp_fxcfe_00037.png)
SDXL:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-76.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-75.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-74.png)
Flux:
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-73.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-72.png)
![](https://stable-diffusion-art.com/wp-content/uploads/2024/10/image-71.png)
Lumina produces decent poster images, but the excessive text ruins them.
Conclusions
Lumina is good at generating artistic styles and following the prompt well. The base model generates subpar realistic images, but, as in the case of Stable Diffusion 1.5, it can likely be improved with finetuning.
Reference
- ComfyUI Blog: Lumina Image 2.0 Native Support in ComfyUI
- GitHub page: Alpha-VLLM/Lumina-Image-2.0
- ComfyUI’s official example workflow
Do you know if Lora models developed for Flux are compatible with the model?
Not compatible.