Qwen Image User Guide

Qwen Image is a new open-source text-to-image model developed by Alibaba’s Qwen team. It’s quickly gaining thumbs-up from AI creators. Unlike many closed models, Qwen Image is both flexible and accessible, making it a strong alternative to Stable Diffusion and Flux models. You can run it locally in ComfyUI with customization.

In this article, I will cover:

An overview of the Qwen Image model.
Sample images
Qwen Image on Google Colab
Qwen Image on ComfyUI (Standard)
Qwen Image on ComfyUI (Fast)

Table of Contents

The Qwen Image model
- Pros of Qwen Image
- Cons of Qwen Image
Software
Sample images
ComfyUI Colab Notebook
Qwen Image workflow (Fast)
Qwen Image workflow (Standard)

The Qwen Image model

So why should you consider using Qwen Image for your creative projects?

Qwen-Image has 20 billion parameters and uses a MMDiT (Multimodal Diffusion Transformer) architecture. The design goals are (1) Complex, multilangual text rendering, and (2) Strong alignment between the prompts and the generated images.

Pros of Qwen Image

Open-source: Freely available with transparent weights.
High image quality: Sharp, consistent, and diverse images.
Good at following prompts: Strong alignment between text input and images with multilingual support.

Cons of Qwen Image

Large model size – You’d better have a large hard drive and a good GPU card.
Smaller ecosystem – Unlike Stable Diffusion and Flux, many essential tools, like ControlNet, are not as mature.

Qwen Image also supports image editing and LoRA, but I will cover these features in the upcoming tutorials.

Software

We will use ComfyUI, a free AI image and video generator. You can use it on Windows, Mac, or Google Colab.

Think Diffusion provides an online ComfyUI service. They offer an extra 20% credit to our readers.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step by step.

Sample images

Here are some images generated by the Qwen Image 8-step Lightning workflow. (15 secs on an RTX 4090). I used the same prompt in the Wan 2.2 model post so that you can compare.

A medieval knight in ornate silver armor, polished and gleaming under radiant sunlight, riding a gigantic shimmering koi fish flying through the sky. The koi’s scales glow iridescent orange and gold, casting reflections across drifting lotus ponds suspended in the clouds. Red and golden paper lanterns float gently, their light flickering against a backdrop of endless pastel skies. Epic cinematic composition, ultra-detailed textures, dreamlike atmosphere, fantasy realism, 8K, volumetric lighting, high dynamic range.

A lively street soccer scene in a colorful urban neighborhood with pale blue and yellow buildings in mediterranean background. A young Iranian boy wearing a bright yellow and green soccer jersey with navy shorts, is skillfully juggling a soccer ball mid-air, his face full of concentration and energy. He wears black sneakers and white socks. Behind him, pastel-colored buildings with balconies, murals, and some graffiti. Parked cars line the sides of the street, but the road is open and vibrant with play. The atmosphere is dynamic and joyful, capturing the essence of childhood street soccer in a Latin or Southern European city. The lighting is warm and natural, suggesting late afternoon or early evening, with a cinematic depth of field that emphasizes the main boy in action while softly blurring the background buildings.

Portrait photograph of a young woman lying on her stomach on a tropical beach, wearing a white crochet bikini, gold bracelets and rings, and a delicate necklace, long brown hair loose over her shoulders. She rests on her forearms with legs bent upward, eyes looking at the viewer smile. The sand is light and fine. Some sand on her stomach. Turquoise waves roll gently in the background under a bright blue sky with scattered clouds. Midday sunlight, soft shadows, warm tones, high detail, sharp focus, natural skin texture, vibrant colors, shallow depth of field, bokeh, professional beach photography, shot on a 50mm lens, cinematic composition. It’s windy. realistic skin texture.

A lone samurai in crimson lacquered armor battling a towering origami dragon inside an ancient library. The dragon’s folded paper wings crackle with energy, scattering thousands of glowing calligraphy symbols into the air. Books fly like birds from towering bookshelves, pages tearing into stormy paper clouds that swirl above. Dramatic chiaroscuro lighting, ultra-detailed Japanese fantasy, cinematic still, hyperrealistic textures, 8K, intricate atmosphere.

ComfyUI Colab Notebook

If you don’t have a powerful GPU card, you can still run the Qwen Image model on Google Colab, Google’s cloud computing platform, with my ComfyUI notebook.

You don’t need to download the model as instructed below. Select the Qwen_Image model before running the notebook.

Qwen Image workflow (Fast)

This workflow uses Qwen Image Lightning LoRA to speed up the workflow. The LoRA converts the model to the distilled version.

Step 0: Update ComfyUI

Before loading the workflow, make sure your ComfyUI is up to date. The easiest way to do this is to use ComfyUI Manager.

Click the Manager button on the top toolbar.

Select Update ComfyUI.

Restart ComfyUI.

Step 1: Install models

After loading the workflow JSON file, ComfyUI should prompt you to download the missing model files.

Here are the models you need to download:

Download qwen_image_fp8_e4m3fn.safetensors and put it in ComfyUI > models > diffusion_models.
Download qwen_2.5_vl_7b_fp8_scaled.safetensors and put it in ComfyUI > models > text_encoders.
Download qwen_image_vae.safetensors and put it in ComfyUI > models > vae.
Download Qwen-Image-Lightning-8steps-V2.0.safetensors and put it in ComfyUI > models > loras.