Turn an image into a video with Wan 2.2 local model

Published Categorized as Tutorial Tagged , , , 4 Comments on Turn an image into a video with Wan 2.2 local model

Wan 2.2 is a local video model that can turn text or images into videos. In this article, I will focus on the popular image-to-video function. The new Wan 2.2 models are surprisingly capable in motion and camera movements. (But not necessarily controlling them)

This article will cover how to install and run the following image-to-video models:

  1. Wan 2.2 14B model – higher quality but needs more VRAM and takes longer.
  2. Wan 2.2 5B model – Slightly lower quality but is quicker and needs lower VRAM.
Wan 2.2 15B Image-to-video
Wan 2.2 5B Image-to-video

Software needed

We will use ComfyUI, a free AI image and video generator. You can use it on Windows, Mac, or Google Colab

Think Diffusion provides an online ComfyUI service. They offer an extra 20% credit to our readers.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step by step.

Wan 2.2 model

Wan 2.2 is a multimodal AI model from Wan AI, a highly ancipated update to Wan 2.1. It uses a new “Mixture of Experts” model architecture, where different expert sub-models handle different levels of noise during video creation. The result is cleaner, sharper videos.

Main features:

  • Movie‑quality visuals: You can adjust lighting, color, and framing using a prompt.
  • Smooth complex motion: Good improvement in motion generation.
  • Accurate scene understanding: Generating scenes with many objects and details.

You can create videos from text or images.

The models and the code are open-source and under the permissive Apache 2.0 license, making it open to commercial use.

image-to-video with the Wan 2.2 14B model

This workflow turns an input image into a video. It requires 20 GB VRAM and takes about 1 hr 20 mins on an RTX4090 GPU card.

Step 0: Update ComfyUI

Before loading the workflow, make sure your ComfyUI is up to date. The easiest way to do this is to use ComfyUI Manager.

Click the Manager button on the top toolbar.

Select Update ComfyUI.

Restart ComfyUI.

Step 1: Load the workflow

Download the workflow below.

Step 2: Download models

After loading the workflow JSON file, ComfyUI should prompt you to download the missing model files.

Here are what you need to download:

Step 3: Upload an image for the start frame

Upload an image you wish to use as the first frame of the video.

You can use the test image below.

Step 4: Revise the prompt

Describe your video in the prompt. Use keywords to direct the camera.

Step 5: Generate the video

Click the Run button to run the workflow.

image-to-video with the Wan 2.2 5B model

This workflow turns an input image into a video. Unlike the 14B model, this smaller model requires only 8 GB VRAM and takes about 6 mins on an RTX4090 GPU card.

Step 1: Download the workflow

Download the workflow below and drop it into ComfyUI to load.

Step 2: Install models

Here are the model files you need to install:

Step 3: Upload an image for the start frame

Upload an image you wish to use as the first frame of the video.

You can use the test image below.

Step 4: Revise the prompt

Describe your video in the prompt. Use keywords to direct the camera.

Step 5: Generate the video

Click the Run button to run the workflow.

Testing camera keywords on Wan 2.2

The Wan AI team touted a significant improvement in the model’s ability to follow camera instructions. Let’s put it to the test. I will stick with the 5B model due to the generation time.

Zoom

Zooming out from a character is working pretty well:

Zooming out.

Sometimes, it confuses zooming in and out.

Zoom in.

The solution is to generate a few more videos and pick the best one.

But it is not clear whether the zoom is working when the character is doing something at the same time.

Zooming out and showing a giant egg.

Pan

Panning right works pretty well:

Pan right.

Panning left is still producing a video of panning right (and zooming out).

Pan left.

I think what happens is that the initial image is more natural for the video to pan right than left. The model produces likely videos rather than following the prompt.

Useful links

Wan2.2 Day-0 Support in ComfyUI – Release press from ComfyUI

Wan-Video/Wan2.2 – Github page hosting the source code

Wan-AI/Wan2.2-I2V-A14B · Hugging Face – Model weights of the 14B I2V model

Wan-AI/Wan2.2-TI2V-5B · Hugging Face – Model weights of the 5B I2V model

Wan2.2 Video Generation – ComfyUI’s official documentation

Andrew

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

4 comments

  1. Thanks , Andrew, the quality looks great in your examples and I look forward to trying this. I see Comfy also make Wan2.2 available through an API.

    You know what I’m going to ask 🙂 please could you put the links to upload the models into the Colab notebook.

    1. Yes +1 on this please. I figure by the time I’m done downloading the models and uploading them into drive the Colab notebook will be updated before then lol.

      1. Joe, there are ways to use Colab to copy models, Loras etc directly to the relevant sub-folder in your AI_PICS folder stack which avoids the slow upload problem. Andrew recently offered to develop a notebook to do this (sorry that I can’t find the post) and I used Gemini to write code for me to use in my own notebooks.
        But to use large models like Wan2.2, it is much (!) easier to be able to tick a box in Andrew’s Comfy notebook and have Colab manage it all which avoids using one’s personal storage allocation for the files. I always seem to be coming up against my storage limit.

Leave a comment

Your email address will not be published. Required fields are marked *