Video from text with Wan 2.2 local model

Published Categorized as Tutorial Tagged , , No Comments on Video from text with Wan 2.2 local model

Wan 2.2 is a high-quality video AI model you can run locally on your computer. In this tutorial, I will cover:

  • Using the high-quality Wan 2.2 14B video model
  • Using the fast Wan 2.2 5B video model
  • Prompt tips for Wan 2.2 models

Software needed

We will use ComfyUI, a free AI image and video generator. You can use it on Windows, Mac, or Google Colab

Think Diffusion provides an online ComfyUI service. They offer an extra 20% credit to our readers.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step by step.

ComfyUI Colab Notebook

If you use my ComfyUI Colab notebook, you don’t need to download the model as instructed below. Select the Wan_2_2 model before running the notebook.

Text-to-video with the Wan 2.2 14B model

This workflow generates a video based on a text prompt. It requires 20 GB VRAM and takes about 50 minutes on an RTX4090 GPU card.

Step 0: Update ComfyUI

Before loading the workflow, make sure your ComfyUI is up to date. The easiest way to do this is to use ComfyUI Manager.

Click the Manager button on the top toolbar.

Select Update ComfyUI.

Restart ComfyUI.

Step 1: Load the workflow

Download the workflow below.

Step 2: Download models

After loading the workflow JSON file, ComfyUI should prompt you to download the missing model files. Here are what you need to download:

Step 3: Revise the prompt

Describe your video in the prompt. Use keywords to direct the camera.

Step 4: Generate the video

Click the Run button to run the workflow.

Text-to-video with the Wan 2.2 5B model

This workflow turns an input image into a video. Unlike the 14B model, this smaller 5B model requires only 8 GB VRAM and takes about 6 minutes on an RTX4090 GPU card.

Step 1: Download the workflow

Download the workflow below and drop it into ComfyUI to load.

Step 2: Install models

Here are the model files you need to install:

Step 3: Revise the prompt

Describe your video in the prompt. Use keywords to direct the camera.

Step 4: Generate the video

Click the Run button to run the workflow.

Prompt tips for Wan 2.2

Compared to the image-to-video workflow, it is much easier to direct Wan 2.2 videos in the text-to-video mode. Due to generation time,

Camera motion

Useful keywords are:

  • Zoom in/out
  • Lens trembling
  • Pan left/right
  • Tilt up/down
  • Dolly in/out
  • Orbital arc
  • Crash zoom
  • Track upward/downward

Zoom in

Zoom in is easier to use for adding camera movement.

Prompt:

Camera zooms in from the vast volcanic crater, the lens trembling with the heat as glowing embers drift upward into the smoky sky. At the edge of the roaring molten pit sits a office desk, weathered and scorched by searing winds. At this desk, a detective leans forward, sweat beading on his forehead, as he writes tirelessly, pen scratching furiously against crumpled sheets of paper. His coat flutters in the acrid breeze, and the glowing red light of the lava casts ominous shadows across his furrowed brow. Ash falls like snow around him, each flake punctuating his determination. The detective’s hand grasps a journal, every line and margin filled with cryptic clues and hastily drawn schematics. Occasionally, he pauses to stare into the churning fire below, lost in thought, before dipping his pen back into the inkwell, fingers smudged with charcoal dust. The camera’s focus shifts from his intense eyes to the swirling patterns of molten rock behind him, then back to the rhythmic motion of his hand, creating a dance. The scene crackles with tension: danger lurks in every thundering eruption, yet the detective remains undeterred, driven by a pursuit of truth in the heart of the smoldering inferno.

Zoom out

The model responds to zoom out better when you first describe a close-up scene. For example:

Extreme close-up on a painter’s fine brush etching vibrant color on a masterpiece. Zoom out gradually to reveal the beatuiful and trendy female artist in a dimly lit dated art studio.

Crash Zoom

Crash Zoom is a filming technique that the camera zooms in quickly.

Wide angle view of a confident young blonde woman in a sunlit urban plaza, standing atop a low wall. Crash zoom suddenly into her determined face as she flips her hair.

Track upward

Use “Track upward” to move the camera up.

close-up on the hero’s armored boot pressing into a rusted seam. Track upward along his blue-plated leg as he steadies himself, revealing his chest armor and youthful face.

Composite video

We have used this technique in a previous prompt example. The trick is to describe one scene, followed by a different one connected with a camera motion. For example:

Opening shot is a close-up on a delicate, antique music box’s spinning ballerina figure, its metal edges tarnished by time.

Zoom out Pull back to reveal an elderly woman’s trembling hands winding the key, then further to show her seated alone in a dusty attic illuminated by a single shaft of sunlight through a cobwebbed window.

Scene quality

Modify the scene quality to match what your message. The model supports concepts like

  • anamorphic lens flare
  • HDR high contrast
  • 35mm film grain
  • Soft focus

35mm film grain

Prompt:

A lone traveler walking along a misty forest path at dawn. Camera cranes down from treetops to meet the traveler’s head height, then tracking shot behind them as they move forward. Soft backlight through fog, ethereal color palette, delicate 35mm film grain for a dreamy, storybook mood.

Silhouette photography

Silhouette photography, place a lone guitarist standing knee-deep in surf, facing the ocean at dusk. Close in on the guitar’s headstock and the player’s bowed head in silhouette. Slowly zoom out to reveal crashing waves, a pastel sky, and seabirds tracing the horizon.

Animation styles

The model supports animation styles:

  • 3D animation style
  • Disney cute character style
  • Japanese anime style

Prompt:

3D animation disney cute character style, in a theater with red drapes and golden trim, a magician in a midnight-blue tailcoat and top hat stands under a spotlight. He reaches into a silver cylinder and produces a surreal pipeapple, exhaling smoke that swirls like colorful ribbons.

Fantasy content

Like Stable Diffusion, the Wan 2.2 model is quite good at generating imaginary scenes. Let your imagination runs wild and Wan 2.2 will help you visualize it!

close up shoot, Under a vast azure sky, a red-haired woman was smiling and laughing joyfully. Her long, curly tresses dance in the breeze. A large-brimmed straw hat, slightly drooping at the edges, crowns her head. On a rural path blanketed in golden hay, expansive fields and a pristine blue horizon form the backdrop. With hands aloft, she wields a blue garden hose from which a cascade of colorful little birds erupts, scattering like fireworks in the air. The blossoms, diverse in hue and shape, gleam with a gentle luster under the sun’s rays.

You can also find some good prompts for imaginary scenes on the Wan 2.2 website.

Additional tips

  • Character size: Like text-to-image models, the Wan 2.1 text-to-video model does not perform well with small details. You should try to create characters that are large enough so that their faces are covered by many pixels.
  • Camera control: Compared to the image-to-video mode, the video in text-to-video mode has more freedom to create motions. Stick with text-to-video with you want to have create specific camera motion.

Useful links

Wan2.2 Day-0 Support in ComfyUI â€“ Release press from ComfyUI

Wan-Video/Wan2.2 â€“ Github page hosting the source code

Wan-AI/Wan2.2-T2V-A14B · Hugging Face – Model weights of the 14B T2V model

Wan-AI/Wan2.2-TI2V-5B · Hugging Face â€“ Model weights of the 5B T2V model

Wan2.2 Video Generation â€“ ComfyUI’s official documentation

Andrew

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

Leave a comment

Your email address will not be published. Required fields are marked *