How to speed up Wan 2.1 Video with Teacache and Sage Attention

Updated Categorized as Tutorial Tagged , , 15 Comments on How to speed up Wan 2.1 Video with Teacache and Sage Attention

Wan 2.1 Video is a state-of-the-art AI model that you can use locally on your PC. However, it does take some time to generate a high-quality 720p video, and it can take a lot of time to refine a video through multiple generations.

This fast Wan 2.1 workflow uses Teache and Sage Attention to reduce generation time by 30%. It will help you iterate through multiple videos with significant time-saving.

Software

We will use ComfyUI, an alternative to AUTOMATIC1111. You can use it on Windows, Mac, or Google Colab. If you prefer using a ComfyUI service, Think Diffusion offers our readers an extra 20% credit.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step by step.

How does the speed-up work?

This workflow uses two speed-up techniques: Teacache and Sage Attention.

Teacache

TeaCache takes advantage of the observation that some neural network blocks don’t do much during sampling. Researchers have recognized that diffusion models generate image outlines in the initial sampling steps and fill in details in the late steps.

Diffusion models generate the image outline in the initial steps and details in the late steps. (Image: Chen et. al.)

TeaCache intelligently determines when to use caches during sampling. It uses the cached output when the current input is similar to that produced the cache. It only recomputes the cache when the input becomes substantially different. You can control how often the cache is recomputed by a threshold value.

See also: TeaCache: 2x speed up in ComfyUI

Sage Attention

Sage Attention speeds up transformer attention operations by quantizing the computation. Instead of full precision, it uses lower precision (like 8-bit or 4-bit) in the key parts of the attention operation. It can speed up many AI models with nearly lossless accuracy.

Google Colab

If you use my ComfyUI Colab notebook, select the following before running the notebook.

  • WAN_2_1 video models
  • WAN_2_1 custom nodes
  • VideoHelperSuite custom nodes

Fast Wan 2.1 Teacache and Sage Attention workflow

This fast Wan 2.1 workflow uses KJNodes‘ Sage Attention and Teacache nodes. It is ~30% faster than the standard Wan 2.1 workflow.

The two speed-up nodes are placed between the Load Diffusion Model and the KSampler node.

Step 1: Update ComfyUI

Before loading the workflow, make sure your ComfyUI is up to date. The easiest way to do this is to use ComfyUI Manager.

Click the Manager button on the top toolbar.

Select Update ComfyUI.

comfyui manager - update comfyui

Restart ComfyUI.

Step 2: Download model files

Download the diffusion model wan2.1_i2v_720p_14B_fp8_e4m3fn.safetensors and put it in ComfyUI > models > diffusion_models.

Download the text encoder model umt5_xxl_fp8_e4m3fn_scaled.safetensors and put it in ComfyUI > models > text_encoders.

Download the CLIP vision model clip_vision_h.safetensors and put it in ComfyUI > models > clip_vision.

Download the Wan VAE model wan_2.1_vae.safetensors and put it in ComfyUI > models > vae.

Step 3: Load the fast Wan 2.1 workflow

Download the workflow JSON file below and drop it to ComfyUI to load.

Step 4: Install missing nodes

If you see red blocks, you don’t have the custom node that this workflow needs.

Click Manager > Install missing custom nodes and install the missing nodes.

Restart ComfyUI.

Step 5: Install trition and sage attention

The Sage Attention node requires the trition and sage attention packages that do not come with the JK Nodes.

For Windows users, navigate to the Python folder of your ComfyUI.

For the Windows portable version, it is ComfyUI_windows_portable > ComfyUI_windows_portable.

Enter cmd in the address bar and press Enter.

You should see the command prompt.

Enter the following command to install triton.

python -m pip install triton-windows

Enter the following command to install sage attention.

python -m pip install sageattention

Step 6: Set the image image

Upload an image you wish to use as the video’s initial frame. You can download my test image for testing.

Step 7: Revise the prompt

Revise the positive prompt to describe the video you want to generate.

Don’t forget to add motion keywords, e.g. Running.

Step 8: Generate the video

Click the Queue button to run the workflow.

queue button comfyui

You should get this video.

Andrew

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

15 comments

  1. Such a nice workflow, I like it very much. It started right away, all nodes known.
    But I wish my external linux server would support it. It has a Tesla V100 with 32 GB VRAM. The problem comes from triton or sageattention, it reads:

    PassManager::run failed

    Is there any hope to fix that? Tesla V100 too old?

  2. Thanks, Andrew, this looks like it will be a significant improvement. But on Colab, Comfy isn’t finding the sage attention module.

      1. Thanks, Andrew, that got it working and I was able to generate one 720×1280 video on L4 with your workflow at length 33 – a 2 sec video took 17 mins which is a lot quicker than before. But subsequently, I kept getting out of memory errors on the KSampler. I could get round this by reducing the length eg to 25 or 1 sec (generation time 11 mins) and the quality looks very good.

        With the 480 model, the memory problems didn’t recur and a 480×848 video of length 25 took 7 mins and a length 49 video, 3 secs, took 10 mins. The quality was much worse than the 720 model, however, and subjectively worse than the equivalent without TeaCache.
        Are there any other settings I can tweak to get longer 720 videos using Colab?

          1. actually have the same OOM issue with a100 as well, I thought it was working but now its not, I must have been using the original workflow.

        1. It seems some nodes have memory leak. You can try using the buttons next to the Manager button to unload the model and node caches.

          1. Thanks, Andrew, that helped as did reducing the resolution to 720×720 with the 720 model, following a suggestion you made in the comments to your Hunyuan TeaCache post.

            I was able to generate a 4 sec video in 17 mins. It also worked at 4:3, 720×960, but it took much longer – 30 mins for 3 secs.

            Interestingly, the frame rate seemed natural at the initial resolution but when I increased it, the video was in slowmo, as also happened with Hunyuan. Your suggestion there about increasing the frame rate fixed that – in this case from 16 to 21 fps.

          2. I gave that a shot but still got the OOM error on L4. Works fine with the A100 though.

Leave a comment

Your email address will not be published. Required fields are marked *