CogvideoX 5B: High quality local video generator

Updated December 29, 2024By AndrewCategorized as Tutorial Tagged img2vid, Model, txt2vid, Video 26 Comments

Cognvideo is a state-of-the-art AI video generator similar to Kling, except you can generate the video locally on your PC. In this article, you will learn how to use Cogvideo in ComfyUI.

Table of Contents

Software
What is CogvideoX?
- Model architecture
- Models available
How to use CogVideo in ComfyUI

Software

We will use ComfyUI, a free AI image and video generator. You can use it on Windows, Mac, or Google Colab.

Think Diffusion provides an online ComfyUI service. They offer an extra 20% credit to our readers.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step by step.

What is CogvideoX?

CogVideoX is a significant advancement in text-to-video generation. Building upon the success of text-to-image models like Stable Diffusion, CogVideo is specifically designed to generate coherent and high-quality videos from text prompts.

Model architecture

Here are some notable model design features.

CogVideo uses the large T5 text encoder to convert the text prompt into embeddings, similar to Stable Diffusion 3 and Flux AI.
In Stable Diffusion, an VAE compresses an image to and from the latent space. CogVideo generalizes this idea and uses a 3D casual VAE to compress a video into the latent space.

Models available

CogVideo models with 2B and 5B parameters are available. For higher-quality videos, we will use the 5B version in this tutorial.

In a dimly lit bar, purplish light bathes the face of a mature man, his eyes blinking thoughtfully as he ponders in close-up, the background artfully blurred to focus on his introspective expression, the ambiance of the bar a mere suggestion of shadows and soft lighting.

How to use CogVideo in ComfyUI

This workflow is tested with an RTX4090 GPU card. It takes about 15 minutes to generate a video with a maximum VRAM usage of 16GB.

Step 1: Load the CogVideo workflow

Download the workflow JSON file below. Drop it to ComfyUI.

Download

Step 2: Install missing nodes

You will need the ComfyUI Manager for this step. Follow the link for instructions to install ComfyUI Manager.

Click Manager on the sidebar. Click Install missing custom nodes.

Install the ComfyUI CogVideoX Wrapper.

Restart ComfyUI.

Refresh the ComfyUI page.

Step 3: Download the T5 text encoder

Download the T5 text encoder using the link below. Put it in ComfyUI > models > clip.

t5xxl_fp8_e4m3fn.safetensors

Step 4: Generate a video.

Press Queue Prompt to generate a video.

It will automatically download the 5B CogVideo model the first time you run it. It will take a while as if nothing is happening. But you can tell by the size of the folder models > CogVideos getting larger.

After the download is complete, it will start the video generation.

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

View all of Andrew's posts.

26 comments

Barbara Parkman says:

April 13, 2025 at 3:09 pm

Hi Andrew,
Thanks for this tutorial. Can you use CognVideo to generate a video longer than 5 seconds. Thanks.
Zaffer
(member)

Reply
1. Andrew says:
  
  April 14, 2025 at 7:48 am
  
  Yes technically but it is outside of what this model is trained to do.
  
  Reply
  1. Barbara Parkman says:
    
    April 14, 2025 at 10:35 am
    
    Thanks for getting back to me Andrew. Is there a model that is free and will run locally and generate longer videos?
    
    Reply
angelo says:

February 28, 2025 at 8:55 am

does this work offline,?

Reply
1. Andrew says:
  
  March 1, 2025 at 8:17 am
  
  yes
  
  Reply
Satvikar says:

November 26, 2024 at 2:10 pm

Does it work in CPU with 16GB RAM?

Reply
1. Andrew says:
  
  November 27, 2024 at 2:44 pm
  
  No.
  
  Reply
Steve says:

November 20, 2024 at 2:04 pm

Any idea what this error means?

Failed to validate prompt for output 33:
* CogVideoDecode 11:
– Exception when validating inner node: tuple index out of range
Output will be ignored
invalid prompt: {‘type’: ‘prompt_outputs_failed_validation’, ‘message’: ‘Prompt outputs failed validation’, ‘details’: ”, ‘extra_info’: {}}

Reply
1. Duck says:
  
  November 20, 2024 at 9:46 pm
  
  Same error for me, looking for a fix.
  
  Reply
2. Mykhailo Kapush says:
  
  November 21, 2024 at 6:56 am
  
  same, please help
  
  Reply
3. Andrew says:
  
  November 21, 2024 at 7:09 am
  
  Fixed.
  
  Reply
  1. Krisy says:
    
    November 22, 2024 at 12:38 pm
    
    same, please help
    
    Reply
    1. Andrew says:
      
      November 23, 2024 at 7:25 am
      
      It should be working. make sure the nodes and comfyui are all up-to-date. The json file was updated a few days again. version v3.
      
      Reply
Arthur Machado says:

November 16, 2024 at 6:29 pm

Hi there, got the following error, any idea please? Thank you!!!

Prompt outputs failed validation
DownloadAndLoadCogVideoModel:
– Value not in list: fp8_transformer: ‘False’ not in [‘disabled’, ‘enabled’, ‘fastmode’]
CogVideoSampler:
– Value 49.0 bigger than max of 1.0: denoise_strength
– Value not in list: scheduler: ‘DPM’ not in [‘DPM++’, ‘Euler’, ‘Euler A’, ‘PNDM’, ‘DDIM’, ‘CogVideoXDDIM’, ‘CogVideoXDPMScheduler’, ‘SASolverScheduler’, ‘UniPCMultistepScheduler’, ‘HeunDiscreteScheduler’, ‘DEISMultistepScheduler’, ‘LCMScheduler’]

Reply
1. Andrew says:
  
  November 18, 2024 at 4:47 pm
  
  Fixed.
  
  Reply
Lucia Ricciardelli says:

October 11, 2024 at 4:11 pm

I have tried downloading CogVideo in ComfyUI but I’m unable to complete it because of the following error:
Prompt outputs failed validation
CheckpointLoaderSimple:
– Value not in list: ckpt_name: ‘v1-5-pruned-emaonly.ckpt’ not in []

Any suggestion on how to fix this problem?

Reply
1. Andrew says:
  
  October 12, 2024 at 9:38 am
  
  Download here: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors
  
  Put it in comfyui > models > checkpoints.
  
  Reply
  1. Lucia Ricciardelli says:
    
    October 12, 2024 at 2:41 pm
    
    Thank you for the input, Andrew. Now I’m getting a new error message. Here it is:
    CogVideoSampler
    Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
    Any idea on how to fix this problem?
    Thanks again.
    
    Reply
    1. Andrew says:
      
      October 14, 2024 at 7:56 am
      
      Not quite sure for this one. You can try deleting the cogvideo5B folder in models>cogvideo and rerun.
      
      Reply
Eugene says:

September 19, 2024 at 4:27 am

Andrew, you can try to use DualClipLoader(Flux) node instead of ClipLoader node(SD3).

Reply
1. Andrew says:
  
  September 19, 2024 at 8:09 am
  
  Thanks!
  
  Reply
AlexSmith says:

September 17, 2024 at 8:06 pm

Highly dependant on prompt and skill – and *every* video generator out there is a hit or miss as far as what works.

I’ve used every text-to-vid, img-to-vid and vid-to-vid tool out there that I can find on Github and otherwise and I get mostly crap from all of them, with a couple good ones here and there once you stumble on something that works well.

CogVideoX is actually pretty phenomenal, even when compared to the more well-known of the bunch like Kling and Runway.
Not saying it’s better, it isn’t – but is capable of giving you stuff that is nearly on-par – and given the fact we have other AI tools to clean things up, it’s pretty amazing really.

And don’t forget – this is the worst it’s going to get.

Reply
1. Andrew says:
  
  September 19, 2024 at 8:07 am
  
  Yeah I think it is pretty good for a *local* generator.
  
  Reply
Eugene says:

September 17, 2024 at 9:34 am

Hi Andrew,
“max of 16GB VRAM” or it’s requirement as 16Mb and above?

Reply
1. Andrew says:
  
  September 19, 2024 at 7:56 am
  
  It means the usage is below 16GB VRAM. So a 16GB card would work.
  
  Reply
Danny says:

September 16, 2024 at 3:08 pm

“””””””””””high quality””””””””””” videos. Right…

Reply