Local AI video has gone a long way since the release of the first local video model. The quality is much higher, and you can now control the video generation with both an image and a prompt.
In this tutorial, I will show you how to set up and run the state-of-the-art video model CogVideoX 5B. You can control the video by specifying an image and a prompt.
Table of Contents
Software
We will use ComfyUI, an alternative to AUTOMATIC1111.
Read the ComfyUI installation guide and ComfyUI beginner’s guide if you are new to ComfyUI.
Take the ComfyUI course to learn ComfyUI step-by-step.
CogVideoX Image-to-video workflow
Step 0: Update ComfyUI
Before loading the workflow, make sure your ComfyUI is up-to-date. The easiest way to do this is to use ComfyUI Manager.
Click the Manager button on the top toolbar.
Select Update ComfyUI.
Restart ComfyUI.
Step 1: Load the workflow
Download the CogVideoX 5B Image-to-video workflow below.
Drag and drop the JSON file to ComfyUI.
Step 2: Install missing nodes
Click Manager > Install Missing Custom Nodes.
Install the nodes that are missing.
Restart ComfyUI.
Step 3: Install the text encoder model
Download the t5xxl_fp8_e4m3fn text encoder model.
Put the model file in the folder ComfyUI > models > clip.
Step 4: Upload the first frame’s image
Download the image below.
Upload it to the Load Image node.
Step 5: Run the workflow
Press Queue Prompt to generate a video.
Running the workflow for the first time takes a while because it needs to download the CogVideo Image-to-Video model.
Note
You will need to change the prompt to match your uploaded image.
Have fun making local AI videos.