How to run Hunyuan Image-to-video on ComfyUI

Published Categorized as Tutorial Tagged , , No Comments on How to run Hunyuan Image-to-video on ComfyUI

The Hunyuan Video model has been a huge hit in the open-source AI community. It can generate high-quality videos from text, direct the video using a reference image, and modify the model with LoRA.

It only missed the image-to-video function like the LTX image-to-video. The good news is that the Hunyuan Image-to-Video model is now available! Read on to learn the model details and a step-by-step guide to use it.

Software

We will use ComfyUI, an alternative to AUTOMATIC1111. You can use it on Windows, Mac, or Google Colab. If you prefer using a ComfyUI service, Think Diffusion offers our readers an extra 20% credit.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step-by-step.

Model details

The Hunyuan image-to-video model has the following features.

  1. Latent concatenation: The Multimodal Large Language Model (MLLM) extracts the semantic tokens from the input image, which are then concatenated with the latent video latents. This ensures the model faithfully uses the information from the input image during video generation.
  2. Multimodal full attention: The text, image, and video tokens interact through a full-attention mechanism.
  3. Synergy of Modalities:  The interaction of these modalities enhances visual fidelity and interprets the inputs effectively.
Model architecture. (Image: Hunyuan Image-to-Video)

Hunyuan image-to-video workflow

This workflow uses an input image as the initial frame and generates an MP4 video.

It takes about 8 mins to generate a 720p (1280×720 pixels) video on my RTX 4090 (24 GB VRAM).

Step 0: Update ComfyUI

Before loading the workflow, make sure your ComfyUI is up-to-date. The easiest way to do this is to use ComfyUI Manager.

Click the Manager button on the top toolbar.

Select Update ComfyUI.

comfyui manager - update comfyui

Restart ComfyUI.

Step 1: Download models

You have some of these models if you have installed the Hunyuan Video text-to-image model.

Download hunyuan_video_image_to_video_720p_bf16.safetensors and put it in ComfyUI > models > diffusion_models.

Download clip_l.safetensors and llava_llama3_fp8_scaled.safetensors. Put them in ComfyUI > models > text_encoders.

Download hunyuan_video_vae_bf16.safetensors and put it in ComfyUI > models > vae.

Download llava_llama3_vision.safetensors and put it in ComfyUI > models > clip_vision.

Step 2: Load workflow

Download the Hunyuan video workflow JSON file below.

Drop it to ComfyUI.

Step 3: Install missing nodes

If you see red blocks, you don’t have the custom node that this workflow needs.

Click Manager > Install missing custom nodes and install the missing nodes.

Restart ComfyUI.

Step 4: Upload the input image

Upload an image you wish to use as the video’s initial frame. You can download my test image for testing.

Step 5: Revise prompt

Revise the prompt to what you want to generate.

Step 6: Generate a video

Click the Queue button to generate the video.

queue button comfyui

Tips: Change the noise_seed value to generate a different video.

Reference

ComfyUI Blog: Hunyuan Image2Video: Day-1 Support in ComfyUI!

tencent/HunyuanVideo-I2V · Hugging Face

Andrew

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

Leave a comment

Your email address will not be published. Required fields are marked *