The Hunyuan Video model has been a huge hit in the open-source AI community. It can generate high-quality videos from text, direct the video using a reference image, and modify the model with LoRA.
It only missed the image-to-video function like the LTX image-to-video. The good news is that the Hunyuan Image-to-Video model is now available! Read on to learn the model details and a step-by-step guide to use it.
Table of Contents
Software
We will use ComfyUI, an alternative to AUTOMATIC1111. You can use it on Windows, Mac, or Google Colab. If you prefer using a ComfyUI service, Think Diffusion offers our readers an extra 20% credit.
Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.
Take the ComfyUI course to learn how to use ComfyUI step-by-step.
Model details
The Hunyuan image-to-video model has the following features.
- Latent concatenation: The Multimodal Large Language Model (MLLM) extracts the semantic tokens from the input image, which are then concatenated with the latent video latents. This ensures the model faithfully uses the information from the input image during video generation.
- Multimodal full attention: The text, image, and video tokens interact through a full-attention mechanism.
- Synergy of Modalities: Â The interaction of these modalities enhances visual fidelity and interprets the inputs effectively.

Hunyuan image-to-video workflow
This workflow uses an input image as the initial frame and generates an MP4 video.
It takes about 8 mins to generate a 720p (1280×720 pixels) video on my RTX 4090 (24 GB VRAM).

Step 0: Update ComfyUI
Before loading the workflow, make sure your ComfyUI is up-to-date. The easiest way to do this is to use ComfyUI Manager.
Click the Manager button on the top toolbar.

Select Update ComfyUI.

Restart ComfyUI.
Step 1: Download models
You have some of these models if you have installed the Hunyuan Video text-to-image model.
Download hunyuan_video_image_to_video_720p_bf16.safetensors and put it in ComfyUI > models > diffusion_models.
Download clip_l.safetensors and llava_llama3_fp8_scaled.safetensors. Put them in ComfyUI > models > text_encoders.
Download hunyuan_video_vae_bf16.safetensors and put it in ComfyUI > models > vae.
Download llava_llama3_vision.safetensors and put it in ComfyUI > models > clip_vision.
Step 2: Load workflow
Download the Hunyuan video workflow JSON file below.
Drop it to ComfyUI.
Step 3: Install missing nodes
If you see red blocks, you don’t have the custom node that this workflow needs.
Click Manager > Install missing custom nodes and install the missing nodes.
Restart ComfyUI.
Step 4: Upload the input image
Upload an image you wish to use as the video’s initial frame. You can download my test image for testing.

Step 5: Revise prompt
Revise the prompt to what you want to generate.

Step 6: Generate a video
Click the Queue button to generate the video.

Tips: Change the noise_seed value to generate a different video.

Reference
ComfyUI Blog: Hunyuan Image2Video: Day-1 Support in ComfyUI!
tencent/HunyuanVideo-I2V · Hugging Face