Fast Local video: LTX Video

Published Categorized as Tutorial Tagged , , , No Comments on Fast Local video: LTX Video

LTX Video is a fast, local video AI model full of potential. The Diffusion Transformer (DiT) Video model supports generating videos from text alone or with an input image. The model is small, with only 2 billion parameters. As a result, it only requires 6 GB VRAM and generates a 4-second video in 20 secs on RTX4090. The quality is surprisingly good for its size.

In this article, I will show you how to set up and run LTX Video in ComfyUI with text-to-video and image-to-video modes.

Software

We will use ComfyUI, an alternative to AUTOMATIC1111. You can use it on Windows, Mac, or Google Colab. If you prefer using a ComfyUI service, Think Diffusion offers our readers an extra 20% credit.

Read the ComfyUI beginner’s guide if you are new to ComfyUI. See the Quick Start Guide if you are new to AI images and videos.

Take the ComfyUI course to learn how to use ComfyUI step-by-step.

LTX Text-to-video

This text-to-video LTX-Video workflow turns a text prompt into an mp4 video. The default video size is 768×512.

Step 0: Update ComfyUI

Before loading the workflow, make sure your ComfyUI is up-to-date. The easiest way to do this is to use ComfyUI Manager.

Click the Manager button on the top toolbar.

Select Update ComfyUI.

Restart ComfyUI.

Step 1: Download models

Download the LTX checkpoint model and put it in ComfyUI > models > checkpoints.

Download the T5XXL text encoder and put it in ComfyUI > models > clips.

Note: If you use my ComfyUI Colab Notebook, you don’t need to download the models. Simply select LTX before launching the notebook.

Step 2: Load workflow

Download the following workflow JSON file and drop it to ComfyUI.

Step 3: Install missing custom nodes

You may not have all the custom nodes in this workflow. If you see red boxes, click Manager > Install missing custom nodes.

Restart ComfyUI.

Step 4: Revise the prompt

Write a description of the video you want to generate.

Step 5: Generate a video

Click Queue to generate a video.

LTX Image-to-video

An image-to-video workflow takes an input image and a text prompt to generate a video.

However, if you use ComfyUI’s default image-to-video LTX video workflow, you will find the generated videos lack motion. Discussions on Reddit pointed to the model being trained with videos with motions, so images that are a bit blurry work best. I tested two methods suggested in the thread: (1) Compress the image to a single frame video to reduce its quality, and (2) Slightly blur the input image. The first method works better and is what I used in the following workflow.

Step 1: Download models

Follow the text-to-image section to download models.

Step 2: Load workflow

Download the image-to-video workflow below and drop it to ComfyUI.

Step 3: Upload an input

Upload an input image to the Load Image node. You can use the image below.

Step 4: Revise prompt

Write a prompt that matches the input image plus some motion.

Step 5: Generate video

Click Queue to generate a video.

Tips for using LTX-Video model

  • Long prompt works better for LTX-video. Write more to describe the scene and the camera motion.
  • Seed value matters. If a video doesn’t look good, try a different seed value.

Further readings

Avatar

By Andrew

Andrew is an experienced engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, photography, and education. He has a Ph.D. in engineering.

Leave a comment

Your email address will not be published. Required fields are marked *