FramePack: long AI video with low VRAM

Published Categorized as Tutorial Tagged , 6 Comments on FramePack: long AI video with low VRAM

Framepack is a video generation method that consumes low VRAM (6 GB) regardless of the video length. It supports image-to-video, turning an image into a video with text instructions.

In this tutorial, I will talk about:

  • An introduction to FramePack.
  • How to use FramePack in Windows.

5-second FramePack video:

10-second FramePack video:

What is FramePack?

FramePack predicts the next frame based on the previous frames in the video. It uses a fixed context length in the transformer regardless of the video length. This overcomes the drawback of many Video generators (such as Wan 2.1Hunyuan, and LTX Video) that limit video length due to available memory. For FramePack, it takes the same amount of VRAM to generate a 1-second and a 1-minute video.

Importantly, FramePack is plug-and-play: It works with existing video models by finetuning some layers and some code changes in sampling.

Frame packing

When predicting the next frame of a video, not all frames are equally important. The key insight of FramePack is to downsample the context of the frames based on how long ago the frame was taken. The further away it is, the more the frame is downsampled.

The context of a frame is downsampled according to how recent it is.

Anti-drifting sampling

However, errors (called drift) accumulate when we predict the next frames and then use them for further predictions.

The inverted anti-drifting sampling used in the software generates the video in reverse order. Each generation is anchored on the high-quality initial frame.

Video model

The demo software applies FramePack to the Hunyuan Video model.

Installing FramePack on Windows

FramePack is an open-source software. You can use it for free on your local machine with an NVidia video card RTX 3000 series or above.

Step 1: Install 7-Zip

You need the 7-zip software to uncompress FramePack’s zip file.

Download 7-zip on this page or use this direct download link.

Double-click to run the downloaded exe file. Click Install to install 7-zip on your PC.

Step 2: Download FramePack

Go to FramePack’s Github page. Click the Download link under Windows.

Step 3: Uncompress FramePack

Right-click on the downloaded file, select Show More Options > 7-Zip > Extract to “framepack_cu126_torch26\”.

When it is done, you should see a new folder framepack_cu126_torch26. You can move the folder to another location you like.

Step 4: Update Framepack

In the folder famepack_cu126_torch26, double-click the file update.bat to update Framepack.

Step 5: Run FramePack

In the folder famepack_cu126_torch26, double-click the file run.bat to start Framepack.

It will take some time to start when you run it for the first time because it needs to download ~30GB of model files.

Using FramePack

FramePack generates a short video clip using the input image as the initial frame and the text description of the video.

In this section, we will use the video generation settings recommended by FramePack.

Step 1: Upload the initial image

Upload the following image to FramePack’s Image canvas.

Step 2: Enter the prompt

Enter the following prompt in FramePack.

a crochet doll dancing on a desk

Step 3: Revise video settings

Revise the generation settings.

  • Teacache: speeds up video generation but may introduce artifacts in small details, such as fingers.
  • Seed: Different value generates different videos.
  • Steps: The number of diffusion steps. Keep the default setting.
  • Distilled CFG Scale: The CFG scale controls how closely the prompt should be followed. Keep the default setting.
  • GPU inference preserved memory (GB): Increase this value if you run into an out-of-memory error.

Step 4: Generate a video

Click Start Generation to generate a video.

It will generate the end of the video first, and extend to the beginning. On the console, you will see several progress bars before a video is generated.

It takes about 10 minutes on an RTX 4090 GPU card to generate a 5-second video.

Longer videos with Framepack

Unlike Wan 2.1, Hunyuan, and LTX Video, Framepack uses the same amount of VRAM regardless of the video’s length. That means you can generate a minute-long video using 6 GB VRAM!

See this 10-second video:

Reference

GitHub – lllyasviel/FramePack: Lets make video diffusion practical!

FramePack (Blog post)

Andrew

By Andrew

Andrew is an experienced software engineer with a specialization in Machine Learning and Artificial Intelligence. He is passionate about programming, art, and education. He has a doctorate degree in engineering.

6 comments

  1. Another humble request here for a notebook since this is a Windows-only solution — would be much appreciated. Some of us Mac-based folks would have a beefed-up Windows system if we could.

  2. This looks like this will be extremely useful, Andrew. Hope you can put it into a Colab notebook at some point.
    The 5 sec video has a bug and won’t play

      1. Great, thanks, Andrew. If you have to choose a model to go with it, I think Wan2.1 would be my choice 🙂

Leave a comment

Your email address will not be published. Required fields are marked *