Skip to content

[NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attention"

License

Notifications You must be signed in to change notification settings

aniki-ly/FreeLong

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attention (2024 NeurIPS)

Paper Project Page

This repository is the official implementation of FreeLong.

FreeLong can generate 512-frame long videos with high consistency and fidelity without the need for additional training.

Yu Lu, Yuanzhi Liang, Linchao Zhu and Yi Yang

📖 Overview

overall_structure

We propose FreeLong, a straightforward and training-free approach to extend an existing short video diffusion model for consistent long video generation.

🔥 Updates

  • [10/2024] We release the code of FreeLong implementation on LaVie and VideoCrafter2
  • [9/2024] FreeLong is accepted by NeurIPS 2024.
  • [6/2024] Project page and paper available.

📃 Usage

In this repository, we utilize LaVie as a case study to illustrate the integration of FreeLong into existing text-to-video inference pipelines.

Within the file attention.py, we define the freelong_temp_attn function inside the BasicTransformerBlock class. This function is responsible for executing two-stream attention and merging both global and local features.

Additionally, in freelong_utils.py, we provide the necessary code for frequency filtering and mixing.

For guidance on incorporating FreeLong into other video diffusion models, please refer to the aforementioned scripts.

🔨 Quick Start

1. Clone Repo

git clone https://github.com/aniki-ly/FreeLong
cd FreeLong
cd examples/LaVie

2. Prepare Environment

conda env create -f environment.yml
conda activate lavie

3. Download Checkpoints

Download pre-trained LaVie models, Stable Diffusion 1.4, stable-diffusion-x4-upscaler to ./pretrained_models. You should be able to see the following:

├── pretrained_models
│   ├── lavie_base.pt
│   ├── lavie_interpolation.pt
│   ├── lavie_vsr.pt
│   ├── stable-diffusion-v1-4
│   │   ├── ...
└── └── stable-diffusion-x4-upscaler
        ├── ...

4. Inference with FreeLong

After downloading the base model, run the following command to generate long videos with FreeLong. The generation results is then saved to res folder.

cd freelong
python pipelines/sample.py --config configs/sample_freelong.yaml

where video_length in the config can be used to control the generated length of long video, which default set to 128. Modify this parameters should also modify the length local_masks length in attention.py

You can change the text prompts in the config file. To tune the frequency filter parameters for better results

🖼️ Generation Results

Please refer to our project page for more visual comparisons.

Acknowledgements

The code is built upon LaVie and VideoCrafter2, with additional references to code from FreeInit and FreeNoise. We thank all the contributors for their efforts in open-sourcing these projects.

🖋️ Citation

If you find our repo useful for your research, please consider citing our paper:

@article{lu2024freelong,
title={Freelong: Training-free long video generation with spectralblend temporal attention},
author={Lu, Yu and Liang, Yuanzhi and Zhu, Linchao and Yang, Yi},
journal={arXiv preprint arXiv:2407.19918},
year={2024}
}

About

[NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attention"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published