This repository is the official implementation of FreeLong.
FreeLong can generate 512-frame long videos with high consistency and fidelity without the need for additional training.
Yu Lu, Yuanzhi Liang, Linchao Zhu and Yi Yang
We propose FreeLong, a straightforward and training-free approach to extend an existing short video diffusion model for consistent long video generation.
- [10/2024] We release the code of FreeLong implementation on LaVie and VideoCrafter2
- [9/2024] FreeLong is accepted by NeurIPS 2024.
- [6/2024] Project page and paper available.
In this repository, we utilize LaVie as a case study to illustrate the integration of FreeLong into existing text-to-video inference pipelines.
Within the file attention.py
, we define the freelong_temp_attn
function inside the BasicTransformerBlock
class. This function is responsible for executing two-stream attention and merging both global and local features.
Additionally, in freelong_utils.py
, we provide the necessary code for frequency filtering and mixing.
For guidance on incorporating FreeLong into other video diffusion models, please refer to the aforementioned scripts.
git clone https://github.com/aniki-ly/FreeLong
cd FreeLong
cd examples/LaVie
conda env create -f environment.yml
conda activate lavie
Download pre-trained LaVie models, Stable Diffusion 1.4, stable-diffusion-x4-upscaler to ./pretrained_models
. You should be able to see the following:
├── pretrained_models
│ ├── lavie_base.pt
│ ├── lavie_interpolation.pt
│ ├── lavie_vsr.pt
│ ├── stable-diffusion-v1-4
│ │ ├── ...
└── └── stable-diffusion-x4-upscaler
├── ...
After downloading the base model, run the following command to generate long videos with FreeLong. The generation results is then saved to res
folder.
cd freelong
python pipelines/sample.py --config configs/sample_freelong.yaml
where video_length
in the config can be used to control the generated length of long video, which default set to 128. Modify this parameters should also modify the length local_masks length in attention.py
You can change the text prompts in the config file. To tune the frequency filter parameters for better results
Please refer to our project page for more visual comparisons.
The code is built upon LaVie and VideoCrafter2, with additional references to code from FreeInit and FreeNoise. We thank all the contributors for their efforts in open-sourcing these projects.
If you find our repo useful for your research, please consider citing our paper:
@article{lu2024freelong,
title={Freelong: Training-free long video generation with spectralblend temporal attention},
author={Lu, Yu and Liang, Yuanzhi and Zhu, Linchao and Yang, Yi},
journal={arXiv preprint arXiv:2407.19918},
year={2024}
}