Stars
Enjoy the magic of Diffusion models!
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Official repository for our work on micro-budget training of large-scale diffusion models.
[ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)
Official repository of In-Context LoRA for Diffusion Transformers
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
FastVideo is a lightweight framework for accelerating large video diffusion models.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Inference-time scaling of diffusion-based image and video generation models.
[CVPR2025] A benchmark for evaluating video generative models in generating short stories
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
Explore the Multimodal “Aha Moment” on 2B Model
Awesome Reasoning LLM Tutorial/Survey/Guide
A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
Inference-Time Alignment in Protein Diffusion Models
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
Wan: Open and Advanced Large-Scale Video Generative Models
R1-onevision, a visual language model capable of deep CoT reasoning.
[CVPR 2024] | LAMP: Learn a Motion Pattern for Few-Shot Based Video Generation
SkyReels V1: The first and most advanced open-source human-centric video foundation model
FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
Solve Visual Understanding with Reinforced VLMs