Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update attention.py #116

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

shirubei
Copy link

Adding support for cards that aren't Ampere architecture

Adding support for cards that aren't Ampere architecture
@splendiz
Copy link

splendiz commented Mar 1, 2025

Thanks shirubei, I tried the code with my 8*2080ti.
However, I got the error argument "TypeError: attention() got an unexpected keyword argument 'version'" for each of the graphic card. The errors result in "torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ".
Any solutions?

@jimbojd72
Copy link

python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 
[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.
[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-01 23:17:35,481] INFO: Generating video ...
  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate
    noise_pred_cond = self.model(
                      ~~~~~~~~~~^
        latent_model_input, t=timestep, **arg_c)[0]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward
    x = block(x, **kwargs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward
    y = self.self_attn(
        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,
        freqs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward
    k=rope_apply(k, grid_sizes, freqs),
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply
    return torch.stack(output).float()
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

@splendiz
Copy link

splendiz commented Mar 2, 2025


python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 

[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)

[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}

[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.

[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.

[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth

[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth

[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B

[2025-03-01 23:17:35,481] INFO: Generating video ...

  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]

Traceback (most recent call last):

  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>

    generate(args)

    ~~~~~~~~^^^^^^

  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate

    video = wan_t2v.generate(

        args.prompt,

    ...<6 lines>...

        seed=args.base_seed,

        offload_model=args.offload_model)

  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate

    noise_pred_cond = self.model(

                      ~~~~~~~~~~^

        latent_model_input, t=timestep, **arg_c)[0]

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward

    x = block(x, **kwargs)

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward

    y = self.self_attn(

        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,

        freqs)

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward

    k=rope_apply(k, grid_sizes, freqs),

      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast

    return func(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply

    return torch.stack(output).float()

           ~~~~~~~~~~~~~~~~~~~~~~~~~^^

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

Same here, for single 2080ti, i got OOM as well. Multiple GPUs not working. :(

@shirubei
Copy link
Author

shirubei commented Mar 2, 2025

python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 
[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.
[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-01 23:17:35,481] INFO: Generating video ...
  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate
    noise_pred_cond = self.model(
                      ~~~~~~~~~~^
        latent_model_input, t=timestep, **arg_c)[0]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward
    x = block(x, **kwargs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward
    y = self.self_attn(
        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,
        freqs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward
    k=rope_apply(k, grid_sizes, freqs),
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply
    return torch.stack(output).float()
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

you can go with --frame_num 17 or even less to test the modification, such as 13 or 9.

@shirubei
Copy link
Author

shirubei commented Mar 2, 2025

@splendiz @jimbojd72
I run cmd like below with 2080ti 22GB

python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir
./Wan2.1-T2V-1.3B --offload_model False --frame_num 13 --sample_shift 8 --sample_guide_scale 6 --prompt "Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting."

image

@shirubei
Copy link
Author

shirubei commented Mar 2, 2025

Finally video file was created.
image

t2v-1.3B_1_Ultra-wide_angle._night._a_young_girl_in_a_red_dre_20250303_001358.mp4

@jimbojd72
Copy link

Getting the same error with your new prompt. I don't know enough for now to debug on my side (first time trying a model on my Arch with 2080 Ti).

My point here is that I should not be a blocker for merging if it doesn't work on my setup.

    /mnt/data/AI/Wan2.1   main !1 ?1  python generate.py --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model False --frame_num 13 --sample_shift 8 --sample_guide_scale 6 --prompt "Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting."
[2025-03-02 11:51:36,833] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=13, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=False, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=False, dit_fsdp=False, save_file=None, prompt='Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=1674726556309183157, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-02 11:51:36,833] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-02 11:51:36,833] INFO: Input prompt: Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting.
[2025-03-02 11:51:36,833] INFO: Creating WanT2V pipeline.
[2025-03-02 11:52:28,223] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-02 11:52:36,409] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-02 11:52:36,689] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-02 11:52:39,375] INFO: Generating video ...
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 171, in generate
    self.text_encoder.model.to(self.device)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1329, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 0 has a total capacity of 10.57 GiB of which 2.01 GiB is free. Including non-PyTorch memory, this process has 6.28 GiB memory in use. Of the allocated memory 5.78 GiB is allocated by PyTorch, and 335.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@shirubei
Copy link
Author

shirubei commented Mar 3, 2025

@jimbojd72
Does your 2080ti come with 22GB memory?
You can see that frame_num=13 without t5_cpu will consume 21.6 GB of graphic memory and 2 GB of shared memory(must be shared from ordinary memory) from the screen image captured .
So if you own a 11GB 2080ti , I suggest using frame_num 5 to test the code.
Maybe it's better to add --t5_cpu parameter.

Thank you.

@jimbojd72
Copy link

11GB

Now I understand the link between frame_num and vram 🤦🏻 . Thanks for clarifying that for me.

It did work afterward even though the video seems too short!

    /mnt/data/AI/Wan2.1   main !1 ?1  python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --frame_num 5 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                            (Wan2.1)  ✔   16s  󰌠 3.13.2  23:15:48 
[2025-03-02 23:15:51,906] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=5, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=5029517465784891079, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-02 23:15:51,906] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-02 23:15:51,906] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-02 23:15:51,906] INFO: Creating WanT2V pipeline.
[2025-03-02 23:16:35,034] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-02 23:16:44,671] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-02 23:16:44,968] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-02 23:16:47,722] INFO: Generating video ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:45<00:00,  3.30s/it]
[2025-03-02 23:24:00,482] INFO: Saving generated video to t2v-1.3B_832*480_1_1_Two_anthropomorphic_cats_in_comfy_boxing_gear_and__20250302_232400.mp4
[2025-03-02 23:24:00,961] INFO: Finished.
t2v-1.3B_832.480_1_1_Two_anthropomorphic_cats_in_comfy_boxing_gear_and__20250302_232400.mp4

@qianzhouyi2
Copy link

I have tested this.It need frame_num under about 12 to make sure 2080ti 22G run smoothly.It cost 5min to generate a 832480 video
python generate.py --task t2v-1.3B --size 832
480 --ckpt_dir ./Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --frame_num 12

@splendiz
Copy link

splendiz commented Mar 6, 2025

single gpu may work, how about multi gpus? Anyone tested the code?

@Mapraw
Copy link

Mapraw commented Mar 9, 2025

Hello, I tried to use Shirubei code in Kaggle. There will still raise an Ampere architecture error. Then, I uninstalled flash-attn, It shows

File "/kaggle/working/Wan2.1/wan/modules/attention.py", line 112, in flash_attention
assert FLASH_ATTN_2_AVAILABLE
AssertionError

What else, do i have to fix other than using shirubei code above ?

I saw someone talking about adding something in xdit_context_parallel.py

Thanks !

@3dluvr
Copy link

3dluvr commented Mar 14, 2025

I would like to know whether anyone had luck running FSDP as well with the provided MultiGPU example?

On my 2x 3090 I don't see real DP happening. Both cards load up to almost the max VRAM (after ~80GB of RAM was used), but only 1 card does inference while the other is idling. I thought this was supposed to shard the sampling across the multiple GPUs?

@johndpope
Copy link

@3dluvr - did you set dit_fsdp=True and use_usp=True ? DDP will only help if you're doing 2 videos at same time.
with my 3090 - i have some code that seperates out the t5 encoder in a preprocessing step (t5 takes 12gb of vram) https://github.com/johndpope/OmniHuman-1-hack/blob/main/seaweed_apt/generate.py -

it would probably be worthwhile looking into seeing if a smaller google t5 encoder could be used (though the wan weights probably all kinda hardcoded on the larger model).

@3dluvr
Copy link

3dluvr commented Mar 17, 2025

@johndpope Yeah and I managed to get the DDP working after trying it in Linux. Over there it appears to just work and shards the sampling across the two 3090s I have. Both cards had their VRAM filled up and both were being utilized at 100% at times (it would go up and down). Sadly, I ran out of VRAM when a video was supposed to be saved, which is weird. Maybe the VAE threw it into OOM.

I can only conclude that DDP just doesn't work in Windows (yet), so next thing I'll try is go to WSL2 and try the whole setup there.

@skoroneos
Copy link

I had the same issue with Flash-Attention not working on some vert old K80 i am using for testing.
I had to modify your code a bit to handle that but it now works and OOM's when the video generation is over, for which i think is another issue (VAE OOM)
Thanks for all the work !


Snip

wanvideo) stelios@ml2:/storage/Wan2.1$ export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.83,max_split_size_mb:48

(wanvideo) stelios@ml2:/storage/Wan2.1$ python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model False --frame_num 5 --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

[2025-03-24 18:53:56,797] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=5, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=False, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='zh', base_seed=1403399102435925943, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)

[2025-03-24 18:53:56,797] INFO: Generation model config: {'name': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.float32, 'text_len': 512, 'param_dtype': torch.float32, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}

[2025-03-24 18:53:56,797] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.

[2025-03-24 18:53:56,797] INFO: Creating WanT2V pipeline.

[2025-03-24 18:54:52,522] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth

[2025-03-24 18:54:58,869] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth

[2025-03-24 18:54:59,134] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B

[2025-03-24 18:55:00,165] INFO: Generating video ...

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [13:50<00:00, 16.61s/it]

Traceback (most recent call last):

File "/storage/Wan2.1/generate.py", line 412, in

generate(args)

File "/storage/Wan2.1/generate.py", line 314, in generate

video = wan_t2v.generate(

File "/storage/Wan2.1/wan/text2video.py", line 257, in generate

videos = self.vae.decode(x0)

File "/storage/Wan2.1/wan/modules/vae.py", line 659, in decode

return [

File "/storage/Wan2.1/wan/modules/vae.py", line 660, in

self.model.decode(u.unsqueeze(0),

File "/storage/Wan2.1/wan/modules/vae.py", line 557, in decode

out = self.decoder(

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl

return forward_call(*args, **kwargs)

File "/storage/Wan2.1/wan/modules/vae.py", line 451, in forward

x = layer(x, feat_cache, feat_idx)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl

return forward_call(*args, **kwargs)

File "/storage/Wan2.1/wan/modules/vae.py", line 215, in forward

x = layer(x, feat_cache[idx])

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl

return forward_call(*args, **kwargs)

File "/storage/Wan2.1/wan/modules/vae.py", line 36, in forward

return super().forward(x)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 608, in forward

return self._conv_forward(input, self.weight, self.bias)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 603, in _conv_forward

return F.conv3d(

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.86 GiB. GPU 0 has a total capacity of 11.17 GiB of which 2.85 GiB is free. Including non-PyTorch memory, this process has 8.32 GiB memory in use. Of the allocated memory 7.62 GiB is allocated by PyTorch, and 296.38 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants