Update attention.py #116

shirubei · 2025-02-28T14:32:53Z

Adding support for cards that aren't Ampere architecture

splendiz · 2025-03-01T16:08:53Z

Thanks shirubei, I tried the code with my 8*2080ti.
However, I got the error argument "TypeError: attention() got an unexpected keyword argument 'version'" for each of the graphic card. The errors result in "torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ".
Any solutions?

jimbojd72 · 2025-03-02T04:30:38Z

python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 
[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.
[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-01 23:17:35,481] INFO: Generating video ...
  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate
    noise_pred_cond = self.model(
                      ~~~~~~~~~~^
        latent_model_input, t=timestep, **arg_c)[0]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward
    x = block(x, **kwargs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward
    y = self.self_attn(
        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,
        freqs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward
    k=rope_apply(k, grid_sizes, freqs),
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply
    return torch.stack(output).float()
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

splendiz · 2025-03-02T06:58:36Z


python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 

[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)

[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}

[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.

[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.

[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth

[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth

[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B

[2025-03-01 23:17:35,481] INFO: Generating video ...

  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]

Traceback (most recent call last):

  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>

    generate(args)

    ~~~~~~~~^^^^^^

  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate

    video = wan_t2v.generate(

        args.prompt,

    ...<6 lines>...

        seed=args.base_seed,

        offload_model=args.offload_model)

  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate

    noise_pred_cond = self.model(

                      ~~~~~~~~~~^

        latent_model_input, t=timestep, **arg_c)[0]

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward

    x = block(x, **kwargs)

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward

    y = self.self_attn(

        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,

        freqs)

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward

    k=rope_apply(k, grid_sizes, freqs),

      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast

    return func(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply

    return torch.stack(output).float()

           ~~~~~~~~~~~~~~~~~~~~~~~~~^^

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

Same here, for single 2080ti, i got OOM as well. Multiple GPUs not working. :(

shirubei · 2025-03-02T13:28:20Z

python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 
[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.
[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-01 23:17:35,481] INFO: Generating video ...
  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate
    noise_pred_cond = self.model(
                      ~~~~~~~~~~^
        latent_model_input, t=timestep, **arg_c)[0]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward
    x = block(x, **kwargs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward
    y = self.self_attn(
        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,
        freqs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward
    k=rope_apply(k, grid_sizes, freqs),
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply
    return torch.stack(output).float()
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

you can go with --frame_num 17 or even less to test the modification, such as 13 or 9.

shirubei · 2025-03-02T14:36:01Z

@splendiz @jimbojd72
I run cmd like below with 2080ti 22GB

python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir
./Wan2.1-T2V-1.3B --offload_model False --frame_num 13 --sample_shift 8 --sample_guide_scale 6 --prompt "Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting."

shirubei · 2025-03-02T15:16:28Z

Finally video file was created.

t2v-1.3B_1_Ultra-wide_angle._night._a_young_girl_in_a_red_dre_20250303_001358.mp4

jimbojd72 · 2025-03-02T17:00:06Z

Getting the same error with your new prompt. I don't know enough for now to debug on my side (first time trying a model on my Arch with 2080 Ti).

My point here is that I should not be a blocker for merging if it doesn't work on my setup.

    /mnt/data/AI/Wan2.1   main !1 ?1  python generate.py --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model False --frame_num 13 --sample_shift 8 --sample_guide_scale 6 --prompt "Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting."
[2025-03-02 11:51:36,833] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=13, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=False, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=False, dit_fsdp=False, save_file=None, prompt='Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=1674726556309183157, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-02 11:51:36,833] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-02 11:51:36,833] INFO: Input prompt: Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting.
[2025-03-02 11:51:36,833] INFO: Creating WanT2V pipeline.
[2025-03-02 11:52:28,223] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-02 11:52:36,409] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-02 11:52:36,689] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-02 11:52:39,375] INFO: Generating video ...
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 171, in generate
    self.text_encoder.model.to(self.device)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1329, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 0 has a total capacity of 10.57 GiB of which 2.01 GiB is free. Including non-PyTorch memory, this process has 6.28 GiB memory in use. Of the allocated memory 5.78 GiB is allocated by PyTorch, and 335.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

shirubei · 2025-03-03T02:36:34Z

@jimbojd72
Does your 2080ti come with 22GB memory?
You can see that frame_num=13 without t5_cpu will consume 21.6 GB of graphic memory and 2 GB of shared memory(must be shared from ordinary memory) from the screen image captured .
So if you own a 11GB 2080ti , I suggest using frame_num 5 to test the code.
Maybe it's better to add --t5_cpu parameter.

Thank you.

jimbojd72 · 2025-03-03T04:25:52Z

11GB

Now I understand the link between frame_num and vram 🤦🏻 . Thanks for clarifying that for me.

It did work afterward even though the video seems too short!

    /mnt/data/AI/Wan2.1   main !1 ?1  python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --frame_num 5 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                            (Wan2.1)  ✔   16s  󰌠 3.13.2  23:15:48 
[2025-03-02 23:15:51,906] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=5, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=5029517465784891079, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-02 23:15:51,906] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-02 23:15:51,906] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-02 23:15:51,906] INFO: Creating WanT2V pipeline.
[2025-03-02 23:16:35,034] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-02 23:16:44,671] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-02 23:16:44,968] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-02 23:16:47,722] INFO: Generating video ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:45<00:00,  3.30s/it]
[2025-03-02 23:24:00,482] INFO: Saving generated video to t2v-1.3B_832*480_1_1_Two_anthropomorphic_cats_in_comfy_boxing_gear_and__20250302_232400.mp4
[2025-03-02 23:24:00,961] INFO: Finished.

t2v-1.3B_832.480_1_1_Two_anthropomorphic_cats_in_comfy_boxing_gear_and__20250302_232400.mp4

qianzhouyi2 · 2025-03-05T16:42:53Z

I have tested this.It need frame_num under about 12 to make sure 2080ti 22G run smoothly.It cost 5min to generate a 832480 video
python generate.py --task t2v-1.3B --size 832480 --ckpt_dir ./Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --frame_num 12

splendiz · 2025-03-06T08:05:42Z

single gpu may work, how about multi gpus? Anyone tested the code?

Mapraw · 2025-03-09T14:40:56Z

Hello, I tried to use Shirubei code in Kaggle. There will still raise an Ampere architecture error. Then, I uninstalled flash-attn, It shows

File "/kaggle/working/Wan2.1/wan/modules/attention.py", line 112, in flash_attention
assert FLASH_ATTN_2_AVAILABLE
AssertionError

What else, do i have to fix other than using shirubei code above ?

I saw someone talking about adding something in xdit_context_parallel.py

Thanks !

3dluvr · 2025-03-14T02:47:16Z

I would like to know whether anyone had luck running FSDP as well with the provided MultiGPU example?

On my 2x 3090 I don't see real DP happening. Both cards load up to almost the max VRAM (after ~80GB of RAM was used), but only 1 card does inference while the other is idling. I thought this was supposed to shard the sampling across the multiple GPUs?

johndpope · 2025-03-16T19:54:12Z

@3dluvr - did you set dit_fsdp=True and use_usp=True ? DDP will only help if you're doing 2 videos at same time.
with my 3090 - i have some code that seperates out the t5 encoder in a preprocessing step (t5 takes 12gb of vram) https://github.com/johndpope/OmniHuman-1-hack/blob/main/seaweed_apt/generate.py -

it would probably be worthwhile looking into seeing if a smaller google t5 encoder could be used (though the wan weights probably all kinda hardcoded on the larger model).

3dluvr · 2025-03-17T14:50:02Z

@johndpope Yeah and I managed to get the DDP working after trying it in Linux. Over there it appears to just work and shards the sampling across the two 3090s I have. Both cards had their VRAM filled up and both were being utilized at 100% at times (it would go up and down). Sadly, I ran out of VRAM when a video was supposed to be saved, which is weird. Maybe the VAE threw it into OOM.

I can only conclude that DDP just doesn't work in Windows (yet), so next thing I'll try is go to WSL2 and try the whole setup there.

skoroneos · 2025-03-24T16:20:20Z

I had the same issue with Flash-Attention not working on some vert old K80 i am using for testing.
I had to modify your code a bit to handle that but it now works and OOM's when the video generation is over, for which i think is another issue (VAE OOM)
Thanks for all the work !

Snip

wanvideo) stelios@ml2:/storage/Wan2.1$ export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.83,max_split_size_mb:48

(wanvideo) stelios@ml2:/storage/Wan2.1$ python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model False --frame_num 5 --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

[2025-03-24 18:53:56,797] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=5, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=False, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='zh', base_seed=1403399102435925943, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)

[2025-03-24 18:53:56,797] INFO: Generation model config: {'name': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.float32, 'text_len': 512, 'param_dtype': torch.float32, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}

[2025-03-24 18:53:56,797] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.

[2025-03-24 18:53:56,797] INFO: Creating WanT2V pipeline.

[2025-03-24 18:54:52,522] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth

[2025-03-24 18:54:58,869] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth

[2025-03-24 18:54:59,134] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B

[2025-03-24 18:55:00,165] INFO: Generating video ...

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [13:50<00:00, 16.61s/it]

Traceback (most recent call last):

File "/storage/Wan2.1/generate.py", line 412, in

generate(args)

File "/storage/Wan2.1/generate.py", line 314, in generate

video = wan_t2v.generate(

File "/storage/Wan2.1/wan/text2video.py", line 257, in generate

videos = self.vae.decode(x0)

File "/storage/Wan2.1/wan/modules/vae.py", line 659, in decode

return [

File "/storage/Wan2.1/wan/modules/vae.py", line 660, in

self.model.decode(u.unsqueeze(0),

File "/storage/Wan2.1/wan/modules/vae.py", line 557, in decode

out = self.decoder(

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl

return forward_call(*args, **kwargs)

File "/storage/Wan2.1/wan/modules/vae.py", line 451, in forward

x = layer(x, feat_cache, feat_idx)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl

return forward_call(*args, **kwargs)

File "/storage/Wan2.1/wan/modules/vae.py", line 215, in forward

x = layer(x, feat_cache[idx])

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl

return self._call_impl(*args, **kwargs)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl

return forward_call(*args, **kwargs)

File "/storage/Wan2.1/wan/modules/vae.py", line 36, in forward

return super().forward(x)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 608, in forward

return self._conv_forward(input, self.weight, self.bias)

File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 603, in _conv_forward

return F.conv3d(

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.86 GiB. GPU 0 has a total capacity of 11.17 GiB of which 2.85 GiB is free. Including non-PyTorch memory, this process has 8.32 GiB memory in use. Of the allocated memory 7.62 GiB is allocated by PyTorch, and 296.38 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Update attention.py

5f7e7ed

Adding support for cards that aren't Ampere architecture

This was referenced Mar 1, 2025

RuntimeError: FlashAttention only supports Ampere GPUs or newer #25

Open

Does it support 2080Ti? Dao-AILab/flash-attention#1514

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update attention.py #116

Update attention.py #116

shirubei commented Feb 28, 2025

splendiz commented Mar 1, 2025

jimbojd72 commented Mar 2, 2025

splendiz commented Mar 2, 2025

shirubei commented Mar 2, 2025

shirubei commented Mar 2, 2025

shirubei commented Mar 2, 2025

jimbojd72 commented Mar 2, 2025

shirubei commented Mar 3, 2025 •

edited

Loading

jimbojd72 commented Mar 3, 2025

qianzhouyi2 commented Mar 5, 2025

splendiz commented Mar 6, 2025

Mapraw commented Mar 9, 2025

3dluvr commented Mar 14, 2025

johndpope commented Mar 16, 2025

3dluvr commented Mar 17, 2025 •

edited

Loading

skoroneos commented Mar 24, 2025

Update attention.py #116

Are you sure you want to change the base?

Update attention.py #116

Conversation

shirubei commented Feb 28, 2025

splendiz commented Mar 1, 2025

jimbojd72 commented Mar 2, 2025

splendiz commented Mar 2, 2025

shirubei commented Mar 2, 2025

shirubei commented Mar 2, 2025

shirubei commented Mar 2, 2025

jimbojd72 commented Mar 2, 2025

shirubei commented Mar 3, 2025 • edited Loading

jimbojd72 commented Mar 3, 2025

qianzhouyi2 commented Mar 5, 2025

splendiz commented Mar 6, 2025

Mapraw commented Mar 9, 2025

3dluvr commented Mar 14, 2025

johndpope commented Mar 16, 2025

3dluvr commented Mar 17, 2025 • edited Loading

skoroneos commented Mar 24, 2025

Snip

shirubei commented Mar 3, 2025 •

edited

Loading

3dluvr commented Mar 17, 2025 •

edited

Loading