-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update attention.py #116
base: main
Are you sure you want to change the base?
Update attention.py #116
Conversation
Adding support for cards that aren't Ampere architecture
Thanks shirubei, I tried the code with my 8*2080ti. |
I got further with your branch but now getting OOM issues on my VRAM. Good try though |
Same here, for single 2080ti, i got OOM as well. Multiple GPUs not working. :( |
you can go with --frame_num 17 or even less to test the modification, such as 13 or 9. |
@splendiz @jimbojd72 python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir |
Getting the same error with your new prompt. I don't know enough for now to debug on my side (first time trying a model on my Arch with 2080 Ti). My point here is that I should not be a blocker for merging if it doesn't work on my setup.
|
@jimbojd72 Thank you. |
Now I understand the link between frame_num and vram 🤦🏻 . Thanks for clarifying that for me. It did work afterward even though the video seems too short!
t2v-1.3B_832.480_1_1_Two_anthropomorphic_cats_in_comfy_boxing_gear_and__20250302_232400.mp4 |
I have tested this.It need frame_num under about 12 to make sure 2080ti 22G run smoothly.It cost 5min to generate a 832480 video |
single gpu may work, how about multi gpus? Anyone tested the code? |
Hello, I tried to use Shirubei code in Kaggle. There will still raise an Ampere architecture error. Then, I uninstalled flash-attn, It shows File "/kaggle/working/Wan2.1/wan/modules/attention.py", line 112, in flash_attention What else, do i have to fix other than using shirubei code above ? I saw someone talking about adding something in xdit_context_parallel.py Thanks ! |
I would like to know whether anyone had luck running FSDP as well with the provided MultiGPU example? On my 2x 3090 I don't see real DP happening. Both cards load up to almost the max VRAM (after ~80GB of RAM was used), but only 1 card does inference while the other is idling. I thought this was supposed to shard the sampling across the multiple GPUs? |
@3dluvr - did you set dit_fsdp=True and use_usp=True ? DDP will only help if you're doing 2 videos at same time. it would probably be worthwhile looking into seeing if a smaller google t5 encoder could be used (though the wan weights probably all kinda hardcoded on the larger model). |
@johndpope Yeah and I managed to get the DDP working after trying it in Linux. Over there it appears to just work and shards the sampling across the two 3090s I have. Both cards had their VRAM filled up and both were being utilized at 100% at times (it would go up and down). Sadly, I ran out of VRAM when a video was supposed to be saved, which is weird. Maybe the VAE threw it into OOM. I can only conclude that DDP just doesn't work in Windows (yet), so next thing I'll try is go to WSL2 and try the whole setup there. |
I had the same issue with Flash-Attention not working on some vert old K80 i am using for testing. Snipwanvideo) stelios@ml2:/storage/Wan2.1$ export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.83,max_split_size_mb:48 (wanvideo) stelios@ml2:/storage/Wan2.1$ python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model False --frame_num 5 --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." [2025-03-24 18:53:56,797] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=5, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=False, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='zh', base_seed=1403399102435925943, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0) [2025-03-24 18:53:56,797] INFO: Generation model config: {'name': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.float32, 'text_len': 512, 'param_dtype': torch.float32, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06} [2025-03-24 18:53:56,797] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage. [2025-03-24 18:53:56,797] INFO: Creating WanT2V pipeline. [2025-03-24 18:54:52,522] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth [2025-03-24 18:54:58,869] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth [2025-03-24 18:54:59,134] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B [2025-03-24 18:55:00,165] INFO: Generating video ... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [13:50<00:00, 16.61s/it] Traceback (most recent call last): File "/storage/Wan2.1/generate.py", line 412, in
File "/storage/Wan2.1/generate.py", line 314, in generate
File "/storage/Wan2.1/wan/text2video.py", line 257, in generate
File "/storage/Wan2.1/wan/modules/vae.py", line 659, in decode
File "/storage/Wan2.1/wan/modules/vae.py", line 660, in
File "/storage/Wan2.1/wan/modules/vae.py", line 557, in decode
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
File "/storage/Wan2.1/wan/modules/vae.py", line 451, in forward
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
File "/storage/Wan2.1/wan/modules/vae.py", line 215, in forward
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
File "/storage/Wan2.1/wan/modules/vae.py", line 36, in forward
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 608, in forward
File "/home/stelios/anaconda3/envs/wanvideo/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 603, in _conv_forward
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.86 GiB. GPU 0 has a total capacity of 11.17 GiB of which 2.85 GiB is free. Including non-PyTorch memory, this process has 8.32 GiB memory in use. Of the allocated memory 7.62 GiB is allocated by PyTorch, and 296.38 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) |
Adding support for cards that aren't Ampere architecture