[FEATURE]: Expert Parallel for qwen/deepseek #6180

Guodanding · 2025-01-12T14:32:39Z

Describe the feature

Hello, are there any existing implementations of expert parallel code for the new MoE model, like qwen and deepseek?

shiyongde · 2025-02-19T01:52:27Z

need FP8 training deepseek-MOE

Issues-translate-bot · 2025-02-19T01:52:37Z

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

need FP8 training deepseek-MOE

ver217 · 2025-02-20T04:03:06Z

EP for Deepseek V3 is implemented, see our latest blog.

ver217 · 2025-02-20T04:05:12Z

need FP8 training deepseek-MOE

FP8 gemm kernel released by deepseek github repo now is less efficient than BF16 gemm provided by cublas sometimes. We will release blockwise FP8 training feature until we resolve the efficiency issue.

Guodanding added the enhancement New feature or request label Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: Expert Parallel for qwen/deepseek #6180

[FEATURE]: Expert Parallel for qwen/deepseek #6180

Guodanding commented Jan 12, 2025

shiyongde commented Feb 19, 2025

Issues-translate-bot commented Feb 19, 2025

ver217 commented Feb 20, 2025

ver217 commented Feb 20, 2025

[FEATURE]: Expert Parallel for qwen/deepseek #6180

[FEATURE]: Expert Parallel for qwen/deepseek #6180

Comments

Guodanding commented Jan 12, 2025

Describe the feature

shiyongde commented Feb 19, 2025

Issues-translate-bot commented Feb 19, 2025

ver217 commented Feb 20, 2025

ver217 commented Feb 20, 2025