Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Expert Parallel for qwen/deepseek #6180

Open
Guodanding opened this issue Jan 12, 2025 · 4 comments
Open

[FEATURE]: Expert Parallel for qwen/deepseek #6180

Guodanding opened this issue Jan 12, 2025 · 4 comments
Labels
enhancement New feature or request

Comments

@Guodanding
Copy link

Describe the feature

Hello, are there any existing implementations of expert parallel code for the new MoE model, like qwen and deepseek?

@Guodanding Guodanding added the enhancement New feature or request label Jan 12, 2025
@shiyongde
Copy link

need FP8 training deepseek-MOE

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


need FP8 training deepseek-MOE

@ver217
Copy link
Member

ver217 commented Feb 20, 2025

EP for Deepseek V3 is implemented, see our latest blog.

@ver217
Copy link
Member

ver217 commented Feb 20, 2025

need FP8 training deepseek-MOE

FP8 gemm kernel released by deepseek github repo now is less efficient than BF16 gemm provided by cublas sometimes. We will release blockwise FP8 training feature until we resolve the efficiency issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants