Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is DeepGEMM directly applicable to backward in training? #10

Open
YouJiacheng opened this issue Feb 26, 2025 · 4 comments
Open

Is DeepGEMM directly applicable to backward in training? #10

YouJiacheng opened this issue Feb 26, 2025 · 4 comments

Comments

@YouJiacheng
Copy link

bwd of GEMM is two GEMMs, but I wonder if I need to take some special care of the range of gradients?

@YouJiacheng
Copy link
Author

Oh, so we must write a quantization kernel that produces the correct lhs_scales and rhs_scales.

@zhipeng93
Copy link

We need a fp8 gemm with 128x1 LHS scaling and 1x128 RHS scaling?

@LyricZhao
Copy link
Collaborator

We provide this library mainly for inference. So this library only supports DGRAD, not WGRAD.

In my understanding, WGRAD support needs more than a GEMM kernel, but also some utility fused kernels (e.g. transposing, fused with casting, fused with SwiGLU, fused with MoE layout). We want this library to be clean, so we didn't open-source them.

We may later release the WGRAD kernel, we will discuss about it internally :)

@YouJiacheng
Copy link
Author

Thank you Chenggang!
Yup I forgot that WGRAD needs to transpose matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants