-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] CUDA backend #1983
base: main
Are you sure you want to change the base?
[WIP] CUDA backend #1983
Conversation
I wanna add rocm support based on your cuda pull request. would that be ok with you? |
Awesome progress so far @zcbenz !! I'm wondering what the best way to get this incorporated into MLX. I can think of a couple of options:
I kind of prefer the latter.. but I'm open to suggestions. |
@radudiaconu0 Of course I'm ok with it! Before you begin, you might want to decide how the ROCm backend lives together with CUDA backend first. I'm not familiar with ROCm, but I saw 2 patterns in projects with both backends:
Another thing to notice is this PR is bound to heavy changes in following weeks, I'm still experimenting what is the best interface for integration. |
Awesome progress indeed! Just chiming in regarding the best way to incorporate this. Imho merging often is the way to go (option 2 basically). Combined with running CUDA tests in CI it will be the easiest to live with (since we 'll know when we break it even if we don't use it). Otherwise the cuda branch would have to be constantly rebased on top of main which could be annoying. |
I would try to make a separate hip folder or to use hipify on your CUDA code to make it use rocm/hip |
I find myself keep refactoring the code when porting new kernels, I think I still need to implement a few more primitives before getting the backbone code stable, probably a few more weeks of experimenting. Once the code is ready for review, I can split this PR into a backbone PR, and a few small PRs for each primitive. And future works would then be submitted in incremental PRs. |
In CUDA the kernel parameters' size must be known at compile-time, i.e. we can't pass dynamic-sized shape/strides via constant memory like what the Metal kernels do. I'm currently passing shape/strides to kernels via fixed-size |
Sounds great! As long as we can change it by setting one number somewhere I think that's perfectly fine. |
This PR is an ongoing effort to add a CUDA backend to MLX, very little things work now but you can run the tutorial example already.
To build and test:
For development I usually use:
$ cmake . -Bbuild -DMLX_BUILD_CUDA=ON -DMLX_BUILD_EXAMPLES=ON -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache -DCMAKE_BUILD_TYPE=Debug -GNinja
Only tested on a Ubuntu 22.04 with CUDA 11.6, in theory other environments can also work but there are no testings.
This PR is not updated frequently, if anyone is interested in the realtime development, please check my forked repo.
There are mainly 2 reasons for a CUDA backend:
This work is sponsored by Apple.