-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: NUMA support #660
Comments
With the release of Deepseek, those of us running on CPU with EPYC chips would get a noticeable performance benefit with this. I've been testing locally using Having this feature available as a flag in kobold.cpp similar to llama.cpp would be quite helpful. |
Mind if i asking what command are you using to run the numactl? I am testing different numactl combination has not see much improvement. |
Sure, for context the hardware I'm running is a single-socket EPYC 7763; this command may not work as well on multi socket systems (infinity fabric link between sockets becomes a different bottleneck at that point). :
Check your BIOS settings to make sure each CCD is being treated as a separate NUMA node. |
I'm fusing with numa stuff on a big dual socket Intel Xeon 6980P with upstream llama.cpp if anyone has a similar rig or tips I'm all ears: ggml-org#12088 Thanks! |
If it's just a small change and someone can PR support for it, I wouldn't mind adding it as an option. However I have literally no way to test numa myself, i also don't have any AMD gpu or AMD cpu, so i would be adding it in the blind. |
I think it should be a relatively small change? Digging through the code a bit, llama.cpp implements all of the heavy lifting in the I'll tinker with this over the weekend, and submit a PR when I get something working. |
Could you please add NUMA support to the application?
Essentially, this is just the "--numa" option for llamaсpp.
This allows models to run on multiprocessor systems (linux based) with significant acceleration.
The text was updated successfully, but these errors were encountered: