Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: NUMA support #660

Open
frz121 opened this issue Feb 4, 2024 · 6 comments
Open

Request: NUMA support #660

frz121 opened this issue Feb 4, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@frz121
Copy link

frz121 commented Feb 4, 2024

Could you please add NUMA support to the application?
Essentially, this is just the "--numa" option for llamaсpp.
This allows models to run on multiprocessor systems (linux based) with significant acceleration.

@LostRuins LostRuins added the enhancement New feature or request label Feb 8, 2024
@rogerfachini
Copy link

With the release of Deepseek, those of us running on CPU with EPYC chips would get a noticeable performance benefit with this. I've been testing locally using numactl with koboldcpp to force NUMA aware scheduling (discussed in the llama.cpp repo here: ggml-org#1437).

Having this feature available as a flag in kobold.cpp similar to llama.cpp would be quite helpful.

@jasonsi1993
Copy link

With the release of Deepseek, those of us running on CPU with EPYC chips would get a noticeable performance benefit with this. I've been testing locally using numactl with koboldcpp to force NUMA aware scheduling (discussed in the llama.cpp repo here: ggml-org#1437).

Having this feature available as a flag in kobold.cpp similar to llama.cpp would be quite helpful.

Mind if i asking what command are you using to run the numactl? I am testing different numactl combination has not see much improvement.

@rogerfachini
Copy link

Mind if i asking what command are you using to run the numactl? I am testing different numactl combination has not see much improvement.

Sure, for context the hardware I'm running is a single-socket EPYC 7763; this command may not work as well on multi socket systems (infinity fabric link between sockets becomes a different bottleneck at that point). :

numactl --interleave=all koboldcpp --model ...

Check your BIOS settings to make sure each CCD is being treated as a separate NUMA node.

@ubergarm
Copy link

I'm fusing with numa stuff on a big dual socket Intel Xeon 6980P with upstream llama.cpp if anyone has a similar rig or tips I'm all ears: ggml-org#12088

Thanks!

@LostRuins
Copy link
Owner

If it's just a small change and someone can PR support for it, I wouldn't mind adding it as an option. However I have literally no way to test numa myself, i also don't have any AMD gpu or AMD cpu, so i would be adding it in the blind.

@rogerfachini
Copy link

I think it should be a relatively small change? Digging through the code a bit, llama.cpp implements all of the heavy lifting in the llama_numa_init(strategy) function, which should be called after llama_backend_init(). The scope appears to be adding the numa init call, and wiring that up to a new CLI arg.

I'll tinker with this over the weekend, and submit a PR when I get something working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants