Request: NUMA support #660

frz121 · 2024-02-04T19:22:36Z

Could you please add NUMA support to the application?
Essentially, this is just the "--numa" option for llamaсpp.
This allows models to run on multiprocessor systems (linux based) with significant acceleration.

rogerfachini · 2025-02-04T00:03:45Z

With the release of Deepseek, those of us running on CPU with EPYC chips would get a noticeable performance benefit with this. I've been testing locally using numactl with koboldcpp to force NUMA aware scheduling (discussed in the llama.cpp repo here: ggml-org#1437).

Having this feature available as a flag in kobold.cpp similar to llama.cpp would be quite helpful.

jasonsi1993 · 2025-02-18T06:07:56Z

With the release of Deepseek, those of us running on CPU with EPYC chips would get a noticeable performance benefit with this. I've been testing locally using numactl with koboldcpp to force NUMA aware scheduling (discussed in the llama.cpp repo here: ggml-org#1437).

Having this feature available as a flag in kobold.cpp similar to llama.cpp would be quite helpful.

Mind if i asking what command are you using to run the numactl? I am testing different numactl combination has not see much improvement.

rogerfachini · 2025-02-18T17:59:41Z

Mind if i asking what command are you using to run the numactl? I am testing different numactl combination has not see much improvement.

Sure, for context the hardware I'm running is a single-socket EPYC 7763; this command may not work as well on multi socket systems (infinity fabric link between sockets becomes a different bottleneck at that point). :

numactl --interleave=all koboldcpp --model ...

Check your BIOS settings to make sure each CCD is being treated as a separate NUMA node.

ubergarm · 2025-02-26T19:52:49Z

I'm fusing with numa stuff on a big dual socket Intel Xeon 6980P with upstream llama.cpp if anyone has a similar rig or tips I'm all ears: ggml-org#12088

Thanks!

LostRuins · 2025-02-27T09:12:54Z

If it's just a small change and someone can PR support for it, I wouldn't mind adding it as an option. However I have literally no way to test numa myself, i also don't have any AMD gpu or AMD cpu, so i would be adding it in the blind.

rogerfachini · 2025-03-01T00:56:28Z

I think it should be a relatively small change? Digging through the code a bit, llama.cpp implements all of the heavy lifting in the llama_numa_init(strategy) function, which should be called after llama_backend_init(). The scope appears to be adding the numa init call, and wiring that up to a new CLI arg.

I'll tinker with this over the weekend, and submit a PR when I get something working.

LostRuins added the enhancement New feature or request label Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: NUMA support #660

Request: NUMA support #660

frz121 commented Feb 4, 2024

rogerfachini commented Feb 4, 2025

jasonsi1993 commented Feb 18, 2025

rogerfachini commented Feb 18, 2025

ubergarm commented Feb 26, 2025

LostRuins commented Feb 27, 2025

rogerfachini commented Mar 1, 2025

Request: NUMA support #660

Request: NUMA support #660

Comments

frz121 commented Feb 4, 2024

rogerfachini commented Feb 4, 2025

jasonsi1993 commented Feb 18, 2025

rogerfachini commented Feb 18, 2025

ubergarm commented Feb 26, 2025

LostRuins commented Feb 27, 2025

rogerfachini commented Mar 1, 2025