-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tool-call
: Phi-4 support
#12288
base: master
Are you sure you want to change the base?
tool-call
: Phi-4 support
#12288
Conversation
Hey @jpohhhh, thanks a lot for preparing this! The official template from Microsoft is quite disappointing tbh, and while your changes work around some/most of its limitations, we might need a bit more / might be worth going full jinja (see below) Show original template
The "sins" of their template are:
Despite these issues, I seem to be getting good outputs w/ the generic handling on master: cmake -B build -DLLAMA_CURL=1 && cmake --build build --config Release -j -t llama-server
export LLAMA_SERVER_BIN_PATH=$PWD/build/bin/llama-server
export LLAMA_CACHE=${LLAMA_CACHE:-$HOME/Library/Caches/llama.cpp}
./scripts/tool_bench.py run --n 10 --temp -1 --temp 0 --temp 1 --test-calc-result --model "Phi 4 mini instruct Q4_K_M" --output phi4_master.jsonl --hf bartowski/microsoft_Phi-4-mini-instruct-GGUF
./scripts/tool_bench.py plot phi4_master.jsonl This is just a smoke test / not a proper benchmark (trying to get BFCL running, see here), but I'm getting less success w/ your branch. I sent you Telosnex#1 which fixes a couple of issues. I think we should also remove much of the code in Show proposed template (still unclear how to provide `<|tool_response|>` and if `<|tag|>` should be involved)
|
- Add system message if needed (per template requirement) - Add tools to system message (req'd by template) - Parse output: -- add tools to response when there is valid JSON between <|tool_call|> and </|tool_call|> -- content outside of tool_call tags is added to the text portion of the response -- if there is no valid JSON, the entire content is added to the text portion of the response
Fixes for phi-4 support
I made a mess while merging in Olivier's work, so it ended up merged into one commit in this branch. In this commit, I undo changes that wouldn't have been intended in this commit (ex. server.cpp
via https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/main/added_tokens.json { "<|/tool_call|>": 200026, "<|/tool|>": 200024, "<|assistant|>": 200019, "<|end|>": 200020, "<|system|>": 200022, "<|tag|>": 200028, "<|tool_call|>": 200025, "<|tool_response|>": 200027, "<|tool|>": 200023, "<|user|>": 200021 } FWIW tool_response seems to be a role, via https://github.com/kinfey/Phi-3CookBook/blob/main/md/02.Application/07.FunctionCalling/Phi4/Multiagents/Phi_4_mini_multiagent.ipynb
Phi-4 tool calling's docs include the HuggingFace pages for Phi-4[^1] and Jupyter notebooks at [^2]. [^1] only has <tool_call>, and the notebook at [^3] is the only resource I've found that demonstrates tool responses. It looks like it's used as a sort of role -- it doesn't directly show it 100% directly, but, the message format used defines tool_response as a role in a structure with text content, identical to the user/assistant messages in the same notebook. Given that, and also it explaining another small mystery to me (why isn't <|/tool_response> in the reserved tokens?), I'll apply that here. [^1] https://huggingface.co/microsoft/Phi-4-mini-instruct [^2] https://github.com/microsoft/PhiCookBook/tree/main/md/02.Application/07.FunctionCalling/Phi4 [^3] https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/07.FunctionCalling/Phi4/Multiagents/Phi_4_mini_multiagent.ipynb
(seems it added reviewers automatically; can't figure out how to remove) re: template / params_init That'll work great, it makes sense to be picky / correct about that on the client end, and generically, use the built-in template as provided by the GGUF. It's a good mental model too, I've been struggling with a Gemma 3 GGUF template issue re: absolutely requiring alternating user/system only messages. re: code I'm afraid I made a bit of mess while merging, I think it all came out right. I added 3 more commits on top of that:
|
for posterity, re: tool responses: Phi-4 tool calling's docs include the HuggingFace pages for Phi-4[^1^] and Jupyter notebooks at [^2^]. [^1^] only has <tool_call>, and the notebook at [^3^] is the only resource I've found that demonstrates tool responses. It looks like it's used as a sort of role -- it doesn't directly show it 100% directly, but, the message format used defines tool_response as a role in a structure with text content, identical to the user/assistant messages in the same notebook. Given that, and also it explaining another small mystery to me (why isn't <|/tool_response> in the reserved tokens?), I'll apply that here. [^1^] https://huggingface.co/microsoft/Phi-4-mini-instruct |
Make sure to read the contributing guidelines before submitting a PR
(cc @ochafik)