`tool-call`: Phi-4 support #12288

jpohhhh · 2025-03-09T17:52:45Z

Add system message if needed (per template requirement)
Add tools to system message (req'd by template)
Parse output: -- add tools to response when there is valid JSON between <|tool_call|> and </|tool_call|> -- content outside of tool_call tags is added to the text portion of the response -- if there is no valid JSON, the entire content is added to the text portion of the response

Make sure to read the contributing guidelines before submitting a PR

common/chat.cpp

ochafik · 2025-03-09T23:40:24Z

Hey @jpohhhh, thanks a lot for preparing this!

The official template from Microsoft is quite disappointing tbh, and while your changes work around some/most of its limitations, we might need a bit more / might be worth going full jinja (see below)

Show original template

{%- for message in messages -%}
  {%- if message['role'] == 'system' and 'tools' in message and message['tools'] is not none -%}
    {{- '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' -}}
  {%- else -%}
    {{- '<|' + message['role'] + '|>' + message['content'] + '<|end|>' -}}
  {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
  {{- '<|assistant|>' -}}
{%- else -%}
  {{- eos_token -}}
{%- endif -%}

The "sins" of their template are:

It expects tools from the system message (instead of as a global variable as most templates). This is worked around by Minja w/ a polyfill that prints the json array of tools. Unfortunately that's w/o the expected <|tool|>...</|tool|> wrapper, the "tools in message" behaviour should be handled in https://github.com/google/minja
It does not print tool calls (this is worked around by the Minja + the generic mode, but without the <|tool_call|> syntax)
It prints tool call results (messages such as {"role": "tool", "name": "foo", "content": "42"}) as <|tool|>42<|end|>... which would probably conflict with the tool description wrapping mechanism above. Unclear what the proper way to inject tool results is (did you find any documentation btw?), but sure involves <|tool_response|> (see Phi-4-mini-instruct's added tokens; possibly also <|tag|>). Note that Minja doesn't polyfill this but the generic handler does.

Despite these issues, I seem to be getting good outputs w/ the generic handling on master:

cmake -B build -DLLAMA_CURL=1 && cmake --build build --config Release -j -t llama-server

export LLAMA_SERVER_BIN_PATH=$PWD/build/bin/llama-server
export LLAMA_CACHE=${LLAMA_CACHE:-$HOME/Library/Caches/llama.cpp}

./scripts/tool_bench.py run --n 10 --temp -1 --temp 0 --temp 1 --test-calc-result --model "Phi 4 mini instruct Q4_K_M" --output phi4_master.jsonl --hf bartowski/microsoft_Phi-4-mini-instruct-GGUF
./scripts/tool_bench.py plot phi4_master.jsonl

This is just a smoke test / not a proper benchmark (trying to get BFCL running, see here), but I'm getting less success w/ your branch.

I sent you Telosnex#1 which fixes a couple of issues.

I think we should also remove much of the code in common_chat_params_init_phi_4 and heavily suggest users use a better template instead (if someone at Microsoft is reading, please update your template haha!). We could even throw an exception w/ instructions to use the "right" template when we detect a bad template.

Show proposed template (still unclear how to provide `<|tool_response|>` and if `<|tag|>` should be involved)

{%- if messages[0]["role"] == "system" %}
    {%- set system_message = messages[0]["content"] %}
{% elif tools is defined -%}
    {%- set system_message = "You are a helpful assistant with access to tools." -%}
{% else %}
    {%- set system_message = "" -%}
{%- endif %}
{%- if tools is defined -%}
    {%- set system_message = system_message + '<|tool|>' + (tools | tojson) + '<|/tool|>' -%}
    {%- if '<|tool_call|>' not in system_message -%}
        {%- set system_message = system_message + "\nTo use a tool, respond in this format: <|tool_call|>{\"name\": \"foo\", \"arguments\": {\"a\": 1}}<|/tool_call|>" %}
    {%- endif %}
{%- endif %}
{%- if system_message is defined -%}
    {{- '<|system|>' + system_message + '<|end|>' -}}
{%- endif -%}
{%- for message in messages -%}
    {%- if message['role'] == 'tool' -%}
        {{- '<|tool_response|>' + (message['content'] | tojson) + '<|/tool_response|>' -}}
    {%- elif message['role'] != 'system' -%}
        {{- '<|' + message['role'] + '|>' -}}
        {%- if message.content -%}
            {{- message['content'] -}}
        {%- endif -%}  
        {%- for tool_call in message.tool_calls -%}
            {{- '<|tool_call|>' + (tool_call | tojson) + '<|/tool_call|>' -}}
        {%- endfor -%}
        {{- '<|end|>' -}}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
   {{- '<|assistant|>' -}}
{%- else -%}
   {{- eos_token -}}
{%- endif -%}

- Add system message if needed (per template requirement) - Add tools to system message (req'd by template) - Parse output: -- add tools to response when there is valid JSON between <|tool_call|> and </|tool_call|> -- content outside of tool_call tags is added to the text portion of the response -- if there is no valid JSON, the entire content is added to the text portion of the response

Fixes for phi-4 support

I made a mess while merging in Olivier's work, so it ended up merged into one commit in this branch. In this commit, I undo changes that wouldn't have been intended in this commit (ex. server.cpp

via https://huggingface.co/microsoft/Phi-4-mini-instruct/blob/main/added_tokens.json { "<|/tool_call|>": 200026, "<|/tool|>": 200024, "<|assistant|>": 200019, "<|end|>": 200020, "<|system|>": 200022, "<|tag|>": 200028, "<|tool_call|>": 200025, "<|tool_response|>": 200027, "<|tool|>": 200023, "<|user|>": 200021 } FWIW tool_response seems to be a role, via https://github.com/kinfey/Phi-3CookBook/blob/main/md/02.Application/07.FunctionCalling/Phi4/Multiagents/Phi_4_mini_multiagent.ipynb

Phi-4 tool calling's docs include the HuggingFace pages for Phi-4[^1] and Jupyter notebooks at [^2]. [^1] only has <tool_call>, and the notebook at [^3] is the only resource I've found that demonstrates tool responses. It looks like it's used as a sort of role -- it doesn't directly show it 100% directly, but, the message format used defines tool_response as a role in a structure with text content, identical to the user/assistant messages in the same notebook. Given that, and also it explaining another small mystery to me (why isn't <|/tool_response> in the reserved tokens?), I'll apply that here. [^1] https://huggingface.co/microsoft/Phi-4-mini-instruct [^2] https://github.com/microsoft/PhiCookBook/tree/main/md/02.Application/07.FunctionCalling/Phi4 [^3] https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/07.FunctionCalling/Phi4/Multiagents/Phi_4_mini_multiagent.ipynb

jpohhhh · 2025-03-15T05:16:22Z

(seems it added reviewers automatically; can't figure out how to remove)

re: template / params_init
I see, lmk if anything sounds off here: I'm building a Flutter GUI downstream of llama.cpp via github.com/telosnex/fllama. I usually steal bits and pieces from examples to hack together my libraries. Looking at server.cpp, I see I could just pass a string to common_chat_templates_init (a la chat_templates = common_chat_templates_init(model, **params_base.chat_template**); in server.cpp) when I'm using a Phi-4 model.

That'll work great, it makes sense to be picky / correct about that on the client end, and generically, use the built-in template as provided by the GGUF. It's a good mental model too, I've been struggling with a Gemma 3 GGUF template issue re: absolutely requiring alternating user/system only messages.

re: code
Thank you very much for your work.

I'm afraid I made a bit of mess while merging, I think it all came out right.

I added 3 more commits on top of that:

Revert changes to server (it was debris, from the way I accidentally flattened the commits in your PR)
Token consistency Make sure tokens are consistent with the actual model tokens (cross-checked using a reserved tokens list in their HF repo, I had used <|/tool_call> in some places and </|tool_call> in others, but its the first, <|/tool_call>)
Tool responses as role: Docs on the tool calls are a bit sparse, there's not anything beyond the HF model card and Jupyter notebooks in a repo, but...good news -- there were a couple more tool call notebooks published earlier this week, they also demonstrate <|tool_response|>. It is treated like a role. This commit tweaks the jinja to do the same (<|tool_response|> coupled to end)

jpohhhh · 2025-03-15T05:17:25Z

for posterity, re: tool responses:

Phi-4 tool calling's docs include the HuggingFace pages for Phi-4[^1^] and Jupyter notebooks at [^2^]. [^1^] only has <tool_call>, and the notebook at [^3^] is the only resource I've found that demonstrates tool responses.

It looks like it's used as a sort of role -- it doesn't directly show it 100% directly, but, the message format used defines tool_response as a role in a structure with text content, identical to the user/assistant messages in the same notebook.

Given that, and also it explaining another small mystery to me (why isn't <|/tool_response> in the reserved tokens?), I'll apply that here.

[^1^] https://huggingface.co/microsoft/Phi-4-mini-instruct
[^2^] https://github.com/microsoft/PhiCookBook/tree/main/md/02.Application/07.FunctionCalling/Phi4
[^3^] https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/07.FunctionCalling/Phi4/Multiagents/Phi_4_mini_multiagent.ipynb

github-actions bot added the testing Everything test related label Mar 9, 2025

ngxson requested a review from ochafik March 9, 2025 20:39

jpohhhh commented Mar 9, 2025

View reviewed changes

common/chat.cpp Show resolved Hide resolved

ochafik mentioned this pull request Mar 9, 2025

Fixes for phi-4 support Telosnex/llama.cpp#1

Merged

ochafik mentioned this pull request Mar 10, 2025

Misc. bug: Missing <think> tag in response (DeepSeek R1) #11861

Open

jpohhhh requested a review from ngxson as a code owner March 14, 2025 22:10

github-actions bot added documentation Improvements or additions to documentation examples server labels Mar 14, 2025

jpohhhh requested a review from JohannesGaessler as a code owner March 14, 2025 22:22

jpohhhh force-pushed the phi4_tools branch from d689d21 to 275df71 Compare March 14, 2025 22:32

jpohhhh added 3 commits March 14, 2025 23:11

Merge pull request #1 from ochafik/Telosnex_phi4_tools_template

eae5d97

Fixes for phi-4 support

Revert some bits

32d32ef

I made a mess while merging in Olivier's work, so it ended up merged into one commit in this branch. In this commit, I undo changes that wouldn't have been intended in this commit (ex. server.cpp

jpohhhh force-pushed the phi4_tools branch from 275df71 to 32d32ef Compare March 15, 2025 03:32

jpohhhh added 2 commits March 15, 2025 00:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tool-call`: Phi-4 support #12288

`tool-call`: Phi-4 support #12288

jpohhhh commented Mar 9, 2025

ochafik commented Mar 9, 2025 •

edited

Loading

jpohhhh commented Mar 15, 2025

jpohhhh commented Mar 15, 2025 •

edited

Loading

tool-call: Phi-4 support #12288

Are you sure you want to change the base?

tool-call: Phi-4 support #12288

Conversation

jpohhhh commented Mar 9, 2025

ochafik commented Mar 9, 2025 • edited Loading

jpohhhh commented Mar 15, 2025

jpohhhh commented Mar 15, 2025 • edited Loading

`tool-call`: Phi-4 support #12288

`tool-call`: Phi-4 support #12288

ochafik commented Mar 9, 2025 •

edited

Loading

jpohhhh commented Mar 15, 2025 •

edited

Loading