Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp template capabilities + inject tools in system message when not supported by template #33

Merged
merged 31 commits into from
Jan 29, 2025

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Jan 28, 2025

This only affects the chat-template.hpp experimental header (used to incubate ggml-org/llama.cpp#9639), not core minja.hpp.

  • Refreshed template capabilities detection code in C++ and Python, now kept in lockstep by tests (w/ some ground-truth expectations in new test-capabilities.cpp).
  • More granular fallbacks: some models don't inject tools but do support tool calling, e.g. DeepSeek R1
  • Fixes detection of parallel tool calls, etc
  • Skip less tests from Python goldens (more work needed to align fallback mechanisms 1:1)
  • More extensible API

All templates can be used the same way (e.g. as if they supported everything / didn't require any specific adjustments from the user of Minja)

  struct chat_template_caps {
      bool supports_tools = false;
      bool supports_tool_calls = false;
      bool supports_tool_responses = false;
      bool supports_system_role = false;
      bool supports_parallel_tool_calls = false;
      bool supports_tool_call_id = false;
      // meta-llama/Llama-3.1-8B-Instruct expects arguments to be an object.
      // Most other templates (and OpenAI's API) expect the arguments object to be stringified.
      bool requires_object_arguments = false;
      // CohereForAI/c4ai-command-r-plus simple variant
      bool requires_non_null_content = false;
      // MiniMaxAI/MiniMax-Text-01 special
      bool requires_typed_content = false;
  }

This makes it possible to use deepseek-ai/DeepSeek-R1-Distill-Llama-8B with tools without having to pass a different system message than with most other models.

Possible follow ups:

  • allow controlling which capabilities are emulated (e.g. to disable tools inlining)
  • pass a template to control how the tools are inlined, etc?
Model requires object arguments requires typed content supports parallel tool calls supports system role supports tool call id supports tool calls supports tool responses supports tools
mistralai-Mistral-Large-Instruct-2407 ⚠️
mistralai-Mistral-Nemo-Instruct-2407 ⚠️
NousResearch-Hermes-2-Pro-Mistral-7B-tool_use
NousResearch-Hermes-2-Pro-Llama-3-8B-tool_use
meetkai-functionary-medium-v3.2
NousResearch-Hermes-3-Llama-3.1-70B-tool_use
CohereForAI-c4ai-command-r-plus-tool_use ⚠️
Qwen-QwQ-32B-Preview ⚠️
Qwen-Qwen2.5-Math-7B-Instruct ⚠️
Qwen-Qwen2.5-7B-Instruct ⚠️
deepseek-ai-DeepSeek-R1-Distill-Qwen-7B
deepseek-ai-DeepSeek-R1-Distill-Qwen-32B
deepseek-ai-DeepSeek-R1-Distill-Llama-8B
deepseek-ai-DeepSeek-V2.5
meta-llama-Llama-3.2-3B-Instruct ⚠️
nvidia-Llama-3.1-Nemotron-70B-Instruct-HF ⚠️
meta-llama-Llama-3.3-70B-Instruct ⚠️
meta-llama-Meta-Llama-3.1-8B-Instruct ⚠️
meta-llama-Llama-3.1-8B-Instruct ⚠️
mistralai-Mistral-7B-Instruct-v0.2
databricks-dbrx-instruct
microsoft-Phi-3.5-vision-instruct
openchat-openchat-3.5-0106
bofenghuang-vigogne-2-70b-chat
microsoft-Phi-3-small-8k-instruct
mattshumer-Reflection-Llama-3.1-70B
teknium-OpenHermes-2.5-Mistral-7B
mistralai-Mixtral-8x7B-Instruct-v0.1
NousResearch-Hermes-2-Pro-Mistral-7B-default
Qwen-Qwen2-VL-7B-Instruct
CohereForAI-c4ai-command-r-plus-rag
mlabonne-AlphaMonarch-7B
microsoft-Phi-3-mini-4k-instruct
microsoft-Phi-3.5-mini-instruct
NousResearch-Hermes-3-Llama-3.1-70B-default
NousResearch-Hermes-2-Pro-Llama-3-8B-default
deepseek-ai-DeepSeek-Coder-V2-Instruct
CohereForAI-c4ai-command-r-plus-default
deepseek-ai-deepseek-coder-33b-instruct
mistralai-Mistral-Large-Instruct-2411
Qwen-Qwen2-7B-Instruct
indischepartij-MiniCPM-3B-OpenHermes-2.5-v2
abacusai-Fewshot-Metamath-OrcaVicuna-Mistral
TheBloke-FusionNet_34Bx2_MoE-AWQ
MiniMaxAI-MiniMax-Text-01 ⚠️
microsoft-Phi-3-medium-4k-instruct
google-gemma-7b-it
NexaAIDev-Octopus-v2
google-gemma-2-2b-it
OrionStarAI-Orion-14B-Chat

ochafik added a commit to ochafik/llama.cpp that referenced this pull request Jan 28, 2025
@ochafik ochafik merged commit f21e3e8 into main Jan 29, 2025
11 checks passed
@ochafik ochafik deleted the tools-system branch January 29, 2025 20:25
@ochafik ochafik restored the tools-system branch February 4, 2025 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant