You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: yes
version: 4790 (438a839)
this will use llama-cpp-deepseek-r1.jinja as the template, however, when using stream mode, the output content will miss the start <think> tag, but the </think> still exists, if remove the flag --chat-template-file /root/git/llama.cpp/models/templates/llama-cpp-deepseek-r1.jinja, problem gone
FYI, this is currently halfway between working as intended (by the DeepSeek R1 template authors - both the latest official DS template and our llama-cpp-deepseek-r1.jinja add a <think> tag to the end of the prompt, and now QwQ followed suit - a very annoying new "convention") and something we're actively fixing (non-streamed cased should work already, and I'm working on the streaming case as part of the tool call streaming support).
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: yes
version: 4790 (438a839)
Operating systems
Linux
GGML backends
CUDA
Hardware
tesla t4
Models
gguf deepseek-r1:14b, downloaded from ollama
Problem description & steps to reproduce
this will use llama-cpp-deepseek-r1.jinja as the template, however, when using stream mode, the output content will miss the start
<think>
tag, but the</think>
still exists, if remove the flag--chat-template-file /root/git/llama.cpp/models/templates/llama-cpp-deepseek-r1.jinja
, problem goneFirst Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: