-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add chat template support #917
Conversation
Reviewer's Guide by SourceryThis PR adds support for chat templates to Sequence diagram for adding a new snapshot with chat templatesequenceDiagram
participant MS as ModelStore
participant OR as OllamaRepository
participant HF as HuggingfaceRepository
participant GGUF as GGUFInfoParser
MS->OR: new_snapshot(model_tag, snapshot_hash, snapshot_files)
OR->MS: _prepare_new_snapshot(model_tag, snapshot_hash, snapshot_files)
MS->MS: Creates ref file and snapshot directory
OR->MS: _download_snapshot_files(snapshot_hash, snapshot_files)
loop For each file in snapshot_files
MS->OR: get_blob_file_path(file.hash)
OR->OR: download(dest_path, snapshot_dir)
OR->MS: get_snapshot_file_path(snapshot_hash, file.name)
MS->MS: Create symlink
end
OR->MS: _ensure_chat_template(model_tag, snapshot_hash, snapshot_files)
alt Chat template file specified
MS->MS: get_blob_file_path(file.hash)
MS->MS: Reads chat template file
opt Is Go template
MS->go2jinja: go_to_jinja(chat_template)
go2jinja->MS: jinja_template
MS->MS: update_snapshot(model_tag, snapshot_hash, files)
end
else No chat template file specified
MS->MS: get_blob_file_path(model_file.hash)
MS->GGUF: is_model_gguf(model_file_path)
alt Is GGUF model
MS->GGUF: parse(model_file_path)
GGUF->MS: info
MS->MS: info.get_chat_template()
opt Is Go template
MS->go2jinja: go_to_jinja(tmpl)
go2jinja->MS: jinja_template
MS->MS: update_snapshot(model_tag, snapshot_hash, files)
end
end
end
Sequence diagram for updating a snapshot with chat templatesequenceDiagram
participant MS as ModelStore
MS->MS: update_snapshot(model_tag, snapshot_hash, new_snapshot_files)
alt Directory setup does not exist
MS->MS: return False
else Ref file does not exist
MS->MS: return False
else
MS->MS: get_ref_file(model_tag)
MS->MS: Updates ref_file.filenames, model_name and chat_template_name
MS->MS: Writes ref_file.serialize() to file
MS->MS: _download_snapshot_files(snapshot_hash, new_snapshot_files)
loop For each file in new_snapshot_files
MS->MS: get_blob_file_path(file.hash)
MS->MS: download(dest_path, snapshot_dir)
MS->MS: get_snapshot_file_path(snapshot_hash, file.name)
MS->MS: Create symlink
end
MS->MS: return True
end
Updated class diagram for RefFileclassDiagram
class RefFile {
- hash: str
- filenames: list[str]
- model_name: str
- chat_template_name: str
- _path: str
+ from_path(path: str) : RefFile
+ serialize() : str
+ path : str
}
note for RefFile "SEP, MODEL_SUFFIX and CHAT_TEMPLATE_SUFFIX are class constants"
Class diagram for GGUFModelInfoclassDiagram
class GGUFModelInfo {
- Metadata: dict
- Tensors: list
- LittleEndian: bool
+ get_chat_template() : str
+ serialize(json: bool, all: bool) : str
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
@ericcurtin Could we rebuild a new version of the ramalama container image? |
@rhatdan has the build infrastructure set up to do it, so it would be less effort to wait for him for a new release. But if you build a container image locally, you can just pass that to RamaLama for development purposes. |
66ad4bf
to
d756093
Compare
if file.type == SnapshotFileType.ChatTemplate: | ||
return | ||
if file.type == SnapshotFileType.Model: | ||
model_file = file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should break here? Can there be multiple SnapshotfileType.Model, if yes then we only see the last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ModelStore would allow multiple models at the moment and huggingface allows that as well (e.g. mradermacher/SmolLM-135M-GGUF). However, this would be invalid for ramalama as input. When the refs
file is serialized, the current approach is to use the last seen model file - the same applies to the chat template file. So currently, I think its probably worth to add some kind of validaton for this (i.e. only one model and chat template in the list of files) and raise an Exception. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we should throw an error, since RamaLama has no way to know if it chose the right one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a validation function, incl. a few unit test cases.
We will do a release on Monday. or Sunday, |
So, when I run the following, and access the openai endpoint I still get jinja errors on tool calls as this is foundational to jinja, but not yet jinja, right? $ gh pr checkout 917
$ python3 -m venv .venv && source .venv/bin/activate && pip install -e . && python bin/ramalama serve qwen2.5:3b |
ps if I make this change, and do --- a/ramalama/model.py
+++ b/ramalama/model.py
@@ -586,6 +586,7 @@ class Model(ModelBase):
else:
exec_args = [
"llama-server",
+ "--jinja",
"--port",
args.port,
"-m", ggml-org/llama.cpp#12279 has details on failures in general with hf qwen2.5, and the error of semantic-kernel dotnet, which also applies here. |
d756093
to
c1474e7
Compare
@codefromthecrypt By using the $ ramalama inspect qwen2.5:3b | grep chat_template You get basically the jinja template required by that model. This PR is part of enabling to detect and use the chat templates provided by the platformas such as ollama as well as extract this information from .gguf models. However, this will take a bit longer. So I think its best to use ramalama with your changes for the presentation. |
Long-term we want llama-server to just default to jinja without any manual intervention and fallback to other techniques, needs upstream llama.cpp work. |
thanks for the advice folks! ps please applaud @ochafik for the work upstream on llama-cpp! ggml-org/llama.cpp#12279 |
7f5f8c2
to
b2d9aa3
Compare
Added also the feature to convert ollama's Go Templates to Jinja ones so those should work as well. For example, when running: # with chat template support
$ ramalama --use-model-store run --pull=never ollama://granite-code
🦭 > write a hello world program in python
print("hello world")
🦭 >
# without the chat template support
$ ramalama run --pull=never ollama://granite-code
🦭 > write a hello world program in python
terminate called after throwing an instance of 'std::runtime_error'
what(): this custom template is not supported I had to use It downloads the go template from ollama and then converts it to jinja, passing the latter one to llama-run. $ cat chat_template
{{ if .Suffix }}<fim_prefix> {{ .Prompt }}<fim_suffix> {{ .Suffix }}<fim_middle>
{{- else if .Messages }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}Question:
{{ .Content }}
...
$ cat chat_template_converted
{% if suffix %}<fim_prefix> {{ prompt }}<fim_suffix> {{ suffix }}<fim_middle>
{%- elif messages %}
{%- for m in messages %}
{%- set last = ((messages)[loop.index0:])|length==1 %}
{%- if m["role"]=="user" %}Question:
{{ m["content"] }}
... @rhatdan @ericcurtin PTAL, I think the PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @engelmi - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding a method to
RefFile
to check if a file exists in the snapshot directory. - The
go2jinja
dependency should be added toinstall-requirements
in the Makefile.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
36060fd
to
7198cb8
Compare
Tested this branch on macOS, problem:
|
Did you use the current version? I pushed some updates recently since the CI was failing due to this. |
LGTM other then raising an error on multiple templates issue. |
7198cb8
to
311a4d3
Compare
Just refreshed branch, still occuring |
We have this problem again on macOS /usr/bin/python3 . It would be nice to set up CI to check this. It's an easy fix everytime, just populate a variable first. If we remember to only use variables in f-string's it will stop happening.
|
This error is new to me: Why does this cause an issue on mac? And yes, would be great to have the CI properly checking for this on mac - same for the install.sh. |
I can't test this on my system or via the CI, so its quite hard to fix this. Could you help me here? @ericcurtin Edit: Removed the directory and put the copied |
Signed-off-by: Michael Engel <[email protected]>
Signed-off-by: Michael Engel <[email protected]>
Signed-off-by: Michael Engel <[email protected]>
Usually, the chat templates for gguf models are written as jinja templates. Ollama, however, uses Go Templates specific to ollama. In order to use the proper templates for models pulled from ollama, the chat templates are converted to jinja ones and passed to llama-run. Signed-off-by: Michael Engel <[email protected]>
Signed-off-by: Michael Engel <[email protected]>
Older version on python3 doesn't have PEP 701 |
311a4d3
to
b751eef
Compare
It's working now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for chat templates to "ramalama run" by enabling users to provide, download, or extract chat template files and automatically convert Go templates to Jinja templates when necessary. Key changes include:
- Introducing a new SnapshotFileType for chat templates and updating related file creation and validation functions.
- Adding a get_chat_template method to model inspection and updating container mount options to support chat templates.
- Incorporating go2jinja conversion support and extending test coverage for snapshot file validations.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
test/unit/test_model_store.py | Tests now include cases for chat template validation. |
ramalama/model_inspect.py | Added get_chat_template method for chat template extraction support. |
ramalama/url.py | Updated SnapshotFile instantiation with explicit file types. |
ramalama/common.py | Added a constant for the chat template mount point. |
ramalama/ollama.py | Updated SnapshotFile creation and path handling to include chat template files. |
ramalama/model_store.py | Introduced SnapshotFileType enum, LocalSnapshotFile, and enhanced snapshot logic to handle chat templates including conversion. |
ramalama/model.py | Updated command building for container mounts to include chat template file. |
ramalama/huggingface.py | Integrated SnapshotFileType in file creation for consistency. |
Comments suppressed due to low confidence (1)
ramalama/model_store.py:407
- [nitpick] The inner variable 'file' shadows the outer loop variable from the 'for file in snapshot_files' construct, which may lead to confusion. Consider renaming it (e.g., to 'template_file').
with open(chat_template_file_path, "r") as file:
This PR enables the (automatic) use of the chat template file via
ramalama run
by passing the chat template file tollama-run
. The chat template can be either provided/downloaded directly or is extracted from the GGUF model and stored - preference is given to the provided chat template file.TODO:
Summary by Sourcery
This PR adds support for chat templates to
ramalama run
. It enables the automatic use of chat template files by passing them tollama-run
. The chat template can be provided directly, downloaded, or extracted from the GGUF model. It also includes a conversion tool to convert Go Templates (used by Ollama) to Jinja templates.New Features: