Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine #1486

nikg4 · 2025-02-26T19:41:27Z

Description

-- Use HF-supplied chat templates for VLM-s in vLLM in-process inference engine. The change will be reverted once we have a more general solution (requires more effort)
-- Solves the error for Llava and Qwen. Not effective for LLAVA (no chat template) and Phi3 (text-only template)
-- Create new helper function get_hf_chat_template()

Related issues

Towards OPE-1090

Before submitting

This PR only changes documentation. (You can ignore the following checks in that case)
Did you read the contributor guideline Pull Request guidelines?
Did you link the issue(s) related to this PR in the section above?
Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

…m-infer-err-v2

taenin

I just checked out your branch on a VM with an A100 and ran this command:

oumi infer -c oumi://configs/recipes/vision/llava_7b/inference/vllm_infer.yaml -i --image="https://oumi.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmatthew.3389a3a6.png&w=640&q=75"

I still get an error:

Enter your input prompt: What is this photo of?
INFO 02-26 20:52:24 chat_utils.py:332] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/envs/oumi/bin/oumi", line 8, in <module>
[rank0]:     sys.exit(run())
[rank0]:              ^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/main.py", line 123, in run
[rank0]:     return app()
[rank0]:            ^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 340, in __call__
[rank0]:     raise e
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
[rank0]:     return get_command(self)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
[rank0]:     return self.main(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 743, in main
[rank0]:     return _main(
[rank0]:            ^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 198, in _main
[rank0]:     rv = self.invoke(ctx)
[rank0]:          ^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
[rank0]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
[rank0]:     return ctx.invoke(self.callback, **ctx.params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 788, in invoke
[rank0]:     return __callback(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
[rank0]:     return callback(**use_params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/infer.py", line 143, in infer
[rank0]:     return oumi_infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/__init__.py", line 135, in infer_interactive
[rank0]:     return oumi.infer.infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 58, in infer_interactive
[rank0]:     model_response = infer(
[rank0]:                      ^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 138, in infer
[rank0]:     generations = inference_engine.infer(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/core/inference/base_inference_engine.py", line 89, in infer
[rank0]:     return self.infer_online(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 366, in infer_online
[rank0]:     return self._infer(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 314, in _infer
[rank0]:     chat_responses = self._llm.chat(
[rank0]:                      ^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 725, in chat
[rank0]:     prompt_data = apply_hf_chat_template(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 978, in apply_hf_chat_template
[rank0]:     return tokenizer.apply_chat_template(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1687, in apply_chat_template
[rank0]:     rendered_chat = compiled_template.render(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 1295, in render
[rank0]:     self.environment.handle_exception()
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 942, in handle_exception
[rank0]:     raise rewrite_traceback_stack(source=source)
[rank0]:   File "<template>", line 1, in top-level template code
[rank0]: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content'

nikg4 · 2025-02-26T20:58:51Z

I just checked out your branch on a VM with an A100 and ran this command:

oumi infer -c oumi://configs/recipes/vision/llava_7b/inference/vllm_infer.yaml -i --image="https://oumi.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmatthew.3389a3a6.png&w=640&q=75"

I still get an error:

Enter your input prompt: What is this photo of?
INFO 02-26 20:52:24 chat_utils.py:332] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/envs/oumi/bin/oumi", line 8, in <module>
[rank0]:     sys.exit(run())
[rank0]:              ^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/main.py", line 123, in run
[rank0]:     return app()
[rank0]:            ^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 340, in __call__
[rank0]:     raise e
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
[rank0]:     return get_command(self)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
[rank0]:     return self.main(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 743, in main
[rank0]:     return _main(
[rank0]:            ^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 198, in _main
[rank0]:     rv = self.invoke(ctx)
[rank0]:          ^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
[rank0]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
[rank0]:     return ctx.invoke(self.callback, **ctx.params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 788, in invoke
[rank0]:     return __callback(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
[rank0]:     return callback(**use_params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/infer.py", line 143, in infer
[rank0]:     return oumi_infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/__init__.py", line 135, in infer_interactive
[rank0]:     return oumi.infer.infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 58, in infer_interactive
[rank0]:     model_response = infer(
[rank0]:                      ^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 138, in infer
[rank0]:     generations = inference_engine.infer(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/core/inference/base_inference_engine.py", line 89, in infer
[rank0]:     return self.infer_online(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 366, in infer_online
[rank0]:     return self._infer(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 314, in _infer
[rank0]:     chat_responses = self._llm.chat(
[rank0]:                      ^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 725, in chat
[rank0]:     prompt_data = apply_hf_chat_template(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 978, in apply_hf_chat_template
[rank0]:     return tokenizer.apply_chat_template(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1687, in apply_chat_template
[rank0]:     rendered_chat = compiled_template.render(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 1295, in render
[rank0]:     self.environment.handle_exception()
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 942, in handle_exception
[rank0]:     raise rewrite_traceback_stack(source=source)
[rank0]:   File "<template>", line 1, in top-level template code
[rank0]: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content'

LLAVA has no chat template published on HF => The change doesn't fix the problem for LLAVA. It should work from Qwen-s and Llava.

This is non-perfect but better than completely broken as now.

taenin · 2025-02-26T21:08:39Z

/configs/recipes/vision/llava_7b/inference/vllm_infer.yaml

Got it. Should we remove /configs/recipes/vision/llava_7b/inference/vllm_infer.yaml for now as well then?

nikg4 · 2025-02-26T21:12:56Z

/configs/recipes/vision/llava_7b/inference/vllm_infer.yaml

Got it. Should we remove /configs/recipes/vision/llava_7b/inference/vllm_infer.yaml for now as well then?

it's referenced in misc docs. It'd be a lot of busy work to delete it, then re-add. Let me do a proper fix later this week (hopefully) once I'm done with GRPO

nikg4 added 3 commits February 25, 2025 17:22

save

2b4e20c

Merge branch 'main' of https://github.com/oumi-ai/oumi into nikg4/vll…

5f009a0

…m-infer-err-v2

temp fix

caa801a

nikg4 requested a review from optas February 26, 2025 19:49

nikg4 added 3 commits February 26, 2025 12:28

save

eca3153

save

d60901d

save

5ea41eb

nikg4 changed the title ~~[WIP] Temporary fix for chat template issue with VLM inference w/ in-process vLLM engine~~ [WIP] Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine Feb 26, 2025

nikg4 requested review from taenin and oelachqar February 26, 2025 20:37

nikg4 marked this pull request as ready for review February 26, 2025 20:37

nikg4 changed the title ~~[WIP] Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine~~ Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine Feb 26, 2025

save

d5d4732

taenin reviewed Feb 26, 2025

View reviewed changes

taenin approved these changes Feb 26, 2025

View reviewed changes

nikg4 merged commit 7a3913c into main Feb 26, 2025
2 checks passed

nikg4 deleted the nikg4/vllm-infer-err-v2 branch February 26, 2025 21:15

nikg4 mentioned this pull request Feb 28, 2025

Fix chat template issue for nested content parts used for VLMs #1493

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine #1486

Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine #1486

nikg4 commented Feb 26, 2025 •

edited

Loading

taenin left a comment

nikg4 commented Feb 26, 2025

taenin commented Feb 26, 2025

nikg4 commented Feb 26, 2025

Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine #1486

Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine #1486

Conversation

nikg4 commented Feb 26, 2025 • edited Loading

Description

Related issues

Before submitting

Reviewers

taenin left a comment

Choose a reason for hiding this comment

nikg4 commented Feb 26, 2025

taenin commented Feb 26, 2025

nikg4 commented Feb 26, 2025

nikg4 commented Feb 26, 2025 •

edited

Loading