Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine #1486

Merged
merged 7 commits into from
Feb 26, 2025

Conversation

nikg4
Copy link
Collaborator

@nikg4 nikg4 commented Feb 26, 2025

Description

-- Use HF-supplied chat templates for VLM-s in vLLM in-process inference engine. The change will be reverted once we have a more general solution (requires more effort)
-- Solves the error for Llava and Qwen. Not effective for LLAVA (no chat template) and Phi3 (text-only template)
-- Create new helper function get_hf_chat_template()

Related issues

Towards OPE-1090

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

Reviewers

At least one review from a member of oumi-ai/oumi-staff is required.

@nikg4 nikg4 requested a review from optas February 26, 2025 19:49
@nikg4 nikg4 changed the title [WIP] Temporary fix for chat template issue with VLM inference w/ in-process vLLM engine [WIP] Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine Feb 26, 2025
@nikg4 nikg4 requested review from taenin and oelachqar February 26, 2025 20:37
@nikg4 nikg4 marked this pull request as ready for review February 26, 2025 20:37
@nikg4 nikg4 changed the title [WIP] Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine Temporary fix for chat template issue with multimodal inference w/ in-process vLLM engine Feb 26, 2025
Copy link
Collaborator

@taenin taenin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked out your branch on a VM with an A100 and ran this command:

oumi infer -c oumi://configs/recipes/vision/llava_7b/inference/vllm_infer.yaml -i --image="https://oumi.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmatthew.3389a3a6.png&w=640&q=75"

I still get an error:

Enter your input prompt: What is this photo of?
INFO 02-26 20:52:24 chat_utils.py:332] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/envs/oumi/bin/oumi", line 8, in <module>
[rank0]:     sys.exit(run())
[rank0]:              ^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/main.py", line 123, in run
[rank0]:     return app()
[rank0]:            ^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 340, in __call__
[rank0]:     raise e
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
[rank0]:     return get_command(self)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
[rank0]:     return self.main(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 743, in main
[rank0]:     return _main(
[rank0]:            ^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 198, in _main
[rank0]:     rv = self.invoke(ctx)
[rank0]:          ^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
[rank0]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
[rank0]:     return ctx.invoke(self.callback, **ctx.params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 788, in invoke
[rank0]:     return __callback(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
[rank0]:     return callback(**use_params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/infer.py", line 143, in infer
[rank0]:     return oumi_infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/__init__.py", line 135, in infer_interactive
[rank0]:     return oumi.infer.infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 58, in infer_interactive
[rank0]:     model_response = infer(
[rank0]:                      ^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 138, in infer
[rank0]:     generations = inference_engine.infer(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/core/inference/base_inference_engine.py", line 89, in infer
[rank0]:     return self.infer_online(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 366, in infer_online
[rank0]:     return self._infer(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 314, in _infer
[rank0]:     chat_responses = self._llm.chat(
[rank0]:                      ^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 725, in chat
[rank0]:     prompt_data = apply_hf_chat_template(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 978, in apply_hf_chat_template
[rank0]:     return tokenizer.apply_chat_template(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1687, in apply_chat_template
[rank0]:     rendered_chat = compiled_template.render(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 1295, in render
[rank0]:     self.environment.handle_exception()
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 942, in handle_exception
[rank0]:     raise rewrite_traceback_stack(source=source)
[rank0]:   File "<template>", line 1, in top-level template code
[rank0]: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content'

@nikg4
Copy link
Collaborator Author

nikg4 commented Feb 26, 2025

I just checked out your branch on a VM with an A100 and ran this command:

oumi infer -c oumi://configs/recipes/vision/llava_7b/inference/vllm_infer.yaml -i --image="https://oumi.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmatthew.3389a3a6.png&w=640&q=75"

I still get an error:

Enter your input prompt: What is this photo of?
INFO 02-26 20:52:24 chat_utils.py:332] Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/conda/envs/oumi/bin/oumi", line 8, in <module>
[rank0]:     sys.exit(run())
[rank0]:              ^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/main.py", line 123, in run
[rank0]:     return app()
[rank0]:            ^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 340, in __call__
[rank0]:     raise e
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
[rank0]:     return get_command(self)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
[rank0]:     return self.main(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 743, in main
[rank0]:     return _main(
[rank0]:            ^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/core.py", line 198, in _main
[rank0]:     rv = self.invoke(ctx)
[rank0]:          ^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
[rank0]:     return _process_result(sub_ctx.command.invoke(sub_ctx))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
[rank0]:     return ctx.invoke(self.callback, **ctx.params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/click/core.py", line 788, in invoke
[rank0]:     return __callback(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
[rank0]:     return callback(**use_params)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/cli/infer.py", line 143, in infer
[rank0]:     return oumi_infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/__init__.py", line 135, in infer_interactive
[rank0]:     return oumi.infer.infer_interactive(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 58, in infer_interactive
[rank0]:     model_response = infer(
[rank0]:                      ^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/infer.py", line 138, in infer
[rank0]:     generations = inference_engine.infer(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/core/inference/base_inference_engine.py", line 89, in infer
[rank0]:     return self.infer_online(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 366, in infer_online
[rank0]:     return self._infer(input, inference_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/matthew/.local/lib/python3.11/site-packages/oumi/inference/vllm_inference_engine.py", line 314, in _infer
[rank0]:     chat_responses = self._llm.chat(
[rank0]:                      ^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 725, in chat
[rank0]:     prompt_data = apply_hf_chat_template(
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/vllm/entrypoints/chat_utils.py", line 978, in apply_hf_chat_template
[rank0]:     return tokenizer.apply_chat_template(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1687, in apply_chat_template
[rank0]:     rendered_chat = compiled_template.render(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 1295, in render
[rank0]:     self.environment.handle_exception()
[rank0]:   File "/opt/conda/envs/oumi/lib/python3.11/site-packages/jinja2/environment.py", line 942, in handle_exception
[rank0]:     raise rewrite_traceback_stack(source=source)
[rank0]:   File "<template>", line 1, in top-level template code
[rank0]: jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content'

LLAVA has no chat template published on HF => The change doesn't fix the problem for LLAVA. It should work from Qwen-s and Llava.

This is non-perfect but better than completely broken as now.

@taenin
Copy link
Collaborator

taenin commented Feb 26, 2025

/configs/recipes/vision/llava_7b/inference/vllm_infer.yaml

Got it. Should we remove /configs/recipes/vision/llava_7b/inference/vllm_infer.yaml for now as well then?

@nikg4
Copy link
Collaborator Author

nikg4 commented Feb 26, 2025

/configs/recipes/vision/llava_7b/inference/vllm_infer.yaml

Got it. Should we remove /configs/recipes/vision/llava_7b/inference/vllm_infer.yaml for now as well then?

it's referenced in misc docs. It'd be a lot of busy work to delete it, then re-add. Let me do a proper fix later this week (hopefully) once I'm done with GRPO

@nikg4 nikg4 merged commit 7a3913c into main Feb 26, 2025
2 checks passed
@nikg4 nikg4 deleted the nikg4/vllm-infer-err-v2 branch February 26, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants