-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to sovle warning:The context length of the model is too short to hold the multi-modal embeddings in the worst case #738
Comments
I meet the same problem, this size of tokens generated by new version is larger than the previous one. |
Hi, thanks for your interest in the Qwen model! This warning appears during the VLLM profile_run. In the original code, we added +1 to the video's |
+1, same problem |
This issue has been fixed in the latest version of vLLM. You can try updating it. |
@vefalun If your issue has been resolved, please close the issue. |
When I run vllm based on the code example in the readme file on an 8-card A100, the following warning occurs: (VllmWorkerProcess pid=427033) WARNING 02-08 11:44:42 profiling.py:187] The context length (128000) of the model is too short to hold the multi-modal embeddings in the worst case (131072 tokens in total, out of which {'image': 16384, 'video': 114688} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase max_model_len, reduce max_num_seqs, and/or reduce mm_counts. However, I couldn't find the configurations for max_model_len, max_num_seqs, and mm_countsin theconfig.json` file. How should I adjust these settings to avoid this warning? Thank you very much!
The text was updated successfully, but these errors were encountered: