Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistral 7B v0.3 Support - Resolves part of #2 #28

Open
wants to merge 1 commit into
base: v0.1.0
Choose a base branch
from

Conversation

davidjpyu
Copy link

As we discussed earlier, there are implementations of mistral concurrently on both sides, after comparison, there is no meaningful difference on mistral.py, mistral_layer.py, and templates.py, so only auto_model.py is updated here and it is tested with its output answer correctly.

Although everything works fine, one thing to note is that generate.py works well using DEVICE = "cuda:0" in line 20, but not if using other gpus like changing that line to DEVICE = "cuda:1" instead. The bug would fall into cache.py line 79:
hidden_states = flashinfer.single_prefill_with_kv_cache(
with message:
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/prefill.py", line 186, in single_prefill_with_kv_cache
packed_custom_mask = packbits(
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/quantization.py", line 65, in packbits
return _kernels.packbits(x, bitorder)
RuntimeError: PackBits failed with error code an illegal memory access was encountered

I'm not sure if that is expected or it is something to figure out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant