Mistral 7B v0.3 Support - Resolves part of #2 #28
+2
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As we discussed earlier, there are implementations of mistral concurrently on both sides, after comparison, there is no meaningful difference on mistral.py, mistral_layer.py, and templates.py, so only auto_model.py is updated here and it is tested with its output answer correctly.
Although everything works fine, one thing to note is that generate.py works well using
DEVICE = "cuda:0"
in line 20, but not if using other gpus like changing that line toDEVICE = "cuda:1"
instead. The bug would fall into cache.py line 79:hidden_states = flashinfer.single_prefill_with_kv_cache(
with message:
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/prefill.py", line 186, in single_prefill_with_kv_cache
packed_custom_mask = packbits(
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/quantization.py", line 65, in packbits
return _kernels.packbits(x, bitorder)
RuntimeError: PackBits failed with error code an illegal memory access was encountered
I'm not sure if that is expected or it is something to figure out