Mistral 7B v0.3 Support - Resolves part of #2 #28

davidjpyu · 2025-03-08T02:09:56Z

As we discussed earlier, there are implementations of mistral concurrently on both sides, after comparison, there is no meaningful difference on mistral.py, mistral_layer.py, and templates.py, so only auto_model.py is updated here and it is tested with its output answer correctly.

Although everything works fine, one thing to note is that generate.py works well using DEVICE = "cuda:0" in line 20, but not if using other gpus like changing that line to DEVICE = "cuda:1" instead. The bug would fall into cache.py line 79:
hidden_states = flashinfer.single_prefill_with_kv_cache(
with message:
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/prefill.py", line 186, in single_prefill_with_kv_cache
packed_custom_mask = packbits(
File "/home/zhuominc/anaconda3/envs/junpuy_test/lib/python3.10/site-packages/flashinfer/quantization.py", line 65, in packbits
return _kernels.packbits(x, bitorder)
RuntimeError: PackBits failed with error code an illegal memory access was encountered

I'm not sure if that is expected or it is something to figure out

compare changes and update auto_model.py only

9c24715

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral 7B v0.3 Support - Resolves part of #2 #28

Mistral 7B v0.3 Support - Resolves part of #2 #28

davidjpyu commented Mar 8, 2025

Mistral 7B v0.3 Support - Resolves part of #2 #28

Are you sure you want to change the base?

Mistral 7B v0.3 Support - Resolves part of #2 #28

Conversation

davidjpyu commented Mar 8, 2025