INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

This repository is the official implementation of INT-FlashAttention.

About this repository

flash_atten_*.py contains the main code for different Flash Attention algorithm in Triton implementation.
benchmark.py contains the performance benchmark for different algorithm above.
configs.py contains the configs you may need to adjust for the Triton autotune.
csrc contains ours algorithm implementation in cuda version. You should reference the official repository for Flash Attention to compile it.
More details can be found in the folders.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
csrc		csrc
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
benchmark.py		benchmark.py
configs.py		configs.py
flash_atten_fp.py		flash_atten_fp.py
flash_atten_full_int8.py		flash_atten_full_int8.py
flash_atten_int8.py		flash_atten_int8.py