This repository is the official implementation of INT-FlashAttention.
flash_atten_*.py
contains the main code for different Flash Attention algorithm in Triton implementation.benchmark.py
contains the performance benchmark for different algorithm above.configs.py
contains the configs you may need to adjust for the Triton autotune.csrc
contains ours algorithm implementation in cuda version. You should reference the official repository for Flash Attention to compile it.- More details can be found in the folders.