This folder contains implementations of efficient model architecture designs with improved accuracy, faster inference speed, and better resource utilization, which make them especially suitable on edge devices such as in autonomous vehicle applications.
DEST employs a GPU-friendly, simplified attention block design, reducing model size and computation by over 80% while increasing accuracy and speed, validated on depth estimation and semantic segmentation tasks. For more details about the method, check out our spotlighted paper published at 2022 CVPR Workshop on Transformers for Vision.
Convolutional Self-Attention uniquely identifies one-to-many feature relationships using only convolutions and simple tensor manipulations, enabling seamless operation in TensorRT’s restricted mode and making it ideal for safety-critical autonomous vehicle applications. Please refer to our blogpost for more details.
ReduceFormer simplifies transformer architectures for vision tasks by using reduction and element-wise multiplication, enhancing inference performance and making it ideal for edge devices and high-throughput cloud computing. For more details about ReduceFormer, please refer to our spotlighted paper published at 2024 CVPR Workshop on Transformers for Vision.
Swin-Free uses size-varying windows across stages, instead of shifting windows, to achieve cross-connection among local windows. With this simple design change, Swin-Free runs faster than the Swin Transformer at inference with better accuracy. For detail, please refer to Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window.