Conferences >2024 IEEE 6th International C...

A 28nm 343.5fps/W Vision Transformer Accelerator with Integer-Only Quantized Attention Block

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Vision Transformer (ViT) has achieved state-of-the-art performance on various computer vision tasks. For the mobile/edge device, the energy efficiency is the most importa...Show More

Metadata

Abstract:

Vision Transformer (ViT) has achieved state-of-the-art performance on various computer vision tasks. For the mobile/edge device, the energy efficiency is the most important issue. However, ViT requires huge computation and storage, which makes it difficult to be deployed on mobile/edge device. In this work, we focus on algorithm level and hardware level to improve efficiency of ViT inference. At algorithm level, we proposed energy efficient ViT model by adopting 4bit Quantization and Low-Rank Approximation to convert all the non-linear functions with floating point (FP) values in Multi-Head Attention (MHA) to linear function with integer (INT) values, to decrease the overhead caused by computation and storage. There are less accuracy drop compare with full-precision (<1.5%). At hardware level, we design an energy efficient row-based pipelined ViT accelerator for on-device inference. The proposed accelerator is consisted of integer-only quantizer, integer MACs PE array used for executing quantization and matrix operations, and approximated linear block adopted for executing low-rank approximation. As we know, in the research of ViT, this is the first accelerator uses 4-bits quantization and designs quantizer to operate integer-only quantization for on-device inference. This work can achieve energy efficiency of 343.5 fps/W and improve up to 8x energy efficiency compare to state-of-art works.

Published in: 2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS)

Date of Conference: 22-25 April 2024

Date Added to IEEE Xplore: 19 July 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/AICAS59952.2024.10595969

Conference Location: Abu Dhabi, United Arab Emirates

Contents

I. Introduction

Vision Transformers (ViTs) have been shown to be highly effective in various computer vision tasks, including image classification, segmentation, and object detection. The ViT [1] was the first work to implement the encoder structure of the transformer model for image classification, achieving improved accuracy. However, as ViT models become massive, they become more difficult to deploy on edge/mobile devices. Therefore, for edge/mobile devices, both accuracy and efficiency are crucial. There are several methods to achieve high-efficiency on-device inference, such as reducing computational cost, memory storage, and memory footprints. DeiT [2] and Swin [3] have improved ViTs to make them more efficient. However, the massive computation and memory access requirements make ViT models difficult to deploy on edge applications. Therefore, lightweight and efficient ViT models have become a recent trend.

References is not available for this document.

A 28nm 343.5fps/W Vision Transformer Accelerator with Integer-Only Quantized Attention Block

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A 28nm 343.5fps/W Vision Transformer Accelerator with Integer-Only Quantized Attention Block

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?