Loading [MathJax]/extensions/MathMenu.js
BTPA: Hardware-Software Co-Design for Bitwise Based Transformer with Parallelized Accelerator | IEEE Conference Publication | IEEE Xplore

BTPA: Hardware-Software Co-Design for Bitwise Based Transformer with Parallelized Accelerator


Abstract:

In recent years, the Transformer algorithm has achieved outstanding results in many areas. However, the Transformer algorithm relies on the self-attention mechanism, whic...Show More

Abstract:

In recent years, the Transformer algorithm has achieved outstanding results in many areas. However, the Transformer algorithm relies on the self-attention mechanism, which requires a large amount of computation with quadratic complexity and large memory resources, hindering its popularity in edge devices. Most existing studies focus on reducing the algorithm complexity by selecting part of the elements instead of all elements to participate the calculation of the attention, but do not explicitly consider the efficiency of deploying their methods on edge devices which is suffering from the float point matrix multiplication. This paper proposes a software-hardware collaborative self-attention module, which adopts bitwise operations to replace the traditional float point matrix multiplication while retaining the ability of the attention mechanism to capture complex long-range information dependencies with the reduction of algorithm complexity and memory consumption. Meanwhile, we design a dedicated acceleration operator on Xilinx ZCU104 FPGA. The experimental results show that the proposed operator achieves a speedup of more than 1300× compared with the traditional self-attention operator with only 0.8% performance loss in CIFAR image classification task.
Date of Conference: 10-12 May 2024
Date Added to IEEE Xplore: 19 July 2024
ISBN Information:
Conference Location: NYC, NY, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.