Abstract:
Recently, Long Short-Term Memory (LSTM), a type of recurrent neural network, has been widely employed in realtime applications, such as speech recognition, word segmentat...Show MoreMetadata
Abstract:
Recently, Long Short-Term Memory (LSTM), a type of recurrent neural network, has been widely employed in realtime applications, such as speech recognition, word segmentation, machine translation, etc. While existing works demonstrate that LSTM can be efficiently deployed in cloud platforms, the high communication latency between cloud and edge will drastically reduce its efficiency. Therefore, efficient LSTM accelerators at the edge are highly demanded. The limited resource in edge devices and the heterogeneous operations in LSTM (e.g., LSTM gates) bring challenges for the LSTM accelerator design. It seems straightforward to implement each operation as a specific hardware kernel. However, the data dependency among gates leads to significant running stalls in the existing heterogeneous-kernel accelerator, resulting in low parallelism and low resource utilization. To overcome the above challenges, this work proposes a novel generic LSTM accelerator design for Field-programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) platforms, where two fundamental computing patterns (i.e., element-wise multiplication and addition) are incorporated in a unified computing kernel to execute operations in all LSTM gates simultaneously. Thus, the running stalls caused by heterogeneous kernels can be eliminated, achieving full parallelism in LSTM. The proposed technique and architecture are validated on Xilinx PYNQ-Z1 FPGA which can fully utilize the available resource, achieving 10x faster in inference time and 15.2x improvement in computing power efficiency compared with the state-of-the-art LSTM accelerator.
Date of Conference: 18-21 October 2020
Date Added to IEEE Xplore: 21 December 2020
ISBN Information:
ISSN Information:
Funding Agency:
References is not available for this document.
References is not available for this document.