Conferences >2020 IEEE 38th International ...

Achieving Full Parallelism in LSTM via a Unified Accelerator Design

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recently, Long Short-Term Memory (LSTM), a type of recurrent neural network, has been widely employed in realtime applications, such as speech recognition, word segmentat...Show More

Metadata

Abstract:

Recently, Long Short-Term Memory (LSTM), a type of recurrent neural network, has been widely employed in realtime applications, such as speech recognition, word segmentation, machine translation, etc. While existing works demonstrate that LSTM can be efficiently deployed in cloud platforms, the high communication latency between cloud and edge will drastically reduce its efficiency. Therefore, efficient LSTM accelerators at the edge are highly demanded. The limited resource in edge devices and the heterogeneous operations in LSTM (e.g., LSTM gates) bring challenges for the LSTM accelerator design. It seems straightforward to implement each operation as a specific hardware kernel. However, the data dependency among gates leads to significant running stalls in the existing heterogeneous-kernel accelerator, resulting in low parallelism and low resource utilization. To overcome the above challenges, this work proposes a novel generic LSTM accelerator design for Field-programmable Gate Array (FPGA) and Application-specific Integrated Circuit (ASIC) platforms, where two fundamental computing patterns (i.e., element-wise multiplication and addition) are incorporated in a unified computing kernel to execute operations in all LSTM gates simultaneously. Thus, the running stalls caused by heterogeneous kernels can be eliminated, achieving full parallelism in LSTM. The proposed technique and architecture are validated on Xilinx PYNQ-Z1 FPGA which can fully utilize the available resource, achieving 10x faster in inference time and 15.2x improvement in computing power efficiency compared with the state-of-the-art LSTM accelerator.

Published in: 2020 IEEE 38th International Conference on Computer Design (ICCD)

Date of Conference: 18-21 October 2020

Date Added to IEEE Xplore: 21 December 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/ICCD50377.2020.00086

Conference Location: Hartford, CT, USA

Funding Agency:

References is not available for this document.

Contents

References is not available for this document.

Achieving Full Parallelism in LSTM via a Unified Accelerator Design

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Achieving Full Parallelism in LSTM via a Unified Accelerator Design

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?