Abstract:
This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accurac...Show MoreMetadata
Abstract:
This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.
Date of Conference: 28-31 May 2017
Date Added to IEEE Xplore: 28 September 2017
ISBN Information:
Electronic ISSN: 2379-447X