Conferences >2020 14th IEEE/ACM Internatio...

Improving Inference Latency and Energy of DNNs through Wireless Enabled Multi-Chip-Module-based Architectures and Model Parameters Compression

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Performance and energy figures of Deep Neural Network (DNN) accelerators are profoundly affected by the communication and memory sub-system. In this paper, we make the ca...Show More

Metadata

Abstract:

Performance and energy figures of Deep Neural Network (DNN) accelerators are profoundly affected by the communication and memory sub-system. In this paper, we make the case of a state-of-the-art multi-chip-module-based architecture for DNN inference acceleration. We propose a hybrid wired/wireless network-in-package interconnection fabric and a compression technique for drastically improving the communication efficiency and reducing the memory and communication traffic with a consequent improvement of performance and energy metrics. We assess the inference performance and energy improvement vs. accuracy degradation for different CNNs showing that up to 77% and 68% of inference latency reduction and inference energy reduction, respectively, can be obtained while keeping the accuracy degradation below 5% as respect to the original uncompressed CNN.

Published in: 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)

Date of Conference: 24-25 September 2020

Date Added to IEEE Xplore: 02 November 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/NOCS50636.2020.9241714

Conference Location: Hamburg, Germany

No metrics found for this document.

Contents

I. Introduction

Many deep neural networks (DNN) accelerators share common architectural features like the presence of a large array of specialized processing elements (PEs) interconnected through a specialized Network-on-Chip (NoC). Although the different accelerators target specific sectors, including mobile, automotive, and datacenter, the current trend is the design of scalable architectures that can be used on a broad computing spectrum spanning from mobile IoT to large-scale data centers by means of the use of multi-chip-module (MCM) based architectures [1].

Usage

Select a Year

View as

Total usage sinceNov 2020:651

Year Total:32

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Crossref^®

Scopus^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

Improving Inference Latency and Energy of DNNs through Wireless Enabled Multi-Chip-Module-based Architectures and Model Parameters Compression

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Improving Inference Latency and Energy of DNNs through Wireless Enabled Multi-Chip-Module-based Architectures and Model Parameters Compression

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

View as

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?