Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC | IEEE Conference Publication | IEEE Xplore

Memory and Computation Coordinated Mapping of DNNs onto Complex Heterogeneous SoC


Abstract:

The DNN models are now pervasively used for various applications. Meanwhile, the computing hardware has shifted towards heterogeneous system composed of various accelerat...Show More

Abstract:

The DNN models are now pervasively used for various applications. Meanwhile, the computing hardware has shifted towards heterogeneous system composed of various accelerators. The intertwined complexity of DNN models and hardware makes it challenging for mapping DNN models. Existing mapping frameworks suffer from inefficiencies due to under utilization of computation and bandwidth in heterogeneous SoC. In this paper, we propose COMB, a mapping framework that coordinates the memory and computation and data transfer overhead of heterogeneous accelerators to achieve latency improvement and energy efficiency with two optimizations: dataflow grouping and accelerator mapping. Dataflow grouping maps multiple independent DNN layers to the same accelerator at the same time to spatially share the hardware resources; accelerator mapping finds the optimized placement of the layer groups to accelerators to reduce data transfer overhead. These two optimizations provide a huge design space for heterogeneous DNN mapping. To explore the space efficiently, we present a hybrid scheduling algorithm by combining greedy algorithm and genetic algorithm. In evaluation, COMB achieves 1.28× and 1.37× speedup for latency compared to MAGMA and H2H; COMB also reduces 22.7% and 29.2% energy consumption compared to MAGMA and H2H.
Date of Conference: 09-13 July 2023
Date Added to IEEE Xplore: 15 September 2023
ISBN Information:
Conference Location: San Francisco, CA, USA

Funding Agency:


I. Introduction

The advancement in multi-modality and multi-task learning brings various heterogeneous DNN model architectures that are composed of multiple different types of computations (e.g., convolution layers, fully connected layers, and transformer layers) as shown in Figure 1 part a). The heterogeneous layers prefer different hardware accelerator designs for high performance [1]–[3]. To provide low latency and high energy efficiency, various heterogeneous SoC designs [1], [4] have been proposed recently. The heterogeneous SoC is normally composed of multiple different types of accelerators that are connected to each other spatially through network-on-chip (NoC) as shown in Figure 1 part b). The major features of such heterogeneous SoC can be summarized into two aspects. For computation, the accelerators in the heterogeneous system employ different dataflows [2] (e.g., weight-stationary, output-stationary) and different hardware resources to accelerate different workloads. For memory, each accelerator occupies a local scratchpad memory to store a tile of input/output data for computation. The scratchpad is limited in size and the accelerators have to communicate with each other to get the required data through NoC. The accelerators that are close to each other need less delay (NoC hops) for data transmission compared to those that are farther away from each other.

Contact IEEE to Subscribe

References

References is not available for this document.