Loading [MathJax]/extensions/MathMenu.js
An In-DRAM Neural Network Processing Engine | IEEE Conference Publication | IEEE Xplore

An In-DRAM Neural Network Processing Engine


Abstract:

Many advanced neural network inference engines are bounded by the available memory bandwidth. The conventional approach to address this issue is to employ high bandwidth ...Show More

Abstract:

Many advanced neural network inference engines are bounded by the available memory bandwidth. The conventional approach to address this issue is to employ high bandwidth memory devices or to adapt data compression techniques (reduced precision, sparse weight matrices). Alternatively, an emerging approach to bridge the memory-computation gap and to exploit extreme data parallelism is Processing in Memory (PIM). The close proximity of the computation units to the memory cells reduces the amount of external data transactions and it increases the overall energy efficiency of the memory system. In this work, we present a novel PIM based Binary Weighted Network (BWN) inference accelerator design that is inline with the commodity Dynamic Random Access Memory (DRAM) design and process. In order to exploit data parallelism and minimize energy, the proposed architecture integrates the basic BWN computation units at the output of the Primary Sense Amplifiers (PSAs) and the rest of the substantial logic near the Secondary Sense Amplifiers (SSAs). The power and area values are obtained at sub-array (SA) level using exhaustive circuit level simulations and full-custom layout. The proposed architecture results in an area overhead of 25 % compared to a commodity 8 Gb DRAM and delivers a throughput of 63.59 FPS (Frames per Second) for AlexNet. We also demonstrate that our architecture is extremely energy efficient, 7.25× higher FPS/W, as compared to previous works.
Date of Conference: 26-29 May 2019
Date Added to IEEE Xplore: 01 May 2019
Print ISBN:978-1-7281-0397-6
Print ISSN: 2158-1525
Conference Location: Sapporo, Japan

I. Introduction

Convolution Neural Networks (CNNs) have revolutionized the fields of image recognition and classification. Over the years, a wide variety of CNN accelerators [1]–[3] are implemented to achieve higher performance and lower power consumption as compared to general purpose computing. A current emerging concept in this aspect is Processing-in-Memory (PIM) [4]. The aim of PIM architecture is to eliminate the memory bound (energy and bandwidth) issues that are associated with CNN data and parameter accesses [1]. It bridges the memory-computation gap by integrating the CNN computation logic in the memory devices. The tight coupling of the memory and computational logic enables massive data parallelism (performance increase) and minimal data movement cost, which results in a much higher energy efficiency. Because of these advantages, PIM is an active research domain in recent years. In this work, we focus on integrating a Binary Weighted Network (BWN)-CNN inference engine into a commodity DRAM architecture. This enables the implementation of CNN with a larger memory footprint and provides higher data access bandwidth, low latency and low power consumption. The prior-art that are most related to our work are XNOR-POP [5], DrAcc [6], and DRISA [7]. XNOR-POP is based on the incorrect premise that the number of SSAs are equal to the DRAM page size. In commodity DRAMs, the number of SSAs in a bank are limited (I/O data width×bursts length). Thus, the proposed solution in XNOR-POP is contrary to standard DRAM architecture. DRISA and DrACC adapts a concept of concurrently activating multiple rows to realize the basic logic operations (AND, OR, NOT), which was first proposed in Ambit [8]. The major drawbacks of this concept are the process variation in the DRAM SA and high energy consumption due to the numerous multi-row activations. The logical operation failure due to process variation in Ambit is as high as 26% for a process variation of 25%. DrAcc extends Ambit by introducing a new special row inside the sub-array (SA) for shift operations. However, DrAcc neither addresses this issue of Ambit that affect CNN inference nor mentions the CNN resilience factor for such failures. DRISA addressed this issue of Ambit by restructuring and redesign sub-array that is different from the the highly optimized commodity DRAM sub-array design. DRISA also proposes an alternative solution, called 1T1C-adder, that employs commodity DRAM sub-array. Nevertheless, the authors of DRISA highlight that, 1T1C-adder is not feasible in a DRAM technology from the energy/area perspective due to large adder trees integrated near the PSA in each SA.

References

References is not available for this document.