I. Introduction
Convolution Neural Networks (CNNs) have revolutionized the fields of image recognition and classification. Over the years, a wide variety of CNN accelerators [1]–[3] are implemented to achieve higher performance and lower power consumption as compared to general purpose computing. A current emerging concept in this aspect is Processing-in-Memory (PIM) [4]. The aim of PIM architecture is to eliminate the memory bound (energy and bandwidth) issues that are associated with CNN data and parameter accesses [1]. It bridges the memory-computation gap by integrating the CNN computation logic in the memory devices. The tight coupling of the memory and computational logic enables massive data parallelism (performance increase) and minimal data movement cost, which results in a much higher energy efficiency. Because of these advantages, PIM is an active research domain in recent years. In this work, we focus on integrating a Binary Weighted Network (BWN)-CNN inference engine into a commodity DRAM architecture. This enables the implementation of CNN with a larger memory footprint and provides higher data access bandwidth, low latency and low power consumption. The prior-art that are most related to our work are XNOR-POP [5], DrAcc [6], and DRISA [7]. XNOR-POP is based on the incorrect premise that the number of SSAs are equal to the DRAM page size. In commodity DRAMs, the number of SSAs in a bank are limited (I/O data width×bursts length). Thus, the proposed solution in XNOR-POP is contrary to standard DRAM architecture. DRISA and DrACC adapts a concept of concurrently activating multiple rows to realize the basic logic operations (AND, OR, NOT), which was first proposed in Ambit [8]. The major drawbacks of this concept are the process variation in the DRAM SA and high energy consumption due to the numerous multi-row activations. The logical operation failure due to process variation in Ambit is as high as 26% for a process variation of 25%. DrAcc extends Ambit by introducing a new special row inside the sub-array (SA) for shift operations. However, DrAcc neither addresses this issue of Ambit that affect CNN inference nor mentions the CNN resilience factor for such failures. DRISA addressed this issue of Ambit by restructuring and redesign sub-array that is different from the the highly optimized commodity DRAM sub-array design. DRISA also proposes an alternative solution, called 1T1C-adder, that employs commodity DRAM sub-array. Nevertheless, the authors of DRISA highlight that, 1T1C-adder is not feasible in a DRAM technology from the energy/area perspective due to large adder trees integrated near the PSA in each SA.