Processing math: 100%
A 4nm 6163-TOPS/W/b <span class="MathJax_Preview">\mathbf{4790-TOPS/mm^{2}/b}</span><script type="math/tex" id="MathJax-Element-1">\mathbf{4790-TOPS/mm^{2}/b}</script> SRAM Based Digital-Computing-in-Memory Macro Supporting Bit-Width Flexibility and Simultaneous MAC and Weight Update | IEEE Conference Publication | IEEE Xplore

A 4nm 6163-TOPS/W/b \mathbf{4790-TOPS/mm^{2}/b} SRAM Based Digital-Computing-in-Memory Macro Supporting Bit-Width Flexibility and Simultaneous MAC and Weight Update


Abstract:

The computational load, for accurate AI workloads, is moving from large server clusters to edge devices; thus enabling richer and more personalized AI applications. Compu...Show More

Abstract:

The computational load, for accurate AI workloads, is moving from large server clusters to edge devices; thus enabling richer and more personalized AI applications. Compute-in-memory (CIM) is beneficial for edge-AI workloads, specifically ones that are MAC-intensive. However, realizing better power-performance-area (PPA) and high accuracy is a major challenge for practical CIM implementation. Recent work examined tradeoffs between MAC throughput, energy efficiency and accuracy for analog based CIM [1–3]. On the other hand, digital-CIMs (DCIM), which use small, distributed SRAM banks and a customized MAC unit, have demonstrated massively-parallel computation with no accuracy loss and a higher PPA with technology scaling [4]. In this paper, we introduce a 4-nm SRAM-based DCIM macro that handles variable 8/12b-inteteger weights and 8/12/16b-integer inputs in a single macro. The proposed 8-transistor 2b OAI (or-and-invert) cell achieves a 11 % smaller combined bit cell and multiplier area, and supports ultra-low voltage operation, down to 0.32V. Furthermore, the signed-extended carry-look-ahead adder (signed-CLA) and an adder tree pipeline are introduced to boost throughput. Figure 7.4.1 shows the implementation of the bit cell structure, and a neural network accuracy comparison with various bit precisions. Since we targeted concurrent write and MAC operations, ping-pong for weight updates and MAC operations, array needs to have an even number of rows: a classical approach is to use two 12T bit cells and a 2-input NOR. The 12T cell supports simultaneous read and write operations, as its read- and write-port are independent. The 2-input NOR is used for bitwise multiplication with input activations (XIN) and weights (W). On the other hand, two 8T cells and an OAI is used in the proposed SRAM-based DCIM macro. In the proposed bitcell topology, the 8T bitcells act as memory data storage and row selection for the write operation. The OAI performs row selection and bitwise mu...
Date of Conference: 19-23 February 2023
Date Added to IEEE Xplore: 23 March 2023
ISBN Information:

ISSN Information:

Conference Location: San Francisco, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.