Loading [MathJax]/jax/output/HTML-CSS/autoload/mtable.js
A Hybrid SRAM/RRAM In-Memory Computing Architecture Based on a Reconfigurable SRAM Sense Amplifier | IEEE Journals & Magazine | IEEE Xplore

A Hybrid SRAM/RRAM In-Memory Computing Architecture Based on a Reconfigurable SRAM Sense Amplifier


The proposed hybrid memory architecture

Abstract:

In this paper, a hybrid memory architecture based on a new array of SRAM and resistive random-access memory (RRAM) cells is proposed to perform in-memory computing by imp...Show More

Abstract:

In this paper, a hybrid memory architecture based on a new array of SRAM and resistive random-access memory (RRAM) cells is proposed to perform in-memory computing by implementing all basic two-input Boolean functions. The SRAM array can be configured as a dual-purpose element. It can be used as an SRAM array in memory mode to keep data for high-performance application requirements. It can also be configured as a sense amplifier (SA-SRAM) for reading the contents of RRAMs and performing the in-memory computation. The circuits are designed using independent-gate FinFET (IG-FinFET), whose channel is controlled by two independent gates, increasing the design’s maneuverability. Our results indicate that the proposed SA-SRAM cells’ write energy consumption and combined word line margin (CWLM) achieve 50% and 20% improvements compared to the conventional 8T SRAM. Moreover, by benefiting from the combination of SRAM and RRAM cells in the proposed architecture, the energy consumption of our design in application areas, such as image processing, is much lower than the well-known compared in-memory architecture designs. In addition, to address security concerns, we proposed a polymorphic circuit primitive to prevent reverse engineering or integrated circuit (IC) counterfeiting. The proposed polymorphic circuit also adds more computations to accomplish complex logic operations and the proposed hybrid memory architecture.
The proposed hybrid memory architecture
Published in: IEEE Access ( Volume: 11)
Page(s): 72159 - 72171
Date of Publication: 12 July 2023
Electronic ISSN: 2169-3536

SECTION I.

Introduction

With the increasing need for high-performance processing in applications such as artificial intelligence, neural networks, search engines, biological systems, and image processing, the von Neumann architecture faces a critical challenge known as the memory wall [1], [2]. In this case, transferring a large amount of data back and forth between a separate processor and memory leads to severe energy consumption, high latency, and I/O congestion [3], [4]. Therefore, one of the most promising approaches for alleviating these challenges is in-memory computing (IMC), which provides computing capability inside memory [5], [6]. IMC can perform simple computational tasks to reduce memory-processor data transfers [7]. In other words, memory cells are considered to accomplish normal read-write operations and perform simple logical calculations within the memory to bypass the von Neumann bottleneck [8]. Various studies have been conducted on the role of IMC platforms based on volatile and nonvolatile memories. Dynamic random-access memory (DRAM) and static random-access memory (SRAM) are used for volatile in-memory computing design, and emerging technologies such as magnetic RAM (MRAM) and RRAM are used in the nonvolatile area [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20]. It is worth mentioning that the advantages of using volatile memories are related to widespread usage and the read/write operation speed. Moreover, these memories can implement short-term memory in neuromorphic computing [21]. However, high power consumption in DRAMs and large area occupation in SRAMs may deter these designs [17]. In contrast, low power dissipation, dense integration, and nonvolatility are the significant privileges of emerging technologies for the IMC paradigm. Meanwhile, these memories face high read/write delay and reliability issues due to low endurance [8].

In [9], an 8T SRAM cell for in-memory computing capable of performing simple Boolean computation was proposed. Although the latency is reduced because of the separated read/write operations and in-memory Boolean logic calculations, write power consumption and reliability concerns in half-selected cells are significant issues in this design. Moreover, this design is not suitable for performing complex Boolean logic functions.

In [18], an 8T SRAM cell with three ports for four operands to perform complex Boolean functions was proposed. In one cycle, this design can perform three Boolean operations (XOR/XNOR, AND/NAND, and OR/NOR). However, the static noise margin (SNM) of the SRAM cells is considerably reduced at the nanoscale. Moreover, the area overhead and volatility are the critical bottlenecks of this design.

In [19], nonvolatile hybrid MTJ/CMOS logic gates capable of performing Boolean logic functions for the in-memory computation were proposed. Although this design shows acceptable performance compared to benchmarks, the delay in performing Boolean computation is considerably higher than in CMOS-based designs. Moreover, this design is not suitable for applications that need real-time computations.

To this aim, [21] proposed a spintronic/CMOS memory, including volatile DRAM and nonvolatile MRAM cells, to cope with the requirements of real and none real-time applications. However, this design only covers some Boolean logic functions and is unsuitable for complex calculations. Moreover, this design’s writing energy and delay are considerably higher than the CMOS designs.

None of these designs has benefited from the fast read/write of the volatile DRAM (or SRAM) designs and the low power consumption and dense integration of nonvolatile emerging technologies (RRAM/MRAM). Using a single circuit to implement multiple functions reduces chip area and forms a primary barrier to security issues like reverse engineering or IC counterfeiting. These features are provided by polymorphic circuits [22]. This reconfigurable circuit utilizes controllable factors to implement various functions. For example, signals such as voltage and temperature can configure a polymorphic circuit to perform multiple functions [19]. The placement of such circuits in a memory unit offers the ability to increase operational capacity and also enhances its security.

This paper proposes a novel hybrid SRAM/RRAM-based in-memory computing architecture capable of performing all Boolean logic functions. The SRAM array is designed by benefiting from the proposed reconfigurable 11T SRAM cell capable of reading data from RRAM-based main memory in sense-amplifier mode and performing all Boolean logic functions besides the data storage ability in the SRAM mode. Moreover, the proposed SRAM cell has a high static noise margin beside the free half-select issue, which is critical in memory design.

To our best knowledge, it is for the first time that the SRAM and RRAM memory array are combined to handle the need for real and none real-time applications. This is due to the proposed hybrid architecture benefiting from the RRAM-based memory for non-real-time applications and the reconfigurable SRAM array, which can be used for real-time application requirements. Moreover, this design is a great candidate for implementing neuromorphic computing. This is because, in neuromorphic computing, the memory architecture is inherited from a biological memory that needs short-term and long-term memory elements [21]. This can be modeled by SRAM (short-term) and RRAM (long-term) memory array in the proposed architecture.

The rest of the paper is organized as follows: Section II reviews the fundamentals of RRAM. Section III describes the proposed SRAM cell. The proposed reconfigurable SRAM array is described in section IV. Section V discussed the proposed reconfigurable SRAM array and the hybrid process in-memory architecture. The proposed hybrid architecture is evaluated and compared with its state-of-the-art counterparts in Section VI, and finally, section VII concludes the paper.

SECTION II.

Resistive Random-Access Memory (RRAM)

RRAM is a two-terminal device comprising an oxide layer sandwiched between two metal layers. By applying a voltage across its terminals, RRAM’s resistance can change between the low-resistance state (LRS) and the high-resistance state (HRS). Despite RRAM being a memory element, its influence has extended beyond memory design to logic circuits and computing systems. Furthermore, the use of RRAM in next-generation nonvolatile memories has been touted due to its near-zero leakage power consumption, low read/write voltage, fast switching speed, and excellent scalability [23]. Fig. 1 shows the physical structure of the RRAM device.

FIGURE 1. - Structure of the RRAM device.
FIGURE 1.

Structure of the RRAM device.

Applying different voltages to the top (TE) and the bottom (BE) metal electrodes can form/rupture oxygen vacancy inside the oxide layer (OL), considered a conductive filament (CF). In addition, RRAM resistance is determined by the gap distance (g) between TE and the apex of CF [24].

The I-V characteristic of an RRAM can be expressed as:\begin{equation*} I=I_{0} e^{\left ({{-\,\,\frac {g}{g_{0}}} }\right)}\sinh \left ({{\frac {V}{V_{0}}} }\right) \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where I0, g0, and V0 are fitting parameters. When a positive (negative) voltage is applied to the RRAM, an oxygen vacancy is generated (recombined), which then leads to growth (dissolution) at the CF.

A positive voltage should be applied to RRAM to transmit from HRS to LRS (SET). Alternatively, transmitting from LRS to HRS (RESET) can be achieved using a negative voltage [25]. It is worth mentioning that the RRAM device can be fabricated on top of the transistors [26].

SECTION III.

Proposed SRAM Cell in the Memory Mode

This section proposes a half-select Free 11T SRAM cell (HF11T) using independent-gate FinFETs. As shown in Fig. 2, two back-to-back inverters ((M1, M2) and (M3, M4)) keep the data in the storage nodes. Two feedback-cutting pMOS transistors (M5 and M6) are used between the cross-coupled inverters to eliminate the conflict between the pull-up and access devices during the write operation. To write a ‘0’ or ‘1’ data into the proposed cell, the ‘0’ data transfers from the ground through M11 shared transistor and M7 or M8 transistors to the corresponding storage node Q or QB, respectively. Moreover, in the read operation, M9 and M10 transistors provide a decupled read operation through the read bit-line (RBL).

FIGURE 2. - Proposed HF11T SRAM cell.
FIGURE 2.

Proposed HF11T SRAM cell.

To describe the functionality of the proposed design, each operational mode is discussed separately in detail.

A. Hold State

During the hold state, the WWLA, WWLB, and WL control signals are forced to ‘0’, turning the M7, M8, and M11 transistors OFF. Moreover, as the front and back gates of M5 and M6 are completely active, the required feedback for maintaining data in the cell is established.

B. Read Operation

To perform the read operation, the RBL signal is pre-charged to \text{V}_{\mathrm {DD}} , and then, the RWL signal is asserted to ‘1’. Then, depending on the data kept at the QB node, RBL is either discharged through M9 and M10 to the ground or remains at its initial value, and the read operation is performed.

C. Write Operation

The write operation of the proposed SRAM cell is performed by asserting WL with WWLA or WWLB signals. In this case, to write a ‘0’ or ‘1’ data in the cell, the path from the ground node to the Q and QB nodes must be established. For instance, suppose Q and QB store ‘1’ and ‘0’, respectively. To write ‘0’ in the Q node, WL and WWLA are asserted to ‘1’. Therefore, M7 and M11 become completely activated, forcing Q to ‘0’. Meanwhile, by asserting WL and WWLA, M5 turns off, temporally eliminating the feedback path. Hence, ‘0’ is written in the floated Q node without contention. Simultaneously, M6 remains partially active due to the connection of its front gate to WWLB, which is initially forced to ‘0’. Therefore, M6 still can pass the strong ‘1’ from the left inverter output to the gate of the right inverter (QB), as shown in Fig. 2. Consequently, by forcing WWLA and WL to ‘0’, the new data will be retained in the cell. Similarly, the writing ‘0’ operation in node QB is executed by asserting WWLB and WL signals to ‘1’.

D. Half-Selection Issue

One of the most important considerations that must be considered in the design of an SRAM cell is to avoid unwanted writing in the neighboring cells of the cell of interest in the writing or reading process, known as the half-select issue. This issue is entirely addressed in our proposed cell. To write ‘0’ in an arbitrary cell, that cell’s WL and WWLA signals should be activated. These signals involve the unwanted cells in the row and column of the desired cell. In any row cell with only the WL signal activated, transistors M5 and M6 partially turn ON, and the feedback path remains active as a barrier against noise. In any column cell with only the WWLA signal activated, the M5 transistor is still ON, and the feedback paths are active. Therefore, the floating problem existing in the cells proposed in [27], [28], and [29] is solved, and the competition causing unnecessary power dissipation in cells like [9] and [30] is eliminated.

SECTION IV.

Proposed Reconfigurable SRAM Cell FRT In-Memory Computing

To obtain the in-memory computing requirements in application areas such as optical character recognition (OCR), the proposed SRAM cell is redesigned to be a sense amplifier for reading data from a nonvolatile memory or a volatile memory element (SRAM mode). Accordingly, as shown in Fig. 3, the back gates of the pull-up transistors (M1 and M3) are connected to the PC control signal. An extra n-type transistor (MG) is placed between M2/M4 and ground to float the storage nodes and pre-charge them to prepare the cell for sensing the storage data in nonvolatile memory. Meanwhile, the MRR and MRL transistors assess the pre-charged nodes for reading data from the nonvolatile memory.

FIGURE 3. - Proposed reconfigurable SRAM cell.
FIGURE 3.

Proposed reconfigurable SRAM cell.

The proposed cell can be configured as either a fast memory element (SRAM mode) to perform a normal read operation in real-time applications by disabling the PC and Read-EN control signals or a sense amplifier (sense amplifier mode) for in-memory computing by enabling PC followed by Read-EN.

The sense amplifier mode starts by enabling the PC control signal, which enables the back-gate of the pull-up transistor for pre-charging the storage nodes and disables the MG transistor to eliminate the path to the ground. Then, by asserting the Read-EN signal, the storage nodes (Q and QB) are connected to the memory element (RRAM) and the reference resistance, respectively, as shown in Fig. 4. According to the difference between the resistance of the RRAM cell and the reference resistance, the path with a lower resistance discharge faster. Meanwhile, by disabling the PC control signal, the cross-coupled inverters are activated, and the data of the lower resistance path is tied to the ground. After disabling the PC signal, the Read-EN signal is disabled, and the desired data will be kept in the cell.

FIGURE 4. - Proposed sense amplifier for reading data from RRAM.
FIGURE 4.

Proposed sense amplifier for reading data from RRAM.

SECTION V.

Proposed Process In-Memory Architecture

This section presents a new hybrid memory architecture, including an RRAM-based main memory array (MMA)ň, a reconfigurable SRAM array (SA-SRAM), and polymorphic logic units (PMU)ő as shown in Fig. 5. The MMA row (MMRD) and source line (SLD) decoders are considered to select cells for either reading, writing or computing operations. The SA-SRAM unit can be configured in SRAM mode and used as a sense amplifier to read the MMA data and then switch back to SRAM mode to maintain the reading data.

FIGURE 5. - The proposed hybrid memory architecture.
FIGURE 5.

The proposed hybrid memory architecture.

It is worth mentioning that the SA-SRAM cells can perform in-memory computing besides the storage mode. Switching between the SRAM and sense amplifier modes is done by the mode decoder (MD) unit, which provides the PC signals. In the meantime, the read-enable decoder (RED) contributes to the communication between the MMA and SA-SRAM units. In the SRAM mode, the column (SCD) and row (SRD) decoders are utilized for reading from and writing to the SA-SRAM cells. The SA-SRAM output can be maintained in the latch unit presented in the sense amplifier unit (SAU) to provide proper inputs for PMUs. PMUs act as security-processing blocks to increase the computation capability in this architecture, designed to handle real-time and non-real-time applications. Each operational mode is discussed in detail in the following subsections to clarify the function of the array architecture.

A. MMA Memory Mode

In MMA, the data stored in a memory cell is represented by the resistance of RRAM. To write a bit into a memory cell, the corresponding source line (SL) and bit-line (BL) are connected to the respective voltages to modify the stored data. For example, to write ‘0’ into a cell, the corresponding SL should be grounded, and BL should be connected to the write voltage (2V). Then, the RRAM cell will be reset by enabling the corresponding DWL signal. On the other hand, for writing ‘1’ into a cell, BL should be grounded, and SL should be connected to the write voltage (2V). Then, the RRAM cell will be set by asserting the DWL signal.

The read process is accomplished by connecting SL to the ground and pre-charging BL and the reference bit line (ReBL) to 0.7V. ReBL is connected to the reference resistance through a transistor connected to the L1 signal, as shown in Fig. 5. For instance, when the word line of the desired cell and L1 are asserted, BL and ReBL are discharged at different rates depending on the resistance of the corresponding cell. Since an SA-SRAM is configured in the sense amplifier mode, it acts as a comparator. To this end, the discharge voltage in BL is compared with the discharge voltage in the ReBL for a certain period, and the desired data is detected and stored.

B. MMA Computing Mode

To enable IMC in MMA, SLs are connected to the ground, the BLs are pre-charged to 0.7V, and the word lines of the two desired rows are activated to start the computing process. With the activation of the two desired rows, two RRAM cells become parallel in each column, and according to the data of each cell, the discharge voltage of BL can be stable at three voltage levels. As shown in Fig. 6, if the data on both cells are “00” (low-low), “01” or “10” (low-high), or “11” (high-high), the voltage on BL will be low (LV), medium (MV), or high (HV), respectively. To further clarify the functionality of IMC in MMA, the structure is explained in detail.

FIGURE 6. - Proposed in-memory computing structure in MMA.
FIGURE 6.

Proposed in-memory computing structure in MMA.

1) AND/NAND Logic

To obtain the AND/NAND logic, the reference voltage must be placed between the MV and HV resistance regions. At the same time, as the word lines of the desired cells are activated, and BL and ReBL are pre-charged, the L2 signal is activated (reference AND/NAND resistance), and SA-SRAM detects the desired data by comparing the voltages on BL and ReBL. Finally, the appropriate output result is stored in the memory.

2) OR/NOR Logic

The OR/NOR logic, like AND/NAND, is obtained by setting the appropriate reference resistance that places the ReBL voltage between LV and MV during IMC. The reference resistance in this operation is connected by activating the L3 signal. In this operation, the desired output is generated based on the different voltages on BL and ReBL according to the resistance of their paths.

C. SA-SRAM Memory Mode

As discussed in section III, SA-SARM operates in the SRAM mode by disabling the active-low PC signal (PC=‘1’). In summary, the write operation is accomplished by selecting the required signals (WWLA, WWLB, and WL) for the desired cell, activated by the SRD and SCD decoders. Based on the data stored in the cell, RBL discharges or remains at its pre-charged value in the read operation.

D. SA-SRAM Computing Mode

The SA-SRAM cell provides in-memory computations in the SRAM mode, benefiting from the isolated read path due to the decoupled read ports. As illustrated in Fig. 7, the isolated read mechanism makes it possible to perform the OR/NOR, AND/NAND, and XOR/XNOR functions within the SA-SRAM array. In the following, how each of these operations is performed is described.

FIGURE 7. - Proposed SRAM-mode IMC structure.
FIGURE 7.

Proposed SRAM-mode IMC structure.

1) AND/NAND Operation

In the AND/NAND operation, the output is ‘1’/‘0’ only if both inputs are ‘1’. By activating two RWLs of the selected cells connected to RBL, RBL remains at its high value only if the Q(QB) nodes of the two activated cells in the same column contain ‘1’ (‘0’) values. Otherwise, the RBL is discharged to the ground, indicating the ‘0’ output. As shown in Fig. 7, a high-skewed inverter (INV1) acts as a sense amplifier for each column gated by the corresponding RBL for fast detection of the output at a certain period. Placing the high-skewed inverter causes the NAND operation to be executed, so a subsequent unskewed inverter (INV2) is needed to perform the AND operation.

2) OR/NOR Operation

As mentioned in the AND/NAND operation, when the RWLs of the cells of interest are activated simultaneously, based on the data stored in these cells, the selected RBL can be expected to discharge to the ground or not.

If two desired cells contain “00”, “01”, or “10”, RBL starts to discharge. However, the discharge rate in the “00” state is much faster than in the “01”/“10” states due to different discharge path resistances. As shown in Fig. 7, placing a low-skewed inverter (INV3) with a switching threshold between the LV and MV regions implements the NOR function. In addition, a cascaded inverter (INV4) is needed to realize the OR function.

3) XOR/XNOR Operation

The XOR operation can be performed straightforwardly by NORing the AND (INV2) and NOR (INV3) outputs.

E. Polymorphic Unit (PMU)

A polymorphic gate is a reconfigurable component that can perform various logic functions non-conventionally, boosting processing capacity. Moving toward polymorphic structures is due to their flexibility in implementing multiple processes and preventing security risks such as penetration and reverse engineering. For this purpose, some control signals are also considered as inputs to the polymorphic unit and the primary input data. Different logic configurations can be selected for input processing by changing these control signals.

We proposed a new PMU design based on the IG-FinFET technology, as shown in Fig. 8. This structure can be configured as a dual-purpose design that can perform different simple logic functions (logic mode) and be considered as a full-adder (full-adder mode). Equation 2 shows the relation between \text{C}_{\mathrm {OUT}} (majority) and SUM with their inputs:\begin{align*} C_{OUT} &= (A + B).(A + C).(B + C) \tag{2}\\ {\textit{SUM}} &= (C_{OUT} + C).(C_{OUT} + B).(C_{OUT} + A).(A + B + C) \tag{3}\end{align*}

View SourceRight-click on figure for MathML and additional features.

FIGURE 8. - Proposed polymorphic logic design (PMU).
FIGURE 8.

Proposed polymorphic logic design (PMU).

This structure is inherently a full adder with four outputs, where signals A and B are intended as input data signals, and C can be either data or a control signal. In the following, we discuss the operation of each mode separately.

1) Logic Mode

Signal C acts as a control signal in this mode. If we set C to ‘0’ in the proposed PMU, the majority part (\mathrm {C}_{\mathrm {OUT}}\mathrm {/}\overline {\text {C}_{\text {OUT}}} outputs) acts as a two-input AND/NAND, and the SUM part \mathrm {(SUM/}\overline {\text {SUM}} outputs) realizes the two-input XOR/XNOR operations. On the other hand, if we set the C signal to ‘1’, the majority part performs OR/NOR operations, and the SUM unit performs XNOR/XOR.

One of the data signals (A or B) should also be considered a control signal to implement the INV logic. For example, if C and A are set to “01” or “10”, the output of the SUM part gives the inverted result. Overall, seven different Boolean logic can be implemented with the proper setting of the C signal.

2) Full-Adder Mode

As the proposed structure is inherently a full adder, the full adder outputs are generated in one cycle if all three inputs are considered data. Suppose the majority output of each polymorphic unit is used as the input C of the next polymorphic unit. In that case, these blocks will form a ripple carry adder (RCA) for more complex calculations.

As demonstrated in Fig. 9, the C input of each block must be signaled in different ways to change the configuration mode. Therefore, it is necessary to embed a multiplexer on the input C of each block. Suppose the selector of this multiplexer switches to ‘1’.

FIGURE 9. - RCA structure based on proposed polymorphic logic.
FIGURE 9.

RCA structure based on proposed polymorphic logic.

The logic controller signal is connected to input C of each cell to perform the desired operations. On the other hand, if the selector switches to ‘0’, The majority of the output of the previous block is connected to the current block, which makes the RCA available for processing.

SECTION VI.

Performance Evaluation

In this section, the performance of the proposed SA-SRAM in the SRAM and logic modes is evaluated. Meanwhile, the proposed in-memory architecture and polymorphic logic design are evaluated and compared to other compared designs.

A. SRAM Mode

The proposed SA-SRAM cell is simulated in HSPICE using the IG-FinFET model [31], [32] at a nominal \text{V}_{\mathrm {DD}} of 0.7V. Some critical parameters of the model are listed in Table 1.

TABLE 1 Important IG-FinFET Device Characteristics
Table 1- 
Important IG-FinFET Device Characteristics

To provide a comparative analysis, comparisons are made with the state-of-the-art SRAM cells, including conventional 8T [9], LP10T [30], ST9T [27], BF12T [28], SEHF11T [29], BP8T [33] and 8+T [34]. To have a fair comparison, the compared cells have also been optimized and simulated using the same FinFET technology.

1) Hold Static Noise Margin (HSNM)

HSNM is the maximum noise voltage level an SRAM cell can endure without data flipping during the hold state. HSNM is calculated by measuring the longest side of the largest square that can be plotted within the smaller lobe of the butterfly curve in the hold state. As shown in Fig. 10, the proposed cell has the highest HSNM compared to other SRAM cells due to its symmetric structure, power-gated M1 and M3, and stacked MGL, enhancing the voltage transfer characteristic (VTC) curve.

FIGURE 10. - HSNM and RSNM of the proposed and compared cells.
FIGURE 10.

HSNM and RSNM of the proposed and compared cells.

2) Read Static Noise Margin (RSNM)

Like HSNM, RSNM is obtained by finding the square’s largest side, which is inside the smallest lobe in the butterfly curve in the read state. As shown in Fig. 10, the proposed cell RSNM is as high as HSNM due to the decoupled read path. Read operations in some structures, such as LP10T and ST9T, are conducted by BL discharging through the pull-down network of their cells. Accordingly, during the read operation in these cells, the VTC curves undergo unfavorable changes because voltage division between the access and pull-down transistors degrades RSNM.

3) Combined Word Line Margin (CWLM)

CWLM is the difference between the \text{V}_{\mathrm {DD}} and word line (WL) when the WL signal is swept (for the n-type access transistors) from GND to \text{V}_{\mathrm {DD}} in the DC analysis. In this work, CWLM is considered the most suitable definition among various definitions described in [35] to evaluate and compare the writability of SRAM cells. According to the simulation results shown in Fig. 11, the proposed cell has the highest CWLM according to the separated read and write operations, pseudo writes operation, and temporally disabling the feedback of the cross-coupled inverters.

FIGURE 11. - CWLM of the proposed and compared cells.
FIGURE 11.

CWLM of the proposed and compared cells.

4) Half-Select SNM

The proposed structure is half-select-free due to the decoupled row and column control signals in writing and reading data from and to the proposed cell. Fig. 12 shows the HSNM of the Half-selected SRAM cell. It can be observed that the half-select SNM of the proposed cell is 44% higher than the BF12T, which is in second place.

FIGURE 12. - Half-select SNM of the proposed and compared cells.
FIGURE 12.

Half-select SNM of the proposed and compared cells.

5) Read Delay

The read delay time (RDT) is the time when RWL reaches \text{V}_{\mathrm {DD}}/2 until the voltage of BL discharges (charges up) to 50% of \text{V}_{\mathrm {DD}} based on the design configuration.

According to the simulation results given in Fig. 13, the delay of the proposed structure is approximately equal to the 8T, LP10T, SEHF11T, and 8+T structures due to the same reading path with two transistors.

FIGURE 13. - Comparison of the reading and writing delay of the proposed cell and compared cells.
FIGURE 13.

Comparison of the reading and writing delay of the proposed cell and compared cells.

FIGURE 14. - Comparison of the reading and writing energies of the proposed cell and compared cells.
FIGURE 14.

Comparison of the reading and writing energies of the proposed cell and compared cells.

6) Write Delay

The write delay time (WDT) is calculated as the time when the WL signal reaches \text{V}_{\mathrm {DD}}/2 until the voltage of the ‘1’ (‘0’) storing node discharges (charges up) to 10% (90%) of \text{V}_{\mathrm {DD}} due to the configuration of the designs.

As shown in Fig. 13, the 8T, BP8T, and 8+T structures have the lowest write delay because of their differential write mechanism. Since the proposed cell utilizes a pseudo-differential write mechanism in the write operation, the time required to write on the cell is much less than in competitive structures ST9T and BF12T.

7) Energy Consumption

As there is always a trade-off between delay and power consumption in VLSI circuits, PDP, the product of delay and power, is a suitable metric for comparing cells’ performance in reading and writing modes. The simulation results show that the proposed design has nearly the same PDP in read operation as the conventional 8T, BP8T, 8+T, and SEHF11T. Also, the proposed cell has the lowest write PDP compared to the SRAM cell structures mentioned.

8) Area

The area of the proposed reconfigurable SRAM sense amplifier is also compared to the well-known SRAM cells in Fig. 15. According to the results, The proposed cell area is in the middle of other designs. However, it should be considered that the reliability and half-selected issues, which are essential in IMC, are solved in the proposed design. In contrast, the 8T, BP8T, and 8+T SRAM cells suffer from these problems.

FIGURE 15. - Comparison of the area of the proposed cell and compared cells.
FIGURE 15.

Comparison of the area of the proposed cell and compared cells.

Moreover, it is worth mentioning that the main application of the proposed reconfigurable SRAM sense amplifier is to read data from the RRAM-based main memory array. However, in high-performance applications, it can be turned into an SRAM cell with high reliability, which is critical in IMC.

9) Process Variation Evaluation

To estimate the robustness and reliability of our proposed cell, Monte-Carlo simulations are performed in HSPICE to explore the effect of process variations. The channel length (\text{L}_{\text {g}} ) and silicon (Fin) thickness (\text{T}_{\text {si}} ) are the critical parameters that are modeled by Gaussian distribution in this simulation, considering a variation of 7% for both parameters [36], [37]. The simulation results indicate no failure in any operational modes of the proposed cell under process variations. The normalized yield calculated by dividing the mean value by sigma is given in Table 2. As can be seen, the yield of the proposed design in HSNM and RSNM is sufficient for the reliability requirements, and its CWLM has provided the highest yield compared to others.

TABLE 2 The Normalized Yield of the Proposed Design and Compared Cells in Different Operations
Table 2- 
The Normalized Yield of the Proposed Design and Compared Cells in Different Operations

B. Main Memory Evaluation

The simulation and evaluation of the in-memory computing implementation in the main memory are shown in Fig. 16. In this simulation, in addition to the functional evaluation of the cell, the validity and reliability of the cell have also been accomplished by running Monte Carlo simulations to consider the process variation. In addition to the critical parameters of FinFETs, for RRAMs, the V0, I0, and \gamma _{0} parameters are considered to deviate ±20% of the RRAM nominal resistance [25], [38]. According to the results, no failure occurred under process variations, and consequently, the in-memory computing process can be implemented with high reliability in the main memory.

FIGURE 16. - Simulation results of AND/NAND and OR/NOR of the main memory in the presence of process variation.
FIGURE 16.

Simulation results of AND/NAND and OR/NOR of the main memory in the presence of process variation.

It is worth mentioning that by increasing the number of operations in RRAM, read disturb may become a critical concern in practice. To address this issue, the HRS of the RRAM needs to be set to a high resistance [39]. In the proposed design, by setting the HRS to 1 \text{M}\Omega and LRS to 10k\Omega , read disturb will not affect the functionality of the computing mode.

C. PMU Evaluation

We evaluate the proposed PMU design from different perspectives to demonstrate its efficiency and performance as a security-processing gate for replacing conventional designs.

1) Complexity Factor Analysis

As shown in Table 3, four polymorphic logic designs are compared with the proposed PMU with the criteria such as the transistor count (TC), number of implementable functions (FC), and the complexity factor (TC/FC), which is calculated by dividing the number of transistors by the number of implementable functions. It is worth mentioning that the PMU can accomplish seven Boolean logic functions (OR/NOR, AND/NAND, XOR/XNOR, INV) using just 20 transistors. As can be seen by realizing seven different Boolean functions, the proposed design has the lowest complexity factor among the compared structures.

TABLE 3 Comparison of Complexity Factors of Different Structures
Table 3- 
Comparison of Complexity Factors of Different Structures
TABLE 4 Average Performance Evaluation of the Implemented Filters Using Different in-Memory Architecture Designs
Table 4- 
Average Performance Evaluation of the Implemented Filters Using Different in-Memory Architecture Designs

2) Process Variation Analysis

To evaluate the functionality of the proposed PMU unit in the presence of process variation, Monte Carlo simulations have been performed, and no failure occurred in the proposed design’s outputs, as shown in Fig. 17.

FIGURE 17. - The transient response of different functions in the proposed PMU in the presence of process variation.
FIGURE 17.

The transient response of different functions in the proposed PMU in the presence of process variation.

D. Architecture Evaluation

A preprocessing in OCR using min/max filters, a median filter, and a neural network is implemented to evaluate the proposed architecture in real-world applications. To implement the median filter, a 3\times 3 sliding window need to implement to obtain data and select the median value among them.

This can be done by sorting data in rows, then in columns, and finally sorting the sub-diameter in the 3\times 3 sliding window. The median value is the fifth data in the middle of the 3\times 3 sliding window (Fig. 18) [3]. Moreover, after sorting data, by replacing the minimum/maximum data with the middle of the 3\times 3 sliding window, the min/max filter is implemented. It is worth mentioning that the building block for sorting data is an 8-bit comparator which can be implemented using basic logic gates which are implemented in the proposed architecture.

FIGURE 18. - Finding median value in 
$3\times 3$
 sliding window (a) row sorting (b) column sorting (c) sub-diameter sorting (d) median value (fifth data).
FIGURE 18.

Finding median value in 3\times 3 sliding window (a) row sorting (b) column sorting (c) sub-diameter sorting (d) median value (fifth data).

In the proposed architecture, in the first cycle, the data stored in each row of MMA is sorted using an 8-bit magnitude comparator implemented using MMA in the computing mode. Then, the sorted data in each row are stored in the reconfigurable SRAM array (Fig. 5). This can be done by configuring each reconfigurable SRAM row to the sense amplifier to read the computed data from MMA and store the value inside itself. In the second cycle, all columns of the 3\times 3 sliding window stored in a reconfigurable SRAM array are sorted, benefiting from the computing mode of the reconfigurable SRAM array. The results are stored in the reconfigurable SRAM array. This considerably decreases the power consumption and the computing time of the median filter calculation because the power consumption for writing the desired data (sorted data) in the RRAM-based MMA is much higher than storing data in the SRAM array.

Fig. 19 and Fig. 20 show the functionality of the proposed architecture by implementing these filters. To evaluate the proposed architecture in a more practical application, a neural network (NN) is implemented (Fig. 21) [48]. To this end, the energy consumption of the proposed architecture is extracted from the circuit-level simulations conducted using HSPICE. Then, by modeling the NN using MATLAB and the obtained data, the overall energy consumption of the entire network is evaluated.

FIGURE 19. - Median filter implementation to remove the salt-and-pepper noise (a) original image (b) denoised image.
FIGURE 19.

Median filter implementation to remove the salt-and-pepper noise (a) original image (b) denoised image.

FIGURE 20. - Preprocessing in OCR (a) original image (b) Max filter (removing noise and erosion), (c) Min Filter (dilation).
FIGURE 20.

Preprocessing in OCR (a) original image (b) Max filter (removing noise and erosion), (c) Min Filter (dilation).

FIGURE 21. - The BNN accelerator architecture.
FIGURE 21.

The BNN accelerator architecture.

As shown in Table 5, the energy consumption of the proposed method is lower than the other structures as it benefits from the proposed reconfigurable SRAM sense amplifier, which reduces the overall energy consumption by eliminating the extra data reading from MMA and the low computing energy consumption of the proposed reconfigurable SRAM.

TABLE 5 Comparing Energy Consumptions of the BNNs Implemented Using Different in-Memory Architectures
Table 5- 
Comparing Energy Consumptions of the BNNs Implemented Using Different in-Memory Architectures

SECTION VII.

Conclusion

In-memory processing is a promising paradigm that improves the throughput and energy consumption, especially in data-intensive applications. To this end, we have proposed an IG-FinFET-based chained new reconfigurable SRAM array for in-memory computing and a nonvolatile RRAM array as a hybrid architecture with in-memory computing capability in both structures. The proposed reconfigurable SRAM array can be configured as a sense amplifier to read data from RRAM memory and also can be configured as an SRAM cell for in-memory computation besides data storage ability. Moreover, the proposed reconfigurable SRAM cell has a high static noise margin and a free half-select issue which is essential for memory design. The simulation results indicate that according to the half-select free feature of the proposed cell, its half-select static noise margin is equal to HSNM, which is a remarkable advantage in the in-memory computation process. Moreover, 50% and 20% improvements in the write energy consumption and CWLM have been achieved compared to the 8T SRAM cell. In addition, the proposed reconfigurable SRAM array can directly read and store raw or processed data from the RRAM unit by adding the sense amplifier feature to the proposed SRAM cell. In the proposed hybrid in-memory architecture, AND/NAND, OR/NOR operations can be performed in the RRAM main memory. All two-input Boolean functions can be calculated in the reconfigurable SRAM array. Furthermore, we have embedded a new polymorphic structure inside the hybrid memory architecture as a security-computing unit to enhance the computation capability inside the memory.

References

References is not available for this document.