Introduction
With the increasing need for high-performance processing in applications such as artificial intelligence, neural networks, search engines, biological systems, and image processing, the von Neumann architecture faces a critical challenge known as the memory wall [1], [2]. In this case, transferring a large amount of data back and forth between a separate processor and memory leads to severe energy consumption, high latency, and I/O congestion [3], [4]. Therefore, one of the most promising approaches for alleviating these challenges is in-memory computing (IMC), which provides computing capability inside memory [5], [6]. IMC can perform simple computational tasks to reduce memory-processor data transfers [7]. In other words, memory cells are considered to accomplish normal read-write operations and perform simple logical calculations within the memory to bypass the von Neumann bottleneck [8]. Various studies have been conducted on the role of IMC platforms based on volatile and nonvolatile memories. Dynamic random-access memory (DRAM) and static random-access memory (SRAM) are used for volatile in-memory computing design, and emerging technologies such as magnetic RAM (MRAM) and RRAM are used in the nonvolatile area [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20]. It is worth mentioning that the advantages of using volatile memories are related to widespread usage and the read/write operation speed. Moreover, these memories can implement short-term memory in neuromorphic computing [21]. However, high power consumption in DRAMs and large area occupation in SRAMs may deter these designs [17]. In contrast, low power dissipation, dense integration, and nonvolatility are the significant privileges of emerging technologies for the IMC paradigm. Meanwhile, these memories face high read/write delay and reliability issues due to low endurance [8].
In [9], an 8T SRAM cell for in-memory computing capable of performing simple Boolean computation was proposed. Although the latency is reduced because of the separated read/write operations and in-memory Boolean logic calculations, write power consumption and reliability concerns in half-selected cells are significant issues in this design. Moreover, this design is not suitable for performing complex Boolean logic functions.
In [18], an 8T SRAM cell with three ports for four operands to perform complex Boolean functions was proposed. In one cycle, this design can perform three Boolean operations (XOR/XNOR, AND/NAND, and OR/NOR). However, the static noise margin (SNM) of the SRAM cells is considerably reduced at the nanoscale. Moreover, the area overhead and volatility are the critical bottlenecks of this design.
In [19], nonvolatile hybrid MTJ/CMOS logic gates capable of performing Boolean logic functions for the in-memory computation were proposed. Although this design shows acceptable performance compared to benchmarks, the delay in performing Boolean computation is considerably higher than in CMOS-based designs. Moreover, this design is not suitable for applications that need real-time computations.
To this aim, [21] proposed a spintronic/CMOS memory, including volatile DRAM and nonvolatile MRAM cells, to cope with the requirements of real and none real-time applications. However, this design only covers some Boolean logic functions and is unsuitable for complex calculations. Moreover, this design’s writing energy and delay are considerably higher than the CMOS designs.
None of these designs has benefited from the fast read/write of the volatile DRAM (or SRAM) designs and the low power consumption and dense integration of nonvolatile emerging technologies (RRAM/MRAM). Using a single circuit to implement multiple functions reduces chip area and forms a primary barrier to security issues like reverse engineering or IC counterfeiting. These features are provided by polymorphic circuits [22]. This reconfigurable circuit utilizes controllable factors to implement various functions. For example, signals such as voltage and temperature can configure a polymorphic circuit to perform multiple functions [19]. The placement of such circuits in a memory unit offers the ability to increase operational capacity and also enhances its security.
This paper proposes a novel hybrid SRAM/RRAM-based in-memory computing architecture capable of performing all Boolean logic functions. The SRAM array is designed by benefiting from the proposed reconfigurable 11T SRAM cell capable of reading data from RRAM-based main memory in sense-amplifier mode and performing all Boolean logic functions besides the data storage ability in the SRAM mode. Moreover, the proposed SRAM cell has a high static noise margin beside the free half-select issue, which is critical in memory design.
To our best knowledge, it is for the first time that the SRAM and RRAM memory array are combined to handle the need for real and none real-time applications. This is due to the proposed hybrid architecture benefiting from the RRAM-based memory for non-real-time applications and the reconfigurable SRAM array, which can be used for real-time application requirements. Moreover, this design is a great candidate for implementing neuromorphic computing. This is because, in neuromorphic computing, the memory architecture is inherited from a biological memory that needs short-term and long-term memory elements [21]. This can be modeled by SRAM (short-term) and RRAM (long-term) memory array in the proposed architecture.
The rest of the paper is organized as follows: Section II reviews the fundamentals of RRAM. Section III describes the proposed SRAM cell. The proposed reconfigurable SRAM array is described in section IV. Section V discussed the proposed reconfigurable SRAM array and the hybrid process in-memory architecture. The proposed hybrid architecture is evaluated and compared with its state-of-the-art counterparts in Section VI, and finally, section VII concludes the paper.
Resistive Random-Access Memory (RRAM)
RRAM is a two-terminal device comprising an oxide layer sandwiched between two metal layers. By applying a voltage across its terminals, RRAM’s resistance can change between the low-resistance state (LRS) and the high-resistance state (HRS). Despite RRAM being a memory element, its influence has extended beyond memory design to logic circuits and computing systems. Furthermore, the use of RRAM in next-generation nonvolatile memories has been touted due to its near-zero leakage power consumption, low read/write voltage, fast switching speed, and excellent scalability [23]. Fig. 1 shows the physical structure of the RRAM device.
Applying different voltages to the top (TE) and the bottom (BE) metal electrodes can form/rupture oxygen vacancy inside the oxide layer (OL), considered a conductive filament (CF). In addition, RRAM resistance is determined by the gap distance (g) between TE and the apex of CF [24].
The I-V characteristic of an RRAM can be expressed as:\begin{equation*} I=I_{0} e^{\left ({{-\,\,\frac {g}{g_{0}}} }\right)}\sinh \left ({{\frac {V}{V_{0}}} }\right) \tag{1}\end{equation*}
A positive voltage should be applied to RRAM to transmit from HRS to LRS (SET). Alternatively, transmitting from LRS to HRS (RESET) can be achieved using a negative voltage [25]. It is worth mentioning that the RRAM device can be fabricated on top of the transistors [26].
Proposed SRAM Cell in the Memory Mode
This section proposes a half-select Free 11T SRAM cell (HF11T) using independent-gate FinFETs. As shown in Fig. 2, two back-to-back inverters ((M1, M2) and (M3, M4)) keep the data in the storage nodes. Two feedback-cutting pMOS transistors (M5 and M6) are used between the cross-coupled inverters to eliminate the conflict between the pull-up and access devices during the write operation. To write a ‘0’ or ‘1’ data into the proposed cell, the ‘0’ data transfers from the ground through M11 shared transistor and M7 or M8 transistors to the corresponding storage node Q or QB, respectively. Moreover, in the read operation, M9 and M10 transistors provide a decupled read operation through the read bit-line (RBL).
To describe the functionality of the proposed design, each operational mode is discussed separately in detail.
A. Hold State
During the hold state, the WWLA, WWLB, and WL control signals are forced to ‘0’, turning the M7, M8, and M11 transistors OFF. Moreover, as the front and back gates of M5 and M6 are completely active, the required feedback for maintaining data in the cell is established.
B. Read Operation
To perform the read operation, the RBL signal is pre-charged to
C. Write Operation
The write operation of the proposed SRAM cell is performed by asserting WL with WWLA or WWLB signals. In this case, to write a ‘0’ or ‘1’ data in the cell, the path from the ground node to the Q and QB nodes must be established. For instance, suppose Q and QB store ‘1’ and ‘0’, respectively. To write ‘0’ in the Q node, WL and WWLA are asserted to ‘1’. Therefore, M7 and M11 become completely activated, forcing Q to ‘0’. Meanwhile, by asserting WL and WWLA, M5 turns off, temporally eliminating the feedback path. Hence, ‘0’ is written in the floated Q node without contention. Simultaneously, M6 remains partially active due to the connection of its front gate to WWLB, which is initially forced to ‘0’. Therefore, M6 still can pass the strong ‘1’ from the left inverter output to the gate of the right inverter (QB), as shown in Fig. 2. Consequently, by forcing WWLA and WL to ‘0’, the new data will be retained in the cell. Similarly, the writing ‘0’ operation in node QB is executed by asserting WWLB and WL signals to ‘1’.
D. Half-Selection Issue
One of the most important considerations that must be considered in the design of an SRAM cell is to avoid unwanted writing in the neighboring cells of the cell of interest in the writing or reading process, known as the half-select issue. This issue is entirely addressed in our proposed cell. To write ‘0’ in an arbitrary cell, that cell’s WL and WWLA signals should be activated. These signals involve the unwanted cells in the row and column of the desired cell. In any row cell with only the WL signal activated, transistors M5 and M6 partially turn ON, and the feedback path remains active as a barrier against noise. In any column cell with only the WWLA signal activated, the M5 transistor is still ON, and the feedback paths are active. Therefore, the floating problem existing in the cells proposed in [27], [28], and [29] is solved, and the competition causing unnecessary power dissipation in cells like [9] and [30] is eliminated.
Proposed Reconfigurable SRAM Cell FRT In-Memory Computing
To obtain the in-memory computing requirements in application areas such as optical character recognition (OCR), the proposed SRAM cell is redesigned to be a sense amplifier for reading data from a nonvolatile memory or a volatile memory element (SRAM mode). Accordingly, as shown in Fig. 3, the back gates of the pull-up transistors (M1 and M3) are connected to the PC control signal. An extra n-type transistor (MG) is placed between M2/M4 and ground to float the storage nodes and pre-charge them to prepare the cell for sensing the storage data in nonvolatile memory. Meanwhile, the MRR and MRL transistors assess the pre-charged nodes for reading data from the nonvolatile memory.
The proposed cell can be configured as either a fast memory element (SRAM mode) to perform a normal read operation in real-time applications by disabling the PC and Read-EN control signals or a sense amplifier (sense amplifier mode) for in-memory computing by enabling PC followed by Read-EN.
The sense amplifier mode starts by enabling the PC control signal, which enables the back-gate of the pull-up transistor for pre-charging the storage nodes and disables the MG transistor to eliminate the path to the ground. Then, by asserting the Read-EN signal, the storage nodes (Q and QB) are connected to the memory element (RRAM) and the reference resistance, respectively, as shown in Fig. 4. According to the difference between the resistance of the RRAM cell and the reference resistance, the path with a lower resistance discharge faster. Meanwhile, by disabling the PC control signal, the cross-coupled inverters are activated, and the data of the lower resistance path is tied to the ground. After disabling the PC signal, the Read-EN signal is disabled, and the desired data will be kept in the cell.
Proposed Process In-Memory Architecture
This section presents a new hybrid memory architecture, including an RRAM-based main memory array (MMA)ň, a reconfigurable SRAM array (SA-SRAM), and polymorphic logic units (PMU)ő as shown in Fig. 5. The MMA row (MMRD) and source line (SLD) decoders are considered to select cells for either reading, writing or computing operations. The SA-SRAM unit can be configured in SRAM mode and used as a sense amplifier to read the MMA data and then switch back to SRAM mode to maintain the reading data.
It is worth mentioning that the SA-SRAM cells can perform in-memory computing besides the storage mode. Switching between the SRAM and sense amplifier modes is done by the mode decoder (MD) unit, which provides the PC signals. In the meantime, the read-enable decoder (RED) contributes to the communication between the MMA and SA-SRAM units. In the SRAM mode, the column (SCD) and row (SRD) decoders are utilized for reading from and writing to the SA-SRAM cells. The SA-SRAM output can be maintained in the latch unit presented in the sense amplifier unit (SAU) to provide proper inputs for PMUs. PMUs act as security-processing blocks to increase the computation capability in this architecture, designed to handle real-time and non-real-time applications. Each operational mode is discussed in detail in the following subsections to clarify the function of the array architecture.
A. MMA Memory Mode
In MMA, the data stored in a memory cell is represented by the resistance of RRAM. To write a bit into a memory cell, the corresponding source line (SL) and bit-line (BL) are connected to the respective voltages to modify the stored data. For example, to write ‘0’ into a cell, the corresponding SL should be grounded, and BL should be connected to the write voltage (2V). Then, the RRAM cell will be reset by enabling the corresponding DWL signal. On the other hand, for writing ‘1’ into a cell, BL should be grounded, and SL should be connected to the write voltage (2V). Then, the RRAM cell will be set by asserting the DWL signal.
The read process is accomplished by connecting SL to the ground and pre-charging BL and the reference bit line (ReBL) to 0.7V. ReBL is connected to the reference resistance through a transistor connected to the L1 signal, as shown in Fig. 5. For instance, when the word line of the desired cell and L1 are asserted, BL and ReBL are discharged at different rates depending on the resistance of the corresponding cell. Since an SA-SRAM is configured in the sense amplifier mode, it acts as a comparator. To this end, the discharge voltage in BL is compared with the discharge voltage in the ReBL for a certain period, and the desired data is detected and stored.
B. MMA Computing Mode
To enable IMC in MMA, SLs are connected to the ground, the BLs are pre-charged to 0.7V, and the word lines of the two desired rows are activated to start the computing process. With the activation of the two desired rows, two RRAM cells become parallel in each column, and according to the data of each cell, the discharge voltage of BL can be stable at three voltage levels. As shown in Fig. 6, if the data on both cells are “00” (low-low), “01” or “10” (low-high), or “11” (high-high), the voltage on BL will be low (LV), medium (MV), or high (HV), respectively. To further clarify the functionality of IMC in MMA, the structure is explained in detail.
1) AND/NAND Logic
To obtain the AND/NAND logic, the reference voltage must be placed between the MV and HV resistance regions. At the same time, as the word lines of the desired cells are activated, and BL and ReBL are pre-charged, the L2 signal is activated (reference AND/NAND resistance), and SA-SRAM detects the desired data by comparing the voltages on BL and ReBL. Finally, the appropriate output result is stored in the memory.
2) OR/NOR Logic
The OR/NOR logic, like AND/NAND, is obtained by setting the appropriate reference resistance that places the ReBL voltage between LV and MV during IMC. The reference resistance in this operation is connected by activating the L3 signal. In this operation, the desired output is generated based on the different voltages on BL and ReBL according to the resistance of their paths.
C. SA-SRAM Memory Mode
As discussed in section III, SA-SARM operates in the SRAM mode by disabling the active-low PC signal (PC=‘1’). In summary, the write operation is accomplished by selecting the required signals (WWLA, WWLB, and WL) for the desired cell, activated by the SRD and SCD decoders. Based on the data stored in the cell, RBL discharges or remains at its pre-charged value in the read operation.
D. SA-SRAM Computing Mode
The SA-SRAM cell provides in-memory computations in the SRAM mode, benefiting from the isolated read path due to the decoupled read ports. As illustrated in Fig. 7, the isolated read mechanism makes it possible to perform the OR/NOR, AND/NAND, and XOR/XNOR functions within the SA-SRAM array. In the following, how each of these operations is performed is described.
1) AND/NAND Operation
In the AND/NAND operation, the output is ‘1’/‘0’ only if both inputs are ‘1’. By activating two RWLs of the selected cells connected to RBL, RBL remains at its high value only if the Q(QB) nodes of the two activated cells in the same column contain ‘1’ (‘0’) values. Otherwise, the RBL is discharged to the ground, indicating the ‘0’ output. As shown in Fig. 7, a high-skewed inverter (INV1) acts as a sense amplifier for each column gated by the corresponding RBL for fast detection of the output at a certain period. Placing the high-skewed inverter causes the NAND operation to be executed, so a subsequent unskewed inverter (INV2) is needed to perform the AND operation.
2) OR/NOR Operation
As mentioned in the AND/NAND operation, when the RWLs of the cells of interest are activated simultaneously, based on the data stored in these cells, the selected RBL can be expected to discharge to the ground or not.
If two desired cells contain “00”, “01”, or “10”, RBL starts to discharge. However, the discharge rate in the “00” state is much faster than in the “01”/“10” states due to different discharge path resistances. As shown in Fig. 7, placing a low-skewed inverter (INV3) with a switching threshold between the LV and MV regions implements the NOR function. In addition, a cascaded inverter (INV4) is needed to realize the OR function.
3) XOR/XNOR Operation
The XOR operation can be performed straightforwardly by NORing the AND (INV2) and NOR (INV3) outputs.
E. Polymorphic Unit (PMU)
A polymorphic gate is a reconfigurable component that can perform various logic functions non-conventionally, boosting processing capacity. Moving toward polymorphic structures is due to their flexibility in implementing multiple processes and preventing security risks such as penetration and reverse engineering. For this purpose, some control signals are also considered as inputs to the polymorphic unit and the primary input data. Different logic configurations can be selected for input processing by changing these control signals.
We proposed a new PMU design based on the IG-FinFET technology, as shown in Fig. 8. This structure can be configured as a dual-purpose design that can perform different simple logic functions (logic mode) and be considered as a full-adder (full-adder mode). Equation 2 shows the relation between \begin{align*} C_{OUT} &= (A + B).(A + C).(B + C) \tag{2}\\ {\textit{SUM}} &= (C_{OUT} + C).(C_{OUT} + B).(C_{OUT} + A).(A + B + C) \tag{3}\end{align*}
This structure is inherently a full adder with four outputs, where signals A and B are intended as input data signals, and C can be either data or a control signal. In the following, we discuss the operation of each mode separately.
1) Logic Mode
Signal C acts as a control signal in this mode. If we set C to ‘0’ in the proposed PMU, the majority part (
One of the data signals (A or B) should also be considered a control signal to implement the INV logic. For example, if C and A are set to “01” or “10”, the output of the SUM part gives the inverted result. Overall, seven different Boolean logic can be implemented with the proper setting of the C signal.
2) Full-Adder Mode
As the proposed structure is inherently a full adder, the full adder outputs are generated in one cycle if all three inputs are considered data. Suppose the majority output of each polymorphic unit is used as the input C of the next polymorphic unit. In that case, these blocks will form a ripple carry adder (RCA) for more complex calculations.
As demonstrated in Fig. 9, the C input of each block must be signaled in different ways to change the configuration mode. Therefore, it is necessary to embed a multiplexer on the input C of each block. Suppose the selector of this multiplexer switches to ‘1’.
The logic controller signal is connected to input C of each cell to perform the desired operations. On the other hand, if the selector switches to ‘0’, The majority of the output of the previous block is connected to the current block, which makes the RCA available for processing.
Performance Evaluation
In this section, the performance of the proposed SA-SRAM in the SRAM and logic modes is evaluated. Meanwhile, the proposed in-memory architecture and polymorphic logic design are evaluated and compared to other compared designs.
A. SRAM Mode
The proposed SA-SRAM cell is simulated in HSPICE using the IG-FinFET model [31], [32] at a nominal
To provide a comparative analysis, comparisons are made with the state-of-the-art SRAM cells, including conventional 8T [9], LP10T [30], ST9T [27], BF12T [28], SEHF11T [29], BP8T [33] and 8+T [34]. To have a fair comparison, the compared cells have also been optimized and simulated using the same FinFET technology.
1) Hold Static Noise Margin (HSNM)
HSNM is the maximum noise voltage level an SRAM cell can endure without data flipping during the hold state. HSNM is calculated by measuring the longest side of the largest square that can be plotted within the smaller lobe of the butterfly curve in the hold state. As shown in Fig. 10, the proposed cell has the highest HSNM compared to other SRAM cells due to its symmetric structure, power-gated M1 and M3, and stacked MGL, enhancing the voltage transfer characteristic (VTC) curve.
2) Read Static Noise Margin (RSNM)
Like HSNM, RSNM is obtained by finding the square’s largest side, which is inside the smallest lobe in the butterfly curve in the read state. As shown in Fig. 10, the proposed cell RSNM is as high as HSNM due to the decoupled read path. Read operations in some structures, such as LP10T and ST9T, are conducted by BL discharging through the pull-down network of their cells. Accordingly, during the read operation in these cells, the VTC curves undergo unfavorable changes because voltage division between the access and pull-down transistors degrades RSNM.
3) Combined Word Line Margin (CWLM)
CWLM is the difference between the
4) Half-Select SNM
The proposed structure is half-select-free due to the decoupled row and column control signals in writing and reading data from and to the proposed cell. Fig. 12 shows the HSNM of the Half-selected SRAM cell. It can be observed that the half-select SNM of the proposed cell is 44% higher than the BF12T, which is in second place.
5) Read Delay
The read delay time (RDT) is the time when RWL reaches
According to the simulation results given in Fig. 13, the delay of the proposed structure is approximately equal to the 8T, LP10T, SEHF11T, and 8+T structures due to the same reading path with two transistors.
Comparison of the reading and writing delay of the proposed cell and compared cells.
Comparison of the reading and writing energies of the proposed cell and compared cells.
6) Write Delay
The write delay time (WDT) is calculated as the time when the WL signal reaches
As shown in Fig. 13, the 8T, BP8T, and 8+T structures have the lowest write delay because of their differential write mechanism. Since the proposed cell utilizes a pseudo-differential write mechanism in the write operation, the time required to write on the cell is much less than in competitive structures ST9T and BF12T.
7) Energy Consumption
As there is always a trade-off between delay and power consumption in VLSI circuits, PDP, the product of delay and power, is a suitable metric for comparing cells’ performance in reading and writing modes. The simulation results show that the proposed design has nearly the same PDP in read operation as the conventional 8T, BP8T, 8+T, and SEHF11T. Also, the proposed cell has the lowest write PDP compared to the SRAM cell structures mentioned.
8) Area
The area of the proposed reconfigurable SRAM sense amplifier is also compared to the well-known SRAM cells in Fig. 15. According to the results, The proposed cell area is in the middle of other designs. However, it should be considered that the reliability and half-selected issues, which are essential in IMC, are solved in the proposed design. In contrast, the 8T, BP8T, and 8+T SRAM cells suffer from these problems.
Moreover, it is worth mentioning that the main application of the proposed reconfigurable SRAM sense amplifier is to read data from the RRAM-based main memory array. However, in high-performance applications, it can be turned into an SRAM cell with high reliability, which is critical in IMC.
9) Process Variation Evaluation
To estimate the robustness and reliability of our proposed cell, Monte-Carlo simulations are performed in HSPICE to explore the effect of process variations. The channel length (
B. Main Memory Evaluation
The simulation and evaluation of the in-memory computing implementation in the main memory are shown in Fig. 16. In this simulation, in addition to the functional evaluation of the cell, the validity and reliability of the cell have also been accomplished by running Monte Carlo simulations to consider the process variation. In addition to the critical parameters of FinFETs, for RRAMs, the V0, I0, and
Simulation results of AND/NAND and OR/NOR of the main memory in the presence of process variation.
It is worth mentioning that by increasing the number of operations in RRAM, read disturb may become a critical concern in practice. To address this issue, the HRS of the RRAM needs to be set to a high resistance [39]. In the proposed design, by setting the HRS to 1
C. PMU Evaluation
We evaluate the proposed PMU design from different perspectives to demonstrate its efficiency and performance as a security-processing gate for replacing conventional designs.
1) Complexity Factor Analysis
As shown in Table 3, four polymorphic logic designs are compared with the proposed PMU with the criteria such as the transistor count (TC), number of implementable functions (FC), and the complexity factor (TC/FC), which is calculated by dividing the number of transistors by the number of implementable functions. It is worth mentioning that the PMU can accomplish seven Boolean logic functions (OR/NOR, AND/NAND, XOR/XNOR, INV) using just 20 transistors. As can be seen by realizing seven different Boolean functions, the proposed design has the lowest complexity factor among the compared structures.
2) Process Variation Analysis
To evaluate the functionality of the proposed PMU unit in the presence of process variation, Monte Carlo simulations have been performed, and no failure occurred in the proposed design’s outputs, as shown in Fig. 17.
The transient response of different functions in the proposed PMU in the presence of process variation.
D. Architecture Evaluation
A preprocessing in OCR using min/max filters, a median filter, and a neural network is implemented to evaluate the proposed architecture in real-world applications. To implement the median filter, a
This can be done by sorting data in rows, then in columns, and finally sorting the sub-diameter in the
Finding median value in
In the proposed architecture, in the first cycle, the data stored in each row of MMA is sorted using an 8-bit magnitude comparator implemented using MMA in the computing mode. Then, the sorted data in each row are stored in the reconfigurable SRAM array (Fig. 5). This can be done by configuring each reconfigurable SRAM row to the sense amplifier to read the computed data from MMA and store the value inside itself. In the second cycle, all columns of the
Fig. 19 and Fig. 20 show the functionality of the proposed architecture by implementing these filters. To evaluate the proposed architecture in a more practical application, a neural network (NN) is implemented (Fig. 21) [48]. To this end, the energy consumption of the proposed architecture is extracted from the circuit-level simulations conducted using HSPICE. Then, by modeling the NN using MATLAB and the obtained data, the overall energy consumption of the entire network is evaluated.
Median filter implementation to remove the salt-and-pepper noise (a) original image (b) denoised image.
Preprocessing in OCR (a) original image (b) Max filter (removing noise and erosion), (c) Min Filter (dilation).
As shown in Table 5, the energy consumption of the proposed method is lower than the other structures as it benefits from the proposed reconfigurable SRAM sense amplifier, which reduces the overall energy consumption by eliminating the extra data reading from MMA and the low computing energy consumption of the proposed reconfigurable SRAM.
Conclusion
In-memory processing is a promising paradigm that improves the throughput and energy consumption, especially in data-intensive applications. To this end, we have proposed an IG-FinFET-based chained new reconfigurable SRAM array for in-memory computing and a nonvolatile RRAM array as a hybrid architecture with in-memory computing capability in both structures. The proposed reconfigurable SRAM array can be configured as a sense amplifier to read data from RRAM memory and also can be configured as an SRAM cell for in-memory computation besides data storage ability. Moreover, the proposed reconfigurable SRAM cell has a high static noise margin and a free half-select issue which is essential for memory design. The simulation results indicate that according to the half-select free feature of the proposed cell, its half-select static noise margin is equal to HSNM, which is a remarkable advantage in the in-memory computation process. Moreover, 50% and 20% improvements in the write energy consumption and CWLM have been achieved compared to the 8T SRAM cell. In addition, the proposed reconfigurable SRAM array can directly read and store raw or processed data from the RRAM unit by adding the sense amplifier feature to the proposed SRAM cell. In the proposed hybrid in-memory architecture, AND/NAND, OR/NOR operations can be performed in the RRAM main memory. All two-input Boolean functions can be calculated in the reconfigurable SRAM array. Furthermore, we have embedded a new polymorphic structure inside the hybrid memory architecture as a security-computing unit to enhance the computation capability inside the memory.