1F-1T Array: Current Limiting Transistor Cascoded FeFET Memory Array for Variation Tolerant Vector-Matrix Multiplication Operation

This letter proposes a memory cell, denoted by 1F-1 T, consisting of a ferroelectric field-effect transistor (FeFET) cascaded with another current-limiting transistor (T). The transistor reduces the impact of drain current (<inline-formula><tex-math notation="LaTeX">$I_{d}$</tex-math></inline-formula>) variations by limiting the on-state current in FeFET. The experimental data from our 28 nm high-k-metal-gate (HKMG) based FeFET calibrates and simulates the memory arrays. The simulation indicates a significant improvement in bit-line (BL) current (<inline-formula><tex-math notation="LaTeX">$I_{BL}$</tex-math></inline-formula>) variation and the accuracy of vector-matrix multiplication of the 1F-1 T memory array. The system-level in-memory computing simulation with 1F-1T synapses shows an inference accuracy of 97.6% for the MNIST hand-written digits with multi-layer perceptron (MLP) neural networks.

Amidst such a plethora of eNVMs, FeFET is a promising one due to field-based operation, low power consumption, fast switching, high on-current (I ON ) to off-current (I OF F ) ratio ( I ON I OF F ), excellent linearity and bidirectional programmability, good endurance, and compatibility with the Complementary Metal-Oxide Semiconductor (CMOS) technology [18], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38]. However, the primary obstacle in implementing FeFET-based computing systems lies in the inherent stochasticity of FeFET devices. One of the primary reasons for this stochasticity lies in the polycrystalline nature of HfO 2 -based ferroelectric thin films. The presence of charge traps at the Ferroelectric /interlayer interface and within the Ferroelectric film can lead to asymmetrical conductive response and large device-to-device variations [39], [40], [41]. Great efforts have been made to minimize the effects of such non-idealities both from the devices and from the perspective of the circuit [42], [43], [44], [45], [46], [47]. It is imperative that device-to-device variations in the drain currents of a FeFET can adversely affect the performance of a synaptic core, resulting in significant degradations in training and inference accuracy. Our previous work demonstrated how a series-resistor connected to the drain terminal of the FeFETs could reduce variations by limiting the current in the Low-Voltage-Threshold (LVT) state [21]. However, the fabrication of resistors in a standard CMOS process adds complexity to the macro-design because of the enormous size of such resistors. Poly-silicon resistors have the highest resistance density but suffer from a larger mismatch, requiring 100 µm 2 to achieve 1Mohm. In this work, we demonstrate the efficacy of the 1F-1T structure in overcoming this issue. We have considered 4 × 4 arrays to evaluate the effects of this 1F-1T architecture. We show that such a configuration prevents the accumulation of variations over bit-line current (I BL ) and also reduces the impact of voltage swing across word lines (WL), bit lines (BL), and select lines (SL). We benchmark the system-level performance of an in-memory computing circuit with a 1F-1T synaptic array. Our simulations indicate significant improvements in inference accuracy compared to the network using 1F synapses.

A. Characterization of 28 nm HKMG FeFET
The experiment began with fabricating the crossbar arrays and memory cell structures on 300 mm wafers using the 28 nm HKMG technology at GlobalFoundries. Fig. 1 shows the schematic of an HKMG FeFET and associated transmission electron microscopic (TEM) image of a minimum feature size FeFET at this technology node. On the application of voltage pulses to the gate terminal, the ferroelectric layer's dipoles align themselves as per the polarity of the pulse, altering the channel's surface charge density and, consequently, the threshold voltage (V th ). A positive pulse at the gate terminal of n-type FeFET programs the device at a lower V th state (LVT), and the negative pulse programs the device at a higher V th state (HVT). The devices were programmed and erased by applying 500 ns pulses of amplitude 4.5 V and −5 V, respectively, to the gate terminal of FeFET, with the drain and source terminals grounded. Before READ-WRITE operations, the devices were subjected to wake-up cycling (100 times), delivering 4.5 V and −5 V pulses of 500 ns.

B. Design of 28 nm HKMG Based Memory Array
In this work, the effects of FeFET variability have been assessed using 4 × 4 memory arrays. We investigated the two distinct architectures shown in Fig. 2. Note that program and erase conditions for the single devices were adopted to simulate the characteristics of the array. Fig. 2(a) shows the architecture with one FeFET device per synaptic cell (1F architecture). Here, word lines (WL) are connected to the gate terminals of the FeFETs in a row. Each cell can be accessed by controlling the corresponding WL and the select line (SL). Bit-line (BL) current (I BL ) defines the state of the cell. Our proposed schematic of the 1F-1T synaptic array is shown in Fig. 2(b). Here, each synaptic cell consists of an extra transistor which acts as a current limiter. A bias voltage is applied to the transistor's gate terminal to access a specific cell. FeFET conductance is controlled by tuning the WL and SL voltages. Layouts of the 4 × 4 1F and the proposed 1F-1T memory array are shown in Fig. 3. The 1F and 1F-1T memory array have an area of 5580 λ 2 and 7688 λ 2 , respectively. Compared to the pseudo crossbar 1F-1T memory array [16], our proposed 1F-1T memory provides a 60% area advantage.   This is due to the same diffusion layer shared by the cascaded transistor's drain and source of FeFET in our configuration. Table I summarizes the operating voltages for simulating the 1F and 1F-1T arrays. The row-wise write operation was conducted by applying 4.5 V at the word line (WL) and 1.8 V at  Table I for the write, read, and inhibition operations in the arrays.
The current sense amplifier (CSA) reads the different levels of bit-line current. Fig. 4 depicts a typical 4-stage current-mode sensing amplifier [48], [49]. I BL and I REF stand for the memory array's bit-line current and reference current, respectively. 'Out' denotes the differential comparator's logical output voltage. M2 mirrors the bit-line current into the reference branches, which are the drain sides of M3, M4, M5, and M6. Current-to-voltage conversion takes place at each reference node for the reference current. The output node Out rises to logic high 'V DD ' if the memory array's I BL exceeds the I REF . Otherwise, it remains at logic low '0'.

C. Neural Network Simulation
Finally, the impact of variations on the system-level performance, especially for in-memory computing applications, was evaluated by the Neurosim platform [50]. Experimentally calibrated I d values with the statistics of variations were used to emulate the synaptic weights of a multilayer perceptron (MLP) neural network (NN). MNIST dataset is used to benchmark the performance of the MLP. The neural network's architecture is illustrated in Fig. 5. This work considers offline training for neural networks, where synaptic weights are pre-trained in the software. The weights are then encoded into the circuitry. This method is more energy-efficient but less noise-tolerant as the synaptic weights cannot be modified on-the-fly during the training operation. Following offline training, a single-shot programming pulse was used to update the synaptic weights on the hardware in terms of FeFET channel conductance values. The synaptic weights were normalized between the minimum value (W min ) of -1 and the maximum value (W max ) of 1. The I OF F of the FeFETs was mapped to W min , and the I ON was mapped to W max . The FeFET-based synaptic core, shown in    5, is used to carry out the VMM operation. The output of the VMM is directly digitized using a current-to-digital converter. Fig. 6(a) shows the READ operation conducted two seconds after WRITE by applying a voltage ramp with a step size of 100 mV. The READ current's probability density function (PDF) is shown in Fig. 6(b). We observe that drain current device-todevice variation is more at the LVT than at the HVT state. The experimentally measured READ-WRITE of a single FeFET was calibrated with a comprehensive model [58] for the FeFETs to evaluate the characteristics of a memory array. The mean and standard deviation of V th for HVT and LVT states were used for statistical variation analysis of the devices. Fig. 6(c) shows the current ensemble through the bit-lines of different memory array columns according to the voltage applied to the word line (V W L ). I BL increases as the number of activation cells increases in a column, demonstrating the multiply and accumulation operation. The PDFs of I BL show that as the number of activation cells increases in the column, the distribution curves start to overlap, making the multiply-and-accumulate (MAC) operation futile for in-memory computing applications as shown in Fig. 6(d). Next, we propose an alternative synaptic cell consisting of a ferroelectric memory transistor cascaded with another logic transistor (1F-1T) to circumvent this issue. Since the variations mainly affect the LVT state, we can efficiently address the problem by limiting the ON current. The logic transistor acts as a current limiter, and the ON current of the FeFET is limited by tuning the gate-source voltage of the cascaded transistor. FeFET switches the state of the cell, and the cascaded transistor controls the cell's ON current. The 1F-1T arrangement thereby makes the ON state current variation independent of the FeFET V th variation and considerably decreases it. The 1F-1T structure was adopted by connecting the FeFET model and the BSIM-4 model [58], [59]. The variation statistics obtained from experimental data were used to evaluate the impact of the current limiting transistor on the variation of I BL through Monte Carlo (MC) simulation for 50 iterations. The variation for the logic transistor cells was not considered assuming they are insignificant compared to the FeFET devices. Fig. 7(a) shows the transfer characteristics of the proposed cell with the same pulsing scheme as a single FeFET cell. It is observed that the ratio of drain currents at HVT and LVT is low compared to the 1F cell. Next, the simulation of the 4 × 4 array with 1F-1T memory devices was conducted. Fig. 7(b) shows non-overlapping MAC operation among different bit-line current levels. Therefore, integrating the current limiter transistor with each FeFET device reduces the variation significantly in the I BL . The PDF of the MAC operation with 1F-1T structures in Fig. 7(c) shows non-overlapping distinguishable PDFs of I BL among different columns with four cells in each column. Transient analysis of the sensed output voltage of the 4-stage CSA shown in Fig. 4 has been carried out, keeping the device-to-device variations of the FeFET in the 1F-1T memory  Once the I BL grows beyond I REF , the Output voltage (Out) of the differential comparator goes high. Otherwise, it remains low, as shown in Fig. 8(b). Thus, the sense amplifier can successfully detect the different bit-line current levels of the 1F-1T memory array despite the device-to-device variation of FeFET devices due to the cascaded transistor, which limits the I BL and makes the non-overlapping current levels. Fig. 9(a) shows the inference accuracy achieved for offline training. 1F-1T exhibits a clear accuracy advantage compared to 1F-based synapses. It achieves 97.6% accuracy, whereas the software benchmark is 98.99%. Further, it is observed that total leakage power is almost the same for both subarrays because the OFF current (I of f ) is the same for both cells. But total read energy is less for the 1F-1T-based synaptic core than the 1F-based core due to the lower ON current of FeFET. The total read energy of 1F and 1F-1T based cores are 450 µJ and 17.6 µJ, respectively, shown in Fig. 9(b). So, the MLP-based NN with our proposed 1F-1T synaptic core offers excellent immunity towards device variations, accuracy, and low energy consumption. Finally, a detailed comparison of 1F-1T with other emerging memories investigated for in-memory computing applications is shown in Table II. As the table shows, all other emerging memories have several advantages. Still, they are all plagued by problems with device-to-device variability, which renders the VMM operation ineffective. On the other hand, even when FeFET devices are scaled down, our suggested 1F-1T-based in-memory computing architecture reduces the bit-line current variation.

IV. CONCLUSION
In this work, we have proposed a novel 1F-1T memory cell for in-memory computing applications. The operations are evaluated by characterizing the 28 nm HKMG FeFET arrays. The cascaded 1T cell acts as a current limiter and reduces the variation in bit-line current compared to 1FeFET cell-based memory array. Despite FeFET V th variation, we have shown that the current sensing amplifier can accurately digitize the different bit-line current levels of the 1F-1T memory array. We have also demonstrated the impact of device variation on system-level performance in the inference of an MLP-NN. The 1F-1T array-level simulation depicts that such an analog weight cell-based crossbar array accelerates the system-level performance for building in-memory-computing hardware.