28 nm HKMG-Based Current Limited FeFET Crossbar-Array for Inference Application

This article reports a novel ferroelectric field-effect transistor (FeFET)-based crossbar array cascaded with an external resistor. The external resistor is shunted with the column of the FeFET array, as a current limiter and reduces the impact of variations in drain current (<inline-formula> <tex-math notation="LaTeX">${I}_{d}$ </tex-math></inline-formula>), especially in a low threshold voltage (LVT) state. We have designed crossbar arrays of <inline-formula> <tex-math notation="LaTeX">$8\times8$ </tex-math></inline-formula> sizes and performed multiply-and-accumulate (MAC) operations. Furthermore, we have evaluated the performance of the current limited FeFET crossbar array in system-level applications. Finally, the system-level performance evaluation was done by neuromorphic simulation of the resistor-shunted FeFET crossbar array. The crossbar array achieved software-comparable inference accuracy (~97%) for National Institute of Standards and Technology (MNIST) datasets with multilayer perceptron (MLP) neural network, whereas the crossbar arrays built solely with FeFETs failed to learn, yielding only 9.8% accuracy.

(AI). The researchers started to demonstrate interest again in AI with the advent of AlexNet amidst the availability of the ImageNet dataset [2], [3]. The colossal computing resources enabled the users to create much larger CNNs, which could perform more complex tasks that had not been possible before.
Amid such developments in the computing world, another evolution was brewing in the world of electronics. The rapid advancement toward the deeply scaled and dense technology nodes made the edge devices more available to the users, which increased the real-time data generated by internet search engines, social media, Internet of Things (IoT) devices, and smart devices by many folds in recent times. The need for realtime processing of this enormous amount of data produced by the end-user devices has mandated a change in the computing system. The latency and massive computing power required by conventional Von-Neumann computing architecture for processing such an enormous amount of real-time data make them ill-suited for such purposes. However, it is worth noting that the data processing centers furnished by high-performance graphics processing units (GPUs) or tensor processing units (TPUs) can run real-time data processing with much lower latency. But their power-hungry nature makes them inappropriate for end-user applications. Therefore, the need for lowpower and fast real-time data processing has directed the researchers toward an alternative route beyond standard Von-Neumann architecture [4], [5], [6], [7], [8].
The scientific community is pursuing non-Von-Neumann architectures by implementing deep neural networks (DNNs) or spiking neural networks (SNNs) for data-centric computing with higher energy efficiency and lower latency. Simultaneously, research on emerging non-volatile memory (eNVM) has also accelerated the implementation of in-memory computing (IMC) architectures. Hafnium oxide (HfO 2 ) based ferroelectric memories are of great interest among the scientific community amidst many potential candidates like resistive random access memory (RRAM), magnetic random access memory (MRAM), and phase change memories (PCMs). This trait can be attributed to the CMOS compatibility and scalability of HfO 2 , which facilitates very large-scale integration (VLSI) of ferroelectric memories with the advanced CMOS process. The compatibility of ferroelectric memory with 28-nm high-k-metal-gate (HKMG) technology, FinFETs and thinfilm technology have further accelerated the system-level This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ integration of ferroelectric memories [9], [10], [11], [12], [13], [14], [15].
However, the primary issue of system-level integration of ferroelectric memories is the increasing variability with scaling. The poly-crystalline nature of HfO 2 and intrinsic defect sites, which act as charge-trapping sites, creates a severe issue in deeply scaled ferroelectric field-effect transistors (FeFETs). The charge trapping sites may capture electrons or holes from the channel side (CS) or gate side (GS), leading to infidelity in program-erase (WRITE) operation. Quintessentially, ferroelectricity in HfO 2 is a crystal structure-dependent property. The non-centrosymmetric Pca 21 orthorhombic phase is responsible for ferroelectricity in HfO 2 . Therefore, harnessing the ferroelectric orthorhombic phase is essential for the stable operation of ferroelectric memory. There have been several attempts from a semiconductor process perspective to stabilize the ferroelectric orthorhombic phase. Despite adopting several stabilization processes, atomic-layer-deposited (ALD) or physical vapor-deposited (PVD) HfO 2 films show variability in significantly scaled devices [16], [17], [18], [19], [20], [21].
In our previous work, we demonstrated a very effective way to reduce the device variations in the READ-WRITE operation of 28-nm HKMG FeFETs devices by shunting a resistor with the drain terminal (1F-1R) [22], [23]. We observed that the large variation in the drain current (I d ) is mostly associated with the low threshold voltage (LVT)-state current, which creates erroneous output during analog-to-digital conversion after accumulation. Therefore, we suggested to build a 1F-1R structure to compensate the variation in I d of LVT-state. We observed sufficiently high resistance (2 M), the I d variation was strongly suppressed. Furthermore, the I d variation originating from the random distribution of ferroelectric domains is reduced by the large operation window of V GS .
This work, an extension of our previous work, focuses on evaluating the performance of 1F-1R structure-based crossbar array for neuromorphic computing applications. We have designed 8 × 8 crossbar arrays with FeFETs and the column was terminated with a resistive element. The resistive element acts as current limiter for the crossbar array and reduces the variation of the bitline (BL) current (I BL ). Finally we have also evaluated the performance of the memory array as synaptic core. The system level performance demonstrates, ∼97% accuracy for inference application.

II. EXPERIMENTS
The device considered in this work was fabricated in Global-Foundries', using their 28-nm HKMG technology on 300-mm wafers. The FeFETs were fabricated by integrating 8 nm silicon doped hafnium oxide (HfO 2 ) based ferroelectric layer with a ∼1 nm silicon-di-oxide (SiO 2 ) interfacial layer in the gate-stack of a regular metal oxide semiconductor field-effect transistor (MOSFET). Fig. 1 shows the schematic illustration and the transmission electron microscopic (TEM) image of the FeFETs under consideration. The fabricated devices were programmed (WRITE) to non-overlapping binary states using 500-ns pulses at the gate terminal. Before conducting READ-WRITE operations, the FeFETs were cycled by 50 consecutive wake-up pulses. Each wake-up pulse consists of one 4.5-V pulse followed by another −5 V pulse of 500 ns. The source, drain and bulk terminals were biased at 0 V during the WRITE operation. A non-disturbing direct current (dc) sweep with a step size of 100 mV was applied at the gate terminal for the READ operation while maintaining 100 mV at the drain terminal and 0 V at source and bulk. The WRITE-pulse applied at the gate terminal lines up the electrical dipoles in the ferroelectric layer according to their polarity, changes the surface charge density of the semiconductor layer, conductance of the channel (G ch ), and the threshold voltage (V th ). Quintessentially, for n-type FeFET a positive pulse at gate terminal sets the device to LVT-state and negative pulse at gate terminal sets the devices at HVT-state.
The characterization of single devices was followed by the characterization of 8 × 8 arrays. The layout, optical image of the array, and schematic representation of the mini-array are shown in Fig. 2(a) and (b), respectively. The gate terminals of FeFETs are connected row-wise in a single word-line (WL). The drains and sources are connected column-wise in bitline (BL) and source-line (SL). The WL receives inputs for the READ-WRITE operation. The BL is connected to the current limiter. The arrays were programmed row-wise through the direct access through word lines (WLs). The select lines (SLs) and BLs connected along the columns allows the read operation to be performed along the column. The transistors denoted by M SL and M BL are used as inhibit mode switches. The FeFETs are characterized using a PXI-Express system from National Instruments. The contacts of the memory array were controlled by controlled by the pin parametric measurement unit (PPMU) of NI PXIe-6570 and source measure unit (SMU) of NI PXIe-4143. SMUs were used for conduction programerase operation, while the SLs and BLs were biased at 0 V. The devices were allowed to de-trap for 2 s after programming. The read operation was conducted by a slowly varying voltage ramp with a step size of 100 mV at WL, while keeping BL and SL biased at 100 mV and 0 V, respectively. The bulk was also kept at 0 V. The read operation takes approximately 1 ms to complete.
The row-wise WRITE operation was carried out by applying a 4.5-V pulse of 500 ns at the WL, while the complete array was erased by applying a 5-V pulse of 40 μs in the bulk. Fig. 2(d) and (e) shows the biasing scheme for row wise WRITE and bit-wise READ operation. The SL was connected to the resistive current limiter. The BLs were biased to 0.1 V through the wire named by V PRG BL . N SL and N BL transistors

III. RESULTS AND DISCUSSION
A. Device Characterization Fig. 3(a) shows the program-erase scheme used in this work. Fig. 3(b) and (d) displays the two-level READ-WRITE operations for 1F cell and 1F-1R cell, respectively. The LVT-state current (I LVT ) to HVT-state current (I HVT d ) ratio has been reduced for 1F-1R synapses. However, high I LVT d also increases the variation in the I LVT d . The slightest variation in the V th of any programmed state induces a significant variation related to the cell current of that state. Therefore, even though the I LVT d of the 1F-1R structure is reduced, embedding the current limiter reduces the standard deviation of the I LVT d significantly [ Fig. 3(c) and (e)]. Further, the variability originating from the WL due to the random variation in ferroelectric domains is suppressed by a large operation window of the WRITE pulse. The drain was held at constant 0 V during WRITE operation to ensure low static power consumption.
The READ-WRITE characterization was followed by endurance and retention characterization. Quintessentially, the operation of front-end-of-line (FEoL) FeFETs with silicon channels is limited by the WRITE endurance up to 10 5 . This limitation on WRITE endurance makes online training of the neural network (NN). The WRITE-endurance characteristics has been described in Fig. 4(a), and we observed a MW with a stable behavior up to 10 3 cycles with an increase in degradation, which lead to a full closure of MW after 5 × 10 5 cycles. Fig. 4(b) shows stable data retention characteristics up to 10 4 s at 85 • C.    state was lower than 1 nA. This ensures leakage free MAC operation. For statistical modeling, the MAC operation was performed over 20 segments across 300-mm wafers. Shows stable MAC operation over 20 different segments from the crossbar array with a non-overlapping variation in I BL .

C. Applications to In-Memory-Computing
We have performed a system-level simulation of handwritten digit recognition from the data set of "Modified National Institute of Standards and Technology (MNIST)" to quantify the efficacy of current limited FeFET-based crossbar array in multi-layer perceptron (MLP) based NNs as synaptic cores [24], [25]. Experimentally obtained device-to-device variation and retention degradation of I BL have been modeled for NN simulation. The architectures and the layers of the NN are illustrated in Fig. 6(a).
The NN was trained offline for meeting the endurance criteria of FeFET synapses. The inference operation was conducted after the offline training of the NN was conducted. Quintessentially the online training or the retraining of the NN puts an excessive load on the hardware in terms of energy. Therefore, data retention capability becomes important to carry out inference operation without frequent retraining. The data retention measured up to 10 4 s at 85 • C has been extrapolated to estimate the data retention up to 10 8 s. The extrapolated data retention was used to gauge the inference accuracy for MNIST datasets with MLP NN for 10 8 s. The MLP-based NN achieves inference accuracy of over 97% initially and maintains inference accuracy above 95% for 10 8 s without being retrained.

IV. CONCLUSION
In this work, we demonstrate a high-performance current limited FeFET-based crossbar array. The externally shunted resistor acted as a current limiter and reduced the variation in I BL . The MAC operation conducted on 20 numbers of crossbar arrays demonstrates the stability of the operation. The high on-state resistance and low variation ensure error-free MAC operation using current limited FeFET arrays. Table I lists the comparison of this work with other state-of-the-art works.