NOR-Type 3-D Synapse Array Architecture Based on Charge-Trap Flash Memory

In this work, we proposed a three-dimensional (3-D) channel stacked array architecture based on charge-trap flash (CTF) memory for an artificial neural network accelerator. The proposed synapse array architecture could be a promising solution for implementing efficiently a large-size artificial neural network on a limited-size hardware chip. We designed a full array architecture including a stacked layer selection circuit. In addition, we investigated the synaptic characteristics of CTF device by using technology computer-aided design (TCAD) simulation. We demonstrated the feasibility of the synapse array for neural network accelerators through a system-level MATLAB simulation with the Modified National Institute of Standards and Technology (MNIST) database.

In this paper, channel-stacked 3-D synapse array based on CTF memory devices is proposed to implement highdensity synapse array. Compared with our previous array based on the gate-stacked structure [14], the proposed array architecture has several advantages in terms of the unwanted interference between stacked layers and process cost. More details will be discussed in the next chapter.
We demonstrate the feasibility of CTF memory-based synapse array and neuromorphic system application through the results of device-level and system-level simulation. Both TCAD and MATLAB simulation tools are used to characterize the CTF memory device operation and pattern recognition application with MNIST database by implementing fully-connected single layer neural network. Also, fabrication process flow of the 3-D channel stacked synapse array with CTF memory devices is introduced. Fig. 1(a) illustrates an equivalent circuit diagram of the proposed synapse array architecture. The proposed 3-D synapse array has a form in which conventional planar NOR flash memory arrays are vertically stacked. In order to selectively operate each stacked layer, the layer selection circuit is added. The detailed configuration and operation method will be discussed later. Fig. 1(b) represents a unit synapse structure where two CTF memory devices are connected in series. In each CTF device, respective conductance information (G + or G − ) is stored in a charge-storage node (silicon nitride layer). The synaptic weight of each synapse is represented by the difference between two conductances: w = G + − G − . When an input voltage (x) is applied to the IL (input line), the drain currents of OL(+) and OL(-) flow into a neuron circuit. Theses drain currents are determined by x · G + and x · G − , respectively. In order to modulate the conductance of each CTF memory device, hot electron injection (HEI) and hot hole injection (HHI) can be used. Fig. 2(a) shows a top view of the proposed full array architecture. The output of each neuron is defined as the difference between the output line currents [16]. The specific structure of the synapse array is depicted in Fig. 2(b). WLs and ILs are alternatively arranged in parallel at the top of the synapse array structure. OLs are configured for each stacked layer, and they are connected to the layer selection circuit as can be seen in Fig. 2(c). OL 1,n(+) of each synapse layer and OL 1(+) of neuron circuit are connected through layer selection transistor (LST). The gates of LSTs are connected by LSL (layer selection line). By selectively applying a turnon voltage (V PASS ) to the LSLs, only the selected synaptic layer is connected to the neuron circuit. Specific operation bias schemes for the layer selection are shown in Table 2. Note that the number of connected LSTs for one OL 1(+) (or OL 1(−)) of each neuron circuit is equal to the number of stacked layers. In addition, by arranging a plurality of LSTs in a direction parallel to the each OL, the effective unit cell area (45 F 2 ) does not increase even if the number of stacked layers increase.

II. 3-D CHANNEL-STACKED SYNAPSE ARRAY ARCHITECTURE
In our previous work [14], the gate-stacked type of 3-D synapse array architecture was proposed. In this work, unlike the previous one, the array architecture is based on the channel-stacked type. With respect to 3-D stacked array architecture design, unwanted electrical interference between stacked layer should be carefully considered. However, in the case of the gate-stacked type, the gate voltage drift of the unselected layer can be caused by the capacitive coupling between the stacked word lines (WLs). In the worst case, memory devices of the unselected layer may be turned on as the electrical potential of unselected WLs is boosted. To avoid this inherent risk, additional peripheral circuit part that applies voltages to the unselected layer is required [11] and this causes an increase in the area of the peripheral circuitry. On the other hand, in the case of the channel-stack type, only the current in the selected layer always flows to the neuron circuit by the layer selection circuit even if electrical coupling occurs between stacked channels. Therefore, there is no VMM calculation error or degradation of system accuracy due to unwanted interference between the stacked layers. In addition, the proposed synapse array architecture has the advantage of reducing process steps compared to the previous one. In the gate-stacked version of our previous work, two metal layers (source lines and drain lines) on top of the main synapse array are needed. On the contrary, the channel-stacked version only requires one metal layer (ILs and WLs are constructed within the same metal layer).  Operation method of the proposed 3-D synapse array architecture is summarized in Table 3. HEI or HHI is used for the learning process. Synaptic weight of the synapse device can be modulated by changing conductance through hot carrier injection. The inference process can be done by applying V WL,inference and V IL,inference to WL and IL, respectively.

A. DEVICE MODELS AND MODEL PARAMETERS
Synopsys Sentaurus TCAD device simulation was used to demonstrate the synaptic operation of the proposed synaptic device. In order to describe basic transistor characteristics in TCAD device simulation, several models were used. Shockley-Read-Hall (SRH) recombination, bandto-band tunnelling (BTBT) models are used to describe recombination which occurs in channel. Spherical harmonics expansion (SHE) model, PhuMob, Enormal, and HighFieldSaturation models were used for mobility [36]. Specific parameters of charge-trapping layer such as trap concentrations, trap energy, and etc. are summarized in Table 4. Fiegna model [36] was used to calculate charge injection to charge-trapping layer through hot carrier injection during HEI or HHI [37]. The amount of injected charge of Fiegna model can be formulated as (1): where I g is gate current, χ and A are constant which determined by fitting experimental data, P ins is the probability of scattering in the image force potential well, F eff is the effective electric field, and E B is the semiconductor-insulator barrier height. We utilized default parameter values which provided by Sentaurus TCAD simulator for mobility models and recombination models. We modified fitting parameters of Fiegna model to adjust the HEI and HHI characteristics. The    Table 5. We verified our model used in the simulation by comparing simulated data and measured one from the reference CTF device [21]. As shown in Fig. 3, HEI and HHI speed of the simulated result approximately follows that of the measured one, validating our simulation models.

B. SYNAPTIC DEVICE CHARACTERISTICS
Basic structure of the device used in simulation is shown in Fig. 4. As summarized in Table 6, each CTF device has gate dielectric layers with 3 nm-thick silicon oxide (tunneling dielectric), 6 nm-thick silicon nitride (charge-trapping layer), 6 nm-thick silicon oxide (the 1 st blocking dielectric), and an   additional 3 nm-thick aluminium oxide (the 2 nd blocking dielectric). Fig. 5 shows the I-V characteristics of the TCAD simulated single CTF synaptic devices. The I-V curve is measured for each HEI pulse, which is applied to inject electrons into charge-trapping layer in total 64 times. Assuming that we define the threshold voltage as gate voltage when the current is 1 µA using the constant current method, the simulated device has shown 260 mV of threshold voltage, and ∼1.9 V of memory window. Fig. 6 shows a simulation result of HEI and HHI. The application of successive pulses makes a gradual change in conductance. The magnitude of pulse is listed in Table 3 and each pulse time is 50 ns for HEI and 100 ns for HHI. Dotted line shows an ideal linear behavior of a synapse device and circles are the simulation result. Both HEI and HHI showed nonlinear characteristics. Instead, interestingly, HEI showed more linear behavior than HHI. This can be interpreted as follows.
The most important factor determining the amount of HEI is the horizontal electric field in the channel region. Electrons locally trapped in the drain region by HEI do not have a significant effect on the horizontal electric field. As a consequence, even if the concentration of trapped electrons increases as HEI pulses are continuously applied, the amount of HEI per pulse does not change significantly. However, the most important factor determining the amount of HHI is the vertical electric field between the gate and the drain. Holes locally trapped in the drain region by HHI reduce the vertical electric field. Consequently, as HHI pulses are continuously applied and the concentration of trapped holes increases, the amount of HHI per pulse gradually decreases.

C. POTENTIATION AND DEPRESSION METHOD
The synaptic weight update by potentiation (increasing the synaptic weight) or depression (decreasing the synaptic weight) operation is the heart of learning process in a neuromorphic system. In this work, we tested two methods to perform potentiation or depression operation. The first method is a bidirectional update method in which both HEI and HHI are used for the conductance modulation. As can be seen the following equation, to increase w, HHI pulses to the G + CTF and HEI pulses to the G − CTF are applied simultaneously. On the other hand, to decrease w, HEI pulses to the G + CTF and HHI pulses to the G − CTF are applied simultaneously.
Another method is a unidirectional update method in which only one injection mechanism (HEI or HHI) is used for the conductance modulation of a single CTF device.
Equation (3) shows a unidirectional update method by using HEI.
In a unidirectional update method by using HEI, HEI pulses are applied to the G − CTF and G + CTF for potentiation and depression, respectively. At this time, no pulse is applied to the OL of the other CTF device (the G + CTF during potentiation and the G − CTF during depression).
A unidirectional update method by using HHI is described in (4).

D. MNIST PATTERN RECOGNITION SIMULATION
System-level simulation of MNIST pattern recognition was performed to demonstrate the feasibility of the proposed synapse array for neuromorphic applications. Binary MNIST image set is composed of 10 hand-written numbers ('0' to '9') which are made with 784 (= 28 × 28) pixels and each pixel has the value of '1' or '0'. This image set is made of 60000 training images and 10000 test images. MNIST pattern recognition was performed using a single-layer artificial neural network with 784 input neurons and 10 output neurons.
There are two types of learning method in the neuromorphic systems. First, off-chip learning method implement learning outside a neuromorphic chip. After external learning is completed in software, the conductace of a synapse device is adjusted to a target level through an iterative programverify scheme. Second, on-chip learning method perform learning in a neuromorphic chip. The program (potentiation or depression) pulses are applied to each synapse devce. At this time, the number of pulses to be applied is determined by the output current of the corresponding neuron circuit. In this work, we conducted on-chip learning simulation. We assumed that the neuron circuit performs a softmax activation function and a program pulse is applied for every input image according to a conventional stochastic gradient decent algrorithm. Fig. 7 shows the simulation result of recognition accuracy as a function of the number of trained samples. Accuracy baseline (88.18 %) is a simulation result of an ideal CTF memory in Fig. 7 as a reference. Unidirectional update methods exhibit better accuracy than bidirectional update methods. In particular, the unidirectional update method by using HEI with better linearity showed slightly better recognition accuracy than the unidirectional update method by using HHI.
However, if we use bidirectional update methods, the accuracy is stuck at around 50% level even with the more training.  In a unidirectional update method, only HEI or HHI occurs in one CTF device. On the other hand, in a bidirectional update method, both HEI pulse and HHI pulse can be applied to one CTF device during learning process. If HHI pulses are applied in the conductance range 15∼60 μS, abrupt conductance change occurs as can be seen in Fig. 6, which means that sophisticated weight modulation of the learning process fails. Therefore, a bidirectional update method has very low learning accuracy compared with a unidirectional update method.
In addition to nonlinearity of conductance tuning, synaptic weight precision (the number of multi-level conductance states) has a great influence on the accuracy performance of neuromorphic systems [25]. The multi-level conductance states as a function of number of HEI pulses are depicted in Fig. 8. Fig. 9 shows the best recognition results for various the number of multi-level conductance states when a unidirectional update method with HEI is adopted.

IV. FABRICATION METHOD
Fabrication process flow of 3-D channel-stacked synapse array based on CTF memory device is represented in Fig. 8. Here, we assumed that there are four layers in the stack, but it is possible to expand the number of stacked layers. When the number of stacks increases by one layer, one row of LST in layer selection circuit in Fig. 2(a) is added. Note that even if the number of stacks increase, the area of synapse array region does not increase. The sepecific method of each step is summarized as follows. Fig. 10(a): At first, multi-layer stacked alternately with silicon oxide and polysilicon is formed on a silicon substrate followed by deposition of Si 3 N 4 . Polysilicon will be used as a channel of CTF memory devices. The top Si 3 N 4 layer will be used as a hard mask for etching and a CMP (chemicalmechanical polishing) stopper. Fig. 10(b): After the multi-layer stacks are formed, patterning by photolithography and dry etching process is carried out. Then, gate dielectric materials are deposited. Fig. 10(c): N + -doped poly-Si deposition and CMP process are performed. Fig. 10(d): Gate photolithogrphy and poly-Si etching are carried out to for the gate formation. Then, n-type ion implantation is carried out for the formation of source and drain regions. Fig. 10(e): Oxide deposition and planarization process are carreid out to make the inter-layer dielectric (ILD). Fig. 10(f): Photolithography and etching process is carried out for the formation of OL trench region and IL contact hole region. Then, selective wet etching of poly-Si is carried out. The region where poly-Si is removed will be filled with tungsten in the next contact process.

V. CONCLUSION
In this paper, we have proposed a 3-D channel-stacked synapse array architecture. The synapse device is composed of a pair of CTF memory devices which have good CMOS compatibility and excellent reliability. The synapse device characteristic was demonstrated using a TCAD device simulation. In addition, we investigated the conductance update method through a MNIST pattern recognition simulation. Compared with our previous array based on the gate-stacked structure, the proposed array architecture has the advantage of preventing abnormal operation due to unwanted interference between stacked layers. The proposed 3-D channel-stacked synapse array architecture will be the promising technology for high-density neuromorphic systems.