Double-Gated Asymmetric Floating-Gate-Based Synaptic Device for Effective Performance Enhancement Through Online Learning

In this paper, we propose a floating-gate-based synaptic transistor with two independent control gates that implement both offline and online learning. Unlike previous research on double-gated synaptic transistors, the proposed device is capable of online learning without facing a fan-out problem. Basic operation of the device was verified and a program/erase scheme based on Fowler-Northeim tunneling is suggested for the multi-conductance utilization of the synaptic device. With the proposed P/E scheme, an offline-trained single-layered hardware-based spiking neural network was simulated for MNIST classification, resulting in 87.37% classification accuracy with 10% conductance variation. To alleviate this performance degradation, the online learning method is applied on the offline-trained SNN by reusing 3,000 training images. The effectiveness of the proposed method is also verified under existence of the synaptic weight variance. As a result, up to 86.89% of the performance degradation is alleviated.


I. INTRODUCTION
Neuromorphic systems are rising candidates for the next generation computing system due to their massively parallel data processing capability and minimal power consumption [1]- [8]. Various researchers have implemented neuromorphic systems using their unique methods and performing machine learning tasks such as pattern recognition or image denoising [9]- [13]. Neuromorphic systems consist of neuron circuits and synaptic devices, and their implementation differ depending on the specific combination of incorporated circuits and devices. The most widely used neuron circuits include integrate-and-fire (IF) neurons, a simplified model of a biological neuron that integrates current in a membrane capacitor and generates an action potential when the membrane voltage exceeds the threshold [14]- [19]. IF neurons receive and transmit a signal in various forms such as leftjustified encoding or Poisson encoding [20]- [22]. Behavior The associate editor coordinating the review of this manuscript and approving it for publication was Cristian Zambelli .
of IF neurons are proven to be equivalent to rectified linear unit (ReLU) activation function of non-SNNs, making offline learning possible by weight transfer from weights that are calculated from an external computer. Candidates for synaptic devices include a flash memory as well as emerging memory devices such as resistive random-access memory (RRAM), phase change random access memory (PCRAM), and ferroelectric tunnel junction (FTJ) [23]- [30]. Both gradual switching devices and abrupt switching devices are used as synapses. Although it is more effective to use gradual switching synaptic devices, which represent continuous synaptic weight in single memory cell, it is also possible to implement one synaptic weight with multiple single-bit devices [31]- [33]. There were studies that implemented a neuromorphic system with widely used flash memory as synaptic devices [34]- [37]. Such studies include fabricating a hardware-based neural network, which conducts vectormatrix-multiplication in a NOR flash array or implementing a binarized neural network by conducting an XNOR operation on a NAND flash [38], [39]. However, there is a downside in using conventional flash arrays since require complex external controllers to implement online learning. For this reason, some researchers devised a double-gated synaptic transistor to easily implement online learning such as spike timing-dependent plasticity [15], [40]. The doublegated synaptic transistor mentioned above can realize online learning and lifelong learning, which are the major advantages of hardware-based neuromorphic systems. However, they induce problems in system operation. Conventional double-gated synaptic devices use output pulse of presynaptic neuron as current source to prevent drain current flowing when only teaching signal is given. However, this method makes the presynaptic neuron to drive an excessive amount of current, causing a fan-out problem [37]- [42]. For example, transmitting spike to one thousand synaptic devices which operate at 1uA each, will require the presynaptic neuron to drive 1mA of current.
Therefore, in this research, we propose a double-gated synaptic transistor capable of online learning without facing a fan-out problem. We handle the fan-out problem by connecting the drain to voltage source. We also prevent the drain current flowing at teaching signal pulse by adopting asymmetrically shaped gates. In the following sections, we analyze the basic operations of the proposed device and the results for neuromorphic system operations when the device is used as a synapse. Moreover, a new learning technique which utilize both online and offline learning is proposed to minimize performance degradation of the neuromorphic system.  (L SO ). The gate-to-S/D overlap length (L ov ) is 0.36 um for the drain region and 0.16 µm for the source region, which ensures effective channel control. The thickness of the tunnelingoxide (T ox ) and the gate dielectric oxide are fixed to 5 nm and 15 nm, respectively. Doped poly-silicon is used for gate1 and gate2 for effective program/erase and online learning. The doping concentration is 5 × 10 20 cm −3 of phosphorus in the S/D region and 1 × 10 17 cm −3 of boron in the silicon box, and the graded doping profile is adopted to account for the realistic fabrication conditions. All of the structural parameters are specified in Table 1.

B. SIMULATION CONDITION
We analyzed the device characteristics through simulation. To increase the accuracy of simulation, we carefully calibrated the entire simulation conditions using measurement data from previous studies. Device simulation was conducted using Synopsys Sentaurus 3D technology computeraided design (TCAD) simulation. Referring [47], all of the physics and parameters are calibrated to implement the actual program/erase (PGM/ERS) of a memory device and the operation of a complementary metal-oxide-semiconductor (CMOS) device (Fig. 2). Bandgap narrowing as well as Shockley-Read-Hall (SRH) recombination models are used and the mobility properties are also considered by using Philips and Lombardi models. Quantum potential and Fermi-Dirac models are included to consider the density gradient quantization and carrier density. The hydrodynamic carrier transport model is adopted as a carrier transport model. We can check that the simulation results, performed with the above physics, fits well with the data from [48]. To simulate the memory characteristics, we calibrated the electron tunneling mass of 5 nm tunneling-oxide using [49]- [51] as reference. The electron tunneling mass is tuned to 0.35 m 0 , which shows a well matched threshold voltage (V th ) shift.

C. DEVICE OPERATION AND PGM/ERS CHARACTERISTICS
Synaptic transistors require two or more gates to perform online learning. Online learning is implemented using potential difference between the input signal and the feedback FIGURE 2. Simulated transfer curves of parameters calibrated using measurement data. (a) Parameter and physics calibration process data from [44]. (b) Memory characteristic calibration process data from [45]- [47]. signal. However, malfunction may occur if a channel is formed by a feedback signal. Therefore, V in and V dd are connected to ensure that drain current flow only when input signal is given as shown in Fig. 3. However, in this case, output spikes of presynaptic neurons operate as current sources inducing extreme fan-out problems. In neuromorphic systems, this problem becomes even more serious as numerous synaptic devices are connected in parallel since R L becomes far lower than R out (Fig. 4).
In the proposed device, the role of gate 1 and gate 2 are different. Coupling ratio between gate 1 and the floating-gate  Energy band diagrams along the source-channel-drain with respect to different gate2 bias. 0V is applied to gate1 and 2V is applied to drain. Note that source-side energy barrier remains constant disregarding changes in gate 2 bias.
is larger than that of gate 2 and the floating-gate. Therefore, gate 1 dominantly controls the on/off operation of the device. Due to the different coupling ratio, the voltage across silicon dioxide between gate 2 and the floating-gate is larger than that of gate 1 and the floating-gate. Therefore, gate 2 becomes the charge source for programming on the floating-gate by Fowler-Nordheim (FN) tunneling. One important characteristic of the proposed device is that the drain current does not flow with a feedback signal even if the drain is connected to a voltage source. The reason for this is the existence of an area under gate 1, which maintains a source side energy barrier regardless of the voltage applied to gate 2 (Fig. 3). It can be verified in Fig. 5, which is the energy band diagrams of the channel with respect to different gate 2 voltages. The characteristics explained above play an important role in preventing a malfunction in online learning, where a signal is given to both gate 1 and gate 2. Without such a characteristic, the neuron will fire at an excessive rate when a large feedback signal is given to gate 2.
The PGM/ERS operation of the proposed device utilizes FN tunneling across silicon oxide between gate 2 and the floating gate. Before diving into online learning, we first analyze basic PGM/ERS characteristics for utilizing the proposed device as a multi-conductance synaptic device for offline learning. Fig. 6(a) presents the bias condition of PGM/ERS for offline learning. Rectangular pulses are applied to gate 2 of the device, and gate 1 and the drain are biased as 0 V during the learning process. For PGM, an individual PGM pulse is applied to gate 2 for 10 µs with the magnitude of −6.1 V. When the PGM pulse is applied to gate 2, most of the voltage is applied to the tunneling-oxide formed between gate 2 and the floating-gate since the coupling ratio between gate 1 and the floating-gate is larger than that of between gate 2 and the floating-gate. For this reason, the FN tunneling of electron between gate 2 and the floating-gate occurs, changing the charge stored in floating-gate. It can be seen that the amount of charge decreases as the pulse is applied. However, as the PGM pulse is applied repeatedly, the charge of the floating-gate lowers the potential of the floating-gate, which degrades the PGM efficiency. Therefore, the change rate of the charge reduces as the pulse number increases.
Transfer curves were verified for each of the PGM/ERS states. Gate 2 and drain are biased at 0 V and 2 V respectively. Then, gate 1 is set as the control gate as mentioned above. During gradual PGM, as the pulse number of PGM increases, the reduced charge in the floating-gate causes the threshold voltage (V th ) of the device to increase. In contrast, with a repeatedly applied ERS pulse, Vth decreases due to the accumulated charge in the floating-gate. This results in 0.65 V and 0.4 V of memory window (MW) for PGM and ERS, respectively. The conductance values are extracted in each of the states at a read voltage condition of 1.5 V, which ensures a stable operation of the proposed synaptic device. The amount of conductance decreases with the PGM states. However, the conductance change rate becomes smaller because the sufficiently discharged floating-gate depresses the tunneling of additional electrons. In the case of the ERS states, the opposite tendency of conductance is shown, but the same tendency of the conductance change rate is also verified due to the sufficiently charged floating-gate. This shows the same tendency with the charge change of the floating gate as illustrated in Fig. 6(c).

D. SYSTEM LEVEL SIMULATION OF SYNAPTIC DEVICE
In order to analyze the system-level performance of the proposed device, an offline-trained hardware-based SNN is simulated. A single-layered hardware-based SNN is trained to classify the modified National Institute of Standards and Technology (MNIST) dataset, which exhibited 92.06% of classification accuracy on ideal non-SNN. Fig. 8(a) presents a schematic of a synaptic array using a proposed double-gated device. V are feedback signals for online learning. One synaptic weight is represented by a pair of two synaptic devices, denoted by G + and G − , respectively [52]- [55]. G + injects a current to the membrane capacitor of a postsynaptic neuron, which is responsible to the positive part of a weight. On the other hand, G − withdraws current from the membrane capacitor, which is responsible to the negative part of a weight.
Weight values are calculated externally from a non-spiking optimized neural network using a stochastic gradient descent with minimum square loss and transferred into our SNN after weight quantization. For a pair of synaptic devices representing a positive weight, G − is programmed to its lowest conductance. For a pair representing a negative weight, G + is at its lowest conductance.
As illustrated in Fig. 8(b), pixel values of input images from MNIST was represented using a Poisson-encoded spike train, where the spiking rate is proportional to the input pixel value. The maximum of 255 spikes are used for a single pixel since the MNIST dataset are 8-bit grayscale images. Using the synaptic device as described above yielded a classification accuracy of 91.64%.

III. ONLINE LEARNING A. IMPLEMENTATION OF ONLINE LEARNING ON DEVICE
In this session, we maximize the performance of a neuromorphic system by utilizing both offline and online learning. Offline learning is effective in the way that it adopts optimal synaptic weight values computed from an external computer. However, it is prone to performance degradation from problems including variance, limited fan-in, and overflow [56]- [58]. In other words, computer weights are optimal for the ideal neural network, but they are suboptimal for hardware-based neural networks. Therefore, in this paper, we apply online learning on a neuromorphic system trained offline proceeding in order to maximize the performance. Similar to previously researched double-gated synaptic devices [14], [15], the proposed device utilizes the overlapping input and teaching signal for online learning. The teaching signal nor the input signal provide enough voltage for change in floating gate electron density. Therefore, changes of the floating gate charge and the threshold voltage upon the online programming scheme must be analyzed. As presented in Fig. 9(a), a bipolar input pulse with an amplitude of 1.5 V and a duration of 600 ns is applied to gate 1 for the reading and current summation. The feedback pulse is applied to gate 2 for an online depression and potentiation, and the voltage and duration of which differs depending on the target polarity and magnitude of the conductance change.
In the case of online depression, a −5.9 V feedback pulse is applied to gate 2, which causes a maximum potential difference between gate 2 and gate 1 to become −7.4 V. This is a sufficient potential difference for FN tunneling of the electron from gate 2 to the floating gate. For online potentiation, an applied gate 1 input pulse is the same as the online depression, and the feedback pulse is applied with a positive 5.1 V pulse as opposed to the online depression. In this case, the potential maximum difference between gate 2 and the floating gate becomes 6.6 V, which causes FN tunneling of the electron from the floating gate to gate 2. The resulting charge VOLUME 8, 2020 change amount is less than 10% of the offline learning charge change amount. The amount of Vth shift differs depending on the conductance of the synaptic device before receiving a signal, and the conductance change ( S/S) on the online depression and potentiation are presented in Fig. 9(b).

B. SYSTEM-LEVEL SIMULATION FOR ONLINE LEARNING
In actual implementation of hardware neural networks, performance degradation occurs due to the device conductance variation. Therefore, the performance gap between the ideal non-SNN and the hardware-based SNN can be mitigated by fine conductance modulation, which can be achieved by applying online learning on offline-trained SNN of Fig. 8. By applying a teaching signal with a certain rule on the double-gated synaptic device, we can emulate a gradient descent in updating the synaptic weight. First, consider a single-layered perceptron: where X is the input vector, Y is the output vector, W is the weight matric, and L is the loss function. Minimizing the mean square error (L) with the gradient descent yields the following equation: To emulate this equation on hardware, we update the conductance of G + and G − with teaching signals and input signals. Since decreasing G − by G yields the same result as increasing G + by G, we only use the online depression upon online learning for reducing the overall current level and power consumption. For weights connected to the output neuron corresponding to the label, we decrease G − since the weight must increase, and for the others, we decrease G + . According to Fig. 8(b), conductance change of the synaptic transistor occurs at the presence of a presynaptic spike overlapping with the teaching signal. Therefore, with rate-coded data, w ij is proportional to the expected number of input spikes, overlapping with the teaching signal. If the teaching signal with duty proportional to y j − t j is given to gate 2, w ij becomes proportional to (y j − t j )x i , successfully emulating a weight update by a gradient descent. A step-by-step flowchart for online learning is presented in Fig. 10(a). During the online learning procedure, we further reduce training loss by reusing training images. If the hardware SNN operates as desired, only the output neuron corresponding to the target class should fire at the target speed and the others should remain silent. Therefore, if a neuron fired less than the target firing rate, the synaptic weights connected should increase, and if it fired more, synaptic weights should decrease. An example for node voltages is presented in Fig. 10(b)-(d). As illustrated in Fig. 10(d), a weight change occurs when V pre −V fb becomes maximum. Since duration of teaching signal V fb is proportional to the difference between the number of target output spikes and the actual number of output spikes, it can be seen that the larger the output error, the more the conductance change occurs.
The conductance variance of nonvolatile memory is inevitable. There always exists a mismatch between target conductance and programmed conductance. To verify the effect of conductance variation on SNN performance, we assumed an ideal synaptic device with continuous conductance states and a log-normal distribution of conductance variation is also considered. As the variance increases, the system accuracy becomes lower since the synaptic weights are not at its local optimum anymore. Applying the online learning method, we can set the synaptic weights to become closer to the local optimum, increasing the performance, reducing up to 87% of the performance degradation. The proposed online learning method shows significant performance enhancement and this additional online learning method should be performed without a fan-out problem. Classification accuracy of the hardware-based SNN before and after applying online learning with respect to weight variance is presented in Fig. 11.

IV. CONCLUSION
In this paper, we analyzed the device characteristics and the system level operation of a floating-gate-based synaptic device with two control gates. The proposed device is programmed to 16 different conductance levels by applying a pulse at one of two control gates, thus designating it as a qualified synaptic device candidate for a hardware-based spiking neural network. Through an effective online learning method, the performance of the hardware neural network is maximized and variation immunity is achieved. The proposed synaptic transistor and its training strategy enable efficient lifelong learning of a neuromorphic system.

ACKNOWLEDGMENT
(Donghyun Ryu and Tae-Hyung Kim contributed equally to this work.)