Low-Power Binary Neuron Circuit With Adjustable Threshold for Binary Neural Networks Using NAND Flash Memory

Recent studies have demonstrated that binary neural networks (BNN) could achieve a satisfying inference accuracy on representative image datasets. BNN conducts XNOR and bit-counting operations instead of high-precision vector-matrix multiplication (VMM), significantly reducing the memory storage. In this work, an analog bit-counting scheme is proposed to decrease the burden of neuron circuits with a synaptic architecture utilizing NAND flash memory. A novel binary neuron circuit with a double-gate positive feedback (PF) device is demonstrated to replace the sense amplifier, adder, and comparator, thereby reducing the burden of the complementary metal-oxide semiconductor (CMOS) circuits and power consumption. By using the double-gate PF device, the threshold voltage of the neuron circuits can be adaptively matched to the threshold value in the algorithms eliminating the accuracy degradation introduced by the process variation. Thanks to the super-steep SS characteristics of the PF device, the proposed neuron circuit with the PF device has an off-state current of 1 pA, representing 105 times improvement compared to the neuron circuit with a conventional metal-oxide-semiconductor field effect transistor (MOSFET) device. A system simulation of a hardware-based BNN shows that the low-variance conductance distribution (8.4 %) of the synaptic device and the adjustable threshold of the neuron circuit implement a highly efficient BNN with a high inference accuracy.


I. INTRODUCTION
Recently, neuromorphic computing inspired by brain architecture has gained much interest because of its extremely low-power and massively parallel operations [1], [2]. In the von Neumann architecture, vector-matrix multiplication (VMM) causes enormous energy consumption due to the memory wall problem of data movement between memory and arithmetic units. On the other hand, neuromorphic computing resolves this problem, by computing vectormatrix multiplication (VMM) with a nonvolatile memory array in a single pulse step, overcoming the limit of the von Neumann bottleneck. To implement neuromorphic computing with nonvolatile memory, researchers have proposed The associate editor coordinating the review of this manuscript and approving it for publication was Junxiu Liu .
implementing the analog conductance of synaptic devices [3], [4]. However, it is challenging to implement an accurate analog conductance state in a memory device due to the non-ideal analog conductance characteristics of the memory device [4], [5].
Recently, researchers have demonstrated that BNN can obtain a comparable inference fidelity to high-precision neural networks on various datasets, such as MNIST, CIFAR-10, and ImageNet [6]- [8]. The BNN dramatically reduces the memory storage and computing resource by binarized activation and weight [6]- [14]. Instead of a high-precision analog state, it allows a binary state of the memory device, which provides a practical way for the implementation of a hardware neural network system [8], [14]. In a neuromorphic system, 2T2R (two select transistors with two RRAMs) was mainly studied as a binary synapse [8], [14]. Recent high-performance DNN algorithms typically demand a large parameter size. Therefore, NAND flash memory can be a promising candidate for synaptic devices to meet this requirement. NAND flash memory offers ultra-high bit density for ample data storage and low fabrication cost per bit, and it has been well known as a mature technology [15]- [17]. In previous research, we reported neuromorphic systems utilizing NAND flash memory as a multilevel synapse for on-chip learning [18] and as a binary synaptic device for BNN digitally [19].
First, in this study, we propose an analog bit-counting scheme with a synaptic architecture utilizing NAND flash memory. The proposed analog bit-counting scheme replaces the digital sense amplifier, adder, and digital comparator with a binary neuron circuit, significantly reducing the CMOS overhead in the neuron circuits compared to a digital bitcounting scheme. A one-bit current sense amplifier (CSA) can serve as a neuron circuit to produce a binary neuron output in an ideal case. However, it may cause considerable inference accuracy degradation because the threshold of the binary neuron circuit can be different from the threshold value in the algorithms due to the process variation [8], [20]. In a previous study, an ADC-like multi-level sense amplifier (MLSA) was employed instead of a one-bit CSA to minimize the accuracy degradation [8]. However, the ADC-like MLSA requires a large CMOS overhead.
Second, for the first time, we propose a low-power binary neuron circuit with a PF device that has an adaptive-threshold to resolve the above problem. Note that the proposed binary neuron circuit serves as a low-power comparator with an adaptive-threshold function, which is different from the conventional integrate-and-fire neuron circuits. We demonstrate that the threshold voltage of the neuron circuits can be adaptively changed by the gate bias or program/erase pulse. Therefore, the proposed neuron circuit can eliminate the accuracy degradation introduced by the process variation without any CMOS overhead. In addition, the PF device based on a gatedthyristor has a super-steep subthreshold swing (SS) [21]- [23]. Finally, we show that the proposed neuron circuit with the PF device significantly reduces the off-state current of a neuron circuit compared to a neuron circuit with the conventional MOSFET device, thanks to the super-steep SS characteristics of the PF device. Fig. 1 (a) and (b) show a synaptic string array architecture consisting of a 2T2S synaptic string structure in a digital bitcounting scheme and an operating voltage scheme, respectively [19]. The 2T2S synaptic string consisting of two input transistors and two NAND strings is capable of XNOR operation. Two input voltages (V in1 , V in2 ) are applied to each gate of the two input transistors which are reused for all synaptic devices in one synaptic string, therefore the number of input transistors is significantly decreased compared to the 2T2R scheme in a previous work [8]. A synaptic device consists of two NAND cells whose complementary state defines the synaptic weight. As shown in Fig. 1 (b), a weight of +1 can be defined as the state where the right cell has a high threshold voltage (V t,high ), and the left cell has a low threshold voltage (V t,low ). In contrast, a weight of −1 can be defined as the reverse state of the two NAND cells. In addition, a complementary state of two input voltages (V in1 , V in2 ) defines the input value. The state of (V on , V off ) and (V off , V on ) can represent an input value of +1 and −1, respectively, shown in Fig. 1 (b). By using the above scheme, the string current (I S ), which represents the XNOR output, is determined by the combination of the complementary input voltages and the state of the two adjacent NAND flash cells. The Fixed reference current (I REF ) of the sense amplifier is set to a value which is between the on-current (I on ) and off-current (I off ) of the NAND flash cells. In this scheme, the current sense amplifier compares the fixed reference current (I REF ) with a string current (I S ) which is the sum of the currents of the two NAND cells (I C1 , I C2 ) to generate an XNOR output.

II. BINARY SYNAPTIC ARCHITECTURE BASED ON NAND FLASH MEMORY
The word-line (WL) decoder applies the read bias (V read ) and pass bias (V PASS ) to a selected WL and unselected WLs, respectively. The input vector switch matrix applies input pulses to the input transistor. The adder sums the XNOR operation outputs, and the summed result goes through a binary comparator to produce a binary output. When V read is imposed on the WL sequentially along the synaptic string, the output of each post-synaptic neuron is sequentially generated. Thus, the output of the k th neuron in the post-synaptic neuron layer is generated when V read is applied to the k th WL. Because multipliers are not required, power consumption is enormously reduced.
We propose an analog bit-counting scheme with a 2T2S synaptic string by using a binary neuron circuit. When multiple currents are summed, the I on dominates the total current (I T ), because the on/off resistance ratio of the NAND cells is sufficiently large [19]. For example, when the number of the synaptic string in the array is 256, the weighted sum of 0 can correspond to an I T of 128 I on s. When I T is smaller than 128 I on s, the binary neuron circuit generates an output −1, which means there are more XNOR outputs of −1 than XNOR outputs of +1 in a row. In this scheme, the binary neuron circuit replaces the digital sense amplifiers, adder, and comparator shown in Fig. 2, which significantly decreases the power consumption and the burden of the circuits compared to the digital scheme in Fig. 1. Furthermore, the neuron circuit can be reused for all neurons in the neuron layer, therefore increasing the integration density compared to the previous work [8]. On the other hand, the process variation of the neuron circuit can reduce the inference accuracy in the proposed analog bit-counting scheme. In principle, a binary CSA can be used as a neuron to generate a binary output. However, process variation results in the intrinsic offset of the CSA, therefore the threshold of the neuron can be different from the target value in the algorithms. It makes sensing pass rate worse when the total current (I T ) from the synaptic array increases as the size of an array becomes large [20], which causes a significant accuracy degradation. In a previous study, an ADC-like multi-level sense amplifier (MLSA) was employed instead of a one-bit CSA to minimize the accuracy degradation [8]. However, the ADC-like MLSA requires an immense CMOS overhead. We propose a lowpower binary neuron circuit with a double-gate PF device that adaptively changes the threshold voltage of the neuron circuit, which significantly reduces the accuracy degradation without the CMOS overhead. Fig. 3 (a) and (b) show the 3-D schematic and top views of the fabricated PF device with a structure of a double-gate floating-body. The PF device has a cathode region (n + -region), gated region (p-channel), ungated region (n-channel), and anode region (p + -region) from the right shown in Fig. 3 (b). The n-channel and p-channel of which the doping concentration is ∼1 × 10 18 cm −3 serve as hole and electron injection barriers, respectively. The O/N/O stack of which the thickness is 2/4.2/9 nm is formed between the p-channel and gate to store charge in the nitride (N) layer. Fig. 4 shows a TEM image of the fabricated PF device. The n + poly-Si double gates (G1, G2) are defined on the left and right sides of the Si 3 N 4 spacer.   The simulation is executed at a V G1 of 2 V and −1 V, which correspond to the turn-on and turn-off states, respectively, at a fixed V G2 of 0 V. In the turn-off state, the electron injection barrier (V p ) and hole injection barrier (V n ) impede the movement of the holes and electrons shown in Fig. 5. When the V G1 increases from −1 V to 2 V, the V p decreases, which results in the injection of electrons from the n + region into the n region. It decreases V n , which results in the injection of holes from the p + region into the p region further decreasing the V p , and electrons flow into the n region again. When this PF process occurs, the device turns on rapidly with a steep SS.   Fig. 6 (a) shows the anode current (I A ) versus G1 voltage (V G1 ) curves measured in the fabricated PF device as a parameter of V G2 . As V G2 increases, the threshold voltage (V th ) decreases because the electron injection barrier (V p ) effectively decreases. Fig. 6 (b) shows the I A -V G1 curves as a parameter of the anode bias (V A ). The built-in potential (V bi ) in the p-n junction impedes the current flow when the V A is small. When V A is larger than the V bi , the I on significantly increases. As V A increases further, the carriers are generated in the reverse biased p-n junction and they accumulate in the n region and p region. As a result, the V p and V n decrease as the V A increases, resulting in a decrease of V th . Fig. 7 shows the change of the V th with V G2 at a fixed V A of 1 V. The threshold voltage can be modulated by changing the bias applied to the second gate (G2) shown in Fig. 7. Therefore, the threshold voltage of neuron circuits can be adaptively controlled, which significantly reduces the accuracy degradation introduced by the process variation. Fig. 8 (a), (b) and (c) show the I A -V G1 curves of the fabricated PF device as a parameter of the number of pulses when the device is programmed with V PGM s of 7, 7.5 and 8 V, respectively. As the number of V PGM s applied to G2 increases, more electrons are trapped in the charge trap layer of the PF device, which increases the concentration of holes in the p region. Then, the V th of the PF device increases gradually. Fig. 8 (d) explains the changes of V th with the number of pulses as a parameter of the pulse amplitude of V PGM . As the pulse amplitude increases, V th increases at the same number of pulses because more electrons are trapped in the charge trap layer. Therefore, the V th of the neuron circuit can be changed by controlling the amplitude of V PGM and the number of V PGM s for the double-gate PF device.    9 shows a schematic diagram of the proposed binary neuron circuit using the double-gate PF device. The neuron circuit consists of the double-gate PF device, two invertors and one n MOSFET. The supply voltage (VDD) is 1.2 V. As current from synaptic array increases, the membrane voltage (V m ) increases. When the V m exceeds the V th of the neuron circuit, then the on-current flows in the neuron circuit and the output voltage (V out ) of the neuron circuit becomes VDD, which can be regarded as a binary output of +1. Then, V out is initialized to 0 V by applying the reset pulse (V r ) to reset-MOSFET (M reset ). Fig. 10 (a) and (b) show the transient waveforms of the neuron circuit as a parameter of the membrane voltage (V m ) and V th of the neuron circuit, respectively. In Fig. 10 (a), the V th of the double-gate PF device is fixed at 0.7 V by controlling V G2 or the program/erase operation. V m increases from V m4 to V m1 . When V m1 exceeds the fixed V th of 0.7 V, the V out1 becomes VDD. In Fig. 10 (b), the V m is fixed at 0.55 V (V m1 ). Then, V th decreases from V th1 to V th3 . Because V th3 is lower than V m1 , V out becomes VDD (V out at V th3 ). VOLUME 8, 2020   show schematic diagrams of neuron circuits using the double-gate PF device and conventional MOSFET, respectively, for a comparison of the power consumption in the off-state (V out = 0 V) of the neuron circuit. Fig. 11 (c) and (d) represent the membrane voltage (V m ), and the current of the neuron device in the neuron circuits using the double-gate PF device and conventional MOSFET, respectively. When V m is lower than the V th , the PF device having a steep switching characteristic shows a very low I off (∼1 pA) during the read operation of the synaptic arrays shown in Fig. 11 (c). On the other hand, in the neuron circuit using the conventional MOSFET, the subthreshold current (∼ 100 nA) of the n MOSFET flows during the read operation of the synaptic arrays shown in Fig. 11 (d). Therefore, during the off-state of the neuron circuit, the neuron circuit with the PF device significantly reduces the power consumption compared to the neuron circuit with a conventional MOSFET.

III. BINARY NEURON CIRCUIT WITH THE PF DEVICE
The effect of synaptic device variation on the inference accuracy of a hardware-based BNN is investigated. Fig. 12 (a) and (b) show the inference accuracy with the weight variation of the synaptic devices on the MNIST and CIFAR 10 images, respectively. A weight variation occurs when the weights obtained in the off-chip training are transferred to the synaptic devices. The variation of the conductance of the NAND cells is assumed to follow the Gaussian distribution [24]. The effect of device variation is more detrimental to convolution neural networks classifying CIFAR 10 than the multi-layer neural networks classifying MNIST. Note that little decrease in accuracy is observed when the sigma (σ w ) of the synaptic weight variation is within about 40%. As noted in previous work [19], when the V PGM is 16 V, the sigma over mean (σ /µ) of the conductance of the NAND cells in an array is about 8.4 %. Therefore, BNN using NAND flash cells as synaptic devices is very robust to the effect of device variation. The effect of the V th variation in the neuron circuits on the inference accuracy is also investigated. Fig. 13 (a) and (b) show the effect of the variation of V th on the inference accuracy of the MNIST and CIFAR 10 datasets, respectively. The threshold voltage variation (σ th ) of the neuron circuits is assumed to follow a Gaussian distribution. The classification accuracy for the MNIST and CIFAR10 datasets decreases significantly as the sigma (σ th ) of the threshold voltage in neuron circuits increases by ∼60% and ∼50%, respectively. In particular, the classification accuracy for the CIFAR10 dataset decreases more severely as σ th increases above ∼50%. The threshold voltage transferred to the binary neuron circuit can be different from the threshold value in the algorithm due to process variation or variation in the transfer process. By using the double-gate PF device, the V th can be matched to the threshold value in the algorithms by controlling the gate bias or the program/erase pulse shown in Figs. 6 and 8. Therefore, the proposed binary neuron circuit consisting of 6 transistors can eliminate the accuracy degradation without CMOS overhead compared to the ADC-like MLSA [8]. Comparing the inference accuracy within 40% sigma (σ w , σ th ) in Figs. 12 and 13 shows that the V th variation in the neuron circuits has a greater effect on the inference accuracy than the weight variation in synaptic arrays in BNN. Therefore, in this work, it can be said that the proposed neuron circuit with the PF device capable of controlling the V th accurately improves the inference accuracy while reducing the power consumption.

IV. CONCLUSION
An analog bit-counting scheme has been proposed to decrease the burden of neuron circuits in a binary neural network with a synaptic architecture utilizing NAND flash memory compared to the digital bit-counting scheme. A novel binary neuron circuit with a double-gate positive feedback (PF) device was proposed to replace the sense amplifier, adder, and comparator, thereby decreasing the power consumption and the burden of the CMOS circuits. The proposed neuron circuit consisting of 6 transistors, including the double-gate PF device, eliminates accuracy degradation without additional CMOS overhead compared to a multilevel sense amplifier. The V th variation of the neuron circuits was more detrimental to the inference accuracy compared to the weight variation of the synaptic devices up to 40 % sigma (σ w , σ th ). By controlling the gate bias or program/erase pulse for the double-gate PF device, we demonstrate that the threshold voltage of the neuron circuits can be adaptively matched to the threshold value in the algorithms. Thanks to the super-steep SS characteristics of the PF device, the proposed neuron circuit with the PF device significantly reduces the off-state current (∼1 pA) of the neuron circuit compared to the neuron circuit with the conventional MOSFET device (I off ∼100 nA). Note that, to accommodate a vast volume of parameters and a large network size required in recent neural networks, high-density NAND flash and the proposed neuron circuit are promising candidates for a neuromorphic system. Therefore, practical realization of hardware neural networks consisting of NAND flash memory and neuron circuits needs to be demonstrated and requires further study. The proposed binary neuron circuit with a synaptic device utilizing NAND flash memory in this work can show the feasibility of energy-efficient and high-density neuromorphic hardware with a high inference accuracy.