A Fast Weight Transfer Method for Real-Time Online Learning in RRAM-Based Neuromorphic System

In this work, a synaptic weight transfer method for a neuromorphic system based on resistive-switching random-access memory (RRAM) is proposed and validated. To implement the on-chip trainable neuromorphic system which utilizes large-scale hardware synapse units, a fast and reliable write scheme needs to be established. Based on the experimental results, it is confirmed that the gradual set and full reset operation is the most suitable operation scheme for fast programming due to the fundamental reliability characteristics of the resistive-switching memory cell. Also, the superiority of this programming method using the proposed RRAM compact model is demonstrated. In addition, a one weight/one synaptic device structure is newly adopted for realizing high-density synapse arrays by using a nonnegative weight constraint in supervised learning. Finally, the pattern recognition accuracies obtained at the software and hardware levels are compared.


I. INTRODUCTION
Numerous studies have been conducted in academia and industry to imitate the limitless cognitive abilities of the human brain to learn, remember, infer, and forget in an incredibly energy-efficient and natural way [1]. AlphaGo once again opened a door to a new stage of artificial intelligence by introducing an elaborately and systematically trained artificial neural network (ANN) model [2]. The latest deep neural network concept and its learning algorithm should be highly evaluated not only for their learning and inference ability, but also for reproducibility. Recently, Intel announced a spikeevent-based neuromorphic chip called Loihi2, which doubles the synapse density and operates 5,000 times faster than the biological neuron [3]. Furthermore, Samsung, which is The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Tang . one of the leading companies in the semiconductor memory business, presented the concept of delivering the structure and function of a human brain into a high-density memory chip [4].
A schematic diagram of a biological neural network is shown in Fig. 1(a). Biological neurons that operate based on the integrate-and-fire mechanism to transmit weighted signals through the synapse region and also the synaptic connections and their long-/short-term plasticity are known to play the most important role in the learning and memory functions of a human brain and various studies which implement those functionalities into electronic systems have been reported [5]- [11]. The conceptual structure of a simple ANN ( Fig. 1(b)) is deeply inspired by the biological neural network, and this structure has been the basis of most of the well-known neural networks [12]. Various nonvolatile memory array structures can be considered to realize the connectivity and synaptic plasticity of neural networks at the hardware level, but the cross-point array structure adopting the two-terminal resistive-switching random access memory (RRAM) has been the most actively discussed.
For ultrahigh-density applications, a 3D and vertically stacked structure that is similar to the recent 3D NAND Flash memory may also be considered as shown in Fig. 1(c) [13]. Multilevel conductivity states and long-/short-term memory characteristics that can be realized within a highly scalable cell structure have made RRAM favored by many researchers. However, the switching operation based on the soft breakdown of a switching layer and the read operation that relies on direct charge flows through the switching layer are always a concern for this memory device, which result in reliability issues such as uniformity and endurance [14]- [17]. Although several approaches considering switching layer engineering, pulse operation scheme, and unit memory structure (1R, 1S1R, 1T1R, etc.) have been proposed and investigated to improve the reliability, they still do not reach the industrial requirements level. At the same time, a thorough study on the write method of a synapse array considering the device characteristics such as multilevel state and reliability, as well as the memory architecture, is required for a high-density and high-speed neuromorphic system. A fully parallel write method in a resistive synapse array can be a suitable option due to its as speed performance and parallelism, but it has some limitations [18]. (a) Typical I-V curves of a multilevel resistive switching memory. From the initial HRS, a memory cell switches to LRS and shows different conduction characteristics based on the materials and resistance levels. (b) In pulse operation, the conductivity is increased by consecutive set pulses and is decreased by consecutive reset pulses. (c) Conductive filaments generally consist of atomic vacancy regions or electron traps. The gap between two electrodes is effectively shortened by set pulses and is ruptured by reset pulses.
First, it is necessary to check the state of preneuron and postneuron nodes when performing quantitative weight update before each write operation complicating the write operation itself. In addition, it has difficulty utilizing various techniques such as the incremental-step-pulse programming scheme and verify/inhibit operation because it uses pulses with a fixed voltage amplitude and is also based on the assumption that the conductivity change is proportional to the overlapped pulse width. In this work, a fast write method for hardware transfer of ideally trained synaptic weight from an ANN is proposed and validated. For the validation, an RRAM compact model is adopted into a cross-point array and adjusted to fit the device characteristics. For RRAM cells showing voltage-dependent switching characteristics, a sequence of write operation utilizing gradual set and full reset (GSFR) is proposed and verified through SPICE simulations. Furthermore, a one weight/one synaptic device (1W1S) implementation is adopted using a nonnegative weight constraint in software training to realize a high-density synapse array. Finally, the pattern recognition accuracy of the multilevel conductance synaptic array is compared with that of a softwarebased ANN.

II. SPICE COMPACT MODEL FOR MULTILEVEL RRAM A. RRAM CELL CHARACTERISTICS
An RRAM typically shows the bipolar switching phenomenon, which is the transition of the cell resistance state from high resistance state (HRS) to low resistance state (LRS) and vice versa under the opposite voltage polarity ( Fig. 2(a)). The switching voltage may vary depending on the switching layer material, the thickness, and the combination with the electrode materials [19], [20]. Some devices exhibit gradual set or reset switching over a specific voltage range, and the characteristics can be used to obtain multilevel resistance states using pulse width/amplitude modulation to store multibit data. Each resistance state has its own conduction mechanism and represents different I-V behaviors depending on the switching material and the presence of the conducting filament. The conductivity of the RRAM cell can be gradually modulated by applying appropriate pulses [21]. In studies on memory-based neuromorphic systems, the phenomena of conductivity increase and decrease are commonly called long-term potentiation and depression, respectively, which are named after the biological synaptic plasticity characteristics. In phase A in Fig. 2(b), the conductivity is increased uniformly by positive set pulses, and in phase B, the amount of change begins to decrease due to degradation of the switching efficiency. By adopting the incremental-pulse scheme, the phase-A-like behavior can be extended but will have limitations in the end. Phase C represents the initial stage of the conductivity decrease by reset pulses. Similarly, when entering phase D, the amount of change also begins to decrease. It is known that the conductivity change behavior by an identical pulse can be asymmetric in potentiation and depression, and various studies to modulate the characteristics at the device level have been reported [11]- [23]. The shape of the filament corresponding to each resistance state (states 0-7) can be described as shown in Fig. 2(c). All traps or vacancy regions involved in the conduction expand further under a certain condition, eventually connecting both electrodes. Also, under a certain condition, the connected dense filament regions can be ruptured, which is known to be mainly related to Joule heating [15]- [17].
An RRAM device is considered as one of the plausible candidates for a synaptic device in a neuromorphic hardware system because of its scalability and simple process. Gradual resistance changes of RRAM cells can be used to simply express the connection strength of biological synapses in hardware, and it is also necessary to implement an on-chip learning method such as spike-timing-dependent plasticity (STDP) in a unit device [24]. At the same time, low energy consumption in an RRAM-based synaptic device is necessary because a large number of synaptic devices operate simultaneously and in parallel even if an ideal 1W1S structure is configured. Few studies have been reported to overcome this problem by adding a thin dielectric film to suppress the operating current [25], [26]. Similarly, the RRAM device used in this work was able to suppress the operating current to µA-level by using a thin SiO 2 as a tunnel barrier layer. Detailed explanations of the process flow and the operating principles can be found in our previous work [27].

B. RRAM CELL MODELING FOR SPICE SIMULATION
It is necessary to implement the switching and conduction characteristics of RRAM cells at the circuit and system levels for the development and verification process. Several studies have been conducted and reported to realize the general characteristics of RRAM cells [28]. A SPICE compact model was also proposed, and it has been successfully applied to various RRAM devices in a cross-point array [19]. Fig. 3(a) represents the RRAM cell structure. In general, the switching layer of a filamentary-type RRAM cell is divided into a filament and other region, and the filament region is connected or broken depending on its resistance state. It has been reported that the filament region may be an oxygen vacancy for metal oxide materials or an electron trap for silicon nitride materials, which plays a major role in charge conduction. Fig. 3(b) shows the proposed RRAM cell model for SPICE circuit simulation. Linear and nonlinear resistors are used to realize the conduction behavior of the memory device. Also, several voltage-dependent switches are used to obtain the voltage-dependent switching characteristics. The I -V characteristics of our device and the model are represented together in Fig. 3(c). The parameters are elaborately adjusted to implement the voltage-dependent switching characteristics in the range of -6.0 to 6.0 V, which are importantly utilized in the write scheme proposed in this work. For example, the values of a nonlinear resistor R G and the voltage-controlled switches S 1 -S n are determined to describe more accurately the conduction behavior through the rupture and the connection of conduction paths. Also, a nonlinear resistor R F is placed in series and adjusted to describe the LRS conduction behavior after the initial forming process. Although it is necessary to determine a linear resistance R C considering the magnitude and variation of the contact and line resistances in the actual array configuration, it was not considered in this study. In addition, capacitive elements such as the cell (C C ) and gap (C G ) capacitances need to be elaborately determined, but they are beyond the scope of this study and does not affect the experimental results. Table 1 summarizes the circuit components and parameters and their values in the model.

III. FAST WRITE SCHEME FOR REAL-TIME LEARNING OF HARDWARE NEURAL NETWORK
Each layer of a neural network trained at the software level may contain a lot of continuous or quantized weight values. To ensure the software-level inference accuracy, it is necessary to accurately transfer the trained weight matrix into the hardware synapse array while suppressing nonideal phenomena such as voltage drop by line resistances, read disturb, and leakage current. In this part, a write method that transfers the weight matrix into synapse arrays for high density and high-speed applications is proposed and verified.
To obtain multilevel resistance states and reach the target state, the following schemes can be considered: gradual set and full reset (GSFR), full set and gradual reset (FSGR), and gradual set and gradual reset (GSGR). Although the GSGR method, which takes full advantage of the RRAM's bidirectional gradual switching characteristics, seems the most attractive, but it needs to be reconsidered from two perspectives. First, negative voltages that are repeatedly applied can cause serious reliability problems. For example, Fig. 4(a) shows that reset switching failure can easily occur in pulse operation by repeated applied negative voltage pulses. Distributions of conductance change in the gradual set and reset  are shown in Fig. 4(b). It is confirmed that the gradual set operation is more reliable and uniform from the fact that the standard deviation of conductance change by one pulse is as small as 20% compared to the case of the gradual reset operation. DC cycles in Fig. 4(c) also show that reset operation has more difficulty preventing switching failure compared to the set operation, which can be easily managed by limiting the current flow using current compliance (I C ). In fact, reset operations have a condition that can easily cause breakdown because the RRAM device must withstand high current and temperature under negative bias conditions in order to trigger reset switching without the current limit. On the other hand, set failure, which can be defined as failure of proper switching operation from HRS to LRS, has been considered a relatively minor issue and rarely reported, and it never occurred in our experiment. In addition, the high resistance tail of the setstate distribution, known to be caused by this set failure [29], VOLUME 10, 2022 FIGURE 6. Voltage and current waveforms of (a) Ni/SiN x /p + -Si and (b) Ni/SiN x /SiO 2 /p + -Si RRAM cells when applying the incremental step pulse scheme in two different cycles. By inserting a thin SiO 2 layer, highly reliable gradual switching characteristics can be achieved and current overshoot can be suppressed by itself.
is thought to be mostly resolved using the current compliance and the incremental-pulse scheme proposed in this study. Second, the bidirectional gradual switching operation may lead to longer time by complicating the write scheme and may also increase the burden of the peripheral circuitry by requiring a bidirectional sense-out operation. Recently, GSGR-based write methods have been proposed [30], [31]. Compared to the method proposed in this work, they are identical in terms of adopting iterative loops of the program-verify operation and incremental-step-pulse technique to reach the target conductivity of gradual RRAM. The main difference between those methods and the proposed method is whether gradual set or reset is determined based on the comparison with the target state and the current cell state. Such a decision process before each write loop and a bidirectional switching process may not be beneficial in terms of overall speed and device reliability.
Pulse operation and its effect on device conductivity are shown in Fig. 5(a)-(d). With a positive set pulse, the conductivity level gradually increases (Fig. 5(a) and (b)). Similarly, with a negative reset pulse, the conductivity level abruptly decreases (Fig. 5(c) and (d)). As described in Fig. 2(b), there is a fundamental difference in the pattern of the conductivity change due to positive-/negative-pulse application especially in the initial stage. Using these asymmetric switching behaviors, the concept of a high-speed write method is proposed and validated. It can be easily understood that the slight differences between the measurement and simulation data shown in Fig. 5(b) and (d) have little effect on the overall speed of the write method because it cooperatively utilizes the incremental-pulse scheme, multiple set/read loops, and verify/inhibit techniques. The voltage and current waveforms showing the effect of the inserted thin (∼1.5 nm) oxide layer on the switching characteristics are shown in Fig. 6. As incremental step pulses from 4.2 to 4.6 V are applied in two different cycles, abrupt switching occurs and a high current of several mA flows in the absence of SiO 2 , which may seriously affect the device reliability. On the other hand, when SiO 2 is inserted, it can be seen that a reliable gradual switching that guarantees multilevel resistance states occurs consistently and that operating current is reduced by more than 10 times due to the SiO 2 tunnel barrier. The reduction in operating current has an important meaning because it is directly related to the reduction in energy consumption and the maximum number of synaptic devices that operate in parallel and simultaneously.
A schematic diagram of an RRAM cross-point array and the description of our cell model are represented in Fig. 7(a). The device characteristics can be implemented using the SPICE subcircuits, and this model can be simply embedded in the netlist of the cross-point array. The concept of the WLby-WL write scheme is described in Fig. 7(b). V WL is applied to the selected WL, and GND or V IH is applied to the BL to determine whether it is written or inhibited. In addition, 1/2V WL is applied to the unselected WL, thereby suppressing the conductivity change (write disturb) of the cells in the unselected WLs. The voltage waveforms in Fig. 7(c) are sequences that perform fast multilevel write operations. For  the first step, all RRAM cells in the selected WL are prepared to have the lowest conductivity using the full reset operation. Then, a small set voltage V SET is applied to obtain a relatively low conductivity state, and the amplitude is increased by V STEP . A read voltage V RD is applied between each switching pulse to sense out the conductivity state, and whether or not the cell is inhibited is determined depending on whether or not the target conductivity is reached. The inhibit voltage V IH has a margin that (V SET − V IH ) does not cause set switching operation and that (1/2V SET − V IH ) does not cause reset switching operation as summarized in the following equations. Table 2 summarizes the parameters used in the circuit simulations.

IV. RESULTS AND DISCUSSION
The voltage waveforms of the BL and selected/unselected WL are shown in Fig. 8(a). V WL from 4.0 to 6.2 V was applied to the selected WL, and 1/2V WL was applied to unselected WL. BL is grounded at the beginning of the cycle, and V IH was applied to the BL when the cell conductivity became equal to or greater than the target conductivity. Fig. 8(b) and (c) shows the current waveforms and conductivity state of each cell in the selected WL. After the full reset operation, all cells in the WL are prepared to have the lowest conductivity state (state 0). In addition, it is confirmed that the conductivity of each cell is increased for each cycle, and when the target conductivity is reached, the state is maintained by the inhibit operation. According to the experimental results, it was confirmed that full reset operation consumes 87.76 pJ and gradual set operations consume 30.58 pJ to 2.52 nJ, respectively. Because the energy consumption of the write VOLUME 10, 2022 operation increases with the cell conductance, it is important to achieve a precise control of the multilevel states in the low conductivity region. More specifically, the insertion of a thin oxide layer can be an appropriate option to reduce the energy consumption as confirmed in Fig. 6. It should be noted that unexpected or nonideal phenomena such as variations from the target conductivity and initial conductivity change after the write operation were not been considered in the circuit simulation. The total time required for the overall write operation of the entire array when adopting the proposed sequence and the GSFR/FSGR operation can be expressed as follows: In these equations, n WL is the number of word lines, n state is the number of conductivity states, and α is the number of pulses to increase on the conductivity state. Fig. 9 shows the total time required to implement 8, 16, and 32 conductivity states in a 784 × 10 hardware synapse array. In this work, it is assumed that only one pulse is required to move between states (α = 1). It is confirmed that t total, GSFR is 35.3% to 176.5% smaller than t total, FSGR in the case of t SET = 1 µs and t RST = 2, 3, 4, 5 µs. In addition, t total, GSFR increases from 80.0% to 282.4% when the number of states increases from 8 to 16 and 32. Therefore, under conditions that require a longer reset pulse compared to the set pulse [21], the GSFR operation is always superior in terms of reducing the total write time. Implementing multilevel states is fundamentally different from implementing a binary state, and always requires longer write time. It should be noted that this comparison is limited to the performance of the WLby-WL write method using gradual switching characteristics. Also, it can be concluded that it is advantageous to reduce the number of states if possible provided that the performance degradation such as inference accuracy is minimized. The inference accuracy of a hardware synapse array after weight transfer from an ANN was obtained by SPICE circuit simulations and compared to those of different software cases. The weight distributions of a one-layer ANN before and after supervised training are represented in Fig. 10(a). When trained for 50 epochs without any constraints, it can be seen that most of the weight values are distributed in the range of −2 to 2. Fig. 10(b) shows the weight distributions before and after supervised learning, and after quantization when there is an initial/in-training nonnegative weight constraint. The nonnegative weight constraint is set before training process starts and is applied each time to adjust the weight values. With the nonnegative constraint, the initial and final weight values are distributed in the range of zero or more. After training, the weight values can be quantized to suit the hardware implementation. The red bars in Fig. 10(b) indicate the position and distribution of the quantized weights.
In Fig. 10(c), Case1 shows the pattern recognition accuracy when the Modified National Institute of Standards and Technology (MNIST) dataset is used in an ideal software level. It can be seen that the nonnegative weight constraint has little effect on the accuracy (Case2). According to the experimental results in Fig. 10(c), it can be confirmed that the negative weight value is not always essential if there are more than 32 weight levels and to solve problems such as simple pattern recognition. This result is also consistent with the results of a recently reported study [11]. At the same time, weight quantization can affect the accuracy depending on the number of states (Case3−Case5). In particular, when the number of states is smaller than 16, the accuracy decreases to less than 90% (Case5). From the relationship between weight quantization and inference accuracy, it can be seen that 16 levels (4 bits) or higher weight resolution are required to achieve acceptable accuracy (e.g., 90%). This would be a fundamentally necessary feature for synapse weight to learn or classify the characteristics of image patterns with analog levels. These synaptic properties related to quantization are generally consistent with those reported in the literature although it may be slightly affected by the neural network structure, the number of parameters and the complexity of the image dataset [32], [33]. The weight values from software training can be converted into conductivity of synaptic devices. It is assumed that the conductivity ranges from 0.02 to 25.2 µS, and can be quantized to more than eight levels, as shown in Fig. 3(c). Case6 shows that the accuracy of the inference operation performed at the hardware level is equivalent to that at the software level. A subtle difference in the accuracies between software and hardware is related to the number of test datasets. In our experiment, 10,000 test images in the MNIST dataset were used to check the accuracy of the software, and 500 test images were used in the circuit simulation to check the accuracy of the hardware. From the results in Figs. 9 and 10, it can be concluded that there is a trade-off between the inference accuracy and the total write time, with both mainly determined by the number of conductivity states, i.e., n state .

V. CONCLUSION
In this work, a fast and reliable write scheme that fully utilizes the GSFR operation of RRAM is proposed and validated.
To realize the voltage-dependent resistance switching behavior of a CMOS-compatible SiN x -based RRAM cell, a SPICE compact model is introduced and adjusted based on the device characteristics. Considering that set pulses generally requires relatively shorter time than reset pulses and that reset failure is a more vulnerable and difficult-to-handle phenomenon, it is confirmed that the GSFR-based WL-by-WL scheme is superior in terms of speed and reliability compared to the FSGR-based operation. Finally, the inference accuracy of the synaptic memory array with the nonnegative constraint and weight quantization, is quantitatively compared with that of an ideal software algorithm.