Self-Recovery Tolerance Latch Design Based on the Radiation Mechanism

Digital latches are becoming more sensitive to Multiple-Node Upsets (MNUs) due to lower supply voltage and higher integration density of devices. The hardening technique based on using multiple C-Elements (CEs) (the CE acts as an inverter if its inputs are equal to each other, otherwise holds the high-impedance output) has been used to propose the MNU tolerance latches. However, it brings important hardware overheads. Based on the radiation mechanism, this paper proposes an MNU tolerance latch with self-recovery properties to prevent Singe Node Upset (SNU) and MNU propagation in the feedback loops, and providing the smallest overheads in terms of power, delay, rising time <inline-formula> <tex-math notation="LaTeX">$\text{t}_{\mathrm {r}}$ </tex-math></inline-formula>, falling time <inline-formula> <tex-math notation="LaTeX">$\text{t}_{\mathrm {f}}$ </tex-math></inline-formula> and Power-Delay-Area-Product (PDAP) metric compared with the existing MNU tolerance latches.


I. INTRODUCTION
The scaling trend of nanoscale CMOS process technology is dramatically leading to the reduction of supply voltage and the increase of integration density (the proximity of the nodes), making digital latches more susceptible to Multiple-Node Upsets (MNUs) caused by charge sharing [1].
Similar to Single Node Upsets (SNUs), MNUs are also regarded as nondestructive errors. However, tolerating MNUs generally introduce larger overheads because more node pairs must be dealt with. To achieve MNU tolerance in the latches, some Radiation-Hardening-by-Design (RHBD) techniques are designed that focus on tolerating an MNU either by using the Triple Path Dual Interlocked Storage Cell (TPDICE) (as Fig. 1(a)) without restoring the flipped values [2], or multiple C-Elements (CEs) (as Fig. 1(b)). For example, the latch of [3] is designed as per the masking property of the TPDICE cell, but this property can float the output value of the CE instance and bring huge short-power consumption. As illustrated in Fig. 1 (a), if the adjacent nodes A and NA are simultaneous upset nodes, nodes B and NC are also affected to change their values; P3 and N6 are opened, so two short paths are generated (as the dotted lines), thus leading to huge short-power consumption. In order to avoid these issues, The associate editor coordinating the review of this manuscript and approving it for publication was Cristian Zambelli . a traditional and simple approach is to interlock multiple CE instances to restore the flipped nodes. The main advantage of the CE is that it can be implemented with a low complexity: 1. if the inputs are equal to each other, the output value is the complementary value of an input; 2. if the inputs are different, the output node becomes a high-impedance state. This makes it easier for the designers to propose a tolerance latch with self-recovery properties by using the interlocked approach. For example, the CEs can be directly interlocked to restore the flipped nodes [4] (Fig. 2(a)). Or in the latch of [5] (as Fig. 2(b)), multiple (nine) CEs are firstly used to propose an SNU tolerance module by the interlocking approach, then such modules are further interlocked to design an MNU tolerance latch with self-recovery properties. Although the latches in [4] and [5] restore the flipped nodes depending on the correct nodes of the unaffected CEs (the advantage of self-recovery is that it mitigates the errors and recovers the function of a system to maintain the highest possible performance [6]), this technique obviously brings significant layout area and power overheads because a larger number of nodes and devices are used for tolerating MNUs.
In order to achieve the reduction of the overheads, in this paper, a novel MNU tolerance (MT) latch with self-recovery properties is proposed based on the polarity of voltage pulse [1], [7]. The use of this principle can significantly reduce the number of extra transistors and nodes compared   [4]; (b) the latch in [5].
with the hardening approach based on the CEs. The evaluation results of the hardware overheads show that the proposed latch has a reasonable amount of hardware overheads (dynamic power consumption, area, propagation delay, rising time t r , falling time t f ); when a comprehensive metric that is Power-Delay-Area-Product (PDAP) is used, it offers the minimum PDAP.
The rest of this paper is organized as follows. In Section II, the proposed latch is described, and the hardening principle is explained. In Section III, the behavior of the proposed latch is simulated, and the overheads are compared with different hardened latches. In Section IV, the conclusion is given.

II. PROPOSED MNU TOLERANCE LATCH A. PROPOSED LATCH
The proposed MT latch is given in Fig. 3; it is based on the polarity of radiation-induced voltage pulse. This hardening principle can help to minimize the number of devices, so that it comprises of 42 devices (an inverter is used to generate the CLKN). N1 ∼ N12 and P1 ∼ P16 transistors compose the main sequential part with the self-recovery properties; N13 ∼ N15 and P17 ∼ P19 compose a 2-input CE that acts as the filter to prevent SNU and MNU propagation, so a transient glitch will be generated on node Q only when it is struck by a particle, or the node pair S3-S7 are simultaneously flipped by charge sharing. N1, N2, N5 ∼ N8, N11 and N12 can prevent the positive charge to be deposited and collected in the nodes S1, S2, S5 and S6. This means that when S1 = S5 = 1 and S2 = S6 = 0, even they are affected by a radiation particle, only S1 and S5 nodes can be changed to 0, and the voltages of S2 and S6 will be lowered while the latched values still are 0. Similarly, when S1 = S5 = 0 and S2 = S6 = 1, only S2 and S6 nodes can be changed to 0 while S1 and S5 nodes will keep the latched values when a strike event occurs.
As can be seen from Fig. 3, when CLK = 1, the proposed MT latch performs the propagation mode and the value of the input D is transmitted to the output node Q; when CLK = 0, the proposed latch performs the hold mode (it can keep the latched values by the built-in feedback loops): • when CLK = 1 and D = 1, S1, S4, S5 and S8 are forced to 1, so S2, S3, S6 and S7 can be altered to 0 through the feedback loops. P19 and N13 are closed, and the voltage of the output node Q is equal to the voltage of the input D, i.e., Q = D = 1; • when the CLK signal becomes a low voltage (CLK = 0), all access devices (N16 ∼ N20) are turned off. Because the feedback loops of the proposed latch can provide the stable values on the internal nodes S1 ∼ S8, the 1 value will be kept in the node Q (P19 and N13 are on) unless the CLK signal becomes a high voltage (CLK = 1).

B. TOLERANCE ANALYSIS
The upset mechanism [7] indicates that the sensitive nodes are the drains/sources of the off devices and the polarity of radiation voltage pulse is related to the type of the struck device, i.e., a positive pulse will be generated (an amount of positive charge is collected) only when a PMOS is the struck device, while a negative pulse will be generated (an amount of negative charge is collected) only when an NMOS is the struck device. This means that when 1 is latched (Q = 1), S2 (the source of N6) and S6 (the source of N12) only collect the negative charge which lowers the voltage levels; however, the latched values are not affected (S2 = S6 = 0), so they are not the sensitive nodes. As per the same analysis, when 0 is latched (Q = 0), S1 (the source of N5) and S5 (the source of N11) nodes are not the sensitive nodes. Hence, the number of sensitive nodes (14) for the proposed MT latch is fewer than that of the latched nodes (16).
To simplify the analysis, we assume that 1 is latched (CLK = 0), i.e., Q = 1, S1 = S4 = S5 = S8 = 1 and S2 = S3 = S6 = S7 = 0. Because the sequential part of the proposed MT latch is comprised of two identical storage cells, we only focus on the SNU and MNU self-recovery of one cell in the following analysis (the same applies to the other one).
SNU takes place on S1this SNU can close N2 and N9, and open P8 and P9 at the same time. However, P3, P7 and N5 are on because S3, S2 and S4 are the unchanged nodes. Therefore, S1 can be restored to 1 again by transmitting the current through P3, P7 and N5.
SNU takes place on S3this SNU can open N6, and close P3 and P6. Due to the on N3 (S5 = 1), the collected charge can be discharged, so S3 can be restored.
SNU takes place on S4this SNU can close N5, and open P5 and P4. Because S6 and S3 latch the correct values, S4 can be restored to 1 by transmitting the current through P2 and P6.
SNU takes place on F3, F5 or F6the upset cannot disturb the latched values because F3, F5 or F6 is a high-impedance node. It means that the collected charge cannot disturb other nodes.
SNU takes place on Qit becomes a transient glitch that can be mitigated because P17, P18 and P18 are open, so Q also is a self-recovery node.
MNU takes place on S1-S3, S1-S4 or S3-S4these simultaneous upsets can be mitigated because the feedback loops are not destroyed, i.e., S5 ∼ S8 also latch the correct values. For example, an MNU occurring on S3-S4 can open N6, P5 and P4, and close P3, P6 and N5, while S1 and S5 ∼ S8 are not affected and the collected charge of S3 can be discharged through the N3 device. Thus, S3 can be restored, opening P6 again. Subsequently, S4 can be also restored by charging through P2 and P6.

MNU takes place on F3-F5, F3-F6 and F5-F6these
MNUs cannot be transmitted in the feedback loops because these nodes are the high-impedance nodes. As a result, this MNU has no impact on the latched values.
MNU takes place on a latched node (S1, S3 or S4) and one high-impedance node (F3, F5 or F6)these types of MNU can be mitigated as two SNUs (occurring on a latched node and one high-impedance node, respectively), so the nodes can be restored (F3, F5 or F6 keeps the high-impedance state).
MNU takes place on one node (S1, S3, S4, F3, F5 or F6) and Qas per the above analysis, these types of MNU can be mitigated as two SNUs. Therefore, they also have no impact on the latched values.
The MNU self-recovery analysis of occurring on two storage cells is as follows (assuming that Q is also latching 1): MNU takes place on S1-S5this MNU can close N2, N9, N8 and N3, and open P8, P9, P16 and P1, while P3, P7, N5, P11, P15 and N11 are on because their deriving nodes also maintain the correct values. As a result, S1-S5 can be recovered.
MNU takes place on S1-S7these simultaneous upsets can close N2, N9, P11 and P14, and open P8, P9 and N12. Because P3, P7 and N5 are open due to the correct latched values of S3, S2 and S4, S1 can be charged and restored to 1, opening N9 again. Subsequently, S7 is also restored.
MNU takes place on S3-S7-N6 and N12 are opened, and P3, P6, P11 and P14 are closed. However, the latched values of S1 and S5 make N3 and N9 keep opening, so that the upset nodes S3 and S7 can be restored.

III. SIMULATION AND COMPARISON A. SIMULATION
The behavior of the proposed latch is simulated with Spice tools (CADENCE) by using TSMC 65nm technology (VDD = 1.2V). To present the self-recovery properties as clearly as possible, the simulations focus on the self-recovery response of Q and S1 ∼ S8. SNUs and MNUs are modeled by the dual double-exponential current sources [8], because it is more accurate than the double-exponential current source (it can simultaneously model the current peaks and plateaus).
The behavior simulations of the proposed MT latch are shown in Fig. 4 where two periodic square signals are used to drive the inputs D and CLK. As can be seen, when CLK = 1, the proposed latch performs the propagation mode and the value of the input D is transmitted to the output node Q; when CLK = 0, the proposed latch performs the hold mode, i.e., the value of the output is maintained. Fig. 4 also proves that the proposed MT latch is a self-recovery latch, where at least 12fC (24fC) charge is injected for each SNU (MNU): scenario 1-SNU occurring on an internal node S1, S3 ∼ S5, S7 and S8. For example, node S8 is flipped at 55ns, but the latched value is recovered because the other nodes still hold the correct values, resulting in opening P10 and P14, and closing N10. Consequently, the output node Q can keep the latched value. scenario 2-SNU occurring on Q. It is induced at 295ns and the value of node Q is flipped. However, the pair S3-S7 is not changed, so P17 ∼ P19 transistors are always opened. Consequently, the output can also keep the latched value.
MNU scenarios-other upsets are MNUs that occurs on all possible latched node pairs. For example: • an MNU is induced at 60ns which occurs on the node pair S1-S3. However, it can be seen that the others still hold the correct values, so that N3 is on and S3 node is firstly recovered to 0; then, because P3, P7 and N5 are on, S1 node is also recovered; • an MNU occurs on the node pair S3-S5 at 175ns, their values are flipped. However, because S6 ∼ S7 keep their latched values, P11, P15 and N11 still are on. Therefore, S5 node can be firstly recovered; then, N3 is opened, so the latched value of S3 node can be also recovered to 0; • the output Q and an internal node (e.g., S8) occur an MNU at 290ns, but their values can be restored. The main reason is that the flipped node Q does not affect the internal nodes, so that nodes S1 ∼ S7 still hold the latched values. Consequently, the flipped nodes S8 and Q are restored to the correct values.

B. COST COMPARISON
To present a fair comparison with TSMC 65nm technology, the smallest size (120nm/60nm) in the compared latches [2]- [5], [9]- [15] is used if the behavior is correct, while the typical instances would use the classical aspect ratios, such as in a transmission gate, PMOS transistor size is 240nm/60nm and NMOS transistor size is 240nm/60nm. However, it is noted that some hardened latches (e.g., the latch shown in [9]) must require larger sizes to achieve MNU tolerance protection, so larger sizes are needed for some devices. The proposed latch does not strongly depend on the transistor sizes to implement SEU tolerance, so most transistors can use the smallest size (120nm/60nm). The layout of the proposed latch is shown in Fig. 5 where P1 ∼ P16, N1, N2, N5 ∼ N8, N11, N12 devices are 120nm/60nm, and at the same time N3, N4, N9 and N10 use large size (e.g., 360nm/60nm) to  compensate the effects of threshold loss. Besides, the relative transistor positions in the proposed latch do not affect its reliability because it can correct all upsets in circuit level. Finally, the comparison of these latches is performed in the same simulation conditions. Table 1 reports the device and node comparison results; the classical latch (CL) circuit [15] as shown in Fig.6 is also included. From this table, it can be known that the hardened latches in [2], [9], [13] and [14] employ fewer devices and nodes, but cannot implement MNU self-recovery tolerance (the masking tolerance latches generally need fewer devices and nodes). The latches in [5] and [4] use the interlocked CE instances to implement MNU self-recovery tolerance, so they require more devices and nodes, compared with the proposed MT latch. An interesting phenomenon is that the nodes of the proposed MT latch are more than the sensitive nodes, this is because that the use of the hardening principle reduces the sensitive nodes. Table 2 gives the results of the hardware overheads, critical charge (Q crit ) and linear energy transfer threshold (LET th ) [16]. It can be seen that the latch in [5] needs 70 devices to restore SNUs and MNUs due to the interlocked structure, thus it has the largest area. The latches in [2] and [3] feature a smaller area than that of the proposed latch due to the masking properties. The latch in [13] has the largest dynamic power consumption, because it uses the isolation technique to prevent SNU and MNU propagation, resulting in larger leakage power consumption. It should be noticed that although the proposed latch has a threshold loss, the power consumption is comparable to that of the latches in [2] and [14] because the stacked design can effectively save power consumption. The parameters of the proposed latch in terms of delay, rising time t r and falling time t f are the smallest among the hardened latches, because it has smaller node capacitance and shorter critical path. In addition, the proposed latch has the moderate Q crit and LET th values due to the smallest size and lower feedback deriving capability (the threshold loss can decrease the deriving capability); however, even the deposited charge (or LET) in the charge collection and sharing is larger, the proposed latch can still restore those erroneous nodes (see Fig. 4), so this moderate Q crit (LET th ) value has no impact on the robustness.
Here, in order to evaluate the relationship between SNU tolerance performance and overheads, the following metrics are used:  FIGURE 6. A classical latch (CL) circuit [15].
The obtained results are given in Fig. 7; as expected, the CL latch has the smallest α 1 and α 3 values, because none hardening design is employed. As for all MNU tolerance designs, the proposed design has moderate α 1 and α 3 values, but features the smallest α 2 value. Fig. 8 shows the PDAP of each latch in which the proposed MT latch obtains the minimum PDAP compared with other hardened latches. Fig. 9 shows the α value of each latch, and it is clear that the proposed tolerance latch has the smallest α value, indicating a remarkable performance for the SNU tolerance.

C. MULTIPLE-NODE UPSET TOLERANCE COMPARISON IN CIRCUIT LEVEL
To quantify an MNU tolerance capability for various latches, Fig. 10 plots the comparison curves of the collected charge in a charge sharing, in which the primary node collects a large number of the charge, while the secondary node collects the remaining charge (for ease of presentation, the comparison curves of the CL latch, and the latches in [2], [5] and [13] are only plotted). It should be noticed that this comparison curve is an effective and accurate criterion to verify fault tolerance capability in circuit level, and it has been verified in many technical literatures [17], [18]. If the area of a curve is larger, this latch will manifest a better tolerance to an MNU. From Fig. 10, it can be known that, the curve of the classical latch simultaneously intersects the X and Y axes; this represents that it cannot tolerate an SNU in any single node because the hardening approaches are not used to improve the tolerance performance. The curves of other latches except for the latch in [13] do not intersect the X and Y axes, so implying that these robust latches including the proposed latch can tolerate any SNU and MNU. In addition, this figure also verifies that the MNU tolerance latch in [13] is only an SNU tolerance latch (the output does not be considered as a sensitive node) because the corresponding curve intersects the X axis. This indicates that if the output Q is a struck node, only a small number of the deposited charge can result in the change of the latched value, and meanwhile an SNU occurs.

D. COMPREHENSIVE COMPARISON
It should be known that a single metric is not suitable for achieving overhead evaluation because the applications are  quite complicated. For example, the circuits in aerospace missions (radiation applications) are required to have smaller area, lower power consumption as well as higher reliability, while in modern Internet of Things (IoT) they favor smaller area and lower power consumption over reliability [19]- [22]. To evaluate the hardware overheads of each latch cell, some comprehensive metrics are utilized [23], [24]: Fig. 11 shows β 1 , β 2 and β 3 comparison results of various latches, respectively. Form this figure, we can see that the β 1 values of the latches in [2], and [12]- [14] is smaller than that of the proposed latch, this is because they use fewer devices. However, the hardened latches of [2], [13] and [14] only perform MNU masking tolerance; although the latch of [12] can implement MNU self-recovery tolerance, its β 2 value is larger. The β 2 and β 3 values of the proposed latch is minimal among all the latches, because it has excellent fast delay and low power advantages. Fig. 12 gives the β value of each latch. As can be seen, the proposed MT latch manifests the smallest β value among the hardened latches (smaller β means that the hardened latch is able to use fewer hardware overheads to provide better MNU tolerance protection). Thus, these results show that the proposed latch has a remarkable performance for the MNU tolerance.

E. PROCESS VARIATIONS
In nanoscale IC technologies, process variations can result in functional or parametric failures, because device parameters such as oxide thickness, channel length and channel width suffer large fluctuations. Therefore, the impact of the process variations should be carefully considered. In this section, we make use of corner, Process, Voltage and Temperature (PVT) as well as Monte Carlo (MC) simulation to verify the impact of these variations. In process corner simulation, four corners (i.e., SS, FF, SF and FS) are chosen; for MC simulation with 3-sigma evaluation, the simulation number is 3000 [18], [20]; for PVT simulation, the extreme combinations are considered: 1  are given in Fig. 13; these results show that SNUs and MNUs do not destroy the function of the proposed MT latch and the output Q maintains the latched value. In the corner analysis,  the performance parameters in terms of power, delay, t r , t f , and Q crit are shown in Table 3. From Tables 2 and 3, it can be known that these parameters suffer little fluctuations in the SS, SF, and FS corner analysis, and four parameters delay, t r , t f , and Q crit are the best while the power parameter is the worst in the FF corner analysis. Figs. 14 and 15 present the fluctuations of the parameters in terms of power, delay, t r and t f in 3000 MC simulations, and Table 4 gives the statistical parameters of various latches in terms of mean, variance, and deviation. Thus, as per these results, it can be concluded that for the power and timing parameters, the proposed latch can feature low sensitivity to the variations. The evaluation of the variations demonstrates that although these parameters of the proposed latch can be slightly affected under the variations, the tolerance performance is not disturbed, so the variations have no impact on the robustness against SNUs and MNUs.  [2], and [9]- [12] (the abscissa is the number of the MC evaluation).

F. RELIABILITY
The reliability is the ability of the circuit to implement the function within a certain period under the impact of stated conditions [6]. Thus, the reliability in the soft errors generally is the self-recovery ability of the circuit; it can be calculated by using the following   [13], [14] and [3]- [5] (the abscissa is the number of the MC evaluation). equation [6]: where λ is the Soft Error Rate (SER) occurring within a time interval and it can be obtained by the mathematical model of [25] and [26]. In this work, the neutron flux (0.00565 n * cm −2 s −1 at sea level from New York City) is used to calculate the SER (λ) value [25], [26]. The self-recovery reliability of each latch is shown in Fig. 16, in which the impact of soft errors is only considered. It can be seen from this figure that the proposed latch provides a higher reliability compared with the remaining latches; it means that the proposed latch can operate normally within a longer time in the radiation environments.
The temporal degradation effects such as Negative Bias Temperature Instability (NBTI), Positive Bias Temperature Instability (PBTI) and Hot Carrier Injection (HCI) can affect and degrade the performance of a circuit, because they can affect and degrade the threshold voltage of a transistor during circuit operation [27]- [29]. Hence, the combined degradation effects should be also noticed for long-lifetime applications. The model in [30]- [32] can estimate accurately the transistor  threshold voltage shift ( V th ) during BTI-induced aging, and meanwhile the model shown in [29] can estimate accurately V th during HCI-induced aging. [29] also gives the model to estimate V th under the combined effects of BTI and HCI. For the models of BTI and HCI, the readers are advised to consult previous publications [27]- [33]. Once V th is obtained, we can determine the critical charge of the proposed latch under the combined effects of BTI and HCI by measuring in circuit-level.
As per the models of [29]- [32], the self-recovery reliability of the proposed latch under the combined effects of BTI and HCI is calculated and shown in Fig. 17. It can be known that these temporal degradation effects only have little impact on the tolerance capability of the proposed MT latch; the main reason is that the same as the TPDICE cell [2], the proposed scheme also depends on the excellent tolerance structure to provide tolerance protection, so that the degradation of the threshold voltage due to BTI and HCI effects only has little impact on the tolerance reliability.

IV. CONCLUSION
This paper uses the polarity of radiation voltage pulse to propose a novel MNU self-recovery latch. It can restore the erroneous nodes and provide the correct latched values in an SEU. The behavior of the proposed latch is simulated by using TSMC 65nm CMOS process. Because the use of the hardening principle can help reduce the number of devices, so it has fewer hardware overheads (when a comprehensive   [2]; (c) [9]; (d) [10]; (e) [11]; (f) [12]; (g) [13]; (h) [14]; (i) [5]; (j) [3], and (k) [4]. metric PDAP is used, it manifests the minimum hardware metric PDAP value for all MNU tolerance designs). Monte Carlo (MC), and Process, Voltage and Temperature (PVT) simulations have been implemented to evaluate the impact of process variations. The obtained evaluation results show that the process variations have no impact on the robustness, although the parameters of the proposed design suffer little fluctuations in terms of delay, t f and t r and power. Although the temporal degradation effects such as Bias Temperature Instability (BTI) and Hot Carrier Injection (HCI) can lead to the transistor threshold voltage shift ( V th ) to reduce the critical charge, the evaluation of the self-recovery reliability prove that they only have little impact on the self-recovery reliability.
This work has demonstrated that this hardening approach can provide excellent tolerance protection against SNU and MNU, thus it can be also used as a novel hardening idea to propose other tolerance circuits.