Robust Offset-Cancellation Sensing-Circuit-Based Spin-Transfer-Torque Nonvolatile Flip-Flop

Nonvolatile flip-flop (NV-FF) based systems can be effectively implemented in battery-limited internet of things (IoT) applications due to their zero standby power consumption and instant ON/OFF characteristics. Among various NV-FFs, spin-transfer-torque magnetic tunnel junction based NV-FFs are the most applicable due to their nonvolatility, high endurance, complementary metal-oxide-semiconductor (CMOS) compatibility, scalability, and rapid sensing and write speed due to its low resistance. However, they are subject to a degraded sensing margin for restoring operation with technology shrinks because of the increased process variation and reduced supply voltage. This paper proposes a novel NV-FF with a significantly superior offset-tolerance and reduction in read current (Iread) that produces read disturbance in comparison to state-of-the-art NV-FFs. Monte Carlo HSPICE simulation results based on industry-compatible 65-nm model parameters revealed that the proposed NV-FF can achieve a three-order improvement in the restore yield and a two-order reduction in the Iread when compared with state-of-the-art NV-FFs.


I. INTRODUCTION
Recently, nonvolatile flip-flop (NV-FF)-based systems (NV-System) have been regarded as a potential substitute for conventional volatile systems [1]- [10]. This is mainly because the NV-System enables 1) zero standby power consumption by switching OFF the power in standby mode, thereby saving power; 2) instant-ON from the power-down conditions, thereby improving user experience and saving power; 3) instant-OFF to the standby mode, thereby power saving and eliminating the need for external nonvolatile memory; and 4) the prevention of sudden power failures, thereby improving its reliability and resulting in cost/area saving, as large tantalum capacitors are not required.
Among a variety of NV-FF implementations, NV-FFs that employ spin-transfer-torque magnetic tunnel junctions (STT-MTJs) are the most applicable due to their characteristics of nonvolatility, high endurance, long retention times, complementary metal-oxide-semiconductor (CMOS) compatibility, The associate editor coordinating the review of this manuscript and approving it for publication was Yong Chen . scalability, and no area overhead as a result of stacking above the MOS transistor. The STT-MTJ based NV-FF has four operation modes. In the normal FF mode, it acts as a conventional volatile FF. In the backup mode, it stores the computing data in the STT-MTJs. In the standby mode, the power of the NV-system is completely switched OFF to achieve zero standby power. In the restore mode, it restores the stored data from the STT-MTJs to the FF core. Moreover, it should be noted that the design of the NV-FF should not degrade the performance of the normal FF mode, given that the normal FF mode operation is the predominant operation in the NV-System, whereas the restore and backup mode operations occur infrequently in internet of things (IoT) applications.
Although the STT-MTJ based NV-FFs demonstrate the abovementioned advantages, they are subject to a degraded sensing margin for restoring operations with technology shrinks because of the increased process variation and reduced supply voltage (V DD ). Moreover, this is exacerbated when V DD is further reduced until near-threshold voltage region for ultralow power IoT applications. The size-increase strategy typically employed for the conventionally used merged latch and sensing circuit (MLS) structure (see Fig. 1(a)) to ensure tolerance to the process variation increases the parasitic capacitance of the slave latch, thus resulting in the performance degradation of the normal FF mode [3], [5], [10].
To overcome this problem, Ryu et al. [3] proposed a separated latch and sensing circuit (SLS) structure, as shown in Fig. 1(b). By adding a separate sensing circuit at the cost of the area overhead, the sensing circuit can be optimized without the degradation of the slave latch operation, thus leading to an improvement in the sensing margin of the restore mode and the performance of the normal FF mode. Song et al. [10] proposed an offset-cancellation sensingcircuit (OCSC)-based NV-FF, which is insensitive to the offset voltage due to the process variation. By employing the offset-cancellation technique in the sensing circuit, the sensing margin was improved, thereby resulting in the energy saving of the restore mode and area reduction. However, the abovementioned NV-FFs were still highly sensitive to the process variations.
This paper presents a novel NV-FF with a significantly higher offset-tolerance characteristic based on the SLS structure, in comparison with the state-of-the-art NV-FF [10]. The remainder of this paper is organized as follows. Section II describes the proposed NV-FF. Section III presents the simulation results and comparison, followed by the conclusions in Section IV. To independently optimize the sensing circuit and flip-flop core, the proposed NV-FF is based on the SLS structure. With respect to the sensing circuit, the only differences between the proposed NV-FF and state-of-the-art NV-FF [10] are 1) the inclusion of two NMOSs (NL2 and NR2), which are controlled by a supply voltage (V DDH ) higher than the normal V DD and 2) reverse-connected MTJ A and MTJ B [11].

II. PROPOSED NV-FF
However, these differences significantly improve the effectiveness of offset cancellation and reduce the read disturbance in the restore mode. The requirement of V DDH may create design complexity when an extra power rail is needed. This paper gives a solution to generate the V DDH signal without extra power rail by using a simple charge pump, which will be described later.
The restore mode operation of the proposed NV-FF consists of the following four phases: precharge, offsetcancelling, re-precharge, and comparison. In the precharge phase (phase1, P1), the gate voltages of NL and NR reach V DD . Thus, the two capacitors (C SA_L , C SA_R ) are precharged to V DD . In the offset-cancelling phase (phase2, P2), based on the diode-connected configuration, the precharged voltage V DD in C SA_L and C SA_R discharges and automatically stops discharging when the voltage reaches the threshold voltage (V th ) of NL and NR. Thus, in P2, C SA_L and C SA_R reflect the V th of NL and NR, respectively, thus resulting in the offset cancellation. It should be noted that without NL2 and NR2, efficient offset cancellation cannot be achieved, given that the V OUT and V OUTB nodes are connected via MTJ A and MTJ B in P2. By adding NL2 and NR2 for isolation between the V OUT and V OUTB nodes in P2, the proposed NV-FF significantly improves the efficiency of the offset cancellation. In the re-precharge phase (phase3, P3), V OUT and V OUTB are precharged to GND, whereas C SA_L and C SA_R capture the V th s of NL and NR, respectively. In the comparison phase (phase4, P4), the stored data in the MTJs are first compared by the difference in resistance between MTJ A (R MTJ_A ) and MTJ B (R MTJ_B ), and then amplified by the positive feedback of the cross-coupled NL and NR. If R MTJ_A is smaller than R MTJ_B , V OUTB is charged more rapidly than V OUT . Given that V OUTB (V OUT ) is connected to the left (right) plate of C SA_R (C SA_L ), which is the cross-coupled NMOS structure, the voltage difference between V OUT and V OUTB is almost amplified to the rail-to-rail voltage (V OUTB ∼ V DD , V OUT ∼ GND). More details on the operation are presented in [10], VOLUME 8, 2020 given that the operation of the proposed NV-FF is the same as the operation of the state-of-the-art NV-FF; with the exception of the inclusion of switches NL2 and NR2. To restore correctly, the offset cancellation of NL and NR, in addition to the difference in resistance between R MTJ_A + R NL2 and R MTJ_B + R NR2 , is critical, where R NL2 (R NR2 ) is the effective resistance of NL2 (NR2). To achieve insensitivity to the V th mismatch between NL2 and NR2, the gates of NL2 and NR2 are driven by V DDH . Given that the drain current of NMOS is proportional to (V GS -V th ) 2 , where V GS is the gate-source voltage; by increasing V GS , the effect of the V th mismatch between NL2 and NR2 on the restore mode operation can be suppressed. Fig. 3 shows the write operation of STT-MTJ [12]- [15]. Because the magnetization direction of the free layer can be changed by flowing a sufficient write current (I write ) larger than a critical switching current (I C ) while the magnetization direction of the pinned layer is fixed, low resistance (R L ) can be written by flowing I write from the top electrode (TE) to the bottom electrode (BE). Likewise, high resistance (R H ) can be written by flowing I write from the BE to the TE. However, because of the read current (I read ) that is the current flowing through MTJ A and MTJ B during the restore mode, an unintentional write operation can occur. This is known as read disturbance [16]. Fig. 4 shows the read disturbance scenarios in the restore mode in cases of normal-connected MTJs and reverse-connected MTJs. As mentioned earlier, if R MTJ_A and R MTJ_B are R L and R H , V OUTB and V OUT nearly become V DD and GND, respectively. In the case of normal-connected MTJs ( Fig. 4(a)), because the direction of I read is from the TE to the BE, only the MTJ B of R H has the probability of causing a read disturbance. To avoid such a read disturbance, I read should be sufficiently lower than I C . However, because V OUT is almost GND, a high I read (∼ (V DD -GND)/R H ) is inevitable. In the case of reverse-connected MTJs ( Fig. 4(b)), because the direction of I read is from the BE to the TE, only the MTJ A of R L has the probability of causing a read disturbance. Because V OUTB is almost V DD , I read can be significantly reduced (∼ (V DD -V DD )/R L ), resulting in read disturbance prevention in the restore mode.
The backup mode operation of the proposed NV-FF has two scenarios that are determined by the computing data Q of the normal FF mode (refer to Fig. 2). When Q is 1 (0) and a write-enable (WE) is asserted, the write current flows from PWR (PWL), MTJ B (MTJ A ), NR2 (NL2), NL2 (NR2), MTJ A (MTJ B ), to NWL (NWR). Then, MTJ A and MTJ B are written as R H (R L ) and R L (R H ), respectively. Fig. 5 illustrates the operation when Q is 1.  Fig. 6(a) presents the restore mode transient responses when there was no V th mismatch between NL and NR, and Fig. 6(b) presents the responses when there was a V th mismatch of 200 mV (V th_NR = V th_NL + 200 mV). In the simulation, the following industry-compatible 65-nm model parameters and the MTJ model based on [17] were used: V DD = 0.8 V,

III. SIMULATION RESULTS AND DISCUSSION
where W indicates the width of the corresponding transistor, as denoted by the subscript. A minimum length of 0.06 µm was used for all the transistors. All other transistors that were not specified had minimum widths of 0.21 µm. Fig. 6(b) clearly reveals the advantage of the proposed NV-FF, in that the 200 mV V th mismatch in NR was entirely reflected by the C SA_R , thereby successfully passing the restore operation; whereas the stateof-the-art NV-FF reflected almost zero, thereby failing. Fig. 7(a)-(c) present the sensing pass/fail simulation results in the restore mode with respect to the (a) NL/NR V th mismatch; (b) tunnel magneto-resistance ratio (TMR), which is defined as (R H -R L )/R L × 100, when NL/NR V th mismatch is 50 mV; and (c) NL2/NR2 V th mismatch. Fig. 7(a) reveals that the proposed NV-FF exhibited a higher offset tolerance than the state-of-the-art NV-FF by a factor of 5. Due to the improved offset-tolerance characteristic, as can be seen from Fig. 7(b), the proposed NV-FF can restore correctly at a TMR of 20%; whereas the state-of-the-art NV-FF can restore correctly in TMR greater than 70%. Fig. 7(c) reveals that the proposed NV-FF was insensitive to the V th mismatch between NL2 and NR2 in accordance with an increase in the V DDH , as mentioned previously. To ensure an effective restore operation under the severe conditions of a V th mismatch of 200 mV, V DDH = 1.5 V was set. However, according to the target of the restoring yield and the reliability of the transistor, V DDH can be adjusted. Fig. 8 presents the restore yield of the proposed NV-FF and the state-of-the-art NV-FF with respect to the capacitance of C SA (= C SA_L = C SA_R ). The restore yield, which is defined as the number of restore pass results over total number of simulations in sigma, was obtained from a large number of  Monte Carlo simulations. The target restore yield of NV-FF was set to 4σ to guarantee a 96.88% yield when 1000 NV-FFs are assumed [10]. Given that C SA was used for the offset cancellation of NL and NR and the positive feedback operation of the cross-coupled NL and NR in the comparison phase, a larger C SA would improve the restore yield. The proposed NV-FF achieved a target restore yield of 4σ at C SA = 10 fF; whereas the state-of-the-art NV-FF did not achieve the target store yield even at C SA = 20 fF, due to the inefficient offset-cancellation characteristic. The increase in size of W NL and W NR improved the restore yield; however, it decreased after C SA = 10 fF, due to the limited restore time when each phase time is set to 20 ns.   9 shows the restore yield of the proposed NV-FF with respective to W NL2 (= W NR2 ) and V DDH . For this simulation, C SA = 20 fF was used. Because the effect of the V th mismatch between NL2 and NR2 on the restore mode operation is suppressed as the width of NL2 and NR2 or V DDH increases, the restore yield increases with W NL2 (= W NR2 ) and V DDH . When V DDH = 1.5 V is used, W NL2 = W NR2 = 1.0 µm is sufficient to achieve the target restore yield of 4σ . When V DDH = 1.3 V is used, W NL2 = W NR2 = 2.0 µm is required. Fig. 10 shows the restore yield of the proposed NV-FF and a state-of-the-art NV-FF with respect to W NL (= W NR ). For this simulation, V DDH = 1.5 V was used. Unlike the state-of-the-art NV-FF, whose restore yield monotonically increases with W NL and W NR , the proposed NV-FF has an optimal W NL and W NR . When W NL and W NR are extremely small (e.g., 0.25 µm), the effective resistances of NL and T. Na: Robust Offset-Cancellation Sensing-Circuit-Based Spin-Transfer-Torque NV-FF NR become significantly high. Because this reduces the pull-down drive strength of the NL and NR, the voltage difference ( V) between V OUT and V OUTB becomes extremely small, as shown in Fig. 11(a), leading to a degradation in the restore yield. When W NL and W NR are extremely large (e.g., 10 µm) with the same value of C SA , the capacitive coupling effect between V OUTB (V OUT ) and V G_NR (V G_NL ) decrease because the coupling ratio depends on C SA_R / (C SA_R + C G_NR ) and C SA_L / (C SA_L + C G_NL ), where C G_NR and C G_NL are gate capacitances of NR and NL, respectively. This results in the weak positive feedback of cross-coupled NMOS, as shown in Fig. 11(b). The restore yield is also degraded. To improve the restore yield in extremely large values of W NL and W NR , C SA should be also increased. In case of the proposed NV-FF, however, because of the superior offset-tolerance characteristic, large W NL and W NR values are not necessary. Thus, a much smaller C SA can be used to achieve the target restore yield compared to the stateof-the-art NV-FF. The optimal W NL and W NR range of the proposed NV-FF is from 0.5 µm to 1 µm, as shown in Fig. 10. On the ground that a smaller W NL , W NR , and C SA are possible in the proposed NV-FF compared to the state-of-the-art NV-FF and W NL2 = W NR2 = 1 µm is sufficient for achieving the same restore yield, the proposed NV-FF is expected to have a much smaller area than the state-of-the-art NV-FF.
Both the state-of-the-art and proposed NV-FFs have a cross-coupled NMOS structure. For this reason, V OUT and V OUTB cannot be the rail-to-rail voltage (V DD or GND) after finishing the positive feedback of restore mode. Because the non-rail-to-rail voltages of V OUT and V OUTB generate static I read , it can cause a read disturbance. When MTJ A = R L and MTJ B = R H , MTJ B (MTJ A ) of state-of-the-art NV-FF (proposed NV-FF) has the probability of causing a read disturbance, as described in Section II and Fig. 4. Fig. 12 shows the I read causing a read disturbance (proposed NV-FF: I read flowing through MTJ A , state-of-the-art NV-FF: I read flowing through MTJ B ) with respect to NL/NR V th mismatch. Fig. 12 clearly shows that the proposed NV-FF can reduce the I read by almost two orders of magnitude compared to the state-of-the-art NV-FF, resulting in the prevention of read disturbance. In the case of 65-nm process technology, the nominal V DD is 1.2 V and the maximum allowable V DD is 1.32 V, which is the value of the nominal V DD multiplied by 1.1. Thus, a V DDH of 1.5 V is not allowed to prevent gate oxide breakdown. There are two approaches to prevent this problem. The first approach is to reduce the V DDH by increasing the width of NL2 and NR2. As shown in Fig. 9, a V DDH of 1.3 V can be used by increasing the width of NL2 and NR2 to 2 µm. The second approach is to reduce the V DDH by replacing the typical V th NMOSs NL2 and NR2 with low V th NMOSs because this reduces the effect of V th variation in (V GS -V th ) 2 . It should be noted that because both typical and low V th NMOSs use the same gate oxide thickness, their allowable voltages are the same to 1.32 V. Fig. 13 shows the restore yield of the proposed NV-FF with respect to W NL2 (= W NR2 ) and V DDH when NL2 and NR2 are low V th NMOSs. The figure shows that a V DDH of 1.3 V can be used without increasing the width of NL2 and NR2. To minimize the area overhead, the second approach based on low V th NMOS for NL2 and NR2 is selected. Because the restore yields in cases of typical V th NL2 and NR2 with V DDH = 1.5 V and low V th NL2 and NR2 with V DDH = 1.3 V are almost the same, all the previous results using typical V th NL2 and NR2 with V DDH = 1.5 V can be interpreted as results obtained using low V th NL2 and NR2 with V DDH = 1.3 V.   14 shows the restore yield of the proposed and stateof-the-art NV-FFs with respect to V DD . For this simulation, the low V th NMOSs are used for NL2 and NR2 with V DDH of 1.3 V. The figure shows that the state-of-the-art NV-FF requires a V DD of 1.1 V to achieve the target restore yield of 4σ , whereas the proposed NV-FF requires a V DD of 0.8 V. Fig. 15 shows the restore yield of the proposed and state-ofthe-art NV-FFs with respect to the restore time. For simplicity, the simulation uses equal phase times for P1, P2, P3, and P4. As expected, the restore yields of both NV-FFs increase as the restore time increases. Note that the proposed NV-FF achieves the target restore yield at a restore time of 6 ns, whereas the restore yield of the state-of-the-art NV-FF is saturated to 2.7σ at a restore time of 40 ns.   Figs. 16 and 17 show the restore yield of the proposed and state-of-the-art NV-FFs with respect to temperature and process corner, respectively. The process corners are specified by two letters describing the NMOS and PMOS. S, T, and F stand for slow, typical, and fast corners, respectively. These results clearly verify that the proposed NV-FF outperforms the stateof-the-art NV-FF in terms of offset-tolerance characteristics regardless of the temperature and the process corner. It should be noted that when the temperature decreases to a value lower than 0 • C or the process corner is SS, because of significantly increased V th of switches, which are described as switch symbols in Fig. 2, the switches are not fully turned  on, resulting in weak positive feedback and a degradation in restore yield. This can simply be overcome by replacing the typical V th switches to low V th switches, as shown in Figs. 16 and 17. So far, the power rail of V DDH was assumed for the proposed NV-FF. Unless the power rail of V DDH , which is for the purpose of operating not the proposed NV-FF but the other blocks, exists already in a target IoT application, however, using dual-rail supply only for the NV-FF in IoT application is not attractive because it may cause significant design complexity and power/area overhead. Fig. 18 shows a charge pump based solution that is capable of generating V DDH signal (P1,4_CP) without extra power rail of V DDH . The operation of the charge pump is simple. When V DD signal P1,4 is low, the voltages of P1,4_CP and BOT_CP (V P1,4_CP and V BOT_CP ) are GND, and the voltage of TOP_CP (V TOP_CP ) is V DD . When P1,4 signal changes from low (GND) to high (V DD ), V BOT_CP becomes V DD , and then, by the capacitive coupling through C CP , V TOP_CP and V P1,4_CP increase above V DD . The increment of V P1,4_CP depends on the value of C CP . Note that the body of two PMOSs, PCP1 and PCP2, is connected to the node TOP_CP, and a high V th PMOS is used for the PCP1 to maximize the increment of V P1,4_CP . Fig. 19(a) shows the transient response of V P1,4_CP according to the value of C CP . As expected, V P1,4_CP increases as the C CP increases. It shows that C CP of 6 fF is sufficient to achieve the target value of 1.3 V. V th variation of PCP1 can cause significant variation of V P1,4_CP . Fig. 19(b) of 1000 sets of Monte Carlo simulations with minimum sized (width = 0.21 µm, length = 0.06 µm) PCP1 clearly shows that V P1,4_CP variation increases as time elapses, and at time = 70 ns, V P1,4_CP variation becomes almost 140 mV. However, because the positive feedback occurs early in P3 (60.3 ∼ 60.5 ns) and after starting to the positive feedback V OUTB and V OUT are amplified correctly regardless of the value of V P1,4_CP , the proposed NV-FF is rarely affected by the variation of PCP1. Fig. 20 shows the restore yield of the proposed NVFF with charge pump according to the value of C CP . It also clearly shows that C CP of 6 fF, which corresponds to the pMOSCAP size of 6.0 µm/0.1 µm (W/L), is sufficient to achieve the target restore yield of 4σ .   It also clearly verifies that the proposed NV-FF demonstrates a significantly better offset-tolerance characteristic, resulting in a much higher restoring yield. Table 1 summarizes and compares the performance of the proposed and state-of-the-art NV-FFs. Both NV-FFs use the SLS structure to optimize the sensing circuit without causing degradation in slave latch operation. The proposed NV-FF achieves the restore yield of 4.3σ at a restore time of 6 ns, whereas the restore yield of the state-of-the-art is saturated to 2.7σ at a restore time of 40 ns. The restore yield values of 2.7σ and 4.3σ are the same as the restore error rates of 3.5 × 10 −3 and 8.5 × 10 −6 , respectively. Thus, by using the proposed NV-FF, a three-order improvement in the restore yield can be achieved. When the same restore time of 6 ns is applied, the restore energy of the proposed NV-FF is slightly lower than that of the state-of-the-art NV-FF even the energy consumed by the charge pump is included. It is due to the better offset-cancellation characteristic, leading to more rail-torail V OUT and V OUTB (as shown in Fig. 6), thereby reducing static current at P4. Even though [11], which is the first paper proposing the reverse-connected MTJ structure for embedded STT magnetic random access memory application, does not mention about the area overhead, the reverse-connected MTJ may cause the area overhead because one or two extra vias are required. However, its area overhead is negligible from the NV-FF area perspective. I read causing the read disturbance of the state-of-the-art NV-FF is 60.1 µA. The proposed NV-FF, however, significantly reduces the I read causing the read disturbance to 0.98 µA by employing the reverse-connected MTJ structure. Even though the write current of the proposed NV-FF is decreased from 58.5 µA to 53.9 µA because of the addition of NL2 and NR2, its write current is still enough when the write time of more than 10 ns is considered [5]. The clock-to-Q (C-Q) delays of both NV-FFs are the same as the transmission-gate-based master-slave FF because of the SLS structure.
Even though this paper focuses on the dynamic offset cancellation technique (offset capture and cancel), the offset voltage can also be alleviated by employing offset calibration techniques [18]- [20].

IV. CONCLUSION
This paper proposes a novel NV-FF with a significantly improved offset-tolerance characteristic due to the inclusion of two NMOSs (NL2 and NR2), to achieve isolation between the V OUT and V OUTB nodes in the offset-cancelling phase. To achieve insensitivity to the V th mismatch between NL2 and NR2, the gates of NL2 and NR2 were driven by V DDH . To avoid the overhead caused by the use of dual-rail supply, a charge pump based solution is proposed. In addition, the reverse-connected MTJ structure is used to reduce the read disturbance. The simulation results verify the effectiveness of the proposed NV-FF, which demonstrated a three-order improvement in the restore yield and a two-order reduction in the I read . Hence, the proposed NV-FF can be employed for the realization of zero standby power and instant ON/OFF NV-FF systems.