High-Performance and Area-Efficient Ferroelectric FET-Based Nonvolatile Flip-Flops

Recently, nonvolatile systems with nonvolatile flip-flops (NVFFs) have gained prominence for their energy efficiency in energy-harvesting devices and battery-operated Internet of Things applications. They are normally-off instantly-on, and thus, can save energy effectively owing to their zero standby power consumption. An NVFF stores the computing state in nonvolatile memories (NVMs) when the power is off. A ferroelectric field-effect transistor (FeFET) is one of the most promising NVMs owing to its high $\text{I}_{\mathrm {on}}/\text{I}_{\mathrm {off}}$ ratio and low write power. Three FeFET-based NVFFs (previous FeFET-out NVFF-1/-2 and FeFET-in NVFF) were recently proposed to improve the area, power, and speed; however, they still have their own problems. Previous FeFET-out NVFF-1 has large area overhead and previous FeFET-out NVFF-2 does not properly perform restore operation. Previous FeFET-in NVFF has a long clock-to-Q delay and high operating energy. This paper introduces two novel FeFET-based NVFFs (proposed FeFET-out and -in NVFFs). Proposed FeFET-out NVFF reduces the large area overhead of previous FeFET-out NVFF-1 and corrects the malfunction in the restore operation of previous FeFET-out NVFF-2. Proposed FeFET-in NVFF achieves a better clock-to-Q delay, operating energy, and area than the previous FeFET-in NVFF. Monte Carlo simulations based on an industry-compatible 10-nm FinFET model are performed for a comparative analysis. Proposed FeFET-out NVFF achieves 17.6% smaller area with slightly higher (6.3%) operating energy and only 0.8% slower clock-to-Q delay than previous FeFET-out NVFF-1. Proposed FeFET-in NVFF achieves 18.9% shorter clock-to-Q and 3.0% smaller operating energy with 8.7% smaller area than the previous FeFET-in NVFF.


I. INTRODUCTION
In recent years, Internet of Things (IoT) applications have been extensively adopted for many electronic devices. They are growing rapidly with the improvement in 5G wireless technology [1], and the reports from Ericsson [2] predicts that about 28 billion smart devices will be connected to the internet across the world by 2021. Most of them are battery-operated devices and have normally-off and instantlyon life patterns. These devices spend most of their lifetime in the standby mode; therefore, reducing the leakage power has become important. However, with the scaling down of The associate editor coordinating the review of this manuscript and approving it for publication was Kim-Kwang Raymond Choo . technology, a large amount of energy is wasted by the leakage current. Because the threshold voltage has been lowered along with the nominal voltage (V DD ), the subthreshold leakage current in turned-off transistors has become a major issue.
Power gating is a commonly used circuit technique owing to its significant effect on reducing the leakage current by shutting down the power supply. However, the conventional volatile system, as displayed in Fig. 1(a), cannot eliminate the leakage current in the standby mode because it loses the computing states in the pipelines if the power is fully shut down [3]- [5]. Therefore, significant overheads of energy and latency exist in the wake-up process to restart the pipeline from a cold state. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ In recent years, nonvolatile systems, as shown in Fig. 1(b), have become increasingly important because they can eliminate the leakage current in the standby mode while storing data. With the advent of data retentive nonvolatile flip-flops (NVFFs), the states in the flip-flops are stored into embedded nonvolatile memories (NVMs) before power shut down and restored before the system restarts computing. NVFFs eliminate the wake-up process and the leakage current in the standby mode, leading to high energy efficiency for battery-operated normally-off devices. Moreover, batteryless energy-harvesting devices can receive significant assistance from state backup and restore because they frequently lose their computing progress in case of abrupt power outages with intermittent energy harvesting sources, such as photovoltaics, vibrations, and radio frequency (RF) [6]- [9]. In general, an NVM can store 1-bit data with two distinguishable resistance states. The two resistance states are called as the low-resistance state (LRS) and the high-resistance state (HRS), which denote the on-state and off-state, respectively. The data can be restored by sensing the currents flowing through low and high resistances, I on and I off . Various NVFFs have been proposed with different promising NVMs, such as spin-transfer torque magnetic tunnel junctions (STT-MTJs) [10]- [17] and resistive random access memory (ReRAM) [18]- [21]. However, they still have severe challenges owing to the following limitations of the device characteristics. They require large read circuits to distinguish the low ratio of I on and I off (I on /I off ). Moreover, they are two-terminal devices that share the same path in the read and write operations. Thus, the write circuit needs additional logic gates to differentiate the write and read operations, which incurs area overhead. In addition, they consume static current to switch their resistance state, resulting in high energy consumption in the write operation.
A ferroelectric field-effect transistor (FeFET), another promising NVM, can resolve these challenges of NVFFs. An FeFET has a high I on /I off ratio similar to a metal-oxide-semiconductor field-effect transistor (MOSFET). Thus, the read circuit is simple with a few transistors. An FeFET is an FET with an additional ferroelectric layer in the MOS structure that has three-terminals like in a MOSFET. The states of an FeFET can be switched by the gate-to-source voltage (V GS ) and read by the drain-to-source current (I DS ). There are several advantages of designing NVFFs using FeFETs owing to their voltage-driven write operations. First, the write and read paths are independent of each other; therefore, an additional write circuit is not required. Second, since the state of an FeFET is switched by V GS , only the dynamic power is consumed for capacitor charging and charge trapping, resulting in a significantly low backup energy in the order of fJ [22]- [26]. Third, the backup signal can be embedded in an external voltage modulation unit [25], [26] to mitigate the routing cost and the control unit overheads. Finally, we can choose whether the data are stored every clock cycle or in a controlled manner simply by tuning the operating voltage. In addition, an FeFET is CMOS compatible with good scalability and can be fabricated easily with existing fabrication facilities.
Various FeFET-based NVFFs have been proposed [22]- [26] based on conventional positive edgetriggered transmission gate flip-flops (TGFFs). A TGFF is the most basic edge-triggered flip-flop composed of master and slave latches. Most of the previous structures integrate one or two FeFETs into the slave latch of the TGFF and restore the state with V DD recovery. They can be classified by the position of the FeFET. An FeFET is generally located out of the logic path (FeFET-out) [22]- [26]. In this case, V DD can be sufficiently lowered for low-power operations at the cost of the performance. However, since an FeFET has an I on /I off ratio similar to that of a conventional MOSFET, it can be a part of the logic path (FeFET-in) to reduce the number of transistors [23], [26]. In this case, although V DD cannot be lowered sufficiently, it may achieve a compact design with reduced number of transistors.
We have reviewed three most recent structures of FeFET-based NVFFs in [25], [26]. The circuit in [25] placed two FeFETs out of the logic path (previous FeFET-out NVFF-1), in parallel with the slave latch. The opposite resistance states were stored in the two FeFETs and restored with a differential structure. Although previous FeFET-out NVFF-1 achieves a low energy and small delay overhead, it has a large additional circuitry with four transistors and FeFET mismatch problems. In [26], comparatively more compact structures were proposed with only three additional transistors with both FeFET-out (previous FeFET-out NVFF-2) and FeFET-in (previous FeFET-in NVFF) structures. However, each structure has severe issues, such as using a control signal as the power source and malfunction in the restore operation.
In this paper, we propose two compact designs of FeFET-based NVFFs with both FeFET-out and FeFET-in structures. The main contributions in this paper are as follows: 35550 VOLUME 9, 2021 • Analysis and categorization of the most recent previous FeFET-based NVFFs in terms of 1) the read schemes depending on the number of FeFETs and 2) the locations of the FeFETs in the circuits.
• An FeFET-out NVFF is proposed. The proposed FeFET-out NVFF corrects the restore error of previous FeFET-out NVFF-2 in [26] and improves the clock-to-Q delay while maintaining three additional transistors.
• An FeFET-in NVFF is proposed. The proposed FeFET-in NVFF achieves an improved clock-to-Q delay with only two additional transistors, the least number of additional transistors thus far.

II. FeFET DEVICE
An FeFET has attracted interest in both nonvolatile memory [22]- [30] and logic [31], [32] designs owing to its capability of steep switching, tunable hysteresis behavior, good scalability, and low-power operation. In this section, we introduce the basics, read schemes for FeFET as memory, and simulation setup.

A. FeFET STRUCTURE AND CHARACTERISTICS
An FeFET can operate as an NVM with hysteresis behavior according to V GS with the sufficient thickness of the FE layer (T FE ) [22]- [30]. A reported FeFET I DS -V GS hysteresis curve with two opposite polarization states shows a large I on /I off ratio of over 10 6 in [33]. The polarization state of the FE layer plays a critical role in setting the LRS and HRS of an FeFET. When a sufficiently large |V GS | exceeding the coercive voltage (V C ) is applied to an FeFET, the polarization state of the FE layer switches within dozens of picoseconds. The switching operation of two polarization states of an FeFET is shown in Fig. 2(b). In an nMOS-based FeFET (n-FeFET), when a positive V GS exceeding V C is applied, the polarization of the FE layer points to the channel. The positive polarization state assists the electrons in the substrate to form a channel, leading to the formation of LRS. Conversely, when a negative V GS smaller than −V C is applied, the polarization of the FE layer switches to the opposite, pointing to the gate metal. The negative polarization state disturbs the electrons from forming a channel, leading to the formation of HRS. Similarly, the same concept can be directly extended to a pMOS-based FeFET (p-FeFET) [22].

B. FeFET MODEL SETUP
In this study, we performed HSPICE simulations based on a 10-nm FinFET predictive technology model [34] fitted to the I DS versus V GS (I-V) characteristic of an Intel 10-nm FinFET [35]. Figs. 3(a) and (b) respectively show the polarization versus V GS (P-V) and I-V hysteresis curves of the FeFET with T FE = 8 nm whose V C is around 0.1 V. The I-V curve in dotted line is simulated from the SPICE model in [26] calibrated with a time-domain-based Landau-Khalatnikov (LK) equation [36]. The characteristics of the experimental FeFET device were properly reflected by extracting the LK equation parameters from the P-V curve which was measured from the experiment using ferroelectric material, Hf 0.7 Zr 0.3 O 2 [37]. The I-V curve in red line is simulated from the fitted FeFET model used in this study to mimic the I-V curve in dotted line. However, V C of 0.1 V is insufficient for the ''controlled'' backup discussed in section III and IV because V DD has to be lower than V C to prevent unintentional polarization switching. By increasing T FE , the gate-to-source voltage required to switch the polarization state of the FE layer, V C , increases [22]- [26]. Therefore, V C of the FeFET model (0.1 V) is intentionally increased to 0.5 V [25], [26], assuming a thicker FE layer than 8 nm is employed for the simulation purpose to verify the ''controlled'' backup scheme.

C. FeFET READ SCHEMES
NVFFs with other emerging NVMs such as STT-MTJs require a large read circuit to differentiate two opposite resistance (high and low) states because the difference between VOLUME 9, 2021 the high and low resistances is small. To improve the sensing margin, they employ two NVMs with a differential structure. In contrast, FeFET-based NVFFs have small read circuits owing to high I on /I off ratio of FeFETs, as shown in Fig. 4. The read schemes can be categorized by the number of FeFETs and the structure of the circuits into differential and single-ended read schemes.
The read circuits in Figs. 4(a) and (b) with two FeFETs utilize a differential read scheme in which the data are restored based on the current difference in two pull-down paths. The pull-down paths are composed of two FeFETs with opposite polarization states [23]- [25]. In this case, either node OUT or OUT_b is pulled down to ground (GND), and the other node is pulled up to V DD by pMOS or cross-coupled inverters. A differential read scheme requires a large read circuit; however, it can improve the sensing margin even in low V DD .
The read circuits in Figs. 4(c) and (d) with a single FeFET utilize a single-ended read scheme in which the data are restored based on a resistive voltage divider [22], [26]. The read circuit in [22] is composed of one large-sized p-FeFET (FP1) and one minimum-sized nMOS (MN1), as shown in Fig. 4(c). If FP1 is in the HRS, MN1 pulls down node OUT to GND. If FP1 is in the LRS, both FP1 and MN1 are on and node OUT becomes close to V DD by the stronger pull-up path. The other read circuit is composed of a minimum-sized pMOS (MP1), a large-sized n-FeFET (FN1), and an nMOS (MN1), as shown in Fig. 4(d) [26]. In this case, if FN1 is in the HRS, node OUT is pulled up to V DD . If FN1 is in the LRS, node OUT becomes close to GND owing to the strong pull-down path. Using a single-ended read scheme, the read circuit can be made compact with a few transistors.
Because an FeFET has an outstanding I on /I off ratio comparable to a MOSFET, it can be a good choice to exploit a single-ended read scheme to reduce area overhead. Accordingly, we utilize a single-ended read scheme in our proposed structures.

III. PREVIOUS FeFET-BASED NONVOLATILE FLIP-FLOPS
There are four operational modes in an NVFF: normal, backup, standby, and restore modes. In the normal mode, it works as an edge-triggered conventional D flip-flop. In the backup mode, the state in the flip-flop is stored in the embedded NVM. In the standby mode, the power supply is fully shut down for achieving zero standby leakage power. In the restore mode, the power supply returns from 0 V to V DD, and the data saved in the NVM are restored to resume the normal operation.
The voltage-driven write operation of an FeFET device enables two backup schemes for FeFET-based NVFFs: ''per-cycle'' and ''controlled'' backup. The backup scheme is decided by the voltage level of V DD and V C . The ''per-cycle'' backup operation occurs when V DD is higher than V C . The polarization state of an FeFET in the slave latch switches at every clock cycle; thus, a separate backup mode is not required. Energy-harvesting devices lose the computing states frequently with unreliable power supply, and thus, can benefit from this ''per-cycle'' backup and there is also no routing cost for the backup signal line. However, this backup scheme has a limitation that the minimum V DD depends on V C [26]. In addition, issues about the endurance and aging effects of ferroelectric materials have been reported [38], [39], i.e., the endurance is worsened by switching the polarization state every clock cycle.
If V DD is lower than V C , we can control the data backup with a separate backup mode with an increased V DD exceeding V C . Because a ''controlled'' backup operation needs dynamic V DD , the backup signal is embedded in the supply voltage line to increase V DD beyond V C , which provides sufficient V GS to switch the polarization states of the FE layers. Because such dynamic V DD has already been extensively used in many digital circuits, NVFFs can reuse a V DD modulation unit without introducing other overheads [25], [26]. Thus, the routing cost for the backup signal line can also be reduced. With a ''controlled'' backup scheme, V DD can be lower than with a ''per-cycle'' backup scheme. Thus, it enables low V DD normal mode operation for low power consumption. However, an abrupt power outage in energy-harvesting devices will result in the loss of the computing state in this backup scheme. Fig. 5 shows the schematic of the slave latch of the most recent FeFET-based NVFFs in [25], [26]. We briefly discuss a few different characteristics of previous FeFET-based NVFFs  the FeFET has to act as a transistor in the logic path. This suggests that the polarization state of the FeFET has to switch according to the gate input voltage because current cannot flow through the FeFET in the HRS, and vice versa in the LRS. The polarization state switching in every gate voltage transition results in per-cycle data backup owing to the remnant polarization state of the FeFET. In this section, we review three latest FeFET-based NVFFs in [25], [26] in terms of their structures, overall operations, and issues.
A. PREVIOUS FeFET-OUT NVFF-1 Fig. 5(a) shows the structure of previous FeFET-out NVFF-1 in [25]. In the normal mode, the restore (RSTR) signal is grounded, and thus, two pull-down transistors, T1 and T2, are turned off. The two FeFETs, T3 and T4, do not affect nodes Q and QN because their source terminals are floating.
It is noticeable that the opposite gate-to-source voltages (V GS,T3 = V Q − V QN and V GS,T4 = V QN − V Q ) are applied to the two FeFETs such that they have opposite polarization states through the backup operation. In the ''per-cycle'' backup scheme with V DD > V C , because |V GS | of both T3 and T4 exceeds V C , the state of the slave latch is stored in both the FeFETs at every clock cycle. Therefore, a separate backup mode is not required. In the ''controlled'' backup scheme with V DD < V C , we can control the data backup with a separate backup mode with increased V DD exceeding V C .
In the restore mode, the data are restored with the differential read scheme described in Section II-C. However, previous FeFET-out NVFF-1 has a large area overhead with four additional transistors and is sensitive to the process variations of the FeFETs related to V C or I on /I off ratio. The latter can deteriorate the complete switch of the polarization states or restore yield. If the two FeFETs have different V C s, they will not safely switch to the opposite polarization states. Moreover, a mismatch in I on and I off can adversely affect the restore yield because the restore operation is based on the current difference.
which uses a single-ended read scheme with one FeFET, and it has three additional transistors compared to conventional TGFFs. In the normal mode, the RSTR signal is grounded so that M1 is turned off and M2 is turned on, and node Z is inverted by M4 and M3 according to node X. Thus, previous FeFET-out NVFF-2 operates as a conventional TGFF.
This structure can also utilize both ''per-cycle'' and ''controlled'' backup schemes. V DD is decided to be higher or lower than V C depending on the backup scheme. In a ''percycle'' backup scheme, V DD is higher than V C , and thus, V GS applied to the FeFET by the voltage difference between nodes X and Z switches the polarization state per clock cycle according to the data in the slave latch. In a ''controlled'' backup scheme, V DD is lower than V C , and thus, the backup operation is performed by ramping up V DD over V C in a separated backup mode in a controlled manner.
Previous FeFET-out NVFF-2 has a critical defect in the restore operation. In the restore mode, the RSTR signal is set to 1; therefore, M1 is turned on and M2 is turned off. When the FeFET is in the LRS, node Z successfully becomes close to the GND with a stronger pull-down path as V DD recovers from 0 V. However, when the FeFET is in the HRS, as shown in Fig. 6(a), M3 and M4 along with the INVF form a cross-coupled inverter, resulting in metastability. The restore failure from metastability can be seen in the timing diagram in Fig. 6(b). In this regard, previous FeFET-out NVFF-2 cannot be used as an NVFF. Fig. 5 (c) presents the structure of the previous FeFET-in NVFF in [26], which has three additional than conventional TGFFs. It can utilize only a ''per-cycle'' backup scheme; therefore, V DD must be higher than V C to switch the polarization state of the FeFET. In the normal mode, the RSTR signal is 0 and the RSTR_b signal is 1; thus, the INVA works as an inverter and M1 is turned off. When X = 0, M4 is turned on VOLUME 9, 2021 The previous FeFET-in NVFF cannot exploit a ''controlled'' backup scheme because the FeFET is located in the logic path, and thus, it must turn on/off depending on the gate input data. The polarization state of the FE layer determines the on/off state of the FeFET, and thus, V GS should always be greater than V C. Therefore, V DD should be higher than V C and the data in the slave latch (Q) is stored in the FeFET every clock cycle. If V DD is lower than V C , the FeFET-in NVFF cannot work as a flip-flop. For example, if node X becomes 1, node Z has to be driven to 0. However, when the FeFET is in the HRS, current cannot flow through the FeFET, causing node Z to float.

C. PREVIOUS FeFET-IN NVFF
The restore operation is based on a single-ended read scheme with a strong pull-down path composed of an appropriately sized FeFET and M1. The previous FeFET-in NVFF is compact with only three transistors; however, it has the following issues. The first issue is the clock-to-Q delay degradation for the 0 → 1 transition. This is because the 0 → 1 transition occurs after the polarization switching of the FE layer (from the HRS to the LRS). This worsens the normal mode speed and can limit the maximum clock frequency. The second problem is that the signal line is used as the power supply. Because the control signal, RSTR_b, is used as the supply line for INVAs in every NVFF, like in [22], driving the RSTR signal can consume more time and power in the restore mode. In addition, there unexpected problems can occur in supplying sufficient robust power using the control unit. Table 1 summarizes the comparison of important metrics for the three recent FeFET-based NVFFs. Each of them has its demerits in terms of many additional transistors, the error in the restore operation, the power supply with control signals, and the degradation in the clock-to-Q delay. Because the speed of an NVFF is also crucial, the clock-to-Q delay overhead in the normal mode is recommended to stay in a satisfactory region (∼5%).

IV. PROPOSED FeFET-BASED NONVOLATILE FLIP-FLOPS
In this section, we propose two FeFET-based NVFFs having FeFET-out and FeFET-in structures employing the conventional TGFF as the base structure. The additional circuitries in both the structures have a negligible impact on the normal mode operation because they are located inside the feedback loop of the slave latch to minimize the load capacitance on the critical path of the clock-to-Q delay. The proposed FeFET-based NVFFs can perform the restore operation without any failure, do not use control signals as the power supply, and have small clock-to-Q delay overheads in the normal mode. Furthermore, the proposed FeFET-based NVFFs have only three or fewer additional transistors. Fig. 7 shows the timing diagrams of the proposed FeFETbased NVFFs with two backup schemes: (a) the ''per-cycle'' and (b) the ''controlled'' backup schemes. In the normal mode, input D is captured to output Q at the rising edge of the clock (CLK) signal. The restore (RE) signal is activated to V DD only in the restore mode.
In the ''per-cycle'' backup scheme shown in Fig. 7(a), the polarization state of FeFET follows the Q value so that the data in the slave latch is stored in FeFET at every clock cycle in the normal mode. In this case, we set V DD = 0.7 V and V C = 0.1 V. Because V DD is higher than V C , sufficient V GS of FeFET can stably switch the polarization state of the FE layer. The final polarization state of FeFET is settled to the HRS according to the last Q value before going into the standby mode. In the restore mode, the Q value is restored to 1 according to the polarization state of FeFET.
In the ''controlled'' backup scheme shown in Fig. 7(b), the polarization state of FeFET does not switch in the normal mode. In this case, we set V DD = 0.3 V and V C = 0.5 V. Because V DD is lower than V C , V GS of FeFET is insufficient to switch the polarization state of the FE layer. The backup operation is performed before going into the standby mode by increasing V DD to the backup voltage, V BKP (0.9 V) over V C (0.5 V); therefore, the polarization state of FeFET switches according to the last Q value. The final polarization state of FeFET is settled to the HRS in this example. In the restore mode, the Q value is restored to 1 according to the polarization state of FeFET. The notable difference between ''per-cycle'' and ''controlled'' backup schemes is that the FeFET state switches at every clock cycle or at the separate backup mode, as shown in circled parts in Figs. 7(a) and (b).

1) NORMAL & BACKUP OPERATION
In the normal mode, the restore (RE) signal is set to 0 (RE = 0 and RE_b = 1); therefore, MN2 and MP2 are turned off. Consequently, node n3 is driven to the inverted value of node n2 by MP1 and MN1 such that the NVFF operates as a conventional TGFF. The operating voltage, V DD , can be chosen depending on the backup scheme.
In the ''per-cycle'' backup scheme with V DD (0.7 V) > V C (0.1 V), the polarization state of the FE layer has to switch or remain every clock cycle according to the data of the slave latch, Q, as shown in Fig. 7(a).
1) When Q = 0, n2 = 1, and n3 = 0, the positive gate-tosource voltage of the FeFET (V GS = V n2n3 = V DD ) provides sufficient voltage that exceeds V C (0.1 V). This switches the FE layer to the positive polarization state (LRS).
2) When Q = 1, n2 = 0, and n3 = 1, the negative gateto-source voltage of the FeFET (V GS = V n2n3 = −V DD ) switches the FE layer to the negative polarization state (HRS). VOLUME 9, 2021 The initial charge at node n4 does not affect the final resistance state of the FeFET. If Q changes from 1 to 0 (n2/n3 change from 0/1 to 1/0), the FeFET has to switch from the HRS to the LRS. At first, the polarization state of the FE layer close to node n3 partially switches to positive (polarity to channel) with V n2n3 = V DD , so that the electron channel is formed in the substrate of the FeFET. Thus, a current can flow through the FeFET, which makes node n4 becomes 0. Then, the FeFET fully switches to the LRS with V n2n3 = V n2n4 = V DD . If Q changes from 0 to 1 (n2/n3 change from 1/0 to 0/1), the FeFET has to switch from the LRS to the HRS. Node n4 is pulled up to 1 following node n3 since a current can flow through the FeFET in the LRS. Then, the FeFET switches to the HRS with V n2n3 = V n2n4 = −V DD . Therefore, the polarization state can be switched by a single gate-to-source voltage and a floating node on the other terminal same as in the previous FeFET-out NVFF-1 in Fig. 5(a).
In the ''controlled'' backup scheme with V DD (0.3 V) < V C (0.5 V), the state backup operation occurs in a separate backup mode because the gate-to-source voltage of the FeFET (|V GS | = V DD ) does not provide sufficient voltage to switch the polarization of the FE layer in the normal mode. The timing diagram in Fig. 7(b) shows an example of storing a high state Q. In the backup mode, V DD increases from 0.3 V to V BKP (0.9 V) over V C (0.5 V) for a certain period to safely switch or retain the polarization state of the FE layer.

2) RESTORE OPERATION
The number of fins for the nMOS and the pMOS are set to one and two, respectively, while MN2 and FeFET have four fins for the successful restore operation based on the single-ended read scheme. The RE signal is set to 1 (RE = 1 and RE_b = 0) to turn on MN2 and MP2. The CLK signal remains 0 to isolate the slave latch from the master latch by TG1 and form a feedback loop in the slave latch by TG2. The restore operation occurs when V DD ramps up from the GND. 1) If the FE layer is in the positive polarization state (FeFET = LRS), node n3 is connected to both V DD (by MP2) and GND (by the FeFET and MN2). At the beginning of V DD rise, all nodes are at 0 V that all the pMOSs are turned on. When V DD starts to ramp up, all nodes start to be charged up by pMOSs but not for node n3. A current flows into node n3 is immediately pulled down to GND by strong FeFET and MN2 that node n3 stays close to 0. Therefore, node n2 is charged up to V DD by an inverter and MN1 is turned on. Finally, MN1 also helps node n3 to stay close to 0.
2) If the FE layer is in the negative polarization state (FeFET = HRS), the FeFET detaches node n3 from the GND and MP2 charges node n3. When V DD recovery finishes, node n3 is charged to 1, resulting in n1 = 1, n2 = 0, and Q = 1. Thus, the final state of the slave latch before power off is successfully restored depends on the polarization state of the FeFET.
The supply line has a relatively large capacitive load that requires time to fully recover the operating voltage. Thus, the actual restore time is almost the same as the V DD recovery time because the intrinsic restore time of the proposed FeFET-out NVFF is less than 100 ps.

B. PROPOSED FeFET-IN NVFF
The proposed FeFET-in NVFF with only two additional transistors (MN1 and FeFET) is shown in Fig. 9, in which the FeFET is placed in the logic path as a part of the pull-down path of the inverter feedback loop in the slave latch.

1) NORMAL & BACKUP OPERATION
In the normal mode, the operating voltage, V DD (0.7 V), should remain above V C (0.1 V) of the FeFET, same as in the previous FeFET-in NVFF, because the FeFET has to turn on/off (the polarization state has to switch) depending on V GS . The RE signal is set to 0; therefore, MN2 is turned off during the normal mode. The proposed FeFET-in NVFF exhibits a much faster clock-to-Q delay than the previous FeFET-in NVFF, which involves the polarization switching time in the clock-to-Q delay. The clock-to-Q delay of the proposed NVFF does not involve the polarization switching time because the data transition at node n1 directly passes through the two inverters to node Q, same as a conventional TGFF.
The backup operation occurs when CLK = 1 (TG2 is turned off).
1) When Q = 0 and n2 = 1, MP1 is turned off and MN1 is turned on. Node n4 is discharged to 0 by MN1, and V GS of the FeFET becomes V DD (V GS = V n2n4 = V DD ). Thus, the FE layer switches to the positive polarization state (FeFET = LRS), and node n3 becomes 0 by the pull-down path, the FeFET, and MN2. The initial charge at node n3 does not affect the polarization switching of the FeFET because MP1 and TG2 are turned-off.
2) When Q = 1 and n2 = 0, MP1 is turned on and MN1 is turned off. In this case, MP1 drives node n3 to 1 immediately without considering the FeFET state because the pull-down path is cut-off by MN1. Consequently, the value in node n2 is successfully inverted to node n3 (latching Q). Subsequently, the polarization of the FE layer is switched to the negative polarization state (FeFET = HRS) by the negative V GS (V GS = V n2n3 = −V DD ). The initial charge at node n4 does not affect the polarization switching of the FeFET because MN1 and MN2 are turned-off.

2) RESTORE OPERATION
The restore operation is similar to that of the proposed FeFETout NVFF. V DD ramps up from 0 V, the RE signal is asserted to 1, and the CLK signal remains 0 to isolate the slave latch from the master latch by TG1 and form a feedback loop in the slave latch by TG2. Thus, MN1 is turned on and clamps node n4 to the GND tightly. At the beginning of the V DD rise, all the pMOSs are turned on because all the nodes are 0 initially. As V DD recovers, the final voltage of node n3 is selected depending on the polarization state of the FeFET.
1) If the FE layer is in the positive polarization state (FeFET = LRS), both MP1 and the FeFET drive node n3. Because the pull-down path, which consists of the FeFET and MN2, is designed stronger than the pull-up path (MP1), node n3 will be pulled down close to the GND. At the beginning of the restore operation, the FeFET and MN2 clamp node n3 to GND tightly; therefore, a current flowed from MP1 directly leaks to GND, resulting in node n3 to become 0 V. As V DD ramps up, node n2 is charged close to V DD by MP2 and thus, MP1 is turned off. Therefore, node n3 safely becomes to 0 V. The voltage of node n3 passes through a transmission gate (TG2), and node n1 also becomes close to the GND. Thus, node n1 pulls up node n2 to V DD by an inverter, and thus, Q becomes 0.
2) If the FE layer is in the negative polarization state (FeFET = HRS), only MP1 drives node n3 to V DD followed by node n1 through the transmission gate. Thus, node n1 pulls down node n2 to 0 and Q becomes 1.

V. ANALYSIS AND COMPARISON
In this section, we present the evaluation of the performance of the proposed FeFET-based NVFFs in terms of the layout area, clock-to-Q delay, and energy consumption.
We implement 1K Monte Carlo HSPICE simulations to consider variations in A VTH , 1.07 mV/µm and 1.23 mV/µm for the nMOS and the pMOS, respectively [40]. The simulations were performed at following temperature corners; cold (-40 • C), room (25 • C), and hot (120 • C) temperature corners in order to consider the temperature variations on clock-to-Q delays. The minimum gate length of the transistors is 18 nm [35], and 0.7 V and 0.3 V are used for V DD according to the backup schemes. The rising and falling times of the CLK and other signals are taken to be 10 ps each, and 100 ps for V DD owing to its relatively large capacitive load. A 2-fF capacitor is added at the output node to mimic the output capacitance [25], [26]. It is unreasonable to compare the performance under different simulation conditions; therefore, we measured the performance under the same simulation conditions for a reasonable comparison. In this respect, we analyze and compare performance metrics, such as the area, delay, and energy, under the same simulation setup conditions and using the FeFET device model described in Section II. The performance of the previous FeFET-out NVFF-2 is not measured because it cannot appropriately operate, as discussed in Section III.
A. LAYOUT AREA Flip-flops are the most basic sequential elements, and hundreds of thousands of them have widespread use in the overall circuits of processors. In this regard, the area of the NVFF is an important metric. The size of the circuit is generally proportional to the number of additional transistors; however, the comparison of the actual layouts is more important. Fig. 10 shows the layouts of the proposed FeFET-based NVFFs following the λ-based rule [41] same as in [22] and [26]. The additional circuitries of the FeFETout and FeFET-in NVFFs lead to 19% and 15% area overheads than the conventional TGFF. The area overhead is significantly reduced compared to those of the recent FeFETbased NVFFs in [25], [26], which show 40% and 23.7% area overheads, as displayed in Fig. 11. The proposed FeFETout and FeFET-in NVFFs reduce the layout areas by 17.6% and 8.7% compared to the previous FeFET-out and FeFET-in NVFFs, respectively.

B. CLOCK-TO-Q DELAY
Clock-to-Q delay is an important figure-of-merit because it adversely affects the sequencing overhead in the entire computing progress. For example, the increase in the clock-to-Q delay increases the CLK period, resulting in performance degradation. Fig. 12 shows the clock-to-Q delays of the previous and proposed FeFET-based NVFFs at different temperature conditions. The clock-to-Q delays of the NVFFs are inevitably longer than that of the conventional TGFF because of the additional circuitry for the nonvolatile VOLUME 9, 2021  behavior. The performance comparison of the FeFET-based NVFFs is based on the simulation results at the room temperature corner (25 • C).
Previous FeFET-out NVFF-1 has FeFETs out of the logic path; therefore, the extra load capacitances of the additional circuitry degrade the performance. It has two gate (T3, T4) and two diffusion (T3, T4) capacitances in the critical path at nodes Q and QN. The structure employs the minimum transistor sizes for the two FeFETs, T3 and T4, owing to its differential structure, resulting in slight clock-to-Q delay degradation, a 59.73-ps clock-to-Q delay (+4.3% compared to that of the TGFF).
The previous FeFET-in NVFF has the worst clock-to-Q delay of 71.14 ps (+24.3% compared to the TGFF) owing to the polarization switching time of the FE layer. We measured the delay with a calibrated FeFET. Moreover, we added the ideal case of the polarization switching time (∼10 ps) for an FeFET with T FE = 8 nm and the kinetic coefficient of the LK equation, ρ = 0.25 −cm, which determines the polarization switching speed [26]. Because the polarization switching time directly increases the clock-to-Q delay, the performance varies with the physical characteristics of the FeFET. The previous FeFET-in NVFF has the worst clock-to-Q delay among the FeFET-based NVFFs, even under the ideal condition (the shortest polarization switching time). Moreover, it is a significant drawback that the clock-to-Q delay depends on the polarization switching time of the FeFET, because given the variations in the physical characteristics of FeFETs, the performance of NVFFs can worsen. However, the proposed FeFET-in NVFF does not involve the polarization switching time; therefore, it has a better clock-to-Q delay than the previous FeFET-in NVFF under all the conditions.
The proposed FeFET-out NVFF has a clock-to-Q delay of 60.21 ps (+5.2% compared to the TGFF) with only four gate capacitances (FeFET) in the critical path. It has a similar delay compared to previous FeFET-out NVFF-1, with only 0.8% degradation in the clock-to-Q delay. Moreover, the proposed structure can perform the restore operation without any problems, unlike previous FeFET-out NVFF-2, while maintaining three additional transistors.
The proposed FeFET-in NVFF achieves a clock-to-Q delay of 59.82 ps (+4.5% compared to the TGFF), which is reduced by 18.9% versus the previous FeFET-in NVFF, and it also has four gate capacitances (FeFET) in the critical path. Unlike the previous structure, the polarization switching time does not affect the clock-to-Q delay because it is integrated with the inverter in the feedback loop of the slave latch. Furthermore, the proposed FeFET-in NVFF exhibits a better clock-to-Q delay than the previous FeFET-in NVFF, regardless of the polarization switching time.
The proposed FeFET-in NVFF integrates an FeFET in the feedback loop inverter of the slave latch. Thus, the polarization of the FE layer should be switched depending on the Q value during CLK = 1, particularly when Q = 0 (n1 = 0 and n2 = 1), as discussed in Section IV-B. If the CLK becomes 0 before the polarization state becomes the LRS, node n3 is not driven to the GND strongly because the polarization state of the FE layer is not settled in the LRS. Thus, Q = 0 is not safely latched in the slave latch. However, this is not problematic because the polarization switching time of ∼10 ps is shorter than the CLK period. Because the proposed FeFET-in NVFF has sufficient time to wait for the polarization switching, it is much more tolerable to the longer polarization switching time due to the variations in the FeFET physical characteristics than the previous FeFET-in NVFF.
The restore time does not have a significant impact on the performance because the restore operation is completed within the V DD rise time. In a practical case, the wakeup time of the state of charge generally requires in the order of several microseconds [42], [43] and it needs a long time for V DD rising owing to the very high capacitance. The measured worst intrinsic restore times of the previous and proposed NVFFs are much shorter than 100 ps, and thus, the data are restored in a single phase with V DD rising for all the NVFFs.

C. COMPARATIVE ANALYSIS
Previous FeFET-out NVFF-1, the previous FeFET-in NVFF, the proposed FeFET-out NVFF, and the proposed FeFET-in NVFF consume 2.23 fJ, 2.43 fJ, 2.37 fJ, and 2.36 fJ in the normal mode and 2.35 fJ, 1.93 fJ, 2.5 fJ, and 1.9 fJ in the restore mode, respectively. The proposed FeFET-out NVFF consumes a slightly higher operating energy (6.3%) than previous FeFET-out NVFF-1, and the proposed FeFET-in NVFF consumes 3% less operating energy than the previous FeFET-in NVFF. The proposed FeFET-out NVFF consumes a slightly more energy in the restore mode (6.4%) than the proposed FeFET-out NVFF because a static current flows through MP2, the FeFET, and MN2. However, the restore operation occurs only once during the wakeup process; therefore, the portion of the restore energy in the order of fJ is negligible among the overall energy consumption. Table 2 summarizes the comparative analysis of the previous NVFFs with promising NVMs and the proposed FeFET-based NVFFs. In comparison to the recent FeFET-based NVFFs, the proposed ones achieve similar or better clock-to-Q delay and operating energy while preserving and even reducing the number of additional transistors. In addition, the issues in the previous studies, such as usage of control signals as supply lines and the restore failure, do not occur in the proposed NVFFs. To summarize our contributions, the proposed FeFET-based NVFFs are superior than the previous ones because of the following reasons: • The proposed FeFET-based NVFFs have the smallest layout areas with only three and two additional transistors compared to the conventional TGFF while achieving comparable or better delay and energy than the previous FeFET-based NVFFs.
• The proposed FeFET-based NVFFs can support abrupt power outages of energy-harvesting devices by a ''per-cycle'' backup scheme. Particularly, the proposed FeFET-out NVFF can utilize both ''per-cycle'' and ''controlled'' backup schemes and achieve low-power operation with lowered V DD . VOLUME 9, 2021

VI. CONCLUSION
In this paper, we proposed two FeFET-based NVFFs, FeFETout and FeFET-in NVFFs. The proposed FeFET-out NVFF reduces the large area overhead of previous FeFET-out NVFF-1 considerably and also eliminates the malfunction occurring in the restore operation of previous FeFET-out NVFF-2. The proposed FeFET-in NVFF exhibits the minimum area overhead reported thus far, with only two additional transistors and even improves the clock-to-Q delay and the operating energy. Both the proposed FeFET-based NVFFs achieve comparable or better delay and energy behavior compared to the previous ones. The proposed highperformance, low-area-overhead, and low-power NVFFs can be utilized for IoT applications and energy-harvesting devices with FeFET-circuit integration.

ACKNOWLEDGMENT
The EDA Tool was supported by the IC Design Education Center.