A Low-Spur Fractional-N PLL Based on a Time-Mode Arithmetic Unit

This article introduces a low-jitter low-spur fractional-N phase-locked loop (PLL) adopting a new concept of a time-mode arithmetic unit (TAU) for phase error extraction. The TAU is a time-signal processor that calculates the weighted sum of input time offsets. It processes two inputs—the period of a digitally controlled oscillator (DCO) and the instantaneous time offset between the DCO and reference clock edges—and then extracts the DCO phase error by calculating their weighted sum. The prototype, implemented in 40-nm CMOS, achieves 182-fs rms jitter with 3.5-mW power consumption. In a near-integer channel, it shows the worst fractional spur below −59 dBc. Under considerable supply or temperature variations, the worst spur still remains below −51.7 dBc without any background calibration tracking.

near-zero input so that its gain can be enlarged to suppress the noise contribution of subsequent loop blocks.
In applying the narrow-range phase-detection concept to fractional-N PLLs, digital-to-time converters (DTCs) are widely explored [8], [9], [10], [11], [12], [13]. Fig. 1(a) illustrates a conceptual PLL (similar to that in [14]) relying on a DTC to cancel the instantaneous time offset from the significant (here, falling) edge of the reference clock (FREF) to that of the variable oscillator clock (CKV). This time offset spans from zero to the CKV period T CKV (i.e., "time base") and is predicted by φ R,frac ∈ [0, 1), i.e., the fractional part of the accumulated frequency control word (FCW) [15]. According to this prediction, the DTC launches a delayed FREF, FREF dly , which is substantially aligned with the relevant CKV edge in order to narrow down the input range of the PD.
This DTC-based solution is highly effective in improving PN. However, it potentially introduces high fractional spurs at the PLL output since the DTC delay can easily depart from its nominal expected value of (1−φ R,frac )·T CKV . Such a mismatch stems from the underlying principle of DTCs-delaying input edges based on the circuit's nominal intrinsic latency, e.g., the propagation delay of the elements in a delay-chain-based DTC [16]. This is markedly distinct from the conventional digital-to-analog converters (DACs), which generate signals by scaling a stable and accurate base, e.g., a bandgap reference voltage. Given the sensitivity of the circuit's intrinsic latency to process, voltage, and temperature (PVT) variations [17], [18], [19], an extra effort is required in tracking and protecting the DTC's transfer function (i.e., from φ R,frac to DTC delay), so as to prevent the associated PLL spurs from arising. Tasca et al. [8], Liu et al. [20], and Un et al. [21] track the drift of the DTC transfer gain in the background. Wu et al. [7] and Markulic et al. [22] protect the DTC delay from supply variations with dedicated low-drop regulators (LDOs), so as to alleviate any memory effects in DTC's transfer function. Markulic et al. [22] further use a complementary dummy DTC to reduce the time-varying supply perturbations resulting from the main DTC. These countermeasures, however, only exhibit limited capabilities in suppressing the DTC-related spurs. When an extremely low spurious level is desired, the DTC codes might need to be modulated to smear the spurs into the noise floor [22], [23], [24]. These extra efforts complicate the design of the overall PLL system and degrade its power efficiency.
Instead of relying on the circuit propagation delay, Wu et al. [17], Chen et al. [19], and Liao and Dai [25] cancel the instantaneous fractional-N time offset in the voltage domain. A conceptual example emulating [17] is presented in Fig. 1(b). The time offset between FREF and its subsequent CKV edge, t S , is converted into voltage V S by the charging curve characteristic. The PLL cancels V S with its prediction (V P ) to extract the phase error information in the voltage domain (V e ). Accurate error extraction here requires a charging curve of constant slope since the voltage prediction assumes a linear time-to-voltage conversion. Such a dependence is also imperfect because the slope is generated by (dis)charging a capacitor through a current source, which raises two issues: 1) it requires a stable current reference, which is costly and 2) it invokes a tradeoff issue between noise, linearity, and power: linearity of the (dis)charging slope is degraded by the finite impedance of the current source (i.e., an MOS transistor). Therefore, circuit-level techniques, such as cascoding [12], [26], seem mandatory. Nevertheless, they consume significant voltage headroom, thereby degrading the noise. To combat the noise, the associated node capacitance and current must be enlarged. However, a larger current implies not only higher power but also a wider current source transistor, which lowers the impedance and possibly launches a new round of tradeoffs.
The dilemmas of these two discussed methods root in their dependence on the PVT-sensitive physical parameters, i.e., the intrinsic circuit latency of the DTC and the (dis)charging slope in the voltage-domain cancellation. Mathematically, a "golden" base for the fractional-N time-offset cancellation is T CKV since the time offset is predicted by (1 − φ R,frac )· T CKV . In terms of implementation, T CKV is also accurate and stable since it is intrinsically tracked by the PLL. Therefore, we propose a new time-offset cancellation method adopting T CKV as its base, which can be considered analogous to the aforementioned reference voltage in a DAC. 1 The pro- 1 It is interesting to note that Narayanan et al. [27] phase-interpolate new edges from a quadrature RF clock source to substantially cancel the time offset, just like DTC does. That method can be also regarded as utilizing a "golden" time base of T CKV /4. However, cascading many stages of phase interpolators to achieve fine resolution can be quite bulky and power-hungry. A similar approach in [28] uses eight internal phases from an RF oscillator that is followed by a ÷4 divider to reduce the time offset by a factor of 8. The quantization is coarser (i.e., 3 bits to yield a T CKV /8 "golden" time base), but the analog interpolators are avoided. posed method employs a time-mode arithmetic unit (TAU) processor [29] that takes timestamp offsets as inputs and outputs their weighted sum, also in the time domain. Within each PLL cycle, TAU takes both the timestamps defining T CKV , as well as the timestamps defining t S , i.e., the offset between the oscillator and reference clock edges. Then, the weighted sum of their offsets is calculated to extract the desired information (i.e., time error t E input to the PD). With such a "golden" time base, the TAU-based method can exhibit high linearity and built-in resilience to supply and temperature variations. This simplifies the overall PLL system design and helps to suppress the generated spurs. As an extra bonus, TAU can advantageously amplify the desired time residue, thereby suppressing the noise contributions from subsequent loop blocks. Fig. 2 shows a conceptual diagram of the proposed fractional-N PLL. To track the reference phase by the digitally controlled oscillator (DCO), the proposed TAU extracts the time error (t E ) between the FREF and CKV timestamps. This t E is quantized by the time-to-digital converter (TDC) and input to the digital loop filter for the DCO phase error correction.

A. Conceptual Architecture
Generally, t E "hides" within t S , which is the instantaneous "raw" time offset between FREF and the first subsequent CKV falling edge, with theoretical prediction of (1 − φ R,frac )· T CKV . Therefore, extracting t E requires canceling t S with its prediction. In the proposed system, the TAU samples t S and T CKV , and then calculates their weighted sum to extract t E . To further help with suppressing the TDC quantization noise, the TAU also time-amplifies the extracted error by G TA before feeding it to the TDC. Thus, the TAU's output can be described as More abstractly, if T CKV and t S are viewed as general inputs, and G TA and φ R,frac are treated as their weights, the TAU's function can be generalized as producing the weighted sum of its inputs where t i is the i th input time offset, w i is the weight applied to t i , n is the total number of inputs, and t out is the output time offset. Note that t i and t out are generally defined as the time offsets between arbitrary edges. To realize this conceptual PLL system, we first realize this generalized TAU and then program it to calculate the result required by (1).

B. Evolution From Time Register to TAU
The starting point for implementing TAU is a time register (TR), which takes pulsewidths as inputs, holds them, and then outputs their sum in a complementary form [30]. Fig. 3(a) illustrates how to achieve these functions with a simplified RC model of TR [31]. Before a new execution cycle, capacitor C is charged to an initial voltage V init by closing the charging switch SWC. After SWC is disconnected, the TR processes the active-low pulses on the discharge switch SWD by means of storing their pulsewidths as voltage drops on capacitor C. For example, during the first pulse, the switch SWD is closed to discharge capacitor C through resistor R. After t 1 , the duration of the first pulse, the voltage on the capacitor The new input time t 2 is internally summed with the prestored t 1 and recorded as V init −V 2 . The TR can continue to process more inputs as long as V C is higher than V th , i.e., the threshold voltage of the level-crossing comparator (slicer). Assuming that the TR has processed n pulses in total, the final V C becomes where t i is the width of the i th pulse. To read the recorded time, SWD is pulled down to discharge the capacitor voltage V C to below V th , thereby asserting the comparator's output CMP. The delay between the last SWD and CMP falling edges reflects the processed result, which is an offset (the duration in which V C is continuously discharged from V init to V th ) minus the sum of all time inputs A quick comparison between (4) and (2) suggests a crucial limitation of the TR-its weight for each t i can only be 1 instead of an arbitrary w i . The weighted TR (WTR) shown in Fig. 3(b) overcomes this limitation by replacing the fixed resistor R and capacitor C with the variable ones, R V and C V . With this change, the WTR acquires a new degree of freedom, i.e., the variable RC time constant τ = R V · C V , to influence each pulse's discharge speed and the resulting voltage drop on V C . Accordingly, the WTR's final output becomes where τ i is the RC time constant for t i , and τ out is the RC time constant for the final output discharge. Here, an arbitrary weight, w i = τ out /τ i , is effectively applied to t i . Although the WTR achieves the weighted sum [ n i=1 (τ out /τ i ) · t i ], the offset term τ out · ln (V init /V th ) in its output raises undesired issues. This term indicates the WTR's sensitivity to voltages, i.e., V init and V th , and physical parameters, e.g., τ out , which can ultimately lead to a severe PVT susceptibility. This term is advantageously canceled in a differential WTR (DWTR) configuration shown in Fig. 3(c). Two identical WTRs operate there in parallel and share the common resistive and capacitive tuning terminals, RT and CT. Hence, the same RC time constant τ i is applied to their i th input pair (i.e., t i,P and t i,N ). Non-shared pins of the two WTRs are distinguished with subscripts P and N. The outputs of two individual WTRs follow the same rule as (5). Combining these outputs differentially, the PVT-sensitive offset terms cancel out each other Nevertheless, the differential inputs and output required by the DWTR are too complex to use-they are the pulsewidth differences (t i,P − t i,N and t out,N − t out,P ), instead of the time differences defined in (2). Therefore, their form is redefined. For the output, we simply impose a constraint that the last falling edges on SWD P and SWD N must be launched simultaneously. Then, the differential output t out is reinterpreted as the time offset between CMP P and CMP N , which equals t out,N − t out,P [see Fig. 3(c)]. For the input form conversion, the proposed TAU employs a phase/frequency detector (PFD). As shown in Fig. 3(d), the PFD bridges the gap between the overall TAU input, i.e., the time difference between TIN P and TIN N falling edges, and the DWTR input, i.e., the width difference of the pulse-pair on SWD P and SWD N . To do so, the PFD first pulls down SWD P and SWD N at the TIN P and TIN N falling edges, respectively. Once both SWDs become low, the PFD resets itself to pull them up simultaneously. By doing so, the PFD converts the input time difference to the pulsewidth difference. However, during the TAU output processing, the SWDs should stay LOW to keep discharging the WTRs until both CMPs' falling edges are asserted. At this moment, the PFD should not revert the SWDs to HIGH because this would disrupt the output process. Therefore, when READ = 0 triggers the final output, it also blocks the PFD's reset (the second mode of PFD) and, thus, the SWD recovery.
The output of the proposed TAU is where t i is the input time difference between the i th pair of the TIN P/N falling edges, and t out is the output time offset between CMP P/N . Note that t i here can be either positive or negative depending on the corresponding leading edge on the TINs. The TAU calculates the weighted sum of all inputs, whose weights can be manipulated by tuning the  corresponding RC time constants (τ out and τ i 's). Therefore, the TAU's definition in Section II-A can be satisfied. However, one may still question the equivalence between (7) and (2) since the weights are positive-only in the former (τ out /τ i ) but can also be negative in the latter (w i ). This limitation can be addressed by transferring the weight's ± sign to its associated input t i , whose polarity is determined by the corresponding leading edge on the TINs [see TIN P/N in Fig. 3(d)]. In our implementation shown later, we achieve the negative weight by deliberately swapping the leading-falling edges in the corresponding active-low SWD pulse-pair.

C. RC Tuning in the WTR
To further detail the weight control in (7) by means of τ out /τ i , Fig. 4 reifies the variable resistance and capacitance introduced in the conceptual WTR of Fig. 3(b). The variable resistor is implemented with a switched-resistor (SR) bank, consisting of parallel unit resistors, R U . RT determines the number of actively discharging R U 's (8 in total). Meanwhile, the variable capacitor is realized with a fixed capacitor C 0 and a switched-capacitor (SC) bank, consisting of parallel unit capacitors, C U , whose active count is controlled by CT. Therefore, the RC time constant can be controlled as Note that, during the complete TAU execution cycle (from the reset to output), increasing CT would engage new V init -precharged capacitor units, which would lead to charge sharing, thus erroneously increasing the V C voltage. Therefore, CT is constrained to stay constant or decrease when processing the TAU inputs (see Fig. 5).
The RC tuning of WTR is introduced here to pave the way for the TAU control flow design in Section II-D. Other details are delayed until Section III-E.

D. TAU Control Flow Within the Proposed PLL
The basis of the TAU in the proposed PLL system stems from (1). It was then abstracted as computing the weighted sum of its time inputs, which also generalizes the TAU functionality, i.e., (7). To program the TAU to execute (1), we designed a dedicated control flow to ensure that the TAU receives T CKV and t S [i.e., time inputs of (1)], assigns proper weights to them, and outputs the weighted sum.
According to Fig. 5, the TAU processes four time-domain inputs in a single execution cycle. By tuning the RT and CT control pins, different RC time constants (τ 's) can be assigned to each input. According to (7), the resulted output is where τ 1 , τ 2 , and τ 3 are the RC time constants during the first to third discharge, while τ S and τ A are those during the t S sampling and final output, respectively. The minus signs result from the swapped leading-falling edges in the corresponding SWD pulse-pairs, as discussed in Section II-B. By replacing the τ symbols with their respective components in (8), t out becomes where N C is the CT code during the first discharge and N R is the RT code during the third discharge. To explain the correlation between this output and the functional requirement in (1), the TAU execution cycle is divided into a reset state and three functional states-pre-discharge, snapshot, and time amplification (TA). Each of them realizes one term or coefficient in (1). The execution cycle starts with the reset state, in which the SWC closes the relevant switches in the WTRs to charge all the capacitors (CT = max) to V init . Then, the non-critical FREF (i.e., rising) edge disconnects the SWC switches and triggers the pre-discharge state, in which the TAU calculates and stores the t S prediction term, (1 − φ R,frac ) · T CKV . The prediction is realized by the weighted sum of three T CKV 's, which are generated by sampling the CKV period and reflected on the width differences of the active-low SWD pulse-pairs. During the first SWD pulse-pair, the capacitive tuning code N C (on CT) is applied to finely scale T CKV . During the third one, the resistive tuning code N R (on RT) scales T CKV coarsely. The difference between these two scaled inputs realizes the Here, N R ranges from 0 to 7, yielding the resolution of 1/8 in φ R,frac tuning. Consequently, the N C term needs only to cover the tuning range of 0 ∼ 1/8. Within such a narrow range, the nonlinearity in the mapping between N C and φ R,frac is insignificant and simple to compensate for. One may notice that (1) does not reflect the influence of the second discharge.
In fact, this discharge introduces an extra offset of 3/8 · T CKV for metastability mitigation to be discussed in Section III-B1. After these three discharges, TAU enters the snapshot state, in which the WTRs directly subtract the sampled t S from the pre-stored prediction. This realizes the −t S term in (1). As a result, only the desired residue (substantially reflecting the DCO PN in the phase-locked state) remains in the TAU. Finally, in the TA state, the TAU outputs this residue as the time offset between CMP P and CMP N (t out ). During this process, the residue is also time-amplified by This gain factor corresponds to G TA in (1) and is realized by manipulating the ratio between τ A and τ S , more specifically, the RT code during the TA and snapshot states. After generating the outputs, the TAU returns to the reset state, awaiting the next cycle.

A. TAU Sub-System Overview
Fig. 6(a) illustrates the implemented TAU together with the auxiliary circuits that control its behavior in each state defined in Section II-D. The PFD is actually realized in a more complex tri-mode in order to effectively support the three distinct functional states-pre-discharge, snapshot, and TA. The TAU is alternatively controlled by the global and local finite state machines (FSMs). Fig. 6(b) shows the active FSM in each TAU state, indicated by RST all , PDIS done , and TA en . In the pre-discharge state, the local FSM is active. It interacts with the tri-mode PFD (through START and READY) to generate the first three inputs for the WTRs (pulse-pairs on SWD P and SWD N ). Meanwhile, the local FSM adjusts the weight for each input (through RT, CT, and SIGN), whose φ R,frac -dependent weight codes, i.e., N R and N C , are calculated by the RC encoder according to (11). Once the TAU processes the first three inputs, the local FSM terminates the pre-discharge state and activates the global FSM through PDIS done = 1, which controls the TAU in the remaining states.
In the snapshot state, the global FSM captures t S and transfers it to the TAU via CKRG P and CKRG N . To mitigate the issue of potential metastability in the t S sampling (see Section III-B1), an anti-alignment delay (between FREF and FREF') is added. In the TA state, the global FSM controls the local FSM to apply proper RT for G TA and prepares the TAU for final output, both by setting TA en = 0. While waiting for the TAU output, the global FSM also launches CKU, a master clock of the overall PLL. After the TAU output is quantized by its subsequent TDC (indicated by TDC done falling), the global FSM resets the overall TAU sub-system with RST all = 0. When this global reset is removed (RST all = 1, by the FREF rising), the local FSM will be activated again, starting the next execution cycle.
B. Implementation of the Global FSM 1) Differential Snapshot Circuit: In the snapshot state, the global FSM conveys the t S information to the TAU via CKRG P and CKRG N . Inside the global FSM, t S is sampled by the differential snapshot circuit. As shown in Fig. 7, it contains two similar single-ended paths, modified from [14]. The P-path captures the first CKV falling edge after FREF'. To achieve this, FREF' first inactivates the reset on the main flip-flop (FREF' = 0) and releases CK1, the gated CKV. Once CKV falls, the main flip-flop asserts CKRG P . On the N-path, CKRG N is asserted at the FREF falling edge (since PDIS done = 1 in the snapshot state). Ideally, the propagation delays on these two paths are canceled, so the time offset between the CKRGs equals that between FREF and CKV, which is t S . One may also notice CKR en , the gating signal of CKRGs, in the differential snapshot circuit. It is scheduled by the global FSM (see Fig. 9) for two purposes: First, in the TA state, it launches the concurrent rising edges on the CKRGs to trigger the TAU output. Second, in the pre-discharge and reset states, it blocks activities on the CKRGs to avoid interfering with the tri-mode PFD.
The differential snapshot circuit can sample t S accurately only if its N-and P-path propagation delays are properly canceled. However, in reality, the flip-flop metastability may corrupt this condition, thus distorting the sampled t S . For example, in the P-path, the flip-flop's CK1-to-Q delay can dramatically increase when the reset removal (FREF' falling) is close to the subsequent critical clock edge (CK1 falling). This occurs with a certain probability (determined by the flip-flop's metastability window) in a fractional-N PLL mode because the time offset between the FREF and CKV edges (also, by extension, the offset between FREF' and CK1) distributes uniformly between 0 and T CKV . In contrast, the N-path is free from this issue since its reset, the inverse of PDIS done , can be guaranteed to settle sufficiently earlier than CK2 (or FREF). Consequently, the P-path delay variation can reflect on the time offset between CKRG P and CKRG N , thus adding uncertainty to the sampled t S .
To avoid this flip-flop metastability issue, we add a conditional anti-alignment delay, either 0 or T CKV /2, between FREF' and FREF according to the t S prediction [i.e., (1 − φ R,frac ) · T CKV ]. Consequently, the FREF' falling edge can be sufficiently separated from its neighboring CKV (strictly speaking, CK1) falling edge, and the flip-flop metastability will not occur. However, this variable delay will change the sampled t S . For example, when FREF is close to its first subsequent CKV edge (i.e., small t S prediction), FREF' is delayed by T CKV /2 to extend the separation. As a result, the second CKV edge after FREF, instead of the first one, is captured by the snapshot circuit, and T CKV is added into the sampled t S . In contrast, the sampled t S is intact when its prediction is nominally large. This yields a non-monotonic mapping from φ R,frac to the sampled t S , thus complicating the TAU control. To alleviate this, we add the 3/8T CKV offset during the 2nd discharge in the pre-discharge state (see Fig. 5). Since any type-II PLL always keeps a zero-mean input to the loop filter, this offset finally appears in the expected t S so the delay logic changes accordingly (see Fig. 7). This maintains a monotonic mapping between φ R,frac and the sampled t S as (13), thus avoiding any complicated top-level time-offset controls.
To provide a better overview of this metastability mitigation mechanism, four boundary cases are examined in Fig. 8. From (a) to (d), these cases are arranged with increasing t S (hence, decreasing φ R,frac ). In (a), FREF' is relatively close to the subsequent CKV. As t S increases, FREF' separates from the subsequent CKV edge but gets closer to the precedent CKV edge until (b), right before the anti-alignment delay changes (controlled by SelDelay). At the moment SelDelay switches from 0 to 1 [see (c) when φ R,frac = 0.5], FREF' is shifted by T CKV /2, thus closer to the subsequent CKV edge again, just as in (a). Then, as t S increases, FREF' is gradually away from the subsequent CKV edge and closer to the precedent CKV edge until t S reaches its maximum in (d), repeating the trend from (a) to (b).
There are two critical timing separations in these boundary cases. The first one is the minimum level of separation between FREF' and the subsequent CKV edge [see the light blue shaded area in (a) and (c)]. Increasing this separation helps to mitigate the linearity degradation due to metastability. The second is the minimum separation between FREF' and the precedent CKV edge [see the light red shaded area in (b) and (d)]. This separation is essential to avoid FREF' being caught up with the precedent CKV edge, which would cause the snapshot circuit to capture the wrong t S . Thus, the value of this separation is not so critical as long as it does not cross zero.
Interestingly, the sum of these two critical separations equals T CKV /2 because the first one is set by the intentional offset for metastability mitigation (i.e., 3T CKV /8 in our case), and the second one is set by T CKV /2 minus this intentional offset. It seems optimal to equally allocate T CKV /2 to these two separations, i.e., T CKV /4 for either. However, because the separation between FREF' and the subsequent CKV edge can cause the linearity issue, we prefer to assign more margin to it.
Although adding the offset of 3T CKV /8 alleviates the metastability issue, it shifts the range of t S from (0, T CKV ] to (3T CKV /8, 11T CKV /8], thereby increasing the maximum t S to 11T CKV /8. To handle the increased t S , the WTRs should adopt a larger R 0 C 0 (see Section III-E), but this slows the discharge slew rate and degrades the noise performance (see Section V-C). This is a tradeoff between linearity (which may be degraded due to metastability) and noise. However, more advanced technology nodes will suffer less from this tradeoff because the flip-flops are faster with a narrower metastability window [32].
2) Time Amplification Control and Global Reset: Fig. 9 shows the overall global FSM, emphasizing the TA control logic and the global reset. The core of the TA control logic is a shift-register chain, whose outputs (ST2 : 0) serve as a state variable, scheduling the TA-related actions: In the state of ST2 : 0 = 3 b001, the global FSM notifies the local FSM to adjust RT for G TA , alters the tri-mode PFD to the TA mode, and prepares the WTR comparator for the final output. All these actions are performed by pulling down TA en . When ST2 : 0 = 3 b011, the tri-mode PFD is triggered for the final output by the rising CKR en , which launches CKRG P = 1 and CKRG N = 1 in the differential snapshot circuit. The shift-register chain is clocked by a gated CKV, i.e., CKTA. It is activated after sampling t S (indicated by CKR rising) and deactivated after triggering TAU output (ST2 : 0 = 3 b111). The TA logic also launches the master clock for the PLL digital part (CKU) after triggering the TAU output. This helps protect the critical events (e.g., sampling t S and launching the final output of TAU) from potential interferences due to the digital activity.
Once the output of TAU has been quantized (indicated by TDC done = 0 from the TDC), the global FSM asserts a global reset (RST all = 0). As a result, the TAU enters the reset state, waiting for the next TAU execution cycle (triggered by FREF rising). Fig. 10(a) shows details of the tri-mode PFD, whose three modes pair up with the three functional states of TAU. These modes are switched according to the TAU state indicators-RST all , PDIS done , and TA en [see Fig. 6

C. Implementation of the Tri-Mode PFD
PFD Mode 1 is active in the pre-discharge state. The PFD core is driven then by the dedicated clock gating block, which releases the gated CKV clocks on CKVG P and CKVG N with one CKV cycle delay (when READY = 0). Once the CKVGs are released, the PFD core launches an active-low pulse-pair on SWD P and SWD N , whose width difference is T CKV . Fig. 10(b) illustrates a single SWD pulse-pair generation cycle. Once a cycle is triggered (START falling, event marker 1), the flipflop Q2 removes the reset on the output flip-flops Q1 and Q3 (RST = 0, 2), unsets the PFD idle flag (READY = 0, 3), and enables the CKV gating block to release the CKVGs successively (4.1 and 4.2). At the CKVGs' rising edges, the corresponding SWDs fall (5.1 and 5.2). Once both the SWDs become LOW, they are reset (6) to HIGH simultaneously (7). Consequently, the PFD outputs an activelow pulse-pair on the SWDs. Meanwhile, the SWD reset (6) also raises the PFD idle indicator (READY = 1, 7 * ), which is the check signal for the local FSM (see Fig. 12) to determine whether to start the next pulse-pair generation cycle (through START = 0, 8). In addition, as mentioned in Section II-B, the TAU needs to swap the leading-falling edges in the generated SWD pulse-pair when a negative weight is required. The SIGN signal (from the local FSM) controls this polarity by determining the earlier released CKVG. A question may arise whether the output flip-flops Q1 and Q3 can be disturbed by the activities on CKRG P and CKRG N in PFD Mode 1. According to Fig. 7, this cannot happen since the CKRGs are blocked by CKR en = 0 in the pre-discharge state.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Fig. 8. Boundary cases of the metastability mitigation mechanism that prevents the insufficient separation between FREF' and the subsequent CKV edge [corresponding to CK1 in Fig. 7(a)]. After the pre-discharge, the CKVGs are frozen at LOW by PDIS done = 1. Then, the tri-mode PFD is driven by the CKRGs and behaves the same as the dual-mode PFD in the conceptual TAU of Fig. 3(d). Detailed waveforms are illustrated in Fig. 10(b): In PFD Mode 2 (paired with the snapshot state), the PFD converts the time difference between the CKRGs to the width difference of the SWD pulse-pair. In PFD Mode 3 (corresponding to the TA state), reset of the output flip-flops Q1 and Q3, i.e., RST, is initially disabled [by TA en = 0, note that RST all = 1 and TA done = 1 at this moment, and R(eset) has a higher priority than S(et) in flipflop Q2]. Consequently, SWDs can remain at LOW (2) after being triggered by the CKRGs (1). The LOW-level SWDs keep discharging the WTRs. As soon as both WTRs output, a feedback signal [3, TA done = CMP P +CMP N = 0, as shown in Fig. 6(a) (upper right)] enables the reset (RST = 1, 4) so that the SWDs can recover HIGH level (5) in order to stop discharging the WTRs.

D. Implementation of the Local FSM
In the pre-discharge state, the local FSM controls the tri-mode PFD to generate the first three SWD pulse-pairs and applies proper weights to the WTRs. Each pulse-pair is generated through the interaction between the local FSM   and the tri-mode PFD in a self-timed style, emulating the asynchronous SAR ADC [33]. Fig. 11 shows the detailed single pulse-pair generation (SPPG) logic. Two prerequisites are needed to activate the SPPG logic: the global reset released (RST all = 1) and the precedent (if existing) SPPG logic completed (STATEn−1 = 1). Once the tri-mode PFD becomes idle (READY = 1, 1), the SPPG cycle starts by raising its state indicator (STATEn = 1, 2). Then, a trigger pulse is generated (on TRIGn, 3) to notify the tri-mode PFD to launch an SWD pulse-pair (through START, 4, which sums the TRIGn's from all the SPPG units in Fig. 12). Once the pulse-pair gets generated, the tri-mode PFD sets the idle flag again (READY = 1, 5), possibly starting the next SPPG cycle (6). Fig. 12 sketches the overall local FSM, which cascades three SPPG units and sums their trigger pulses (START = 3 i=0 TRIGi) to launch the SWD pulse-pairs consequentially. The corresponding timing diagram in a complete TAU execution cycle is shown in Fig. 13. After activated by the global reset removal (RST all = 1), the local FSM disconnects the TAU's charging switch (SWC = 0) and triggers the tri-mode PFD (through the first START falling edge) to generate the first SWD pulse-pair. After that, the SPPGs interact with tri-mode PFD (through START and READY) to launch the remaining two SWD pulse-pairs (as shown in Fig. 11). Once "done" (indicated by the 3rd READY rising), the state of the TAU transitions from the pre-discharge to snapshot (PDIS done = 1). Accordingly, the tri-mode PFD changes its mode. Then, at the local FSM's further request for pulse-pair generation (the 4th START falling), the tri-mode PFD merely removes its output reset, i.e., RST falls in Fig. 10(a), readying itself for processing t S in the snapshot state.
The weight for each WTR discharge is controlled by the corresponding combinational logic in the local FSM (see Fig. 12), which translates the outputs of RC encoder (N C and N R ) to the weight-control sequences (on RT, CT, and SIGN) according to the SWD pulse-pair indexes (STATE3 : 1) and certain TAU state indicators (TA en , and the inverted RST all , i.e., SWC). Note that the delay lines in the local FSM and SPPGs are realized with replica logic gates and routing of the corresponding weight control paths, in order to emulate the associated propagation delay. Therefore, these delays guarantee the corresponding discharges to be triggered (by START falling) after the weight control signals get settled down. Fig. 4 shows the implemented WTR. The switching SR and SC units adopt dummy switches, roughly compensating their main switches' charge injection and clock feed-through in order to minimize the TAU's arithmetic accuracy degradation. Finer compensation is performed by a piecewise pre-distortion in the RC encoder (see Section VI-C). Considering that the overall TAU targets 10-bit accuracy, the WTR uses eight SR units and 223 SC units to realize the upper 3 bits and lower 7 bits, respectively. The over-designed 223 SC units provide enough redundancy for pre-distortion (or calibration).

E. Implementation of the WTR
A question might arise as to why the bottom plates of the SC units here are connected to power (V DD ) instead of ground. This is to avoid a situation where the bottom plate voltages of those disconnected SC units fall below ground after the discharge. This could occur if the bottom plates were initially connected to the ground and would result in reverse polarization of their switches, causing charge leakage, thus degrading the TAU's arithmetic accuracy.
The slicing comparator is modified from the threshold-crossing detector (TCD) in [34]. As shown in Fig. 14, the implemented slicer mainly consists of a gated inverter (PM2 and NM1) and a dynamic inverter (PM3, NM3, and NM4). The slicer is enabled (by TA en = 0) right before the final discharge of the WTR to avoid unnecessary power consumption due to the possible crowbar current (since V C can be close to the threshold of the first-stage inverter before the final discharge). Once the slicer output is asserted Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. (CMP = 0), the first-stage inverter is gated off immediately to save power. Capacitors C 1 and C 2 help to suppress the output jitter [34]. The cross detection threshold of this slicer, V th , is dominated by that of the first-stage inverter, which drifts with PVT variations. Fortunately, the differential arrangement helps to cancel the influence of V th drift common to both paths. V th mismatch between the differential paths mainly causes a constant output offset, which is automatically compensated by the loop dynamic of a type-II PLL.
Considering the constraint in Section II-B stating that V C should be higher than V th after the (W)TR processes all the time inputs, one may wonder how to properly choose V init , V th , and the R & C values of the WTR to satisfy this constraint. From the circuit perspective, these four physical parameters determine the upper limit of the discharge duration that a WTR can handle, i.e., t lim . From the system perspective, the time processing details in Fig. 5 determine the maximum discharge duration the TAU should handle, i.e., t max . As long as t max < t lim , V C would never fall below V th after all the inputs get processed. In this way, the four physical parameters of the WTR are constrained. Next, we calculate t lim and t max separately. Note that, in the analysis below, all the discharge durations are referred to as their corresponding equivalents in the snapshot state, i.e., resulting in the same amount of V C drop if discharging C 0 through R 0 /8. This is because the primary goal of the TAU is to cancel t S , which is processed in the snapshot state. t lim can be determined by discharging C 0 from V init to V th through R 0 /8 To analyze t max , Fig. 15 depicts the equivalent discharge time of the DWTRs. Each SWD pulse-pair contains a differential component t diff and a common-mode component t cm . The former is the explicit time input to be processed, i.e., T CKV or t S , depending on the state of the TAU; the latter results from the PFD reset delay. The influences of these two components should be considered separately. Considering that the time signals on the P and N paths will cancel out, the maximum accumulated duration in the differential mode can be estimated by inspecting the P-path as max t diff,acc = max which is obtained at N C = 0. For the common-mode discharge, the max accumulated duration is max t acc,cm = max which is achieved at N C = 0 and N R = 7. Summing max(t acc,diff ) and max(t acc,cm ) yields t max . By substituting (14)-(16) into t max < t lim , the minimum required product of R 0 × C 0 can be constrained as

F. Implementation of the RC Encoder
The RC encoder assists the local FSM with the weight control by mapping φ R,frac to N C and N R , which are, respectively, the CT code at the first discharge and the RT code at the third discharge (see Fig. 5). According to (11), the mapping from φ R,frac to N R is linear. Considering that N R is responsible for the coarse tuning, it is simply obtained by truncation Then, N C handles the residue phase Accurate mapping from φ CT to N C is nonlinear and rather complex, but it can be approximated with Taylor series considering that φ CT is merely a small residue (<1/8) after the coarse tuning  where the dominant nonlinearity is handled by φ 2 CT , and higher order errors are compensated by o(φ CT ). Fig. 16 illustrates the implemented RC encoder. The path from φ R,frac to N R reflects (18). Equation (20) is realized by the path from φ CT to N C , where a sparse lookup table (LUT) stores the high-order error o(φ CT ), and E(C 0 /C U ) estimates the fabricated capacitance ratio C 0 /C U .

IV. IMPLEMENTED PLL
The proposed TAU sub-system is incorporated into the fractional-N PLL shown in Fig. 17. The TAU extracts the time error t E , mainly due to the DCO PN, by canceling t S with its prediction. Unlike the DTC-based or voltagedomain methods, which cancel t S with fixed time resolution, the TAU has a fixed phase resolution of 2π/1024 as it scales the carrier period T CKV with the 10-bit accuracy. The output of the TAU is quantized by a 4-bit differential TDC, whose overall architecture is quite similar to that in [11]. However, the sub-TDC for each differential path was replaced by a vernier counterpart in [34] in order to achieve a fine resolution of 1.9 ps. Considering the TAU's time amplification gain G TA = 8, the equivalent TDC quantization resolution is finer than 240 fs, thus negligible for the PLL in-band PN. In parallel with the TAU-based phase error tracking path, there is also a counter path assisting the frequency (re)locking, which can be turned off to save power once the PLL is locked. Similar to in [6], [20], and [35], the counter path could be "instantaneously" woken up when the PLL gets unlocked as detected by a range detector in TDC. The DCO is implemented using an LC tank and a complementary cross-coupled pair as in [36]. It covers the oscillation frequency range from 2.6 to 4.1 GHz. The frequency tuning is achieved by SC banks, with the finest frequency resolution varying from 70 to 290 kHz, depending on the oscillation frequency. To reduce its PN contribution, the frequency resolution is dithered by a modulator (DSM), operating at 1/8 DCO's resonant frequency.
V. NOISE/JITTER ANALYSIS As the TAU adopts the DWTRs to perform time-domain signal processing, all the noise generated within the TAU sub-system will be eventually reflected in the differential output as timing variance. The noise sources are categorized into two types: the time-domain noise, which constitutes the SWD jitter and is added to WTRs in conjunction with the timedomain inputs, and voltage noise, which originates inside the WTRs. Each noise type shows a distinctive transfer function at the TAU output. Fig. 18 depicts the time-domain noise presenting as jitter on the SWD edges. During the pre-discharge and snapshot states, the jitter that belongs to the same SWD pulse-pair is clustered as a pulsewidth difference variance, σ PP . σ PP 's in the pre-discharge and snapshot states are further distinguished as σ PP,P and σ PP,S , respectively. σ PP 's are injected into the DWTRs "riding" on top of their time-domain inputs to finally appear at the TAU output along the corresponding outputs. Therefore, the TAU's signal processing function of (10) also applies to σ PP . Moreover, consider the two facts: σ PP,P and σ PP,S , are added to T CKV and t S , respectively; the factor of 8 in (10) results from the time-amplification gain G TA = 8 [see (12)]. Consequently, we obtain the code-dependent TAU output variance resulting from the time-domain noise

A. Time-Domain Noise
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Fig. 19. Jitter contributors of an SWD pulse-pair. Note that only half of the PFD core is shown here, so σ PFD consists of σ fall and σ rise contributions on both paths, yielding σ 2 PFD = 2σ 2 fall + 2σ 2 rise .
The N C -and N R -related coefficients represent φ R,frac [see (11)], which uniformly distributes between 0 and 1 in fractional-N channels; thus, they can be averaged accordingly. This yields the average TAU output variance

B. Circuit-Level Contributors of Time-Domain Noise
Up until now, σ PP has been treated as top-level composite noise. In this section, we break it down into circuit-level contributors so that we can estimate σ 2 TD,out by combining the simulated jitter of each sub-circuit. According to Fig. 19, three physical mechanisms contribute to σ PP . The first is the original edge source that triggers the SWD falling edges, i.e., CKV or FREF. Its edges determine the SWD pulsewidth difference. Correspondingly, the edge source adds its jitter σ src to σ PP . The second σ PP contributor is a conceptual edgesampler, which samples the time information from the edge source and transfers it to the tri-mode PFD core. For example, in the snapshot state, it represents the differential snapshot circuit (see Fig. 7), which samples t S from CKV and FREF. To realize the required functions, the edge samplers usually block the unwanted edges and pass the desired ones. Thus, the edge sampler smears out the desired edges during the propagation. Consequently, the edge sampler adds its jitter σ samp to the SWD falling edges. The last σ PP component is σ PFD , i.e., width difference variance of the SWD pulse-pair due to the tri-mode PFD core, which launches the pulse-pair and contributes noise to both the SWD falling and rising edges. Since the PFD reset logic is common for the differential paths, its noise contribution is canceled in the final pulsewidth difference [37]. Therefore, only the output flip-flops degrade σ PFD . Finally, σ PP is broken down to where factor 2 indicates that the edge jitter adds to both SWD paths.
For σ PP,P , i.e., σ PP in the pre-discharge state, its edge source is the CKV clock with jitter of σ CKV , and the edge sampler is the CKV gating block in the tri-mode PFD with jitter of σ CKVG . Therefore, σ PP,P is detailed as For σ PP,S , i.e., σ PP in the snapshot state, its edge source contains both the CKV and FREF clocks, and the edge sampler is the differential snapshot circuit with jitter of σ snap on either path. Therefore, the σ PP,S breakdown is The coefficient of σ 2 CKV is 1 since the CKV clock only launches one SWD falling edge. Although FREF triggers the other SWD falling edge, its jitter is expediently ignored here since it is usually considered as reference noise in the PLL systems. Substituting (24) and (25) into (22), we have Note that σ CKV includes only the jitter of the DCO's buffer since the intrinsic DCO PN has the nature of wander (i.e., accumulated jitter) [38], and so it must be accounted for separately.

C. Voltage Noise
In the TA state, the DWTRs convert their internal voltages into the time difference at the output. As such, any internal noise voltage will be manifested as time difference variance σ VD,out . Two types of noise voltages dominate σ VD,out -K T/C noise on the fixed capacitor C 0 and the noise voltage of the first-stage slicing comparator (see Fig. 14). For either WTR, its output jitter due to the K T/C noise is estimated as where k is Boltzmann's constant, T is the absolute temperature, and k th1 is the slope of the C 0 discharge curve when it crosses V th1 , the threshold voltage of the first-stage cross comparator. With the windowed integral theory in [39], the first-stage cross comparator approximately degrades the WTR output jitter by where g m,eq is the equivalent transconductance combination of PM2 and NM1, C 1 is the load capacitance of PM2 and NM1, γ is the excess noise factor, and V th2 is the threshold voltage of the second stage of the cross comparator. Consequently, the TAU's output variance resulting from the voltage-domain noise is roughly where factor 2 accounts for the differential operation.

D. TAU's Input-Referred Noise and Its Contribution to PLL's Phase Noise
Summing σ 2 TD,out and σ 2 VD,out estimates σ 2 TAU,out , the overall time difference variance at the TAU output. Yet, we prefer to use the input-referred jitter for the PLL PN analysis, especially at the FREF side, e.g., [15] and [40]. According to (10) and (12), the transfer gain from FREF-related input, i.e., t S , to TAU's output is G TA = 8. Therefore, σ 2 TAU,out is divided by G TA 2 to derive the TAU's input-referred jitter σ 2 TAU,in ≈ 3.6σ 2 CKV + 2.6σ 2 CKVG + 2σ 2 snap + 2.3σ 2 Since the thermal noise dominates σ 2 TAU,in , the noise spectrum can be assumed to uniformly spread over the reference frequency range f REF . According to Staszewski and Balsara [15], this jitter power spectral density can be normalized to the PN spectrum by multiplying (2π f CKV ) 2 , where f CKV is the PLL output frequency. After getting attenuated by the closed-loop transfer function of the PLL, i.e., H cl ( f ), σ TAU,in contributes to the overall PLL PN by

A. INL Characterization and Degradation Mechanism
Generally, a nonlinearity of a typical mixed-signal circuit (e.g., DAC and DTC) is characterized by an integral nonlinearity (INL) representing a deviation between the practical and ideal outputs across the input. However, this is inapplicable for TAU as it needs to handle multiple time-domain and digital inputs. However, if the scope is narrowed down to the time-offset cancellation case in a type-II PLL system, the TAU's INL can be well-defined. Consider the corresponding behavior of TAU described in (1). t S is the time offset to be canceled, so it can be regarded as an ideal target, equivalent to the ideal output of a DTC. (1 − φ R,frac ) · T CKV is the generated term to cancel with t S and, thus, can be treated as the counterpart of the actual DTC output. Therefore, the cancellation residue t E reflects the TAU's nonidealities.
A conceptual testbench to measure the TAU's INL is illustrated in Fig. 20(a). Two phase-locked clocks, i.e., CKV and FREF, and the digital control target, i.e., φ R,frac , are input to the TAU sub-system [similar as in Fig. 6(a)], emulating the inputs to the TAU in the proposed PLL. Under such an arrangement, the TAU can get a stable time base of T CKV , a sequence of incremental t S ramps, and the corresponding φ R,frac , which scales the T CKV to accurately cancel t S . In the ideal case with no analog impairments, the cancellation residue t E would reflect the TAU's quantization error (QE), which can be precisely estimated based on the RC encoder structure in Fig. 16. However, if the TAU's nonlinearity is included, t E will further reflect the INL. Therefore, we can estimate the TAU's INL versus φ R,frac as   (The unit before the scaling by 2 10 is 1, i.e., characterizing the full range of T CKV with 0 ∼ 1.) Fig. 20(b) sketches a conceptually expected INL curve of the TAU. It exhibits a piecewise linear shape due to the TAU's coarse-fine tuning strategy. The eight segments coincide with the 3-b coarse resistive tuning. The vertical offset of each segment results from the nonideality of SR bank units, e.g., charge injection, clock feedthrough, and unit mismatch. The characteristic inside each segment is mainly correlated with the fine capacitive tuning. For example, the slope of each segment results from the C 0 /C U estimation error in (20) and the charge injection of SC-bank units. Since the fine-tuning is determined only by N C during the first discharge (see Fig. 5), which is actually irrelevant for the subsequent coarse tuning behavior, the slopes of all the segments are almost identical.
One may wonder how the INL curve changes in face of a mismatch between the DWTRs. In fact, the overall piecewise linear feature would remain similar to that in Fig. 20(b), but the offsets and slope values of each segment would change. This can be analyzed by inspecting each term in (5) that describes the WTR function. First, consider the offset term τ out · ln (V init /V th ), which is supposed to be canceled out in . As for each of the weighted terms, i.e., τ out /τ i · t i , mismatches in the corresponding discharge RC time constants, i.e., τ out and τ i , would introduce error in the t i scaling. Here, the mismatch of the SR unit dominates that of the RC time constants since the capacitive mismatch can be addressed by properly sizing the SC units [7]. The detailed effects due to this scaling error are case-dependent. For example, the scaling error would vary the slopes of all segments by the same amount if it occurred in the fine-tuning discharge (see Fig. 21 middle) because this discharge adopts a fixed SR configuration (RT = 8), and the corresponding mismatch introduces a fixed gain error to all the target scaling factors. In contrast, the scaling error would randomly offset each segment if it happened in the coarse-tuning discharge (see Fig. 21 right) since the error due to mismatch is N R -dependent. Fig. 22 shows the INL curve of TAU extracted from postlayout simulations. Under a 1-V supply (the nominal supply of transistors used in the implemented TAU), the INL is 1.7 LSB, corresponding to 0.17% of the full range. This is better than the DTC INL of 0.4% in [42] but worse than that of 0.09% in [41] (both from simulations). The TAU's INL is mainly degraded by the offsets between the coarse-tuning segments, reflecting the contribution from the charge injection of SR units. The INL could be improved to 0.5 LSB if the relative offsets were removed by calibration.

B. Simulated INL
The INL under 1.1-V supply is also shown in Fig. 22. The slope of each segment increases significantly, thus degrading the INL to 2 LSB. The increased slope can be attributed to the nonlinear parasitic capacitance, which varies with supply, thus introducing more error to the estimated capacitance ratio in the RC encoder, i.e., E(C 0 /C U ). After adjusting E(C 0 /C U ), the slopes are essentially corrected, and so the INL drops to 1.2 LSB, which is 0.12% of the full range and the same as the DTC INL under 1.1 V in [41].
One may question the advantage of TAU given its apparent lack of superiority in the INL characteristics over those in the best-in-class DTCs, such as [41]. In fact, the INLs presented so far were simulated under ideal constant supply conditions and reflect only the "static" nonlinearity. In practice, the DTC delay is easily disturbed by instantaneous supply fluctuations and, thus, suffers from certain "dynamic" nonlinearity. For this reason, Wu et al. [7], Markulic et al. [22], and Santiccioli et al. [43] report significant efforts on stabilizing the supply.
This supply-related nonlinearity issue is examined with a 10-b virtual DTC example emulating the resolution drift behavior in [41]. The reported DTC resolution changes (becomes finer) by 14% when the supply increases from 1 to 1.1 V. Therefore, if the estimated DTC gain, K DTC , is not adjusted accordingly, the DTC output delay would exhibit an error that is linearly proportional to the expected value. Fig. 23(a) shows the trend lines of the expected delay error of this reference DTC under the supply of 1 and 1.1 V, with the expected K DTC (used for converting the expected delay to the DTC control word) frozen at the mean value of these two cases. The two trend lines are characterized under a test bench similar to Fig. 20(a), so they converge to 0 at φ R,frac = 1, corresponding to the expected delay of 0, and reach the maximum amplitude at φ R,frac = 0. One may doubt the efficacy of freezing the estimated K DTC since a background calibration can constantly track the K DTC drift. However, the calibration might be too slow to respond to fast supply disturbances. Fig. 23(a) shows a case with such a fast supply ripple, which sinusoidally fluctuates between 1 and 1.1 V, in synchronicity with φ R,frac . The corresponding delay error of the virtual DTC will oscillate between the two aforementioned trend lines, and the peak-to-peak error can be up to 140 LSB.
For comparison, the t S cancellation error of the TAU is simulated under the same supply ripple condition. According to Fig. 23(b), the peak-to-peak error is merely ∼8 LSB. This benefits from the operating principle of scaling the "golden" time base and indicates the TAU would show stronger immunity to aggressors and better "dynamic" linearity compared with the DTC. One may wonder why the cancellation error of the TAU in face of the supply ripple exceeds the boundaries set by the INL curves under the stable supply cases (i.e., at 1 and 1.1 V). This comes from our specific WTR implementation, where the bottom plates of the SC units are connected to V DD (see Fig. 4). The supply ripple will affect the internal voltage of the WTRs (i.e., V C ) through the conducting SC units and parasitic switch capacitance, thus ultimately degrading the INL.

C. INL Calibration
According to Fig. 20(b), the INL of TAU is dominated by the coarse-tuning offsets and fine-tuning slope, correlated with N R and φ CT in Fig. 16, respectively. To combat the INL degradation relevant to these two sources, a piecewise calibration emulating [44] is added to supplement the RC encoder. The calibration operates when the PLL is locked by observing the TDC output, i.e., D TDC . As shown in Fig. 24, the calibration consists of two parallel paths-one pre-distorts the offset correlated with each possible N R value and the other combats the slope relevant to φ CT . Fig. 24(a) details the offset calibration. The offset related to each N R value affects D TDC (read in the subsequent FREF cycle) and, thus, can be estimated by accumulating the corresponding D TDC . This is similar to that in [22]. μ RT here is a constant controlling the accumulation speed. By subtracting the estimated offsets, i.e., OS0 ∼ OS7, from the fine-tuning path, the effects of the coarse-tuning offsets can be compensated. Prior to the subtraction, the estimated offsets are rounded to the same resolution as φ CT by a modulator to avoid the fine resolution of the offsets being masked by the quantization error of the fine-tuning path. Meanwhile, a constant positive phase φ const is also added in conjunction with the rounded offset to prevent the fine-tuning path underflow due to the potential negative input. Similar to the 3T CKV /8 offset for the metastability mitigation, the extra φ const would also shift the t S range, without causing functional issues. While the calibration is running, the offset registers would constantly update until the average D TDC corresponding to each N R becomes zero. This indicates that the influences of offsets have been well-compensated, thus becoming invisible to the PLL. Fig. 24(b) depicts the fine-tuning slope calibration, which detects the slope error by correlating (i.e., accumulating the following product) D TDC with the fine-tuning target φ CT , similar to the LMS calibration for K DTC in [8]. μ CT here is a constant controlling the accumulation speed. The correlation output N URT is used to correct the capacitance ratio of C 0 /C U , which significantly influences the fine-tuning slope. Instead of directly updating the estimated E(C 0 /C U ), which may require long word length and increased hardware cost, we directly tune the physical ratio of C 0 /C U : the nominal fixed capacitor C 0 is split into a "real" fixed C 0 and an SC-bank with the unit capacitance of C U . N URT is dithered by a modulator before adjusting the number of active C U to tune the "real" capacitance ratio C 0 /C U until the slope error vanishes.
Since both calibration paths rely on the same D TDC , they will likely interfere with each other given that both N R and φ CT change at a very slow rate when the PLL operates in a near-integer channel. This is due to the fact that it is difficult to distinguish the D TDC contribution from the offsets and the slope due to the absence of t S dithering mechanism in the overall PLL system, such as that provided by a multi-modulus divider dithered by a high-order modulator. To minimize such mutual interference, the calibration works in the foreground: it is only performed at well-behaved conditions, such as at specific large fractional FCWs. After the calibration is done, the results are frozen and used for nearby channels. The absence of background calibration would not significantly degrade the TAU's performance since it is insensitive to voltage and temperature variations.

VII. MEASUREMENT RESULTS
The proposed PLL is fabricated in 40-nm CMOS and occupies an active area of 0.31 mm 2 [excluding output drivers and debugging SRAMs; see Fig. 25(a)]. With a reference clock of 40 MHz, it synthesizes 2.6-4.1 GHz. Fig. 25(b) shows its power breakdown at 2668.2 MHz. The overall PLL consumes 3.48 mW, which is dominated by the DCO and its buffer, costing 2.3 mW at a 1.1-V supply. All other blocks are supplied with 1.0 V. The power consumption for the time mode (e.g., TAU, TDC, and the clock divider for DCO dithering) and digital logic parts is, respectively, 0.65 and 0.52 mW. Fig. 26(a) shows the measured PN at 2668.2 MHz. The integrated rms jitter (integrated from 10 kHz to 40 MHz, and including all spurs) is 182 fs, almost identical to that in the Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  nearby integer-N channel (177 fs at 2640 MHz). Considering the total power consumption of 3.48 mW, this PLL achieves a jitter-power FoM [49] of −249.4 dB. Fig. 26(b) compares the measured PN with its s-domain prediction, indicating a tight agreement at offset frequencies above 50 kHz. In this s-domain model, the input referred-jitter of TAU is 402 fs, estimated by simulating the jitter of each sub-circuit and combining the contributors via (30). The corresponding contribution to PN is obtained by an amended formula to (31) that combines the subblock's noise in the spectrum domain. The noise contribution of each sub-circuit is also listed in Fig. 26(b). Fig. 26(c) shows the integrated rms jitter across frequencies with the same fractional FCW as 2668.2 MHz, i.e., FCW frac ≈ 0.7. The measured jitter degrades as the frequency increases. We suspect the dramatic degradation between 3300 and 3800 MHz is attributed to the nearby inductors in this SoC and unoptimized implementation of the DCO SC tuning banks to support the wideband direct phase modulation [36]. To demonstrate the TAU's advantages in suppressing fractional spurs, the PLL output spectrum is measured in a near-integer channel of 2680.04 MHz (FCW ≈ 67.00025). According to Fig. 27(a), the worst-case fractional spur is −44.67 dBc. Note that they are measured before any TAU calibration, e.g., for global gain and INL. This compares favorably with the literature reports of worst case fractional spurs in DTC-based PLLs that adopt only gain calibration but with no further DTC linearity enhancement techniques, e.g., −37 dBc in [20] and −42 dBc in [8]. Our fundamental design choice-adopting T CKV , the PLL carrier period, as the basis for the time offset cancellation-is thus validated. This "golden" base automatically scales the global gain of the TAU transfer function, thus avoiding any need for the corresponding calibration.
The fractional spurs in Fig. 27(a) are dominated by the TAU's INL, chiefly due to the coarse-tuning non-ideality and the gain error in fine-tuning. After compensating the INL with the piecewise calibration, the worst case fractional spur becomes −60.74 dBc @50 kHz, the fifth fractional spur in Fig. 27(b). In this scenario, the integrated rms jitter is 236 fs. The worst-case fractional spur levels and integrated rms jitter are swept for at the fractional channels close to 2680 MHz. As shown in Fig. 27(e), all the spur levels are below −59 dBc.
Since the TAU utilizes the time basis of T CKV , which is constantly tracked by the PLL, the TAU-based PLL is expected to exhibit inherent resilience to environmental changes, i.e., supply and temperature drifts. To prove this, we froze the TAU's INL calibration setting and then measured the spur levels under certain environmental changes. From Fig. 27(b) to (c), the TAU's supply was increased from 1.0 to 1.1 V, and the worst spur remains −54.37 dBc. From Fig. 27(c) to (d), the environment temperature was increased from 19 • C to 85 • C, and the worst spur level is still below −51.7 dBc. These are noteworthy improvements compared with the DTC-based counterparts, as they would generate substantial spurs if their transfer function drift could not be compensated. For example, Chen et al. [41] reported a 14% DTC resolution drift when its supply increased from 1.0 to 1.1 V. As measured in [20], a 10% DTC gain error can cause an in-band fractional spur higher than −30 dBc. Table I summarizes and compares the performance of the proposed PLL with the state-of-the-art fractional-N PLLs. This work achieves the competitive spur level below −59 dBc and a state-of-the-art tradeoff between jitter and power, i.e., FoM of −249.4 under the low power constraint.

VIII. CONCLUSION
This article introduces a fractional-N PLL based on the proposed TAU, which extracts the phase error by calculating a weighted sum of its time-domain inputs derived from timestamps of the reference and DCO clocks. The prototype PLL demonstrates low-spur levels, which are robust under supply and temperature drift. Such spurious performance benefits from the phase-error-extraction strategy-scaling the "golden" time base, i.e., DCO period, to cancel the PD inputwhich automatically corrects the TAU's transfer function. The methodology-level improvement indicates a potential for exploring this new phase-detection category for low-spur clock generation.