Mismatch Analysis of DTCs With an Improved BIST-TDC in 28-nm CMOS

Nonlinearity of a digital-to-time converter (DTC) is pivotal to spur performance in DTC-based all-digital phase-locked-loops (ADPLL). In this paper, we characterize and analyze the mismatch of cascaded-delay-unit DTCs. Through an improved built-in-self-test (BIST) time-to-digital converter (TDC) assisted with phase-to-frequency detector (PFD), a measurement system of sub-half-ps accuracy is constructed to conduct the characterization. Fabricated in 28-nm CMOS, the DTC transfer functions are measured, and mismatches are compared against Monte-Carlo simulation results. The integral nonlinearity (INL) results are compared against each other and converted to the in-band fractional spur level when the DTC would be deployed in the ADPLL. The BIST-TDC system thus characterizes the on-chip delays without expensive equipment or complex setup. The effectiveness of adding a PFD into the $\Delta \!\Sigma $ loop is validated. The entire BIST system consumes 0.6mW with a system self-calibration algorithm to tackle the analog blocks’ nonlinearities.


I. INTRODUCTION
A LL-DIGITAL phase-locked loops (ADPLL) offer extensive re-configurability and require a small area for their (digital) loop filter in scaled CMOS [1]- [6]. A digitalto-time converter (DTC) is a critical block in fractional-N ADPLLs. In contrast to the voltage/current-domain operation of the conventional phase-detection circuitry, made up of a phase/frequency detector (PFD) and a charge pump, and suffering from the degraded dynamic range due to the reduced voltage headroom and stronger channel-length modulation in current mirrors, the time-domain quantization benefits from the steeper transition edges in the advanced technology nodes [2]. Consequently, the time-domain converters promise faster phase detection speeds and lower power consumption at lower supply voltages [6]- [9].
As shown in Fig. 1(a), the DTC, whose mismatch is the main investigation target of this work, is placed in the reference signal path in front of the time-to-digital converter (TDC) [5]. The DTC's nonlinearity is directly reflected in the ADPLL output spectrum, producing in-band fractional spurs which, by definition, cannot be suppressed by the loop filter. Thus, DTC architectures with better linearity, such as using a constant-slope technique [8], [10], are being explored. Since the DTC works periodically and is controlled by the accumulated fractional control word, FCW frac , its nonlinearity and the in-band spur level have a mathematical relationship. We confine the study of this relationship to the ADPLL architecture shown in Fig. 1(a). We assume the nonlinearity forms a pure sinusoidal shape covering the whole period of the digitally controlled oscillator (DCO), T DCO , with an amplitude of A nonl , where A nonl is normalized to one oscillator period. Then, the spur level can be expressed as [11]: L = 20 · log 10 (π A nonl ). (1) That relationship is plotted in Fig. 1(b). It can be observed that if a −40 dBc in-band fractional spur is desired, the nonlinearity amplitude should satisfy A nonl < 0.32%. Note that in this specific example, INL (normalized to LSB) and the nonlinearity amplitude are related by where DTC  unit stage delay [3], [6], [12]. Their nonlinearity is dominated by device mismatches, which can be optimized through proper device sizing and symmetrical layout. In practice, they can support the ≤0.3% peak INL harmonic amplitude to guarantee the <−40 dBc in-band fractional spurs [3], [5], [12]. A phase rotation technique can be further applied to reduce the DTC's nonlinearity influence by restricting the effective DTC range to a smaller portion of T DCO [7]. Two delay-cascading DTC architectures adopted in the previous ADPLLs [6], [12] are analyzed in this work. Although fabricated with different transistor sizes and even in different technologies, the measurement results are normalized and compared.
To perform an on-chip delay characterization, namely measuring the DTC transfer function, TDCs are widely used [13]- [22]. In general, the "measurement system" should push its precision one order of magnitude finer than the DTC resolution under test. In our case, the TDC resolution should be at the level of 1 ps, or even finer. This basic requirement excludes the flash TDCs whose resolution is at the gate delay level [13]. The vernier TDCs [14], [15] can achieve the sub-gate resolution but they suffer from large mismatchs between the fast and slow paths, and that requires a non-trivial calibration for each delay stage. The ADC-based TDCs can provide the desired resolution with reasonable linearity [16], [17]. However, the covered delay range is limited. Increasing the range while keeping the same resolution will inevitably sacrifice the linearity, which still needs to be calibrated. The or noise shaping TDCs [18]- [28] can relax the range-linearity trade-off. Unfortunately, their gain changes over PVT variations, which is normally calibrated by an on-chip PLL. In conclusion, the conventional TDC architectures, when employed to measure the DTC transfer function, require complex and expensive lab equipment or extra on-chip circuitry to calibrate the gain and mismatch of the TDC itself.
To overcome the aforementioned drawbacks, we have previously proposed to wrap-around the DTC in a loop of low hardware complexity, creating a 1 st -order TDC [1], [29]. A system self-calibration algorithm was utilized for non-ideal analog effects. To verify the DTC mismatch analysis with a highly accurate measurement, this work improves the precision of the BIST -TDC by reducing the charge pump noise by means of an additional PFD. The system noise contributions are analyzed to verify the proposed technique.
In addition to the targeted ADPLL application, the proposed BIST-TDC can also be used for characterizing the gain of other TDCs or to replace the digital delay-locked loop in outphasing transmitters [30].
This paper is organized as follows. Mismatches of three popular DTC architectures are analyzed in Section II. Section III describes the improved BIST -TDC. Measurement results are presented and discussed in Section IV.

II. ANALYSIS OF DTC MISMATCHES
In addition to the two separate DTC architectures suitable for an ADPLL, a third DTC architecture with around 100 fs resolution is also implemented. It explores the vernier concept applied to the DTC design by means of MOS capacitance matching. Every DTC is made up of cascading 32 delay units in a chain targeting 5-bit performance. Mismatches between delay units will be investigated in this section. Fig. 2, the first DTC was originally proposed in [6] and deployed in the first-ever sub-1 mW ultra-low-power ADPLL. The ADPLL architecture is similar to that in Fig. 1(a), yielding the same in-band fractional spur sensitivity to the DTC nonlinearity. The top part of Fig. 2 shows the DTC block diagram. Extra delay units are placed at each side of the delay chain. Acting as dummies, they are intended to retain the first and last cells' loading similar to that in other delay units. Every delay unit needs two delay control codes: EK for selecting the clock feeder (M 5 -M 9 ) and EN for enabling the delay element (M 10 -M 17 ). The input signal first goes through only one clock feeder controlled by EK. Then, it propagates through the remaining delay stages towards the output port. Therefore, the delay elements after the selected clock feeder are enabled through EN = 0, while the delay elements placed in front of the selected clock feeder are disabled through EN = 1. The disabled delay elements provide a high output impedance to the acting transition edges. Consequently, they do not affect the transition timing of the input's critical edge. Extracted from the measurement results [1], [6], this DTC's transfer function of delay versus digital control word (DCW) manifests non-monotonicity. Considering its delay-cascading structure, it is unexpected to observe that choosing the shorter path may yield a longer delay. For example, the signal going through the clock feeder selected by EK30 can reach the OUT node earlier than the signal going through the path selected by EK31. This phenomenon has not been clearly explained in the previous publications but will be addressed in this section with the help of dedicated Monte-Carlo simulations.

Shown in
As discussed in [1], when the input signal's (FREF) rising edge is the critical edge to be delayed, the clock feeder reverses it to the falling edge to propagate it through the delay elements. Therefore, the size for M 9 should be large enough in order to suppress the transistor noise. However, its gate capacitance is driven by the input signal, posing more pressure on the input buffer's driving capability. The devices/wires marked in red color are to be driven by FREF, causing the need for huge input buffers and symmetrical clock tree distribution for the FREF node. The rising time of FREF is still quite long. It was inferred in [1] that after the rising edge reaches the gate of M 16 , M 17 may not be fully enabled due to its huge size, which was intentionally designed to reduce the discharging on-resistance. This was an initial guess on the nonmonotonicity. Monte-Carlo simulations based on the whole DTC chain are very time-consuming and do not make it easy to spot the main mismatch source. To remedy that, Monte-Carlo simulations are carried out in this work to focus on the selected delay unit. Figure 3 provides quantitative mismatch information with 'local-only' variations, and discovers the root cause which has not been identified before. Note that the delay is also affected by the input signal's transition characteristics and output loading. The same driver and loading are placed in the testbench of the characterized individually selected delay unit.   Figure 3 demonstrates the delay variations for the selected clock feeder, its corresponding delay element, and the whole delay unit. The active clock feeder contributes to the majority of the mismatches. The standard deviation of the clock feeder's delay is up to 0.52 LSB. As a comparison, the delay element merely contributes σ = 0.12 LSB. Furthermore, the delay element always presents a positive delay. Its mismatch does not intrinsically cause any non-monotonicity issues. In contrast, two neighboring clock feeders with a delay difference larger than one LSB end up with non-monotonicity, which is very likely to happen based on Fig. 3(a). It should be pointed out that the propagation time of the clock feeder is about 9× of the DTC resolution, i.e. one order of magnitude higher. This DTC architecture demands the clock feeder to have a better mismatch performance than the delay element. However, the small sizes of M 5,6,7 are favored to reduce the input driver's loading. Enlarging their sizes will further increase the input signal's rising time. This design trade-off is hard to be balanced. Replacing the transmission gate with an AND gate can relax this trade-off by reducing the parasitic loading for the input driver.

B. DTC #2: Variable-Resistance Delay Line
Variable delay of the second DTC, shown in Fig. 4, is controlled by selecting an on-resistance in one discharging path. To align with the digital control interface, M 6 can be either enabled or disabled. M 5 is always on by fixing its gate voltage  to V DD . Together with M 6 , the two are connected in parallel in the first inverter's discharging path. The impedance seen from the M 2 source to ground is determined by the control signal EK. M 5 's on-resistance dominates when M 6 is disabled. When EK = 1, the parallel on-resistance of M 5 and M 6 is significantly lower than when EK = 0. Extra delay units are inserted at the input and output ports of the DTC to reshape the transition edges and mimic the loading environment for the first (EK0) and last (EK31) core delay units.
The discharging time from the node of the first inverter's output, namely the second inverter's input, is different at various digital control codes. It can affect the second inverter's output transition time and shape, which will disturb the following delay unit. Thus, even though specific numbers of delay stages can be enabled randomly within this delay chain, the implemented selection always starts from the front ones. With the same input transition edges for the active cells, the delay mismatch is limited to the devices. Considering one delay unit, the capacitances, including the parasitics from the gate of M 3,4 and the drain of M 1,2,5,6 , the source of M 2 and the interconnecting wires, should be discharged until reaching the threshold voltage of the second inverter. The transistors' capacitance and the parasitic capacitance can be optimized through a symmetrical layout. The on-resistance from M 2,5,6 is another source of mismatch, which also relies on the layout optimization. Large transistors' sizes burning more power can be traded for the better mismatch performance under the same resolution requirement. The Monte-Carlo simulation results shown in Fig. 5 demonstrate a much better standard deviation compared to DTC #1. One standard deviation is only 0.09 LSB from the schematic-level simulations.

C. DTC #3: "Vernier" DTC
Variable delay of the third DTC is realized through adjusting its capacitive loading difference. Its schematic is shown in Fig. 6. The MOS capacitors (M 5 and M 6 ) are put at the first and second inverters' output nodes. Two extra inverters are cascaded to isolate the variable capacitance loading from other delay units. Generally, the MOS capacitance experiences a large variation during the transition from the strong inversion region to the depletion region. M 5 is NMOS and M 6 is PMOS. The resolution is determined by the difference of these two capacitances, in a similar principle as in a vernier TDC, but using only a single path. This architecture is certainly not practical due to the sub-ps resolution heavily depending on the matching. That capacitance difference is even smaller than the parasitic capacitance. However, it is interesting to discover the extent of variations the measurement results can produce.
The MOS capacitance of M 5 /M 6 against the gate voltage is plotted in Fig. 7(a) for different source/drain voltages. It is apparent that both NMOS and PMOS capacitors exhibit the largest and smallest gate capacitances when they are in the strong inversion and depletion regions, respectively. Since the gate voltage is bounded within 0 to 1 V, the accumulation region does not arise.
Taking the rising edge as the critical one, while initially only considering M 5 , the discharging process experienced by M 5 corresponds to two different gate capacitance trajectories as Fig. 7(a) shows. The integrated influence when the gate voltage changes from 1 V to V th,M3 yields two different discharging times, under the scenarios of VSD = 0 V and VSD = 1 V. V th,M3 denotes the threshold voltage of PMOS M 3 which is around half V DD in this case. The averaged capacitance of M 5 when VSD = 0 V is much larger than that when VSD = 1 V. In other words, EK = 0 can increase the unit delay. The critical edge turns into the rising edge for M 6 . The integrated influence when its gate voltage increases from 0 V to V th,M8 affects the unit stage delay. V th,M8 represents the threshold voltage of M 8 . As a comparison, the averaged capacitance of M 6 when VSD = 0 V is relatively flat and much smaller than the case when VSD = 1 V. Therefore, EK = 0 will speed up the unit delay if only M 6 is considered. These two MOS capacitors present an opposite influence on the unit delay, manifesting a finer resolution if they are combined. However, as Fig. 7(b) shows, one sigma of the delay variation is as large as 4.11 LSB. This architecture is therefore not very practical. On the other hand, the mismatches are amplified, which makes it easier for characterization and comparison.

III. IMPROVED FIRST-ORDER TDC
Following up on our previous work in [1], a PFD is inserted in front of the charge pump, as presented in the top level architecture in Fig. 8(a). The red labeled blocks highlight this work's implementation contributions: a PFD for the system-level improvement and DTCs for the block-level analysis.
The BIST-TDC is made up of the DTCs under test, a charge pump (CP), a clocked comparator, and the digital control logic. The timing diagram is illustrated in Fig. 8(b). The system needs one external input clock S in with period T i to normalize the DTC delay. Synthesized and auto-placed-and-routed dividers generate three low-frequency clocks, S 1 , S 0 , and CK. All frequencies of the generated clocks are one-quarter of the external input clock. S 1 has a 50% duty cycle while S 0 has a 25% duty cycle. Their falling edges are aligned. The comparator is driven by CK whose rising edge leads S 1 by one T i . The loop works in such a way that the top plate voltage of the integration capacitor C int , namely V cap , toggles around the reference voltage V ref which is connected to the negative input of the comparator. The possibility of '1' appearing in the comparator's '0/1' bit-stream maps the delay under test.

A. Operational Principle
Originally, S 0 directly controls the charge switch and S d controls the discharge switch. S 0 's duty cycle is constantly 25%, while S d 's pulsewidth ranges from 0% to 50%. When the loop parameters are properly selected, V cap can stably toggle around V ref which is set here to 0.5 V. A more detailed mathematical explanation for the BIST-TDC working scheme can be found in [1].
It can be noticed that when S 0 and S d are connected to the charge pump directly, there is a time window lasting T i every cycle when charging and discharging paths are enabled at the same time. From the system point of view, those currents are wasted because the charging and discharging overlap time is fixed, containing no DTC delay information. Moreover, the current noise contributions from current sources I c and I d are added into V cap during this unproductive time. With the help of PFD, the charging and discharging signals end up as S up and S dn . Therefore, shrunk pulses are applied on the charge pump without affecting the delay information under test. On the other hand, the added PFD will introduce extra jitter. However, this can be neglected compared to the optimized charge-pump noise.

B. System Self-Calibration
The system calibration scheme is shown in Fig. 9. The motivation is to remove influence of the non-ideal effects on the DTC transfer function. Major sources are the charging/ discharging current mismatch of the charge pump and the comparator offset. In the system calibration mode, the DTC under test is bypassed from the measurement path. A digital calibration block is inserted between S 0 and the MUX to help generate an equivalent delay by omitting some pulses from S 0 in response to the N and M inputs. For example, when N = 1 and M = 25, one pulse is omitted every 25 S 0 pulses. In such a way, the equivalent delay equals to 0.0385T i , or N/(M + N)T i expressed in a general way. The calibration block provides the equivalent delay with high precision due to all the edges of S 0 and S 1 being triggered by the rising edges of S in . Note that the accuracy of the BIST-TDC in this work is assumed by the top-level mathematical constructs and has not been independently verified through direct (although extremely complicated laborious) measurements, such as [10], [31].

C. Noise Sources
In the calibration mode, the noise derives from the external high-speed clock, PFD, charge pump and comparator's thermal and flicker noises. In this improved version, the comparator is identical to that in [1]. Thus, only the noise of PFD and CP is investigated. Fig. 10 compares the noise with and without  the PFD. Ignoring the external clock jitter, the transition edges of S 0 and S 1 are clean. The PFD applies timing variations to S up and S dn . In reality, S dn still has a very short pulse with its pulsewidth determined by the PFD reset path delay. Therefore, the discharging path also suffers from the PFD jitter. Regarding the noise contributed by the charge pump, PFD significantly reduces the noise window, making the charge pump noise contribution less than half of the value in the case without PFD.
To quantify the charge pump's noise influence [32], [33], its output current noise simulation results are shown in Fig. 11(b), next to the current-steering charge pump schematic. Though the charge pump's power consumption cannot take advantage of adding a PFD, the same architecture is adopted for the noise comparison with the previous work [1]. S up is connected to the UPP node and S dn is connected to the DWP node. UPN and DWN are the inverted signals of UPP and DWP, respectively. The simulation results reveal that the charging current noise is larger than the discharging current because the current source M 5 encounters the double current mirroring from the external bias current source I CP . The flicker noise corner is close to 1 MHz. Being clocked at 50 MHz, the integrated current noise amplitude from 1 Hz to 25 MHz is 84 nA for the discharging path and 122 nA for the charging path. We assume that the two noise sources are uncorrelated and take 8 pF for the integration capacitor. Without the PFD and the delay under test being zero ( = 0 in Fig. 10), the maximal disturbance on V ref can be calculated as: The PFD schematic is shown in Fig. 12(a). A system reset signal is inserted into the PFD reset path with the OR gate. Extra delays are added in the reset path to avoid well known issues with the dead zone. The simulated rms jitter spectrum and corresponding phase noise are presented in Fig. 12(b) and (c) respectively. The flicker noise corner is around 100 kHz. The falling edge suffers from larger jitter due to the reset path. Circuits inside the PFD are selected from the standard cell library without optimizing for the jitter performance. Nevertheless, the added jitter is marginal [34], [35]. Integrating from 1 Hz to 25 MHz, the falling edge jitter is 311 fs and the rising edge jitter is 205 fs. Assuming for simplicity that the jitter of rising and falling edges is uncorrelated, their influences on V ref can be calculated as: 25 μA × 311 fs + 25 μA × 205 fs 8 pF = 1.6 μV (4) Compared to (3), the PFD induced jitter is deeply buried by the charge pump noise. Thus, it is worthwhile to introduce such a block into the BIST-TDC to improve the measurement precision.

D. Noise Improvement
To verify the above analysis, a behavioral model has been prepared. Simulations reveal that the loop can effectively suppress the white noise. For example, even though the external clock noise is modeled as 5 ps, the measurement precision can still be as fine as sub-100 fs if the noise is white. With the PFD, the charge pump's noise contribution is reduced, as shown in Fig. 13. However, the first-order loop suffers more from the flicker noise. Besides for the DTC under test, the flicker noise of charge pump and comparator dominates the flicker noise contributions of the analog blocks. To optimize the system noise performance, a longer channel length for the charge pump devices, especially for the current mirror, should be selected.

IV. MEASUREMENT RESULTS
Fabricated in 28 nm LP CMOS, the chip micrograph is shown in Fig. 14, together with the layout view. It occupies   810 μm × 640 μm of silicon area. The length of the bonding wire is around 1-1.5 mm. Thanks to the delay information being calculated on-chip and stored in the flip-flops, as well as to the input clock's thermal noise being filtered out by the loop, the package has limited influence on the measurement precision.

A. Experimental Setup
The measurement setup is illustrated in Fig. 15. A 200 MHz sine wave or square wave clock signal feeds into the chip. After a division by four, a 50 MHz clock is generated driving the BIST-TDC loop. The measured delay information is exported off-chip through a 1 MHz SPI interface for reporting purposes. All tests are completed automatically within several hours after setting up the measurement. The whole system power consumption is around 600 μW, similar to the one without the PFD [1]. It is limited by the charge pump's current-steering structure, even though a shorter charging/discharging time is realized.

B. DTC Transfer Function and Mismatch
After processing the measured raw data, the DTC transfer function can be calculated. The DNLs are derived from the transfer function. Three DTCs are measured in order.
The transfer function of the first DTC is shown in Fig. 16(a). The system calibration adjusts the final delay information from the black curve to the blue curve. The delay range is 528 ps and LSB equals 17 ps. It is not surprising to observe the DNL jumping out of 1-LSB boundary, which is indicated by the dashed black line. Worst-case DNL is measured as large as 2.0 LSB. Standard deviation (σ ) of a single-unit stage delay, 0.56 LSB, as shown in Fig. 3, is adopted to model the potential DNL performance. The DNL calculation is repeated 1000× and obeys the Gaussian distribution. The simulated results are shown in grey color in Fig. 16(b). It can be observed that the measured data matches quite well with the simulations. The simulated results indicate that the largest possible DNL can be even larger than 3 LSB. The calculated INL based on the best fit straight line is shown in Fig. 16(c).
To investigate how large in-band fractional spurs this DTC can induce, FFT is deployed to get the harmonic components' amplitude before applying formula (1) and (2). One more assumption is made here: the whole DTC range exactly covers one DCO period. The INL amplitudes for each harmonic are shown in Fig. 16(d). The 13 th harmonic's amplitude is the highest one with a value of 0.61 LSB, corresponding to an in-band spur level of −24.5 dBc. This value is close to the one reported in [6]. In the BLE applications, the fractional spur located outside of the bandwidth can be suppressed through the loop. For high-performance ADPLLs, this DTC mismatch can be calibrated. As mentioned in Section II, the mismatch performance can be improved by replacing M 5 , M 6 , M 7 in Fig. 2 with an AND gate, as well as enlarging the channel width of M 9 . This architecture enjoys the potential small fixed offset delay but suffers mismatches from two sub-blocks: the delay element and clock feeder.
The measured performance of the second DTC is shown in Fig. 17. The delay range is 391 ps with 12 ps resolution. Its linearity is much better than the first one. The measured worst-case DNL is only 0.25 LSB, which matches with the 1000× repetitive simulations presented by the grey curves. Its INL is shown in Fig. 17(c). Possibly owing to the gradient of mismatch or doping, this INL curve is not 'friendly' to the fractional spurs. As Fig. 17(d) shows, the first harmonic component dominates. The highest harmonic amplitude of 0.28 LSB corresponds to −31.22 dBc of in-band fractional spur. This value can be optimized to <−40 dBc through larger device sizes, thus burning more power [12]. The gradient effects on the fractional spurs can be suppressed by enabling the delay units in a sequence of e.g. 1, N, 2, N − 1, · · · , rather than 1, 2, · · · N, where N is the number of delay units. Although this DTC has the best mismatch performance, its architecture is still sensitive to the supply noise and substrate noise, as well as suffering from PVT variations. To retain a stable performance, an LDO, deep N-well and proper guard-rings should be adopted. Fig. 18 reports the measurement results of the third DTC. This clearly impractical DTC structure gives an overly nonlinear transfer function. It can only be observed that the delay tends to decline as the DCW increases. After least square fitting, one LSB is 89 fs. Although a 'fine' resolution is achieved, this architecture results in DNL of 47.9 LSB. Additionally, the measured DNL is much larger than the simulated  data, which is based on 4.1 LSB as 1σ for the unit delay. It indicates that for two different types of MOS capacitors' matching, the standard deviation can be much larger than the simulation results. That is also reasonable considering that hundreds of aF or one fF capacitance difference can be easily disturbed by the neighboring routing and dummy metal filling. Removing either M 5 or M 6 in Fig. 6, or controlling M 6 with the inverted version of EK could turn this DTC into a more practical architecture.
The nonlinearity performance of the three DTCs is summarized in Table I. Table II compares the proposed DTC measurement system with state-of-the-art found in the literature. The off-chip DTC measurement methods in [10], [31] offer a 'golden reference' for measuring the delay difference, although they cannot measure the absolute delay of DTC. Therefore, it appears that only the on-chip TDC arrangements are capable of characterizing a DTC under test with a fixed offset.

C. Measurement Precision
As discussed previously, the PFD can help with reducing the influence of the charge pump noise. The system is expected to provide better precision under the same conditions. Fig. 19 shows the histogram of the measured DTCs, together with the system self-calibration. The system self-calibration yields σ = 0.47 ps, a bit better than the previously reported  [1]. This is likely due to the input clock now being fed into the synthesized digital block directly. The divided clocks at 50 MHz are heavily affected by other digital cells, especially some digital standard cells adopting the minimum allowed size, introducing significant flicker noise into the digital power supply. The comparator's flicker noise is another major low-frequency noise source, which is to be optimized by enlarging the input pairs' size.

V. CONCLUSION
This work characterizes and analyzes the mismatches of three delay-cascading DTCs, which have recently become very popular in digital PLLs. Through an improved built-in self-test (BIST)-TDC arrangement with an added PFD, the measurement results demonstrate a sub-half-ps precision in the system self-calibration mode. Fabricated in 28-nm CMOS, the DTC transfer functions are measured, and mismatches are compared against Monte-Carlo simulation results. The integral-nonlinearity (INL) information is translated to the in-band fractional spur level of digital PLLs. Noise contributions within the first-order loop are analyzed, proving the effectiveness of adding a PFD into the loop. The whole BIST system consumes 0.6 mW with a system self-calibration algorithm to tackle the nonlinearities of analog blocks.