# A PLL Technique: Charge-Steering Sampling

Weichen Tao<sup>®</sup>, Graduate Student Member, IEEE, Yuhao Yang, Graduate Student Member, IEEE, Robert Bogdan Staszewski<sup>®</sup>, Fellow, IEEE, and Yizhe Hu<sup>®</sup>, Member, IEEE

Abstract—This article introduces a charge-steering sampling (CSS) technique for time-error detection (TD), an equivalent of phase detection (PD), in phase-locked loops (PLLs). The CSS mechanism presets the input capacitors of a successive approximation register (SAR) analog-to-digital converter (ADC) to  $V_{DD}$ and subsequently discharges them during a reference-triggered pulse through a pseudo-differential MOS pair directly driven by the oscillator. The resulting differential-mode (DM) charge residue, proportional to the time error, is digitized by the ADC to support all-digital PLL (ADPLL) operation. The proposed technique simultaneously achieves high-TD gain for low jitter, the excellent oscillator isolation for reduced reference spur, and multi-bit digital TD output for fast locking, fully leveraging the capabilities of advanced CMOS technology. A digital loop filter (DLF) featuring a dead zone (DZ) in the integral path is introduced to mitigate potential conflicts with the proportional path. To accommodate the short-oscillator period  $T_{osc}$  at millimeter-wave (mm-wave) frequencies, we propose extending the CSS pulsewidth to  $1.5\,T_{\rm osc}$ . In addition, a damped-sine waveform model for the CSS current is developed, providing deeper insights into the high-TD gain characteristics. The comprehensive noise analysis of the CSS is conducted using a multirate timestamp model, identifying contributions to the output phase noise (PN). Fabricated in 22-nm CMOS, the 18.8-23.3-GHz CSS-ADPLL prototype achieves 63-fs rms jitter, -52.4-dBc reference spur, and a figure of merit (FoM) of -254 dB, while consuming 9.95-mW total power, with only 1.3 mW allocated to the loop. For an initial frequency error of 200 MHz, the system achieves a locking time of 0.61  $\mu$ s, benefiting from the combined effects of a counter-based frequency-locked loop (FLL) (0.27 µs) and the multi-bit digital output of the CSS-ADPLL (0.34  $\mu$ s).

Index Terms—All-digital phase-locked loop (ADPLL), chargesteering sampling (CSS), low jitter, millimeter wave (mm-wave), multirate timestamp, phase detection (PD), pseudo-differential pair (diff-pair), sub-sampling, time-error detection (TD).

### I. INTRODUCTION

**E**MERGING high-speed communication standards for both wireline and wireless systems are driving the

Received 26 December 2024; revised 13 March 2025 and 19 April 2025; accepted 23 April 2025. This article was approved by Associate Editor Jaehyouk Choi. This work was supported in part by the National Natural Science Foundation of China under Grant 62374156 and in part by the National Key Research and Development Program of China under Grant 2019YFB2204601. (Corresponding author: Yizhe Hu.)

Weichen Tao, Yuhao Yang, and Yizhe Hu are with the School of Microelectronics, University of Science and Technology of China, Hefei 230026, China (e-mail: huyz@ustc.edu.cn).

Robert Bogdan Staszewski is with the School of Electrical and Electronic Engineering, University College Dublin, Dublin 4, D04 V1W8 Ireland, and also with the Department of Measurement and Electronics, AGH University of Krakow, 30-059 Kraków, Poland.

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2025.3566271.

Digital Object Identifier 10.1109/JSSC.2025.3566271



Fig. 1. RMS jitter requirements for (a) the sampling clock of an ADC with input near the Nyquist frequency, assuming a 3-dB SNR penalty in high-speed PAM-4 systems, and (b) the LO in 5G/6G communications.

demand for local oscillators (LOs) with ultra-low jitter, often well below 100 fs. In high-speed wireline applications, advanced modulation formats such as PAM-4 often rely on analog-to-digital converters (ADCs) operating at tens of gigahertz [1], [2]. The relationship between sampling clock jitter and the resulting *m*-dB signal-to-noise ratio (SNR) penalty for an *M*-bit ADC with input near the Nyquist frequency can be derived as follows [3]:

$$J_{\rm rms} = \sqrt{\frac{10^{m/10} - 1}{3\pi^2 f_{\rm s}^2 2^{2M - 1}}} \tag{1}$$

where  $f_s$  is the ADC sampling clock frequency. For instance, a 7-bit ADC operating at 56 GHz for 112-Gb/s PAM-4 signaling requires the sampling clock jitter to remain below 40 fs under a 3-dB SNR penalty. As illustrated in Fig. 1(a), the PLL rms jitter must even be below 10 fs for wireline transceivers operating at rates exceeding 224 Gb/s (e.g., 448 Gb/s).

In wireless systems [4], the relationship between the rms jitter and the corresponding error vector magnitude,  $EVM_{LO}$ , can be expressed as follows [5], [6]:

$$J_{\rm rms} = \frac{\sqrt{10^{\rm EVM_{LO}/10}}}{2\pi f_0} \tag{2}$$

where  $f_0$  is the carrier frequency. For example, in a 256 QAM modulation scheme, the corresponding LO's rms jitter can be set to 80 fs at 28 GHz (i.e., EVM<sub>LO</sub> = -37 dB), which is 8 dB<sup>1</sup> lower than the total EVM specification of -29 dB, as illustrated in Fig. 1(b).

To achieve low jitter, frequency synthesizers typically rely on either a high-gain phase detector or an injection-locking (IL) technique [7], [8] (or employ charge-sharing locking (CSL), a generalized form of IL [6], [9]). The former can be implemented through: 1) sub-sampling (SS)<sup>2</sup> a sinusoidal waveform of the oscillator with a sharp slope [11], [12], [13], [14], [15], [16], [17], [18], [19], [20]; 2) sampling a reference-rate waveform with additional phase-detection (PD) gain boosting techniques [21], [22], [23], [24]; or 3) a bang-bang (BB) operation [25], [26], [27], [28], [29], [30], which enforces a continuous toggle (i.e., non-zero output) of a 1-bit phase detector. Subsampling-based analog PLLs, due to directly sampling the oscillator voltage, often require a buffer between the oscillator and sampler to enhance the isolation and reduce reference spurs. Recently, several isolating SS-PD architectures have been proposed, including "isolated SS-PD" [14], active-mixer-based PD [31], and SS-PD with a functionally reused voltage-controlled oscillator (VCO) buffer [32]. However, they require an analog loop filter of a large area, making them unsuitable for deeply scaled CMOS technologies. On the other hand, digital PLLs based on a BB operation can achieve low jitter with compact digital loop filters (DLFs), but they typically suffer from slow locking due to their 1-bit PD output. This limitation can be mitigated by introducing an auxiliary path with an additional BB-PD, as demonstrated in [25]. Furthermore, IL and CSL techniques face challenges at millimeter-wave (mm-wave) frequencies, primarily due to the limitation imposed by the minimum achievable pulsewidth.

To avoid *directly* sampling the oscillator voltage, an early PD concept based on the overlapping area between differential sinusoidal oscillator waveforms and a reference-triggered pulse was introduced in [33] for use in a frequency-tracking loop  $(FTL)^3$  within an injection-locked synthesizer. This concept was later combined with a pulse-switched differential pair (diff-pair) with RC loading and applied in an analog PLL, termed "charge-sampling" [34]. Charge-sampling PD offers high gain with low jitter and ensures good isolation between the sampler and oscillator, thereby restraining the spurs. However, it relies on a diff-pair operating as a current source, a conventional operational transconductance amplifier (OTA) for V/I conversion after the charge sampling, and a bulky analog loop filter for I/V conversion, making it unsuitable for advanced CMOS technologies. Moreover, its PD-gain

analysis assumes a purely sinusoidal waveform current, which, while convenient, is impractical as it neglects nonlinearities, particularly when applied to short-channel devices.

In this article, we propose a new PLL technique of charge-steering sampling (CSS)<sup>4</sup> [36]. During a referencetriggered pulse, two preset capacitors are discharged through a pseudo-diff-pair that is directly driven by the oscillator,<sup>5</sup> promoting high-gain TD<sup>6</sup> and excellent isolation with the oscillator. By merging the preset capacitance with a successive approximation register (SAR) ADC, we implement a CSS-based all-digital phase-locked loop (ADPLL) that simultaneously achieves low jitter, low spurs, and fast locking. To address the pulsewidth limitations in mm-wave synthesizers, a 1.5× oscillator-period discharging method is proposed, along with a new analytical model for the CSS operation that characterizes the TD gain. Furthermore, a comprehensive phase noise (PN) analysis of the CSS-ADPLL is conducted using the multirate timestamp modeling technique in both the z-domain and behavioral time domain, with particular emphasis on the impact of the dead zone (DZ) [6] in the DLF, supported by experimental results.

The remainder of this article is organized as follows. Section II introduces the basic concept of CSS and its operating principles. It also includes the quantitative analysis of the CSS's TD gain, featuring the proposed time-damped sinusoidal waveform model for the charge-steering current. Section III presents the CSS-based ADPLL, highlighting the integration of CSS with SAR ADC, while its PN mechanisms are analyzed in Section IV. Section V discusses the circuit implementation of key building blocks, followed by experimental results in Section VI.

### II. CONCEPT OF CSS

## A. Basic Operation

Fig. 2(a) illustrates the concept of the proposed TD in the CSS scheme. It consists of two pairs of switches ( $S_1$  and  $S_2$ ) manipulating the charge on two sampling capacitors  $C_s$ . The actual charge steering is carried out by a pseudo-diff-pair,  $M_{1/2}$ , driven by the oscillator's differential output signal  $V_{\rm osc}$  with a period of  $T_{\rm osc}$ . The common-mode (CM) voltage of  $V_{\rm osc}$  (i.e.,  $V_{\rm osc,cm}$ ) serves as the biasing voltage for  $M_{1/2}$ . The time-error ( $\Delta t_{\rm err}$ ) detection between the reference pulse  $clk\_css$  (triggered by the falling edges of the reference clock ref with a period of  $T_{\rm ref}$ , where  $T_{\rm ref} = NT_{\rm osc}$  and N is the PLL's frequency multiplication ratio) and  $V_{\rm osc}$  involves two steps: 1) charge preset and 2) CSS. As shown in Fig. 2(b), during the high level of ref,  $S_1$  turns on while  $S_2$  remains off, and so the  $C_s$  capacitors are preset to  $V_{\rm DD}$  through  $S_1$ . Subsequently,  $S_1$  turns off and  $clk\_css$  shortly connects the

 $<sup>^1</sup>It$  accounts for 16% of the total EVM budget, calculated as  $(10^{EVM}\text{Lo}/^{20})^2/(10^{EVM/20})^2$  .

 $<sup>^2</sup>$ It should be noted that the so-called "not multiplied by  $N^2$ " for PD/CP noise in SS-PLLs (compared with charge pump PLLs) due to the absence of a divider is a long-standing myth in the PLL community, as clarified in [6] and [10].

<sup>&</sup>lt;sup>3</sup>Typically, an FTL is employed with IL or CSL techniques (see [6]) to enhance process, voltage, and temperature (PVT) robustness, while a frequency-locked loop (FLL) in a PLL is used to bring the oscillator frequency close to the target value. Unlike the FLL, which is disabled after completing the frequency acquisition (with control handed over to the integral path of the PLL), the FTL operates continuously alongside the IL.

<sup>&</sup>lt;sup>4</sup>It might be interesting to note that, in the art of analog design, a "charge-steering" technique [35] has been employed for small-signal amplification with low power consumption. It consists of two preset capacitors, a diff-pair, and a tail capacitor.

<sup>&</sup>lt;sup>5</sup>A modified version of this CSS-ADPLL was reported in [37] in which the roles of the reference and oscillator edges are swapped.

<sup>&</sup>lt;sup>6</sup>TD is equivalent to PD in the phase domain. The term "PD" is not best suited for subsampling techniques, as these involve comparisons between waveforms of different frequencies, making the phase definition in radians less consistent. Therefore, we adopt here the concept of TD.



Fig. 2. (a) Proposed CSS technique and its timing diagram. Operating principles: (b) charge preset and (c) CSS.



Fig. 3. Operating principle of the CSS: (a)  $\Delta t_{\rm err} > 0$ , (b)  $\Delta t_{\rm err} = 0$ , and (c)  $\Delta t_{\rm err} < 0$ , when  $\tau_{\rm pulse} = 0.5\,T_{\rm osc}$ . (d)  $K_{\rm TD}$  degradation when  $\tau_{\rm pulse} = 0.8\,T_{\rm osc}$ . (e)  $\tau_{\rm pulse} = 1.5\,T_{\rm osc}$  for mm-wave applications with small  $T_{\rm osc}$ . The middle point of  $clk\_css$  is taken as reference, where t=0. The peak-to-peak value of  $clk\_css$  and  $V_{\rm osc} \pm$  is equal to  $V_{\rm DD}$  (e.g., 0.8 V in 22-nm CMOS).

two  $C_s$  capacitors to the pseudo-diff-pair via  $S_2$  for the CSS operation [see Fig. 2(c), where  $R_0$  represents the equivalent resistance seen at  $V_{\text{err}+}$  (or  $V_{\text{err}-}$ )].

The corresponding waveforms of the CSS operation are shown in Fig. 3. The area overlap between  $clk\_css$  and  $V_{\rm osc}$  (=  $V_{\rm osc+} - V_{\rm osc-}$ ) represents the net charge steered through  $M_1$  (red shaded area) or  $M_2$  (blue shaded area). Consequently, the differential-mode (DM) charge residue on  $C_{\rm s}$ ,  $\Delta V_{\rm err} = V_{\rm err+} - V_{\rm err-}$ , corresponds to the detected time-error (i.e.,  $\Delta t_{\rm err}$ ) between  $clk\_css$  and  $V_{\rm osc}$ . If the zero-crossing point of  $V_{\rm osc}$  lags, aligns, or leads with  $clk\_css$ , it results in a positive, zero, or negative  $\Delta V_{\rm err}$ , as shown in Fig. 3(a)–(c), respectively.

## B. TD Gain of the CSS $(K_{TD})$

The TD gain  $K_{\rm TD}$  of the CSS is defined as follows:

$$K_{\rm TD} = \frac{\Delta V_{\rm err}}{\Delta t_{\rm err}} \tag{3}$$

which reaches its maximum value when the pulsewidth of  $clk\_css$ ,  $\tau_{pulse}$ , equals  $0.5\,T_{\rm osc}$ . However, at mm-wave,  $T_{\rm osc}$  becomes impractically narrow, so achieving  $\tau_{pulse}=0.5\,T_{\rm osc}$  would be challenging, even in advanced CMOS technologies. This eventually leads to a degradation of  $K_{\rm TD}$ , as shown in Fig. 3(d). As a remedy, we propose setting the pulsewidth for the CSS operation at around  $1.5\,T_{\rm osc}$  and  $2.5\,T_{\rm osc}$ , to maintain



Fig. 4. Simulated and calculated differential CSS current  $\Delta I_{\rm css}$  based on (4) under different conditions. (a)  $\tau_{\rm pulse}=0.5\,T_{\rm osc}$  and  $C_{\rm s}=200$  fF. (b)  $\tau_{\rm pulse}=1.5\,T_{\rm osc}$  and  $C_{\rm s}=100$  fF. (c)  $\tau_{\rm pulse}=1.5\,T_{\rm osc}$  and  $C_{\rm s}=20$  fF. Parameters:  $T_{\rm osc}=50$  ps and  $\Delta t_{\rm err}=0$ .  $W_{\rm S1}=6~\mu{\rm m},~W_{\rm S2}=5~\mu{\rm m},~W_{\rm I}/2=5~\mu{\rm m},~{\rm and}~L=30~{\rm nm}.$ 

high  $K_{\rm TD}$  [see Fig. 3(e)]. It is evident that the CM voltage of  $V_{\rm err,\pm}$ ,  $V_{\rm err,cm}$ , in Fig. 3(e) is lower than that in Fig. 3(a) due to the longer  $\tau_{\rm pulse}$  for discharging. To avoid an excessively low  $V_{\rm err,cm}$ , it should be carefully managed by properly sizing  $C_{\rm s}$ ,  $S_{\rm 2}$ , and the pseudo-diff-pair. Note that this approach would not be feasible with subharmonic IL or CSL techniques. Moreover, unlike the conventional subsampling, the CSS operation does not directly sample the slope of the oscillating waveform, thus obtaining good isolation between the TD and oscillator, maintaining low-reference spurs even without an isolating buffer.

To quantitatively analyze  $K_{TD}$  in the proposed CSS, we employ the model shown in Fig. 2(c), in which  $S_1$  is implemented using pMOS transistors with an inverter, while S<sub>2</sub> and pseudo-diff-pair are realized with nMOS transistors, all with a minimum length in a 22-nm CMOS. With the sharp slope of  $V_{\text{osc}\pm}$ ,  $M_{1/2}$  transitions quickly from the cutoff to the triode region, spending minimal time in the saturation region. As a result, during CSS, the "steering" current  $I_{css\pm}$  is strongly influenced by both  $V_{GS}$  and  $V_{DS}$  of  $M_{1/2}$  (also considering the short-channel effects in advanced CMOS). Since  $V_{GS}$  follows a sinusoidal waveform and  $V_{DS}$  exhibits a declining trend [see Fig. 3(a)], we propose modeling the differential steering current  $\Delta I_{\rm css}$  (=  $(I_{\rm css+} - I_{\rm css-})/2$ ) as a damped sine waveform. When  $V_{\rm osc\pm} = V_{\rm osc,cm} \pm V_0 \sin{(\omega_{\rm osc}(t - \Delta t_{\rm err}))}$ ,  $\Delta I_{\rm css}$  can be expressed as follows (with the midpoint of clk\_css taken as the t = 0 reference point and  $V_{\text{osc,cm}} = V_0 = V_{\text{DD}}/2$ ):

$$\Delta I_{\rm css}(t) = V_0 \cdot G_{\rm m} e^{-(t + \tau_{\rm pulse}/2)/R_0 C_{\rm s}}$$

$$\times \left[ \sin \omega_{\rm osc}(t - \Delta t_{\rm err}) + a_3 \sin 3\omega_{\rm osc}(t - \Delta t_{\rm err}) \right]$$
(4)

where  $-\tau_{\text{pulse}}/2 \leq t \leq \tau_{\text{pulse}}/2$ .  $G_{\text{m}}$  is the equivalent large-signal transconductance of  $M_{1/2}$ , while the damping factor  $e^{-(t+\tau_{\text{pulse}}/2)/R_0C_s}$  models the reduction of  $G_{\text{m}}$  caused by the decline of  $V_{\text{err,cm}}$  over time.  $a_3$  models the odd-order nonlinearity of  $M_{1/2}$ , flattening the peaks and bottoms of  $\Delta I_{\text{css}}$  (i.e.,  $0 < a_3 < 1$ ) [38]. Fig. 4(a) and (b) shows the post-layout simulated and fit curves of  $\Delta I_{\text{css}}$  based on (4) under  $\tau_{\text{pulse}} = 0.5 \, T_{\text{osc}}$  or  $1.5 \, T_{\text{osc}}$  with a reasonably large  $C_{\text{s}}$ , thus demonstrating the effectiveness of the postulated

formula (4). However, when  $C_s$  is excessively small (or  $M_{1/2}$  is excessively large), this would result in a rapid charge loss, causing  $|\Delta I_{\rm css}|$  to quickly decay to 0 [see Fig. 4(c)], which spoils the behavior predicted by (4).

 $K_{\rm TD}$  in the CSS operation can be derived as follows:

$$K_{\rm TD} = \frac{\Delta V_{\rm err}}{\Delta t_{\rm err}} = \left(-\frac{2}{C_{\rm s}} \int_{-\tau_{\rm pulse}/2}^{\tau_{\rm pulse}/2} \Delta I_{\rm css}(t) dt\right) / \Delta t_{\rm err} \quad (5)$$

in which  $|\Delta t_{\rm err}| < T_{\rm osc}/2$  ensures that  $K_{\rm TD}$  remains monotonic. The analytical result of  $K_{\rm TD}$  based on (4) is complex and can be solved and visualized using mathematical calculation software. However, for simplicity and to gain intuitive understanding,  $a_3$  could be omitted. On the other hand, if  $C_{\rm s}$  is sufficiently large, the damping factor can be neglected within  $\tau_{\rm pulse}$ . Correspondingly, the simplified  $\Delta I_{\rm css}$  (i.e.,  $\Delta I_{\rm css,simpl}$ ) and  $K_{\rm TD}$  (i.e.,  $K_{\rm TD,simpl}$ ) are given by

$$\Delta I_{\rm css,simpl} = V_0 \cdot G_{\rm m} \sin \omega_{\rm osc} (t - \Delta t_{\rm err}) \tag{6}$$

and

$$K_{\text{TD,simpl}} \approx \frac{4G_{\text{m}}V_0}{C_{\text{s}}} \cdot \frac{\sin \omega_{\text{osc}} \Delta t_{\text{err}}}{\omega_{\text{osc}} \Delta t_{\text{err}}} \cdot \sin \left(\frac{\omega_{\text{osc}} \tau_{\text{pulse}}}{2}\right).$$
 (7)

#### C. Optimization of CSS TD Gain

Fig. 5 shows the post-layout simulated and calculated  $K_{\rm TD}$  as functions of the pulsewidth  $\tau_{\rm pulse}$ , sampling capacitor  $C_{\rm s}$ , and time error  $\Delta t_{\rm err}$ , based on the proposed  $\Delta I_{\rm css}$ 's dampedsine model in (4) and the simplified pure-sine model in (6). Clearly,  $K_{\rm TD}$  calculated from (4) shows much better agreement with the simulation results, significantly outperforming the simplified model based on (6).

1)  $K_{TD}$  Versus  $\tau_{pulse}$ : As illustrated in Fig. 5(a),  $K_{TD}$  exhibits two peaks at  $\tau_{pulse} = 0.5 \, T_{\rm osc}$  and  $1.5 \, T_{\rm osc}$ , as discussed in Section II-B. Although the latter is slightly smaller than the former, it is well suited for the small  $T_{\rm osc}$  in mm-wave oscillators. The peak difference is attributed to a larger decline in  $G_{\rm m}e^{-(t+\tau_{\rm pulse}/2)/R_0C_{\rm s}}$  when  $\tau_{\rm pulse} = 1.5 \, T_{\rm osc}$ . On the other hand, the 3rd-harmonic current due to  $a_3$  beneficially flattens the peaks of  $K_{\rm TD}$ , reducing its sensitivity to  $\tau_{\rm pulse}$ . Fig. 5(a) shows  $K_{\rm TD}$  remains around 25 GV/s as  $\tau_{\rm pulse}/T_{\rm osc}$  varies from 1.25 to 1.75. Consequently, for a digitally controlled oscillator (DCO) with a tuning range (TR) from  $f_{\rm osc,min}$  to  $f_{\rm osc,max}$ , a simple option is to set  $\tau_{\rm pulse}$  to  $1.5/[(f_{\rm osc,min}+f_{\rm osc,max})/2]$  or to occasionally adjust  $\tau_{\rm pulse}$  for different frequency ranges.

<sup>&</sup>lt;sup>7</sup>Higher odd-order nonlinearities are neglected for simplicity without compromising accuracy, while even-order nonlinearities are omitted due to the differential operation.



Fig. 5. Simulated and calculated TD gain  $K_{\text{TD}}$  versus (a)  $\tau_{\text{pulse}}/T_{\text{osc}}$ , (b)  $C_{\text{s}}$ , and (c)  $\Delta t_{\text{err}}/T_{\text{osc}}$ , based on  $\Delta I_{\text{css}}$ 's damped-sine model in (4) and pure-sine model in (6). Parameters:  $T_{\text{osc}} = 50$  ps,  $W_{1/2} = 5$   $\mu$ m, and  $L_{1/2} = 30$  nm.

- 2)  $K_{TD}$  Versus  $C_s$ : When  $C_s$  is sufficiently large (e.g., >180 fF) both models exhibit similar accuracy, showing that  $K_{TD}$  is inversely proportional to  $C_s$ . However, the proposed model based on (4) demonstrates higher accuracy in the range of 80 fF <  $C_s$  < 180 fF, where the damping effect becomes significant. On the other hand, considering an excessively small  $C_s$ , the charge on both  $C_s$  capacitors (or on one of them) is nearly depleted before the sampling pulse completes, leading to a significant reduction in  $K_{TD}$ . Therefore, there exists an optimized  $C_s$  for the highest CSS TD gain, as shown in Fig. 5(b). Similarly, the size of the pseudodiff-pair should be optimized for achieving the maximum  $K_{TD}$ , as an excessively large  $M_{1/2}$  causes the same ill condition as with a small  $C_s$ .
- 3)  $K_{TD}$  Versus  $\Delta t_{err}$ : As per Fig. 5(c),  $K_{TD}$  exhibits a sinc-like functional behavior with respect to  $\Delta t_{err}$ , while the 3rd-harmonic current modeled by  $a_3$  flattens the peaks of  $K_{TD}$ . For an integer-N or DTC-based fractional-N operation [39], where  $\Delta t_{err} \approx 0$ ,  $K_{TD}$  achieves its maximum value of approximately  $4G_{\rm m}V_0/C_{\rm s}$ , e.g., 25 GV/s.
- 4)  $K_{TD}$  Versus  $V_{osc,cm}$ : The relationship between  $K_{TD}$  and the biasing voltage of  $M_{1/2}$ , i.e.,  $V_{osc,cm}$  in Fig. 3, is illustrated in Fig. 6. The results demonstrate robustness against variations in  $V_{osc,cm}$  from 0.2 to 0.6 V, with an optimum biasing at 0.4 V (i.e.,  $V_{DD}/2$ ). An ac coupling circuit with additional biasing for  $M_{1/2}$  can be introduced between  $V_{osc\pm}$  and  $M_{1/2}$  if necessary.

## D. Mismatch Analysis in CSS

Mismatches in the  $C_{\rm s}$  capacitors, switches, and pseudo-diffpair introduce only a time-error offset, defined as a value of  $\Delta t_{\rm err}$  (i.e.,  $\Delta t_{\rm err0}$ ) that results in  $\Delta V_{\rm err}=0$ . In addition, waveform asymmetry between  $V_{\rm osc+}$  and  $V_{\rm osc-}$  also contributes to  $\Delta t_{\rm err0}$ , though this effect is typically negligible with careful layout.

Fig. 7 presents the Monte Carlo simulation of the time-error offset  $\Delta t_{\rm err0}$  and  $K_{\rm TD}$ , accounting for all mismatches in the CSS technique. The mean  $K_{\rm TD}$  is 25.62 GV/s with a standard deviation of 1.034 GV/s, which has a negligible impact on the loop bandwidth and PN.

<sup>8</sup>The quick charge preset in the proposed CSS prevents any mismatch from causing output ripples. This contrasts with the conventional charge sampling [34], where mismatches between the two sampling resistors ( $R_{\rm D}$ ) and the sampling capacitors lead to output ripples due to the slow reset with  $R_{\rm D}=100~{\rm k}\Omega$ , ultimately worsening the reference spur [39] in PLLs.



Fig. 6. Simulated  $K_{\text{TD}}$  versus  $V_{\text{osc.cm}}$ .



Fig. 7. Monte Carlo simulations of (a) time-error offset  $\Delta t_{err0}$  and (b)  $K_{TD}$ , considering mismatches in the  $C_s$  capacitors, switches, and pseudo-diff-pair.

# III. CSS-ADPLL

## A. Architecture of CSS-ADPLL

The overall architecture of the CSS-ADPLL is shown in Fig. 8. It consists of a programmable pulse generator, the proposed CSS-based time-error detector (CSS-TD) integrated with a 6-bit SAR ADC, a DLF, and a DCO, whose resonating waveform is connected to the CSS-TD. A separate FLL, based on a counter scheme, is used to tune the coarse bank of the DCO via  $D_{\rm coarse}$ , bringing  $f_{\rm osc}$  close to  $Nf_{\rm ref}$  (where  $f_{\rm osc}=1/T_{\rm osc}$  and  $f_{\rm ref}=1/T_{\rm ref}$ ), just before the ADPLL fine-tunes the DCO via the fine bank using  $D_{\rm fine}$ .

# B. Time-to-Digital Conversion (TDC) Based on CSS and SAR ADC

Generally, achieving low jitter in an ADPLL necessitates a high-resolution TDC for  $\Delta t_{\rm err}$  to surpass the bottleneck imposed by a single inverter delay in advanced CMOS nodes (e.g.,  $\sim$ 10 ps in 22 nm).

By integrating the total input capacitance of an M-bit SAR ADC into the sampling capacitor  $C_s$  of the CSS technique, a high-resolution TDC scheme is proposed, as illustrated in



Fig. 8. Architecture of the implemented CSS-ADPLL with a counter-based FLL.



Fig. 9. (a) Schematic of the TDC scheme based on CSS and SAR ADC, (b) midtread encoder, and (c) timing diagram. Note:  $C_s = C_{sar} + C_{dmy}$ .

Fig. 9(a). A single  $C_s$  comprises an (M-1)-bit single-ended capacitive digital-to-analog converter (CDAC) (with total capacitance  $C_{\rm sar}$ ) and a dummy capacitance ( $C_{\rm dmy}$ , including all parasitic capacitance) specifically optimized to enhance  $K_{\rm TD}$ ,  $C_s = C_{\rm sar} + C_{\rm dmy}$ . The SAR ADC resolution (V/bit) can be derived as  $\Delta V_{\rm adc} = (C_{\rm sar}/C_s) \cdot V_{\rm ref,adc}/2^{M-1}$  with an input range of  $\pm (C_{\rm sar}/C_s) \cdot V_{\rm ref,adc}$ , where  $V_{\rm ref,adc}$  is the reference voltage for the SAR ADC (often reused as  $V_{\rm DD}$  for simplicity). S<sub>1</sub> is implemented using properly sized pMOS transistors (M<sub>5/6</sub>) to fully precharge  $C_s$  to  $V_{\rm DD}$ , with its control signal  $clk\_rst$  generated with ref and  $clk\_sar$  using a NAND gate. Meanwhile, S<sub>2</sub> can be implemented using either pMOS or nMOS transistors (i.e., M<sub>3/4</sub>).

It should be noted that the switching operation of  $M_{3/4}$  can inject glitches on  $V_{\rm osc\pm}$  through the gate-drain capacitance  $(C_{\rm gd})$  of  $M_{1/2}$ . This CM disturbance affects  $V_{\rm osc\pm}$ , altering the nonlinear parasitic capacitance of the oscillator's  $-G_{\rm m}$  component, which in turn leads to frequency modulation (FM)-induced reference spurs [39]. Fortunately, this effect is

significantly smaller than with the direct sampling of  $V_{\rm osc\pm}$ . To further mitigate this issue,  $M_{1/2}$  and  $M_{3/4}$  can be properly sized to minimize glitches, or an oscillator buffer can be added to provide additional isolation.

After the charge preset and CSS,  $\Delta V_{\rm err}$  is digitized by the M-bit SAR ADC, producing an unsigned output  $D_{\rm out}$  (range:  $[2^M-1:0]$ ), triggered by the falling edges of  $clk\_sar$ . The  $clk\_sar$  signal is generated from ref using an inverter-based delay chain, ensuring that its falling edges occur after those of  $clk\_css$ . A midtread quantizing encoder is used to convert  $D_{\rm out}$  into a signed  $D_{\rm err}$  as  $D_{\rm err} = D_{\rm out} - 2^{M-1}$  for subsequent ADPLL operation. The equivalent TDC resolution  $\Delta t_{\rm tdc}$  is derived as follows:

$$\Delta t_{\rm tdc} = \frac{\Delta V_{\rm adc}}{K_{\rm TD}}.$$
 (8)

Based on the analysis in Section II-C,  $C_{\rm s}$  of the SAR ADC is chosen as 100 fF (for maximum  $K_{\rm TD}=25$  GV/s when  $\tau_{\rm pulse}\approx 1.5T_{\rm osc}$ ), comprising  $C_{\rm sar}=20$  fF with  $C_{\rm dmy}=80$  fF. This corresponds to the SAR ADC range of  $\pm 160$  mV and resolution of  $\Delta V_{\rm adc}\approx 5$  mV/bit ( $C_{\rm u}\approx 0.625$  fF/bit), assuming  $V_{\rm ref,adc}=800$  mV. Consequently, it achieves a fine  $\Delta t_{\rm tdc}=200$  fs/bit.

## C. Design of DLF

The DLF comprises the proportional  $(\gamma)$  and integral  $(\rho)$  paths, along with a controlled DZ.

1) Proportional Path: The proportional path is intended for correcting instantaneous phase errors caused by PN. The coefficient  $\gamma$  serves as the "TDC-to-DCO code scaling" factor. It can be configured to values such as  $2^1$ ,  $2^0$ ,  $2^{-1}$ , and so on by applying arithmetic left-bit shifting (i.e., <<<), no shifting, or right-bit shifting (i.e., >>>), respectively, to fine-tune the loop bandwidth (BW). However, right-bit shifting (e.g., >>>1 or >>>2) reduces the detection resolution

<sup>&</sup>lt;sup>9</sup>A midrise encoder can also be used, typically with  $\gamma = 1$ , where the quantization noise is suppressed by BB effects, rendering it independent of  $K_{\rm TD}$  [37]. However, its effective TDC resolution ( $\Delta t_{\rm tdc}$  is not well defined, as it depends on the standard variation of  $\Delta t_{\rm err}$ ,  $\sigma_{\Delta t, \rm err}$ ). Therefore, when  $K_{\rm TD}$  is high, a midtread encoder is preferred to ensure a more stable and predictable resolution for jitter optimization [see Fig. 23(a)].





Fig. 10. (a) Implementation of the DZ in Verilog and (b) its visualization with an example where  $DZ\_PAR = 1$ .

by discarding lower bits. This is mathematically equivalent to setting  $\gamma=1$  but results in a coarser TDC step of  $2\Delta t_{\rm tdc}$  or  $4\Delta t_{\rm tdc}$ , which reduces the BW while increasing the quantization noise.  $K_{\Delta T, \rm dco} (\approx T_{\rm osc} \cdot K_{\rm dco}/f_{\rm osc})$  is the DCO gain in time domain (unit: s/bit), while  $K_{\rm dco}$  represents the DCO gain in frequency domain (unit: Hz/bit). The loop BW can be further optimized by adjusting  $K_{\Delta T, \rm dco}$ .

Per the analysis in [10], the BW of a wideband digital PLL depends on the "timestamp correction factor," which characterizes the strength of correction applied to  $\Delta t_{\rm err}$ , expressed as follows:

$$\alpha = \gamma \frac{K_{\Delta T, \text{dco}}/T_{\text{osc}}}{\Delta t_{\text{tdc}}/T_{\text{ref}}} \le 1.$$
 (9)

The terms  $\Delta t_{\rm tdc}/T_{\rm ref}$  and  $K_{\Delta T,{\rm dco}}/T_{\rm osc}$  ( $\approx K_{\rm dco}/f_{\rm osc}$ ) represent the normalized TDC gain and DCO gain (units: ppm/bit), respectively. Consequently, the loop BW can be estimated approximately as  $\alpha/2\pi \cdot f_{\rm ref}$ , or more precisely determined from [10, Fig. 4(b)], especially at higher  $\alpha$ .

2) Integral Path With the DZ: The integral path, incorporating the DZ, detects and minimizes the frequency error between  $f_{\rm osc}$  and  $Nf_{\rm ref}$  by accumulating the phase error. The integral coefficient  $\rho$  (implemented via arithmetic right-bit shifting, i.e., >>>) is set significantly smaller than  $\gamma$  (e.g.,  $\rho/\gamma \le 2^{-4} < 1/10$ ; see Section VI for details) to ensure loop stability and control the convergence speed of the integral path. The DZ prevents excessive corrections to the DCO—an issue 10 that cannot be resolved by merely reducing  $\rho$ .

Fig. 10(a) illustrates the implementation of the DZ in Verilog. It discards the lower bits of  $D_{\rm err}$  using an arithmetic right-bit shift operation with a positive variable DZ\_PAR (i.e.,  $D_{\rm err}>>>$  DZ\_PAR). For a negative  $D_{\rm err}$ , a corrective

increment (+1) is applied after<sup>11</sup> the arithmetic right-bit shift. The visualization of the DZ, with an example where DZ\_PAR = 1, is presented in Fig. 10(b).

With a DZ, if  $D_{\rm err}$  toggles within a small range (e.g.,  $\pm 1$ ) mainly due to thermal PN, the integral path remains inactive, preventing conflicts with the proportional path. However, the presence of frequency error (e.g., due to temperature variations) as well as frequency fluctuations/wander (e.g., due to the DCO's flicker PN) can eventually drive  $\Delta t_{\rm err}$  beyond the DZ threshold, activating the integral path to track the frequency variations and suppress flicker PN. Consequently, the PLL enhances robustness and maintains a certain degree of type-II filtering for the DCO's flicker PN [40], [41], [42], [43]. Generally, larger rms jitter of  $\Delta t_{\rm err}$  (e.g., due to poor reference PN) necessitates a larger DZ. However, an excessively large DZ can degrade the integral path's ability to track frequency variations and suppress the DCO's flicker PN.

# IV. PN ANALYSIS OF CSS-ADPLL

### A. Multirate Timestamp Model of CSS-ADPLL

Due to the very high  $K_{\rm TD}$  that leads to a drastic bandwidth expansion in CSS-ADPLL, we adopt a multirate timestamp model incorporating two z-variables [6], [10] for PN and jitter analysis. As shown in Fig. 11,  $^{12}$   $t_{\rm ref}[n]$  and  $t_{\rm osc}[k]$  represent the timestamps of reference and oscillator, respectively. The downsampler  $(\downarrow N)$  bridges the timestamps from the high (i.e.,  $f_{\rm osc}$ ) to low (i.e.,  $f_{\rm ref}$ ) sampling-rate domain, while the upsampler  $(\uparrow N)$  and zero-order hold (ZOH) performs the reverse operation. Correspondingly, two z-variables are employed to execute the z-transform for the two-rate timestamps as follows:

$$z_{\text{ref}} = e^{j2\pi \Delta f/f_{\text{ref}}}, \text{ and } z_{\text{osc}} = e^{j2\pi \Delta f/f_{\text{osc}}}.$$
 (10)

The ZOH is represented as  $(1-z_{\rm ref}^{-1})/(1-z_{\rm osc}^{-1})$ , where  $z_{\rm ref}^{-1}$  and  $z_{\rm osc}^{-1}$  express one reference and one oscillator delay (in which  $z_{\rm ref}^{-1}=z_{\rm osc}^{-N}$ , timewise). Furthermore,  $z^{-L}$ , where  $L=\lceil N/2\rceil+1$ , represents a loop delay equivalent to half the reference cycle. This delay arises, in our specific case, from the detection of  $\Delta t_{\rm err}$  at the falling edges of the reference clock and the tuning of DCO at its rising edges.

Based on Fig. 11, the output PN of the CSS-ADPLL is derived as follows:

$$\mathcal{L}_{\text{out}}(z_{\text{osc}})$$

$$\approx \left| \frac{1}{N} \frac{H_{\text{corr}}}{1 + H_{\text{corr}}/N} \right|^{2} N^{2} (\mathcal{L}_{\text{ref}}(z_{\text{ref}}) + \mathcal{L}_{\text{TDC}}(z_{\text{ref}}))$$

$$+ \left| 1 - \frac{1}{N} \frac{H_{\text{corr}}}{1 + H_{\text{corr}}/N} \right|^{2} \mathcal{L}_{\text{osc}}(z_{\text{osc}})$$
(11)

 $<sup>^{10}</sup>$ For instance, without a DZ and even with a small  $\rho=2^{-9}$  in Fig. 8, if the accumulator output reaches approximately 511, 1023, and so on, the integral path may unnecessarily adjust by  $D_{\rm err}=\pm1$ , conflicting with the proportional path. This effect arises from the quantization introduced by the right-bit shifting, leading to excessive corrections.

 $<sup>^{11}\</sup>mathrm{The}$  corrective increment (+1) can also be applied before the arithmetic right-bit shift negative  $D_{\mathrm{err}}.$  This does not result in a noticeable difference in PLL time-domain behavioral simulation.

 $<sup>^{12}</sup>$  This model can also be extended to analyze the charge-domain fractional- N ADPLL based on CSS [37] by replacing  $\Delta t_{\rm tdc}$  and  $\sigma_{\Delta q}$  with their midrise encoder counterparts. The capacitive DAC quantization noise is added after the  $K_{\rm TD}$  stage.



Fig. 11. Multirate timestamp modeling [10] of CSS-ADPLL, supporting wideband PN analysis.

where  $H_{\text{corr}}$  is the transfer function of the feedforward path, expressed as follows:

$$H_{\text{corr}} = \frac{K_{\text{TD}}}{\Delta V_{\text{adc}}} H_{\text{DLF}} K_{\Delta T, \text{dco}} \frac{1 - z_{\text{ref}}^{-1}}{1 - z_{\text{osc}}^{-1}} \frac{1}{1 - z_{\text{osc}}^{-1}} z_{\text{osc}}^{-L}.$$
 (12)

The transfer function of the DLF is given by

$$H_{\rm DLF} = \gamma + \frac{\rho'}{1 - z_{\rm ref}^{-1}}.\tag{13}$$

When the DZ is enabled, the effective integral coefficient  $\rho'$  can be approximately estimated as  $\rho \cdot 2^{-DZ\_PAR}$ , or effectively 0, resulting in the PLL degenerating into a type-I PLL in the PN analysis.

In addition,  $\mathcal{L}_{ref}$  refers to the total reference PN (i.e.,  $\Delta t_{ref}[n]$  in the time domain), which includes both the intrinsic reference PN and the contributions from the on-chip reference path (e.g., PN originating from the  $clk\_css$  pulse generator,  $\mathcal{L}_{pulse}$ ).  $\mathcal{L}_{osc}$  corresponds to the PN of the oscillator (i.e.,  $\Delta t_{osc}[k]$  in time domain).

## B. PN Analysis of CSS-Based TDC Affecting on $\Delta t_{err}$

Specifically,  $\mathcal{L}_{TDC}$  in (11) represents the internal PN of the CSS-based TDC that affects  $\Delta t_{err}[n]$ .<sup>13</sup> It includes noise contributions from the CSS ( $V_{n,css}[n]$ ), the SAR ADC comparator ( $V_{n,cmp}[n]$ ), and the quantization error of the SAR ADC ( $\Delta q[n]$ ).

1) Noise From CSS: The CSS noise is characterized as a sampled differential voltage noise,  $V_{n,css}[n]$ , on  $V_{err+} - V_{err-}$ , after each CSS cycle. It originates from the current noise in  $I_{css\pm}$ , which is injected into  $C_s$  during  $\tau_{pulse}$  [see Fig. 2(c)].

Assume that the average current noise power during CSS in one branch of  $I_{\text{err}+}$  or  $I_{\text{err}-}$  is  $\overline{I_{\text{n,css}}^2}$  (unit: A<sup>2</sup>/Hz), accounting for variations in  $V_{\text{GS}}$  and  $V_{\text{DS}}$  of  $M_{1/2}$ . Accordingly, the spectrum of  $V_{\text{n,css}}[n]$  can be derived as follows:

$$\overline{V_{\text{n,css}}^2} = \frac{2 \cdot \overline{I_{\text{n,css}}^2} \tau_{\text{pulse}}^2}{C_s^2}$$
 (14)

where the coefficient "2" arises because  $\overline{I_{n,css}^2}$  in  $I_{err+}$  and  $I_{err-}$  represent two independent noise sources.  $\overline{V_{n,css}^2}$  can be accurately simulated by Cadence PNOISE analysis with the

"sampled jitter" noise type. <sup>14</sup> It is configured to periodically  $(T_{\rm ref})$  observe the differential voltage noise on  $V_{\rm err+} - V_{\rm err-}$ , triggered after the falling edge of  $\tau_{\rm pulse}$ . When normalized to the PN affecting  $\Delta t_{\rm err}[n]$ , it is expressed as follows:

$$\mathcal{L}_{\text{css}}(\Delta f) = \left(\frac{2\pi}{T_{\text{ref}}}\right)^2 \cdot \frac{\overline{V_{\text{n,css}}^2/2}}{K_{\text{TD}}^2} \quad \propto \quad \frac{\overline{I_{\text{n,css}}^2}}{G_{\text{m}}^2} \tag{15}$$

where  $-f_{\rm ref}/2 < \Delta f < f_{\rm ref}/2$ .  $\overline{V_{\rm n,css}^2}$  is represented as a one-sided spectrum in "sampled jitter" analysis of PNOISE and must be divided by 2 to convert it into a two-sided spectrum for single-sideband (SSB) PN calculation. Fig. 12(a) presents the simulated  $\mathcal{L}_{\rm css}$ , which is significantly suppressed due to the high  $K_{\rm TD}$ . To further reduce thermal noise in  $\mathcal{L}_{\rm css}$ ,  $G_{\rm m}$  can be increased at the cost of higher power consumption (with the corresponding increase in  $C_{\rm s}$  to avoid the ill-condition of ruining  $K_{\rm TD}$ ). It should be noted that increasing  $C_{\rm s}$  by itself cannot suppress  $\mathcal{L}_{\rm css}$ , as shown in (15) and Fig. 12(b). In addition, its flicker noise can be suppressed by increasing the area of  $M_{1/2}$  (e.g., increasing  $L_{1/2}$ ), which cannot be suppressed by merely increasing  $G_{\rm m}$ .

2) Comparator Noise in SAR ADC: The differential input-referred sampled noise of the SAR ADC comparator,  $\overline{V_{n,\text{cmp}}^2}$ ,  $^{16}$  can also be suppressed by a high  $K_{\text{TD}}$ , as it is normalized to the PN affecting  $\Delta t_{\text{err}}[n]$ 

$$\mathcal{L}_{cmp}(\Delta f) = \left(\frac{2\pi}{T_{ref}}\right)^2 \cdot \frac{\overline{V_{n,cmp}^2/2}}{K_{TD}^2}$$
 (16)

where  $-f_{\rm ref}/2 < \Delta f < f_{\rm ref}/2$ , as shown in Fig. 12(c).  $\overline{V_{\rm n,cmp}^2}$  is represented as a one-sided spectrum in "sampled jitter" analysis of PNOISE.

3) Quantization Noise:  $\Delta q[n]$  is the detector's quantization error and so  $|\Delta q[n]| \le 0.5$  bit. Considering  $\Delta t_{\rm err}[n]$  is fairly uniformly distributed during phase lock, the standard deviation

<sup>&</sup>lt;sup>13</sup>Per [10], the noise contributions added to  $t_{\rm ref}[n]$ ,  $\Delta t_{\rm err}[n]$ , and  $t_{\rm out,down}[n]$  undergo the same transfer function to the output.

<sup>&</sup>lt;sup>14</sup>It should be noted that PNOISE using sampled jitter analysis for voltage and current noise provides a one-sided spectrum, whereas PNOISE for PN simulation presents an SSB PN spectrum. Thus, to calculate (SSB) PN, all one-sided noise spectra must first be converted to two-sided spectra.

<sup>&</sup>lt;sup>15</sup>This is because both  $\overline{V_{\rm n,css}^2}$  and  $K_{\rm TD}^2$  are proportional to  $1/C_{\rm s}^2$ .

 $<sup>^{16}\</sup>overline{V_{n,cmp}^2}$  is simulated based on PNOISE with "sampled jitter" noise type. Given a fixed differential input offset (e.g., 0.1 mV), the differential output voltage noise of the comparator is observed periodically every  $T_{\rm ref}$ , triggered when the differential output reaches a specific voltage (e.g., 100 mV) during the comparison. This noise is then normalized by the voltage gain (e.g.,  $100\,{\rm mV}/0.1\,{\rm mV}$ ) to obtain  $\overline{V_{n,cmp}^2}$ .



Fig. 12. Simulated and normalized PN contributions from (a) CSS current noise (i.e.,  $\mathcal{L}_{\text{CSS}}$ ), (b) its variation with  $C_{\text{S}}$  at 100-kHz offset, and (c) input equivalent noise of the comparator in the SAR ADC (i.e.,  $\mathcal{L}_{\text{cmp}}$ ), both simulated using PNOISE with the "sampled jitter" noise type ( $|\Delta f| < f_{\text{ref}}/2$ ).

of  $\Delta q[n]$ ,  $\sigma_{\Delta q}$ , is around  $1/\sqrt{12}$  bit<sub>rms</sub>. Normalizing  $\sigma_{\Delta q}$  into the PN affecting  $\Delta t_{\rm err}[n]$ , we get

$$\mathcal{L}_{\Delta q}(\Delta f) = \frac{(2\pi \cdot \sigma_{\Delta q} \Delta t_{\text{tdc}} / T_{\text{ref}})^2}{f_{\text{ref}}}$$
(17)

where  $-f_{\rm ref}/2 < \Delta f < f_{\rm ref}/2$ . With  $f_{\rm ref}=250$  MHz and  $\Delta t_{\rm tdc}=200$  fs/bit,  $\mathcal{L}_{\Delta q}=-164.83$  dBc/Hz.

Based on the above analysis, we obtain  $\mathcal{L}_{TDC} = \mathcal{L}_{css} + \mathcal{L}_{cmp} + \mathcal{L}_{\Delta q}$ .

# C. PN Analysis of Pulse Gen. ( $\mathcal{L}_{pulse}$ , Part of $\mathcal{L}_{ref}$ )

The schematic of pulse generator for  $clk\_css$  is shown in Fig. 8. The programmable delay spans from 52 to 230 ps with a 7-bit control, enabling flexible generation of  $\sim 1.5~T_{\rm osc}$  pulses.

The PN analysis of the pulse in CSS is not straightforward, as it is influenced by the uncorrelated contributions of both the rising and falling edges. Assuming that the corresponding rms jitter of these edges are  $\sigma_{\rm edge,rise}$  and  $\sigma_{\rm edge,fall}$ , respectively, the jitter of the  $clk\_css$  is determined by its midpoint, which can be derived as  $(\sigma_{\rm edge,rise}/2 + \sigma_{\rm edge,fall}/2)$ . Consequently, the PN of the  $clk\_css$  in CSS can be expressed as follows:

$$\mathcal{L}_{\text{pulse}}(\Delta f) = \frac{(2\pi/T_{\text{ref}})^2 \left(\sigma_{\text{edge,rise}}^2 / 4 + \sigma_{\text{edge,fall}}^2 / 4\right)}{f_{\text{ref}}}$$

$$= \frac{1}{4} \mathcal{L}_{\text{edge,rise}}(\Delta f) + \frac{1}{4} \mathcal{L}_{\text{edge,fall}}(\Delta f) \qquad (18)$$

where  $\mathcal{L}_{edge,rise}$  and  $\mathcal{L}_{edge,fall}$  are the PN contributions from the rising edges and falling edges, respectively. They can be simulated using PNOISE with the "sampled jitter" noise type, <sup>17</sup> which directly presents an SSB PN spectrum. At 100 kHz, based on simulated  $\mathcal{L}_{edge,rise} = -157$  dBc/Hz and  $\mathcal{L}_{edge,fall} = -155$  dBc/Hz, the calculated  $\mathcal{L}_{pulse}$  is -159 dBc/Hz, which can be safely neglected in this ADPLL system.

# D. PN Contributions From Various Noise Sources

Fig. 13(a) and (b) illustrates the breakdown of PN contributions from various loop-filtered noise sources, calculated



Fig. 13. PN contributions from different loop-filtered noise sources calculated using (11) with (a)  $\rho'=0$  and (b)  $\rho'=\rho\cdot 2^{-\rm DZ\_PAR}=2^{-13}.$  Parameters:  $\gamma=2^{-1},~\rho=2^{-9},~{\rm and}~{\rm DZ\_PAR}=4.~K_{\rm dco}/f_{\rm osc}=10$  ppm/bit with  $f_{\rm osc}=20~{\rm GHz}.$ 

using (11) for  $\rho'=0$  and  $\rho'=\rho\cdot 2^{-\mathrm{DZ\_PAR}}=2^{-13}$ , respectively. In this calculation,  $\mathcal{L}_{\mathrm{css}}$ ,  $\mathcal{L}_{\mathrm{cmp}}$ , and  $\mathcal{L}_{\Delta q}$  are derived from simulations and analytical modeling, while  $\mathcal{L}_{\mathrm{ref}}$  (including simulated noise from reference buffer and pulse generator) and  $\mathcal{L}_{\mathrm{osc}}$  are obtained from both simulations and measurements. In addition,  $\gamma$  is set to 1/2 to reduce the loop BW for improved suppression of reference PN. Although this results in a coarser TDC step of  $2\Delta t_{\mathrm{tdc}}$ , the impact remains negligible due to the inherently high  $K_{\mathrm{TD}}$ .  $K_{\Delta T,\mathrm{dco}}/T_{\mathrm{osc}}$  (approximately  $K_{\mathrm{dco}}/f_{\mathrm{osc}}$ ) is set to 10 ppm/bit, optimized based on loop BW, which

<sup>&</sup>lt;sup>17</sup>It should be noted that the PN of the pulse should not be simulated by PNOISE with the "time average" noise type. This setting calculates the noise power averaged over all time points within the periodic steady-state (PSS) period, rather than focusing on the pulse edges, which are most critical for pulse PN analysis. The "time average" option is more suitable for noise figure (NF) evaluation of LNAs or for PN analysis of oscillators.

influences both jitter performance as well as the locking range (i.e.,  $|f_{LR}|$ ), as shown in Fig. 14(a).

The PN contributions from the CSS, comparator, and quantization error are fully suppressed due to the high  $K_{TD}$ , as shown in Fig. 13. Estimating the impact of the DZ on filtering the DCO's in-band PN in z-domain analysis is nontrivial. For instance, when using  $\rho' = 2^{-13}$ , the suppression of the DCO's in-band PN is more significant compared to the case of  $\rho' = 0$ . However, both configurations exhibit minimal differences in overall jitter performance within our ADPLL, although  $\rho' =$ 0 shows slightly better agreement with the measured results in the 1-10-kHz offset frequency range. The PLL's PN is predominantly determined by the loop-filtered reference and the on-chip buffer PN, with only slight degradation in the in-band region due to flicker noise contributions from other sources. The calculated PN closely matches the measured PN (see Section VI for details), demonstrating the effectiveness of the proposed model shown in Fig. 11.

## E. Numerical Verification in Time Domain

To further evaluate the influence of the DZ on the CSS-ADPLL, we conducted simulations using a time-domain behavioral model implemented in Verilog-AMS within Cadence Spectre AMS Designer. The ADPLL output timestamps (see the Appendix for DCO timestamp modeling in a sub-50-fs ADPLL) are recorded and post-processed in MATLAB to extract the PN [10], [39], as illustrated in Fig. 14(b). Both thermal and flicker PN are modeled in the time domain (see [38]) for the combined reference, CSS, and comparator (i.e.,  $\mathcal{L}_{ref} + \mathcal{L}_{css} + \mathcal{L}_{cmp}$ ), as well as for the free-running DCO (i.e.,  $\mathcal{L}_{osc}$ ).

If the DZ is disabled (i.e., DZ\_PAR = 0 in Fig. 10), PN peaking may occur, degrading jitter performance even with a small  $\rho=2^{-9}$ . This phenomenon results from overcorrection by the integral path and can only be observed in time-domain behavioral simulations, as it is not captured by either z-domain or s-domain analysis.

By enabling the DZ, the CSS-ADPLL effectively degenerates into a type-I PLL once the frequency error is sufficiently minimized by the integral path. The close agreement between the analytical predictions and behavioral simulations validates the accuracy and effectiveness of the proposed modeling approach.

# V. CIRCUIT IMPLEMENTATION OF OTHER BLOCKS

# A. SAR ADC and Timing

A sample-and-hold block with bootstrap switches in the conventional SAR ADC [45] is replaced by the proposed CSS. The comparator adopts a two-stage dynamic structure [44] for noise reduction, as shown in Fig. 15. It is important to ensure that the CM voltage of  $V_{\rm err,\pm}$ ,  $V_{\rm err,cm}$ , remains within the input CM voltage range of the comparator in the SAR ADC after the CSS. Given the relatively low  $V_{\rm err,cm}$  when  $\tau_{\rm pulse} = 1.5 \, T_{\rm osc}$ , a pMOS diff-pair is preferred for the comparator to maintain proper operating conditions.

For the SAR ADC specifications in the CSS-ADPLL, the ADC resolution ( $\Delta V_{\rm adc}$ ) and comparator noise ( $\overline{V_{\rm n,cmp}^2}$ ) primarily impact the PLL jitter, as previously discussed. Since  $\Delta V_{\rm err}$ 



Fig. 14. (a) Simulated rms jitter and locking range  $|f_{LR}|$  versus  $K_{dco}/f_{osc}$ . (b) Time-domain behavioral simulation with DZ disabled (i.e., DZ\_PAR = 0) and enabled (i.e., DZ\_PAR = 4). Both the reference and oscillator are modeled with flicker and thermal PN in the time domain. Parameters:  $\gamma=2^{-1}$  and  $\rho=2^{-9}$ .



Fig. 15. Schematic of the comparator proposed in [44] that achieves threefold noise improvement.

remains small during an integer-N operation, there are no stringent requirements for the ADC linearity. The post-layout simulated effective number of bits (ENOB) is 5.81 bit, with a signal-to-noise and distortion ratio (SNDR) of 36.72 dB.

The detailed schematic of the timing controller (SAR logic) and its complete timing diagram are shown in Fig. 16(a) and (b), respectively (see also Figs. 9 and 15 for their connections). The asynchronous timing of the SAR ADC follows the conventional approach in [45]. Once all bit transitions are complete, the read-out clock,  $clk\_rout$ , captures all bits into the output registers as unsigned  $D_{out}$ .



Fig. 16. Timing controller (SAR logic). (a) Schematic and (b) timing diagram.



Fig. 17. Two-core DCO implementation. (a) Schematic and (b) layout.

# B. DCO With Tuning Banks

The schematic of the DCO is illustrated in Fig. 17(a). It is a two-core complementary DCO utilizing a distributed  $G_{\rm m}$  topology for direct mm-wave frequency generation with low PN. The layout of the DCO is shown in Fig. 17(b). The two cores are placed at the center of the inductor coil to achieve a compact layout, with the power supply and ground connections extending from the left and right sides of the oscillator coil. This configuration naturally forms a tail inductor structure, which optimizes the flicker noise characteristics of the DCO [40].

The DCO incorporates an 8-bit coarse-tuning switched-capacitor (sw-cap) bank and an 8-bit fine-tuning sw-cap bank. As shown in Fig. 17(b), the coarse-tuning sw-cap bank consists of two 8-bit sub-coarse-tuning sw-cap banks located on the



Fig. 18. (a) Schematic and (b) layout of the fine sw-cap bank unit, along with two floorplan implementations of the bank: (c) central-symmetric distribution and (d) interdigitated distribution. (e) INL comparison of the two floorplans.

upper and lower sides of the DCO, which tune simultaneously, while the fine-tuning bank is located only on the lower side of the DCO for finer frequency tuning resolution. The structure of the coarse-tuning capacitor bank is similar to that in [43], employing reverse-biasing resistors to minimize parasitics in the OFF state. The unit capacitance of the coarse-tuning capacitor bank ( $\Delta C_{\text{coarse,unit}}$ ) is 3 fF/LSB, providing an overall TR 18.8–23.3 GHz (21.9%). The corresponding frequency coarse-tuning step ( $\Delta f_{\text{coarse,unit}}$ ) is 17.64 MHz. Consequently, the maximum frequency error  $f_{\text{osc}} - N f_{\text{ref}}$  after the FLL tunes the coarse bank is  $\pm \Delta f_{\text{coarse,unit}}/2$  (i.e.,  $\pm 8.82$  MHz), which remains well within the simulated PLL locking range  $f_{\text{LR}}$  of  $\pm 24$  MHz. The quality factor of the coarse sw-cap bank in the ON and OFF states is 24 and 66 at 20 GHz, respectively.

The fine-tuning capacitor-bank unit ( $\Delta C_{\rm fine,unit}$ ) is designed with a step of 24 aF/bit, providing frequency tuning resolution about 200 kHz/bit (i.e.,  $K_{\rm dco}/f_{\rm osc}\approx 10$  ppm/bit), as analyzed in Section IV-D. The overlap ratio between the one-step jump of the coarse bank and the fine-bank TR is given by  $1-\Delta C_{\rm coarse,unit}/((2^8-1)\times\Delta C_{\rm fine,unit})$ , (e.g., 51.2%), which should be sufficiently large to ensure seamless switching from the coarse bank to the fine bank.

To implement the fine-tuning sw-cap bank with such a 24-aF/bit resolution step is not straightforward. As shown in Fig. 18(a), the tiny step is achieved by selectively shorting  $C_2$  in the series combination of  $C_1$  and  $C_2$ . Since the capacitance of the fine bank is significantly smaller than that



Fig. 19. Simulated locking behavior of BB operation and the CSS-ADPLL for various SAR ADC output resolutions under a 5-MHz frequency offset.

of the coarse bank, the drain and source nodes of the nMOS switch are pulled down to the ground using large resistors, eliminating the need for any reverse-biasing setup and further simplifying the layout. The physical implementation of the fine sw-cap bank unit is illustrated in Fig. 18(b), with post-layout simulated quality factors exceeding 200 in both ON and OFF states. Capacitors  $C_1$  and  $C_2$ , along with their ground fence (for improved isolation and linearity), are implemented using a customized metal-oxide-metal (MOM) capacitor structure composed of "Metal-1" (M1)-M6 layers. To optimize the area utilization, poly resistors  $R_{\text{poly}}$  are embedded within the capacitor structure. Thick M7 is used for connections with  $V_{
m osc\pm}$  to ensure a high-quality factor. Two floorplan candidates based on central-symmetric and interdigitated distributions are illustrated in Fig. 18(c) and (d), respectively. In both layouts, each pair of adjacent units is arranged in a mirrorsymmetric manner, ensuring that the adjacent plates of  $C_2$ maintain the same polarity to minimize parasitics. As shown in Fig. 18(e), the central-symmetric distribution achieves a significantly lower peak-to-peak integral non-linearity (INL) (0.2 LSB) compared with the interdigitated distribution, making it a more suitable choice for precision tuning.

# C. Frequency and Phase Locking

The FLL in Fig. 8 includes a full-custom designed counter (CNT), a synthesized finite state machine (FSM) that implements a binary-search algorithm, similar to that in [46], and a retimer to synchronize their clock domains. Once the FLL process is completed, the CSS-ADPLL is immediately enabled. The multi-bit digital output of the TD significantly enhances the locking speed. According to the transient simulation in Fig. 19, a 6-bit digital output achieves the locking in less than  $0.5~\mu s$  under a 5-MHz frequency offset, whereas the BB operation (1-bit output) takes up to  $100~\mu s$ . This demonstrates the effectiveness of multi-bit digital output PD in accelerating the locking process. The locking curve above 0 MHz reflects the phase re-alignment behavior. Therefore, further reduction in locking time should focus on optimizing the initial phase alignment between the reference and the oscillator.

### VI. EXPERIMENTAL RESULTS

The proposed 18.8–23.3-GHz CSS-ADPLL is fabricated in 22-nm CMOS, occupying an active area of 0.044 mm<sup>2</sup>,



Fig. 20. Chip micrograph and power breakdown of its building blocks.



Fig. 21. Measured frequency acquisition  $(0.27\,\mu\text{s})$  and phase locking  $(0.34\,\mu\text{s})$  behavior of the FLL and CSS-ADPLL, respectively, under a 200-MHz initial frequency error.

as shown in Fig. 20. With all modules powered by 0.8 V, the total power consumption is 9.95 mW, primarily dominated by the DCO, which consumes 8.65 mW. As shown in Fig. 13, DCO's PN at 1-MHz offset from 20 GHz is -102.1 dBc/Hz, with 120 kHz of the  $1/f^3$  corner.

The frequency and phase locking time are measured using the R&S FSW85 in its transient analysis mode, as shown in Fig. 21. For a 200-MHz frequency error between the DCO's initial frequency and the target frequency, the total locking time is 0.61  $\mu$ s, consisting of 0.27  $\mu$ s for the FLL to control the coarse bank for frequency acquisition and 0.34  $\mu$ s for the CSS-ADPLL to control the fine bank for frequency fine-tuning and phase locking. This demonstrates the fast-locking capability of the proposed CSS-ADPLL with the FLL.

The PN of the PLL is measured using the Keysight E5052B signal source analyzer and E5053A downconverter. The reference source is R&S SMA-100B with a B711(N) option. With the DZ enabled in the DLF, Fig. 22(a), (d), (b), and (e) shows the measured rms jitter of 63 fs with a reference spur of -52.4 dBc at 19 GHz, and 68.6 fs with -51.9 dBc at 23 GHz, respectively. Fig. 22(c) and (f) presents the jitter and spur performance consistently remaining around 65 fs and -52 dBc across the TR. Specifically, Fig. 22(c) further shows the comparison of the jitter performance with

<sup>&</sup>lt;sup>18</sup>The chip has been re-measured for jitter and spur performance using this equipment as the reference source, which offers lower phase noise compared to the crystal-based reference used for our preceding conference paper [36].



Fig. 22. Measured results. (a) PN at 19 GHz, (b) PN at 23 GHz, (c) rms jitter over the TR with a one-time  $\tau_{pulse}$  setup at 21 GHz as well as two setups for  $\tau_{pulse}$  optimized separately for the low- and high-frequency ranges, (d) reference spur at 19 GHz, (e) reference spur at 23 GHz, and (f) reference spur over the TR with a one-time  $\tau_{pulse}$  setup. The DLF is configured as:  $\gamma = 2^{-1}$ ,  $\rho = 2^{-9}$ , and DZ\_PAR = 4.



Fig. 23. Measured results at 20 GHz. (a) RMS jitter versus  $\gamma$ , (b) rms jitter versus  $\rho$  with  $\gamma=1$  or  $2^{-1}$ , (c) rms jitter versus DZ\_PAR with  $\gamma=1$  or  $2^{-1}$ , (d) PN plots for different DZ\_PAR values, (e) rms jitter versus DCO supply variations with DZ\_PAR = 4 or with the integral path disabled (e.g., DZ\_PAR  $\geq$  5), and (f) rms jitter versus  $\tau_{\text{pulse}}$  (with  $\tau_{\text{pulse}}$  estimated from simulation).

a one-time  $\tau_{\text{pulse}}$  setup arrangement and an arrangement with two optimized  $\tau_{\text{pulse}}$  values for the low- and high-frequency ranges, validating the effectiveness of the  $\tau_{\text{pulse}}$  setup scheme discussed in Section II-C.

To fully characterize the DLF with the DZ, Fig. 23 presents the jitter measurement for different values of  $\gamma$ , DZ\_PAR, and  $\rho$ . At 20 GHz, with optimized DZ\_PAR = 4 and  $\rho$  =  $2^{-9}$ , sweeping  $\gamma$  from 2 to  $2^{-3}$  results in the lowest jitter of 67.3 fs at  $\gamma = 2^{-1}$ , as shown in Fig. 23(a).

With the DZ disabled (i.e., DZ\_PAR = 0 in Fig. 10) and  $\gamma = 1$  or  $2^{-1}$ , the jitter decreased as  $\rho$  is reduced and saturates at approximately 100 fs when  $\rho/\gamma \leq 2^{-4}$ . This experimentally confirms that  $\rho/\gamma \leq 2^{-4} < 1/10$  serves as a practical rule of thumb for balancing the proportional and integral paths. However, the results also demonstrate that, without DZ, merely reducing  $\rho$  is insufficient to suppress overcorrection from the integral path, ultimately leading to increased rms jitter, as analyzed in Section III-C.

|                                        | This work                    | Hu,<br>JSSC'22<br>[6]         | Wang,<br>JSSC'23<br>[18] | Wang,<br>JSSC'24<br>[19] | Du,<br>VLSI'21<br>[22]          | Zhao,<br>JSSC'23<br>[24]          | Gong,<br>JSSC'22<br>[34]          | Dolt,<br>JSSC'24<br>[47] | Lim,<br>JSSC'22<br>[20] | Li,<br>JSSC'24<br>[32] |
|----------------------------------------|------------------------------|-------------------------------|--------------------------|--------------------------|---------------------------------|-----------------------------------|-----------------------------------|--------------------------|-------------------------|------------------------|
| Technology (nm)                        | 22 CMOS                      | 28 CMOS                       | 40 CMOS                  | 40 CMOS                  | 28 CMOS                         | 28 CMOS                           | 40 CMOS                           | 22 CMOS                  | 65 CMOS                 | 28 CMOS                |
| Architecture                           | Charge-<br>Steering<br>ADPLL | Charge-<br>Sharing<br>Locking | SS-PLL                   | Dual-Path<br>SSPLL       | Reference-<br>sampling<br>ADPLL | Double-<br>Sampling<br>Analog PLL | Charge-<br>Sampling<br>Analog PLL | CP-PLL                   | Digital-<br>SSPLL       | SS-PLL                 |
| Output Freq.<br>(GHz)                  | 18.7-23.3<br>(21.9%)         | 21.7-26.5<br>(19.3%)          | 7.9-14.3<br>(28.8%)      | 20-24<br>(18.2%)         | 24-31<br>(25.5%)                | ~20<br>(2.25%)                    | 9.6-12<br>(22.2%)                 | 15-22<br>(37.8%)         | 12-14.5<br>(18.9%)      | 23.2-26<br>(11.4%)     |
| Ref. Freq. (MHz)                       | 250                          | 250                           | 100                      | 250                      | 50                              | 250                               | 100                               | 500                      | 50                      | 100                    |
| RMS Jitter (fs)<br>Integrated<br>Range | 63<br>(10k-40M)              | 75.9<br>(10k-30M)             | 77<br>(1k-30M)           | 61.2<br>(1k-100M)        | 199<br>(10k-30M)                | 20.9<br>(10k-40M)                 | 48.6<br>(1k-100M)                 | 121<br>(1k-100M)         | 83<br>(1k-100M)         | 48.3<br>(10k-100M)     |
| Ref. Spur (dBc)                        | -52.2                        | <b>–</b> 45                   | <b>-</b> 54              | -44                      | <b>-</b> 65                     | -66                               | -77.3                             | -64.1                    | -75                     | -66                    |
| Norm. Ref.<br>Spur** (dBc)             | -52.2                        | <del>-4</del> 7.4             | -49.7                    | -47.4                    | -68.2                           | -66                               | <del>-</del> 72.3                 | -65.4                    | <del>-</del> 71.6       | -68.2                  |
| Power (mW)                             | 9.95                         | 16.5                          | 14.1                     | 13.35                    | 11.55                           | 12                                | 5                                 | 55.7                     | 7.7                     | 19.1                   |
| FoM* (dB)                              | <b>-</b> 254                 | -250.2                        | -250.5                   | -253                     | -243.3                          | -262.8                            | -259.2                            | -240.9                   | -252.8                  | -253.5                 |
| Active Area<br>(mm²)                   | 0.044                        | 0.5                           | 0.18                     | 0.057                    | 0.3                             | 0.06                              | 0.13                              | 0.17                     | 0.23                    | 0.065                  |
| &F-M (4D)                              | 267.6                        | 252.2                         | 250.2                    | OCE E                    | 240.6                           | 275                               | 260.1                             | 240.6                    | 250.1                   | 205 4                  |

TABLE I Comparison With State-of-the-Art Integer-N mm-Wave/RF PLLs

To further investigate the effect of the DZ, DZ\_PAR is swept from 0 to 4 for two cases:  $\gamma=1$  and  $\gamma=2^{-1}$ , as illustrated in Fig. 23(c). Both measurements show the jitter decreases and saturates beyond DZ\_PAR = 2, demonstrating the effectiveness of the DZ. Fig. 23(d) shows the measured PN plots versus different DZ\_PAR values, which demonstrates that without DZ, PN peaking occurs, significantly worsening the jitter, as discussed in Section IV-E.

With the largest DZ, i.e., DZ\_PAR = 4, Fig. 23(e) shows that the jitter performance remains stable as the DCO supply is swept from 0.75 to 0.85 V, altering the DCO's intrinsic frequency. However, when the integral path is fully turned off, the PLL becomes prone to unlocking with small variations in the DCO supply. Fig. 23(f) presents the measured jitter performance versus different  $\tau_{pulse}$  values (estimated by simulation), demonstrating that jitter is not highly sensitive to  $\tau_{pulse}/T_{osc}=1.5$ , as analyzed in Section II-C.

Compared with the prior art, see Table I, our prototype boasts the figure of merit (FoM) of -254 dB, which is remarkable for digital PLLs in the >20-GHz range.

#### VII. CONCLUSION

This work presents a charge-domain ADPLL leveraging the proposed charge-steering sampling technique. By effectively integrating the charge-steering sampler into an SAR ADC, a multi-bit fine-resolution TDC is realized, enabling the ADPLL to achieve low jitter, low spurs, and fast locking. To accommodate the short-period oscillating waveform characteristic of mm-wave frequencies, a 1.5× oscillator-period sampling pulse scheme is introduced, extending the applicability of CSS to high-frequency domains. A digital loop filter incorporating a dead zone is implemented to mitigate conflicts between the proportional and integral paths, further improving jitter performance. On the theoretical front, a damped-sine

waveform model for the CSS current is developed, providing a comprehensive explanation for its high time-detection gain, even in deeply scaled short-channel CMOS devices. The noise mechanisms associated with the CSS are systematically analyzed through a multirate timestamp model, offering detailed insights into their impact on the ADPLL's PN performance. As a newly developed phase-detection mechanism, CSS demonstrates significant potential for broader exploration across diverse applications and oscillator topologies.

# APPENDIX

# BEHAVIORAL MODELING OF THE DCO IN VERILOG-AMS FOR SUB-50-FS ADPLL

Recording the DCO's output timestamps for PN and spur plotting in MATLAB has proven to be an effective methodology for analyzing ADPLLs and novel frequency synthesizers [6], [10], [38], [39]. However, when modeling an ADPLL targeting sub-50-fs jitter performance, the conventional Verilog-based modeling approach described in [38] encounters significant delay resolution limitations, which compromise the accuracy required at such low jitter levels.

As shown in Fig. 24(a), even if the resolution of osc\_period using the real data type is sufficiently high, the delay (#) in out remains constrained to 1-fs resolution. Specifically, because each falling edge delay (or rising edge delay) depends on the previously accumulated rising-edge delay (or falling-edge delay), all sub-1-fs resolution errors accumulate progressively over time. This leads to unacceptable inaccuracies when modeling an ADPLL targeting sub-50-fs jitter performance.

To overcome the "1-fs" resolution limitation, we propose a DCO edge modeling approach based on the absolute timestamp. As shown in Fig. 24(b), the falling-edge delay is obtained by differentiating two timestamps, resulting in

<sup>\*</sup>FoM = 20log<sub>10</sub>(Jitter/1s) + 10log<sub>10</sub>(Power/1mW); \*\*Spur normalized to 20GHz

 $<sup>^{\&</sup>amp;}$ FoM<sub>A</sub> = FoM +  $10log_{10}$ (Area/1mm<sup>2</sup>)

= #(osc\_timestamp\_fall - osc\_timestamp\_rise) 1'b0;

osc timestamp fall = osc timestamp fall + osc period;

out

accumulation.

(b)

Fig. 24. (a) Conventional DCO modeling in Verilog-AMS from [38], which suffers from the accumulation of sub-1-fs resolution errors. (b) Proposed DCO modeling, where one-edge delay (e.g., falling edge delay) is based

only a one-time accumulation of sub-1-fs errors due to (#). This method significantly enhances the accuracy of falling-edge modeling. Meanwhile, the rising-edge delay remains dependent on the preceding falling edge, thereby accumulating twice the sub-1-fs error. Nevertheless, the resulting error accumulation is considerably smaller than that in Fig. 24(a), making the proposed approach markedly more accurate for high-precision ADPLL modeling.

on differentiating timestamps, significantly reducing sub-1-fs resolution error

#### REFERENCES

- [1] D. Pfaff et al., "A 224 Gb/s 3 pJ/bit 40 dB insertion loss transceiver in 3-nm FinFET CMOS," *IEEE J. Solid-State Circuits*, vol. 60, no. 1, pp. 9–22, Jan. 2025.
- [2] J. Im et al., "6.1 A 112Gb/s PAM-4 long-reach wireline transceiver using a 36-way time-interleaved SAR-ADC and inverter-based RX analog front-end in 7nm FinFET," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2020, pp. 116–118.
- [3] B. Razavi, "Jitter-power trade-offs in PLLs," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 68, no. 4, pp. 1381–1387, Apr. 2021.
- [4] T. Micallef, I. Hussain, and K. Wu, "Multifunction transceiver for data communication, radar sensing and power transfer," *Electromagn. Sci.*, vol. 3, no. 2, pp. 1–22, Jun. 2025.
- [5] Base Station (BS) Radio Transmission and Reception, document TS 38.104, V18.6.0, 3GPP, 2024.
- [6] Y. Hu et al., "A charge-sharing locking technique with a general phase noise theory of injection locking," *IEEE J. Solid-State Circuits*, vol. 57, no. 2, pp. 518–534, Feb. 2022.
- [7] Z. Zhang et al., "An 18–23 GHz 57.4-fs RMS jitter –253.5-dB FoM sub-harmonically injection-locked all-digital PLL with single-ended injection technique and ILFD aided adaptive injection timing alignment technique," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 10, pp. 3733–3746, Oct. 2019.
- [8] H. Choi and S. Cho, "19.1 a 7.5GHz subharmonic injection-locked clock multiplier with a 62.5MHz reference, -259.7dB FoMJ, and -56.6dBc reference spur," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 67, Feb. 2024, pp. 348–350.
- [9] S. Kumar et al., "A ping-pong charge-sharing locking PLL with implicit reference doubling and simultaneous frequency/duty-cycle calibrations," *IEEE J. Solid-State Circuits*, vol. 60, no. 4, pp. 1368–1383, Apr. 2025.
- [10] Y. Hu, T. Siriburanon, and R. B. Staszewski, "Multirate timestamp modeling for ultralow-jitter frequency synthesis: A tutorial," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 69, no. 7, pp. 3030–3036, Jul. 2022.
- [11] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, "A low noise sub-sampling PLL in which divider noise is eliminated and PD/CP noise is not multiplied by N<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3253–3263, Dec. 2009.
- [12] Z. Yang, Z. Xu, M. Osada, and T. Iizuka, "A 10-GHz inductorless cascaded PLL with zero-ISF subsampling phase detector achieving -63-dBc reference spur, 175-fs RMS jitter and -240-dB FOMjitter," in *Proc. IEEE Symp. VLSI Technol. Circuits (VLSI Technol. Circuits)*, Jun. 2022, pp. 10-11.

- [13] T. Siriburanon et al., "A 2.2 GHz –242 dB-FOM 4.2 mW ADC-PLL using digital sub-sampling architecture," *IEEE J. Solid-State Circuits*, vol. 51, no. 6, pp. 1385–1397, Jun. 2016.
- [14] Z. Yang, Y. Chen, S. Yang, P.-I. Mak, and R. P. Martins, "16.8 a 25.4-to-29.5GHz 10.2mW isolated sub-sampling PLL achieving -252.9dB jitter-power FoM and -63dBc reference spur," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 270–272.
- [15] Z. Zhang, G. Zhu, and C. P. Yue, "A 0.65-V 12–16-GHz sub-sampling PLL with 56.4-fs<sub>rms</sub> integrated jitter and -256.4-dB FoM," *IEEE J. Solid-State Circuits*, vol. 55, no. 6, pp. 1665–1683, Jun. 2020.
- [16] J. Kim et al., "16.2 a 76fs<sub>rms</sub> jitter and –40dBc integrated-phase-noise 28-to-31GHz frequency synthesizer based on digital sub-sampling PLL using optimally spaced voltage comparators and background loop-gain optimization," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 258–260.
- [17] J. Kim, Y. Jo, H. Park, T. Seong, Y. Lim, and J. Choi, "A 12.8–15.0-GHz low-jitter fractional-N subsampling PLL using a voltage-domain quantization-error cancellation," *IEEE J. Solid-State Circuits*, vol. 59, no. 2, pp. 424–434, Feb. 2024.
- [18] Y. Wang et al., "Analysis and design of a dual-mode VCO with inherent mode compensation enabling a 7.9–14.3-GHz 85-fs-rms jitter PLL," *IEEE J. Solid-State Circuits*, vol. 58, no. 8, pp. 2252–2266, Aug. 2023.
- [19] L. Wang, Z. Liu, R. Ma, and C. P. Yue, "A compact 20–24-GHz sub-sampling PLL with charge-domain bandwidth control scheme," *IEEE J. Solid-State Circuits*, vol. 60, no. 3, pp. 768–784, Mar. 2025.
- [20] Y. Lim, J. Kim, Y. Jo, J. Bang, and J. Choi, "A wide-lock-in-range and low-jitter 12–14.5 GHz SSPLL using a low-power frequencydisturbance-detecting and correcting loop," *IEEE J. Solid-State Circuits*, vol. 57, no. 2, pp. 480–491, Feb. 2022.
- [21] J. Du, T. Siriburanon, Y. Hu, V. Govindaraj, and R. B. Staszewski, "A reference-waveform oversampling technique in a fractional-N ADPLL," *IEEE J. Solid-State Circuits*, vol. 56, no. 11, pp. 3445–3457, Nov. 2021.
- [22] J. Du et al., "A 24–31 GHz reference oversampling ADPLL achieving FoMjitter-N of -269.3 dB," in *Proc. Symp. VLSI Circuits*, Jun. 2021, pp. 1–2.
- [23] W. Wu et al., "A 14-nm ultra-low jitter fractional-N PLL using a DTC range reduction technique and a reconfigurable dual-core VCO," *IEEE J. Solid-State Circuits*, vol. 56, no. 12, pp. 3756–3767, Dec. 2021.
- [24] Y. Zhao, M. Forghani, and B. Razavi, "A 20-GHz PLL with 20.9-fs random jitter," *IEEE J. Solid-State Circuits*, vol. 58, no. 6, pp. 1597–1609, Jun. 2023.
- [25] S. M. Dartizio et al., "A fractional-N bang-bang PLL based on type-II gear shifting and adaptive frequency switching achieving 68.6 fsrms-total-integrated-jitter and 1.56 μs-locking-time," *IEEE J. Solid-State Circuits*, vol. 57, no. 12, pp. 3538–3551, Dec. 2022.
- [26] S. M. Dartizio et al., "A low-spur and low-jitter fractional-N digital PLL based on an inverse-constant-slope DTC and FCW subtractive dithering," *IEEE J. Solid-State Circuits*, vol. 58, no. 12, pp. 3320–3337, Dec. 2023.
- [27] M. Rossoni et al., "10.1 an 8.75GHz fractional-N digital PLL with a reverse-concavity variable-slope DTC achieving 57.3fs<sub>rms</sub> integrated jitter and -252.4dB FoM," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2024, pp. 188-190.
- [28] L. Feng, W. Rhee, and Z. Wang, "A DTC-free fractional-N BBPLL with FIR-embedded injection-locked-oscillator-based phase-domain lowpass filter," *IEEE J. Solid-State Circuits*, vol. 59, no. 3, pp. 728–739, Mar. 2024.
- [29] A. Santiccioli et al., "A 66-fs-rms jitter 12.8-to-15.2-GHz fractional-N bang-bang PLL with digital frequency-error recovery for fast locking," *IEEE J. Solid-State Circuits*, vol. 55, no. 12, pp. 3349–3361, Dec. 2020.
- [30] S. M. Dartizio et al., "A 12.9-to-15.1-GHz digital PLL based on a bangbang phase detector with adaptively optimized noise shaping," *IEEE J. Solid-State Circuits*, vol. 57, no. 6, pp. 1723–1735, Jun. 2022.
- [31] D. Lee and P. P. Mercier, "A sub-mW 2.4-GHz active-mixer-adopted sub-sampling PLL achieving an FoM of -256 dB," *IEEE J. Solid-State Circuits*, vol. 55, no. 6, pp. 1542–1552, Jun. 2020.
- [32] H. Li, T. Xu, X. Meng, J. Yin, R. P. Martins, and P.-I. Mak, "A 23.2-to-26-GHz low-jitter fast-locking sub-sampling PLL based on a function-reused VCO-buffer and a Type-I FLL with rapid phase alignment," *IEEE J. Solid-State Circuits*, vol. 59, no. 12, pp. 3952–3965, Dec. 2024.
- [33] S. Yoo, S. Choi, J. Kim, H. Yoon, Y. Lee, and J. Choi, "A low-integrated-phase-noise 27–30-GHz injection-locked frequency multiplier with an ultra-low-power frequency-tracking loop for mm-Wave-band 5G transceivers," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 375–388, Feb. 2018.

- [34] J. Gong, E. Charbon, F. Sebastiano, and M. Babaie, "A low-jitter and low-spur charge-sampling PLL," *IEEE J. Solid-State Circuits*, vol. 57, no. 2, pp. 492–504, Feb. 2022.
- [35] B. Razavi, "Charge steering: A low-power design paradigm," in Proc. IEEE Custom Integr. Circuits Conf., Sep. 2013, pp. 1–8.
- [36] W. Tao, W. Zhao, R. B. Staszewski, F. Lin, and Y. Hu, "An 18.8-to-23.3 GHz ADPLL based on charge-steering-sampling technique achieving 75.9 fs RMS jitter and -252 dB FoM," in *Proc. IEEE Symp. VLSI Technol. Circuits* (VLSI Technol. Circuits), Jun. 2023, pp. 1–2.
- [37] W. Tao, Y. Liu, Y. Yang, R. B. Staszewski, F. Lin, and Y. Hu, "A compact 21–25 GHz charge-domain fractional-N ADPLL with 168 fs total RMS jitter," in *Proc. IEEE Eur. Solid-State Electron. Res. Conf. (ESSERC)*, Sep. 2024, pp. 693–696.
- [38] Y. Hu, "Flicker noise upconversion and reduction mechanisms in RF/millimeter-wave oscillators for 5G communications," Ph.D. dissertation, School Elect. Electron. Eng., Univ. College Dublin, Dublin, Ireland, 2019. [Online]. Available: http://hdl.handle.net/10197/11459
- [39] Y. Hu, W. Tao, and R. B. Staszewski, "Nonlinearity-induced spur analysis in fractional-N synthesizers with ΔΣ quantization cancellation," *IEEE Open J. Solid-State Circuits Soc.*, vol. 4, pp. 226–237, 2024.
- [40] Y. Hu, T. Siriburanon, and R. B. Staszewski, "A low-flicker-noise 30-GHz class-F23 oscillator in 28-nm CMOS using implicit resonance and explicit common-mode return path," *IEEE J. Solid-State Circuits*, vol. 53, no. 7, pp. 1977–1987, Jul. 2018.
- [41] Y. Hu, T. Siriburanon, and R. B. Staszewski, "Intuitive understanding of flicker noise reduction via narrowing of conduction angle in voltagebiased oscillators," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 66, no. 12, pp. 1962–1966, Dec. 2019.
- [42] Y. Hu, T. Siriburanon, and R. B. Staszewski, "Oscillator flicker phase noise: A tutorial," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 68, no. 2, pp. 538–544, Feb. 2021.
- [43] X. Chen, Y. Hu, T. Siriburanon, J. Du, R. B. Staszewski, and A. Zhu, "A 30-GHz class-F quadrature DCO using phase shifts between draingate-source for low flicker phase noise and I/Q exactness," *IEEE J. Solid-State Circuits*, vol. 58, no. 7, pp. 1945–1958, Jul. 2023.
- [44] M. Miyahara, Y. Asada, D. Paik, and A. Matsuzawa, "A low-noise self-calibrating dynamic comparator for high-speed ADCs," in *Proc. IEEE Asian Solid-State Circuits Conf.*, Nov. 2008, pp. 269–272.
- [45] C.-C. Liu, S.-J. Chang, G.-Y. Huang, and Y.-Z. Lin, "A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching procedure," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 731–740, Apr. 2010.
- [46] L. Xu, K. Stadius, and J. Ryynanen, "An all-digital PLL frequency synthesizer with an improved phase digitization approach and an optimized frequency calibration technique," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 11, pp. 2481–2494, Nov. 2012.
- [47] D. Dolt and S. Palermo, "A radiation-hardened 15–22-GHz frequency synthesizer in 22-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 59, no. 9, pp. 2870–2883, Sep. 2024.



**Robert Bogdan Staszewski** (Fellow, IEEE) was born in Bialystok, Poland. He received the B.Sc. (summa cum laude), M.Sc., and Ph.D. degrees in electrical engineering from The University of Texas at Dallas, Richardson, TX, USA, in 1991, 1992, and 2002, respectively.

From 1991 to 1995, he was with Alcatel Network Systems, Richardson, involved in SONET cross-connect systems for fiber optics communications. He joined Texas Instruments Inc., Dallas, TX, USA, in 1995, where he was an Elected Dis-

tinguished Member of Technical Staff (limited to 2% of technical staff). From 1995 to 1999, he was engaged in advanced CMOS read channel development for hard disk drives. In 1999, he co-started the Digital RF Processor (DRP) Group, Texas Instruments Inc., with a mission to invent new digitally intensive approaches to traditional RF functions for integrated radios in deeply scaled CMOS technology. He was appointed as a CTO of the DRP Group from 2007 to 2009. In 2009, he joined the Delft University of Technology, Delft, The Netherlands, where he currently holds a guest appointment as a Full Professor (Antoni van Leeuwenhoek Hoogleraar). Since 2014, he has been a Full Professor with the University College Dublin (UCD), Dublin Ireland He is also a Guest Full Professor with the AGH University of Krakow, Kraków, Poland. In addition, he is a Co-Founder of a startup company, Equal 1Labs, with design centers located in Silicon Valley, Dublin, Delft, and Timişoara, Romania, aiming to produce single-chip CMOS quantum computers. He has authored or co-authored six books, 11 book chapters, 170 journals, and 230 conference publications, and holds 220 issued U.S. patents. His research interests include nanoscale CMOS architectures and circuits for frequency synthesizers, transmitters and receivers, and quantum computers.

Dr. Staszewski was a recipient of the 2012 IEEE Circuits and Systems Industrial Pioneer Award. He also serves on the Technical Program Committee of the IEEE European Solid-State Circuits Conference (ESSCIRC).



Weichen Tao (Graduate Student Member, IEEE) was born in Yichang, Hubei, China, in 1998. He received the B.Sc. degree in applied physics from the School of the Gifted Young, University of Science and Technology of China, Hefei, China, in 2019, where he is currently pursuing the Ph.D. degree in microelectronics.

His research interests include analog-/mixed-signal IC design, focusing on low-jitter frequency synthesizers for millimeter-wave (mm-wave) applications.



Yuhao Yang (Graduate Student Member, IEEE) was born in Guangde, Anhui, China, in 2001. He received the B.Sc. degree in applied physics from the University of Science and Technology of China (USTC), Hefei, China, in 2022, where he is currently pursuing the Ph.D. degree in microelectronics.

His research interests include high-performance support all-digital phase-locked loops (ADPLLs) and digital transmitters.



**Yizhe Hu** (Member, IEEE) was born in Chenzhou, Hunan, China. He received the B.Sc. degree (summa cum laude) in microelectronics from Harbin Institute of Technology, Harbin, China, in 2013, and the Ph.D. degree in microelectronics from the University College Dublin, Dublin, Ireland, in 2019.

From 2013 to 2014, he was a Post-Graduate Researcher with Fudan University, Shanghai, China, focusing on radio-frequency integrated circuits design. He consulted part-time for the PLL Group, HiSilicon, Huawei Technologies, Shenzhen, China,

from 2016 to 2017, designing 16-nm digitally controlled oscillators and all-digital phase-locked loops. From 2018 to 2022, he consulted part-time for the Mixed-Signal Design Department, TSMC, Hsinchu, Taiwan, on a new type of phase-locked loop (PLL) design in 5-nm CMOS. From 2019 to 2020, he was a Post-Doctoral Researcher with Prof. Staszewski's Group, University College Dublin. From 2020 to 2022, he served as a Principal Investigator with the Microelectronic Circuits Centre Ireland, Dublin. Since 2022, he has been a Professor with the University of Science and Technology of China, Hefei, China. His research interests include digital-RF/mm-wave integrated circuits and architectures for wireless/wireline communications.

Prof. Hu serves as a reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: REGULAR PAPERS/IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, and IEEE Transactions on Microwave Theory and Techniques.