Loading web-font TeX/Main/Regular
Millimeter-Wave All-Digital Phase-Locked Loop Using Reference Waveform Oversampling Techniques | IEEE Journals & Magazine | IEEE Xplore

Millimeter-Wave All-Digital Phase-Locked Loop Using Reference Waveform Oversampling Techniques


Abstract:

This article proposes an mm-wave fractional-N all-digital phase-locked loop (ADPLL) employing a reference-waveform oversampling (ROS) phase detector (PD) that increases i...Show More

Abstract:

This article proposes an mm-wave fractional-N all-digital phase-locked loop (ADPLL) employing a reference-waveform oversampling (ROS) phase detector (PD) that increases its effective rate four times, consequently improving jitter at lower power consumption while using a low-frequency reference of 50 MHz. The passive oversampling PD utilizes a zero-forcing technique for voltage-domain presetting and compensation for both the fractional phase and reference spurs induced by imperfections in the reference waveform and reference-waveform oversampling PD (ROS-PD). The ROS-PD eliminates the conventional power-hungry low-noise buffer for the reference input and reduces the PD noise by increasing the loop correction speed. This promotes low jitter and high efficiency in advanced mm-wave PLLs without relying on the increase of the reference clock frequency to several hundred MHz. The imperfections in the reference waveform and ROS-PD, i.e., harmonic distortion, differential path mismatches, and other nonideality factors, can be programmably compensated by the proposed digital manifold calibration scheme, resulting in low reference spurs. A class-F3 oscillator is used to generate a ~10-GHz signal for the feedback divider along with its third harmonic for the harmonic extractor to generate the ~30-GHz output. The proposed ADPLL is implemented in TSMC 28-nm LP CMOS. The prototype generates a 24–31-GHz output carrier with rms jitter of 237 fs while consuming only 12 mW. This corresponds to a state-of-the-art ADPLL {\mathrm {FoM}}_{\text {jitter-N}} of −269 dB in a fractional-N mode. Using a comprehensive digital calibration, the reference spurious tones can be reduced from −33 to −65 dBc.
Page(s): 212 - 225
Date of Publication: 07 November 2024
Electronic ISSN: 2644-1349

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

To cope up with the continual increase of data rates in wireless communication systems with complex modulation schemes (i.e., 5G NR), a millimeter-wave (mmW) signal generation with low integrated phase noise (PN) or rms jitter is of paramount importance [1]. As an example, an error vector magnitude (EVM) typically less than −30 dBc is required, which means that the rms jitter of the PLL must be lower than 200 fs for a 30-GHz carrier [2], [3]. To address such stringent requirements, PLLs play a crucial role in suppressing the oscillator PN at lower frequency offsets to meet the required performance. However, a larger PLL bandwidth is necessary to compensate for the insufficient PN of a typical mmW oscillator. This, in turn, necessitates a large frequency of the reference clock.

Under the optimal loop bandwidth, the in-band PN (IBPN) of the PLLs is typically dominated by the phase detector (PD) noise, and the associated noise in the reference path (i.e., crystal oscillator (XO) and its low-noise reference buffer [4], [5]). Traditional analog charge-pump (CP) PLLs of low jitter rely heavily on the careful design of high-power CP with accurate up/down current matching and bulky analog loop filters, which increasingly become more difficult in advanced CMOS nodes (e.g., finer than 28 nm) [6], [7], [8]. To solve this problem, all-digital PLLs (ADPLL) are often employed as they are amenable to CMOS scaling [9], [10]. Moreover, digital assistance is used to maintain performance across PVT variations [11]. However, the rms jitter performance of ADPLLs is usually limited by the TDC resolution and its linearity, which results in poor IBPN and in-band fractional spurs (IBFSs), respectively [12], [13]. To solve this issue, various phase-detection techniques, e.g., digital-to-time converter (DTC)-assisted with short-range high-resolution PD, have been proposed [14], [15], [16]. For example, a subsampling PD (SS-PD) takes advantage of the high-slew rate of a signal from the oscillator feedback path by sampling it with the reference frequency $(f_{\textrm {ref}})$ clock with sharp rising edges [17], [18], [19]. This results in a high PD gain with extremely high resolution but due to the limited linear slope of the input waveform, the PLL could easily lose lock in the face of a perturbation to the oscillator [20]. Thus, an extra frequency locked-loop (FLL) is required to ensure robust locking [21].

Alternatively, a reference-sampling PLL (RS-PLL) uses a frequency-divided signal from the oscillator feedback path to sample the reference sinusoidal XO waveform directly [22]. Due to the large linear range and limited slew rate of a typical low-cost XO ($f_{\textrm {ref}} \le 50$ MHz), its large locking range results in superb locking robustness but requires techniques to resolve the issue of low time-to-voltage conversion [23]. To break the tradeoff between the achievable PLL bandwidth and $f_{\textrm {ref}}$ , various reference multiplication techniques have been proposed. However, they require extra circuitry and complicated calibration, which increase power and area [see Fig. 1(a)] [24], [25], [26]. To solve this problem more efficiently, reference oversampling techniques had been proposed. They increase the PD rate well beyond $f_{\textrm {ref}}$ by not only locking at specifically zero-crossing points of the reference waveform, but spreading the locking points throughout the whole reference period [27], [28], [29], [30].

FIGURE 1. - Simplified diagrams of (a) conventional ADPLL with a reference quadrupler and (b) proposed mm-wave ADPLL employing a 
$4\times $
 reference-oversampling PD and class-F3 oscillator with harmonic extractor.
FIGURE 1.

Simplified diagrams of (a) conventional ADPLL with a reference quadrupler and (b) proposed mm-wave ADPLL employing a $4\times $ reference-oversampling PD and class-F3 oscillator with harmonic extractor.

In this article, we propose a low-power low-noise reference waveform oversampling ADPLL (ROS-ADPLL) for mmW frequency generation [see Fig. 1(b)] [31], [32]. To solve the spurious tones that may arise from the distortions in the input XO waveform and associated issues, a calibration technique will be proposed. By $4\times $ oversampling the reference waveform, the loop bandwidth can increase from 3 to 10 MHz, which can reduce the IBPN by 6 dB, ultimately achieving 200-fs jitter when constrained by an $f_{\textrm {ref}} {=} {\mathrm {50-MHz}}$ reference input. Section II summarizes the operation of the proposed $4\times $ reference oversampling PD. Section III presents the entire architecture of the proposed ROS-ADPLL with consideration of nonlinearities in the sampling path and the proposed calibration scheme, as well as the fractional compensation methods. Section IV introduces the circuit implementation of the oscillator and the harmonic extractor, dividers, and clock generator (CG). The measurements and conclusions are summarized in Sections V.

SECTION II.

Proposed Oversampling ADPLL with Calibration of Reference Spurs

In this study, a reference-waveform oversampling PD (ROS-PD) is employed to directly sample the waveform of a standard 50-MHz crystal oscillator, eliminating the need for power-intensive reference buffer/slicers and taking advantage of the wide monotonic PD range. By utilizing bottom-plate oversampling with a zero-forcing technique [32], the PD achieves low-noise characteristics while increasing the loop rate to $4\times $ $f_{\textrm {ref}}$ , thereby mitigating the IBPN limitations. The system-level diagram is depicted in Fig. 2. The reference waveform $(S_{\textrm {REF}})$ is directly $4\times $ -oversampled by the divided oscillator clock ${\mathrm{CK}_{\mathrm{OS}}}$ through the cross-coupled bottom-plate sampling switches. The proposed ROS-PD utilizes a single-path detector assisted by the two DACs clocked by ${\mathrm{CK}_{\mathrm{OS}}}$ . Thus, the DACs can preset the differential dc voltage to support a fractional-N operation while yielding excellent power efficiency. A lookup table (LUT) translates the accumulated fractional phase into sinusoidal mapping to adjust the DACs and to compensate for the fractional residues in the voltage domain. The sampled signal ${\mathrm{PD}_{\mathrm{OUT}}}$ is amplified by two gain stages (GM1, GM2) and subsequently digitized by the SAR-ADC. Leveraging the digital implementation facilitates the precise phase-error calculation and straightforward calibration of any offsets or mismatches in the PD. A programmable proportional–integral (PI) control block serves as a loop filter, tuning the 30-GHz class-F23 DCO which drives the third-harmonic extractor (H3E). The DCO supplies the fundamental ~10-GHz component to the $\div 4$ frequency divider, multimodulus divider (MMDIV), and CG that generates the clocks for the ROS-PD and digital blocks.

FIGURE 2. - Top-level diagram of the proposed ROS fractional-N ADPLL with manifold reference spur calibration.
FIGURE 2.

Top-level diagram of the proposed ROS fractional-N ADPLL with manifold reference spur calibration.

A. Bottom-Plate Oversampling Phase Detectors with Zero-Forcing Technique

Since the PD gain, $K_{\textrm {ROS}}$ , of the proposed architecture is quite low, it appears rather challenging to realize the needed phase digitization that is high resolution yet low power. The conventional approach in [22] and [33] employs a top-plate sampling mechanism [see Fig. 3(a)]. That method tracks the reference waveform voltage $(S_{\textrm {REF}})$ during the high phase of the oscillator feedback clock ${\mathrm{CK}_{\mathrm{OS}}}$ and holds the voltage on sampling capacitor $C_{S}$ during the low phase of ${\mathrm{CK}_{\mathrm{OS}}}$ . In the $4\times $ ROS-PD configuration, the top-plate sampler offers four locking points with a phase increment of $2\pi /4$ during an integer-N operation. However, in a fractional-N mode, the sampled voltage continuously follows $S_{\textrm {REF}}$ . Consequently, the subsequent quantizer would require a wide ~1-V full-range input voltage to accommodate such varying voltages, necessitating an ADC with extremely high resolution to achieve low IBPN. Such a high-resolution ADC would typically consume high power, especially at a conversion speed of hundreds of MHz. To address these challenges, Fig. 3(b) shows the proposed bottom-plate sampling method assisted by a voltage zero-forcing technique. This approach effectively reduces the required input range by eliminating the expected dc offset ${\pm }A\sin {(\pi /4)}$ at each locking point by means of presetting the input node to $V_{P}$ /$V_{N}$ during the preset phase $\overline {\mathrm {CK_{OS}}}$ . Additionally, a fine-resolution/small-range CDAC at the top plate can compensate for the fractional voltage residues. This configuration ensures that the voltage on the top plate of $C_{S}$ accurately represents the bias-free phase error, proportional to $K_{\textrm {ROS}}\cdot \Delta \phi $ . By reducing the required input range, subsequent circuitry can be designed with much relaxed specifications, achieving low-noise performance while minimizing power consumption. Unfortunately, there are various nonlinearities associated with the reference path, i.e., harmonic distortions and differential mismatches of the input reference waveform, charge injections from sampling switches, etc. (see Fig. 4). These cause periodic error patterns that will result in reference spurious tones [see Fig. 3(c)].

FIGURE 3. - Reference oversampling schemes: (a) conventional top-plate sampling, (b) proposed bottom-plate sampling with voltage zero-forcing, and (c) effect of imperfections from the input harmonic distortion and reference biases where 
$\overline {\mathrm {CK_{OS}}}$
 is the preset phase.
FIGURE 3.

Reference oversampling schemes: (a) conventional top-plate sampling, (b) proposed bottom-plate sampling with voltage zero-forcing, and (c) effect of imperfections from the input harmonic distortion and reference biases where $\overline {\mathrm {CK_{OS}}}$ is the preset phase.

FIGURE 4. - Summary of possible imperfections in the proposed differential 
$4\times $
 oversampling PD with voltage zero-forcing and cross-coupled differential configuration.
FIGURE 4.

Summary of possible imperfections in the proposed differential $4\times $ oversampling PD with voltage zero-forcing and cross-coupled differential configuration.

Fig. 5 shows the operational principle of the reference-oversampling PD with zero forcing. The reference waveform is sampled using bootstrapped switches [34] and two bottom-plate sampling capacitors $C_{\textrm {SP/SN}}$ . The top plates of $C_{\textrm {SP/SN}}$ are preset by the reference bias voltage $V_{P}$ /$V_{N}$ which is ~0.85V/0.15V (i.e., $0.5\pm 0.5\sin {(\pi /4)}$ V) aligning with the $\sin {\pi /4}$ positions of the differential reference sinusoidal input with an amplitude of 0.5V and spanning from 0 to 1V. Any residues will be compensated further by the CDACs during $\phi _{3}$ . The bottom plates are preset to 0.5V, regulated by the threshold voltage $V_{\textrm {CM}}$ of the self-biasing circuitry, which is a replica of the inverter-based amplifier $A_{1}$ . After sampling the reference signal, the voltage information is stored on both plates of $C_{\textrm {S}}$ . The digital block controls the CDAC code to compensate for an instantaneous voltage residue arising from the sampling of $S_{\textrm {REF}}$ with the accumulated fractional phase. The amplifier $A_{1}$ is designed for low input-referred noise (IRN), while $A_{2}$ provides high driving ability to amplify and charge up the input capacitor of the SAR-ADC [23].

FIGURE 5. - Operational principle of the implemented reference-oversampling PD: (a) preset phase (
$\phi _{1}$
), (b) tracking phase (
$\phi _{2}$
), (c) amplify/calibration phase (
$\phi _{3}$
), and (d) complete waveforms of all phases.
FIGURE 5.

Operational principle of the implemented reference-oversampling PD: (a) preset phase ($\phi _{1}$ ), (b) tracking phase ($\phi _{2}$ ), (c) amplify/calibration phase ($\phi _{3}$ ), and (d) complete waveforms of all phases.

The operation of the ROS-PD consists of three phases, as shown in Fig. 5(d). In phase $\phi _{1}$ , the top plates of $C_{\textrm {SP}}$ /$C_{\textrm {SN}}$ are preset to $V_{P}$ (0.85 V) and $V_{N}$ (0.15 V), respectively, while the bottom-plate nodes, $V_{\textrm {BP}}$ /$V_{\textrm {BN}}$ , are preset to $V_{\textrm {CM}}$ of 0.5V. After the presetting phase, the voltage across the top and bottom plates of $C_{S}$ is 0.35V. In phase $\phi _{2}$ [subfigure (b)], a short ~0.7-ns pulse enables the switch to track the incoming $S_{\textrm {REF}}$ . This charges the capacitance on $V_{\textrm {TP/TN}}$ mainly established by the CDAC’s input. While maintaining the conserved charge on $C_{\textrm {SP/SN}}$ , the voltage at node $V_{\textrm {BP/BN}}$ tracks the $S_{\textrm {REF}}$ trajectory with the previously developed offset of 0.35V. Concurrently, $A_{1}$ preamplifies that voltage at $V_{\textrm {BP/BN}}$ with a gain of 15. Entering phase $\phi _{3}$ stops the input tracking, thus establishing the sampling voltage. Furthermore, to make an adjustment for the fractional-N operation, the digital control switches the CDAC code to change the load capacitance seen at node $V_{\textrm {TP/TN}}$ , thereby the voltage at node $V_{\textrm {TP/TN}}$ (and $V_{\textrm {BP/BN}}$ ) can be programmed to remove the voltage residue seen in the fractional mode. Concurrently, the second-stage amplifier, GM2, starts driving the 6-bit SAR-ADC and amplify the sampled voltage on the input CDAC of the SAR-ADC. This phase is reserved for a 3-ns duration. Note that, in the proposed approach, the low time-to-voltage gain $K_{\textrm {ROS}}$ requires amplifying the sampled voltage signal before the digitization. However, a single-stage amplifier with high gain would introduce substantial input parasitic capacitance in the preamplifier stage GM1, leading to a decrease in signal amplitude due to capacitive division between $C_{\mathrm { SP}}$ /$C_{\mathrm { SN}}$ and the parasitic capacitance. To ensure accurate voltage transfer during bottom-plate sampling, the input parasitic capacitance of GM1 must be minimized, which restricts the gain that can be achieved in the first stage. Moreover, the input node of GM1 contains voltage information from $S_{\textrm {DCO}}$ with a low transfer gain through the $S_{\textrm {ref}}$ slope, which places strict demands on noise performance, requiring GM1 to have low IRN.

To avoid long charge/discharge times and to improve power efficiency, a cross-coupled sampling technique is proposed. The circuit is composed of four bootstrapped switches with two sampling sets (i.e., $S_{\textrm {1P/1N}}$ and $S_{\textrm {2P/2N}}$ ) as shown in Fig. 5(d). Therefore, instead of sampling the positive/negative $S_{\textrm {REF}}$ input by their own sampling capacitor $C_{\textrm {SP}}$ /$C_{\textrm {SN}}$ , four bootstrapped switches are adopted to facilitate the possibility of swapping the sampling capacitors with the differential input waveforms. In the first two sampling phases, $C_{\textrm {SP}}$ samples the positive reference signal $S_{\textrm {REFP}}$ through the $S_{2P}$ switch, while $S_{2N}$ is maintained open [see Fig. 5(b)]. During the next two sampling phases, the $S_{2P}$ switch is disabled while $C_{\textrm {SP}}$ samples the negative reference signal $S_{\textrm {REFN}}$ using $S_{2N}$ . Exploiting the proposed cross-coupled differential sampling operation, $C_{\textrm {SP}}$ always samples the reference signal in the region near 0.85V. Likewise, $C_{\textrm {SN}}$ always samples the reference signal near 0.15V. When the loop is locked, after the sampling phase $\phi _{1}$ , the sampled voltage difference caused by the PN should be a small value of $K_{\textrm {ROS}}\cdot \sigma _{\textrm {jitter}}$ . Additionally, due to the strict requirements on the timing (~1 ns) of preset phase $(\phi _{1})$ and calibration phase $(\phi _{3})$ , in this prototype, $V_{P}$ /$V_{N}$ are provided via separate interconnect pads. This relaxes the linearity and power consumption for the DAC while meeting the timing constraints.

B. Problem of Reference Spurious Tones and the Proposed Compensation Techniques

The $4\times $ ROS-PD could generate several signatures of periodic patterns that cause reference spurious tones summarized in Fig. 6. The differential offset, amplitude mismatch, and the $V_{\textrm {P}}$ /${\mathrm {V}}_{\textrm {N}}$ gain mismatches can cause $+\Delta _{\textrm {offset}}$ /$-\Delta _{\textrm {offset}}$ on the positive/negative paths. Note that ${\mathrm {E}}_{\textrm {SREFP,N}}$ are single-ended errors for positive/negative inputs $(S_{\textrm {REFP,REFN}})$ , respectively. As a result, due to the differential signaling of the implemented PD, output errors can be derived as ${\mathrm{E}}_{\textrm {SREFP}}$ -${\mathrm{E}}_{\textrm {SREFN}}$ and ${\mathrm{E}}_{\textrm {SREFP}}$ -${\mathrm{E}}_{\textrm {SREFN}}$ for State 1-2 and State 3-4, respectively. Henceforth, this will be translated into a normalized [1, 1, −1, −1] error pattern of ${\mathrm {ERR}}_{\textrm {offset}}$ at the output of four sampling states in one reference period. Similarly, the error caused by unbalanced edges of sinusoidal waveform $(\Delta _{\textrm {edge}})$ generates a [1, −1, −1, 1] pattern ${\mathrm {ERR}}_{\textrm {edge}}$ and the differential phase mismatches between ${\mathrm{S}}_{\textrm {REFP}}$ and ${\mathrm{S}}_{\textrm {REFN}}~(\Delta _{\textrm {phi}})$ contribute to the pattern with [−1, 1, −1, 1] of ${\mathrm {ERR}}_{\textrm {phi}}$ .

FIGURE 6. - Periodic error patterns caused by (a) differential dc offset, (b) unbalanced slopes of input reference waveform, and (c) input differential offset.
FIGURE 6.

Periodic error patterns caused by (a) differential dc offset, (b) unbalanced slopes of input reference waveform, and (c) input differential offset.

The above errors are first coarsely calibrated by the above preprogrammed states as shown in Fig. 7(a). It can be observed from the equivalent circuit in Fig. 7(b) that the compensation voltages (${\mathrm {V}}_{\textrm {ECP/ECN}}$ ) can be used to compensate the periodic errors at nodes $V_{\textrm {TP/TN}}$ , respectively. These voltages can be obtained from the CDACs, which are controlled by predefined patterns to compensate for the above errors with gain factors ERR1–3 that can adjust the voltage level of the compensation patterns due to the differential dc offset, unbalanced slopes of input waveform, and input differential offset, respectively, as shown in Fig. 7(c). It should be pointed out that the proposed reference spur calibration shows a significant reduction in the level of reference spurs in both the integer and fractional modes. Finally, the proposed calibration technique for the reference spur reduction is shown in Fig. 8. With this prototype being merely a proof of concept, the LMS calibration of ERR1, ERR2, and ERR3 has not been included for the proposed reference spur calibration. By manually setting these gains, followed by some tweaking during lab measurements, the proper values of these settling can be achieved directly. The compensation values ERR1–3 can be estimated from: 1) ERR1 = ${\mathrm {DC}}_{\textrm {offset}}$ /${\mathrm{V}}_{\textrm {res}}$ ; 2) ERR2 = $A \cdot 10^{\mathrm { THD2/20}}$ /${\mathrm{V}}_{\textrm {res}}$ ; and 3) ${\mathrm {ERR}}_{3} {=}$ (A$\sqrt {2}$ /4) (sin$(\Delta \phi) {+}$ co${\mathrm{S}}(\Delta \phi)-1$ )/(${\mathrm {2V}}_{\mathrm { res}}$ ), where $V_{\textrm {res}}$ is the CDAC resolution, A is the amplitude of the input XO waveform, THD2 is the second harmonic content in the XO waveform, and $\Delta \phi $ is the phase offset.

FIGURE 7. - (a) Conceptual timing diagram of the proposed error-pattern compensation, and simplified diagrams of the differential ROS-PD with (b) calibration using compensated voltage (
${\mathrm{V}}_{\textrm {EP}}$
 and 
${\mathrm{V}}_{\textrm {EN}}$
) and with (c) DAC-based error pattern correction.
FIGURE 7.

(a) Conceptual timing diagram of the proposed error-pattern compensation, and simplified diagrams of the differential ROS-PD with (b) calibration using compensated voltage (${\mathrm{V}}_{\textrm {EP}}$ and ${\mathrm{V}}_{\textrm {EN}}$ ) and with (c) DAC-based error pattern correction.

FIGURE 8. - Proposed differential ROS-PD with the DAC-assisted reference spurious suppression.
FIGURE 8.

Proposed differential ROS-PD with the DAC-assisted reference spurious suppression.

From our detailed investigation, the sources of these possible nonlinear signatures come from: 1) dc offset was mainly caused by mismatches between the two paths of sampling bootstrapped switches (4.3 mV from post-layout extraction); 2) phase offset from the off-chip balun that provides the differential crystal waveforms (3.6° from off-chip balun); and 3) harmonic distortions of the XO waveform (second harmonic contents of −40 dBc from XO). The proposed ADPLL was found to be more sensitive to dc offset, which should be less than 5 mV to acquire the lock. The compensation DACs are beneficially merged in functionality with those for the fractional errors (described in the following section), as indicated in Fig. 2. This is because the summations in each of the aforementioned four-point error sequences equal to zero, thereby the 4-tap moving average (MA) filter can directly remove these errors in the digital domain, as shown in Fig. 9. Despite the reduction of the reference spur level achieved via MA, the maximum achievable bandwidth of the ROS-PLL is reduced from 40 to 30 MHz. This does not effect the rms jitter of the implemented PLL since the optimum bandwidth of this PLL is lower than 1 MHz.

FIGURE 9. - Fine reference spurious corrections of each periodic error pattern, (a) corresponding waveforms and (b) diagram of moving average filter (MA).
FIGURE 9.

Fine reference spurious corrections of each periodic error pattern, (a) corresponding waveforms and (b) diagram of moving average filter (MA).

The proposed error-pattern compensation carried out in the mixed-signal domain is crucial to ensure the PLL locking robustness as it adjusts the level of sampled voltage to its operating range.1 Note that the calibration for these three error patterns is by reusing the existing CDAC, thus there is no extra cost of area and power consumption in the analog/mixed-signal domain. However, due to the limited CDAC resolution, the reference spurs may need to be further reduced using the digital 4-tap MA filter.

C. Fractional Compensation

Fig. 10(a) shows an ideal reference waveform $S_{\textrm {REF}}(t) = \sin (\phi)$ sampled by the feedback clock with a phase increment of $2\pi /M$ (i.e., timing increment of $T_{\textrm {ref}}/M$ ), in which $M {=} 4$ . This corresponds to an integer-N operation with ${\mathrm { FCW}}_{F}=0$ . The sampling moments are labeled as ①–④, and the initial ($t {=} 0$ ) sampled voltage at ① is $\sin (\pi /4)$ . In face of ${\mathrm { FCW}}_{F}\gt 0$ , the following expected sampling instant will get advanced by $\Delta {t_{F}}$ before the ideal sampling instance $T_{\textrm {ref}}/4$ , which can be derived as [30]\begin{equation*} \Delta {t_{F}}=\frac {T_{\textrm {ref}}}{4}-\frac {\mathrm {FCW}_{I}}{4}T_{\textrm {osc}}=\frac {\mathrm {FCW}_{F}}{4}T_{\textrm {osc}} \tag {1}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $T_{\textrm {ref}}=({\mathrm {FCW}_{I}+{\mathrm { FCW}}_{F}})T_{\textrm {osc}}$ . For ${\mathrm { FCW}}_{F}$ of 0.5, $\Delta {t_{F}}=0.125T_{\textrm {osc}}$ . This $\Delta {t_{F}}$ time increment will accumulate over time. After two reference periods, the accumulated time difference of $8\Delta {t_{F}}$ will be equal to the DCO period, $T_{\textrm {osc}}$ . In this prototype, the feedback sampling edge will be adjusted by delaying it by one oscillator cycle, by momentarily toggling the divide ratio of MMDIV from $N_{\textrm {DIV}}$ to $N_{\textrm {DIV}} {+} 1$ , where $N_{\textrm {DIV}}$ is defined as the floor of ${\mathrm {FCW_{I}}}/M$ . This will reset the accumulated time difference back to zero, thus significantly relaxing the required range of CDACs used to compensate for the fractional voltage residue. It can be observed in Fig. 10(a) that ① and ③ may experience a movement of the sampled voltage toward the zero-crossing point when ${\mathrm { FCW}}_{F}$ is 0.5 while ② and ④ move toward the peak/trough. Therefore, when the loop is settled and locked into these corresponding points, two sinusoidal compensation patterns in Fig. 10(b) can be derived as [30]\begin{align*} D_{\textrm {out1}}=& \frac {A\cdot 2^{N_{\textrm {CDAC}}}}{V_{\textrm {range1}}}\left ({{\sin {\left ({{\frac {\pi }{4}+D_{\textrm {in}}\cdot \Delta \theta }}\right )}-\sin {\left ({{\frac {\pi }{4}}}\right )}}}\right ) \\ D_{\textrm {out2}}=& \frac {A\cdot 2^{N_{\textrm {CDAC}}}}{V_{\textrm {range2}}}\left ({{\sin {\left ({{\frac {\pi }{4}}}\right )}-\sin {\left ({{\frac {\pi }{4}-D_{\textrm {in}}\cdot \Delta \theta }}\right )}}}\right ) \tag {2}\end{align*} View SourceRight-click on figure for MathML and additional features.where A is the estimated amplitude of $S_{\textrm {REF}}(t)$ , while $V_{\textrm {range1}}$ and $V_{\textrm {range2}}$ represent output ranges of the two CDACs. $N_{\textrm {CDAC}}$ is the number of binary bits of CDACs (here, 9). $D_{\textrm {IN}}$ is the input code of the accumulated ${\mathrm { FCW}}_{F}$ , while $D_{\textrm {out1,2}}$ are the corresponding SIN-LUT1,2 output codes. $\Delta \theta $ is the unit fractional phase error accumulated in a quarter of the reference period $T_{\textrm {ref}}/4$ [30]\begin{equation*} \Delta \theta =\frac {\pi }{2}\cdot \frac {1}{\mathrm {FCW}_{I}/{\mathrm { FCW}}_{F}+1}. \tag {3}\end{equation*} View SourceRight-click on figure for MathML and additional features.As shown in Fig. 10(c), to compensate for positive or negative slopes of the incoming waveforms, the corresponding fractional compensation codes are carefully chosen. LUT1 stores the fixed ROM-like pattern of the reference waveform from the $\mathrm \pi /4$ phase point toward the direction of peak, while LUT2 stores the waveform information from the $\mathrm \pi /4$ point toward the direction of zero-crossing. Since the differential reference sinusoidal input and cross-connected switches have been adopted to maintain the sampling level of each path (see Fig. 5). This allows the sampling P path $(V_{\mathrm { BP}})$ to only sample the upper half of the waveform, and the sampling N path $(V_{\mathrm { BN}})$ to only sample the lower half of the waveform, when approaching or achieving lock. In other words, State 1 (phase $\mathrm \pi /4$ ) and State 3 (phase $\mathrm 5\pi /4$ ) exhibit the same trend of the sampled waveform, and vice versa for State 2 (phase $\mathrm 3\pi /4$ ) and State 4 (phase $\mathrm 7\pi /4$ ).

FIGURE 10. - (a) Timing diagram of the incoming reference waveform from the crystal oscillator at frequency 
$f_{\text {ref}}$
 while the sampling clock runs at the down-divided DCO frequency of 
$f_{\text {osc}} = \text {(}{\mathrm {FCW}_{I}} + 0.5\text {)} f_{\text {ref}}$
, (b) implemented digital sine-based predistortion in ROS-ADPLL for residue compensation when operating in fractional-N mode, and (c) simplified diagram of the fractional compensation in digital domain.
FIGURE 10.

(a) Timing diagram of the incoming reference waveform from the crystal oscillator at frequency $f_{\text {ref}}$ while the sampling clock runs at the down-divided DCO frequency of $f_{\text {osc}} = \text {(}{\mathrm {FCW}_{I}} + 0.5\text {)} f_{\text {ref}}$ , (b) implemented digital sine-based predistortion in ROS-ADPLL for residue compensation when operating in fractional-N mode, and (c) simplified diagram of the fractional compensation in digital domain.

In order to ensure that the fractional spurs can be kept low, the resolution of CDAC (as well as its linearity) and LUTs should be carefully considered. Based on our system simulations, to keep the fractional spurs lower than −52 dBc, the required CDAC resolution should be 11 bits with the size of LUTs larger than 10 bit $\times 10$ bit. However, in this prototype, due to the limited resolution of CDAC and LUTs (i.e., 9 bit, and 8 bit $\times 8$ bit, respectively), the estimated worst-case and typical fractional spurs in the proposed PLL with ~300-kHz bandwidth and 50-MHz reference clock are −29 and −42 dBc when fractional codes (FCWF) are 1/28 and 1/24, respectively. Note that the above fractional spur is estimated at ~30 GHz, which is equivalent to ~ −50 dBc in the 2.2-GHz carrier [30]. On the other hand, from post-layout extraction, the INL and DNL of the CDAC are within ±1LSB, which is not the main limitation for fractional spurs. Alternatively, the fractional spur can also be lowered if higher-order MASH were to be applied but at a cost of wider required linear range of CDAC. In this work, the occupied area of the 8 bit $\times 8$ bit SIN-LUT1 and SIN-LUT2 are $8\times 10$ $\mu $ m2 and $8\times 10$ $\mu $ m2, respectively. From the top-level diagram in Fig. 2, it can be observed that the outputs from LUT can be scaled by gain $K_{\textrm {CDAC}}$ , which can either be set manually or calibrated through an LMS algorithm. After that, the codes will be combined with the output from the reference spur calibration. Fig. 11(a) shows the convergence of the CDAC gain using an LMS algorithm. It can be observed that the gain error can settle to the correct value with smaller magnitudes of phase errors shown in Fig. 11(b), resulting in better fractional spur performance shown in Fig. 11(c).

FIGURE 11. - (a) Convergence of the CDAC gain using the LMS algorithm, (b) corresponding phase error, and (c) PN profile with an exaggerated loop bandwidth at 30-GHz output frequency with (orange) and without (blue) the LMS calibration of CDAC gain.
FIGURE 11.

(a) Convergence of the CDAC gain using the LMS algorithm, (b) corresponding phase error, and (c) PN profile with an exaggerated loop bandwidth at 30-GHz output frequency with (orange) and without (blue) the LMS calibration of CDAC gain.

In this prototype, the impact of the three periodic error patterns (described in Section II-B) causes high reference spurs in the integer mode but exhibits a negligible impact on fractional spurious tones in the fractional mode when THD2 is less than −55 dBc. To verify this effect, behavioral simulation of the proposed system has been performed under the following assumptions: 1) dc offset of 4.3 mV; 2) phase offset of 3.6°; and 3) edge imbalance from the XO’s THD2 of −55 dBc. The result shows that the worst-case IBFS estimated at a 30-GHz output frequency is −34.7 dBc, which is similar to the case with no error compensations applied. Note that if the third harmonic content of −55 dBc from XO is also taken into account, the worst-case IBFS would be −29.9 dBc. When THD2 is increased to the −50-dBc level, the IBFS degrades by 4 dB when the CDAC resolution is 9 bits. By increasing the CDAC resolution to 11 bit, the degradation of the IBFS can be reduced to 2 dBc.

SECTION III.

Other Circuit Implementations

A. DCO and Third-Harmonic Extractor

The sampling clocks for the PD are generated from the down-divided DCO output by means of the frequency divider chain [see Fig. 1(b)]. A ~30-GHz divider would consume high power and it might require an additional bulky passive inductor. Therefore, we chose a class-F3 DCO operating at ~10 GHz but which also inherently generates a third harmonic by properly setting the primary and secondary tanks, as shown in Fig. 12(a) [35]. The gate node voltages ($V_{\textrm {GP}}$ , $V_{\textrm {GN}}$ ) exhibit a large amplitude of the ~10-GHz fundamental waveform, which is fed into the frequency divider chain. On the other hand, the drain nodes (${\mathrm{V}}_{\textrm {DP}}$ , ${\mathrm{V}}_{\textrm {DN}}$ ) are ac-coupled to the subsequent H3E [see Fig. 12(b)]. This keeps the MMD input frequency at ~10 GHz, thus consuming much less power and area as an inductor-less structure.

FIGURE 12. - Schematics of (a) 10-GHz class-F3 oscillator with (b) third-harmonic extractor (H3E) and 50-
$\Omega $
 output driver.
FIGURE 12.

Schematics of (a) 10-GHz class-F3 oscillator with (b) third-harmonic extractor (H3E) and 50-$\Omega $ output driver.

The DCO frequency can be tuned using coarse, medium, and fine banks. The resolution of its fine bank is 65 kHz which corresponds to −137.5 dBc/Hz at 10-MHz offset [36]. This is lower than the PN level of the actual oscillator, i.e., −130 dBc/Hz at 10-MHz offset considering 10-GHz carrier frequency. The LC-tank at the drains of M3,4 exhibits a high differential impedance at ~30 GHz to amplify the third-harmonic component at ${\mathrm{V}}_{\textrm {DP}}$ and ${\mathrm{V}}_{\textrm {DN}}$ , while a common-source tank is implemented by $L_{T}$ and $C_{\textrm {CT}}$ . This provides a high impedance at the fundamental ~10 GHz, preventing the ~10-GHz current flow. As a result, the fundamental component will be significantly attenuated [37]. The second-stage nMOS-based buffer is used for further amplifying the ~30-GHz signal and for driving the 50-$\Omega $ measurement equipment.

B. Dividers and Clock Generation

As indicated earlier in Fig. 2, the PD sampling clocks at $f_{\textrm {sample}} \sim 4 \times f_{\textrm {ref}}$ frequency are generated from the DCO carrier at $f_{\textrm {osc}}$ by means of two cascaded divide-by-2 frequency dividers followed by an MMDIV. Fig. 13(a) shows the schematic of the front-end first-stage divide-by-2 frequency divider that receives the differential 10-GHz signal from the MOS gates of the class-F oscillator. The digital logic of MMDIV is used to control: 1) the desired feedback rate; 2) the sampling edge realignment for the fractional-N operation; and 3) pulse generators for the ROS-PD. Targeting $f_{\textrm {sample}}=4 \times f_{\textrm {ref}}~(\sim 200$ MHz) rate in the feedback path, the 4-bit MMDIV consists of three $\div 2$ /3 cells (DIV2/3) and one MMDIV control block (MMD CTL), as shown in Fig. 13. As described in Section II-A, the proposed ROS-PD requires three operational phases for the voltage presetting $(\phi _{1})$ , sampling $(\phi _{2})$ , and amplification $(\phi _{3})$ . Instead of using a capacitive-loaded inverter chain to control the width of the sampling pulse, which would suffer from excessive power consumption and susceptibility to PVT variations, we propose to use the associated internal edges of the MMDIV. By utilizing two internal clocks, $S_{\textrm {V1}}$ and $S_{\textrm {V2}}$ , with different duty cycles [38], the required phases can be generated with simple combinational logic, as shown in Fig. 13(b). Note that the MSB of modulus, C$\langle 3\rangle $ , is fixed to 1 so that $S_{\textrm {V2}}$ and $S_{\textrm {V3}}$ hold an overlapping relationship. Thus, the required clock edges can support the divide ratios $N_{\textrm {DIV}}$ of C$\langle 3$ :$0\rangle =8\ldots 15$ , which is suitable for the desired feedback rate.

FIGURE 13. - Schematics of (a) first-stage 
$\div 2$
 frequency divider, and (b) MMDIV chain and sampling clock generation.
FIGURE 13.

Schematics of (a) first-stage $\div 2$ frequency divider, and (b) MMDIV chain and sampling clock generation.

SECTION IV.

Experimental Results

The proposed mmW $4\times $ ROS ADPLL was implemented in TSMC 28-nm LP CMOS. Figs. 14 and 15 show the chip micrograph and power breakdown, respectively. The oscillator and the third harmonic extractor occupy 0.24 mm2, which is the majority of the total ADPLL’s active area of ~0.3 mm2.

FIGURE 14. - Die micrograph of the proposed mm-wave reference-oversampling ADPLL.
FIGURE 14.

Die micrograph of the proposed mm-wave reference-oversampling ADPLL.

FIGURE 15. - Power breakdown of the proposed ADPLL.
FIGURE 15.

Power breakdown of the proposed ADPLL.

The high-frequency building blocks, i.e., DCO and its third-harmonic extractor, consume 4.5 and 5 mW, respectively, which is altogether more than 95% of the total power consumption. On the other hand, the first-stage divider, MMDIV and clock generation consume 1.3 and 0.21 mW, respectively. The proposed ROS-PD, which includes bootstrapped switches, low-noise amplifiers GM1 and GM2, CDAC buffers, and other associated passive switches, consumes 290$\mu $ W while operating at 200 MHz. The total power consumption of the ADPLL is only 11.9 mW. Note that the power consumption of the low-noise reference buffer, which we have avoided, is usually not included in the total power of other works in Table 1 [39], [40], [41], [42], [43], [44], [45]. Note that these reference buffers usually consume more than a milliwatt of power which could contribute more than 10% of the total power consumption of this PLL [46], [47]. To benchmark the PLL/ADPLL performance, the figure-of-merit $(\mathrm FOM_{\textrm {jitter}})$ can be computed as [5]\begin{equation*} {\mathrm {FOM_{jitter}}}=20\log \left ({{ \frac {\sigma ^{2}}{1s} }}\right )+10\log \left ({{ \frac {P_{\textrm {DC}}}{\textrm {1mW}} }}\right ) \tag {4}\end{equation*} View SourceRight-click on figure for MathML and additional features.where $\sigma $ is the RMS jitter of PLL, and $P_{\textrm {DC}}$ is its power consumption. To further normalize it with the PLL’s multiplication ratio N, the extended figure of merit $({\textrm {FOM}}_{{\textrm {jitter}}-N})$ can be computed as\begin{equation*} {\textrm {FOM}}_{{\textrm {jitter}}-N}={\mathrm {FOM_{jitter}}}+20\log \left ({{ \frac {f_{\textrm {out}}}{f_{\textrm {ref}}} }}\right ). \tag {5}\end{equation*} View SourceRight-click on figure for MathML and additional features.

TABLE 1 Comparison table With State-of-the-Art Mm-Wave Fractional-N PLLs
Table 1- Comparison table With State-of-the-Art Mm-Wave Fractional-N PLLs

In the integer-N operation, the proposed $4\times $ ROS ADPLL is measured with $f_{\textrm {ref}} {=} {\mathrm {50-MHz}}$ reference frequency provided by a standard low-cost 50-MHz XO (Crystek CVSS-945). Fig. 17(a) shows the PN plot at 28.8 GHz by an R&S FSW Signal and Spectrum Analyzer. In an integer-N operation, the IBPN is as low as −95 dBc/Hz with the integrated rms jitter of 199 fs from 10 kHz to 30 MHz. As shown in Fig. 16, thanks to the proposed calibration technique, the associated reference spurs are −68 dBc. For the fractional-N operation, the measured PN at the carrier of ~28.809275 GHz is shown in Fig. 17(b). From our system investigation, due to limited resolution of CDAC, the IBPN in fractional mode is degraded to ~ −92 dBc/Hz. This results in relatively smaller optimal bandwidth and the degradation of rms jitter to 237fs when compared to the integer-N operation.

FIGURE 16. - Measured spectrum at 28.8 GHz (a) without and (b) with the proposed reference spur calibration.
FIGURE 16.

Measured spectrum at 28.8 GHz (a) without and (b) with the proposed reference spur calibration.

FIGURE 17. - Measured PN in (a) integer-N operation, and (b) fractional-N operation.
FIGURE 17.

Measured PN in (a) integer-N operation, and (b) fractional-N operation.

The typical fractional spur is −40 dBc (see Fig. 18). Based on simulations, the worst-case IBFS is estimated to be −29 dBc when FCW${_{\text {F}}}{=}2^{-9}$ is chosen. Note that the fractional spur above is estimated at ~30 GHz, which is equivalent to ~ −50 dBc with a 2.2-GHz carrier [30]. Compared with state-of-the-art fractional-N mmW ADPLLs (Table 1), the proposed ROS-ADPLL achieves an rms jitter of 237 fs in a typical case. To estimate the jitter performance with the worst-case fractional spurs, the simulations indicate rms jitter of 350 fs (${\mathrm {FCW}}_{F} {=} 2^{-9}$ ). For a fair comparison with other works, the above estimated rms jitter and FOMs are included in Table 1. Thanks to the ROS-PD and H3E techniques, our system achieves an FoM of −241.7 dB, and −238.4 dB with a large N of 576 for the typical and the worst case, respectively.

FIGURE 18. - Measured spectrum in a fractional-N operation.
FIGURE 18.

Measured spectrum in a fractional-N operation.

The proposed techniques enable to break through the FoM-versus-N barrier shown in Fig. 20 for the landscape of state-of-the-art PLLs with large N. Compared with other >10 GHz digital PLLs in Table 1, this work achieves excellent FoM while consuming lower power. The large value of N brings the ${\mathrm {FoM}}_{\textrm {jitter-N}}$ to a record number of −269.3 dB, and −266 dB in the typical and the worst cases, respectively, which is remarkable among single-stage PLLs. From Fig. 20, it can be observed that for high N (i.e., > 500), the techniques using a frequency multiplier as a cascading stage can help to achieve a good FOM closer to the trend lines for integer-N [40], [48] and fractional-N cases [24].

FIGURE 19. - Measured relocking response characteristic of the proposed ADPLL in face of frequency jumps of (a) 25 MHz and (b) 62 MHz.
FIGURE 19.

Measured relocking response characteristic of the proposed ADPLL in face of frequency jumps of (a) 25 MHz and (b) 62 MHz.

FIGURE 20. - Landscape of state-of-the-art RF/mm-wave PLLs where fractional-N PLLs are shown in black, integer-N PLLs are shown in gray, and PLLs with an output frequency multiplier are shown with 
$\diamond $
. (Note that # refers to measured result with typical performance, and & refers to the estimated worst-case performance.)
FIGURE 20.

Landscape of state-of-the-art RF/mm-wave PLLs where fractional-N PLLs are shown in black, integer-N PLLs are shown in gray, and PLLs with an output frequency multiplier are shown with $\diamond $ . (Note that # refers to measured result with typical performance, and & refers to the estimated worst-case performance.)

Thanks to the wide monotonic PD range offered by the proposed architecture [23], [30], the detector eliminates the need for any FLL which would be necessary in subsampling PLLs in order to ensure locking robustness and fast settling. Based on our system-level simulations, when an initial sampling point is not located at the ideal locked points (i.e., $\pi /4$ ), the proposed ROS-PLL can still acquire locking but needs a relatively longer settling time, e.g., $30 \mu $ s for the worst case when an initial sampling point is at the peak of reference waveform $(\pi /2)$ . Moreover, the large PLL lock range is verified in the transient measurement results shown in Fig. 19. It can be observed that, under the 1-MHz loop bandwidth, the proposed ADPLL can settle within 10$\mu $ s to a new frequency 25 MHz away. By increasing the frequency deviation from the desired locked frequency to ~62 MHz, the settling time is $15 \mu $ s. The measured relocking behavior demonstrates the locking robustness of the proposed system.

SECTION V.

Conclusion

The proposed mmW fractional-N ADPLL employing the $4\times $ reference-waveform oversampling (ROS) PD achieves low jitter while providing a wide monotonic PD range without any need for additional frequency detection. Adopting mostly passive circuitry, the cross-coupled bottom-plate sampling technique oversamples the reference waveform by a voltage zero-forcing technique, which relaxes the requirements on its subsequent phase digitization and its driving ability. Unlike traditional DTC-based ADPLLs, the fractional-N operation can be achieved with the help of an embedded CDAC, which corrects voltage residues for fractional compensation directly in the voltage/charge domain. Moreover, the same CDAC can be reused to compensate for possible nonlinearity associated in the reference path, thereby reducing the reference spurious tones.

ACKNOWLEDGMENT

The authors would like to thank the TSMC University Shuttle for the chip fabrication and Dr. Hsieh-Hung Hsieh (TSMC) for help with the tape-out.

References

References is not available for this document.