Loading web-font TeX/Math/Italic
Design Methodologies for Low-Jitter CMOS Clock Distribution | IEEE Journals & Magazine | IEEE Xplore

Design Methodologies for Low-Jitter CMOS Clock Distribution


Abstract:

Clock jitter negatively affects the performance of sampling circuits such as high-speed wireline transceivers and data converters. With CMOS buffers being increasingly us...Show More

Abstract:

Clock jitter negatively affects the performance of sampling circuits such as high-speed wireline transceivers and data converters. With CMOS buffers being increasingly used for the distribution of precise clocks in advanced technologies, it is important to understand their limitations and explore design tradeoffs. This tutorial provides quantitative analyses of the main sources of jitter in CMOS clock distribution: power supply induced jitter, jitter generation, and jitter amplification. Minimizing the number of buffers along the clock distribution network while still maintaining fast rise-fall times and ensuring proper settling of all clock waveforms will minimize the impact of all jitter sources. Following these guidelines can simultaneously reduce power supply noise sensitivity and power consumption of the clock distribution circuits. These conclusions are backed up by simulation and measurement results of two 16-nm FinFET clock distribution networks.
Page(s): 94 - 103
Date of Publication: 05 October 2021
Electronic ISSN: 2644-1349

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

In many applications, a clock from a single source, usually a phase-locked loop (PLL), is distributed across a chip as shown in Fig. 1. Clock jitter degrades the performance of sampling circuits that use the clock including high-speed wireline transceivers and data converters. With the continuous demand for faster and more precise sampling, jitter targets are becoming harder to meet [1]. Whereas there has been much research effort to reduce PLL jitter [2]–​[4], there has been much less on the distribution of low-jitter clocks whose design is also critical.

FIGURE 1. - CMOS clock distribution for high-speed sampling circuits.
FIGURE 1.

CMOS clock distribution for high-speed sampling circuits.

Various clock distribution architectures have been reported in the literature, each with its distinct advantages and disadvantages, as shown in Table 1. This tutorial paper focuses on CMOS clock distribution, by which we mean the distribution of full-swing clocks (from the supply voltage to ground) using CMOS inverters as buffers with multiple tap points along each interconnect as shown in Fig. 1. Current-mode logic (CML) circuits may be used, but they deliver a reduced clock swing and therefore require additional circuits before they can interface sampling circuitry. CML buffers also consume static power, making them power-inefficient especially when operating at lower speeds. Furthermore, CML buffers require bias circuitry and (typically) passive resistors which complicates their floorplanning. By comparison, CMOS clock distribution delivers a full-swing clock with fast rise-fall times using simple, scaling-friendly buffer circuits whose power consumption scales with clock frequency, which has made them predominant in broadband applications such as time-interleaved data converters [5] and high-speed wireline transceivers [6], [7]. However, CML buffers inherently offer better power supply rejection than CMOS buffers, thus, power supply induced jitter of CMOS inverters will be analyzed in detail.

TABLE 1 Qualitative Comparison of Clock Distribution Architectures
Table 1- 
Qualitative Comparison of Clock Distribution Architectures

Resonant clocking and distributed oscillators have excellent jitter performance and power efficiency [8]–​[10]. However, they occupy a relatively large area and have limited frequency range. Although they are not the focus of this tutorial, they are of great interest for applications where the sampling frequency does not change, there is enough area for inductors, and the lowest possible jitter is essential. Section II discusses jitter generated in a CMOS inverter from two sources: power supply noise and transistor thermal noise. Guidelines are presented for minimizing their effects in CMOS clock distribution networks. Section III describes jitter amplification in a CMOS inverter and design guidelines are provided. Section IV presents the design and measurement of two prototype 2-mm long CMOS clock distribution networks designed in 16-nm FinFET to illustrate the principles of the paper.

SECTION II.

Jitter Generation

A. Power Supply Induced Jitter (PSIJ)

Power supply induced jitter (PSIJ) is caused by voltage fluctuations in the power supply due to transient currents from surrounding circuit blocks. These transient currents pass though a power distribution network (PDN) made up of parasitic resistors, capacitors and inductors, across which a transient noise voltage is developed. The amplitude and frequency of the supply noise is determined by the frequency response of the PDN impedance and spectra of the underlying circuits’ supply currents [11]. A typical PDN resonates at tens to hundreds of MHz, and may result in supply noise amplitudes of tens of mV. Supply noise modulates the output swing and the transition threshold of CMOS inverters, affecting their rise-fall times. This changes the inverters’ delay, imparting jitter on the buffered clock.

formulate this behavior, consider a CMOS inverter with a nominal delay t_{d} operating under a nominal supply V_{DD} . When subject to a noisy supply voltage V_{DD}^{\prime } , the delay changes by \Delta t_{d} . Provided that \Delta V_{DD} = V_{DD}^{\prime } - V_{DD} is sufficiently small, \Delta t_{d} / t_{d} is proportional to \Delta V_{DD} / V_{DD} where the proportionality constant K is the PSIJ sensitivity [12]–​[15],\begin{equation*} \frac {\Delta t_{d}}{t_{d}} = K \frac {\Delta V_{DD}}{V_{DD}} \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

An analytical expression for K is obtained using a few simplifying assumptions in [16]. Consider the case illustrated in Fig. 2. The inverter input is assumed to be a step function and the delay is measured with respect to V_{DD}/2 at the output. During the output transition, one transistor is in saturation and the other is off. Using the square-law model, the nominal delay for a falling output edge transition is \begin{equation*} t_{d} = \frac {Q}{I} = \frac {C_{L} V_{DD}/2}{\frac {1}{2}\mu C_{ox}\frac {W}{L}\left ({V_{DD}-V_{th}}\right)^{2}} \tag{2}\end{equation*}

View SourceRight-click on figure for MathML and additional features. The derivative of (2) with respect to V_{DD} becomes, \begin{align*} \frac {dt_{d}}{dV_{DD}}=&\frac {C_{L}/2}{\frac {1}{2}\mu C_{ox}\frac {W}{L}\left ({V_{DD}-V_{th}}\right)^{2}} \left ({1-\frac {2V_{DD}}{V_{DD}-V_{th}}}\right) \\[-2pt]=&\frac {t_{d}}{V_{DD}}\left ({1-\frac {2V_{DD}}{V_{DD}-V_{th}}}\right) \tag{3}\end{align*}
View SourceRight-click on figure for MathML and additional features.

FIGURE 2. - The step response of a CMOS inverter in the presence of supply noise.
FIGURE 2.

The step response of a CMOS inverter in the presence of supply noise.

For small \Delta V_{DD} , \Delta t_{d} = \frac {dt_{d}}{dV_{DD}}\Delta V_{DD} . Substituting this into (3) and rearranging the terms, we have\begin{equation*} \frac {\Delta t_{d}}{t_{d}} = -\frac {V_{DD}+V_{th}}{V_{DD}-V_{th}}\frac {\Delta V_{DD}}{V_{DD}} \tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Finally, comparing (1) and (4), we see \begin{equation*} K = -\frac {V_{DD}+V_{th}}{V_{DD}-V_{th}} \tag{5}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Equation (1) provides important design insights. Naturally, anything that reduces \Delta V_{DD} (supply regulation and decoupling capacitors) will reduce overall PSIJ. Reducing the nominal inverter delay t_{d} also reduces PSIJ. Finally, we note that using transistors with the lowest possible threshold voltage V_{th} reduces K and thus reduces PSIJ.

The following simulation example confirms the linear relationship predicted by Equation (1) and characterizes K for a 65-nm CMOS technology. In Fig. 3(a), a 10 GHz clock is passed through a chain of inverters. The device widths are swept from 8 to 20 fingers where each finger of the PMOS and NMOS devices are 500n/60n and 200n/60n , respectively. The load capacitances, C_{L} , are swept from 6 fF to 10 fF and a 40 mVpp sinusoidal supply noise at a frequency of 200 MHz is superimposed on V_{DD} . The delay across INV3, t_{d} , and its peak-to-peak delay variation, \Delta t_{d} , are plotted in Fig. 3(b) [17]. A linear relation between t_{d} and \Delta t_{d} is observed, where the PSIJ sensitivity K \approx 1.45 . The preceding derivations rely on the simplified square-law MOSFET model, which does not strictly apply in nanoscale CMOS technologies. Moreover, in scaled technologies, both V_{DD} and V_{th} are decreased which have opposing effects on K . Higher-order effects further complicate the analytical model. Thus, in practice, for a given advanced process technology, simulations are necessary to characterize K accurately.

FIGURE 3. - (a) Testbench for PSIJ sensitivity simulation in 65-nm CMOS. (b) Peak-to-peak delay variation 
$\Delta t_{d}$
 versus nominal delay 
$t_{d}$
 across INV3 where 
$V_{DD} = 1$
 V and 
$V_{th} = 0.37$
 V (obtained from simulator DC operating point) in the presence of 40 mVpp sinusoidal supply noise at 200 MHz for different device widths and values of 
$C_{L}$
.
FIGURE 3.

(a) Testbench for PSIJ sensitivity simulation in 65-nm CMOS. (b) Peak-to-peak delay variation \Delta t_{d} versus nominal delay t_{d} across INV3 where V_{DD} = 1 V and V_{th} = 0.37 V (obtained from simulator DC operating point) in the presence of 40 mVpp sinusoidal supply noise at 200 MHz for different device widths and values of C_{L} .

B. Random Jitter Generation (RJ)

Random jitter (RJ) is generated by transistor thermal noise. In a CMOS inverter, RJ appears in two phases of the output waveform. When the inverter output is not transitioning, the on transistor (operating in triode) contributes kT/C noise voltage and introduces a random offset voltage on the output signal. The resulting jitter is obtained by dividing the noise voltage by the output transition slope I_{D}/C_{L} = V_{DD}/t_{r} , where t_{r} is the rise-fall time of the output transition.\begin{equation*} \sigma _{j1}^{2} = \frac {\sigma _{vn1}^{2}}{\left ({I_{D}/C_{L}}\right)^{2}} = \frac {kTC_{L}}{I_{D}^{2}} = \frac {kT}{I_{D}V_{DD}}t_{r} \tag{6}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

During an output transition, white noise current with a power spectral density (PSD) of 4kT\gamma g_{m} is added to the charging or discharging current. The noise voltage incurred at the V_{DD}/2 crossing can be found by integrating the noise current over the time interval t_{r}/2 and dividing by the load capacitance C_{L} . This calculation was performed in [18] and the following expression for the noise voltage is obtained, \begin{equation*} \sigma _{vn2}^{2} = 4kT\gamma g_{m}\frac {t_{r}}{4}\frac {1}{C_{L}^{2}} = \frac {kT\gamma g_{m} t_{r}}{C_{L}^{2}} \tag{7}\end{equation*}

View SourceRight-click on figure for MathML and additional features. The resulting jitter can be obtained by dividing (7) by the output transition slope as shown.\begin{equation*} \sigma _{j2}^{2} = \frac {\sigma _{vn2}^{2}}{\left ({I_{D}/C_{L}}\right)^{2}} = \frac {kT\gamma g_{m} t_{r}}{I_{D}^{2}} = \frac {2kT\gamma }{I_{D}\left ({V_{DD}-V_{th}}\right)} t_{r} \tag{8}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

In (8), the last equality assumes the square-law model and V_{GS} = V_{DD} . From (6) and (8), it is observed that both RJ components are directly proportional to t_{r}/I_{D} which indicates that RJ can be decreased by decreasing the rise-fall time and by increasing the charging and discharging current.

The preceding analysis is verified through simulation using a testbench similar to Fig. 3(a) with a clean supply, V_{DD} , and ground, V_{SS} , and with device noise turned on for all devices. The RJ variance per stage for different number of fingers, nf, is plotted in Fig. 4(a) [17] with zero explicit load capacitance, C_{L} = 0 , while keeping the fan-out of the buffer chain equal to 1 to keep the rise-fall time constant. As nf decreases, smaller charge and discharge currents are supplied to the load capacitors and RJ variance is increased, as expected. The RJ variance per stage for different rise-fall times is plotted in Fig. 4(b) [17], where nf is fixed to maintain the same charging and discharging current and C_{L} is varied to change the rise-fall time. As the rise-fall time increases, the slope of the buffer output transitions decreases and RJ variance is increased, as expected.

FIGURE 4. - (a) RJ variance per buffer in 65-nm CMOS for different buffer sizes (nf) with 
$C_{L} = 0$
 while maintaining fan-out of 1 and (b) for different rise-fall times with nf fixed and different 
$C_{L}$
.
FIGURE 4.

(a) RJ variance per buffer in 65-nm CMOS for different buffer sizes (nf) with C_{L} = 0 while maintaining fan-out of 1 and (b) for different rise-fall times with nf fixed and different C_{L} .

C. Jitter Generation in Global Clock Distribution

Both PSIJ and RJ accumulate along a buffer chain and are therefore minimized by reducing the number of buffer stages. Assuming n stages with identical clock buffers, each experiencing identical power supply noise (a conservative assumption), the amount of PSIJ at the end of the chain will be n times that of a single stage. On the other hand, since RJ generated in each stage is uncorrelated, the RJrms at the end of the chain will be \sqrt {n} times that of a single stage.

Next, we consider the proper choice of n . Assume n buffers are connected in series, driving an interconnect of length l per stage, and a total distance nl . While decreasing n reduces jitter accumulation, it degrades the rise-fall times at the tapping points, resulting in larger jitter generation per stage. Fig. 5 shows the model of a CMOS inverter driving an interconnect of length l , terminated by the input capacitance of the next stage, C_{in} . The interconnect is typically implemented in the top layers of metal whose series resistance is low with metal shield to mitigate electromagnetic interference. The inverter launches an incident rising/falling edge along the interconnect that is reflected at the far end, terminated by C_{in} . The resulting clock waveform at a point x from the inverter is the superposition of the incident and reflected edges at that point. The amplitude of the incident edge depends on the output impedance of the inverter driver, R_{drv} , and the characteristic impedance of the interconnect Z_{0} , \begin{equation*} V_{inc}\left ({x = 0}\right) = V_{in} \frac {Z_{0}}{Z_{0} + R_{drv}} \tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features. The incident edge amplitude increases when Z_{0} increases and decreases when R_{drv} increases. For simplicity, R_{drv} is assumed to be independent of the input amplitude V_{in} .

FIGURE 5. - Model of CMOS inverter driving an interconnect of length 
$l$
 and terminated by 
$C_{in}$
.
FIGURE 5.

Model of CMOS inverter driving an interconnect of length l and terminated by C_{in} .

At each point x , the incident edge amplitude is attenuated by a factor of e^{-\alpha x} where \alpha is the interconnect attenuation factor [19]. The reflected edge is attenuated by a factor of e^{-\alpha (2l - x)} . The signal across C_{in} experiences exponential settling with a time constant \tau = Z_{0} C_{in} [19]. Assuming \tau \ll t_{r} , where t_{r} is the rise-fall time of the incident edge, the effect of \tau will be small and the reflected edge at point x will have an amplitude \begin{equation*} V_{ref}\left ({x}\right) = V_{inc}\left ({x = 0}\right)e^{-\alpha \left ({2l - x}\right)} \tag{10}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

For a propagation velocity, v_{p} , the time between the moment the incident edge reaches point x and the moment the reflected edge returns to the same point and superimposes with the incident edge is \begin{equation*} t_{rt} = 2\left ({l - x}\right)/v_{p}.\tag{11}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

D. Design for Low Jitter Generation

We have seen that reducing the number of buffers while maintaining fast rise-fall times will minimize both PSIJ and RJ. Design guidelines to ensure this, including how to size the buffer transistors, are outlined below.

Several overlapping considerations impact buffer sizing since it affects both R_{drv} and C_{in} , which in turn changes Z_{0} , rise-fall settling time constant \tau , and the round-trip delay time t_{rt} . First, it is undesirable to have R_{drv} \gg Z_{0} . In this case, the incident edge amplitude is much smaller than V_{in} = V_{DD} , and multiple reflections are required for the clock edge to settle. Second, choosing R_{drv} \ll Z_{0} is also undesirable because it results in overshoots and ringing in the clock waveform due to negative reflections. Thus, in a typical design R_{drv} \approx Z_{0} . The time constant \tau = Z_{0} C_{in} \approx R_{drv} C_{in} must then be limited to obtain fast rise-fall times.

Recall that jitter generation increases with the number of buffer stages, n . As long as fast rise-fall times can be maintained, fewer buffer stages are desirable for clock distribution over a fixed distance.

The distance between buffers, l , is limited by a desire to prevent transmission line effects from increasing rise-fall times. At any point x where t_{rt} is comparable to t_{r}/2 , the rise-fall time suffers, and the jitter generation of local buffers at the tapping point increases. Fig. 6 [17] shows the simulated clock waveforms at various points x in 16-nm FinFET. At x = 0.2 -mm, the clock waveform rises halfway due to the incident edge and stalls until the reflected wave arrives. Fig. 7 [17] shows the potentially detrimental effect of slow reflections on the 50% to 90% rise time t_{r,50-90\%} by showing its variation versus distance x for different values of l . Note the clock always has larger t_{r,50-90\%} at the near end due to the late arrival of reflections.1 As l increases, this effect is exaggerated.

FIGURE 6. - Global clock waveforms in 16-nm FinFET at different distances 
$x$
.
FIGURE 6.

Global clock waveforms in 16-nm FinFET at different distances x .

FIGURE 7. - Global clock 
$\text{t}_{r,50-90\%}$
 versus distance 
$x$
 in 16-nm CMOS for different interconnect length per stage 
$l$
.
FIGURE 7.

Global clock \text{t}_{r,50-90\%} versus distance x in 16-nm CMOS for different interconnect length per stage l .

E. Jitter Generation Case Study

Simulations in 16-nm FinFET are presented for a 10 GHz clock propagated across 2-mm of clock distribution and tapped by ten local buffers spaced 200-\mu \text{m} apart. The number of global clock buffers, n , is varied from 2 to 10 with each buffer sized to have R_{drv} slightly smaller than Z_{0} to maintain a sharp edge at the far end without significant overshoot. The global clock buffers use minimum gate lengths, with 4 fins per finger (nfin = 4) and 36 fingers (nf = 36). Local buffers use minimum length with nfin = 4 and nf = 4 as a compromise between low RJ generation and low capacitive loading on the interconnect. The transmission line model used in the simulations is based upon the transmission line implemented on the testchip and described in Section IV.

1) PSIJ Simulation

Supply noise with 40 mVpp amplitude is applied to all buffers and the PSIJpp at the output of each local clock buffer is plotted in Fig. 8(a) [17] with n varied from 2 to 10. When n =2 or 3, the correspondingly large l results in higher PSIJ at the near end of each interconnect because of late-arriving reflections. Interestingly, the worst-case PSIJ for small n is better than for large n because PSIJ is introduced by fewer global buffers and there is less overall PSIJ accumulation.

FIGURE 8. - (a) PSIJpp simulation in 16-nm FinFET at the output of local clock buffers at different points along a 2-mm global clock distribution with different number of global buffers, 
$n$
, sized with nfin = 4 and nf = 36. (b) The worst-case PSIJpp among all local clocks in the 2-mm clock distribution with different buffer sizes.
FIGURE 8.

(a) PSIJpp simulation in 16-nm FinFET at the output of local clock buffers at different points along a 2-mm global clock distribution with different number of global buffers, n , sized with nfin = 4 and nf = 36. (b) The worst-case PSIJpp among all local clocks in the 2-mm clock distribution with different buffer sizes.

Next, the global buffer size is varied from nf = 28 to 36. In each case, the same input clock waveform is maintained at the input to the global buffer chain by appropriately sizing the pre-drivers of the first stage. The worst-case PSIJpp among all tapping points is plotted versus n in Fig. 8(b) [17], showing that it reduces with fewer and larger global buffers. This simulation assumes all buffers experience the same supply noise, which may not be true for a 2-mm clock distribution in practice. When supply noise is less correlated among buffers, the accumulated PSIJ will be less than n times of a single stage.

2) RJ Generation Simulation

Unlike PSIJ, the RJ for each buffer is uncorrelated so it accumulates proportional to \sqrt {n} . Fig. 9(a) [17] shows the RJrms at different tapping points. For n=2 , l is large and as a result, large RJ is observed at the near end (node n1) where late-arriving reflections are more severe. As n increases, l decreases and RJ is reduced.

FIGURE 9. - (a) Simulated RJrms variation in 16-nm FinFET for local clocks at different tapping points along a 2-mm clock distribution where each global buffer is sized with nfin = 4 and nf = 36 for both NMOS and PMOS transistors. (b) The worst-case RJrms among all tapping points in the 2-mm clock distribution for different buffer sizes.
FIGURE 9.

(a) Simulated RJrms variation in 16-nm FinFET for local clocks at different tapping points along a 2-mm clock distribution where each global buffer is sized with nfin = 4 and nf = 36 for both NMOS and PMOS transistors. (b) The worst-case RJrms among all tapping points in the 2-mm clock distribution for different buffer sizes.

The worst-case RJrms among the local clock buffer outputs is plotted versus n in Fig. 9(b) [17] with the global buffer size varied from nf = 28 to 36. In this simulation, n = 5 results in the lowest RJ. Smaller n and larger l suffers from longer t_{r,50-90\%} due to late-arriving reflections. Therefore, large RJ is observed for n = 2 and n = 3 . Increasing n above 5 improves rise-fall times slightly due to decreased l , but the benefits are countered by the increase in RJ accumulation (in proportion to \sqrt {n} ) and thus larger RJ is observed for n = 10 and n = 20 .

SECTION III.

Jitter Amplification

Jitter amplification occurs when the output rms jitter is larger than the input rms jitter in the absence of supply and device noise. It arises due to imperfect settling of clock waveforms between consecutive rising and falling edges. Several models have been developed to characterize this effect. In this tutorial, we focus on the jitter impulse response (JIR) and jitter transfer function (JTF) [12], [20].

The JIR is obtained by measuring the output absolute jitter sequence, h_{k} , resulting from an impulse of jitter at the input, whose magnitude is a small fraction of the clock period. If the input jitter to the buffer chain is white, the ratio of output to input rms jitter can be calculated from the JIR coefficients [12] as follows, \begin{equation*} A_{tot} = \sqrt {\sum _{i} h_{i}^{2}}\tag{12}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

More generally, jitter amplification depends on the input phase noise spectrum. The JTF, obtained by taking the Fourier transform of the JIR, represents jitter amplification as a function of jitter frequency, by which the input phase noise is shaped [21]. For colored input phase noise S_{\phi }(f) , the jitter amplification factor, A_{tot} , of a buffer chain, where the JTF for each stage is H_{j}(f) , is given by \begin{equation*} A_{tot} = \sqrt {\frac {\int {}{}{S_{\phi }\left ({f}\right) \left |{ \prod _{j} H_{j}\left ({f}\right) }\right |^{2} df}}{\int {}{}{S_{\phi }\left ({f}\right) df}}}.\tag{13}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Fig. 10 illustrates jitter amplification in a clock buffer driving an RC load. A clock is applied with and without a 5 ps jitter impulse on one clock edge.2 By subtracting the output V_{DD}/2 crossing times with the jitter impulse at the input from the V_{DD}/2 crossing times without the jitter impulse, the JIR is obtained.

FIGURE 10. - Jitter amplification in a clock buffer driving an RC load.
FIGURE 10.

Jitter amplification in a clock buffer driving an RC load.

A. Design for Low Jitter Amplification

Jitter amplification typically occurs in two scenarios. First, it arises when the clock waveforms exhibit RC-settling characteristics with a settling time constant, \tau , comparable to half of the clock period, T_{clk}/2 . For example, if the interconnect itself is too resistive, or a clock buffer is sized too small for its capacitive load, then R_{drv} \gg Z_{0} and forms a time constant with its load comparable to T_{clk}/2 . Such situations are generally depicted in Fig. 10 where the delayed falling edge does not completely settle, so the subsequent rising edge is advanced, and so on. The alternating polarity of h_{i} ’s are indicative of a highpass JTF and A_{tot}>1 .

Second, it arises when t_{rt} is comparable to T_{clk}/2 . Even when the clock path has low series resistance and the buffers are sized appropriately, so that R_{drv} \approx Z_{0} , slow settling can still occur as shown in Fig. 6 at the near end, resulting in jitter amplification.

The time required for a clock waveform to settle is roughly equal to the sum of the incident edge’s transition time, t_{r} , and the round-trip delay, t_{rt} , of the reflected wave, \begin{equation*} t_{s} = t_{r} + t_{rt} \tag{14}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

The worst case settling time is experienced at the near end, where the round-trip delay is longest.

To avoid jitter amplification, we require t_{s} < T_{clk}/2 . This in turn places an upper limit on (l-x) in (11), which can be written in terms of the inductance and capacitance per unit length of the interconnect, L_{w} and C_{w} , and the capacitance per unit length due to the input of local buffers tapping onto the transmission line, C_{L} .\begin{equation*} t_{rt} = 2\frac {l-x}{v_{p}} = 2\left ({l-x}\right)\sqrt {L_{w}\left ({C_{w} + C_{L}}\right)} \tag{15}\end{equation*}

View SourceRight-click on figure for MathML and additional features. To ensure t_{s} < T_{clk}/2 using equations (14) and (15), the interconnect length and tapping points, x , are limited by \begin{equation*} l-x < \frac {1}{2} \frac {T_{clk}/2 - t_{r}}{\sqrt {L_{w}\left ({C_{w} + C_{L}}\right)}}.\tag{16}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
This limitation becomes more stringent with lower T_{clk} and larger local buffers (greater C_{L} ).

B. Jitter Amplification Simulation

The worst-case jitter amplification ratio, A_{tot} , among all tapping points is plotted versus the interconnect length per stage, l , in Fig. 11 [17] for f_{clk}=10 GHz and 14 GHz. Global buffers are sized with nfin = 4 and nf = 32 and local buffers uniformly spaced 200-\mu \text{m} apart are sized with nfin = 4 and nf = 4. The clock path is not RC limited and the buffer R_{drv} \approx Z_{0} . For both clock frequencies, it is evident that A_{tot} is negligible up to some critical interconnect length, beyond which it increases dramatically.

FIGURE 11. - Worst-case jitter amplification among all tapping points versus 
$l$
 in 16-nm FinFET with white input jitter, global buffers sized nfin = 4 and nf = 32 and local buffers spaced 200-
$\mu \text{m}$
 apart and sized nfin = 4 and nf = 4.
FIGURE 11.

Worst-case jitter amplification among all tapping points versus l in 16-nm FinFET with white input jitter, global buffers sized nfin = 4 and nf = 32 and local buffers spaced 200-\mu \text{m} apart and sized nfin = 4 and nf = 4.

When f_{clk} is small, the available settling time T_{clk}/2 is large compared to the input clock transition time t_{r} . Thus, jitter amplification occurs only for relatively large l (and correspondingly large t_{rt} ). At higher f_{clk} where T_{clk}/2 is comparable to t_{r} , a slight increase in l may cause significantly increased settling time. The fact that the critical interconnect length is smaller for f_{clk} = 14 GHz than for 10 GHz, is consistent with (16).

SECTION IV.

Measurements

A 1-mm by 1-mm testchip was fabricated in 16-nm FinFET technology. Within the testchip are two 2-mm clock distribution networks consisting of 5 and 10 global buffers, respectively. Both are tapped by local buffers, sized nfin = 4 and nf = 4, spaced 400-\mu \text{m} apart with two additional fan-out-of-2 buffers placed after each one. These additional buffers operate under the same supply and, thus, contribute to overall jitter and power consumption. Only the local clock buffer outputs at 400-\mu \text{m} , 1200-\mu \text{m} and 2000-\mu \text{m} are passed to a shared MUX for measurement. Global buffers are sized with nfin = 4 and nf = 36 for the 5-buffer clock distribution and the corresponding interconnect length per stage is l = 400 -\mu \text{m} . For the 10-buffer clock distribution, global buffers are sized with nfin = 4 and nf = 32 and l = 200 -\mu \text{m} . Both clock distributions use coplanar waveguide (CPW) transmission lines with the signal line placed on Metal 11 with a width of 0.5-\mu \text{m} . The distance between the signal and ground on Metal 11 is 4.5-\mu \text{m} . A metal shield is placed underneath the signal line on Metal 9 to minimize electromagnetic interference. Both clock distributions operate under a 0.8 V supply, where the measured power consumption of the 5-buffer clock distribution is 4.96 mW and the measured power consumption of the 10-buffer clock distribution is 6.32 mW for a nominal clock frequency of 10 GHz. Fig. 12(a) and Fig. 12(b) show the die photograph and layout of the global clock distribution, respectively.

FIGURE 12. - (a) Testchip die photograph and (b) layout plot.
FIGURE 12.

(a) Testchip die photograph and (b) layout plot.

A. PSIJ

An on-board probing method named “spy hole” was used in [22] to measure injected supply noise in a flip-chip packaged testchip. This method requires an additional pair of power and ground BGA balls to be routed to the board for probing. The disadvantage of this method is that only low frequency noise can be measured due to the combination of the decoupling capacitances and high-impedance probe. To improve the bandwidth of the spy hole in this work, one on-chip I/O pad for the noisy supply is routed to a dedicated power BGA, as shown in Fig. 13. It does not get connected to the power plane in the package substrate or the on-board decoupling capacitors, resulting in smaller decoupling capacitances and less noise filtering than [22].

FIGURE 13. - PSIJ measurement setup with spy hole and high-frequency buffered path.
FIGURE 13.

PSIJ measurement setup with spy hole and high-frequency buffered path.

Supply noise is generated by modulating the gate voltage of noise-injection transistors. An AC-coupled buffered path allows high-frequency supply noise to be observed off-chip. The buffer has a bandwidth of 1 GHz and the gain calibration of the buffer is performed by applying sinusoidal supply noise at 100 MHz (between the lower cutoff frequency of the buffer’s AC-coupling and the spy hole bandwidth) and comparing the noise measured via the two paths.

A 45.1 mVpp sinusoidal supply noise at 300 MHz is generated for the 5-buffer clock distribution and a 46.1 mVpp sinusoidal supply noise at 300 MHz is generated for the 10-buffer clock distribution. The PSIJpp at the output of the local buffers is plotted in Fig. 14 for a nominal clock frequency of 10 GHz, by subtracting the PSIJpp without supply noise injection from the PSIJpp with supply noise injection. The amount of added PSIJ accumulates linearly along the buffer chain, as expected.

FIGURE 14. - PSIJpp measured for a 5-buffer and 10-buffer clock distribution at three different nodes.
FIGURE 14.

PSIJpp measured for a 5-buffer and 10-buffer clock distribution at three different nodes.

Fig. 8(b) shows the 10-buffer clock distribution, corresponding to the design point nf = 32 and n = 10 , has a worst-case PSIJpp of 85 fs/mV while the 5-buffer clock distribution, corresponding to the design point nf = 36 and n = 5 , has a worst-case PSIJpp of 51 fs/mV. The measured worst-case PSIJpp, at the output of the local buffer located 2000-\mu \text{m} from the near end are 46.1 fs/mV and 91.3 fs/mV for the 5-buffer and 10-buffer clock distributions, respectively. Both values are consistent with the simulation results, especially considering the impact of secondary details of the prototype not captured in the simulation testbench. Furthermore, the PSIJpp along the entire 5-buffer clock distribution is better than the 10-buffer clock distribution, which supports the proposed design guidelines.

B. RJ Generation

The simulated RJ generation in this scenario, due to intrinsic device noise is approximately 45 fsrms in each clock path. Measurements were performed using an input clock source with 200 fsrms of RJ. Though a small increase in RJ is observed in both paths, there is no noticeable RJ accumulation among tapping points along each path. Since the input clock jitter is independent and significantly larger, the observed increase in RJ mainly comes from jitter amplification along the path from the clock source to the scope. Therefore, RJ generation within the two CMOS clock paths could not be clearly observed in measurements, and are unlikely to limit overall clocking performance.

C. Jitter Amplification

With supply noise injection disabled, the RJrms averaged over rising and falling edges at the 400-\mu \text{m} and 2000-\mu \text{m} tapping points are measured for an input clock frequency of 10 GHz to 14 GHz. No noticeable jitter amplification was observed over this range. This is expected since the interconnect length per stage, l , is 400-\mu \text{m} and 200-\mu \text{m} for the 5-buffer and 10-buffer clock distributions, respectively, which are well within the limits predicted in Fig. 11.

D. Summary

Table 2 compares the performance of the 5-buffer and 10-buffer clock distributions to a CML clock distribution reported in [15] and a resonant clock distribution reported in [23].

TABLE 2 Comparison of Clock Distribution Works
Table 2- 
Comparison of Clock Distribution Works

SECTION V.

Conclusion

In this tutorial, we presented quantitative analyses of the major jitter sources in CMOS clock distribution: power supply induced jitter (PSIJ), random jitter generation (RJ), and jitter amplification. PSIJ is reduced using fewer and larger buffers with correspondingly longer interconnect length per stage, l . To reduce RJ, the number of buffers should be large enough to avoid excessive degradation of rise-fall times but small enough to avoid accumulation of PSIJ and RJ. Jitter amplification is closely tied to the available settling time, T_{clk}/2 , and only becomes significant when l exceeds a certain threshold.

In summary, designers should strive to minimize the number of buffers along a CMOS clock distribution path, particularly when power supply noise is of concern, while maintaining the sharpest possible rise-fall times and complete settling of all clock waveforms tapped for use by high-speed sampling circuits. Minimizing the number of buffers, subject to this critical constraint, can simultaneously reduce power supply noise sensitivity and power consumption, presenting a rare “win-win” scenario for designers.

References

References is not available for this document.