Journals & Magazines >IEEE Open Journal of the Soli... >Volume: 1

Design Methodologies for Low-Jitter CMOS Clock Distribution

Abstract:

Clock jitter negatively affects the performance of sampling circuits such as high-speed wireline transceivers and data converters. With CMOS buffers being increasingly us...Show More

Metadata

Abstract:

Clock jitter negatively affects the performance of sampling circuits such as high-speed wireline transceivers and data converters. With CMOS buffers being increasingly used for the distribution of precise clocks in advanced technologies, it is important to understand their limitations and explore design tradeoffs. This tutorial provides quantitative analyses of the main sources of jitter in CMOS clock distribution: power supply induced jitter, jitter generation, and jitter amplification. Minimizing the number of buffers along the clock distribution network while still maintaining fast rise-fall times and ensuring proper settling of all clock waveforms will minimize the impact of all jitter sources. Following these guidelines can simultaneously reduce power supply noise sensitivity and power consumption of the clock distribution circuits. These conclusions are backed up by simulation and measurement results of two 16-nm FinFET clock distribution networks.

Published in: IEEE Open Journal of the Solid-State Circuits Society ( Volume: 1)

Page(s): 94 - 103

Date of Publication: 05 October 2021

Electronic ISSN: 2644-1349

DOI: 10.1109/OJSSCS.2021.3117930

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

In many applications, a clock from a single source, usually a phase-locked loop (PLL), is distributed across a chip as shown in Fig. 1. Clock jitter degrades the performance of sampling circuits that use the clock including high-speed wireline transceivers and data converters. With the continuous demand for faster and more precise sampling, jitter targets are becoming harder to meet [1]. Whereas there has been much research effort to reduce PLL jitter [2]–[4], there has been much less on the distribution of low-jitter clocks whose design is also critical.

FIGURE 1.

CMOS clock distribution for high-speed sampling circuits.

Show All

Various clock distribution architectures have been reported in the literature, each with its distinct advantages and disadvantages, as shown in Table 1. This tutorial paper focuses on CMOS clock distribution, by which we mean the distribution of full-swing clocks (from the supply voltage to ground) using CMOS inverters as buffers with multiple tap points along each interconnect as shown in Fig. 1. Current-mode logic (CML) circuits may be used, but they deliver a reduced clock swing and therefore require additional circuits before they can interface sampling circuitry. CML buffers also consume static power, making them power-inefficient especially when operating at lower speeds. Furthermore, CML buffers require bias circuitry and (typically) passive resistors which complicates their floorplanning. By comparison, CMOS clock distribution delivers a full-swing clock with fast rise-fall times using simple, scaling-friendly buffer circuits whose power consumption scales with clock frequency, which has made them predominant in broadband applications such as time-interleaved data converters [5] and high-speed wireline transceivers [6], [7]. However, CML buffers inherently offer better power supply rejection than CMOS buffers, thus, power supply induced jitter of CMOS inverters will be analyzed in detail.

TABLE 1 Qualitative Comparison of Clock Distribution Architectures

Resonant clocking and distributed oscillators have excellent jitter performance and power efficiency [8]–[10]. However, they occupy a relatively large area and have limited frequency range. Although they are not the focus of this tutorial, they are of great interest for applications where the sampling frequency does not change, there is enough area for inductors, and the lowest possible jitter is essential. Section II discusses jitter generated in a CMOS inverter from two sources: power supply noise and transistor thermal noise. Guidelines are presented for minimizing their effects in CMOS clock distribution networks. Section III describes jitter amplification in a CMOS inverter and design guidelines are provided. Section IV presents the design and measurement of two prototype 2-mm long CMOS clock distribution networks designed in 16-nm FinFET to illustrate the principles of the paper.

SECTION II.

Jitter Generation

A. Power Supply Induced Jitter (PSIJ)

Power supply induced jitter (PSIJ) is caused by voltage fluctuations in the power supply due to transient currents from surrounding circuit blocks. These transient currents pass though a power distribution network (PDN) made up of parasitic resistors, capacitors and inductors, across which a transient noise voltage is developed. The amplitude and frequency of the supply noise is determined by the frequency response of the PDN impedance and spectra of the underlying circuits’ supply currents [11]. A typical PDN resonates at tens to hundreds of MHz, and may result in supply noise amplitudes of tens of mV. Supply noise modulates the output swing and the transition threshold of CMOS inverters, affecting their rise-fall times. This changes the inverters’ delay, imparting jitter on the buffered clock.

formulate this behavior, consider a CMOS inverter with a nominal delay $t_{d}$ operating under a nominal supply $V_{DD}$ . When subject to a noisy supply voltage $V_{DD}^{\prime }$ , the delay changes by $\Delta t_{d}$ . Provided that $\Delta V_{DD} = V_{DD}^{\prime } - V_{DD}$ is sufficiently small, $\Delta t_{d} / t_{d}$ is proportional to $\Delta V_{DD} / V_{DD}$ where the proportionality constant $K$ is the PSIJ sensitivity [12]–[15],

$\begin{equation*} \frac {\Delta t_{d}}{t_{d}} = K \frac {\Delta V_{DD}}{V_{DD}} \tag{1}\end{equation*}$ View Source

An analytical expression for $K$ is obtained using a few simplifying assumptions in [16]. Consider the case illustrated in Fig. 2. The inverter input is assumed to be a step function and the delay is measured with respect to $V_{DD}/2$ at the output. During the output transition, one transistor is in saturation and the other is off. Using the square-law model, the nominal delay for a falling output edge transition is

$\begin{equation*} t_{d} = \frac {Q}{I} = \frac {C_{L} V_{DD}/2}{\frac {1}{2}\mu C_{ox}\frac {W}{L}\left ({V_{DD}-V_{th}}\right)^{2}} \tag{2}\end{equation*}$ View Source

The derivative of (2) with respect to

$V_{DD}$

becomes,

$\begin{align*} \frac {dt_{d}}{dV_{DD}}=&\frac {C_{L}/2}{\frac {1}{2}\mu C_{ox}\frac {W}{L}\left ({V_{DD}-V_{th}}\right)^{2}} \left ({1-\frac {2V_{DD}}{V_{DD}-V_{th}}}\right) \\[-2pt]=&\frac {t_{d}}{V_{DD}}\left ({1-\frac {2V_{DD}}{V_{DD}-V_{th}}}\right) \tag{3}\end{align*}$

View Source

FIGURE 2.

The step response of a CMOS inverter in the presence of supply noise.

Show All

For small $\Delta V_{DD}$ , $\Delta t_{d} = \frac {dt_{d}}{dV_{DD}}\Delta V_{DD}$ . Substituting this into (3) and rearranging the terms, we have

$\begin{equation*} \frac {\Delta t_{d}}{t_{d}} = -\frac {V_{DD}+V_{th}}{V_{DD}-V_{th}}\frac {\Delta V_{DD}}{V_{DD}} \tag{4}\end{equation*}$ View Source

Finally, comparing (1) and (4), we see

$\begin{equation*} K = -\frac {V_{DD}+V_{th}}{V_{DD}-V_{th}} \tag{5}\end{equation*}$ View Source

Equation (1) provides important design insights. Naturally, anything that reduces $\Delta V_{DD}$ (supply regulation and decoupling capacitors) will reduce overall PSIJ. Reducing the nominal inverter delay $t_{d}$ also reduces PSIJ. Finally, we note that using transistors with the lowest possible threshold voltage $V_{th}$ reduces $K$ and thus reduces PSIJ.

The following simulation example confirms the linear relationship predicted by Equation (1) and characterizes $K$ for a 65-nm CMOS technology. In Fig. 3(a), a 10 GHz clock is passed through a chain of inverters. The device widths are swept from 8 to 20 fingers where each finger of the PMOS and NMOS devices are $500n/60n$ and $200n/60n$ , respectively. The load capacitances, $C_{L}$ , are swept from 6 fF to 10 fF and a 40 mV_pp sinusoidal supply noise at a frequency of 200 MHz is superimposed on $V_{DD}$ . The delay across INV3, $t_{d}$ , and its peak-to-peak delay variation, $\Delta t_{d}$ , are plotted in Fig. 3(b) [17]. A linear relation between $t_{d}$ and $\Delta t_{d}$ is observed, where the PSIJ sensitivity $K \approx 1.45$ . The preceding derivations rely on the simplified square-law MOSFET model, which does not strictly apply in nanoscale CMOS technologies. Moreover, in scaled technologies, both $V_{DD}$ and $V_{th}$ are decreased which have opposing effects on $K$ . Higher-order effects further complicate the analytical model. Thus, in practice, for a given advanced process technology, simulations are necessary to characterize $K$ accurately.

$FIGURE 3. - (a) Testbench for PSIJ sensitivity simulation in 65-nm CMOS. (b) Peak-to-peak delay variation $\Delta t_{d}$ versus nominal delay $t_{d}$ across INV3 where $V_{DD} = 1$ V and $V_{th} = 0.37$ V (obtained from simulator DC operating point) in the presence of 40 mVpp sinusoidal supply noise at 200 MHz for different device widths and values of $C_{L}$ .$

FIGURE 3.

(a) Testbench for PSIJ sensitivity simulation in 65-nm CMOS. (b) Peak-to-peak delay variation $\Delta t_{d}$ versus nominal delay $t_{d}$ across INV3 where $V_{DD} = 1$ V and $V_{th} = 0.37$ V (obtained from simulator DC operating point) in the presence of 40 mV_pp sinusoidal supply noise at 200 MHz for different device widths and values of $C_{L}$ .

Show All

B. Random Jitter Generation (RJ)

Random jitter (RJ) is generated by transistor thermal noise. In a CMOS inverter, RJ appears in two phases of the output waveform. When the inverter output is not transitioning, the on transistor (operating in triode) contributes $kT/C$ noise voltage and introduces a random offset voltage on the output signal. The resulting jitter is obtained by dividing the noise voltage by the output transition slope $I_{D}/C_{L} = V_{DD}/t_{r}$ , where $t_{r}$ is the rise-fall time of the output transition.

$\begin{equation*} \sigma _{j1}^{2} = \frac {\sigma _{vn1}^{2}}{\left ({I_{D}/C_{L}}\right)^{2}} = \frac {kTC_{L}}{I_{D}^{2}} = \frac {kT}{I_{D}V_{DD}}t_{r} \tag{6}\end{equation*}$ View Source

During an output transition, white noise current with a power spectral density (PSD) of $4kT\gamma g_{m}$ is added to the charging or discharging current. The noise voltage incurred at the $V_{DD}/2$ crossing can be found by integrating the noise current over the time interval $t_{r}/2$ and dividing by the load capacitance $C_{L}$ . This calculation was performed in [18] and the following expression for the noise voltage is obtained,

$\begin{equation*} \sigma _{vn2}^{2} = 4kT\gamma g_{m}\frac {t_{r}}{4}\frac {1}{C_{L}^{2}} = \frac {kT\gamma g_{m} t_{r}}{C_{L}^{2}} \tag{7}\end{equation*}$ View Source

The resulting jitter can be obtained by dividing (7) by the output transition slope as shown.

$\begin{equation*} \sigma _{j2}^{2} = \frac {\sigma _{vn2}^{2}}{\left ({I_{D}/C_{L}}\right)^{2}} = \frac {kT\gamma g_{m} t_{r}}{I_{D}^{2}} = \frac {2kT\gamma }{I_{D}\left ({V_{DD}-V_{th}}\right)} t_{r} \tag{8}\end{equation*}$

View Source

In (8), the last equality assumes the square-law model and $V_{GS} = V_{DD}$ . From (6) and (8), it is observed that both RJ components are directly proportional to $t_{r}/I_{D}$ which indicates that RJ can be decreased by decreasing the rise-fall time and by increasing the charging and discharging current.

The preceding analysis is verified through simulation using a testbench similar to Fig. 3(a) with a clean supply, $V_{DD}$ , and ground, $V_{SS}$ , and with device noise turned on for all devices. The RJ variance per stage for different number of fingers, nf, is plotted in Fig. 4(a) [17] with zero explicit load capacitance, $C_{L} = 0$ , while keeping the fan-out of the buffer chain equal to 1 to keep the rise-fall time constant. As nf decreases, smaller charge and discharge currents are supplied to the load capacitors and RJ variance is increased, as expected. The RJ variance per stage for different rise-fall times is plotted in Fig. 4(b) [17], where nf is fixed to maintain the same charging and discharging current and $C_{L}$ is varied to change the rise-fall time. As the rise-fall time increases, the slope of the buffer output transitions decreases and RJ variance is increased, as expected.

$FIGURE 4. - (a) RJ variance per buffer in 65-nm CMOS for different buffer sizes (nf) with $C_{L} = 0$ while maintaining fan-out of 1 and (b) for different rise-fall times with nf fixed and different $C_{L}$ .$

FIGURE 4.

(a) RJ variance per buffer in 65-nm CMOS for different buffer sizes (nf) with $C_{L} = 0$ while maintaining fan-out of 1 and (b) for different rise-fall times with nf fixed and different $C_{L}$ .

Show All

C. Jitter Generation in Global Clock Distribution

Both PSIJ and RJ accumulate along a buffer chain and are therefore minimized by reducing the number of buffer stages. Assuming $n$ stages with identical clock buffers, each experiencing identical power supply noise (a conservative assumption), the amount of PSIJ at the end of the chain will be $n$ times that of a single stage. On the other hand, since RJ generated in each stage is uncorrelated, the RJ_rms at the end of the chain will be $\sqrt {n}$ times that of a single stage.

Next, we consider the proper choice of $n$ . Assume $n$ buffers are connected in series, driving an interconnect of length $l$ per stage, and a total distance $nl$ . While decreasing $n$ reduces jitter accumulation, it degrades the rise-fall times at the tapping points, resulting in larger jitter generation per stage. Fig. 5 shows the model of a CMOS inverter driving an interconnect of length $l$ , terminated by the input capacitance of the next stage, $C_{in}$ . The interconnect is typically implemented in the top layers of metal whose series resistance is low with metal shield to mitigate electromagnetic interference. The inverter launches an incident rising/falling edge along the interconnect that is reflected at the far end, terminated by $C_{in}$ . The resulting clock waveform at a point $x$ from the inverter is the superposition of the incident and reflected edges at that point. The amplitude of the incident edge depends on the output impedance of the inverter driver, $R_{drv}$ , and the characteristic impedance of the interconnect $Z_{0}$ ,

$\begin{equation*} V_{inc}\left ({x = 0}\right) = V_{in} \frac {Z_{0}}{Z_{0} + R_{drv}} \tag{9}\end{equation*}$ View Source

The incident edge amplitude increases when

$Z_{0}$

increases and decreases when

$R_{drv}$

increases. For simplicity,

$R_{drv}$

is assumed to be independent of the input amplitude

$V_{in}$

$FIGURE 5. - Model of CMOS inverter driving an interconnect of length $l$ and terminated by $C_{in}$ .$

FIGURE 5.

Model of CMOS inverter driving an interconnect of length $l$ and terminated by $C_{in}$ .

Show All

At each point $x$ , the incident edge amplitude is attenuated by a factor of $e^{-\alpha x}$ where $\alpha$ is the interconnect attenuation factor [19]. The reflected edge is attenuated by a factor of $e^{-\alpha (2l - x)}$ . The signal across $C_{in}$ experiences exponential settling with a time constant $\tau = Z_{0} C_{in}$ [19]. Assuming $\tau \ll t_{r}$ , where $t_{r}$ is the rise-fall time of the incident edge, the effect of $\tau$ will be small and the reflected edge at point $x$ will have an amplitude

$\begin{equation*} V_{ref}\left ({x}\right) = V_{inc}\left ({x = 0}\right)e^{-\alpha \left ({2l - x}\right)} \tag{10}\end{equation*}$ View Source

For a propagation velocity, $v_{p}$ , the time between the moment the incident edge reaches point $x$ and the moment the reflected edge returns to the same point and superimposes with the incident edge is

$\begin{equation*} t_{rt} = 2\left ({l - x}\right)/v_{p}.\tag{11}\end{equation*}$ View Source

D. Design for Low Jitter Generation

We have seen that reducing the number of buffers while maintaining fast rise-fall times will minimize both PSIJ and RJ. Design guidelines to ensure this, including how to size the buffer transistors, are outlined below.

Several overlapping considerations impact buffer sizing since it affects both $R_{drv}$ and $C_{in}$ , which in turn changes $Z_{0}$ , rise-fall settling time constant $\tau$ , and the round-trip delay time $t_{rt}$ . First, it is undesirable to have $R_{drv} \gg Z_{0}$ . In this case, the incident edge amplitude is much smaller than $V_{in} = V_{DD}$ , and multiple reflections are required for the clock edge to settle. Second, choosing $R_{drv} \ll Z_{0}$ is also undesirable because it results in overshoots and ringing in the clock waveform due to negative reflections. Thus, in a typical design $R_{drv} \approx Z_{0}$ . The time constant $\tau = Z_{0} C_{in} \approx R_{drv} C_{in}$ must then be limited to obtain fast rise-fall times.

Recall that jitter generation increases with the number of buffer stages, $n$ . As long as fast rise-fall times can be maintained, fewer buffer stages are desirable for clock distribution over a fixed distance.

The distance between buffers, $l$ , is limited by a desire to prevent transmission line effects from increasing rise-fall times. At any point $x$ where $t_{rt}$ is comparable to $t_{r}/2$ , the rise-fall time suffers, and the jitter generation of local buffers at the tapping point increases. Fig. 6 [17] shows the simulated clock waveforms at various points $x$ in 16-nm FinFET. At $x = 0.2$ -mm, the clock waveform rises halfway due to the incident edge and stalls until the reflected wave arrives. Fig. 7 [17] shows the potentially detrimental effect of slow reflections on the 50% to 90% rise time $t_{r,50-90\%}$ by showing its variation versus distance $x$ for different values of $l$ . Note the clock always has larger $t_{r,50-90\%}$ at the near end due to the late arrival of reflections.¹ As $l$ increases, this effect is exaggerated.

FIGURE 6.

Global clock waveforms in 16-nm FinFET at different distances $x$ .

Show All

$FIGURE 7. - Global clock $\text{t}_{r,50-90\%}$ versus distance $x$ in 16-nm CMOS for different interconnect length per stage $l$ .$

FIGURE 7.

Global clock $\text{t}_{r,50-90\%}$ versus distance $x$ in 16-nm CMOS for different interconnect length per stage $l$ .

Show All

E. Jitter Generation Case Study

Simulations in 16-nm FinFET are presented for a 10 GHz clock propagated across 2-mm of clock distribution and tapped by ten local buffers spaced 200- $\mu \text{m}$ apart. The number of global clock buffers, $n$ , is varied from 2 to 10 with each buffer sized to have $R_{drv}$ slightly smaller than $Z_{0}$ to maintain a sharp edge at the far end without significant overshoot. The global clock buffers use minimum gate lengths, with 4 fins per finger (nfin = 4) and 36 fingers (nf = 36). Local buffers use minimum length with nfin = 4 and nf = 4 as a compromise between low RJ generation and low capacitive loading on the interconnect. The transmission line model used in the simulations is based upon the transmission line implemented on the testchip and described in Section IV.

1) PSIJ Simulation

Supply noise with 40 mV_pp amplitude is applied to all buffers and the PSIJ_pp at the output of each local clock buffer is plotted in Fig. 8(a) [17] with $n$ varied from 2 to 10. When $n =2$ or 3, the correspondingly large $l$ results in higher PSIJ at the near end of each interconnect because of late-arriving reflections. Interestingly, the worst-case PSIJ for small $n$ is better than for large $n$ because PSIJ is introduced by fewer global buffers and there is less overall PSIJ accumulation.

FIGURE 8.

(a) PSIJ_pp simulation in 16-nm FinFET at the output of local clock buffers at different points along a 2-mm global clock distribution with different number of global buffers, $n$ , sized with nfin = 4 and nf = 36. (b) The worst-case PSIJ_pp among all local clocks in the 2-mm clock distribution with different buffer sizes.

Show All

Next, the global buffer size is varied from nf = 28 to 36. In each case, the same input clock waveform is maintained at the input to the global buffer chain by appropriately sizing the pre-drivers of the first stage. The worst-case PSIJ_pp among all tapping points is plotted versus $n$ in Fig. 8(b) [17], showing that it reduces with fewer and larger global buffers. This simulation assumes all buffers experience the same supply noise, which may not be true for a 2-mm clock distribution in practice. When supply noise is less correlated among buffers, the accumulated PSIJ will be less than $n$ times of a single stage.

2) RJ Generation Simulation

Unlike PSIJ, the RJ for each buffer is uncorrelated so it accumulates proportional to $\sqrt {n}$ . Fig. 9(a) [17] shows the RJ_rms at different tapping points. For $n=2$ , $l$ is large and as a result, large RJ is observed at the near end (node n1) where late-arriving reflections are more severe. As $n$ increases, $l$ decreases and RJ is reduced.

FIGURE 9.

(a) Simulated RJ_rms variation in 16-nm FinFET for local clocks at different tapping points along a 2-mm clock distribution where each global buffer is sized with nfin = 4 and nf = 36 for both NMOS and PMOS transistors. (b) The worst-case RJ_rms among all tapping points in the 2-mm clock distribution for different buffer sizes.

Show All

The worst-case RJ_rms among the local clock buffer outputs is plotted versus $n$ in Fig. 9(b) [17] with the global buffer size varied from nf = 28 to 36. In this simulation, $n = 5$ results in the lowest RJ. Smaller $n$ and larger $l$ suffers from longer $t_{r,50-90\%}$ due to late-arriving reflections. Therefore, large RJ is observed for $n = 2$ and $n = 3$ . Increasing $n$ above 5 improves rise-fall times slightly due to decreased $l$ , but the benefits are countered by the increase in RJ accumulation (in proportion to $\sqrt {n}$ ) and thus larger RJ is observed for $n = 10$ and $n = 20$ .

SECTION III.

Jitter Amplification

Jitter amplification occurs when the output rms jitter is larger than the input rms jitter in the absence of supply and device noise. It arises due to imperfect settling of clock waveforms between consecutive rising and falling edges. Several models have been developed to characterize this effect. In this tutorial, we focus on the jitter impulse response (JIR) and jitter transfer function (JTF) [12], [20].

The JIR is obtained by measuring the output absolute jitter sequence, $h_{k}$ , resulting from an impulse of jitter at the input, whose magnitude is a small fraction of the clock period. If the input jitter to the buffer chain is white, the ratio of output to input rms jitter can be calculated from the JIR coefficients [12] as follows,

$\begin{equation*} A_{tot} = \sqrt {\sum _{i} h_{i}^{2}}\tag{12}\end{equation*}$ View Source

More generally, jitter amplification depends on the input phase noise spectrum. The JTF, obtained by taking the Fourier transform of the JIR, represents jitter amplification as a function of jitter frequency, by which the input phase noise is shaped [21]. For colored input phase noise $S_{\phi }(f)$ , the jitter amplification factor, $A_{tot}$ , of a buffer chain, where the JTF for each stage is $H_{j}(f)$ , is given by

$\begin{equation*} A_{tot} = \sqrt {\frac {\int {}{}{S_{\phi }\left ({f}\right) \left |{ \prod _{j} H_{j}\left ({f}\right) }\right |^{2} df}}{\int {}{}{S_{\phi }\left ({f}\right) df}}}.\tag{13}\end{equation*}$ View Source

Fig. 10 illustrates jitter amplification in a clock buffer driving an RC load. A clock is applied with and without a 5 ps jitter impulse on one clock edge.² By subtracting the output $V_{DD}/2$ crossing times with the jitter impulse at the input from the $V_{DD}/2$ crossing times without the jitter impulse, the JIR is obtained.

FIGURE 10.

Jitter amplification in a clock buffer driving an RC load.

Show All

A. Design for Low Jitter Amplification

Jitter amplification typically occurs in two scenarios. First, it arises when the clock waveforms exhibit RC-settling characteristics with a settling time constant, $\tau$ , comparable to half of the clock period, $T_{clk}/2$ . For example, if the interconnect itself is too resistive, or a clock buffer is sized too small for its capacitive load, then $R_{drv} \gg Z_{0}$ and forms a time constant with its load comparable to $T_{clk}/2$ . Such situations are generally depicted in Fig. 10 where the delayed falling edge does not completely settle, so the subsequent rising edge is advanced, and so on. The alternating polarity of $h_{i}$ ’s are indicative of a highpass JTF and $A_{tot}>1$ .

Second, it arises when $t_{rt}$ is comparable to $T_{clk}/2$ . Even when the clock path has low series resistance and the buffers are sized appropriately, so that $R_{drv} \approx Z_{0}$ , slow settling can still occur as shown in Fig. 6 at the near end, resulting in jitter amplification.

The time required for a clock waveform to settle is roughly equal to the sum of the incident edge’s transition time, $t_{r}$ , and the round-trip delay, $t_{rt}$ , of the reflected wave,

$\begin{equation*} t_{s} = t_{r} + t_{rt} \tag{14}\end{equation*}$ View Source

The worst case settling time is experienced at the near end, where the round-trip delay is longest.

To avoid jitter amplification, we require $t_{s} < T_{clk}/2$ . This in turn places an upper limit on $(l-x)$ in (11), which can be written in terms of the inductance and capacitance per unit length of the interconnect, $L_{w}$ and $C_{w}$ , and the capacitance per unit length due to the input of local buffers tapping onto the transmission line, $C_{L}$ .

$\begin{equation*} t_{rt} = 2\frac {l-x}{v_{p}} = 2\left ({l-x}\right)\sqrt {L_{w}\left ({C_{w} + C_{L}}\right)} \tag{15}\end{equation*}$ View Source

To ensure

$t_{s} < T_{clk}/2$

using equations (14) and (15), the interconnect length and tapping points,

$x$

, are limited by

$\begin{equation*} l-x < \frac {1}{2} \frac {T_{clk}/2 - t_{r}}{\sqrt {L_{w}\left ({C_{w} + C_{L}}\right)}}.\tag{16}\end{equation*}$

View Source

This limitation becomes more stringent with lower

$T_{clk}$

and larger local buffers (greater

$C_{L}$

B. Jitter Amplification Simulation

The worst-case jitter amplification ratio, $A_{tot}$ , among all tapping points is plotted versus the interconnect length per stage, $l$ , in Fig. 11 [17] for $f_{clk}=10$ GHz and 14 GHz. Global buffers are sized with nfin = 4 and nf = 32 and local buffers uniformly spaced 200- $\mu \text{m}$ apart are sized with nfin = 4 and nf = 4. The clock path is not RC limited and the buffer $R_{drv} \approx Z_{0}$ . For both clock frequencies, it is evident that $A_{tot}$ is negligible up to some critical interconnect length, beyond which it increases dramatically.

$FIGURE 11. - Worst-case jitter amplification among all tapping points versus $l$ in 16-nm FinFET with white input jitter, global buffers sized nfin = 4 and nf = 32 and local buffers spaced 200- $\mu \text{m}$ apart and sized nfin = 4 and nf = 4.$

FIGURE 11.

Worst-case jitter amplification among all tapping points versus $l$ in 16-nm FinFET with white input jitter, global buffers sized nfin = 4 and nf = 32 and local buffers spaced 200- $\mu \text{m}$ apart and sized nfin = 4 and nf = 4.

Show All

When $f_{clk}$ is small, the available settling time $T_{clk}/2$ is large compared to the input clock transition time $t_{r}$ . Thus, jitter amplification occurs only for relatively large $l$ (and correspondingly large $t_{rt}$ ). At higher $f_{clk}$ where $T_{clk}/2$ is comparable to $t_{r}$ , a slight increase in $l$ may cause significantly increased settling time. The fact that the critical interconnect length is smaller for $f_{clk} = 14$ GHz than for 10 GHz, is consistent with (16).

SECTION IV.

Measurements

A 1-mm by 1-mm testchip was fabricated in 16-nm FinFET technology. Within the testchip are two 2-mm clock distribution networks consisting of 5 and 10 global buffers, respectively. Both are tapped by local buffers, sized nfin = 4 and nf = 4, spaced 400- $\mu \text{m}$ apart with two additional fan-out-of-2 buffers placed after each one. These additional buffers operate under the same supply and, thus, contribute to overall jitter and power consumption. Only the local clock buffer outputs at 400- $\mu \text{m}$ , 1200- $\mu \text{m}$ and 2000- $\mu \text{m}$ are passed to a shared MUX for measurement. Global buffers are sized with nfin = 4 and nf = 36 for the 5-buffer clock distribution and the corresponding interconnect length per stage is $l = 400$ - $\mu \text{m}$ . For the 10-buffer clock distribution, global buffers are sized with nfin = 4 and nf = 32 and $l = 200$ - $\mu \text{m}$ . Both clock distributions use coplanar waveguide (CPW) transmission lines with the signal line placed on Metal 11 with a width of 0.5- $\mu \text{m}$ . The distance between the signal and ground on Metal 11 is 4.5- $\mu \text{m}$ . A metal shield is placed underneath the signal line on Metal 9 to minimize electromagnetic interference. Both clock distributions operate under a 0.8 V supply, where the measured power consumption of the 5-buffer clock distribution is 4.96 mW and the measured power consumption of the 10-buffer clock distribution is 6.32 mW for a nominal clock frequency of 10 GHz. Fig. 12(a) and Fig. 12(b) show the die photograph and layout of the global clock distribution, respectively.

FIGURE 12.

(a) Testchip die photograph and (b) layout plot.

Show All

A. PSIJ

An on-board probing method named “spy hole” was used in [22] to measure injected supply noise in a flip-chip packaged testchip. This method requires an additional pair of power and ground BGA balls to be routed to the board for probing. The disadvantage of this method is that only low frequency noise can be measured due to the combination of the decoupling capacitances and high-impedance probe. To improve the bandwidth of the spy hole in this work, one on-chip I/O pad for the noisy supply is routed to a dedicated power BGA, as shown in Fig. 13. It does not get connected to the power plane in the package substrate or the on-board decoupling capacitors, resulting in smaller decoupling capacitances and less noise filtering than [22].

FIGURE 13.

PSIJ measurement setup with spy hole and high-frequency buffered path.

Show All

Supply noise is generated by modulating the gate voltage of noise-injection transistors. An AC-coupled buffered path allows high-frequency supply noise to be observed off-chip. The buffer has a bandwidth of 1 GHz and the gain calibration of the buffer is performed by applying sinusoidal supply noise at 100 MHz (between the lower cutoff frequency of the buffer’s AC-coupling and the spy hole bandwidth) and comparing the noise measured via the two paths.

A 45.1 mV_pp sinusoidal supply noise at 300 MHz is generated for the 5-buffer clock distribution and a 46.1 mV_pp sinusoidal supply noise at 300 MHz is generated for the 10-buffer clock distribution. The PSIJ_pp at the output of the local buffers is plotted in Fig. 14 for a nominal clock frequency of 10 GHz, by subtracting the PSIJ_pp without supply noise injection from the PSIJ_pp with supply noise injection. The amount of added PSIJ accumulates linearly along the buffer chain, as expected.

FIGURE 14.

PSIJ_pp measured for a 5-buffer and 10-buffer clock distribution at three different nodes.

Show All

Fig. 8(b) shows the 10-buffer clock distribution, corresponding to the design point nf = 32 and $n = 10$ , has a worst-case PSIJ_pp of 85 fs/mV while the 5-buffer clock distribution, corresponding to the design point nf = 36 and $n = 5$ , has a worst-case PSIJ_pp of 51 fs/mV. The measured worst-case PSIJ_pp, at the output of the local buffer located 2000- $\mu \text{m}$ from the near end are 46.1 fs/mV and 91.3 fs/mV for the 5-buffer and 10-buffer clock distributions, respectively. Both values are consistent with the simulation results, especially considering the impact of secondary details of the prototype not captured in the simulation testbench. Furthermore, the PSIJ_pp along the entire 5-buffer clock distribution is better than the 10-buffer clock distribution, which supports the proposed design guidelines.

B. RJ Generation

The simulated RJ generation in this scenario, due to intrinsic device noise is approximately 45 fs_rms in each clock path. Measurements were performed using an input clock source with 200 fs_rms of RJ. Though a small increase in RJ is observed in both paths, there is no noticeable RJ accumulation among tapping points along each path. Since the input clock jitter is independent and significantly larger, the observed increase in RJ mainly comes from jitter amplification along the path from the clock source to the scope. Therefore, RJ generation within the two CMOS clock paths could not be clearly observed in measurements, and are unlikely to limit overall clocking performance.

C. Jitter Amplification

With supply noise injection disabled, the RJ_rms averaged over rising and falling edges at the 400- $\mu \text{m}$ and 2000- $\mu \text{m}$ tapping points are measured for an input clock frequency of 10 GHz to 14 GHz. No noticeable jitter amplification was observed over this range. This is expected since the interconnect length per stage, $l$ , is 400- $\mu \text{m}$ and 200- $\mu \text{m}$ for the 5-buffer and 10-buffer clock distributions, respectively, which are well within the limits predicted in Fig. 11.

D. Summary

Table 2 compares the performance of the 5-buffer and 10-buffer clock distributions to a CML clock distribution reported in [15] and a resonant clock distribution reported in [23].

TABLE 2 Comparison of Clock Distribution Works

SECTION V.

Conclusion

In this tutorial, we presented quantitative analyses of the major jitter sources in CMOS clock distribution: power supply induced jitter (PSIJ), random jitter generation (RJ), and jitter amplification. PSIJ is reduced using fewer and larger buffers with correspondingly longer interconnect length per stage, $l$ . To reduce RJ, the number of buffers should be large enough to avoid excessive degradation of rise-fall times but small enough to avoid accumulation of PSIJ and RJ. Jitter amplification is closely tied to the available settling time, $T_{clk}/2$ , and only becomes significant when $l$ exceeds a certain threshold.

In summary, designers should strive to minimize the number of buffers along a CMOS clock distribution path, particularly when power supply noise is of concern, while maintaining the sharpest possible rise-fall times and complete settling of all clock waveforms tapped for use by high-speed sampling circuits. Minimizing the number of buffers, subject to this critical constraint, can simultaneously reduce power supply noise sensitivity and power consumption, presenting a rare “win-win” scenario for designers.

References is not available for this document.

Design Methodologies for Low-Jitter CMOS Clock Distribution

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction