A 20-μs Turn-On Time, 24-kHz Resolution, 1.5–100-MHz Digitally Programmable Temperature-Compensated Clock Generator

A clock generator using a fast-locking frequency-locked loop (FLL)-based RC oscillator and delta-sigma fractional dividers (FDIVs) to generate programmable temperature-insensitive output frequencies is presented. Successive approximation register (SAR) logic is used to speed up the locking of the FLL, and truncation error cancellation (TEC) is performed in FDIVs to reduce delta-sigma-induced jitter. A prototype clock generator fabricated in a 65-nm CMOS process generates output clocks in the range of 1.5–100 MHz with a resolution of 24-kHz, 140-ps peak-to-peak period jitter, 6.8-ppm/°C inaccuracy, and can be turned on within 20 $\mu \text{s}$ .

off rapidly, severely limiting the ability to employ system-level power-reduction strategies such as power cycling.
Closed-loop frequency-locked loop (FLL)-based oscillators achieve excellent accuracy at high output frequencies by eliminating the need for a high-speed comparator [19]. The schematic of a conventional FLL-based oscillator is shown in Fig. 1 Step response at two different temperatures.
dominates, many compensation schemes for implementing a temperature-insensitive resistor have been proposed, and excellent frequency accuracies are achieved using them [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18]. However, the FLL-based architecture suffers from two drawbacks not previously addressed. First, the bandwidth of the feedback loop is typically very low, in the order of a kHz, resulting in a long settling time that severely limits the efficacy of duty-cycled operation used in low-power devices. Second, these oscillators can only provide a single fixed frequency output, and replicating it to generate multiple outputs at digitally-programmable frequencies incurs a hefty area penalty.
This article presents a fast startup, a temperature-stable digital FLL-based oscillator, and low-jitter open-loop fractional dividers (FDIVs) that can provide programmable clock outputs. Fabricated in a 65-nm CMOS process, the prototype timing device can generate clocks from about 1.5-100 MHz with a frequency inaccuracy and resolution of 6.8 ppm/ • C and 24 kHz, respectively.
The rest of the article is organized as follows. Section II presents the proposed architecture. Section III shows the details of the FDIV. Circuit implementation of key building blocks is described in Section IV. Experimental results from the prototype chips are presented in Section V. Key contributions of this article are summarized in Section VI.

II. PROPOSED ARCHITECTURE
The simplified block diagram of the proposed clock generator is shown in Fig. 2 [20]. It comprises a digital FLL-based fast startup temperature-compensated oscillator (TCO), a low-jitter FDIV, and a digital PLL used for calibrating the FDIV. We first describe the TCO followed by FDIV in Section III.
TCO employs the RC network, depicted in Fig. 3(a), consisting of resistors R N and R P and a capacitor C, connected between the supply rails, V DD , and ground. To understand how this RC network can be used to build an oscillator with excellent intrinsic temperature stability, consider the RC network's step responses at two temperatures, T 1 • C and T 2 • C, illustrated in Fig. 3(b). When the input RC is high ( RC = 1), switch S R is closed, resetting the capacitor to ground and the output to Due to the temperature dependence of resistors, the output voltage is reset to V RST1 and V RST2 , respectively. When a negative step is applied ( RC = 1 → 0), S R opens, causing the RC-network's output voltage, V RC , to approach V DD exponentially from V RST . The time taken by the output voltage to reach a voltage of V RC0 , denoted by T RC , can be calculated using Note that the temperature dependence of the resistors makes T RC also a function of temperature, resulting in, in general, two different charging times T RC1 and T RC2 , at T 1 • C and T 2 • C, as shown in Fig. 3(b). We assume that the TC of the capacitor is negligible, a reasonable assumption in practice, and treat the capacitor as a constant in the above equation. Interestingly, by appropriately choosing the resistors and the capacitor in the RC network, V RC can be made to reach V ZTC after T ZTC seconds, independent of temperature [1]. In other words, This is illustrated in Fig. 3(b), where the step responses at two temperatures, T 1 • C and T 2 • C, intersect at coordinates (T ZTC , V ZTC ). ZTC stands for zero-TC to capture this temperature-independent behavior (at least to the first order). In our 65-nm CMOS processes, choosing C = 9 pF, R P (unsalicided P+ diffusion resistor) = 80 k, and R N (unsalicided N+ poly resistor) = 55 k resulted in a T ZTC = 1.06 μs. Having identified a stable time base in the form of T ZTC , we now describe how to build a temperature-insensitive frequency reference using it.
The schematic of digital FLL that generates a stable frequency reference by locking the period of a ring oscillator to T ZTC is shown in Fig. 4. It consists of the RC network, a clocked comparator, a digital accumulator, a digitally controlled ring oscillator (DCRO), a frequency divider, and a phase generator. The frequency of DCRO's output, C K OUT , is divided by N DIV , and three clock phases, RC , COMP , and ACC , are generated from it, such that COMP 's positive edge is separated from the negative edge of RC by N DIV /2 output  clock periods (T OUT · N DIV /2). The factor of 2 in the expression appears because only half the divided clock period is allocated for RC charging.
As described earlier, when RC transitions from high to low, RC-network's output voltage, V RC , starts settling toward V DD . The clocked comparator samples V RC on the rising edge of COMP and compares it with a reference voltage V REF that is set to be equal to V ZTC . If the sampled voltage is less than V ZTC , the comparator outputs a logic one indicating COMP arrived early, as depicted in Fig. 5. In other words, it shows that the DCRO's period (T OUT ) is smaller than the target period, indicating DCRO is running faster. Similarly, if the comparator outputs a logic-low signal, it means that DCRO is running slow. The comparator output is integrated by a digital accumulator and used to tune DCRO's frequency. In the steady state, the average of the comparator output is zero, and the input voltage sampled by the comparator is V ZTC , leading to an output period (T OUT ) and frequency (F OUT ) equal to Thus, by making T ZTC independent of temperature as described earlier, F OUT becomes insensitive to temperature variations. However, in practice, the comparator's offset voltage severely degrades the TCO's stability. Denoting the input-referred offset voltage of the comparator by V OS , T ZTC and its temperature sensitivity can be expressed as a function of V OS as follows: Because V OS has strong temperature dependence, T ZTC and consequently F OUT also becomes highly sensitive to temperature. We address this drawback by compensating the comparator's offset.

A. Offset-Compensated Comparator
The schematic of the offset-compensated comparator is shown in Fig. 6. It is composed of a preamplifier, a strong-arm latch-based flip-flop (SAFF), and a pair of offset-storage capacitors (C OS ) connected to the output of the preamplifier. The SR latch is disconnected from the SA latch during the SA latch's amplification and regeneration phases to prevent kickback. The SAFF's offset is suppressed by a coarse offset cancellation scheme in which its inputs (V IP , V IN ) are shorted together, the circuit is clocked, and the decisions are accumulated in a register and used to control the amount of capacitance connected to drains of the input NMOS pair [21]. The unequal amount of capacitance added at the drain nodes compensates for the SAFF offset. Thirty-one thermometer-coded unit capacitors (1.4 fF) were added on each side to achieve an offset cancellation range and resolution of ±40 and 1.3 mV, respectively. The preamplifier's gain ( A PRE ) further attenuates the SAFF's offset, but the preamplifier itself introduces offset, which becomes the dominant source of the comparator offset. So, an output offset cancellation scheme depicted in Fig. 6(a) is used to mitigate it [22]. The operation of OCC is controlled by two nonoverlapping clock phases A and B . When A is high, and B is low, input voltages V REF and V RC are tracked onto capacitors (C S ), preamplifier inputs are shorted to V REF , and the amplified offset (A PRE V OS ) is stored on C OS . When A and B are both low, C S holds the sampled V RC , and C OS has the amplified offset voltage of the preamplifier. As soon as B goes high, the preamplifier amplifies the difference between the sampled voltages. At the rising edge of LCH , the comparator produces an output based on the amplified input voltage difference minus the stored preamplifier's offset voltage The timing diagram can be found in Fig. 13. The simulated preamplifier gain is 45 V/V, and the sampling and offset-storage capacitors were chosen to be 990 and 588 fF, respectively. Monte-Carlo simulations (200 iterations) were performed to quantify the impact of the comparator offset on frequency error and the effectiveness of the offset-cancellation scheme in mitigating it. The results depicted in Fig. 7 show that without offset compensation, the mean and standard deviation of the peak-to-peak frequency variation (F OUT P-P ) over a temperature range of −40 • C to 80 • C is 23 518 and 10 910 ppm, respectively. The preamplifier's offset compensation reduces the mean variation by 20 times to 1222 ppm, and the standard deviation improves to 1044 ppm. The coarse cancellation of SAFF's offset further reduces the mean and standard deviations to 510 and 760 ppm, respectively.

B. Settling Time Improvement
The proposed TCO suffers from a long settling time like the conventional TCOs. The main reason for this drawback is the combination of the low update rate and small update step size. As depicted in Fig. 8, the output frequency starts from DCRO's free-running frequency F FR and slews toward the desired output frequency F DES in steps of DCRO's LSB (8.6 kHz in the prototype), resulting in a very long settling time. The locking process can be sped up by increasing the update step size, but this increases the dithering jitter because the comparator's nonlinearity causes the TCO's steady state to be a bounded limit cycle. To overcome this issue, we employ the successive approximation register (SAR) algorithm [23] and quickly bring the output frequency to the desired frequency (see Fig. 8). After the initial warm-up phase, the MSB of the K -bit SAR register is set to 1, causing the DCRO frequency to be in the middle of its tuning range. In the next update cycle, the second-most significant bit is set to 1, and the MSB is set based on the result of the first comparison. This binary search process continues until all the K bits in the SAR are tested, which takes only K updates to complete compared to at most 2 K updates in the conventional case. At the end of the SAR search, the control is handed back to the accumulator mode of operation.

III. FRACTIONAL DIVIDER
The FDIV schematic is shown in Fig. 9. It comprises an edge combiner (EC), a multimodulus divider (MMD) controlled by a modulator, a digital-to-time converter (DTC), and a digital PLL for calibrating the DTC. A 17-bit digital   Of the 17-bits, 7-bits (D INT ) are used to set the integer division ratio, and the remaining 10-bits (D FRAC ) are used to set the fractional division ratio. D FRAC is truncated to 1-bit using a first-order modulator, and the resulting output is added to D INT and used to control the MMD [24]. The shaped truncation error of the modulator appears as a jitter in the FDIV's output clock. Note that in a typical fractional-N PLL where the FDIV is commonly used, the loop filter suppresses the truncation error and significantly reduces its impact on the PLL's output jitter. However, in the absence of such filtering, as in an open-loop FDIV, the truncationerror-induced jitter is prohibitively large (about one input CLK period, T IN ), severely limiting FDIV's usability.
We employ two mitigation techniques to reduce the jitter. First, FDIV's input clock frequency is increased by three times, thereby reducing the modulator-induced jitter by a factor of 3 to about T IN /3. To this end, the outputs of three inverters in the DCRO ( 1/2/3 ) are edge-combined to generate a clock, CK 3X , at three times the DCRO output frequency as shown in Fig. 10. This frequency multiplication technique is susceptible to mismatches between the three inverters as phase spacing errors caused by them directly appear as jitter at the FDIV output. The jitter performance can be improved by upsizing the logic gates for edge combining and minimizing the routing mismatches in the layout. Note that the modulator is clocked at FDIV's output frequency, so the proposed frequency multiplication only incurs a minor power penalty in the first couple of stages in the MMD due to increased input frequency.
The EC reduces the aforementioned modulator-induced output jitter to T IN /3, but it is still prohibitively large in most applications. So, a second technique, truncation error cancellation (TEC), is employed to reduce the jitter further [25] (see Fig. 9). The TEC uses a DTC at the output of the MMD to add the right amount of delay and cancel the jitter caused by the modulator's truncation error. To this end, the truncation error is extracted by taking the difference between the modulator's input and output, accumulated to account for implicit integration due to frequency-to-phase conversion in the MMD before using it to control the DTC [25]. For this approach to be practical, the DTC gain must be tuned precisely such that the range of DTC is equal to one input clock period, a requirement difficult to guarantee in practice due to DTC's sensitivity to process, voltage, and temperature (PVT) variations. Therefore, the DTC gain is calibrated using a least-mean-square (LMS) algorithm running in the background [25]. However, unlike in [25], an error signal (ERR) needed for LMS adaptation is unavailable in the open-loop FDIV scenario. So, a separate digital PLL, implemented using a bang-bang phase detector (PD), a digital loop filter (DLF), and a digitally controlled oscillator (DCO), is used to generate the ERR, as depicted in Fig. 9. The LMS loop correlates the accumulated truncation error (D DTC ) with the ERR signal and generates the DTC gain calibration code, D GC .
The DTC's schematic and gain-controlling circuitry are shown in Fig. 11. The DTC is implemented using a current-starved inverter. Its gain is tuned by starving the inverter of charging current using voltage V GC , generated from D GC using a digital to analog converter (DAC). DTC delay is tuned by controlling the inverter's load capacitance with D DTC . The digital PLL was designed to occupy a small area and to turn on rapidly.
IV. CIRCUIT IMPLEMENTATION The detailed schematic of the proposed clock generator is shown in Fig. 12  is shown in Fig. 13. The phase generator produces all the clocks shown in the figure from the divided feedback clock. On the falling edge of RC , the RC network's output voltage V RC starts charging exponentially toward V DD and the voltage across the sampling capacitor C S tracks V RC until the falling edge of A . During this phase, the preamplifier's offset voltage is stored on capacitors C OS . The offset cancellation of the SAFF is also carried out during this period. On the rising edge of B , the voltage held on C S is amplified by the preamplifier and sampled at the subsequent rising edge of LCH . The SAR logic and accumulator use the comparator's decision to update the DCRO control word D SAR . The frequency of DCRO needs to settle before the next RC charging starts; a requirement easily met in practice.
The circuit implementation of some critical blocks in the loop is described in Sections IV-A-IV-C.

A. RC Network
In the prototype manufactured in 65-nm CMOS processes, the RC network is implemented using C = 9 pF, R P (unsalicided P+ diffusion resistor) = 80 k, and R N (unsalicided N+ poly resistor) = 55 k. We provide the design considerations for choosing R P , R N , and C as follows. From the expression of the reference time constant shown in (1), and the temperature dependence of the resistors described by the following expressions: R P = R P,0 1 + T · tc 1,P + T 2 · tc 2,P it is easy to show that scaling the nominal resistance R N,0 and R P,0 by factor K results in scaling of T RC by the same factor. Therefore, the normalized spread of T RC across temperatures is independent of the scaling of the nominal resistance. In other words, when an oscillator is locked to the time constant T RC , the temperature stability of its frequency is only determined by the TCs, and the ratio between R N,0 and R P,0 . The design procedure used to size the RC network is described next. For a given set of TCs, the ratio between R N,0 and R P,0 is swept, and the temperature stability of T RC is calculated. At each setting, the reference voltage V RC0 is set to V ZTC , which is found by sweeping V RC0 and finding the value that gives After fixing the ratio of resistors that achieve a T RC with minimum temperature dependence, the  capacitance and absolute resistance are chosen. Note that T RC scales linearly with the capacitance as well, so the capacitor does not affect the frequency stability. However, its value should be chosen to be much larger than the sampling capacitor connected to the output of the RC network so that the first-order settling response assumption is valid. To satisfy this condition, a 9-pF capacitor was selected. The resistance is then chosen to achieve a T RC value of around 1 μs to provide ample time for operations such as voltage amplification, SA latch decision, DCRO control word generation, and DCRO settling. Global resistor variation across corners does not affect the temperature stability because it does not change the resistance ratio. Local resistor variation changes the ratio and therefore degrades the temperature stability. However, according to simulation, even with a 20% skew in resistance, the peakto-peak inaccuracy of the reference time constant across the temperature is limited to 200 ppm.

B. Reference Voltage Generation
The reference voltage to the comparator, V REF , is generated using a 15-bit second-order DAC. The modulator truncates the 15-bit digital input D REF to one bit, which is converted to voltage using a simple 1-bit resistor DAC and filtered using a second-order RC low-pass filter. A two-point trimming method is used to determine V ZTC in the form of the DCW D ZTC . D REF was swept at −40 • C and 85 • C, the corresponding output frequencies were recorded, and D ZTC  was found at the intersection point of the two frequency datasets, as shown in Fig. 14.

C. Digitally Controlled Ring Oscillator
The DCRO schematic is shown in Fig. 15. It consists of a DAC and a three-stage CMOS inverter-based  voltage-controlled ring oscillator (VCRO). The five most significant bits of the 12-bit input (D SAR ) are thermometer-coded and used to control 31 unit-cell current sources. The rest of the seven least significant bits are truncated to 1 bit using a modulator, and the resulting output controls one unit cell. The current from the 32 unit cells is summed and filtered using a first-order RC filter in the current mirror formed by devices M 1 and M 2 . The filtered current is converted to control voltage, V CTRL , using resistor R DAC . The RO frequency is tuned with V CTRL by varying the inverter's load capacitance, implemented using an MOS varactor. The simulated DCRO's tuning range and gain are 37 MHz and 8.6 kHz/LSB, respectively.

V. MEASUREMENT RESULTS
A prototype clock generator was fabricated in a 65-nm CMOS process and packaged in a plastic QFN package. The die micrograph is shown in Fig. 16. The active area is 0.15 mm 2 . The total power consumption of the TCO is 547 μW, and its breakdown is shown in Fig. 17(a). 64% of the power is consumed by the clock drivers, which are implemented using inverters operating at 1.8 V to drive the high-voltage (low-leakage) switches used in the RC branch and the OCC. The driver power can be significantly reduced by better transistor sizing in the driver. Transistor-level simulations indicate that 99% of the power of the clock drivers is consumed by the dc current in the first few stages of the inverter chain, which operates in the 1.8-V domain and with an input clock in the 1-V domain. Halving the size of the first inverter reduces the power by half at the expense of about a 30 ps increase in the driver output's delay. Since the drivers are used for the sampling switches in both the RC branch and the OCC, the extra 30 ps delay is common to RC , A ,  and B , resulting in a negligible change in the output frequency and its temperature stability. The power consumption of FDIV, shown in Fig. 17(b), is 464 μW at 100-MHz output frequency, of which the DTC consumes 244 μW, and the digital blocks (MMD, modulators, TEC logic) consume the remaining power. The power of FDIV reported in [20] is inaccurate as it included the power consumed by test circuitry used for monitoring purposes. The PLL consumes 1.2 mW, but the duty-cycling ratio will reduce its contribution to the total power consumption in the mission mode. Fig. 18 shows the measured temperature stability of ten TCO samples. Each sample is trimmed at two temperatures (−40 • C and 85 • C), and the digital code D REF to generate the optimal V ZTC is written onto an on-chip register. The worst-case inaccuracy is 6.8 ppm/ • C, illustrating the effectiveness of the proposed techniques in mitigating temperature sensitivities. The output frequency settling behavior during a power-on event shows the output frequency settles in less than   20 μs, compared to 3000 μs when the SAR-logic is disabled, as shown in Fig. 19. The initial 7.5 μs is allocated for the DCRO and other circuits to warm up, and the SAR logic takes 12.5 μs.
The TCO's peak-to-peak and rms period jitter at 133-MHz output frequency are 28 and 3.5 ps, respectively (see Fig. 20). Fig. 21 shows the measured Allan deviation in a 1-s stride is 4 ppm. The performance of the FDIV is presented next.
The measured FDIV output frequency versus the integer division control word (D INT ) shown in Fig. 22 indicates an output frequency range of 1.5-100 MHz is achieved. A sweep of the fractional division control word (D FRAC ) shows that the  worst-case resolution is 24 kHz (see the inset of Fig. 22). The peak-to-peak period jitter when D INT = 4 and D FRAC = 10 (F OUT ≈ 100 MHz) is 2500 ps when TEC is disabled. This large jitter is expected and equals one period of the 400-MHz FDIV input clock, as shown in Fig. 23. When the TEC is turned on, the peak-to-peak period jitter reduces to 140 ps across the entire temperature range.
When the DTC calibration code is fixed at its value at room temperature, the jitter degrades to 470 ps at −40 • C, indicating the necessity to perform background calibration (see Fig. 24). The period jitter along the signal path is illustrated in Fig. 25. The peak-to-peak period jitter is increased from 28.7 ps at the TCO output to 140 ps at the FDIV output. The degradation is caused by both the deterministic jitter and random jitter introduced along the path. The deterministic jitter is mainly caused by the systematic and random mismatch between the three phases in the ring oscillator and the EC. In the prototype, the three inverter stages are placed sequentially, like the placement shown in Fig. 15. Therefore, the three inverter stages have different surroundings, and the routings between the three stages are unequal, thus causing a systematic mismatch. The mismatch caused by the random variation of the device parameters can be significant as well. Minimum-sized gates in the EC of the prototype can suffer from substantial mismatches, which can be reduced by upsizing the transistors. Apart from the deterministic jitter, the random jitter introduced by the DTC's thermal noise is another big source of jitter degradation, which can be significantly reduced by increasing the charging current and proportionally upsizing the capacitor.
The performance of the TCO is summarized and compared to state-of-the-art RC oscillators in Table I. The proposed TCO's temperature stability is comparable to the state-of-theart. Still, it also achieves fast startup and provides output with digitally programmable output frequencies, two attributes not present in any of the reported RC-oscillator-based clock generators.

VI. CONCLUSION
A new clock generator capable of providing multiple temperature-compensated outputs with digitally programmable frequencies is presented. It locks the period of a ring oscillator to the temperature-insensitive time constant of the RC network and generates a temperature-stable output. The settling time of this locking loop is significantly reduced without sacrificing the jitter performance by using a binary search algorithm implemented using digital SAR logic. Multiple low-jitter outputs are generated using FDIVs in which the delta-sigma-induced jitter is reduced by canceling modulator's truncation error with a calibrated DTC placed at the output of the divider. A prototype clock generator fabricated in a 65-nm CMOS process generates output clocks in the range of 1.5-100 MHz with a resolution of 24-kHz, 140-ps peakto-peak period jitter, 6.8-ppm/ • C inaccuracy, and can be turned on within 20 μs, making it an attractive alternative for clock generators in low-power microcontrollers.