Ultra-Low Power 32kHz Crystal Oscillators: Fundamentals and Design Techniques

One of the challenges to the proliferation of Internet of Things is ultra-low power circuit design. Wireless nodes common in IoT applications use sleep timers to synchronize with each other and enable heavy duty cycling of power-hungry communication blocks to reduce average power. 32kHz crystal oscillators remain the most popular choice for sleep timers thanks to their frequency stability, simplicity, and low cost. Because sleep timers must be always on, their power consumption must be low compared to the average power of wireless nodes. Meantime, 32kHz crystal oscillators must operate reliably under process, voltage, and temperature variations and exhibit good long-term stability, which make circuit design challenging considering their ultra-low power operation. This paper reviews the state-of-the-art in ultra-low power 32kHz crystal oscillators. Fundamentals of crystal oscillators are introduced and analyzed from the perspective of power and frequency stability. Based on these fundamentals and analyses, existing design techniques of 32kHz crystal oscillators are discussed, highlighting the evolution of architectures in ultra-low power 32kHz crystal oscillators. Finally, research directions related to 32kHz crystal oscillators are introduced.


I. INTRODUCTION
T HE ACCURACY of synchronization in time plays a vital role in the modern society, such as the global positioning system (GPS), manufacturing, and stock markets. One second is currently defined as being equal to the time duration of 9192631770 periods of the radiation corresponding to the transition between the two hyperfine levels of the fundamental unperturbed ground state of the caesium-133 atom [1]. Before this "atomic" second, sundials, water clocks, mechanical clocks, quartz crystal oscillator clocks, and ephemeris time (the orbital motion of the Earth around the Sun) have been used for timekeeping or frequency references. From sticks in the ground to atomic clocks, the fundamental elements in clocks or frequency references do not change: 1) a frequency-defining mechanism; 2) a readout interface; and 3) an energy source and driver to sustain the frequency-defining operation (see Fig. 1). While humans often investigate clocks to improve their accuracy, demand for low cost and portable clocks has also fueled efforts on reducing their dimensions and power demands. One critical starting point in creating low-cost and portable clocks is the invention of the quartz crystal oscillator by Walter Guyton Cady in 1921 [2]. A crystal oscillator (XO) is defined as an electric oscillator circuit that uses the mechanical resonance of a vibrating crystal of piezoelectric material to generate an electrical signal of a constant frequency [3], [4]. Referring to Fig. 1, in an XO, the mechanical resonance of a vibrating quartz crystal serves as the frequency-defining mechanism, and electrical circuits provide energy to sustain the mechanical resonance and generate an electrical signal as output clock. The properties of a quartz crystal that make it highly suitable for frequency references are its great mechanical and chemical stability, which assure a precise and reliable resonance frequency. In addition, the quartz crystal exhibits an extremely high quality factor (Q) on the order of tens of thousands thanks to its extremely low elastic hysteresis. This means it only dissipates a very small fraction of its stored energy in each period of vibration [5], [6]. However, it is the piezoelectric effect and its converse effect that really make XOs stand out as ideal candidates for low-cost and low-power frequency references. Thanks to these two effects, quartz crystals can directly interface with electric circuits that benefit from decades of process scaling (i.e., Moore's law). The first quartz wristwatch was presented in 1967 [7]; the fact that this occurred 46 years after the invention of the quartz crystal oscillator is largely due to the fact that integrated circuits of small dimensions and low power consumption were needed to make it possible, and not previously feasible. 32kHz crystals were chosen for quartz watches in the early 1970s because of their compatibility with existing electronic circuits, small dimensions, and low power consumption [8].
Since then, 32kHz XOs have remained the most popular choice for low-cost and low-power real time clocks. The emerging Internet of Things (IoT) relies on wireless nodes that incorporate sensing, computing, and communication capabilities. Because the available batteries and harvested energy are limited due to small form factors, power-hungry communication blocks (RX/TX) in these wireless nodes must be heavily duty cycled and synchronized to reduce average power while achieving reliable data transmission and reception, as shown in Fig. 2 [9]. This duty cycling and synchronization scheme is enabled by an alwayson sleep timer in each wireless node, and 32kHz XOs are a natural choice for these sleep timers owing to the properties discussed above. The average power of a wireless node can range from <10nW to µW, which may be much lower than the power consumption of a conventional wristwatch (<10µW). Because the 32kHz XO in the sleep timer must be continuously operating, its power must be minimized so that it does not dominate the average power. Though the crystal is stable compared to on-chip passive components in CMOS processes, the drive circuit for sustaining the oscillation from the energy source and the read-out circuit in Fig. 1 must work reliably under process, voltage, and temperature (PVT) variations. Such reliable circuit operation under PVT is challenging considering the concurrent requirement of ultra-low power consumption. Furthermore, depending on the long-term stability of the sleep timers, the duty-cycled periods, T Duty may show variation, and RF blocks in the wireless node must wake up early (guard band shown in red in Fig. 2) to accommodate this inaccuracy [10]. The minimum duration of this guard band is set by the frequency variations at an averaging window of T Duty , which is evaluated with Allan deviation at T Duty . For example, assuming there are two XO designs (XO1 and XO2), and that, at an averaging window of 100s, Allan deviation of XO1 is 1/5 the Allan deviation of XO2. For a duty-cycled edge device with T Duty = 100s as shown in Fig. 2, the required guard band with XO1 as sleep timer can be 1/5 of the guard band with XO2 as sleep timer. This means the power overhead due to guard band is reduced to 1/5 thanks to better long-term frequency stability or lower Allan deviation of XO1. In summary, IoT applications introduce new challenges for 32kHz XO design in terms of ultra-low power, PVT variation tolerance, and long-term frequency stability.
In the past ten years, ultra-low power 32kHz XO performance has greatly improved [9]- [19]. Fig. 3 presents the power consumption and Allan deviation floor of nW XOs published from 2012 to 2020. Reference [17] from ISSCC 2012 presented the first 32 kHz XO that consumed less than 10nW; now the lowest power consumption for an XO is close to 0.5nW. The lowest Allan deviation floor has reduced from >10ppb to 2ppb. Compared with XO designs before 2012 that consume significantly more than tens of nanowatts, it is reasonable to consider how these works achieve nW or sub-nW power consumption, i.e., what design techniques led to this dramatic power reduction? Further, what are the fundamental limitations on minimum achievable power? This work seeks to provide a review of state-of-art nW 32kHz XO designs, including the answers to these questions. This paper is organized as follows. Section II discusses fundamentals of 32kHz crystal oscillators including the resonator and circuit functions. Section III discusses XO design considerations as they relate to energy, noise, and PVT variations. Section IV introduces existing design techniques in ultra-low power 32kHz XOs. Section V summarizes the performance of state-of-the-art nW XOs. Research directions related to 32kHz XOs are presented in Section VI. Finally, Section VII concludes the paper.

II. FUNDAMENTALS OF 32KHZ CRYSTAL OSCILLATORS
XO frequency is defined by the crystal resonator. Referring to Fig. 1, the circuit in a XO interfaces with the crystal to sustain the oscillation and generate output clock. In order to design the XO circuit, it is necessary and helpful to understand the characteristics of the crystal resonator and the required functions of the XO circuit.

A. ELECTRICAL MODEL OF THE ELECTROMECHANICAL CRYSTAL
The mechanical resonance can be modeled with a series RLC branch (R S , L S , and C S ) as shown in Fig. 4. C O models the capacitance due to electrodes and package, and C P models extra load capacitors and parasitic capacitance at the two nodes of the crystal. One advantage of a crystal resonator is its high Q. From the perspective of power consumption or energy, Q can be understood as a ratio between dissipated energy in one cycle and stored energy in the resonator [5]: Other high Q resonators such as steel tuning forks [20] were also used to make "electronic watches". However, the oscillator circuit cannot directly interface with the steel tuning fork, and it sustains the vibration of the steel tuning fork by driving electromagnets. In the case of crystal resonators, thanks to the piezoelectric effect and its converse effect, if we look into its two nodes, we can treat it as an electric circuit modeled as Fig. 4. The circuit designed for the crystal resonator can directly interface with it to extract phase or frequency of the oscillation and inject energy to compensate the loss.
To provide a quantitative analysis, consider a 32 kHz crystal model with R S = 50k , L S = 17kH, C S = 1.39fF, C O = 1.35pF, and Q = 70000 (estimated model of an ECS-2X6-FLX crystal [21]), assuming 100mV oscillation amplitude, V OSC across the crystal and 1pF C P (all the calculations in Sections II and III use this crystal model and assumptions on V OSC and C P ). There are three things that we can observe from the crystal model in Fig. 4 and the value of the components here. First, due to R S , during every oscillation cycle the circuit dissipates a certain amount of energy. Second, referring to (1) and Q = 70000, a very small portion of the stored energy is dissipated after each cycle. Third, because C S is in series with L S and we can only access the two nodes of the crystal, we cannot directly add energy to inductor L S by applying a voltage across it. We will discuss more about these three observations later in this section and Section III.
We now consider a scenario where energy is added into the crystal and it is allowed to run freely. Because V 1 and V 2 are 180 degree out of phase to each other in this case, the crystal model can be simplified as in Fig. 5. Assuming that the initial oscillation amplitude across the crystal is V OSC = 100 mV, we can calculate the amplitude attenuation after each period of oscillation [9]. First, with the simplified model in Fig. 5, the energy stored in the crystal can be calculated as: C L is the total load capacitance and C L = C O + 0.5C P . Then, the amplitude attenuation of V OSC,INT in Fig. 5 can be obtained [9]: Finally, because V OSC has the same attenuation ratio as V OSC,INT , the attenuation of V OSC after one cycle due to crystal loss can be estimated as [9]: With V OSC = 100 mV, after one cycle the oscillation amplitude attenuation, V OSC , is only 4.5µV because of the high Q of the crystal.

B. RESONANCE MODES
Because the crystal model in Fig. 4 includes a series RLC branch and C O (or C L ) in parallel, the resonator can operate in two different types of modes [9]: "parallel resonance" and "series resonance", as seen in Fig. 6. In "parallel resonance," inductor L S resonates with C S in series with load capacitance C L , which is a combination of C O and C P . There is a phase shift of 180 degree across the crystal, because the inductor current inherently pulls up/down V 1 while pulling down/up V 2 . One important feature of the parallel resonance mode is that if the driver is turned off, the crystal will stay in parallel resonance and continue to oscillate as the amplitude decays. In "series resonance," the inductor resonates with C S only. It requires a driver to maintain zero phase shift across the crystal. In contrast with parallel resonance, the driver must always be on to maintain series resonance. If the driver is turned off, the inductor current would instantaneously cause a phase difference between V 1 and V 2 by pulling up one of V 1 and V 2 while pulling down the other one. Hence, once the driver in series resonance is turned off, the crystal switches to parallel resonance. Fig. 6 also shows the voltage waveform at V 1 and V 2 in these two modes. The oscillation amplitude across the crystal is V OSC .

C. OSCILLATION FREQUENCY
Almost all 32kHz quartz crystals are manufactured to resonate with a certain load capacitance (typically 6pF to 15pF) at 32.768 kHz at 25 • C. This specific frequency is chosen because 32768 = 2 15 , and 1 Hz output clock can be easily obtained with 15 divide-by-two frequency dividers, or flip-flops, following the 32.768 kHz XO output. 32.768kHz is emphasized here because most of the nW XO designs achieve low power at the cost of frequency deviation from the exact 32.768kHz, which will be discussed in Section IV. In the case of 32.768kHz crystal with pre-defined load capacitance, the XO is in parallel resonance because inductor L S resonates with C S in series with load capacitance, C L , and the oscillation frequency is: Fig. 7 shows the calculated oscillation frequency at different load capacitances with the model of an ECX-34Q-S crystal (required load capacitance = 6 pF) [22]. We can tell from (5) that its first term is the "series resonance" frequency: When C L increases, the oscillation frequency decreases and gets closer to this "series resonance" frequency. Meanwhile, at large C L , the oscillation frequency is less sensitive to the change or variation of C L . Because C S /C L is far less than 1, (5) can be estimated as: Frequency variation due to variation of C L can be calculated: With C S = 3.5 fF [22] and C L = 6 pF, frequency variation due to C L variation can be estimated as 48.6 ppm/pF. Surface-mount device (SMD) capacitors can have tolerance from ±1% to ±5%, and with 48.6ppm/pF sensitivity and C L = 6pF, this would lead to about ±3ppm to ±15ppm frequency variation (or about ±0.1Hz to ±0.5Hz absolute frequency variation). We review the resonance frequency calculations here because in Section IV reducing C L is discussed as one of the low-power techniques and it would lead to a change of the resonance frequency and the sensitivity of the resonance frequency to C L variations. At 25 • C, frequency tolerance of the 32kHz crystal due to manufacturing is typically ±10 ppm or ±20 ppm. Since most 32kHz crystals are tuning fork or XY cut [3], the oscillation frequency is a second-order function of temperature with its peak around 25 • C: T 0 is the turnover temperature (∼25 • C), defined as the point where the curve of oscillation frequency versus temperature is at its peak, and f 0 is the oscillation frequency at T 0 . α is a temperature coefficient with a unit of ppm/ • C 2 . Fig. 8 presents the oscillation frequency of an ECX-34Q-S crystal [22] (α = −0.034 ppm/ • C 2 ) versus temperature with T 0 = 25 • C and the range of frequency variation due to crystal tolerance, variation in T 0 , and variation in temperature coefficient (α).
In addition to the simplified model in Fig. 4, crystals exhibit several modes of oscillation at its harmonic frequencies [3], which are termed "overtone modes". The second overtone of 32kHz crystal is about six times the fundamental frequency. These overtone modes can be  avoided by limiting the drive level or loop gain at overtone frequencies.

D. FUNCTIONS OF THE CIRCUIT IN A CRYSTAL OSCILLATOR
Once a specific crystal component is chosen, how does one go about designing the circuit to create a crystal oscillator? What are the fundamental requirements or functionalities of this circuit to generate a 32 kHz output?
We start with a conventional structure, a Pierce crystal oscillator, which is the most widely used structure for 32kHz crystal thanks to its simplicity and reliability. It only requires an inverter to serve as an inverting amplifier and two resistors, as shown in Fig. 9. R Bias is used to provide DC biasing at the inverter's input. A resistor, R Series , is typically added in series with the output of the inverter to limit the drive level, which prevents damaging the crystal and provides phase shift or filtering. The inverter, the load capacitances, and the resistors must be carefully sized to enable the oscillation [15], [23], which will be discussed in Section III-B. The output inverter can convert the sinusoidal waveform from the crystal to the square wave clock output at the cost of short circuit current. With resistors and load capacitors, the inverter works as a continuous amplifier that must satisfy gain and phase requirements to sustain the oscillation of the crystal. If the voltage swing at V 2 is small, an extra inverter is typically added to generate a rail-to-rail output clock.
By analyzing this conventional structure, we can obtain the core functions of the XO circuit: 1) it must extract frequency and phase from the crystal waveform; 2) it must inject energy into the crystal to compensate for the loss in the crystal; 3) it generates timing control, so energy is injected at the proper time.

III. ENERGY AND NOISE IN XO DESIGN
Now we have introduced the crystal model and the fundamental requirements of the circuit to interface with the crystal. The next questions we consider are: 1) what is the lowest power that a 32kHz XO can achieve? 2) what sets this fundamental limitation in power consumption? and 3) how is energy injected into the crystal? A final question relates to design considerations associated with noise and PVT variations.

A. CRYSTAL OSCILLATION AND LOSS
As discussed in Section II-D, the circuit needs to 1) extract frequency and phase from the crystal waveform; 2) inject energy into the crystal to sustain the oscillation; 3) determine timing for energy injections. These requirements each require some level of power consumption: power for extraction, power injected into the crystal, and power for timing generation. Even with an ideal circuit that can perfectly accomplish all three tasks, energy is still required to compensate the loss in the crystal. Hence, the crystal loss determines the power limit.
In parallel resonance ( Fig. 6 left), because V 1 and V 2 are 180 degree out of phase to each other, the simplified crystal model from Fig. 5 can be used. C L is the total load capacitance, and C L = C O + 0.5C P . Because C L and C S form a capacitive voltage divider, and the oscillation amplitude across C L is defined as V OSC as shown in Fig. 6, the oscillation amplitude across R S and L S can be obtained as: Then, the crystal loss in parallel resonance can be obtained by calculating the power dissipated in R S [10]: ω S is the resonance frequency of L S and C S , 1/ √ (L S C S ).
Hence, crystal loss in parallel resonance is proportional to R S and a quadratic function of both V OSC and C L . In series resonance, because L S resonates with C S , V OSC is across R S , which makes crystal loss a quadratic function of V OSC and inversely proportional to R S [9]: From Equation (11), in order to reduce crystal loss in parallel resonance, we should choose a crystal with a small R S and reduce the oscillation amplitude, V OSC , and load capacitance, C L . Using the 32 kHz crystal model from Section II-A, the crystal loss in parallel resonance is calculated as 36pW. XOs in series resonance can generate about 100mV peak−peak single-ended oscillation amplitude while keeping V OSC = 2 mV across the crystal [11]. With the crystal model in Section II-A and V OSC = 2 mV, the crystal loss in series resonance is calculated as 40pW using (12). The crystal loss of 36pW in parallel resonance and 40pW in series resonance set fundamental limits on the lowest possible power consumption under the chosen assumptions of crystal parameters, V OSC , and C P [9].

B. HOW TO SUSTAIN THE OSCILLATION
For a conventional Pierce XO in Fig. 9, the method of equivalent negative resistance [6], [23] is commonly used to derive the transconductance requirement of the driver to sustain the oscillation: This negative resistance is a non-monotonic function of g m but considering the range of g m for nW XO design, R Neg typically does not reach its minimum and decreases as g m increases. Again using the crystal model in Section II-A, (13) requires a driver g m of about 29nS to achieve a negative resistance lower than −50k . Assuming g m /I D of 20V −1 to 30V −1 for both PMOS and NMOS, we can obtain the required bias current range of the continuous driver as 0.48nA to 0.73nA. Because we target a 100mV oscillation amplitude across the crystal and assume 50mV V DS for transistor to stay in saturation, a minimum power supply of 200mV is needed. Hence, for a continuous gm-based driver, under the above assumptions, the minimum power to sustain the oscillation can be estimated as 96pW to 146pW. Please note that the calculation above is for XOs in parallel resonance. In series mode XOs, the drivers directly provide the AC current through R S to sustain the oscillation. This requires the driver to be a power amplifier. Assuming Class-B operation in the driver (maximum efficiency = 78.5%) and 40pW (Section III-A) dissipated by R S , the power of the driver to sustain the oscillation would be ∼51pW.
In Section III-A, we derived that the crystal loss in both parallel resonance and series resonance is approximately 40pW. Based on the calculations above, the conventional structure in parallel resonance is less efficient to sustain the oscillation compared with the series mode XO. To dive deeper into this low efficiency, we would like to create a more general model to understand the energy injection in parallel mode, which would help us to obtain some observations that leads to design techniques in Section IV.
To investigate how the driver injects energy into the crystal in parallel mode, we use the model in Fig. 10(a). The pull-up and pull-down switches can be turned on for a short period of time to pull up/down voltage at V 1 . This configuration can be used to model the continuous gm-based driver by keeping the pull-up and pull-down switches on and making the current through these switches change with V 2 . Since in parallel resonance the series RLC branch shows high impedance, Fig. 10(a) can be simplified to Fig. 10(b) during injections to analyze the energy injection. So, energy is injected into the capacitor network including C O and C P and then part of this energy is transferred to the crystal through the inductor current. This perspective provides a way to calculate the energy injected into the crystal by checking the stored energy in the capacitor network before and after the injections [10]. How can we add energy to the capacitor network when the voltage at the injection node, V 1 oscillates as a sinusoidal wave while keeping the DC value of V 1 ? To avoid DC shift at V 1 , we cannot simply keep pulling up V 1 to add energy but need one pull-up and pull-down pair with the same amount of DC shift, V INJ , as shown in Fig. 10(c). We assume the pull-up injection occurs when V 1 = V UP and the pull-down injection happens when V 1 = V DN . As calculated in Section II-A, attenuation of the oscillation amplitude is only 4.5µV at V OSC = 100mV. To sustain the oscillation, the driver must provide energy to compensate the crystal loss. The energy added to the capacitor network because of the pull-up injection can be calculated as: C Network is the equivalent capacitance seen by the singleside driver during a single-sided injection and C Network = C P + C O ||C P [10]. The energy removed from the capacitive network due to the pull-down injection can be calculated as: Then, the energy added to the capacitive network after one pull-up and pull-down pair can be calculated as: The energy from V DD is: If we perform single side injection with an injection step of V INJ , the injection efficiency is: When V UP = V DN − V INJ , there would be no energy added to the capacitor network after one pull-up and pull-down pair as shown in Fig. 10(d). To maximize the energy injected into the crystal and the injection efficiency with a certain value of V INJ , we need to maximize V UP −V DN +V INJ in our design. The max value of V UP − V DN + V INJ is equal to V OSC as shown in Fig. 10(e). In this case the pull-up injection happens at peak of V 1 , and the pull-down injection happens at the valley of V 1 . Referring to (18) and Fig. 10(e), to achieve the highest efficiency, V DD is reduced to be V OSC + V INJ . The required injection step, V INJ , to compensate the crystal loss can be obtained using (11) and (17): With the crystal model in Section II-A and 100mV V OSC , the injection step can be calculated as 7mV, which leads to an injection efficiency of 93.5%. If we set a delay of T/2 between the pull-up and pull-down injections, the injection efficiency changes from 0 to 93.5% as the pull-up injection timing changes from the DC of V 1 (Fig. 10(d)) to the peak of V 1 (Fig. 10(e)). As previously mentioned, the configuration in Fig. 10(a) can also be used to model the continuous gm-based driver by activating the driver continuously. Because the injections away from the peak and valley of V 1 exhibit low efficiency, conventional drivers show low efficiency while requiring bias currents. Please note that in the optimum case of Fig. 10, V DD is reduced to V OSC + V INJ = 107mV, which is barely half the value of the minimum V DD = 200mV that we assumed for the conventional Pierce XOs with the same V OSC = 100mV at the beginning of Section III-B. This is because if we only do pulse injection around the peak and valley as shown in Fig. 10(e), the driver should work as a switch with minimum resistance to pull up V 1 to V DD or pull down V 1 to V SS , and we expect minimum voltage drop across the driver. In the case of conventional Pierce XO, we must leave sufficient headroom for the driver, or our assumption of g m would be invalid due to nonlinearity or distortion.

C. NOISE CONSIDERATIONS IN XO DESIGN
From the perspective of frequency response, a crystal resonator's high Q means that it only responds to excitations or disturbances within a very narrow frequency range [5]. Though high Q makes oscillation startup more difficult and slower, startup times in 32kHz XOs are typically not a concern given that they will not be duty cycled. To improve noise performance or frequency stability, crystals with higher Q are preferred. The circuit interfacing with the crystal necessarily introduces noise. When it injects energy into the crystal to compensate the crystal loss, it also injects noise into the crystal, which would disturb the oscillation as shown in Fig. 11. The noise in the circuit comes from the environment, power supplies, passive components, and transistors. We can categorize them as low-frequency noise and highfrequency noise. Because in most applications of 32kHz XOs, only long-term frequency stability matters, we focus on low-frequency noise including that arising from power supplies and transistor flicker noise. Thermal noise from resistors or transistors will average out across longer periods of time. Known techniques to reduce flicker noise can be applied to achieve better long-term frequency stability, including increasing the transistor sizes.
The crystal waveform is sinusoidal. Referring to the theory of phase noise [24], the oscillation phase has the minimum sensitivity to amplitude change or noise at waveform peaks and valleys. Hence, the same amount of noise injected into the crystal at different time points will result in different phase error.

D. CONSIDERATIONS ON PVT VARIATIONS
Though crystals show good frequency stability across PVT variations, their R S can be much larger at high temperature compared to its value at 25 • C [25]. Referring to (11)- (13), this would change the requirement of the circuit. Furthermore, the XO circuit must reliably perform its functions across a wide range of temperature and power supply voltages in the presence of process variations. In a conventional Pierce XO (Fig. 9), because the dc voltage at the two nodes of the crystal is ∼V DD /2, the dc current, transconductance, gain, phase shift, noise, and linearity of the inverter changes with V DD and threshold voltages of transistors given PVT variations. In this simple topology, one inverter is performing numerous tasks (frequency extraction, energy injection, timing) and hence there are only a few design parameters to tune overall XO performance. As a result the circuit is typically overdesigned at the typical case or requires calibrations to function reliably in PVT variations.

IV. DESIGN TECHNIQUES
After understanding the trade-offs and challenges in XO design, we now review the existing design techniques aimed at reducing power or improving frequency stability. One design methodology strives to partition the XO circuit into several blocks corresponding to the three requirements: frequency/phase extraction, energy injection, and timing control. With this concept, we can sidestep the inherent trade-offs in the conventional Pierce structure and optimize these functional blocks separately. By controlling oscillation amplitude and reducing load capacitance, the crystal loss is reduced and less power is needed to sustain the oscillation. Further, architectures with pulsed drivers were proposed to increase the efficiency of energy injection. In such structures, where the driver is turned off at some points in time to save power, a separate output clock generation circuit is used to maintain output clock at all times. Finally, pulse injection schemes at subharmonic frequency reduce switching loss for timing control, and less noise is injected into the crystal on average.

A. AMPLITUDE CONTROL
The motivation of amplitude control is to reduce power consumption by reducing the oscillation amplitude, so it focuses on reducing the power for energy injection to compensate the crystal loss. In the conventional Pierce structure in Fig. 9, the supply current is not controlled and can be overdesigned to saturate the waveform at one node of the crystal, which degrades the effective g m . A current-starving structure with feedback loop for amplitude control was proposed in [26] and shown in Fig. 12, which reduces the driver current by controlling the oscillation amplitude. This structure also provides better power supply rejection ratio (PSRR). However, because the oscillation amplitude is reduced, an extra output buffer is required to generate a rail-to-rail output clock. Recent nW XOs use an amplitude control circuit [11] or lower voltage supply [9], [10], [12]- [16] to reduce V OSC , because crystal loss is a quadratic function of the oscillation amplitude referring to (11) and (12). A 0.55 nW 32 kHz XO operating in series resonance was proposed in [11], [18]. It uses I/Q downconversion and upconversion to preserve the oscillation phase across the crystal to force the crystal into series resonance. A delaylocked loop (DLL) is used to generate I/Q signals from crystal oscillation. Because the delay between the I/Q signals and crystal oscillation does not affect the phase synchronization across the crystal, the phase-detecting circuit in the DLL does not require a fast response, which reduces its power consumption. The measured peak-to-peak oscillation amplitude at both sides of the crystal is about 0.1 V [18], and the phase shift across the crystal is close to zero in series resonance as shown in Fig. 6. The differential oscillation amplitude across the crystal (V OSC in Fig. 6) is reported as 2 mV [18]. The choice of this low oscillation amplitude is important to reduce crystal loss in series-mode resonance, referring to (12). With ECX-34Q crystal (R S < 70k ) that is used in [18], the estimated crystal loss is about 28.6pW by controlling the oscillation amplitude across the crystal as 2mV, which is negligible compared to the measured total power of 0.55nW.
Reference [15] introduces a design with an inverterbased Pierce structure with an amplitude-based duty-cycling scheme for the driver. The inverting amplifier operates in the subthreshold region with a V DD of 0.3V. Due to the conventional driver and subthreshold operation, the bias current is sensitive to PVT variations, and thus this design requires calibration. An ultra-low-voltage 32 kHz XO design operating with a supply voltage of only 60 mV was presented in [13]. It uses a Schmitt trigger as the inverting amplifier to compensate for the crystal loss. Although the paper shows that a Schmitt trigger has much less variability in process corners than an inverter, this design suffers from a limited V DD range of 0.06 to 0.1 V and was only tested from 5 to 62 • C due to measurement setup limitations.
XOs with pulsed drivers either generate lower supply [10] or use extra low power supply [9], [14], or achieve single low supply operation [12], [16] to reduce oscillation amplitude. We must note that it requires a low-dropout regulator (LDO) or switched-capacitor dc-dc converter to generate a low supply voltage for the driver or the whole XO design from the main V DD , which introduces overhead of power and area. In the case of an extra low supply voltage for the driver [9], the measured power to compensate the crystal loss through the driver can be less than 60 pW [9]. The  power overhead due to nonideal voltage conversion could be low compared with the crystal loss without amplitude control because the crystal loss is a quadratic function of oscillation amplitude referring to (11).

B. REDUCED LOAD CAPACITANCE
Referring to (11), crystal loss in parallel resonance is a quadratic function of the load capacitance, C L . Hence, reducing load capacitance is one avenue to explore in reducing crystal loss. Several nW XOs designs [9], [12], [16] have relied solely on parasitic capacitance at the two nodes of the crystal. To further reduce the parasitic capacitances at the two nodes of the crystal due to the chip packaging, chip-on-board (COB) package is used in [9], [19] (Fig. 13), and the estimated load capacitance is less than 2 pF. Manufacturers have recently started to optimize crystals for low power Internet-of-Things (IoT) applications, reducing C L to 3 pF [25]. However, there are still potential concerns about reliability with such low load capacitance, including frequency variations from part to part.
Table 1 [9] shows the measured nominal frequency across different parts of a XO design [9], [19] in COB packages with different crystals. The frequency variation due to the COB package is evaluated to be within ±3 ppm by resoldering one crystal to all 10 COBs, and Table 2 [9] presents the measured nominal frequency across different parts in COB packages with the same crystal. This experiment shows that for the XO design in [9], [19], the frequency variation of the crystals rather than the parasitics from COB packages dominate the variation in nominal output  frequency. Compared with designs using standard (i.e., larger) load capacitances, the oscillation frequency in the design with smaller C L is more susceptible to PCB parasitics (Section II-C), and the absence of load capacitors also causes a frequency deviation from the conventional 32.768 kHz. For conventional packages and PCBs, we suggest that designs with reduced C L use a one-point calibration for every board to account for board-dependent frequency drift.

C. DUTY CYCLING THE DRIVER WITH AMPLITUDE DETECTION
A crystal's high Q makes it possible to extract the frequency and phase of the crystal waveform for a certain amount of time without injecting energy to compensate the crystal loss.
An automatic self-power-gating (ASPG) scheme [27] was proposed in a 39MHz XO as shown in Fig. 14. It uses multistage inverters instead of one inverter in the conventional Pierce XO to reduce the total short-circuit current in the inverter-based driver. Two inverter chains with different threshold voltages and a flip-flop (Fig. 14 bottom) are used as an amplitude detector, and when the amplitude at one node of the crystal is less than a preset level, a set of switches are turned on to connect the driver and the dc-biasing resistor across the crystal. At V DD = 0.7V, ASPG reduces power by 87%, and with both multistage inverters and APSG, the power is reduced by 92%. Another amplifier duty cycling scheme was proposed in a 32kHz XO [15] with an inverterbased Pierce structure in 2016. Analog comparators and time constant generation circuit are used to detect the amplitude and enable the duty cycling of the driver in a conventional Pierce structure. The driver for energy injection is dutycycled, but a clock buffer is always on to convert the crystal waveform to a rail-to-rail output clock. This architecture reduces power for energy injection but keeps the power for frequency or phase extraction.
Though these duty-cycling schemes help reduce power, they are based on the amplitude of the oscillation and the transitions between "on" and "off" are not synchronized with the phase of the crystal oscillation, which could disturb the crystal oscillation and degrade overall noise performance. In the 39MHz XO [27], due to the introduction of duty cycling, the phase noise at 1MHz offset degrades by 38dB. In the 32kHz XO of [15], the measured Allan deviation plot is non-monotonic with irregular patterns, and it shows worse long-term frequency stability than other nW XO designs.
Interestingly, the pulsed driver in nW XO [10], [17] can be also considered as a type of duty cycling scheme. Referring to our analyses in Section III-B, the efficiency reaches its peak if the driver performs injection at the peak/valley of the crystal waveform. Hence the driver is only turned on for a preset short period around the peak/valley of the crystal waveform. To reduce short circuit current, these designs often further optimize the structure by only turning on the pull-up PMOS at the peak of the crystal waveform and turning on the pull-down NMOS at the valley of the crystal waveform.

D. XOS WITH PULSED DRIVERS
Pulsed drivers for nW XOs were first proposed in [17] at ISSCC 2012, as seen in Fig. 15(a). The motivation is to reduce power dissipated on the transistors and series resistor in conventional Pierce structure in Fig. 9. By injecting current when the crystal waveform is close to the power supply voltage or ground, it can sustain the oscillation without a series resistor and with a small voltage drop across the drive transistors, which reduces power dissipation of the driver [10]. In addition, this pulsed driver works as a Class-C amplifier, which does not have a continuous bias current and achieves higher efficiency compared with the Class-A operation in the conventional Pierce structure as we analyzed in Section III-B.
The first disadvantage of this pulsed scheme is that it introduces overhead for generating injection timing. To activate the pulsed driver around the peak and valley of the crystal waveform, a slicer followed by a delay-locked loop (DLL) was proposed in [17] to generate T/4 delay. Because the delay in slicer would directly add to the desired delay of T/4, it requires relatively high power consumption in the slicer to achieve a delay negligible compared to T/4 across PVT variations. A structure with a low power phase detector and phase-lock loop (PLL) was proposed in [14] to generate timing for pulsed injection, but it requires a large capacitor for ripple reduction or loop stability. An XO design with differential pulsed driver and relaxing injection timing was introduced in [16], and is shown in Fig. 15(b). A low-power clock slicer was proposed with the input PMOS and NMOS transistors biased on the edge of conduction. When the crystal waveform rises, it turns on the NMOS while turning off the PMOS. This Class-B operation reduces short current and enables low power consumption. The injection timing depends on the uncontrolled delay in the proposed low-power slicer. Although this structure can achieve low power consumption and exhibit great simplicity, the performance and delay of the slicer varies considerably across PVT variations. Based on the design in [16], an RC-based quadrature-phase shifter was proposed in [28] to specify the timing for injections around the peak and valley of the crystal waveform. Though this passive network does not consume DC current, it attenuates the input signal while introducing a 90 degree phase shift. To ensure that its output can be used to trigger the control signals, the oscillation amplitude of the crystal must be increased to compensate the phase shifter's attenuation. The design uses an oscillation amplitude of 720mV, which greatly increases crystal loss and total power consumption. A T/4-delay slicer whose delay is controlled by an on-chip current reference derived from the XO frequency was proposed in [9], [19]. This not only converts the sinusoidal crystal waveform to the output clock but also generates a delay of T/4 for the injection timing. This choice simplifies the architecture compared to DLL or PLL-based timing generation while achieving stable operation from −25 • C to 125 • C.
The second disadvantage of the pulsed driver is that a low power supply voltage is required to achieve high injection efficiency while reducing the oscillation amplitude due to V DD = V OSC + V INJ as discussed in Section III-B. In [10], [17], four extra on-chip voltage domains (V DDM , V DDL , V SSL , and V SSM ) are generated from the main supply to control the oscillation amplitude, which introduces both power and area overheads. In [14] and [19], one additional lower supply (V DDL ) is used for the driver. The XO design in [12] operates at one single supply of 0.3V.
The third disadvantage of the pulsed driver is that it requires the control signals in high voltage domain or with bootstrapping to activate the driver for pulsed injections. This requires extra circuits and results in power overhead due to switching loss, especially when these signals are in bootstrapped voltage domains.
In summary, the architecture with pulsed injections at 32kHz reduces the power for energy injection but increases the power for frequency or phase extraction and the power for timing control.

E. PULSE INJECTION AT SUBHARMONIC FREQUENCY
The concept of pulse injection at subharmonic frequency was proposed owing to three observations: 1) the driver for energy injection can be duty-cycled thanks to high Q of the crystal, and the pulsed driver can be duty-cycled in a simpler way than the amplitude-based scheme for analog drivers; 2) Duty-cycling the pulsed driver could reduce the injected noise on average; 3) Injection at lower frequency than 32kHz with a pulsed driver can reduce the switching loss of the timing control signals.
The work in [9], [19] introduces a 32kHz XO with high energy-to-noise-ratio (HERO) pulse injection at 8 th subharmonic frequency as shown in Fig. 16. A T/4 delay clock slicer is proposed to convert the sinusoidal crystal waveform to an output clock of 32kHz and to introduce a delay of T/4, providing proper timing for the energy injections. The output clock feeds frequency dividers and generates pulses to activate the proposed all-NMOS differential driver at 4kHz. It enables two injections in eight periods at the peak and valley of the crystal oscillation, with the crystal running freely between injections. This configuration reduces the  noise injected into the crystal (N INJ ) as illustrated in Fig. 17 and achieves a 2ppb Allan deviation floor. The less frequent energy injections reduce injection overhead/cost, enabling the lowest-reported power consumption of published nW XOs (510pW). At 0.45V, this XO operates across a temperature range of −25 • C to 125 • C, the widest reported range for nW XOs.
Referring to our design methodology of partition the XO circuit into blocks according to their fundamental functions, the architecture with pulse injection at subharmonic frequency reduces the power for energy injection thanks to pulsed driver, and it also reduce the power for timing control by activate the pulsed driver at subharmonic frequency instead of 32kHz. In the work [9], [19], the T/4-delay slicer also present one way to optimize the power for frequency or phase extraction while generating injection timing. By operating the blocks and signals in green in Fig. 16 at 4kHz instead of 32kHz, the switching power to active the driver is reduced to about 1/8. Because less noise is injected into the crystal on average and the crystal runs freely runs between injections, the pulse injections at subharmonic frequency also shows advantage on noise performance. The measured Allan deviation of the XO design in [9], [19] is presented in Fig. 18. Three baseline designs are also measured, including a 1.9µW 1.1V Pierce XO on PCB with a discrete inverter as the inverting amplifier, the 2nW on-chip Pierce oscillator for startup (Fig. 16), and a same structure as Fig. 16 but with 32kHz injections. At a short averaging window (left part of Fig. 18), Allan deviation evaluates high frequency noise, and the high-power XO on PCB shows much lower Allan deviation thanks to its lower thermal noise. The two on-chip baselines and HERO use the same T/4-delay slicer to convert the crystal waveform into output clock. Because this lowpower slicer limits the high-frequency noise performance, the yellow, red, and blue lines are close together on the far-left side of Fig. 18. With a relatively longer averaging window (middle part of Fig. 18), the advantage of the pulse injection at subharmonic frequency with respect to noise becomes evident. When the averaging window is more than hundreds of seconds, the noise in the testing environment, including the nonideal temperature stability of the temperature chamber, dominates and determines the value of the Allan deviation [9].
If the crystal is presented with a pure 4kHz sinusoidal excitation, the oscillation cannot be sustained because the crystal can only see 32kHz signals due to its high Q. It therefore seems confusing how the design in [9], [19] works with such a 4kHz injection. The key point is that pulse injection is used instead of pure sinusoidal injection. An interesting perspective [29] to understand the subharmonic injection is as follows. The crystal's high Q dictates that it cannot see other frequencies except for a very narrow range of frequency components around the fundamental frequency. When supplied with pulsed injections at a subharmonic frequency, these pulses can be considered as square waves that are full of harmonics. If the circuit performs pulsed injection at 8 th subharmonic frequency, it sustains the oscillation because the crystal sees the 8 th harmonic components in these pulses.
The drawback of the injection at subharmonic frequency is the spurs that are introduced by the pulse injections. In [9], [19], injecting energy at the 8 th order subharmonic frequency of the oscillation would introduce spurs at frequencies: m = 1, 2, . . . , 7, 9, . . . Fig. 19 shows the measured power spectrum density (PSD) with HP 35670A signal analyzer, which clearly shows the expected spurs. After each pull-down injection at V 1 (Fig. 16), the dc value or the zero-crossing of the oscillation waveform would be different from its value in the following 7 periods by V INJ . Though based on phase noise theory [24] this voltage step will cause minimal phase disturbance of the crystal oscillation, it can cause phase error and short-term frequency change at the output clock due to a non-ideal clock slicer. On average, this dc shift will not cause a large frequency deviation, but it may result in worse jitter performance [9]. The measured RMS period jitter of the proposed design is 230.8 ns RMS (10000 samples with Keysight 53230A frequency counter with single-shot time resolution of 20 ps and input frequency range of 1 mHz to 350 MHz), while the measured RMS period jitter of three baseline designs reported in [9], [19] (on-chip Pierce, XO with 32kHz injection, and PCB Pierce) is 53.9 ns RMS , 52.8 ns RMS , and 0.5 ns RMS respectively. Considering the spurs and jitter performance, the design of [9], [19] is not suitable as a frequency reference for applications requiring low spur and low jitter, like communication circuits.
In [9], [19], V DD = 0.45V is used for the main blocks and V DDL = 0.15V is used for the driver and startup circuit. As with any additional supply, V DDL will introduce power overhead as it must be generated from V DD with a low-dropout regulator or a 3-to-1 switched-capacitor DC-DC converter. In [12], a 32kHz XO with single supply and subharmonic pulse injection is proposed. By using a more digital architecture, it uses a single 0.3V supply and operates from −20 • C to 80 • C. The design can be configured to perform injections at f osc /N (N = 1, 2, 4, 8, and 16), and at N = 16 the total power is measured as 0.74nW at 25 • C. Table 3 summarizes the performance of state-of-the-art nW XOs. The lowest reported power is 510pW in [9], [19] from two power supplies, 0.45V and 0.15V, and the measured power from the 0.15V power supply to compensate the crystal loss is 55pW [9], which is 11% of the total power. Power for phase/frequency extraction and injection timing generation hence still dominates the total power consumption. We expect new design techniques in the future to further reduce the power consumption, pushing the power consumption closer to the fundamental limit discussed in this paper.

V. PERFORMANCE SUMMARY
In Table 3, XO designs in parallel resonance use oscillation amplitudes ranging from <30mV [13] to about 300mV [16]. The XO design in [11], [18] operates in series resonance and the common-mode peak-to-peak swing at one node of the crystal is about 100mV while the differential oscillation amplitude across the crystal is 2mV. As discussed in Section IV, these low amplitudes greatly help reduce power consumption.
To reduce crystal loss, all the sub-nW XO designs in Table 3 use less load capacitance than the required load capacitance that sets the resonance frequency at 32.768kHz, which makes the XOs in parallel resonance show higher output frequency than 32.768kHz as shown in Fig. 7 in Section II-C. This frequency deviation from 32.768kHz is not an issue for sleep timers that do not require an exact frequency of 32.768 kHz. For real-time clock applications that generate 1 second from 32 kHz XOs, there are two solutions [9] for the designs in parallel mode to deal with frequency deviation from 32.768 kHz. The first solution is to use the required load capacitance specified by the crystal datasheet to set the output frequency to 32.768 kHz. For example, we can use the crystal optimized for low power IoT [25] that requires C L = 3pF to achieve 32.768kHz. Because C L = 3pF could be larger than the load capacitance in these designs, this would increase the power to compensate the crystal loss. In [9], [19], by using C L = 3pF, the total power consumption can be estimated as 593 pW instead of 510pW at C L < 1.9pF. The second solution is to adjust the output frequency in the digital domain by using fractional division, which would introduce power and area overhead compared with the conventional chain of 15 divide-by-2 frequency dividers to obtain 1 second from 32.768 kHz. As for XO designs [11], [18] in series resonance, its output frequency with a 32.768kHz crystal is inherently lower than 32.768kHz referring to (7). One advantage of the series-mode XOs is that the oscillation frequency is not sensitive to the variation of the load capacitance or the parasitic capacitances at the two nodes of the crystal, referring to (6). However, despite of this advantage, its output frequency is inherently lower than 32.768kHz because of series resonance, and a variable capacitor can be connected in series with the crystal for fine frequency adjustment [18], [30] if exact 32.768kHz is required.
Please note that the row "Calibrations required to operate in PVT?" in Table 3 does not mean frequency calibration. This row indicates whether the XO designs required calibrations to operate properly under PVT variations.
A typical tested temperature range in Table 3 is −20 • C to 80 • C, and design [9], [19] shows the widest tested temperature range, −25 • C to 125 • C. Generally, the lowest operational temperature of the XO designs is limited by the power supply voltage. All the designs in Table 3 use relatively low power supply voltages to reduce power. Designs with relatively simple architectures that avoid analog amplifiers [12], [16] show their advantage on minimum power supply voltage across PVT variations.
The lowest reported Allan deviation floor is 2ppb in [9], [19], enabled by duty-cycled noise injection. Referring to the calculations and use models of wireless sensor nodes in [10], an Allan deviation of approximately 10ppb over a 1000s time window is already enough to make the power overhead due to the required guard band negligible. In these scenarios, reducing Allan deviation floor from 10ppb to 2 ppb would not make a difference from the perspective of power reduction, but it provides a technique to improve long-term frequency stability in scenarios where the Allan deviation of a baseline design is too large due to issues such as lower oscillation amplitude, worse resonator, or noisier environment [9].

VI. RESEARCH DIRECTIONS
In addition to power reduction, there are several ongoing research directions related to 32kHz XO.

A. SYNTHESIZABLE CRYSTAL OSCILLATOR
A XO structure with pulsed driver was proposed in [17] to improve energy injection efficiency. One observation about the pulsed driver is that digital driver operation also offers the possibility of digital operation in the XO control loop, allowing the entire XO design to be more process portable. In [10], [14], analog blocks such as clock slicers [10] or phase detectors [14] are used to extract phase information from the crystal waveform to properly time pulse injection. These analog blocks require moderately high design and verification effort to meet performances across PVT corners. Moreover, they typically require a higher power supply voltage than digital standard cells.
Work [12] uses a sub-harmonic peak-detection DLL to control the pulse injections. In this design, only the slicer and peak detector are analog circuits with inverter like structures. Thanks to digital operation, it uses a single 0.3V power supply while achieving stable operation from −20 to 80 • C. By further simplifying the analog blocks in crystal oscillators, fully digital operation may be achievable with commensurate process portability and design effort benefits.

B. ULTRA-LOW POWER 32KHZ TEMPERATURE-COMPENSATED CRYSTAL OSCILLATOR
Without temperature compensation, 32kHz XOs show 100s of ppm frequency variation across typical temperature ranges. By applying temperature compensation to existing ultra-low power XOs, single digit ppm frequency variation across temperature may be achievable while keeping the total power lower than 50nW.
Towards this goal, an ultra-low power temperature compensated crystal oscillator (TCXO) with pulsed driver [31] was proposed at VLSI 2021. Temperature compensation is achieved using a single switched load capacitor, modulated by a sigma-delta modulator ( M). A piece-wise linear approximation of the crystal temperature dependence is used and a 4-bit temperature sensor is implemented to select the segment of the piece-wire linear approximation. It achieves an accuracy of ±4.2ppm across −20 • C to 85 • C with 3-point trimming. The power consumption is measured as 43nW at 25 • C, which is about 8× lower than prior state-of-the-arts 32kHz TCXOs [32]-[34].

C. 32KHZ MEMS OSCILLATORS
Though crystal resonators have gone through significant size reductions in the last decades, this trend may be nearing an end: the packaged resonator size reduction has slowed from 50% in 2 years to 50% in 10 years [35]. Being notably more compact than quartz crystals, MEMS resonators have received great research attention over the last decade [33], [36]- [38]. In addition to package size, MEMSbased oscillators outperform quartz crystal-based oscillators in several other aspects [39] including 1) higher reliability and lower failure rate; and 2) lower electromagnetic, vibration, and acceleration sensitivity. Meanwhile, combining both resonator and oscillator circuit in a single package leads to superior stability, improved robustness, and lower power by minimizing environmental effects and parasitic capacitances [32]. These timing modules may become ubiquitous and fuel the emerging IoT computing class.
Please note that most of the reviewed low-power techniques and design methodologies in Section IV can be applied to oscillators or frequency references with other types of resonators in addition to quartz crystals, like MEMS resonators and steel tuning forks. For example, it would be very compelling to achieve nW 32kHz MEMS oscillators with sub-mm chip scale packages that use pulsed drivers and subharmonic injections.

VII. CONCLUSION
This paper presents a review of ultra-low power 32kHz crystal oscillators. We hope that discussions on the fundamentals of crystal oscillators and related analyses will be helpful for readers to understand the evolution of design techniques in recent nW XOs. Performance of state-of-the-art nW XO are summarized and research directions related to 32kHz crystal oscillation are briefly introduced.