Low-Power RF Wake-Up Receivers: Analysis, Tradeoffs, and Design

Wake-up receivers (WuRXs) offer a potentially energy-efficient means to enable asynchronous wake-up of higher power and higher performance radios without needing frequent (often energy-expensive) synchronization. Since WuRXs are typically on for a large percentage of the time, keeping their power consumption very low is critical to minimizing the total energy draw. However, this is difficult while maintaining good sensitivity, interference resiliency, and robustness, all with application-appropriate wake-up latencies and form factors. This article reviews the main challenges facing WuRXs, outlines the most popular WuRX architectures, and details essential design techniques and tradeoffs toward enabling utility in emerging applications.

network. For many standards-based radios, the target latency is imposed by the standard or by an application-dependent communication latency constraint. If the active WuRX power consumption is low, energy savings can be substantial for applications that impose frequent synchronization events but do not require frequent transmissions of data packets.
To be beneficial over a standard wake-on radio approach, WuRXs must simultaneously balance performance over several metrics. 1) Sensitivity: The sensitivity of the WuRX should be just as good or better than the main receiver. Otherwise, the distance between networked nodes must shrink to support the shorter range of the WuRX, which may or may not be acceptable depending on the application. 2) Power: Since the WuRX will typically dominate the power consumption of the standby mode operation, its power consumption should be as low as possible to preserve battery life. The WuRX power should be lower than the average power of the main radio at iso-latency to be beneficial. 3) Wake-Up Latency: The WuRX should react to wakeups with a reasonable latency, as defined by application requirements. In some applications, latencies of multiple seconds are acceptable, while latencies of microseconds are required in others. The latency may be limited by multiple factors, such as hardware delays or the on-air time of wake-up messages. 4) Interference Resiliency: WuRXs may operate in congested wireless environments like the main radios they wake up and, thus, should offer some ability to reject wireless interferers, ideally commensurate with the capabilities of the main radio. 5) Robustness: WuRXs may need to operate reliably across wide temperature and supply voltage ranges and should ideally require minimal calibration. 6) Form Factor: Ideally, WuRXs should have small-form factors, including the antenna and passive components, to fit into small-form factor IoT devices that rely on small batteries or energy harvesters. 7) Standards Compliance: In some cases, adherence to commercial standards is important. The key tradeoff a WuRX makes to achieve these specifications compared to a conventional main radio is primarily in the bandwidth of the wake-up signal. The main radio requires a large bandwidth to support its throughput needs, which is governed by the well-known sensitivity equation P sen = −174 dBm/Hz + 10 log(BW) + NF + 10 log(SNR min ) (1) where BW is the bandwidth, NF is the noise figure, and SNR min is the minimum SNR required to demodulate to a prespecified bit error rate (BER). A low NF front-end is required in such main radios to achieve good sensitivity. Unfortunately, RF front-ends have a fundamental noisepower tradeoff that makes achieving good NF and very low power difficult. Instead, WuRXs tend to significantly reduce the signal (and, thus, the noise) bandwidth, such that a low-NF front-end is not required to achieve the same overall sensitivity as the main radio. So long as the wake-up latency, which is related to the wake-up signal's bandwidth, is commensurate with application demands, this can be an acceptable tradeoff. In some cases, trading off power consumption with blocker rejection capabilities is also a tradeoff that can be made. While WuRXs can, in principle, utilize any receiver architecture, the desire for low power also necessitates the selection of low-complexity architectures operating with low-complexity modulation schemes (e.g., OOK or FSK). Most modern standards do not natively support such simple modulation schemes, or if they do, they do so at much larger bandwidths than desired for WuRX purposes. As a result, much of the published literature on WuRXs is not directly compatible with modern standards. While some standards are now starting to include a WuRX mode (e.g., IEEE 802.11ba in Wi-Fi), most designs that operate with commercial radios employ a technique called back-channel communication, whereby the bits fed to the transmitter that delivers the wake-up packet are carefully selected to make the transmitted packet look like a lower complexity, lowerbandwidth waveform that is more readily detectable by a low-power WuRX [4]. As will be discussed shortly, most of the work that utilizes this approach ends up consuming more power than a WuRX that is not standards compatible, largely due to the bandwidth limitations imposed by even carefully crafted wake-up packets, in addition to the desire to achieve interference resiliency similar to what the underlying standards require.
Another benefit of back-channel communication is that it can wake-up a backscatter modulator that reflects and modulates incident standards-derived signals at much lower power than a conventional active transmitter can achieve (e.g., ∼1000× lower power Wi-Fi compatible communication in [3]). Knowing exactly when to enable backscattering is critical to ensure symbol-level synchronization of the incident and backscattered signals for proper decodability. Since a conventional receiver consumes too much power to do this, a WuRX, with appropriately designed wake-up latency, in some cases with a hierarchical wake-up approach, can meet the application needs at low overall power [5].
The purpose of this article is to review low-complexity architectures that are finding popularity in WuRXs and discuss how these architectures tradeoff sensitivity, power, latency, interference resiliency, and robustness. Section II reviews the most prominent architectures, while Sections III-V study the tradeoffs involved in directdemodulation, transmitted local oscillator (LO), and on-chip LO architectures, respectively. Section VI then describes how duty cycling can be applied to these architectures to modulate the power-latency tradeoff. Section VII describes baseband (BB) correlator structures. Section VIII then describes various figures of merit (FoMs) that capture the tradeoffs between the different architectures, with the ultimate goal of giving the reader sufficient information to make informed design choices when building a WuRX. Finally, Section IX offers concluding thoughts and a brief discussion on future directions.

II. OVERVIEW OF WURX ARCHITECTURES
There are three prevalent WuRX architectures.
1) A direct demodulation architecture, which after some passive or active RF amplification and filtering, directly passes the signal to an envelope detector (ED) for demodulation to BB. 2) A transmitted LO scheme, which utilizes the same general architecture as the direct demodulation architecture, but is used for signals which include a transmitted tone that, through the nonlinear action of the ED, enables downconversion to an intermediate frequency (IF) instead of BB. 3) A more conventional heterodyne architecture featuring on-chip LO generation used for downconversion prior to demodulation. The first two architectures tend to be used for the lowest power applications at the expense of some degree of interference resiliency, while the on-chip LO architectures tend to consume much higher power yet offer better resiliency. This section will briefly summarize each architecture before diving into the detailed design tradeoffs in subsequent sections.

A. DIRECT DEMODULATION
As the name implies, a direct demodulation architecture directly demodulates the incident RF signal without prior downconversion to IF or BB. This occurs by routing the incident RF signal to an ED that, through its inherent nonlinearity, performs implicit downconversion to BB. The main reason this approach has found popularity is its ability to achieve extremely low power: there do not need to be any active RF circuits, and the only power-consuming blocks are at BB, which can be very low power due to the lowfrequency operation. Since there is no ability to separate I from Q or easily distinguish frequencies centered around an RF carrier, direct-ED architectures tend to work with OOK signals.
A generic direct-ED architecture is shown in Fig. 1(a). One of the most important blocks here is the input matching network: it provides passive voltage gain set ideally by the square root of the ratio of the input impedance of the first stage (either an RF amplifier or the ED itself) to the impedance of the source (in most cases, a 50-antenna). Since the conversion gain of the ED is proportional to the square of its input voltage, any RF gain achieved prior to envelope detection proportionally improves the resulting sensitivity. Since the matching network can provide this gain for free (at no power cost), this is a critical block to design well. In addition, the matching network can (and should) also provide some RF filtering. Matching networks in direct-ED architectures are typically designed using high-Q off-chip inductors to maximize the achievable output impedance and, thus, the voltage gain. Voltage gains on the order of 25-30 dB have been demonstrated at sub-GHz frequencies [6], [7], [8], with lower gains like 13.5 dB at 9 GHz [9]. High gain achievement may ultimately be limited by component tolerances and matching in practical applications. An alternative approach is to design a high impedance antenna to directly interface with the ED [10].
The ED can be passive or active for demodulating purposes. Passive EDs [8], [10] have zero power consumption and good noise performance since they do not exhibit flicker noise. However, their ability to achieve a high conversion ratio is dictated by the number of cascaded stages, which trades off with input impedance and, thus, with passive pre-ED gain. Active EDs utilizing common-gate (CG) [11] or common-source (CS) [7] architectures, on the other hand, can achieve higher conversion gain at the cost of active power and flicker noise. After the ED, BB circuits consisting of a BB amplifier, a digitizer (which in many cases is a simple 1-bit comparator), and some digital logic (e.g., a correlator) are used to perform demodulation.
During the past few years, sensitivities down to −80 dBm with power below 10 nW have been achieved [6], [7], [8], [11], [12], [13]. Adding active RF gain stages before the ED can improve the SNR by suppressing the noise contribution from the following stages. However, such stages typically consume microwatts-to-milliwatts and worsen with frequency. Sensitivities down to −86 dBm have been achieved with 10 s of μW [14], [15], [16]. To further reduce the power consumption at the cost of increased latency, bit-level duty cycling (BLDC) can be employed, with −106-dBm sensitivity demonstrated with only 33 nW using an off-chip high-Q MEMS filter [17]. Section III describes these designs and their relevant tradeoffs in more detail.

B. TRANSMITTED-LO
One critical issue with the direct-demodulation approach is that the incident signal is demodulated directly to BB. This creates issues with dc offsets, flicker noise, limited filtering ability, and RF self-mixing issues. While capacitive coupling can remove the dc offset, the required capacitance can be prohibitively large, and this does not address the other issues. If the signal could somehow instead be mixed down to a low-IF, that would help enable addressing some of these issues by allowing bandpass filtering [18].
A transmitted LO scheme paired with a direct demodulation architecture can achieve this type of low-IF downconversion [19]. As shown in Fig. 1(b), the transmitted LO approach effectively mixes the incident LO with the signal itself through the nonlinear ED, placing the incident signal at a low-IF. After this process, the low-IF signal can be filtered and further demodulated to BB by a second ED. This sort of IF channelization enables opportunities for better interference rejection and enables channel selection. Wideband multitone transmissions have also been demonstrated [20], [21]. The penalty for IF channelization is the increased linearity requirements on the transmitter and a 3-dB sensitivity reduction (compared to a single-tone transmission). The sensitivity degradation can be understood by considering the required RF transmitted amplitudes to generate a similar amplitude after the ED output at dc in the case of a single tone versus at an IF in the case of two tones [19].
The TX linearity requirement due to the nonconstant wave output can be relaxed by adopting a BB Manchestercoded OOK scheme or channel-embedded ON-OFF-keying (CE-OOK) [22]. This provides 4/π (+2 dB) better voltage conversion gain than the two-tone modulated case (for similar peak amplitudes). This is advantageous if the peak output power limits the transmitter due to regulations or device breakdown constraints. The penalty for adopting the CE-OOK method over the simple two-tone method is that the CE-OOK requires a sharper bandpass filter at the IF to suppress the BB square wave's higher odd-order harmonic content. However, given that this filtering occurs at the IF, it can be realized at low power.

C. ON-CHIP LO
In contrast to the other two main architectures, the third popular architecture generates an LO on-chip to explicitly mix down the incident RF signal to an IF or BB, as shown in Fig. 1(c). Mixing down to BB/IF allows more powerefficient amplification than at RF, and it is generally possible to design sharp yet low power BB or IF filters to knock out circuit noise and interferers at nearby frequency offsets. Therefore, the mixer-based approach is advantageous in terms of sensitivity and interference resilience compared to the direct-ED approach. However, LO generation requires significant power and, thus, mixer-based architectures are generally used in applications where 10-100s-μW power levels are acceptable [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35].
Depending on which frequency the signal is downconverted to, the on-chip LO mixer-based WuRX architectures can be classified into the following categories.

1) ZERO-IF
The incident RF signal is downconverted directly to BB. Since the OOK signal is its own image and does not cause demodulation issues, the zero-IF approach ideally does not require an I/Q mixer, especially when using with a multicarrier signals like MC-OOK in [30], which, therefore, can save extra LO buffer power. However, careful design is required when operating with low BB bandwidths/data rate applications such as LPWAN to combat 1/f noise.

2) LOW-IF
For the heterodyne approach, the RF signal is downconverted to a known IF that is large enough to be away from the 1/f noise corner frequency while still low enough for powerefficient band-pass filtering and amplification. This approach, however, will simultaneously downconvert any unwanted jammers located at the image frequency to the same IF as the desired signal. If jammers at the same frequency must be tolerated, either a front-end image rejection filter must be adopted, or an I/Q mixer is required to enable on-chip image rejection techniques at the expense of higher LO buffer power.

3) UNCERTAIN-IF
The uncertain-IF architecture adopts a free-running oscillator, not conditioned by a PLL, to downconvert the highfrequency input RF signal to a relatively lower IF [23], [36]. The benefit is that the power of LO generation can be much lower than a PLL-based approach and that high gain can still be realized at IF with relatively low power (assuming the downconversion is low noise). The downside, however, is that since the LO is free-running, its exact frequency is uncertain and, thus, the IF bandwidth (including its noise bandwidth) can potentially be very large. A large IF noise bandwidth entering an ED can result in significant demodulated noise, resulting in a poor overall noise figure.
Tightening the LO uncertainty can be accomplished by opting for a robust oscillator topology, such as LC-VCO with high-Q components [26] and oscillator calibration [34].

A. ARCHITECTURE OVERVIEW AND MODELING
The direct demodulated architecture shown in Fig. 1(a) can be subdivided based on whether a receiver includes an RF gain stage before the ED into "LNA-ED-first" or "ED-first" categories. In either case, an OOK-modulated RF signal passes through a matching network and is filtered before the ED to reduce the total noise rectified by the ED. In an ED-first architecture, the matching network directly matches the antenna impedance (typically 50 ) to a high input impedance, providing high voltage gain to increase the signal level at the ED input. The ED provides square-law detection of the input signal with limited bandwidth, which will only pass the signal's envelope and filter out the second harmonics of the carrier. Due to the lack of RF gain, the ED output is small enough to typically require BB amplifiers that amplify the signal prior to digitization. A digital correlator follows the digitizer to compare the signal against reference codes to determine if the input matches a wake-up command.
There are three major noise mechanisms in direct demodulated WuRXs: 1) signal-to-noise mixing; 2) self-mixing; and 3) BB noise. Since the input matching gain before the ED is typically relatively low in an ED-first architecture, the dominant noise source is ED/BB noise. By properly designing the detector output impedance and BB bias current, the noise contribution of the BB amplifier can be minimized [37]. The sensitivity of the ED-first architecture has been explored in [6], [12], and [37] and other works. Assuming the ED is driven by a source resistance of R s , as shown in Fig. 2, the output voltage is where μ det is the open-circuit voltage sensitivity of a singlestage ED with units of 1/V, N is the number of stages in the ED, R D is the device channel resistance, and P RF is the available power from the source. Then, the SNR is derived in [37] to be where k B is Boltzmann's constant, T is the temperature in Kelvin, and B is the noise bandwidth. The optimal N for the highest SNR is (R D /R s ), which indicates a power match between the source and ED. Thus, by replacing the SNR with the SNR min , the minimum required SNR for demodulation, B with the BB bandwidth BW BB , and N with (R D /R s ), the minimum detectable signal is found [37] P MDS = 5log 10 (16k B T) + 5log 10 (BW BB )  From (4), the optimal sensitivity can be achieved when the source resistance is arbitrarily large and the impedance match is still achieved. With the inductor's limited Q factor, especially at high frequencies, the maximum achievable source resistance is typically 10s of k . The minimum detectable signal is plotted in Fig. 3 with the assumption of μ det = 10 V −1 , BW BB = 100 Hz, and SNR min = 10 dB. Given that μ det is inversely proportional to the subthreshold slope factor [13], [37], which is technologydependent, the sensitivity is limited to −80 dBm for these assumptions unless the data rate is reduced significantly or the technology improves significantly to increase μ det .
Active RF gain stages can be added before the ED to improve the SNR further in the LNA-ED-first category. Increasing the RF front-end gain would cause signal-to-noise mixing or self-mixing noise to be the dominant contributor compared to the BB noise. Further increasing the gain would not improve the SNR if the front-end NF stays the same. On the contrary, it increases RF power consumption and decreases linearity. The match from the 50 antenna to the high impedance LNA input provides an advantageous voltage gain, similar to the ED-first case. As shown in Fig. 4, the voltage gain and LNA noise figure both improve due to high-Q matching.
Huang et al. [38] explored the sensitivity of the LNA-ED-first architecture. As shown in Fig. 5, envelope detection includes both self-mixing noise and signal-to-noise mixing. By decreasing the RF bandwidth before the ED, the selfmixing noise level drops, and the signal-to-noise mixing will dominate. With the assumption of BW BB = 10 kHz, the sensitivity versus RF bandwidth is shown in Fig. 6.
When either signal-to-noise mixing or self-mixing noise dominates, the minimum detectable signal equations simplify to (5) and (6), respectively, [38]  P MDS,N 2 = 2k B T · F FE BW filter · BW BB · SNR min (6) where F FE is the noise factor before the ED, and BW filter is the RF bandwidth. These equations assume a noiseless BB, an ideal filter, and an RF bandwidth that is much larger than the BB bandwidth. From (5) and (6), and Fig. 6, reducing the RF bandwidth until the signal-to-noise mixing dominates can improve overall SNR and sensitivity. Once signal-tonoise mixing is the dominant noise source, reducing RF bandwidth further is unhelpful for sensitivity but increases the filter's complexity and power. So, a filter with proper bandwidth before the ED is necessary to achieve optimal sensitivity and avoid overdesign.

B. ENVELOPE DETECTORS
As the primary block that performs direct RF to BB demodulation of the input signal, the performance of the ED greatly affects the sensitivity of the entire WuRX. Envelope detection can be performed passively by a rectifier or actively via an amplifier-biased design to maximize second-order nonlinearities. Both approaches exploit a subthreshold MOSFET's nonlinear exponential V-I relationship. This section reviews ED designs and compares choices based on the target power consumption, operating frequency, and data rate.

1) ACTIVE EDS
To get the full benefit of passive RF gain from the front-end matching network, the ED needs to provide high enough input resistance, R chip , not to degrade the corresponding R EQ,P of the RF matching network. Although a passive Nstage RF rectifier [4], [39] is a tempting choice considering it consumes no power, it is hard to achieve a high enough R chip at 10's of k while supporting >1-kHz bandwidth. Thus, active EDs tend to be preferable when a high data rate is required. A transistor biased in sub-V th can operate with a low supply voltage and low power consumption while providing an exponential voltage-current relationship.
Assuming the transistor is operating in saturation in sub-V th , the exponential V-I relationship ultimately leads to the 2ndorder nonlinearity desired for ED operation, which can be derived as [40] g m2 = 1 2 Different biasing schemes for a CS ED with a low supply voltage are shown in Fig. 7, which are used to set the dc load line and provide output resistance R out . An active diode-loaded ED has a similar output resistance as a source follower ED [23], since Because φ t is smaller than the early voltage V A , the output resistance is low compared to the transistor's small-signal output resistance r ds , and could only provide a large conversion gain for high input signals. On the contrary, a resistive-loaded ED has higher output resistance compared to an active diode-loaded ED, which is set by the loading resistor R L R out R L = 0.5V DD I DS (9)

FIGURE 8. (a) Comparison of DTMOS CS and CG EDs [11] and (b) active pseudo-balun current-reuse CG ED operation [11].
if the output node is set to half of V DD . However, R L is fixed by the current level and limited because of the low supply voltage. Other techniques, such as cascoded level shifters, could set the dc value for the drain node of the input transistor and provide high output resistance but need extra voltage headroom [41]. An active-L self-biased ED was proposed in [40] [ Fig. 7(c)] to solve the aforementioned issues. The feedback resistor sets the dc value for both the gate and drain nodes of the input transistor and also serves as the output impedance of the ED because of the low current level. A Bode plot of the output impedance Z out is shown in Fig. 7(c). The plot shows that Z out is boosted to R FB at the signal passband because of the active-L biasing, which leads to higher conversion gain. However, active EDs suffer from 1/f noise, which requires large transistors to minimize the effect of 1/f noise at BB. This is another reason why active EDs are more suitable for high data rate applications. Because of this requirement, the CS-type ED in either bulk technology or SOI technology that can leverage a dynamic threshold-voltage MOSFET (DTMOS) [42] to increase second-order transconductance [40] will introduce significant C gd (and C bd for DTMOS) at the ED input, which is not suitable for high transformer gain at high frequency. On the other hand, compared to a CS ED, the CG ED only has the source connected to the RF input whereas both the gate and bulk nodes are connected to a dc bias voltage, which therefore eliminates the effects of C gd (and C bd for DTMOS) on the input [ Fig. 8(a)] [11]). In addition, for the CG DTMOS approach in [11], the dc bias voltages for the gate and bulk nodes can be set at different potentials for threshold voltage adjustment and freedom of transistor sizing. Fig. 8(b) depicts the active pseudo-balun ED configuration described in [11]. Two n-and p-type CG EDs are stacked in a current reuse structure to provide single-ended to pseudo-differential conversion, eliminating the need for an explicit reference. This ED acts as a pseudo-balun only to 2nd-order nonlinearities: linear RF currents flow symmetrically through the n-and p-CG amplifiers to partially cancel at the outputs (and are then further filtered), yet the BB 2nd-order components flow pseudo-differentially with slightly different gains due to the asymmetric loading. A fully (pseudo)-differential CG design has also been presented in [43], which was used in a super-regenerative receiver after the voltage-controlled oscillator (VCO) to rectify a differential input signal and, thus, would require a center-tapped transformer in this design, which results in lower Q and, thus, lower A V compared to a single-ended design. The current reuse pseudo-balun architecture improves k ED by 66.6% compared to [40], and the WuRX sensitivity by ∼1.5 dB (i.e., 2× signal voltage with 2× noise power compared to single-ended ED) without a power penalty. The same activeinductor bias technique increases the output impedance and, therefore, k ED [11]. Although the current reuse architecture generally requires larger voltage headroom than a single amplifying transistor, in subthreshold, the required overhead only increases by ∼100 mV, which still fits within a 0.5-V supply. Given that a dc-dc converter would be required to generate a lower supply voltage than this, and such a converter will have area and power overhead, current reuse is generally a useful technique to improve performance without a power penalty.

2) PASSIVE ENVELOPE DETECTORS
As discussed in the previous section, active EDs can offer R in >10s of k with wide output bandwidth but suffer from 1/f noise. Passive EDs, on the other hand, were historically designed with low-V t devices [44] or with standard high-V t devices along with V t -cancellation techniques [45] to maximize power (not voltage) conversion efficiency, which results in a reduced R in . By using high-V t devices at the cost of lower BB bandwidth (BW BB ), passive EDs can achieve comparable R in to active EDs and, most importantly, do not have any 1/f noise since there are no dc currents. This permits smaller devices and, thus, lower C in . As such, passive EDs can have higher SNR and enable higher A V compared to active EDs and, therefore, tend to be a better choice for low data rate applications. Fig. 9(a) depicts conventional ED unit cells and architectures. Cross-coupled self-mixers [46] rectify a differential input signal and, thus, require a center-tapped transformer. For direct-ED architectures operating at sub-500 MHz, the input transformer filter is normally implemented off-chip. To implement such a transformer that balances the footprint and gain, the coupling part is usually implemented via a distributed trace on the PCB, while the majority of the secondary inductor is implemented via lumped components [40]. In this scenario, a center-tapped transformer is difficult to implement, and it is likely to have lower Q and, thus, a lower passive gain than a single-ended design. Moreover, biasing is implemented using an extra RC network at the RF node that reduces the ED's input impedance.
On the other hand, a traditional Dickson rectifier operating in sub-V t [13], [39] can rectify a single-ended input signal but does not have any tunability and only has a singleended output, which requires a tunable reference circuit for the digitizer. To overcome these issues, a tunable passive pseudo-balun ED architecture is proposed in [6], which is a 2N-stage rectifier with the middle node connected to V CM and the bulk nodes connected to a tunable voltage V bulk to set the bandwidth [ Fig. 9(b)]. As such, the BB ac currents flow in opposite directions relative to ground to form a pseudo-differential output. Compared to the original singlebranch N-stage Dickson rectifier, this structure achieves 2× conversion gain and a 1.5-dB sensitivity improvement under the same input signal level without sacrificing the output bandwidth. In [47], the 2nd branch of the N-stage ED is connected in parallel with the 1 st branch without flipping the polarity. The output of the parallel rectifier branches are then summed and amplified by the proposed charge-transfer summation amplifier (CTSA), which, along with the following analog-to-digital converter (ADC), improves SNR compared to conventional single branch ED. More importantly, this design shows robustness across process, voltage, and temperature (PVT) under the industrial spec from −40 • C to 85 • C and has been deployed in volume commercially.

C. TRADEOFF ANALYSIS BETWEEN THE MATCHING NETWORK AND THE ED INPUT IMPEDANCE
The design of an ED-first architecture involves intricate tradeoffs between input impedance, output impedance, and as a result, A V , k ED , BW BB , and total integrated noise v 2 n . For instance, how should one choose the optimum number of stage and transistor size of a passive ED to achieve the best WuRX performance? To drive a fixed capacitive load from the BB amplifier, a passive ED with many stages requires larger transistor widths to maintain the same output bandwidth and, thus, has a larger C in , which limits the achievable transformer gain. As the transistor width increases, the parasitic capacitor from the ED starts to add to the fixed capacitive load at the output node, which requires R out to decrease further. As shown in Fig. 10(a), a larger passive voltage gain in the impedance transform, A V , is possible with small N, which has higher R in and lower C in . However, as shown in Fig. 10(b), since the conversion gain and, thus, ED scaling factor k ED , are proportional to N, an ED with a large N is more suitable for post-ED stage noise suppression. Moreover, since the passive ED noise power density is 4k B TR out , an ED with a larger N has less total integrated noise, v 2 n . To find the optimum N, an objective function was developed in [6] to compare designs with different N under the same output bandwidth and operating frequency which is essentially the achievable ED output SNR normalized to its input signal power. As shown in Fig. 10(c), an optimum value of N = 5 was found for the ED in [6] using the proposed FoM. Fig. 10(d) shows a simulation for the C in and A V that correspond to an ED with N = 5 for different V bulk . By forward biasing the transistor bulkto-source junction diode (<200 mV), V t is reduced and, therefore, transistors with lower width could be implemented for a given BW BB . Thus, bulk tuning can overcome process variation and effectively reduce C in via smaller devices, maximizing the achievable A V .

D. ACTIVE RF AMPLIFICATION
The sensitivity of ED-first WuRXs is typically dominated by the BB noise, as there is not sufficient pre-ED gain to make RF noise dominate. LNA-ED-first WuRXs add additional RF gain, which increases sensitivity linearly-in-dB with the amount of added gain until reaching a limit where RF noise begins to dominate. This gain level can be ∼50 dB for a typical LNA-ED-first receiver [34], necessitating a powerefficient amplifier to minimize the RF power consumption impact on overall system power consumption. Active amplification can be realized with conventional topologies such as a CS amplifier, but a tuned load should be adopted to limit the RF BW and the self-mixed noise at the ED. Prior demonstrations include off-chip passive inductive [15] and on-chip active inductive loads [14]. Off-chip inductors can achieve better BW and noise but contribute to integration cost, especially if multiple amplification stages are cascaded. The active inductor's quality factor is often insufficient to provide narrow bandwidths while adding extra noise to the signal path at low power.
Another attractive class of amplifiers for power-efficient high gain is regenerative amplifiers. Such amplifiers approach stability boundaries to maximize the power gain of the device for a given bias current. Most regenerative amplifiers are based on oscillator structures, such as Colpitts [48] or ring oscillators [17], which are backed off somewhat from the unstable region where the loop gain approaches unity, allowing for considerable gain at low dc power. However, such amplifiers require careful biasing to guarantee stable operation across PVT variation.

E. BASEBAND AMPLIFIERS AND TECHNIQUES
The BB noise can limit the achievable sensitivity for nWlevel operation, where no active RF amplification stages are available to provide gain. Thus, BB amplifier design needs to be low noise and low power. The subthreshold design is typically used for achieving nW-level power consumption, where large device sizes are employed to reduce the flicker noise. In addition, low input capacitance and high input impedance are required for BB amplifiers to not degrade the ED's output impedance. Thus, CS amplifiers with gate inputs are popular and are often used in a differential configuration to reject common-mode noise. The BB amplifiers typically offer 10s of dB gain programmability for dynamic range considerations.
In LNA-ED-first architectures, the active RF gain stage typically dominates the power consumption and provides enough gain to suppress the effect of the noise from later stages. The BB amplifiers' design constraints are more relaxed for LNA-ED-first BB amps with similar architectures as ED-first amps, so their current can increase to μA-level to support a higher data rate.
The ac coupling is a popular way to connect BB amplifiers to the ED to remove the dc offset from the detector stages. A self-biased current reuse topology [6], [49] is shown in Fig. 11, where the gain can be controlled by changing the bias current. Another buffer stage offers bandwidth control. However, ac coupling can require huge capacitance between the ED output and the BB circuits, which increases the start-up time of the ED. Techniques, including neutralization capacitors and fast start-up pseudo-resistors, have been proposed to address this [6]. dc coupling eliminates the huge capacitor but requires the design to address the dc offset between the ED and amplifier stages. The BB amplifiers need to maintain functionality under a range of the common-mode voltages from the detector. As shown in Fig. 12, differential amplifiers with common-mode feedback (CMFB) stabilize the dc bias and reduce the dc offset at their input. The feedback resistors should be large and provide a self-bias for the pMOS transconductance load. Low-frequency zeros can be added using an off-chip capacitor between sources of the input pairs to achieve a bandpass response for filtering lowfrequency flicker noise and reducing the dc gain. In addition, global CMFB techniques for dc coupling have been demonstrated by introducing an additional feedback loop [9], for example, by utilizing an auxiliary amplifier to sense the tail nodes of the BB amplifiers and drive the ED common-mode voltage.
After BB amplification, a common way to process the BB signal is to directly filter and digitize the signal. During the digitization process, a proper threshold should be chosen for the comparator or ADC. For a comparator, the threshold should be well above the noise level, given the false positive rate requirement. However, dc offset exists and is affected by PVT variation. An automatic offset calibration loop was implemented to address this issue in [8]. Setting an ADC threshold is similar to the comparator. Robust reference voltages need to be generated for the ADC to minimize PVT effects on the quantization.
Synchronizing the incoming data symbols with the on-chip BB clock-either generated by a crystal oscillator or a small on-chip RC oscillator-is important. Oversampling can help eliminate the need for explicit synchronization for alwayson and packet-level duty cycling (PLDC) architectures. For BLDC, synchronization is not an issue because the BB data is sampled within a small portion of a bit. The timing signals are typically programmable and slow enough that they are unlikely to miss a single bit with a reasonable frequency offset. However, conventional symbol-level decoding does require some way to synchronize the incident symbols and the on-chip clock.
Instead of digitizing in the amplitude domain, an alternate way of time-based signal processing was proposed in [12], where a matched filter using a windowed integrator was developed. Specifically, a voltage-controlled delay line worked as voltage-to-time converter and integrated the time encoded signal. This matched filter filtered the high-frequency BB noise and optimized the SNR before sampling.

F. PVT ROBUSTNESS
PVT variability causes challenges for all circuits, but the low-power budget and breadth of operating contexts can make robustness to such variations particularly challenging for WuRXs. While all elements of the WuRX require some attention to PVT robustness, most of the challenge is solved if the references, bias circuits, and clock sources remain stable across PVT. For robustness to voltage variations, numerous recent examples of nA-level reference circuits and low dropout (LDO) dc-dc converters can be integrated with WuRXs. It is beyond this article's scope to explore these adjacent topics in depth, but several examples that show how low-power components can offer a range of robustness features will be discussed. For example, a stack of diode-connected transistors can generate a complementary to absolute temperature (CTAT) voltage, which provides robustness to the voltage across the temperature range [10]. Less than 500-pA variation has been achieved for a 10 nA current source from −30°Cto 70°C [10]. Jiang et al. [9], Shen et al. [10], and Bassirian et al. [49], [50] have also applied techniques for improving temperature robustness.
Temperature variations can cause clock frequency drift, ED or BB amplifier bandwidth reductions, and comparator offset changes. Clock frequency affects the accuracy of the data detection. If the clock deviates from the data rate, it reduces the chance of successful detection, especially with a long code. A simple solution is to use an XTAL oscillator to generate a clock, since an XTAL's temperature coefficient is typically less than 0.05 ppm/°C 2 and can be extended from −40 to 85°C. Bassirian [49] also implemented an on-chip temperature compensated relaxation oscillator with less than 41 ppm/°C and nW-level power consumption, among many other examples [51], [52], etc.
The ED and BB amplifier bandwidths are highly dependent on the temperature because the output impedance changes versus temperature. The design technique of applying a CTAT bulk bias to the ED is one example of an approach that keeps a constant diode channel resistance [9].
For BB amplifiers, MIM capacitors and temperatureinsensitive current references achieve less than 2.5% BW variation [9]. The offset change due to temperature and process variation can cause errors in the comparator. Tunable offset in the comparator and an nW-level PID calibration loop have been demonstrated to automatically compensate for the offset [49]. The autozeroing network proposed in [9] utilized a switched-capacitor network to sample the dc offset when connected to a replica ED and subtract the stored dc offset when in normal operation. Thus, the offset caused by PVT can be calibrated out.
These examples show how a range of power-compatible techniques are available for providing WuRX robustness to PVT variations.

IV. TRANSMITTED-LO ARCHITECTURES
The two major downsides of the direct-ED approach relate to the lack of filtering at RF, which limits channel selectivity and interference resiliency, and the lack of pre-ED gain, which can only occur at power-expensive RF frequencies.
To simultaneously address these issues, a mixer can translate incident RF energy to a lower frequency, where it is easy and power efficient to filter and amplify the signal before energy detection. This, however, requires the generation of an LO, which can consume significant power and, thus, these architectures generally consume significantly more power than direct-ED-based approaches.
One alternative solution is to send the packet with a timing reference such as a two-tone RF OOK signal, which can then be demodulated to an IF using only a simple ED [19], [20], [22]. The signal propagation is shown in Fig. 13. The transmitted signal v RF,2−tone (t) may comprise two RF tones spaced by f, which can be explained mathematically as the sum of two sinusoids where the peak-to-peak amplitude is A RF , the carrier frequency is f C , and φ 1 and φ 2 represent the random phase offsets for mathematical convenience. The resulting output signal when this 2-tone input enters a square-law ED can be expressed as Neglecting the high-frequency content due to low-pass filtering, the resulting IF tone is Equation (13) shows rectified energy at dc and a tone at f due to intermixing of the two tones. Since the desired signal appears at the IF and the interference stays at dc, a bandpass filter can be used to isolate the desired information while suppressing the dc offset, as shown in Fig. 14. To further reduce the effect of the in-band interference, code-based interference rejection is presented in [21]. It applies one code to the two tones and another code to a reference tone. Using the same code-based N-path mixer, the signal is dechirped with the matched code while the in-band interference is spread and reduced.
A second IF ED or a mixer with the IF frequency can be applied to downconvert the signal to BB for quantization and digital processing. A −99-dBm sensitivity is achieved using this approach with only 260-nW power without an offchip MEMS filter [22]. Additionally, the CE-OOK signaling approach can simplify the multitone modulation relative to 2-tone sinusoids-based transmission while providing 2 dB additional conversion gain on the receiver side [53]. The CE-OOK waveform can be expressed as sin(2π f t) + 1 3 sin(3 · 2π f t) where Sq(t) is a square signal with a unity amplitude and f is the frequency. An infinitely long symbol sequence is assumed for the convenience of derivation. Expanding the terms in Sq(t), and cross multiplying with cos(2π f C t) produces the following results: The rectification of the signal in (16) by an ideal squarelaw ED and filtering out the high-frequency content results in where the odd-order harmonic content beyond the 3rd harmonic is assumed to be negligible. Equation (17) shows that it provides a 4/π (+2 dB) better voltage conversion gain compared to (13) for a two-tone modulated case (for similar peak amplitudes). The CE-OOK method can also be used with an uncertain-IF topology [34] to distribute the required gain before the rectification across several frequencies to improve stability and selectivity.

V. ON-CHIP LO ARCHITECTURES
In some applications, the need for excellent channel filtering necessitates going to the on-chip LO generation architecture. This section will discuss how to do this with a low-power penalty.

A. LO GENERATION CONSIDERATIONS
Utilizing noncoherent energy detection for demodulation relaxes the RF LO phase noise requirement to around −80 dBc/Hz at a 1-MHz offset [54] and, thus, a ring oscillator-based LO, which can consume lower power than an equivalent LC VCO, can be used [23]. For zero-IF and heterodyne architectures that require channelization, either a PLL or an FLL may be required to stabilize the LO to a known good frequency.
In [28], a ring VCO-based FLL consumes less than 55 μW with a phase noise of −65 dBc/Hz at a 1-MHz offset. However, phase noise still limits the SIR because of reciprocal mixing. Thus, for most WuRXs operating on a single channel at a time, further SIR improvements can be achieved only by employing an LC oscillator as a tradeoff with power. For instance, in [30], an LC VCO-based FLL is reported that consumes 292 μW with a phase noise of −128 dBc/Hz at a 20-MHz offset.
To obviate this issue, a 3-channel frequency-hopping wake-up signature and a majority voting algorithm are presented in [31], enabling the use of a ring VCO while still achieving interference resiliency. Moreover, with a careful frequency plan, an integer-N PLL and a frequency tripler are adopted in [31], where the VCO and PLL divider operate at a 3× lower frequency than the channel frequency, further saving power. The entire LO generation consumes 166 μW with a phase noise of −79 dBc/Hz at a 1-MHz offset.
Several prior works have explored the possibility of further reducing the power consumption of LO frequency generation by completely removing the frequency stabilization circuitry, i.e., a PLL or FLL. However, mixing an incoming RF waveform with a free-running oscillator whose precise frequency is not well controlled or known requires a relatively large IF bandwidth to guarantee proper demodulation after envelope detection. This requirement then results in the uncertain-IF architecture as discussed in prior sections. For a typical uncertain-IF WuRX, the free-running oscillator frequency can be calibrated periodically over process and temperature variation, while the frequency variation over time eventually determines the IF bandwidth. For instance, although a ring oscillator consumes less power than an equivalent LC oscillator, it introduces larger frequency variation and, therefore, requires larger IF bandwidth.
Moreover, since a larger IF bandwidth before the ED causes worse sensitivity, a power and sensitivity tradeoff exists for the uncertain-IF WuRX depending on the RF LO design. In [23], a ring oscillator consuming 20 μW with a 15-MHz frequency variation over a 6-h observation window is adopted, which results in a sensitivity of −72 dBm with 100 MHz of IF bandwidth. To achieve improved sensitivity, the LC oscillator with an off-chip high-Q inductor consumes 44 μW with a 68.5-kHz frequency variation over a 5-h observation window [26], which results in a sensitivity of −97 dBm with a 1 MHz of IF bandwidth. In addition, a dual uncertain-IF architecture has been explored to alleviate the tradeoffs between LO accuracy and noise bandwidth. The filter's center frequency is located at the second IF and is independent of the LO accuracy because the clock for the filter is recovered at the second IF. With 50 kHz of second IF bandwidth, up to 180-kHz allowable LO uncertainty is tolerable [26].

B. OVERVIEW OF STANDARD-COMPATIBLE ON-CHIP LO ARCHITECTURES
To work with the existing infrastructure, operating directly with standard-compatible radios (e.g., BLE or Wi-Fi) can reduce cost and simplify deployment strategies. WuRXs that are compatible with standards tend to consume more power than proprietary WuRXs, since standards impose tighter frequency control, channelization, higher RF frequency, and other constraints that are avoidable in custom protocols. Numerous standard-compatible WuRXs have nevertheless been presented, most of which achieve low power by incorporating backchannel-based modulation schemes [4], [27], [28], [29], [30], [31], [32], [55]. To achieve comparable sensitivity and interference resiliency with the main radios, mixer-based architectures are normally used. Moreover, since BLE and Wi-Fi have multichannel allocations within the band, the uncertain-IF architecture is generally unsuitable because of the lack of a PLL or FLL to switch the channel. Therefore, this section focuses on designs employing mixer-based zero-IF and heterodyne architectures.
Most prior-art Wi-Fi-compatible WuRXs utilize a mixer-first zero-IF architecture, as shown in Fig. 15(a) [28], [29], [30]. In this approach, the front-end RF LNA is removed to save power. Instead, the incident RF signal is fed to a passive mixer after an on-chip matching network, which downconverts the signal to BB for filtering and amplification. Low passive mixer switch resistance is required to achieve high sensitivity, inevitably increasing the switch size and, therefore, passive mixer driver power. This further increases the LO generation and driving power requirement, especially given the 2.4-GHz frequency.
Although this architecture consumes more power than a direct-ED approach, such receivers can still achieve sub-mW power (much lower than the 4-5 mW for BLE or 80-100 mW for WiFi main radios) and high sensitivity, with performance generally being better in a more scaled CMOS process where dynamic switching power is low. For example, in [28], a sensitivity of −72 dBm is achieved with 173 μW in 14-nm FinFET technology for Wi-Fi. However, since the signal bandwidth is purposely reduced for a WuRX, the sensitivity of this zero-IF approach is then limited by the post-mixer stage 1/f noise [28]. In [29], a dynamic amplifier with low 1/f noise is proposed to address this issue, enabling a design that achieves a sensitivity of −92.4 dBm at 340 μW in 28 nm, again for Wi-Fi. Further SIR improvement is achieved in [30] by replacing the ring oscillator with an LC oscillator as a tradeoff with power, resulting in a sensitivity of −92.6 dBm with 16.6-dB better SIR compared to [29] at 495 μW. While this approach may work well for Wi-Fi back-channel communication, which has a larger signal bandwidth, careful redesign would be required when translating to lower bandwidths required by BLE to combat not only the more relatively important 1/f noise but also possible FLL frequency fluctuation, to properly demodulate the wake-up signal without sensitivity degradation.
A mixer-first heterodyne architecture can be adopted to deal with 1/f noise issues that will come into play due to lower BLE bandwidths, which adds amplification at an IF away from the 1/f noise corner frequency [31]. This approach is shown in Fig. 15(b). The design in [31] achieves a sensitivity of −85 dBm with 220 μW in 65 nm in part by operating the RF LO at one-third of the signal frequency (with a frequency tripler) and using a 0.5-V supply voltage. Moreover, a PLL instead of an FLL is adopted, which guarantees frequency stability for low bandwidth signal demodulation. It also achieves good interference resiliency (i.e., a signal to interference ratio, or SIR of −60 dB) even using of a ring oscillator when operating under the proposed 3-channel frequency-hopping voting mode. However, this prior-art requires a custom off-chip single-die 3-channel FBAR filter for image rejection without I/Q RF LO signals. Although this off-chip image rejection approach is suitable for BLE applications requiring a single FBAR die, it cannot be operated under Wi-Fi mode without using multiple multichannel FBAR die.
In [32], a heterodyne WuRX architecture that can support dual-mode BLE/Wi-Fi operation while improving sensitivity over [31] is presented [ Fig. 15(c)]. Before the first downconversion, a matching network and current-reuse LNA improve sensitivity over prior-art mixer-first architecture [31]. After amplification, I/Q passive mixers are used to downconvert to the first IF at 8 MHz, where IF amplifiers can power efficiently amplify the signal. A passive poly-phase filter can then be employed for image rejection without needing an off-chip image rejection filter. After the second downconversion, a programmable-gain BB amplifier with a built-in low-pass filter further amplifies the signal and rejects both noise and interference to increase the ED output SNR. The ED then provides a squaring function for energy detection purposes. The ED's output is then oversampled and digitized by the comparator, which serves as a 1-bit ADC. The digital BB finally determines the wake-up and, along with the BLE/Wi-Fi dual-mode control logic, controls the channel selection for the RF front-end. This dual-mode BLE/Wi-Fi WuRX achieves −92/−90.3-dBm sensitivity at low-latency-configurable power consumption (4.4-352 μW).

VI. DUTY CYLING TOWARD LOWER POWER
Given the high dc power associated with the LNA-ED-first, uncertain-IF, and on-chip LO architectures, a duty cycling scheme must be adopted if sub-or near-μW operation is desired. Duty cycling can be implemented asynchronously with the aid of a dedicated base-station-transmitted signal (TX), such as the synchronous target wake-up time' method preferred by prior demonstrations [28]. However, asynchronous duty cycling can enable mesh-type networks and does not require any dedicated TX signal. Hence, this section focuses on two asynchronous duty cycling schemes known as the BLDC and PLDC, as shown in Fig. 16. Both of these methods can be adopted in situations where the receiver has no prior information about the transmitter.
As shown in Fig. 16, the BLDC method senses a portion of each transmitted bit (T on,B ) over a full bit duration of T per,B . The minimum BB bandwidth required to capture the rectified energy is given by the rise-time to bandwidth relationship The PLDC method requires back-to-back wake-up packet transmissions such that a shorter on-time of the receiver can be adopted. This is because in a back-to-back message transmission, the receiver only needs N bits (one packet duration) to be captured instead of 2N due to the repetition of the code. The captured N bits can then be rotated one bit at a time and compared against the wake-up address associated with the node to successfully detect a wake-up interrupt.
The BB bandwidth of the PLDC case needs to satisfy the Nyquist criterion and given by The power dissipation with the BLDC (P BLDC ) and PLDC (P PLDC ) methods are related to the instantaneous power (P inst ) by Equations (18)-(21) describe the general BLDC and PLDC operation. The same latency condition requires the period of the PLDC T per,B to be equal to the total number of bit duration in the BLDC case

FIGURE 17. Illustration of (a) BLDC and (b) PLDC for same latency and power.
Considering the special case where both the latency and power are of equal importance, the following relationship can be obtained: Combining the expressions (22) and (23) provide The case for equal latency and power is shown in Fig. 17. The unit time interval shown is the transmitted bit duration for the PLDC case, where the wake-up message length (N) is 4 bits. The bit duration of the BLDC case corresponds to 4-unit intervals, and the duty cycle corresponding to both cases is 25%. The relative BB bandwidth difference between the two cases is BW BLDC BW PLDC = 0.35.
The above analysis (25) shows the sensitivity difference between PLDC and BLDC is 4.6 dB for a heterodyne receiver with the same power and latency. However, the PLDC bandwidth may be reduced below the requirements if certain statistics of the packet structure are known, such as the consecutive number of 1's and 0's allowed in a message and BB pulse-shaping information, which relaxes the intersymbol interference requirements. For the case of an LNA-ED-first or a Uncertain-IF with RF bandwidth much larger than BB bandwidth, this reduces to 2.28 dB.
The analysis so far has assumed an ideal abrupt startup; Fig. 18 shows the effect of nonideal startups. A startup time of one additional bit for the PLDC case is assumed. This leads to a power increment of 100% for the BLDC but only 25% for the PLDC. In other words, the BLDC method achieves a poor "energy-per-bit" metric for the case of the same latency and power as the PLDC receiver. Note that in the case of an integrated PLL with a low-frequency reference (3 kHz), such as in [28], PLL settling time can be several milliseconds, such that heavier power penalty may be paid in a BLDC scheme where the startup time is now much longer than a bit sensing period.
Another undesirable effect of BLDC is the dc offset and interference upconversion due to the spectrum shaping due to an abrupt rectangular sampling window [56]. Given that a PLDC receiver operates mostly in the steady state, such spectrum shaping does not occur for a sufficiently large T on,P .
A summary of the above analysis is shown in Fig. 19. For a fixed transmitter bit rate, BLDC needs to tradeoff sensitivity to achieve lower power, while PLDC can tradeoff latency for power while maintaining the maximum sensitivity. This tradeoff assumes that the transmitter continues the back-to-back wake-up pattern transmission for a maximum predetermined latency period. A similar scheme cannot exist in an asynchronous BLDC, since maintaining maximum sensitivity while lowering the power requires a fixed on-time while reducing the bit rate. Such a reduction in the bit rate must be correctly conveyed to all the WuRX nodes in a network, which requires a synchronous network. Additionally, in the case of extremely narrow bandwidths, the required reference oscillator accuracy for RF channel selection in downconverter-based typologies can be prohibitively large such that a BLDC may not be suitable.

VII. CORRELATOR ARCHITECTURES
To demodulate the signal in the WuRX, the signal is typically digitized by an ADC or comparator and then fed into a digital correlator to compare the received data with a reference code [6], [9], [11], [34], [56], as shown in Fig. 20(a). A longer correlator code length provides higher coding gain and can relax the requirement on the minimum required SNR for detection. Bassirian [49] showed that a 63-bit correlator can provide around 6 dB more SNR than an 8-bit correlator.
Digital correlators require synchronization between the clock and data. 2× and higher oversampling is typically implemented to overcome the phase asynchronization [6], [9], [34]. In a duty cycled WuRX system, a first in first out (FIFO)-based linear shifting correlator requires WuRXs to stay on for at least two full packet lengths for a guaranteed detection of the wake-up due to the phase misalignment between transmitters and receivers. Dissanayake et al. [34] presented a rotating digital correlator technique to reduce the RF block on time to one packet by only leaving the low power digital on for two packets and rotating the captured code to compare it with the reference code. Digital correlators have the advantage of small area and typically can be implemented with 1 nW [8], [9]. However, since quantization occurs before the digital correlator, it has no selectivity to in-band AM interference.
Analog correlators have been proposed to address the above issue [57], [58], as shown in Fig. 20(b). Analog correlators avoid the need for clock synchronization by performing the correlation continuously. In addition, the correlation acts like a matched filter before the BB decision, suppressing AM interference of unwanted codes while maximizing the signal that matched with desired code, as shown in Fig. 21. Selecting orthogonal codes can ideally maximize this advantage. As a result, if different WuRXs require different wake-up codes, the analog correlator enables the ability of WuRXs to wake up multiple receivers simultaneously utilizing code-division multiple access (CDMA). Although there is no coding gain in analog correlators, the correlation adds signal in magnitude but adds noise in power [57]. So it provides a 10log(N) dB SNR processing gain when using nonreturn to zero (NRZ) coding, and a 10log(L) when using return to zero (RZ) coding, where N is the number of bits in the correlator and L is the number of 1's in the RZ code. The current challenges of analog correlators include the relatively larger area and higher power (37 nW for an 11-bit correlator in [57]) compared to the digital correlator, and the delay cells in the correlator require calibration.

VIII. FIGURE-OF-MERIT AND TRADEOFF ANALYSIS A. FIGURE-OF-MERIT ANALYSIS
So far, this article has discussed several WuRX architectures with different sensitivities, latency/data rates, and power consumption results and tradeoffs. These differences make it unclear which design approach is best suited for a particular application. This section will describe a theoretical analysis that will result in a few different FoMs that can be used to compare previous work, while also enabling better insight into the corresponding design tradeoffs in WuRXs.
Because of its simplicity and low power, frequency downconversion using an ED is present in most WuRXs; however, the inherent nonlinear squaring function on both signal and noise makes deriving the noise figure nontrivial. It is shown in [38] that three kinds of noise can be the limiting factor on sensitivity.
1) Baseband Noise: This includes the noise of ED itself and all the BB amplifiers/filters before ADC. The sensitivity limited by BB noise can be written as Assuming the BB noise is mostly white, V 2 n,eq is proportional to BW BB , which makes P MDS proportional to √ BW BB . BB noise tends to dominate in designs with insufficient pre-ED RF voltage gain. 2) Convolution Noise: For designs with sufficient pre-ED gain, noise is dominated by the pre-ED circuitry, which can be dominated by one of two different sources.
In the case where the pre-ED RF filter bandwidth, BW filter , is small, the convolution between the RF signal and the noise at RF, caused by the nonlinear squaring function of the ED, can dominate. The minimum detectable signal in this situation is given by (5). For convolution noise-dominated designs, P MDS is proportional to BW BB . 3) Self-Mixing Noise: On the other hand, if the pre-ED RF filter bandwidth BW filter is large, sensitivity is dominated by the portion of the noise that is self-mixed, which is called self-mixing noise. The minimum detectable signal, in this case, is given by (6). For self-mixing noise dominated designs, P MDS is proportional to √ BW BB . In the latter two cases, the bandwidth that determines if sensitivity is limited by convolution noise or self-mixing noise can be derived by equating (5) and (6) directly, which results in Based on this, all of the reported WuRX architectures can be further divided into six categories based on these three sensitivity-limiting factors.
1) ED-First: The first category includes directdemodulation WuRXs that feature only passive voltage gain before the ED (i.e., ED-first architectures) [4], [6], [9], [11], [12], [13], [39], [40], [41], [57], [59], [60], [61]. Because of the limited passive gain, the sensitivity for this category is dominated by BB noise only, as described in (26). 2) LNA-ED-First With High Pre-ED Gain: Some designs improve sensitivity by putting an LNA before the ED to get extra active RF gain beyond the passive gain of the matching network. However, because of the high noise figure for the ED, some designs still cannot achieve sufficient gain, even with an LNA, to make pre-ED noise dominate [15], [55], [62]. For these designs, sensitivity is still dominated by BB noise. On the other hand, with enough RF gain from the LNA, sensitivity is then limited by self-mixing noise because of the large pre-ED filter bandwidth at RF [62], [63]. In both scenarios, sensitivity is proportional to √ BW BB no matter what LNA gain is provided.

3) Direct-Demodulation, Active Pre-ED Gain With
MEMS Filter or CE-OOK: Researchers have shown that by incorporating either a high-Q MEMS filter for pre-ED noise rejection [17], [56] or a CE-OOK technique to allow narrow band IF filtering [22], the BW filter of the direct demodulation architecture with active pre-ED gain can be low enough to have the convolution noise dominate the overall noise, which makes the sensitivity proportional to BW BB instead of √ BW BB . 4) Mixer-Based Heterodyne or Zero-IF: As discussed in the prior sections, the mixer-based heterodyne or zero-IF architecture can have sufficient gain and filtering before the ED, which makes convolution noise dominate. Therefore, sensitivity is proportional to BW BB for this category. Note that some designs have adopted a conventional linear signal chain without the square function provided by the ED, which also makes the sensitivity proportional to BW BB .

5) Mixer-Based Uncertain-IF With Large Pre-ED Filter
Bandwidth: The mixer-based uncertain-IF architecture can be divided into two categories depending on whether the pre-ED filter bandwidth BW filter is larger or smaller than BW cor from (27) for a given set of parameters. For designs, such as [23], [25], [36], and [64] that have relatively large BW filter , selfmixing noise dominates, which means that sensitivity is proportional to √ BW BB .

6) Mixer-Based Uncertain-IF With Small Pre-ED Filter
Bandwidth: As discussed in Section V, uncertain-IF designs that adopt the LC oscillator [26], [34] can achieve small BW filter , which makes the sensitivity limited by convolution noise. Therefore, sensitivity is proportional to BW BB instead of √ BW BB .
The result of this analysis is that the sensitivity of these designs is all either linearly or square-root proportional to BW BB . Therefore, for designs that report sensitivity at a 0.1% BER, normalized sensitivity can be taken one of the two ways based on its relationship with BW BB P SEN,NORM,1 (dB) = −P SEN + 5 log BW BB (28) P SEN,NORM,2 (dB) = −P SEN + 10 log BW BB .
The factor 5 in (28) comes from the square root function. On the other hand, for designs that report sensitivity at 0.1% MDR, since the measurement involves the digital BB/correlator and the associated wake-up signature, sensitivity is normalized to the latency where BW BB is replaced with the 1/Latency value. Similarly, the bit period is used for WuRXs that only report BER. However, it should be noted that latency is a more pertinent representation for a WuRX sensitivity over BER or packet error rate due to the additional constraints imposed by the false alarm rate. Using (28) for Categories 1, 2, and 5, and (29) for Categories 3, 4, and 6, the corresponding normalized sensitivity versus power for the state-of-the-art WuRXs is shown in Fig. 22. Moreover, the FoM that takes both data rate and power consumption into account can be derived as which is shown by the fixed lines in Fig. 22. Here, a higher FoM is considered a design that better navigates the noise/power/bandwidth tradeoff.

B. BOUNDARIES BETWEEN ARCHITECTURE CHOICES
One major decision constraining the available design choices is the RF carrier frequency. Various tradeoffs enabled by low-or high-frequency operation are shown in Fig. 23. A lower carrier frequency generally aids with low-power circuit design and longer communication range due to the low path loss. The required antenna size can be larger, however, this negatively affects the total device area/volume. The inductors required for the input matching or LC oscillators are also area consuming and generally restricted to the off-chip domain. Thus, the choice of a low-frequency operation may lead to a large communication range but at the cost of a large footprint and, hence, tend to be more suitable for large-scale outdoor networks, such as agricultural monitoring and farms. Given the larger volume, one could argue that a large battery can also be included, and the need for low power consumption is reduced. Careful application-level decisions need to be made here. Conversely, a higher carrier frequency requires high power for active gain and/or RF filtering but benefits from a smaller antenna area. The on-chip inductors, although they have lowquality factors, are also available, which benefits integrated low-volume systems. High bandwidths can lend to a high number of simultaneously accessible nodes by employing multichannel communications and efficient spectrum management but also suffers from more interference issues. Due to the high-integration factor, high path loss, and availability of larger bandwidths, high-frequency operation is ideally suited for short to mid-range indoor networks (<100 m). Fig. 24 shows the dc power versus normalized sensitivity of several state-of-the-art works for sub-GHz and multi-GHz regions [65]. Two bounds on the design space can immediately be identified at the edges. The heterodyne topology achieves the best sensitivity but also consumes the highest power, while ED-first consumes the lowest power but achieves the lowest sensitivity. ED-first suffers from a lower sensitivity at higher frequencies since the voltage boosting due to the input matching is limited due to the low-Q components at multi-GHz. The LNA-ED-first topology achieves moderate dc power for moderate sensitivity, and its power can be reduced to sub-μW levels with duty cycling. This moves the receiver operating point diagonally downward, as shown by the duty-cycling trends on the plots. Sensitivity can be improved by adopting high-Q narrowband MEMS filters to achieve a lower noise equivalent bandwidth (ENB) prior to the rectifier [56]. However, the MEMS components result in larger node volume, higher integration costs, and lack tunability.
The higher frequency LNA-ED-first suffers from a larger ENB leading to a lower sensitivity. However, if the LO stability can be improved, then the Uncertain-IF can reduce the ENB with tight lowpass filtering after the mixer. Hence, techniques, such as high-Q off-chip inductors for the LC oscillator [26] and built-in calibration methods [34] have been proposed. Uncertain-IF also lends more easily to duty cycling than heterodyne due to the omission of a PLL and high-Q crystal references.
The summary of the identified operating space tradeoffs is shown in Fig. 25(a). The ED-first and heterodyne architectures occupy the edges of the operating space for sensitivity and power. The in-between regions can be realized by either heterodyne, Uncertain-IF, or LNA-ED-first combined with duty cycling. The sensitivity of the LNA-ED-first can be improved with MEMS-based ENB limiting filters, while the Uncertain-IF sensitivity can be enhanced with calibration techniques to improve the LO stability.
The SIR trends are shown in Fig. 25(b). The ability to reject interference is very important if the WuRX is expected to operate in a crowded area with an increased number of potential interference sources. Among the interference trends, the heterodyne option achieves the best performance due to the ease of sharp filtering at IF/BB. ED-first and LNA-ED-first cannot achieve any BB filtering without incorporating a channelized signaling scheme. Even with a channelized signaling scheme, the reported LNA-ED-first works have achieved poor SIR [19], [22], which can be attributed to the front-end nonlinearity due to the high RF gain required for high sensitivity. The benefit of such channelized signaling methods seems to be the capability of ac coupling the ED output to BB, which alleviates large dc offset due to the noise-self mixing effect [19]. Prior work adopting channelized signaling schemes lowered the frontend gain to achieve better SIR, but this also lowered the sensitivity [20]. Hence, further work is required to conclude that the channelized signaling schemes can provide reasonable SIR while maintaining the sensitivity.
A straightforward way to improve the SIR metric for the ED-first and LNA-ED-first options is to adopt a high-Q input matching [40] or a front-end MEMS filter [56], where both choices lead to off-chip component integration. The Uncertain-IF topology can realize better SIR even with a wideband input match due to the selectivity offered by downconversion and filtering at BB [26]. Similarly, SIR is also enhanced with better LO stability. The insights related to the SIR trends and the approximated minimum power floor with duty cycling are illustrated in Fig. 25(b). The minimum power floor is estimated from the values reported from the prior art and considering the overall WuRX system complexity.

IX. CONCLUSION
This article has shown that simultaneous achievement of lowpower and good sensitivity, wake-up latency, interference resilience, size, and robustness, can be quite challenging, especially if standard compatibility is required. However, this article has also shown that a careful selection of architecture, circuits, and system-level techniques can enable WuRXs that can meaningfully achieve many of these requirements and, therefore, enable lower power connectivity than conventional approaches in a large number of emerging application spaces.
The key tradeoffs involve deciding whether to include an LO and/or an LNA, and deciding how much wake-up latency is tolerable by the underlying application.
Moving forward, the next generation of WuRXs will need to find a way to enable compatibility with more standards to enable more widespread adoption. Including wake-up modes directly into the standards themselves will help tremendously. It will be important to continue researching ways to improve interference resiliency, especially in very congested spectral bands, while keeping power low and sensitivity matched to that of the main radio. Further research on low-power LO generation, sharp RF filtering, and antenna/matching network/LNA/ED co-integration can help to further achieve these goals and bring WuRXs to more commercial spaces.