A 0.7-V Sub-mW Type-II Phase-Tracking Bluetooth Low Energy Receiver in 28-nm CMOS

We present an architecture of a Bluetooth low energy (BLE)-compliant receiver which, for the first time ever, breaks the 1mW barrier of power consumption. It is based on a type-II phase-tracking loop and addresses the mutual magnetic coupling between on-chip inductors of a digitally controller oscillator (DCO) and low-noise transconductance amplifier (LNTA), which causes RX performance degradation in the prior-art implementations. An inverter-based inductor-free LNTA is employed instead. The resulting adjacent channel rejection (ACR) improves by 1.5/2.5dB at 2/3MHz offset. By further leveraging current-reuse and switched-capacitor circuitry, this RX achieves the best-in-class FoM of 183.2dB with sensitivity of −93.2dBm. Thanks to the single-channel topology, the proposed RX occupies tiny area of 0.48mm2 in 28-nm CMOS.


I. INTRODUCTION
W ITH the increasingly high demand for Internet-of-Things (IoT) devices, there is a strong push for ultralow power (ULP) and ultra-low voltage (ULV) operation of wireless nodes [1]- [6]. The conventional in-phase/quadrature (I/Q) or Cartesian receivers (RX) still dissipate large amounts of power, e.g., >5 mW for commercial products [7] and >1.9 mW [5], [8], [9] for exploratory research, due to the unavoidable employment of power-hungry PLL (commonly requiring >1 mW for sufficient phase noise performance) and the need for two identical baseband I/Q signal paths. Several sub-mW Cartesian receivers have been reported, yet featuring sensitivity worse than −90 dBm [10]- [12] or only reporting an RF front-end design without counting the power of PLL [10]- [14]. Ref. [15] achieves great sensitivity at 0.98 mW, yet it does not include a basebabnd filter or ADC. To achieve a good interferer rejection, a PLL with low phase noise and a sharp filter is required, which adds to the overall power dissipation in [15].
To avoid the power-hungry PLL and quadrature demodulation path, a single-channel or phase-domain RX architecture appears to be an attractive solution for Bluetooth low energy (BLE) or comparable radios. Liu et al. [3] developed the first non-quadrature, phase-tracking (PT) receiver with direct demodulation to achieve excellent power efficiency, but it introduced new issues of constrained data pattern and unstable locking loop. The follow-up version in Ding et al. [4] employs a digital PLL and heavy digital filtering to address the issues of unstable locking and large side-lobe energy causing an ACR degradation. While the digital PLL ensures a stable locking, it degrades the system power efficiency. Moreover, the RX in [4] still does not support the long run-length of consecutive "1" and "0" sequences due to its nature of the type-I PT-RX.
Our previous work [1] resolves the aforementioned issues with a type-II phase-tracking architecture. It employs a multibit ADC to zero out the residual phase error and to provide a wider locking range in the steady-states of "1" and "0" symbols. As a result, [1] achieves excellent power efficiency with 1.5 mW consumption and sensitivity of −93 dBm. However, the current implementations of PT-RXs commonly suffer from mutual coupling between on-chip inductors in the lownoise transconductance amplifier (LNTA) and the digitally controlled oscillator (DCO), which results in a reduced tolerance to interferers and degraded signal-to-noise ratio (SNR). Furthermore, despite being able to offer an ULV operation at <0.4 V, the inductor-based LNTAs in [1], [3] still consumes more than 2× the power of a push-pull architecture [16] at roughly the same noise and gain, but with a much larger occupied area.
The objective of our work is to address the remaining mutual coupling issue and to break the 1 mW power consumption barrier of a BLE compliant receiver. We employ an inductorfree inverter-based LNTA to obtain an overall NF of 6 dB at a maximum RX gain of 60 dB. Furthermore, inverter-based g m cells are widely used in this work to support the ULV operation, which also promotes power efficiency. This enables the use of a supply voltage of 0.7 V and results in record-low total power consumption of 900 μW. Apart from the mutual coupling, excessive loop delay can also severely degrade the interferer tolerance. Noise analysis and type-II theory have been explored in our prior work [1], but there are still open questions about the optimization of loop delay and selection of ADC resolution. Therefore, to provide design guidance for the type-II PT-RX optimization, we also extend the system analysis in this article via a nonlinear model of type-II PT-RX and offer some further discussion on ADC resolution.
This article is organized as follows. Section II discusses the mutual coupling issue in the PT-RXs. Section III gives a brief overview of the proposed sub-mW RX. Section IV provides the analysis of loop delay and bandwidth to achieve the best sensitivity and interferer rejection performance. The key circuit blocks are introduced in Section V, followed by measurement results in Section VI. The conclusions are drawn in Section VII. Figure 1(a) illustrates the parasitic mutual coupling paths (labeled "C" and "D") between the LNTA and DCO inductors in the phase-tracking receiver. "A" is the received signal and "B" is the DCO RF component in response to the tracking loop modulation. Figures 1(b)-(d) conceptually present the resulting issues:

II. BRIEF OVERVIEW OF THE MUTUAL COUPLING ISSUE
1) The mutual coupling can further degrade the demodulated SNR due to the worsened unwanted mixing components at both the signal channel and adjacent channels, as shown in Fig. 1(b). Furthermore, this increased energy at adjacent channel (ω int ) results in a poor ACR performance. Although a higher-order discrete-time (DT) lowpass filter (LPF) with a sharp transition band [17] could offer strong filtering of the unwanted signal, the close-loop RX must still obey the typical trade-offs between the bandwidth and loop delay. The unwanted components in the signal band lead to a static dc offset. Despite the fact that this dc offset can be addressed by auto-calibration, 1 it further enlarges the required dynamic range of the DAC. Therefore, it is necessary to minimize any coupling to prevent the SNR degradation. 2) Coupling from a strong adjacent-channel interferer (e.g., at 1 MHz offset) can give rise to DCO pulling or even locking to that interferer, and thus can severely impact the tracking loop [see Fig. 1(c)]. This is similar in concept to the injection locking and pulling in an oscillator. As discussed in [18], [19], an oscillator coupled to a strong interferer maintains an injection locking range ω L as follows: where I inj and I osc are the injection and oscillator currents, respectively. ω 0 is the natural angular resonant frequency of the tank and Q is its quality factor. According to Eq. (1), to avoid this injection locking to an adjacent-channel interferer, a large I osc is needed, which is against our ULP objective. In the injectionpulling mode, according to [19], the power of the largest pulling spur P ω spur can be derived as: where ω m refers to the frequency offset between the pulling spur and the center of resonant frequency ω 0 . This indicates, the closer the adjacent channel, the larger the pulling spur. The inductive coupling could be mitigated by resonating the DCO at 4.8 GHz and dividing it by two, at a cost of increased power dissipation or worsened phase noise performance. 3) Since the DCO frequency tracks the GFSK modulated data in real-time, the coupling energy would also contain a symbol-depend dynamic offset, which, via self-mixing, would cause a dynamic dc offset at the LPF output [see Fig. 1(d)]. Thus, static and time-dependent offsets must be also minimized as not to degrade the sensitivity performance.
III. OVERVIEW OF THE PROPOSED SUB-mW PT-RX Figure 2 shows the proposed phase-tracking RX. The RF input is passed to an inductor-free inverter-based LNTA, which cuts the power by 50% compared to the LC-tankbased LNTA [1], [3]. The passive mixer functions as both a down-converter and phase detector (PD), without contributing any flicker noise due to being biased below V TH . A process-voltage-temperature (PVT)-independent switchedcapacitor-based discrete-time (DT) LPF is utilized to remove the unwanted mixing harmonics and interferers with a programmable bandwidth of 0.4 to 3 MHz. The residual phase error is digitized by an 8-bit SAR-ADC before delivering to the digital proportional-integral (PI) loop filter. The recovered Gaussian frequency shift keying (GFSK) waveform at the 9-bit oscillator tuning word (OTW) modulates the DCO to generate a replica of GFSK-modulated RF input signal. Figure 3 summarizes the advantages of the type-II PT-RX operation, which were discussed in detail in [1]. The key point is that the conventional type-I PT-RX [3], [4] cannot zero out the residual phase at the comparator output [either "1" or "0", see Fig. 3(a)] even with an additional integrator (digital accumulator). Borrowing an analogy from a PLL field, this is because a bang-bang (BB) phase detector's (PD) (equivalent to the combination of a mixer and one-bit quantizer in the type-I PT-RX) gain is nonlinear. The linearized BB-PD gain can only cover a small range near its threshold [20], [21], where the transition of each symbol in a conventional PT-RX is; however, the BB-PD gain drops to nearly zero in the steady-states [see Fig.3(a)]. This may result in a significant reduction of the loop bandwidth or the loop may intermittently lose the lock, thus causing instability. To zero out the phase error, the PD has to operate in the linear range to provide a certain gain. In other words, in the previous type-I PT-RX, the comparator has to toggle around its threshold to obtain enough gain in the steady-states; however, this is contradictory with its other key role of demodulating input symbols, which requires the comparator to hold its output until the next symbol.
The type-II phase-tracking configuration can also mitigate side-lobe energy at the DCO output. Figure 3(b) shows the oscillator tuning word (OTW) curves for various cases. Subfigures (i) and (ii) in Fig. 3(b) correspond to the type-I [3], [4], while subfigure (iii) to the type-II PT-RX. Note that the remaining parts of the phase-tracking loop are not shown in Fig. 3(b) for simplicity. Thanks to the employment of the multi-bit ADC, our type-II PT-RX tracks the input modulating waveform more faithfully and power-efficiently. Note that Ref. [4] [see subfigure (ii)] mitigates the side-lobe energy notably by inserting a relatively power-hungry (0.75 mW) digital filter, which is not congenial with our goal of sub-mW operation.
It is worth mentioning that, for a type-II loop, setting a value for the ratio of integral and proportional coefficients (ρ/α) is often a trade-off between increasing overshoot and decreasing settling time. The principle we have followed was setting the value of α to ensure the DCO tracking range would be slightly larger than the deviation frequency range of 500 kHz so as to decrease the settling time, but setting the ρ/α ratio smaller than 0.01 to ensure enough damping [1].

A. Linear S-Domain Model Including LPF's Poles
Although the linear model in [1] can be quite effective for the analysis of signal and noise transfer functions (STF/NTF), it might be overly simplistic in some cases as it does not take into account the LPF bandwidth. An improved linearized model that does include the effects of LPF poles is shown in Fig. 4. A seventh-order discrete-time (DT) realpole LPF is employed in this work. Since the sampling frequency (∼120 MHz) is much higher than the signal bandwidth (1 MHz), we can use a bilinear transform to obtain the CT transfer function of the DT LPF as shown in (3) [17], where n refers to the order of the filter. Equation (3) indicates that all seven poles coincide at −1/λ. The STF and NTF DCO can be derived as (4) and (5), respectively.

B. Loop Delay Optimization
Due to the incorporation of LPF, the group and phase delays 2 in the PT-RX loop are unavoidable. After the signal x(t) transitions through the LPF, certain values of group delay τ g and phase delay τ p will be added to the amplitude and phase of the cosine output y(t) as: where A(t) and θ refer to the amplitude and initial phase of the cosine signal, respectively. Figure 6 plots the group and phase delays of a fifth-order Butterworth LPF (mathematical model) with a 1-MHz bandwidth. 3 It reveals that the signal exhibits a constant time shift at low frequencies, but then experiences severe delay changes in the ∼1-MHz roll-off region. This indicates that during the symbol transitions, the loop undergoes the worst instability. This conclusion will be used later on. For the constant-envelope GFSK modulation, only the phase carries the information. Its characteristic versus bandwidth of the Butterworth LPF is shown in Fig. 7. As expected, their relationship is inversely proportional. This phase retardation through the LPF is likely to dominate the loop delay of the PT-RXs (both type-I and type-II). To avoid disturbing the next incoming symbol, intuitively, the loop delay cannot exceed 1/ f symbol , where the f symbol is the symbol rate. This leads to the requirement that the loop bandwidth should be as wide as possible to properly track the incoming symbols. On the other hand, in a scenario of a substantial interferer or blocker  coming with the desired signal, a narrow LPF bandwidth is required to filter out the undesired components; otherwise, this interference could degrade the baseband SNR or even saturate the receiver. Hence, the phase delay of LPF worsens the tradeoff between bandwidth and interferer attenuation.
To gain a deeper insight into the loop delay effects, we follow a similar nonlinear time-domain mathematical model introduced in [22] but with adjustments for the digitally intensive type-II phase-tracking loop. To simplify the analysis and to be able to better visualize the waveforms, the digital PI controller in Fig. 8 is relocated to the PD output. The PD itself is modeled as a combination of the phase subtractor and a nonlinear sine function. The PD response is a function of two inputs, IN (t) and DCO (t), being the input GFSK and modulated DCO phase signals, respectively: where f C is the carrier frequency. Assume that the DCO locks to f C correctly and the harmonics caused by the mixing will be suppressed by the following LPF. Then, Eq. (8) can be simplified as In the following, we model the input dt as a sinusoidal frequency-modulated (FM) signal with a symbol rate of T sym for three reasons: 1) This sinusoidal FM signal represents the fastest, namely, the worst-case input.
2) The GFSK modulation in the transmitter applies a Gaussian filter to smoothen the transitions between the symbols "1" ( f C + 250 kHz) and "0" ( f C − 250 kHz); with the modulation index of 0.5, these symbol transitions follow an almost sinusoidal trajectory. 3) As mentioned in Section IV-B, the loop exhibits the worst instability during the transitions between the symbols. The DCO output f DCO (t) = d DCO (t ) dt can be modeled as a delayed version of the input. Note that because the type-II arrangement tracks the input symbols with much finer resolution thanks to the multi-bit ADC, it is far more justifiable to model the DCO frequency as a delayed input frequency compared to the case in the type-I PT-RXs. 4 The frequency difference between the input-modulated signal and the lagging DCO output is derived as where is the 250-kHz symbol peak deviation frequency of the incoming GFSK signal, f (DCO) pk = f pk implies accurate DCO gain estimation, and τ loop is the total loop delay, which comprises the phase shift of the DT LPF, delays of the ADC and digital processing, etc. Since the phase error is an integral of the frequency error, we obtain: where max is the maximum phase error equal to max = 4 · f pk · T sym · sin π · τ loop 2T sym .
By substituting formula (11) into (9) and assuming Taylor series expansion, 5 the PD controller output is derived as follows, where (t) 3 can be expanded as follows, 4 In [22], the square OTW shape [see subfigure (i) in Fig. 3(b) above] does not make it sufficiently convincing to model the DCO output as a delayed version of input frequency, which is GFSK modulated. 5 For sin(x), the Taylor series expansion is sin( By substituting (14) into (13), the PD output is equal to Thus, the PI controller output 6 can be derived as follows, (16) where α and ρ refer to the proportional and integral coefficients of PI controller (note that α ρ [1]), and T s is the clock sampling period. The LPF suppresses out the third-order harmonics at the PI output, hence, where τ LPF refers to the phase delay caused by the LPF. In case T sym = 1μs, α ρ·T sym π T s , then (17) would simplify to Assuming that the ADC and digital processing introduce an extra time delay of τ d , then the DCO output phase can be derived as Note that the amplitude of LPF in Eq. (18) is normalized to the peak deviation frequency via a compensation of PD and ADC gains, and the PI coefficient α assists the DCO with the correct tracking. 7 As mentioned above, the DCO output phase is a delayed version (τ loop ) of the input signal, so we can obtain: Commonly, τ d would be at least ×10 smaller than τ loop , hence, Furthermore, (17) indicates that to obtain the maximum amplitude/SNR at the LPF output, we should optimize the loop delay with the guidance of the following equation, 6 Assume that the PI controller operates in a continuous-time domain since its sampling frequency ( f s =25 MHz) is much higher than the signal bandwidth of interest (1 MHz). 7 Coefficient ρ also helps the tracking process, but it predominantly handles the repeating symbols [1]. Equations (21) and (22) indicate that for the GFSK modulation with 1 Mbps symbol rate, the LPF should be optimized with a phase delay of 0.5 μs, such that the phase-tracking loop obtains a maximized SNR.

C. ADC Resolution
The SAR-ADC plays significant roles in the proposed system, such as zeroing out the residual phase error, providing a wider locking range in the steady-states (i.e., symbols of "1" and "0") and reducing the side-lobe energy versus the 1-bit comparator case in the type-I PT-RX. The ADC merely needs a ≥2-bit resolution to form a type-II loop; however, to achieve better ACR performance, a fairly high resolution is still required. Note that the demonstrated excellent energy efficiency of contemporary SAR-ADCs brings these additional benefits at almost no extra cost. Figure 9 confirms our expectation that the higher the ADC resolution, the lower the bit error rate (BER). However, at a certain level of ADC resolution (8 bits), the BER improvement will stop and this marks the point where neither quantization noise nor side-lobe energy dominates. To meet the BLE specification, a 20-dB rejection at 2 MHz offset is targeted with a 3-dB margin, and so the ADC requires at least 8 bits. The ACR performance of the proposed PT-RX achieves an improvement of 8.5/3.5/6.5 dB at ±1/±2/±3 MHz offset versus that in Ref. [3]. Ref. [4] inserts a fourth-order Chebyshev II digital filter after the PI block [see subfigure (ii) in Fig. 3(b)] to filter out higher-order harmonics in the OTW curve, but at a cost of 0.8 mW and extra loop latency. Compared to that, the multibit SAR ADC resolves this issue via a more straightforward and efficient method of tracking the GFSK signal with higher fidelity. Additionally, it costs only 0.05 mW and does not bring any significant latency into the loop. Figure 10 shows the system simulation results of the DCO output spectrum at various levels of ADC resolution, i.e. 1, 3, 4, 7, and 8 bits. Due to the limited memory resources in this simulation, the DCO's carrier frequency is set to 16 MHz, but this by no means affects the efficacy of the concept. It can be observed that the main lobe occupies the 1.5 MHz bandwidth (since the 500 kHz input signal bandwidth is modulated by the 1 Mbps data rate). For the first side-lobe, a 3-dB reduction is achieved by applying an 8-bit ADC compared to the one-bit quantizer. However, no reduction in the first side-lobe was obtained in the case of one-bit quantizer with the fourthorder digital filter [4] due to the constraint between bandwidth and delay. Moreover, a >20-dB improvement is gained for the other side-lobes, while only a 3∼6-dB improvement is achieved in [4]. Therefore, with the employment of a lowcomplexity ADC, our proposed type-II PT-RX can help reduce the side-lobe energy dramatically without either extra cost of power or latency.
As a side note, an automatic gain control (AGC) function is one of the essential features in commercial receivers. Even though the AGC is not implemented in this design, it would be useful to take it into consideration when deriving design specifications for the system. For instance, with a maximum input of −10 dBm and sensitivity of −93 dBm, an open-loop RX requires to have a dynamic range (DR) of 83 dB. What, then, is the minimum range of the receiver's variable gain before the ADC to capture this DR, while arranging for as much as possible of the remaining AGC function into the DSP? It is relatively easy in today's technology to realize a SAR ADC (25-MS/s) with a power of <0.1 mW and 8-bit resolution, which can offer a 50-dB (=8 × 6.02+1.76 dB) DR. Suppose that the maximum ADC input is −6 dBm, then a minimum gain of 4 dB would be required for this RX. With a requirement of 12 dB for SNR and a margin of 6 dB, the RX is obligated to have a minimum gain of 55 dB (=−6 dBm-50 dB+12 dB+6 dB+93 dBm). Thus, we are taking advantage of the low-power consumption feature of digital and SAR-ADC circuitry in nanoscale CMOS.
To summarize, the number of ADC bits in this architecture is determined by two aspects: 1) How much side-lobe energy and quantization we would like to mitigate to meet the design specifications. 2) How much AGC function we would like to delegate to the DSP for the sake of low-power and digitalintensive design. Based on these two aspects, a 10-bit SAR ADC (with a 2-bit margin) was implemented in this work. Figure 11 shows the circuit-level details of the proposed RX. Apart from the DCO operating at 0.22 V, the rest of RX can run at a 0.7-V supply due to the employment of inverter-based and DT approaches. The inverter-based LNTA avoids any use of an on-chip source-degeneration metal inductor in order to minimize the coupling energy with the oscillator. To obtain good input matching without the conventional on-chip sourcedegeneration inductor (exploiting only a small value of wirebonding inductance), the transistor's cut-off frequency f T is optimized for a slightly higher value such that it compensates the loss of the real part of input matching due to the small wire-bonding inductance. As customarily done in the majority of common-source or cascode LNAs in the BLE RXs [1], [22], the off-chip inductor L G (see the LNTA schematic in Fig. 11) is employed to provide a good input matching. Although this off-chip component adds a bit of extra cost in mass production, it saves the chip area. On the other hand, to minimize the coupling from on-chip inductors, a >600 μm distance (observed from chip die) between the inductors of LNTA and DCO was imposed in Ref. [23].

A. RF Part
Thanks to a current-reuse technique, the transconductance g m of LNTA is increased by a factor of 2 with the same current. The inverter-based g m cell with a programmable gain is employed for both TIA and DT LPF with a commonmode feedback (CMFB), which stabilizes the common-mode voltage by means of sourcing/sinking current to/from the main transistors M 1−4 .
To minimize the coupling through the substrate or parasitic capacitance, several routing techniques have been adopted in this work. Firstly, a native (NT_N) layer is added under the inductors to define non-doped high-resistance regions in the substrate. Apart from the on-chip inductors, the NT_N layer is also used to separate different blocks by providing highresistance isolation between them. Secondly, deep N-wells (DNW) are placed under each key block (i.e., LNTA, mixer, TIA, LPF, digital block, etc.) to provider further isolation.
A single-balanced passive mixer is employed in this work. Along with the TIA, it provides extra bandpass filtering to the out-of-band blockers. The TIA achieves 2× better power efficiency than the solution in [1] due to two factors: 1) The TIA is designed with an optimized (3 MHz) cut-off bandwidth to offer extra blocker suppression but without sacrificing the loop delay. 2) The requirement for the input impedance of TIA (R in,TIA ) is relaxed due to the enhanced output impedance of LNTA (R out,LNTA ). Note that to ensure the current-mode operation of LNTA and passive mixer, R out,LNTA needs to be much larger than R in,TIA .

B. Analog Part
A DT LPF [17] is employed in this work due to its advantages of highly reconfigurable bandwidth, processvoltage-temperature (PVT)-independence, and very low power consumption compared to continuous analog filters [24], [25]. The g m cell converts voltage to charge on the history capacitor C H1 . In each cycle of the first phase, φ 1 , the stored charge on C H1 is sampled by the sampling capacitor C S . From φ 2 to φ 7 at each phase, C S charge-shares the residual charge from C H2 to C H7 respectively. C S is reset to virtual ground at φ 8 after sharing with the last history capacitor C H7 , and then, a new cycle starts. With the aim of attenuating in-band interferers or other undesired components (e.g., the mixer harmonics) and obtaining an optimized loop delay for the best performance (see Section IV), the typical value of DT LPF's bandwidth is set at 700 kHz with a programmability of 0.4 to 3 MHz.
The DCO [1] running at 0.22 V is implemented with a cross-coupled pair and 2:3 transformer. This 2:3 transformer achieves a higher k m of 0.82, such that it provides sufficient gain for a robust start-up at ULV, as opposed to the conventional 1:2 transformer in [8].
The SAR ADC [26] is of 10-bit resolution, providing a margin of 2 bits to minimize the side-lobe and maximize the interferer tolerance. A bootstrap structure is employed for the input sampling switches to obtain better linearity and speed. Figure 12 shows the details of the digital logic interfacing the ADC to the DCO. A PI loop filter with programmable  coefficients α and ρ is connected to the ADC. To meet the requirement of carrier offset tolerance, two kinds of the carrier frequency offset (CFO) calibration (i.e., dynamic and steady CFO) are implemented in this work. The CFO will be enabled after the four pairs of "10" preamble symbols detected. The static calibration operates only during the preamble sequence of "10101010" and detects the offset value via averaging the corresponding OTW curve, and then compensates the offset for the incoming actual data. After the preamble sequence, the static calibration will be disabled, while the dynamic calibration will be enabled to correct for the slow carrier frequency drift. After detecting a "1" symbol, this dynamic CFO holds its accumulated value until the next "0" symbol. By means of this, the dynamic CFO would not result in the accumulation error due to unbalanced symbol sequences. Figure 13(a) plots the simulated sensitivity across different levels of the carrier frequency offset. As expected, thanks to the automatic frequency calibration (AFC), the proposed type-II PT-RX has a ±4% tolerance of the symbol frequency offset without the initial coarse frequency calibration (CFO). With the CFO enabled, the proposed system can tolerate up to ±40% of the symbol frequency deviation, without any sensitivity degradation.

C. Carrier Frequency Drift and DCO Gain Error
In addition, Fig. 13(b) demonstrates the simulated result of sensitivity degradation due to a DCO gain (K DCO ) error. Compared to the type-I PT-RX in [4], thanks to the elimination of the dc phase error, our proposed PT-RX can tolerate the K DCO error up to ±10%, while only ±4% is allowed in [4].

VI. EXPERIMENTAL RESULTS
The RX is fabricated in 28 nm CMOS with a core area of 0.48 mm 2 (chip micrograph is shown in Fig. 14). Onethird of the area of our prior implementation of the type-II PT-RX [1] has been saved thanks to the inductor-free LNTA architecture as well as the bandwidth optimization of DT LPF. Compared to the Cartesian RXs in [5], [8], at least 2× better area utilization has been achieved due to the removed quadrature down-conversion path.
The power breakdown (see Fig. 15) reveals that the DCO (400 μW) and LNTA (207 μW) dominate 67% of the total power budget due to the sensitivity requirement. The g m -cells (for TIA and DT LPF) and DCO buffers burn 117 and 72 μW, respectively. Only 108 μW is dissipated by the loop filter and the digital proportional-integral (PI)-based loop filter operates at 25 MHz sampling frequency. Apart from the DCO running at 0.22 V, the RX operates at 0.7-V supply, which is the lowest supply voltage among CMOS BLE RXs.
As shown in Fig. 16, the measured BER is better than in the previously reported PT-RXs (as much as 6 dB vs. type-I in [3]) across RF input power due to the mitigation of dynamic dc offset and coupling-caused SNR degradation. Compared to [1], it maintains a similar BER while dissipating 40% lower power thanks to the current-reused techniques in the LNTA. With the BER requirement of 0.1% for BLE, the RX achieves −93.2 dBm sensitivity.   During the ACR measurement, the interference power level is kept increasing until reaching the 0.1% BER. It achieves 3.5/18.5/29.5 dB adjacent channel rejection at ±1/±2/±3 MHz offsets, respectively, as shown in Fig. 17(b). An 8-dB improvement is achieved compared to [4] at a 1-MHz offset, while dissipating less than half of its power consumption. Compared to [1] at 2/3 MHz offset, this work advances by 1.5-/2.5 dB because of the mitigated coupling energy. In Figs. 17(c), the out-of-band (OOB) rejection is measured  The NF degradation in face of in-band/out-of-band blocker is plotted in Fig. 18. The NF degrades to 22 dB with a blocker level of −50/−25/−15 dBm residing at 3-/20-/440-MHz away from the desired 2.44-GHz signal, respectively. The measured in-band/out-of-band IIP3 is −43/−16 dBm with the maximum gain of 60 dB corresponding to the receiver sensitivity of −93 dBm, respectively, as shown in Fig. 19. Note that due to the closed-loop nature of the PT-RX, the IIP3 measurement is only meaningful in open loop.
The observed LPF output waveform is shown in Fig. 20, staring at four pairs of "10" preamble symbols. It shows that in the case where long run-length symbols are present, the residual phase error of the tracking loop converges towards zero. The LPF output is scaled and accumulated by the PI loop filter to generate the OTW waveform, whose time-trajectory is a digitally demodulated replica of the transmitted bit stream. Figure. 21 shows the sensitivity vs. energy efficiency benchmark of the low-power BLE RXs. Our RX achieves the best sensitivity figure-of-merit (FoM). Table I compares this work with state-of-the-art low-power BLE RXs. This RX achieves the best-in-class power efficiency FoM with 2.2-3.2 dB improvement compared to state-of-the-art singlechannel RXs in [1], [4], [27], while maintaining cutting-edge ACR performance (similar to [4]) thanks to the mitigation of mutual coupling between the DCO and LNTA on-chip inductors.

VII. CONCLUSION
We present a type-II phase-tracking receiver for BLE IoT applications that operates at the ultra-low supply of 0.7 V and breaks through the 1 mW barrier of power consumption. The mutual coupling issues in the current PT-RXs are minimized by employing an inductor-free LNTA, which leads to an 1.5/2.5-dB improvement of ACR performance at 2/3 MHz offset. We provide a design guidance on the loop delay optimization and ADC resolution selection. By further leveraging the current-reuse and switched-capacitor circuitry, this RX achieves best-in-class FoM of 183.2 dB with a sensitivity of −93.2 dBm.