Low Power Analog Processing for Ultra-High-Speed Receivers With RF Correlation

Ultra-high-speed data communication receivers (Rxs) conventionally require analog digital converters (ADCs) with high sampling rates which have design challenges in terms of adequate resolution and power. This leads to ultra-high-speed Rxs utilising expensive and bulky high-speed oscilloscopes which are extremely inefficient for demodulation, in terms of power and size. Designing energy-efficient mixed-signal and baseband units for ultra-high speed, Rxs requires a paradigm approach, which is detailed in this paper, that circumvents the use of power hungry ADCs by employing low-power analog processing. The low-power analog Rx employs direct-demodulation with RF correlation using low-power comparators. The Rx is able to support multiple modulations with the highest modulation of 16-QAM reported so far for direct-demodulation with RF correlation. Simulations using Matlab, Simulink R2020a Ⓡ indicate sufficient symbol-error rate (SER) performance at a symbol rate of 8 GS/s for the 71 GHz Urban Micro Cell and 140 GHz indoor channels. Power analysis undertaken with current analog, hybrid and digital beamforming approaches requiring ADCs indicates considerable power savings. This novel approach can be adopted for ultra-high-speed Rxs envisaged for beyond fifth generation (B5G)/sixth generation (6G)/ terahertz (THz) communication without power-hungry ADCs, leading to a low-power integrated design solution.


I. INTRODUCTION
Terahertz (THz) electromagnetic spectrum applications such as imaging have been extensively used in radio astronomy, with the first astronomical THz images dating back to the 1960s. Since 1990 there has been much progress in THz timedomain spectroscopy for tissue characterisation and cancer detection due to advances in femtosecond optoelectronics. New antenna and CMOS integrated circuit (IC) technologies are fast emerging as an alternative for realizing affordable THz systems. This together with the availability of wider bandwidths in the THz spectrum motivates more innovative applications, for example, novel cognition, sensing, imaging, communications, and positioning capabilities that may be The associate editor coordinating the review of this manuscript and approving it for publication was Mohamed M. A. Moustafa . employed by automated machinery, autonomous cars, and new human interfaces [1].
Policy and research communities have yet to agree on the range of frequencies that would form the THz communication band. However, the Federal Communications Commission (FCC) in November 2019 formally opened up the spectrum between 95 GHz-3 THz [2] for experimental purposes termed as THz communication, beyond fifth generation (B5G) and sixth generation (6G) communication. This spectrum has many challenges such as narrow or pencil beam antennae demanding accurate line-of-sight (LOS) communication with frequent beam alignment for the mobile users. Mitigating these challenges will require research on topics such as channel modelling, antenna design, fast antenna beam alignment, radio resource management, and protocol design.
The spectrum has ultra-high-speed data of Gb/s requiring low-power and compact ICs that are challenging to design.
Due to the degradation of active device performance at frequencies close to its maximum operating frequency, the operating frequency cannot be arbitrarily high. Current stateof-the-art CMOS technologies enable circuits operating up to 1.3 THz, to detect both the amplitude and phase of signals up to 1.2 THz, and signal amplitudes up to 10 THz. Reference [3]. Other CMOS implementations include a 240 GHz quadrature phase shift keying (QPSK) transceiver with a data rate of 16 Gb/s and a 300 GHz radio-frequency (RF) transmitter (Tx) that can support 105 Gb/s (32-QAM) [4], [5].
To cater for the high path loss that occurs in signal transmission the ultra-high-speed receiver (Rx) antenna frontend (AFE) will require beamforming to increase antenna directivity. As frequency increases beamforming directional antennas incur a lower path loss for the same aperture area [6]. Beamforming can be employed in the analog and/or digital domains. Analog beamforming (ABF) is performed at either RF or at an intermediate frequency (IF) through a phase shifter (PS) and a low noise amplifier (LNA) per antenna element as shown in Fig.1, where N RX is the number of antenna elements and N S is the number of baseband chains. The antenna elements can be employed as uniform linear arrays or uniform rectangular arrays (URA). This reduces power consumption as only one down-conversion chain is required with a variable gain amplifier (VGA) and an analogto-digital converter (ADC) per digital stream. The VGA ensures that the signal power is adequate to drive the ADC. Certain applications require a high beamforming gain which is done by increasing N RX to 64, 128 or 256. For these hybrid beamforming (HBF) is preferred in order to reduce the number of PS, down-conversion chains and ADCs as N RX is high. HBF with two RF chains is shown in Fig. 1 where each RF chain has its down-conversion chain, VGA and ADC. Partial beamforming is done by PS in the RF domain and digitally in baseband. For digital beamforming (DBF) shown in Fig. 1, beamforming is performed digitally in baseband. This overcomes the limitation of being able to transmit/receive at a few or only fixed directions in ABF and HBF. Since each antenna element requires its own downconversion stage along with a VGA and ADC, DBF becomes very power hungry for large numbers of antenna elements. Even for a lower number of power requirements remain high in the case of high-speed data due to power hungry highspeed ADCs and VGAs, requiring higher sampling rates and larger gain ranges respectively to compensate for the reduced beamforming gain.
ABF utilizing one RF chain has significant advantage in energy efficiency. PSs can be employed with low resolution which are easier to realize and are more energy-efficient. To circumvent the restriction of limited beamforming directions, the optimal continuous phase ABF can be obtained, and quantize the phase of PS to a finite set [7], [8], [9], employ codebooks [10], [11], [12], [13], [14], or machine learning [15]. Recent novel approaches indicate similar performance of ABF PSs with low and high resolutions in mmWave downlink multicast systems [16]. Similar approaches in ABF and HBF have been proposed for THz communication [17], [18], [19], [20]. The choice of beamforming and the associated transceiver depends on application, but in most cases HBF and DBF architectures require multiple energy-intensive RF chains and have relatively high-power consumption. Post AFE Rxs are either employed as direct-conversion i.e. direct-demodulation [21], [22] or IF conversion [23], [24] architectures for demodulating data. To demodulate the raw bit information, Rxs with ultra-high-speed data will require high-resolution ADCs with sampling rates at least 2-4 times the symbol rate T S of the modulated baseband or IF signals to avoid aliasing [25]. However, both signal-to-noise-distortion ratio (SNDR) and spurious-free-dynamic range degrade with increase in the ADC sampling rate leading to poor resolution. Current state-of-the-art high-speed Rxs, therefore, employ expensive and bulky high-speed oscilloscopes to demodulate data [26], [27]. This leads to ultra-high-speed Rxs being extremely energy inefficient for demodulation. Channel bonding is an alternative solution, but requires several parallel data converters with a high level of calibration [21]. Accordingly designing energy-efficient mixed-signal and baseband units for ultra-high speed Rxs is challenging and requires a paradigm approach. One such approach is to design high-speed Rxs with direct-demodulation requiring no ADCs. Such architectures are reported but support only single and low-order OOK, BPSK, QPSK modulations and have large latency, or have architectures which are not suitable for ultra-high speed data Rxs.
The design challenges in ultra-high speed ADCs, directdemodulation without ADCs and motivations for adopting the approach in this paper are detailed next.
A. ULTRA-HIGH-SPEED ADC ADC is one of the most power-hungry blocks for Rxs. The effective number of bits (ENOB) of an ADC is its dynamic range above the noise levels (quantization and thermal) that is available for measuring the signal input amplitude. ENOB can be increased by increasing quantization levels for Nyquist ADCs or its sampling frequency (f s ) to above the Nyquist frequency as in oversampling ADCs. A trade-off, however, exists between power dissipation P d , ENOB and f s VOLUME 11, 2023 37945 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
determined by a figure-of merit FOM ADC [28]: Sampling data is fundamental to an ADC and hence the power required for it acts as a lower bound to P d . The thermal sampling noise N TS = kT /C s , where C s is the sampling capacitor of the ADC, k is the Boltzmann's constant, and T is the temperature in Kelvin [29]. Typically, C s is chosen large enough such that N TS is of the same order as the ADC's quantization noise N Q . Assuming that an n-bit ADC is designed such that N TS ∼ N Q , it leads to the following minimum value of C s [29]: where, V in is the full-scale voltage at the ADC input. For an ideal n-bit ADC [29]: For high-resolution ADCs (n > 10), P d is dominated by sampling thermal noise and grows proportionally as 2 2n . For low-resolution ADCs (n < 6), P d is dominated by component mismatch and capacitor size, proportionate to 2 n [29]. As technology and supply voltages scale to lower values, the permissible noise levels reduce further requiring larger values of C s for a given resolution [29], and for low-resolution ADCs, the energy efficiency becomes limited by thermal noise.
For high-speed demodulation a minimum resolution is required across the entire bandwidth. Due to parasitic capacitances the ADC power and speed becomes a nonlinear relation leading to a lower FOM ADC . ADCs operating above a certain FOM ADC thus degrade performance of resolution and speed [30]. To increase the sampling speed without degrading the FOM ADC , time interleaving ADCs are used which run at lower sampling speeds. However, they require power hungry front-end drivers and are prone to timing and gain mismatches [31], [32]. Hence, they have limited ENOB and require high power (e.g. 950 mW) [33].

B. DIRECT DEMODULATION WITHOUT ADC
High-speed Rxs with direct modulation and requiring no ADCs are reported in [21], [22], [34], [35], and [36]. However, these are either architecture specific such as for spread spectrum systems, or support only single low-order modulations like OOK, BPSK and QPSK. Ultra-high-speed Rxs will require higher modulation formats for increased data rates and should be adaptive to be able to support multiple modulations depending on channel conditions. Direct-modulation for 8-PSK and higher-order modulations becomes challenging as the boundary decisions for symbols reduce, hence, becoming more susceptible to noise. An 8-PSK direct-demodulation based on the arctangent technique in parallel with a digital phased-locked loop is presented in [37]. This, however, requires large lookup tables and memory increasing latency which becomes extremely challenging to implement at ultrahigh speeds. A technique that overcomes this limitation is described in [38], but is limited to 8-PSK. This paper proposes a novel low-power analog processing (LPAP) Rx with ABF, employing direct-demodulation and RF correlation without the power-hungry high-speed ADCs. The main technical contributions of the paper are summarized below: • The Rx design is able to support multiple modulations of BPSK, 4-QAM, along with the highest modulation of 16-QAM reported so far for direct-demodulation with RF correlation and with a single architecture.
• Simulations of the Rx employed with ABF indicate sufficient symbol of error (SER) performance with a symbol rate of 8 GS/s for 71 GHz Urban Micro (UMi) Cell and 140 GHz indoor (InH) channels.
• Power and linearity analysis are undertaken with the current state-of-the-art ABF, BHF and DBF architectures requiring ADCs, indicating considerable power savings and spurious-free dynamic range (SFDR). Section II details the LPAP Rx architecture, channel model, RF signal power at AFE, phase noise model, comparator and digital decoder design. Simulation results in Section III provide the BER curves, equalizer performance, blocking and interference analysis. The component values gain, insertion loss, linearity and power comparison analysis for various Rx architectures, are detailed in Section IV with concluding remarks in Section V.

II. LOW POWER ANALOG PROCESSING A. RECEIVER ARCHITECTURE
The LPAP Rx consists of 32 antenna elements each followed by LNA and PS as shown in Fig. 2. The input signal is a BPSK/QPSK/16-QAM RF signal with symbol rate of 8 GS/s. The signal is converted to in phase (V in ) and quadrature phase (V q ) components by two LOs at RF, one in-phase with the RF signal, and the other with a 90 • phase difference. For direct demodulation Rxs there are widely employed techniques available for synchronization like pilots/training sequences; and for mitigating I/Q imbalances. The low pass filters (LPFs) are employed to remove any interference signals in the proximity of the RF signal. Since both V in and V q are low pass baseband signals this overcomes any sharp roll-off requirements for the LPF, hence lowering the power requirements. Typically, the LPF is a single pole RC filter with a 3dB cutoff at 12 GHz. The VGAs provide sufficient amplification for a stable signal reference value in the subsequent stages.
At the integrator outputs V in V q are pulse amplitude modulated (PAM) signals with two-levels (PAM-2) for BPSK/QPSK modulation. For 16-QAM V in , V q are four-level PAM signals (PAM-4). The automatic gain controls (AGCs) ensure a certain voltage level at the comparator. Decision feedback equalizers (DFEs) remove any inter-symbol interference (ISI) and channel distortion. The equalized signals are then decoded by the digital decoder that enables demodulation of the BPSK/QPSK/16-QAM signal.

B. CHANNEL MODEL AND RF SIGNAL POWER
Based on the 3D statistical channel model in [39] and [40], an open source MATLAB-based statistical simulator NYUSim v3.1, has been developed by New York University (NYU) [41]. The simulator generates 3D angle-of departure (AOD) and angle-of arrival (AOA) power spectra and power delay profiles (PDPs) that match measured field results from 0.1-148 GHz RF frequencies. This is based on over 15,000 PDPs that were measured and used to derive directional and omnidirectional path loss models and extract smallscale channel statistics such as the number of time clusters, cluster delays, and cluster powers. In the 3 rd Generation Partnership Project (GPP) Technical Report 38.901 outdoor channel model for frequencies above 0.5 GHz [42], the number of clusters is unrealistically large which is not supported by the real-world measurements at mmWave bands [39], [40], [43]. In contrast, in the outdoor statistical model implemented in NYUSim v3.1, the number of time clusters ranges from 1 to 6, and the mean number of spatial lobes is about 2 which is upper-bounded by 5. These are obtained from field observations and are much smaller than those in the 3GPP channel model [44], [45]. In order to realistically quantify the signal power received P r at AFE, NYUSIM v3.1 is employed in this paper for simulating the 71 GHz and 140 GHz channels, as the simulator is built from field data which gives more realistic results.
Input values for NYUSim v3.1 are given in TABLE 1 indicated as channel parameters which includes atmospheric conditions, spatial consistency, and antenna parameters, where HPBW is the half power beamwidth and N TX number of antenna elements in the Tx array. Spatial consistency mode is applicable in outdoor channels, where the Rx moves along a specific path generating correlated and consecutive channel impulse responses for successive sampling points on the path. The path can be selected as linear or hexagonal. Spatially correlated large-scale parameters such as shadow fading, and time-variant small-scale parameters like angles, power, delay, phase of each multipath component are generated [46], [47]. In addition, the effects of human blockage causing temporal shadowing for both indoor and outdoor channels are modelled. For this the default setting option is used in this paper where an average mean attenuation for human blockage is implemented based on a linear fit applicable to the Tx/Rx antenna HPBW [48]. The PDPs generated are weighted by the Tx and Rx antenna directivity given as [39]: where, (θ φ) are the azimuth and elevation angle offsets from the boresight direction in degrees, G 0 is the maximum directive boresight gain in linear units, (θ 3dB , φ 3dB ) are the azimuth and elevation HPBW in degrees, αβ are parameters that depend on the HPBW values, and η = 0.7 is the typical average antenna efficiency. Conventionally, the HPBW of an antenna array is a function of the number of antenna elements and the antenna spacing. However, here in the simulator three parameters i.e., the HPBW, number of antenna elements, and antenna spacing can be independently specified, since there may be a wide range of beamforming approaches as in Fig. 1, where different individual antenna element types (e.g., patch antennas, vertical antennas, horns) are used. The PDP profile generated for 71 GHz channel is shown in Fig. 3. The variation in PDP due to shadow fading for the mobile Rx is shown in Fig. 4 for a Rx track at 45 • . The PDP profile for the 140 GHz channel is shown in Fig. 5.

C. LOCAL OSCILLATOR PHASE NOISE
A common circuit solution for LO is frequency generation with a voltage controlled oscillator (VCO). There are several parameters that can be employed in VCO design, and the performance is well captured by the following figure-of-merit (FOM VCO ) which accounts for power, different semiconductor technologies and circuitry topologies: where, L (f ) is the phase noise in dBc/Hz at a frequency offset f , f o is the oscillation frequency, and P VCO is the power consumption in mW. On a linear scale both L (f ) and P VCO are ∝ f 2 o . In order to maintain the phase noise level at a certain offset when increasing f o by a factor α requires the power to be increased by α 2 , assuming a fixed FOM VCO . Conversely, for a fixed power consumption and FOM VCO the phase noise will increase by α 2 , or 6 dB per every doubling of f o . A common way to suppress LO phase noise is to apply a phase locked loop (PLL), where the VCO is locked to a highly stable reference, normally a low frequency crystal oscillator, using a phase frequency detector, filter and counter. The PLL compares the phase of a reference signal to the phase of an adjustable feedback signal to ensure a steady higher frequency output.
Different strategies can be employed for implementation of signal generation and distribution such as centralized PLL generation (one PLL for all baseband chains), distributed PLL generation (one PLL per baseband chain) and semidistributed PLL generation (baseband chains within a group sharing a common PLL). The different strategies have not yet been investigated. A comparison could lead to some potential advantages/disadvantages for implementation in mmWave or higher frequencies, forming a basis for future study. The total phase noise of a PLL is composed of contributions from the VCO outside the loop bandwidth and the reference oscillator inside the loop. A significant noise contribution is also added by the phase detector and the divider. This poses significant challenges when employing higher frequencies with phase sensitive modulation such as 16-QAM in ultra-high speed Rxs. As the VCO phase noise increases by 6 dB per doubling of the frequency, an increase in VCO frequency from 3 GHz to 30 GHz would result in phase noise degradation of 20 dB for a given offset frequency.
Another contributing factor that increases phase noise at higher frequencies is the degradation in quality factor (Q)-value and low signal power. In order to achieve low phase noise, the Q-value and signal power need to be maximized while minimizing the noise figure of the active device, which is challenging to achieve when the signal frequency increases. For monolithic circuits, the Q-value of the on-chip resonator decreases as frequency increases due to increase of parasitic losses due to substrate effects and resistance of metal tracks. The fundamental VCO frequency is, therefore, generally limited to ∼15 GHz while employing frequency multipliers for higher frequency requirements. In addition, up-conversion of the 1/f noise creates an added slope in vicinity of the carrier frequency. 1/f noise depends on technology; planar devices such as FET, PHEMT and CMOS have a higher noise level than vertical bipolar devices like Si bipolar, SiGe HBT and GaAs HBTs.
Existing phase noise models include Leeson's model [49] and those developed by Hijimira and Lee [50], Rael and Abidi [51], and Razavi [52]. A phase noise model that gives accurate results when compared with actual prototypes is the single side band (SSB) pole zero model developed by the IEEE 802.15.3c task group [53]. The phase noise at offset frequency f in dBc/Hz is given by: where, f z and f p are the pole and zero frequencies respectively. The model was extended and employed by 3GPP for mmWave circuits. It is a generalization of the multi-pole/zero model extended to fractional orders in [54] where: In [54] the modelled phase noise obtained by (9) is compared with a research prototype PLL designed in a 28 nm FD-SOI CMOS process indicating accurate results. This is for distributed PLL generation at 29.55 GHz. The model is used by 3GPP to estimate PLL phase noise at higher frequencies of 45 GHz and 70 GHz [53]. The same method is followed in this paper to estimate the PLL phase noise at 71 GHz and 140 GHz. When the PLL phase noise profile is given at frequency f o with phase noise L(f o ), the correct phase noise L(f ) at any other oscillation frequency f according to Lesson's equation is [49]: Here f o = 29.55 GHz is the carrier frequency with known phase noise profile. FOM VCO degrades at higher frequencies as shown in Fig. 7, which shows FOM VCO versus frequency for several state-of-the-art published VCOs implemented in CMOS [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67].   To determine the phase noise characteristics, the phase noise is first increased by the 20log 10 f f o degradation according to TABLE 2. Then parameters f z,n , α z,n , f p,m , and α p,m f z,n α z,n f p,m , α p,m are altered to obtain specified FOM VCO degradation at 30 MHz offset while maintaining a constant phase noise of −140 dBc/Hz at large offset and at the hump around 1.55 MHz offset. The resulting phase noise characteristics are shown in Fig. 8, with parameters listed in Table 3 for 71 GHz and 140 GHz.

D. COMPARATOR AND DIGITAL DECODER
Three single-bit comparators C 1 , C 2 and C 3 are employed for 16-QAM demodulation as shown in Fig. 9 where the input  is a PAM-4 V q /V in signal. The effects of process voltage temperature variation (PVT) can be countered by employing AGC [68], [31] and adaptive reference voltage generators [69], [70]. A PAM signal offers superior performance in comparison to non-return-to-zero (NRZ) signalling under the effects of inter-symbol interference (ISI) and clock jitter [71], achieving data rates of 100 Gbps [72], [73]. The threshold values for C 1 , C 2 and C 3 are set as TH 1 , TH 2 and TH 3 respectively. The comparator output voltage levels V 1 , V 2 and V 3 are high or low depending on the amplitude level of the input signal. For QPSK the Rx utilises two C 2 comparators, one each for V q and V in and PAM-2 inputs. For BSPK a single C 2 comparator is utilised for either V q or V in and a PAM-2 input.  Fast, high resolution comparators can be implemented with one or two stages of preamplification followed by a track and latch stage. Normally a low gain (∼10) preamplifier reduces the kickback caused by fast output transitions while reducing the probability of metastability. Higher gain values can increase the time constant thereby reducing the speed. The latch alternates between a reset phase and positive feedback stage that generates full-swing digital signals from the preamplifier output. State-of-the-art comparators can sample at 14 GS/s [74] and interleaving comparators can be employed to increase the sampling rates.
The digital decoder for 16-QAM is shown in Fig. 10. With V in as input, the decoder employs V 2 as MSB and a second bit B 2 is produced by XORing V 3 and V 1 . This results in a two-bit Gray mapping output of V in . With V q as the input, bit B 3 is obtained by inverting V 2 , and the LSB is produced by XORing V 3 and V 1 . Bits B 3 and LSB also result in a two-bit Gray mapped output of V q . The final symbol is obtained by combining V in and V q bits, resulting in a four-bit 16-QAM symbol according to Gray mapping shown in Fig. 11. The QPSK and BPSK decoders are shown in Fig. 12. For demodulating the QPSK symbol decoder inverts V 2 to obtain the MSB, and the LSB from inputs V q and V in respectively. This results in a QPSK symbol with Gray mapping shown in Fig. 11. For a BPSK symbol the output bit B 1 is obtained by inverting V 2 from either V q or V in as the input.
The truth table values for four 16-QAM symbols are shown  in TABLE 4 which provides the PAM-4 2-bit quantised outputs for V in and V q . All 16-QAM symbols in Fig. 11 can   be generated using appropriate combinations of V in and V q output values. The truth tables for QPSK and BPSK decoders are shown in TABLE 5

III. SIMULATION RESULTS
The LPAP Rx in Fig. 2 was implemented in Simulink Matlab R2020a®. Simulations were undertaken for the signal inputs generated by the NYUSim v3.1 for 71 GHz and 140 GHz channels. Thermal noise was generated according to (6) and the LO phase noise as in Fig. 8. The Rx component values such as gain, insertion loss, linearity and noise figure employed are detailed in Section IV.

A. BER CURVES
The BER curves for the LPAP Rx and the ideal theoretical values with no channel coding for AWGN channels are shown in Fig 13. The LPAP performance is similar for 71 GHz and 140 GHz channels with the 140 GHz performing marginally better. This can be attributed to the signal at AFE being higher at −16 dBm for the 140 GHz channel, compared to −21 dBm for the 71 GHz channel.

B. DECISION FEEDBACK EQUALIZER
A continuous time linear equalizer (CTLE), feed-forward equalizer (FFE), decision feedback equalizer (DFE) or a combination of these are normally employed for channel equalization [75], [76]. Another approach is to employ spatiotemporal equalization with beamforming in mmWave channels [77]. A DFE offers a significant advantage over CTLE and FFE since it cancels post-cursor ISI without noise and crosstalk amplification between the in-phase and quadrature channels. The DFE employed is shown in Fig.14. The automatic gain control (AGC) ensures a certain voltage level at the DFE input. It applies an adaptive variable gain to the input waveform to achieve a desired RMS output voltage. Averaging the RMS voltage over a specified number of symbols, the AGC performs by increasing or decreasing the gain, or keeping the gain constant. The DFE samples data at each clock sample time and the amplitude of the waveform is adjusted by a correction voltage. The zero-forcing algorithm is employed to determine the correction factors necessary to eliminate ISI. Due to the sparse nature of mmWave channels a two-tap DFE is sufficient to equalize the 71 GHz line-of sight (LOS) or no line of sight (NLOS), and 140 GHz line-of sight (LOS) channels. For the 140 GHz no-line of sight (NLOS) channel a 4-tap DFE is required due to increased multipaths that occur indoors with sufficient power levels leading to ISI.    The normalized impulse response waveforms for the DFE are shown in Fig. 15 for the 71 GHz LOS and 140 GHz NLOS channels where more multipaths are visible for the latter.
The pseudo-random binary sequence waveforms for PAM-4 (16-QAM) and NRZ (QPSK/BPSK) are shown in Fig.16. Eye diagrams and BER for PAM-4 and NRZ are shown in Fig.17.
The signal quality parameters for PAM-4 and NRZ are eye linearity, vertical eye closure (VEC) and channel operating margin [78], [79]. Linearity is a measure of variance in amplitude separation (distribution) between the different PAM-4 levels. The eye linearity is always equal to or less than 1.0. The value 1.0 indicates that the separations between all levels are equal. The eye linearity (EL) is given by [78]: EL = min AV upp , AV mid and AV low max AV upp , AV mid and AV low (11) where AV xxx is the mean of central 5% of eye amplitude values as shown in Fig 18. The vertical eye closure (VEC) penalty is given by [78]:  The channel operating margin (COM) is defined as [79]: COM = 20 log 10 A s N (13) where N is the peak BER noise and A s is the peak signal. COM combines the eye-mask and frequency-domain masks, with user-defined equalization parameters at the Rx. COM produces a signal-to-noise ratio (SNR) as a final value, which represents the channel performance and must be >3 dB. The PAM-4 and NRZ measured parameters along with the DFE coefficient values are shown in TABLE 6.

C. BLOCKING AND INTERFERENCE
During beamforming the beam can be directed towards the Rx and prevent interference to nearby Rxs. The effects of interfering power which is offset by 5 • in azimuth can be reduced by ∼3-8dB depending upon the elevation offset [80]. This is for HPBWs from 13.4 • − 25.8 • . Lower values of interference are likely since the Rx HPBW considered is 7 • [81].
Rx protection from noise and interference can be achieved through requirements for performance parameters like ACS 37952 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
(adjacent channel selectivity) and blocking characteristics. Since the standards for ultra-high speed Rxs are not available the 5G New Radio (NR) parameters are adopted for quantifying the interference as below: • ACS is a measure of a Rx's ability to receive a wanted signal at its assigned channel frequency in the presence of an adjacent channel signal at a given frequency offset from the center frequency of the assigned channel. The required ACS is 23 dB (Table 7.5-1 [82]).
• Blocking characteristic is a measure of Rx's ability to receive a wanted signal at its assigned channel frequency in the presence of an unwanted interferer on frequencies other than those of the adjacent channels, without this unwanted input signal causing a degradation in Rx performance beyond a specified limit. ( Beamforming and network function virtualization (NFV) currently adopted for 5G networks provide an inherent mechanism that would mitigate interference. High frequency bands make it possible to increase antenna element density without increasing the physical size of the antenna which allows much narrower beams to be formed. For 5G networks NFV is employed to perform the majority of processing needed to run commercial networks, via virtual network functions and by scheduling users. This can reduce interference significantly.

IV. CIRCUIT COMPONENTS AND POWER ANALYSIS A. RECEIVER LINEARITY AND DISTORTION
The state-of-the art CMOS circuit parameters including nonidealities considered for simulation to obtain the BER curves in Fig. 12 are summarized in TABLE 7. The gain, noise figure and third the intercept point of various stages indicated by i ∈ [1,5] in the Rx are given by G i , NF i and IIP 3,i respectively. According to International Telecommunication Union-Radio (ITU-R) recommendation the spurious free dynamic range of a Rx accounting for distortion is given by: where IIP 3 and NF are the total third intercept point and noise figure of the Rx. The IIP 3 of M stages is given by [89]: The noise factor for M stages is given by the Friss's formula: where NF = 10 log 10 f M Using the values in TABLE 7 from (14)-(16) the SFDR for the LPAP Rx is 46.48 dB. For a fixed bandwidth the design process requires a trade-off between the NF and IIP 3 . Loss in front of a Rx stage improves the IIP 3 , while increasing gain improves the NF. This is in contradiction to when defining the system specifications in each stage as a common practice and considering each performance parameter separately. Currently there exists no technique for optimisation. Consider IIP 3 in (15) as a function such that: where x = IIP 3,1 , y = IIP 3,3 , z = IIP 3,5 ; and gains G i as constants ∀i. The gradient of f (x, y, z) is given by: where: θ = [yz + C 1 xyz + C 2 xz + C 3 xy] 2 , The normalised gradient vector is given by: From (19) it is inferred that increasing IIP 3,1 would increase IIP 3 at a faster rate than for any increase in IIP 3,3 or IIP 3,5 . This is shown in Fig. 19 where increase in IIP 3 is plotted for every 2 dB increase in IIP 3,i , i = 1, 3, 5. Further increase in SFDR can only be achieved by lowering NF 1 . The LNA can accordingly be designed by employing SiGe HBTs. Such high performance BiCMOS technology platforms with higher integration levels are currently being employed to address ultra-high speed data rates [90]. The design of a wideband SiGe HBT LNA with G 1 = 20.2 dB, IIP 3,1 = 8.36 and NF 1 = 3.7dB is presented in [91], which improves the SFDR to 50.1 dB. The LNA power consumption is 17 mW.

B. DISCUSSION
For LPAP Rx, the AFE and comparator technology determine the feasibility of demodulation. How high the RF frequencies can be processed is determined by the AFE, and the comparator sampling rate determines the data symbol rate. Current BiCMOS/CMOS technologies permits the design and implementation of Rxs with low-power AFE with carrier frequencies as high as 125 GHz [37], and similar technologies can be employed to implement the LPAP Rx detailed in this paper. The comparators must have sufficiently high sampling rates to cover the wide signal bandwidth for ultrahigh-speed data transfer. Assuming the power consumption values indicated in TABLE 7, the power breakdown for the configurations in TABLE 8 are given in TABLE 9. A stateof-the-art ADC in 20 nm CMOS has a FOM ADC of 0.24 pJ and a sampling rate of 16 GS/s with ENOB 6 bits [33]. Using (1) and FOM ADC = 0.24 pJ the power consumed by the ADC is determined. High-speed comparators are widely employed as low-power voltage slicers for decoding PAM-4/PAM-2 signals. Power consumption can be as low as ∼1.7 mW for a 28-nm CMOS voltage slicer clocking at 30 GS/s [92], which is assumed in this paper. The power breakdown of ABF, HBF and DBF is given in Fig. 20. The ADCs contribute the largest share ranging from 29% to as high as 85%, which is avoided by not employing them in the proposed LPAP Rx.  There are alternative approaches when using low resolution ADCs i.e. 3.5-4 ENOB to lower the power requirements [93], [94], [95], [96], [97], [98], [99]. However, they do not consider the effect of wide bandwidths as in the case of ultra-high-speed data and algorithms at moderate sample rates of about 1 GS/s [93], [94], [95], [96], [97], [98], [99]. For the expected ultra-high-speed data rates, ADC power requirements will remain high even for lower resolutions. In comparison employing comparators, the power requirements remain much lower for similar data rates expected as shown in Fig. 21. The reference power taken is 1.7 mW, which is for a PAM-4 comparator at a sample rate of 30 GS/s [92].
The alternative approaches of low-resolution ADC algorithms are complex to implement, and require additional signal algorithms such as for synchronization, user scheduling and beamforming [94], [95], [96], [97], [98]. When employing low resolution ADCs with ENOB of 3.5-4 the power requirement is 82 mW-115 mW at a sample rate of 30 GS/s, not accounting for the overheads for the additional signalling algorithms.

V. CONCLUSION
Designing energy-efficient mixed-signal and baseband units for ultra-high speed Rxs requires a paradigm approach such as analog processing which is proposed in this paper. The novel approach is based on RF correlation that can process ultra-high-speed data envisaged for B5G/6G/THz Rxs without the power hungry ADCs. Circuit non-idealities such as linearity, noise figure, insertion loss, thermal and phase noise are taken into consideration. Phase noise of LOs has a significant effect on signal quality in higher order modulations such as 16-QAM. The 3GPP mmWave phase noise model is adopted to accurately model performance of LOs implemented in 28 nm FD-SOI CMOS process. Two PAM-4 low voltage slicers are employed replacing the power hungry ADCs. The digital decoder is not only able to support multiple modulations with a single architecture but also with highest modulation reported so far for 16-QAM employing direct demodulation. The digital decoder conforms to Gray mapping for the baseband signals minimizing the probability of bit error. Power analysis undertaken for current beamforming approaches requiring ADCs indicate a promising alternative approach towards designing ultra-high speed Rxs with lowpower analog integrated circuit design solutions.