Digital Polar Transmitters for Massive MIMO: Sum-Rate and Power Efficiency Analysis

In this article, we comprehensively investigate the potential of the digital polar radio transmitter architecture for multi-user massive multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) downlink system. In terms of throughput performance, we derive a lower bound for the average sum-rate achievable with Gaussian signaling inputs and zero-forcing (ZF) precoding based on Bussgang decomposition. By diagonal approximation, we derive an approximate, yet accurate, model for the distortion caused by uniform polar quantization, which can be used to evaluate the corresponding sum-rate in closed form. To assess the power efficiency, we provide power consumption models with realistic parameters and values for the quantized polar and Cartesian transmitters, based on state-of-the-art integrated circuit (IC) designs and measurements. Extensive numerical results demonstrate that the proposed quantized polar transmitter can enable excellent performance in terms of average sum-rate, symbol error rate (SER), and out-of-band (OOB) emission level, compared to the Cartesian architecture. Furthermore, the power consumption comparisons show that the digital polar transmitter can save more than 36% in the energy consumption under 64-antenna setting in typical 5G enhanced mobile broadband use cases, thus making it highly appealing for future power-efficient massive MIMO transmitter implementations.

large numbers of antennas at the base station (BS) could theoretically achieve multiple orders of spectral and energy efficiency gains over current 4G networks [1], [2].Moreover, massive MIMO is regarded as a potential enabler of massive connectivity and ultra low latency in the context of 5G internet of things (IoT) [3].However, the efficient implementation of a massive MIMO radio frequency frontend (RFFE) with hundreds or even thousands of antennas is a significant challenge [4].One can directly implement as many conventional RF chains and independently controlled RF outputs as required, however, the power consumption, size, and cost of the mixed-signal components, i.e., analogto-digital converters (ADCs) and digital-to-analog converters (DACs), and the RF parts, increase proportionally with the number of antennas.Furthermore, wideband orthogonal frequency-division multiplexing (OFDM) signals and highorder quadrature amplitude modulation (QAM) schemes utilized in 5G result in high peak-to-average power ratio (PAPR).This lowers the power efficiency of power amplifiers (PAs) in power-back-off operation [5].Thus, it is highly desirable to develop custom BS radio transmitters and energy efficient transmission schemes [6] to increase hardware integration and output power, and to reduce the cost and power consumption of large-scale MIMO implementations at sub-6 GHz and mmWave bands [4].
A. Relevant Prior Art 1) Constant-Envelope and One-Bit Precoding: One key strategy for improving the cost and power efficiency of the transmitter is to cut the power consumption of the current power-hungry circuit components, such as high-resolution DACs and RF PAs.As an example, it has been demonstrated that massive MIMO systems with coarsely quantized DACs, even with one-bit quantization, can achieve satisfactory bit/symbol error rate (BER/SER) and achievable rate performance, especially if combined with tailored nonlinear precoding [7], [8], [9], [10].
Another appealing option is to generate constant envelope (CE) waveforms with 0 dB PAPR, which would enable the highest efficiency for traditional PAs or the utilization of switch-mode PAs.In general, the complex baseband signals of the CE waveforms at each transmitting antenna are constant-amplitude and thus information can only be modulated in the phase of the transmitted signal [11].This stringent constraint, however, degrades both the power and spectral efficiency, as well as introduces new signal processing challenges.To tackle this issue, through spatial processing, one-bit and CE precoding have been recently proposed and have since attracted wide interest in the literature [12], [13].With CE precoding, the transmitted signal at each antenna is restricted to take an M-ary phase-shift keying (PSK) form or to have a continuous constant envelope [14].It is also worth noting that one-bit precoding can be regarded as a special case of CE precoding from the precoding design point of view.
The abundant CE precoding designs can be classified into mean squared error (MSE)-based nonlinear precoding and constellation-dependent precoding designs.The former approach generates the constant envelope transmit vector directly by solving a nonlinear least-squares optimization problem, resulting in better BER performance compared to the quantized linear precoding [12].The latter designs solve the well-reformulated optimization problems by taking into account the constellation structures for performance improvement [13].However, the current algorithmic developments for interference exploitation with symbol-level precoding are intimately linked with factors such as the chosen design formulation, the scenario (e.g., single user or multi-user), and the utilized modulation constellation.In particular scenarios and with specific modulation constellations (e.g., low-order QAM and PSK), efficient algorithms or closed-form solutions have been developed for one-bit and CE precoding [15], [16].
For the above taxonomy, we note that one-bit/CE precoding for the narrow-band massive MIMO systems is wellstudied [13], while in this work we focus on wideband OFDM transmission.Table I summarizes the state-of-the-art of one-bit/CE precoding for downlink MIMO-OFDM, while the one-bit quantized linear precoding such as maximal-ratio transmission (MRT) and zero-forcing (ZF) [17] is also listed for comparison.It was shown in [5] that the algorithm presented in [19] for CE single-carrier precoding over frequencyselective channels can be extended to OFDM transmission.The iterative time-domain processing includes the computation of the convolution at each iteration (see (64) in [5]), which leads to a high computational complexity.To reduce the complexity, [20] proposed a squared-infinity norm Douglas-Rachford splitting (SQUID) algorithm, called SQUID-OFDM.In [21] and [22], the CE precoding designs are formulated into a constrained least-squares problem with a constant modulus constraint and solved using cyclic coordinate descent (CCD) based iterative algorithm, and more efficient Gauss-Newton (GN) algorithm.Based on the concept of constructive interference, a nonlinear precoder to transmit PSK signals in the one-bit quantized MIMO-OFDM systems is proposed in [23].A maximizing the safety margin (MSM) problem is formulated to optimize the decision thresholds and reduce the quantization distortion.While the developed designs mentioned above have their distinct benefits, they also suffer from a loss in the achievable data rate, since a large share of the transmitted power is wasted into the channel null-space.Furthermore, these methods rely on a large number of iterations for optimization, at symbol-level, which is a considerable challenge for real-time implementation of wideband massive MIMO transmitters.
2) Digital Transmitter Architectures: The emerging paradigm of the digital-intensive RFFE design naturally leads to transmitter architectures relying on extensive time-based signal processing.In time-domain signal processing, the information is coded into the phase or delay of the signal while the dependency on amplitude resolution can be limited or even removed.Thus, digital logic can be extensively utilized to generate these signals [24], [25].Then, the RFFE integrated on a single chip can also benefit from complementary metal-oxide-semiconductor (CMOS) process scaling, since smaller transistors allow faster switching time and thus increased time resolution while consuming less energy per transition [26], [27].Moreover, time-based transmitter architectures have the potential to adapt well to phased-array and massive MIMO systems, where precise phase control of each antenna element is required.Wellknown examples of time-based transmitter architectures are polar, outphasing and pulse-width modulation (PWM) transmitters.All these architectures have been invented decades ago, but they have only recently found new life thanks to advancements in digital PAs (DPAs) and wideband digital phase modulators.The emerging DPA designs that incorporate the functionality of DAC and PA in the same circuit can be used in combination with time-based transmitter architectures to leverage the advantages of switched-mode PAs [28].Specifically, the switched-capacitor PA (SCPA) has become a popular DPA architecture in the RF integrated circuit (RFIC) domain, since it can achieve high accuracy and linearity by exploiting the precision of capacitance ratios that CMOS processes provide, while simultaneously achieving good power efficiency [29].For further descriptions and implementations of the various DPA designs, the reader is referred to the overview article [30] and references therein.
In terms of power consumption, the digital polar architecture is the most promising transmitter architecture.It has been shown that polar architecture has an inherent power efficiency gain of up to 3 dB compared to the Cartesian architecture, due to the in-phase and quadrature (I/Q) signal summation in the latter [31].Several successful designs and chip prototypes [32], [33], [34], [35], [36] have demonstrated the feasibility and superiority of the digital polar transmitters.Specifically, [32] demonstrated an analysis and implementation of a polar quantizer in a CMOS process for wireless receivers.It showed, both analytically and experimentally, that the polar quantizer can achieve a higher signal-toquantization-noise ratio (SQNR) compared to the rectangular (I/Q) quantizer.Then, a polar transmitter was implemented in [33] for mmWave communications, wherein the phase modulated signal was generated with upconversion mixers while the amplitude signal modulated a 4-bit RF-DAC.The first polar transmitter to meet either the linearity requirements of 256-QAM WLAN signals or the transmit signal quality requirement of aggregated 40 MHz LTE signal was reported in [34], in which the DPA delivers a peak output power of 21.9 dBm with 41% drain efficiency.In [35], a digital polar transmitter using a digital-to-time-converter (DTC) to enable high-efficiency RF-DAC for multi-band applications was implemented.The recent implementation of an all-digital polar transmitter for 2.5/5 GHz dual-band Wi-Fi 6 application [36] shows that the DPAs can reach a peak power/average power efficiency of 27 dBm/53% and 27 dBm/37% at low and high frequency bands, respectively.However, to the best of our knowledge, there are no implementations reported for large-scale MIMO transmitters.Furthermore, comprehensive theoretical analysis and performance characterization of applying the digital polar transmitter in massive MIMO-OFDM systems are still lacking.This paper is the first to address these issues, in terms of providing performance analysis tools and results related to the adoption of polar radio transmitters in massive MIMO systems.Such fundamental results and design insight related to, e.g., dimensioning the antenna array size in different deployment cases are of instrumental importance, paving the way towards actual hardware implementations.

B. Contributions
In contrast to the existing works [17], [18], [20], which consider the Cartesian transmitter with finite-resolution DACs and/or CE precoding in massive MIMO, we focus in this paper on the polar radio transmitter architecture.This architecture can benefit significantly from CMOS process downscaling and highly-efficient DPAs, contrary to the Cartesian architecture [37].To better understand the potential advantages of the polar transmitter, we first investigate the average sumrate, SER, and out-of-band (OOB) spectral emissions of the quantized polar transmitter in the wideband multi-user massive MIMO-OFDM downlink system.Then, we analyze the power efficiency of the quantized polar transmitter based on proposed power consumption models for the circuit components.Commonly used asymptotic analyses for massive MIMO systems are not applicable here, because the assumptions of (i) Gaussian multi-user interference and nonlinear distortion and (ii) Gaussian distribution-based channel models required for such analysis are not fulfilled in our system model.Thus, we follow a similar analysis framework as [18], which analyzed the achievable rate performance of the Cartesian architecture with low-resolution DACs in massive MIMO-OFDM using Bussgang decomposition, and then investigate and compare the performance of both architectures.However, the analysis methods in [18] are only valid for the case of negligible overload distortion, whereas we take the overload distortion into account explicitly in this study.Compared to our earlier work in [38], here we develop more accurate and complete system analyses and performance evaluations, in the framework of the 5G enhanced mobile broadband (eMBB) applications.Specifically, in [38], a second-order Taylor approximation was utilized which yields good performance estimates with high phase resolution.Following up on this, in the current work, a more generic analysis is developed, being applicable also with lower phase resolutions.
The key contributions of the paper can be stated and summarized as follows: • Firstly, a digital polar transmitter architecture for massive MIMO-OFDM systems is presented to improve the transmitter's power efficiency.• We derive a lower bound for the average sum-rate achievable with Gaussian signaling and ZF precoding, utilizing Bussgang decomposition and a diagonal approximation which treats the polar quantization distortion as uncorrelated in both the spatial and temporal domains.
• We analyze and compare the total power consumption of the quantized polar transmitter and the traditional Cartesian architecture based on proposed power consumption models whose parameters/values in each required circuit block are extracted from state-of-the-art circuit designs and transmitter implementations.• The extensive numerical results demonstrate that the quantized polar transmitter enables superior performance in terms of achievable average sum-rate, uncoded SER and OOB emission levels, compared to the Cartesian one [18] and CE precoding with the popular SQUID-OFDM algorithm [20].Specifically, we show that only 3-4 amplitude bits and 5-6 phase bits are sufficient to approach the system performance without quantization.the results suggest that the optimum array size for a given use case can be different for the different transmitter architectures.

C. Organization and Notation
Organization: The remainder of this paper is organized as follows.In Section II, we introduce the system model and the concept of the quantized polar transmitter.In Section III, we derive the signal-to-interference-noise-and-distortion ratio (SINDR) at the user equipments (UEs) and develop a lower bound for the average sum-rate achievable with Gaussian signaling and ZF precoding.Then, we derive a diagonal approximated covariance of the distortion caused by polar quantization, which we use to evaluate the sum-rate.In Section IV, we analyze the power consumption of polar and Cartesian transmitters based on proposed power consumption models.In Section V, we provide extensive numerical results to demonstrate the system performance and the power consumption of the quantized polar transmitter.Section VI concludes the paper.
Notation: Boldfaced lowercase and uppercase letters are used to denote vectors and matrices.C n and R n represents n-dimensional complex and real vector space, respectively.Superscripts (•) T and (•) H stand for the transpose and conjugate transpose operators, respectively, while vec(•) and blkdiag(•) denote the vectorization and block diagonal operator, respectively.The operators E{•}, Tr(•), ⌊•⌋ and ∥ • ∥ stand for the statistical expectation, trace, floor, and vector norm of a vector.Kronecker product is denoted by ⊗, while the Hadamard (element-wise) product of two equally-sized matrices is denoted as ⊙.ℜ{•} and ℑ{•} represent the real and imaginary parts of a complex input, respectively.I n denotes the n × n identity matrix and [X] i,j denotes the (ith, j-th) element of matrix X. CN (•) denotes the complex Gaussian distribution.The function exp(•) denotes the natural exponential function.

II. SYSTEM MODEL
We consider a single-cell massive MIMO-OFDM downlink system as depicted in Fig. 1.The system consists of a BS with N t antennas, which simultaneously serve K singleantenna UEs over a frequency selective channel, where N t is significantly larger than K.At the BS, the discrete-time OFDM signal at each antenna is generated by applying the inverse fast Fourier transform (IFFT) to the frequency-domain precoded vector of data symbols, before passing the signal through the polar transmitter.At the UEs, the time-domain received signal is transformed back to frequency domain through FFT followed by OFDM demodulation to obtain the received data symbols.We assume that the subcarrier spacing is ∆f and the sampling rate f s = κN F F T ∆f , where κ is the oversampling ratio and N F F T is the corresponding FFT size.Then, each OFDM symbol consists of N = κ N F F T timedomain samples, with a sample interval of T c =1 N ∆f .We also assume that a cyclic prefix (CP) of length larger than D is added to remove inter-symbol interference, with D denoting the channel impulse response length in samples (cf.( 6)).

A. Baseband Precoding
The symbol vector contains the data symbols for K UEs, where O represents the set of QAM constellations.Let the disjoint sets S u and S g be the set of S active subcarriers associated with intended data symbols and the set of N −S guard subcarriers, respectively.Hence, we set u k,l = 0 K×1 for l ∈ S g and E{∥u l ∥ 2 } = I K for l ∈ S u .In order to cancel multi-user interference, frequency-domain precoded vectors are generated by multiplying u l with the persubcarrier precoding matrices P l ∈ C Nt×K .The transmitted time-domain signal z n is then generated by applying normal OFDM processing at each transmit branch, expressed as for Moreover, the transmitted signal z n is assumed to satisfy the total transmit power constraint where P total denotes the total transmit power at the BS.In the following, we will focus on linear ZF precoding, which is commonly studied in the Cartesian case in massive MIMO systems.
Assuming that perfect channel state information (CSI) is available at the BS, 1 the corresponding precoding matrices for the active subcarrier l ∈ S u are given by where the associated power scaling constant is given by [18] and the channel matrix H l = Γ 1 2 H l ∈ C K×Nt includes the small-scale fading at the l-th subcarrier H l and the largescale fading coefficient between the BS and UEs.The diagonal large-scale fading matrix [Γ] k,k = γ k contains the path loss, penetration loss and shadow fading for the k-th UE.For the small-scale fading, we adopt a cluster-based channel model to address the frequency-selectivity and spatial correlation characteristics of the massive MIMO channels.We note that 3GPP also utilizes a similar channel modeling approach in 5G mobile radio standardization [39].
Thus, the delay-d channel vector for the UE k can be expressed as where there are N cl clusters and each cluster consists of N ray rays.Each cluster c has a time-delay κ c , angle of arrival (AoA) ϕ c , and angle of departure (AoD) θ c , while each ray r with the ray delay κ r and the corresponding AoA and AoD are denoted by φ r and ϑ r , respectively.The complex gain α rc ∼ CN (0, σ 2 c ), where σ 2 c represents the average power of the cluster c.The function f flt (•) is a pulse-shaping function for T c -spaced signaling.a BS (•) denotes the antenna array response vector of the BS, and a UE (•) accounts for the phase between the clusters and UEs.For the uniform linear array (ULA), a BS (ϕ) = , where λ and d s denote the carrier wavelength and inter-antenna spacing, respectively.Thus, the compound multi-user channel matrix is Lastly, the corresponding multi-user frequency-domain response at l-th subcarrier is given by where D denotes the channel impulse response (CIR) length in samples.A line-of-sight (LOS) component can also be added in (5) to account for a propagation environment with a LOS path.

B. Quantized Polar Transmitter Architecture
Instead of the conventional and popular Cartesian architecture, a quantized polar transmitter architecture is adopted at the BS.As shown in Fig. 1, the time-domain per-antenna transmitted I/Q signal is first upsampled and pulse-shaped.Then, these I/Q signal components are converted to a polar representation in the digital domain with, for example, the coordinate rotation digital computer (CORDIC) algorithm.Thus, the amplitude and phase signals are given by for n = 0, • • • , N − 1, where the I/Q signal branches of z n follow independent identically distributed (i.i.d.) zeromean asymptotic Gaussian distributions for a sufficiently large number of subcarriers.Thus, the amplitude signal A n follows the Rayleigh distribution and the phase signal Ψ n follows a uniform distribution in the interval [0, 2π).
After that, uniform polar quantization is adopted in which the amplitude and phase signals are quantized independently.Then, the quantized amplitude signal directly modulates an l A -bit resolution (i.e., L A = 2 l A quantization levels) RF-DAC, acting also as the PA.The phase signal, quantized to l P bits, modulates the phase of the RF carrier in the phase modulator.The phase modulator can be realized with digitally controlled circuits, for example, a digital phase-locked loop (PLL) [40] or a delay-line based approach [24], [25], [35], [36], [41].In the last stage, the amplitude signal and the phase modulated signal are combined in the RF-DAC to generate the per-antenna transmitted signal at the desired carrier frequency.
In this work, we assume that RF components are ideal without impairments, aside from the quantization in the DACs and phase modulators.Moreover, perfect timing between the amplitude and phase paths as well as perfect synchronization between the BS and UEs are assumed.However, there are some issues that need to be considered for the practical implementation of the wideband digital polar transmitter.Firstly, the bandwidths of the amplitude and phase signals in polar transmitter are significantly higher than that of the original I/Q signal due to the nonlinear signal conversion from I/Q to polar.Hence, the polar transmitter needs to be able to accommodate wider internal bandwidth or the composite RF signal will be distorted.Secondly, any time delay mismatches between the amplitude and phase signals will result in erroneous restoration of the transmit signal, causing linearity degradation.These are crucial challenges in real-life wideband polar transmitter implementations, especially for massive MIMO-OFDM systems.Nonetheless, it is encouraging that recent studies and implementations mentioned above have demonstrated the feasibility of the digital polar transmitter and phase modulator with more than 100 MHz RF signal bandwidth.

C. Uniform Polar Quantization
Uniform polar quantization is adopted in the polar transmitter, where two uniform real-valued scalar mid-rise quantizers act on both the amplitude and phase paths.Due to the Rayleigh Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
distribution of the amplitude signal, the amplitude quantization is more complicated.In general, one should choose the best step size for minimizing the quantization distortion caused by the finite amplitude resolution.However, too small a step size will result in significant overload distortion.Here, we set the clipping level A clip of the amplitude quantizer such that the amplitudes of samples that exceed this level will be clipped.The clipping operation creates signal distortion, resulting in unwanted in-band and out-of-band emissions that degrade error vector magnitude (EVM) and interfere neighboring channel users.
With L A quantization levels, the step size ∆ = A clip L A and the quantization thresholds and labels of the amplitude quantizer are given by Then, the quantized amplitude signal is given by where e An is the quantization error of the amplitude signal.Similarly, the quantization thresholds of the L P -level (i.e., L P = 2 l P ) phase quantizer υ p = (p − 1) 2π L P , p = 1, • • • , L P , υ L P +1 = 2π, and the output of the phase quantizer can be expressed as where e Ψn denotes the quantization error of the phase signal.
Then, the quantized transmitted signal at each BS antenna over the n-th time sample can be written as From input-output perspective, the polar quantizer can be viewed as a single instantaneous nonlinear transformation.Thus, as is demonstrated in the next section, we can apply the Bussgang decomposition in analyzing the average sum-rate.

A. Channel Input-Output Relationship
The received discrete time baseband signal y n ∈ C K for K UEs is given by where n n ∼ CN (0, N 0 I K ) denotes additive white Gaussian noise (AWGN) at the UEs at time instance n and N 0 is the noise variance.Then, the transmit SNR is defined as P total /N 0 .
For analytical convenience, we transfer the time-domain signal model in ( 14) into frequency domain.Let F N be the normalized N × N discrete Fourier transform (DFT) matrix with the property F N F H N = I N .Then the transmitted and received signal matrices in the frequency-domain are X = XF N and Y = YF N , respectively, where are the corresponding time-domain transmitted and received signal matrices over N time samples.After discarding the CP, the received frequency-domain signal y l ∈ C K over K UEs at the l-th subcarrier can be expressed as where y l and x l are the l-th column of Y and X, respectively, and Next, let us define the compact forms for the following vectors and matrices: where P l = 0 Nt×K for l ∈ S g in (16c).By using the simplified Bussgang decomposition [18], [42], the transmitted frequency-domain vector x can be written as which is known to converge to Gaussian as the number of subcarriers goes to infinity.Finally, the frequency-domain received signal over N subcarriers can be rewritten as By substituting ( 17) and ( 18) into ( 19), we have Let y k,l = [ y l ] k be the received signal for the k-th UE at the l-th subcarrier, then we have where Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. Achievable Sum-Rate With Gaussian Signaling
With Gaussian signaling inputs, 2 i.e., u l ∼ CN (0, I K ) for l ∈ S u , the SINDR for UE k at the l-th subcarrier can be expressed as [18] SIN DR k,l ( H) where and C d = E{dd H } denotes the covariance of the distortion d.
It is challenging to derive the exact achievable rate due to the non-Gaussian distributed residual multi-user interference and quantization distortion.Nevertheless, a lower bound on the achievable rate can be derived by using the so-called "auxiliary-channel lower bound" [7], [18] approach.Specifically, the lower bound of the average sum-rate can be written as From ( 17), the time-domain transmit signal x ≜ vec(X) = Bz + d.Then, we have where C x = E{xx H } ∈ C NtN ×NtN and C z denotes the covariance matrix of the input Gaussian signal z, which is given by (cf.(18)) Due to the overload distortion, the rounding approximation method, used for example in [18], is not valid.In addition, the computation of C x is more challenging compared to I/Q signal quantization, due to the asymmetry of polar quantization.
For tractable analysis, we employ a diagonal approximation for deriving C d , which is widely used when analyzing the performance of low-resolution converters in massive MIMO systems (see, e.g., [42]).Doing this, we obtain 2 According to information theory, one of the necessary conditions to achieve Shannon capacity/throughput is that the channel inputs must be continuous Gaussian random variables.However, this is not true in modern wireless communication systems, in which the channel inputs generally take their values from a finite alphabet (constellation) with equal probability. where . Hence, the Bussgang gain corresponding to the m-th component for the polar quantization of z m is given by where the superscript (•) * denotes the complex conjugation operation, m are the quantized amplitude and phase, respectively; e am = [e An ] m denotes the amplitude quantization error while the phase error e ψm = [e ψn ] m is assumed to be uniformly distributed over the interval Thus, we obtain With Rayleigh distributed amplitude signal, E{(a Q m ) 2 } can be expressed as The overall amplitude distortion E{e 2 am } includes both granular and overload distortion.It can be written as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where the granular distortion is given by A under the assumption of uniformly distributed granular distortion, a max = A clip (1 − 1 2L A ) denotes the maximum quantization output of the amplitude quantizer, and the function Q 2 )dv denotes the Q-function.Finally, by inserting (34), ( 33) and ( 32) into (30), we can find the b m in the Bussgang gain matrix B. Then, the diagonal approximation C diag d = C d ⊙ I N Nt can be derived based on (26), wherein the m-th diagonal entry of

A. Methodology
In general, while there are several IC and prototyping implementations for Cartesian-based 5G transmitters [43], [44], [45] and digital polar transmitter architectures [33], [34], [35], [36], directly comparing the power efficiency of two transmitter architectures is difficult [46].This is because the reported implementations may have different design targets and omit some implementation details due to only some circuit components being integrated.Moreover, the implementations may use different silicon technologies and circuit components.As depicted in Fig. 1, the quantized polar transmitter at each antenna consists of block components including I/Q to polar converter (e.g., CORDIC), local oscillator (LO), digital phase modulator, and RF-DAC.On the other hand, the RF chain at each antenna with conventional Cartesian architecture mainly consists of the DAC, low-pass filter (LPF), I/Q mixer, LO, a 90 • hybrid with buffers, and a linear-class PA (class-A/B/AB), which is typically an integrated CMOS PA in the large-scale MIMO context.To make these two architectures comparable, we need to establish realistic power consumption models of the required circuit blocks in each transmitter, based on an extensive survey of the state-of-the-art circuit designs and measurements.
For a fair comparison, each transmit architecture has to deliver the same target spectral efficiency (SE).Thus, the total power consumption P All for each transmitter architecture can be expressed as where P total is the required total transmit power to meet the target SE, η is the average power efficiency of the power amplification stage, and P cir denotes the power consumption of the circuit blocks, which depends on the chosen transmitter architecture.We are limiting our focus to the power consumption of the RF transmitter and RFFE.The baseband power consumption will be omitted due to the same baseband processing being applied in both architectures.The power consumption of any active cooling is also omitted.Moreover, three test environments (indoor hotspot, dense urban, rural) for eMBB are defined in 5G standardisation [47].In this work, the rural and dense urban environments are considered, since the power consumption of wide-area BSs is most critical from the overall network power efficiency point of view.

B. P total Estimation
In general, the output power of each PA is split into K parts due to spatial multiplexing and even power allocation, while N t transmit antennas with beamforming provide N 2 t times increased power gain.Hence, the maximum output signal power after coherent summation is P total Nt K .Note that increasing the transmit power P total and array size N t both improve signal power gain in (22), which effectively provides higher effective isotropic radiated power (EIRP) and helps achieve the target SINR.However, we can only rely on Monte Carlo simulations to estimate the actual power terms in (22) instead of quantitatively analyzing them with respect to the system parameters.Then, the corresponding estimated SINDR with the given channel realization H can be denoted as where the signal power gain G k,l is given by G k,l = arg min E{∥g k,l ŷk,l − u k,l ∥ 2 }, and the interference plus dis- All the receiver thermal noise power σ 2 rx is treated as constant in each use case.Moreover, all signal, interference plus distortion, and noise powers are relative powers, referred to the maximum BS transmit power (see more in Table of [48]).As a result, we obtain the expected signal power Ḡ2 , and multi-user interference plus distortion power Ī2 + D2 over all active subcarriers via Monte Carlo simulation.Then, to reach the targeted SINRs in each use case, the average SINDR per UE yields where P adjust is the power adjustment factor, SIN R req is the required SINR in each use case, and the thermal noise power σ2 rx at the UE side in each use case can be reproduced by link budget estimation from Table II in [46] with corresponding parameter values [39], [48].When the equality holds, we have Lastly, the required transmit power for different system parameters is obtained, leading to e.g., P total (dBm) = 49 dBm + 10 log 10 (P ⋆ adjust ) in the rural eMBB use case.

C. Hardware Power Consumption Modeling
In this subsection, we analyze the power consumption of the circuit components in both the polar and Cartesian transmitter architectures.Additionally, we select some realistic power consumption values or parameters for each circuit block based on the state-of-the-art reported circuit designs and measurements.
Firstly, we consider a direct-conversion Cartesian transmitter and denote the power consumption of the DAC, filters, I/Q mixer and LO as P DAC , P F , P M and P LO , respectively.
1) Power Consumption of DAC: The DAC power consumption is mainly determined by the sampling rate and quantization resolution.An empirical model for where FOM DAC and BW OSR denote the figure-ofmerit (FOM) and oversampled data rate, respectively, and P buffer is a constant hardware overhead for signal amplification, e.g., we assume P buffer = 10 mW for -14 dBm output signal power.We specify the FOM of the DAC to be FOM DAC = 0.08 PJ/conversion as in [46] and BW OSR = 614 MS/s for 100 MHz signal bandwidth.2) LO, Mixer and Filters: LO signal generation and distribution design pose a challenge to massive MIMO transceiver design, due to large number of RF chains and a high frequency range.Currently, there are three approaches for LO generation and distribution that have been proposed in the literature for massive MIMO [49].Firstly, LO signal can be generated by an independent oscillator locally for each transceiver.Secondly, a lowfrequency reference is first distributed to each transceiver and a PLL is then used to generate the desired LO frequency.Thirdly, in the centralized LO distribution architecture, high-frequency LO signal is directly distributed from a common source to all RF chains.In this work, we assume that the independent LO generation approach is utilized in both transmitter architectures.The state-of-the-art oscillator designs facilitate phase noises lower than -110 dBc/Hz at 1 MHz offset such that the system performance is not affected, and they consume less than 12.5 mW of direct current (DC) power [50].Thus, we consider that the LO components consume 25 mW of power, when the 90 • phase shifters and LO buffer for driving the passive up-conversion mixers are considered.We assume the power consumption of the mixers to be P M = 5 mW [51].Regarding the required power to run filter components, we selected the value of P F = 20 mW from [52]. 3) CMOS PA: PA consumes large amount of power in current BSs operating in sub-6 GHz.It is tricky to balance the output power, linearity, and efficiency in practical PA designs.In [53], a comprehensive PA performance survey is reported.We note that PA efficiency is related to the operating frequency, output power and semiconductor technologies.In this work, we assume that CMOS-based PAs are utilized.Assuming a PAPR of 8 dB for the clipped OFDM signals, the 49 dBm total transmit power leads to 18.9 dBm output power per PA with an extremely large-scale array of 1024 antennas at the BS.Thus, the required saturated power of the PA is 26.9 dBm, which poses a huge challenge for CMOS PA design.Additionally, contrary to the optimistic values of η ≈ 40% found in the massive MIMO energy efficiency analyses [54], we opt for a more conservative PA efficiency of η = 15%, based on the required saturated output power and the state-of-the-art CMOS PA implementations.
The power consumption details of the quantized polar transmitter are, in turn, listed below.

1) CORDIC:
The main principle of CORDIC is carrying out calculations based on shift-registers and adders instead of multiplications, which saves hardware resources.In [55], parallel rotation and vectoring CORDIC are designed to perform I/Q signal to polar conversion with 5-bit amplitude and 7-bit phase signals, with an energy metric (Power/Frequency/Bits) of 0.41 PJ/bit.Thus, we estimate the maximum power consumption of CORDIC (with integrated pulse shaping filter) to be P CORDIC = 20 mW. 2) Phase Modulator: In the quantized polar transmitter, a digital-intensive RF phase modulator can be implemented using digital PLL or delay-based architectures such as DTC or DIPM.In this work, we assume that DTC-based phase modulators are adopted and the power model is P PM = 27 mW with 100 MHz bandwidth [35].3) RF-DAC: Combining the RF-DAC and PA functionality on a single die constitutes fully integrated RF-DPA implementations, capable of delivering high output powers.In addition, by utilizing switched-capacitor circuits, power efficiency, linearity and scalability can be further improved.The state-of-the-art average PA efficiency of DPA in polar transmitter implementations is around 25% [36].
Finally, we assume that the power consumption of the LO is the same for both the Cartesian and the quantized polar transmitter architectures.
Table II shows a summary of the specifications of the circuits blocks for Cartesian and polar transmitter architectures.Note that the estimated power consumption from the proposed power model matches well with the reported power consumptions in the surveyed designs, or is at least comparable to key circuit blocks in [43], [44], [45], and [36].V. NUMERICAL RESULTS In this section, we first present the quantized polar transmitter -based massive MIMO system performance in terms of OOB emission levels, average sum-rate, and uncoded average SER.Then, we present the total power consumption results and comparison between the polar transmitter and the conventional Cartesian transmitter architecture in rural eMBB and dense urban eMBB use cases.
We focus on a massive MIMO-OFDM multi-user downlink transmission in which the BS is equipped with N t = 128 halfwavelength spaced antennas.There are K = 16 single antenna UEs which are mutually far apart ( ≥ 10 m mutual distance) and uniformly distributed within the respective cell with predefined cell radius in [48].The RMa-NLOS and UMa-LOS propagation channels are assumed, where the large scale fading coefficients are calculated based on [39] and the small scale fading coefficients are generated through cluster-based channel models (cf.( 5)).We assume that the channel between the BS and each UE consists of N cl clusters with uniformly distributed AoAs/AoDs in [0, 2π) and a LOS path with the Ricean K-factor of 9 dB, if it exists.We also assume that all clusters have equal power.Moreover, each cluster consists of N ray rays with Laplacian distributed AoAs/AoDs, and the maximum delay spread of 60 ns on each path.The detailed channel parameters are listed in Table III.Similar to [56], channel normalization (Normalization 1 in [56]) is utilized such that the imbalances of channel attenuations (e.g., path loss variations) between UEs are removed, while variations over antenna elements and frequencies remain.With this normalization, the channel vectors of each UE are normalized such that the average energy over all antenna ports and effective subcarriers is equal to one.As a result, equality between average per-UE received SNRs is obtained in all scenarios.
The assumed 100 MHz M -QAM OFDM baseband signal for 3.5 GHz carrier frequency is in-line with 5G NR specifications [57].There are S = 1584 effective subcarriers with subcarrier spacing of 60 kHz and the normal CP length, and total time-domain samples of N = 5 × 2048 per main OFDM symbol where the oversampling factor κ = 5 and N F F T = 2048.The whole signal consists of 10 OFDM symbols, and time-domain windowing is adopted to improve the spectral containment of the OFDM signals.

A. Power Spectral Density
To demonstrate the OOB emission levels of the quantized polar transmitter, we plot in Fig. 2 the normalized power spectral density (PSD) of the transmitted signal (averaged over the BS antennas and over 100 channel realizations) and the normalized PSD of the received signal (averaged over the UEs and over 100 channel realizations) for the case of the RMa-NLOS scenario (similar transmit spectrum in the UMa-LOS scenario).We also plot the PSD of a traditional Cartesian transmitter with the low-resolution DACs using ZF precoding and SQUID-OFDM algorithm.In the simulations, the total transmit power is set mutually equal in all cases.We set A clip = 1 and set the average clipping probability of the amplitude signals to 10 −2 .Then, under the assumption that the I/Q signal in each antenna is z m ∼ CN (0, P total κNt ), m = 1, • • • , N t , the total transmit power is set to P total = κN t /(− ln(10 −2 )).
We can see that the OOB emission levels of the quantized polar transmitter with 3-bit amplitude and 3-bit phase quan-Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
tization are slightly higher as compared to the Cartesian one with 3-bit DACs.
Higher phase resolution (e.g., 6-bit) can be used to reduce the emission levels, since smaller phase errors will improve the reconstructed signal integrity.However, the emission levels will be eventually saturated when increasing the number of amplitude and phase bits, due to the overload distortion (see Fig. 2a).We also note that the relative level of OOB emissions is lower at the UEs than at the BS, which is in line with recent findings reported in [18].Lastly, we can observe that the OOB distortions with CE precoding (SQUID-OFDM) are a significant issue in practical systems as they may cause interference to users in adjacent frequency bands.This is due to CE precoding reducing the in-band power while OOB power increases under the fixed average total transmit power at the BS.Further analyses of the effects of the number of served UEs, BS antennas and different channel models on the OOB emissions, as well as the spatial radiation pattern of the distortions, are out of the scope of this paper, but form an important future work item.

B. Average Sum-Rate
Next, we will present results of the average sum-rate for the massive MIMO-OFDM systems.We will compare the performance of the proposed quantized polar transmitter paired with ZF precoding to Cartesian transmitter with low-resolution DACs, as proposed in [18], and to CE precoding system employing the SQUID-OFDM algorithm [20].For SQUID-OFDM, we use l P to denote the phase quantization bits while the constant amplitude signal satisfies the total transmit power.Due to the overload distortion being omitted in [18], we use the simulated results for the Cartesian transmitter rather than the analytical ones.In addition, analytical results with the second order Taylor approximation method from [38] are also presented for comparison.The results in Fig. 3 are the averages over 100 channel realizations.
1) Perfect CSI: We can see from Fig. 3 that the proposed diagonal approximation with Bussgang decomposition attains very good accuracy for the case of l A = l P ≥ 4 bits (see Fig. 3b and Fig. 3d) and l A ≥ 2, l P = 6 bits (see Fig. 3a and Fig. 3c) for the average sum-rate performance of the polar transmitter.We note that the Taylor approximation is accurate for the case of higher phase resolution l P = 6, but significantly overestimates the achievable rate for the small phase resolution case.This is due, in large part, to the quantization noise being temporally non-white, similarly to the case of Cartesian transmitter with low-resolution DACs [18].We can see that the average sum-rates of the unquantized polar transmitter can be approached by using very few amplitude bits and a moderate number of phase bits.For instance, the performance loss can be neglected in both scenarios when l A = 3-4 amplitude bits and l P = 6 phase bits are utilized at the SNR of 10 dB.On the contrary, there is a substantial average sum-rate loss for the SQUID-OFDM scheme.Thus, like other CE precoding schemes [14], [19], extra power is required to achieve the sum-capacity of the Gaussian broadcast channel under average total power constraint since a notable share of the transmitted power is lost into the channel null-space.
Furthermore, the SQUID-OFDM algorithm requires further parameter optimization to improve its robustness, see Fig. 3b and Fig. 3d.
Overall, we note that better average sum-rate performance can be attained with the digital polar transmitter with l P = 6 and low amplitude resolution, i.e., 2-4 bits, compared to the Cartesian one with higher-resolution DACs.Moreover, the polar transmitter has a better average sum-rate performance compared to the Cartesian transmitter also with equal amplitude and phase resolutions.This is due to the properties of polar quantization with complex Gaussian input, which have been analyzed and demonstrated in [32].
2) Imperfect CSI: Next, we investigate the impact of imperfect CSI on the average sum-rate performance.Specifically, we consider a case where the BS acquires a noisy time domain compound multi-user of H est [d], which can be modelled or expressed as [18] H where ϵ ∈ [0, 1] is the uncertainty factor, and E is the channel estimation error, wherein the elements are random complex values that are i.i.d.Gaussian distributed with zero mean, having a variance equal to the variance of the multi-path components of H[d].The case ϵ = 0 corresponds to perfect CSI, while values lower than 1 indicate that only partial CSI is available.In a TDD system, the value of the channel uncertainty ϵ is dependent on the length of pilot sequences that have been transmitted during the uplink training phase.Then, the corresponding multi-user frequency-domain response at lth subcarrier is given by H ld).In Fig. 4, we show the average sum-rate for massive MIMO-OFDM system with different transmitter architectures in the two considered channel cases, and with uncertainty factor ϵ having values of 0.1, 0.2, and 0.5.We note that the imperfect CSI does indeed deteriorate the average sum-rate similarly for each considered transmitter architecture, no matter the channel type.This is due to the imperfect CSI only affecting the ZF precoding, which in the quantized polar and Cartesian transmitters is the same, and the quantization occurs only after the precoding.

C. Symbol Error-Rate Performance
In Fig. 5 and Fig. 6, we present the uncoded average SER performances of the different transmitter architectures with ZF precoding and different modulation schemes for the case of l A = 2, 3, 4 and l P = 6 in the RMa-NLOS and UMa-LOS scenarios, respectively.We can see that only a few amplitude bits are sufficient to approach the optimal SER performance of the unquantized one in both scenarios.For instance, only 3-4 amplitude bits are required in the polar transmitter to achieve an uncoded SER of 10 −4 with a negligible performance loss for 64-QAM in RMa-NLOS and 16-QAM in UMa-LOS scenarios as compared to the infiniteresolution case.As expected, less amplitude bits are needed for polar transmitter with 6-bit phase resolution to outperform the Cartesian with more DAC bits, e.g., l A = 2 while l DAC = 3.Moreover, the SER performance of the quantized polar transmitter is significantly improved by increasing the amplitude resolution from l A = 2 to l A = 3.This is because the polar quantization error of the low amplitude input signals is dominated by the amplitude resolution and that of large amplitude input signals is dominated by the phase resolution [32].On the other hand, the performance of the SQUID-based scheme shows clear dependence on the channel characteristics and the modulation order, and can only be seen to have comparable performance in the UMa-LOS case with 16-QAM at high SNR.We also note that the superiority of the quantized polar transmitter is more significant for the high-order modulation schemes (e.g., 256-QAM and 64-QAM in RMa-NLOS and UMa-LOS, respectively), compared to the traditional Cartesian architecture and the SQUID-based CE precoding design with the same phase resolution.This is an important finding of high practical relevance, as the modulation orders supported in 5G NR downlink at sub-6GHz have been recently extended to cover even 1024-QAM.

D. Power Consumption Comparison
In this subsection, the required total transmit power P total to reach the targeted SE and the total power consumption P All for the two transmitter architectures operating in the rural and dense urban eMBB use cases are presented.We assume the RMa-NLOS and UMa-LOS propagation environments for the rural eMBB and the dense urban eMBB use cases, respectively [48].From Fig. 7, we can see that the increase of the array size is an effective way to reduce the transmit power requirement in both use cases since it helps to improve both signal gain and interference control by forming narrower beams.We note that the required total transmit power of the quantized polar transmitter with l A = 3 and l P = 6 is almost same as the Cartesian transmitter with l DAC ≥ 4 in rural eMBB use case.However, the polar transmitter with smaller amplitude and phase bits requires more power to meet the target SE for the case of l A = l P = l DAC .In addition, it is not feasible to estimate the P adjust for the polar transmitter with 2-bit amplitude and phase resolutions in dense urban use case since the term Ḡ − ( Ī2 + D2 )SIN R req < 0 in (38).This is the SINDR is limited by the quantization due to coarse quantization.a fair comparison, we consider the total required transmit power with l A = l P = l DAC = 6 for the total power consumption comparison between the two transmitter architectures.The total power consumption P All in rural eMBB use case is presented in Fig. 8a, while also pushing the antenna count up to 1024.We can see that the quantized polar transmitter architecture is highly power efficient and it saves up to 39.6% in power consumption compared to the conventional Cartesian architecture in the 64-antenna setting.The difference is reduced when increasing the array size, because the power consumption per PA, which is the main factor in the power efficiency gain of the polar architecture, is reduced.Thus, since the individual circuit components are assumed to have a constant power consumption in our analysis, the corresponding total circuit power consumption, P cir , becomes more dominant and eventually the bottleneck in terms of the total power consumption.Specifically, with 1024 antennas, the consumption of the PAs is already becoming minor compared to the total circuit power consumption.Therefore, to make large arrays feasible and practical, the circuit power consumption must be scaled down as the array size is increased.
The power-saving superiority of the quantized polar transmitter is also demonstrated in Fig. 8b for dense urban use case.In this use case, the polar transmitter architecture is still highly power efficient with up to 36.4% power saving compared to Cartesian one.Similarly to the rural eMBB case, with the increase of the number of BS antennas, the output power per PA is becoming smaller and the power consumption of the other circuit blocks gradually starts to dominate the total power consumption.Interestingly, the optimum array sizes to minimize the total power consumption are different in this case -256 for Cartesian and 128 for quantized polar transmitter.
Overall, we can conclude that a power efficient system requires large array sizes in both use cases, however, there is a clear limit beyond which increasing the array size is not anymore directly useful.Similar to the analysis in [46], the exact turning points of total power consumption for different transmitter architectures depend on the use case, the assumed circuit power consumption model, and the targeted SE.

VI. CONCLUSION
The quantized polar transmitter for a multi-user downlink massive MIMO-OFDM was presented and analyzed.We derived a lower bound on the achievable sum-rate based on Bussgang decomposition.To express and evaluate this lower bound, we presented a diagonal approximation for the covariance of the polar quantization distortion.In addition, we presented two realistic circuit power consumption models for the quantized polar and Cartesian transmitters, and analyzed the total transmit power by considering two 5G eMBB use cases.The quantized polar transmitter in a massive MIMO system was shown to outperform the traditional Cartesian-based system in terms of sum-rate, SER, and OOB emissions.Moreover, the power consumption comparison demonstrated the superior power efficiency of the quantized polar transmitter architecture compared to the Cartesian transmitter in two representative 5G eMBB use cases.Altogether, these results show that the requirements for the array size, transmit power, and hardware specifications of the RF chains can be relaxed compared to traditional Cartesian transmitter architecture.Additionally, the results show that the optimum array size varies depending on the use case, and that the optimum array size for a given use case can be different for the different transmitter architectures.The tools and analysis methods provided in the article allow for assessing such optimum array size for any given deployment scenario.

Fig. 1 .
Fig. 1.The block diagram of the considered massive MIMO-OFDM multi-user downlink transmission with the linear precoding and the quantized polar transmitter at the BS.
where B = diag(b 1 , . . ., b Nt ) ⊗ I N ∈ R NtN is the Bussgang gain matrix, and the quantization error d = vec([d 0 , d 1 , • • • , d N −1 ]) ∈ C NtN is uncorrelated with the discrete-time precoded vector ) C. Computation of C diag d Let now z m = [z n ] m be the m-th element of the input signal z n , with magnitude a m = [A n ] m and phase angle ψ m = [Ψ n ] m .Then, its probability density function in the polar coordinates can be expressed as

Fig. 5 .
Fig. 5. Average uncoded SER performance for different transmit architectures operating in the RMa-NLOS scenario with ZF and different modulation schemes versus different amplitude and phase l P = 6, 1% clipping probability, Nt = 128 and K = 16; perfect CSI.

Fig. 6 .
Fig. 6.Average uncoded SER performance for different transmit architectures operating in the UMa-LOS scenario with ZF and different modulation schemes versus different amplitude and phase resolutions; l P = 6, 1% clipping probability, Nt = and K = perfect CSI.

Fig. 7 .
Fig. 7.The total transmit power total with different number of transmit antennas versus the amplitude and phase resolutions to reach the target SE in the rural eMBB and dense urban eMBB use cases; ZF with Gaussian signaling and K = 16; perfect CSI.

Fig. 8 .
Fig. 8.Total power consumption P All for the traditional Cartesian and the quantized polar transmitter architectures in the Rural and dense urban eMBB use cases; ZF with Gaussian signaling, Nt ∈ {64, 128, 256, 512, 1024} and K = 16.

TABLE I SUMMARY
OF THE STATE-OF-THE-ART ONE-BIT/CE PRECODING APPROACHES FOR DOWNLINK MIMO-OFDM

TABLE II SUMMARY
OF CIRCUIT BLOCKS IN THE CARTESIAN AND THE QUANTIZED POLAR TRANSMITTER ARCHITECTURES