Digital Post-Distortion for Multi-Layer MIMO-OFDM

In this letter, we propose a new digital post-distortion (DPoD) technique for precoded multi-layer orthogonal frequency division multiplexing (OFDM) systems, in order to enhance the received signal error vector magnitude (EVM) under heavily nonlinear power amplifiers (PAs) in the transmitter. The proposed multi-layer digital post-inverse (ML-DPoI) builds on layer- or stream-level nonlinear distortion modeling and corresponding signal processing at receiver, combined with true reference signal based model parameter estimation allowing to operate without any prior knowledge of the transmit precoder or PA behavior. Numerical evaluations with dual-layer rank-2 transmission in 5G New Radio (NR) uplink context with standard-compatible physical-layer transmit waveform show that despite heavily nonlinear PA units, the received signal EVM can be reliably enhanced – even in the case of very high modulation order such as 256-QAM per transmit stream. These results pave the way towards improved power-efficiency and sustainability in future networks, especially at millimeter-wave bands, where transmitter power-efficiency is known to be a major implementation challenge.

Digital Post-Distortion for Multi-Layer MIMO-OFDM Huseyin Babaroglu , Lauri Anttila , Guixian Xu , and Mikko Valkama , Fellow, IEEE Abstract-In this letter, we propose a new digital postdistortion (DPoD) technique for precoded multi-layer orthogonal frequency division multiplexing (OFDM) systems, in order to enhance the received signal error vector magnitude (EVM) under heavily nonlinear power amplifiers (PAs) in the transmitter.The proposed multi-layer digital post-inverse (ML-DPoI) builds on layer-or stream-level nonlinear distortion modeling and corresponding signal processing at receiver, combined with true reference signal based model parameter estimation allowing to operate without any prior knowledge of the transmit precoder or PA behavior.Numerical evaluations with dual-layer rank-2 transmission in 5G New Radio (NR) uplink context with standard-compatible physical-layer transmit waveform show that despite heavily nonlinear PA units, the received signal EVM can be reliably enhanced -even in the case of very high modulation order such as 256-QAM per transmit stream.These results pave the way towards improved power-efficiency and sustainability in future networks, especially at millimeter-wave bands, where transmitter power-efficiency is known to be a major implementation challenge.Index Terms-5G, 6G, digital post-distortion, EVM, DMRS, MIMO-OFDM, multi-layer precoding, nonlinear distortion, physical-layer, power amplifier, power-efficiency, RF impairments

I. INTRODUCTION
T HE HIGH peak-to-average power ratio (PAPR) of orthog- onal frequency division multiplexing (OFDM) waveforms creates a fundamental tradeoff between the transmitter (TX) power-efficiency and the antenna signal quality when operating under practical nonlinear power amplifiers (PAs) [1].Different digital predistortion (DPD) methods are the most established approach to suppress PA nonlinear distortion, especially in cellular base-stations (BSs).However, applying DPD in emerging millimeter-wave (mmWave) networks with increasing bandwidths, large amounts of PA units, and elaborate hybrid beamforming methods is known to be challenging [2].Additionally, in cellular uplink, the processing capabilities of user equipment (UE) and thereon the prospects of using DPD are much more limited compared to base-stations.
Digital post-distortion (DPoD) [3], [4] is an alternative, receiver-based method to suppress transmitter PA distortion, with primary emphasis on inband signal quality and improving the corresponding error vector magnitude (EVM).This is also the main technical scope of this letter, with specific focus on cellular uplink and challenging pre-coded multi-layer transmission with high modulation order per transmit stream.
Different means for improved energy-efficiency, including DPoD, are also currently considered in 3GPP standardization for the evolution of 5G New Radio (NR) [5], while a notable energy-efficiency leap and the overall sustainability are among the key targets towards IMT-2030 and 6G [6].
The seminal model-based works in [3], [4] introduce the socalled PA nonlinearity cancellation (PANC) and reconstruction of distorted signals (RODS) methods, respectively, in ordinary single-antenna or single-stream system context.Further singlestream refinements are provided, e.g., in [7], [8], while [9] develops machine learning (ML) based physical-layer receiver for single-stream OFDM transmission showing robustness against PA distortion.Additionally, different multiantenna or MIMO variants have been considered in [10], [11], [12], [13], [14], [15], [16], [17].In [10], [12], transmission of multiple parallel streams is assumed, however, channel-aware transmit precoding is not considered and the TX nonlinearity is assumed known.In [11], in turn, rank-1 space-time coded transmission is considered while again assuming a known TX nonlinearity at receiver.In [13], a multi-user MIMO case is considered with rank-1 transmission per UE, hence neglecting nonlinear mixing or interference between the streams.The work in [14] considers again rank-1 transmission and proposes new pilot or reference signal structures for nonlinear TX model estimation.In [15], space-frequency transmit diversity system is considered under the assumption of known channel and TX nonlinearity.In [16], an iterative receiver scheme is devised for multi-layer transmission, assuming a known TX nonlinearity.The work in [17] considers hybrid beamforming based MIMO system and devises an iterative particle-filter based receiver solution building again on the assumption of a known TX nonlinearity.Importantly, in all of the above works, the fundamental aspect of TX nonlinearity model estimation at receiver is neglected, while many being also applicable only in rank-1 transmission scenarios, missing the channel-aware precoding aspects, or calling for iterative processing with notable complexity.Hence, in this letter, we bridge these important gaps and focus on pre-coded multi-layer transmissions beyond rank-1, and provide a direct one-shot DPoD solution referred to as the multi-layer digital post-inverse (ML-DPoI) while also explicitly consider the TX nonlinearity model estimation using the available reference signals.First, multi-dimensional baseband nonlinearity modeling is provided, describing how a PA nonlinearity with precoded signals is impacting at stream-level.Inspired by the modeling, the proposed oneshot ML-DPoI scheme is then devised, being complemented with demodulation reference signal (DMRS) based parameter estimation scheme.Finally, comprehensive numerical results are provided in the context of 5G NR dual-layer rank-2 uplink transmission at 28 GHz mmWave band with standard-compliant uplink waveform and DMRS structure.The obtained results show that despite heavily nonlinear PA units at UE, the received signal EVM can be reliably enhanced through the proposed DPoD scheme even in the very challenging case of 256-QAM data modulation.
2162-2345 c 2024 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
II. SYSTEM MODEL We consider an N TX × N RX MIMO-OFDM transmission system with N L parallel streams where N TX and N RX denote the numbers of transmit and receive antennas, respectively.The transmit symbol of stream i at subcarrier k is denoted by a while N act denotes the number of active subcarriers.The data symbols of the transmission layers are mapped to transmit antennas via precoding, expressed in vector matrix format as where k , . . ., a ] T is the N TX × 1 precoded antenna signal vector.We note that frequency-independent precoding is deliberately considered, as such approach is the baseline method in 5G NR uplink [18].
While the precoding is commonly implemented in frequency-domain, as shown in (1), we next express the equivalent time-domain model which then allows for including PA nonlinear distortion modeling through well-established models such as the memory polynomial (MP) [19].To this end, the time-domain equivalent of (1) can be directly expressed as where z (i) (n), and x (j ) (n) denote the discrete-time OFDM waveform samples corresponding to a k , respectively, while v ji denote the entries of the precoding matrix V. To accommodate accurate PA modeling, we assume that the time-domain model sample rate is f s = εN FFT Δf , where ε is the oversampling factor, N FFT is the fundamental OFDM fast Fourier transform (FFT) size, and Δf denotes the subcarrier spacing (SCS).Thus, assuming further that the cyclic prefix (CP) length is N CP , z (i) (n) can be expressed as where N = εN FFT , while the corresponding samples of x (j ) (n) follow through (2).
To now explicitly account for the TX nonlinearities, we employ the widely-used memory polynomial (MP) modeling approach [19].It is further assumed that each transmit antenna branch has its own individual nonlinear PA unit.The corresponding nonlinearly distorted transmit waveform on j-th antenna branch, y (j ) (n), can now be expressed as where c (j ) p,d is the MP complex coefficient of order p and delay d while P TX and D TX denote the maximum nonlinearity order and memory depth, respectively.Importantly, because of the precoding in (2) and assuming nontrivial precoders with v ji = 0 ∀i , j , an individual PA input comprises linear combinations of the individual streams or layers, and thus the nonlinear distortion model in (4) causes a complicated nonlinear distortion profile at PA output from the data layers point of view.This is further elaborated in Section III, forming also the basis of the proposed DPoD method.
Next, the distorted antenna waveforms propagate through a noisy MIMO multipath channel.We assume that the receiver basic sample rate is f s,RX = N FFT Δf , and that the receiver front-end contains applicable channel selection filtering.The corresponding channel-filtered received signal is transformed at the receiver into frequency domain via FFT of size N FFT .The resulting frequency-domain samples at subcarrier k across the RX antennas can be expressed in vector-matrix notation as where H(k) is the N RX × N TX MIMO channel frequency response matrix at subcarrier k.Furthermore, Y filt (k ) stacks the FFT values of the effective channel-filtered transmit antenna waveforms at subcarrier k, while W filt (k ) refers to the corresponding FFT values of the channel-filtered thermal noise at subcarrier k.
Finally, we assume that ordinary linear minimum meansquared error (LMMSE) equalizer is employed to mitigate the MIMO channel linear distortion.We further assume that the uplink reference signal transmissions are also precoded [18], such that the RX can estimate directly the effective precoded linear channel without TX precoder knowledge.We express the effective precoded MIMO channel as H eff (k ) = H cs (k )H(k )V, where the diagonal matrix H cs (k ) accounts for the frequency responses of the channel selection filters at RX branches.Denoting further the corresponding equalizer of size N L × N RX by H eq (k ), the equalizer output signal reads This equalizer output serves as the input to the proposed DPoD processing, described next.Furthermore, the fundamental considered system model is graphically illustrated in Fig. 1.

A. Multi-Input Baseband Nonlinearity Modeling
We start by explicitly addressing the nonlinear system model at layer or stream level, building on the antenna waveform model in (4).By substituting the precoded signal expression in (2) into (4), we can first formally write Next, for notational convenience, let us denote z (i) (n) as z i while momentarily omitting also the antenna branch index j.Then, (7) can be rewritten for any given antenna index j as where  given OFDM symbol with N samples (excluding CP) can be expressed as where y is of size N × 1, Υ is the N × N coef TX basis function matrix, and c is the N coef TX ×1 coefficient matrix, with N coef TX = N BF TX (D TX + 1) denoting the total number of coefficients.
The exact expressions for the nonlinear basis function samples Υ m,d (n) can basically be solved through brute-force, by working out the substitution in (7) for a given nonlinearity order P TX .To address this more elegantly for an arbitrary nonlinearity order, we next introduce a recursive approach to obtain the entries of the basis function matrix Υ.To this end, let us denote the set of instantaneous basis functions that corresponds to a nonlinearity order of p as Υ (p) , for any arbitrary sample instant n.Now, stemming from the nature of the assumed memory polynomial PA model, a recursive relation can be established between orders p and p − 2, expressed as where ⊗ denotes the Kronecker product, G is a generator vector, while the fundamental linear basis function set is defined as Υ (1) = [z 1 (n), z 2 (n), . . ., z N L (n)] T .The entries of the generator vector G(n) contain all pairwise products of the involved streams with the other element always complexconjugated.Concrete example for N L = 2 is available in the Appendix.Furthermore, and importantly, the recursion in (10) provides always the correct basis function terms but includes also some duplicates when directly implemented.Thus, the actual basis function set Υ (p) is obtained from Ῡ(p) in ( 10) by simply removing the repetitive terms.An example for N L = 2 is given in the Appendix.

B. Multi-Layer Digital Post-Inverse (ML-DPoI)
Building on the previous modeling, we next describe the actual proposed multi-layer DPoD processing operating at RX. First, the LMMSE equalizer output in (6) is transformed back to time-domain for more efficient calculation of nonlinear transformations.This is done via N = εN FFT -point IFFT, and yields a signal q (i) (n) for the i-th layer, expressed as where ỹ(i) (n) and w (i) (n) denote the upsampled time-domain sequences of the first and the second terms in (6).Now, the actual multi-layer DPoD processing, referred to as the multi-layer digital post-inverse (ML-DPoI), is based on a processing engine similar to the transmitter nonlinearity modeling in (8), in which the inputs are the received and equalized time-domain layers q (i) (n) in (11).That is, the basis functions up to order P RX are calculated using the time-domain layers q (i) (n), while because of seeking inverse nonlinearities, we denote the corresponding basis functions with Υ inv in the following.Thus, the ML-DPoI output for layer i is defined as (12) where θ (i) m,d denote the ML-DPoI processing coefficients.Similar to (9), and to support efficient parameter estimation, it is convenient to represent (12) in vector-matrix notation.Accounting for all involved N L parallel layers, we express this as where

C. DMRS-Based Parameter Estimation
Next, we address the DMRS-based estimation of the inverse model parameters θ.This is important since in real networks, the PA model coefficients are unknown at RX. Additionally, when DMRS transmission is precoded similar to data symbols [18], the receiver does not know the applied precoding weights either.However, the precoded DMRS symbols form direct basis for effective linear channel estimation as well as estimation of the ML-DPoI model parameters in (13).The linear channel estimation is a standard procedure, while the parameter estimation for the proposed ML-DPoI is one technical contribution of this letter.To this end, by using the known DMRS sequences, z where RX basis function matrix during DMRS reception, and θ is the N coef RX × N L matrix of inverse model coefficients.Then, the unknown coefficient matrix can be estimated with least-squares (LS) as

D. Notes on Complexity
We next shortly address the ML-DPoI processing complexity.Building directly on the basis function generation mechanism in (10), the numbers of instantaneous basis functions, N BF RX , can be deduced and are shown in Table I for feasible example values of P RX and N L .The corresponding actual floating point operations (FLOPs) per processed sample, excluding the IFFT/FFT operations, can also be calculated.These are provided in Table II.One can clearly observe that the complexity grows fast for increasing P RX , especially if N L > 2. However, for N L = 2, the complexity numbers are still modest even for fairly large values of P RX .

IV. NUMERICAL RESULTS
The proposed ML-DPoI method is next evaluated in a computer simulation environment using MATLAB, while focusing on the practical case of dual-layer transmission (N L = 2).Random realizations of 5G NR standard-compliant uplink CP-OFDM waveform of length 14 OFDM symbols (1 slot) are used.The assumed values of the different related parameters and quantities are listed in Table III.Furthermore, the layer-to-antenna mapping of data symbols and DMRS is performed assuming a codebook-based transmission as described in [20], with a randomly chosen transmit precoding matrix indicator (TPMI) value (within nontrivial precoders corresponding to TPMI values from 14 to 21 defined in [20]), while DMRS boosting is employed as described in [7].Each transmit branch has its own PA unit whose nonlinear behaviors are characterized by measured MP models available in [21] with mutually different coefficient realizations.A soft envelope limiter is also deployed, such that the PAPR of the PA input signal is limited to 8 dB.The strength of nonlinearity introduced by the PAs is controlled with the backoff applied at the PA input, ranging from 4.25 dB to 9.25 dB with 0.5 dB increments.In exact terms, the input backoff is defined as IBO[dB] = 10 log 10 P in-sat /P avg , (16) where P in-sat and P avg are the input saturation power of the PA and the average power of the PA input signal, respectively.The assumed reference input saturation power of an individual PA module is +5 dBm.
The involved 4 × 64 MIMO multipath channel is modeled by CDL-E channel profile [22] with RMS delay spread (DS) of 43 ns.LMMSE equalizer with practical DMRS-based linear channel estimate is employed before ML-DPoI, as shown also in Fig. 1.Both the linear channel and the ML-DPoI parameter estimation are executed on per-slot basis, using DMRS.The main performance metric is the received signal EVM for PUSCH symbols -with thermal noise excluded from the EVM calculation to focus on the true useful signal quality under the noise.Thermal noise is, however, properly modelled in all other stages, such as the LMMSE and ML-DPOI parameter estimation phases, for realistic assessments.Since the existing literature does not contain any model-based RX DPoD reference methods applicable for true multi-layer transmission with unknown transmitter nonlinearities, the performance of the proposed method is compared to that of an ideally linearized PA (ILP), which has a constant gain over its linear range while entering saturation after P in-sat is reached.
Figure 2 shows the achieved receiver EVM values of the proposed ML-DPoI and of the LMMSE and ILP benchmarks with 256-QAM.Compared to the ordinary LMMSE, employing ML-DPoI provides substantial EVM enhancements.Specifically, when single-symbol DMRS is utilized for parameter estimation, the tight EVM limit is satisfied at and above 5.25 dB IBO values, reflecting an IBO gain of around 4 dB compared to ordinary LMMSE.Additionally, the receiver EVM is consistently below the required EVM limit when double-symbol DMRS is utilized, corresponding to even larger IBO gains of more than 5 dB.It can also be observed that the proposed ML-DPoI shows better performance compared to the corresponding ILP benchmark at the important low IBO region.This is due to the fact that the PA input signal has a high PAPR, and thus when the IBO is low, significant amounts of samples are essentially tending towards the clipping or saturation region.The number of such samples gradually decreases with increasing IBO, and the ILP starts to eventually perform better than ML-DPoI.These results thus directly show that the ML-DPoI is capable of satisfying tight EVM Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
requirements at the challenging low IBO values where the ideally linearized PA approach already fails.This implies that the ML-DPoI allows for higher power-efficiency by letting the PAs operate in their highly nonlinear region.
Setting the ML-DPoI nonlinearity order is, in general, a compromise between modeling capability and processing complexity.However, as illustrated in Table IV, large EVM improvements can already be obtained with modest orders of P RX = 3 and P RX = 5 -even when operating at low IBOs.In this case, increasing the nonlinearity order beyond P RX = 5 does not further improve the performance.However, it is important to acknowledge that the exact optimization of the involved nonlinearity order and memory depth is always eventually subject to the made assumptions and thus application specific.
V. CONCLUSION In this letter, we proposed a new digital post-distortion technique for precoded multi-layer MIMO-OFDM systems under strong transmitter nonlinear distortion.The proposed ML-DPoI builds on a layer-level nonlinear distortion modeling and consists of reference signal based model parameter estimation and the corresponding receiver side post-processing.Furthermore, ML-DPoI is a one-shot approach and it does not require any prior information of the transmit precoder or PA model parameters.The performance of the proposed method was evaluated through computer simulations considering a dual-layer 5G NR mmWave uplink transmission at 28 GHz.The results showed that employing ML-DPoI provides significant enhancements in the receiver EVM, and is able to meet the tight EVM limit of 3.5% standardized for 256-QAM at low IBO values.It was also shown that the proposed method outperforms an ideally linearized transmitter at low IBO values, allowing transmitter PAs to operate in their highly nonlinear but power-efficient region while still fulfilling the corresponding EVM limits.Our future work focuses on extending the proposed method to frequencyselective precoding scenarios.

APPENDIX INSTANTANEOUS BASIS FUNCTION GENERATION
In the following, a concrete BF generation example for N L = 2 with linear basis function set of Υ (1) = [z 1 (n), z 2 (n)] T is given.First, the generator vector reads ) Then, for p = 3, we can write Ῡ(3) = Υ (1) ⊗ G which yields In above, the fourth item z 1 (n)z * 1 (n)z 2 (n) is equal to z 2 (n)|z 1 (n)|2 , which is identical to the fifth item.Furthermore, the seventh item z 2 (n)z 1 (n)z * 2 (n) equals z 1 (n)|z 2 (n)| 2 , which is identical to the second item.Thus, Υ (3) (n) is obtained by removing these duplicate terms, resulting eventually in

Fig. 1 .
Fig. 1.Baseband equivalent block-diagram of the considered multi-layer MIMO-OFDM system with TX nonlinearities included.Up-and down-sampling are embedded in IFFT/FFT operations, respectively.
denotes the N × N L post-distorter output matrix with one column representing one layer, Υ inv is the N × N coef RX basis function matrix, θ is the N coef RX ×N L matrix of the inverse model coefficients, and N coef RX = N BF RX (D RX + 1).Finally, the output signals ẑ (i) (n) are decimated by ε and transformed via N FFT -point FFT for the actual bit or symbol detection.
are the nonlinear basis functions that are now comprised directly of the unprecoded input layers z i .The exact expressions for these basis functions are given below, with further details being available in the Appendix.Furthermore, cm,d denote the effective complex coefficients lumping the underlying unknown MP model coefficients c (j ) p,d and the precoder weights v ji , while N BF TX denotes the number of memoryless basis functions which in general depends on the nonlinearity order P TX and the number of layers N L .The corresponding vector-matrix model for any Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I EXAMPLE
AMOUNTS OF MEMORYLESS BASIS FUNCTIONS (N BF RX )