Predistortion-Based Linearization for 5G and Beyond Millimeter-Wave Transceiver Systems: A Comprehensive Survey

The next-generation (5G/6G) wireless communication aims to leapfrog the currently occupied sub-6 GHz spectrum to the wideband millimeter-wave (MMW) spectrum. However, MMW spectrums with high-order modulation schemes drive the power amplifier (PA) at significant back-off, causing severe nonlinear distortions, thus deteriorating the transceiver’s (TRXs) modulation process. Typically, the TRX efficacy is quantified with standardized linearization matrices, which take advantage of different predistortion (PD) schemes to handle deep compression of the PA. In this regard, TRX baseband signals are mostly linearized in the digital domain, where digitally controlled linearization needs higher sampling rates due to increasing MMW bandwidths to compensate for intermodulation (IMD) products, resulting in increased system cost and power consumption. Alternately, the digitally controlled analog-based linearization, i.e., the hybridization of digital predistortion (DPD) and analog predistortion (APD), is highly productive and cost-effective for fulfilling the linearized energy-efficient design vision of MMW networks. Therefore, this paper puts an extensive spotlight on the progress in PD-based linearization for 5G and beyond communications. It first provides background information on the advancements of PD schemes through recent surveys, then classifies the general roadmap of PD waveform processing across the TRX system models as preliminary. After this, we present three prominent PD architectures and their design approaches with intrinsic performance metrics. Finally, we explore four case studies encompassing PD operation under certain nonlinear constraints of different communication schemes. We examine the suitability of PD-based linearization solutions, both existing and proposed till the first quarter of 2022, and identify the potential prospects in this domain.


I. INTRODUCTION
T HE RAPID evolution in wireless cellular technology and its increased usage has put an enormous responsibility on the research organizations, industries, and academics to work towards the new generations (5G/6G) of mobile communications. In 2015, an interesting survey was conducted where smart devices exceeded the world population [1]. Also, the prediction made by the Ericsson of 29 billion devices by 2022 has seemed possible [2], according to one of the recent estimates [3]. From different surveys, all eyes were on the early 2020s for the roll-out of 5G millimeter-wave (MMW) technology [1], [4]. However, the common usage of 5G cellular technology at the global level still looks far away because of the stiff challenges that telecom operators are facing [5]. These constraints are evident from the highorder modulations at MMW frequencies [6]. If not treated properly, such advanced modulation schemes can degrade the entire modulation density, which results in significant signal distortion.
The extremely high-frequency band spans the frequency range of the 30 GHz MMW band to the 300 GHz terahertz (THz) band. However, in 5G networks, the MMW (24.25 GHz to 52.6 GHz) band has been blended with the conventional sub-6 GHz (410 MHz to 7.12 GHz) to maximize the available spectrum towards the users [7]. According to release 13, a peak data rate of 25 Gb/s is achievable by tapping 5 or 32 contiguous aggregated 5G NR component carriers [8]. This feature opens up further interests in the non-contiguous carrier aggregation (CA) of different frequency spectrums [9]. In addition, 5G orthogonal frequency division multiplexing (OFDM) is reinforced by the combination of multiple subcarriers, where each subcarrier has a dedicated modulation scheme with an excessive peak-to-average power ratio (PAPR) level [10]. In this way, the 5G modulation schemes become extremely immune to intermodulation (IMD) distortions. However, the high PAPR at the significant back-off region increases the system-level power consumption and exacerbates the transmit power and efficiency, leading to a low signal-to-noise ratio (SNR) at the receiver (RX). Over the years, clusters This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of works have been presented for PAPR reduction [10], [11]. The prime focus of each contribution is to reduce the PAPR level while evading inherent signal distortion or compromising spectral efficiency (SE). However, such insistence demand additional behavioral baseband optimization for the nonlinearity prediction at the transmitter (TX) or/and RX, which ultimately increases the computational complexities, making them unideal for large multiple-input-multiple-output (MIMO) arrays [11], [12], [13], [14].
In 5G base station (BS) TXs, a power amplifier (PA) is considered the most critical component because of its dynamic characteristics across the multi-connected front-end modules (FEMs). The PA attributes are often signified with insufficient power added efficiency (PAE) corresponding to the high PAPR signals [13], [14], [15], [16], [17]. Moreover, the substantial back-off in wideband CA hampers the PA performance, causing higher power consumption and, thus, inefficiency. Consequently, heat dissipation induces inherent nonlinear effects on PA characteristics [18]. To avoid such constraints, predistortion (PD) in its digital (DPD) variant is considered as the essential component of the amplification chain [17], [19], [20], [21]. For the linearization of 4G long-term evolution (LTE) waveforms, a standalone PA accompanied by a DPD solution is noticeably operate near the saturation region while romping at the PAPR and adjacent channel leakage/power ratio (ACLR/ACPR) levels of ≥8-dB and >−45 dBc, respectively [22], [23], [24], [25]. Such an appetite for LTE traffic was achievable mainly because of the two reasons: 1) a limited number of antenna phased arrays at macro-cells BS (MBS) transceiver (TRX) and thus 2) the implementation feasibility of a dedicated digital baseband unit for each antenna element [26]. Also, the traditional PAs were anticipated to operate in the weakly nonlinear region where dedicated DPDs in limited RF chains were comfortably able to meet PAE and ACLR criteria [20]. However, this methodology is unideal for 5G massive MIMO (mMIMO) densification of small-cell BSs (SBSs) because each antenna is preceded by a low-power compact PA, which operates highly nonlinearly. For this reason, the PAE and ACLR limits of the MMW 5G-NR (frequency range 2 (FR2)) spectrum are reasonably relaxed compared to the sub-6 GHz (FR1) MIMO systems [27].
On the other side, the 5G-CA schemes have imposed major limitations on DPD wideband running capabilities, especially in targeting nonlinearities of the gigabit broadband spectrum. In this way, it is important to determine an appropriate rate of sampling bandwidth in the presence of memory effects (MEs) to meet the target performance of intra-band (within carrier), and inter-band (outside carrier) signal modulations [28], [29]. It is also worth highlighting that the worst-case scenario of ambient MEs occurs in the wideband systems, where the inter-band distortion coincides with DPD modulated intraband signal [15], [30]. Therefore, another variant of the PD scheme, known as analog predistortion (APD), with its wideband operating capability, can target the out-of-band (OOB) distortions more effectively than DPD [31]. Besides, APD can provide a high degree of freedom in its cost-effective analog techniques, comes in different typologies [32], [33], [34], [35], [36], [37]. As a result, it is formidable in ultra-dense 5G BS-FEMs to employ a compact APD before each PA, significantly relaxing the digital hardware complexities and safeguarding the OOB spectral regrowth. Such FEMs enable the joint coordination or processing in a smart hybrid TX where the memory-based DPD can strengthen the in-band (IB) communication. Simultaneously, the memory-less analog circuitry can maintain the array uniformity and minimize the PA-induced OOB emissions. Hence, a promising technique that emerged from the hybridization of DPD and APD, called hybrid predistortion (HPD), is anticipated to be highly productive for the distinguished modulation schemes of 5G networks.

A. Challenges of Designing MMW-PD Systems and Motivation
There are various challenges to be addressed for the PD mechanism of MMW systems. First, it is very challenging to quickly find the optimal analog and digital PD matrices in a real-time manner, e.g., sufficient feedback knowledge to measure the radiated PA-induced emissions from the analog signal division across an arbitrary number of RF chains. In this pursuit, the main difficulties include: i) digital and analog precoders to drive parallel connected RF branches at the TX, as well as different feedback options (see [38] for the overview of feedback configurations) at the RX end, all together, emulate to bypass the practical non-convex TRX constraints for providing sufficient feedback knowledge to extract nonlinear parameters, ii) usually the analog precoders are established with the network of phase shifters (PSs) and variable gain amplifiers (VGAs), which often imposes additional constraints on the elements of preceding digital baseband due to the limited phase rotations or amplitude tapering [39], iii) when using a truncated DPD (Volterra/ML-adopted) models aiming to avoid highly non-linear orders (i.e., keeping the number of basis functions as small as possible), the best optimal hyperparameters lies in finite sets of optimization algorithms [40], which often leads to the concomitant trade-off among modeling generality and run-time complexity [41], iv) the difficulty in acquiring PA output signal with undersampling techniques at the transceiver RX (ADC) side, because one-to-one reduction of ADC acquisition bandwidth and sampling rate to below that of the input signal (leads to signal aliasing) could also make DPD channel estimation in both downconversion and upconversion more complicated [31].
As such, given the importance of predistortion methodologies in MMW-mMIMO wireless communications, the proposed solutions for the integration of PD techniques into practical 5G and beyond cellular BS architectures, and the existing approaches and challenges, we garner an extensive survey of PD modeling for both standalone and MIMO transceiver systems under different operating scenarios, and cover their underlying performance-complexity profile from various perspectives across the baseband processing to the corresponding behavioral modeling and feasibilities among the candidate communications schemes for contemporary and future networks; to provide a valuable guide to the readers working on the related RF & microwave

B. Contributions and Organization of the Paper
This paper aims to present a comprehensive, up-to-date review of the power amplifier predistortion schemes in the system, architecture, and different communications technologies, emphasizing the millimeter-wave technology of modern telecommunication systems. Since the main focus of this survey comprises the discussion on recent research findings of predistortion-based PA linearization dedicated solely to the 5G and beyond modules; and therefore, the survey does not include the basics of linearization methods, as there exists an ample review of these fundamentals in the open literature. For example, the readers are encouraged to go through the following surveys that cover the basics of various linearization methods focused more on the 5G networks [42], [43]. Nevertheless, this work does include a brief overview that encompasses fundamental insights and in-depth discussions on well-established PD techniques and design approaches, complexities, and potential future directions which can be harnessed to improve the PD processing across the emerging telecommunication standards; so that the readers appreciate the distinguishing features of traditional and current PDbased linearization solutions. A comparison of contributions between our survey and relevant works is summarized in Table I.
The rest of the paper is organized as follows: In Section II, we present the background on three main modes of PD schemes, i.e., digital, analog, and hybrid-based linearization solutions from recent survey papers. Section III lists the various system model perspectives of PD signal processing through different stages of a transceiver. Section IV is about the recent developments of three PD techniques. Section V discusses the case studies involving four enabling communications schemes to illustrate the PD characterization in several applications. The emerging research trends of the PD topic are covered in Section VI, followed by the conclusions in Section VII. In addition, the structure of this paper is shown in Fig. 1 at a glance. Also, the list of important acronyms commonly appeared throughout the article is given in Table II.

II. BACKGROUND: FROM AN EYE OF RECENT SURVEYS
The objective of this section is to familiarize the reader with the fundamental background, state-of-the-art proposals, approaches, and application scenarios of the three PD paradigms when developing MIMO technology for 5G BS modules. The summary of general and relevant surveys [15], [44], [45], [46], [47], [48] and magazine articles [19], [49], [50], [51], [52] pertaining directly or indirectly to the PD linearization schemes or technology can be found in Table I and III, respectively.
An exhaustive survey has been presented in [15] to investigate distortion-free PA architectures' system requirements in conventional 3G/4G communications. This survey carries an extensive research study to validate the energy efficiency (EE) and SE trade-off. The paper also discusses the fundamental limitations of non-linearity modeling in practical domains, i.e., PA architecture, signals behavior, and network protocols. All these limitations of the traditional systems have motivated researchers to introduce distinct channel modeling techniques which have been applied to the modern multi-connected sub-arrays (SAs) utilizing adaptive, conductive, and over-theair (OTA) ACLR testing [43], [53], [54], [55], [56], [57], which are the enabling features of MIMO linearization in 5G TRXs.
Cao et al. [41] overviews the interferer mitigation techniques for 5G digital array technologies. These techniques aim to investigate the effective solutions for MIMO arrays with high dynamic range (DR) signals, asynchronous waveform aggregation, and nonlinear equalization for OOB interference. They further illustrate the pros and cons of adaptive nonlinear predistortion (or pre-equalization) solutions to eliminate spectral contamination in cost-effective 5G systems, based on the fact that 5G communications are occupied with multi-carrier OFDM transmissions. For this reason, researchers resort to low complexity 5G PD schemes to engulf the major milestone of dispensing high linearity, especially at large back-off points [58], [59], [60]. Gilabert et al. [19] spunk to answer the challenges involved in multi-dimensional DPD with concurrent MMW multi-standard 5G communication signals. The authors favored the envelope tracking and outphasing PA components to amplify concurrent 5G sub-6 GHz (FR1) band signals. Fager et al. [61] analyze the hindrances of 5G TX characteristics based on PA and antenna codesigns' mutual differences in active MIMO arrays. Since the impedance mismatching of the two components is the main source of crosstalk and mutual distortions. Through simulations, the authors demonstrated a behavioral framework of different MIMO transmitting test boards to investigate the impact of dynamic load impedance on linearity, efficiency, and output power.
In ultra-dense heterogeneous elements of 5G MIMO arrays, the PD paradigms seek the possible technical benefits of the forthcoming fully hybrid linearization solutions. Hong et al. [46] studies the system characterizations' feasibility for 5G and 6G MMW networks. The paper starts with the characterization of 5G BS modules which involve active arrays, integrated circuits (ICs), user terminals mobility, system calibrations, and spectral models. Furthermore, the authors present an extensive comparison among the everemerging 5G beamforming techniques such as digital, analog, and hybrid structures. Here, the authors articulated the importance of choosing the right PD method to maintain sufficient hardware flexibility and performance trade-off for the distributed SAs of 5G networks. After that, the practicality of symmetrical and asymmetrical beamformer (BF) chipsets in fully digital and hybrid distortion learning environments associated with 5G and 6G wireless evolutions are elucidated. In addition, the authors claim that it is logical to yield fulldigital linearization in mMIMO asymmetrical nonreciprocal arrays for high-performance capacity, hardware compatibility, and broad-ranging signals coverage with sufficiently reduced circuit volume and operational cost [62]. The thematic issue from Chung and Boss [63] focuses on the recent surveys toward 5G DPD and digital post-correction techniques. The issue describes the execution of these techniques in wireless, wired, and optical communication systems to meet linearization challenges. Moreover, the authors review recent feature articles to illustrate the feasibility of linearization solutions in diverse digital signal processing (DSP) units of the three communication mediums.

A. Digital and Analog Linearization Solutions
The next-generation MMW solutions will enable high-speed connectivity for users with a long-awaited extended footprint.
The network operators first have to conjure linearization deficiencies without compromising power efficiency to make it happen. The interrelated DPD challenges that should be considered while initiating massive deployment of MIMO arrays include oversampling in wide bandwidths, fast speed data converters to extract wideband nonlinear coefficients, computational complexity, large circuit volume, implementation cost, and high power consumption. Abdelaziz et al. [51] delve deeper insights to analyze the less complex DPD structure for spectrally agile non-contiguous CA. The use cases of DPD processing in the sub-band and full-band spurious emissions of the sub-6 GHz LTE-A band have been outlined. The full-band DPD complements to be more competent in 5G contiguous NR waveforms. However, the cost consuming barriers linked with this method unveils some open research issues, such as the oversampling in the broadband transmission of MMW signals. In this context, [64] have studied the impact of sampling frequency on single-band and multi-band DPD structures. Moreover, the authors portrayed a major source that hinders DPD effectivity at large-scale integration, such as a trade-off in DPD modeling impairments and performance accuracy in the sampling rates of single/multi-band DPD settings. The authors propose pruning the multi-band DPD model with less complexity and comparable performance than single-band DPD processing.
Many potential benefits of mMIMO terminals include high capacity, high data rates, low-cost RF subsystems, etc. It also promotes some constraints related to hardware integration, where finding the optimal number of antenna elements holds paramount importance. This factor is even more important in multi-user (MU) transmission, which is often entertained via cost-consuming full-DPD equipment [65]. In this regard, Yan et al. [52] has provided a detailed statistical overview of the full digital array (DA), hybrid SA, and full-hybrid (FH) connected arrays for optimized 5G baseband precoding. More specifically, the simulations for three 5G use cases such as dense urban mobile broadband (DMB), 50+Mb/s everywhere, and self backhauling are performed to find an optimal number of data streams and analog precoding chains for fulfilling an output power and SE requirements for MU transmissions. For DMB, by increasing the data streams, it is observed that DA and FH take the lead over SA in lessening SNR requirements while obtaining a high signal gain, which in the SA case is achievable at 2 dBm of high transmission power level. However, because of the common precoding schemes of FH and SA, their performance is equivalent at a 1 dBm higher power level than DA in self backhauling. As a result, the effectiveness of DA is evident from the authors' analysis for adequate performance with reduced RF (or APD) elements or transmission power required. Further, the authors' analysis has shown a negligible impact on SE performance, irrespective of the number of PSs quantization bits. Nevertheless, many bits in the agile radio connectivity induce rapid updating of beamforming amplitude weights in different phase-shifting states. This ratio motivates the need for a low-cost field-programmable gate array (FPGA) to monitor the phase-shifting error into the graphically based look-up table (LUT) [66], [67]. Payami et al. [44] presents a novel framework of agile RF analog BFs. The authors have elaborated on the theoretical and practical aspects of PS speed and resolution in multi-agile RF communication. Their unique analysis includes the infinite and finite phase shifts at the profit of the RF-chains free array system. Such credibility motivates us to explore the use of agile RF-PD techniques in large-array TXs over digital-only BFs.
In addition to the above, array uniformity is another important parameter to characterize the mutual crosstalk and phase differences for the prosperity of multi-dimensional transmission [57], [68]. For this reason, the compact APD circuits lead the low-power PAs of MIMO arrays to tune the PSs for uniform OTA signals so that the overall burden of feedback acquisition on the DSP unit can be compensated. Reference [21] categorizes a classic time frame of PD structures. Subsequently, the authors stretch the applicability of RF analog feedforward linearization in landline telephone systems to the digital feedback linearization of RF PAs. Finally, the document lays out the remaining challenges of PA and PD architectures that must be combated to pave the way for distinguished 5G modulation schemes [69]. These investigations are beneficial for the ongoing research on MMW PAs for phased array MIMOs. Hence, the researchers across the globe have responded well to tackle these directions in recent years to drive the PAs at significant back-off levels courtesy of the DPD has become a new normal [70], [71], [72].
Another developing linearization trend for B5G cellular BSs is to operate the PAs with wide DR signals. For this purpose, Wang et al. [48] presents the latest review, which analyzes a typical impact of MMW PAs on residual TX subsystems with an eye on large DR modulation signals. Moreover, the paper addresses the variants of load modulation networks (LMN) in semiconductor technologies and their proficiency in supporting MMW modulation bandwidths. Besides, the research trade-offs of power gain and linearity in various semiconductor materials are addressed along with vital load modulated MMW PA structures which eliminates the option of external PD while facilitating the complex linearity requirements in mMIMO channel measurements. Hence, it gives the fundamental understanding and importance of weak mutual coupling and mismatch in array-based digital and analog sub-components.

B. Hybrid Linearization Solution
The incessant hardware constraints associated with common distortion learning methods have lifted the research focus on hybrid linearization. Since the HPD preference is recently raised in the equipment of dozens of active arrays, which is the reason for rare overviews in the evolving domain. That is why we will review the surveys devoted to hybrid beamforming (HB), as its EE and SE specifications are interrelated with PD procedures. Hence, we extract the relevant insights of HB to correlate with the signal PD in 5G NR systems. The study by Han et al. [49] demonstrates the embryonic state of HB structures for large-scale antenna systems. The authors scrutinize analog BF (ABF) and digital BF (DBF) separately to learn HB optimality. Therefore, ABF is used to generate the optimized reference beam signal from the number of optimal SAs (denoted by M) and N numbers of RF TRXs. Here, the purpose is to get a reference signal's adequate channel state information (CSI). Based on CSI, the digital precoding is applied to dilute the unwanted spatial components from the combined array response of ABF. Due to this, the authors can find the optimal quantity of hybrid TRXs for the persistent EE and SE graph. However, the behavior of CSI in dynamic DSP units at different 5G NR bands (sub-6 GHz/MMW) is still unclear; for example, the study by Molisch et al. [50] present an extensive survey on the required HB-based CSI levels for different MMW standards. Despite finding the optimal number of analog/digital precoders for different CSI levels, the authors still look for a hybrid structure to balance the system overhead and performance trade-off. In this context, Rihan et al. [47] visualize an up-to-date HB methods for 5G and B5G networks. The authors contemplate multiple factors such as the CSI inherence, frequency spectrums, system complexity, analog, digital precoder designs, user existence, and the number of RF chains. It is learned that the potential utility of analog precoders is to bring uniformity to the nonlinear RF chains and, thus, steer the phase-shifted beam with high gain. The array uniformity also enhances the DPD compatibility to capture adequate (instantaneous) CSI, which as a result, unstrain the hardware integration constraints.

C. Summary and Lessons Learned
The overviews mentioned above talk about the practicality of PD solutions in 5G BS networks for different spectrum scenarios such as standalone carrier frequency or CA and the design challenges of integrated PA circuits in large DR modulations. Moreover, the potential assemblies of MMW PAs in different substrate materials and the replacements of PD circuits for various load-modulated PAs are discussed.
The joint optimization of analog and digital signal linearizers is ultimately needed. Also, the potential hybrid structures should linearize the BF signals with nominal baseband complexities. From an analog precoder (or APD) perspective, the best possible uniformity of the subsequent arrays is expected to meet the high beamforming gain requirements. Also, the array uniformity should compensate for the dynamic impairments of the multi-connected active arrays. After that, the common/dedicated-DPD(s) is/are responsible for mitigating the spectral pollution of the transmitted signals. Also, the CSI screening in the MU mMIMO scenario becomes far-reaching, burdening the DSP unit. Lastly, it is important to truncate the optimal quantity of hybrid arrays to get the appropriate EE and SE conclusion.

III. PREDISTORTION LINEARIZATION: A GENERAL ROADMAP OF TRANSCEIVER SIGNAL PROCESSING
Predistortion is essentially a distortion mitigation procedure, which typically suppresses the residual interference across the entire PA unit to eliminate nonlinear characteristics for better output performance. Since the invention of wireless communication, the linearity of transmitting signals hold paramount importance [73], [74]. In this pursuit, over the past couple of decades, DPD has been the center of focus to perceive efficient spectral signaling [21], [75], [76]. However, with the ambitious goal of keeping spectral consistency in modern, closely located MIMO-TRXs, it is very important to introduce practical, suitable (real-time) DPD prototypes that could support continuous data transmission and have simultaneous beam-steering [55]. These factors have resulted in many DPDrelated studies, research, etc., thus enabling new technologies to quickly enter the 5G market. For this reason, the random search for DPD in IEEE digital library gives over 3000 publications compared to over 500 issues of its analog counterpart (as of early 2022), suggesting the usefulness of DPD in transceiving equipment.
Interestingly, the nonlinearity modeling is also evolved with every mobile generation, based on the necessary distortion (ACPR) levels assisted by the governing authorities [77]. From the linearization of standalone TX elements to the mMIMO TRX compounds [49] about maintaining the uniform phase setting of the desired beam through analog circuitry [44], [78], [79] or squeezing the IB nonlinear distortion via DPD assembly [80], [81] intimates the successful deployments of PD schemes from recent times.
In principle, the PD signal processing across the MIMO arrays revolve around multi-connected RF chains with associated digital engines, which comprises highly sophisticated backstage procedures; starting from the software-defined basis function of DPD polynomial algorithms in computer-aided design tools (MATLAB, for instance) of baseband system to the high-speed data converters further leads VGAs, PSs, amplifiers, etc., as illustrated in Fig. 2. Each of the mentioned system models has its characteristics, challenges, and adopted set of applications, as is discussed in the subsequent sections. Therefore, we classify the general road map of a PD signal Fig. 3. The existing work on PD signal and waveform processing across the TRX can be classified into baseband, digital, and analog sub-components.
generation and system processing across the TRX system models in the following sections. In addition, Fig. 3 shows the classification of existing works studying various aspects of different system models.

A. Baseband Waveform Processing
The backbone element of the TRX module, known as baseband, is responsible for configuring the high data rate information from the collective beams of a FEM. The baseband processing complexity increases accordingly with the spectrum resources; for example, an operating signal of 2.5-Gbaud/s with 500-MHz bandwidth signal in 384-elements mMIMO TRX can easily compass the baseband data rates to 10-Gbaud/s in 16 QAM signal [82] (which can be even higher as the constellation modulation grows). In other words, the baseband sampling frequency with the data rates of typically 5× or 7× higher than the input Nyquist modulation bandwidth is needed to predistort the IB and OOB distortions. However, high data rates increase the computational complexities of baseband processing and exceed the prerequisite bandwidth conditions of the existing digital hardware, irrespective of the single-carrier or multi-carrier communication adopted [83].
The adaptive DSP algorithm in the baseband module can either be compiled through the open-loop configuration (OLC) or closed-loop configuration (CLC), where the OLC further categorizes direct learning architecture (DLA) or in-direct learning architecture (IDLA), see [53] for the overview of these configurations. Meanwhile, it is important to realize the adaption of DPD configuration from multiple dimensions, especially in the presence of MEs, which includes the convergence of algorithm with different levels of iteration, associated trade-offs between error-rate and computational complexity with the exploitation of microwave and MMW frequency bands. However, it has been shown for complex MMW systems, the CLC with inherent nonlinear deviations can balance the performance-complexity trade-offs more effectively with the selection of quantized pruned algorithms [39], [55], [84].
In essence, analytical basis functions in software-based algorithms are defined for these architectures to carry out the back-and-forth operation of nonlinear coefficient filtering. A set of polynomial orders (depends on the carrier bandwidth) called memory depth (M) are assembled over several iterations to predict the impendent nonlinearity at PA output [67]. After that, the delayed version of the initial iteration signal with cross polynomial behavioral data streams of baseband are passed on to the DPD observation engine, where highspeed clock rates diminish the captured IMD of the PA output, as illustrated in Fig. 2. Table IV provides the formulations of baseband coefficient approximations of different nonlinear polynomial models. In this pursuit, frequently applied PA behavioral models of Volterra series (VoS) and machine learning (ML) are stimulated in a baseband programming, where the former contains a complete set of polynomial adoption to ascertain static and dynamic nonlinearities of the input signal. At the same time, the latter accommodates the universal adaption of real-time envelope signals [75], [85]. However, in this case, the VoS is considered as a complete polynomial model to account for the distortion information of IB and OOB spectrum, measured as normalized mean square error (NMSE) and ACLR/ACPR, respectively. But it comes at the cost of very high computational inefficiency, which increases severely at the MIMO level due to the density of an overdetermined coefficient matrix (a) for each input (X) and output (y) polynomial function [86], [87], where output signal Y out (n) of equation (1) can be expressed in a vector (matrix) form as: ⎡ The modeling systems extract pruning methods from VoS to interpolate basis functions in a block-oriented structures to avoid large coefficient assessment; see [88], [89], [90] for the overviews, which can be classified in different variants as are discussed in the succinct sections.
1) Volterra Series Based Pruning Models: a) Linear memory-based models: The Wiener model (WM) is the special pruning approach of VoS, which is the block-structured combination of linear time-invariant (LTI) filter and LUT-based memory-less polynomial function as shown in [91] and redrawn in Fig. 4(a). Meanwhile, in some places, WM is also referred to as a linear-nonlinear model [40]. In WM, each fraction of an input signal surpasses the cascaded linear (LTI) and nonlinear (LUT) blocks for the identification and modeling of the static and dynamic distortions of the device under test (DUT). Because of the large coefficients involved, the joint correlation of the nonlinear and linear orders does not necessarily display the coherent convergence of discrete input samples of WM, as given in equation (2) of Table IV. The solution to this is an extension of the existing WM depicted in Fig. 4(b), which incorporate the parallel LTI filter of an opposite magnitude known as augmented WM (AWM) (as can be seen in the coefficient equation Table IV), which as a result construct a weakly nonlinear system with better accuracy of dynamic MEs display at the output.
In contrast, the Hammerstein model (HM) has a reverse block setting to the WM, as illustrated in Fig. 4(c). This model has a better reputation than WM for having a linearity block after the nonlinear basis function of LUT [40], [91]. In other words, a minor replacement of HM polynomial blocks can give a discrete linear output based on the priori nonlinear information of the LUT, of what was achievable with the augmented approach of WM, as given by the equation (4) of Table IV. Meanwhile, from (5), another pruning edition of VoS is in the form of the Wiener-Hammerstein (W-H) or H-W model, which is the serial collaboration of each other where three-block orientation is equipped to conjunct an LTI system which precedes the nonlinear LUT followed by another LTI filter, and vice versa, as shown in Fig. 4(d). This topology enables the simple gradient-based learning algorithm to cover several iterations according to the nonlinearities identification of different systems such as filters, microwave mixers, and  [40], [92], [93]. b) Memoryless: Likewise, those mentioned above, LUTbased memory-less VoS models (as illustrated in Fig. 4(e)) are exempted from complex coefficient analytics. However, their memory-less nature forbids its implementation towards the dynamical realistic MIMO PAs driven by a digital baseband processing module. For this purpose, Dalbah et al. [94] have taken the impact of MEs for dynamic nonlinear systems in the bandwidth range of 100 MHz by deploying a typical memory-less LUT with a hybrid LUT-based model referred to as nested LUT sub-model. Since the MIMO beam scanning constitutes an impulsive coefficient learning of high data rates and beamforming algorithms, and as a result, the bottlenecks of computational intensiveness hinder the iterative runtime of dense memory depth orders [84], [95]. For this cause, additional pruning techniques for the correlated polynomials of the nonlinear memory systems to confront 5G-NR signal quality have been proposed. c) Nonlinear memory-based models: Becerra et al. [96] have applied the different variants of the memory-polynomial (MP) model to the commercially available PA designs. Another subset of VoS kernels, the MP model, is prominent for its high computational complexity in the cross-correlation matrix of instantaneous complex signals. For this reason, the cross-term effect is not usually endorsed for the basic MP model, as given by the single basis equation (6) of Table IV. In this regard, several algorithms from the family of greedy pursuits techniques have been proposed to reduce the cross-correlation complexity at high orders without losing the nonlinearity prediction for the leading, and lagging terms of IB carrier signal [97], [98]. A similar research group extended their discussion for the sparse MP (SMP) model of VoS, emphasizing the doubly orthogonal matching pursuit (DOMP), which is a modification of the Gram-Schmidt orthogonalization process [44], [81], [99]. The SMP formulation benefits signal with high sampling rates or wideband modulations. Moreover, it provides the freedom to model the discrete samples of past iterations and the cross-term MPs of the current model. In this paradigm, the generalized MP (GMP) model extends the limitations of the basic MP model by contemplating additional basis functions, including leading and lagging cross-term MEs [45], [100], as shown in Fig. 4(f) and given in equation (7) of Table IV. Usually, the applicability of the GMP model falls in between full-VoS and the basic MP model to maintain a uniform fidelity between performance and computational complexity. Besides, the deficiency of large nonlinear orders in non-singular matrix conditioning of MP is also well addressed through GMP for multi-tone signals, such that in the event of variable outputs from the multi-carrier signal of the same waveform, one model will persistently converge the training signals of various communication waveforms [101]. In addition, for highly nonlinear systems with increased memory depth, the GMP algorithm is partitioned (the so-called pruning) into the wide range of input amplitudes for multi-carrier CL-DPD functions [39], [64], [102]. Table V summarizes the performances of different pruning models of VoS presented in the recent literature.
• Complexity Discussion − Although the VoS is considered one of the promising ways of PA behavioral modeling due to its versatile series of pruning algorithms and complex geometric channel modeling, which represent realistic coefficient estimation by adjusting an array of nonlinear PAs, however this, powered by the high-quality linearization of timevarying strongly nonlinear systems, is generally dominated by large computational complexities which can be broadly classified into three main categories-running complexity, identification complexity, and adaption complexity [103]. In the first category, the model parameterization is allowed to access the iterative procedure until the optimal convergence. It becomes more challenging during recalculating the basis function matrix in the case when a signal direction is modified in realworld transmission [104]. While the second and third operations are the most resource-intensive categories, the model's output signal calculations will be affected by the number of model coefficients and training samples in terms of FLOPs. To avoid such an intensive computation, it is advisable to divide the vector components of the DPD input data symbols into a covariance matrix that constitutes an efficient inverse matrix properties [104], [105].
• Critical Analysis − The majority of the representative works in sequential W/H and memoryless equalization models have captured linear MEs, yielding stable learning convergence while inherently obviating the suitability for real-time systems with limited computational resources. However, the recent advancements in such models, the move towards developing the Wiener structure for online cancellation of TRX self-interference, and the flexible combination of ANN with W/H models paved the way for the recursive coefficient adaption in full-duplex strategy to be an enabling linearization method for both contemporary 5G and future communications [106], [107].
− As highlighted in previous sections, to enable the stability of computing algorithm relaying, some advanced pruning algorithms should be adopted to reduce the basis function orthogonalization to a tolerable level. To minimize prohibitively high computational complexity, it is intuitive from a practical real-time applications point of view to deploy a self-orthogonalized cooperative version of the basis function as it has a powerful and smooth MP convergence capabilities [108]. However, dedicated self-orthogonalized learning can also be computationally intensive for the DPD forward path. To avoid such computational efforts, it is advisable, whenever applicable (especially in MMW array linearization), to combine orthogonalization transformation with an inverse covariance matrix to guarantee smooth convergence in the DPD main path [104].

2) Machine Learning-Based Models:
a) Neural network models: The nonlinear polynomial convergence of VoS becomes unideal for the continuous varying MMW signals. For this purpose, an ANN model with its universal approximation ability to counter nonlinear distortion takes over, with the highly adaptive integration at MMW frequencies [75], [109]. The polynomials convergence in NN models is processed through multiple layers, including input, output, and intermediate hidden layers (HLs) see Fig. 4(g). For the real-time behavioral training, the baseband input signal splits into the complex real and imaginary data streams (the so-called neurons), which are processed to the succeeding HL (o k (l) ) where the synaptic weights (w jk ) and biases (b k ) of the complex signals are trained for the modeling accuracy, as given in equation (8) and (13), before the affirmative validation at the DUT output is taken to observe the dynamic convergence response over every epoch as illustrated in Fig. 4.
In general, a set of polynomial coefficients, called epoch (e) in NN terminology, are predetermined to scale the complex weights and biases of HL neurons N l . A particular activation function (a k (l) ) is defined to process the synoptic convergence of HL neurons as represented with f (.) in equation (8), also see [110] for the overview of different activation functions. Here, the fastest convergence rate at the reduced number of epochs considers the best model with less distortion and error [111], [112]. One of the enabling features of NN over MP-based VoS is the less modeling complexity, such as the Kth-times coefficient extraction and inverse functions are required in MP as the MIMO transmission path (K × K nonlinear orders) increases, as proposed by the authors in [111] as future work. On the contrary, a standalone NN model can deliberately model the MIMO TX imperfections based on its universal approximations, as proposed in [109], [113]. Moreover, MP-based models also lose their audacity in characterizing nonlinearities other than PAs, such as DC offset and I/Q imbalancing, as evaluated in [114] case study.
However, for inverse NN models, the fastest and steady convergence rate is achievable after multiple HLs, which comprises three processing stages of summing, multiplication, and activation function, complicating the input signal's training process. Authors in [115] proposed a LUT-based predesigned filter in a convolutional NN (CNN) model to extract complete information of complex (I/Q) input basis functions. This approach significantly eases the burden of neuron modeling on the successive intermediate layer referred to as a single fully connected layer. Also, the training overhead is greatly reduced as the synaptic weighting and biasing of the input neurons are shared by the convolution structure.
Meanwhile, the modern beam-varying array integrated structures encouraged the online system training with realvalued time delay NN (RVTDNN) complex weights. Recent research methods have presented sophisticated RVTDNN structures to balance the number of epoch and training time tradeoffs to avoid prohibitive real-time testing of communication signals [113], [114], [115], [116]. This tradeoff is well managed by converging the online and offline training process of input signal data through the deep NN (DNN) model [117], [118]. In DNN modeling, all the input signal's current and past neuron samples are first selected offline. After that, the optimal algorithm extracts the best suitable combination of envelopedependent terms for PA modeling in a real-time online training [118], [119]. This approach also avoids the signal processing bottlenecks of raw baseband data samples stranded after continuous coefficient iterations [84]. Nevertheless, the online input coefficient modeling with less training overhead should always be the preferred ANN-based behavioral convergence for the PD synthesis of MIMO-oriented PAs [109]. Table VI highlights the performance metrics of recent NN models included in this survey.
• Complexity Discussion − Even though the ANN models are more effective than VoS in targeting high-order polynomials, it is because of their well-built adaptive nature and approximation capacity that allows one-step compensation capability in the three most common hardware impairments, such as I/Q imbalancing, DC-offset, and PA nonlinearity as proposed in [114], [120], of TRX module. However, the complexity of ANN during model extraction is much dependent on the number of synaptic weights and biases. In what follows, pruning methods are usually used to accelerate the training process while reducing computational intensiveness with the main aim of disposing of unimportant weights from each NN layer [120]. − The additional design constraints result in more involved rates and optimization problems of overfitting in developing complex-valued NNs compared to the commonly deployed ANN frameworks using RVTDNN parameters. Thus, to address this problem, it is demonstrated through numerical results in [121] that the complexvalued parameters can be quantified with quadrate times FLOP multiplications compared to the single multiplication of its real-valued counterpart. While the research in this area is still in its inception, the generalization problem and computational complexity of the complexvalued NNs should be properly configured and addressed in the prospective ANN-related works. • Critical Analysis − The literature on real-valued ANN implementations forms the base for integrating ANN within different transmission modes of 5G integrated communications. The typical approach in most of the designs is to decompose the refined I/Q baseband output signals envelope to the NN's input layer that accounts for the concatenated matrix-vector multiplication. Typically, once the neurons are formed, and the modeling data (based on the particular activation function) is established, optimal NN hyperparameters are derived from the iterative procedure until the model converges. b) SVM-based models: Likewise, ANN, the support vector machine (SVM), is another subset of machine learning (ML) recently developed for PA behavioral modeling due to its outstanding features, e.g., preventing model over-fitting (often associated with ANN) and less CPU training time. Meanwhile, support vector regression (SVR) is considered one of the building blocks of SVM modeling, which guarantees improved behavioral modeling and distortion mitigation of dynamic nonlinear systems. The core idea of this baseband model is integrating real and imaginary parts of the input signal on two different SVR machines to build models for the output signal, as shown in Fig. 4(h). In doing so, recent research has studied the applicability of SVR modeling at two levels, i.e., for large-signal transistor level modeling [122], and nonlinear circuit (PA) level modeling [123]. The first application model (i.e., at transistor level) is based on predicting the optimal load impedance conditions across the Smith chart region, while the second approach (i.e., at PA level) is to determine the optimal dynamic nonlinear model parameters.
Transistor-level SVR modeling: The first approach to integrate the SVR method into the behavioral modeling framework of RF power transistor considers all possible loading conditions by measuring device incident and reflected waves, which can have a direct or indirect impact on the power and efficiency contours throughout the Smith chart [124].
In particular, for a frequency domain system, the RF transistor under all load impedance and harmonic conditions can periodically excite the A qn incident and B pm reflected waves with the model describing function f pm as given in (9) [124]. In addition, in (9), β pm,i and b pm , respectively, express the model coefficients and bias terms. Now, by embedding the data symbols of real and imaginary parts of incident waves . . , into the SVR machines, the behavioral model can be trained for the prediction of transistor waveforms.
However, the SVR model proposed in [124] can only be employed for a single frequency band while ignoring the channel predictions at adjacent frequencies. In this pursuit, Cai et al. [125] extended the work of [124] into a broadband SVR-based behavioral modeling. The modeling approach includes the parameter of frequency information f as an input with the following adjustment made to the (9), given as (14) In addition, the model takes into account the sensitivity of DC offset imperfections across the frequency band. However, the principle of this behavioral approach is simple and easy to automate kernel K information at the scattered wave output, though its transistor waveform prediction much depends on the range of input power levels and amount of complex amplitudes of incident waves.
PA-level SVR modeling: This solution aims to integrate the advanced time-delay SVR-based modeling method into the traditional SVR solutions to accelerate the CPU training process, especially for large-sample PA behavioral modeling in wideband systems. For this purpose, the most effective technique to obtain the SVR model in a short time is a gridsearch method [123]. The main reason for using the empirical grid-search method is to determine the best combination of hyperparameters to train the SVR model without extensive testing and validation automatically. Fig. 4(h) shows a general block diagram of the SVR behavioral model where the real and imaginary parts of the input signal are separated and taken into their respective SVR machines. The fundamental methodology of the basic SVRbased PA model is to use all the present/past polynomial amplitude and phase-modulated time-delay SVR signals to convey real and imaginary parts to the SVR trainers for building the PA behavioral model. Then, the corresponding kernel functions comprised of a large number of training samples in basic or time-delay SVR models are accomplished to realize a DPD mechanism [123], [126], [127]. According to [123], the SVR nonlinear function f(x) for PA modeling can be modeled as: where α i represents the model weights. Moreover, K(.) and b, respectively, express the kernel function and bias term of the radial basis functions based SVR method.
Then, to account for the PA memory effects, the transmitted signals need to have the present and the history polynomials of phase and amplitude terms of an input signal. The works in [126] and [127] have also included the higher-order exponential terms of phase and amplitude characteristics of the input signal referred to as an augmented time-delay SVR model. After that we can obtain the relationship of received output signal y(n) to the given input signal x (n) as follows: Finally, the real and complex-valued functions in (15) and (16), respectively, can be separated into real part Re{x (n)} → Re{y * (n)} and imaginary part Im{x (n)} → Im{y * (n)} of the current input signal to output signal as expressed in (10) and (11) and shown in Fig. 4.
Table VII provides some comparisons in terms of performance and limitations of the aforementioned SVR-based behavioral modeling approaches.
• Complexity Discussion − Overfitting issues often smear the ANN modeling, which needs additional time-consuming computations to avoid this. However, the SVR-based models provide welldesigned scalability by defining a soft margin ε boundary to shield the nonlinear function from approaching the sensitive region. As a result, an optimal model is achievable in a short time with marginal complexity in the typical scenarios of narrowband nonlinear mapping. However, the complexity of such a design scheme becomes excessive for wideband modulations in 5G. Therefore, gradient descent algorithms with less memory consumption can be an approximate solution for wideband adaptive applications. • Critical Analysis − As discussed in the previous item, the main goal in SVM-enabled behavioral optimization with a transistor or PA-level application model is to invoke suitable training samples from the ε isolation zone. Hence, investigating an optimum range of ε is crucial in normalizing large samples. With ANN models, longer training loops with overfitting issues are expected because of the meritorious modeling of real-time systems. In contrast, SVR offers flexible model extraction while accommodating standalone and multi-transistor PA devices. Nevertheless, some high complexity real-time behavioral prototyping and validation have to be invoked through SVM with a compelling performance versus fitness complexity tradeoff has to be struck.

B. Waveform Processing at Transmission and Observation Path: From MIMO Perspective
A typical research and development environment of baseband signal processing module encapsulate transmission and feedback/observation (receiver) path, as illustrated in Fig. 2. The digital baseband carrier frequency signal fed to the vector signal generator (VSG) for modulation purposes which incorporates digital-to-analog converter (DAC), frequency up-conversion of intermediate frequency (IF)-to-RF (desired band) courtesy passive mixers & local oscillator (LO), and bandpass filtering. After that, the RF analog section provides a uniformity against the phase and amplitude imbalancing of the DUT, which comprises PA and antenna codesigns to attempt carrier amplification and signal beamforming. At the same time, the vector signal analyzer (VSA) encapsulates the feedback loop to process the RX digitization through frequency down-conversion and analog-to-digital converter (ADC). However, practical TRX modules exhibit certain flaws because of the erroneous test stimulus of data converters, instability, and imperfections of feedback and signal processing equipment (VSG/VSA). These discontinuities bring marginal inconsistencies between simulated and measured results [22], [25]. In this pursuit, the recent research works present energy-cost efficient performances of data converters under different operating conditions, e.g., [128], where the performance constraints of high-precision ADCs are eliminated after defining the PD codes at DAC input, which as a result, characterizes the DAC/ADC co-testing at the same time overcoming the sensitivity of large feedback overheads. Whereas the paper [129] proposes a fast heuristic linearity characterization of standalone PA, which effectively adjusts the amplitude synchronization and phase-coherence of IB signal while considering the practical constraints of spectral regrowth in up to the five channels of DUT. This solution can be harnessed to improve the nonlinear characteristics of multichannel ULA in the event of abrupt change in amplitude/phase beamforming weights, which involves signaling overhead and computational complexity in another case [130].
This section aims to overview the recent studies of TRX baseline modules for PD operation that are suffered from thorny performance evaluation problems, especially at highfrequency integration. Starting from the sub-components of the transmission path to handle the baseband generated PD signal while alleviating the inferiorities of DAC/ADC, followed by the APD sub-components such as PS and VGA, leading to the efficient PD linearization of MIMO systems.

1) DAC and ADC Non-Idealities:
The PD operation of truncated digital signals is associated with the inherent mathematical model defined in the software-based baseband workstation. These algorithms drive the built-in black-box (VSG/VSA) DPD structure. Recent research efforts are dedicated to reducing the high power consumption of black-box data processors whose clock rate speeds encroach the teraflop range to capture the nonlinear distortion of wideband modulated signals [66]. High sampling rates are processed through high-speed converters, which greatly consumes the operating power. As a consequence, the reduction in sampling rates is also risky for the main spectrum as the aliasing effects may incur with it, as shown in Fig. 5(a), thus deteriorating the modeling accuracy of down-sampled signals and SNR.
On the other hand, a software-defined radio (SDR) based TRX with FPGA utility serves as a contrary solution to alleviate high system cost and hardware integration of commercial black-box TRX drivers. Kumar et al. [131] developed a DPD platform with system-on-chip (SoC) embedded SDRbased TRX. It has been shown that the real-time acquisition of PA inverse model coefficients in SDR-based TRX has a fast heuristic convergence speed than that of the commercial DPD TRXs due to the high-speed SoC embedded processors. The complexity and power consumption factor can also be reduced by the use of an undersampling transmissionobservation-receiver with a single ADC in the commercial FPGA platform, as shown in [132]. In this paper, the authors compare the performances of two different real-time DPD test cases with FPGA-based PD engines. It advocates the use of a parallel-processing-based DPD system with less number of coefficients along with envelope-assisted coefficient modeling at the RX. In addition to fewer hardware resources and less power/time consumption in the FPGA-based DPD coefficient extraction, it is not feasible to arbitrarily increase the number of bits for multiple iterations to enhance the DPD performance because of the truncated bits in each DSP unit.
However, the FPGA-based TRX modules on the back of built-in high bit-resolution DAC/ADC (≥ 8-bit with ≥ 150-MFLOPS) are well-known to handle better the high sampling feedback acquisition of wideband (CA) nonlinear systems [66]. While the high bit-resolution entails high power consumption, the downside of separate coding and assembling for each coefficient sample is a complex and laborious task in FPGA. For this reason, the FPGA accuracy is lacking in a large-scale MIMO operation. For this reason, the commercial high-volume TRX instruments are still non-obsolete and widely deployed because of high compatibility and reliability in both offline and online testing of wideband MMW radio signals.
One such approach is to steady the IB aliasing effects and quintuple sampling rates of the digital processors while maintaining an acceptable SNR without compromising the DPD linearization performance [29]. It is worth understanding that low frequencies spectrum is less susceptible to aliasing, as the 5× of 10 MHz bandwidth signal could only require the 50 MHz clock rate in realizing DPD implementation as depicted in Fig. 5(b), where the system cost and power consumption is bearable. The authors in [133] have adopted a time-interleaved approach of multiple DPD baseline components for dual and tri-band signals linearization. The modulation bandwidth of the two bands is 3 MHz and 10 MHz, respectively. These narrowband signals could not force the data converters to operate at a high clock-rate speed as the 5× sampling bandwidth (15/50-MHz) can easily fall inside the maximum bandwidth limit of DPD converters. The authors in [29] have extended the feasibility of time-interleaved structure for the joint alleviation of aliasing effect distortion and DPD high sampling rates. The band-limited DPD is introduced to adhere to the IB linearization and limit the 5× sampling rates of DAC/ADC. In addition, the authors opt for the bandpass filters to restrict the OOB distortions as [134]. However, the input samples are distributed among the parallel (sliced) DPD blocks to control the clock rate speed. Although the authors were able to prevent power consumption while scaling down the sampling rates from 5× to 1.5×, the hardware complexity and cost are also increased, making its implementation questionable for distributed TXs.
The authors in [135] considered the reduced sampling rates and aliasing effects for continuous MMW signals of the 5G-NR FR2 band. The paper proposed a novel method of reducing the sampling bandwidth at the baseband level. Since the output nonlinearity is predetermined at the baseband level, the oversampling rate of an incoming signal to the transmission path is relatively low. This strategy significantly eases the burden on TRX digital processors. In addition, the 5th order kernels of the VoS sinc function are compiled in the baseband to capture the nonlinear cross-terms of IB and near OOB signal distortions, whose multiplications greatly increase the computational complexity. However, a similar research group extends their idea in [136] to ease the algorithmic complexity. The authors assemble the linear-decomposition model in the baseband to cover only the IB signal's timealigned and leading cross-term memory polynomials. Finally, the authors' validation has shown a satisfactory performance of linearly decomposed cross-term memory samples with reduced polynomials and computational complexity over the former Volterra-based model. Meanwhile, the extra polynomial functions are required to recover lost input signal terms in bandpass filters, which reduces the effectivity of under-sampled DPD. Hence, a NN model approach provides a viable solution to this topsy-turvyness [117].
In 2017, Analog Devices proposed an on-chip DPD circuit for 4G SBSs to pave the pre-5G mMIMO systems design strategy [137]. This sort of on-chip DPD integration has given new research directions to leverage the cost-effective benefits of scaled semiconductor technology [131]. However, the massive distributions of IC connections in MIMO systems present unpredictable nonlinearities across the entire TRX subsystems, making it compulsory for the test signals to evade the sine waves offset [128]. In the opposite scenario, the causal dynamic drifts of the supply voltage hinder the desired ACPR and error vector magnitude (EVM) of the modulation signal [138], [139]. This strand emphasizes the importance of potent DPD structures, which should be adaptable to the inevitable dynamic effects and discontinuities of waveform magnitudes in distributed TRX systems.
2) Effects of RF Sub-Components: For large arrays, it is technically cumbersome to deploy a dedicated DPD with each TX chain. Therefore the modern arrays prefer a fewer number of digital streams than TX chains [140], [141]. However, the unusual variations in load conditions of multiple SAs make it difficult for the standalone ADPD drivers to maintain the optimum efficiency and linearity in the arbitrary number of identical PAs. In addition, the load modulations of antenna impedance cause the beam-tilting towards the desired direction. These challenges thus make it mandatory to introduce a hybrid way of PD aiming at providing a better compromise of arduous PD tasks equally between digital and analog circuitry [130].  6. Spatial linearization in the array far-field when SLL has −15 dB threshold [57]. The DPD has been adopted to minimize the ACP of an incidence angle with no impact on mainlobe power.
With their small form factor and low power consumption, the RF sub-components enable them to be embedded before each PA of the MIMO array. The VGA and PS are the two potential components of APD application that are responsible for providing wide gain control and accurate phase tuning to minimize AM-AM, and AM-PM errors of the beam-steering signal [142]. This concept can be understood that 1) the good PS should yield the maximum handling of the wide range of beam-steering rotations. After that, if PS can satisfy the incidence angle requirements of different channels, this will reduce the phase errors (AM-PM) and support user mobility [143]. 2) The VGA maintains constant gain (AM-AM) characteristics at different phase angles. Thus, VGA provides amplitude balancing to avoid any beam-tilting from PS toward incidence angles. In essence, PS and VGA are the final components in a TX chain to process the DPD data streams for optimum beamforming weights. In addition to that, the phased array DPD is used as an add-on to cancel the nonlinearities of PA outputs in the wanted direction. In this way, the DPD-APD collaboration jointly optimizes the beam gain and ACPR to favor the desired steering angle, as illustrated in Fig. 6. In other words, the DPD behavioral modeling is used to counter the PA nonlinearities due to the high interference of load modulations. At the same time, the APD sub-components ensure high beamforming efficiency as they can offer an appropriate array gain towards different steering angles. Consequently, adequate array gain surpasses the interference from unknown directions and protects the EVM degradation, essential in high constellation modulation schemes [144]. Because of their fabrication on silicon semi-conductor material, these analog components have shown great area-effective performances in recent works.
Amplitude-Compensated Phase Shifter: The PSs maintain uniform gain characteristics over different phase tuning states with less possible insertion loss at each state. In this scenario, the appropriate selection (active/passive) of PS is extremely important for RF or hybrid predistortive TRX, such that to provide high-resolution to the large DR beamforming of 5G modulation schemes [145]. The PS in TRX systems can be calibrated at different stages, including the digital baseband stage, LO stage, or RF stage. Meanwhile, the readers are encouraged to go through [146], [147] to overview the PS classifications. In the case of active (digital) PS, they require auxiliary data converters to minute gain, and phase errors, where high bits/speed converter can provide high accuracy (i.e., near 360 • isotropic beam-resolution) [148], [149], [150]. In this classification, most of the research studies consider quadrature all-pass filter applications using lumped-elementsbased phase/amplitude compensation networks. Fair to say, digital PS has much higher accuracy than its passive counterpart. But for hybrid or digital predistortive TRX, additional complexity in digital baseband cannot be afforded, given a top priority of avoiding an input signal oversampling in a large array TRX system.
Nevertheless, the compensation level of phase and amplitude signals in digital-only PS lies between 1 • ∼ 4 • and 0∼2 dB, respectively. To further improve the performance characteristics of the active PS approach, the recent designs [151], [152] adopted the PS calibration at the LO stage, which shows a record performance in terms of OTA testing of 64 elements TRX array. This calibration enhances the accuracy of phase and amplitude variations to 0.08 • ∼ 0.3 • and 0.01∼0.1 dB, respectively.
Conversely, RF passive PSs have the commandment over their active counterpart in reducing circuit complexity and giving near 0-milli-watt power consumption [78], [153]. Since RF signal is directly generated from VSG, no complex digital waveforms from arbitrary baseband generators are involved in this approach. That is why this calibration topology claimed to have a smaller phase resolution and less beamforming efficiency than others [152]. However, for the sake of competitive performance, a simple PA-aware precoding baseband algorithm can be defined (irrespective of DPD execution) for passive PS to guarantee the optimum signal transmission [154]. Also, the recently developed concept of hybrid PS is well-argued, where tuning states are defined in passive form while the gain-neutral characteristics with low insertion loss are obtained through active PS [79], [155]. In addition, the leading VGA is used to compensate for the magnitude tilting at various angles to ease the heavy-duty tasks of maintaining low gain/phase errors and high beam-scanning resolutions of PS, as discussed next. A rare method here could be implementing a vector-sum differential (I/Q) phase shifting approach where multiple VGA units are integrated into a single phase-shifting core to adjust amplitude weights [156]. Hence, this technique can also provide the joint compensation of I/Q imbalancing and PA nonlinearity. The summary of recently reported works on PSs and their effects are summarized in Table VIII.
Phase-Compensated and Low-Noise Variable Gain Amplifier: The VGAs (or driver amplifiers) in phased array TXs are responsible for multi-tasks such as 1) compensating the gain variations at different phase angles of PS by 2) providing a large gain control and linearity improvement over a wide input power range [157]. For a PS, exposure to insertion and switching loss is a common reference point while tuning the beam angles. As described earlier, a built-in digital calibration system can mitigate these losses at the expense of large area occupancy. However, the recent RF beamformers have shown an all-around display of  [44]. Likewise, PS, the VGA, is also appointed on both TX and RX sides, as shown in Fig. 2; therefore, it is important to know the mutual responsibility of each other.
At TX, VGA is deployed in conjunction with PS, called phase-invariant VGA. Over the last five years, the research studies on phase-invariant VGA have been urged to enhance its dB-linear range [158], [159]. They must reduce gain errors at different phase-shifting states for low sidelobes levels (SLL). Thus to give sufficient amplitude weights for the main beam and high beam-tapering for interfering SLs. These signals improve the sensitivity and DR for the receiver VGA integrated with a low noise amplifier (LNA) known as low-noise VGA [147]. Apart from identical performance criteria, their operating behavior differs in terms of bias voltages as optimum biasing is needed in sufficient amplitude weighting for low  [160]. Meanwhile, insufficient biasing exceeds the VGA transistor's maximum temperature limit, which induces DC offset due to the severe parasitics [161]. Hence, the gate biases are controlled to minimize the effective transconductance for gain enhancement.
Different gain control techniques have been reported in the literature of MMW analog-controlled VGA [162]. Among these techniques, most of the establishment has been made in the conventional designs of the current steering topology (CST) [163]. Since CST enables the phase inversion via adjustment of gate voltages; therefore, different gate widths or biasing levels are given to minimize the parasitic effects and increase the dB-linear range. For this reason, CST-based VGA designs have also been realized through both analog and digital control voltages [164]. However, the design obstacle of CST in TRX VGA is the circuit volume, where the cascade (multi-stage) structure brings the fine-tuning of dB-linear gain characteristics and effective transconductance.
In contrast, a single-stage structure could form a large transistor to fulfill an adequate gain control range. The large transistor size could also debunk stringent parasitics, leading to bandwidth degradation. The recent study has merged RF sub-components in a common-leg TRX topology for compact die-size, and low-temperature variations instead of designing a separate PS and VGA for a standalone TX or RX [165].
The RX-VGA alongside LNA is responsible for 1) containing a large gain in-band flatness, such as to keep the low noise figure against rigorous RX sensitivity, and 2) balancing the amplitude and phase response of incoming signals to avoid I/Q mismatch for desensitization, and finally, 3) to maximize the ADC DR for the establishment of reliable wireless link [166]. The incoming signals with constant phase and gain control range are expected to have a minimum EVM rate (< 5%) and high SNR for reasonably acceptable RX sensitivity. Since the modulation of broadband signals, the RX sensitivity can become subtler as the multiple IMD products in the form of aliasing can easily intervene. Besides, the VGA is placed before the ADC [167]; therefore, the aliasing components need to be filtered out before ADC [29] to process enough gain from the preceding VGA to realize the digitization of incoming signals with their full input range. Table IX provides the performance comparisons of the aforementioned VGA designs.

C. Future Research Directions
Based on the review mentioned above, there are still many debates and open research issues over the specifications of future high-performance PD waveform processing across TRX subsystems.
As discussed earlier, the wideband systems exhibit strong dynamic nonlinearities, where the high-ordering memory depth becomes a challenge to cover sufficient DR. As a future work in behavioral modeling, it is of interest to extend [103] by slowing down the complexity growth rate in developing a fully basis-function rich nonlinearity modeling for the leading/lagging cross-terms and linear/nonlinear MEs in a realtime iteration of multi-antenna TXs taking into consideration the problematic IB and OOB interferences. After that, the applicability of undersampling techniques in the multiple feedback paths of massive FEMs is also an important design challenge. Therefore, an interesting research contribution extends the under-sampling of both real and complex (I/Q) baseband signal components to enrich the low feedback sampling of the MIMO active array [168]. However, using an under-sampling technique can also poach the aliasing effect. For this purpose, a memory-less PA-aware precoding algorithm could provide some freedom in extracting the partial kernel knowledge of nonlinear basis functions [154]. In addition to this, for MMW enabled HB, by taking advantage of computational and powerefficient analog PS precoding algorithms, the reported strategy of [169] can also be adopted to relax the high sampling burden on digital processors. This strategy could frame an optimization problem in MIMO transmission to minimize the worst-case scenario of MU interference due to SL distortion.
On the analog circuitry side, the prospects include the tiptop fabrication and testing of passive integrated RF subcomponents. Specifically, accurate PS and VGA are the modern shapes of APD composition [72] to combat nonlinearities through proper beam shaping. In what follows, phase and gain squinting errors should not exceed the maximum threshold of EVM degradation [78], [166]. In future research, improved scanning resolutions of PSs are needed to envoy wide beam-steering for large-sized phased arrays. The MU scenarios are also interesting to pursue because the dynamic PSs must upstand with abrupt phase-shifting states according to the user's movement. Also, an interesting design challenge offered by the PSs in terms of high insertion/parasitics losses and PVT variations arrogates the mutual coupling and ideal impedance matching conditions, leading to high phase and gain errors, are worth exploring. On the other hand, to suppress SLLs with sufficient amplitude weighting, the TX-VGA should provide a wide gain tuning range with low variations over large input power. It will increase the directional gain and transmission power efficiency, especially for the wideband RXs with multi-standard communication signals, resulting in a high spectral density of the classical main beam signal, leading to in-band gain flatness RX-VGA for proper quantization. However, future research should also interrogate VGA transistor parasitics' deficiencies, mainly at high frequencies, which badly affect the PS insertion loss and wideband operation. Hence a compact RF VGA circuit with versatile gain tuning and small phase variation in higher frequencies is needed to empower the uniform linearity, which is an area of future research.

D. Summary and Lessons Learned
This section has discussed system model perspectives (which include the roadmap and PD waveform processing across baseband and DUT side) of PD linearization. The modeling approaches and their references are summarized in Table IV-IX. In summary, a PD process encompasses computer-aided testing, signal processing across baseband and precoders, and radio transmission areas. In all scenarios, behavioral modeling is the most important avenue of PD processing and is a large research topic of its own. The review above of baseband waveform processing for DPD signal generation from a system model perspective has discussed various nonlinear polynomial models to test the PA immunity against static and dynamic nonlinearities. Specifically, the VoS and ML-based models have remained the two key aspects of PA behavioral assessment. The summarized key lessons from Section III-A are as follows.
• It is learned that the cross-term MP extraction of conventional DPD models, including the subsets of VoS models, can be one of the most complex operations with the increase of memory orders in wide bandwidths, which will remain a vibrant research focus that needs further study to codify the safety PD metrics for the concurrent MMW communication standards.
• The augmented model of NN extends by explicitly accounting for both static and dynamic distortions, making them suitable for operation in wideband nonlinear systems [114]. It also suggests targeting a tradeoff between the extent of coaction among the system convergence and stringent computational requirements while training online and offline coefficients of the NN model. Hence, the DNN model for such a setting can help to characterize this tradeoff.
In Section III-B, we have reviewed the methodology of PD waveform processing across the different sub-components of digital and analog precoders. The key lessons learned are summarized as follows: • The error-free test stimulus in TRX modules can increase the frequency diversity of both VSG and VSA by mitigating the effects of the high-order distortions and resolving the acquisition of modulation signals with large DR.
• To eliminate the factor of high sampling rates and hardware complexities of black-box signal processors, effective FPGA embedded SDR-based data converters are deployed to project the fast and flexible under-sampled PD waveform, minimizing the system latency. Nevertheless, using black-box DSP units as a testbed is considered a promising way of largescale integration. Several processors of coordinated arrays provide ubiquitous coefficients extraction through array feedback diversity to compensate for the pre/post-PA losses at MMW frequencies.
• The MMW large-scale systems' spatial diversity relies on the hybrid-precoded-based beamforming with low-dimension DPD and high-dimension APD coordination. The array-based  potential enablers of APD are VGA and PS [57]. In contrast, the DPD methods can utilize the established pruning algorithms to facilitate the fast adaption of nonlinear memory coefficients.

IV. PREDISTORTION ARCHITECTURE CLASSIFICATIONS
AND DESIGN APPROACHES The modern MMW wireless channels encourage the sparse connectivity between TRX modules with high SE and EE to fulfill the tougher constellations of high-order modulation schemes. With the intensive reconfigurability of array architectures, the PD schemes need to be highly adaptive to alleviate the adverse distortion effects of PAs nonlinearity. Since PA is the hub for distortion as it invokes severe distortion effects of different nature. These nonlinear effects are often categorized as IMD products, aliasing effects, OOB spectral harmonic regrowth (ACPR/ACLR), or AM-AM/AM-PM distortion, which can degrade the transmitter PAPR specifications, beamforming gain, spectral densities, receiver SNR, and EVM. These nonlinear effects were negligible a decade ago due to the limited bandwidths of conventional wireless communications [21], [170]. Since then, the TRX linearization preferences have taken a one-eighty degree turn where most of the current BSs are equipped with digital domain systems for high-fidelity linearization. Recent studies have brought high-quality services in the demonstrations of PDbased linearization [17], [19], [171].
This section is intended to survey the recent developments reported in the analog, digital, and hybrid linearization solutions of PD techniques, followed by the technical challenges with basic advantages offered by the featured techniques in modern wireless networks. Fig. 7 highlights the classification of existing works on PD architectures. We also summarize some of the important features of PD classifications in Table X.

A. Analog Predistortion (APD)
Analog-based communications have ruled the telecom industries for over 50 years. The cost-effective and convenient footprint of APD techniques has enabled its corporate identity for the jumbo PAs used in satellites or radar-based communications [21], [172]. Since APD delivers the correction signal before the nonlinear PA, therefore it can also be called the feedback-based PD linearization, as shown in Fig. 8. Unlike DPD, where the PD process starts from the PA output after the first iteration of behavioral assessment, the APD directly delivers the compensating linearized signal to the PA input. So that the inverse characteristics of each other cancel the nonlinearity, and hence, no prior information of distortion is given to the APD from the baseband modem. This exempts the need for 5× digitization bandwidths of high-speed data converters in analog circuitry. In other words, the IF filter bandwidth remains the same as the input signal bandwidth to acquire linearization bandwidth, which leads to low power consumption and hardware complexity. Over the recent years, APD has been designed in various configurations, as are summarized in Table XI, providing a high degree of freedom with its area and cost-effective analog techniques, as discussed in the following sections.

1) Internal/Built-in APD linearizer:
This topology of the APD linearizer is embedded inside the operating PA, mostly at the input matching network, as shown in Fig. 8(a). The recent works have demonstrated high gain and linearity performance with little DC power consumption while eliminating the need for an external PD control circuitry. In particular, the linearizer is used as a shunt component to provide design stability and high isolation against the detrimental effects of the gate and drain bias conditions. This feedback neutralization technique minimizes the impact of parasitic capacitance and inductance at PA input. Also, the transistor back-gate biasing is another approach to equalize the current distribution at drain and gate nodes [173]. The back-gate biasing maintains the desired quiescent drain bias current and output voltage swing while assuring no current leakage due to device intrinsic transconductance sensitivity, thereby giving high efficiency and linearity.
2) IMD Generator-Based APD Linearizer: The external APD circuit comprises two parallel branches. Usually, these branches are excited with the two-tone input signal, where the upper (linear path) branch carries the original RF signal to the PA for amplification. In contrast, the lower (nonlinear path) branch deploys the IMD generator to form the opposite (180 • out-of-phase) characteristics of an original RF signal as illustrated in Fig. 8(b). After getting the IMD nonlinear response of PA, the delayed version of an original input signal with reversal IMD components reaches PA input for distortion cancellation. Here, the two paths need to have an adjacent convergence delay for accurate reverse characteristics of IMD compensation. The APD linearizers having an external structure mostly cater to the effective cancellation of IMD3 close to the fundamental bandwidth. However, to compensate for the fifth-and onward odd-orders IMD, additional analog components can also be added [32].
3) Tuneable Diode-Based APD: This technique furnishes the independent conversions of gain and phase waveforms, Likewise, in previous methods, this approach also shapes two parallel branches where the bias of two paths is adjusted to enable the independent tuning for AM-AM and AM-PM compensation, as illustrated in Fig. 8(c). Unlike previous techniques, the delayed version of an input signal is exempted here, making it easier for the two paths to correlate simultaneously. In addition, the bias voltages of a diode-based nonlinear generator enable the equivalent reverse control of gain and phase expansion (or compression) according to the PA output response over a large input power range (>25 dBm). This approach is suitable for linearizing high-power electronic devices such as solid-state PAs and traveling wave tube amplifiers.

4) Tuneable Sub-Component Array Based APD:
This category is one of the emerging and preferred forms of APD linearization along digital baseband paths of FEMs, as given in Fig. 8(d). The analog signal processing (or RF beamformer) block in hybrid TX systems is mostly comprised of low-cost PS only (referred to as phase-shifting predistorter in [174]), where the accurate phase rotation is visualized to control the phase mismatch and degradation of feedback signal for DPD learning [140], [141]. However, the PS becomes prohibitive in the millisecond adaption state of dense 5G networks incurring high insertion loss, causing AM-PM errors. Therefore, to further enhance the amplitude weighting and PS switching speed, an analog controlled VGA [175] and PS-based APD beamforming unit is also well-argued in recent times [57], [72].

B. Digital Predistortion (DPD)
The DPD has been recognized as the de facto approach among the research fraternity for PA linearization. In contrast to APD, the DPD approach involves a baseband and DSP units that bring more flexibility and effectiveness to the wide DR of modern communication signals. The DPD is well-known for its user-friendly adaptivity. As discussed earlier, the nonlinear output signals captured at the RX path proceed to the baseband modem, where a vendor (or custom-designed) kernel control algorithm is compiled to reduce the sensitivity of distortion products, which is also shown schematically in Fig. 2. Ideally, LUT stores the information of nonlinear coefficients, and the convergence algorithm (see Table IV) weighted by a set (memory depth) of coefficient polynomials process the manipulation of LUT to shape DPD operation.

1) LUT-Based DPD:
This simplest and cost-effective form of DPD linearization with memory-less nature can also be referred to as a non-adaptive solution. The LUT application in DPD includes the storage of graphical magnitudes information of original input and output signal entries after the first iteration (or convergence). In this pursuit, the most relevant discrete samples of uniform intervals among LUT entries are usually picked to evaluate nonlinearities and to avoid the complexities of large table sizes, referred to as bins in [76].
Generally, the LUT-based PA linearization is considered a memory-less contestant. For this purpose, the WM or HMbased LTI filter is used as a composite to account for memory modeling [105], [176], as explained in Section III-A. However, irrespective of the polynomial orders, LUT provides a degree of freedom in avoiding analytical ill-conditioning cumbersome of high order polynomials with power-efficient modeling, and no extra cost [131], [177]. The solitary dominance of oversizing in LUT is well-reimbursed by replacing the pruning subsets of VoS polynomials with cross-term interpolation and extrapolation to form the less-complex linearized output in a sole LUT-based DPD [178].
As stated in [76], the number of LUT entries may vary from 128 to 256 control points. But, in 5G-NR bands, the IB and OOB standardized linearization matrices have been relaxed in the OTA characterization. Therefore, the recent LUT-based DPD [179] operating at FR2 with 16∼32 LUT entries have comfortably met the minimum ACPR limit of 28 dBc defined by 3GPP [77].
Usecases: A variety of LUT-based DPDs have been demonstrated in the recent literature to support challenging usecases such as providing well-conditioned parameter estimation in large-scale polynomial coefficients. One such lower complexity solution is an extension of the conventional direct learning adaption of LUT coefficients, is the concept of linearly interpolated LUT implementation [105], [177], [178]. With this approach, a subset of necessary basis functions is served and extracted from a wide distribution of many individually controllable multi-LUTs, providing an extra degree of freedom to the typical ill-conditioning problem. It has been successfully applied in continuous DPD adaption to both FR2, and MMW-enabled active array communications [105].
2) Adaptive DPD (ADPD): The modern PA phased arrays exhibit strong MEs in the wideband OFDM test signals of 5G-NR-related transmissions. The OTA emissions of closely packed TX elements suffer from insufficient isolation from OOB irruptions which calls for an ADPD to mitigate the intensive sensitivity of modern dynamical systems. The ADPD based on canonical basis functions is the most productive and established nonlinear modeling method. Also, an ADPD can be computed in FPGA or commercial black-box digital TRX modules. In addition, FPGAs, because of their lower hardware cost and power consumption, are widely deployed in the realtime testing of 5G-NR FR1 bands [60], [66], [132], [180]. At the same time, commercial DSP generators are highly prevalent in MIMO systems to continuously track scattered beam variations of FR2 bands. As such, a conventional offline learning approach of DPD coefficients has been emulated in a real-time online environment to increase the potency of commercial TRXs at MMW spectrums [39], [108], [181].
In a real-time 5G array modeling, the serials of nonlinear basis functions are computed in DPD engines to counteract in-band NMSE dominated by out-of-band ACPR. These functions with real-time adaptions are preoccupied with ergodic observation (i.e., PD) matrix, in the form of least-square (LS) or least-mean square (LMS) algorithms, often leads to numerical unstable (ill-conditioning) interpolation in ADPD behavioral modeling. These circumstances are well-arduous for a decent DPD system. Hence, two kinds of precautionary measures can be taken to alleviate the ill-conditioning issues in ADPD, that is 1) pruning of PD polynomial model to update relevant coefficient sets of basis function [181], [182], [183]; 2) applying regularization matrix to steady the training samples of weight vector especially in iterative online linearization [60], [99], [177], [184], [185], [186].
The real 5G baseband is empowered with an in-expensive and well-defined iterative algorithm with steady-state convergence and a moderate amount of modeling coefficients to meet the specified residual distortion. Besides, the smooth polynomial convergence can be realized on the back of thousands of training samples (or floating-point operation per second (FLOPs)) captured over several blocks after the amplification. As written in equation (12), a matrix formulation is carried out to extract training samples. An LS/LMS data matrix of X·a can be constructed in any of the polynomial models (given in Table III), which is further assembled in a baseband workstation to build the memory basis for linearization. The number of samples in each block can vary from 2k to 32k, as claimed in [76]. In addition, regularization criteria of matrix inversion to minimize the conditioning errors of an overdetermined X matrix, such as Moore-Penrose pseudo-inverse [75] or singular value decomposition [76], is usually defined to guarantee good stability and matrix convergence: (17) where (.) H represents the conjugate Hermitian transpose. Recent studies have proposed effective methods of easing the high computational complexity of ADPD baseband training samples. An improved ridge regression method in FPGA-based baseband is applied to the OFDM operating PA device [185]. The authors have compared their method's effectiveness with LS-based PD coefficient estimation under the same amplitude of training samples. Therefore, the approach has relaxed the high training samples requirement as a consequence of reduced correlation effects among output vector y components, which lacks in LS estimation. Hence, they showed that their ridge regression approach significantly reduces the training samples by 95%, i.e., from 32768 to 200 with satisfactory ACPR performance. A piecewise (PW) coefficient estimation method is recently evolved due to its fast adaption of nonlinear MEs in the DPD forward/transmission path. In [186], a mesh-selecting-based ADPD identification algorithm is proposed. Based on the minimized sample extraction of PW technique, the standard ACPR limit of LTE band, i.e., -45 dBc, is comfortably met with high PAPR for multicarrier signals. References [39], [187], [188] have applied a PW learning algorithm on 5G-NR waveforms to derive a reduced-order DPD basis from the cross-term combinations of near band samples. For MIMO arrays, the PW function cooperates with the mutual characteristics of BF regions in a subset orthogonalization form to guarantee the smooth convergence of coefficient adaption. Hence a joint adaptive learning and regularization matrix is properly embedded in the LMSbased PD identification to circumvent the mutual divergence among BF submodels.
Meanwhile, the probability of instability due to the inevitable estimation errors during a longer estimate process or the mutual differences in the PD models is an important design challenge in providing LMS stability for real-time applications. Reference [189] expanded the study on LMS-based ADPD iteration to include signal-dependent noise (SDN) effects on the nonlinear memory-induced output weight vector. Since the SDN is accumulated in large iterations and causes divergence or matrix ill-conditioning, it is also well known from the authors' illustration that the reduced sampling rates are the initial source of SDN (or aliasing effects), as the cut-off frequency of the RX low-pass filter lies close to the first Nyquist zone. However, the authors strategy of vector update showed the elimination of SDN in large iterations with adequate bias levels to balance the hardware resources and computational complexity.
Usecases: Contrasted with LUT-based DPDs, ADPD offers the opportunity to dynamically update the DPD coefficient in response to the fast-changing PA nonlinearity. An ADPD learning procedure can be thought of as another way of realizing optimal parameterization in the combined or dedicated feedback of multiple TXs. Still, unlike LUTs, ADPD is not constrained to support a smaller number of nonlinear orders.
On the contrary, based on the mature pruning algorithms, ADPD can keep an even larger number of BFs and coefficients, ideal for the massive mobility and rapid beam-tuning for 5G/B5G channel modeling in frequent DPD adaptions.

C. Hybrid Predistortion (HPD)
The HPD method has recently emerged as a new frontier to confront the targeted nonlinearities with intermediate hardware costs and resources. In modern mMIMO beamforming systems, a digital-only PD linearization is considered an unrealistic option. The high energy consumption and cost inefficiency from dedicated DSP units have brought the concept of joint PD alleviation of APD and DPD in MMW communication. It is envisioned in integrated hybrid TX that a dedicated APD circuitry in an analog beamforming array modules provides a directive communication to ease the coefficient identification process of shared DPD. The unusual correlations between PA-antenna constitute the PD operation under very nonlinear conditions, especially in FR2 established 5G bands to reduce the initial ACLR reading from around 20 dBc to 28-30 dBc [39]. The DPD-based MIMO beamforming offers satisfactory all-around accuracy at the expense of increased system overhead. Likewise, in paper [190], the authors can linearize the multipath transmitting beams of 5G-NR signals through dedicated commercial DPD instrumentation for each RF chain.
In contrast, APD-based RF beamforming is a cost-effective solution to abstract optimal spatial multiplexing gain [44]. However, the optimization goal of APD from the perspective of hybrid-MIMO TX is worth understanding since it has low flexibility and linearization performance than dedicated DPD, which is not to enhance the linearity of single-handed DPD but to maintain the uniformity among interconnected IC-based PA devices, as debated and experimented in [68], [191]. In conjunction, HPD provides stability and serene performance to both IB and OOB communication, as a part of APD, which is efficient in suppressing OOB emissions, and DPD, a powerful tool for IB linearization. The HPD solution with odds-on features is anticipated to uplift the user densification forfeited in high propagation loss of 5G and B5G cellular networks. This section further covers the key contributions of HPD in different PA operating scenarios investigated in recent research works.
1) HPD Linearization for Standalone PA: This method provides a high level of nonlinearity compensation since it alleviates PA static and dynamic distortions [192]. This HPD testing mainly applies to low-frequency range signals (e.g., sub-6 GHz) as the preceding analog circuitry becomes lossy and bulky at the high-frequency range. In some works, an analog linearizer is employed as a compensation circuit along with MP assisted DPD model to target both the static (shortterm MEs) and dynamic (long-term MEs) nonlinearities that occur due to the stringent self-biasing oscillations [193], [194] as depicted in Fig. 9. This approach alleviates the option of custom-designed MP-based DPD identification algorithms. It thus provides ingenuity to the built-in vendor-supported DPD training model to achieve a satisfactory level of linearization. Nevertheless, the power-hungry DPD engines used in [193], [194] exacerbate the linearization capacity to the wideband parameter estimations.
On the other hand, the long-term MEs or cross-term polynomials, if treated properly in HPD, effectively linearize the broadband systems in compliance with the 5G-CA schemes [66], [192]. Likewise, in [66], an FPGA-based baseband is deployed to control the APD circuits for higher accuracy digitally. The FPGA-based DPD engines avoid the high clock rate processing of baseband signals rendering the synchronized phase characteristics of IMD function, which also avoids the lossy APD nonlinear components of conventional designs [31]. Different methodologies of HPD linearization for standalone PA presented in the recent papers are summarized in Table XII. 2) HPD Linearization for MIMO Systems: The HPD in MIMO systems has been studied and implemented in various typologies where dedicated or common digital processors are employed to drive the multiple numbers of analog RF processing chains, see Fig. 10. In array-based HPD, the processed feedback calibration coefficients are digitally precoded after the first iteration to estimate the nonlinear array behavior caused by the imbalancing of analog amplitude and phase control circuits. Within the scope of MIMO-HPD, the DPD serves in two ways for monitoring the feedback information, i.e., an Fig. 9.
A HPD linearization of a standalone PA that achieves the compensation of both short-term (DPD) and long-term (APD) MEs. error identification from only the phase-controlled feedback signals of nonlinear PA responses in near-field coupling [195], and error estimation from the amplitude and phase-controlled beam-oriented streams in the context of far-field coefficient propagation vectors [38].
With Single-DPD for Hybrid Array System: This linearization model forms a dedicated APD for each PA of the MIMO array to bring its nonlinearity to a uniform state and ease the coefficient extraction of shared DPD. In other words, single DPD conforms to the array structure as a single-input-single-output (SISO) DPD system, as given in Fig. 10(a). It is predominantly an example of stabilizing the main beam signal by combining all the PA outputs of SAs, as a part of the mean-DPD identification for a given amplitude and phase distribution [196]. In [190], a four-channel hybrid array is considered one virtual unit to obtain full-angle linearization of the main beam and sidelobes. The authors testing shows that a single-DPD can achieve the equivalent performance of the full-DPD if the PA variations among multiple RF chains are properly compensated with analog tuning. Furthermore, the authors experimentation has also drawn a comparison of different nonlinear distortion modeling scenarios, such as a combined PA response, dedicated DPD for each chain, digital tuning of analog boxes to include memory, and memory-less analog tuning. However, the PA characteristics in large arrays are unavoidably different. For this purpose, Liu et al. [195] presents a hybrid linearization of the main beam signal from the large-scale array where the feedback receiving points are integrated at the array edges across the transmitting elements. The authors emphasize the importance of proper analog beamforming vector tuning concerning the dynamical nature of array integrated PAs. Consequently, the optimized analog circuits form unaltered beamforming weights as the authors use the principle of superposition theorem to align the main beam signals of operating SAs. Moreover, the developed iterative algorithm has near-optimal decoupling of signal streams from the feedback signals, thus reducing the computational complexity and power consumption. Now in the far-field radiation, the large variation of beam-steering resolutions ruins the amplitude tapering of analog PSs, which raises the importance of designing an optimal amplitude-controlled PA array [196]. References [57], [197] considers beam-shaping for the linearization of a hybrid array system. The ABF section is equipped with VGA and PS to simultaneously reduce the AM-AM error and ACPR from SL emission and phase difference among parallel PAs. It has been shown that slight variations of amplitude weights have a notable impact on SLL, causing extra calculations to determine the mean-DPD coefficients for different SLL cases before assigning fixed beamforming weights in the direction of the targeted user. An alternate optimization approach of DPD and APD parameters is employed in wideband hybrid MIMO-TX to avoid complex formulations from compromising DPD algorithms [68], [198]. The presented HPD models are based on the analog step uniformization to linearize all the PAs in an array with a shared DPD. The similar layouts of testing PAs enable the time-saving tuning of APD control voltages to quickly reach the optimization goal of DPD iterative learning, thus reducing the feedback overhead. A brief comparison of referenced developments in HPD with single-DPD is enlisted in Table XIII.
With Dedicated-DPD for Each Hybrid Chain: This configuration of HPD is extracted from full-digital beamforming, also known as a fully connected (FC) hybrid array [52]. The dedicated DPD hardware effectively mitigates the crossinteractions among TX chains since each channel works independently, and hence, enables spatial linearization for the MU scenario as illustrated in Fig. 10(b). However, this approach might be outdated because of its hardware overhead, especially with additional feedback processors where the high clock samples could bring the maintenance cost to an unimaginable state. The work, contrary to [190] minimizes the multibeam interference per chain by constructing a PD signal for each beam [191]. The modeling characteristics of full-digital TX are used for analog multibeam TX to alleviate mutual interference due to the PA input/output deviations. For this purpose, the LS estimation of singlechannel DPD is adopted to effectively remove the linear and nonlinear distortions. Interestingly, the alleviation of linear distortion is ignored in [190] by a similar research group since the transmission of a virtual main beam was prioritized for the direction-specific user. Nevertheless, multibeam communication encourages high spatial isolation in all directions to enhance multiplexing gain [108].
In parallel SAs, the coupling among RF components is much higher than that of individual array elements. Ng et al. [199] presents one such system that minimizes the mutual coupling and steering mismatch of a single array. The nonlinear basis functions of pruned VoS model from the MIMO system are assumed for SISO-DPD to exploit the multiplexing gain of steering angles from analog subcomponents. Yan and Cabric [200] studied the combined effects of ISI and inter-beam interference (IBI) to get the near-optimal SE of fully connected hybrid TX. They further evaluate the unit amplitude from the combined beamforming weights of analog precoders for the user equipment to run the digital compensation algorithm for the IBI molecules of those side emissions that intervene with the main signal while ignoring other SLL. The authors' analysis found that optimal SE is more or less the same no matter how many RF chains are used, even with the ideal PA. This constituent depicts the importance of compensating for the nonlinear distortion from the beam signals toward the corresponding users. Liu et al. [201] extend this focus to improve the analog PS streams for adequate spatial isolation to remove IMD and ISI from OTA beams of dual-stream phased-array TX. Due to the multi-dimensional MP-DPD model, the cross-channel modulations are significantly suppressed at the cost of high computational complexity. These authors have extended their work in [202] by linearizing more (4) streams of the FC-HPD array. It has been shown that the multi-variable DPD model can effectively mitigate the high-order IMD radiations imposed on the linear beam streams in a real transmission scenario, which altogether deviated from the analog PS modulation. After that, the RF chains split into the main branch and auxiliary branch to control the extraction overhead of the DPD behavioral model. Based on this, the nonlinear distortion is largely compensated by the main branch. However, if the main beam IMDs severely aligned with auxiliary branch streams, then the fully multi-variable rich basis functions are considered an alternate calibration. The research work in [203] conducts a computer simulation of FC analog precoder framework to evaluate the sensitivity of OOB radiations, with a technique, namely, generalized Eigen-space BF, to develop the high power beam pattern. The achievable information rate is the desired performance metric under PA mismatching nonlinearity. The Bussgang approximation (BA) is used in ABF to provide phase and amplitude corrections for the desired beam pattern of the intended group. It has been shown that the alone FC analog TX cannot fulfill the system capacity with minor OOB reduction at the desired user terminal, even with conventional DPD. However, the generalized mutual information (GMI) analysis is carried out in ABF to realize the improved system capacity with conventional DPD, especially at single-carrier and multi-carrier high-order OFDM modulations. Table XIV enlist the summary of HPD architectures for the case of dedicated DPD structure with each RF analog chain. Some exciting research focuses on subsidizing PA predistortion structures and MIMO linearization schemes discussed in the next section. These research directions could serve as a starting point for the designers to standardize the linearity challenges envisioned for next-generation mMIMO systems.

D. Future Research Directions
In future work for a built-in class of APD, it is important to minimize the effects of stringent parasitic capacitances of the transistor, especially at large input biasing [205]. Moreover, wideband frequencies require high output power gain. Hence, APD circuits are expected to synchronize the inverse transfer characteristics (small-signal to large-signal) over an operating frequency of interest to the PA for sufficient distortion cancellation. By addressing these shortcomings, the active biasing of high-linearity transformer-based predistorter in [34] could be used as a gain and phase adjustment tool to linearize hybrid phased array systems. Meanwhile, the challenges of delay mismatches in the dual-branch APD can be lessened in future editions [208]. In addition, the modified HPD in [66] extracted from a standalone APD of [31] can also be extended to CA-TX and a more practical case of MU MIMO-OFDM 5G systems.
Moreover, the joint embodiment of [66] with the parallelprocessing fully-digital array of [132] will be interesting to extend the scalable hybrid linearization for an FPGA-based baseband. On the other hand, tuneable ABF sub-components with limited digital baseband calibrations could be developed to inoculate PA transfer functions dominated by the multipath propagations and limited PS switching rotations to pursue high SNR and low-EVM levels, especially in far-field isotropic directions.
Since the amplitude beamforming weights of the SA beams are mainly controlled by VGA, which, as a result, are independent of PS beam-steering directions, as illustrated in [130], therefore, it would be interesting to extend the PS network of the RF-chain free BF in [44] by developing a VGA dependent amplitude beamforming vectors to design a low-complexity solution of an adaptive real-time array diversity.
Besides, building the basis functions of multidimensional ADPD at MMW frequencies needs to be further investigated, as concluded in [105]. Since in diverse deployments of BS, each SA has an independent input driving signal responsible for transmitting true uncorrelated data sequences [209]. However, the impairments are always liable across PA interconnections due to inadequate array spacing, which induces strong correlation losses during spatially multiplexed transmissions. These impairments due to the crosstalk effects have been successfully demonstrated in [108]. However, the additional variations of power gains (see, e.g., [210]) in the multi-channel scenario should also be considered to uplift the spectral containment for multi-directive beamforming.
For hybrid TX with shared DPD, the operating frequencies with wide modulation bandwidths still need to be addressed. Since PAs with wide bandwidths might exhibit stronger MEs (depends on the array isolation characteristics [190]), which will be oppressive for a single-DPD to compensate. As the cross-memory terms are shared across the array channels, it is of interest to contemplate the geometric sets of SISO-DPD coefficients [211] to amend wide signal modulation bandwidths in a common-DPD oriented MIMO system. On the other hand, in FC hybrid MIMO system [202], a potential research opportunity could be to reduce the high sampling rates of DPD modules at continuous MMW bands. The lineardecomposition-based DPD modeling in [136] can be explored for FC-HPD architecture to limit the oversampling rates of dedicated DSP units while preserving high linearity.

E. Summary and Lessons Learned
The characterization of large-scale MIMO-BS TRXs based on their spatial distribution properties is a key to understanding the linearization performance and the design of PD circuits in a real-world implementation. Also, the bandwidths of presentday communication standards are constantly evolving to fulfill the public cravings with high data rates and high quality of service. This section has discussed three PD learning architectures with their application scenarios in practice for both standalone and MIMO transceiver systems. The approaches with their references are summarized in Table X-XIV. The key lessons learned from the section mentioned above are as follows.
• For an APD operation, we can leverage the similar covariance bandwidth of an input baseband signal to the PA for linearization, which is not applicable in DPD that occupies the five-fold of sampling bandwidth to an operating band. Subsequently, to get maximum benefits from APD, one common principle is to ensure a careful shaping of drain and gate bias voltages [21], since it is complementary in principle to maintaining optimal instantaneous envelope of the RF carrier towards achieving satisfactory linearity.
• The LUT must be equipped with sufficient distribution of data points, owning to the information storage of input/output signals magnitude, to select the most appropriate samples among discrete table entries.
• Despite the most preferred method of nonlinear modeling that empowers ADPD solutions in real-time testing and tracking of array conditions, the overall derivation of PD function may affect due to improper construction of coefficient matrix causes ill-conditioning. Hence, considering an ill-conditioning problem, the PD regularization matrix should be properly configured.
• The various hybrid distortion learning architectures are application-specific, e.g., a TX with dedicated-DPD-based beamforming can effectively safeguard the spectral pollution from the adjacent bands while retaining the PD performance traits at the expense of additional hardware resources. In contrast, the low complexity hybrid TX with shared-DPD provides low spectral efficiency. Meanwhile, the power-hungry resolutions of DPD engines and array uniformity from APD sub-components play a pivotal role in spectral containment.

V. CASE STUDIES INVOLVING PREDISTORTION MEASURES
This section will illustrate the in-depth understanding of essential PD metrics and their usability in analyzing the performances of several application scenarios. Four different case studies have been presented in this pursuit. These include cellular BS communications, satellite communications, radio over fiber (RoF) communications, and terrestrial broadcasting systems. These application areas show the performances of PD paradigms at different frequency waves of 5G BS systems, and particularly, what would be the future strategies to realize effective PD mechanisms for reliable communications on various scales.

A. Cellular Base Station Communications
While considering cellular communications, the PD-based linearization attributes have been well addressed in recent literature. Like, MBS, in cooperation with traditional 4G and evolving 5G mMIMO networks, has been evaluated with high throughput and linearity performance supporting up to two, four, or eight up/down-link BS terminals [27], [212]. Whereas, currently researched femtocells, pico-cells, micro-cells, or metro-cells BS (taken as the subsets of SBS in general) enable high power gain and independent beam-steering capabilities on the back of closely-packed phased arrays ICs, starting from the single TX element [79] to 32-64 [213], 8-256 [214], and even up to 384-TRX elements [82]. Reference [215] presents a thorough study addressing the design considerations and challenges from the various dimensions of 5G BS access technology. Multiple aspects are needed to explore the feasibilities of PD schemes at different scales of 5G and B5G networks, such as dealing with linearization metrics of digital and RF sub-systems at different power levels, and semiconductor materials are ultimately needed to be addressed [216].

1) Predistortion Characterization in Small-Cell Base Station:
The low-power (in milli-Watt range) SBS is a more substantial choice than the monolithic MBS to reuse the available spectrum for ubiquitous connectivity. The all-in-one package of high-volume MIMO terminals employs low-cost, lowpower PAs, where linearization requirements are significantly relaxed or external PD calibrations are even exempted [217], [218], [219]. However, the cross-tier interference in SBS network densification always calls for additional equalization arrangements to procure high EE [220]. Badal et al. [221] conduct a review to delve deeper insights into various configurations of CMOS-based micro BS TXs feasible for 2.4 GHz ISM band applications. In the light of these solutions, a brief description of the design challenges associated with CMOS downscaling factors in device parasitics, carrier leakage, and power consumption are also marked for different TX configurations. The authors accentuate dual-conversion and direct-conversion TXs because of their enhanced linearization capacity and power-saving ability.
It is well-noted that SBS is the potential enabler of MMW-HB to realize the prevailing (1000×) network capacity of B5G networks [222]. Hence, SBSs are envisioned to evade contradictory hardware resources in large-scale linearization, courtesy of low power consumption. The references [223], [224] demonstrate indoor densification of small-cells operation by employing single omnidirectional and multiple BF elements, and thus, efficient reuse of frequency spectrum among short-spaced terminals. Moreover, BF nonlinearity in terms of multi-path dispersion is highlighted, where a constructive approach of a Rake receiver is proposed [223]. This strategy significantly reduces the system overhead while providing a substantial SE and EE performance and further validates the effectiveness of a low-power RF-PD learning approach for omnidirectional SBS antenna elements.
Westberg et al. [225] presents an evolutionary bar chart shipment metric of 5G BSs, which are fuelled by the dramatic increase of mMIMO systems deployment, reaching over 100 million in 2021 for MMW spectrums. This is because the exhaustive SBSs are usually preferred to increase network capacity and efficiency, especially inside several hard-reaching indoor environments, such as stadiums, airports, or shopping malls, as [226] outlines the signal penetration caused by the MBS in high-rise buildings. While doing so, the extravagant cost and hardware complexities incurred by large-scale SBS deployments have to be balanced. The special issue [227] focuses on the practical challenges of large-scale MIMO systems that would jointly pave the way to hardware-efficient deployments of SBS and IoT applications. The key philosophy is to derive the novel optimization solutions of analog precoders and fast heuristic DSP algorithms to maximize the SE and EE of low-power small-cell devices. However, strictly speaking, the emphasizing DPD method is usually a not usable form of linearization in SBS. The reference [223] has already addressed a directive line-of-sight (LOS) communication using an RF analog precoder acquisition in dense small-cell constellations. It is evident that even low-power PAs does not scale down the DPD criterion in terms of expense and power requirements since the linearization metrics remain more or less the same, independent of the DPD usability. Although, the variations due to ME in broadband modulation signals can significantly degrade the performance of RF linearizers. Therefore, Campo et al. [228] addressed a real-time DPD implantation for the linearization of high-throughput graphics processing unit (GPU)-based baseband modem with operating PAs aimed inside SBS. The signals transmission and reception run back and forth across GPU and CPU, respectively. The authors further utilized PA sets that support commercialized indoor and medium-range outdoor products. However, a DPD is formulated to improve the PA's inherent features by reducing the initial PAPR value of some 8.5 dB down to 7 dB and raising the initial modulation bandwidth from 20 MHz LTE to 100 MHz 5G-NR signal. Their investigation concluded that using the complex MP with high M-orders could improve the symmetry across PA spectra at the cost of more computational adjustments and hardware resources, which was nonadjacent in other cases due to insufficient memory.
2) Predistortion Characterization in Macro-Cell Base Station: Likewise, the next-generation network, SBS, will equally rely on MBS to enhance an outdoor environment's signal strength and throughput. The MBS is envisioned to leverage high transmit power (up to several watts), supporting MU transmission with multiple data streams through a limited number of MIMO elements [229]. Unlike SBS, where APD is predominant for enhancing the concomitant SE and EE tradeoff, the accustomed DPD is the viable linearization mode in MBS to combat the MU interference while providing a flexible compromise between large PAs back-off and SE. Furthermore, the TXs with highly efficient PA topologies such as Doherty amplifier, etc., permit less power wastage with ubiquitous signal transmission and area coverage of around 1-25 km. But on the other side, such PAs also suffer from poor linearity because of operation close to the saturation region [230].
Meanwhile, in [231], MBS are also assumed to jointly optimize the analog and digital PD solutions to compensate for impaired hardware constraints. The paper presents an earlystage design model to study the impact of insufficient physical information due to power-consuming RF and digital control circuits and thus proposes the optimized energy-efficient statistical power model of candidate PA designs in 4G and 5G MBS modules. The investigation, as a result, found the best regression points for RF analog sub-components, including PA and DAC/ADC, to derive an appropriate fitting function for low power consumption of 5G MBS.
To meet the growing traffic demands of users with ultrahigh data rates, the 5G and B5G MBS architectures will be densely deployed; therefore, it is vital to design a wireless infrastructure with high linearity and high efficiency, especially in large form-factor systems like MBS, to keep the power consumption as low as possible. The GaN material is one such proposition that has been carried out in PA designing to balance the stringent linearization and PAE constraints of 5G MBS architectures [232]. Kuwabara et al. [233] proposes a highly integrated 5G active antenna array equipped with 480 transceiving elements to achieve high effective isotropic radiated power (EIRP) for MBS coverage. The full digital beamforming compound further harnessed the spatial reuse of the 5G spectrum, supporting spatial multiplexing of 32 data streams from 15 elements SA. However, developing an MMW mMIMO structure of MBS is often hindered by thermal instability, especially in systems where DPD is considered unrealistic. A remedy to this weakness has been addressed in [233]. The design employs a unique thermal discharging mechanism where heat pipes are scratched underneath the array unit. As a result, the effective temperature per channel is reduced by 13 • C. Another article [234] explores the energyefficient design of MBS-PA to harness heat emissions and underlying hardware resources. The authors considered picocell networking for the incorporation of MBS. The problem of heat dissipation due to excessive transistor power supply has been addressed to pave the way for energy and costeffective, readily installable solutions. On the other hand, introducing commercial dual-stage balanced PA improves the different performance metrics for efficient signal transmission of outdoor pico BS. Based on these considerations, the overall objective of operating a PA with a reduced DC power supply while maintaining an acceptable quality of offerings in terms of average output power and linearity has been accomplished.

B. Satellite Communications
PD-based linearization, while considering satellite communications, has been integrated with a more practical and sophisticated adaptation process to meet the target performance of future high throughput satellites [235], [236], [237], [238], [239]. Both APD and DPD solutions may be considered for realizing a near-optimal linearization across numerous frequency applications [239]. Additionally, the DPD-MP metric has been actively investigated for the onboard-aware satellite transponder to counter the grave dynamic distortions of transponder filters and high PAs. In [237], the main architecture of the onboard satellite compound involving the ground-station-based DPD operation with the multicarrier mode is considered. In this satellite network, the MP resources have been allocated based on the band limitation (high sampling rate) constraints in DPD forward path. Additionally, the nonlinear behavior of the sampling feedback signal has been captured from the output multiplexing filter co-designed with high PA to include deleterious MEs from the payloads filter. Hence, under the constraint of spectral regrowth at the transponder, an adaptive estimation of DSP algorithms is interpreted either in DLA or IDLA, depending on the specific application scenarios of adopted carrier waveforms. To this end, the DPD model can either be exploited at onboard [236], [237] or on-ground [238] level to evaluate the performance of satellite when handling nonlinearities for higher transponder throughput, includes: • On-board predistortion: It optimizes bandwidth efficiency and filters OOB radiations by utilizing DPD gain adaptions in multicarrier waveforms with higher modulation order and larger bit-error rate. • On-ground predistortion: It offers joint optimization of PD identification parameters to mitigate high PA nonlinearity and interfering carriers across the user's terminal.

C. Radio Over Fiber Communications
The PD model has been used in an RoF transmission, taking into account the realistic linear impairments of optical components such as laser, photodiode, filters, and semiconductor amplifiers. In detail, to efficiently analyze the nonlinear combinatorial effects of long-haul RoF transmission system, authors in [240], and [241] have incorporated the concept of DPD and Schottky diode-based APD for broadband LTE and 5G systems, respectively. In the proposed DPD approach, the feedback information is tested under the PVT variations without facing delay. In a nutshell, the on-demand DPD operation can only be invoked in the event of significant PVT variations, providing an effective way to tune seldom coefficients of the RoF downlink channel. With the aid of an APD approach, the nonlinearities of the laser diode are compensated. Usage of an identical equidistant Schottky diodes-based interference cancellation with the same impedance-matching has resulted in the wide linearization bandwidth of 1.8 GHz, ideal for the 5G bandwidth-intensive applications.

D. Terrestrial Broadcasting Systems
The plethora of high PA losses in terrestrial broadcasting TXs has also been tested in conjunction with the DPD recently [242], [243]. The DPD-based iterative correction and non-convex optimization problem can be carried out to analyze the quality of digital broadcasting signals with fast convergence. In [242], the DPD model has been used as a convex problem but is not necessarily a perfect design method due to the complicated impedance mismatch between the correction and real PA. The memoryless solid-state PA has been used, and achievable static linearization was obtained to test the OFDM signal with AM-AM distortion of terrestrial broadcasting transmissions. This simulation-based iterative analysis shows that most convergence performance depends on the additional parameters employed to optimize the associated convergence assignment. A range of investigations related to finding the feasible number of online convergence parameters in broadcasting systems were carried out in [243]. This study tested divergence issues of convergent output and large input amplitude dissatisfaction using a signal-based DPD scheme while proposing optimal coefficient extraction strategies at a near-saturation level of high PAs.

E. Future Research Directions
The research efforts mentioned above have adequately demonstrated the feasibilities of PD schemes at different levels. These contributions serve as a valuable guide for further amendments to devise versatile linearization solutions seamlessly integrated with the future generations of wireless communication.
1) Airborne Communications: The utilization of PD schemes in aerial and ad hoc, such as UAV-enabled next generation frameworks, has a strong appeal for 5G and 6G standards as it constitutes a wide coverage area and capacity [244]. In this regard, fast heuristic real-time acquisition of feedback data with near-zero latency for different flavors of user's mobility and nonlinear system models can be considered for future research. In the existing works, on the TX side, a one-to-one mapping is formed between the PA and cascaded PD equipment to support simultaneous uplink and downlink channel estimation, or a high precision beam-steering link is established to increase spectrum utilization. It is illustrated in [245] if the steered beam in a single-target spatial linearization is developed, it leads to interference over space for different spatial distributions while narrowing the user's moving range resulting in highly directive PD operation. This framework of extending the multi-target linearization can be explored for the coalition of different aerial communications schemes to improve air-to-ground channel characteristics. The wide linearization angle could support a high mobility scenario.
2) Backscatter Communications (BackComm): The PD schemes in mMIMO-enabled BSs can be developed for the ambient waves of BackComm networks which is a new and appealing research area for the future-connected IoT devices that demands ultra-low power consumption. Similar to the conventional MIMO transmission scenario [108], the PD system models with multiple digital and/or analog sources at backscatter TXs/RXs need to be considered to facilitate the interference mitigation between the transmitted ambient and received backscattered signals. As indicated in [246], the BackComm may still be degraded by the layout interconnect parasitic components despite supporting complex modulation; it then emphasizes the use of appropriate signal PD/equalization mechanism at the TX/RX side to allow suitable biasing conditions to the integrated backscatter TRX for supporting more complex schemes, which is an avenue for future research.
3) Other Enabling Technologies: a) Linearization scalibility: It is evident that the output power requirements are accordingly relaxed as the frequency bands deployment becomes dense; therefore, the scalable PD configurations should be one of the most important considerations for a viable placement of 5G and beyond networks. Since the power overhead of existing DPD structures remains the same, irrespective of the network densification. In this regard, the further exploration of hybrid linearization solutions will greatly balance the high power consumption towards digital processing components and ease the implementation complexities of integrated front-end circuitry utilizing MMW spectrums.
b) Thermal management: The optimizations of heat transfer solutions need further investigation to provide longlasting operation in the high packaging densities of different communications schemes. Future research is possible to reduce the strict heat removal requirements and subsequent maintenance by commingling cost-effective PD solutions and heat sinking fins for passive cooling in distributed radio channels [247].
c) Concurrent transmission: The existing PD solutions in SBS-TRXs schedule a communication link limited to the single-user only [132], supporting either a frequency-division duplex (FDD) [248] or time-division duplex (TDD) phased array [38], [65]. It would be interesting to see the efficacy of current linearization solutions for the concurrent transmission of closely packed radio spectrums in future wireless nodes, especially in non-standalone FDD/TDD-CA systems. Although, such transmission has recently received profound attention [249] after the successful trials of TDD-FR1/FDD-FR2 CA proof-of-concept systems announced by the Ericsson [9]. d) Interference management: Insufficient robustness to the interference has always been a symptomatic issue for microwave-based and MMW communications. From the 5G cellular BS point of view, it is an example of multi-tier (or cross-tier) interference from the SBS to MBS, which has been largely elucidated in recent research works [15], [250]. Ultimately, the variable load conditions due to unstable PA interconnections are the main source of electromagnetic interference regardless of the small-scale or largescale networks. The latest PD designs and a paradigm shift towards learning algorithms [61], [108] could be investigated for maximized radiation power and SE in wider transmission bandwidths while dealing with practical constraints of inter/intra-cell modulation distortions [58], [60].

F. Summary and Lessons Learned
The linearization of strongly nonlinear MMW systems is poised to be performed in dynamic broadband propagation environments with a high density of transmission modules and users. This highly saturated spectral environment compound increases the linearization complexities towards intended nodes. In addition, the sensitivity to interferers and spurs in network densification can cause more nonlinearity orders of memory depth, where the performance of various PD aspects at the baseband and the architectural level becomes a critical issue. Eventually, the severe nonlinearity of MMW spectrums in denser 5G and B5G networks will raise many new challenges worth exploring, such as power scalability of PD subsystems, periodic optimizations for ambient thermal variations, concurrent transmission protocols, and interference management. The key lessons learned from this section are as follows.
• In mMIMO communications enabled by sub-6 GHz and MMW frequencies, simultaneous deployment of SBS and MBS are needed among different entities at the hotspot and large levels using the same orthogonal resources. To handle the cross-tier interference between the transmissions, the joint coordination and optimization of established PD techniques are essential, raising the system cost and computational resources.
• In integrated SBS that adopt MMW communications, the network EE and gains of FEMs can be complemented with the assembly of low-power PAs. Thus, the communication function of SBS is typically integrable into the shorter transmission distance between a user equipment and SBS with good channel conditions [15]. However, the main challenge is understanding the impact of cascaded coordination and optimization of hybrid precoders in exploiting spatial diversity, encouraging flexible and scalable distortion modeling.
• The 5G MBS networks add additional "operatingexpense" constraints, which can be contaminated by the increased power consumption of communications equipment. The optimization of PD functions can be remarkably complicated to solve the power allocation problems in the MU-MMW scenario. The DPD solutions for GaN-based FEMs should be investigated in high-power macro cellular BSs in the presence of linearizability clutters [251].
• The use of PD schemes in satellite-based communications can enhance the spectral density of the transponder by mitigating the spectral regrowth from the payload subcomponents and resolving the multicarrier reflections.
• In RoF architecture, long-range transmission bandwidths may not indicate an optimal communication or downlink performance. The reason is that ensuing the superimposed multiplexing subcarriers through a fiber cable or free space incurs a linearity loss in the centralized light source [241].
• In broadcast PAs, the linearization performance is affected by the cumulative inefficient backoff and DPD convergence. As such, the model identification in high broadcasting PAs should be defined with optimal convexity bounds limit, which is beneficial for the convergence of the input baseband symbols having high-order modulation diversity [242].

VI. EMERGING RESEARCH TRENDS
The previous sections have presented the concrete and promising solutions to the varieties of PD methods for exploiting linearized 5G and B5G BS geometry, highlighting the key offerings and technical challenges of various linearization methods. This section is intended to present certain areas which have received significant attention in recent times and are expected to remain important topics for futuristic research outcomes.

A. Millimeter-Wave MIMO Predistortion
Compared to the traditional wireless networks with standalone microwave bands, the MMW frequencies (with the joint allocation of microwave bands in a unique propagation environment [252]) in mMIMO systems are the real enabler for future generations of wireless technology. The PD-based linearization is also the firm substructure of evolving domain to maintain reasonably good spectrum and power efficiency. There is a growing interest in PD solutions to dictate the nonlinear terms of multiple and mutually different operating PAs of MIMO active antenna arrays. This has led to considerable open research problems of nonlinear distortion modeling and corresponding PD solutions that need significant attention when designing MMW-mMIMO systems.
1) From Standard SISO to Diverse MIMO Predistortion Systems: SISO-based PD processing and learning have been widely adopted to design and analyze the sub-6 GHz modulated signals. It leverages the nonlinear reciprocity in a single-block DPD model as a classical single set of DPD polynomials-based learning architecture for individual PA elements. This has motivated the study to design a hierarchical dual-input DPD model to provide a high-performance solution to the multi-element beamforming array [102], [253]. Based on the extended benefits of dual-input DPD modeling by frequency reconfigurability, this architecture assists the nonlinear characterization with multidimensional forward and reverse inputs assigned for simultaneous monitoring of PA outputs and potential antenna crosstalk. Under pure LOS conditions, it is found that such DPD processing can minute the direction (θ, φ) based OOB emissions in both narrowband [102] and wideband [211] BFs, which leads to the performance improvement in terms of coverage and capacity to the intended single-user. However, such a DPD learning method based on the improvisation of mildly nonlinear operation points limits the design feasibility of confronting the standard-compliant ACLR metrics of 3GPP 5G-NR specifications for FR2 bands [77]. Moreover, the finite spatial domain can cause an insufficient linearization for the desired user at random directions or the victim user, originating in the swath of intended RX.
The pioneering work that responded to such shortcomings is [108], inspired by their earlier demonstration [140]. The main philosophy behind it is to enhance the spatial coverage, and performance gain for MU scenarios in the form of multibeam linearization [191] to compensate for the existence of arbitrarily affected users completely. It would be fair to say that these research contributions on MU-MIMO PD systems have observed the TDD transmission and reception mode to evade the coefficient training and feedback overhead for DPD parameter estimation. Moreover, in the high mobility and obstruction zone, the TDD transmission mode supporting the MU beamforming is no longer constructive. However, some recent works have been done which can help in improving the self-interference and impaired hardware constraints of full-duplex mMIMO systems [254], [255]. However, to the best of our knowledge, there are still no requisite real-time adaptions or applications of FDD-based MIMO systems to support seamless MU connectivity in the context of 5G-NR BSs, which is a subject of the investigation with inputs from recently presented ray-tracing simulations.
2) Mutual Coupling: This is another important metric to evaluate the performance degradation of multi-antenna FEMs. Such a condition may invoke the source and load impedance deviation from its ideal 50-Ω point. Hence, the issue must be mitigated to deal with the realistic scenarios that employ continuous real-time testing of MMW signals [39], [65]. This is generally challenging since the PD inversion of strongly nonlinear systems needs high polynomial formulations with MEs, where the encapsulated PA elements are allocated to operate near the compression region for highefficiency performance [20]. Hence, such a scenario permits maximizing the level of nonlinearity. In future work, it is of interest to address the randomness of crosstalk or mutual coupling distortion beyond the assumption of mutually identical PAs [190].
3) Algorithm Complexity and Compressive Sampling: In the existing algorithms seeking to tune the nonlinear coefficients, issues still need to be considered (as discussed in Section III-A), especially the wideband (CA) MEs in MMW active array PA units. On the other hand, compressive sampling is a fertile research topic in signal processing, specifically for MMW communication, due to its sparse nature. The high sparsity of MMW channels with wide instantaneous modulation bandwidths does not encourage large digitization bandwidths (as discussed in Section III-B). However, the only limiting factor is the amount of required aliasing that must have enough spectral regrowth to accurately estimate aliased time/frequency domain samples [29], [136]. With a simplified DPD feedback RX design considering aliasing minimization, the technique reduces the signaling overhead and computational complexity by employing band-limited DPD approach beacons to estimate the model coefficients, which significantly reduces ADC acquisition bandwidth and sampling rate. It is an integral technique of baseband precoders to design novel learning algorithms with fewer data samples and computational resources. Therefore, compressed sampling can also play a key role in developing implicit model extraction algorithms for the effective linearization of true MMW wireless BSs.

B. Predistortion Based on NOMA Scheme
Compared to the PD integration of MMW-MIMO with 5G-NR OFDM signal, which is a well-studied domain [93], [111]; the amalgamation of PD methods in NOMA scheme with developing PA architectures is still an underdeveloped area of research with a large room of contributions. Generally, two types of NOMA multiplexing methods are allocated in 5G-MU communication: 1) power domain NOMA (pNOMA) for various power-coefficients grouped users and 2) code domain NOMA (cNOMA) for codeword modulated symbols grouped users [256]. In what follows, we outline some of the promising research directions that can be leveraged to integrate developing PD methods in the field of NOMA multiplexing waveforms.
1) Predistortion Facets for pNOMA Waveform: As mentioned above, the real-time integration of NOMA into modern PAs is in its infancy [257], [258], and much work needs to be investigated in this area. Specifically, the problem of high PAPR with a minimum bit-error rate per subcarrier brings some unique challenges in acquiring downlink pNOMA multiplexing signals. In particular, the lack of reliable decoding at RX due to the nonlinear distortions of the downlink NOMA system maximizes the error propagation effects for the users sharing the same resources even if the perfect successive interference cancellation (SIC) technique is applied [258]. To accommodate successful SIC reception for the NOMA communication networks requires high computational power and processing time to run SIC algorithms, especially for the real-time MMW-mMIMO systems that demand iterative nonlinearity profiling for all transmitting users. This constituent of pNOMA makes it challenging for the SIC receiver to mitigate multipath fading effects.
On the other hand, in OFDM systems, much ongoing research works jointly optimizes the PAPR reduction and PD matrices, for example, the work published in [259] and references therein to explore the unified PAPR reduction and DPD model that cooperate. However, in pNOMA networks, there is a lack of research work or real applications discussing the integration of PD solutions to strengthen downlink transmissions. Although several high-PA standards based on analytical assessments of pNOMA schemes are recently proposed [257], [258]; there is a need for further exploration as the PA in modern TRX applications demands real-time adaption for a high value of PAPR. Afterward, in [260], the SIC stability in a downlink pNOMA enabled scenario that normalizes the designated power with superposition coding to allocated NOMA users has been proposed. In essence, the seamless customization of such schemes and solutions in one form or other to jointly combat the inter/intra-cell modulation distortions in an MU-pNOMA-enabled TRX system is an area worthy of future work.
2) Predistortion Facets for cNOMA Waveform: The cNOMA scheme can be thought of as another way of perceiving analytical downlink communication with a nonlinear signal-processing system. Unlike pNOMA, it is not constrained to support suitable power-allocated users. On the contrary, the advantage of cNOMA is that it supports MU diversity based on the unique codeword symbol for each user. The integration of PD in the direction of the cNOMA setting means that the reconfigurable properties of the trained PA predistortion model can be utilized to reduce the unbearable high PAPR for the desired accurate detection of user symbols at the RX. However, the open question of how premier to exploit the most optimized convergence algorithm of PA distortion modeling to best suit cNOMA gains formation is still in its preliminary phase, and only a few polynomial models have been developed [261].

C. Linearizability of Monostatic Systems
An intelligent and smart design package of shared TRX mode can benefit from the value-added features for large-scale arrays having a reduced chip area and power consumption, and compensate for the inadequacies due to the high noise figure, gain and phase mismatches, parasitics in the aftermath of PVT variations, etc. The joint TX-RX time-frames in a switch-based TDD operation environment have ameliorated the applicability of APD in coordinated FEMs [165]. However, the plethora of losses in wideband matching networks or TRX switches of such practical RF circuits at MMW frequencies is an issue worth exploring. This is challenging for the frequencyselective MIMO channels that may have high fading, making the time-scale modulation process difficult for gain and phase adjustments [72].

D. Hybrid Predistortion
Since HPD leverage the benefits of analog and digital PD solutions while guaranteeing the linearity and performance requirements of very nonlinear system applications. In particular, currently, the predistorter is widely implemented in the hybrid baseband [52], which has emerged as an attractive proposition in terms of much lower implementation complexity and outperforms the existing alternatives with DPD-only [25], or APD-only [262] PA setup. With the design objective of hybrid SAs in a consolidated MIMO platform, a linear combination of RF signals and then performing the parameter estimation, especially in coherence MMW bandwidths, remains a formidable challenge. Although pseudorandom sequences of memory coefficients, in [108] and references therein, have shown the good autocorrelation properties of the SA output signals, close to the one obtained with the architectures without SAs [61].
In MIMO systems, a co-designed HPD framework can be treated as a group to jointly maximize the array uniformity and spatial isolation provided by the APD biasing and behavioral modeling offered by the shared DPD [198]. Interests in HPD solutions for beamforming design are motivated because the number of DSP modules is only lower-bounded by the feedback paths from the PAs output. At the same time, the analog phase and gain normalization circuits can only be deemed the SLL upper-bounds. However, the extent to which the target ACPR can be achieved is sensitive to the nonuniformity of analog precoders [57]. Thus, it is important to consider the APD uncertainties and individual PA's nonlinear characteristics in developing the hybrid linearization of MIMO systems.
Meanwhile, the exponential complexity and power consumption of feedforward APD circuits in hybrid linearization still need further investigation [193]. Among various possibilities, the HPD solution with such a distortion compensation circuitry has attracted increasing attention because of the capability to handle long-term MEs for the wideband waveforms. An in-depth analysis of [263] serves as a guiding principle to design energy-efficient baseband precoders at high frequencies such as MMW to optimize the intractable distortions of a broadband system.

E. Predistortion for 6G-Enabled Communications
While the early next-generation research on super-wideband 6G (or B5G) communications is already intensified, it is also of interest to develop an energy-spectrum efficient solution because the large bandwidths of 6G modulations require high PAPR burst-like waveforms [264]. The technical issue of energy dissemination assumes particular importance in the context of massive deployments of superdense small-cell and IoT devices [265], [266].
Furthermore, the usability of enhanced CA in future communications will enforce the shared frequency spectrum to the inevitable congestion, which necessitates the frequent updating of distortion coefficients. This, in turn, raises the importance of computational simplicity in the continuous acquisition and extraction of DPD coefficients. In this regard, there has been an effort to counterbalance the nonlinear dynamics effects during DPD learning, in real-time, induced by the 6G frequencyhopping carrier waveform signals [267]. Hence, the basic idea of the proposed heuristic DPD model in [267] is to enhance the coefficients training capacity under considerations of constantmodulus waveforms with large subcarrier spacing in the lower 6G MMW (26 & 28 GHz bands) regime. However, benefiting from the very large bandwidths of sub-THz frequencies, it has not been adequately explored for the effectivity of SE with fair transmit power, mainly because of the high path loss characteristics. Works in [269] and [270] are the recent attempts to study the impact of high propagation losses on PA linearization metrics at the intended 6G frequencies. Such vital insights would pave the way for developing MIMO systems yielding satisfactory linearization performance in different evaluation scenarios involving upper MMW and sub-THz frequencies.

VII. CONCLUSION
This paper provides the key concept and comprehensive view of PD-based linearization, a critical criterion for the 5G and beyond wireless communications. While the previous contributions on PA linearization focused more on specific distortion learning architectures, this paper focuses on the evolution and advancements in baseband processing, system model, and distinct PD aspects in various case studies. In particular, this work categorizes the PD utility into i) the general roadmap of its signal processing through TRX baseband, digital and analog sub-components; and the resulting ii) APD, DPD, and HPD hardware architectures; followed by iii) the distinguished configurations and PD features of each architecture, and the resulting linearization performance metrics; and finally iv) the applicability of emerging PD solutions under various practical environments. We gave an inclusive overview of the current research trends for each of these topics. Also, we discussed the complexities and shortcomings of possible future directions of research related to the designs of PD-based linearization.