Heuristic Model Structure Optimization for Digital Predistortion

Power amplifiers (PAs) are widely used in RF broadcasting applications. However, they exhibit nonlinear behavior and deteriorate the quality of transmitted signals. Digital predistortion (DPD) is developed to linearize the distortion generated by PAs. Due to the contradiction between modeling precision and complexity, to reduce the complexity and ensure comparable performance, it is necessary to determine an appropriate or optimized structure before applying DPD. Heuristic method is a good method to solve such multivariate problems with a considerable large search domain. In this paper, a novel approach to determine the structure of DPD based on heuristic method is proposed, including the enhanced hill-climbing (EHC) with stronger global search ability and enhanced genetic algorithm (EGA). Ridge regression is introduced to ensure the correctness of the search direction. Besides, orthogonal matching pursuit is used to further reduce the basis number while keeping good performance. The validation of the proposed techniques on a Doherty PA using generalized memory polynomial (GMP) model demonstrates the capability to efficiently find the dimensions of an appropriate GMP model. As shown by the convergence curve and the final model performance, the model searched by the proposed method has satisfactory results, and achieves a good balance between complexity and performance.


I. INTRODUCTION
Driven by the combination of market demands and technology advances, the next generation communication system, i.e., 5G, had become one of the research hotspots and been commercialized in 2020 [1]. The signal transmitted in the system is characterized by ultra-wideband and a high data transmission rate [2]- [4]. The power amplifier (PA) is one of the indispensable components in the system and it is also the major component causing nonlinearity, which deteriorates the quality of the transmitted signal both in time and frequency domain.
Therefore, the behavioral modeling and linearization of PAs have been widely studied. Many methods have been proposed to alleviate the distortion, such as power back off (BO), analog predistortion (APD), and digital predistortion (DPD) [5] [6]. DPD attracts wide attention among these methods, owing to its flexibility and high efficiency.
Volterra series [7] is typically adopted to characterize nonlinear systems, which has also been introduced into PA modeling. However, due to the high correlation of Volterra series, too many basis functions have to be included when characterize a PA with high nonlinearity. Therefore, many simplified versions have been proposed, such as memory polynomial (MP) model [8], generalized memory polynomial (GMP) model [9] and dynamic deviation reduction-based (DDR) model [10]. As the bandwidth of the transmitted signal becomes increasingly wider, MP model is not accurate enough to modify the PA. Thus, GMP is more attractive because of its capability to model a system with relatively strong nonlinearity. The structure of these models is determined by the nonlinear orders and the memory depths, representing the static and dynamic nonlinearity, respectively. Without loss of generality, GMP is the model used in this article, and the proposed search algorithm can also be applied to other models such as MP, DDR, etc.
Before applying DPD to linearize PAs, it is necessary to first determine the structure of the model which is suitable for corresponding scenarios.
Basically, the requirements for modeling performance and modeling complexity are contradictory. An appropriate model structure can make a good tradeoff between the computational complexity and the performance [11]. Classically, the model structure is set with the corresponding parameters chosen by experience, which makes it difficult to balance model performance and complexity. In other words, either an over-fitting or an under-fitting is likely to appear and deteriorate the model performance.
Therefore, it is necessary to solve this problem. Exhaustive search can naturally find the best structure of a model while it is too computation costly to implement in a real system. Fortunately, heuristic search guides the search by the heuristic information of the problem to reduce the complexity and the search domain corresponding to the problem, where hillclimbing (HC) method [12], genetic algorithm (GA) [13], simulated annealing (SA) [14] and particle swarm optimization (PSO) [15] are classical algorithms. Therefore, in this paper we propose a method based on heuristic method to determine the proper structure of GMP model. However, even in the simplified models, such as MP, GMP, DDR, basic functions contribute differently to the modeling performance. In [16], a non-uniform memory polynomial behavioral model is proposed to further reduce the number of parameters in the MP model without affecting its performance. A method based on the attention mechanism is used to choose the memory terms effectively in [17]. Similarly, basis functions with different nonlinear orders also contribute differently to the modeling performance. In [18], by ignoring the nonlinear terms that contribute less to the modeling performance at the same memory depth, an MP model with non-uniform nonlinearity is proposed. The phenomena mentioned above are attributed to the correlation between basis functions of models derived from the Volterra series. Unfortunately, in [13]-[15], model structures obtained by search algorithm based on simplified Volterra-like model still have redundancy. If only the model structure is searched, basis functions redundancy cannot be avoided.
Therefore, it is necessary to select an appropriate subset of the searched models. At the same time, the selected subset can achieve a comparable modeling performance with the searched structure and reduce the modeling complexity further. Orthogonal matching pursuit (OMP) contributes to picking the primary proper subset in an efficient way [19].
In this paper, search methods based on heuristic are proposed to search DPD model structures, where the fitness value is utilized to generate the decision of solutions in the next step and converges quickly. The proposed methods also possess the following advantages: multiple initial points search in parallel, quick convergence, enhanced numerical stability in the search process to ensure a correct iterative search direction, and accepting a poor solution in some subdomains with a certain probability which may contribute to jump out of the local optimal solution and finally reach the global optimal solution. The remainder of this paper is organized as follows: Section II introduces the DPD background, heuristic background, and the definitions of the heuristic algorithms. In section III, novel methods based on the heuristic algorithm and appropriate implementation structure are proposed to identify the appropriate DPD model structure. Experimental results are presented in section IV. In section V, a conclusion is drawn.

A. DPD BACKGROUND
In terms of model identification, indirect learning (IDL) [20] and direct learning (DL) [21] are classic structures. In this paper, the indirect learning architecture (IDL) [20] is considered and shown in Fig. 1. One of the characteristics of IDL is the quick convergence capability. In order to effectively linearize the nonlinearity of PAs in broadband scenarios, the GMP model is adopted as an example, which is represented as: (1) where and are nonlinear order and memory depth of the aligned signal; , and are the memory depth, nonlinear orders, and cross terms length of the lagging term; , and are the memory depth, nonlinear orders, and cross terms length of the leading term. Represent (1) by a matrix as where where ∈ ℂ 1 ×1 , ∈ ℂ 2 ×1 ∈ ℂ 3 ×1 stand for the coefficients and ( ) ∈ ℂ × 1 , ( ) ∈ ℂ × 2 , ( ) ∈ Generally, (5) is over-determined, hence the parameters of the model can be calculated by the least square (LS) algorithm [22], where the input signal ∈ ℂ ×1 and the output signal ∈ ℂ ×1 are power-aligned where ∈ ℂ ×1 represent the parameters of the post-inverse of the PA, = 1 + 2 + 3 is total number of parameters and (•) represents conjugate transpose. If a system can be modeled by a th order model, according to [23], the th pre-inverse is equal to the th post-inverse of the model. Correspondingly, by swapping the input and output in the above procedure, the DPD model is constructed. The DPD coefficients can be obtained and expressed as follows

B. BASIC FACTORS OF HEURISTIC ALGORITHMS
The neighborhood is one of the main conceptions in optimal domain, which is defined in terms of Euclidean distance: where ( ) ∈ ℝ × is the neighborhood of ( ) = The merit to measure model performance is the normalized mean square error (NMSE), which denotes as where stands for the length of the input signal, ( ) ∈ ℂ ×1 is the input signal, ( ) ∈ ℂ ×1 is the output of the PA after DPD. Due to the complexity of coefficients extraction directly relates to parameter number, the complexity of the model can be measured by the number of parameters of the GMP which is marked as . Considering both modeling performance and complexity, a cost function can be proposed as: where and represent the th fitness value and performance evaluated by NMSE, ( ( )) is a function of th parameter number ( ), is the regularization factor. The trade-off between the performance and the complexity of the model is governed by , which is an indicator of how much importance the complexity of the model is. So far, it can be concluded that what needs to be solved is a multiobjective optimization problem, specifically the optimization of both performance and complexity.

III. HEURISTIC STRUCTURE SEARCHING
Hill-climbing method is directly used to select model structure in [13], where a satisfactory result can be obtained. However, the drawbacks of the hill-climbing still exist, and may limit the final performance and the computational complexity is still high. Therefore, we propose the following optimized selection method for model structures to enhance the hill-climbing method.
1) To enhance the global search capability of the hillclimbing and to reduce the probability of falling into local optima, multiple initial points are set and searched in a parallel way. 2) To search the feasible domain effectively, a black-list is established to prevent duplicate searches. 3) Actually, an appropriate set of basis is the primary contributor if it can best represent the parameters. We propose to use OMP method to further optimize the basis set to reduce model parameters, which is an efficient way since it can select the most appropriate subset. 4) Since heuristic algorithms including the hill-climbing method totally iterate in an adaptive way, the numerical instability is likely to happen especially when searching the high order models. Therefore, we also introduce a ridge regression technique into our optimized selection method to enhance numerical stability.

A. ENHANCED HILL-CLIMBING METHOD (EHC)
In the initialization stage, in order to enhance global search capability of the hill-climbing and, to some extent, to avoid its likely trapping into local optimal solutions, multiple initial points instead of one are set and represented as where initial point ∈ ℝ ×1 is set randomly and distributes in different places in the solution domain, which can support parallel search. Take the th search path as an example and compare the fitness values of the current search point with its neighbors, the next searching point will be decided where ( ) is the th iteration in the th path, ( ) ∈ ( ) is one of ( )'s neighbors. The corresponding cost functions are and . The fitness value of the next point is = min . After the iteration, the optimal one from the multiple solutions obtained in parallel is taken as the final solution.
Traditionally, duplicated searches tend to appear in heuristic algorithms when searching different point with its neighbors, resulting in a low searching efficiency. For example, ( − 1) is a neighbor of the point ( − 1), which is also a neighbor of the current point ( ) represented as ( − 1) ∈ ( − 1) and ( − 1) ∈ ( ) . Since ( − 1) ∩ ( ) ≠ ∅ as shown in Fig. 2, therefore, some points are likely to be searched again or for several times, making it quite inefficient in both computation and memory resources, especially in a wideband scenario where a GMP model with high orders is needed.
Therefore, a black-list is created where the poor searched points are stored. It can avoid duplicate searching and thus improve search efficiency. Meanwhile, it can effectively avoid getting trapped in the previous search area leading to local optima. Specifically, add ⌊ ⌋ % to in each iteration, where ⌊•⌋ L% represents L% poor individuals in the neighborhood, and is ( ) sorted in ascending order. In addition, accepting a suboptimal solution can also reduce the probability of being trapped into local optima, which is inspired by the simulated annealing (SA) [14], where where ℎ ℎ ∈ (0, 1) is a preset threshold value, ∈ (0, 1) is a random number. Once > ℎ ℎ , ( ) is accepted which is an idea borrowed from SA.

B. PROPER SUBSET SELECTION OF BASIS FUNC-TIONS
In this section, we propose a method to further reduce model parameters. Actually, the performance of the model is contributed by an appropriate subset of model parameters, which means the possibility of using fewer basis functions in modelling. There are many approaches to implement further model size reduction. Typically, principal component analysis (PCA) [24] is a well-known technique suitable for converting correlated bases into uncorrelated ones and reduce number of basis functions. Besides, orthogonal matching pursuit (OMP) [19] selects the primary proper subset by testing all the elements in the dictionary and sorting the contributions in reducing the NMSE. However, in DPD, OMP tends to achieve better performance than PCA while keeping the same number of basis functions. The reason is that after PCA, there tends to be more basis functions corresponding to certain components because of the decoupling of basis functions. Therefore, OMP is considered in our method.
TABLE I shows the FLOPs corresponding to different operations, where ( * ) −1 stands for the inverse of the matrix and ( * ) represents the conjugate transpose. X stands for the basis functions and ∈ ℂ × , where and are number of samples and basis functions, respectively.
The computational complexity can be represented by： O ⊕ = 4 2 + 4 where O ⨂ and O ⊕ represent the computational quantity of multiplications and additions, respectively, for one iteration.

C. NUMERICAL STABILITY IMPROVEMENT
Using LS to calculate the coefficients containing matrix inversion operations tends to encounter larger error when the Hessian matrix involved is ill-conditioned hence leading to wrong iteration direction. Because the HC algorithm works in a fully adaptive manner, numerical instability is likely to happen especially when searching the high order models. Tikhonov regularization addresses the numerical instability of the matrix inversion and subsequently produces lower variance models. This method adds a positive constant to the diagonals of , to ensure the matrix to be nonsingular. As a consequence, the analytical expression can be represented as which is an unconstrained continuously differentiable convex optimization problem, with the solution in the form of where λ is the damping factor, ∈ ℂ × is an identity matrix. The method is called Ridge regression (RR) in [25], which is a particular case of Tikhonov regularization. The solution can be attained analytically in a closed-form. A typical value of λ is equal to the resolution of the system. In this paper, we let = 2 × 10 −7 . This choice is reasonable because a recent method of iterative updating λ is proposed in [26], and it converges to λ = 10 −7~1 0 −8 , which is comparable to ours. This small bias does not disturb the solution evidently but significantly eases the ill-conditioning problem of the Hessian matrix because where κ ′ is the matrix condition after modification, and are the maximum and the minimum eigenvalue of the Hessian matrix, κ is the original matrix condition. A larger condition number signifies an unstable numerical procedure calculating the matrix inversion. Therefore, the numerical stability improved significantly to avoid the instability that may lead to a wrong search direction.
We just take a simple model to show the important influence of numerical stability. As shown in Table II, final NMSE performance, matrix condition and floating-point operations (FLOPs) count of the Hessian Matrix involved in the single iteration are associated with model structures. The MP model is employed. An exhausting search method is adopted to determine the best structure, where the search domain is limited to memory depths 0 to 8 and nonlinear orders 1 to 8 including 9 × 8 = 72 different possible models. The best model structure is (Mem. , Nonlin. ) = (4,6).
Normally, with the increase of model orders, memory depths, or nonlinear orders, the matrix conditions and FLOPs will increase correspondingly as shown in Table II. Specifically, A(4,1) and B(4,6) have the same memory depth, and point B with higher nonlinear order has larger matrix condition numbers, more FLOPs, and lower NMSE. Similarly, A and C indicate that, under the same nonlinear order, increasing the memory depth will increase the matrix condition numbers and FLOPs. B represents the best MP structure. D(4,7) has the same memory depth and the nonlinear order is higher by 1 than B, but B has a better NMSE. Furthermore, the matrix condition number of D is nearly 1 212 times of B, and 2 001 560 more FLOPs are required. Therefore, choosing a suitable model size helps to achieve higher performance under lower matrix conditions and FLOPs. So far, in order to solve the problems existing in traditional HC algorithm, including likely falling into the local optimal solution affecting the final performance, often duplicated the search resulting in very low efficiency, and unstable numerical stability during the iteration process, we propose EHC, which has more powerful global search ability through multiple initial points parallel search and borrowed idea from SA. And the blacklist in EHC can effectively prevent duplicate searches and improve search efficiency. Moreover, RR is used to enhance the numerical stability, which makes EHC search more robust.

D. ENHANCED GENETIC ALGORITHM (EGA)
Next, in order to illustrate the global search ability of the proposed algorithm, we further propose EGA algorithm with global search capability for more comparison.
Genetic algorithm (GA) is also an important member of the heuristic algorithms, which is a classical global optimal adaptive search algorithm. Therefore, it is chosen as a comparison of the global search capabilities of EHC. In addition, we also use RR in GA to enhance numerical stability and introduce a blacklist mechanism to prevent duplicated searches. We call it enhanced genetic algorithm (EGA).
In EGA, the fitness function can be defined in (10). Crossover and mutation operators are used to search for feasible regions. A crossover sample is presented in Fig. 4. For ( 1 , 2 ), interchanges happen in the intersections to produce new individuals ( 1 ′ , 2 ′ ). Then, two individuals with best fitness are selected in ( 1 , 2 , 1 ′ , 2 ′ ) as the next generation ( 1 +1 , 2 +1 ) to participate next iteration. Mutation operator refers to the process of replacing certain genetic values in a chromosome with other values to create a new one. A mutation example is also presented in Fig.4, where gray depth represents the changes in the corresponding index of the origins limited by the search domain.
The crossover and mutation operations cooperate with each other to search the model structure in the search domain. A concise experimental result is presented in TABLE III to confirm the feasibility and effectiveness of the EGA. The specific parameter selection process for EGA is in the next section.
So far, we propose two search methods based on heuristic algorithm, namely EHC and EGA. In the subsequent experiments, on the one hand, we specifically verify the high performance of models obtained by EHC and EGA. On the other hand, we also compare the global search capability of

E. PROPOSED ARCHITECTURE
In order to combine the proposed model search algorithms with DPD in a more flexible way. A novel structure is proposed and the block diagram is shown in Fig. 3. Firstly, the heuristic algorithm proposed above is used offline to search a model structure. Then the extracted model structure is employed in DPD. ̃( ),̃( ),( ) ∈ ℂ ×1 represent complex

Training model based on IDL
input signal, output signal, and the signal after DPD, respectively. is the model structure searched by heuristic algorithm we proposed above. The proposed architecture consists of the following three working mode. − is on and − is off. Either heuristic method proposed above is employed to obtain an optimal model marked as in an offline way. Working) − is off and − is on. DPD training is carried out under IDL structure based on the optimal model obtained in initialization mode or updating mode. Updating) Both − and − are on. Once the operating state of the PA changes significantly, the DPD model should be updated. The above heuristic method operates again to obtain a new optimization model offline without interrupting the operation of DPD. It should be noted that the updating mode operates offline and the parallel mechanism introduced can accelerate the convergence.

A. MEASUREMENT SETUP
In this section, the proposed model is validated and compared with the experimental model. The photograph of the experimental platform is shown in Fig. 5, which is composed of a personal computer (PC) with MATLAB R2021a, a Vector Signal Generator (SMW200A) from Rohde & Schwarz, a linear drive amplifier, the main PA to be tested (BLM9D2325-20AB), a spectrum analyzer, a 40dB RF attenuator.
The specific procedure is as follows: firstly, the discrete baseband signal generated by PC is sent to the Vector Signal Generator where a continuous baseband OFDM signal is generated and up-converted to the RF band. Secondly, the RF signal is amplified by the linear PA and the main PA. Thirdly, the output of the main amplifier is attenuated, down-converted, and sent to the spectrum analyzer. Finally, the signal is sent back to the PC for DPD processing, specifically including power-alignment, time-alignment, and coefficients update.

B. HEURISTIC PARAMETERS SETTING
Before the experiment, parameters of the heuristic need to be determined. The signal used to determine heuristic parameters is a 100 MHz-OFDM and the tested PA mentioned above. The range of the 8 dimensional parameters in (1) varies according to different application scenarios. The regularization factor in (10) is searched offline at an interval of 0.0001 between 0.001 and 0.2, and a best value of 0.036 is obtained.  In this paper, we let = 2 × 10 −7 to eases the ill-conditioning problem of the Hessian matrix.
For EHC, the neighborhood radius in (8) is set as = 1. When is small, it means better local search capability and more iterations. The established blacklist mechanism in EHC can effectively prevent the duplicated search caused by the neighborhood overlap of search points. Furthermore, by dividing into P segments and randomly generating P initial search points, not only the search speed can be accelerated, but also the risk of falling into local optimum can be reduced. According to TABLE V, considering both performance and running time, we set P HC = 3.
For EGA, through simulation, elimination rate and survival rate are chosen as 0.2 and 0.5 respectively. Half of the remaining 30% of the population are randomly selected for mutate, and the rest participate in crossover to generate new individuals. The size of the population P affects the running time and the performance, as shown in TABLE V. Meanwhile, it is necessary to ensure that enough models are searched in the search domain with less running time. Considering the above factors, P GA = 100 is reasonable.  Under the above settings, Fig. 6 can be obtained. The discrete red dots are structures searched by EGA, and the blue ones represent structures searched by EHC.
The convex hull of the search interval formed by the red stars is also called the Pareto front, which is composed of the optimal solution set. The convex hull represents the best tradeoff between model performance and complexity. In fact, the problem of finding suitable model structure is a multi-objective optimization problem, which aims to find the best balance between model complexity and performance. In such problems, the optimal solution set of the objective function also becomes the Pareto optimal solution set.
Furthermore, from the distribution in Fig. 6, we can confirm that the global search ability of EHC with multiple initial points parallel search is enhanced greatly. Because of the strong local search capability of EHC itself, the search range of EHC is still close to the convex hull even with the enhanced global search capability. Therefore, each path in EHC can converge to the convex hull quickly. It can be concluded that compared with EGA, EHC can not only enhance the global search capability but also achieve more efficient search.

C. COMPARISON MODEL SETTING
For a fairer comparison, we select the GMP comparison model according to the following methodology.
Since GMP model can be considered as an extended version of MP model, hence to determine the parameters of the comparison GMP model, we first select the best memory depth and nonlinear order in MP model as mentioned in part C of section II, and use them as and in GMP model, respectively. After that, determine the parameters of the leading and lagging terms. In terms of nonlinearity, , are taken to be close but not greater than . And in terms of memory depth, we need to ensure the existence of cross terms. It should be noted that in general, L b ≥ , and L c ≥ are in GMP model. The comparison model will be marked as Compara in the following section.

D. MEASUREMENT RESULTS
Experiments on both 60 MHz and 100 MHz signal are conducted and they will be described in detail as follows.
In the initialization mode, turn on − in Fig.3 and obtain P best based on the proposed heuristic algorithms. In EHC, the searched best model is P EHC 60M = [6,2,5,5,1,1,4,1], and EGA converges to model P EGA 60M = [4,5,2,3,2,3,2,2]. In terms of the highest memory depth and non-linearity, there is not much difference between the final model obtained by EHC and EGA, specifically. In terms of the number of parameters, EHC has 48 parameters and EGA has 44 parameters. And the performance of the two models is comparable.
In order to further illustrate, a neighborhood of radius δ = 2 of P EHC 60M and P EGA 60M are presented in Fig. 7 marked as two green diamonds. Both P EHC 60M and P EGA 60M are the structures with the best performance in their respective neighborhoods. These two models obtained are then further simplified by OMP represented by two black stars, where NMSE are comparable and superior to other DPD structures with the same number of parameters.
Then turn off − , and turn on − . The model P best is trained based on IDL to obtain the corresponding DPD coefficients. The convergence curves of different models determined by EHC and EGA model are shown in Fig. 8. There is almost no difference in convergence speed between them. The specific NMSE, ACPRL/ACPRU and EVM are listed in TABLE VI, and the proposed method can find models with comparable performance and less parameters. Obviously, for the case without DPD, because the PA operated in a high gain compression state, a poor NMSE is got. For the other four cases with DPD, comparable and satisfying performance is got. After the model converges, the power spectral density (PSD) comparison among different models is shown in Fig. 9. Compared with the case without DPD all models have comparable and quite satisfactory adjacent channel performance after DPD.
Finally, compared with the Compara model, the computational complexity of models searched by EHC and EGA are both reduced by approximately 75%, which is calculated based on (14) and (15) and the results are shown in 60 MHz section in Fig. 10.
However, the searched models of EHC and EGA exist difference. In EHC, several search paths instead of one can improve search capabilities and avoid being trapped into local optima. VOLUME XX, 2017 While in EGA, a relatively optimal result is reached because of to its global search capability.

2) 100 MHz SIGNAL
The test signal is an OFDM signal with an increased bandwidth of 100 MHz, and its PAPR is 8.78 dB. The length of data is 92 112. The input power is -3 dBm and the sampling rate is 614.4 MHz. The main PA is a Doherty PA from Ampleon with a center frequency at 2.4 GHz. Compared with case 1), the bandwidth of the input signal is increased, implying stronger nonlinearity and memory effect of the PA. The order of the model should be increased accordingly, in order to improve the modeling ability. Ranges of the coefficients are set as 1 ≤ ≤ 8, 1 ≤ ≤ 20, 0 ≤ ≤ 5, 0 ≤ ≤ 5, 0 ≤ ≤ 8,0 ≤ ≤ 8, 0 ≤ ≤ 8, and 0 ≤ ≤ 8 to limit the search domain containing 7 464 960 different possible model structures. And according to the "Comparison Model Setting" mentioned above, the comparison model is obtained as P compara = [6,12,5,4,4,5,4,4].
Turn on − in Fig. 3, and obtain P best . In EHC, the final model,P EHC 100M = [7,4,5,1,1,3,5,3]is obtained and 909 different models are compared. The EGA converges to P EGA 100M = [7,4,3,4,0,4,5,2] with 808 different models compared. It can be seen that the models searched by the two methods are very similar. Although there are slightly differences between their results, they all achieve a good trade-off between complexity and performance. After OMP, the number of parameters in EGA and EHC is reduced to 66 and 68, respectively. In Fig. 11, we compare all models with a neighborhood radius δ = 2 for P EHC 100M and P EGA 100M that are the structures with the best performance in their respective neighborhoods. After OMP, parameters number of P EHC 100M and P EGA_100M are further reduced, and they significantly outperform other structures with the same number of parameters. The AMAM and AMPM of PA are shown in Fig.12, which are calculated and plotted offline, and it can be seen that the PA has been effectively linearized.  Then, turn off − is and turn on − is closed for DPD training. The optimal solution and their convergence rate are comparable with each other as shown in Fig. 13. The specific numerical results are listed in TABLE VII, where the NMSE, ACPRU/ACPRL are comparable with the P compara as shown in Fig. 13 and Fig. 14, and the EVM is comparable too. Besides, the models generated by heuristic proposed have less parameters, which greatly reduces the computational complexity. For both EGA and EHC, the amount of computation in a single iteration is reduced by approximately 80% as shown in the 100 MHz section in Fig. 10.

V. CONCLUSIONS
In this paper, a novel method based on the heuristic algorithm is proposed to optimize and identify the appropriate DPD model structure with two specific algorithms, namely EHC and EGA. The proposed method explores the problem of obtaining an optimal solution in a considerable large search domain in parallel, and this search domain is difficult to search exhaustively. Once the operating state of the PA changes significantly, new optimal structures can be searched offline also in parallel with DPD. Then, since the heuristic algorithm itself iterates in an adaptive manner, RR is introduced to improve the numerical stability and ensure the correctness of the search direction. Furthermore, considering the fact that only a proper set closely related to the parameters is actually the primary contributor to the final performance, OMP is introduced to further reduce the number of parameters. Therefore, the combination of parallel searching, enhancing the numerical stability with RR and further reduction of model parameters using OMP greatly reduces the computational complexity and improves the robustness of the searched model. Specifically, the proposed EHC has a more powerful global search ability and can effectively prevent duplicated searches and improve search efficiency. And both EHC and EGA have a more stable and faster iterative process owning to the introduction of RR and blacklist.
Finally, based on the novel heuristic DPD paradigm we proposed, the experimental results of GMP on both 60 MHz and up to 100 MHz bandwidth signals further prove the effectiveness of the proposed method.