A System Approach for Efficiency Enhancement and Linearization Technique of Dual-Input Doherty Power Amplifier

In this paper, we propose an efficient system approach to improve the power efficiency of dual-input Doherty power amplifier (DIDPA) with maintaining its linearity level. An auto-tuning process based on a hybrid heuristic search control (HHSC) is applied to optimally define DIDPA configuration by optimizing its free parameters. Digital predistortion (DPD) is then integrated to linearize DIDPA using an optimal reduced-complexity model based on the segmentation approach. An optimal pruning process of the free parameters, based on hill-climbing (HC) heuristics, is proposed to reduce the HHSC complexity in order to refine the optimal DIDPA configuration. The system approach has been approved by experimental results, in different scenarios, using an LTE 20 MHz signal with a PAPR of 8 dB. DPD linearization under optimal DIDPA configuration improved linearity using a low-complex model with only 30 coefficients, which exhibited an error vector magnitude (EVM) of 2.5% and an adjacent channel power ratio (ACPR) of −50 dB at an averaged output power of 34 dBm. By updating the cost function coefficients and pruning the free parameters, DIDPA exhibited an EVM of 3%, an ACPR of −50 dB, and a drain efficiency of 47% at an average output power of 39 dBm.


I. INTRODUCTION
Introducing non-constant amplitude modulated signal in radio frequency (RF) transmission systems enhances power amplifier (PA) efficiency while maintaining an adequate linearity level more challenging. Starting from wideband code division multiple access (W-CDMA) in 3G, the peak-to-average power ratio (PAPR) of signals increases with the use of orthogonal frequency division multiplex (OFDM) in 4G LTE and 5G new radio (NR).
Advanced PA architectures based on dynamic load or supply modulation have been proposed in the literature to avoid wasting excessive power resources [1]. Some of the most popular solutions are Doherty [2], envelope tracking [3], Chireix [4]. and outphasing [5]. These highly efficient topologies require linearization techniques such as digital predistortion (DPD) to meet the linearity requirement, especially with the increase of signal bandwidth.
The amplification architectures based on active load modulation are among the most common PA efficiency enhancement techniques (such as Doherty), which rely on the nonlinear interaction between the main and auxiliary transistors for modulated signals with a large dynamic. Although these architectures can be designed with a single RF input to be used in the transmitter, several studies have been reported in the literature to highlight the benefits of maintaining separate inputs [6], [7], [8] and the advantages of dual-input PA compared to single-input has been studied in [9].
The separation of RF inputs provides additional degrees of freedom, so-called free parameters, that can be set to improve the performance or to enhance the PA efficiency [10]. These free parameters present a set of crucial circuit and system-level parameters, including PA bias voltages, power ratio, and phase shift of the separate input signals.
Focusing on dual-input Doherty PA (DIDPA), searching for optimal free parameters guaranteeing high performance requires experimental cross-validation or exhaustive search. These processes are usually costly and computationally significant, especially when the search space is enormous and not limited. Besides, setting these free parameters to their optimal values within a defined interval can be considered a global optimization problem. Several techniques have been proposed in the literature to find the optimal set of free parameters among large tunable ranges considered as search intervals [11].
Generally speaking, DIDPA should operate at its maximum efficiency, leading to poor linearity. To overcome this system drawback, it is necessary to introduce an efficient system approach to ensure the trade-off between linearity and efficiency. Regarding linearity, digital predistortion (DPD) is a powerful linearization technique used to compensate for the PA nonlinearities [12].
DPD consists of applying a pre-correction to the input signal so that the cascaded system (DPD and PA) behaves like an ideal linear and memoryless amplification. Designing a DPD model that allows finding the inverse characteristic of DIDPA is challenging, particularly finding a friendly candidate model for the hardware implementation with good numerical properties and high modeling accuracy.
On the other hand, the efficiency of DIDPA is limited by an output back-off (OBO) needed to prevent signal peaks from going beyond the saturation point. Since DPD introduces a back-off that allows the OBO to be equal to the PAPR of the input signal, efficiency can be further improved by reducing the PAPR signal with various crest factor reduction (CFR) techniques [13]. In the literature, several research works have focused on the joint combination of CFR and DPD [14], [15], [16], [17].
The motivation of this paper is to provide a rigorous and practical system approach to DIDPA. This is achieved by placing DIDPA in an iterative system-based process while focusing on three aspects related to the study of DIDPA: optimizing the free parameters provided by the separation of the input signal, reducing the PAPR using CFR, and compensating for the nonlinearities using DPD linearization.

A. STATE-OF-THE-ART
In the literature, the design of DIDPA with enhanced efficiency has been reported in numerous research works. Few of them deal with the optimization of its configuration, and even fewer deal with the joint optimization of DIDPA parameters and its linearization technique.
Early work investigating the linearization of DIDPA has been reported in [18], where the authors used vector-switched generalized memory polynomials [19] to improve the linearity. The separation of the RF inputs has been statically achieved by performing several combinations in a simulation environment.
In [20], the authors have split the RF inputs for multi-input Doherty PA by performing an exhaustive search that returns the best configuration regarding the power ratio. However, the bias voltages of Doherty PA are identically biased, which may limit the efficiency enhancement.
Another research work in [21] proposed an adaptive signal separation in terms of the power ratio from the static measured results of DIDPA. The results in [21] once again confirmed the interest in optimizing the separation of the input signals to be transmitted to DIDPA, which can significantly improve efficiency.
Another confirmation can be found in [22], where the authors highlight the need for optimizing the phase shift between the two RF inputs for dual-input load modulated balanced PA. In [22], the authors have performed an exhaustive search to find the optimal phase shift by sweeping the phase over a determined interval. This approach can be costly, especially in terms of its implementation. Besides, sweeping the phase over a large interval with reduced resolution can be critical, especially at intervals where there will be no output power, leading to heat dissipation in the device, which can damage it. Finding the optimal shift phase can be viewed as an optimization problem with a global minimum that requires a unidirectional minimization with reduced complexity.
The first work related to the online learning-based optimization of DIDPA is proposed in [23]. The authors proposed an adaptive technique based on a simultaneously perturbed stochastic approximation (SPSA) algorithm to tune the power ratio, the phase shift, and the bias voltages as free parameters. A cost function is used to control the algorithm convergence and defined in terms of the output gain G and the power added efficiency (PAE) in the additive criterion. However, the linearity requirement has not been met since SPSA has focused only on efficiency enhancement.
An extension of the work in [23] is reported in [24], where the authors update the cost function by including, in addition to G and PAE, the output power P out and adjacent channel power ratio (ACPR) as metric referring to the linearity. The optimization process used in [24] is carried out by using a global optimization algorithm such as simulated annealing (SA) in the first order. Once the cost function achieves its optimal value, a fine-tuning process based on an approach of learning-based control such as on the extremum-seeking control (ESC) is used. In [24], the free parameters' global optimization process does not include any linearization technique or PAPR reducer.
In [25], a novel auto-tuning approach has been proposed to enhance the power efficiency of the DIDPA while meeting the linearity requirement. It consists of optimizing the free parameters using a proposed hybrid heuristic search control (HHSC) according to a designed cost function that indicates the trade-off between power efficiency and linearity. The work presented in [25] is considering the first part of the research work presented in this paper.

B. MAIN CONTRIBUTION
In this paper, we propose a new system approach to follow the workflow of the efficiency enhancement of DIDPA with its linearization. Three major phases conduct this workflow: optimization of DIDPA configuration based on HHSC, DPD linearization, and joint optimization or pruning version of HHSC and DPD. These three phases use an adaptive cost function and change its coefficients at each phase according to each specification requirement. The free parameters optimized in the HHSC process cover the baseband calibration process, such as the power level and PAPR reduction. Besides, a DPD linearization is jointly optimized and integrated into our approach to meet the linearity requirements. The proposed approach brings a set of contributions regarding the optimization and linearization of DIDPA, which are emphasized by: r An efficient optimization process based on HHSC by combining a global optimization problem and adaptive control to ensure the convergence of the free parameters to their optimum.
r Design a multi-objective optimization cost function to represent the trade-off between linearity and power efficiency.
r Design an optimal DPD model with a reducedcomplexity aspect by hill-climbing (HC) heuristic. The DPD model involved in this study is based on the model with a segmentation approach.
r An adaptive tuning of the cost function according to the linearity improvement by the DPD and power efficiency by the optimal configuration. r An optimal process to prune the free parameters in HHSC to refine the optimal configuration with the presence of DPD towards reduced computational complexity. Unlike conventional solutions reported in the literature for DIDPA that focus on a single perspective study, the advantage of our proposed system approach is to drive DIDPA iteratively to an optimal operating point where linearity and efficiency meet their requirements. This is achieved by thoroughly studying and optimizing each phase of the system approach and executing the appropriate algorithms with high performance, low complexity, and good numerical properties for its implementation.
The remainder of this paper is organized as follows. Section II presents the system-level aspects, including the DIDPA, the proposed architecture, and the free parameters to be optimized. Section III describes the proposed system approach by focusing on the cost function design, the auto-tuning process-based architecture, the DPD linearization technique, and re-optimizing free parameters using an optimal pruning process. Section IV describes the test bench. Section V presents the experimental results of the proposed approach. Finally, the conclusion is given in Section VI.

A. DUAL-INPUT DOHERTY POWER AMPLIFIER
The PA based on active load modulation such as Doherty and outphasing with separate RF inputs can be viewed, by generalization, as the block diagram depicted in Fig. 1.
The DIDPA has two RF inputs, a drain bias V DC , and two gate-source voltages V GS,1 and V GS,2 to control the transistor's terminal independently. A typical example of PA with independent V GS is Doherty, where the main (carrier) amplifier is biased in class B and the auxiliary amplifier (typically named peaking amplifier) in class C.
The instantaneous amplitude and phase of each input in the baseband, as well as the V GS gate bias voltages, can be controlled and adjusted separately, allowing a significant degree of freedom for these parameters to improve the power efficiency. The main amplifier reaches its maximum output voltage at a given operating back-off and becomes maximally efficient. From this power level, the auxiliary amplifier turns on and injects current into the common node, increasing the output power and modulating the load seen by the main amplifier. In this paper, the DIDPA presented in [26] is used as the device under test (DUT), where the authors have presented a 3.0-3.6−GHz wideband GaN Doherty PA with a frequency dependency compensating circuit.

B. ON-LINE ARCHITECTURE
To meet the objectives of optimizing DIDPA, which consists of improving efficiency while maintaining a better linearity level, we propose an on-line architecture described in Fig. 2.
In this paper, the DPD is used to linearize the DUT by compensating for the nonlinearities of DIDPA. Additionally, CFR is used to reduce the PAPR of the transmitted signal so that DIDPA can operate with less BO. Both CFR and DPD are implemented in baseband.
On the other hand, the DIDPA requires two separate input signals. Therefore, the baseband signal to be sent to the PA should be divided into two input signals, which are different in amplitude and phase, using a splitting function, the so-called digital splitter, designed in the baseband.
Each block has parameters to be set or controlled, which requires a design of a control engine based on an optimization approach that optimally determines these parameters to ensure an operating point of DUT exhibiting a better trade-off between efficiency and linearity.  In this architecture, the DPD block will not be controlled by this control engine since the DPD technique requires linear regression techniques such as the least square (LS) method to identify the model coefficients.

C. FREE PARAMETERS
Each block in Fig. 2 has free parameters to be set or controlled, which are summarized in Table 1.

1) FREE PARAMETER OF CFR
The CFR technique used to reduce the PAPR is based on peak cancellation [27], [28], the principle of which is based on the clipping and filtering CFR technique and carried out through two stages: hard clip and clip-and-filter.
The hard clip is the most basic CFR technique, where the input signal v(n) is clipped according to a threshold μ. The signal generated at the first step is expressed as: At the second stage, v HC (n) is filtered using noise shaping. Finally, the output signal u(n) is given by subtracting a time-aligned weighted version of the filtered peak cancellation signal from the original input signal v(n).
where α s is the subtraction parameter.
Therefore, we use the clipping threshold μ as a free parameter of the CFR block to be controlled.

2) FREE PARAMETERS OF DIGITAL SPLITTER
The motivation behind using two separate RF inputs is to eliminate analog input splitters, such as the Wilkinson divider, and to allow independent power control to the main and peaking amplifier.
Digital splitter divides the complex signal x = X e jθ into two complex signals x m and x p defined as: We propose to take the power ratio α and the phase offset φ as free parameters with

3) FREE PARAMETER OF TRANSCEIVER
In the calibration process, it has been shown that two essential operations are needed to be established from the baseband: fixing the DAC resolution, which is integrated into the RF transceiver, and setting the gain attenuation, which controls the power level of the transmitted signal. For DAC resolution, it is recommended to scale the IQ data in baseband to ensure high accuracy and minimize loss of information. The gain attenuation directly controls the power level of the signal in the Tx branch.
In the transceiver block, we use two parameters A m and A p defined in the baseband to control the attenuation in the branch T × 1 occupied by the main amplifier and in T × 2 occupied by the peaking amplifier.
We have approximately estimated the relationship between A m (and A p ) and the average power of the main P m,dBm (and peaking P p,dBm ) RF signal as: where a, b, and c are parameters defined empirically from some preliminary tests and stored in look-up (LUT) and indexed in terms of the center frequency f c , and the signal bandwidth. These test results provide a datasets of a, b, and c that will be used subsequently according to the parameters of the scenario at hands. If A m and A p are equal, the power P m,dBm and P p,dBm should be equal. However, in practice, we have observed that by assigning the same numerical value to A m and A p , P m,dBm and P p,dBm through the power sensors are different. This difference between P m,dBm and P p,dBm can be adjusted and compensated in baseband by using a parameter denoted by ψ with Finding ψ that satisfies (6) can be done in the calibration process. However, we propose to consider it as a free parameter controlled from baseband, which could be viewed as a hardware parameter since it can adjust the input power distribution over the main and peaking amplifiers.

4) FREE PARAMETER OF DUT
The main and peaking inputs of DIDPA control the main and peaking amplifiers, biased with V GS,m and V GS,p , respectively. These biased voltages are controlled from the baseband and defined within a DC voltage range. The DC power supply used to manage the gate bias voltages is connected to the PC workstation through an Ethernet connection that enables real-time voltage monitoring from the baseband. Therefore, V GS,m and V GS,p are taken as free parameters.

III. SYSTEM APPROACH FOR LINEARIZATION AND EFFICIENCY ENHANCEMENT OF DIDPA
The process of the proposed system approach is mainly composed of five sub-processes: r Design of the cost function to control the convergence of the free parameters optimization process.
r Optimization of free parameters based on the proposed auto-tuning process.
r DPD linearization based on ILA. r Update the cost function designed in the first subprocess.
r Optimal pruning of free parameters in HHSC.

A. DESIGN OF COST FUNCTION
The cost function is an essential aspect in this research work to ensure a good trade-off between linearity and efficiency. The linearity requirement is presented in terms of two figures of merit (FOMs): error vector magnitude (EVM) and ACPR, while the efficiency requirement is presented by PAE and the output power P out .
The EVM is a metric that measures the in-band distortion level. It is defined in the constellation domain and evaluates the deviation between the reference constellation point and the actual constellation point obtained in the presence of distortions. Analytically, EVM is defined as: Where δI and δQ are errors magnitude corresponding to in-phase symbol and quadrature symbol of received data compared with an ideally reconstructed constellation respectively, N is the number of symbols, S 2 avg is the average square magnitude.
The ACPR is used to evaluate the out-band distortions and defined for the lower (left) and upper (right) adjacent channels as: ACPR U,dB = 10 log 10 where B represents the bandwidth of the signal and P(.) is power spectral density.
The FOMs are weighted according to their importance in the cost function. Additionally, some FOM penalization thresholds can also be defined when targeted specifications are not met.
In this paper, we propose to design the cost function J according to the weighted sum method [29] but with constraints [30] this ref. has not list of authors, which is defined as where EVM t , ACPR t , PAE t , and P out,t are EVM target, ACPR target, P out target, and efficiency target, respectively, that the user attempts to reach. The constraints of the cost function designed in (9) are defined as: Since CFR as a nonlinear process deteriorates EVM dramatically, we propose to use in J EVM , EVM of CFR denoted by EVM C , EVM of DPD and DUT denoted by EVM DD , and EVM of the whole system including CFR, DPD, and DUT denoted by EVM CDD .
There is a way to present the three EVMs in one feature, denoted by EVM ms by using the mean square of EVM CDD , EVM C , and EVM DD , which can be defined as: The constraints of the cost function designed in (9) are defined as: In (9), each objective function J FOM is normalized by its target value J FOM,t , which is defined as a user specification. The cost function design is carried out such that the optimization process attempts to maximize J to 1, indicating that the user's specifications are met.

B. HYBRID HEURISTIC SEARCH CONTROL
Finding the optimal configuration of each free parameter in Table 1 in a defined search range corresponds to our optimization problem.
The brute-force search can help find the optimal free parameters by exploring all possible combinations in the searching space. However, the brute-force search is not a practical solution to be implemented in real-time applications. Consequently, an auto-tuning approach based on an optimization algorithm is proposed to meet this need.
The proposed auto-tuning approach to optimize the free parameters is based on an efficient hybrid heuristic search control (HHSC) based on two types of model-free optimization methods: simulated annealing (SA) as a global optimization search and extremum-seeking control (ESC) as an adaptive control to fine-tune the optimized results.
The choice of combining SA and ESC stems from the main two properties of these optimization methods. Indeed, SA is well known to guarantee the convergence to a neighborhood of the global optimum in a compact search set [31]. Furthermore, ESC is proven in [32] to converge to a local optimum. Based on these two convergence properties, we choose to combine the SA, which will guide the free parameters to a neighborhood of the global optimum, and then switch to ESC, which will finite-tune the search for the optimal parameters in the local neighborhood of the optimum.
The vector of free parameters to be optimized is denoted by . The cost function corresponding to is denoted by J ( ) or J for simplicity. The free parameters to be optimized are defined as: The boundaries min and max are defined as well. The interval [ min max ] presents the searching range of each free parameter. Some preliminary tests, or information about the system, especially DUT from previous works, are necessary to determine the proper optimization interval range.

1) SIMULATED ANNEALING
One of the best-known heuristic search methods for addressing the complex black-box global optimization problems is the SA algorithm proposed in [33].
Physical annealing in the metallurgy domain inspires the principle of the SA algorithm. Physical annealing is the process of heating a material until it reaches an annealing temperature. Then it will be cooled down slowly to increase the size of its crystals and reduce their defects. When the material is hot, the molecular structure is weaker and is more likely to change. When the material cools down, the molecular structure is more rigid and is less responsive to change.
Following the analogy with metallurgy, the slow cooling in simulated annealing depends on the slight decrease in the probability of accepting a worse solution as the solution space is explored. The algorithm should perform an extensive search to find the global optimum solution, so accepting worse solutions is fundamental.
SA algorithm is an almost straightforward stochastic search based on the Metropolis Monte Carlo method [34], the concept of which is to accepts not only the solutions that improve J, but also some solutions that worsen it with a probability p known as the Metropolis criterion and defined as: where E is the change in cost function, k bolt is Boltzmann's constant, and T is the control parameter analogous to the temperature of the annealing process.
During the search, the temperature is gradually decreased until reaching zero value in the perfect case.
The free parameters optimized using SA are denoted by opt,SA . The cost function corresponding to opt,SA is denoted by J opt,SA . The algorithm of SA is described in Algorithm (1).

2) EXTREMUM SEEKING CONTROL
Once SA algorithm has reached a neighborhood of J opt,SA , the optimization procedure switches to ESC to fine-tune opt,SA .
ESC is a control method that regulates the output of a dynamic map to its optimal value. Recently, it has been used extensively for real-time auto-tuning of physical systems, e.g. [35]. We propose here to use ESC in this context of real-time auto-tuning.
Indeed, ESC can be seen as a model-free optimization method, which does not need the gradient information explicitly, but estimates the gradient value of the cost function over  time, via properly designed feedback loops, i.e., filters. ESC is proven to converge to a local optima of the cost function.
One of the most popular and simple ESC methods is the perturbation-based ESC, which is proposed in [32], in which the concept is fundamentally depicted by the block diagram in Fig. 3, where it consists of a target system, the output of which is the cost function J, a perturbation signal asin(ωt ), a gain K, and an integrator.
According to Fig. 3, the controller K s injects a perturbation signal asin(ωt ) into the system, resulting in an output of the cost function J ( ). This output is subsequently multiplied by asin(ωt ), passed through the integrator 1 s , leading to cost's gradient estimate , and added to the perturbation signal asin(ωt ).

Algorithm 2: Algorithm of HHSC.
The loop of ESC can be written as the following dynamical system It is worth noting that this ESC method only needs the numerical values of the cost J, and do not need any closed-from or numerical computations of the gradient of J. Its implementation is also rather straightforward, and corresponds to a simple forward Euler discretization of the dynamics given by (17). This simplicity of implementation makes ESC a good choice for online tuning of physical systems, when real-time computations capacities are limited or expensive, e.g., [35].
In our application, the perturbation-based ESC is used as an on-line process, which is placed downstream of SA to fine-tune opt,SA , around the neighborhood of opt,SA . The optimized configuration by ESC is denoted by opt,HHSC . The cost function corresponding to opt,HHSC is denoted by J opt,HHSC .

3) ALGORITHM OF HHSC
Starting from an initial solution 0 = μ 0 α 0 φ 0 ψ 0 V GS,m 0 V GS,p 0 SA optimizes the free parameters according to a designed cost function J. The optimized solution opt,SA returned by SA will be the initial solution for the ESC process. The algorithm of HHSC is described in Algorithm (2).
The optimized configuration by HHSC is denoted by opt,HHSC , and its corresponding cost function is denoted by J opt,HHSC .

C. DPD LINEARIZATION
Once HHSC is done, it will be decided whether to include DPD in the system approach or not. If not included, the cost function will be re-designed only based only on the efficiency requirement. However, the linearity specifications is more challenging to meet for the wideband signal scenarios. Hence, DPD linearization is part of system approach.
The decomposed vector rotation (DVR) model, proposed in [36], has been chosen to serve as a DPD model as it demonstrates its ability to linearize strong nonlinear behavior with memory [37].
The DVR model can be expressed by: where x(n) and y(n) are the input and output of the model, M lin is the memory depth for the linear term, a i are the complex coefficients of the linear term, and S is the set of the terms T t which are used in the model with S ⊂ T and where T 1,[0,...,P] is the set T 1,0 , ..., T 1,P whose elements are defined hereafter along with T 2 , ..., T 7 .
where K is the number of segments, β k the bounds of the segments, p is the nonlinearity order, M the memory depth, and c ki are the complex coefficients of the model for each segment.
The DVR model has been sized to find its optimal structure using the HC algorithm proposed in [38].  The indirect learning architecture (ILA), presented in Fig. 4, is used to identify the DPD coefficients, which are iteratively estimated by using LS approach in order to minimize the LS criterion built on the difference between the PA input x(n) and z p (n), the model output, so-called the postdistorter, that is computed using the estimated model coefficient c and z(n), the PA output divided by the amplification gain G.
Only the input-output signals from DIDPA are required in ILA to estimate the model coefficients. The principle is based on Post-Distortion and illustrated in Fig. 5.
The instantaneous error is defined as: The postdistorter input and output can be rewritten for N samples as or equivalently using the pseudo-inverse: where (22) minimizes the LS criterion: with:

D. UPDATE OF WEIGHTING COEFFICIENT
The DPD may apply a back-off to the operating point of DIDPA, which dramatically reduces the efficiency requirement. Therefore, the cost function J with DPD, which is denoted by J HHSC+DPD , with updated FOMs and the same initial weighting coefficients will probably have deteriorated, and some free parameters in opt,HHSC will no longer be optimal.
To maximize the cost function again, we propose to rely on the design of the cost function J by adapting its weighting coefficients w = [w 1 w 2 w 3 w 4 ] according to the change effected by the DPD. An intuitive approach can be used by attribute an equal weight to each FOM according to the following equation: where i = 1, 2, . . . , n and n is the number of objective functions that present FOMs. In our context, the cost function is designed by combining efficiency and linearity, as is shown in (9). On the other hand, DPD with an efficient optimal DPD model can significantly improve linearity, which leads to reducing the weight of linearity FOMs in (9). Hence, we propose to design an adaptive cost function, in which the weight coefficients w are adaptive according to the improvement of linearity and efficiency over each block. The weighting coefficients w are updated with respect to how much DPD improves linearity FOMs, e.g., EVM and ACPR, compared to before applying it. Starting from initial weighting coefficients w, we apply HHSC to optimize the free-parameters with J opt,HHSC , then DPD to linearize the DUT under the optimized freeparameters.
It is required to re-compute the cost function J HHSC+DPD once the DPD is performed, and compare it to J opt,HHSC . if J HHSC+DPD < J opt,HHSC , we propose to update the weighting coefficients w 1,2 , w 1,3 , w 2 , w 3 , and w 4 . w 1,1 is not concerned since it depends on the CFR operation.
In this process, we are only focusing on w 1,2 , w 1,3 , and w 2 that refers to linearity FOMs improved by DPD. We calculate the ratio n i of the difference before and after DPD improvement for each linearity FOM. The ratio n i is expressed as: Next, we update the weighting coefficients of FOM linearity as: Once w 1,2 , w 1,3 , and w 2 are updated, we propose to assign an equal weight between w 3 and w 4 according to: The algorithm for updating the weighting coefficients w is described in Algorithm (3).

E. OPTIMAL PRUNING OF FREE PARAMETERS IN HHSC
Once w is updated to the DPD contribution, the cost function's design J is changed.
We run the HHSC again, but only on reduced free parameters in . The HHSC will only be performed on one free parameter that is the most sensitive one in and has the most significant impact on the behavior of J. The pruning process is an off-line procedure, which aims to reduce the complexity of the HHSC when the weighting coefficients are updated.
Pruning the free parameters in HHSC is optimally achieved using the HC algorithm [39]. The motivation behind using the HC algorithm is that it is not a black-box optimization process. The neighborhood property in the HC algorithm makes it possible to follow the algorithm's evolution at each iteration.
Here, the cost function is used for the joint optimization of CFR and DUT.
The HC algorithm starts from a given initial element 1 (0) ,HC at the first iteration and continually moves in the direction of the element with the best cost function value among its neighbors.
In the following, we denote q (i) ,HC by q (i) , and J q (i) ,HC by J q (i) .
At the q th iteration, the search procedure starts from q (0) and test its neighbors q (1) , q (2) , . . . , q (M ) , where M is the number of neighbors of q (0) .
In this study, the neighborhood definition is inspired by the proposed one for the DVR model in [38]. The vector consists of 6 free parameters: [μ α φ ψ V GS,m V GS,p ]. As these free parameters can have their values changed independently, they compose a 6-dimension space. The neighbor of element is defined as an 6-tuple 10 with i = 1, .., 6. The main property of this neighborhood definition is to apply the operation of δ (i) to each free parameter (i) individually.
According to this definition, the element at the q th iteration has 12 neighbors, which are: The element q (i) with the maximized cost function J q (i) is the solution denoted by q (s) . With the neighborhood definition, the best solution q (s) can be compared with the initial solution q (0) since only one free parameter is changed. An efficient way to do the comparison is to subtract q (s) from q (0) , which make it easy to locate the position of the nonzero element in the vector V = q (s) − q (s) . The index of the nonzero element, denoted by idx, will then be stored in LUT, and the HC algorithm moves to the next iteration as long as J q (s) is better than J q−1 (s) . Otherwise, the HC algorithm stops.
The size of LUT is 6 × 2, where 6 refers to the number of free parameters in (6 inputs). The cell corresponding to each free parameter, noted by pos, is incremented when the HC algorithm finds idx. Once the HC algorithm is finished, the free parameter to be used in HHSC, noted by , is determined by the maximum incremented variable in the second column of LUT. If many free parameters have the same number of occurrences in LUT, the algorithm will take them as .
The optimal pruning of free parameters in HHSC is described in Algorithm 4.
The cost function that corresponds to the optimized free parameter opt , denoted by J opt,upd , will be compared to J HHSC+DPD . If J opt,upd is better than J HHSC+DPD , HHSC has improved FOMs compared to those from the previous optimal configuration, and a DPD is required to linearize the DUT. Otherwise, the user must check whether the linearityefficiency specifications are met or not.

IV. TEST BENCH
In order to validate the effectiveness of the proposed system approach, experiments have been carried out using a test bench. The block diagram and the photo of the test bench are shown in Figs. 6 and 7, respectively.
The DIDPA is controlled and evaluated using a MATLAB-based linearization and efficiency enhancement technique. The baseband IQ data are generated and split into two different IQ data inputs sent to the DUT through the AD9371 dual-channel RF transceiver, which is connected to Xilinx FPGA ZC700 through FMC connectors. The transceiver AD9371 up-converts the baseband signals to the carrier frequency f c at 3 GHz. A Keysight N9010A MXA spectrum analyzer is used to characterize the signals spectrum at the output of DUT. For the observation path, The RF output signal is down-converted to the baseband by AD9371, which provides the baseband signal to the PC workstation. Around 100000 IQ samples were recorded for the baseband process with a sampling rate of 245.76 MHz.
The proposed system approach for DIDPA is tested using a 64-QAM modulated 20 MHz bandwidth LTE signal with a roll-off factor of 0.6 at f c = 3 GHz with 8 dB of PAPR.

A. INITIAL CONDITIONS OF FREE PARAMETERS
We define the search range of the free parameters in Table 1 by setting the upper and lower bounds for each free parameter.
The search range is determined empirically according to some preliminary tests.
r Threshold of PAPR reduction μ: The CFR applies a nonlinear process by clipping the input signal v(n) according to a clipping threshold μ, which causes an EVM C degradation. Fig. 8 shows the behavior of EVM C and PAPR of the output u(n) of the CFR block according to the variation of μ. PAPR of v(n) decreases with respect to μ. The EVM of v(n) degrades exponentially with increasing μ. As we target an EVM t around 3%, we define the search interval [0 μ max ] = [0 1.2], for which when μ = 1.2, the degradation of the EVM by CFR is almost 3%.
r Power ratio α: Since α presents the power distribution between the main and peaking branches, we set its search range to [0 1]. r Phase shift φ: The search range [−180°180°] is the standard interval to be set. However, based on some preliminary tests, it was shown that there is some critical search range to be avoided. Therefore, [−10°200°] is taken as a search range for the free parameter φ. r Attenuation difference ψ: As ψ is categorized as a hardware free parameter that refers to the physical power difference between the main and peaking amplifier, we set its search range to [ψ min ψ max ] = [−2 2] dB.

B. INITIALIZATION OF COST FUNCTION
At this point, it is necessary to determine the initial weighting coefficients w and the target FOMs.
We initialize the weighting coefficients by: We attribute more weights to the efficiency since DPD will be included to linearize the DUT. This refers to the fact that the linearity requirements are more relaxed, as it is easier to meet with DPD, unlike efficiency.
Regarding target FOMs, they are defined as:

C. JOINT OPTIMIZATION OF CFR AND DUT
The initial solution for HHSC is defined as: In this step, HHSC is performed according to Algorithm (2). For the SA algorithm, we set T 0 = 1, T f = 0.01, C = 0.96, k bolt = 1, and SA max = 60.
The convergence of J is shown in Fig. 9.
In HHSC, we set empirically 100 iterations for SA to converge, while ESC requires 20 iterations. As SA is defined as a stochastic optimization method, we can see from Fig. 9 that the stochastic behavior has been exhibited in the first 60 iterations where J evolves randomly.
After 61 iterations, the SA algorithm returns the optimal solution. After 120 iterations in total, HHSC returns the optimal configuration opt,HHSC summarized in Table 2.
The evolution of free parameters over HHSC is illustrated in Fig. 10, in which the free parameters have similar behavior to J according to the HHSC iterations.
The FOMs corresponding to J opt,HHSC are shown in Table 3.
It should be noted that HHSC shown in Fig. 10 was performed on the linear region of DIDPA. Indeed, since SA generates random solutions, it was decided to reduce the operating point of DUT while HHSC is running in order to set up a security measure of the DUT.
Once the free parameters are optimized, we raise the back-off by increasing the input power from 11.36 dBm to 21.16 dBm so that the system performs nearly at P out,t .  The FOMs after increasing the input power are summarized in Table 4.
According to the results from Table 4, the efficiency FOMs are enhanced by sacrificing the linearity FOMs. This confirms the choice of the initial weighting coefficients where w 3 and w 4 have more influence than w 1 and w 2 .
By assigning the free parameters to their optimal configuration opt,HHSC in Table 2, the DUT, including the digital splitter and transceiver, can eventually be seen as a singleinput single-output system where the input is x(n) and the output is denoted y(n). The AM-AM and AM-PM characteristics of DIDPA are shown in Fig. 11, where we can see a saturation at high power, which leads to strong nonlinearities. Besides, the memory effects are exhibited as well.

D. DPD LINEARIZATION
The DPD is carried out in two steps: r Determination of the optimal DVR model. r Convergence of linearity FOMs (ACPR and EVM) using ILA.

1) DETERMINATION OF OPTIMAL DVR MODEL
According to the HC algorithm presented in [38], the structure of the DVR model is optimally sized.
In this study, the cost function, denoted by Y , is defined as a search criterion to ensure a good trade-off between three features: modeling accuracy presented by normalized mean square error (NMSE), denoted by N and computed between x(n) and z p (n), model complexity presented by the number of coefficients C, and computational complexity, which is given by the condition number of the regressor matrix Z H Z in (22), denoted by Cond.
The cost function Y is used to control the convergence of the HC algorithm in order to return an optimal DVR model with a considerable ability to be implemented in hardware. We define the cost function as: The hardware implementation relies on the numerical properties of the DPD model, which in this case, is presented by C and Cond. The design of the cost function in (29) may deteriorate C, and N since the objective function of Cond is sized in such a way to override the influence of C and N slightly.
However, the deterioration of N can be overcome by the DPD convergence towards the solution that presents a better trade-off between linearization performance, complexity, and numerical properties. Fig. 12 illustrates the HC algorithm's evolution in terms of C, Cond, and N, where there are a total of 1659 DVR model structures tested through 9 HC iterations. The blue dots present the neighbors tested by the HC algorithm. The red dots indicate the search path taken by the HC algorithm. The green diamond highlights the best solution.
The parameters of the optimal DVR model structure are: Replacing the parameters of the optimal structure in (18), the optimal DVR model used in DPD is expressed by: 2) DPD USING ILA As NMSE being considered as a strong indicator of the ILA convergence, Fig. 13 presents the evolution of the NMSE according to the DPD iterations. As can be seen, the convergence of NMSE is rapid, in which NMSE is improved significantly from the first DPD iteration. At the first DPD iteration, the coefficients of the DPD model are initialized by c = [1, 0, . . . , 0, 0], which make it a transparent block, by which x(n) = u(n), and z p (n) = z(n).
EVM CDD is improved from 8% to 2.97%, which is confirmed in Fig. 14 where the red dots present the IQ constellation of y(n), and the blue dots are the reference IQ constellation of v(n). ACPR has significantly been improved by over 20 dB, which can be confirmed in Fig. 15, where the output signal of DIDPA without DPD is shown in the red plot and with DPD in the green plot. Since the DVR model is optimally sized to ensure a good trade-off between performance linearization, model complexity, and numerical stability of the identification process. The numerical properties of the optimal    where is the dynamic range of the model coefficients estimated by DPD.
Linearity FOMs are greatly improved by DPD, which closely meets the targeted linearity FOMs under the optimal configuration opt,HHSC .
However, DPD applies a BO to DIDPA, which may deteriorate the efficiency. At the final DPD iteration, the efficiency FOMs are: PAE = 30.75% P out = 34.09 dBm which does not meet the target FOMs (P out,t and PAE t ) set in the user specifications. Table 5 summarizes the FOMs before and after DPD. The cost function is computed using FOMs before and after DPD.
From Table 5, the cost function is decreased after the DPD application. As DPD partially modifies the system conditions, the configuration opt,HHSC may not always be the optimal solution, which stimulus the motivation to update the weighting coefficients and re-launch a new HHSC.

E. UPDATE WEIGHTING COEFFICIENTS
The weighting coefficients w should be updated to take into account the linearity improvement by DPD to the DUT in which the efficiency is influenced.
According to Algorithm (3), the weighting coefficients w are updated as follows: Being the weighting coefficients w reflect each FOM's impact in the value of the cost function J, Fig. 16 presents the weighting contribution of each FOM, of which it illustrates the contribution with the initial and updated weighting coefficients.

F. HHSC OPTIMAL PRUNING 1) HC ALGORITHM
Optimal pruning of free parameters in HHSC is performed with the presence of the DPD block where the predistorter   the evolution of the HC algorithm to prune the free parameters for HHSC is illustrated in Fig. 17. The red dots present the solution at the HC iteration. The blue dots present the neighbors. In order to investigate the behavior of the HC algorithm, Fig. 18 presents which free parameter has been stored in the dictionary through the HC iterations.
As can be seen, the free parameter φ has often been repeated, which means that the optimization behavior of HHSC is seen as most sensitive to φ.
On the other hand, the free parameter μ has been stored three times, where the configuration μ × 1 ± μ 10 α φ ψ V GS,m V GS,p has the best cost function J at the 6 th , 13 th , and 17 th HC iteration. After the 21 th HC iteration, no configuration was found with a better J, which triggers the HC algorithm to stop.

2) HHSC WITH PRUNED FREE PARAMETERS
As discussed before, the goal of the HC algorithm here is to prune optimally the free parameters of HHSC, which aims to find the most sensitive free parameter to CFR and DUT with DPD.
Hence, = [φ] is the most sensitive free parameter, which will be used in HHSC according to the cost function J designed with the updated weighting coefficients w.
Starting from the initial solution 0 = [164°], which presents the optimal φ from the previous HHSC, Fig. 19 present the evolution of the cost function J and the free parameter φ according to the HHSC iterations.
According to Fig. 19, the free parameter φ has been reoptimized, where its optimal value becomes opt = 175.9°t hat corresponds to the optimal cost function J opt,upd = 0.81. By comparing J opt,upd and J HHSC+DPD , the DPD coefficients are required to be updated using ILA since the optimal configuration of DUT is changed.   Table 6 summarizes the results, where FOMs are presented before and after applying DPD. At this level, the application of DPD consists only of updating the existing DPD coefficients in the predistorter since the optimal pruning of free parameters HHSC was performed with DPD.
With the new optimal configuration, DPD has improved the linearity FOMs. On the other hand, the efficiency FOMs are improved. This can be confirmed in Table 6, where the PAE is improved by almost 5.5%. By comparing the cost function, it can be seen that J HHSC+DPD = 0.85 becomes better than J opt,HHSC = 0.82, which leads to finishing the auto-tuning approach. Fig. 20 illustrates the final architecture of DIDPA with CFR and DPD. The final optimal configuration is also highlighted. The AM-AM and AM-PM characteristics of the whole system (CFR+DPD+DUT) are shown in Fig. 20, along with the spectra of the input-output signals and the IQ constellation of u(n) plotted in blue and y(n) plotted in red.

VI. CONCLUSION
In this paper, we proposed an auto-tuning approach to exploit at best dual-input Doherty PA to maximize power efficiency while being compliant with the linearity specifications. The proposed auto-tuning approach relies on conducting a global optimization combined with a control process to find the optimal configuration of a set of the crucial circuit and systemlevel parameters that are appropriately merged with the DPD linearization and the CFR technique. This proposed approach has been performed according to an adaptive designed cost function, representing the trade-off between efficiency and linearity. In order to sharpen the optimal configuration, we propose a new approach based on the HC algorithm to prune the free parameters optimally when DPD linearization is applied. The proposed approach has been validated through experimental results, in which we use a 20 MHz LTE signal scenario. The proposed approach to optimizing the dual-input Doherty PA has been well validated by presenting a good trade-off between linearity, computational complexity, and efficiency. Besides, the DPD model used, which is optimally sized, has very good numerical properties, making it a perfect candidate for its implementation on hardware such as FPGA.