A Novel Design Methodology for a Multi-Octave GaN-HEMT Power Amplifier Using Clustering Guided Bayesian Optimization

The increasing demand of high-performance power amplifiers (PAs) for modern wireless systems has led the PA structures becoming more complex, thereby resulting in an extremely difficult optimization process for PA design. In this paper, the Bayesian optimization (BO) algorithm with a novel acquisition function, namely clustering guided Gaussian process upper confidence bound (CG-GPUCB) method, is proposed for the optimization of a multi-octave PA. To validate the developed optimization strategy, a high-performance multi-octave PA with a 10-W gallium nitride (GaN) transistor has been successfully implemented. The measured performance of the fabricated PA over the frequency band between 0.6 GHz and 2.8 GHz show that the output power is greater than 40 dBm, the power added efficiency (PAE) is over 62%, and the gain is more than 10 dB. Compared with existing BO based method, the proposed methodology is more efficient, since this method can allow achieving better performance for PAs with less optimization time. A comparison between the achieved results and the performance of other state-of-the-art PAs based on different optimization algorithms has highlighted the validity of the proposed design methodology and the obtained improvement in terms of bandwidth.


I. INTRODUCTION
Recently, wireless communication technology is tremendously developing towards larger channel capacity, wider bandwidth, higher efficiency, and more frequency carrier. PAs, as the main energy consuming components in modern wireless communication systems and the key modules at the end of the transmitters, are required to provide high performance in terms of the main characteristics, such as output power, drain efficiency, and bandwidth [1]- [3]. For example, in modern wireless communication systems, such as 5G indoor wideband wireless systems, PA modules are required to have an extremely wide bandwidth to ensure higher data transfer rates, while maintaining a big output power and high efficiency at the same time [4]. In order to fulfill the demand of wide bandwidth PA for emerging communication systems, several different implementation methods have been proposed, such as continuous-mode PAs, harmonically tuned PAs, and so on [5]- [17]. Among these techniques, the design of multi-octave PAs has received a great deal of attention [18]- [22]. However, the design of those multi-octave broadband high efficiency PAs must base on a more complex matching structures and a larger number of transmission lines, which make the designing and optimizing of the whole PA circuit more challenging. Therefore, in order to meet the required performance, the selection of a suitable optimization method is essential for designing PAs, especially for multi-octave PAs. It should be underlined that the GaN high-electron mobility transistor (HEMT) are the best choice for high-frequency and highpower amplifier design [23]- [32], owing to the outstanding performance of the GaN technology.
Nowadays, automated circuit optimizers are already widely used in commercial electronic design automation (EDA) tools, such as the advance design system (ADS). It is undeniable that the results obtained after optimization are generally satisfactory when dealing with passive circuits, such as filters and antennas. However, the drawback of these commercial optimizers is that it is difficult to obtain satisfactory results when optimizing active circuits, such as PAs, especially for those requiring high performance or with complex topologies. Its optimization time is long and the results are usually poor [33]. In order to alleviate this issue, many well-known optimization algorithms are gradually being employed for designing PAs, such as the simplified real frequency technique (SRFT) [34], the fast simplified real frequency technique (FSRFT) [35], the post-matching technique [36], the artificial neural networks [37], the space mapping [38], the shape-preserving response [39], the band pass auxiliary transformers [40], and the Bayesian optimization (BO) [41]. Among all these methods, BO technique has been considered as an effective global optimization algorithm for solving complex optimization problems, where the model expressions are unknown and costly to evaluate.
BO technique applies a regression model to first form the initial distribution of the objective function, then it trains the model based on extensive parameter information, and, finally, it selects the next evaluation point by applying the appropriate acquisition function. After that, the above process is repeated until the iteration is finished. The effectiveness and the efficiency of BO technique have been clearly demonstrated by the several studies focused on employing the BO algorithm for RF circuits designing, has clearly demonstrate [42]- [45]. However, in the traditional BO algorithm, the hyper parameters of the regression and the acquisition function require auxiliary optimization, which is computationally expensive in practice. In [46], the idea of cluster analysis is combined with model sampling, and the acquisition function is improved to a certain extent, which effectively improves the calculation speed. The improvement is even more obvious when the objective function is not smooth and the peak and valley are sharp.
In [41], the application of BO to PA design has been proposed. The optimized 10-W GaN PA in [41] achieves a drain efficiency higher than 60% with an output power greater than 39.8 dBm in the frequency range from 1.5 GHz to 2.5 GHz. In this paper, we employ for the first time the BO technique proposed in [46], which differs from [41] in the choice of the acquisition function, to the design of a multi-octave PA. In addition, different from the work in [41] which using the fundamental output power as target function, the new target function employed in this work takes both the output power and the PAE into account. Moreover, further improvements to the algorithm has been made to avoid the occurrence of invalid iterations, due to poor model accuracy. The overall process of the PA design is illustrated in Fig. 1. The fabricated multi-octave PA, which has been optimized by the proposed methodology, achieves an output power greater than 40 dBm with power added efficiency (PAE) higher than 60% in the frequency range from 0.6 GHz to 2.8 GHz. Compared with the work in [41], the bandwidth of the PA designed in this paper is remarkably extended, whereas the other specifications are maintained at a similar level.
The rest of the paper is organized as follows. In Section II, the theoretical foundations of the BO algorithm with the new acquisition function are presented. In Section III, the steps of  using BO technique for multi-octave PA design are described in detail. In Section IV, the proposed method is verified. To accomplish this goal, four different GaN PAs with different optimization techniques are fabricated, tested, and analyzed by investigating the effects of the different optimization methods on the achieved performance of the PAs. A 10-W gallium nitride (GaN) high-electron mobility transistor (HEMT) is used for the PA design and fabrication. Finally, the conclusions are given in Section V.

II. THE THEORY OF BAYESIAN OPTIMIZATION
The most critical step of the PA design is to obtain the optimal matching network that meet the target performance in the target operating frequency band. However, this is an extremely complex and time consuming step, even for an experienced PA designer. In order to alleviate this issue, an optimization method based the BO technique is employed is this work.
The BO technique, as a very effective global optimization algorithm, can be used to obtain the optimal solution of a complex objective function with fewer evaluations when the exact form of function is unknown. This is valuable for PA design, especially when the PA design objective is known but its exact relation to the circuit layout geometry parameters is unknown.
Taking the root mean square (RMS) of the output power and efficiency of a PA as the target value, Y, and the length and width of the transmission lines as the independent variable, X, the relationship between them can be expressed as in eq. (1), which can be regarded as a black-box function: where w represents the weight vector. The target of the BO method in this work is firstly to accurately predict the behavioral of the PA through continuously learning from the training data, after that, to find the X that can achieve the optimal solution for the objective function with the predicted results. Assuming that the training data set obtained through the initial circuit is 1: ={ , } =1 , then, according to Bayes theorem, we can obtain: where ( ) is the prior probability, which is the probability of w, taking a certain value that the observer has learned through the experience before training the data. ( | 1: ) is the posterior probability. It represents the new probability of w, which has been corrected by the training data set, D.
( 1: | ) is the likelihood function, which represents the probability of obtaining the dataset, D, with a fixed value of w. The likelihood function plays a key role in the calculation of the posterior probability. ( 1: ) is the marginal probability, which represents the probability of occurrence of the data set 1: . It is independent of other variables and is regarded as a constant. Therefore, the process of finding the maximum value of ( | 1: ) is equivalent to finding the maximum value of ( 1: | ) * ( ). Once w is determined, the black-box function is also determined. In other words, the model regression is completed. Meanwhile, an acquisition function is constructed based on the posterior probability distribution, and the next most evaluation point is selected by maximizing the acquisition function.
The BO algorithm consists of two main parts, the probabilistic surrogate model (PSM) and the acquisition function. The detail description of these two parts are described in detail in the following two subsections.

A. Probabilistic Surrogate Model
In this paper, we chose Gaussian process regression (GPR) as the probabilistic surrogate model (PSM)，which is more flexible and scalable compared to parametric models [47], [48]. A Gaussian process consists of a mean function ( ) with a covariance function, ( , x ' ). The specific expressions are shown as in eqs. (3) - (5): As the PA design is a multi-dimensional problem, the multi-dimensional Gaussian distribution is employed in this work. Both the calculations of the mean vector and covariance matrix are related to the kernel function.
A few different kinds of kernel functions are commonly used, including linear kernel functions, polynomial kernel functions, Gaussian kernel functions, and so on. One of the most frequently used kernel functions is the Gaussian kernel function, which can map data to infinite dimensions, as described as follows: The multidimensional Gaussian kernel can be expressed as in eq. (8), where l is the hyper-parameter of the equation: 2) Mean vector and covariance matrix: Assume that and represent two different multi-dimensional variables, and they both satisfy multi-dimensional Gaussian distribution. Assume that the mean vector of and is given by a and b, respectively, and the variance matrix of and is given by A and B, respectively, and C is the covariance matrix of and . Then, the joint distribution of and can be described as given by： The marginal probability distribution of is: From Bayes theorem, we know that: Applying the theorem to the predictions of the model. Let be the training target variable extracted from the circuit and be the prediction target variable. Then, the posterior mean and posterior covariance can be obtained as in eqs. (12) and (13): Subsequently, by combining the above theory with PA design, and can be expressed as follows: where 1 , 2 … denote the target values corresponding to each set of training data, respectively, and n represents the number of training data sets. Similarly, 1 , 2 … . denote the objective values corresponding to each set of target value, respectively, and m represents the number of training data sets. Suppose that a certain set of independent variables in the training set are 11 , 12 … . 1 , thus, the prediction set can be denoted by 11 ′ , 12 ′ … . 1 ′ , where t represents the number of independent variables. Then, the variables 1 , 1 can be expressed as follows: The target variable Y and the independent variable X can be fully expressed as follows:

B. Acquisition Function
The acquisition function plays an important role in the optimization process. It directly determines the efficiency of the optimization. The following is a detailed description of several acquisition functions.

1) Probability of improvement (PI)
: Assume that f( + ) is the largest value among the target values in the training data, and f(x) represents the target value at the prediction data. The idea of PI is to first find the point among the predicted points whose target value f(x) is greater than f( + ) and later select the point with the highest probability among these points as the next acquisition point [49]. It can be expressed in the formula as in eq. (14): The specific formula for calculating PI is given by: where Φ (·) denotes the normal cumulative distribution function and ε is the trade-off coefficient. However, the PI treats all improvements as equal, reflecting only the probability of an improvement and not the magnitude of the improvement, which may lead the results of the process falling into a local optimum.

2) Expected improvement (EI):
In view of the shortcomings of PI in practical application, a new acquisition function EI is proposed. The EI integrates both the probability of improvement and the different amounts of improvement [50], thus, the situation of the optimization results falling into local optima can be reduced. The formula for EI(x) is shown as: where Z can be described as follows: and ∅(·) denotes the probability destiny function.

3) EI*PI:
To further balance the optimization process, the new acquisition function EI*PI is proposed in [41]. The new sampling points in this method are selected in the way shown below: As can be seen in eq. (18), this collection function takes into account both EI and PI as a combined entirety. It makes some improvements when compared to EI, but the improvements are still limited. The sampling range for the above three acquisition functions is the whole prediction sets. Thus, if the number of prediction sets is huge in practice, the calculation process of these three acquisition function will become extremely long, which making the efficiency of the correspond optimization process extremely poor. 4) CG-GPUCB: Thus, in this work, the clustering guided GPUCB (CG-GPUCB) is used as an acquisition function of BO method for PA design. Based on the idea of cluster analysis, the range of optimal points selected by the acquisition function is reduced by properly classifying all the predicted data sets, which can greatly improve the efficiency of the optimization. The idea of the new acquisition function guided by cluster analysis is described in detail as below.
The traditional GPUCB method is expressed as follows: If we take eq. (19) as the acquisition function, the value of the next sampling point becomes: where μ and σ represent the posterior mean and posterior covariance, respectively, and 1 2 ⁄ represents the weighting factor whose value increases sequentially with the number of iterations. In a given iteration, eq. (19) can be rewritten as follows: Thus, ( ) can be obtained as follows: ( ) = − 1 ( ) + 1 ( | 1: ) (28) which can be seen as a mapping of the mean, ( ), to the standard deviation, ( ), as shown in Fig 2. Thus, finding the point that maximizes the acquisition function is equivalent to finding the point that maximizes the intercept of the σ-axis.

Fig. 2. Optimization process guided by cluster analysis.
In this paper, all the points selected for the acquisition function, i.e., all the prediction sets, were mapped to the μ-σ space and classified into k classes using the K-mean clustering algorithm. After that, the σ-axis intercept of the central of mass in each cluster is calculated. The cluster with the optimal mass that has the largest σ-axis intercept is the optimal cluster. Then, the final acquisition point is selected in the optimal cluster. The expression is shown as below: +1 = * ( | 1: ) (29) where * represents the best cluster. The summarized optimization process is illustrated in Fig. 1. Besides, the comparison of the optimization procedure related to the traditional Bayesian optimization and the proposed algorithm is fully illustrated in Table. Ⅰ. In order to further compare the performance of different acquisition functions for BO method, both the CG-GPUCB and EI*PI are employed, which is applied to multi-octave PA design in this work.
The loss function used in this paper to measure the accuracy of the prediction is the squared loss function, which measures the difference between the predicted value, Y, and the actual output value, T, of the circuit. Detail expression is shown below:

A. Initial Design of Multi-Octave PA
In the initial design of the PA, load-pull technique is employed in the ADS to obtain the optimal matching impedance for the target frequency band. Once the bias condition and input power are fixed, a series of output power contour and PAE contour are obtained by the load-pull technique when the output power is larger than 40 dBm and the PAE is higher than 60% from 0.6 GHz to 3.0 GHz, as shown in Fig. 3. Then, the initial matching networks are formed using stepped impedance structures. In order to stabilize the PA without causing a degradation of the gain across the bandwidth, a stabilizing network needs to be added in the matching circuit. Therefore, a series resistor 1 is added at the input of the amplifier and bypassed by a capacitor 1 .Furthermore, in order to achieve RF choking and avoid high frequency parasitic effects generated by lumped elements, the quarter-wavelength transmission lines are used [51]. Finally, the whole matching network and the bias circuit are integrated together to form a whole PA circuit. The topology of PA is shown in Fig. 4. As can be seen from the illustrated topology, this design requires four transmission lines less than the work in [41].

B. Optimization Steps for PA Design
Once the initial design of PA is obtained, the circuit is tested. The result shows that the performance of the initial PA can meet the requirement only in part of the target frequency band. Thus, optimization is necessary for the initial design. However, due to the complex structure and the large number of transmission lines of the whole PA circuit, it is extremely time consuming to achieve the desired performance with ADS optimizer. Therefore, in this work, the proposed clustering guided BO algorithm is employed for PA optimization. The input parameters of BO are the length and width of all transmission lines of PA, including input and output matching and bias circuits. The fundamental output power and PAE, as the two extremely important metrics of PA, are both included in the optimization function. Assuming that 1 and 2 represent the RMS of the fundamental output power and the PAE of the operating band, respectively, and the objective function Y is the RMS of 1 and 2 , we obtain eqs. (25)- (27), where n refers to the number of frequency points in the target operating frequency band: = √ 1 2 + 2 2 2 (33) Fig. 5. Flowchart of the optimization procedure for the multi-octave PA design.
The flowchart of the detail optimization procedure of the PA is shown in Fig. 5. Firstly, the initial input parameters, i.e., the length and width of the transmission lines of PA, are obtained by the method mentioned earlier. Then, 50 sets of input parameters, which are obtained through the Latin hypercube sampling (LHS) method in MATLAB [52], are imported into ADS.
The fundamental output power, PAE, and the RMS of power and PAE, which are associated to the PA with different transmission lines, are simulated. Next, the obtained training data are send into MATLAB for behavioral predicting. Meanwhile, the data classification and model sampling process are carried out.
After that, the input data corresponding to acquisition point are imported into the ADS for simulation to obtain the actual output data. Finally, the input data and the actual output data of the acquisition point are imported into MATLAB again as new training data. Then, a complete iteration has been finished. The LHS will perform the next iteration again near the new acquisition point and the above iterations are repeated until the optimal point are obtained.
Furthermore, the steps for MATLAB to communicate with ADS are given as follows. Firstly, add the corresponding path of ADS in MATLAB to realize the connection. Secondly, write the corresponding function in MATLAB to control the operation of ADS by MATLAB. Finally, MATLAB obtains the corresponding simulation data in ADS by reading function. The above process can realize the complete communication between MATLAB and ADS.

IV. FABRICATION AND ANALYSIS OF MULTI-OCTAVE GaN-HEMT PAs
The PAs are implemented with the Rogers RO4350 board, which use the Cree CGH40010F GaN HEMTs for the designs. Both the ADS optimizer and the BO technique based optimizers are used for PA optimization, validating the effectiveness of both methods. In order to compare the efficiency of different acquisition functions of the BO method, beside CG-GPUCB function, the EI*PI function are also employed as the acquisition functions to optimize the PA. The iterations number are set equal to 20 for both BO methods when doing optimization.
Taking into account the cost and area of fabricate PAs, the geometry size of the transmission lines needs to be restricted. In addition, due to the PA has a large current at the drain terminal during operation, the transmission lines must be wide enough to ensure the regular operation of PAs. Therefore, the bound range for the width and the length in PAs are set as follows: 1mm ≤ W ≤ 10mm and L ≤ 30 mm, respectively.
In this design, a 10-W GaN HEMT manufactured by Wolfspeed is used, and the gate bias is set equal to -2.7 V and the drain bias is set equal to 28 V. Four different PAs, including the initial designed PA, the PA optimized by ADS, the PA optimized by BO with EI*PI method, and the PA optimized by the proposed clustering guided BO, are all fabricated and tested.
In Table II, the detail geometry size of all the four fabricated PAs are presented. The photographs of all these PAs are reported in Fig. 6. As can be seen from Table I and Fig. 6, the geometries of the optimized PAs are all changed hugely when compared with the initial ones. While, the two PAs optimized by BO method change even larger when compared with the PA optimized by ADS.    The test bench used for PA measurements is shown in Fig.  7. The comparison between simulated and measured results is reported in Fig. 8. Compared with the simulation results, the effective bandwidth of all the fabricated PAs are reduced by 0.1 GHz to 0.2 GHz, respectively.
The comparisons of the measurement results for all the four fabricated PAs are given in Fig. 9. From the obtained results, we can see that the performance of the optimized PAs are all significantly improved, not only in terms of bandwidth but also in terms of output power, PAE, and gain, when compared with the initial designed PA. The PA optimized by the ADS annealing optimizer achieves an output power more than 40 dBm, a PAE greater than 60%, and a gain greater than 10 dB from 0.9 GHz to 2.4 GHz. The PA designed by the BO with EI*PI as the acquisition function has an output power greater than 40 dBm, a PAE greater than 60% and a gain greater than 10 dB from 0.8 GHz to 2.3 GHz. While, the PA designed using the proposed method shows the best performance, since it achieved an output power greater than 40 dBm, a PAE greater than 60%, and a gain greater than 10 dB from 0.6 GHz to 2.8 GHz.
Besides, a more detailed comparison is given in Table III and, according to the achieved results, the proposed clustering guided BO method provides the best design, since the effective bandwidth is hugely extended when compared with the other two optimizers, while the power, PAE, and gain are all maintained at a high level. As can be seen from this table, with the same optimization target, the bandwidth with the proposed optimization method reaches 2.2 GHz, while it is 1.5 GHz for ADS optimizer and 1.5 GHz for the technique in [41]; and the fractional bandwidth of the proposed method reaches 129%, while it is 91% for ADS optimizer and 97% for the method in [41].
The normalized errors of the optimization with the two different BO algorithms are shown in Fig. 10. It can be observed that the BO algorithm with CG-GPUCB function shows a lower normalized error compared to the BO algorithm with EI*PI function, when using the same number of iterations, which confirms that the proposed method is more efficient than the BO with EI*PI function.
Finally, a comparison between the performance achieved in this work and those reported in the latest published articles is presented. The results of this comparison are shown in Table Ⅳ. As can be seen from this table, the proposed method provides widest bandwidth when compared with the other works, while maintaining the output power and efficiency in a high level.

Ⅴ. CONCLUSION
In this paper, a new clustering guided BO algorithm is employed for a multi-octave PA design, which is based on using a 10-W GaN HEMT. Compared with the PA optimized by an existing commercial optimizer, the PA designed by the proposed technique allows achieving not only greatly improved performance but also greatly reduced optimization time. In addition, compared with the EI*PI based BO method, the proposed CG-GPUCB based BO method gives a PA with much wider effective bandwidth, using much less optimization time, which verifies the effectiveness of the proposed methodology. To further demonstrate the validity of the proposed design methodology, the achieved results have been compared with the performance of other state-ofthe-art PAs, highlighting the obtained improvement in terms of bandwidth.