Differential Evolution Based Manifold Gaussian Process Machine Learning for Microwave Filter’s Parameter Extraction

Gaussian process (GP) is a rapidly developing supervised machine learning (ML) method in recent years, which has been widely used in the establishment of surrogate models in the field of electromagnetics. However, it has the problems of large sample demand, high computational complexity and low accuracy when processing high dimensional data. To solve this problem, a manifold Gaussian process (MGP) ML method based on differential evolution (DE) algorithm is proposed in this study. For the proposed method, the DE algorithm is used to get dimension reduction parameters, and the method can work very well with the optimized parameters. Compared with the traditional GP model, the dimensionality reduction method based on Isomap is adopted to simplify the mapping relationship between data pairs. Therefore, the model is more suitable for the problem of insufficient samples and high data dimension. In this study, the proposed DE-based MGP (DE-MGP) is applied to the extraction of coupling coefficients of the fourth-order and sixth-order coupling filters, in which the test error of the fourth-order coupling filter surrogate model can be reduced to 0.84%, and the test error of the sixth-order coupling filter is expected to be reduced to 1.53%, which proves that the proposed method is very effective.


I. INTRODUCTION
As an important part of wireless communication, microwave filter has a long history. As early as the 1930s, Mason and Sykes used ABCD parameters to derive a large number of useful filter image impedance phase and attenuation functions [1]. In the 1950s, for the design of synchronously tuned cascade resonant filters, the S.B.Cohn designed a directcoupled cavity filter with transmission zero at wireless distance based on the low-pass filter prototype [2]. In the 1960s, J. D. Rhodes eliminated the redundant coupling parameters by rotating the coupling coefficient matrix of the transformation filter, making the design structure of the filter simple [3]. In the 1970s, Atia and Williams developed a comprehensive general theory that can obtain the topological structure of coupled resonance filters of less than the fourth order by The associate editor coordinating the review of this manuscript and approving it for publication was Yuhao Liu. analytical method [4]. In the 1980s, Cameron developed this theory and generalized the original Chebyshev function into a generalized Chebyshev function and applied it to the synthesis of cross-coupled filters. In Cameron's theory, we can reduce the synthesis work to the calculation coupling matrix by determining the low-pass prototype filter function of the filter [5], [6]. In the 1990s, Cameron proposed a similar transformation method to eliminate the coupling coefficient matrix [7]. In the early 21st century, Amari proposed the circuit optimization method, which is not limited by the coupling topology structure of the filter and can easily obtain the coupling matrix we need [8].
In recent years, there have been many new design methods for microwave filters. In 2013, Tian Yubo et al. realized the optimization of the frequency characteristics of the electric band-gap structure of bow unit through particle swarm optimization algorithm [9]. In 2014, Otani H et al. carried out high-speed automatic optimization design of the filter through genetic algorithm, and obtained better design results [10]. In 2019, in order to provide a simple and effective solution to the complex tuning problem of microwave filter, J. C. Melgarejo et al. proposed the space mapping technology of microwave filter, and verified the method by experimentation of adjusting the six-pole induction waveguide filter [11].
To solve these problems, many researchers are now introducing machine learning (ML) techniques into the design of microwave devices. The surrogate model technology has gradually come into our field of vision. By establishing an surrogate model for microwave devices, we can realize its rapid simulation and optimization [12]. At the present stage, functional surrogate modeling techniques for microwave devices mainly include artificial neural network (ANN) [13], support vector machine (SVM) [14], polynomial regression (PR) [15] and kernel extreme learning machine (KELM) [16]. However, with the study of Bayesian neural network, a new training model -Gaussian process (GP) has gradually attracted people's attention [17]. This training model has many advantages such as small sample demand, few training parameters and flexible acquisition, so it is very suitable for solving high dimensional nonlinear problems. At present, GP model has been widely applied in the field of electromagnetism. From 2009 to 2010, Villiers successfully modeled the gap antenna fed by ultra wide band (UWB) dual-frequency coplanar waveguide (CPW) using GP model, which indicated that the GP could be used as an alternative to full-wave analysis in the design of microwave devices [18], [19]. In 2012, Jacobs successfully modeled the microwave filter using the GP of non-standard kernel function [20]. In 2013, Jacobs proposed a two-stage network modeling method for antenna input characteristics [21], compared the antenna input characteristics of single mode and ensemble mode with GP modeling, and the results showed that the GP model of set mode had smaller test errors [22]. In 2015, Jacobs proposed the use of GP to model the resonant frequency of dual-frequency microstrip antenna, which extended the application range of Gaussian process in the electromagnetic field [23]. In the same year, Jacobs proposed a second-stage network modeling method for GP, which was successfully applied to the design of microwave filters [24]. In 2016, Vargas Cardona et al. proposed a multi-output GP to enhance diffusion-tensor field resolution [25]. In 2018, Chen Yi et al. used GP regression to obtain prior knowledge, and then used knowledge-based neural network (KBNN) to model electromagnetic problems. This method greatly reduced the calculation time and improved the computational efficiency of modeling [26]. In 2019, Xiao-Hong Fan et al. proposed the design optimization of GP fast microwave antenna based on particle swarm optimization algorithm, and successfully optimized the substitute model as the fitness function of intelligent optimization algorithm and obtained the design parameters we need [27]. In the same year, Xiao-Hong Fan modeled and optimized the cone-core horn antenna through a coarse grid GP [28]. In 2020, Jing Gao et al. proposed a semisupervised learning GP, which combines unlabeled samples to improve the accuracy of the GP model and reduce the number of labeled training samples required [29].
Thanks to the unremitting efforts of our predecessors, we are also deeply inspired when we do relevant research. In order to further solve the problems of insufficient samples, high dimensions and high computing cost in electromagnetic optimization design, this paper proposes a differential evolution (DE) based manifold GP (DE-MGP) algorithm, and applies it to the inverse model parameter extraction of coupled microwave filters [30]. Compared with the traditional GP model, the DE-MGP has better adaptability to high-dimensional data in the case of the same amount of data. The dimensionality reduction operation is carried out on the training data, i.e. the spatial complexity of the data is reduced, which reduces the dependence of the model on the amount of data. At the same time, it also greatly reduces the calculation amount of the model operation. The main contribution details of this paper are as follows: 1) a differential evolution based manifold Gaussian process (DE-MGP) is established. Firstly, different neighborhoods are divided in the data space, and the geodesic distance between sample points in each neighborhood is calculated by the shortest path algorithm [30]. Secondly, multi-dimensional scaling (MDS) [32] is used to obtain the projection of the training data in the low-dimensional space, so as to obtain the isometric mapping (Isomap) of the training data we need in the low-dimensional space. Then the processed data are used as input of the GP to calculate the test error. Finally, the error is taken as the fitness function of the DE algorithm, and the dimensionality reduction parameters in the Isomap are optimized. This means that DE algorithm is used to train dimensionality reduction parameters. After continuous iteration, when the optimization algorithm converges, we can get a set of optimal dimensionality reduction parameters. By applying this set of optimal parameters in the Isomap, our training model can be constructed. Compared with simply combining manifold learning and GP, the advantage of this method is that it can train not only GP model but also manifold parameters through DE algorithm, which we can greatly improve its training efficiency and generalization ability. 2) In order to verify the training effect of our proposed algorithm and further improve our theory, we use the proposed DE-MGP algorithm to extract the inverse parameters of a fourth-order coupled filter and a sixth-order coupled filter. In the experiments, we take the S-parameter of the filter as the system input and the coupling coefficient matrix as the system output to establish the model. In the above two filters, we compare the parameter extraction effects of the normal GP model and the DE-MGP model under the conditions of 300 training samples and 600 training samples, respectively. It is found that the DE-MGP has better fitting ability for the two filters under the same VOLUME 8, 2020 number of samples, and its test error is much smaller than the ordinary GP model, which also shows that the DE-MGP has better adaptability for high-dimensional data, and its training accuracy depends less on the number of training samples. These characteristics are very meaningful for us to deal with the problems in the field of electromagnetics, which have the disadvantages of large amount of calculation, insufficient samples and high cost of obtaining samples. The rest of this paper is organized as follows. In section two, we will introduce some of the algorithmic theories used in our study. In the third part, we will mainly introduce the training model we proposed in this paper. In the fourth part, we will conduct experiments with two coupled filters in different number of samples, and illustrate the advantages of the proposed model by comparing it with the initial GP model and four classical models. The last part is the summary of this research and the prospect of future work.

II. RELATED WORKS
In this section, we will introduce some related theories, including differential evolution algorithm, Gaussian process and manifold learning theory.

A. DIFFERENTIAL EVOLUTION (DE) ALGORITHM
DE is a kind of intelligent optimization algorithm which can perform in the way of parallel search. It uses floating-point vector encoding. The optimal fitness value in solution space is searched through the continuous evolution of the population. In the non-linear and non-differentiable continuous space, the DE algorithm has a good ability of optimization.
In each iteration of DE, the population needs to go through three steps: variation, crossover and selection. The individual x i in the initial population {x 1 , . . . , x n } will continue to evolve towards the optimal solution, and each individual searches in the m dimensional space. This process can be described by the following formulas. Initialization: Variation: Crossover: Selection: where x ij represents the jth element of the ith individual, x L ij and x U ij , respectively, represents its lower limit and upper limit. x r1 , x r2 and x r3 are three different individuals randomly selected from the current population, and F is the scaling factor. u ij (g) represents the crossover result of the g generation, CR is the crossover rate, and j rand is a random integer from 1 to m, so as to ensure that at least one characteristic of the crossed individuals comes from the mutation result v i . x i (g + 1) is an evolutionary progeny.

B. GAUSSIAN PROCESS (GP) MODELING
GP is a random process composed of an infinite number of random variables that obey the Gaussian distribution, and any subset of these random variables obeys joint Gaussian distribution. Its mean value function and covariance function are respectively expressed as follows: is the mean value function of the random variable, and k x, x is the covariance function. Therefore, we define the GP model as: In the case that system noise is considered, the training model is represented as: where ε is the additive noise following a normal distribution, with a mean value of 0 and a variance of σ 2 n , i.e. ε ∼ N (0, σ 2 n ). x is the input vector, y is the observed value polluted by noise, and the prior distribution of y is: where, K = K (X , X ) is a positive definite covariance matrix of n × n order symmetry, and matrix elements are used to measure the correlation between x i and x j . The joint Gaussian prior distribution consisting of the output y with n training samples and the output f * with n * test samples is where, K (X , X * ) is the covariance matrix with n × n * order, and K (X * , X * ) is the covariance matrix of test output samples with n * × n * order. The covariance function of the Gaussian process model must satisfy the Mercer condition, that is, any set of its points must be guaranteed to produce a non-negative definite covariance matrix. Usually we use the square exponent (SE) function as the covariance function.
where the matrix M = diag(l) and l are the positive feature length scale parameters corresponding to the input variable elements, and σ 2 f is the signal covariance. The superparameters of Gaussian process determine the properties of the model, which can be obtained by the maximum likelihood estimation method based on the prior data. The form of the negative logarithmic likelihood function is: According to the Bayesian principle, given the new input x * , training input X and training output y, the most probable predictive posterior distribution of y * is inferred to be: Its mean and covariance are as follows: m contains the most likely values of the test output related to the test input vector in x * , and the corresponding prediction variance is given by the covariance matrix .
The predicted mean and variance of GP model describe a Gaussian distribution that the predicted output may follow. If the predicted mean is regarded as the predicted output value of a general nonlinear fitting tool, the predicted variance can be regarded as an evaluation of the uncertainty of the predicted mean. If a training input vector in the training set is close to the predicted output vector, a small prediction variance will be obtained, and the predicted mean value will be close to the actual output. In other words, the size of the prediction variance reflects the model accuracy of the GP model at that point.

C. ISOMETRIC MAPPING (Isomap)
Manifold learning is a dimensionality reduction method that draws on the concept of topological manifolds. 'manifold' is a space that is homogenous with Euclidean space locally. In other words, it has the property of Euclidean space locally and can use Euclidean distance to calculate the distance.
Isomap is a classical manifold learning algorithm based on multi-dimensional scaling (MDS). It is misleading to directly calculate Euclidean distance in high-dimensional space after embedding low-dimensional manifold into high-dimensional space, because the linear distance in high-dimensional space cannot be expressed in low-dimensional space. In this case, we need to express the distance between two points on the low-dimensional embedded manifold by 'geodesic' distance. The detailed process of isometric mapping is shown in algorithm 1.
When using Isomap algorithm to process data, it is very important to select a suitable neighbor parameter. If the selected neighborhood is too large, the algorithm will mistake the distant points as near neighbors, which will lead to 'short circuit'; if the neighborhood we selected is very small, many points on the manifold surface may not have close neighbor relationship with other points, which will lead to 'open circuit'. The distance between x i and its k-nearest neighbor samples is set to Euclidean distance, and the distance between Xi and other points is set to infinity; 4: end for 5: Calculate the distance dist(x i , x j ) between any two sample points (Dijkstra, Floyd) using the shortest path algorithm; 6: Use dist(x i , x j ) as the input of MDS algorithm; Output: The projection of the sample in the low-dimensional space is the output of the MDS algorithm recorded as Z =

III. PROPOSED ALGORITHM MODEL
In order to obtain the optimal data dimensionality reduction parameters scientifically, improve the generalization ability of the manifold GP model, and reduce the computational complexity and time consumption, a differential evolution based manifold Gaussian process (DE-MGP) is proposed in this study. A new manifold parameter training method is provided for the proposed surrogate model through DE algorithm. We take the manifold parameters as the optimization object, and the output of GP model as the optimization guidance of the DE algorithm. With the continuous iteration of DE, we train the dimension reduction parameters of manifold learning method while training the GP model. Firstly, we use the method of Isomap to process the collected initial data, and obtain a set of dimensional-reduction data with less information loss, and then use it to train GP. The mean absolute percentage error (MAPE) of the model is calculated and applied to the DE algorithm as the fitness function. The DE algorithm is used for iterative optimization, and the dimensionality reduction parameters can be solved after reaching the output conditions. Finally, the optimal parameters obtained by the algorithm are fed back to the Isomap, so that our optimal parameter model is established. The main algorithm flow is shown in algorithm 2, and its flow chart is shown in figure 1.
In the algorithm 2, there are several key points that need to be explained in detail: 1) In the process of using DE algorithm, we choose the mean absolute percentage error (MAPE) as the fitness function, and its calculation formula is as follows: where, n is the number of samples, observed is the label, and predicted is the output of the model.

4:
According to the method in formula (3), calculate the individual after the ith individual crosses its mutant individual. 5: Select according to the equation (4) to get the next generation individuals • 6: End for 7: Generation = Generation + 1; 8: End while 9: Calculate the fitness value of the best individual: Bestfitness = fitness(pop(Generation)), Pop (Generation,1) is the optimal neighbor parameter, denoted as K, and pop(Generation,2) is the optimal low-dimensional space, denoted as d . Among them, MAPE of Gaussian process prediction model is used as the fitness function. 10: For i = 1,2, . . .,(m + n) do 11: Take K samples as a neighborhood space and determine the neighborhood of each sample point.

12:
The distance between x i and the points in its neighborhood is set to Euclidean distance, and the distance from other points is set to infinity. 13: End for 14: The shortest path algorithm is used to calculate the distance dist(x i , x j ) between any two sample points.

IV. CASES STUDY
In the process of modeling and optimizing of microwave components, there are many problems about training data, such as too many characteristics for training data and time consuming for getting them. Therefore, it is very significant for us to reduce the feature dimension and complexity of samples. In this part, the mapping inverse model between the S-parameter of microwave filter and its coupling coefficient matrix will be established by the proposed DE-MGP, in which the given S-parameter is used as the input to predict the corresponding coupling coefficient of the filter, so as to achieve the purpose of designing the required microwave filter. In order to test the regression ability of the proposed training model in small samples and high-dimensional problems, this paper uses a MGP to extract the parameters of the two filters in the case of 300 groups of training samples and 600 groups of training samples respectively, and compared the experimental results with the ordinary GP model. According to our experimental research, when there are only 300 samples, we can get a surrogate model with high accuracy through the proposed method. On this basis, we continue to increase the training samples. When the number of samples reaches 600, our model accuracy will fully meet our design requirements.

A. THE FOURTH-ORDER COUPLING FILTER
In this case, the proposed DE-MGP will be used to establish the parameter extraction model of a fourth-order coupling filter with a central frequency of 4GHz and a bandwidth of 40MHZ. The initial coupling coefficient matrix [33] is as follows: Four different non-zero terms in the initial coupling coefficient matrix are randomly generated into a group of coupling coefficients within the tolerance of ±0.3, and then the S-parameters corresponding to each group of coupling coefficients are calculated respectively, so that the training data needed to establish the dimension reduction fitting model is obtained. The S 11 curve is sampled at 35 frequency points, and input the 35 points as 35 dimensions of training data: X = [dB(S 11 (f 1 ))dB(S 11 (f 2 )) . . . dB(S 11 (f 35 ))] (17) According to the previous experiments, these 35 frequency points can better express the characteristics of the whole S 11 curve, which enables us to obtain the highest surrogate model with the lowest frequency sampling. Four non-zero parameters of the coupling matrix M are taken as model outputs: In this case, the dimension reduction parameter of the established MGP is set as (d', K), where d' is the target dimension of manifold dimension reduction, and K is the number of neighborhood individuals of dimension reduction samples. We will use the DE optimization algorithm to model the DE-MGP in the case of 300 samples and 600 samples respectively, and compare with the original model. Our evaluation of experimental results is based on an ideal coupling coefficient matrix. In this way, we can clearly find that under the same sample size, the training model proposed in this study has obvious advantages in high-dimensional parameter extraction. In the following, we will introduce the S-parameter extraction experiments of the ideal coupling coefficient matrix of the proposed DE-MGP model and the GP model under the conditions of 300 and 600 samples. The calculation of S-parameter is based on the method studied by predecessors and related electromagnetic simulation softwares [34].
Ideal coupling coefficient matrix is given by   Equations (20) and (21) are the coupling coefficient prediction matrix of the DE-MGP under 300 training samples and 600 training samples respectively. The results obtained by the original GP are shown in equation (22) and (23).
As shown in Table 1, in the experiment of 300 groups of samples, the optimization result of the DE algorithm for dimensionality reduction parameters (d', k) is (4,4), in which the dimension of the low-dimensional space is 4 and the number of samples in each neighborhood is 4. The iterative convergence curve of the optimization algorithm is shown in Figure 2a, where the abscissa is the number of iterations and the ordinate is the iteration error of each step. The error of the first iteration is 2.93%, and the error of the final convergence result is 2.75%. That is the MAPE of the proposed model with the condition of optimal dimensionality reduction parameters is 2.75%, which is 6.14% lower than that of the first step iteration. The error rate of the original GP model is 6.56%, which is more than twice the error rate of the proposed model. In the experiment of 600 groups of samples, the dimension of the low-dimensional space output by the DE algorithm is 7, and the number of individuals in each neighborhood is 7. In this case, optimization algorithm convergence curve is shown in figure 2b, where the abscissa is the number of iterations and the ordinate is the iteration error of each step. The first step iterative error is 1.41%, and the final convergence results of about 0.84%. Compared with the first step iterative result error rate, it decreases 40.43%. The prediction error of the original GP model is 5.24%, and the error rate of the proposed algorithm is more than five times better. Through these two experiments, we find that under the same sample sets, the proposed DE-MGP can effectively improve the regression effect of the inverse model of the filter. Figure 3 and Figure 4, respectively, show the S-parameter fitting curves corresponding to the predicted coupling coefficient matrix obtained by the DE-MGP model under the conditions of 300 groups of samples and 600 groups of samples. Their abscissa is the frequency, ranging from 3900MHz to 4100MHz, and the ordinate is the dB value of S-parameter. The blue curve in the figure represents the S-parameter of the ideal coupling coefficient matrix, and the red circle represents the S-parameter of the predicted coupling coefficient matrix. It can be seen from the figure that the proposed DE-MGP in this study has a good fitting result in these two S-parameter extraction experiments.
It can be found from Table 1, Figure 3 and Figure 4 that increasing the sample size can improve the fitting accuracy of the algorithm. In the case of the same sample size, the precision of the proposed DE-MGP model is obviously better than that of the original GP model, and the MAPE of the MGP after parameter selection is at least one times lower than that  of the original GP. By comparing the 300 groups of samples of the DE-MGP and 600 groups of samples of the original GP modeling results (2.75% for DE-MGP in 300 samples, 5.24% for GP in 600 samples), we can clearly find that the dependence of the DE-MGP on the number of samples is obviously reduced. In other words, through our proposed DE-MGP model, we can make the model become more accurate with fewer training samples, which is very great significance for us to establish surrogate models in the field of electromagnetics.

B. THE SIX-ORDER COUPLING FILTER
In order to make our conclusion more convincing, we will carry out further parameter extraction experiments in the same environment through a six-order coupling filter. The central frequency of the filter is 4GHz, the bandwidth is 36MHz, and the initial coupling coefficient matrix [33] is shown in equation (24), as shown at the bottom of the page.
Using the same experimental method as the above fourth-order coupling filter, we randomly generate a set of coupling coefficients from the seven different non-zero terms of the initial matrix within the tolerance of ±0.3, and then calculate their corresponding S-parameters respectively. We take the S-parameter as the training input and the seven different non-zero coupling coefficients as the sample labels. In the same way as above, we sample the curve at 35 frequency points, so that we get the system input (25) In this case, we also used an ideal coupling coefficient matrix to test our model, and extracted S-parameters of the sixth order coupling filter in the conditions of 300 and 600 groups of samples respectively.
Ideal coupling coefficient matrix is given in (27), as shown at the bottom of the page, coupling coefficient matrix predicted by the DE-MGP in 300 training samples is (28), as shown at the bottom of the page, coupling coefficient matrix predicted by the DE-MGP in 600 training samples is (29), as shown at the bottom of the page, coupling coefficient matrix predicted by original GP in 300 training samples is (30), as shown at the bottom of the page, And coupling coefficient matrix predicted by original GP in 600 training samples is (31), as shown at the bottom of the page.
As shown in Table 2, in the experiment of 300 groups of samples, the optimization results of the DE algorithm are as follows: the optimal dimension number of low-dimensional space is 11, and the optimal number of neighborhood individuals is 17. In the case of 300 groups of samples, the convergence curve of the DE algorithm is shown in Figure 5a, where the abscissa is the number of iterations and the ordinate is the iteration error of each step. The first iteration error of the DE algorithm is 2.67%, and the convergence result is 2.22%, which means the MAPE obtained by the proposed model in the experiment of 300 training samples is 2.22%. The error rate is reduced by 16.85% of the first iteration. The original GP model is 5.04%, and the error rate is nearly double that of the proposed model. In the experiment of 600 groups of samples, the optimal dimension number of low dimension space is 7. The number of optimal dimensions of neighborhood individuals is 30. In this experiment, the convergence curve of the DE algorithm is shown in Figure 5b    abscissa is the number of iterations and the ordinate is the iteration error of each step. The error rate of the DE algorithm in the first calculation is 2.09%, and the convergence result is 1.53%, which means the error rate is reduced by 26.76% of the first calculation result. Compared with the original GP model, the error rate is only about 1/3 of the original model. This shows that in the experiment of the sixth order coupled filter, the proposed model also works very well. In the case of the same training samples, the proposed DE-MGP has better parameter extraction effect than that of the original GP. If we carefully observe Table 2, we can also find that, like the fourth-order coupling filter, the precision of the DE-MGP model in the condition of 300 groups of training samples is better than that of the GP model in the condition of 600 groups of samples, which also indicates that the model we proposed has a low dependence on the number of samples when dealing with high-dimensional problems. This has huge implications when we're dealing problems without enough samples. Figure 6 shows the parameter extraction results of the DE-MGP for the sixth order coupling filter in 300 samples, and Figure 7 shows the parameter extraction results of the DE-MGP in 600 groups of samples. Figure 6a and Figure 6b  are the fitting results of S 11 and S 21 under 300 groups of samples, respectively. Figure 7a and Figure 7b are the fitting results of S 11 and S 21 under 600 groups of samples, respectively. The abscissa is the frequency range from 3900MHz to 4100MHz, and the ordinate is the dB value of S-parameter. The blue curve is the S-parameter curve of the ideal coupling matrix, and the red circle is the S-parameter predicted by the proposed DE-MGP.
It can be found from Table 2, Figure 6 and Figure 7 that, generally, increasing the sample size can improve the accuracy of the training model. However, we can reasonably reduce the training dimension and set the dimensionality reduction parameters to improve the accuracy of our training model under the same sample size. According to Table 2, we also find that for the sixth order filter, the DE-MGP training model we proposed can also effectively save the training samples needed in our experiment.

C. COMPARISON WITH OTHER MODELS
In this part, we implement several classical surrogate models respectively and apply them to the filter design we studied. In the case of the same data set, their experimental results are shown in Table 3. The models include shallow neural network (SNN), deep neural network (DNN), support vector machine (SVM), and kernel extreme learning machine (KELM). Among them, the modeling process of SNN and DNN refers to the method in [35], the modeling of SVM refers to the method in [14], and the modeling of KELM refers to the method in [16]. From Table 3, we can find that compared with other surrogate models, the proposed DE-MGP has better extraction accuracy in both fourth-order filter and sixth-order filter under the same experimental data. In all the methods mentioned above, DNN can improve the accuracy of the model obviously when there are more training data, but compared with the proposed DE-MGP, it still has a large error.

V. CONCLUSION
In this paper, we propose a new machine learning model named differential evolution based manifold Gaussian process (DE-MGP). Compared with the original GP model, the proposed model has a higher accuracy in dealing with high dimensional problems. After setting reasonable low dimensional parameters, the DE-MGP can obtain more concise low dimensional spatial mapping through a small information loss when processing data, which greatly reduces the spatial complexity of the data in training the model, and also greatly reduces the amount of calculation. By comparing the DE-MGP with the original GP in different sample sizes, we can also find that the DE-MGP with less training samples can achieve the modeling effect of the original GP model. This indicates that we can save our training samples largely when we use the proposed model to solve the problems with high dimension and small sample size. And it also greatly reduces the cost of establishing the inverse surrogate model of microwave filter. To sum up, the model proposed in this study can well solve the problem of high sample acquisition cost and large sample space dimension in the microwave field, which is of great significance to the optimization design of our microwave components.
In the following research, we will continue our study in two directions. Firstly, we will further explore how to avoid the local optimal risk of manifold parameters in the DE training, and we will improve the generalization ability of our model through more reliable optimization methods. Secondly, we will further study the multivalued problem of inverse surrogate model of microwave components, and try to use a new surrogate model structure to model the problem.
YUBO TIAN was born in Changtu, China, in 1971. He received the Ph.D. degree from Nanjing University, in 2004. He is currently a Full Professor with the School of Electronics and Information, Jiangsu University of Science and Technology. His research interests include the applications of computational intelligence to the electromagnetics field.
TIANLIANG ZHANG was born in Ma'anshan, China. He is currently pursuing the master's degree with the Jiangsu University of Science and Technology. His research interests include signal processing theory and technology.
JING GAO was born in Huai'an, China, in 1995. She is currently pursuing the master's degree with the Jiangsu University of Science and Technology. Her research interests include signal processing theory and technology.