A Novel Deep Neural Network Topology for Parametric Modeling of Passive Microwave Components

Artificial neural network technique has gained recognition as a powerful technique in microwave modeling and design. This paper proposes a novel deep neural network topology for parametric modeling of microwave components. In the proposed deep neural network, the outputs are S-parameters. The inputs of the proposed model include geometrical variables and the frequency. We divide the hidden layers in the proposed deep neural network topology into two parts. Hidden layers in Part I handle both the geometrical inputs and the frequency inputs while hidden layers in Part II only handle the geometrical inputs. In this way, more training parameters are utilized to specifically learn the relationship between the S-parameters and the geometrical variables, which are more complicated than that between the S-parameters and the frequency. The purpose is to reduce the total number of training parameters in the deep neural network model. New formulations are derived to calculate the derivatives of the error function with respect to training parameters in the deep neural network. Taking advantage of the calculated derivatives, we propose an advanced two-stage training algorithm for the deep neural network. The two-stage training algorithm can determine the number of hidden layers in both parts during the training process and guarantee that the proposed deep neural network model can achieve the required model accuracy. The proposed deep neural network can achieve similar model accuracy with less training parameters compared to the commonly used fully connected neural network. The proposed technique is demonstrated by two microwave parametric modeling examples.


I. INTRODUCTION
Parametric modeling of microwave components plays an important role in the area of electromagnetic (EM)-based microwave design. Parametric models can be developed from the relationship between the EM behavior of microwave components and the geometrical variables. The inputs of The associate editor coordinating the review of this manuscript and approving it for publication was Dušan Grujić . the parametric model include geometrical variables and the frequency, and the outputs are EM responses (such as S-parameters) of the microwave component. The developed parametric models can provide fast and accurate prediction of the EM responses for different values of geometrical parameters and subsequently can be implemented in high-level circuit and system designs. It can accelerate the EM-based design process by avoiding repetitive EM simulations, which are usually time-consuming. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Different modeling methods, such as neural network technique [1], [2], polynomial-based surrogate modeling technique [3], Kriging technique [4], and support vector machine (SVM) [5], can be used to build parametric models for microwave components. Polynomial-based surrogate modeling technique, Kriging technique, and SVM have good generalization capability for the situation where the training data are limited [3]. Neural network technique is applicable to the situation where the size of training data is large. In this paper we develop a neural network-based method for parametric modeling of passive microwave components where extensive training data are used.
Artificial neural network (ANN) modeling technique is an efficient parametric modeling technique for EM behaviors of microwave components. The developed ANN models can provide fast solutions to the tasks they have learned [6]. ANN technique has been utilized as a powerful technique for microwave modeling and design optimization [6]- [12]. In [13]- [15], microwave parametric models are developed by the knowledge-based neural network techniques where the neural networks are combined with prior knowledge. In addition, there are applications reported in nonlinear microwave device modeling [16]- [21], power amplifier modeling [22]- [24], multiphysics parametric modeling [25], and microwave component design [26]- [28].
Most of these reported applications utilize shallow neural networks. In recent years, deep neural network with many hidden layers has gain recognition in the neural network community learning complicated relationships in large data sets [29]- [32]. Outstanding results have been achieved by deep neural networks in various challenging areas, such as speech recognition [33], image recognition [34], sentiment analysis [35], language processing [36], and machine translation [37]. In [28], the deep neural network technique is introduced into the microwave modeling field to address the challenges because of high-dimensional inputs. The application examples presented in [28] show the advantages of the deep neural network over the shallow neural network. The deep neural network technique in [28] can be trained to learn the training data in a high-dimensional space. In the area of parametric modeling of microwave components, there are situations where the relationship between the model outputs and the geometrical parameters is more complex than the relationship between the model outputs and the frequency. In this paper, we propose a novel deep neural network topology for parametric modeling to specifically address this situation. The proposed technique can represent the input-output relationship using less training parameters than the commonly used fully connected neural network.
In this paper, we propose a novel deep neural network topology specifically to reduce the total number of training parameters for neural network-based parametric modeling of microwave components. The inputs of the parametric model include geometrical variables and the frequency, and the outputs are S-parameters. In the proposed deep neural network topology, the inputs are divided into two sets, i.e., geometrical inputs and the frequency input. We divide the hidden layers in the proposed deep neural network into two parts. Hidden layers in Part I handle both the geometrical inputs and the frequency input while hidden layers in Part II only handle the geometrical inputs. In this way, more training parameters (including weights and biases in the proposed deep neural network) are utilized to specifically learn the relationship between the S-parameters and the geometrical variables, which are more complex than that between the S-parameters and the frequency. The purpose is to reduce the total number of training parameters in the deep neural network model while maintain the model accuracy. In order to train the proposed deep neural network, we propose new computations of the derivatives of the error function with respect to training parameters. Taking advantage of the calculated derivatives, we propose an advanced two-stage training algorithm for training the deep neural network. By using the two-stage training algorithm, the number of hidden layers in both parts can be determined during the training process and the model accuracy can reach the required threshold after training. The novel deep neural network topology can achieve similar model accuracy using less training parameters than the fully connected neural network. This paper is organized as follows. In Section II, we first describe the structure of the proposed deep neural network topology. The feedforward computations from model inputs to outputs are derived for the proposed deep neural network. The computations of the derivatives of the error function with respect to training parameters are derived for the training algorithm. An advanced two-stage training algorithm is proposed to train the novel deep neural network model. The proposed deep neural network technique is applied to two parametric modeling examples of microwave filters in Section III. Finally, Section IV concludes the paper.

II. PROPOSED DEEP NEURAL NETWORK TOPOLOGY FOR PARAMETRIC MODELING OF MICROWAVE COMPONENTS A. THE PROPOSED DEEP NEURAL NETWORK TOPOLOGY
We propose a novel deep neural network topology to develop parametric models for microwave components more efficiently. Parametric models of microwave components can be developed from the information of EM responses as functions of geometrical parameters and the frequency. The inputs of the parametric model are the geometrical parameters of microwave components and the frequency. The outputs of the parametric model are the EM responses such as the scattering parameters. The relationship between the EM responses and the geometrical parameters is usually more complex while the relationship between the EM responses and the frequency is less complex. In order to represent the relationship between the EM responses and the geometrical parameters and the frequency more efficiently, we propose to use more hidden layers to learn the relationship between the EM responses and the geometrical parameters and use less hidden layers to learn that between the EM responses and the frequency.   Figure 1, we divide the hidden layers in the deep neural network into two parts. Sigmoid function is utilized as the activation function for hidden neurons in both parts. We intend to feed the frequency input directly into the first layer in Part I. To make the topology of the proposed deep neural network tidier, we add a hidden neuron that can achieve a unit mapping from its input to its output. Let p be the number of hidden layers in Part I. The vector containing the weights in Part I is defined as u. Let q be the number of hidden layers in Part II. The vector containing the weights in Part II is defined as v. The proposed deep neural network model can be defined as where g is the function representing the input-output relationship of the proposed deep neural network model. The output of the developed model is a function of geometrical variables x and the frequency variable f . The frequency f is only one input variable for the model. When using the proposed model to represent the EM behavior of a microwave component, the developed model needs to be calculated at multiple different frequency points by changing the value of f . , i = 1, 2, . . . , N h 1 . By using the feedforward computation, z h,i 1 of Part I can be calculated as where σ (·) is the activation function used in the hidden neurons of Part I, i.e., the sigmoid function.
Then we proceed to Part II. Since the first layer in Part II connect to both the pth layer of Part I and the single hidden neuron connecting with the frequency input, we add the single hidden neuron into the pth layer of Part I for the description convenience. The outputs of the neurons in the extended pth layer of Part I becomes The number of neurons in the extended pth layer of Part I becomes N p 1 + 1. Let v l ij represent the weight between the jth neuron in the (l − 1)th layer and the ith neuron in the lth layer for l = 2, 3, . . . , q, j = 1, 2 . . . , N l−1 2 , i = 1, 2, . . . , N l 2 . For l = 1, v 1 ij represents the weight between the jth neuron in the extended pth layer of Part I and the ith neuron in the first layer of Part II. We also introduce an extra weight parameter v l i0 to represent the bias for the ith neuron of lth layer. Thus, the vector v for Part II include v l ij , l = 1, 2, . . . , q, j = 0, 1, . . . , N l−1 2 , i = 1, 2, . . . , N l 2 . By using the feedforward computation, z l,i 2 of Part II can be calculated as where z p,j To represent the weights between the qth layer of Part II and the output layer, the definition of v l ij is extend to the output layer, i.e., v l ij , l = q + 1 represents the weight between the jth neuron in the qth layer of Part II and the ith neuron in the output layer. After the feedforward computation, we can extract the outputs of the proposed deep neural network model from the output layer as Then we need to train the proposed deep neural network model with the training data by adjusting values of the training parameters.

C. PROPOSED TRAINING ALGORITHM FOR THE NEW DEEP NEURAL NETWORK
The deep neural network model need to be trained with the training data. Let d kt be defined as the desired model outputs . . , N f , where N g represents the total number of geometric samples and N f represents the total number of frequency samples in the frequency band of interest. Specifically, for microwave parametric modeling, x k is a set of geometrical parameters of the microwave component, f t is a frequency in the designed frequency band, and d kt should be the S-parameters corresponding to x k at the frequency f t . The standard error function is used to evaluate the performance of the proposed deep neural network model in the training process [38]. The error function is defined as where y j (x k , f t , u, v) is the jth output of the proposed deep neural network model corresponding to x k and f t , and d j kt is the jth element of d kt .
Gradient-based training method is one of the most efficient methods to train a neural network model [6]. When training the deep neural network model with the gradient-based training method, the weights are adjusted according to the derivatives of the error function formulated in Equation (6) with respect to the weights. We propose a method to calculate the derivatives by extending the back propagation (BP) concept for multilayer perceptron (MLP) to our novel deep neural network structure [38]. New formulations are proposed to compute the derivatives of the error function with respect to the training parameters. The derivatives can guide the gradient-based training for the proposed deep neural network structure.
The error function in Equation (6) is the average of the errors of the model for all geometric samples at all sampled frequencies. For the kth geometric sample at the tth sampled frequency, the total error of all outputs of the model can be computed by [38] Let γ h,i 1 , h = 1, 2, . . . , p and γ l,i 2 , l = 1, 2, . . . , q represent the total input to the ith neuron in hth layer of Part I and that to the ith neuron in lth layer of Part II, respectively. For the description convenience, the definition of γ l,i 2 is extended to the output layer, i.e., γ Starting from δ l,i 2 , l = q + 1 at the output layer of the deep neural network model, the BP algorithm will propagate this local derivative backward through the hidden layers to the input layer. Since neurons in the output layer utilize linear activation functions, the δ l,i 2 , l = q + 1 at the output layer can be calculated as Subsequently, this local derivative of the output layer will be propagated backward to the hidden layers in Part II by the BP algorithm. The local derivative δ l,i 2 in these hidden layers in Part II can be derived as Then, the local derivative will be propagated backward from hidden layers in Part II to those in Part I by the BP algorithm. For the pth hidden layer in Part I that is directly connected to the first hidden layer in Part II, the local derivative δ h,i 1 , h = p can be derived as For the other hidden layers in Part I (i.e., h = p − 1, p − 2, . . . , 1), the local derivative δ h,i 1 is calculated using similar method as the standard back propagation [38] Once we obtain the local derivatives for all layers in the proposed deep neural network, we can compute the derivatives of E kt with respect to the weight parameters by where δ h,i 1 and δ l,i 2 are calculated using Equations (8) (8)-(13), we can compute the derivative of the error function with respect to each weight parameter. Then the weight parameters are updated based on the calculated derivatives [6].
We propose a two-stage training algorithm for the proposed deep neural network. The derivative information is used to guide the gradient-based training process. To simplify the training algorithm, we predetermine the number of hidden neurons per layer in Part I and Part II according to experience before the training process. Usually, if the geometrical input dimension is relatively high and/or the variation ranges of the model inputs are relatively large, we choose larger number of hidden neurons. If the geometrical input dimension is relatively low and/or the variation ranges of the model inputs are relatively small, we choose smaller number of hidden neurons [28]. Using the proposed two-stage training algorithm, the number of hidden layers in Part I and Part II can be determined during the training process.
In Stage I, the number of layers in Part II is set to be one, i.e., q = 1. In this stage, we determine the number of layers in Part I, i.e., the value of p by changing the value of p in the training process. Usually we start from p = 1. A deep neural network with one hidden layer in Part II and p hidden layers in Part I is trained to reduce the training error as much as possible. After training, we check the model accuracy. If the model accuracy can satisfy the requirement, we stop the training process and do not need Stages II. If the model accuracy cannot satisfy the requirement, a new hidden layer is added to Part I again and again. We stop adding new layers to Part I until the model accuracy can satisfy the requirement or until the training error cannot be reduced even if a new layer is added in Part I. After each layer is added in Part I, we train the deep neural network model and compute the training error. We define E be and E tr as the training errors of the deep neural network model before and after adding the recently added layer in Part I, respectively. Let E re represent the required error threshold of the model. If E tr < E be and E tr > E re , it means that adding a layer in Part I can reduce the training error. However, the reduced training error still cannot reach the required error threshold. In this case, the deep neural network model is in underlearning state. A new hidden layer need to be added to Part I again. If E tr ≥ E be even after many training iterations, it means that adding more layers in Part I cannot reduce the training error anymore. In this case, the last added layer in Part I is deleted and the total number of hidden layers in Part I is determined as p. After determining the value of p, we proceed to Stage II.
In Stage II, in order to further reduce the training error of the deep neural network, we begin to add hidden layers in Part II. We add one new hidden layer to Part II again and again until the model accuracy can achieve the accuracy requirement. After training of Stage II, the number of hidden layers in Part II can be determined as q. The final deep neural network has p hidden layers in Part I and q hidden layers in Part II. It can represent the input-output relationships of parametric models of microwave components accurately.

D. PROCESS OF DEVELOPING THE DEEP NEURAL NETWORK PARAMETRIC MODEL
We summarize the development process of the deep neural network parametric model of microwave components as follows.
Step 1) Perform EM simulation to generate training and test data using random sampling method. Fix the number of hidden neurons in each layer of Part I and Part II. Initialize p = 1, q = 1.
Step 2) Train the neural network with p hidden layers in Part I and q hidden layers in Part II.  reduce the training error as much as possible. Add one hidden layer in Part II to the trained model, i.e., q = q + 1.
Step 8) Train the neural network with p hidden layers in Part I and q hidden layers in Part II. Calculate the training error E tr and the test error E te .
Step 9) Compare the values of E tr and E re . If E tr ≤ E re , go to Step 10); else if E tr > E re , add one hidden layer in Part II, i.e., q = q + 1, and go to Step 8).
Step 10) Compare the value of E te and E te . If E te > E re , add more training data and go to Step 8); else if E te ≤ E re , stop the training process. The flow diagram of overall development process of the proposed deep neural network parametric model is shown in Figure 2. According to the two-stage training algorithm, Stage I consists of Steps 2) to 6), and Stage II consists of Steps 7) to 10).

E. DISCUSSION
In addition to the sigmoid function, other activation functions, such as tanh function, the rectified linear unit (ReLU), and so on, can also be adopted in the proposed deep neural network structure. If another activation function is used, the model structure and training algorithm of the proposed technique will not be changed. Only the calculation of model outputs and the derivatives of the error function with respect to the weights need to be revised accordingly. The idea of deriving the calculations of model outputs and derivatives is similar to what we have presented in Section II. In this paper, we choose sigmoid function as the activation function for the hidden neurons because it is one of the most commonly used activation functions for neural network [38].
In the two-stage training, it is necessary to check the test errors in both stages and adjust the training data accordingly. This is because in Stage II, the new layers are added to the neural network trained from Stage I. We keep the trained layers and weights from Stage I and further train the whole neural network after adding new layers in Stage II. The trained weights from Stage I provide a good starting point for Stage II. During the training process, we need to check the test errors in both stages. If we do not check the test error in Stage I, the neural network trained from Stage I may be in the overlearning state. In this case, the weights trained from Stage I cannot provide a good starting point for Stage II. It will make the training of Stage II harder. Therefore, we need to check the test errors in both stages and adjust the training data accordingly.
The number of hidden neurons is changed during the training process since the number of hidden layers is changed. In our proposed two-stage training algorithm, to make the training process simpler, we predetermine the number of neurons per layer and adjust the total number of training parameters by changing the number of layers in the training process. If the randomly initialized weights are close to a local minimum, and the local minimum cannot satisfy the required error threshold, we will add a new layer with random weights and train the neural network with the newly added layer. The added layer makes it possible for the neural network to get out of the current local minimum and achieve a better result.
The neural network topology proposed in this paper is meant to reduce the number of training parameters in the neural network model while maintain the model accuracy. It is specifically designed to address the situation where the relationship between the outputs and geometrical parameters is more complex than the relationship between the outputs and the frequency. For the application situations where the relationship between the model outputs and geometrical parameters is less complex than the relationship between the model outputs and the frequency, the proposed technique may not be that effective. The developed neural network model is accurate within the training range, and is unreliable if it is used outside the training range. An extrapolation technique to guide the neural network outside the training range is reported in [39] to address this issue. The proposed deep neural network method is for parametric modeling of passive microwave components. The parametric modeling of nonlinear components can be a very interesting future direction.

III. EXAMPLES A. PARAMETRIC MODELING OF A THREE-POLE H-PLANE FILTER
In this example, we develop a parametric model for a threepole H-plane filter [40], whose structure is shown in Figure 3, using the proposed deep neural network topology. The cross section of the waveguide where the filter is constructed is 19.05 mm × 9.525 mm (WR-75). For this filter, the relationship between the S-parameters and the geometrical parameters is more complex than the relationship between the S-parameters and the frequency. The proposed deep neural network structure is suitable for developing the parametric model of this filter. Four geometrical variables are used as geometrical inputs to the model of this example, i.e., x = [L 1 , L 2 , W 1 , W 2 ] T , and the real and imaginary parts of S 21 are used as model outputs, i.e., y = [Re(S 21 ), Im(S 21 )] T . When developing parametric models for the microwave components, the outputs are S-parameters of the components. In this filter example, we choose one of the S-parameters (S 21 ) as the model output to demonstrate the proposed technique. The proposed modeling method is also suitable when other S-parameters besides S 21 are used as model outputs. The model is developed in similar modeling procedure when using other S-parameters as model outputs.
We apply the novel deep neural network topology to two different cases as defined in Table 1. In Case 1, the geometrical parameters change in a narrower range, while in Case 2 the geometrical parameters change in a wider range. For both cases of this example, the frequency range is from 11.5 GHz to 12.5 GHz. For both cases, we perform EM simulations to generate the training and test data using random sampling method. Suppose N tr and N te are the total numbers of training and test data needed for developing the parametric model. The value of N tr changes in the training process according to the two-stage training algorithm. In order to generate the training and test data, we first generate N tr and N te sets of randomly distributed geometrical samples in the variation range of geometrical parameters. For each set of geometrical parameters, we perform full-wave EM simulation using HFSS to simulate the filter at all chosen frequency samples. Suppose we have N f frequency samples, where N f equals to 101 for this example. We will get N f sets of training/test data for each set of geometrical parameters. Each set of training/test data is composed of the geometrical parameters, one frequency, and the S-parameters at the specific frequency. After simulations at all the geometrical samples and frequencies, we will get N tr × N f sets of training data and N te × N f sets of test data. We develop the parametric models in both cases for the three pole H-Plane filter using the proposed deep neural network technique. The training and test error threshold for this example is set to be 2%. The training and test error threshold is a user-defined parameter. It is defined according to the accuracy requirement of the developed model for different application examples. Smaller error threshold means higher model accuracy. We defined the error threshold to be 2% because it is accurate enough for this filter example. In Case 1, the number of hidden neurons in Part I and Part II are determined to be 5 and 15, respectively. We start from one hidden layer in each part and change the number of hidden layers based on the training algorithm. When the training process is finished, the final structure is composed of two layers with 5 hidden neurons per layer in Part I and two layers with 15 hidden neurons per layer in Part II. The total number of training parameters, including weight parameters and biases, is 432. We start from using 20200 training data and gradually add the training data. When the number of training data reaches 80800, the training and test errors achieve the required threshold. The average training error and test error of the proposed deep neural network parametric model are 1.93% and 1.94%, respectively. Two different geometrical samples in Case 1 are used to test the modeling results as shown in Figure 4. The two test samples are randomly chosen from all test data. From Figure 4, we can see that the S-parameters provided by the deep neural network model can match the desired S-parameters very well. For comparison purpose, we also develop a parametric model for this example in Case 1 using the fully connected 3-layer MLP [6]. The comparison of the modeling results are shown in Table 2. A 3-layer MLP with 54 hidden neurons, i.e., MLP: 5-54-2, is used to learn the learn the input-output relationship of the parametric model. The total number of training parameters is similar to the number of those in the proposed deep neural network model. The average training and test errors of the 3-layer MLP with 54 hidden neurons are 2.36% and 2.39%, respectively. Then we increase the number of hidden neurons of the 3-layer MLP to be 150, i.e., MLP: 5-150-2, where there are 1202 training parameters in total. The average training and test errors of the 3-layer MLP with 150 hidden neurons are 1.98% and 2.00%, respectively.
In Case 2, the number of hidden neurons in Part I and Part II are 10 and 25, respectively. The number of hidden layers in each part starts from one and changes based on the training algorithm. When the training process is finished, the final structure is composed of two layers with 10 hidden neurons per layer in Part I and two layers with 25 hidden neurons per layer in Part II. The total number of training parameters is 1162. For this case, we start from using 101000 training data and gradually add the training data. The final size of the training data is 303000 when the two-stage training process is finished. After training the deep neural network, the average training error and test error are 1.78% and 1.88%, respectively. The modeling results at two random geometrical samples in Case 2 are shown in Figure 5. The deep neural network parametric model can provide accurate S-parameter solutions, which match the desired S-parameters very well. For comparison purpose, we also use the traditional fully connected neural network to develop parametric models for this example in Case 2. The comparison of the modeling results are shown in Table 2. Two 3-layer MLPs with different number of hidden neurons are used to build the parametric model for Case 2. The numbers of hidden neurons of these two neural networks are 145 and 200, respectively. The MLP with 145 hidden neurons has 1162 training parameters, which is as same as the number of those in the proposed deep neural network model. The average training and test errors of the 3-layer MLP with 145 hidden neurons are 3.47% and 3.43%, respectively. When the number of hidden neurons is increased to 200, the total number of training parameters in the neural network increases to 1602. The average training and test errors are reduced to 3.22% and 3.21%, respectively.
From the test results shown in Figure 4 and Figure 5, the S-parameters provided by the proposed deep neural network model can match the desired S-parameters very well for both cases. According to the comparisons of the modeling results shown in Table 2, our proposed deep neural network topology can achieve better training and test errors for both cases compared to the 3-layer MLPs with the similar number of training parameters. For the narrower range (Case 1), adding more hidden neurons in the 3-layer MLP can reduce the training and test errors to the required error threshold. For the wider range (Case 2), adding hidden neurons in the 3-layer MLP can reduce the training and test errors slightly. However, it is much harder to reduce the training and test errors to the required error threshold. In other words, for narrower range, both the proposed deep neural network topology and the conventional 3-layer MLP can satisfy the model accuracy requirement. The proposed deep neural network technique is more efficient than the 3-layer MLP because it needs less training parameters than the 3-layer MLP to achieve similar model accuracy. For the wider range, the proposed deep neural network can achieve the required accuracy efficiently while the 3-layer MLP cannot reach the required accuracy even with much more training parameters than the proposed technique.

B. PARAMETRIC MODELING OF A FIFTH-ORDER WAVEGUIDE BANDPASS FILTER
Parametric models of a fifth-order waveguide bandpass filter in two cases are developed using the proposed deep neural network topology in this example. The structure of the filter is shown in Figure 6 [15]. d 1 , d 2 , and d 3 are the distances from the irises to the waveguide wall. z 1 , z 2 , and z 3 are the distances between two adjacent irises. The thicknesses of the irises are t 1 , t 2 , and t 3 . The parametric model for this example has nine geometrical inputs, i.e., x = [d 1 , d 2 , d 3 , z 1 , z 2 , z 3 , t 1 , t 2 , t 3 ] T , and two outputs, i.e., y = [Re(S 21 ), Im(S 21 )] T .
We apply the novel deep neural network topology to two different cases as defined in Table 3. The frequency range for both cases of this example is from 9.5 GHz to 12.5 GHz. Total 151 frequency samples in the range are used for this example. For both cases, we perform EM simulations to generate the training and test data using random sampling method.
The training and test error threshold for this example is set to be 2%. In Case 1, we use 8 hidden neurons per layer in Part I and 15 hidden neurons per layer in Part II. The number of hidden layers in each part is adjusted according to the training algorithm. When the training process is finished, the final structure is composed of two layers with 8 hidden neurons per layer in Part I and two layers with 15 hidden neurons per layer in Part II. The initial training data size for Case 1 of this example is 75500. After the two-stage training process, the total number of training data used for this case is 302000. The developed model with 574 training parameters can achieve a 1.72% training error and a 1.75% test error. Training and test errors are the average errors of all training data and test data, respectively. Two random geometrical samples in Case 1 are used to show the modeling results in Figure 7. From Figure 7, we can see that the S-parameters provided by the deep neural network model can match the desired S-parameters very well. The comparison results of the proposed technique and MLPs with different number of hidden neurons are shown in Table 4. The MLP (i.e., MLP: 10-44-2) with same number of training parameters as the proposed technique has a 2.28% training error and a 2.30% test error. When the total number of training parameters increases to 1224, the average training and test errors of the MLP (i.e., MLP: 10-94-2) are reduced to 1.76% and 1.76%, respectively.
In Case 2, the final structure is composed of two layers with 10 hidden neurons per layer in Part I and two layers with 25 hidden neurons per layer in Part II after the twostage training. There are 1212 training parameters, including weight parameters and biases, in the developed model. The initial training data size for Case 2 of this example is 302000. After the two-stage training process, the total number of training data used for this case is 377500. The average  training error and test error are 1.74% and 1.80%, respectively. Figure 8 shows the modeling results at two random geometrical samples in Case 2. The S-parameter solutions from the developed model match the desired S-parameters very well. The comparison results of the proposed technique and MLPs with different number of hidden neurons are shown in Table 4. The 3-layer MLP (i.e., MLP: 10-94-2) with similar number of training parameters as the proposed technique has a 2.88% training error and a 2.94% test error. When the total number of training parameters increases to 1900, the average training and test errors of the MLP (i.e., MLP: 10-146-2) are reduced to 2.70% and 2.76%, respectively. Figure 9 shows training errors versus the the number of training epochs for the proposed neural network model and the 3-layer MLP with 146 hidden neurons. A 4-layer MLP is used to develop the parametric model for Case 2 of this example as a further comparison. The modeling result is shown in Table 4. There are 35 hidden neurons per layer in the 4-layer MLP model, and the total number of training parameters is 1717. After training, the average training and test errors are 1.77% and 1.84%, respectively. According to the comparisons of the modeling results shown in Table 4, for narrower variation range (Case 1), both the proposed deep neural network technique and the conventional 3-layer MLP can achieve the required model accuracy. The proposed deep neural network technique is more efficient than the 3-layer MLP because it needs less training parameters than the 3-layer MLP to achieve similar model accuracy. For the wider range (Case 2), the proposed deep neural network technique the required accuracy efficiently while the 3-layer MLP cannot reach the required accuracy even with much more training parameters than the proposed technique. Compared to the 4-layer MLP, the proposed neural network technique can achieve similar accuracy with fewer training parameters.

IV. CONCLUSION
This paper has proposed a novel deep neural network topology for parametric modeling of microwave components. The inputs in the proposed deep neural network topology have been divided into geometrical inputs and the frequency input. We have divided the hidden layers in the propose deep neural network into two parts in order to reduce the total number of training parameters. We have derived the feedforward computation from the model inputs to the outputs for the proposed deep neural network. New computations of the derivatives of the error function with respect to training parameters VOLUME 8, 2020 have been proposed. We have proposed an advanced twostage training algorithm to train the deep neural network. The training algorithm can determine the number of hidden layers in both parts during the training process and guarantee that the proposed deep neural network model can achieve the required model accuracy. The proposed deep neural network topology can achieve similar model accuracy using less training parameters than the fully connected neural network.