A Hybrid Approach Based on Recurrent Neural Network for Macromodeling of Nonlinear Electronic Circuits

This paper proposes a hybrid approach combining Recurrent Neural Network (RNN) and polynomial regression methods for time-domain modeling of nonlinear circuits. The proposed hybrid RNN-polynomial regression (HRPR) method merges RNN and polynomial regression which leads to a significant reduction in training time while providing speedup in simulation compared to both conventional RNN and existing models in simulation tools without sacrificing accuracy. The proposed HRPR method comprises two steps: First, an RNN structure is generated, and then, the output of the RNN is combined with external input(s) of the circuit to perform a regression. Applying this method causes part of the training process to be done by polynomial regression which is simpler than training an RNN. Also, the RNN used in the HRPR method has a simpler structure than a single conventional RNN used for modeling the same component. To verify the validity of the proposed method, modeling and comparisons of three nonlinear examples are presented in this paper.


I. INTRODUCTION
With the increasing complexity of systems exploiting nonlinear circuits and components, control and macromodeling them with high accuracy remains an important concern and active research area [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. Macromodeling is an approach for creating efficient circuit models that reduces the amount of information required to handle them. In other words, a macromodel can be viewed as a compact abstraction of a circuit. With macromodeling, only information necessary to calculate some desired output variables is retained, while the rest of the data can be suppressed [1]. Several macromodeling approaches have been used in the literature, such as inertial delayed Elmore delay (DED) [1], Trajectory PieceWise (TPW) method [13], The associate editor coordinating the review of this manuscript and approving it for publication was Mu-Yen Chen .
With the development of new technologies, existing models may become insufficiently accurate. Thus, existing models may need to be modified or improved [16], [17]. However, developing a new equivalent circuit model, which usually requires manual trial-and-error efforts, is a time-consuming procedure. As an alternative approach, artificial neural networks (ANNs) have been introduced in the literature for nonlinear device control and modeling and have contributed to the evolution of computer-aided design (CAD) of circuits and systems [2], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27] ANN-based CAD methods have led to a notable improvement in efficiency and speed of modeling. Optimization and simulation of circuits and components have been frequently used for modeling of nonlinear circuits and systems [28], [29], [30], [31]. ANNs can learn input-output relationships. Trained model can be used in circuit simulators to provide quick and accurate responses [32]. They can be developed from external signals of the original circuit without implementing its internal details. This capability is very useful for modeling a new device or circuit when its analytical representation is not available, or when a detailed model is too computationally expensive to evaluate [32]. Several time-dependent neural networks such as dynamic neural networks (DNNs) [33], time-delay neural networks (TDNNs) [34], echo state network [35], Long Short-Term Memory (LSTM) [36], recurrent neural networks (RNNs) [32], [37], [38], [39], [40], [41], [42], [43], [44], [45] and state-space dynamic neural network [29], [46], and recently batch normalized recurrent neural network [47] have been proposed in the literature in order to obtain high-performance models for nonlinear circuits and components. Also, there are other modeling methods such as fractional order methods which rely on solving differential equations. Fractional order methods use fewer parameters, reduce complexity, and are simpler but neural network use more parameters and more complex to train but lead to more accuracy. Also, using differential equations for system identification results in more sensitivity to noisy data where discrete-time RNNs do not have that challenge. Also, models based on fractional orders usually can be used for steady-state response of circuits but the RNN-based model can capture both transient and steady-state behavior of nonlinear circuits accurately [48], [49], [50], [51] Among all these methods, RNNs are broadly used for macromodeling the time-domain response of nonlinear circuits. Their capability is notably due to the universal approximation aspect of RNNs, which can be trained to learn and approximate virtually any sophisticated input-output relationship [38]. RNNs have many parameters that can be trained. Also, recursive training through multiple steps and layers makes RNN training a time-consuming technique.
In this paper, a method that combines RNN with polynomial regression has been introduced to solve these issues. For the first time in modeling nonlinear circuits and components, regression units are added on top of the RNN to perform as a part of the model training. This hybrid method results in a significant reduction in the number of parameters compared to the conventional RNN method. The reason is that the RNN which is used prior to the regression units, has a smaller structure compared to the one (without regression) which is required for modeling the same circuits with suitable accuracy. The goal of regression is to predict the values of one or more continuous target variables given the values of a multi-dimensional vector x of input variables [52]. There are several types of regression methods, such as linear regression and polynomial regression. Regression has been used in different areas and applications, such as face recognition [53], price forecasting [54], decoding muscle activation pattern [55], and estimation of human affective states [56]. Our proposed hybrid method presented in this paper uses polynomial regression as part of the training process. Thus, the training time for modeling nonlinear components is significantly reduced. This reduction in training time is in such a way that the model obtained from the proposed method is trained multiple times faster than the conventional methods. Other than reducing training time, this method reduces test time (evaluation of response) compared to conventional RNN and existing circuit simulation tools [57]. This is due to using fewer number of parameters in the proposed method. Indeed, some of the inputs of the adopted regressions can be obtained from simple RNNs containing few parameters, as compared to conventional RNN methods which contain many parameters to be trained. Therefore, the method proposed in this paper can outperform the conventional RNN technique in both speed and accuracy.
This paper is organized as follows: the conventional RNN structure and its use for macromodel development are presented in section II. The proposed HRPR method is presented in section III. Validation of this method using three examples is reported in section IV. Finally, conclusions drawn from this research are presented in section V.

A. FORMULATION OF CIRCUIT DYNAMICS
Consider N u , N y , and N P , the number of input signals, output signals, and circuit parameters of a nonlinear component, respectively. Also letŷ = [ŷ 1 . . .ŷ N y ], u = [u 1 . . . u N u ] and P = [p 1 . . . p N P ] be the outputs, time varying inputs and circuit parameters, respectively. In the rest of the paper, the input signals of the circuit are waveforms that have different values at each time, meaning that they are vectors of different real values. On the other hand, circuit parameters, such as capacitance, have constant values and are not changed through time. Also, the outputs of a circuit are vectors of signals that are generated based on the input signals at each time and parameters of the circuit. The characteristics of the original nonlinear circuit can be described in a nonlinear state space form as (1) where £ and v are nonlinear functions, T is the vector of state variables, and N s is the number of states [41]. In the case where the nonlinear circuit is complex (comprising numerous nonlinear components), the original nonlinear equations in (1) will be computationally complex to solve. Thus, a reduced complexity model that is easier to solve than the original complex equations is needed. It can be obtained by converting the original complex set of equations to a discrete-time set of equations with a specific sampling rate as follows [42]: where k indicates the time index in the discrete-time domain, t k is k th time step, M y and M u are the number of delay steps ofŷ and u, respectively, and f is a set of nonlinear functions. M y and M u also correspond to the number of delays for the output and input signals, respectively.

B. RECURRENT NEURAL NETWORKS (RNNs)
In this section, the RNN structure with global feedback from the output to the input (without local feedback for each hidden neuron) is presented. This has been widely used in macromodeling of nonlinear components and circuits [32], [37], [38], [39], [40], [42].
C. RNN STRUCTURE Figure 1 demonstrates the structure of a conventional recurrent neural network. Let d y and d u be the number of buffers for output y and input u, respectively. In Figure 1 the first layer of the RNN includes the delayed output signals y which are returned from the output of the RNN, the delayed input signals u, and the time-invariant circuit parameters P. The last layer includes the time-varying output signal y which can be formulated for the i th neuron at k th time step as follows: where v ij is the weight between i th neuron of the output layer and j th neuron of the last hidden layer. The vector of the weights between the last hidden layer and the output layer, and the bias of neurons in the output layer are defined as where σ is the sigmoid activation function, z j is the output of the j th hidden neuron of the last hidden layer, and ϕ j (t k ) and N z are the weighted summation of the outputs of the layer before the last hidden layer and the number of hidden neurons of the last hidden layer, respectively.

D. ERROR CALCULATION
The error function of the RNN, needed in the proposed method, is described here. Let y(t) be the predicted output of the RNN model andŷ(t) be the target value. Suppose the train- ing data be represented by input-output signals (u d (t),ŷ d (t)), To employ a gradient-based optimization technique for training the RNN, we need the derivative of the error function with respect to the parameters of the structure. For the k th time index in d th training signal u d ((t k ),ŷ d (t k )), the training error is defined as: where N y is the number of output signals. Let ψ show the parameters of the RNN containing weights between layers and the bias of neurons in different layers. To compute d y id (t k )/dψ, the following procedure can be executed: Consider x a as a th neuron of the input layer. For k = 1 assume d y id (t k )/dψ = ∂ y id (t k )/dψ, then d y id (t k )/dψ for k >1 can be obtained by the histories of d y id (t k )/dψ as below [32], [33], [37]: where x [j+(m−1)N y ] and y jd (t k−m ) are equal. The recurrent backpropagation consists of two parts. In the first part partial derivative ∂y id (t k ) ∂ψ is obtained by normal back propagation through the feedforward neural network (FFNN) between the input and output layers. In the second part, is computed by further back propagating to the input layer and can be written as: Now the derivative ∂y id (t k ) ∂ψ is stored to be used as history for computing the derivative at (k + 1) th time step.

III. THE PROPOSED HYBRID METHOD
The proposed hybrid RNN-polynomial regression (HRPR) method combines recurrent neural network and polynomial regression models. The polynomial models use outputs of the RNN at their inputs. These two parts of the models developed with the HRPR method have separate structures and full training should be done in such a way that training of RNN should be performed before training of the polynomial models.

A. HYBRID RNN-POLYNOMIAL REGRESSION (HRPR)
In this section, the structure of the models produced by the proposed HRPR method is presented in detail. The means of combining RNN and the polynomial regression model is explained in section III.B. Figure 2 indicates the structure of the proposed hybrid method which consists of two parts: the first part is a trained conventional RNN structure that receives the input signal(s) and circuit parameters of electronic component and generates the output signals partly used as inputs of the next part. The second part consists of polynomial regressions receiving the output of the RNN, circuit parameters, and the input(s) of the electronic component simultaneously as their inputs in order to generate the final output. As shown in Figure 2, each time step in the proposed method has its own polynomial regression unit. Assume there are S time steps in each waveform. In Figure 2 and P are input and output signals of electronic component at time step i th , output of RNN at time step i th , and circuit parameters, respectively. In this way, each external input signal at the i th time step and circuit parameters, are first passed through the RNN to generate its outputs at the same time step, and then, the generated output along with external inputs and circuit parameter are given to their corresponding polynomial regression unit (i th unit). Finally, the output of the model is obtained by S polynomial regression units.

B. POLYNOMIAL REGRESSION 1) POLYNOMIAL REGRESSION IN A LINEAR MODEL
As mentioned in section III.A, the second part of the proposed HRPR method consists of polynomial regressions. Noteworthy to mention that in machine learning area, polynomial regression can be used in a linear model as discussed in [52]. Let us first explain the linear model concept as defined in [52] in the following paragraphs.
The simplest linear model for regression involves a linear combination of the input variables: where D is the sum of the number of input signals and circuit parameters of the nonlinear component to be modeled and x = (x 1 . . . x D+1 ) is the vector of input variables. This is often simply known as linear regression. The main property of this model is that it is a linear function of the regression parameters, w 0 , . . . , w D+1 . It is also, a linear function of the input variables x j , and this causes the model to have significant limitations. Thus, we extend the class of linear models by considering linear combinations of fixed nonlinear functions of the input variables: where ζ j is known as basis-function andD is the number of basis-functions. Indeed, the basis-functions enable the model to entangle a function of the input instead of the input itself in order to increase the ability of the model in capturing more complex relationships between input and output. By denoting the maximum value of the index j byD + 1, the total number of regression parameters in this model will beD + 2 and the parameter W 0 corresponds to any fixed offset in the data and is called bias [52]. Now we can use any nonlinear function as basis-function in (8). As an example, for an input vector x = (x 1 , x 2 ) T , if we set ζ 1 (x) = x 1 and ζ 2 (x) = x 2 , a polynomial regression with degree 1 is formed as bellow: Also, to form multivariable polynomial regression of higher degrees, for example with degree 2, we can set basisfunctions as below: Consequently, a linear model of the polynomial regression with degree 2 based on (8) is constructed as below: We can see in (11) that a polynomial regression can be used in linear models where the output is linear in terms of w [52].
In fact, we can have both linear and nonlinear polynomial regression models. In this paper, the linear polynomial regression model has been used in the proposed method.

2) POLYNOMIAL REGRESSION IN THE PROPOSED METHOD
To use polynomial regression in each time step of the HRPR method same as Figure 2, the equation (8) is rewritten as below: where y PR(i) is the output of i th polynomial regression unit and x i = [RNN _O(t i ), P, u(t i )] T in which ζ j should be defined accordingly to form a polynomial regression with the desired degree. Also w i = [w 1 . . . wD] is a vector containing parameters of the polynomial regression whereD, which is the number of terms in the constructed polynomial regression, depends on the degree of the polynomial. For example, in (12), if polynomial regression with degree 1 is used,D will be 3 and if polynomial with degree 2 is used,D will be 10.

C. ORDINARY LEAST SQUARE (OLS)
In section III.B the polynomial regression which can be used in a linear model was introduced. Iterative optimization methods such as stochastic gradient descent (SGD) can be used to find the parameters of the linear model. Other than iterative methods, there are closed-form methods that do not require parameters such as learning rate and number of epochs used in the iterative form. In this section, the Ordinary Least Square (OLS) is introduced. It is one of the well-known closed-form solutions in linear model estimation used in the literature [58], [59].
Assume there exists n number of samples for solving a polynomial regression. For simplicity consider a polynomial regression with degree 1 as below: We can put all together in the form of Y i = [y i 1 , y i 2 , . . . , y i n ] T where y i j (j = 1 . . . n) is target output of j th sample in i th time step, and coefficients at time step i th can be written as W i = w i 1 , . . . , w i D+1 T (bias is ignored for simplicity).
Also, let n samples at i th time step be noted as: . .
where each row of X i is devoted to one sample. Now we should find parameters of W i such that the following objective function is minimized: where J(W i ) is the objective function defined as: Due to the fact that power of two of a matrix is equal to multiplication of the transpose of that with itself, J(W i ) can be expanded to: Finally, gradients can be calculated as (18), shown at the bottom of the next page, in order to find optimal W i , the gradients should be set to zero. Therefore, where (X i ) T X i −1 demonstrates the inverse of matrix (X i ) T X i −1 . Using (19), parameters of the linear model are found based on inputs and outputs of the training data. It means that parameters are learned directly without performing many epochs and the model is ready to be used for test data. Therefore, because of the closed-form nature of the OLS method, it is much faster than iterative methods and setting free parameters such as iteration number and learning rate is not required.

D. DETERMINING THE POLYNOMIAL DEGREE IN HRPR
The OLS formulas introduced in section III.C are based on equation (13) which is a polynomial regression with degree 1. Polynomial regression with higher degrees can also be used in the OLS method. This is because, as discussed earlier in section III.B, polynomial regression with higher degrees can be also linear in terms of W i coefficients. To use polynomial regression with higher degrees in OLS, matrix of n samples similar to equation (14) should be created based on polynomial regression which is represented in equation (20) (ignoring bias for simplicity) for polynomial of degree 2 and D = 1, and then replaced in equation (16) and corresponding coefficients will be the vector of W i = w i 1 , . . . , w iD T , whereD is the number of variables in each row of (20). The rest of the procedure is similar for polynomial regressions with degrees 1 and 2 in HRPR. For polynomial regression with degree 1, we use equations (14)- (19) and if polynomial with degree 2 is used, the same equations are needed, except equation (14) which should be replaced with (20). Each row in the matrix of equation (20) corresponds to one training data for i th polynomial regression model where x i m1 and x i m2 are the output of RNN and the external input of circuit at i th time step respectively for m th training data.

E. TRAINING OF THE RNN WITH THE PROPOSED HRPR METHOD
As shown in Figure 3 training of the proposed method is done in two stages: In the first stage, the training waveforms are obtained using circuit simulation tool and an RNN is trained using these waveforms. In second stage, as shown in Figure 3, outputs obtained from trained RNN in addition to the input(s) of nonlinear component in each time step and circuit parameters, are concatenated to form the training data for the polynomial regression in the same time step. Therefore, for each time step, a specific regressor is trained according to equation (19). Suppose we have n training waveforms, each containing S time steps. After training the RNN using these training waveforms, the equation (13) can be rewritten as (21).
where RNN_O(t i ) is the output of RNN at time step i th based on equation (3) and u i 2 , . . . , u i D+1 are circuit parameters and input signals of nonlinear component all in time step i th , respectively, and W i 0 , . . . , u i D+1 are parameters of polynomial regression unit at time step i th . Noteworthy to mention that the RNN structure used in the proposed method has considerably fewer parameters compared to the conventional RNN structure for modeling the same component. The flowchart in Figure 4 demonstrates the training procedure of the proposed HRPR method.

A. TRANSMISSION-GATE CIRCUIT
The first example to verify the validity of the proposed method is the Transmission-Gate (TG) component shown in Figure 5. Training and test waveforms were generated using the SPICE circuit simulator. A set of signals were generated as training data by varying rise/fall times from 50ps to 60ps with steps of 2ps and load capacitance of 20-24fF with steps of 2fF.
Some other signals with rise/fall times of 51ps to 57ps with steps of 2ps and load capacitance of 20.5, 21, 23, and 23.5 fF were generated as test waveforms which were not used in the training procedure. Table 1 shows the comparison of the training and the test errors/times using the proposed HRPR, conventional RNN, and LSTM [36] methods for modeling TG circuit. The results prove that the proposed hybrid method achieves good enough accuracy in much less training time compared to the conventional RNN and LSTM methods.
Also, the test time of the proposed HRPR-based model is less than conventional RNN-based and LSTM-based models. The comparison of output test signals obtained using the proposed method, the RNN-based model, and Transistor-level models are shown in Figure 6. Also, Table 2 represents the simulation (test) time speedup of the transistor-level and the proposed HRPR-based. As it can be seen from the table, the model obtained from the proposed technique for TG component is considerably faster than the existing model in circuit simulators.

B. FREQUENCY DOUBLER DEVICE
The schematic of a frequency doubler has been shown in Figure 7. A set of signals were generated as training waveforms by varying frequency from 2 kHz to 2.1 kHz with steps of 0.02 kHz and amplitudes of 0.09, 0.1 and 0.11 Volts. Some other signals with frequencies of 2.01, 2.03, and 2.05 kHz and amplitudes of 0.092, 0.094, 0.098, 0.106, and 0.108 Volts were generated as test data. Table 3 shows the comparison of training and test errors/times using the proposed HRPR, the conventional RNN, and LSTM methods for modeling frequency doubler component. As it can be seen in this table, polynomial regression with degree 1 is not capable of achieving the desired accuracy in training/test procedures but polynomial regression with degree 2, due to having more complex structure, can perform better on both training and test processes.
The comparison of output test signals obtained using the proposed HRPR-based, RNN-based, and transistorlevel models are shown in Figure 8. Table 4 shows the speedup comparison among the transistor-level, the proposed HRPR-based, and LSTM-based models for frequency doubler.
As can be seen from Tables 3 and 4 the proposed hybrid modeling method demonstrates significantly less training time in comparison with the conventional RNN and LSTM methods. Also, the final obtained HRPR-based model of frequency doubler not only shows considerable speedup compared to the transistor-level model but is also faster to compute than the model obtained using conventional RNN and LSTM technique.

C. CMOS INVERTER
The schematic of a CMOS inverter used in this paper is shown in Figure 9. A set of signals was generated as training data by varying the rise/fall times from 1.6ps to 2.6ps with steps of 0.2ps and amplitudes of 0.9V, 1V, and 1.1V. Some other signals with rise/fall times of 1.9ps to 2.5ps with step0.2ps and amplitudes of 0.92, 0.94, 0.95, 0.96, 0.98, 0.102, and 0.108 Volt were generated as test data.
Also, comparison of training and test errors and times using HRPR, conventional RNN, and LSTM methods for modeling a CMOS inverter is shown in Table 5. As can be seen from the table, the proposed HRPR method is remarkably faster to train compared to conventional RNN and LSTM models. Also, the final model of the CMOS inverter obtained from   the HRPR technique is much faster to compute than the one obtained using the RNN and LSTM techniques. The output test signals of the proposed HRPR-based, RNN-based, and Transistor-level models are shown in Figure 10. These results show that the proposed HRPR-based model better matches the transistor-level model than the RNN-based model. Table 6 also demonstrated the speedup achieved using the transistor-level and the proposed HRPR-based models for the CMOS inverter component. The results in this table show considerable CPU time improvement for the HRPR-based model in comparison with the transistor-level model.
As it was seen in all three examples, models produced by the HRPR method outperform the conventional straight RNN models in terms of both accuracy and speed. Indeed, according to equation (21), regression units of stage 2 finetune the incoming outputs from RNN of stage 1, resulting in better accuracy. The presented results demonstrate that the proposed method needs considerably less training time compared to the conventional RNN and LSTM methods in order to model the same circuits without losing accuracy. Also, as RNN of stage 1 contains considerably a smaller number of parameters compared to the conventional RNN and LSTM, a great speedup has been achieved. It means that using the proposed method for modeling nonlinear circuits  leads to a faster model which generates the output in a shorter time.
Noteworthy to mention that the proposed method is not limited to transient simulation of components and circuits. It can be applied for any time-domain (transient or steady state) simulations. The limitation here is the availability of accurate training data, so if we want to use this technique for periodic steady state analysis, the proposed method has to be trained for that which means we should generate enough steady state training data and pass them to the proposed method to be trained. Since the neural network-based models, such as the model obtained by the proposed HRPR method, are data-dependent models, the functionality of the model directly depends on the generated training data. For example, if we train the model in a specified range of data, we only expect the model to work well on testing data inside the range and for having a suitable model outside of the range, VOLUME 10, 2022    we should expand the initial range for generating the training data.

D. DATA REDUCTION IN MODEL DEVELOPMENT
One of the main concerns in modeling nonlinear components is the amount of data required for developing an accurate model as generating data is usually costly. The proposed HRPR technique not only creates more accurate models but also requires a smaller number of training data for creating models with similar accuracy compared to the conventional RNN method. To show efficiency of the proposed HRPR method in this case, Transmission Gate example was trained again with different number of training waveforms and the results have been shown in Table 7. As it can be seen from Table 7, the proposed HRPR method requires half of the number of training waveforms compared to the conventional RNN for developing model with similar accuracy for this device.

E. ROBUSTNESS OF THE PRPOSED METHOD AGAINTS NOISY DATA
Training the model based on the HRPR method can be performed by the training data which have been generated by simulation software or measurement tools. In the case of generating training data by measurement tools, we will likely encounter noise in the data. So, a set of noisy data have been generated to train the model based on HRPR method. The noisy data are generated by applying an additional white noise (Gaussian noise) to the original waveforms [36], [47]. Table 8 demonstrates the effect of noise on the functionality of the proposed HRPR method for modeling the TG circuit. As can be seen in this table, despite slightly degenerating the accuracy of the HRPR-based model by the appearance of the noise in the data, an acceptable accuracy is still obtained. These results demonstrate the robustness of the proposed HRPR-based model against the noisy data.

V. CONCLUSION
In this paper a hybrid method called HRPR was proposed. The method combines conventional RNN and polynomial regression models. HRPR was used for modeling nonlinear electronic components. Input-output waveforms of the original component that were used in the training and testing procedures were obtained from SPICE circuit simulator. The new method demonstrated significant reduction in training time compared to models obtained from the conventional RNN and LSTM modeling method due to its more efficient structure and training procedure. Also, it showed considerable speedup in inference (simulation) time compared to the models obtained from the conventional RNN and LSTM methods and the transistor-level models in existing circuit simulation tools. Additionally, the models obtained from the proposed structure introduce less inference time errors compared to the models obtained from the conventional RNN structure. Using the proposed HRPR method for modeling nonlinear components, there is no need to have deep knowledge of details of the internal structure of each component or device. In addition to these advantages, the proposed hybrid method required fewer training waveforms compared to the conventional RNN method to create models with similar accuracy. Three practical examples were used in this paper to demonstrate the validity of the proposed macromodeling approach.