Remaining Useful Performance Estimation for Complex Analog Circuit Based on Maximal Information Coefficient and Bidirectional Gate Recurrent Unit

Degradation of circuit components are typically accompanied by a deviation in component parameters from their normal values, which can ultimately influence the stable operation of complex analog circuit. To address this concern, remaining useful performance (RUP), regarded as the useful performance from the current time to the end of performance, is an effective way to ensure system safety by providing early warning of failure and enabling forecast maintenance. In this paper, a novel RUP estimation method based on the two-stage maximal information coefficient (TSMIC) and bidirectional gate recurrent unit (Bi-GRU) network is proposed. Initially, the run to failure data of the circuit in real-time is obtained by RT-LAB hardware-in-the-loop. Additionally, to obtain suitable features reflecting degradation trend over cycles, a TSMIC method is proposed to eliminate features hardly changing with degradation cycle in the first stage, mine mutual information between features in the second stage. Furthermore, the linear regression model is used as a performance evaluation to retain the original pattern in the selected features. Through the fusion of the selected multi-features, health indicators of different circuit components are constructed. Ultimately, the deep Bi-GRU unit network, which can extract representative time-series information and explore subtle differences of the degradation cycles, is used to generate prediction results. The proposed framework is verified through a case study on the complex analog circuit, and comparisons with other state-of-the-art methods are presented. The experimental results of the case study show the effectiveness and superiority of the proposed approach.


I. INTRODUCTION
Inertial confinement fusion (ICF) [1] is one of the fascinating approaches to achieve fusion ignition, and there have been several huge laser drivers around the world for such research, i.e., the Laser Megajoule in France and National Ignition Facility (NIF) in the United States [2]. The SG-III laser facility is the largest laser driver in China, which is carried out in the Laser Fusion Research Center of China Academy of The associate editor coordinating the review of this manuscript and approving it for publication was Baoping Cai . Engineering Physics (CAEP). As the most important equipment of the SG-III laser fusion facility, the MJ-level power condition module (PCM) circuit consists of 53 components in total, which is regarded as a complex analog circuit. The circuit components deviate from their nominal values gradually under the condition of high voltage and high current pulse. This gradual degradation will affect the performance of the circuit, even resulting in catastrophic accidents. Therefore, it is of great significance to propose a prognostics and health management (PHM) approach to facilitate early warning and predictive maintenance of circuit failures. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ In the past decades, PHM [3], [4] technology have been extensively studied in various engineered systems, i.e., health assessment for electronic component [5], fault diagnosis for three-phase inverters [6], remaining useful life (RUL) prediction for IGBT [7], subsea pipelines [8] and batteries [9]. PHM for circuit system can be divided into the following: 1) the detection and fault isolation (FDI) [10]- [13] of faults and 2) the prediction of the remaining useful performance (RUP) [14] of the failing circuit. Nowadays, the PHM researches of a circuit are mainly focused on FDI technology, and there are few works of literature on RUP estimation of a circuit. Nevertheless, FDI technology cannot prevent the faults of the circuit, since it is utilized to process the abnormal output information of the analog circuit after the fault occurs. The faults of the circuit are generally caused by the gradual aging of the components, which makes the component values gradually deviate from the normal value, thereby reaching the failure threshold that exceeds the normal tolerance range [15]. Therefore, accurate and robust RUP estimation for the circuit provides a basis for timely and reasonable maintenance to avoid circuit disaster.
The concept of using RUP to predict the circuit system is first put forward by the center for advanced life cycle engineering (CALCE) of Maryland University in 2013 [14]. Here, RUP estimation refers to the prediction of the degradation cycle when the circuit is no longer be able to perform its intended function [16]. RUP estimation is made challenging due to the existence of component tolerances, the complex fault mechanism, and the interdependence of the circuit component. Section II comprehensively reviews the state-of-the-art RUP estimation approaches for analog circuit in recent years, analyzing the strengths and limitations of these approaches. Generally, RUP estimation approaches for an analog circuit can be divided into model-based [17] and data-driven approaches [18]. Although these RUP estimation approaches achieve good results for the analog circuit, there are still some deficiencies as follows: • The existing researches on circuit RUP estimation mainly focus on DC-DC converters [19], and simple analog circuit, e.g., Sallen-Key bandpass and Biquad low-pass filter circuit (refer to excellent literature [14], [16], [17]). Additionally, the existing model-based RUP estimation approaches show difficulties in building precise degradation mechanism models in practice owing to the uncertainty of circuits and sensor noise.
• As for feature selection, the existing methods show ignorance in considering the nonlinear correlations among features, which exists in most circuit systems. As for prognostics algorithm, the shallow learning models, such as support vector regression (SVR) and relevance vector regression (RVR), often suffer from invalid learning and weak generalization when learning with a large amount of feature data. This paper proposed a novel prognostic method based on the two-stage maximal information coefficient (TSMIC) and bidirectional gate recurrent unit (Bi-GRU) network to deal with the above shortcomings. The technical contributions are summarized as follows: • A comprehensive metric, namely, TSMIC, which is proposed to select optimal features with deep nonlinear correlations between features following the overall degradation cycle. In the first stage, the main feature set is obtained, where redundant features having tiny correlations with the degradation cycle are eliminated. In the second stage, the optimal feature subset is acquired by comparing the value of MIC between any two features in the main feature set, which eliminates features that not representative of the degradation trend.
• For RUP estimation, the proposed deep Bi-GRU network is devised for tracking the evolution of the HI and generating the high-quality RUP results for circuit components. It has the advantage of using two hidden layers to process the sequence of feature data in two directions to capture both past and future information, respectively. Compared with different prognosis algorithm, i.e., SVR, deep convolutional neural networks (DCNN), long-short term memory (LSTM), gate recurrent unit (GRU) and bidirectional long short-term memory (Bi-LSTM), the experiment results show that the proposed method achieves relatively higher prognosis accuracy for different circuit components. The rest of this paper is organized as follows: Section II reviews the related work of RUP estimation methods. Section III introduces the background of the MJ-level PCM circuit. Section IV outlines the proposed RUP estimation framework and theoretical. Section V presents the experimental results of data processing, feature selection, degradation trends construction, and performance prognostics, respectively. Conclusion and future work are drawn in Section VI.

II. RELATED WORK OF RUP ESTIMATION METHODS
In the following subsections, state-of-the-art model-based and data-driven RUP estimation approaches are reviewed.

A. MODEL-BASED APPROACHES
Most model-based approaches are dependent on the empirical knowledge of the operation conditions, material characteristics and failure mechanism to build mathematical models, among which, regression-based methods, hidden Markov model [20], Weibull distribution model [21], Kalman filter and Particle filter [22] are representative.
The regression-based method assumes that the future state value of the analog circuit depends linearly on its previous observations and a stochastic term [23]. There are two conventional models: autoregressive and moving average (ARMA) model and autoregressive integrated moving average (ARIMA) model, which have been widely utilized in the analog circuit for RUP estimation. For instance, in [24], the ARMA model predicts the RUP of analog circuits by combining with particle filtering algorithm. In [25], Ibrahim et al. employed an ARIMA model and polynomial regression method for RUP estimation in the wavelet domain.
Nevertheless, the regression-based method mainly depends on the trend of historical data of circuit components, which may lead to an inaccurate RUP estimation over the degradation cycle.
Markov models underlying assumptions mean that the degradation processes of the analog circuit should evolve in a finite state-space following the principle of the Markov property [20]. Kharoufeh and Cox [26] proposed a RUP estimation method based on the Markov model and stochastic failure model to calculate the RUP distribution numerically and its moment status. After that, the hidden Markov model has been employed for prognostics to overcome the drawback that the Markov model cannot model the hidden health state of the degradation process. Besides, the Weibull distribution model has been used to study the lifetime distribution of key components of a complex analog circuit such as NIF in the USA [21].
Kalman filter and Particle filter methods are commonly used for RUP estimation of the analog circuit. Kalman filter can update and process the degradation data of circuit components in real-time, and there is no need to store circuit state estimation parameters or circuit observation data. Celaya et al. [27] used the accelerated aging test to obtain the degradation trend of the MOSFET on-resistance and then used an extended Kalman filter algorithm to model and predicted the degradation trend of the circuit. In the Particle filter method, Bayes update and particles with probability information of unknown parameters are processed in a certain order, which can solve the time series prediction problem well [22].
In summary, the above model-based approaches can produce accurate prediction results, but the damage mechanics of circuit components must be analyzed. In contrast, data-driven prognostic approaches are attracting increasing attention with their powerful generalization abilities toward complex engineering systems and the rapid development of artificial intelligence.

B. DATA-DRIVEN APPROACHES
Data-driven approaches mainly include four parts: feature extraction, feature selection, health indicator (HI) or fault indicator (FI) construction, and RUP estimation.

1) FEATURE EXTRACTION
The feature sets usually include three types: time-domain features, frequency domain features, and time-frequency domain features. The time-domain features and frequency domain features can capture the amplitude and spectrum differences between fault state and the normal state of the signal, respectively. The time-frequency feature can reflect the joint distribution information in time and frequency domains. In the existing literature, Long et al. extracted the high-order timedomain statistics as features for the diagnosis of the analog circuit [28]. The features of the circuit were fetched via fast FFT, sweep frequency response analysis of output signal. The time-frequency analysis, i.e., cross wavelet transform [29], S-transform [30], have been proposed to obtain a time-frequency signal [31]. In this paper, the output response signal of the MJ-level PCM circuit changes significantly during the whole degradation cycle, and the time-domain feature can reflect more detailed information on the degradation process.

2) FEATURE SELECTION
At present, various feature selection metrics have put forward to eliminate redundant features and remain representative features. Javed et al. [32] developed a trendability metric, to calculate the relationship between the feature and degradation cycle. Monotonicity was proposed to coincide with the irreversible degradation processes. However, the above metrics are more sensitive to linear correlation than nonlinear correlation. If there is no correlation, weak correlation or redundant correlation in the selected features, the following consequences can be resulted in: 1) When the greater number of features selected, analyze these features will be time-demanding; 2) Too many features selected can cause ''dimension disaster''. In contrast, the maximal information coefficient (MIC) is a more effective metric, which always shows a good performance in terms of the ability to detect nonlinear information of data [33]. For example, Bing et al. [15] proposed a feature selection algorithm based on MIC to process continuous data and identify the nonlinear function. Zhang et al. [16] proposed a McTwo algorithm to measure all the features for the MIC associations with the class labels. Based on these reasons, this paper proposes an improved TSMIC algorithm to obtain the optimal feature set.

3) HI/FI CONSTRUCTION
The HI/FI curve characterizes the degradation trend of the circuit. Circuit prognosis is implemented by generating longterm prediction for HI/FI curves until the failure threshold is reached. In the existing literature, in [22], Li et al. proposed that FI of the circuit components can be calculated by several special parameter features extracted from the output frequency-domain signals. Yan et al. [34] applied the Markov distance to construct a virtual FI from the circuit output response. Zhang et al. [16] proposed a continuous parameter calculation method characterized by the correlation of output voltage, which calculated the HI as the cos( * ) and sin −1 ( * ) of the distance between the test feature and the feature extracted from the circuit response. Nevertheless, all the above HI/FI construction methods are idealized, i.e., linear HI/FI curves, without calculating the appropriate HI/FI according to the different degradation effects on the circuit components. Thus, multi-feature fusion approach is proposed in this paper to construct a multi-scale HI model, representing the degradation for MJ-level PCM circuit components.

4) PROGNOSTICS ALGORITHM
There are many shallow learning models, i.e., SVR [35] and RVR [36], which have been widely implemented in RUP estimation. Nevertheless, these methods cannot reveal the complex inherent relationships between the root cause of VOLUME 8, 2020 failure and the signal signatures, which often suffer from invalid learning and weak generalization when learning and training with a large number of fault features.
Nowadays, Bayesian network (BN) [37] or dynamic Bayesian network (DBN) [38] have many applications in fault diagnosis [39] and are increasingly applied in RUL prediction [8]. In the existing literature, Cai et al. [8] proposed a hybrid RUL estimation approach of structure systems considering the influence of multiple causes by using DBNs. In [40], this paper used BN models to perform the reliability evaluation of subsea blowout preventer control systems. In addition, DCNNs is emerging as a highly effective neural network architecture for RUP estimation. Lei et al. [41] proposed a new data-driven approach based on DCNN by using raw sensor data. Compared to shallower networks, the DCNN has a strong ability to capture basic features, but it ignores the relevance of time series signals over the degradation cycle.
RNNs can learn the latent representation from the entire history of the previous inputs to the target vector by establishing connections between units from a directed cycle. RNN, particularly LSTM first introduced by [42], is devolving as powerful models for reducing the difficulty of learning longterm dependencies in the condition-based maintenance data. In the existing literature, the LSTM based encoder-decoder scheme [43] is proposed to obtain an unsupervised prognosis results using multi-sensor time-series data. However, LSTM can only access previous information instead of future information. Based on the above reasons, in [44], a novel prognostic method based on and Bi-LSTM is developed by Huang et al. This method not only can capture the previous information but also can capture future information. In [45], a multi-scale dense GRU network is proposed by Ren et al. The GRU chooses a new type of hidden unit that has been motivated by the LSTM unit. It combines with the forget gate and the input gate into a single update gate. It is also mixed with cellular state and hidden state. The final model is simpler than the standard LSTM model and is a very popular variant. Therefore, in this paper, the Bi-GRU are used to process selected features, followed by linear regression layers for generating high-quality prediction results.

III. BACKGROUND OF SG-III LASER FACILITY A. INTRODUCE OF SG-III LASER FACILITY
As shown in Fig. 1(a) shows the schematic of SG-III laser facility. Fig. 1 (b) and (c) show the architectures and the Beam Reverser in the laser bay. Most architecture has been constructed in the laser bay, including four bundles of laser beams with beam parameter measure assembly, the whole frontend system, energy system, target system and the beam integrated diagnostic system. Therein, the MJ-level PCM circuit is the most important high-power pulsed circuit in energy system. The MJ-level PCM circuit is consist of the pre-ionization and main ionization circuit. Because the circuit system is enormous and cannot be destroyed, we can only study the degradation performance of the MJ-level PCM circuit through simulation experiments. Fig. 2. represent the structure of the MJ-level PCM circuit, which includes 53 components in total, i.e., main ionization capacitors (C 1 ∼ C 10 )/inductors (L 1 ∼ L 10 )/resistors (R 1 ∼ R 10 ), pre-ionization capacitor C 0 /inductor L 0 /resistor R 0 , ballast inductors (L B1 ∼ L B10 ) and Xenon lamps (Lamp 1 ∼ Lamp 10 ). The working flow of the MJ-level PCM circuit is that the pre-ionization circuit generates high voltage pulses with a pulse width of 120µs through the pulse-forming component to provide energy. Then, the Xenon lamp and the pre-ionization circuit are initially triggered. After preionization, the main ionization capacitor is discharged, which provides a high voltage pulse for the Xenon lamp with the pulse duration of the main ionization circuit 470 ∼ 920µs. The simulated charging voltage of pre-ionization and main ionization is 12 kV and 23kV. The specifications for the MJlevel PCM circuit are shown in Table 1.

1) PRE-IONIZATION CIRCUIT
The pre-ionization circuit is composed of a pre-ionization capacitor, a pre-ionization inductor, and a pre-ionization resistor in series, with the nominal value of 15µF, 100µH, and 100m , respectively. If there is no pre-ionization, the ultra-high pulse voltage of the main ionization may lead to the breakdown of the circuit components, thus affecting the stability of the MJ-level PCM circuit. The pre-ionization circuit equipped with one sensor (sensor1) to obtain voltage signals, and the pre-ionization pulse charges the Xenon lamps 250µs ahead of the main ionization pulse, forming a stable discharge channel, thus improving the efficiency of the main ionization circuit.

2) MAIN IONIZATION CIRCUIT AND LOAD CIRCUIT
The main ionization circuit and load circuit are equipped with sensors2 and sensor3, respectively. To form the critical damping current waveform, the parameters of the main ionization circuit are relatively determined. Each branch is composed of a main ionization capacitor, a main ionization inductor, and a main ionization resistor in series, with the nominal value of 90µF, 150µH, and 200m , respectively. The damping element (inductor and resistor) can not only limit the peak value of fault current but also absorb energy, which can effectively protect the main ionization circuit. The main ionization circuit has 10 groups of ballast inductors in parallel so that the current in each Xenon lamp of the load circuit are distributed equally.

IV. PROPOSED FRAMEWORK & THEORETICAL
As shown in Fig. 3, the proposed prognosis framework for the MJ-level PCM circuit includes five steps.
Step1: Data acquisition. With the MJ-level PCM circuit excited by different degradation conditions, the degradation data can be acquired by adjusting the value of a circuit component to make it deviate gradually from its nominal value, and the output voltage of three sensors are gathered from RT-LAB simulator in real-time Step2: Feature extraction. The time-domain features are extracted from the obtained output voltage under different degradation cycles, and the related life cycles are recorded.
Step3: Feature selection. A novel TSMIC algorithm is proposed to reduce the dimension of the data and remove some unrelated features.
Step4: HI construction. The selected feature set is used to construct HIs by the linear regression model. Additionally, the failure threshold and variation limitation in different HI curves are determined for different circuit components.
Step5: RUP estimation. The training feature set and HIs are inputted into the Bi-GRU network to implement model training. In the testing stage, the output of the circuit for a real signal is encoded into a testing set by the same feature extraction and feature selection methods. Ultimately, it is inputted into the trained model to obtain RUP. The following sections describe the theoretical for the five steps involved in our proposed RUP estimation framework.

A. FEATURE EXTRACTION
For the MJ-level PCM circuit, the probability of a single fault condition is higher than that of multiple fault conditions. Therefore, this paper takes a single fault condition as a matter of course, to prove the proposed prognosis framework. A component in the circuit is randomly selected for the simulation degradation experiment, while other circuit components changed in their tolerance range. Output voltage signals of sensors are obtained, which is a set of onedimension run-to-failure time series data. In this paper, since the output response signal of the MJ-level PCM circuit in the time domain shows a clear trend of increasing/decreasing with the degradation cycle. Hence, time-domain signal processing methods are used to extract, transform, and analyze the measured response signal data. Thus, as mentioned in Table 2, time-domain features, namely, Mean, Skewness, Margin factor (MF), Root mean square (RMS), Kurtosis, and Impulsion index (II), are used to extract the output voltage, where x is the voltage signals, i is the sample index, and N is the number of samples.  [33], which aims to discover and classify the data with either linear or other functional relationships. A higher MIC value indicates that it has more dependencies relationship between the feature data, while a lower MIC value means less dependence.
The basic principle takes advantage of mutual information (MI). In order to solve the MI, the data point (x, y) is divided into n grids in XY coordinate. Giving a finite ordered pair of datasets C{(x i , y i ), i = 1, 2, . . . , n}, the X and Y coordinates are divided into s-by-t grids, respectively. Then, grid T with shape s × t is obtained. In practice, the number of grids is fixed, and the MI can be represented as: where p(x i , y j ) is the joint probability density, p(x i ) and p(y j ) are the marginal density.
In order to facilitate comparison and weighting of the values of MI on the same scale, normalization is applied to transform MI from 0 to 1, namely I * ∈ [0, 1]. The element in sth row and tth column of the feature matrix M (C) on dataset C is shown below: Selecting the maximum value of MI in C as the value of MIC. Higher MIC values represent stronger dependency between variables, whereas lower MIC values mean weaker correlation. The formula for MIC is showed as follows: where MIC(C) is the value of MIC of the two variables (x and y) in dataset C. B(n) is a function of the sample size n, which is usually expressed as B(n) = n 0.6 [33].

C. HEALTH INDICATOR CONSTRUCTION
HI model reflects the evolution of the degradation in any of the circuit's critical components relevantly by utilizing multi-feature fusion according to the influence of different circuit components degradation. In this paper, multifeatures are fused to form a one-dimensional vector, which is defined as the HI curve that helps in indicating the degradation trend of the circuit component. The training data (u) between multi-feature and HI. In [46], logistic regression is proposed to fuse the multi-features into one-dimensional HIs, and then HIs are used to predict RUP through the ARMA model. However, it is found that the original degradation pattern of the MJ-level PCM circuit will be distorted by logistic regression, which may lead to inaccurate prognosis output and larger RUP error. Consequently, this paper uses a linear regression model as a performance assessment to retain the original pattern in the selected features: p , . . . , x p ] is the p-dimensional selected feature vector, Y u) is the actual values of uth HI curve in the training set, α is the bias, (β 1 , β 2 , . . . , β p ) is weight parameter, and ε is the noise term. Under the healthy condition, the corresponding HI value of circuit components is close to 1.

D. BIDIRECTIONAL GATE RECURRENT UNIT NETWORK
A GRU network can only capture the dependence of the current state on the previous state (i.e., forward direction in context). In contrast, the core idea behind Bi-GRU lies in the fact that two separate hidden layers were utilized to process the sequence of feature data in two directions (i.e., forward and backward) to capture both past and future information, respectively. The Bi-GRU network is shown in Fig. 4. The following equations describe the separate hidden layers function at degradation cycle t, and two different arrow symbols, i.e., → and ←, denote the forward and backward process, respectively.
where − → Bi−GRU and ← − Bi−GRU are the parameters set of the forward and backward process, which are shared by all degradation cycles and learned during model training. h t−1 represents the hidden states of the degradation cycle (t-1). − are bias weights of the forward and backward process.
U are input weights of the forward and backward process. σ and tanh (hyperbolic tangent) are pointwise nonlinear activation functions and denote pointwise multiplication of two vectors.
In the first stage of RUP estimation, the input of the Bi-GRU network includes selected features and HIs, which compose the training set {x (u) , Y (u) } n u=1 . Therein, the shape of the input feature x (u) = [x (1) p , x where f denotes the hidden layer function of the Bi-GRU network, which is defined by Eq. (5) and (6). H u represents the uth sequence of the output functions processed by the first stage of the Bi-GRU network, which is characterized by the parameters set ( − → Bi−GRU and ← − Bi−GRU ). The complete output h t at the degradation cycle t is the sum of the output elements from the forward and backward process, which can be calculated: where ⊕ represents the sum of the elements of the two vectors.
In the second stage of RUP estimation, the output from the first stage is inputted to the fully connected layers (FCs) to VOLUME 8, 2020 seek a higher level of representation, which can be defined by the following equation: where o u is the output vector of a FC. FC represents the parameters set, i.e., weight matrix W u and bias vector b u . g(·) denotes the activation function of the neurons, which is set as a rectified linear unit. Dropout is adopted in the Bi-GRU network to solve the problem of overfitting. A rectified linear unit (ReLU) is used in the network. Ultimately, the final learned output features of the FC layers are put into the linear regression layer, and the prognosis results are generated.
where y u is the predicted RUP, and W o denotes the weight vector of the final linear regression layer. As illustrated by Fig. 5, the simulation experimental of the MJ-level PCM circuit was built in the OP5600 simulator, which constructs a circuit response database containing multiple degradation conditions and transmits the multi-sensors voltage signal to the PC. The circuit response was captured at the output using a National Instruments (NI) USB-6212 data acquisition board. The data were recorded using LabVIEW on PC. The experiment operations and degradation parameter setting of different circuit components are implemented in the OP5607 controller. After collecting the circuit degradation data, the proposed data-driven approach based TSMIC and Bi-GRU algorithm are devised for tracking the evolution of the HI and predicting the MJ-level PCM circuit's RUP.

Tolerance range, Failure Threshold, and Variation Limitation:
The relationship among tolerance range, failure threshold, and variation limitation should be predefined. There are two mainstream viewpoints in the existing literature: 1) One viewpoint is that circuit components with ±30% deviation from its nominal value (regardless of its tolerance range) should be regarded as failure [14], [47]- [49]. 2) Another viewpoint is that when the components of a circuit deviate beyond its tolerance range, the circuit is regarded as a failure. According to the characteristics of different circuit components, it is divided into the following three cases: a tolerance of ±1% for all the components [50], ±5% for resistors, and ±10% for capacitors adopted in previous work [17], or ±10% for both resistors and capacitors. Due to a large number of components in the MJ-level PCM circuit, including resistors, capacitors, inductors, and xenon lamps. It is necessary to consider both the failure threshold and tolerance range.
Therefore, this paper adopts the viewpoint that the MJ-level PCM circuit reaches failure if the component value has exceeded a predefined failure threshold (±30% deviation from its nominal value). Since the value of circuit components cannot be increased or decreased unlimited, it is considered that the variation limitation of each circuit component is ±40% deviates from its nominal value. Furthermore, in this paper, according to the influence of degradation for different circuit components on the output voltage waveforms distortion, which is shown in the Section IV-A- (2). The tolerance range of components are divided into four cases according to the importance of components of the MJ-level PCM circuit: ±10% for resistors, ±5% for main ionization capacitors/inductors, ±3% for pre-ionization capacitors/inductors, and ±1% for Xenon lamps/ballast inductors, respectively.

1) DEGRADATION PARAMETER SETTING
According to the above definition, the variation limitation and failure threshold of the circuit components are obtained.
where Value 1 represents the variation step of the experimental component with each degradation cycle. The value of each component increases/decreases equally and gradually concerning the degradation cycle. The degradation parameters of the MJ-level PCM circuit are recorded in Table 3. Take R 1 as an example, the nominal value of R 1 is 200k . Since the variation limitation of each circuit component is ±40% deviates from its nominal value, the upper value of variation limitation and lower value of variation limitation are 280k and 120k , respectively. If the value of R 1 increases by 0.8k in each variation step, the number of degradation cycles is 100 from 200 k to 280 k . If the value of R 1 increases by 0.4k in each variation step, the number of degradation cycles is 200 from 200 k to 280 k . Besides, the upper value of failure threshold/ tolerance range and lower value of failure threshold/tolerance range are 260k /220k and 140k /180k , respectively. The other components are operated in the same way.

2) MULTI-SENSORS OUTPUT VOLTAGE
For the MJ-level PCM circuit, the degradation of different components has different effects on the output voltage.

a: MAIN IONIZATION CAPACITORS/INDUCTORS
The failure degradation of main ionization capacitors (C 1 ∼ C 10 ) and inductors (L 1 ∼ L 10 ) only affects the circuit output voltage waveform of main ionization, and the main ionization circuit has ten branches, one component failure has less influence on the circuit system. Fig. 6. (a) and (b) show the output voltage response corresponding to different values of C 1 and L 1 , where 0, −20% and 20% represent the output voltage at a nominal value, a 20% increase in nominal value and a 20% decrease in nominal value, respectively.

b: PRE-IONIZATION CAPACITORS/INDUCTORS
The Pre-ionization circuit plays an essential role in transition ionization. Moreover, the pre-ionization circuit has only one branch. The degradation of the pre-ionization capacitor and inductor will affect the output voltage of the pre-ionization circuit. Fig. 6 (c) and (d) show the output voltage response corresponding to different values of C 0 and L 0 .

c: XENON LAMPS AND BALLAST INDUCTORS
The arc plasma produced by the discharge of the Xenon lamp, which has a strong dependence on the load current, and it represents a nonlinear dynamic resistor characteristic. The resistor coefficient of Xenon lamp is K = 1.27(p/450) 0.2 (4l/d), where p is the internal pressure of the Xenon lamp, l is the arc length of the Xenon lamp, d is the inner diameter of the lamp tube. According to the Gonze model, the nonlinear dynamic resistor of Xenon lamp can be expressed as The resistor coefficient of the Xenon lamp has been set as K = 100 in this paper. The degradation of the Xenon lamp is simulated by adjusting the value of K . Furthermore, the function of the ballast inductor is to distribute the current evenly in each branch of the load circuit.   in different cycles for the training set and testing set. The essential information is found in Table 4. There are 18 features in total, which are further divided into three sensors condition (sensor1, sensor2, sensor3), and their details are shown in Table 5. For example, S1_RMS represents the root mean square of sensor1's output voltage. More formally, as shown in Fig. 7 ]. The raw training feature is recorded as F = [F 1 , F 1 , . . . , F m , . . . , F 18 ]. Fig. 8. shows the partially extracted features in the training set, and it can be observed that different features have different responses to components degradation. Every feature contains 159 multivariate time series data, and the cycle number ranges from 100 to 200. However, not all the features well indicate the variation of health degradation, which indicates that these features are invalid and may not perform well in RUP estimation. As shown in Fig. 8. (a) and (b), these features are poorly correlated with the degradation cycle, which is removed in the first stage of feature selection. The features in Fig. 8. (c) and (d) present different attenuation curves with a lot of clutters. These features are poorly correlated with other features, which are deleted in the second stage of feature selection. Reasonable features have a good correlation with the process of health degradation, which shows a trend of monotonous increase or decrease. Therefore, the features Fig. 8. (e) and (f) represent the regular degradation trend, which will improve the prognosis accuracy.

D. FEATURE SELECTION BASED ON TSMIC
In this part, the TSMIC algorithm is proposed to overcome the shortcoming of insufficient consideration of nonlinear relations and mine the deep mutual information between features. In the first stage, the main feature subset, where features hardly change with the degradation cycle are eliminated, can be obtained by calculating the values of MIC between every feature and the degradation cycle. If the feature does not satisfy Eq. (14), it will be removed.  where F i denotes the ith feature, t is the cycle of degradation, MIC i represents the value of MIC between ith feature and degradation cycle t. σ 1 is the threshold in the first stage, which is expressed as: where N represents the number of features in the raw feature set F = [F1, F2,. . . , F18]. As shown in Fig. 9, using the feature labels as a horizontal coordinate and the corresponding MIC values of the features as the ordinate and the threshold The symmetric matrix in Eq. (16) is shown in Fig. 10, where each value represents the value of MIC between two features in the main feature set S. The higher the value of correlation, the better the feature representing the declining trend of circuit health status. The average MIC Mean i in per line should be considered to eliminate the redundant VOLUME 8, 2020 features, which can reflect the degree of correlation between all other features with the ith feature. Then, Mean j satisfies the Eq. (17), the jth feature is regarded as the optimal feature. Otherwise, it will be eliminated.
where Mean j represents the average MIC of jth row. σ 2 is the threshold in the second stage, which is expressed as: Mean j (18) where M is the number of features in the main feature set S. For the off-diagonal elements, if m ij ≥ σ 2 , (i ≤ j), delete the feature (set m ij , m jj = 0), the matrix is updated as follows: From Eq. (18), the threshold σ 2 is calculated as 0.8. If m ij does not satisfy Eq. (18), the corresponding diagonal value m jj becomes 0. Subsequently, all diagonal features with the value of 1 are selected as the optimal features with the strongest correlation. Finally, after feature selection through a TSMIC algorithm, the optimal features are represented as

E. HEALTH INDICATOR CURVES
The HI curve is set relevant to each circuit component, then followed by tracking its HI value to predict the end of performance (EOP) of the components in the MJ-level PCM circuit. In this paper, HI curves are calculated by Eq. (4) and fitted with the linear regression model of health condition. Then, a training HI model {M (u) } can be established to describe the degradation performance of circuit components from normal to failure: (20) where y (u) t is a point data at the degradation cycle t for one HI curve, L (u) is the length of the last degradation cycle before circuit component failure for simulation experiment u. HI model {M (u) } can be trained by the Bi-GRU network for producing an estimated output. Fig. 11 shows the 159 HI curves for different components in the MJ-level PCM circuit, which are constructed by the linear regression model. Therein, one of the HI curves indicates the degradation trend for one of the components deviated from their nominal values.

F. FAILURE PROGNOSTICS 1) PROGNOSTIC PERFORMANCE METRICS
The error in estimating the RUP of the nth circuit component is given by where RUP Estimated and RUP Actual denote the estimated RUP and actual RUP, respectively. And n represents the nth testing circuit component. E n can be divided into early prediction (i.e., the estimated RUP value is smaller than the actual RUP value) and late prediction (i.e., the estimated RUP value is larger than the actual RUP value). In this work, the root mean square error (RMSE) is used to evaluate the prognosis accuracy of RUP. A smaller RMSE error reflects the higher effectiveness and stability of the prognosis result. The formulation of RMSE is as follows:

2) THE PARAMETERS SELECTION IN BI-GRU MODEL
The network structure has a significant impact on the performance of prognosis results. Considering that the prediction results may be affected by many factors, such as parameter initialization, and the dropout technique, each result is the average of 10 repeated experiments.

a: EFFECTS OF THE DIFFERENT NUMBERS OF THE NETWORK LAYER
The training time increase with the network structure deepening, and there will be the risk of overfitting due to the limited number of training samples in the experiment. Thus, the number of total hidden layers is constrained between 1 and 5. As shown in Fig. 13 (a), the structure with three hidden layers can produce the best results of RUP estimation among a total of 10 predictions when compared with the rest of the structures. Therefore, it is used as the default network structure in the subsequent simulation experiment.

b: EFFECTS OF DIFFERENT BATCH SIZES
Another critical parameter in the network is the batch size. Fig. 12 (b) shows the results of RUP estimation in different batch sizes. Because of the importance of early prediction needing more consideration in condition-based maintenance, batch size with a smaller value of RMSE is selected within a certain error range. The best performance is acquired when the batch size sets to 32.

c: EFFECTS OF THE DIFFERENT STEPS OF THE SLIDING WINDOW
The step of sliding window decides how much information to be trained in one sample preparation. This work presents the effect of the sliding window size in Fig. 12 (c). When the step of the sliding window is set as 20, RMSE results are the lowest. On the one hand, the Bi-GRU network cannot extract effective features hidden in shorter sequence degradation data, which contains limited sequence information. On the other hand, although more extended sequence degeneration data contain more local information, longer steps can increase the overfitting risk of the Bi-GRU network during the training stage.
For the choice of the hyperparameters such as the number of the network layer, batch size, and step of sliding window dropout rate, training epochs, and early stopping criteria, we implemented a grid search strategy to obtain a satisfactory prediction performance. The hyperparameters in the Bi-GRU are summarized in Table 6.

3) PROGNOSIS RESULTS OF THE PROPOSED METHOD
When the prognosis module is triggered, the HI curve is tracked at any degradation cycle to predict the RUP of the VOLUME 8, 2020   circuit component. As shown in Fig. 13. (a)-(c), variation limitation in a pre-ionization capacitor C 0 (test experiment #31), a Xenon Lamp 8 (test experiment #61), and a main ionization inductor L 9 (test experiment #76) are 160, 180 and 160, which represent the value of C 0 , Lamp 8 and L 9 increase/decrease 40% from their nominal value. The degradation for each component was assumed to increase/decrease gradually and evenly concerning the degradation cycle. Thus, the failure threshold in the C 0 , Lamp 8, and L 9 are 120 (160 × 3.4), 135 and 120, which represent the value of C 0 , Lamp 8, and L 9 increase/decrease 30% from their nominal value. In addition, the prognosis experiments for C 0 , Lamp 8, and L 9 at degradation cycle 60 (120 × 50%), 68, 60, which is truncated at 50% of the testing run to failure data. More precisely, take C 0 as an example, Fig. 13(a) shows that RUP estimation for C 0 starts at degradation cycle 60. It is seen that tracking and predicted trend follows closely along with the actual values until it reaches the failure threshold at degradation cycle 120. The estimated failure occurs at the degradation cycle 113, and the estimated RUP is 53 (113-60). The actual failure occurs at the degradation cycle 120, and the actual RUP is 60. Hence, the errors between the actual RUP and predicted RUP for C 0 is 7 degradation cycles.
To test different prediction lengths, which got close to failure gradually, as shown in Fig. 14 (a)-(c), the second group of prognosis experiments for C 0 , Lamp 8, and L 9 are performed at degradation cycle 72 (120×60%), 81, 72, which is truncated at 60% of the testing run to failure data. In the same way, as shown in Fig. 15 (a)-(c), the third group of prognosis experiments for C 0 , Lamp 8, and L 9 are performed at degradation cycle 84 (120 × 70%), 95, 84, which is truncated at 70% of the testing run to failure data. The estimated failure for C 0 , Lamp 8, and L 9 occurs at the degradation cycle 118, 136, 123, and the actual failure occurs at degradation cycle 120, 135, 120. Hence, the errors between the actual RUP and predicted RUP are small, which are only 2, 1, and 3 degradation cycles for C 0 , Lamp 8, and L 9 . Fig. 16. (a)-(c) show the RUP estimation at every degradation cycle for C 0 , Lamp 8, and L 9 , respectively. The blue line represents the RUP estimates, the estimated curve is relatively smooth and nearly linearity. Since the ideal degradation curve of the circuit component is linear, which proves the proposed method is effective and robust whenever the prognosis module is triggered.

4) COMPARED WITH THE STATE-OF-ART RUP ESTIMATION APPROACHES
The state-of-the-art methods, i.e., SVR [35], DCNN [41], LSTM [43], GRU [45], Bi-LSTM [44] are compared to the proposed method. The important parameters in these networks are summarized in Table 7. To illustrate that the Bi-GRU model is superior to SVR, DCNN, LSTM, GRU, and Bi-LSTM models, the average prediction error (Eave%) is used to evaluate the test results. where M denotes the number of samples in the degradation cycle, RUP i and RUP i represent actual RUP percentage and the predicted RUP percentage, respectively. From the perspective of the engineering application, the RUP estimation accuracy of the final 10% and 5% of the whole degradation cycle (Eave10% & Eave5%) is essential. Therefore, the average prediction errors of Eave10% and Eave5% are calculated to validate the proposed approach. The results of existing methods are presented in Fig. 17. It reveals that the averages of Eave%, Eave10%, and Eave5% of the Bi-GRU model outperform the other compared methods.
More specifically, compared with SVR, the proposed method based on Bi-GRU solve the overfitting problem of shallow learning network caused by fluctuant noise features collected from multi-sensors. In addition, compared with SVR where predictions are based on the local degradation features, Bi-GRU can fuse multi-sensors data with variable time window sizes, which allows the neural network to be able to capture the short-range (local) and long-range (global) degradation features from multiple sensors.
DCNN can extract global and local features from some feature maps, but it generally requires large input parameters, which will inevitably be mixed with noise. On the contrary, Bi-GRU only requires fewer input parameters, which can shorten training time and control the balance between the noise and use information well. Furthermore, the network of Bi-GRU is more flexible in adjusting the structure parameters.
Compared with standard RNN methods (such as LSTM and GRU), Bi-LSTM and Bi-GRU can better represent in learning the hidden relationship between circuit components degradation features and RUP estimation. In addition, the amount of information extracted by the bidirectional approach is more than the single-direction approach. Especially at the late stage of the degradation process (i.e., Eave10% and Eave5%), the Bi-LSTM and Bi-GRU method can obtain better results.
For the Bi-LSTM model, the performance is similar to Bi-GRU in most of the cases. To further illustrate the superiority of the proposed Bi-GRU model, Fig. 17 shows that the averages of Eave%, Eave10% and Eave5% of Bi-GRU outperform Bi-LSTM by 0.63%, 0.34% and 1.42%, respectively. What's more, the standard deviation (Stdev) is calculated to explore the stability of the two models. It can be seen that the Stedvs of Eave%, Eave10%, and Eave5% of Bi-GRU outperform Bi-LSTM by 0.42%, 0.1%, and 0.25%, respectively. In summary, the experimental results show that the Bi-GRU achieves better RUP estimation performance than the above state-of-the-art methods.

VI. CONCLUSION
In this paper, a novel prognostic framework is proposed by using a TSMIC algorithm for the selection of the most relevant features and Bi-GRU for RUP estimation, which can reduce the fluctuation of the selected features as well as improve the accuracy of prediction. The contributions of this paper can be summarized as follows: • Firstly, during the feature selection stage, owing to the lower sensitivity of other indicators to nonlinearity, a new feature selection metric, namely TSMIC, is proposed. The selected optimal feature set is as the input of the models, the prediction accuracy of all the prediction models are improved.
• Secondly, a Bi-GRU network model is employed to achieve RUP estimation. Several other state-of-the-art deep models, such as SVR, DCNN, LSTM, GRU, Bi-LSTM, are compared to prove the superiority of the proposed Bi-GRU. The test results indicate that the proposed method is superior to the existing models. The future research aims to continue with the following goals: 1) A study how to extend RUP estimation in the case of multiple synchronous failures. 2) The deeper relations between features will be dug out with more distinctive approaches to reduce the error of RUP estimation further.
3) Some other deep models, such as the attention model, will be considered to enhance the robustness and generalization ability of circuit RUP estimation.
YIGANG HE (Member, IEEE) received the Ph.D. degree in electrical engineering from Xian Jiaotong University, Xi'an, China, in 1996. He was a Senior Visiting Scholar with the University of Hertfordshire, Hatfield, U.K., in 2002. From 2011 to 2017, he has worked as the Head of the School of Electrical Engineering and Automation, Hefei University of Technology. In December 2017, he joined Wuhan University, China, where he currently works as the Vice-Head of the School of Electrical Engineering and Automation. His research interests include power electronic circuit theory and its applications, testing and fault diagnosis of analog and mixed-signal circuits, electrical signal detection, smart grid, satellite communication monitoring, and intelligent signal processing. He has authored or coauthored 300 journal articles and conference papers, which was included more than 1000 times in Science Citation Index of American Institute for Scientific Information in the aforementioned areas. CHAOLONG ZHANG received the Ph.D. degree from the Hefei University of Technology, in 2018. He is currently a Postdoctoral Researcher of electrical engineering and automation with Wuhan University. He is also an Associate Professor with the School of Physics and Electronic Engineering, Anqing Normal University. His current research interests include fault diagnostics and prognostics of analog and mixed-signal circuits, battery capacity prognostic, satellite communication monitoring, and intelligent signal processing.