Fault Prediction for Capacitor of Power Converters Based on CEEMDAN and GWO-RELM

Aluminum electrolytic capacitors (AECs) get multiple superior functions such as filtering, energy storage and decoupling, which have a great effect on the performance and lifetime for power converters. Therefore, analyzing and predicting the faults of Aluminum electrolytic capacitors (AECs) is conducive to improve the safety and reliability of the power converters. In order to establish the AECs’ fault prediction model and improve the accuracy, an integrated model based on complete ensemble empirical mode decomposition with adaptive noise, grey wolf optimization algorithm and regularized extreme learning machine (CEEMDAN-GWO-RELM) is proposed. The CEEMDAN is used to decompose the time series of AEC degradation process into several sequences, which can decouple the feature of local fluctuations from global degradation in the AEC time series. Then, the RELM optimized by GWO is used to predict each sequence after decomposition. RELM has the advantages of fewer hyperparameters and less operation time, and GWO with strong astringency is used for its optimization to obtain better fault prediction. Eventually, the predicted values are reconstructed to obtain the predicted values of the integrated model. The results show that, based on the aging data of AEC, the integrated model based on CEEMDAN-GWO-RELM can provide better prediction progress than traditional models, and the maximum relative error of each prediction time point is lower than 1.6%.

As the interface between source and load, power converter is widely used in rail transit, electric vehicle, aerospace and other fields [1]. The normal operation of the whole system is restricted by its security and reliability [2]. Therefore, in order to study the cause of its failure to improve reliability, many scholars have conducted researches on it [3], [4]. The circuit failure of power converter is mainly caused by the failure and degradation of its internal components. Moreover, as an important part of the power electronic converter, it is also one of the most vulnerable components. In Fig. 1, based on more than 200 products from 80 companies, 30% of capacitors failed [5]. Thus, fault prediction of AEC is conducive to FIGURE 1. Percentage of power electronic systems failures [5].
timely replacement or maintenance of the failed capacitors in the circuit, to ensure the normal operation and improving the reliability of the circuits. The mainstream methods used in fault prediction can be roughly divided into filter based and data-driven methods. However, most of filtering based methods need to realize the establishment of complex empirical models [7], [8]. However, in the degradation process of AEC, it is easy to be affected by the external environment, including temperature and voltage, as well as the internal capacitance regeneration, which increases the complexity of empirical model building.
Fortunately, with the rapid development of artificial intelligence in recent years, data driven methods have been widely used in the establishment of fault prediction models in various fields. A conditional deep neural network (DNN) with a dropout technique was proposed by Hao et al. for fault prediction of AEC, which greatly reduce the testing time of AEC in aging test [9]. Vanilla Long Short Time Memory (LSTM) was proposed by Jeong et al. to establish the fault prediction of AEC [10]. Its root mean square error and mean absolute error are decreased by 8.6% and 1.7%, respectively. Based on NASA data set, Mesquita applied different artificial neural networks to predict the remaining life time (RUL) [11], and Delanyo proposed a prediction method based on Bi-LSTM [12]. Broad learning system and LSTM are combined into a fusion network. It realizes the fault prediction and RUL prediction of lithium-ion batteries with a small proportion of training data based on two public data sets [13]. A hybrid model is proposed by Ma et al. based on k-fold cross validation, metaheuristic support vector regression (SVR) optimized by different meta-heuristics and the nonparametric fried-man test is proposed for the landslide displacement prediction [14]. Moreover, a comprehensive comparison of twenty meta-heuristics is beneficial for the researches of hyperparameters tuning in other fields [15].
A defect of the models based on data driven is that the local fluctuations and regeneration phenomenon of times series are not considered [31]. Furthermore, AEC generally presents typical nonlinear characteristics in the actual degradation process. In order to improve fault prediction and establish a more accurate prediction model for nonlinear time series. Feature based methods are used in time series processing. These methods can effectively mine the internal information of the time series, which is conducive to the establishment of the prediction model, so as to improve the prediction accuracy. Furthermore, in the process of establishment of model, although it will increase the complexity of the model and the calculation time, optimization algorithms are often used in adaptive optimization of model parameters to obtain better model effects.
Multiobjective grasshopper optimization algorithm is used by Feng to determine the gear transmission dynamic model [16]. A hybrid model is proposed by Wang et al. based on ARIMA-BO-Bi-LSTM, which makes full use of Autoregressive Integrated Moving Average model (ARIMA) to process linear features in time series and BILSTM's ability to predict the linear part [17]. the time series of lithium-ion batteries through ensemble empirical mode decomposition (EEMD), and then a RUL prediction model is built based on GWO-SVR is decomposed by Zhang et al. [18]. Compared with the GWO-SVR, the average relative error is reduced by 3.125%. CEEMDAN is used by Shi et al. to divide the lithium battery capacity into main degradation trends and several local degradation trends, and then LSTM is used to predict the RUL of the decomposed data respectively. Finally, the predicted data was effectively integrated to improve the prediction accuracy [19].
Considering the limitations of the variational mode decomposition (VMD), a novel fault information-guided VMD is proposed to enhance the sensitivity of bearing fault signature. The mode number and bandwidth control parameter can be optimally determined and the diagnosis of bearing is realized based on the novel method. [20]. An integrated model based on VMD, particle filter and Gaussian process regression is proposed to predict the RUL of lithium-ion battery pack, which improves the accuracy of prediction and reduced errors [21]. However, before VMD, its modal number needs to be determined. The unreasonable mode number may lead to incomplete decomposition or large reconstruction error. Relative root mean square is used by Wu to divide the stage of bearing degradation, and Pearson correlation coefficient combined with Entropy Weight Method is used to select sensitive features as the input of RELM, thus realizing the RUL prediction of bearings. By injecting regularization factor, RELM inherits the advantages of ELM such as simple parameter selection and fast operation speed. Moreover, it further prevents the over fitting phenomenon in the operation process of ELM, obtaining better performance. On the other hand, compared with the algorithm based on neural network, its operation speed is faster and its stability is higher [22].
To sum up, considering the advantages and disadvantages of each algorithm, a fault prediction model based on feature decomposition and machine learning is used in this paper for AEC. The major innovations and contributions including: (1) An AEC fault prediction model based on the CEEMDAN-GWO-RELM is proposed. The superiority and feasibility of the integrated model are verified by the data set obtained from the accelerated aging experiment of AEC. (2) The original time series is decomposed by CEEMDAN, which overcome the fluctuation of AEC in the degradation process and accurately capture the local fluctuations in the degradation process.
The architecture of this paper is organized as follows. Failure mechanism of AEC is shown in section II. Algorithm principle used for fault prediction of AEC is shown in section III. Prediction results are evaluated and analyzed in section IV. The conclusion is given in section V.

II. FAILURE MECHANISM OF AEC
Accelerated degradation test of AEC is conducted by overelectric stress and over-heat stress in the NASA Ames Research Center [23]. In [3] and [24], Chen and Bhargava analyzed and summarized the failure mechanism of capacitors. Combined with the above studies, the failure of AEC is related to many factors. Environmental factors include temperature, humidity, air pressure, vibration, etc. Electrical factors include voltage, ripple current, charge and discharge times, etc. Physical factors include incomplete sealing, electrolyte leakage and evaporation, etc. Chemical factors include invasion of halogen ions and deterioration of sealing materials.
The failure of AEC can be divided into two types: structural failure (hard failure) and parametric failure (soft failure) [4]. Structural fault is mainly caused by short circuit or break in AEC equivalent circuit. The main causes of structural failure are short circuit between electrodes, damage of oxide film insulation, disconnection of terminal or poor contact. The degradation form of parametric failure is usually the change of circuit function index caused by capacitor parameter drift. It will not have the same intuitive impact on the system as a structural failure. However, with the pro-longed use, the impact of the environment and the usage of the system, parametric failure becomes more and more serious. Meanwhile, the reliability and security of the system decrease and related performance indicators change significantly. If not handled in time, it may further evolve into a more serious structural failure, resulting in irreversible consequences.
From the physical analysis, evaporation of electrolyte and degradation of dielectric are the main reasons for the deterioration of AEC aging performance. Evaporation of electrolyte is mainly due to the high temperature of external working conditions and the heating of internal equivalent series resistance. The operating temperature of AECs is generally lower than 85 • C. But in the actual working environment, the temperature will may be higher than 85 • C, accelerating the degradation of AEC. In normal operating circuits, when ripple current flows through an electrolytic capacitor, loss and heat are generated at the equivalent series resistance of the capacitor. The increase of external temperature and core temperature of AEC are two main reasons for the accelerated evaporation of electrolyte. Evaporation of the electrolyte increases the equivalent series resistance of the capacitor and reduces the contact area between the electrolyte and the oxide layer leading to the deteriorating of AEC. The result is a decrease in capacitance and an increase in equivalent series resistance [25].
For electronic systems with capacitors, once one of the capacitors has degraded beyond its normal range of use. Other components are subjected to increasing electrical stress, accelerating their degradation. Finally, the degradation of the components in the power electronic system will inevitably lead to the degradation of the whole system, reducing the service life and safety reliability. On the other hand, the loss factor is the real impedance (equivalent series resistance) of the capacitor, which is also an important index to evaluate the quality of the AECs.
In brief, the changing trend of capacitance value and equivalent series resistance value are as well as the cause of AEC degradation.

III. AEC FAULT PREDICTION BASED ON THE CEEMDAN-GWO-RELM A. CEEMDAN DECOMPOSITION
CEEMDAN is improved from EMD and EEMD [18], [19], [26]. Because these signal decomposition methods do not require prior analysis and research, the time series can be adaptively decomposed into multiple inherent mode functions (IMFs) and a residue (R) with different frequencies and scales. These methods have been widely used in the prediction models of nonlinear and non-stationary signals and time series. However, modal aliasing is actually easy to occur in the process of EMD [19]. EEMD cannot eliminate the added white noise, leading to incomplete decomposition and large reconstruction error. To solve these problems, Torres et al. improved the decomposition process and proposed CEEMDAN method [27]. It introduces additional white noise signal-to-noise ratio (SNR) in each decomposition process to control the noise level. The decomposition is more complete and the reconstruction error is smaller. The decomposition steps of AEC degradation time series are as follows: (1) For the original data f (t) (t = 0, 1, . . . n) of the AEC degradation time series, add the white noise capacity series, and the formula is as follows: where, t represents the number of cycles of AEC, and i represents that white noise is added for the i th time, β k represents the k th SNR, w i (t) (t = 0,1, . . . n) represents the white noise subject to standard normal distribution added for the i th time and data f i (t) is obtained.
(2) f i (t) is decomposed n times by EEMD, and then the first modal component IMF 1 (t) is obtained by averaging, R 1 (t) is obtained by (2): (3) For time series R 1 (t)+β 1 EMD 1 (n i (t)) was decomposed n times repeatedly and averaged. Calculate IMF 2 (t) and R 2 (t) as follows: where, EMD k represents the k th component after decomposition by EMD. (4) Repeat the above steps until the number of extreme points of the margin sequence is less than or equal to two. That is, the margin signal cannot be decomposed again, then terminate the algorithm. The final time series after decomposition can be expressed as:

B. PRINCIPLE OF RELM
ELM is proposed on the basis of single hidden layer neural network [25], [28]. ELM has the advantages of less parameters, fast time for operation and small generalization error. However, only the principle of minimizing empirical risk is considered in the calculation process. Thus, over fitting may occur with the number of hidden layers increases, resulting in poor generalization ability [29]. In addition, the ELM directly calculates the least square solution of the weight value of the output layer. In the process of modeling the time series. it is difficult to adjust the information contained in the time window with more time series, resulting in insufficient controllability. Empirical risk ( ε 2 ) and structural risk ( β 2 ) can be balanced to obtain better generalization ability by adjusting the regularization parameters (λ). The purpose of RELM is to find the minimum value of the objective function (E) with the total risk as the objective function: where, the function h(x) represents the activation function of hidden layer neurons. ε i (i = 1, 2, 3,. . . N ) is the sum of training errors, and N is the number of samples. The Lagrange equation thus constructed is: where, α i ∈ R (i = 1, 2, 3, . . . N ) is the Lagrange multiplier.
H is the output matrix of hidden layer. Its partial derivative is shown in (9): From (9), the out weight matrix can be obtained: In (10), I is identity matrix. The output matrix is: C. GREY WOLF OPTIMIZATION ALGORITHM GWO has the advantages of simple principle, strong search ability and few parameters that is widely used in model optimization based on machine learning and deep learning [18,0]. In GWO, according to the objective function in algorithms to be optimized, the four level definitions that govern the gray wolf are: the top level is α wolf with leadership, it is also the optimal solution of individual fitness. The second level is β wolf, which is the subordination of α wolf. The third layer is δ wolf. The fourth layer is ω wolves, are the basis of all wolves. Through the continuous change of the position of the wolves, the distance between the wolves and the prey is gradually narrowed, and finally the hunting is realized. The specific steps of GWO is shown as follows: Step 1: Calculate the distance (D) between gray wolves and prey. Then with D decreases, the gray wolves are updated.
where, X p (t) and X (t) represent the prey and individual gray wolf. A and C are coefficient vectors, t is the number of iterations, r 1 and r 2 are random vectors between [0,1].
Step 2: When the individual wolf recognizes the position of the prey, the prey is hunted down under the leadership of α, β and δ: The positions of ω wolves are updated on the basis of α, β and δ. Step 3: Finally, after the convergence of GWO, the wolves attack the prey and find the optimal solution of objective function.

D. DATA SOURCE
In order to analyze the characteristic of AEC, aging experiment has been conducted under a high temperature on AEC and the specification of AEC is 330 µF/35V. AEC degraded in the chamber with the temperature is at 150 • C. Each AEC has two wires extending to the outside of the drier that connect its positive and negative poles. So that the AECs in the process of degradation can be measured by LCR meter every 6 hours. At the initial stage of the experiment, the open circuit and short circuit of LCR meter are corrected to reduce the error of the measurement result caused by the wires. Finally, considering the factors of time and cost, the experiment lasted for 1608 hours and 268 data points were collected in the AEC degradation experiment.

E. FAULT PREDICTION BASED ON CEEMDAN-GWO-RELM
On the basis of the data set, the fault prediction model of AEC based on CEEMDAN-GWO-RELM is shown in Fig. 2. The process of the prediction model is as follows: Step 1: AEC time series is decomposed into IMFs and R by CEEMDAN.
Step 2.1: Data set is divided into training set and testing set according to three different proportions by sliding time window. The size of window is 9 that is the last data is predicted by the past 8 data. RELM is used to make single-step and short-term prediction for each IMF and R, respectively. The short-term prediction shows the multi-steps prediction of AEC. It does fault prediction from earlier starting time point than single-step prediction.
Step 2.2: The parameters of GWO-RELM are set: the hidden layer and C of RELM are 50 and 0.01. The GWO is used to optimize the input layer weight and hidden layer biases of RELM. The objective function of GWO is RMSE. The population size and maximum iterations of GWO used in this paper is 5 and 500. After the optimization of GWO, the minimum RMSE is obtained and the parameters are used for testing set.
Step 3: Reconstruct the prediction data to realize the fault prediction of AEC and calculate the prediction error.
Furthermore, on the basis of the above single-step prediction, the multi-step prediction of AEC is realized by recursion. The operation mode is shown in Fig. 3. Firstly, the size of the time window is to 13 that is 8 historical data are used to predict the next 5 data. Secondly, supposed that, the obtained nonlinear model of the training set established by GWO-RELM is f (x t ), where the t th historical value ranked x t in the time window. In the testing set, [y 1 , y 2 , . . .
is obtained. The, the [y 1 , y 2 , . . . y 5 ] will be re-divided into the training set to continue to predict the next 5 data. Repeat the above steps, multi-steps prediction is finally realized by recursion.

IV. RESULT AND ANALYSIS
In this study, the superior advantages of CEEMDAN-GWO-RELM integrated model are verified by the following three groups of contrast experiments: (1) The time series sequence is decomposed by VMD, EEMD and CEEMDAN into IMFs and R. Then the decomposed sequences are reconstructed so that the effectiveness of feature decomposition is analyzed by comparing with the real value. (2) The proportion between design training set and prediction set is 80%, 60% and 40%. The advantages of the model are confirmed by comparing with some commonly used data-driven methods. (3) A contrast experiment of prediction models is designed with different prediction steps. The multi-steps prediction of AEC is realized by recursive five-steps method and the number of steps is 5, 10, 15, 20 and 25.

A. MODEL EVALUATION
For evaluating the accuracy of the model prediction, the following four error evaluation indicators are used in this paper.
(1) Root Mean Square Error (2) Mean Absolute Error (3) Mean Absolute Percentage Error (4) Relative Error where, P represents the prediction values of different models and Y represents the real values. And the smaller RMSE, MAE, MAPE, and RE of prediction results are, the better predictions are. The environment where the program runs is: Win10 x64, Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz, GeForce GTX 1050Ti, MATLAB2021a.

B. ANALYSIS ON DECOMPOSITION RESULTS
CEEMDAN is used to decompose the degradation data set of AEC time series sequence into several IMFs and a R. The decomposition results are shown in Fig. 4. It can be seen that R which after decomposition can be smoother and monotonic, which can better reflect the overall trend of AEC degradation. It can greatly improve the fault prediction accuracy of AEC.
The comparison between the decomposition and reconstruction of VMD, EEMD and CEEMDAN algorithms and the real value is shown in Fig. 5(a). The black one is realized as real value, the blue, yellow and orange ones are decomposition results based on EEMD, VMD and CEEMDAN, respectively. It can be seen intuitively that EEMD, compared with VMD and CEEMDAN is more deviated from the real value and has largest error.
Nevertheless, the data set reconstructed after VMD and CEEMDAN decomposition is close to the real value, which is difficult to see directly from the Fig. 5(a). For further comparing with the reconstruction errors of the two feature decomposition algorithms. REs at each TP are calculated after reconstruction compared with the real value.
The RE between the reconstructed sequence and the real value of CEEMDAN and VMD at each TP is shown in Fig. 5(b). The blue is based on VMD and the orange one is based on CEEMDAN. The minimum RE of these two algorithm are both lower than 0.006. In addition, it is obvious that CEEMDAN algorithm has the smallest error.
It can be seen that CEEMDAN algorithm is superior to EEMD and VMD for decomposition. It is more conducive to the establishment of AEC fault prediction model.

1) RESULTS OF SINGLE-STEP PREDICTION
In this study, data set is divided by three ways: data set A: 80% data for training sets; data set B: 60% data for training sets; data set C, 40% data for training sets, as shown in TABLE 1.  The purpose is to test whether the prediction effect of each model will change significantly due to the size of the training set. The starting point of the data set for A is TP 223 , the starting point of the prediction set for data set B is TP 170 , and the starting point of the prediction set for data set C is TP 116 .
The prediction results of the three groups of experiments are shown in Fig. 6(a), where the blue one represents the real value, and the results of the three groups of experiments A, B and C are orange, yellow and purple, respectively. It can be seen from Fig. 6(a) that the deviation of the prediction results of data set A is greater, because the starting point of the prediction set of data set A is earlier, and the prediction error becomes larger and larger as time goes by. The RE of each TP in the three groups of experiments is shown in Fig. 6(b). The experimental RE Max of data set C is 1.247%, the experimental RE Max of data set B is 0.439%, and the experimental RE Max of data set is 0.284%.
Some traditional prediction methods are used as contrast experiments. And in TABLE 2, the prediction results are described in detail in the form of data. From it, LSTM and GRU are often used to establish prediction models  for time series. However, from the prediction results, when the training set is large, its prediction model has a high prediction accuracy, and its internal memory unit plays an important role. Though the prediction error is smaller compared with BP and GWO-RELM. When the training set is small, the prediction error of the prediction model in data set B and C is large. The prediction accuracy of the integrated model based on CEEMDAN-GWO-RELM is greater than that of GWO-RELM. It can be seen that the prediction model can obtain greater prediction accuracy after feature decomposition.
The operation time of fault prediction model based on these algorithms is shown in TABLE 3. From it, BP takes least operation time than GRU and LSTM. Because BP is a shallow neural network, its structure is simpler than deep neural networks such as LSTM and GRU, and its operation time is correspondingly shorter. For CEEMDAN-GWO-RELM, it is feature decomposition and parameters optimization that increases the complexity of the model. Though the operation time of CEEMDAN-GWO-RELM is longer than BP, its prediction accuracy is better than BP.
In order to further verify the convergence effect of GWO on the model. The convergence curves of IMF1 and R based on CEEMDAN-GWO-RELM is shown in Fig. 7(a) and 7(b). Each curve is averaged after 20 cycles for each case to avoid accidents. From Fig. 7, with the increase of training set, the fitness value is lower. Moreover, it takes less than 250 iterations in IMF1 and 150 iterations in R to realize the convergence.

2) RESULTS OF ITERATIVE MULTI-STEPS PREDICTION
On the basis of single-step prediction, the prediction model is used for multi-steps. In practical applications, the problems that may happen in the usage of AEC earlier can be discovered with the implementation of multi-steps. The AEC and its circuit shall be repaired and maintained earlier to ensure the safety and reliability of the system. In this study, the  multi-steps prediction is set ahead of 5, 10, 15, 20 and 25, respectively.
In Fig. 8, the prediction results are shown as lines, where the gray represents the real value and other prediction results are colored differently. short-term prediction of AEC time series can be effectively and accurately conducted based on CEEMDAN-GWO-RELM.  REs at each TP is shown in Fig. 9. It can be seen from the Fig. 9 that when the prediction is made 25 steps ahead that is 150 hours ahead of schedule, the RE max at each TP is not exceed 0.7%. And the RMSE max is 0.91304, MAE max is 0.69557, and MAPE max is 0.26%, which the results are as shown in TABLE 4. This model not only has high prediction accuracy in single step prediction. Moreover, when the advance steps are large, the RE and prediction error will also increase. This model not only in single step prediction, it also has high prediction accuracy in the multi-steps prediction. In addition, short-term prediction through recursive multisteps prediction can reduce the number of recursion, reduce the cumulative error, and further improve the prediction accuracy of short-term prediction.

V. CONCLUSION
The establishment of fault prediction model for AEC is conducive to maintaining the normal operation of AEC and the circuit where it set, resulting in improving the reliability of system operation. In this paper, an integrated model for fault prediction of AEC degradation data from accelerated aging tests under thermal stress based on CEEMDAN-GWO-RELM is designed. The main conclusions from the experiments are: (1) Compared with EEMD and VMD, CEEMDAN applied to AEC sequence feature decomposition has better decomposition effect and smaller reconstruction error after decomposition. It effectively overcomes the nonlinear problem of time series caused by local regeneration phenomenon during AEC degradation, decreasing the difficulty of model prediction and improving the prediction accuracy of GWO-RELM. (2) After the comparison and analysis with other neural networks commonly used, the proposed integrated prediction model has higher prediction accuracy, and the RE max in the single-step prediction of each TP with 40% data set as the training set is not higher than 1.6%. In the shortterm prediction of 25-steps ahead, its RMSE is 0.91304 and the REs of each TP point is less than 0.653%, which has a high prediction accuracy.
However, in the operating circuit, AECs are also affected by electrical stress. Moreover, the electrical stress has a more serious influence on the regeneration of AEC. This further increases the difficulty of fault prediction for AECs and the circuit. Therefore, thermal and electrical stress aging experiments will be further conducted on AECs and circuit to study the failure mechanism and fault prediction methods in practical engineering applications. He is currently working as an Associate Professor with the School of Automation, Nanjing Institute of Technology, Nanjing. His current research interests include condition monitoring, the fault diagnosis and prognostics of power electronic systems, and deep learning for intelligent health management.
ZHENWEI ZHOU received the Ph.D. degree in operational research and cybernetics from the University of Chinese Academy of Sciences, in 2012. He is currently a Senior Engineer with the China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, Guangdong, China. His research interests include fault diagnosis, prognostics, health management, and statistical reliability of electronics systems and devices.
LICHEN YANG was born in Jiangyin, Jiangsu, China, in 1997. He received the B.S. degree from the Department of Automation, Nanjing Institute of Technology, Nanjing, China, in 2019. He is currently pursuing the M.S. degree in mechanical engineering with the Nanjing Institute of Technology.
His current research interests include the fault diagnosis and deep learning of power electronics.