Complementary Ensemble Adaptive Local Iterative Filtering and Its Application to Rolling Bearing Fault Diagnosis

A fault diagnosis model for rolling bearing based on complementary ensemble adaptive local iterative filtering (CEALIF), Laplacian score (LS) feature selection, and genetic algorithm-based backpropagation neural network (GA-BPNN) is proposed in this article. When the rolling bearing fails, the field-measured waveforms usually shows strong nonlinearity and non-stationary characteristics. Adaptive Local Iterative Filtering (ALIF) is an alternative novel approach to empirical mode decomposition to decompose complex signals into multiple intrinsic mode functions (IMFs), but modal aliasing will occur in actual processing. Aiming at the phenomenon of modal aliasing, a noise-assisted analysis methodology, namely complementary ensemble adaptive local iterative filtering, which could overcome the modal aliasing problem of ALIF. This article applies it to the pre-processing of rolling bearing time series. Then, the time domain (TD) statistical features of the IMFs, their Fourier frequency domain (FD) features, and the time frequency domain (TFD) energy features are extracted to capture the fault information. Meanwhile, to avoid feature redundancy and enhance the diagnostic performance, the LS is adopted to rank the features to improve the fault characteristics. Subsequently, the optimized feature vectors are entered into the GA-BPNN to automatically achieve the fault type recognition. The experimental data analysis results of rolling bearings indicate that the model can effectively diagnose the degree and type of failure.


I. INTRODUCTION
The rolling bearing is an integral component of the rotating machinery and its failure would directly affect the functioning of the machinery as a whole [1], [2]. Therefore, it has become increasingly important to improve bearing reliability and detect bearing failure in timely and accurately [3]. When the rolling bearing fails, it will produce periodic pulse impact force, causing the system's nonlinear vibration, which makes the collected vibration signal always has nonlinear nonstationary features, so the key to its fault diagnosis is how to derive failure characteristic information from nonlinear, nonstationary signal [4], [5].
The associate editor coordinating the review of this manuscript and approving it for publication was Lei Shu .
The traditional signal analysis approaches are often only applicable to the analysis of smooth signals, while the rolling bearing fault signal has certain limitations [6]. For rolling bearing fault signals, the signal analysis processing technology must be adaptive and stable to ensure the effective fault signal characteristic information can be obtained in the face of complex fault signals [7]. In the past few decades, vibration-based analysis approaches have been well developed in fault diagnosis, such as short-time Fourier transform (STFT); Wavelet transform (WT); Empirical mode decomposition (EMD), and so on. STFT [8], [9] employs a moving window function to intercept non-stationary signals, which will result in STFT's time resolution and frequency resolution not being optimal at the same time, and when the frequency changes rapidly, the analysis effect is not good. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ WT [10], [11] has been well employed in mechanical fault diagnosis, but how to choose a wavelet basis function with better time-frequency resolution is not yet clear, which needs to be determined manually. EMD [12] is an adaptive non-stationary signal analysis technique developed by Huang et al., which achieves adaptive decomposition by seeking the envelope average curve of the time series to be handled. However, this method also has many problems, such as under-envelope and over-envelope phenomena when seeking the extreme envelope, and serious modal aliasing and end-point effects will occur in the face of certain fault signals. The decomposition results are distorted or even completely invalid, which leads to the misunderstanding of signal processing analysis, which affects the accuracy of fault diagnosis. At the same time, this method has been lacking rigorous and effective mathematical proof in theory. To solve the EMD modal aliasing problem, Wu and Huang [13] suggested ensemble EMD (EEMD) utilizing the statistical properties of white noise. Li et al. combined EEMD with improved frequency band entropy to derive bearing failure characteristics [14]. However, when the number of integrations is not large enough, the residual noise in the IMF will increase. Therefore, Yeh et al. [15] proposed complementary EEMD (CEEMD) to reduce the residual noise by adding a pair of white noise to the analysis signal. However, CEEMD cannot solve problems such as different IMF orders due to the addition of different noises. So Torres et al. [16] proposed complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), which can avoid the problem of generating different IMF orders due to the addition of different noises. Cheng et al. successfully employed CEEMDAN for rolling bearing fault diagnosis [17]. Although CEEMDAN overcomes the aforementioned problems of EEMD, it is still essentially not free from the lack of theoretical support for EMD. Lin et al. improved the EMD methodology and proposed the Iterative filtering (IF) [18] algorithm, which employs a low-pass filter function instead of EMD to solve the envelope average curve in EMD to solve the issue of lack of theoretical support for EMD and its derivatives methods, and proved the convergence of the algorithm [19]. Dash et al. applied IF to the feature extraction of EEG signals and achieved positive results [20]- [22]. When the signal to be processed contains strong nonlinear and non-stationary components, a fixed low-pass filter function may cause problems such as distortion of the decomposition results and poor adaptive behavior. To improve the adaptive and accurate decomposition of the method, Cicone et al. [23]- [25] propose a novel signal processing technology, the Adaptive Local Iterative filtering (ALIF) algorithm, by reworking traditional filters to ensure that the IF method can be accurately decomposed when applied to non-stationary signals. It is derived from the EMD method and developed from the IF method. It also uses iterative decomposition technology. The difference is that it uses an adaptive filter with a filter length driven by data to achieve the necessary adaptive decomposition. The solution of the Planck equation constitutes this method, which avoids some defects in the EMD method. The ALIF decomposition method is not only highly adaptive, it can accurately process AM-FM signals containing multiple components and decompose it into several stationary signals [26]- [29]. At the same time, the method has a stronger ability to suppress modal aliasing and is still faced with complex noise. There is a good performance. In addition, the method already has rigorous mathematical proofs, and its decomposed components have proved to be convergent [25]. Therefore, it can be introduced into the discipline of mechanical fault diagnosis.
Since the ALIF is obtained by solving the Fokker-Planck equation and the filtering interval based on extreme-scale drive, the filtering interval of the filter is adaptively adjusted according to the extreme-scale during the iterative filtering process [23]. It was found that when the signal to be processed contains high-frequency intermittent signals, the extreme value of the signal to be decomposed will change suddenly, which affects the ability of the filter to distinguish different modes, resulting in the decomposition result distortion [13], [15]. Therefore, to improve the decomposition effect of ALIF, inspired by the idea of EMD modal aliasing, a complementary ensemble adaptive local iterative filtering (CEALIF) algorithm based on noise-assisted analysis is presented. When the ALIF method is employed to process the signal, a pair of white noise with the same amplitude and opposite sign is added, which allows the different proportional components of the original signal to fill the timefrequency space evenly over the entire time range. The noise addition results in a more uniform distribution of extreme points, which improves the performance of the adaptive filter and provides better resolution to different modes. Each result is averaged over a finite number of ensembles to remove the noise effect and obtain the final result. The CEALIF algorithm is employed to the pre-processing of the vibration signal of bearing failure, which decomposes it into the sum of several IMFs, and subsequently extract the IMF characteristic parameters as well as the time frequency domain information.
After extracting the above rolling bearing fault characteristics, to avoid information redundancy due to the high dimensionality of the feature vector and reduce the efficiency of diagnosis, this article adopts Laplace score (LS) to re-rank the features [30], and select the most relevant features as the feature vectors to characterize the failure indicators, thus reducing the dimensionality of feature vectors and improving the diagnostic efficiency.
Naturally, after obtaining the feature vectors, a multiplefault categorizer is adopted to automatically identify the failure category and the degree of damage of the rolling bearing [3]- [33]. Some common classifiers such as support vector machines (SVM) [34], [35] and Backpropagation neural network (BPNN) have been widely used in machinery and equipment diagnosis. However, SVM requires prior selection of kernel function and its parameters, which is essentially binary classification, and multiple binary classification is required for multi-classification problems. In order to reduce the reliance on human experience and achieve automatic diagnosis, a suitable multi-classifier needs to be selected. BPNN [36]- [38] does not require the introduction of new parameters in the training process, which relies entirely on the adjustment of initial weights and thresholds. To avoid the local optimization problem of random initialization and to enhance the performance of BPNN, this article adopts genetic algorithm (GA) [39], [40] to find the optimal initial weights and thresholds of BPNN and applies them to construct predictive models. The model developed through this approach is more stable and can converge faster than the standard BPNN.
Based on the above analysis, the paper proposes a rolling bearing fault diagnosis model via CEALIF, LS, and GA-BPNN. Firstly, CEALIF is utilized to handle the vibration signal of rolling bearing and several IMFs are obtained.; secondly, the first few IMFs containing the main fault information are selected and their TD and FD fault parameters are extracted respectively, and then the TFD feature values of the vibration signal are extracted to form the initial feature vector; thirdly, the LS is adopted for each feature of the initial feature vector according to its importance, and the first few more important features are selected as feature vectors according to the order of scores from smallest to largest; finally, the feature vectors are entered into the BPNN to automatically diagnose the type and degree of rolling bearing failures. Public datasets as well as experimental data were employed to validate the efficacy of the model.
The rest of the organizational structure is given below: Section 2 introduces the ALIF method and the basic principles of CEALIF. Section 3 uses simulated signals to test the performance of CEALIF and compares it with EEMD and CEEMDAN. Section 4 gives an introduction to the various feature quantities to be extracted and Laplace scores. Section 5 validates the proposed model and compares it with several similar methods. Conclusions are given in Section 6.

II. THEORETICAL DESCRIPTION A. INTRINSIC MODE COMPONENTS AND THEIR FILTERING
The essence of empirical mode decomposition algorithm is actually a filtering algorithm from high frequency to low frequency, and similar algorithms such as variational mode decomposition (VMD), empirical wavelet transform (EWT) and stationary wavelet transform (SWT) have been developed. The EMD algorithm obtains IMF step-by-step and the modal components satisfy the following conditions: where s(t) is a time series to be processed, is a wave operator (also known as the fluctuant operator), and ζ is a sliding operator in = s(t) − ζ (s(t)).
Further, the following is the process of IMF: Step 1. The sliding operator ζ (s(t)) is calculated for a given signal s(t).
Step 2. The time series s(t) is subtracted from the sliding operator ζ (s(t)) to obtain the wave operator = s(t) − ζ (s(t)).
Step 3. If the fluctuant operator satisfies Equation (1), then the first IMF is c 1 (t) = . If the condition is not satisfied, then the next process to obtain IMF component will continue to Steps 1-2 until Equation (1) is satisfied.
Step 4. Subtract the signal s(t) from the preceding m IMF components and obtain the residual components where r(t) is regarded as the previous s(t), and it is looped Steps 1-3 until r(t) becomes a trend component. After the loop stops, the original time series can be expressed by the following equation: The sliding operator ζ represents different decomposition methods if its computational method is different. In EMD algorithm, the sliding operator is computed by taking the average value of the envelopes as follows: where the upper envelope is E 1 and the lower envelope is E 2 .

B. THEORETICAL DESCRIPTION OF ADAPTIVE LOCAL ITERATIVE FILTERING (ALIF)
Under the framework of EMD algorithm, researchers obtained the sliding operator by convoluting the low-pass filter with the original signal, and thus obtained another algorithm called iterative filtering algorithm (IF) [18], [19].
In the IF algorithm, the convolution of signal s(t) and the filtering function h(t) is regarded as a sliding operator as follows: where L(n) is filter length, which is calculated as: where the steady-state coefficient η is 1.6, t denotes the number of extreme points of the sequence, K is the length of signal, and [·] is rounded to zero. Due to the limitation of operation time, it is difficult to realize the situation that n tends to infinity in Equation (1). Considering the actual situation, the features of IMF can be described as follows: where e min , e max and e 0 are the minimum, the maximum and zero values of the IMF respectively, e 1 (t) denotes the envelope function formed by IMF's local maximum and e 2 (t) enotes the envelope function formed by IMF's local minimum. Equation (7) can be employed to determine whether the component satisfies the IMF condition: whenever the SD value reaches the set threshold, iteratively stop to obtain an IMF component.
It is worth mentioning that in the iterative filtering technique, the filter h(t) is a general filter, and the filter interval L(n) is also a fixed value. In order to improve the adaptability and accuracy of the decomposition of this method, the traditional iterative filtering technique is improved to ensure that it can be applied to the decomposition of non-stationary signals. In ALIF, the filter h(t) is given by the solution of the Fokker-Planck equation, and the filter interval L(n) is also adaptive according to the change trend of the signal to be decomposed [23]. The Fokker-Planck equation is as follows: where α and β are constants in (0,1); and f (x) and k(x) are two smooth differentiable functions; and will cause a diffusion effect to make the solution of the equation move to both ends; meanwhile, −α ∂(k(x)p) ∂x will move the solution p(x) from both ends of (a, b) to the center. When the two reach equilibrium, there are as follows: At this point, the solution to the equation (9) satisfies the condition: ∀x ∈ (a, b), p(x) > 0 and in other case p(x) = 0.
This shows that the solutions p(x) of the above steady-state equations are all between (a, b), and then p(x) can be regarded as a filter function. With the change of the filtering interval, the filtering function also changes, so as to realize the adaptive decomposition of the ALIF algorithm.

C. RESTRAIN OF MODE MIXING AND COMPLEMENTARY ENSEMBLE ADAPTIVE LOCAL ITERATIVE FILTERING (CEALIF)
Mode mixing is the disadvantage of most iterative screening algorithms. ALIF's mode mixing is defined as the existence of multiple scales in a single modality or the existence of the same scale information in different modalities. In this article, EMD's suppressing mode mixing strategy is employed to solve the mode mixing issue of ALIF. The study found that the modal aliasing of ALIF mainly comes from the discontinuity of the time scale, which is the influence of signal intermittent. When there is an intermittent signal in the signal to be analyzed, there is no extreme value related to the intermittent signal in certain intervals, so the extreme points of other scales supplement the missing extreme points of the intermittent signal and some of them will remain on the scale of the intermittent signal in the end, the result will cause the performance of ALIF's filter bank to be compromised, which is commonly referred to as modal aliasing. The white noise is evenly distributed throughout the time-frequency space. The original signal will be projected to the corresponding scale when white noise is added [41]- [43]. Therefore, the ALIF filter set can be restored by adding white noise Performance. The addition of noise will include additional noise components in the decomposition results. In this article, white noise of opposite sign is added enough times to eliminate the impacts of the noise. The flowchart of the CEALIF algorithm is depicted in Fig. 1: Step 1: Let the signal to be decomposed be s(t) and add white noise of opposite sign to s(t) respectively.
where n i (t), i = 1, 2, . . . , Ne represents the added white noise signal, Ne represents the number of additions.
Step 2: Use the ALIF technique to decompose the target Step 3: Repeat steps 1-2 until i = Ne, where Ne is the logarithm of white noise added and the number of integrated averages.
Step 4: Integrate all the IMF components obtained above to obtain the first IMF component s(t).
The amplitude of white noise is added to different signals; there is no consistent standard with the integration number Ne. The magnitude of each added noise is determined from the standard deviation of the raw decomposed signal, which is generally less than 0.5 times the difference of the raw signal standard, Ne is generally within one hundred.

III. NUMERICAL SIMULATION ANALYSIS
There are widespread high-frequency intermittent signals in mechanical signals. To demonstrate the mode mixing issue and verify the validity of CEALIF, the high-frequency intermittent signal is added to the signal composed of modulated signals and harmonic components. The expression is provided by:  where δ(t) is the step signal. s 1 (t) is a 2-segment highfrequency intermittent signal with a frequency of 200Hz, s 2 (t) and s 3 (t) are frequency-modulation signal and harmonic signal respectively. The mixed signal and its frequency spectrum are plotted in Fig. 2(a), and its components are presented in Fig. 2(b).
The intermittent signal is a classic example of mode mixing and the modulated signal is generally a bearing fault signal. We know that the more ensembles there are, the less noise residue there is. To achieve the decomposition effect, the Ne is set to 100. Meanwhile, to avoid the noise amplitude is too large to swamp the original signal, the noise amplitude is set to 0.1. The mixed signal is decomposed by ALIF and CEALIF methods, respectively. Fig. 3(a) provides the ALIF processing results of the mixed-signal. According to the results, the three components are heavily contaminated by high-frequency intermittent signals and the IMFs are distorted. Fig. 3(b) presents the frequency domain diagram of each component after ALIF decomposition. Where IMF1 contains both frequency modulated signals and high-frequency intermittent signals. Among them, modulation components also appear in IMF2, while IMF2 and IMF3 both contain low-frequency harmonic components. This means that serious mode mixing has occurred in ALIF and the intermittent signals are not separated. Therefore, the decomposition capability of the ALIF algorithm needs to be improved when the signal to be processed contains highfrequency intermittent components.
The CEALIF decomposition results of the high-frequency intermittent mixed-signal and the frequency domain diagrams of its components are depicted in Fig. 3(c) and Fig. 3(d), respectively. The high-frequency intermittent signal is basically decomposed into the IMF1 component, while IMF2 and IMF3 also correspond to the original mixed-signal components. The obtained components not only correspond to the original signal components but also the decomposition results do not produce redundant components, which indicates that the CEALIF method can suppress the modal aliasing effect well and the decomposition results are basically unaffected by the additional noise.
To demonstrate the validity of the CEALIF algorithm, EMD's mode mixing improvement methods EEMD and CEEMDAN are also employed to analyze high-frequency intermittent mixed signals for comparison. EEMD and CEEMDAN methods process high-frequency intermittent mixed signals to obtain 12 and 9 IMF components, respectively. For simplicity, the first six effective IMF components are taken for analysis, and the results of EEMD and CEEMDAN are displayed in Fig. 4 and Fig. 5, respectively.  From Fig. 4, we can see that the time domain components in the high-frequency intermittent band are subject to different degrees of pollution, and the distortion is more serious. In the frequency domain, it can be found that IMF1 and IMF2 contain multiple frequency components, and low-frequency harmonic components also appear in IMF2 and IMF3 at the same time. That is to say, EEMD in processing contains high-frequency intermittent signal will be a serious mode mixing. In the CEEMDAN processing of Fig. 5, the lowfrequency harmonic components are completely separated and only the high-frequency intermittent and modulated signals undergo modal mixing. That is to say, CEEMDAN will also suffer from modal aliasing when processing signals with high-frequency intermittent signals, but the aliasing will be less than EEMD and the redundant component will be less than EEMD.
The simulated signals indicate that ALIF has serious mode aliasing when the signal to be processed contains high-frequency intermittent components, while CEALIF improves the performance of adaptive filter by adding white noise based on ALIF, which makes the performance of CEALIF method better and has high recognition ability for different modal components. By comparing with EEMD and CEEMDAN, which are proposed to address the modal aliasing of EMD, CEEMDAN suffers less modal aliasing than   EEMD when dealing with high-frequency intermittent signals, but there is still a certain gap compared with the CEALIF decomposition results.

IV. FEATURE EXTRACTION, SELECTION, AND RECOGNITION A. FEATURE PARAMETER EXTRACTION
When the vibration state of the rolling bearing changes, the TD information, FD information, and TFD information of the measured signal may change accordingly [40]. As the measured rolling bearing vibration signal is more complex, the feature extraction approach based on a single TD or FD has certain limitations. To obtain more fault information, the TD, FD and TFD features of the vibration signal are extracted simultaneously in this article.
Time-domain (TD) indicators (such as kurtosis, impulse factor) and frequency domain (FD) characteristics (such as frequency variance, mean square frequency) are physical quantities based on statistical characteristics, which reflect the amplitude and energy fluctuations in the TD and FD of the vibration signal, and when the vibration signal changes, these characteristics can be accurately monitored from both the TD and FD, so they can effectively characterize the fault signal [40].
When mechanical equipment failure, not only the vibration signal of TD and FD characteristics of the change, and the time-frequency domain (TFD) energy characteristics will also change, mainly manifested in the time-frequency surface of the different time-frequency block energy distribution differences, using the time frequency entropy (TFE) to represent the energy differences.  The TFE is determined as follows: the time frequency plane is divided into N blocks of equal areas, where the total energy is E and the energy of each block is E i . Normalize the energy of each block to obtain e i = E i /E, so there is  [44]. In conclusion, to obtain more information about the fault status, 11 characteristics are extracted in this article, as listed in the Table 1.
Since the values of more feature parameters do not belong to the same order of magnitude, the feature matrix is normalized before entering the fault model. The normalization formula utilized was as follows: where [κ min , κ max ] is the normalized interval, x min and x max respectively correspond to the minimum and maximum values of the characteristic matrix respectively. This article normalizes the characteristic parameters to [− 1, 1].

B. FEATURE SELECTION-BASED LAPLACIAN SCORE(LS)
Although the faults in the equipment can be identified from different angles based on the above features, however, all features are not equally sensitive to faults. Some features are highly sensitive to faults, and there may also be components that are irrelevant or even redundant. Therefore, the feature set may be further sorted and selected before it is entered into the classifier [45]. Laplacian score (LS), based on mapping and projection [46], measures feature through local retention capabilities, directly learns the feature set to extract the internal information structure of the data, transform high-dimensional features into low-dimensional ones. The selection of features with smaller scores in the low-dimensional feature space largely preserves information about the overall geometric structure contained in the failure signal character set and thus aids in rolling bearing fault recognition.
Let L k be the LS of the k-th feature, p ki be the k-th feature of the i-th sample, where k = 1, 2, . . . , n represents the number of features of each sample and i = 1, 2, . . . , m represents the sample. The Laplace score approach is presented below: Step 1: Given a sample of features, the nearest neighbor graph G with m nodes is constructed, where x i denotes the i-th node. Then the premise that x i and x j are connected by edges is that they belong to the same class and are k neighbors of each other.
Step 2: The distance weight of the edges in G are denoted by Equation (14).
where σ is a suitable constant; if x i and x j have no edges connected, S ij = 0.
Step 3: For the k-th feature, define The matrix L is often denoted as the Laplace matrix of graph G. De-averaging p r yields: Step 4: Calculate the LS of the k-th feature. (20) where Var() denotes the variance. It is observed that for a well-defined feature, the larger S ij is, the lower the corresponding Laplacian score is. Therefore, the feature with a lower LS is selected as the feature vector.

C. GENETIC ALGORITHM-BASED BPNN
BPNN have strong self-organizing learning capabilities via error backpropagation, which can realize classification through the nonlinear mapping between the layers. However, the choice of initial weights and thresholds significantly affects its nonlinear mapping performance.
Genetic algorithm (GA) [47], [48] is an intelligent optimization algorithm with powerful global nonlinear optimization function. In this article, GA was employed to optimize the initial weight and threshold to boost the nonlinear mapping capacity and convergence speed of the BPNN. In the GA, the population individuals are firstly coded; the neural network training error is then adopted as the fitness function; secondly, the best fitness individuals of the population are obtained through selection, adaptive crossover and mutation operators. Finally, the best initial weights and thresholds are obtained by the best fitness individuals of the population until the set prediction error requirements are satisfied or the set maximum number of iterations are reached, thus obtaining a mature GA-BPNN. where the mean relative error (MRE) is set as the termination criterion, the expression of which is shown in Equation (21). Fig. 6 depicts the whole flow chart of GA-BPNN.
where N denotes the sample size, x i is the true value, andx is the predicted value.

V. THE PROPOSED FAULT DIAGNOSIS MODEL AND EXPERIMENTAL VERIFICATION
A. THE PROPOSED MODEL BASED ON CEALIF 1) CEALIF is employed to process each sample signal and generate a sequence of IMFs for each.
2) Select the first 5 IMF components, extract their TD and FD features in section 4.1 and extract the TFE of the vibration signal to construct the following initial feature vector: 3) Calculating the LS of the initial feature matrix composed of the initial feature vector T, ranking the feature vectors in order of importance from lowest to highest according to their scores, and selecting the first five features as the novel failure feature vectors.
4) The obtained new failure feature vectors are input into the GA-BPNN for training the network to enable automatic  diagnosis of rolling bearings. The proposed model is schematized in Fig. 7.

B. EXPERIMENTAL VERIFICATION 1) CASE 1: APPLICATION TO OPEN-SOURCE DATASETS
In order to validate the validity of the proposed rolling bearing fault diagnosis model, bearing experimental signals from Case Western Reserve University were employed to test the model performance. The transmission sketch as well as the experimental apparatus is presented in Fig. 8 and its description is given in [49]. The data collected at the drive end was utilized for analysis and truncated in 4096 points.
For the above experimental data, the types of failure are categorized as rolling element failure, inner race failure and outer race failure. The degree of failure is categorized as slight, moderate and severe, and considering the normal operating condition, a total of 10 categories need to be identified. A sample waveform for each of these categories are displayed in Fig. 9. For each category, 10 samples were randomly selected for training and the other 10 samples were employed to determine the type of failure. In total, there are 100 training samples and 100 sets of test samples. All data types are listed in Table 2. Since the measured vibration signals containing mechanical faults generally present  nonlinear and non-stationary properties, it is impossible to identify the failure type only based on the waveforms in Fig. 9. Therefore, it is essential to further decompose the time-domain waveform.
Firstly, CEALIF is adopted to decompose each vibration signal of each category separately and to obtain multiple IMFs; secondly, the aforementioned characteristic parameters of the first five IMF components and the TFE of the vibration signal are extracted to construct the feature vector T =  According to LS, the first five optimal feature values can be obtained as the TFE of the original signal, the RMSF of IMF2, the FC of IMF2, the RMS of IMF2, and the SF of IMF2, which indicates that the second component of IMF2 obtained from the decomposition contains important fault information. In addition, the first five selected features completely contain the TD, FD, and TFD indicators, which make the diagnosis results more reliable. Furthermore, to demonstrate the better fault correlation of the sort-optimized feature set, we plotted the first three features of the sort-optimized feature set (T 56 , T 22 , T 20 ) and compared them with the random features (T 1 , T 2 , T 3 ) before the sort-optimization, as illustrated in Fig. 10. It can be seen that the samples in the sort-optimized feature set are mostly clustered around their corresponding centers and a clear demarcation line between classes can be observed. On the contrary, the unordered optimized feature set basically exhibits a dispersion trend and most of the categories have no obvious boundaries between them.
Finally, the optimal feature vector matrix IT 100×5 is imported into the GA-BPNN categorizer for training to derive the prediction models of each category. According to the feature matrix IT 100×5 , extract the 5 optimal feature values of 100 test samples (for the following comparison, the paper still extracts 56 feature values of each vibration signal), and the feature matrix OT 100×5 of the test sample is obtained, which is input into the trained GA-BPNN categorizer to generate the identification results of the test samples, as presented in Fig. 11. The results demonstrate that the proposed model has a good fault recognition effect, which not only realizes the distinction of fault categories but also the partitioning of different fault degrees, and the test samples have a high fault identification rate (100%).
In addition, the number of feature values has an important impact on the diagnosis result. If the dimension of the feature vector is too small, that is, the number of feature values is small, it will not be able to fully reflect and distinguish the fault type and fault degree, which will affect the reliability of diagnosis; on the contrary, too many eigenvalues will cause information redundancy and increase the classifier Training time, thereby reducing diagnostic efficiency. For comparison, this article selects the first F feature values (F = 3, 4, 5, 6, 7) after LS sorting as the feature vector input to the GA-BPNN classifier. The number of samples from the training and the test samples remained the same. After training, a prediction model was established and the output results of the test samples were tested, as depicted in Fig. 12.
The results revealed that too few features will lead to poor diagnostic accuracy, and too many features will inevitably lead to a decrease in diagnostic efficiency. It means that too many or too few features have an inappropriate effect on the failure identification results. Meanwhile, the conclusions in Fig. 12 also verify that it is feasible and reasonable to select the first five characteristic quantities in the fault diagnosis model presented in this article.
To illustrate the necessity and superiority of the LSoptimized feature vectors, select the F (F = 3, 4, 5, 6, 7) feature values before LS sorting as feature vectors to be imported into the GA-BPNN categorizer. The number of training and test samples remains the same, the trained prediction model is created, and the test samples are obtained with the recognition rates shown in Table 3. Comparing Table 3 with Fig. 11 and Fig. 12, it can be obtained that the feature vector is not optimized by LS, and for the same GA-BPNN categorizer, the recognition rate of the test sample is significantly lower than the results of the test sample with the LS-optimized feature vector as the input feature vector, which shows that LS optimization of feature values has certain advantages and necessities.
In addition, to illustrate the superiority of GA-BPNN classifier, an unoptimized BPNN categorizer was employed to do the same process, in which the network structure was 8 layers of hidden layer and 10 layers of output layer, the training target was 0.001, the maximum number of training was 1000, and other parameters were set by default. The feature vectors are composed of the first 3 to 7 feature values after LS optimization, and the number of samples from the training   Table 4. Comparing Table 4 with Fig. 11 and Fig. 12, it is found that for the same features, the recognition rate of BPNN is lower than GA-BPNN, which indicates the necessity of optimizing the neural network and verifies that GA-BPNN is an effective classification method.

2) CASE 2: APPLICATION TO ROTATING MECHANICAL FAILURE TEST BENCHES
To further validate the proposed model, different bearing damage vibration signals were collected under different operating conditions, with damage types including rolling element failure, outer race failure, and inner race failure. The experimental equipment is shown in Fig. 13, which consists of an AC motor, a coupling and a magnetic powder brake. The test bearing is a cylindrical roller bearing of model SKF 6202-2Z. The above damages were simulated using the EDM technique with a damage depth of 1.5 mm and the three faulty bearings are presented in Fig. 14. The data sampling frequency is 2560 Hz. Three operating conditions were designed, as shown in Table 5, with four bearing classes for each condition. Each of the different operating conditions contained 35 sets of data samples per operating condition, resulting in a total of 140 sets of data samples per operating condition, each consisting of 4096 data points.
For the above experimental data, four categories of bearing working conditions under different operation states are identified separately. The waveforms of the four bearing categories under the three operating conditions are depicted in Fig. 15.   For each category, 20 samples were arbitrarily picked for training and the remainder were taken for testing, and the data sample labels are shown in Table 6. Hence, there were 80 training samples and 60 test samples for each condition. The above operation was implemented for all three operating states listed in Table 5.   The model proposed in this article is deployed for three different operating states. Firstly, CEALIF is employed to pre-process each sample and obtain a number of IMFs, and then TD and FD parameters of the first five IMFs and the TFE of the raw vibration signal are extracted in turn, which can constitute the feature vector T = [T i ] 1×56 . The feature vector T of the 80 training samples is then constructed into the initial feature matrixT = [T ] 80×56 . Since the values of each feature parameter do not belong to the same order of magnitude, the feature matrix is normalized by Equation (13). The next step is to compute the LS of each characteristic and prioritize it from smallest to largest, select the first five features with the smallest score as the optimal features and form the optimal feature vector matrix. For the above experimental data, the inverse of the LS was adopted as an indicator for a clearer display, as shown in Fig. 16, which shows the LS scores for the three operating conditions.
According to the LS optimization ranking in Fig. 16, the first five important indicators for operating state 1 are RMS of IMF1, the RMS of IMF2, the PK of IMF1, the SF of IMF2, the RMSF of IMF1; the first five important indicators for operating state 2 are the RMS of IMF1, TFE for IMF5, the SF of IMF1, the RMSF of IMF1, the M of IMF2; the first five important indicators for operating state 3 are the RMS of IMF1, the PK of IMF1, the FC of IMF2, the RVF of IMF2, and the RMS of IMF2. The above data shows that the vibration signal fault information is essentially found in the first two IMFs and the vibration signal time-frequency entropy. It also justifies choosing different indicators such as TD, FD and TFD as indicators of fault characteristics. In addition, to demonstrate better fault correlation of the ranked optimized feature set, for ease of observation, we plotted the first three features of the five important indicators mentioned above and compared them with the unoptimized random features, as indicated in Fig. 17. Most of the samples of the feature set that were LS ranked optimized in the three conditions in the FIGURE 17. The comparison between the sorted optimized feature set and the random initial feature set: (a) Optimal feature set after sorting in operating state 1 (T2, T13, T3); (b) Random initial feature set of operating state 1 (T1, T2, T3); (c) Optimal feature set after sorting in operating state 2 (T2, T56, T50); (d) Random initial feature set of operating state 2 (T1, T2, T3); (e) Optimal feature set after sorting in operating state 3 (T2, T3, T20); (f) Random initial feature set of operating state 3 (T1, T2, T3). figure are clustered around their centers and there are clear boundaries between categories. In contrast, the unoptimized initial feature set largely exhibits a dispersion trend, with no obvious clustering centers except for the normal bearing operating conditions. In particular, the inner and outer race failures are mixed together in all three operating states and cannot be effectively recognized.
Finally, the optimal feature vector matrix IT 80×5 obtained above is input to the GA-BPNN categorizer for training to derive the prediction model for each category. According to the feature matrix IT 80×5 , extract the first five optimal feature values of 60 test samples (for consistency, the paper still extracts 56 feature values of each sample) to obtain the feature matrix OT 60×5 of the training sample, and input OT 60×5 into the trained GA-BPNN classifier for automatic recognition. Fig. 18 depicts the recognition rates of the test samples under three different operating conditions. The results demonstrate that the proposed model has good fault recognition and can distinguish well between different loading conditions and different rotational speed conditions, and the fault recognition rates of the test samples are 100%.

VI. CONCLUSION
A rolling bearing fault diagnosis model integrating CEALIF, LS, and GA-BPNN is presented. In the proposed model, TD, FD, and TFD information were utilized as damage indicators, where the CEALIF method was employed for signal pre-processing. Then with the help of LS and GA-BPNN, the optimal feature vector was adopted to examine the rolling bearing failure category. The experimental results demonstrate the superior capabilities of the model in determining the different typologies of rolling bearing failures as well as the different degrees of damage. The main aspects of the work are as follows: (1) the noise-assisted CEALIF method was proposed to suppress the ALIF modal aliasing problem, and the simulation signal part showed that the CEALIF method was more stable in processing signals containing high-frequency intermittent signals, which was an effective data analysis technique; (2) the TD, FD and TFD features were extracted simultaneously, which reflected the fault characteristics in many aspects and improved the reliability of the diagnostic results; (3) the LS was introduced to rank the obtained feature parameters according to their importance for selection and reduce redundant features to improve the classification performance; (4) GA is employed to address the shortcomings of the tendency of BP neural networks to fall into local optimality as well as slow convergence to obtain the best BPNN performance. In further investigations, more alternative characteristic parameters and more advanced damage indicators will be proposed to provide enhanced classification performance, which can guide further studies in bearing failure diagnosis.