LightGBM-Based Fault Diagnosis of Rotating Machinery Under Changing Working Conditions Using Modified Recursive Feature Elimination

This article presents an intelligent and accurate framework for fault diagnosis of induction motors using light gradient boosting machine (LightGBM). The proposed framework offers promising generalization ability when the testing data contains new unseen operating conditions unavailable during the training process. After the acquisition of vibration signals and feature extraction in multiple domains, we perform an iterative feature selection (FS) approach by utilizing a modiﬁed version of recursive feature elimination (RFE) and the features’ importance scores obtained by LightGBM. To prevent overﬁtting and subsequent selection bias, an outer resampling loop encompasses the whole process of our RFE-LightGBM algorithm. Moreover, instead of the conventional resampling methods based on K-fold cross-validation (CV) or leave-one-out CV (LOOCV), we use a new scheme called leave-one-loading-out CV ( LOLO-CV ). Lever-aging LOLO-CV , the proposed FS method identiﬁes the optimal feature subset, making the fault diagnosis robust under changing operating conditions. Then, the ﬁnal classiﬁcation is performed with optimal feature subset by training a new LightGBM model with adjusted hyperparameters employing Bayesian optimization. Experimental results from two real case studies show that our proposed fault diagnosis framework achieves accuracies between 98.55% and 100% for various testing scenarios. For example, for the worst-case testing scenario in the bearing dataset of Case Western Reserve University where the no-load data (0hp) is absent during the training process and is only used for testing, the testing accuracy of LightGBM classiﬁer before and after applying the proposed RFE-LightGBM-FS method is 88.04% to 97.23%, respectively. Using the Bayesian hyperparameter optimization further improves the accuracy to 98.55%.


I. INTRODUCTION
Condition monitoring and fault diagnosis of induction motors (IMs) at incipient stages are critical to decrease maintenance costs and tremendous financial losses and avoid the The associate editor coordinating the review of this manuscript and approving it for publication was Shadi Alawneh . long-term shutdown of industrial operations [1], [2]. The methods developed for the condition monitoring of IMs can be classified based on the various signals being recorded and analyzed, e.g., motor vibration, current, acoustic emission, temperature, and pressure [3]- [5]. Among them, vibration signal analysis is one of the most reliable, accurate, and standard methods widely utilized in machinery fault diagnosis [6]- [9]. Recently, intelligent data-driven fault diagnosis techniques based on either shallow machine learning (SML) or deep learning (DL) have become emerging in industry and academia [10], [11].
In the SML-based methods, after data acquisition, several handcrafted features must be extracted using signalprocessing methods. Then the selected features are used in the final step (fault classification) to train various SML models such as support vector machine (SVM) [12], Naive Bayes [13], K-nearest neighbor (KNN) [14], ensemble boosted trees [15], etc.
The DL-based fault diagnosis employs deep architectures of neural networks (NNs) with many layers. They enable automatic extraction of abstract and informative features from the raw input signals and eliminate manual feature extraction requirements [16]. Although the classification accuracy of DL-based motor fault diagnosis methods is promising [17]- [19], using black-box models with few physical perspectives, they cannot provide enough interpretation about how the system operates well. Moreover, training deep NNs requires massive data, time, and computational resources, which is not practical in all industrial sectors [20], [21].
On the other hand, SML models such as SVM have simpler structures and can be trained with fewer computational requirements [22]- [24]. However, one of the biggest challenges in SML-based methods is dealing with the high dimensionality of the handcrafted features extracted from multiple domains, which deteriorates the classification accuracy due to overfitting [25]- [27]. Light Gradient Boosting Machine (LightGBM) is a recently-developed framework based on gradient boosting decision tree (GBDT) [28]. Besides its promising classification performance, LightGBM can rank and evaluate the feature scores. Feature scores define the total contribution of each feature to the splitting process of multiple decision tree learners. The calculation of each feature's score is based on the total gain of splits which use that feature in the structure of decision trees. Hence, using LightGBM as the classifier, it is also possible to implement an embedded feature selection (FS) and remove the redundant features that are not discriminative.
Another critical challenge in intelligent IM fault diagnosis methods is the lack of available data for all operational conditions. Few works in the literature have studied the more realistic and practical scenarios where the training and testing data originated from different working conditions, i.e., different loading levels or rotational speeds [29]. Highlighting this issue, Gangsar et al. [27] showed that when there is a difference between training and testing data working conditions, the average prediction accuracy of a multi-sensor SVM-based fault diagnosis system decreases significantly. It was shown in [27], [30] that the reason for this performance degradation is that variational working conditions lead to a distribution discrepancy between the training and testing feature sets, decreasing the generalization ability of the fault diagnosis methods. Stief et al. [31] utilized principal component analysis (PCA) to reduce the dependency of the extracted features on the loading levels and combined it with a two-stage Bayesian method to improve the performance generalization. Although the technique was reported to be effective in the accurate classification of different types of faults, it could not accurately discriminate the faults' severity. Another powerful approach for solving the distribution discrepancy of features is applying domain adaptation and transfer learning methods proposed in [30], [32]- [34]. However, the limitation of these methods is that they necessitate the availability of unlabeled data for the new working conditions. Satisfying this requirement is not practical in those industrial cases where neither labeled data nor unlabeled data are available for the motor's new working conditions.
Considering the above challenges of intelligent fault diagnosis, in this article, we propose an integrated SML-based framework using LightGBM to enhance the classification performance in terms of prediction accuracy and generalization ability. We used the vibration signals as the inputs of our fault diagnosis framework and extracted features in time domain, frequency domain, and time-frequency domain from the vibration signals obtained by the accelerometers. Thanks to the innovative techniques employed in LightGBM, such as gradient-based one-side sampling (GOSS), leaf-wise tree growth strategy, and histogram-based split finding [28], the proposed fault diagnosis method obtains high accuracy and efficient performance. We utilize the feature ranking capability of LightGBM to sort the features according to their importance. Then, we combine the feature scores with a modified recursive feature elimination (RFE) approach to select the best feature subset. In the proposed RFE-LightGBM-FS, an outer resampling loop encloses the whole process to avoid overfitting the training data and the selection bias. In addition, instead of the conventional resampling methods based on K-fold cross-validation (CV) or leave-one-out CV (LOOCV), we introduce a new resampling scheme called leave-oneloading-out CV (LOLO-CV) to increase the algorithm's generalization when test data contains new unseen loading conditions. The K-fold CV method split the original dataset randomly into K training and validation sets. LOOCV is also a special case of K-fold CV, where K is the number of observations [35], [36]. Therefore, in conventional resampling methods, the data from all the available IM operating conditions exist in all training sets, and there is high overlap between training and validation sets in terms of operating conditions. In other words, the conventional resampling methods select the training and validation sets without paying attention to the operating conditions under which the data is recorded. The reason is that domain knowledge is not used in the conventional resampling methods. On the other hand, LOLO-CV utilizes domain knowledge in dividing the data into training and validation sets. At each resampling iteration of the proposed FS strategy using LOLO-CV, all the data samples belonging to a specific loading level are excluded from the original training set and create a virtual validation set. Therefore, each resampling iteration contains VOLUME 10, 2022 non-overlapping training and validation sets in terms of loading level. The proposed FS strategy improves the robustness of selected features against new unseen working conditions by efficiently utilizing the information in the different working conditions existing in the original training data. The main contributions of our work are described in the following points: 1) A new integrated fault diagnosis framework is presented, which first uses a basic default LightGBM model for obtaining the feature scores. In the next step, a modified RFE containing an outer resampling loop with a new LOLO-CV scheme is linked to the basic LightGBM. The LOLO-CV scheme efficiently leverages the information of multiple loading levels existing in the training set. It makes RFE identify an optimal feature subset that is robust to new unseen loading levels. Using RFE with LOLO-CV also leads to omitting the redundant and uninformative features. Finally, a new optimized LightGBM model is employed to perform the final classification task. 2) To improve the classification accuracy, we use Bayesian optimization to determine the most critical hyperparameters of the final LightGBM model. 3) To evaluate the model's robustness to limited data conditions, we build the testing sets using the data at specific loading conditions which is not included in the training data. The loading condition specified for the test data is neither used in FS nor hyperparameter optimization and is only for final model evaluation. 4) Two real case studies are used to validate the performance of the proposed algorithm. Experiments under different loading conditions prove the effectiveness and superiority of the proposed framework compared with the traditional fault diagnosis methods and related works. The remainder of this paper begins with the theoretical knowledge of LightGBM in Section II. Section III presents the proposed fault diagnosis framework including the feature extraction in multiple domains and the proposed RFE-LightGBM FS approach. Section IV focuses on experimental results and evaluating the proposed method on two real case studies. The conclusion is given in Section V.

II. THEORETICAL FOUNDATION OF LIGHTGBM
In this section, the main principles and advantages of the LightGBM algorithm are clarified. GBDT is an iterative ensemble model achieving the final strong classification results by combining multiple base learners (i.e., weak decision trees) [37]. To improve the performance of the traditional GBDT, Chen et al. [38] presented XGBoost framework that supports parallel learning by CPU multi-threading, adds a regularization term to the loss function to deal with the overfitting, and applies the second-order Taylor approximation in optimizing the objective function.
Having the advantages of XGBoost, LightGBM [28] is a newer enhanced implementation of GBDT. One of the defects of XGBoost is using a level-wise tree growth strategy in which many nodes obtain low splitting gains and increase the computations without improving the accuracy. LightGBM solves this problem by adopting a leaf-wise methodbeing faster and more accurate. The leaf-wise method detects the node with the highest gain at each layer and only splits that node, growing asymmetrical and deeper trees.
Moreover, LightGBM employs other innovative strategies that distinguish its performance from XGBOOST, such as GOSS and histogram-based algorithm of finding the best split points that are fully described in [28], [39].
Considering the training dataset with M instances and p , the predicted output of LightGBM model for the i-th sample,ŷ i is the combination of multiple weak decision trees as follows: where N is the total number of trees (i.e., number of iterations), and f n is an L-leaf node (terminal node) decision tree at the n-th iteration that splits the feature space into L non-overlapping regions {R ln } L l=1 . The region R ln represents the subset of feature space corresponding to the leaf node l in the n-th tree. Equation (2) defines f n as follows: where β ln is the predicted score associated with the l-th leaf node, and 1{·} is the indicator function that outputs 1 if the condition is true and 0 otherwise. LightGBM trains the trees in an additive process. Letŷ (k) i be the predicted output of the i-th sample at the k-th iteration. The objective function in the k-th iteration is defined as The term ''loss'' in (3) is the multi-class logistic loss function for classification problems [37]. The second part is the regularization term that prevents the number of leaf nodes L and the leaf node scores {β lk } L l=1 from increasing. α and λ are the corresponding tuning parameters. By using the second-order Taylor expansion and omitting the constant terms, (3) can be approximated as 81912 VOLUME 10, 2022 where g ik and h ik are the first-order (gradient) and secondorder (hessian) derivatives of loss function. At each iteration of the training, the optimal L-leaf node tree f * k must be found that minimizes J (k) . Therefore, for each tree, the non-overlapping regions {R lk } L l=1 and optimal leaf node scores β * lk L l=1 must be obtained. LightGBM adopts a two-step procedure to carry out this purpose. It firstly fits a regression tree on the pseudo-residuals res ik of the previous tree (i.e., the negative first derivatives of loss function) as follows: Therefore, {R lk } L l=1 is established. By combining (2), (4), (5), and (7), the objective function can be rewritten as where I lk is the subset of data instances at the leaf node l of the k-th tree. By setting the derivative of (8) to zero, the optimal score of the l-th leaf node β * lk and the corresponding minimized objective function J (k) * can be achieved as follows: Suppose I R and I L are the sample subsets of right and left leaf nodes after splitting and I = I R ∪ I L is the sample subset of the original node. The splitting gain (i.e., the reduction of objective function after the split) is computed as Higher values of gain are preferable in growing the trees. When splitting a leaf node, the gains associated with the segmentation points of candidate features are evaluated by (12). LightGBM selects the feature showing the maximum gain for splitting. Finally, feature scores are calculated according to the total splitting gain of each feature or the number of times it participated in the splitting process.

III. PROPOSED FAULT DIAGNOSIS METHODOLOGY
This article presents a new fault diagnosis approach offering high accuracy and generalization. Fig. 1 demonstrates the proposed workflow. Initially, vibration signals are collected from the accelerometers under various loading conditions. Before feature extraction, vibration signals are processed and divided into successive equal-length segments resulting in non-overlapping samples. The details of the remaining steps are described as follows.

A. MULTIPLE DOMAIN FEATURE EXTRACTION AND STANDARDIZATION
The performance of the intelligent IM fault diagnosis systems is highly dependent on the information contained in the extracted features. In this study, we calculated a total of 33 statistical features from the time domain, frequency domain, and time-frequency domain analysis of the preprocessed vibration signal samples. The constructed features are standard and commonly used in the previous IM fault diagnosis works, and their detailed description can be found in [6], [7], [31], [40]. Table 1 shows the formulations of the 15 statistical time-domain feature parameters TD 1 -TD 15 -which are mean value, standard deviation, median value, skewness, kurtosis, square root amplitude, mean absolute deviation, peak to peak value, L1 norm (mean norm), L2 norm (meansquare norm), infinity norm (max norm), crest factor, impulse factor, margin factor, and shape factor, respectively. The frequency domain features may capture the information that cannot be detected in the time-domain features. After performing fast Fourier transform (FFT) on each vibration signal sample, 12 statistical frequency domain features FD 1 -FD 12 are extracted from each sample's frequency spectrum summarized in Table 1. In the frequency domain, the energy of vibration signals may be reflected in the feature FD 1 . The power spectrum convergence can be represented by features FD 2 -FD 4 , FD 6 , and FD 10 -FD 12 . The position shift of the main frequencies can be seen in features FD 5 and FD 7 -FD 9 [40].
In the time-frequency domain, we implemented discrete wavelet transform (DWT) on the vibration signal samples. The specific Daubechies mother wavelet with four vanishing moments ''db4'' in five levels is chosen to decompose the signal samples into five detail levels plus approximation level. Six wavelet domain features are computed as the percentage of energies associated with the wavelet coefficients of each decomposed level (WE 1 -WE 6 ) [6], [41].
LightGBM and other tree-based algorithms are inherently insensitive to the features' scales. However, for the sake of comparison, we also use the case studies of this paper to train other classification algorithms that require feature scaling such as SVM, KNN, etc. We also apply PCA on the dataset for feature visualization that needs feature scaling as well. Therefore, after the feature extraction step, the features are standardized i.e., subtracted by their average value, and divided by their standard deviation, where the average VOLUME 10, 2022  values and standard deviations are computed from the train dataset.

B. RFE-LightGBM-FS USING LOLO-CV
The next step belongs to FS. As stated in section II, LightGBM (and other tree-based algorithms) can output the feature scores. However, selecting the optimal feature subset is another challenge affecting the fault diagnosis performance. This article proposes a straightforward method based on modified RFE to select a reduced feature subset having almost identical distributions under different loading conditions. The proposed FS also alleviates the classification algorithm's computational burden by removing the redundant features.
RFE is an iterative technique that uses the features ranking provided by training a model that can offer feature scores. At each step of RFE, the n least important features (n is user-defined) are iteratively removed from the feature set, and then the model (LightGBM) is retrained using the new reduced feature subset. An evaluation metric such as classification accuracy is iteratively estimated for each rebuilt Light-GBM model. RFE finds the optimal feature subset achieving the maximum evaluation metric and selects it for the final model [35].
Algorithm 1 illustrates the process of the proposed RFE-LightGBM-FS. As shown in Fig. 1, the original test set containing new unseen working conditions does not take part in the whole process of our RFE-LightGBM-FS. Thus, Algorithm 1 only uses the original training set D as input, and it is possible to assess the robustness of the proposed method against varying working conditions by the original testing set.
To prevent the selected features from overfitting the training data and the subsequent selection bias in their iterative performance evaluation, an outer resampling at Line 1 encloses the whole FS loop as propounded by [35]. This contrasts with the traditional RFE methods in which the FS 81914 VOLUME 10, 2022  (Lines 19-20). Then, the complete original train set D is utilized to achieve the final feature ranking and train the final LightGBM model with top S * i features (Lines 21-22). As it is shown in Fig. 1, our proposed FS method employs a basic default LightGBM model for obtaining the feature scores and calculating evaluation metrics without performing any hyperparameters optimization at this stage. Using the basic default LightGBM model simplifies the iterative process of RFE and reduces the potentiality of overfitting to training data.

C. BAYESIAN HYPERPARAMETER OPTIMIZATION OF FINAL MODEL
After finding the optimal feature subset and performing FS, the proposed framework carries out the final classification task by training a new optimized LightGBM model. According to the principal theory of LightGBM discussed in section II, various hyperparameters affect the performance of LightGBM (and other tree-based algorithms), such as the total number of decision trees (iterations), maximum depth of each tree, the minimum value of gain for splitting the leaf nodes, the learning rate of each iteration, maximum number of leaves in each tree, etc. To improve the performance, we adjust the hyperparameters of the final LightGBM model using Bayesian optimization [42]. The basic idea of Bayesian optimization is to construct a surrogate probability model of the objective function and determine the configuration of adjusted hyperparameters that perform best on the surrogate model. It then evaluates the true objective function using the selected hyperparameters and updates the surrogate probability model by adding the new evaluation results. This process continues until a certain number of iterations or time limit is reached [42]. The surrogate function we used in this article for probability representation of the objective function is the tree-structured Parzen estimator [43], which is based on Bayesian reasoning.
Incorporating the previous evaluation results and not spending a considerable time finding non-optimal hyperparameters, Bayesian optimization is faster and more efficient than conventional hyperparameter optimization methods like grid search or random search. As shown in Fig. 1, a LOLO-CV scheme similar to the previous FS step evaluates the classification performance of different sets of hyperparameters to avoid overfitting.

IV. EXPERIMENTAL CASE STUDIES AND RESULT ANALYSIS
This section validates the efficacy of our proposed fault diagnosis framework and compares its performance with the existing intelligent methods using two case studies.     Rolling element bearings are among rotating machinery's most vulnerable and crucial components [44]. Therefore, we focus on the bearing faults in the first case study and evaluate our proposed algorithm's capabilities in diagnosing the bearing faults. The first dataset is the public Case Western Reserve University (CWRU) bearing fault dataset which is also studied in many research articles on fault diagnosis in the literature [44].
The second studied dataset belongs to our laboratory's induction machine (IM) setup. In addition to the bearing fault, which is the most common fault type, there are more fault types, including broken rotor bars and eccentricity faults [1]. A powerful fault diagnosis method must be general enough to present excellent performance for various fault types. Therefore, we also tested our algorithm on the IM dataset in which six types of non-bearing faults have been implemented.
We used python 3.8.5 to train the machine learning models.  Fig. 2, the experimental setup includes a 2hp IM, a torque transducer/encoder, and a dynamometer. The dataset contains vibration data collected by the accelerometers located at the drive-end and fan-end of an IM under four loading level conditions (0, 1, 2, and 3 hp). Three different fault types have been implemented on the drive-end bearings, including inner race fault (IF), outer race fault (OF), and ball fault (BF). Each fault type consists of three severity levels (diameters 0.18, 0.36, and 0.54 mm). Therefore, considering one normal condition and three fault types, each with three severity levels, there are a total of ten class labels for bearing health states, i.e., C1-C10. Table 2 illustrates the ten considered class labels and their corresponding fault types and diameters. The sampling frequency is 12kHz. The data acquired at each loading level creates a subset containing 500 samples where each sample consists of 2400 data points. We randomly selected 50% of the samples of each subset for training. Besides, to assess the performance of the fault diagnosis methods under varying operating conditions, we constructed five testing scenarios (α, β, γ , δ, and ε) shown in Table 3. In the first four scenarios, the training and testing sets are created from different loading levels, and the loading levels used for testing sets are removed from the corresponding training sets. Therefore, they are more critical than the last scenario because they can assess the generalization ability of the diagnosis methods. Fig. 3 depicts three sample time-domain waveforms of raw vibration signals and their corresponding FFT spectrums for health states IF, BF, and OF at the severity level of 0.54mm. It is challenging to observe the hidden features discriminating the bearing states from the raw vibration signals. Hence, it is essential to extract features in multiple domains according to Section III.A.

2) RESULTS, ANALYSIS, AND COMPARISON
Based on the methodology elucidated in section III and Fig.1, the proposed fault diagnosis framework is implemented on the CWRU bearing dataset. Initially, the data obtained from two vibration sensors at the drive-end and fan-end of the IM are processed, and 15 time domain features, 12 frequency domain features, and six wavelet energy features are extracted from each sensor's data leading to 66 features in total. Then, by training basic LightGBM models, the proposed RFE-LightGBM-FS is applied according to Algorithm 1 for each scenario. The total splitting gain in LightGBM is chosen as the criteria for calculating and evaluating the feature scores.
Among the five scenarios shown in Table 3, scenario α using the 0hp load as the original test data is taken as an example and is highlighted here because the no-load and light-load levels are potentially the most challenging conditions in fault diagnosis [27]. The resampling process in the RFE-LightGBM-FS of scenario α has three iterations (L = 3). Each loading (1hp, 2hp, and 3hp) is once excluded from the training set and included in the virtual nested validation set for LOLO-CV evaluation. Fig. 4a illustrates the RFE-LightGBM curve for scenario α concerning the LOLO-CV accuracy. For the sake of comparison, the testing accuracy for 0hp load is also shown in Fig. 4a for different subsets of features. It is observed that LightGBM achieves the maximum LOLO-CV accuracy (99.2%) when the top 11 most important features are selected. Correspondingly, the maximum testing accuracy (97.23%) is also achieved with the top 11 features. Therefore, applying the optimal feature subset for LOLO-CV accuracy determined by our proposed RFE-LightGBM-FS, we can reach the optimal testing performance, too, even when the test data contains new unseen loading levels.
In Algorithm 1, we applied XGBoost and Random Forest (RF) instead of LightGBM to obtain feature rankings (Lines 10-11 and 21) and perform classifications (lines 15 and 22) to compare their performance with RFE-LightGBM-FS. Figs. 4(b) and 4(c) depict the performance of RFE-XGBoost-FS and RFE-RF-FS for scenario α, respectively. In both figures, the optimal feature subsets achieving the highest testing accuracy differ from those achieving the highest LOLO-CV accuracy. In fact, using the proposed FS strategy of Algorithm 1, RFE-XGBoost-FS gives only 93.51% testing accuracy with the top 27 features, and RFE-RF-FS gives 94.4% testing accuracy with the top 15 features. Thus, we cannot achieve the best testing Performance by replacing LightGBM with XGBoost and RF in Algorithm 1.
The reason behind the superiority of the RFE-LightGBM-FS method can be attributed to the different feature scores VOLUME 10, 2022  provided by LightGBM due to its leaf-wise tree growth strategy, explained in section II. Fig. 5 compares the top 30 most important features and their scores obtained by LightGBM, XGBoost, and RF models. For instance, FD8_S1 in Fig. 5 represents the eighth frequency domain feature from the first vibration sensor located in the drive-end of the IM, and WE3_S2 represents the third Wavelet energy feature from the second vibration sensor located in the fanend of the IM. Although most of the top 30 features are shared between the three models, the feature rankings differ remarkably between the three cases in Fig. 5. According to the excellent performance of RFE-LightGBM-FS in selecting the optimal feature subset in Fig. 4a and utilizing the leaf-wise tree growth strategy, we can conclude that the feature ranking provided by LightGBM in Fig. 5 is more valid than XGBoost and RF.
Moreover, it can be seen that most of the features extracted from the drive-end vibration sensor (S1) have higher importance scores than the features obtained from the fan-end vibration sensor (S2) because the faults have been implemented on the drive-end bearing.
According to the confusion matrix shown in Fig. 6, the testing accuracy of a basic LightGBM classifier without applying FS is only 88.04% for scenario α. From Fig. 4a, we saw that adopting the proposed FS method improves the testing accuracy to 97.23%. The testing accuracy can further increase by applying Bayesian hyperparameter optimization for the final LightGBM classifier. Table 4 shows the adjusted LightGBM hyperparameters after performing Bayesian optimization. Fig. 7 illustrates the testing confusion matrix of the final optimized LightGBM classifier. After implementing the proposed FS and Bayesian optimization, the average testing accuracy becomes 98.55%.
To further explore the effectiveness of the proposed FS method, we applied PCA to reduce the features' dimensions and visualize their distribution in Fig. 8 for scenario α. Using the first three principal components (PCs), Fig. 8a depicts the distribution of features without implementing the FS method. It can be seen that there is a distribution discrepancy between the training and testing samples having identical class labels due to changing loading conditions, leading to misclassified samples in the confusion matrix of Fig. 6, particularly for classes C3, C5, C7, and C9. Moreover, because some features are redundant and uninformative, there is a significant overlap between the samples of class labels C3, C5, C6, C7, and C9, leading to poor classification performance. In contrast, after selecting the optimal subset with top 11 features in Fig. 8b, the distributions of the same class label samples are no longer sensitive to loading levels, and they are clustered together. In addition, the proposed FS has also increased the interclass separability between the samples of different fault types (C1-C10). 81918 VOLUME 10, 2022 , and 3hp are shown by hollow shapes, while filled shapes show the test data samples from 0hp load. Each of the ten class labels from C1 to C10 has its own shape and color.)

TABLE 5. Comparison of classification accuracies using CWRU dataset (%).
Previously, Fig. 4 proved the superiority of LightGBM over XGBoost and RF in terms of providing the feature ranking. Table 5 compares the performance of six different settings of the proposed fault diagnosis framework for all scenarios. In methods (1-6) examined in Table 5, LightGBM is used for obtaining the feature rankings in the proposed FS (Algorithm1, lines 10-11 and 21). However, for evaluating different feature subsets (Lines 15-16) and obtaining the final classification results (Line 22), the methods (1)(2)(3)(4)(5) in Table 5 replace LightGBM with other existing optimized SML-based methods, i.e., KNN, SVM, RF, XGBoost, and artificial NN (ANN). For each experiment, ten trials are performed, and the average testing accuracies are reported. It can be seen that the proposed framework using LightGBM in all steps (method6 in Table 5) offers the highest average testing accuracies compared with the other methods for all the five scenarios. Table 6 summarizes a number of fault diagnosis methods reported in the literature using the CWRU bearing dataset and compares their performance with the proposed LightGBM-based fault diagnosis framework. It can be seen that, unlike our proposed method that uses a shallow model, most of the recently developed methods in the literature are based on deep convolutional neural network (CNN). The methods based on deep transfer learning (DTL) [21], [46], [47] overcome the problems of high computational time and overfitting of conventional deep CNNs which are trained from scratch. According to Table 6, the DTL-based methods presented in [21] and [47] using VGG-16 and ResNet-50 models present the highest accuracies in the literature. Table 6 verifies the effectiveness of the proposed method in terms of accuracy compared with the state-of-theart DL-based approaches reported in the literature. However, another key factor that should be considered and directly affects the efficiency of the fault diagnosis methods is the computational training time. As mentioned previously, the shallow models are more efficient than DL models in terms of computational requirements. The average training time of the proposed shallow LightGBM-based fault diagnosis method for the studied scenarios is 26.08 seconds; this is way less than the training times of the DTL-based methods in [21] and [47], which are reported to be in the range of 150-318 seconds VOLUME 10, 2022  for different scenarios in CWRU bearing dataset. Therefore, the performance of the proposed method is proven to be promising in terms of both accuracy and efficiency.

B. CASE 2: INDUCTION MACHINE DATASET OF AALTO UNIVERSITY
In this section, we further examine the performance of the proposed fault diagnosis system by conducting experiments on the IM dataset at our laboratory.

1) DATA PREPARATION
This setup consists of two 18kW IMs that are connected back-to-back via their shaft. The IM measurement setup and its configuration are depicted in Figs. 9(a) and 9(b), respectively. The vibration signals were measured from the first IM (tested machine) fed from a sinusoidal voltage supply at 50 Hz. The second IM operating as the loading machine was connected to a frequency converter to provide various loading levels. Three Kistler 8763B050AB accelerometers were evenly arranged at positions 120 degrees apart on the 81920 VOLUME 10, 2022   circumference of the tested IM. Fig. 9c shows the position of the accelerometers. The vibration signals were collected under full-load (FL), half-load (HL), and no-load (NL) conditions where the currents were 40A, 30A, and 18A, respectively. The following defects were implemented on the tested IM: dynamic eccentricity with 28.5% severity (Ecc), two consecutive broken rotor bars (2 BRBs), 3 BRBs, two non-consecutive (NC) BRBs (2 NC-BRBs), and simultaneous fault of 2 NC-BRBs and Ecc. Hence, with the normal condition, there are a total of six class labels, i.e., C1-C6. Table 7 depicts the six considered class labels and their corresponding fault conditions.
The sampling frequency is 10kHz. The data acquired at each loading  level creates a subset containing 400 samples where each sample consists of 2400 data points. We randomly selected 50% of the samples of each subset for training. Table 8 shows the four testing scenarios (α, β, γ , and ε) that are built like the previous CWRU case study. In the first three scenarios, the loading level used for testing is entirely unseen during the training process. The raw vibration signals for three sample health states under FL level and their spectrums are illustrated in Fig. 10.

2) RESULTS, ANALYSIS, AND COMPARISON
The same fault diagnosis methodology as the previous CWRU case is carried out on the IM dataset. In total, 99 features are calculated from the three vibration sensors data. Taking scenario α as an example, Fig. 11 displays the RFE-LightGBM-FS curve and the profile of testing accuracy. After adopting the proposed RFE-LightGBM-FS, the feature subset containing the top 19 features is determined to be the optimal one as it gives the maximum LOLO-CV accuracy (98.9%). According to Fig. 11, this VOLUME 10, 2022  feature subset also matches the highest testing accuracy (97.92%). Thus, applying the proposed FS technique leads to finding the optimal feature subset offering the highest testing accuracy. Fig. 12 shows the testing confusion matrix of a basic LightGBM classifier for scenario α, indicating the average testing accuracy of 79.85%. Previously, we saw from Fig. 11 that applying the proposed FS method improves the testing accuracy to 97.92%. Fig. 13 illustrates the testing confusion matrix of scenario α after adopting the proposed RFE-LightGBM-FS and Bayesian hyperparameter optimization. According to Fig. 13, hyperparameter optimization further improves the testing accuracy to 99.08%. The optimized LightGBM hyperparameters are listed in Table 9.
The average computational training time of the proposed shallow LightGBM-based fault diagnosis framework for the studied scenarios of IM dataset is 43.12 seconds. It is higher than the computational training time of the same method trained on the CWRU dataset because the number of sensors and hence, the number of features is increased in IM case study.
Figs. 14(a) and 14(b) demonstrate the 3D PCA visualization of the features' distributions before and after performing RFE-LightGBM-FS, respectively. According to Fig. 14(a), the training and testing samples of similar class labels are separated because of varying loading levels. On the other hand, Fig. 14(b) shows that the proposed FS strategy can perfectly decrease the distance between the same class label features at various loading conditions and improve the classification accuracy. Table 10 evaluates the classification results of the six variations of the proposed fault diagnosis framework. In all methods assessed in Table 10, the feature rankings are provided by the LightGBM model, but in methods (1)(2)(3)(4)(5), evaluation of different feature subsets and final classifications in Algorithm 1 are performed by other SML-based methods. The results indicate the superiority of method 6, in which LightGBM is used for all steps of Algorithm 1, including obtaining feature rankings, evaluating feature subsets, and final classification. 81922 VOLUME 10, 2022

V. CONCLUSION AND FUTURE WORK
In this article, a theoretical and experimental study of a new fault diagnosis framework was presented that offers high accuracy and generalization ability for the testing scenarios in which data originates from new operating conditions being unavailable during training. Leveraging the LightGBM's ability to provide feature ranking, we proposed an efficient FS strategy combining LightGBM, RFE, and a LOLO-CV-based resampling process. Moreover, we performed Bayesian hyperparameter optimization to enhance the final classification result. Two experimental case studies, i.e., bearing dataset of CWRU and induction IM dataset obtained in our laboratory, were utilized to examine the proposed fault diagnosis system's effectiveness and accuracy.
Considering the changing operating conditions, we studied various testing scenarios in which a particular loading level is removed from the dataset and only used for testing. The results demonstrated that the proposed RFE-LightGBM-FS method could identify the optimal subset of features that are not sensitive to changing operating conditions and offer high separability between various class labels. The evaluation results proved that for various testing scenarios of CWRU and IM datasets, the proposed fault diagnosis framework achieved 98.55% to 100% accuracy. In both case studies, we highlighted the most challenging testing scenario (scenario α) where the data measured under the no-load condition did not participate in the training process and was only used for testing. According to the results, the classification accuracy of scenario α using a basic LightGBM classifier without any FS implementation or hyperparameter tuning was 88.04% and 79.85% for CWRU and IM datasets, respectively. Meanwhile, the proposed RFE-LightGBM-FS method increased the testing accuracies to 97.23% and 97.92%, respectively. The results also showed that applying the Bayesian hyperparameter optimization further increases accuracy to 98.55% and 99.08%.
Future research work includes: • Extending the proposed framework to be able to locate the faults in addition to detecting and discriminating the fault types by implementing and studying the faults occurring in both drive-end and fan-end bearings.
• Considering and implementing the primary causes of bearing failure and more realistic faults such as lubrication degradation of bearings, overheating, excessive loads, and corrosion.
• Embedded system implementation of the proposed framework to assess its ability in real-time operation.
ALIREZA NEMAT SABERI (Graduate Student Member, IEEE) received the M.Sc. degree in electrical power engineering from the University of Tehran, Tehran, Iran, in 2017, and is currently pursuing the Ph.D. degree with the Department of Electrical Engineering and Automation, Aalto University, Espoo, Finland. He is currently a Data Science Engineer at ABB System Drives, Helsinki, Finland. His research interests include intelligent fault diagnosis and condition monitoring of industrial apparatus and systems, applied machine learning and deep learning, and design and modeling of non-conventional electrical machines and drives. He is a member of the Estonian Society of Moritz Hermann Jacobi and the Estonian Society for Electrical Power Engineering. VOLUME 10, 2022