Intelligent Fault Diagnosis for BLDC with Incorporating Accuracy and False Negative Rate in Feature Selection Optimization

Early fault diagnosis is essential for the proper operation of rotating machines. This article proposes a fitness function in differential evolution (DE) that considers accuracy rate and false negative rate for optimization in brushless DC (BLDC) motor fault diagnosis. Feature selection based on a distance discriminant (FSDD) calculates the feature factors which base on the category separability of features after the Hilbert–Huang transform (HHT) which extracts the features of four different type signals from BLDC motor Hall sensor. The feature rank through DE to optimize before the features into the backpropagation neural network (BPNN) in order. By reducing the feature number of Hall signal and decreasing the complexity of neural network input, the combined method was proposed in this article can significantly reduce the calculation cost. Finally, the identification model obtained an accuracy rate of 98.98% and false negative rate of 13.66% when there were 18 features; besides, receiver operating characteristic (ROC) curve and probability curve have been evidenced the number of false negative is decreased. Moreover, the experiments have verified that the proposed method is effective in UCI data set.


I. INTRODUCTION
Intelligent fault diagnosis (IFD), applications of machine learning theories to machine fault diagnosis, is a promising way to release the contribution from human labor and automatically recognize the health states of machines. A commonly-implemented diagnosis procedure includes feature extraction, feature selection and classifiers. Deep learning (DL) is compared with IFD which features are manually extracted and selected with required prior knowledge. With the purpose of avoiding manual feature extraction, DL has been successfully applied to fault diagnosis, including a fault diagnosis model is designed with a novel stacked transfer auto-encoder (NSTAE) [1], an adaptive feature learning approach that is built on spatiotemporal pattern network (STPN) [2], and deep belief network(DBN) be used as the classification [3]. DL through adjusting connected weights to learn and select representations and patterns that can best represent the condition of working machines from the input data [4]. It helps automatically learn fault features from the collected data instead of the artificial feature extraction of IFD. However, the computational cost for deep learning models is high because the input data is usually high dimensional network structure. And these models require a great deal of training data to be labeled, even only a fraction of condition monitoring data is labeled in the real issues [5].
The work in this article constructed a model to identify the BLDC fault types. In order to detect the operation status of the BLDC, Hall sensors or sensorless algorithms based on back electromotive force are commonly used [6]. A Hall sensor has the obvious advantages of a low cost and simple structure [7]. Additionally, DC motors using Hall sensors have been widely used in commercial and industrial applications [8]. Therefore, this article uses the Hall signal, which is an electrical technology, to establish an identification model. The motor may suffer from different failures, including stator failure [9][10][11][12], rotor failure [13], [14], bearing failure [15], [16], eccentricity fault [17] and inverter fault [18]. Stator failure accounts for 30% to 40% of the total failures in motors; rotor failure accounts for 5% to 10% of the total failures; and bearing failure accounts for 40% to 50% of total failures [19].
The comprehensive description of the proposed fault diagnosis model includes feature extraction, feature selection and classifiers. Signal analysis has been developed for decades, and the Hilbert-Huang transform (HHT) is based on the intrinsic mode functions (IMFs) of the original signal to calculate the instantaneous frequency, and then perform spectrum analysis [20]. Since there does not have to choose the mother wavelet, it is not affected by the resolution of the time domain and frequency domain, so this method can more accurately decompose the signal in the high frequency domain.
After signal analysis, the original signal can provide features with a good identification rate through feature extraction and feature selection. Feature selection can be divided into filter, wrapper, hybrid approach and embedded feature selection [21]. The filter type is based on the relationship between features as the criterion [22], and the wrapper type is based on the relationship between features and the target variable as the criterion [23]. The embedded type is usually used for high dimensional data features [24], [25]. The wrapper and hybrid approach types can obtain better results [26], but the filter type is usually used when considering the computational cost and a large number of features [24], [25]. Therefore, this study used the filter feature selection to calculate feature weights, such as the Feature selection based on a distance discriminant (FSDD), belonging to a clustering algorithm [27].
The main disadvantage of the filter feature selection is that this method independently examines the relationship between features, and the lack of a classifier to participate in the feature selection process leads to ignoring the performance of the feature for the identification results. In the proposed model, the features with distance discriminant factor from FSDD are optimized by DE, the classifier is involved in the feature selection process through optimization to obtain the features with optimized feature ranking. This article combines DE with feature factors after feature selection to optimize the feature ranking. Differential evolution (DE) is an effective and simple global optimization algorithm. The convergence speed and robustness of common benchmark functions and practical problems are better than those of many algorithms [28].
An artificial neural network (ANN) is a common nonlinear function processor that imitates the structure and pattern of the human brain [29]. The performance of the learning process of the neural network depends on the weights of the neural network in the training phase. A BPNN is a supervised machine learning technique that adjusts its weights to minimize the error of the calculated output, and it is suitable for identifying nonlinear relationships [30]. A BPNN is used in the fault diagnosis problem of NPC inverters [31], high impedance faults [32] and virtual speed sensors for DC motors [33].
Performance evaluation by metrics which includes true positive (TP), false positive (FP), true negative (TN) and false negative (FN) in novel convolutional neural network [34]. Regulated parameter to reduce variations that can lead to false alarms in healthy operating conditions of the motor in wound rotor induction machine drives [35]. Error probability (false alarm and miss alarm), one of the thresholds, determine would continue iterative decision making in the fault diagnosis scheme [36]. In most of the literature, false alarm or miss alarm probability were mentioned to exactly identify the performance of the proposed fault diagnosis scheme.
Based on the abovementioned related literature, this research proposes a fault identification model for a BLDC established by Hall signals, which includes signal analysis selection, feature selection and classifiers. Besides, that discusses the number of false positive in the model to find out the performance shift of the proposed fault diagnosis scheme.

A. EXPERIMENTAL ARCHITECTURE
This section introduces the experimental equipment, experimental architecture and signal samples in this research and studies the healthy, bearing, winding and rotor, a total of four different types in a BLDC to build a fault diagnosis model. The process of this research is that the servo motor (11kW/2000rpm/69Hz) of the dynamometer generates the opposite torque to the BLDC (420W/3020 rpm/DC 24V/60Hz) as the load, and then the BLDC motor drives the operation. The BLDCM parameters are listed in Table I. A total of four BLDCs were tested in this experiment. One motor was healthy, whereas the other three motors were faulty. The faulty types included bearing damage in the inner raceway, winding short circuit and rotor damage. The bearing inner raceway had a 1 mm physical crack. The winding short circuit was set by exfoliating a part of the 2 coil insulation. The rotor damage was set by digging a hole. The data acquisition system (NI PXIe-1073) was used to acquire the Hall signal of the DC brushless motor, and the sampling rate was 1000 Hz, and the measurement time was 1500 seconds. There was a total of 1500 seconds of measurement records for BLDC motors in each condition, and the 1500 second data were divided into 750 samples of data, every sample having 2000 points. The adopted procedure use HHT be preprocessing tools of the healthy BLDC hall-sensor signals, the original output voltage of the hall-sensor in 150 points is shown in Fig. 1. Matlab program is used to compile and analyze signals in the personal computer with Intel Core i5-4460 3.2 GHz and 8 GB RAM after the Hall signal of the DC brushless motor was acquired by the data acquisition system (NI PXIe-1073). This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. A total of 3000 samples of Hall signal data records for motors and the measured signals were analyzed by the HHT in Matlab. After the analysis, the extracted features that can reflect the motor conditions were normalized so that the feature values of the 4 motor types were between 0 and 1, which avoids the gradient explosion problem in the classifier. And then feature selection was used to calculate the factor of the features. The rank of features was optimized by DE after the features were ranked in descending order by the feature factors. Finally, the results of the fault type identification from classifiers were returned. The experimental processing and configuration are shown in Fig. 2.

B. SIGNAL ANALYSIS AND FEATURE EXTRACTION
The signal can be analyzed in the time and frequency domains. In some cases, the frequency domain of the signal can be presented in a clearer way than the time domain [37]. Dr. Norden E. Huang proposed the HHT in 1998, and it has since been widely used in speech analysis and nonlinear and unstable signal analysis [38]. HHT consists of empirical mode decomposition (EMD) and Hilbert transform (HT). The original function of the input can be decomposed into intrinsic mode functions (IMFs) and trend functions through EMD. EMD which is a series of shifting decomposition of signal, a highly efficient data decomposition method, adaptively extract the basis functions from signals. Hilbert spectral analysis that each IMF is analyzed by Hilbert transform which transfer to plural modus to get the Hilbert spectrum of the signal. EMD separates the four types of motor Hall signals into the eight layers (IMF1 to IMF7 and Residual), and Hilbert-Huang transform decompose the IMFs to obtain the instantaneous amplitude and instantaneous frequency of each layer. Additionally, there have 12 features that are captured are maximum (Tmax), average (Tmean), mean square error (Tmse), standard deviation (Tstd), maximum/mean (Tmax/Tmean) and maximum/root mean square (Tmax/Trms) of the time domain, and the maximum (Fmax), average (Fmean), mean square error (Fmse), standard deviation (Fstd), maximum/average (Fmax/Fmean) and maximum/root mean square (Fmax/Frms) of the frequency domain. Each IMF of every single sample took 12 features and normalized them so that the feature values of the 4 motor types were distributed between 0 and 1. This step obtained a total of 96 features, as shown in Table II. The experiment uses feature extraction method to obtain the feature set before optimizing the feature rank by feature selection and optimizer methods and then uses classifiers to present the results, as shown in Fig. 3. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.

C. FEATURE SELECTION
The model includes feature selection in order to eliminate redundant features which improve prediction accuracy and reduce computational cost. The feature selection was implemented using the FSDD of the clustering algorithm to calculate the category separability of features. The higher value of the factor represents which is the more important feature. Therefore, the features were ranked in descending order by feature factors after feature selection. The features can increase the recognition rate of the classifier or not affect the recognition result by feature selection and deletion which are extracted from the Hall signal after signal analysis, which can reduce the computation cost of the recognition model. The feature distance discriminant factor λ m is based on the Euclidean distance between the features of the same category and the Euclidean distance between the features of different categories . The Euclidean distance of the feature was calculated by the center of the category feature g c m and the center of the sample feature g i m , where C, m and i are the category number, feature number and sample number. is the feature of the sample. The compensation factor η m was calculated by the distance variance u b m and v w m . The calculation procedure is as follows: Step 1. Calculate the variance and average of all the samples in the mth feature.
Step 2. Calculate the variance and the average of the sample of class C in the mth feature.
Step 3. Calculate the weighted variance of the class center g C at the mth feature.
Step 4. Calculate the inter-class distance of the mth feature and the intra-class distance of the mth feature . 1 Step 5. Calculate the variance factor of in the mth feature and the variance factor of in the mth feature.  This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.  (16) Step 7. Calculate the distance discrimination factor of the mth feature.
Step 8. Normalize the distance discriminant factor.

D. CLASSIFIER
The feature rank is obtained by feature selection, which is conducive to the classifier of the neural network model. The nonlinear classifier BPNN which randomly selects 70% of the data from the motor samples, and the features are brought into the classifier for training, and the remaining 30% of data are used as test samples. The size of input layer depends on the feature number, the size of hidden layer is 60 and the training function which is trainscg uses less memory. The BPNN parameters are listed in Table III. A BPNN imitates the capabilities of neural system resource processing data and discriminant analysis by simulating the structure of biological data processing. Among them, neurons are used for message transmission and backpropagation to correct errors in order to achieve the best identification result. A BPNN has three structures that are composed of an input layer, a hidden layer and an output layer.

A. DIFFERENTIAL EVOLUTION
Differential evolution proposed by Price and Storn is an optimization technique that is a competitive and reliable evolutionary used to solve various complex problems [39].The calculation principle is similar to the genetic algorithm (GA), including three mechanisms of mutation, crossover and selection. The offspring are derived from random parental parameter mutations, as shown in Fig. 4. In addition, this algorithm refers to particle swarm optimization (PSO) to make the evolution direction approach the best particle. The randomness used in DE is a random search algorithm that prevents the algorithm from falling into the local optimum. Therefore, it can be used for many important problems that need to be optimized, including neural network training, and Bayesian network inference [40]. Other articles have proposed the algorithms which were combined with the DE algorithm to improve the computational efficiency or improve the recognition rate [41], [42]. In this article, the accuracy rate and false negative rate are combined to be set as the fitness value, the feature rank is a rank optimized by DE after optimized to improve the identification and false negative rate. The calculation procedure of the differential evolution algorithm is as follows: Does the fitness value converge Start NO Yes

Produce mutations by V =G +F(G -G )
Random operation that bases on rand and CR to crossover Calculate the fitness value of the first particle Select the better particle Set the number of population r、 the number of particle i、 crossover rate CR 、 initialize particle G and initialize iteration j = 0 End 1,0 obtained as the best particle coordinate G best Step 1. Initially, set the parameters as follow: the number of population r=10, the numbers of particle which are the numbers of feature i=96, crossover rate CR=0.8, G 1,0 , the distance discriminant factor from FSDD, is the first generation of the first particle in population and initialize iterations j=0.
Step 2. Calculate the fitness value of the first generation of the first particle.
Step 3. Randomly select the parameters in the offspring G 1,j , G 2,j and G 3,j to produce mutations.
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.  (20) Step 5. The step of elimination obtains a better fitness value through the greedy algorithm.
Step 6. The stopping rule is whether the fitness value has converged, meaning the optimal value. The fitness value combine the accuracy rate and false negative rate. If the number of calculations reaches the iterations j, it stops. Otherwise, repeat steps 3 to 5.
Step 7. Finally, all particles converge to obtain the best global solution. After the optimization, a set of solutions can be obtained as the best particle coordinate G best which is the optimized importance of the feature.

B. FITNESS FUNCTION
There  [43]. A perfect test would have zero false positives and zero false negatives, so building the curve of samples by the different recognized results is another way. The false negative rate is shown in (23), where FN is the number of false negatives, TP is the number of true positives and N=FN +TP is the total number of ground truth positives [40].

% TP TN ACC TP FP FN TN
In the following paragraph, the equations of fitness value combine the accuracy rate and the false negative rate, and then the property of equations are represented in the surface plot which is a visual medium to exactly describe the performance. The red points in the surface plot of fitness evolution are the accuracy rate and false negative rate before optimization to state the global situation.
Equation 24 referent the Pearson correlation coefficient to combine the accuracy rate and false negative rate after optimizing in the iterations. ACC and FNR are accuracy rate and false negative rate. There has obviously been scaling in Fitness when the FNR is extremely smaller, Fitness is flat at other times. The increased degree of fitness is unobvious that could not search the evolution direction as shown in Fig. 6.
Equation 25 cancel the division in the equation to increase the degree of change in fitness. FNR is original accuracy rate and false negative rate. There have many combined options of ACC and FNR with the same fitness value. It leads to an increase in Fitness, while ACC and FNR are not optimized at the same time as shown in Fig. 7.
Equation 26, Fitness will be improved when the ACC is increased and the FNR is reduced at the same time. ACC and FNR are original accuracy rate and false negative rate , which are the origin of coordinate in the evolution plot. There has the plain and slope even the ACC and FNR are in the micro shift so that the optimization has a clear direction as shown in Fig.  8.

A. DATASET RESULTS
The work in this article built a model, and an original signal was brought into the model. In the classifier part of the model, the signal was brought into the BPNN which can obtain better results in BLDC fault diagnosis. The features from 2,100 samples of data were brought into the classifier for training, whilst the features from the remaining 900 samples of data were used as test samples, and this was repeated 100 times to calculate the average accuracy rate in order to know the resolution of the degree of the type of motor failure. Initially, the input was 96 features, and then the number of inputs decreased after the feature selection. The data matrix was 96×3000, which means 96 features and 3000 samples from four types of motor.
The signal was directly recognized by the classifier after the feature analysis by the HHT. Although the number of features was the largest, there may be more features that cannot clearly distinguish the fault, which led to an accuracy rate for the BPNN of 95.70%. Feature selection can reduce computational costs by determining the features of less influence and redundancy.
In Fig. 9, the method can obtain a smooth accuracy rate after the number of features was more than 10. The accuracy rate is 95.70% when the number of features is 96; the highest accuracy rate in a full experiment is 99.25% when the number of features is 62; the lowest false negative rate in a full experiment is 1.00% when the number of features is 58.
When number of features is 96, there is no outstanding advantage in accuracy and false negative; the false negative rate is the lowest when there are 58 features but the number of features is large; the recognition rate is the highest when there are 62 features but the false negative rate is high. As mentioned above is the reason that the number of features that will be optimized should be another position. When the recognition rate is around 16 features, it gradually stabilizes. And the false negative rate is also relatively lower in the entire experiment result.
In Fig. 9, the accuracy rate is gradually stabilized from around 10 features, but the false negative rate is unstable even in more features. There has a feature point worth being noteworthy that the false negative rate is relatively lower and the beginning of stabilized accuracy rate when the number of This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ features is 18. From this finding, the work in this article used a differential evolution to optimize the factor of the first 18 important features.

B. OPTIMIZATION
The number of 18 features in Table IV is the best, and the recognition rate and false negative rate are both the best values. Although the 18 features did not have outstanding advantages in the overall experiment. The 18 features compared with the number of features previously proposed, only need to use a lesser number of features are recognized, and the shortcomings can be optimized to increase the recognition rate and reduce the false negative rate.
In Table V, when the fitness value combines the accuracy rate and false negative rate as (24) can effectively enhance fitness, greatly reduce false negatives, and reduce the accuracy rate. Adding the original point of fitness into the fitness formula as (25) can effectively improve fitness, greatly reduce false negatives, and slightly reduce the recognition rate. Considering the original point of fitness and mutation direction, (26) effectively increases the recognition rate and reduces false negatives at the same time which is the best fitness function in these three equations.
This subsection uses the continuous dataset provided by UCI for verification [44], experimental verifications are performed to demonstrate the effectiveness of the proposed method, and the features of the dataset are carried out with the (26) in DE. First, the feature is selected by using the FSDD before ranking the feature factors. Then features are entered into the DE optimizer. The optimizer results are shown in Table VI, the false negative rates and accuracy rates after optimizing were better than the original. ROC curve compares the TPR and FPR in the plot to confirm the FP situation [45], [46]. If only changing result cutoff values could lead the FP to be raised when decreasing the FN [47], [48]. The ROC curves for diagnostic with optimization and without optimization are plotted in Fig. 10 that illustrates the FP would not obviously increase in the proposed method. The area under the ROC curve (AUC), a metric derived from the ROC curve, evaluates the performance of classifiers. The AUC values of model with DE are higher than the model without optimization, except for the AUC value of winding short circuit motor which is nuance in the two models, as shown in Table VII. In order to compare the probability curve after optimization, the probability distribution of motor accuracy in four types is created by accuracy and probability. Fig. 11 shows the probability of one type motor being classified as the four conditions of motors. The fault motor probability curve of rotor damage and short circuit in stator windings moves to both ends of accuracy after optimization that expounds the probability of these two types being classified as the correct motor raises, as shown in Fig.  11(c), (d), (g), (h).      ROC curve has argued the false negative numbers of model with the proposed fitness function as similarly less as without optimization. Above ROC result and probability curve has been testified the model with the proposed fitness function is better than only moving the threshold to raise the accuracy rate.

VII. CONCLUSION
In fault types, bearing damage, stator winding failure and rotor damage make up the majority. The complexity of fault detection is reduced through this model in executing the preliminary diagnosis. This article presented a fault diagnosis model for BLDCs. This model includes five subsystems which are signal analysis, feature extraction, feature selection, ranking optimization and classifiers. In this article, the proposed model reduced the number of features to 18, which significantly eliminated 81% of the redundant features in the BLDC dataset, and then the model has been successfully applied to the other dataset. The final accuracy rate reached 98.98% and false negative rate was 13.66% in the BLDC dataset, which is higher than the result of the identification of 96 features. Moreover, the model with the proposed fitness function has been testified through the ROC curve and probability curve this is more than moving the threshold and increasing the accuracy probability even.