Fault Diagnosis Based on Multi-Scale Redefined Dimensionless Indicators and Density Peak Clustering With Geodesic Distances

A novel fault diagnosis method for rolling bearings based on multi-scale redefined dimensionless indicators and an unsupervised feature selection method using density peak clustering with geodesic distances is proposed in this paper. First, a new feature extraction method is proposed based on redefined dimensionless indicators and multi-scale analysis called multi-scale redefined dimensionless indicators. Then, density peak clustering with geodesic distances is utilized to automatically find the important multi-scale redefined dimensionless indicators. To the best of our knowledge, this is the first study to use density peak clustering with geodesic distances to explore unsupervised feature selection in the fault diagnosis field. Finally, the selected multi-scale redefined dimensionless indicators are fed into a quadratic discriminant analysis classifier to simultaneously identify 12 different conditions of rolling bearings. Experimental results demonstrated that the proposed method can successfully differentiate 12 localized fault types, fault severities, and fault orientations of rolling bearings.


I. INTRODUCTION
Rolling bearings play significant roles in different types of machinery. However, these bearings are important, damage-prone components in rotating machinery. In practical engineering, bearing failures are among the most frequent causes of breakdowns in rotating machinery. For example in the case of large induction motors, bearing faults can constitute the 44% of the total number of faults [1]. A bearing failure will cause unexpected downtime and economic loss. Consequently, there is a great demand for rolling bearing fault diagnosis techniques that can be used to make reliable maintenance decisions [2]. In general, fault diagnosis and prognosis algorithms can be categorized into model-based approaches, The associate editor coordinating the review of this manuscript and approving it for publication was Gerard-Andre Capolino.
knowledge-based approaches and data-driven approaches [3]. In data-driven methods, features can be extracted to characterize the historical process data as a priori knowledge to a diagnostic system [4], [5]. To date, many fault feature extraction methods have been developed for vibration signal feature extraction [6]- [10]. In [11], several advanced feature extraction approaches (i.e., fast Fourier transform, singular spectrum analysis, wavelet packet transform, empirical mode decomposition and local mean decomposition), in parallel, extract informative features from the raw signals for diagnosing gear faults. In [12], in order to search the most discriminative features for bearing fault diagnosis, heterogeneous feature extraction methods based on the complex envelope spectrum, statistical time-and frequency-domain parameters, and wavelet packet analysis were proposed. In [13], to improve the ability to extract valuable information from VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the measured vibration data sets, singular spectrum analysis had been employed to identify bearing fault. In [14], considering that condition-based monitoring of rotating machines requires robust features for accurate fault diagnosis, statistical spectral analysis was used to extract the fault features. Nevertheless, these methods are based on advanced signal processing methods to extract the fault features, which failed to consider the complexity and efficiency of fault diagnosis method. Dimensionless indicators are not influenced by many disturbances and are sensitive to faults. Because of these merits, they have been used as features for fault diagnosis in several studies [15]- [17]. However, the existing types of original dimensionless indicators are limited, and they usually suffer from varying amounts of aliasing in the feature space, causing a decrease in the diagnosis accuracy and an increase in the diagnosis uncertainty [18]. In view of the above mentioned problems, this paper proposes a new feature extraction method based on redefined dimensionless indicators (RDIs) and multi-scale analysis, called multi-scale redefined dimensionless indicators (MRDIs). MRDIs can reveal more informative fault information hidden in the original vibration signals in the case without using the signal processing methods. Because only the original vibration signals are used to extract the fault features, the complexity of fault diagnosis method can be largely cut down and the time consumption can be effectively decreased, which is different from the above studies.
In practice, a large number of features are usually extracted to collect all of the available information. Irrelevant, redundant, and similar information may exist in the feature set. Such information will not only swamp the effective features but also increase the model solution difficulty. This is also true for the extracted MRDIs. Feature selection method should be used to search for the most important features. In recent years, several feature selection methods have been presented in related literature [19]- [22]. In this study, density peak clustering (DPC) with geodesic distances (GD) is for the first time used to explore feature selection in the fault diagnosis field. DPC with the Euclidean distance (ED) as a clustering method has been successfully used in several studies [23]- [27]. However, if the data points lie on a complex nonlinear manifold hidden in a high-dimensional space, it is difficult for DPC with ED to obtain an accurate clustering result. To overcome this inadequacy, the GD was used to reveal the true distance between every pair of data points; this strategy was inspired by the idea of the isomap algorithm [28]. To date, only a few studies have considered the use of the DPC-GD for grouping data with arbitrary shapes or image data sets [29], [30]. Although the above studies have shown the successful use of DPC-GD in synthetic data field or image data field, DPC-GD still has the following problem that needs to be addressed: in the field of machine fault diagnosis, it may be difficult to directly cluster the correct number of sample types for sample dataset with a high-dimensional feature set. To handle this problem, we first use DPC-GD to perform fault feature clustering analysis and then employ pattern classifier to conduct sample type classification, which can address the above issue of existing studies that DPC-GD is directly used to perform sample type clustering analysis. DPC-GD for unsupervised feature selection is based on the observation that two fault features are probably related to the same group if they are close to each other. Thus, the fault features can be clustered into several groups so that fault features in the same group are similar to each other, while fault features in different clusters are dissimilar. Moreover, the DPC-GD for unsupervised feature selection was based on the assumption that the cluster centres and outliers should both be regarded as important features. The centre in each group represents one kind of fault feature and is regarded as an important fault feature. An outlier can be regarded as a cluster centre that contains only a single data point but that also represents one type of fault feature. Besides, DPC-GD can reduce the error in obtaining the distance between every pair of features. Therefore, the distances between features in the same cluster of the feature space can be decreased, whereas the distances between features from different clusters of the feature space are increased. Thus, important features can be effectively selected in feature selection process.
Based on the above analysis results, a novel fault diagnosis method is proposed based on MRDIs and unsupervised feature selection using DPC-GD. First, the definition of the original dimensionless indicators is extended, and multi-scale analysis is used to further process the RDIs, which can effectively capture more useful fault signal characteristics. Second, DPC-GD is used to enhance the unsupervised feature-clustering ability and obtain several important MRDIs. Finally, quadratic discriminant analysis (QDA) is adopted as a pattern classifier to simultaneously identify 12 different working conditions of rolling bearings. The experimental results clearly demonstrated that DPC-GD was effective at selecting the important features, and the proposed method could successfully differentiate 12 localized fault types, fault severities, and fault orientations of rolling bearings.
The main contributions of this work are as follows: 1) In this study, MRDIs were used as new fault features to find more useful fault information only from the original vibration signals, which could significantly decrease the complexity of the existing fault diagnosis method and decrease the amount of time consumed.
2) The proposed DPC-GD feature selection method was used to adaptively select several important features from a high-dimensional feature set for the first time. Compared with other feature selection methods, DPC-GD showed better feature selection performance.
The remainder of this work is organized as follows: The problem with rolling bearing fault diagnosis is described in section 2. Section 3 presents some basic definitions and theories related to this study. Section 4 concerns the applicability of the proposed method to rolling bearings based on experimental results, and the results of extensive comparative 84778 VOLUME 8, 2020 studies are presented as examples. Finally, conclusions and further work are given in section 5.

II. PROBLEM STATEMENT
The original dimensionless indicators have been widely used as fault features in the fault diagnosis field. Nevertheless, there is a limited number of existing types of original dimensionless indicators. In practical engineering, an increasing number of fault types, fault severities, and fault orientations have emerged. Utilizing the limited number of original dimensionless indicators to accurately diagnose more faults has become extremely difficult. Thus, the first problem is developing additional fault features to effectively and accurately identify additional fault categories. In addition, for high-dimensional features, the data usually show a complex, highly curved manifold structure, which causes significant difficulties in feature selection. Thus, the second problem is selecting important fault features from a high-dimensional feature set that may not be effectively selected using other existing feature selection methods. In consideration of these two problems, a new intelligent fault-diagnosis method based on MRDIs, DPC-GD, and QDA is proposed in this paper. For the first problem, MRDIs are used to construct additional types of dimensionless indicators to reveal more useful fault information. For the second problem, DPC-GD is used as an unsupervised feature selection method to select the important features. A flowchart of the proposed method is shown in Fig. 1, and the general steps are described as follows: Step 1: The vibration signals of a rolling bearing were measured by an acceleration sensor and divided into a series of segments for feature extraction. Rolling bearing fault signals were used to verify the effectiveness of the proposed method. Three bearing fault types (ball fault, inner race fault, and outer race fault) with various levels of fault severity (0.007-0.028 mils) were introduced. A total of 12 different operating conditions were tested.
Step 2: Based on the original vibration signals, µ types of RDIs were extracted as fault features. Additionally, multiscale analysis was used to further process the RDIs, aiming to search for more useful fault information hidden in the original vibration signals. In this study, the scale factor was denoted as τ . A high-dimensional feature set comprising µ×τ MRDIs was built. These MRDIs could measure the complexity of the nonlinearity and nonlinear vibration signals at different scales.
Step 3: An unsupervised feature-selection method based on DPC-GD was used to select several important MRDIs from the µ × τ MRDIs. The Floyd algorithm was used to calculate the GD, and the Gaussian kernel function was utilized to calculate the local density of each feature. A representation called a decision graph was introduced to find the MRDI cluster centres and outliers, and each of the remaining MRDIs was assigned to the same cluster as its nearest neighbour of higher density. The MRDIs possessing relatively large local density ρ and distance δ were the cluster centres, and the MRDIs possessing a small local density ρ and relatively large distance δ were regarded as potential outliers. An outlier was regarded as a cluster centre that contained only a single data point but represented one type of fault feature. Thus, the cluster centres and potential outliers were both regarded as important features. Using the DPC-GD feature selection method, the MRDI set could be greatly reduced, and several important MRDIs could be obtained.
Step 4: To illustrate the classification performance of the original dimensionless indicators, RDIs, and MRDIs, QDA was used to identify the 12 working conditions of the rolling bearing, and the 10-fold cross-validation approach was employed to validate the performance of the QDA. The average identification accuracy for all 10 trials was computed and used as the final identification accuracy. For the RDIs and MRDIs, DPC-GD was used to select several important features. Then, these important features were added to the QDA one by one. Thus, the relationships between the identification accuracy and different numbers of RDIs and MRDIs were obtained.
Step 5: To further demonstrate the superiority of DPC-GD, it was compared with other feature selection methods. For each feature selection method, the detailed feature selection results and classification results of the MRDIs corresponding to the maximum accuracy for normal and eleven faulty working conditions for a rolling bearing were given.

III. RELATED THEORIES A. MULTI-SCALE REDEFINED DIMENSIONLESS INDICATORS
Dimensionless indicators are good diagnosis characteristics that do not require transformation and processing of the vibration signal. In practice, dimensionless indicators are sensitive to faults rather than working conditions, and they are not affected by disturbance. Traditional dimensionless parameters (TDIs) are defined as follows [18]: where x denotes the vibration amplitude and p(x) denotes the probability density function of the vibration amplitude. Details are given as follows: (i) If l = 2, m = 1, we obtain the waveform indicator.
(iv) If l → ∞, m = 2, we obtain the peak indicator. ( To date, there are six types of dimensionless indicators, namely, the waveform indicator, impulse indicator, margin indicator, peak indicator, kurtosis indicator, and skewness indicator. However, because the number of fault types, fault severity, and fault orientations of rolling bearings has gradually increased, it is necessary to extract more useful fault information from the vibration signals to identify additional fault categories. Therefore, RDIs were proposed to find more informative fault information hidden in the vibration signals. The construction of the RDIs was based on TDIs. From the previously provided definition of the dimensionless indicators, it can be seen that they depend on parameters l and m. Six types of existing dimensionless indicators are obtained when six groups of l and m values are determined. Additional types of dimensionless indicators can be obtained if the appropriate values for l and m are changed, while ensuring that the numerators and denominators of the defining formulas have the same dimension. Thus, a large number of new dimensional indicators, called RDIs, can be obtained simply by selecting different values for l and m. In addition, because of its good performance in fault feature extraction, multi-scale analysis can be used to further process the RDIs. The goal is to search for more informative fault information hidden in the vibration signals. The definitions of the RDIs are as follows [31]: Definition 1: Similar to the definition of the waveform indicator, we let l and m vary from α to β, with a change interval of λ for α and β, where α, β ∈ [0.5, 5], λ = 0.5, and α = β. Definition 2: Similar to the definitions of the impulse indicator, margin indicator, and peak indicator, we let l → ∞, m = γ , where the change interval is λ for γ , γ ∈ [0. 5,5], and λ = 0.5.
Definition 3: Similar to the definitions of the kurtosis indicator and skewness indicator, we let l and m vary from ε to η, where the change interval is λ 1 for ε and the change interval with the goal of ensuring that the numerator and denominator of the definition formula have the same dimension, For definition 1, definition 2, and definition 3, 90 types of dimensionless indicators, 10 types of dimensionless indicators, and 45 types of dimensionless indicators are obtained, respectively. Thus, a total of 145 types of dimensionless indicators were used as the fault features in this study. In other words, the number of RDIs was 145.
In addition, in order to further improve the diagnosis ability of the RDIs, multi-scale analysis is used to transform the original signal into different scales of data through the coarse-graining process. Given a one-dimensional discrete time series, S = {x 1 , x 2 , . . . , x n }, first, we divide the original time series into non-overlapping windows of length τ . Second, consider the τ -length vectors The consecutive coarse-grained time series sets y (τ ) j are reconstructed according to the following equation: where τ is also called the scale factor; for scale τ , the time series y (τ ) is {y ν } and ν is the quotient of n/τ . In particular, for scale one, the time series y (1) is simply the original time series {y n }. Finally, we calculate the RDI value for each coarse-grained time series plotted as a function of the scale factor.
Based on the aforementioned three definitions of the dimensionless indicators and multi-scale analysis, new dimensionless indicators can be obtained, and we define these dimensionless indicators as the multi-scale redefined dimensionless indicators (MRDIs), which can provide much more informative fault information regarding the operating condition of rolling bearings. The MRDIs are defined as follows: where y (τ ) denotes the vibration amplitude and p y (τ ) denotes the probability density function of the vibration amplitude.

B. DENSITY PEAK CLUSTERING BASED ON GEODESIC DISTANCES
The density peak clustering algorithm, proposed by Rodriguez and Laio, is a density-based clustering method and does not require one to specify the number of clusters. It is based on the idea that cluster centres are characterized by having a higher local density than their neighbours and a relatively large distance from points with higher densities [23]. The DPC method utilizes two important quantities: one is the local density ρ i of each data point i, and the other is its distance δ i from points of higher density. For the data set that is to be clustered, where N denotes the number of data points in the data set. The detailed calculation process for these two important quantities is as follows: The distance matrix of the data set must be computed first. The Euclidean distance d ij between x i and x j is defined as According to the Gaussian kernel function, the local density ρ i can be defined as where the cut-off distance d c is an adjustable parameter.
The process for selecting d c is the only part of Formulas 6 and 7 that has not been described; it is defined as: where . D is the set of all the distances between each pair of points in the data set, which are sorted in ascending order. p is a percentage.
The computation of δ i is quite simple, because its definition is the nearest separation distance between the samples of higher density and data point i, as follows: For the data point that has the highest local density, we simply let Based on the local density ρ i and separation distance δ i , a representation called the decision graph is introduced to help one find the cluster centres and the probable outliers. In the decision graph, a data point possessing relatively large ρ i and δ i values implies that the data point may be the centre of a cluster. When ρ i and δ i are both small, the data point lies near the edge of the cluster. When ρ i is large and δ i is small, it implies that the data point is close to but not the centre of the cluster. When ρ i is small and δ i is large, on the other hand, the point is far from the entire data set, indicating that the data point is probably an outlier.
From Equation (6), it can easily be observed that the distance d ij between x i and x j has a direct impact on the values of local density ρ i and separation distance δ i . Thus, this paper uses Floyd's algorithm [32] to calculate the GD. The GD is the unique shortest path connecting two features in a high-dimensional fault feature space. The GD is specifically calculated as follows: First, the neighbourhood for each point is calculated. For example, the neighbourhood of a point may be the k nearest points. If data point j is one of the k nearest neighbours of data point i, then G(i, j) = 1 and G(j, i) = 1; otherwise, G(i, j) = 0 and G(j, i) = 0. The value of G(i, j) indicates whether data point j is a nearest neighbour of data point i. In this way, a neighbourhood graph can be constructed. Next, the GD d G (x i , x j ) is initialized to the ED between data point i and data point j.
where d(x i , x j ) represents the ED between data point x i and data point x j . Finally, the GD between two data points is calculated as follows: For each value of q = 1, 2, . . . , N in turn, replace all entries d G (x i , x j ) by The procedure of the DPC-GD algorithm for adaptive feature selection is presented in Table 1.

IV. EXPERIMENTAL RESULTS AND COMPARISONS A. EXPERIMENTAL DATA ANALYSIS
The effectiveness of the proposed method was validated using the measured vibration fault signals of a rolling bearing provided by the Electrical Engineering Laboratory of Case Western Reserve University [33]. This study employed drive-end bearing data for a shaft speed of 1750 rpm. Three bearing fault types (ball fault, inner race fault, and outer race fault) with various levels of fault severity (0.007-0.028 mils) were introduced using electro-discharge machining. A total of 12 different operating conditions were tested, which were as follows: normal condition, ball fault (0.007 mils), ball fault (0.014 mils), ball fault (0.021 mils), ball fault (0.028 mils), The vibration signals for the 12 different operating conditions were processed using multi-scale analysis, and multiple coarse-grained time series were obtained. In the implementation process of the multi-scale analysis, the scale factor was τ = 1, 2, 3, . . . , 6, the selection of which was based on [34], [35]. Based on each coarse-grained time series, the values of the RDIs were calculated under different scale factors. From the above analysis, it was possible to obtain a feature set containing 870 MRDIs from the original vibration signals. The specific order of the 870 MRDIs is as follows: the first 145 MRDIs are for the original signals at scale one (Note: when the scale factor is one, the 145 MRDIs are simply the 145 RDIs). The next 145 MRDIs are for the original signals at scale two. In this way, the last 145 MRDIs are for the original signals at scale six. To qualitatively show the difference between the RDIs and TDIs, four RDIs, namely, the 1st RDI, 9th RDI, 55th RDI, and 108th RDI, are illustrated in Fig. 3. In addition, in order to quantitatively illustrate how the MRDIs and RDIs are different from the TDIs, the first 145 MRDIs (note that the first 145 MRDIs are the 145 RDIs) and 6 TDIs are taken as an example. Distance ratios (DRs) of between-class and within-class distances for the TDIs and MRDIs, respectively, were calculated and arranged in decreasing order. The DR values of the 145 MRDIs are shown in Fig. 4. Among the 145 MRDIs, the six TDIs are marked in red, and the remaining dimensionless indicators are marked in blue. If a dimensionless indicator has a large DR value, it can maximize the between-class distance and minimize the within-class distance, indicating a good performance in distinguishing different operating conditions. By analysing the DR values of the 145 MRDIs, it can be concluded that the DR values of most MRDIs were greater than those of the TDIs, indicating that the MRDIs have better distinguishability for different fault types than the TDIs.

B. COMPARISONS FOR TDIS, RDIS, AND MRDIS
This subsection compares the classification performances of the TDIs, RDIs, and MRDIs based on QDA, which is a powerful statistical multivariate pattern-recognition method. A detailed introduction to QDA can be found in [36]. In this study, the 10-fold cross validation method was used to train and test the QDA pattern classifier. The sample dataset was randomly divided into 10 subsets, one of which was used as the testing set, and the other 9 of which were combined to form the training set. Then, the average identification accuracy for all 10 trials was computed and used as the final identification accuracy.
For the 870 MRDIs, DPC-GD was used to adaptively select the important features. To evaluate the selection results, the selected important features were used as the input of QDA. In DPC-GD, the parameter k for the k nearest neighbours was set at five, and the cut-off distance d c was set at 2%. In addition, it is worth pointing out that in the calculation process for the GD between each pair of features according to Formula (11), there may be an infinite distance between two data points in the distance matrix, imposing several infinite values on the delta matrix. In this study, the maximum distance, second only to the infinite distance value, was used to replace the infinite value in the distance matrix. Thus, the delta matrix of the DPC did not contain an infinite value, which was beneficial for constructing a decision graph. The decision graph generated by DPC-GD is shown in Fig. 5, which is a plot of the separation distance as a function of the local density for each MRDI. It can be seen that three MRDIs, identified as centres, are located in the upper right part of the decision graph, and six MRDIs, identified as outliers, are in the upper left part of the decision graph. It is worth noting that the outliers overlap in the decision graph, causing it to seem that only four MRDIs are selected. The actual number of selected MRDIs is nine. These nine MRDIs are the 1st, 2nd, 10th, 19th, 101st, 137th, 248th, 475th, and 632nd MRDIs. Among these, the 1st, 2nd, 10th, 19th, 101st, and 137th MRDIs are outliers, and the 248th, 475th, and 632nd MRDIs are centres. To evaluate the classification effectiveness of the 9 selected MRDIs, QDA is used as the pattern classifier to identify the 12 bearing working conditions. For these 9 MRDIs, the identification accuracy is 59.17%, 76.83%, 76.17%, 76.33%, 88.33%, 91.00%, 97.67%, 97.00% and 97.17%, respectively, when they are fed into DQA one by one. The maximum accuracy is 97.67% when the number of selected MRDIs reaches 7. The classification results of the MRDIs corresponding to the maximum accuracy of 97.67% are shown in Fig. 6. It can be observed that for the normal condition and the faulty conditions B-014, B-028, IR-014, OR-007 and OR-021, the fault classification rates reach to 100%. For the faulty conditions B-007, B-021, IR-028 and OR-014, the fault classification rate is 94%. For the faulty conditions IR-007 and IR-021, the fault classification rate is 98%. The overall recognition rate is 97.67%.
For the 145 RDIs, the decision graph generated by DPC-GD is shown in Fig. 7. It can be seen that three RDIs identified as centres are located in the upper right part of the decision graph, and one RDI identified as an outlier is in the upper left part of the decision graph. The number of selected RDIs is four. These four selected RDIs are the 52nd, 60th, 104th, and 137th RDIs. The identification accuracy of these four RDIs are 64.83%, 75.17%, 90.67%, and 93.17%, respectively, when they are fed into DQA one by one. The classification results of the RDIs corresponding to the maximum accuracy of 93.17% are shown in Fig. 8  rates are 84%, 78%, 96%, 74% and 90%, respectively. The overall recognition rate is 93.17%.
Based on the above experimental results, it can be seen that compared to the TDIs and RDIs, the MRDIs can improve the fault diagnosis accuracy by 11% and 4.5%, respectively. This demonstrates that the classification effectiveness of the MRDIs is better than that of the RDIs and TDIs.   In addition, DPC-GD is directly used to conduct clustering for the 600 samples. The sample clustering results based on the MRDIs and RDIs are illustrated in Fig. 10, where it can be observed that only six clusters and three clusters are obtained, respectively. However, the actual number of sample clusters was 12. The experimental results show that DPC-GD fails to cluster the correct number of fault types for the sample data set in a high-dimensional feature space.

C. COMPARATIVE STUDY OF ORIGINAL DPC AND DPC-GD BASED ON MRDIs
For the original DPC, several tests were performed with different values for the cut-off distance (d c = 0.1%, 0.5%, 1%, 1.5%, 2%, 2.5%). The decision graphs generated  by the original DPC with six different cut-off distances are shown in Fig. 11. It can be seen that the number of selected MRDIs is always two. These two selected MRDIs for the six group experiments are the 137th and 396th MRDIs, 137th and 396th MRDIs, 137th and 466th MRDIs, 137th and 548th MRDIs, 137th and 548th MRDIs, and 137th and 548th MRDIs. Among the six group experiments, the maximum identification accuracy is 59.67% when the cut-off distance d c is 0.1% or 0.5%. This low accuracy was caused by the fact that the ED could not truly measure the distances between the pairs of MRDIs. The intrinsic geometry of the complex nonlinear structure was ignored, leading to an inaccurate local density for each MRDI. The classification results corresponding to the maximum accuracy of 59.67% are shown in Fig. 12. It can be seen that for the normal condition only, the recognition rate is 100%. For the faulty conditions B-007, B-014, B-021, B-028, IR-007, IR-014, IR-021, IR-028, OR-007, OR-014 and OR-021, the fault classification rates are 58%, 52%, 60%, 38%, 52%, 54%, 64%, 92%, 50%, 22% and 74%, respectively. The overall recognition rate is 59.67%.
For DPC-GD, several tests were performed with different values of parameter k for the k nearest neighbours (k = 5, 6, 7) and different cut-off distance values (d c = 0.1%, 0.5%, 1%, 1.5%, 2%, 2.5%). The decision graphs generated by DPC-GD with different values of d c at k = 5 are shown in Fig. 13. It can be seen that the number of selected MRDIs is always nine. The detailed MRDI selection results and identification accuracy when using DPC-GD are presented in Table 2. The identification accuracy is slightly lower only when the cut-off distance is 0.5%. The maximum identification accuracy is 98.17% when the number of nearest neighbours and the cut-off distance are set at 5 and 0.1%, respectively. The classification results corresponding to the maximum accuracy of 98.17% are shown in Fig. 14. It can be seen that for the normal condition and the faulty conditions B-014, B-028, IR-014, IR-021, OR-007 and OR-021, the fault classification rates reach 100%. For the faulty conditions B-007 and B-021, the fault classification rate is 94%. For the faulty condition IR-007, the fault classification rate is 98%. For the faulty conditions IR-028 and OR-014, the fault classification rate is 96%. The overall recognition rate is 98.17%.
In addition, several tests were performed using DPC-GD with different cut-off distance values when the parameter k was fixed at six and seven. The decision graphs are shown in Fig. 15 and Fig. 18, respectively, and the detailed MRDI selection results and identification accuracy are presented in Tables 3-4. It can be observed that when the parameter k is fixed at six, the identification accuracy only shows a slight change. The maximum identification accuracy is 98.17% when the cut-off distance is 0.5% or 1.5%. The classification results corresponding to the maximum accuracy of 98.17% are shown in Figs. [16][17]. When the number of nearest neighbours is set at seven, the identification accuracy only shows a slight change as well. The maximum identification accuracy is 98.33% when the cut-off distance d c is 1.5%. The classification results corresponding to the maximum accuracy of 98.33% are shown in Fig. 19.
Based on the above experimental results, it can be concluded that compared with the original DPC, DPC-GD can enhance the unsupervised feature selection ability and obtain better feature selection results. DPC-GD can truly reveal the distance between every pair of features, as well as accurately construct a decision graph to select several important features to improve identification accuracy. The proposed DPC-GD for unsupervised feature selection can achieve satisfactory diagnosis results.

D. COMPARATIVE STUDY OF OTHER FEATURE SELECTION METHODS
Furthermore, in order to verify the superiority of the proposed method, other existing feature selection methods were used to conduct a comparison. This comparative study primarily considered four feature selection methods, including local learning-based unsupervised clustering for feature selection (LLC) [37], the Laplacian score for feature selection (LSFS) [38], unsupervised discriminative feature selection (UDFS) [39], and the ordinal locality approach for unsupervised feature selection (OLFS) [40]. In addition, VOLUME 8, 2020  we briefly investigate the parameter selection issue for these four feature selection methods without considering the selection of the optimal parameters, whose studies are beyond the scope of this study. For simplicity, we only reported the maximum identification accuracy and the corresponding detailed feature selection result for all compared methods. For LLC, the number of clusters c 1 , the size of the mutual neighbors k 1 and the trade-off parameter β 1 were searched in: c 1 ∈ {6, 8, 12}, k 1 ∈ {5, 10, 20} and β 1 ∈ {0.1, 1, 10}, respectively. For LSFS, no parameter selection was needed. For UDFS, the number of clusters  was set at 12, and the regularization parameter γ 1 was searched in: γ 1 ∈ 10 −9 , 10 −8 , 10 −7 , · · · , 10 9 . For OLFS, the number of clusters c 2 , the number of nearest neighbors k 2 , the regularization parameters α 2 and β 2 were searched in: c 2 ∈ {6, 12} , k 2 ∈ {5, 10} , α 2 , β 2 ∈ 10 −6 , 10 −4 , 10 −2 , · · · , 10 6 . After a detailed simulation analysis, for LLC, the maximum identification accuracy that we achieved was 94.33% with the combination c 1 = 12, k 1 = 20, β 1 = 0.1. For UDFS, the maximum identification accuracy that we achieved was 91.5% with γ 1 = 10 −8 . For OLFS, the maximum identification accuracy that we achieved was 91.83% with the combination c 2 = 6, α 2 = 10 4 , β 2 = 10, k 2 = 5. For each of the abovementioned methods, the first 30 MRDIs were selected as the QDA inputs for fault identification analysis. The classification accuracy results of these four methods are illustrated in Fig. 20. It can be observed that with an increase in the number of MRDIs, the classification accuracies of the LLC, LSFS, UDFS, and OLFS methods also improved. However, when the number of input features increased to 11, 11, 18, and 26, the classification accuracies reached their highest values of 94.33%, 92.17%, 91.50%, and 91.83%, respectively. Then, the classification accuracy gradually decreased as the number of input MRDIs continued to increase. Table 5 presents the detailed feature selection results and identification accuracy of these four feature selection methods. Compared with the classification results of the MRDIs based on DPC-GD, it can be seen that the identification accuracy of the proposed method is higher than that of the compared methods. Specifically, the proposed method achieved a classification accuracy of 98.33% with six MRDIs, which was better than the values for LLC with 11 MRDIs (94.33%), LSFS with 11 MRDIs (92.17%), UDFS with 18 MRDIs (91.50%), and OLFS with 26 MRDIs (91.83%). These comparative results demonstrate that the proposed method could successfully differentiate 12 localized fault types of rolling bearings and is superior to the aforementioned methods.

E. COMPARATIVE STUDY FOR PATTERN CLASSIFIERS BASED ON THE SELECTED MRDIs
In addition, the performance of the QDA classifier was compared with that of linear discriminant analysis (LDA), a quadratic support vector machine (QSVM) and bagged trees (BT). The nine MRDIs selected by DPC-GD, in which the parameter k for the k nearest neighbours was set at five and the cut-off distance d c was set at 2%, were used as the input of the LDA, QSVM and BT. The 10-fold cross validation method was used to train and test the above pattern classifiers. The average identification accuracy for all 10 trials was computed and used as the final identification accuracy. The maximum classification accuracies of the LDA, QSVM and BT are 89.5% with 9 MRDIs, 93.8% with 8 MRDIs and 91.7% with 9 MRDIs, respectively. These comparative results demonstrate that, using the nine selected MRDIs, the classification accuracy of the QDA pattern classifier is higher than that of the LDA, QSVM and BT.

V. CONCLUSION
This paper proposed a novel hybrid intelligent fault diagnosis method based on MRDIs and DPC-GD to identify 12 different working conditions of a rolling bearing. The proposed method includes three steps for intelligent fault diagnosis: fault feature extraction by MRDIs, unsupervised feature selection using DPC-GD, and fault pattern recognition using QDA. The advantage of the MRDIs is that additional types of dimensionless indicators can be constructed to deal with the increasing number of fault types, fault severities, and fault orientations, and more useful fault information hidden in the original vibration signals can be found. The unsupervised feature selection method DPC-GD plays an important role in improving classification accuracy. Compared with other feature selection methods, the proposed method tended to achieve better diagnosis results with fewer selected MRDIs. However, the effectiveness of DPC-GD is sensitive to two parameters: the cut-off distance and the number of nearest neighbours. Thus, a method to select the optimal parameters when performing feature selection to improve identification accuracy needs further study.