Machinery Fault Diagnosis Scheme Using Redefined Dimensionless Indicators and mRMR Feature Selection

Machinery fault diagnosis methods based on dimensionless indicators have long been studied. However, traditional dimensionless indicators usually suffer a low diagnostic accuracy for mechanical components. Toward this end, an effective fault diagnosis method based on redefined dimensionless indicators (RDIs) and minimum redundancy maximum relevance (mRMR) is proposed to identify the health conditions of mechanical components. In the proposed method, the vibration signals are first processed by the variational mode decomposition, and multiple RDIs are constructed based on the decomposed signals. Subsequently, the mRMR approach is introduced to select the RDIs and several important RDIs can be obtained. Finally, the obtained RDIs are fed into a grid search support vector machine to perform fault pattern identification. To verify the superiority of the proposed method, two experimental examples for different fault types of mechanical components including rolling bearing and gearbox are conducted. The experimental results demonstrated that the RDIs as new fault features can effectively solve the deficiency of the traditional dimensionless indicator, and has stronger distinguishing ability for machinery faults. Additionally, our proposed method successfully differentiated 12 fault conditions of rolling bearings and nine fault conditions of gears with average accuracies of 97.47% and 97.12% with 11 and 5 RDIs, respectively.


I. INTRODUCTION
Rolling bearings and gears are the most commonly used components in modern machinery that frequently functions under harsh environmental conditions. When fault emerges in such mechanical components, it may cause catastrophic machine breakdown and significant economic loss. Therefore, fault diagnosis technology for mechanical components must be investigated to improve reliability and reduce unscheduled downtime [1]. As machinery vibration signals typically exhibit nonlinear and nonstationary characteristics, The associate editor coordinating the review of this manuscript and approving it for publication was Li He . the health condition information distributed in machinery vibration signals is complex. Over the past decades, many signal processing techniques, such as symplectic geometry mode decomposition [2], wavelet packet decomposition [3], empirical mode decomposition [4], local mean decomposition [5], variational mode decomposition [6], empirical wavelet transform [7], etc., have been widely used to analyze the nonlinear and nonstationary signals. Among them, a self-adaptive and entirely nonrecursive signal decomposition method, called variational mode decomposition (VMD), has been proposed by Dragomiretskiy and Zosso [8]. This method can decompose a complex vibration signal into a sum of amplitude modulation and frequency modulation components, as well VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ as effectively preprocess nonlinear and nonstationary signals. Several studies have been conducted regarding the applications and extensions of VMD to fault diagnosis [6], [9]- [11], and VMD has been regarded as highly effective in rotating machinery fault diagnosis [12], [13]. Therefore, we utilized VMD to produce the original feature sets in this study. Based on signal processing, various types of fault features can be extracted [14]- [18]. Dimensionless indicators as timedomain features are sensitive to faults and not affected by disturbances. Therefore, they have been widely used as fault features in several studies [19]- [21]. Nevertheless, the existing types of traditional dimensionless indicators (TDIs) are limited; they are only sensitive to certain fault types. In practical engineering, an increasing number of fault types, fault severities, and fault orientations have emerged. Utilizing the limited number of dimensionless indicators to accurately diagnose more faults has become extremely difficult. Additionally, TDIs often suffer from varying amounts of aliasing in the feature space, causing a decrease in diagnosis accuracy and an increase in diagnosis uncertainty [22], [23]. To solve the abovementioned problems of TDIs, a new fault extraction method based on redefined dimensionless indicators (RDIs) is proposed in this study. Based on the RDIs, more types of dimensionless indicators and more useful fault information hidden in the vibration signal can be obtained.
After feature extraction, feature selection is typically conducted to reduce the feature dimension and obtain an appropriate subset of features. Currently, several feature selection methods have been investigated and applied in fault diagnosis. In [24], a genetic algorithm (GA)-based feature selection was applied to identify crack types for pressure vessels. In [25], two particle swarm optimization (PSO)based feature selection algorithms were proposed. However, the GA-or PSO-based feature selection either traps easily into the local optima or has a low computational efficiency. In [26], kernel principal component analysis was used to select important features by the feature contribution rate. In [27], a locally linear embedding algorithm was used to reduce the dimensions of each row vector of a feature matrix. In [28], locality preserving projections were used to select the more sensitive low-dimensional information hidden in high-dimensional fusion feature structures. Nevertheless, these methods have failed to consider the possible redundant and irrelevant features in high-dimensional feature sets, of which redundant and irrelevant features might reduce the fault identification accuracy and increase the computation time. Among the several available feature selection methods, the max-relevance and min-redundancy (mRMR) method [29] is an effective feature selection approach, which can automatically eliminate redundant and irrelevant and features in a high-dimensional feature space and select important features according to the maximum correlation and minimum redundancy criteria. In [30], [31], the mRMR approach was employed to eliminate redundant features and select the first important features to construct new fault feature vectors to represent fault characteristics. Additionally, the mRMR method, which affords fast calculation and strong robustness, has been validated in [32].
Based on the above analysis, an effective fault diagnosis method for mechanical components is proposed based on VMD, RDIs extraction, mRMR feature selection, and grid search support vector machine (GSSVM) fault identification. To our best knowledge, this study is the first that investigates the superiority of RDIs in machinery fault diagnosis, thereby providing a new feature extraction strategy for intelligent fault diagnosis. Two experimental examples for different fault types of mechanical components including rolling bearing and gearbox demonstrated that the RDIs can effectively solve the deficiency of the TDIs, and has stronger distinguishing ability for the fault types. Also, the experimental comparison results reinforce the superiority of mRMR in selecting the useful RDIs.
The remainder of this paper is organized as follows: section 2 presents the proposed diagnostic framework and techniques involved; section 3 describes the related theories; section 4 presents the experimental demonstration, VMD-based RDIs construction, and mRMR-based RDI selection; section 5 describes the comparative studies; section 6 concludes this paper.

II. HYBRID INTELLIGENT FAULT DIAGNOSIS METHOD
In fault diagnosis filed, feature extraction and selection are important prior to fault identification. Feature extraction is the basis of feature selection, which directly affects the fault identification accuracy. Selecting an appropriate and discriminative feature set to reflect the operating condition of the machinery is critical in fault diagnosis. Considering the increasing number of fault types, fault severities, and fault orientations of machinery components, it is necessary to construct more fault features to represent useful fault information. In this study, a new feature extraction method based on RDIs is proposed. The RDIs can extract more useful fault information hidden in the vibration signals than TDIs, which is beneficial for fault identification. However, different types of RDIs have varying amounts of distinguishing ability for fault patterns. Therefore, the distinguishing ability of each RDI must be considered. Additionally, because the fault feature set is typically high dimensional and contain irrelevant and redundant features, a high-dimensional feature set tends to deteriorate the performance of fault identification. This applies for RDIs as well. Therefore, in this study, the mRMR method was used to select RDIs possessing higher degrees of distinguishing ability and eliminate redundant and irrelevant RDIs in high-dimensional feature sets. The key idea behind the mRMR approach is to maximize the relevance between features and output class labels, and minimize the redundancy between features.
Based on the aforementioned analysis, this study proposes an intelligent fault diagnosis approach for mechanical components based on VMD, RDIs extraction, mRMR feature selection, and GSSVM fault identification. First, the VMD method was used to process the original vibration signals of rolling bearings and gears under different operating conditions. Using VMD method, each vibration signal can be decomposed into a number of intrinsic mode functions (IMFs). Subsequently, TDIs are redefined and multiple types of RDIs are constructed. Based on the original signals and IMFs, multiple RDIs as fault features are extracted. As such, a high-dimensional feature vector comprising (n + 1) × m RDIs is built, where n represents the number of IMFs, and m the number of RDIs of the original signals. Subsequently, the mRMR method is used to select the appropriate and discriminative features from the extracted RDIs to establish a new feature set that contains the informative fault information. The mutual information between each RDI and class has been calculated, and the maximum relevance RDI with the largest mutual information is first selected. Subsequently, the mutual information between every two RDIs are calculated, and the minimum redundant RDIs are then selected. Utilizing the mRMR feature selection, several top ranked RDIs are selected as fault features. Finally, the GSSVM pattern classifier is employed to recognize 12 operating conditions of rolling bearings and nine fault conditions of gears. The GSSVM is useful in solving small samples as well as nonlinear and high-dimensional pattern recognition. The selected RDIs are fed into the GSSVM to distinguish several fault types of rolling bearings and gears. By adopting the grid search method, we can obtain the best kernel parameter γ and penalty parameter C for the support vector machine (SVM). Specifically, a set of candidates is first selected for both C and γ . Using the training sample set, each pair of C and γ is evaluated by cross validation, and the pair with the highest accuracy is determined as the optimal parameters. As such, the optimal GSSVM model can be obtained, and the testing sample set is used to validate the optimal GSSVM model. Experimental results demonstrated that RDIs as fault features can successfully reveal the fault characteristics. Furthermore, RDIs can overcome the major shortcomings of TDIs, and the selected RDIs using mRMR can improve the fault identification accuracy of rolling bearings and gears. The flowchart of the proposed fault diagnostic framework is shown in Fig. 1.

III. RELATED THEORIES A. VARIATIONAL MODE DECOMPOSITION
VMD is an adaptive and non-recursive signal decomposition method, which can decompose the original signal into narrow-band components with independent centre frequencies. In the VMD framework, the signal is decomposed into k discrete number of components, and the bandwidth of a component can be evaluated with constrained variational optimization problem, and the formulated constrained variational problem can be expressed as follows: where u k is the kth component of the signal, and the ω k denotes center frequency of the kth component of the signal, f is the origin signal, δ is the Dirac distribution, t is the time script, and * denotes convolution. By introducing a quadratic penalty and Lagrangian multipliers, the above constrained optimization problem can be expressed as: where α denotes the balancing parameter of the data-fidelity constraint. Eq. (2) is then solved with the alternate direction method of multipliers, and then all the modes gained from solutions in spectral domain are written as: where f (ω), u i (ω), λ(ω) and u n+1 k (ω) are the Fourier transforms of y(t), y i (t), λ(t) and y n+1 k (t). It is noted that Eq. (3) contains the Wiener filter structure. The component in time domain can be obtained from the real part of inverse Fourier transform of the filtered signal. More detailed information concerning the VMD can be found in the literature [8].
Through using the VMD method, multiple sub-signals under different frequency band are obtained. For each VOLUME 8, 2020 sub-signal, 145 types of RDIs are calculated. The issue of how to calculate RDIs is provided in the following.

B. REDEFINED DIMENSIONLESS INDICATORS
Dimensionless indicators are better diagnosis characteristics, which do not require transformation and processing of the vibration signal. In practice, dimensionless indicators are sensitive to faults instead of working conditions, and they are not affected by the disturbance. Thus, they are employed widely in complex industrial conditions. Traditional dimensionless parameters are defined as follows [22]: where x denotes the vibration amplitude, p(x) denotes the probability density function of vibration amplitude, and l and m are two different constants.
Details are given as follows: To date, there are existing six types of dimensionless indicators, namely waveform indicator, impulse indicator, margin indicator, peak indicator, kurtosis indicator, and skewness indicator. However, considering that the number of the fault types, fault severities and fault orientations of rolling bearings has gradually increased, more useful fault information are needed to extract from the vibration signals for identifying more fault categories. For this reason, in this study RDI is proposed to find more informative fault information hide in the vibration signals. The construction of the RDIs is based on the traditional dimensionless indicators. From the above definition of the dimensionless indicators, it can be seen that the dimensionless indicators are depended on the parameters l and m. Six types of existing dimensionless indicators are obtained when six groups l and m are determined. That is, if the value of the appropriate l and m is different, meanwhile, ensuring the numerator and the denominator of the definition formulas possessing the same dimension, more types of dimensionless indicators can be obtained. As such, we only choose different values of the l and m, a large number of new dimensional indicators, called as RDIs, can be obtained. RDIs provide a new feature extraction strategy for intelligent fault diagnosis. The definitions of the RDIs are as follows.
Definition 3: Similar to the definitions of the kurtosis indicator and skewness indicator, we let l and m vary from ε to η, where the change interval is λ 1 for ε and the change interval Based on the aforementioned three definition types of the dimensionless indicators, new dimensionless indicators can be obtained, and we define such dimensionless indicators as redefined dimensionless indicators (RDIs), which can provide much more informative fault information regarding the operating condition of rolling bearings. The RDIs are respectively defined as follows: where x denotes the vibration amplitude, p(x) denotes the probability density function of vibration amplitude, and α, β, γ , ε, η, λ, λ 1 , λ 2 denotes different constants.
When different constants are chosen, meanwhile, ensuring the numerator and the denominator of the definition formulas possessing the same dimension, more types of dimensionless indicators can be obtained. As for vibration signals, the integrals in the above formulation can be easily calculated by the samples.

C. MAX-RELEVANCE AND MIN-REDUNDANCY
The core concept of the mRMR approach is to select a feature subset set that best characterizes the statistical property of a target classification variable, subject to the constraint that these features are mutually as dissimilar to each other but marginally similar to the classification variable. Hitherto, several different forms of mRMR have been proposed, where ''relevance'' and ''redundancy'' were defined using factors such as mutual information, correlation, and distances. In this study, a mutual-information-based mRMR criterion is used to measure the similarity level between features and the correlation level between a feature and class. Let T = x ij N ×m denote a sample matrix, where x ij is the ith sample value of the jth feature; N the total number of samples; and m the total number of features. The mRMR approach can select an optimal p (p < m) dimension feature subspace to maximize the correlation between a feature and class, and minimize the correlation between each feature. Given two jointly discrete random variables X and Y , their mutual information is defined as follows: where p(x, y) is the joint probability mass function of X and Y ; p(x) and p(y) denote the marginal probability mass functions of X and Y , respectively. Suppose a i represents the individual feature and c the class. The max-relevance criterion can be expressed as where |S| is the dimension of the feature space S, I (a i ; c) is the mutual information between the individual feature a i and class c and D is a vector containing the mutual information values between the features and class. In addition, because the features selected based on the maxrelevance criterion may be redundant, the min-redundancy criterion was used to minimize the redundancy between features, which is expressed as where I a i ; a j is the mutual information between features a i and a j , and R is a vector containing the mutual information values between the features. Combining the max-relevance and min-redundancy criteria mentioned above, the mRMR feature selection method is formed, which can optimize D and R simultaneously.
Using formula (9), the problems of the max-relevance and min-redundancy criteria can be transformed to maximize the mutual information difference (D, R). Typically, the incremental search method is used to obtain the near optimal features. Suppose m − 1 features have been selected to establish feature set S, and the following criteria must be satisfied to select the m-th feature.
Formula (10) can guarantee that the correlation between the feature and variables achieves the maximum, and that the correlation between each feature achieves the minimum. A more detailed and comprehensive description concerning the mRMR feature selection method is available in [29].

D. GRID SEARCH SUPPORT VECTOR MACHINE
The SVM performs classification by constructing a hyperplane that optimally separates the data into two categories. We assume that a training set is given by The critical problem of SVM is to obtain the optimal separation hyperplane. This problem may be handled by solving the following optimization problem [33]: subject to where w is normal to the hyperplane, C is a penalty parameter, which defines how important it is to avoid misclassification errors, ξ i is a positive slack variable respectively. SVM obtains a linear separating hyperplane with maximal margin by using function ϕ to map the training instance to a highdimensional space, which assists in solving the nonlinearly separable problem. A radial basis function is adopted for building function ϕ, which has the form: where γ is the kernel parameter.
In the previous research, SVM was used to solve the classification problems. However, SVM still has some disadvantages owing to the questionable determination of the parameters such as penalty parameter C and kernel parameter γ . For the traditional SVM, the setting of parameters is affected by the defects caused by human subjectivity. Improper parameters directly decrease the final classification accuracy. In this study, the grid search algorithm is used to optimize the penalty parameter C and kernel parameter γ [34]. Specifically, a set of candidates is first selected for both C and γ . Using the training sample set, each pair of C and γ is evaluated by cross validation, and the pair with the highest accuracy is determined as the optimal parameters. As such, the optimal GSSVM model can be obtained.

IV. CASE STUDIES
In this section, the validity of our proposed method is assessed through two experimental examples. The first experiment was conducted from Case Western Reserve University for 12 different operating conditions of rolling bearings, and the second experiment was conducted for nine different operating conditions of gears. For each experiment, the VMD method was first employed to decompose the original signals. The mRMR feature selection method was subsequently applied to select several sensitive RDIs and finally fed into the multiclassifier GSSVM for pattern recognition. For each experiment, comparative studies between TDIs and RDIs and for RDI selection methods were conducted.

A. CASE 1: ROLLING BEARING FAULT DIAGNOSIS
In this case, the vibration fault signals of rolling bearings of the Electrical Engineering Laboratory of Case Western Reserve University were used to validate the effectiveness of the proposed method. Fig. 2 shows the experimental platform device [35]. This case employed drive-end bearing data for a shaft speed of 1750 rpm. Three bearing fault types (ball fault, inner race fault, and outer race fault) with various levels of fault severity 0.007-0.028 mils were introduced using electro-discharge machining. A total of 12 different operating conditions were tested, which were as follows: For each operating condition, the first 120,000 points were divided into 50 sub-signals, where each sub-signal contained 2400 points. Thus, the whole data set consisted of 600 samples. The time-domain vibration waves of a single sample for the 12 operating conditions are presented in Fig. 3. The detailed arrangements of the experimental datasets for fault identification are shown in Table 1. Subsequently, VMD was applied to decompose the 600 original signals separately. The number of decomposition levels was set as six. Using the VMD method, the original signals were decomposed into several subsignals. Owing to space limitations, only the decomposition signals of the ball fault (0.007 mils) are presented in Fig. 4, where the x-axis represents the number of sampling points and the y-axis the IMF component. In this experiment, the first 35 groups of vibration data in each state were used as the training data. The remaining 15 groups of vibration data were used to verify the effectiveness of the proposed method.

1) RDI EXTRACTION AND SELECTION
Based on the original signals and VMD decomposition signals, the values of the previously defined RDIs of each IMF and original signals were calculated. From the analysis above, we can obtain a feature set containing 145 RDIs from the original signals and 870 RDIs from the VMD decomposed signals. Therefore, 1015 dimensionless indicators were extracted as a feature subset to diagnose the 12 fault types of rolling bearings. However, although 1015 RDIs as a highdimension feature set increase the useful information for fault diagnosis, they inevitably generate redundant information. Redundant information may reduce the accuracy and efficiency of fault diagnosis. Therefore, the mRMR approach was used to select the most discriminant features, which were highly effective for the pattern classifier when identifying fault types.
Step 1: The mutual information between each RDI and class is calculated according to formula (9). The RDI  possessing the largest mutual information value is selected as the most sensitive feature and joined into an empty set {S}, and the mutual information values of the remaining RDIs are stored in a vector D, which are then used to measure the relevance of the RDI and class.
Step 2: The mutual information values between each remaining RDI and the RDI in set {S} are calculated according to formula (10) and then stored in vector R. The mutual information value in vector R is used to measure the redundancy of the RDIs. We used the vector D to subtract the vector R and select the RDI possessing the largest difference as the sensitive feature to join into the RDI set {S}.
Step 3: Judging the number of RDIs in the RDI set {S}, if the number of RDIs is lower than the setting value K , we should return to step 2. Otherwise, if the number of RDIs is equal to K , we output the RDI set {S} containing K RDIs as the final sensitive features. The setting value K represents the number of sensitive RDIs that we must select from the original feature set. In this case, K is set as 15.
Using the mRMR method, 15 sensitive RDIs were selected from the 1015 RDIs. The detailed mutual information values of the selected 15 RDIs are shown in Table 2.

2) COMPARATIVE STUDIES BETWEEN RDIs AND TDIs
To quantify the distinguishability of the TDIs and the proposed RDIs for different operating conditions of rolling bearing, the distance ratios (DR) of between-class and withinclass distances for the TDIs and RDIs, respectively, were calculated. A dimensionless indicator possessing a larger DR value means that it can maximize the between-class distance and minimize the within-class distance, indicating a better performance in distinguishing different operating conditions. The DR values of the 145 RDIs are shown in Fig. 5. Among the 145 RDIs in Fig. 5, the six TDIs are marked in red, and the remaining dimensionless indicators (NDIs) are marked in blue. By analyzing the DR values of the 145 RDIs, it is observed that although there are 83 dimensionless indicators VOLUME 8, 2020  with DR values greater than 8, there is only one TDI among them. Hence, the DR values of most NDIs were greater than those of TDIs, indicating that the RDIs have better distinguishability for different fault types than the TDIs. To further illustrate the advantages of the RDIs over the TDIs, the classification accuracy of the RDIs and TDIs was investigated under two circumstances: 1) six TDIs of the original signals; and 2) six RDIs selected from 145 RDIs of the original signals using the mRMR method. The GSSVM as a pattern classifier was used to perform the accuracy comparison of the RDIs and TDIs. For each circumstance, the diagnosis accuracy was obtained by repeating the testing procedures 20 times to reduce the effect of randomness on the results. Fig. 6 shows the classification results under the two circumstances for 20 trials. Additionally, the comparative indexes from 20 trials are listed in Table 3, which includes the maximum, minimum, mean, and standard deviation of the classification accuracy. As shown in Table 3, the maximum accuracy of 94.44%, minimum accuracy 87.22%, and average accuracy 91.11% were obtained using the RDIs, which were higher than those using the TDIs. Additionally, the standard deviation of 2.37% of the classification accuracy with the RDIs was significantly lower than that with the TDIs. The comparison results above  indicate that the RDIs are superior to the TDIs in terms of not only the average diagnosis accuracy, but also the stability of the diagnosis result.

3) COMPARATIVE STUDIES FOR RDI SELECTION METHOD
To verify the superiority of the mRMR feature selection method, a comparison between the Fisher criterion (FC) and the mRMR feature selection was performed to select sensitive RDIs from the 1015 RDIs. Twenty trials were conducted to reduce the effect of randomness on the results. Table 4 shows the detailed diagnosis results with the first 15 RDIs selected using the two feature selection methods in 20 trials, and the average classification accuracies using different numbers of RDIs are shown in Fig. 7. The diagnosis accuracy using the mRMR was higher than that using the FC feature selection. As shown in Fig. 7, for the 15 RDIs selected using the mRMR method, the classification accuracy presented a fluctuation change with the increasing number of RDIs at the early stage. However, when the number of input RDIs increased to 11, the average classification accuracy reached the maximum of 97.47%. Subsequently, the average classification accuracy changed slightly when the number of input RDIs continued to increase. The classification results of the first 11 RDIs corresponding to the maximum accuracy of 97.47% for 12 different operating conditions of rolling bearing are shown in Fig. 8. For the faulty conditions B-014, B-028   and 89.33%, respectively. The overall recognition rate was 97.47%. However, using the FC feature selection method, the highest average classification accuracy was only 78.56%, which was significantly lower than the 97.47% obtained by mRMR method. Additionally, the mRMR method achieved a lower standard deviation (1.14%). Hence, the accuracy rate of the proposed method was more stable compared with that of the FC method. In addition, for the proposed method, the original set of 1015 RDIs was reduced around 98.92% but still reached a high identification accuracy of 97.47%.
The classification effectiveness of the first five RDIs selected by different feature selection methods was illustrated in Figs. 9-10. As shown, the mRMR method could obtain the relevant and nonredundant RDIs to improve the fault identification accuracy. In contrast to the mRMR method, the FC method only considers the relevancy of each RDI measured with respect to the class labels, thereby causing similarity or redundant features to appear in the selected RDIs, which causes the low identification accuracy.

B. CASE 2: GEAR FAULT DIAGNOSIS
In this case, the vibration signals of the gear were collected from a benchmark two-stage gearbox [36], as shown  in Fig. 11. A 32-tooth pinion and an 80-tooth gear were installed on the first stage input shaft. The second stage comprised a 48-tooth pinion and 64-tooth gear. The input shaft speed and gear vibration signals were measured using a tachometer and accelerometer, respectively. Nine different gear conditions were tested, as follows: healthy condition, missing tooth, root crack, spalling, and chipping tip with five different levels of severity. The vibration signals were recorded with a sampling frequency of 20 KHz. For convenience, these nine operating conditions are expressed as NC, MT, RC, SP, C1, C2, C3, C4, and C5, and labeled as 1, 2, 3, 4, 5, 6, 7, 8, and 9, respectively.
For each operating condition, 104 signals were collected using the gearbox system and each signal contained 3600 sample points. Hence, 104 groups of vibration data were measured for each state, and 936 samples of the nine different operating conditions were collected for further investigations. The time-domain vibration waves of a single sample for the nine operating conditions of the gear are presented in Fig. 12. The detailed arrangements of the experimental datasets for fault identification are shown in Table 5. The VMD method was subsequently applied to decompose the 936 original  signals separately. The number of the decomposition level was set as six. Using the VMD method, the original signals were decomposed into several subsignals. Owing to space limitations, only the decomposition signals of the missing tooth gear are presented in Fig. 13, where the x-axis represents the number of sampling points and the y-axis the IMF component. In this experiment, the first 70 groups of vibration data in each state were used as the training data. The remaining 34 groups of the vibration data were used to verify the effectiveness of the proposed method.

1) RDI EXTRACTION AND SELECTION
Based on the original signals and the VMD signals, the values of the previously defined RDIs of each IMF and original signals were calculated. From the analysis above, a feature set containing 145 RDIs from the original signals and 870 RDIs from the VMD decomposed signals can be obtained. Therefore, 1015 dimensionless indicators were extracted as a feature subset to diagnose the nine gear fault types. Although the 1015 RDIs as a high-dimension feature set increased the amount of useful information for fault diagnosis, some redundant information was generated inevitably, which may reduce the accuracy and efficiency of fault diagnosis. Therefore, the mRMR approach was used to select the most discriminant features from the 1015 RDIs. Using the mRMR method, 10 sensitive RDIs were selected from the 1015 RDIs. The detailed mutual information values of the selected 10 RDIs are shown in Table 6.

2) COMPARATIVE STUDIES BETWEEN TDIs AND RDIs
To quantify the distinguishability of the TDIs and the proposed RDIs for different gear operating conditions, the DRs of between-class and within-class distances for the TDIs and RDIs, respectively, were calculated. A dimensionless indicator possessing a larger DR value means that it can maximize the between-class distance and minimize the within-class distance, indicating a better performance in distinguishing different operating conditions. The DR values of the 145 RDIs are shown in Fig. 14. Among the 145 RDIs in Fig. 5, six TDIs are marked in red, and the NDIs are marked in blue. Analyzing the DR values of the 145 RDIs, it is clear that the DR values of most NDIs are greater than that of the TDIs, indicating that the RDIs have better distinguishability for different fault types than the TDIs. To further illustrate the advantages of the RDIs over the TDIs, the classification accuracy of the RDIs and TDIs was investigated under two circumstances: 1) six TDIs of the original signals; and 2) six RDIs selected from 145 RDIs of the original signals using the mRMR method. The GSSVM as a pattern classifier was used to compare the accuracy of the RDIs and TDIs. For each circumstance, the diagnosis accuracy was obtained by repeating the testing procedures 20 times to reduce the effect of randomness on the results. Fig. 15 shows the classification results under the two circumstances for 20 trials. Additionally, the comparative   indexes in the 20 trials are listed in Table 7, which includes the maximum, minimum, mean, and standard deviation of the classification accuracy. It can be observed that the maximum accuracy of 77.78%, minimum accuracy 67.65%, and average accuracy 72.01% were obtained using the RDIs, which were higher than those obtained using the TDIs. Additionally, a 3.14% standard deviation in classification accuracy with the RDIs is approximately equal to a 3.03% standard deviation in classification accuracy with the TDIs. The comparison results indicate that the RDIs are superior to the TDIs in terms of the average diagnosis accuracy.

3) COMPARATIVE STUDIES FOR RDI SELECTION METHOD
To verify the superiority of the mRMR feature selection method, a comparison between the FC and the mRMR feature   selection was performed to select sensitive RDIs from the 1015 RDIs. Twenty trials were conducted to reduce the effect of randomness on the results. Table 8 shows the detailed diagnosis results with the first 10 RDIs selected by the two feature selection methods above in 20 trials, and the average VOLUME 8, 2020  classification accuracies using different numbers of RDIs are shown in Fig. 16. The diagnosis accuracy using the mRMR was higher than that of using the FC feature selection. As shown in Fig. 16, for the 10 RDIs selected by the mRMR method, the classification accuracy exhibited a fluctuation change with the increasing number of RDIs at the early stage. However, when the number of input RDIs increased to five, the average classification accuracy was the maximum at 97.12%. Subsequently, the average classification accuracy changed slightly when the number of input RDIs continued to increase. The classification results of the first five RDIs corresponding to the maximum accuracy of 97.12% for nine different gear operating conditions are shown in Fig. 17. For the normal condition and the faulty conditions C3 and C5, the fault classification rates reached 100%. For the faulty conditions MT, RC, SP, C1, C2, and C4, the fault classification rates were 96.03%, 96.62%, 97.65%, 99.41%, 92.35%, and 91.91%, respectively. The overall recognition rate was 97.12%. However, using the FC feature selection   method, the highest average classification accuracy was only 81.27%, which was significantly lower than the 97.12% obtained by the mRMR method. Additionally, the mRMR method achieved the smaller standard deviation (1.09%).
Hence, the accuracy rate of the proposed method was more stable compared with that of the FC method. In addition, for the proposed method, the original set of 1015 RDIs was reduced around 99.01% while reaching a high identification accuracy of 97.12%.
The classification effectiveness of the first five RDIs selected by different feature selection methods was illustrated in Figs. [18][19]. As shown, the mRMR method could obtain relevant and nonredundant RDIs to improve the fault identification accuracy. In contrast to the mRMR approach, the FC method only considers the relevancy of each RDI measured with respect to the class labels, thereby causing similarity or redundant features to appear in the selected RDIs, which causes the low identification accuracy.

V. CONCLUDING REMARKS
With the increasing number of fault types, fault severities, and fault orientations of machinery components in industrial applications, it is necessary to construct more fault features to represent the useful fault information hidden in the vibration signals. Hence, this study provides a new fault feature extraction strategy for the fault diagnosis of machinery components based on RDIs. The mRMR approach was introduced to select sensitive RDIs with higher distinguishability as the input of the GSSVM. Using the mRMR approach, the mutual information of each RDI was calculated, and a set of maximally relevant and minimally redundant RDIs was obtained. Two experiments based on 12 different operating conditions of rolling bearings and nine different operation conditions of gearboxes were performed to verify the effectiveness of the proposed method. Comparative experiments demonstrated that the RDIs performed better than the TDIs and could improve the diagnosis accuracy. Experimental results showed that our proposed method could successfully differentiate 12 and nine faulty conditions of rolling bearings and gears, respectively, with the average accuracy of 97.47% with 11 RDIs and 97.12% with 5 RDIs. Although our proposed method could yield a satisfactory diagnosis accuracy, human subjectivity still existed in the parameter settings of RDI definitions. The effectiveness of RDIs using different parameters is therefore unknown. Further studies will be performed to test the effectiveness of the proposed method with different parameter settings in RDI definitions. QIN HU received the master's degree in control theory and control engineering from the Guangdong University of Technology, China, in 2013. He is currently pursuing the Ph.D. degree with the Department of Automation, Rocket Force University of Engineering, China. His research fields are fault diagnosis of rotating machinery, signal processing, and pattern recognition.