Sparse Representation Classification With Structured Dictionary Design Strategy for Rotating Machinery Fault Diagnosis

Fault diagnosis technique is the core of Prognostics and Health Management (PHM) system, which plays a crucial role in the intelligent operation and maintenance of various rotating machineries. In this paper, we present a novel sparse representation classification framework with structured dictionary design strategy (SRC-SDD) for intelligent fault diagnosis of rotating machineries. The proposed SRC-SDD method consists of two stages, i.e., the structured dictionary design stage and the sparsity-based intelligent diagnosis stage. In the first stage, the novelty of SRC-SDD lies in the overlapping segmentation strategy for structured dictionary design, which leverages the structured prior knowledge of rotating machinery vibration signals, namely, the periodic self-similarity and shift-invariance properties. In the second stage, SRC-SDD achieves fault recognitions of testing samples using a sparsity-based diagnosis strategy based on the minimum sparse reconstruction error. The proposed structured dictionary design strategy can enhance the representation power of dictionaries and thus promote the recognition performance of the sparsity-based diagnosis strategy. Finally, the effectiveness of SRC-SDD has been validated on the gearbox fault dataset from IEEE PHM society. The diagnosis results show that SRC-SDD achieves the excellent recognition accuracy of 100% for predicting six different gearbox health states. Further, the comparative studies with three conventional SRC methods prove the superiority of SRC-SDD in terms of both the recognition performance and computation efficiency.


I. INTRODUCTION
Prognostics and health management (PHM) has been proven as one of the core technologies to promote the reliability and safety of various complex industrial systems in the era of smart manufacturing [1]- [4]. Fault diagnosis techniques play an important role in the PHM framework, which aim to early and accurately detect the component faults in the industrial equipment. The key components of rotating machinery such as the gears and bearings usually operate under the heavy and harsh working conditions, which are susceptible to the local damages and even lead to the catastrophic system failures. Therefore, it is of great significance to diagnosis The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wang . the faults of rotating machineries early and accurately [5]. To early and accurately diagnose the rotating machinery faults, vibration-based signal analysis methods have been intensively studied and proven as one of the most effective techniques [6], which can be roughly categorized as the feature extraction methods and the pattern recognition methods.
Among the feature extraction methods, the classical time-domain statistics and spectral analysis techniques usually fail to detect the incipient weak faults due to the strong harmonic interferences and heavy background noises. To accurately extract and characterize the fault-related features from the complex multi-component signals of rotating machine, many advanced vibration-based signal processing algorithms have been developed, such as spectral kurtosis (SK) [7], minimum entropy deconvolution [8], empirical mode decomposition [9], stochastic resonance [10], time-frequency representation [11]- [13], wavelet transforms [14]- [16], and sparsity-based fault diagnosis algorithms [17]- [19]. However, these above extraction feature methods require a prior feature knowledge such as the fault characteristic frequency (FCF), and thus are not suitable when the explicit prior feature knowledge are not available due to the dynamic and uncertainty issues in the complex mechanical systems [20].
In contrast to the fault feature extraction methods, pattern recognition methods can discover the underlying knowledge which implicitly represents various fault modes without the prior knowledge about the fault characteristic frequency. In recent years, the pattern recognition methods integrated with artificial intelligence technique have witnessed great success for rotating machinery fault diagnosis. Among the artificial intelligence techniques, deep learning (DL) has been proven one of the most popular options to enable pattern recognition methods for data-driven machinery fault diagnosis [21]- [25]. Wen et al. [22] incorporated the deep convolutional neural network (DCNN) with a signalto-image conversion method for data-driven fault diagnosis. Shao et al. [23] improved the convolutional deep belief network (CDBN) with compressed sensing for efficient rolling bearing fault diagnosis. Sun et al. [24] developed the sparse deep stacking network (SDSN) method to overcome the overfitting risk of deep networks for motor fault diagnosis. Saufi et al. [25] proposed a stacked sparse autoencoder model for gearbox fault diagnosis in the case of limited sample data. Besides, the most recent works on deep transfer learning have attracted increasing attention to address the challenge of transferrable fault diagnosis of rotating machineries across different working conditions and even different machines. Although these DL techniques enable to learn the abstract features for data-driven machine fault diagnosis, the diagnosis performances of DL techniques still heavily rely on the well-designed complex deep network architectures and various finely-tuned hyperparameters [26].
Alternatively, the emerging sparse representation-based classification (SRC) methods have also been widely applied for computer vision and pattern recognition [27], [28], which enable pattern recognition with sparse representation theory. Compared with DL techniques, SRC methods do not require designing complex deep network architectures and the labor-intensive parameter tuning works, which can be regarded as a promising tool for rotating machine fault diagnosis as well. The core idea of sparse representation theory is to represent or approximate the signal of interest y ∈ R n over the so-called redundant dictionary D ∈ R n×K using a linear combination of few atoms (columns of the dictionary D), i.e., y = Dx or y ≈ Dx, where x ∈ R K is the sparse code of y containing just a few non-zero coefficients. The SRC methods for pattern recognition can be roughly categorized as the reconstruction error-based SRC and the classifier training-based SRC methods. Within these SRC methods, the dictionary can be designed by either using the training samples directly [29] or using dictionary learning algorithms such as K-SVD (K-means singular value decomposition) [30]. On the one hand, the core idea of the reconstruction error-based SRC is that testing samples tend to achieve the minimum reconstruction error with respect to the category-specific dictionary designed using the training samples with the same class label [29]. Inspired by this idea, Tang et al. [31] achieved the sparse classification of rolling bearing faults using the reconstruction error-based SRC with compressive sensing strategy. The dictionary learning-based sparse classification approaches were developed for planet bearing fault diagnosis [32] and epilepsy detection [33]. Han et al. [34] incorporated K-SVD for dictionary learning with the reconstruction error-based SRC method for wind turbine fault diagnosis. Later, Wang et al. [35] further proposed the weighted sparse representation classification method using an improved K-SVD algorithm for bearing fault diagnosis. On the other hand, the core idea of the classifier training-based SRC method is incorporating the reconstruction power of signal sparse representation and the discrimination power of classifiers, which is achieved by using the sparse codes as input features to train the classifier for pattern recognition. To this end, Zhang and Li [36] proposed the discriminative K-SVD algorithm to learn the dictionary and classifier jointly for face recognition. Jiang et al. [37] further developed a Fisher discriminative K-SVD method by imposing the Fisher discrimination criterion on sparse codes for face recognition. Aiming to obtain both the reconstructive and discriminative power from dictionary learning, Zheng and Tao et al. [38] proposed a label consistent K-SVD method to learn the discriminative dictionary and an optimal linear classifier simultaneously for pattern recognition. Ren et al. [39] developed a new multi-view SRC method based on joint supervised dictionary and classifier training for synthetic aperture radar (SAR) image classification. Recently, Kong et al. [40] developed the discriminative dictionary learning based sparse classification method for planet bearing fault diagnosis. However, when applying these above SRC methods for rotating machinery fault diagnosis, the structured prior knowledge of rotating machinery vibration signals are not considered to enhance the discriminative power of SRC methods and thus may lead to the inferior fault recognition performances.
In this paper, a novel sparse representation classification framework with structured dictionary design strategy (SRC-SDD) is proposed, which could leverage the structured prior knowledge of rotating machine vibration signals for accurate rotating machinery fault diagnosis. In specific, the periodic self-similarity and shift-invariance properties are two crucial prior knowledge of rotating machine vibration signals, which will be incorporated into our proposed structured dictionary design strategy using an overlapping segmentation operation. The proposed SRC-SDD method consists of two stages, i.e., structured dictionary design stage and fault recognition stage. The main contributions of this paper are presented as follows.
1) In the structured dictionary design stage, the periodic self-similarity and shift-invariance properties of rotating machinery vibration signals are fully considered using an overlapping segmentation strategy, which can enable to design the category-specific dictionaries for each health state of rotating machines and enhance the discriminative power of SRC for accurate fault recognitions. 2) In the fault recognition stage, a sparsity-based intelligent diagnosis strategy is established using the discrimination criterion based on minimum sparse reconstruction error of testing samples with respect to all structured category-specific dictionaries. 3) Compared with some state-of-the-art SRC methods, the experiment validation results on the famous gearbox fault dataset from IEEE PHM Society demonstrate that SRC-SDD achieves the highest recognition performance and lowest computation costs for machine fault diagnosis. The rest of this paper is organized as follows. Section II presents the related works on the sparse representation-based classification (SRC) methods and the limitations for rotating machinery fault diagnosis. In Section III, we introduce SRC-SDD for rotating machinery fault diagnosis, including the structured dictionary design strategy, and the sparsity-based intelligent diagnosis strategy. The experiment validations of SRC-SDD for rotating machine fault diagnosis are presented in Section IV. Finally, Section V draws the conclusions.

A. SPARSE REPRESENTATION AND DICTIONARY LEARNING
Different from the classical signal representation paradigms using complete orthogonal bases (such as Fourier transform), sparse representation has brought new insights into the signal representation paradigm using redundant dictionaries. The basic idea of sparse representation is to reveal the sparsest way to represent or approximate the input signal by a linear combination of few dictionary atoms (columns of dictionary), as illustrated in Fig. 1(a). Given that y ∈ R n denotes the input signal and D=[d 1 , d 2 ,. . . , d K ] ∈ R n×K stands for the dictionary, the sparse code x ∈ R K (i.e., sparse representation coefficients) of y with respect to D can be calculated by solving the sparse coding problem as follows, where the L 0 -norm x 0 defines the sparsity constraint and T is the sparsity threshold to enforce the number of nonzero entries in the vector x. Typical sparse coding algorithms for solving (1) are the sparse regularization and greedy pursuit methods. On the one hand, the sparse regularization methods solve the following least squares problem regularized with a sparsity-inducing penalty, where λ is the regularization parameter and φ(x) is a sparsity-inducing penalty such as the convex L 1 -norm and nonconvex penalties including the arctangent penalty [17], generalized minimax-concave penalty [18]. The optimization algorithms to solve the sparse regularization problem in (2) include the proximal splitting method [41], and the alternating direction method of multipliers [42]. On the other hand, the greedy pursuit methods solve the sparse coding problem in (1) in a greedy way to get the approximate solution satisfying the L 0 -norm constraint. Among the greedy pursuit algorithms, the orthogonal matching pursuit (OMP) is a strong competitor due to its computation efficiency for sparse coding [43]. The procedure of OMP for sparse coding is listed in Algorithm 1. Dictionary plays an important role in the effective sparse representations, which is hard to predefine due to the limited prior knowledge about input signals. In this case, dictionary learning method is helpful, which learns to design dictionary for sparse representation using the optimization models and training signals. Assume that Y=[y 1 , y 2 ,. . . , y N ]∈R n×N denotes Nn-dimensional training samples, dictionary learning solves the following optimization problem to learn the data-driven dictionary D, where D=[d 1 , d 2 ,. . . , d K ]∈R n×K is the dictionary containing K atoms (K is the dictionary size) and X=[x 1 , x 2 ,. . . , x N ]∈R K ×N is the sparse codes of Y with respect to dictionary D. The basic principle of dictionary learning is shown in Fig. 1(b). The typical dictionary learning algorithm Algorithm 1 OMP for Sparse Coding Input: Input signal y ∈ R n , dictionary D=[d 1 , d 2 ,. . . , d K ]∈R n×K , and sparsity threshold T . Initialize: Residual r 0 = y, index set 0 = Ø, the activated atom set D 0 = Ø, and iteration number t = 1.

Procedure:
1: for t = 1,2,. . . , T , do 2: Find the index γ t that solves the optimization problem, is K-SVD [30], which alternatively implements the sparse coding and dictionary update procedures to minimize the cost function in (3).

B. SPARSE REPRESENTATION-BASED CLASSIFICATION
In recent years, the sparse representation-based classification (SRC) methods have attracted lots of attention for computer vision and pattern recognition tasks [27]. Based on different discrimination criteria for classifications, SRC methods are categorized into the reconstruction error-based SRC and the classifier training-based SRC.

1) RECONSTRUCTION ERROR-BASED SRC
The basic idea of the reconstruction error-based SRC is that the testing signals tend to be sparsely approximated and achieve the minimum reconstruction error with respect to the dictionary designed using training signals from the correct category [29], [32]. Given L sets of training signals from L different categories, the reconstruction error-based SRC methods achieve pattern recognition via the following steps.
Step 1: Design L different category-specific dictionaries D l (l = 1,2,. . . , L) using the training signals directly [29] or the dictionary learning algorithm [32] for each category.
Step 2: Calculate the sparse codes x * l (l = 1,2,. . . , L) of the testing signal y with respect to L different category-specific dictionaries D l by sparse coding in (1) and the corresponding reconstruction errors RE (y, D l , x * l ) as follows, Step 3: Predict the class label of y using the discrimination criterion based on the minimum reconstruction error,

2) CLASSIFIER TRAINING-BASED SRC
The other strategy to gain discrimination power for SRC methods is to incorporate dictionary learning with classifier training. The basic idea of the classifier training-based SRC methods is to optimize the classifier model for pattern recognition by exploiting sparse codes of the training signals as inputs of the classifiers [36]- [40]. Given the training signals Y∈R n×N and the associated label matrix H=[h 1 , h 2 ,. . . , h N ]∈R L×N , the classifier training-based SRC methods achieve pattern recognition as follows.
Step 1: Learn the discriminative dictionary and classifier model jointly by solving a unified optimization problem, where D∈R n×K and W∈R L×K represents the discriminative dictionary and the classifier model parameters, respectively. λ 1 and λ 2 are the regularization parameters. Besides, C is the classification loss and f (x i , W) is the predicted label using the sparse code x i as input features of the classifier W.
Step 2: Calculate the sparse code x * of the testing signal y with respect to the optimized discriminative dictionary D * , Step 3: Assign the class label of the testing signal y as the index corresponding to the largest element in lable(y), which is estimated using the optimized W * and the sparse code x * ,

C. LIMITATIONS OF THE EXISTING SRC METHODS
When applying these existing SRC methods for rotating machinery fault diagnosis, there are still some limitations to be addressed. First, the recognition performances of SRC methods rely on the representation capability of dictionaries. However, the dictionary design strategy in the existing SRC methods neglects the structured prior knowledge of rotating machinery vibration signals, which may limit the recognition performance of SRC for rotating machinery fault diagnosis. Second, the local features between the training samples and testing samples are not fully considered in the existing SRC methods [35], which plays a key role in the SRC methods for pattern recognition. Therefore, it is promising to develop a wise dictionary design strategy leveraging the structured prior knowledge of rotating machinery vibration signals and the helpful similarity information to promote the recognition performances of SRC for rotating machinery fault diagnosis.

III. PROPOSED SRC-SDD METHOD
The proposed sparse representation classification method with structured dictionary design strategy (SRC-SDD) is presented for rotating machinery fault diagnosis in this VOLUME 9, 2021 section, which involves two stages, i.e., the structured dictionary design stage and the sparsity-based intelligent diagnosis stage.

A. STRUCTURED DICTIONARY DESIGN
To overcome the limitations of the existing SRC methods for rotating machinery fault diagnosis, we propose the structured dictionary design strategy to leverage the key structured prior knowledge of rotating machinery vibration signals, namely, the periodic self-similarity and shift-invariance properties, for enhancing the representation power of dictionary and the fault recognition performances of SRC methods.

1) PERIODIC SELF-SIMILARITY OF ROTATING MACHINERY VIBRATION SIGNALS
When rotating machinery operates under constant working conditions, the repetitive working modes of the rotating machinery will produce the vibration response signals with strong periodicity, thus determining the periodic self-similarity of rotating machine vibration signals. Taking the faulty vibration signals of rotating machinery such as the gear transmission systems as an example, different rotating parts can generate multiple vibration components with strong periodic self-similarity, as illustrated in Fig. 2(a). The faulty vibration signals of rotating machinery generally consist of four constituent components, namely, the harmonic features, amplitude modulation (AM) and frequency modulation (FM) components, repetitive transients and random noises. Among the first three constituent components, they both share strong periodic self-similarity property, such as harmonic features induced by rotating shafts and healthy meshing gears, AM-FM components induced by the distributed gear faults, and repetitive transients induced by the localized bearing defects. Therefore, identifying the periodic self-similarity property of rotating machinery vibration signals, especially the fault-related feature components, is crucial for rotating machinery fault diagnosis under constant working conditions [44]. As for the SRC methods, leveraging the periodic self-similarity property of rotating machine vibration signals for dictionary design under different health states will obviously enhance the representation capability of dictionary and promote the fault recognition performances.

2) SHIFT-INVARIANCE PROPERTY WHEN PREDICTING HEALTH STATES
During the process of rotating machinery condition monitoring using sensors (such as accelerometers), the data acquisition system will acquire large amount of sensory data with the sampling time increasing. In the real applications of rotating machinery fault diagnosis using pattern recognition methods, the long-term monitored sensory data will be first divided into many short data segments and then these short data segments will be used as inputs of pattern recognition methods for health state prediction. As illustrated in Fig. 2(b), if the rotating machinery operates under a certain health state l and the degradation of health state can be negligible during a short time, then a large amount of short data segments will correspond to different time history but all indicate the same health state l. In this case, the health state prediction results of these short data segments by the same pattern recognition method should be ideally invariant to the time-shifts of these short data segments. In other words, if we applied the pattern recognition methods for rotating machinery fault diagnosis, the predicted health state labels of these short data segments during a short time should be ideally consistent with the only real health state, i.e., satisfying the shift-invariance property.
More specifically, as shown in Fig. 2(b), we consider the rotating machinery operating under the certain health state l, the monitored vibration data y state l by sensors can be divided into a large amount of short data segments, which correspond to different time history but all indicate the same health state l. In this case, we assume that there are N short data segments {y 1 , y 2 ,. . . , y N } indicating the same health state l, the predicted health state label results of two different short data segments y s and y t by the same classifier W should be ideally consistent with the health state l and invariant to the time-shifts of y s and y t in the whole vibration data y state l . In other words, the predicted health state labels of the short data segments y s and y t satisfy the following shift-invariance property, ∀s, t ∈ {1, 2, . . . , N }, label(y t , W ) = label(y s , W ), (9) where label(y s , W) and label(y s , W) are the predicted health state of the short data segment y s and y t using the classifier W, respectively. It should be mentioned that the short data segments y s and y t only differ in the different starting points of the vibration data y statel under the health state l.

3) STRUCTURED DICTIONARY DESIGN USING OVERLAPPING SEGMENTATION STRATEGY
Proper dictionary design strategy is important to represent the structured prior knowledge of rotating machinery vibration signals, namely, the periodic self-similarity and shift-invariance property. Exploiting the periodic self-similarity and shift-invariance properties for structured dictionary design can enhance the representative power of dictionaries for effective sparse representations and promote the discriminative power of the sparsity-based fault diagnosis methods. To this end, we propose an overlapping segmentation strategy for the structured dictionary design as follows, which uses the raw rotating machine vibration data.
The overlapping segmentation strategy for the structured category-specific dictionary design is shown in Fig. 3. Firstly, we define a segmentation operator R k which is parameterized with two segmentation parameters, i.e., the window size W and overlapping rate δ. Different segmentation operations for 1-D mechanical vibration signals can be achieved using the segmentation operator R k with different parameters W and δ. As illustrated in Fig. 3(a), assume that y∈R 1×m denotes the 1-D mechanical vibration signal, the segmentation operator R k : R 1×m →R W ×1 is defined to extract the transpose of the k-th local data segment y k of the 1-D mechanical vibration signal y and form as the k-th atom of dictionary D as follows, where d k is the k-th atom in the dictionary D. Further, the overlapping segmentation operator R is defined to transform the 1-D vibration signal y into a 2-D dictionary matrix. In specific, the overlapping segmentation operator R consists of K cascaded segmentation operator, i.e., R=[R 1 , R 2 ,. . . , R K −1 , R K ] (K is the dictionary size). As a result, as illustrated in Fig. 3(b), the overlapping segmentation operator R: R 1×m →R W ×K designs the structured 2-D dictionary D∈R W ×K using the raw 1-D mechanical vibration signal y directly as follows, Moreover, if we consider L different health states of rotating machinery and assume that y l ∈ R 1×m stands for the 1-D raw vibration data under the health state l, then we can apply the overlapping segmentation operator R to design L different structured category-specific dictionaries D l for each health state l (l = 1,2,. . . , L) using the vibration data y l as follows, In summary, the structured dictionary design for L different health states using the overlapping segmentation strategy is illustrated in Stage I of Fig. 4. It is worth noting that the structured dictionary design strategy could leverage the local features between testing samples and testing samples, since the overlapping segmentation operator maximally maintains the self-similarity of local data segments for each health state. It should be mentioned that the overlapping segmentation strategy for the structured dictionary design also applies to the raw testing vibration signals, which could generate plenty of testing samples with unknown health states for health state prediction in the second stage of the SRC-SDD method.

B. SPARSITY-BASED INTELLIGENT DIAGNOSIS
The second stage of SRC-SDD for rotating machine fault diagnosis is the sparsity-based intelligent diagnosis stage for fault recognition, which is illustrated in Stage II of Fig. 4. The sparsity-based intelligent diagnosis strategy achieves fault recognition using the discrimination criterion based on the minimum sparse reconstruction error, which predicts the unknown health states of testing samples according to the following three steps.

1) SPARSE CODING OF TESTING SAMPLE
As for the testing sample y i with unknown health state, we first implement the sparse coding procedure to compute its sparse codex (l) i with respect to the structured category-specific dictionaries D l for each health state l (l = 1,2,. . . , L) using OMP in Algorithm 1, wherex (l) i ∈ R K ×1 is the sparse code of y i with respect to the structured category-specific dictionary D l for health state l. i ) via (14). 7: end for 8: Predict the health state label(y i ) via (15). Output: The health state label(y i ) of testing sample y i .

2) RECONSTRUCTION ERRORS CALCULATION
As for the testing sample y i , we then calculate the sparse reconstruction errors RE(y i , D l ,x (l) i ) of y i using the sparse approximations of y i with respect to the L different category-specific dictionaries D l for each health state l (l = 1,2,. . . , L) as follows,

3) HEALTH STATE LABEL PREDICTION
Thirdly, as for the testing sample y i , we predict the health state label(y i ) of y i according to the following discrimination criterion based on minimum sparse reconstruction error, The above sparsity-based intelligent diagnosis strategy for fault recognition assumes that samples with the same health state label tend to share similar sparse representations. Thus, testing samples are highly possible to achieve the minimum reconstruction errors using the sparse approximations with respect to the structured category-specific dictionary, which is designed using training signals from the correct category.
In summary, the algorithmic procedures of SRC-SDD for rotating machine fault diagnosis are detailed in Algorithm 2.

IV. EXPERIMENT VALIDATIONS
In this section, the effectiveness and superiority of the SRC-SDD method for rotating machinery fault diagnosis are evaluated on the parallel-shaft gearbox fault dataset, which was released in the 2009 data challenge by IEEE Prognostics and Health Management (PHM) Society. Moreover, the fault diagnosis results of the state-of-the-art sparse representationbased classification (SRC) methods including the dictionary learning based SRC (DL-SRC) [32] and the label consistent  KSVD (LC-KSVD1 and LC-KSVD2) [37] are compared in terms of both the recognition performance and computation efficiency. Finally, the effects of algorithm parameters on the recognition performance of SRC-SDD are analyzed in depth.

A. CASE STUDY: GEARBOX FAULT DIAGNOSIS 1) EXPERIMENT DESCRIPTION
The gearbox fault dataset released by IEEE PHM Society [45] is collected on a generic industrial gearbox, which consists of three shafts, four spur or helical gears and six bearings, as shown in Fig. 5. As for the signal acquisitions, two accelerometers installed on the input and output shaft retaining plates and a tachometer are used to acquire the vibration data and rotational speed of the input shaft, respectively. The vibration signals of spur gears acquired by the accelerometer on the input side are analyzed in this case. The specific fault modes on this gearbox dataset contain three gear fault modes, three bearing fault modes, the shaft imbalance, the normal gear, and the normal bearing, as illustrated in Fig. 6. In this case study, six different gearbox health states are considered and their fault modes are detailed in Table 1. During the vibration measurement experiments, the rotation speed of the input shaft is 30 Hz and the gearbox operates with a high load. Besides, the sampling rate is 200/3 kHz and the sampling time of each vibration acquisition is 4 seconds. Two different vibration measurements are acquired under each gearbox health state, one of which serves as the training dataset for algorithm training and the other serves as the testing dataset for fault recognition. As a result, both the training and testing vibration signals of six different gearbox health states are illustrated in Fig. 7.

2) FAULT RECOGNITION RESULTS BASED ON SRC-SDD
In this case study, algorithm parameters of SRC-SDD for fault recognition of gearbox health states are set as follows. First, as for the structured dictionary design using the overlapping segmentation strategy, the window size W and overlapping rate δ are selected as 6250 and 0.97, respectively. As a result, the number of training data segments for each gearbox health state is 1394 and the dictionary size K of the class-specific dictionary D l for each gearbox health state is 1394 as well. Second, as for the sparsity-based diagnosis strategy in the fault recognition stage, the sparsity threshold T for sparse coding is selected as 10. Third, as for the testing samples, the overlapping segmentation strategy with the same parameters W and δ for structured dictionary design is used to generate the testing samples. Therefore, each gearbox health state has 1394 testing samples and the total testing samples for health state prediction is 8364. Finally, the label prediction results of SRC-SDD on the gearbox fault dataset are shown in Fig. 8. It can be observed from Fig. 8 that there is no misclassified testing sample among the total 8364 samples and SRC-SDD always achieves the excellent recognition accuracy of 100% for fault recognition of all the six different gearbox health states. These above health state recognition results prove the effectiveness of SRC-SDD for gearbox fault diagnosis.

3) COMPARATIVE STUDIES
To demonstrate the superiority of SRC-SDD for rotating machine fault diagnosis, the fault recognition results of SRC-SDD are compared with the state-of-the-art SRC methods. As for the comparative methods, we select DL-SRC, LC-KSVD1, and LC-KSVD2. DL-SRC [32] exploits the KSVD-based dictionary learning algorithm for dictionary design and achieves the fault recognition based on minimum reconstruction error for fault recognition of planet bearings. Moreover, LC-KSVD1 and LC-KSVD2 [37] have been proven one of the most successful SRC methods, which learns the discriminative dictionary and an optimal linear classifier model jointly for pattern recognition. However,  the above three SRC methods all neglect the structured prior knowledge of rotating machine vibration signals, which may decrease the fault recognition performances of these methods.
As for the comparative studies, both the fault recognition performance and the computation efficiency results For the purpose of quantitively comparing the recognition performance of different methods for gearbox fault diagnosis, both the overall recognition accuracies and F1-score of SRC-SDD and the comparative SRC methods are calculated. The F1-score is a useful indicator for quantitively evaluating the performances of the pattern recognition approaches, which comprehensively considers the precision rate and the recall rate [46]. F1-score is the harmonic mean of the precision rate P and the recall rate R, which is defined as follows, where the precision rate P is the ratio of the number of true positives (TP) to the sum of TP and the number of false positives (FP), and the recall rate R is the ratio of TP to the sum of TP and the number of false negatives (FN). The F1-score reaches the worst value at 0 and the best value at 1.   Fig. 10(a), which indicates that SRC-SDD always obtains the best recognition accuracies for predicting six gearbox health states. Moreover, the overall average recognition accuracy for SRC-SDD is 100%, which is superior to the overall average recognition accuracies of DL-SRC (72.22%), LC-KSVD1 (91.34%), and LC-KSVD2 (92.13%) for gearbox fault diagnosis. Further, the comparison results of F1-score are shown in Fig. 10(b), which show that SRC-SDD also always achieves the best F1-score for six different gearbox health states and the highest overall average F1-score (100%) on the gearbox fault dataset. In contrast, DL-SRC, LC-KSVD1 and LC-KSVD2 gain the overall average F1-score of 72.05%, 91.33%, and 92.06%, respectively. The above quantitive recognition results prove that SRC-SDD outperforms three comparative SRC methods for reliable gearbox fault diagnosis.
Finally, the computation efficiency of different methods are quantitively compared. The computation time results of SRC-SDD and the comparative SRC methods on the gearbox fault dataset are shown in Fig. 11. It should be noted that all these four methods are executed under Windows 7 operating system and the MATLAB 2016b environment on a computer equipped with the Intel Xeon CPU E5 at 2.2 GHz and RAM of 64 GB. We can observe from Fig. 11   to the testing time of LC-KSVD1 (0.75 s) and LC-KSVD2 (0.51 s). In total, the total computation time considering both the training and testing of SRC-SDD is 23.93 s, which is the most efficient among the total computation time of four different methods. In contrast, the total computation time of the DL-SRC, LC-KSVD1 and LC-KSVD2 are 548.92, 1391.32 and 1288.32 seconds, respectively. The reason for the superiority of SRC-SDD is that SRC-SDD exploits the structured prior knowledge of rotating machinery vibration signals directly for dictionary designs, while DL-SRC, LC-KSVD1 and LC-KSVD2 both apply the time-consuming dictionary learning algorithms to design dictionaries. These above computation time results prove the superiority of SRC-SDD in terms of computation efficiency for gearbox fault diagnosis.

B. PARAMETER ANALYSIS
The algorithm parameters play an important role in the fault recognition performance of SRC-SDD for rotating machinery fault diagnosis. In this section, the effects of three algorithm parameters on the recognition accuracies of SRC-SDD for gearbox fault diagnosis are thoroughly investigated using the cross-validation strategy.

1) OVERLAPPING SEGMENTATION PARAMETERS W AND δ
The window size W and overlapping rate δ are two crucial parameters in the structured dictionary design stage using the overlapping segmentation strategy. The parameters W and δ influence the dimensions of the category-specific dictionary for each gearbox health state and affect the representative power of dictionaries in the sparsity-based diagnosis strategy. The structured dictionary design strategy with a greater value of the overlapping rate δ could take better advantage of the structured prior knowledge of rotating machinery vibration signals and thus could promote the recognition performance of SRC-SDD. To validate this implication, the effects of the window size W and overlapping rate δ on the recognition accuracy of SRC-SDD are investigated using the grid search method, and the obtained results are shown in Fig. 12   According to the recognition accuracy results in Fig. 12, we could reach the following conclusions. First, SRC-SDD with the greater overlapping rate δ leads to a higher recognition accuracy than SRC-SDD with a small overlapping rate δ for gearbox fault diagnosis. Second, in the cases using the great overlapping rate δ (such as 0.97), the recognition accuracy increases slowly with the increasing of the window size W . While in the cases using the small overlapping rate δ (such as 0.2, 0.4 and 0.6), the recognition accuracy decreases with several fluctuations when increasing the window size W . Based on these observations, in the general tasks of rotating machinery fault diagnosis, the relatively great overlapping rate is recommended to choose to exploit the structured prior knowledge of rotating machinery vibration signals as much as possible, and the window size should be properly selected using the cross validation strategy.

2) SPARSITY THRESHOLD T
The sparsity threshold T is the crucial parameter of OMP algorithm for sparse coding in the sparsity-based diagnosis strategy of SRC-SDD. The effect of sparsity threshold T on the recognition accuracy of SRC-SDD for gearbox fault diagnosis is shown in Fig. 13, where the sparsity threshold T is varied within the range of [5:1:60]. Base on the recog-nition accuracy results in Fig. 13, we can clearly observe that the recognition accuracies of SRC-SDD are insensitive to the variations of the sparsity threshold T . The subplot in Fig. 13 further indicates that the variations of the recognition accuracy of SRC-SDD with different choices of the sparsity threshold T are negligible and the optimal sparsity threshold T to gain the best recognition performance of SRC-SDD is 10 for gearbox fault diagnosis.

V. CONCLUSION
In this paper, a new sparse representation-based classification method with structured dictionary design strategy (SRC-SDD) is proposed for rotating machinery fault diagnosis. The SRC-SDD method consists of the dictionary design stage and the sparsity-based intelligent diagnosis stage. The structured prior knowledge of rotating machinery vibration signals (periodic self-similarity and shift-invariance) are exploited to achieve structured dictionary designs by the overlapping segmentation strategy. The proposed structured dictionary design strategy could also leverage the similarity features between the labeled training samples and the unlabeled testing samples for robust health state predictions. Besides, the sparsity-based intelligent diagnosis strategy is established according to a discrimination criterion based on the minimum sparse reconstruction error, which could achieve robust fault recognition by the merits of the strong representative power of the designed structured sub-dictionaries for each health state. Finally, the effectiveness and superiority of SRC-SDD have been validated on the gearbox fault dataset from IEEE PHM Society. The diagnosis results show that SRC-SDD obtains the excellent overall recognition accuracy of 100% for intelligent identification of six different gearbox health states. Besides, the further comparative studies with the classical SRC methods demonstrate the superior fault recognition performance and low computation costs of SRC-SDD for rotating machinery fault diagnosis.
As a next step, we consider to incorporate the unsupervised or semi-supervised learning algorithm into the proposed SRC-SDD method for wide applications of rotating machinery fault diagnosis. FULEI CHU received the B.S. degree in mechanical engineering from the Jiangxi University of Science and Technology, Ganzhou, China, in 1982, the M.S. degree in applied mechanics from Tianjin University, Tianjin, China, in 1985, and the Ph.D. degree in mechanical engineering from the University of Southampton, Southampton, U.K., in 1994.
He is currently a Professor with the Department of Mechanical Engineering, Tsinghua University, Beijing, China. He has authored over 300 academic articles in the research areas of rotating machinery dynamics, machine fault diagnosis, nonlinear vibration, and vibration control.
Prof. Chu is the Vice President of the Chinese Society for Vibration Engineering (CSVE), the Executive Director of Fault Diagnosis Chapter in CSVE, and a member of the rotor dynamics Chapter in the International Federation for Promotion of Mechanism and Machine Science (IFToMM). He is a member of the editorial boards of over ten academic journals.