One Dimensional Convolutional Neural Networks Using Sparse Wavelet Decomposition for Bearing Fault Diagnosis

This paper proposes a novel algorithm for bearing fault diagnosis using sparse wavelet decomposition for feature extraction combined with a multi-scale one dimensional convolutional neural network (1-D CNN). The proposed algorithm consists of three stages. The first stage determines bearing fault frequency bands according to bearing physical parameters and constructs a sparse wavelet decomposition structure. The second stage decomposes a raw bearing signal into multi-resolution signals based on a decomposition structure achieved at the second stage. Finally, the decomposed multi-resolution signal features are fed into the sub-neural networks according to the multi-scale 1-D CNN (MSCNN) network, and then the outputs of the final convolutional/polling layers are concatenated into a single channel which is further used as the input to a fully connected layer. In comparison with the other bearing fault diagnosis methods, our proposed algorithm can achieve a higher classification accuracy of 99.85% using the Case Western Reserve University (CWRU) bearing dataset. The proposed algorithm is successfully validated via our designed experiments.


I. INTRODUCTION
Rolling element bearings in rotating machines are widely employed in industrial machinery and found anywhere from household appliances to heavy duty machines. The rolling element bearings are critical components. A failure of rolling element bearings in a rotating machine is one of the most frequent reasons for machine breakdown [1], [2], [3], [4], [5]. Using diagnostic methods to detect bearing defects at an early stage with proper maintenance can greatly extend a working life of rotating parts in machinery and reduce the economic loss due to downtime [2], [6], [7], [8], [9].
Generally, any rolling bearing in operation generates vibration signals due to the unevenness of the forces acting on internal components. If there is damage occurred in bearing The associate editor coordinating the review of this manuscript and approving it for publication was Wentao Fan . components, the bearing will produce a periodic pulse signal that is different from the one produced by a normal operating condition. Therefore, diagnostics of bearing defects can be achieved by analyzing and monitoring the acquired vibration signals [10], [11] on a regular base. Common methods for bearing fault diagnosis by employing vibration signals are classified into traditional signal processing-based methods, machine learning-based methods, and deep learningbased methods. In traditional signal processing-based methods, the vibration signals are analyzed in time, frequency, or time-frequency domains to derive vibration signatures and then using the achieved characteristics of the vibration signatures to determine the type of defects [12]. Common signal processing techniques include fast Fourier transform (FFT), wavelet transform (WT), empirical mode decomposition (EMD), and wavelet packet transform (WPT) [14], [15], [16], [17], [18]. Time-domain analysis often determines the bearing condition by directly investigating some scalar indicators in vibration signals [19]. These scalar indicators usually include, but are not limited to, signal peak value, root-mean-square (RMS), crest factor, kurtosis, spectral kurtosis, and clearance factor [20], [21], [22], [23]. However, the time domain-based signal analysis is difficult to handle non-smooth vibration signals. This fact leads to attention and development of frequency domain-based signal analysis methods. M. Cocconcelli et al. [24] used the short-time Fourier transform (STFT) to process the original vibration signal and analyzed the results of the conversion in frequency domain to conduct the detection of bearing defects. Similarly, J.-H. Lee et al. [25] used the Wigner-Ville distribution in their study to perform the frequency domain analysis of vibration signals for mechanical fault detection. Although various signal processing methods can effectively extract the features, that is, vibration signatures, present in the original signal, the determination of defect status for these features is still conducted manually. This process relies heavily on judgment rules developed by experts using the characteristics exhibited from typical defects. However, when bearing defects produce atypical or obscure characteristics, these manually developed judgment rules may produce erroneous judgments due to the inability to describe these characteristics.
On the other hand, machine learning can discover unique patterns hidden in data that are difficult to be detected manually. A machine learning model is capable of making accurate judgments based on unique data patterns learned from old data when processing completely new data. M.-Y. Chow et al. [26] used an artificial neural network (ANN) to detect early mechanical faults occurring in the rotors of small and medium-sized motors. A. Malhi and R. X. Gao [27] used the principal component analysis (PCA) to automatically extract the vibration signal features most relevant to the defect state and classify the features using a feedforward neural network (FFNNs). Y. S. Wang et al. [28] proposed an intelligent engine fault diagnosis method by employing the Hilbert-Huang transform (HHT) with the support vector machine (SVM) algorithm. Similarly, D. H. Pandya et al. [29] applied the HHT to process the acoustic signals of bearings with the difference in which the determination of a type of bearing defects was done using the K-nearest neighbor algorithm (KNN). F. Shen et al. [30] used a transfer learning-based singular value decomposition (SVD) model to accomplish the diagnosis of bearing defects. Although the machine learning-based bearing defect diagnosis algorithms can achieve good results in most cases, they have difficulty to extract more abstract features due to their shallow model, which limits for a further accuracy improvement of diagnosis performance. On the other hand, the methods with deep learning-based bearing fault diagnosis achieve the improved performance and have gradually become an active research area in recent years. A deep learning model usually relies on a deeper neural network structure and it requires a larger amount of data to conduct network training. Although the deep learning requires greater computational load as well as a larger amount of training data requirements, such problems can be solved with an increase in computer performance and the number of available open-source datasets in recent years.
Most applications of deep learning models for bearing defect diagnosis tend to use the raw signal directly for end-toend learning. M. Zhao et al. [31] proposed a novel deep neural network structure called deep residual shrinkage networks to improve the accuracy of fault diagnosis. However, using the signal features extracted from processing the raw signals may significantly improve the bearing diagnostic accuracy by using the deep learning neural networks. Our previous research [32] using the input features generated from STFT and employing the structure of a convolutional neural network (CNN) optimized by the particle swarm optimization (PSO) algorithm has achieved improved classification results based on public datasets. Again, a combination of STFT and CNN with tunable input size for bearing fault diagnosis are explored in [33]. Furthermore, evaluation of deep learning neural networks such as CNN and long-short term memory (LSTM) with the processed input features using the various signal transforms, including STFT, Cepstrum, WPT, and EMD has been reported in [34]. Although all above mentioned methods have obtained good results on multiple public datasets, these deep learning models only performed at a single scale level of input signals. There is still a room for further improvement in terms of classification accuracy and computational load. To proceed for development, multiscale neural network-based algorithms are recently proposed. W. Chen and K. Shi [35] used an end-to-end multiscale convolutional neural network (MSCNN) to classify the types of bearing defects. In their approach, the original signal, the smoothed signal, and the down-sampled signal constitute three different scales of inputs. G. Jiang et al. [36] also used the MSCNN to investigate the health status detection of wind turbine gearboxes. As matter of fact in [36], the MSCNN simply used the down sampled signals at different scales as the inputs, it achieved excellent detection results. Furthermore, Y. Yao et al. [37] used a one-dimensional convolutional layer to decompose the original signal into multi-scale features to operate in an end-to-end form for improving the practicality. In addition, their method introduces an attention mechanism to improve the final detection. Our initial results using 1-D CNN, auto-encoder, deep residual network (ResNet), and gate recurrent unit (GRU) along with input multi-scale features using down sampled signals, EMD, and WPT are reported in [38], where 1-D CNN with the WPT signal decomposition demonstrates the best classification accuracy. However, the methods proposed in our studies [38] did not explore how to construct multi-resolution signal features in different frequency bands, since the signal characteristics in bearings often happen to be located in several specific frequency zones [32], [39], [40], [41]. These frequency zones are designated by the low frequency, bearing defect frequency, bearing natural frequency, and high frequency zones, respectively, which can be determined from the shaft speed VOLUME 10, 2022 and bearing physical parameters. Furthermore, the frequency bands from these zones are very narrow and sparse relative to the signal sampling rate in the data acquisition system. Therefore, it is of practical significance to extract signal features from these frequency zones and construct them as signal multi-resolution features to improve the neural networks' diagnosis performance.
To extract the signals in the bearing characteristic frequency bands, we propose to enhance the sparse WPT decomposition algorithm from our previous work [42], [43], [44] by searching an optimal sparse decomposition structure, considering the fact that the path for WPT decomposition follows Gray codes. Once the sparse WPT decomposition is completed, the multi-resolution features can be applied to the sub-neural networks in parallel and then followed by a fully connected layer in the 1-D CNN network. The contributions of this paper are listed as follows: a. A framework using multi-resolution signal features in the bearing characteristic frequency bands and multi-scale 1-D CNN is proposed. b. An improved sparse WPT decomposition method for extracting signals in the sparse multi-frequency bands is developed. This method first uses Gray codes to generate all the paths required for completed signal decomposition for each frequency band. Then a sparse decomposition path is found based on the lowest cost of the decomposition operations. c. A multi-scale 1-D CNN network is designed using the multi-resolution signal features from each bearing characteristic frequency band. d. The proposed method is validated using public dataset as well as dataset from our designed experiments.

II. BEARING FAULT CHARACTERISTICS
Bearing faults come from roughness fault and single-point fault [32], [39], [40], [41]. The roughness fault is due to the condition in which the surface of a bearing has considerably degraded over a large area. In fact, it is a type of distributed and noncyclic fault caused by improper lubrication, erosion, or pollution. The roughness fault does not contain identifiable characteristic frequency components. On the other hand, most importantly, a single-point fault is a localized fault caused by a small hole, a pit, crack, or a missing a piece of material. When a localized defect in the bearing repeatedly strikes other parts of the bearing during rotation, the successive strikes can produce a series of impulse responses, whose amplitudes are modulated [1]. Thus, the characteristic fault frequency components created by a periodic impact vibration are produced, which can be analyzed to detect the types of bearing faults. Our research focus on the detection and diagnosis of singlepoint fault from ball bearings.
A. BEARING DEFECTIVE FREQUENCY BANDS Fig. 1 shows the structure of a typical ball bearing, which consists of inner race, outer race, cage, and balls. The characteristic bearing defect frequencies can be determined from Equations (1)- (4), respectively.
where f OD , f ID , f BD , and f Cage are the outer race, inner race, ball, and cage frequencies, respectively. n and f rm are the number of balls and shaft speed, respectively. BD and PD are the ball diameter and rolling pitch diameter, respectively. Note that φ is the contact angle between the ball and bearing as shown in Fig. 1, which is approximately to be 0∼10 • [13]. The bearing fault frequencies can be grouped into the following four zones: low frequency, bearing defect frequency, bearing nature resonance frequency, and high frequency zones as shown in Fig. 2. The bearing defect frequency zone has the low-frequency components coming from the entry of the ball/roller in the vicinity of defect and high-frequency component caused by the impact of rolling element on the trailing edge of the defect. Again, note that a full-band vibration signal contains not only frequencies from the shaft speed and bearing defects but also the natural frequencies and high frequency noise caused by the bearing rotation. If the full-band vibration signals were used directly as training data, the detection effectiveness of the model may eventually be diminished by learning features that are irrelevant to the labels. Therefore, extracting features from the characteristic frequency bands and using the extracted features as training data samples for the model will greatly improve the final learning and detection accuracies.
As an illustration using the physical parameters of test bearing provided by the CWRU bearing dataset, the shaft speed of f rm is set to 30 Hz (1800 rpm) since a slight variation between 29.93 and 29.95 Hz in the rotational speed of the bearing rolls during data acquisition and the contact angle is used as φ ≈ 10 0 [13], the corresponding fault frequency values for f OD , f ID , f BD , and f Cage by Equations (1)-(4) are shown in Table 1.
From Table 1, the band between f BD and f ID is a target band for a single-point fault, which is between 70.71 Hz and 162.45 Hz.

B. OTHER RELEVANT FREQUENCY BANDS
In addition to the fault frequency band where the inner race, outer race, and ball frequency components are located, the proximity band of f rm also contains the corresponding characteristic information, including the cage frequency. The fault characteristic information in this frequency band comes from the effect of bearing defects on the rolling. In our research, a frequency band from 0.5 f rm to f rm , which is 15 Hz to 30 Hz, is considered as the low frequency zone. An additional frequency band from 162.46 Hz to 500 Hz is considered to contain information about the defect characteristics in the high-frequency band. The reason for choosing 500 Hz as the maximum frequency is to prevent the invalid information from the high frequency noise which may affect the validity of the training data. After obtaining the required three frequency bands, each band is expanded by 0.25 f rm from its band frequency edges in order to ensure the sufficient frequency margins to cover the required frequency range. Three frequency bands and their respective frequency edges for extracting training data samples are listed in Table 2.

III. SPARSE WAVELET DECOMPOSITION A. GRAY CODES IN WAVELET DECOMPOSITION
To extract the corresponding information of the characteristic and relevant frequency bands from the full-band information of the original signal, a complete wavelet packet decomposition (WPT) is first applied. It decomposes the original signal into two sub-bands at low and high frequencies and continues the decomposition for each subband at the next level. Fig. 3 displays the complete decomposition structure from decomposing a signal with a frequency band of 0∼200 Hz up to three levels. As shown in Fig. 3, symbol ''0'' represents the low-frequency decomposition while ''L'' represents the resulting low-frequency subband. Symbol ''1'' denotes the high-frequency decomposition while ''H'' represents the high-frequency subband. The arrows represent the direction of the frequency distribution at the current decomposition level, with the arrows pointing from low to high frequencies. A green arrow indicates no change in the direction of frequency distribution, while a red arrow indicates a direction change of the frequency distribution. According to Fig. 3, the complete decomposition paths follow Gray codes [41], [42], [43] and can be easily seen in Table 3 for decomposition with three levels. Note that the results in Table 3 can be extended to n-level decomposition in which 2 n sub-bands can be arranged in an order from low to high frequencies according to n-bit Gray codes in the generated order.
Using this approach, the decomposition paths for the target frequency band can be easily obtained. As an illustration, if a target band has the frequency range from 75 Hz to 125 Hz, the required decomposition paths found from Table 3 or Fig. 3 are 010 and 110, as well as shown in Fig. 4. For bearing fault detection, we can use this approach to extract the multi-resolution signal features for a given bearing frequency characteristic band. As illustrative example, we can extract features from the band between 7.5 Hz and 37.5 Hz with the given full frequency band from 0 to 6000 Hz at a sampling rate of 12,000 Hz. We will describe our optimized method in the next section.

B. OPTIMAL SPARSE WTP DECOMPOSITION
To obtain the multi-resolution signal features using a sparse WTP decomposition for characteristic frequency bands as listed in Table 2, our proposed method is divided into three steps: determining the optimal decomposition level; finding a sparse decomposition path structure; and performing decomposition execution.
First, for given the sampling frequency of f s , and target frequency band ideally ranging from f L to f H , the minimum number of decomposition levels N min is determined by Let N max be the maximum number of decomposition levels: where L max is the number of maximum allowable steps. We can search for the optimal N best so that the following error is minimized: where f Lt and f Ht are the actual lower frequency and the upper frequency edges, respectively. They can be determined by Note that f is the band resolution at decomposition of level N , given below: Let us define the desired minimum error as where the empirical constant λ determines the minimum error as a percentage of the target frequency band. The algorithm for searching the optimal decomposition level N best can be illustrated in the following pseudo-code.  With N best , the second step is to find a sparse decomposition structure. This can be illustrated using an example shown in Fig. 5 for the case of extracting band 3. The following parameters are used: f s = 6, 000 Hz, f L = 169.95 Hz, f H = 500.00 Hz, L max = 6, and λ = 0.15, the algorithm finds L best = 7, f Lt = 140.63 Hz and f Ht = 515.63 Hz, respectively. Eight 7-bit Gray codes covering this frequency band, beginning with 0000010 and ending with 0001111 are listed in Fig. 5. Again, note that any consecutive two Gray codes with least significant bit ending with 1 and 0 can be merged and the corresponding branch can be pruned by one level. Repeating such a process leads to four Gray codes as shown in Fig. 5. Eventually, a sparse decomposition structure using the four Gray codes is achieved and displayed in Fig 5. As can be seen, without the decomposition structure reduction, there are 7 × 8 = 56 decomposition operations. However, with the obtained sparse structure, only 13 operations are required.
Note that the extracted signal features are in a nature of sparse multi-resolution. The algorithm can be repeated for the other two bands similarly. The combined and simplified final sparse structure for three bands with a further reduced number of decomposition operations is displayed in Fig. 6. As shown in Fig. 6, the highest level is 10 and there are 31 decomposition operations in total. The third step is to perform signal extraction by applying the final sparse decomposition structure. The original raw signal with duration of 10 seconds at 12,000 Hz sampling rate is first pre-processed to have zero mean and then normalized. 512 segments with the same length are retrieved at the random starting locations. The length of each segment is determined to have no less than 2 N best +3 data samples in the segment. The requirement of the minimum length ensures that the original signal retains a data sufficient length to prevent loss of signal characteristics after N best level decomposition. Each segment is decomposed into three frequency bands based on the final sparse structure as shown in Fig. 6. After decomposition, the multi-resolution signal features from each band forms a feature vector. The feature vectors from all the three frequency bands are used as the input for the 1-D MSCNN network in training and detection processes.

IV. MULTI-SCALE 1-D CNN
In order to use three extracted feature vectors, our proposed method uses three parallel 1-D convolutional/polling layers separately. Fig. 7 displays the proposed structure. As shown in Fig. 7, each 1-D CNN extracts and learn feature data in its frequency range. The outputs from three 1-D CNNs in parallel are combined to form a single channel, that is, concatenation layer, which is fed into a fully connected layer and the softmax layer as the output layer. All the convolutional layers and first fully connected layer are activated with the ReLU function. The softmax function is used as the classifier at the output layer with 10 nodes, each represents a bearing class. Table 4 lists the detailed parameters of the designed multi-scale 1-D CNN structure. Note that maximum polling operation is used for all the polling layers.

V. EXPERIMENTAL RESULTS
In this section, the proposed method was first tested using the Case Western Reserve University (CWRU) bearing dataset and the obtained classification results were compared with VOLUME 10, 2022 the other mainstream methods. Second, the proposed method is further validated using our own testbed.

A. EXPERIMENTS USING CWRU BEARING DATASET
The hardware platform uses an X86 workstation with the following configuration: CPU Intel i7-6900k, GPU NVIDIA GTX 1080, RAM 128GB DDR3. In the MATLAB environment, data pre-processing, a sparse decomposition structure, decomposition execution, and feature vector generation are conducted. Tensorflow 2.6 is adopted as the main machine learning tool for training and evaluating multi-scale 1-D CNN neural network model. We choose nine bearing data files consisting of three sets of the inner race, outer race, and ball defects at the fault depth levels of 0.021, 0.014, and 0.07 inches, respectively, and one normal data file from the CWRU bearing dataset. For each fault depth level, we selected data files containing the ball, inner race, and outer race defects. Table 5 shows more detailed information and 10 class labels.
The extracted signal features as proposed in Section III from the CWRU bearing dataset are randomly divided into three parts for training, testing, and validation in the ratio of 6:2:2. These three parts of data samples are then used to train the neural network model 10 times independently, and the obtained average results are compared with our previous 1-D CNN methods [38] as shown in Table 6.
As shown in Table 4, the proposed method obtains the accuracy as 0.9988, averaged macro F-1score and Recall as 0.9988 and 0.9989, respectively, which are higher than what we achieved using our previous methods. Table 7 includes comparisons of our proposed algorithm with the other published methods. Note that only the accuracy values  are compared. As the result, our proposed method achieves the best performance in terms of classification accuracy.

B. DESIGNED EXPERIMENTS
Similar to [45], [46], Fig. 8 shows a data acquisition platform (testbed) established in our Engineering Design Studio at Purdue University Northwest to validate our proposed algorithm. It consists of a rotating power stage which is modified from a press drilling machine (Ryobi), joint assembly, bearing unit, a vibration sensor (CTC AC 102-1A accelerometer with 100 mV/G, ±10%), signal condition circuit (deigned PCB circuit), data acquisition board (NI USB-6356), and desktop workstation. Note that a joint assembly is used between the drill and bearing unit and a vibration sensor is installed at the six o'clock position of the bearing mounting. The vibration signals are captured via the designed signal conditioning circuit and sent to a data acquisition board using LabVIEW. Our complete platform is shown in Fig. 8.
The bearing defects are artificially created on the inside of outer race, the outside of inner race, and the ball surface, respectively. The locations and shapes of defects are shown in Fig. 9. In our experiments, a normal bearing and three fault bearings are driven at a rotational speed of 630 rpm, respectively. The vibration signals are collected at a sampling rate of 12,000 Hz with a duration of 10 seconds (120,000 samples) for each acquisition. The data collected from the four bearings are aggregated to form a self-collected bearing dataset. Details of the bearing physical parameters and the self-collected bearing dataset are shown in Table 8. Typical plots of the acquired vibration signals for the normal bearing, inner race defect, outer race defect, and ball defect are displayed in Fig. 10. Based on Section II and using the bearing contact angle as φ = 0 0 , the frequency ranges for three sparse decomposition bands (band 1, band 2, band 3) are determined as 2.62∼13.13 Hz, 25.02∼64.76 Hz, and 64.76∼500 Hz, respectively. 10 independent training sessions are conducted on our proposed multi-scale 1-D CNN. The training and validation loss curves with a confusion matrix are shown in Fig. 11a and Fig. 11b, respectively. Our proposed method demonstrates an excellent performance on our self-collected bearing dataset and achieves an average accuracy of 99.95% after 10 independent runs of training. In addition, the average values for macro F-1 score, precision, and recall are approximately the same, that is, 99.95%. Since our proposed method has a significant reduction of data dimension, it is suitable for real-time processing applications.
The proposed algorithm offers a superior performance because of the sparse wavelet decomposition stage in which the obtained multi-resolution signal features are targeted on the bearing characteristic frequency bands. Note that this method achieves the 1-D MSCNN network model based on the shaft speed, which is the limitation similar to the other standard CNN methods used for bearing fault diagnosis. Our future work will investigate the performance of the proposed method using the multiple metrics for different public datasets and different scenarios in machine fault detection.

VI. CONCLUSION
We have proposed a novel algorithm for bearing fault diagnosis using sparse wavelet decomposition for feature extraction combined with multi-scale neural networks. The algorithm first determines a sparse wavelet decomposition structure based on bearing characteristic frequency zones. The second stage decomposes a raw bearing signal into multi-resolution signal features. Finally, the algorithm applies decomposed signal features to sub-neural networks based on the one-dimensional convolutional neural network (1-D CNN). In comparison with other bearing fault diag-nosis methods, our proposed method can achieve a higher classification accuracy of 99.85% using the Case Western Reserve University (CWRU) bearing dataset. The effectiveness of our proposed method is also validated via our designed experiments with 99.95% classification accuracy on our self-collected bearing dataset. Our proposed algorithm has a potential for applications in real-time training and classification.