Convolutional Neural Network in Intelligent Fault Diagnosis Toward Rotatory Machinery

Rotating machinery is of vital importance in the ﬁeld of engineering, including aviation and navigation. Its failure will lead to severe loss to personnel safety and the stability of the equipment system. It is a long way to investigate the relevant fault diagnosis method, especially the intelligent fault diagnosis method on the basis of deep learning. In consideration of the limitations of traditional fault diagnosis approaches based on shallow layer network structure, the methods based on deep neural network (DNN) are worthy of thorough exploration. As a common DNN with special structure, deep convolutional neural network is of great concern in intelligent fault diagnosis due to its advantages in processing nonlinear problems. This review will play an emphasis on convolutional neural network (CNN). The basic structure and principle are introduced. The applications of CNN-based fault diagnosis method in rotating machinery are summarized and analyzed. Furthermore, the diagnosis performance and potential mechanism from different CNN methods are discussed. In the end, this review is highlighted on the challenges and the potential key points in research on novel intelligent fault diagnosis strategies. The corresponding analysis and discussion will provide some references and lay the foundation for the investigation in related ﬁelds.


I. INTRODUCTION
As an interdisciplinary and comprehensive technology, fault diagnosis is a critical research direction in the control field, which could be considered as the necessary precondition to guarantee the safety and reliability of mechanical equipment [1], [2]. Rotating machinery plays an essential role in many domains, including aerospace, nuclear power, shipbuilding, and railway. On account of system complexity and the influence of nonlinear factors, equipment faults frequently in operation could result in the damage of the equipment itself, the influence on the production, economic loss and even catastrophic safety accidents [3], [4]. Moreover, it is of great difficulty to conduct its fault diagnosis in condition monitoring due to the uncertainty and complication of failure.
The associate editor coordinating the review of this manuscript and approving it for publication was Dazhong Ma . Therefore, it is particularly vital to strive for the fault diagnosis strategies of rotary machinery.
In academic research and practical applications, numerous studies have been performed on widely used traditional fault diagnosis methods [5]- [7]. Yu et al. integrated polynomial chirplet transform method with strategy of synchroextracting to investigate nonstationary signals of rotating machinery [8]. Aiming at the challenge of underdetermined blind source separation, Hao et al. analyzed bearing fault by combining optimized intrinsic characteristic scale decomposition and local non-negative matrix factorization [9]. With regard to complexity of hydraulic system and its fault characteristics, Zheng et al. employed a new feature extraction approach named improved empirical wavelet transform to investigate the fault of hydraulic pump [10]- [12]. On the basis of local mean decomposition, a new approach was developed for fault diagnosis of planetary gearboxes, with an integration of amplitude and frequency demodulation analysis [13].
By exploiting a windowing and mapping technology, gear tooth fault detection was investigated by Liang et al. [14]. However, traditionally, diagnostic methods are highly dependent on experience and people, which possess some limitations in practical applications.
Intelligent development is an irresistible trend in the engineering field, and intelligent fault diagnosis has emerged in the meantime. Recently, some of researches have made good achievements on intelligent fault diagnosis based on traditional machine learning [15]- [18]. On the basis of non-Naive Bayesian classifier, an intelligent method was constructed combined with Empirical Mode Decomposition (EMD) for compound fault diagnosis [19]. By integrating a symbolic aggregate approximation with the nearest neighbor classifier, Georgoulas et al. investigated fault features of rolling element bearing under changing fault positions and various severity. It has been demonstrated that this approach presents favorable reliability and robustness [20]. In addition to movingpeak-hold strategy, Song et al. integrated statistic filter and wavelet package transform to accomplish features extraction of roller bearings fault [21]. Tang et al. employed Shannon wavelet support vector machine (SVM) to investigate gearboxes faults of wind turbine [22]. Han et al. discussed intelligent fault diagnosis approaches of rotating machinery and made comparisons, including random forest, ANN, and SVM [23]. Based on SVM, sparse representation was jointly employed to conduct fault diagnosis [24]. With the optimization method of quantum genetic algorithm, Zhu et al. used SVM to diagnose fault of rotary machinery [25]. To overcome multiple failures problems, an improved SVM was investigated by Deng et al. [26]. Compared with traditional methods, although the above intelligent fault diagnosis methods have been competent for coping with some complex nonlinear problems, their diagnostic characteristics could still be influenced by external interference and loss of useful information.
In order to overcome the above deficiencies shortcomings of the shallow neural network, deep neural network (DNN) possess the capability of automatic learning effective distinguishing features and more abstract representation, with superiority of addressing complex classification issues. Hence, fault diagnosis approaches based on DNN have intrigued the interests of researchers in machinery field [27]- [29]. Ma [36]. Owing to the special characteristics of local connection, weight sharing, and spatially subsampling, convolution neural network (CNN) presents the advantage in comparison to other DNN-based methods and arouses great interests in intelligent fault diagnosis [37]- [39].
A hierarchical structure called adaptive deep CNN was constructed by Guo et al. to diagnose bearing fault [40]. By the use of multiple sensors, Xia et al. studied the application of CNN-based methods in fault diagnosis of rotating machinery [41]. A new CNN network was established for bearing fault diagnosis, with particle swarm for parameters optimization [42]. Dempster-Shafer theory was combined with ensemble CNN to perform fault diagnosis of bearing, which represented desirable diagnostic properties and adaptive capacity under various situations [43]. Chen et al. introduced Naïve Bayes data into CNN, and investigated its application in crack detection of nuclear power plant [44]. Combined with Fourier transform, CNN framework was constructed for condition monitoring and fault diagnosis of rotatory machinery [45]- [47]. Multiscale feature learning was integrated with CNN, Ding et al. took the wavelet packet energy as the input data for bearing fault diagnosis [48]. Moreover, multi-wavelet transform and timefrequency transform were employed in advance, and the obtained figures were input into CNN to diagnose the fault of gear [49]. Lei et al. proposed a new learning method called deep normalized CNN to research on the bearing failure, effectively addressing the problem of imbalanced distribution [50]. Then their teams achieved the application of marked data acquired from one equipment to the classification of other equipment [51].
The emphasis of this review is on the CNN-based intelligent fault diagnosis. In the beginning, brief introduction is provided on the architecture and principle of CNN. Then, the object of analysis and discussion is focused on representative rotating machinery, bearing, gear/gearbox and pump included. Furthermore, the applications of CNN for fault diagnosis of the machinery mentioned above are emphatically summarized and discussed. Finally, the potential challenges and research orientations are prospected for relevant further research reference.

II. CNN A. DL
Artificial intelligence (AI) is a fascinating and interdisciplinary science for the purpose of achieving the simulation of human behavior with a tool of computer. As demonstrated in SIRI and AlphaGo, striking development and extensive applications in various fields have been experienced by AI [52], [53]. In terms of increasing data and complexity of machinery failure, some techniques on the basis of AI will VOLUME 8, 2020 present the superiority to traditional methods in accuracy of fault diagnosis [54], [55].
The single-layer neural network called Perceptron can only perform linear classification, while admirable nonlinear classification can be achieved by the two-layer neural network. Deep learning (DL) was firstly proposed by Hinton et al., which is one kind of machine learning and reveals the profound development of AI [56]. DL can interpret data by mimicking the mechanisms of the human brain. DL emphasizes the depth of the model structure, and can still further describe the rich internal information of the data. Therefore, DL can accomplish the high-level representations of extracted features, and can be viewed as feature learning or representation learning as well [57]. Furthermore, integrated pre-training with fine tuning, DL based on multilayer neural network attains more thorough representation features as well as stronger functional simulation capability [58]. DL has become prevailing because of the enhanced computational ability, cost reduction and the improvements in machine learning research [59]. The successful applications of DNN have been widespread in variety of respects, including image processing, speech recognition and audio processing [60]. Owing to the advantages of DNN in thorough learning capability, it is valuable to explore the feature mining and intelligent diagnosis methods based on DNN in research on mechanical failure [61]- [64].

B. ARCHITECTURE AND PRINCIPLE OF CNN
As a typical feed-forward neural network enlightened by the cerebral cortical neuron of the animal, CNN adopts a common way of supervised learning with primary hierarchical structure [65], [66]. The frequent overfitting issue in DL-based approaches can be effectively addressed via CNN [67]. CNN is composed of different layers, including data input layer, convolution layer, ReLU activation layer, pooling layer and fully connected layer. The layers mentioned above are employed to effectively achieve the feature extraction and classification, respectively. The fundamental structure is displayed in Figure 1. Actually, the role of convolution layer and pooling layer distinguishes CNN from other neural networks, which is considered to be implemented on a flat surface. Meanwhile, this is the superiority of CNN. The important elements of CNN are summarized in the following Table 1. Local connection means that each neuron in the convolutional layer is only connected to the neuron in a local window in the next layer, and every kernel is viewed as a small local window with a certain stride. Weights sharing can be understood that one convolutional kernel can only capture a kind of specific local feature from the input data. As a function of pooling layer, down-sampling or subsampling is employed to feature selection to reduce the amounts of features.
The following equation is to express the calculation process of the convolution layer: Among them, (×) is employed to perform the convolution. x represents the input data. Then the input data experience the convolution of the kernel kj and the introduction of the bias b l j . Ultimately, the function is used to obtain the output result.

C. INTELLIGENT FAULT DIAGNOSIS METHODS BASED ON CNN
As a new interdisciplinary subject, intelligent fault diagnosis absorbs the advantages of both mechanical engineering and computer science, which plays a significant role in processing massive data of mechanical failure [28], [69], [70]. On account of the special structure and powerful functions of CNN, it has been commonly used in the respect of image processing, defect recognition, text processing and natural language processing [71]- [73]. Moreover, CNN has attracted considerable attention in intelligent fault diagnosis, especially in the field of rotating machinery [74], [75].
Instead of necessary and tedious multiple steps required in traditional methods and traditional intelligent fault diagnosis approaches, new intelligent fault diagnosis technologies can accomplish automatic learning of extracted features [76].

III. CNN-BASED FAULT DIAGNOSIS FOR ROTATING MACHINERY
With the progress of strategies for machinery fault diagnosis, novel intelligent fault diagnosis methods on the strength of CNN strongly present their superiority to other similar methods. It has been demonstrated that some investigations on CNN based diagnosis approaches successfully possess favorable classification effect and higher diagnostic performance in application of machinery [77]- [79]. In regard to the object of analysis, some significant and extensively used rotary machinery will be conducted, mainly involving bearing, gear/gearbox and pump. In spite of different kinds of rotating machinery, the similarity is that varying working loads and the interference from ambient noise will have an influence on the acquired vibration data. Moreover, the final detection and prediction results will be affected in different degrees. This is exactly the challenge of the research on fault diagnosis academically and practically. Owing to the special structure and characteristics, CNN presents powerful capability of automatic feature learning. CNN-based intelligent diagnostic methods greatly decrease the dependence of expert experience and complex preprocessing steps. However, varying designs of CNN could concentrate on the different problems and aim at the promotion of the classification performance of the approaches. In the light of the raw vibration signals and the spectral images in intelligent machinery fault diagnosis, the commonly-used structure involves one dimensional (1D) and two dimensional (2D) CNN [71], [72], [80]. Therefore, each kind of rotating machinery will be analyzed and discussed in accordance with the research on 1D and 2D CNN respectively. Correspondingly, their intelligent fault diagnosis methods will be discussed from the perspective of diagnostic accuracy and classification discrimination, the diagnosis enhancement of potential mechanism included necessarily.

A. CNN-BASED FAULT DIAGNOSIS FOR BEARING
Accounting for around more than 45% in failure of all equipment, the fault of bearing is considered to be widespread in machinery field and may bring about great loss in personal safety and national economy [81]- [83]. Consequently, it is necessary and significant for the research on bearing fault diagnosis methods. To make up for the shortcomings of traditional diagnosis methods, DNN-based approaches are worth exploring and investigating thoroughly.
Raw 1D vibration data can be directly taken as the input of neural network, and 1D CNN has been developed and employed for fault diagnosis of bearing.
To cope with the reduction of performance resulting from changing working load and the interference of environmental noise, a novel end-to-end method based on CNN was constructed for bearing fault diagnosis [37]. It is worth mentioning that special training method is one lightspot of this work. Dropout and small batch training technologies were employed in the framework, and complex preprocessing was well avoided. The Adam Stochastic optimization algorithm was used in the training step to minimize the loss function without regularization term. The classification accuracy of 99.77% was achieved when signal noise ratio was 10. In order to investigate the potential mechanism and present the extracted features from each layer of CNN, t-SNE method  was used to visualize and understand the classification effect, as shown in Figure 2.
With introduction of residual learning, a new CNN framework was built for fault diagnosis of wheelset bearings by Peng et al [84]. This method expresses the validity and feasibility of diagnosis, and the results indicate that more than 95% diagnostic accuracy is obtained in a noisy environment. It can be found that the features learned by deeper 1D CNN from raw data are well differentiated, as displayed in Figure 3. Furthermore, on the basis of transfer learning, Yang and his colleagues also investigated a new deeper CNN applied to wheelset bearings and demonstrated its validity in diagnostic accuracy [70].
In terms of the advantages of CNN in image processing, it is available to exploit 2D CNN-based approaches for fault classification via the transformation of raw vibration signal.
In consideration of the limitations of noise and randomness in CNN, the LiftingNet was established to diagnose the motor bearing fault [85], [86]. This method reaches a classification rate of about 99.63% (trained for 40 loops) and still presents admirable feature learning and classification accuracy under various rotating speed and noisy environment. It should be VOLUME 8, 2020 pointed out that the disadvantages of the network were analyzed and discussed, including the deficiencies in identification of failure severity, rapid learning and initial setting. Similarly, another the method was proposed by Hoang et al. based on CNN, realized accuracy of 100%. Even under the interference of noise, it achieved the desirable classification accuracy of 97.74% [87].  [62], WDCNN represents deep convolutional neural networks with wide first-layer kernels [89], WDCNN(AdaBN) represents WDCNN with AdaBN, AdaBN is a simple method named Adaptive Batch Normalization TICNN represents [37], and HCNN represents hierarchical convolutional neural network [90].
The potential repeated effects resulting from the development of bearing faults were taken into consideration, cyclic spectral coherence was introduced to complete data preprocessing. A novel CNN diagnosis method was investigated on the basis of LeNet-5, with a distinct enhancement of recognition capability [88]. Group normalization was introduced and employed to mitigate the influence of data imbalanced distribution on the network effectiveness, with a promotion of generalization ability. In comparison to other relative researches, this approach reaches an average accuracy of 98.93%, which shows a remarkable superiority. Even though working conditions are set as challenging and changing, the diagnosis accuracy is improved to over 99%, which presents the advantage over other relevant strategies (Figure 4).

B. CNN-BASED FAULT DIAGNOSIS FOR GEAR AND GEARBOX
Similar to the bearing failure, it has been researched that around 40% machinery fault results from gearbox or gear [91]. In order to maintain mechanical safety and stability, it is crucial and challengeable to perform the research of its fault diagnosis methods [92], [93]. Furthermore, some investigations on DL-based fault diagnosis toward gear and gearbox have achieved successful results [94], [95]. It will play an emphasis on CNN-based intelligent diagnosis as well as combined technologies.
In order to decrease the number of parameters and appropriately avoid overfitting in common CNN, residual connection and separable convolution was introduced to construct an improved CNN architecture [96]. The mixed operating conditions were taken into account. The residual connection was devised to overcome the representation difficulty of the features in upper layers. The framework is displayed in Figure 5. Moreover, inspired by the research on hyper-parameter optimization, one of the tricks called random search algorithm was employed to choose suitable hyperparameters [97]. In comparison to the accuracy of 99.50% used by conventional CNN, this method achieved 99.75% under compound situations, which shows a slight advantage over other DLbased approaches. With the purpose of tackling the discrepancy of the marginal and conditional probability distribution, multi-layer balanced domain adaptation was used in the training process followed by the feature extraction of traditional CNN [98]. In view of the limitation of the marked data, the domain adaptation step was brought into the network and arranged behind the feature extraction layer and before the output layer. As for the selection of weight parameters of the loss function, the suggestions of the relative researches were adopted. It can be observed that the phenomenon of overlapping was lightened with the strategy of multilayer balanced domain adaptation ( Figure 6(g)-(i)). The proposed method presents the superiority to other approaches with respect to the effectiveness of each layer in the network.
Wu et al. developed a CNN structure for gearbox fault diagnosis, effectually coping with the existing challenge on endto-end fault diagnosis. Prognostics and Health Management 2009 gearbox challenge data and a planetary gearbox test rig were used to verify the effectiveness of the method [99]. The maximum accuracy of 99% was acquired, which outperformed the other three approaches for comparison.
Through the comprehensive usage of 1D and fused dilated convolutional layers, an improved CNN was developed for fault diagnosis of planetary gearboxes [100]. The architecture included one 1D convolutional layer, four alternate fused dilated convolutional layers and max-pooling layers, and one fully connection layer besides the input and output layer. The design of varied convolutional layers was beneficial to the expansion of the receptive field, which made it available that more useful information was obtained from  vibration signal. As shown in Figure 7, the original vibration signal is used directly as the input data. The average test accuracy of 97.73% was obtained, which was superior to those of other traditional intelligent methods. From the perspective of the reparative fault information gained from varying detection locations, a new CNN was investigated for fault diagnosis of planetary gearbox, fusion data from vibration signals in different directions as input [101]. In the training method, the Adaptive Moment Estimation (Adam) Stochastic optimization algorithm was employed to accomplish the optimization of learning parameters. The accuracy of above 97% was obtained with this method, which demonstrated that the adoption of data fusion enhanced the diagnostic performance, as can be seen from Figure 8.
Through the employment of the data preprocessing, 2D CNN is investigated to diagnose the fault of gear or gearbox by the use of the 2D images converted from vibration signals.
A novel CNN was constructed for fault diagnosis of planetary gearboxes, and discrete wavelet transformation was used to achieve preferable representation in regard to timefrequency distribution [102]. The admirable average accuracy reached around 99.3%, which demonstrated the superiority of learning ability to other traditional machine learning approaches.
Aiming at the exploration of complex compound-faults for gearbox, Liang and his colleagues combined wavelet transform and multilabel classification, and designed a new CNN model to achieve satisfying diagnosis performance [80]. It should be noted that this approach could be applied for single fault and composite fault. The composite fault discussed above is not another kind of single fault, but treated as a problem of image classification. Two datasets were selected to verify the diagnosis performance and stability of the method, including 2009 PHM data and the data from the University of Macau. It can be demonstrated that the optimal accuracy achieves 95.513% and 98.75%, respectively.

C. CNN-BASED FAULT DIAGNOSIS FOR PUMP
With the improvement of intelligent fault diagnosis technology, some relative researches have raised great concern on VOLUME 8, 2020 FIGURE 9. The feature learning processes in simulation verification. D1, D2 and D3 denote the three dimensions, and MED represents minimum entropy deconvolution [117].
complex and versatile pumps [103]- [105]. Pump is considered as one kind of significant and widely-used machine, and mechanical failure caused by pump in machinery system is counted as around over half the proportion [106], [107]. Owing to the defections of fault diagnosis traditional methods discussed above, it is very challengeable and meaningful to exploit the intelligent diagnosis strategies of pump [108]- [111].
Recently, most of the researches on intelligent diagnosis for pump have focused on traditional intelligent methods [112], [113]. There are few researches on application of DNN-based methods in pump fault diagnosis, especially on CNN-based approaches. On the basis of LeNet-5, a novel CNN was developed for fault diagnosis. The advantage in image processing of CNN was made full use, with a transformation of raw signals into images and remove of some useless features [114]. Similarly, based on LeNet-5, a new online CNN framework was constructed, and several offline CNN was employed to enhance the performance. It can be found that the prediction accuracy achieved 99.98% from pump dataset [115]. The following three datasets were exploited to detect the accuracy, motor bearing dataset, self-priming centrifugal pump dataset, and axial piston hydraulic pump dataset included. The satisfied accuracy of 99.79%, 99.481%, and 100% were acquired respectively, with a superiority to some other DL based fault diagnosis methods. Yan et al. also used CNN based diagnostic approach for hydraulic pump, and the accuracy achieved around more than 90% even under changing speeds [116]. In order to explore unknown failure mechanism of axial piston pumps, a new CNN was constructed on the basis of minimum entropy deconvolution, which was employed to preprocess the raw signal to enhance the stability of feature learning and classification performance. The classification accuracy reached up to 97.33%, which was superior to other methods of comparison [117]. Moreover, it could be demonstrated that the method presents the advantageous clustering results from t-SNE visualization, as shown in Figure 9. Wang et al. developed a new CNN for fault diagnosis of centrifugal pump, using data fusion to complete conversion of original signals [118]. Yang et al. integrated hierarchical symbolic analysis with CNN, accomplishing a diagnostic accuracy of 98.50% for centrifugal pump [119].

IV. CONCLUSIONS AND PERSPECTIVES
Intelligent fault diagnosis generally includes traditional methods based on shallow network and modern methods based on DNN. Owing to the potency of DL in big data processing, it is also a powerful tool for handling mechanical big data [120]- [122]. Moreover, the technologies based on DNN have unique advantages in fault classification due to its potential automatic learning and feature extraction. This review primarily retrospects the applications of CNN-based approaches in rotary machinery, bearing, gear/gearbox and pump included. In particular, it focuses on the classification effect of recent relative researches and contrastive analysis with other methods. Presently, some of researches have made desirable achievements. Nevertheless, some challenging issues are worth of thorough thinking and research.
(1) Some of the methods discussed cannot present the desirable classification performance, which may be attributed to incomplete feature extraction. Some useful information could be vulnerable to be covered, which results in the limitations in extracted feature representation.
(2) Many methods concentrate on processing single fault, moreover, the generalization ability of the methods could not be further explored. Therefore, whether the methods suitable for complexed compound fault is not clear, which is exactly one of the limitations of CNN-based approaches in practice.
(3) Some of the current newly developed methods can be viewed as fusion of different approaches. Theses research methods have partly promoted the diagnostic accuracy in comparison with the traditional ones, but the intrinsic mechanism of enhancement remains unclear, which makes it difficult to generalize other machines.
In the future, in consideration of CNN-based fault diagnosis, the possible research object will play an emphasis on the composite fault, and the prospective strategy will belong to the approaches applied to more kinds of machinery.