Mining of Weak Fault Information Adaptively Based on DNN Inversion Estimation for Fault Diagnosis of Rotating Machinery

Fault feature extraction plays a significant role in bearing fault diagnosis, especially in incipient fault period. Recently, deep neural network has been favored by many researchers due to its excellent hierarchical feature extraction capabilities. The existing diagnosis methods based on deep neural network mostly take original condition monitoring data as input and further convert fault diagnosis into pattern recognition issues. Although it improves the level of intelligent diagnosis, it is still confronted with practical problems. Since deep neural network includes many non-linear mapping layers, the extracted fault-related features show a high degree of abstraction, which reduces the practical level of this technology. Aimed at this problem, a mining method of understandably weak fault information is proposed based on deep neural network inversion estimation. From perspective of neuron response degree, this method mines the most sensitive input pattern that maximizes the neuron’s activation value of network output layer in the original input feature space. It can intuitively mine the weak fault information from the original signal. Two bearing experiments verify the effectiveness and reliability of the proposed method.


I. INTRODUCTION
With the rapid development of Internet-of-Things technology and cyber collaborative manufacturing technology, it is convenient to collect and store massive condition monitoring data of rotating machinery. This data is helpful to improve the level of smart manufacturing while bringing challenges to massive data analysis. The rolling element bearing is widely used and plays a significant role in major equipment of rotating machinery [1]. It is necessary to conduct accurate fault diagnosis of bearings in time, in order to extend the response time and reduce the times of accident happening. Therefore, handling massive condition monitoring data and extracting highly representative fault features have become a research hotspot in the context of big data [2]. Subsequently, datadriven intelligent fault feature extraction technology has been being developed rapidly.
The associate editor coordinating the review of this manuscript and approving it for publication was Nuno Garcia .
In general, the intelligent fault feature extraction method has mainly experienced three generations of technological development. The first is based on modern signal processing technology and built on time-frequency domain. The empirical mode decomposition (EMD) [3] method can be seen as representative. Compared with other time-frequency analysis methods, i.e., short time Fourier transform (STFT) [4] and wavelet transform [5] that employ predefined waveform basis functions with limited support and oscillation attenuation, this method is able to obtain adaptively generalized basis from input signal itself. It can realize signal decomposition based on the shape of the signal itself, and decomposes input signal into multiple intrinsic mode functions (IMF). It can be considered as an adaptive fault feature extraction method. A good review of EMD for fault diagnosis of rotating machinery can be found in Ref. [6]. However, there is drawback implied in this method. That is, it needs prior knowledge of experts to construct specific indicator that is used for the selection of fault-related IMF, when dealing with particular rotating component. The generalization ability of the EMD-based fault feature extraction methods is not well. The second is on the basis of shallow models. The artificial neural network (ANN)-based feature extraction method is the typical [7]. ANN is capable of learning complex nonlinear mapping relationship between fault types and condition monitoring data, by the use of its self-learning ability and fault tolerance, which helps researchers or technical staff master the knowledge that is difficult to be expressed in analytical form. The ANN-based fault feature extraction method has been attracted by many researchers and achieved great success [8]- [10]. The typical structure of ANN includes radial basis function network (RBFN) [11], multilayer perceptron (MLP) [12], [13], and so on. However, this kind of method has some shortcomings. Since the existing ANN-based fault diagnosis models adopt fully-connected inter-layer connections, the network needs to traverse all input nodes when learning certain types of features. This redundant connection makes ANN hard to extract fault features from original condition monitoring data. ANN needs to use expert knowledge to perform initial feature extraction for specific equipment, which reduces the generalization ability of ANN model. In addition, the ANN model applied in fault feature extraction is shallow, which severely restricts the network's ability to learn complex non-linear mapping relationships in fault diagnosis applications. The third is centered on deep neural network (DNN) [14]. The DNN-based overcomes the drawbacks of above-mentioned methods that need prior knowledge to do feature extraction. It can adaptively extract fault features from raw condition monitoring data, taking advantage of its hierarchical feature extraction and hierarchical feature expression abilities. Currently, the typical DNN-based fault feature extraction models include restricted Boltzman machine (RBM) [15]- [17], deep belief network (DBN) [18]- [20], convolution neural network (CNN) [21]- [24], and so on. These models have achieved great progress in the field of intelligent fault diagnosis. For example, Jia et al. [2] built a deep auto-encoder (DAE)-based diagnosis model, taking spectrum as network input. This model obtained highly discriminative fault features in the low-dimensional space, and achieved extremely high recognition performance in the identification of various size defects and fault types of rolling bearings. To further reduce the degree of manual intervention in the process of feature extraction, Chen et al. [16] proposed an adaptive and non-linear signal decomposition method based on convolution restricted Boltzman machine model that can learn shift-invariant waveform patterns. This method takes original vibration signal as input. It successfully separated transient impulse signal from the weak fault signal of bearings. Considering that deep neural network has many non-linear mapping layers and huge amount of network hyper-parameters, Zhang et al. [23] exploited weight sharing property of CNN and designed a deep convolution neural network with kernel size being 64 and 3, so as to reduce the difficulty of network parameter selection. This network used original condition monitoring data as input, and realized early fault detection of rolling bearings under strong background noise and variable working conditions. It can be seen from these research cases that the fault features extracted by the above-mentioned deep models are hard to be understood. It is difficult for subsequent researchers to comprehend what kind of fault-related components are mined from the input data. The whole deep models are used as black box. Further analysis of the extracted fault-related features only stays on the discriminative or clustering analysis in the reduced two-dimensional or threedimensional feature space.
Aimed at this shortcoming mentioned above, a mining method of understandable fault information is proposed on the basis of deep neural network inversion estimation. This method is built on the principle that the activation value of a neuron of network output layer reflects its sensitivity to the current input, and further constructs a mathematical model for revealing fault information contained in the extracted fault features. Then, starting from the overall structure of the deep neural network that has been trained and optimized, the proposed method applies gradient descent to mine the most sensitive input feature pattern that maximizes the activation value of the target neuron of network output layer in the original input feature space. Naturally, this sensitive pattern can intuitively reveal what kind of fault component is extracted and identified from original input data. Since the above operations are performed in the original input feature space, this method overcomes the problem that the extracted fault-related features are hard to be understood, while achieving weak fault feature extraction effectively.
The main contributions are summarized as follows: (1) the abstraction problem of extracted fault-related features based on DNN diagnosis models is clearly pointed out, which is ignored by many research studies. (2) To overcome this problem, a mathematical model for mining understandable fault information is proposed. It is able to realize mining of fault-related components from input data in the original feature space. (3) Two bearing experiments verify the effectiveness and generalization ability of the proposed method.
The rest of paper is organized as follows. The motivation is stated clearly in the Section 2. Section 3 presents the proposed mining method of understandable fault information based on DNN inversion estimation in detail. The effectiveness and generalization of the proposed method are demonstrated by two bearing experiments in Section 4. Finally, the conclusion is drawn in Section 5.

II. MOTIVATION
The fault-related components are easily submerged in the acquired condition monitoring data of rotating machinery, due to strong background noise or interference from adjacent components. Deep neural network has received widespread attention because of its excellent adaptive fault feature extraction ability [25], [26]. However, the original input features undergo multiple non-linear mapping layers in the network, which makes the features obtained on each hidden layer show a high degree of abstraction. The typical deep auto-encoder-based fault diagnosis network employed in Ref. [2] is taken as an example to illustrate this problem faced by deep neural network. Fig. 1(a) shows the overall structure of the DAE model, which is composed of one input layer, three hidden layers, and one output layer for classification of fault types. The number of neurons is set to 1200, 600, 200, 100, and 10, respectively. The hyperbolic tangent function is selected as activation function in the hidden layer. The network input is the Fourier spectrum vector that is calculated from bearing vibration signal, with each sample length being 1200. The 10 neurons in output layer correspond to 10 kinds of bearing health state. This DAE model is trained on one bearing vibration dataset with massive samples, by the use of layerwise pre-training and fine-tuning algorithms. Consequently, the optimized DAE model reached an accuracy of more than 99% in the identification of various types of faults in the bearing test set. However, the analysis of reason why the deep neural network model has such an excellent ability of fault feature extraction in the existing research only lies in the discriminative analysis of the extracted features of hidden layers in the low-dimensional space after dimensionality reduction. Fig. 1(b) shows the visualization results of the extracted features after dimensionality reduction by principle component analysis (PCA). Obviously, the fault-related features extracted by the DAE model are highly discriminative. The samples belonging to the same type of health state are well gathered together, while the samples belonging to different types of health state are well separated.
However, the above-mentioned dimensionality reduction analysis could only give the discriminative property of the extracted fault features. It cannot give a clear and intuitive explanation as to what fault-related components the network digs out from the original input samples. There is a lack of visual and interpretable representation of fault information. In addition to the DAE model, the existing fault feature extraction models based on other types of DNN also contain multiple non-linear mapping layers, then the mining problem of understandable fault information has become a common problem faced by this kind of technology. Therefore, there is an urgent need to research a mining method that could reveal what understandable fault information is mined by the intelligent DNN diagnosis model.

III. THE PROPOSED METHOD
In order to solve the problem specified in Section 2, a mining method of understandably weak fault information is proposed based on DNN inversion estimation. Its schematic diagram is displayed in Fig. 2. It includes following steps: (1) Construction of one-dimensional deep residual network (DRN) for fault feature extraction. Following the basic principles of modular design and considering the convergence and over-fitting problems in network training, a deep residual network with residual connection is devised. It takes raw vibration signal of rotating machinery as input. The whole network is trained by back-propagation (BP) algorithm in an end-to-end way.
(2) Construction of mathematical model for mining understandable fault information. The activation value of a neuron of network output layer reflects its sensitivity to the current input data. When one DRN model is trained over, the weight parameters of this network is frozen. And then, a mathematical model for mining understandable fault information is built up, with regard to the neurons in output layer. This model takes activation value maximization as the optimization goal and input data as variables.
(3) Solution of the above mathematical model. Since the mathematical model is highly non-convex, it is hard to calculate its analytical solution. Therefore, gradient descent is employed to find the approximately optimal numerical solution. Then, the most sensitive fault-related pattern can be dug out from the original input feature space.

A. CONSTRUCTION OF ONE-DIMENSIONAL DEEP RESIDUAL NETWORK FOR FAULT FEATURE EXTRACTION
Fault feature extraction is very significant to the fault diagnosis of rotating machinery. As one of the most powerful feature extraction tools, deep neural network has achieved great success in the feature engineering. In this section, convolution neural network is used as sub-module for building up feature extractor. It is one of the most famous deep network structures, and is originated from computer vision [27]- [29]. Its effectiveness of feature extraction has been verified in many domains. An intelligent fault diagnosis network will be designed by obeying basic principles of CNN structures.
Considering that vibration signals contain abundant operation status information of rotating machinery and are able to reveal the health status of equipment, this paper proposes one-dimensional deep residual network for fault feature extraction. The overall structure of the proposed deep network is displayed in Fig. 3. This network takes raw vibration signals as input. It consists of two parts, i.e., backbone network and tail network. The former is made up of alternately stacked convolution layers (Conv) and pooling layers (MP), while the latter is composed of fully-connected (FC) layers. The FC layer projects the feature information extracted from the backbone network into the label space of fault category. It is worth mentioning that the input of this network is just original time-domain vibration signal without needing any advanced signal preprocessing. The subsequent 27 convolution layers are applied to do hierarchical feature extraction and hierarchical feature expression, followed by one global average pooling (GAP) layer and one FC layer. The size of convolution kernel is set to be 16. The FC layer contains K neurons, with each corresponding to one fault class. In here, the reason why the convolution unit is used as the sub-module of feature extraction is that the input time-domain vibration signal is regarded to be locally stationary. In other words, the features extracted in one short-time window of original signal by a certain convolution kernel are also applicable to the other short-time windows. Meanwhile, the convolution operation with sliding window can maintain the timing of extracted features when doing feature extraction. The head of the proposed network employs pre-activation module [29], while the middle part of the network contains 12 residual blocks. Each block consists of 2 batch-normalization (BN) layer and 1 dropout layer. The residual block makes error gradient of weight parameters easier to pass back to the frontend layer of the network, which makes the proposed deep network easier to be trained.
In order to clearly suppress the slow convergence and over-fitting problems occurring in network training process, batch-normalization (BN) technology [30] and dropout technology [31] are introduced into the proposed DRN. The former limits the activation values of BN layers to a stable distribution, and overcomes the internal covariate shift phenomenon caused by the changes of network parameters during training process. It makes network allowed to use large learning rate to speed up training. Detailed process of the BN algorithm is as follows: ζ denotes one batch data containing m training samples, that is, Then, the output o i of BN layer is calculated as follows: where x i is computed through the following equations: γ and β are the parameters that need to be learned in the network training; ξ is a very small constant and set to 0.0001. From Eq. (1) to Eq. (4), it can be found that BN layer also plays a role in data augmentation. The whole network parameters are trained and optimized by stochastic gradient descent (SGD) algorithm. It is obvious that a training sample is considered in conjunction with other samples in one batch data. In each training iteration step, this training sample is randomly combined with other samples to constitute one batch data, while each batch data has an influence on the learned parameters γ and β of BN layers. This is equivalent to introducing disturbances and noise into training samples. In other words, the behavior of BN layers can be seen as an implicit way of data augmentation. Data augmentation is widely used in preventing over-fitting of DNN [32]. The dropout technology is employed to explicitly weaken the over-fitting problem of network training. The core of it is that partial neurons in dropout layers are forced to be inactive, which means that the connections between the neurons in the dropout layer and its previous layer are randomly cut off with a fixed probability p being 0.5. In other words, we can obtain many ''small'' networks that are sampled from original ''big'' network. In the training stage, these ''small'' networks are trained and optimized. In the test stage, dropout layers no longer work. Then, the final trained model can be considered as the average result of above multiple ''small'' networks that share same network structure. This technology contains ensemble learning thought, and largely improves generalization ability of the proposed DRN. Detailed process of the dropout technology is follows: let a (l) denotes the activation value vector in layer l. Then, the input feature vector i (l+1) in layer l + 1 is calculates as follows: where r (l) represents the random vector that obeys Bernoulli distribution of probability p; the symbol ⊗ denotes elementwise product. Through above-mentioned multiple non-linear layer mapping, deep feature representation of any sample can be obtained in the GAP layer. The subsequent FC layer projects this feature representation into the label space of fault category to obtain diagnosis result.

B. MATHEMATICAL MODEL FOR MINING UNDERSTANDABLE FAULT INFORMATION
From the structure of the proposed DRN, it can be seen that the original input data are mapped by each non-linear layer in this network, which makes the extracted fault-related features in hidden layers show strong abstraction characteristic and high dimension characteristic. The existing research on the understanding of this type of feature focuses on the discriminative analysis in the low-dimensional space after linear or non-linear dimensionality reduction. However, it is difficult for subsequent researchers or engineers to intuitively understand the extracted deep fault feature representation. Aimed at this problem, this paper builds up a mathematical model for mining understandable fault information. This model takes the trained DRN as a whole, rather than visualizing highlevel features layer by layer. It digs out the most sensitive pattern from the original input data, based on the principle of activation maximization. Naturally, this sensitive pattern can be used to reveal fault information hidden in the original condition monitoring data.
In this section, we take one neuron in network output layer as an example to describe the procedure of the proposed mathematical model in detail. With regard to one neuron in j layer, the activation value a (j) i reflects the response degree to the current input data. The activation function adopted in the proposed DRN model is rectified linear unit (ReLU), i.e., σ (z) = max(0, z). The larger the activation value of the neuron is, the more sensitive the neuron is to the input pattern. Then, the role of the DRN model can be understood by solving the most sensitive input pattern that maximizes its activation value a (j) i in the original feature space, when the DRN model is trained over. That is to solve the following optimization problem: where the symbol x * represents the optimally sensitive input pattern mined in the original feature space under the condition that the network input is x; E denotes the energy of input sample x, that is: E is a constant. This condition limits the search range of the solution in Eq. (7). Based on the Lagrange multiplier method, the optimization problem shown in Eq. (7) can be transformed into the extreme value problem of the following Lagrange function: where −a (j) i (x) denotes the activation value of i−th neuron in j layer. It is the function of input x; the symbol λ is Lagrange coefficient, and λ > 0. From the perspective of the entire construction process of the proposed model, this model is based on the activation value to mine sensitive input pattern. Therefore, this model can be applied to other types of DNN. It can be regarded as a general mining method of understandable fault information, with regard to the DNN-based diagnosis model.

C. SOLUTION OF THE MATHEMATICAL MODEL
From Eq. (9), it can be found that the function R(x, λ) contains multiple non-linear mapping processes. R(x, λ) is highly non-convex. It is hard to obtain the optimal solution or analytical solution of this function. To solve this problem, the gradient descent (GD) is applied to find the approximately optimal solution. GD algorithm includes three steps: (1) random initialization of solution; (2) calculation of gradient for each parameter; and (3) update of parameters. The above process corresponds to the following equations, respectively.
2 , x 3 , . . . , x ∇R(x, λ) = ( x where x (0) denotes initial solution, i.e., the current input sample; L represents the feature dimension or the length of input signal; ∇R(x, λ) denotes the gradient vector of function R(x, λ) to parameter x; s represents the number of iteration steps; η denotes step size. Considering that the features in hidden layers of DNN are learned from training samples, these features can be regarded as the whole or partial feature expression of input data from the perspective of the input feature space. Therefore, the most sensitive input pattern can be mined out in the original input feature space, by the use of Eq. (10) [33]. The corresponding experiment rig is shown in Fig. 4 [33]. It is mainly composed of five parts, i.e., a 2 horse power motor, a torque transducer, a dynamometer, and control electronics (not shown). The tested bearings are used to support the motor shaft. The multiple faults of single point are introduced into the tested bearings at the position of outer race, inner race, and rolling element, respectively, by using electro-discharge machining (EDM). The weak fault vibration signals with strong background noise are adopted to do experimental    verification. The tested bearing type is the deep groove ball bearing 6205-2RS JEM SKF. Table 1 lists the important structural parameters of this bearing type. The vibration data are acquired by using accelerometer that is attached to the bearing housing with magnetic bases. The sampling frequency is set to 12kHz. The fault characteristic frequency of the tested bearings at unit rotation frequency is listed in In general, the fault feature extraction method based on DNN is data-hungry. The network training needs to consume massive data, since DNN usually has many non-linear mapping layers and huge amount of parameters. In order to train the proposed one-dimensional DRN model, an explicit data augmentation method is raised. That is, one original vibration signal can be divided into multiple sub-signals and 50% overlap of sampling points is allowed between adjacent sub-signals. For example, one original vibration signal with 1535 sampling points can be separated into 2 sub-signals, with each length being 1024. Fig. 5 intuitively shows the process of constructing samples in the original bearing vibration signals. It is worth pointing out that there are no overlapping sampling points between training set and test set. Table 3 lists the number of samples contained in each fault class in the constructed data set. Fig. 6 displays the time-domain waveforms of typical samples included in each fault class of test set. It can be seen that the transient impulse information is severely submerged due to the impact of strong background noise, which brings great difficulties to the early weak fault detection.

2) RESULTS OF FAULT DIAGNOSIS
In order to intelligently identify the fault location of each sample in the test set under strong background noise, this paper firstly constructs the one-dimensional DRN as shown in Fig. 3. The hyper-parameters of this network are set as follows: the length of network input is set to 1024 sampling points. The max-pooling operation is employed in pooling layer. The size of all convolution kernels is set to 16. Each time the original input signal passes through two consecutive residual modules, it is down-sampled to half of its original length. In the last FC layer, the number of neurons is 4. Each neuron is linked with a fault mode.
Afterwards, the proposed DRN model is trained on the divided bearing training set using back propagation algorithm. Fig. 7 shows the loss curve and accuracy curve of the network on the training set and test set during training VOLUME 10, 2022  process. It can be seen from this figure that the DRN model's performance on the test set is equivalent to that on the training set in terms of both loss and accuracy, which indicates that the network does not fall into an over-fitting state. The recognition performance of the proposed DRN model in each fault category is listed in Table 4. Obviously,   residual network. In order to further understand the highly abstract features extracted by the network at GAP layer, the traditionally high-dimensional feature visualization method, i.e., t-distributed stochastic neighbor embedding (t-SNE), is applied to do dimension reduction and further discriminative analysis. Fig. 8 shows scatter distribution of samples' deep feature representations that are reduced in dimensionality. It is clear that the samples belonging to same fault type are clustered well while the samples from different fault types are clearly separated, which demonstrates that the deep feature representation is highly discriminative.   are taken as examples and considered to be initial values. We mine the most sensitive input pattern that maximizes the activation value of corresponding neuron in the output layer, by the use of Eq. (10)-Eq. (12). During iteration, the changes in the sensitive input pattern of each category are shown in Fig. 9-Fig. 12. The iteration is stopped when Fig. 9(d)- Fig. 12(d) show the most sensitive input patterns that are related with the corresponding neurons in the output layer of DRN. It can be seen from these figures that the transient impulse information is retained while the background noise is greatly suppressed. For example, with regard to healthy state neuron in the output layer of DRN model, the most sensitive input feature pattern is a DC signal with mean value being 0. For the faulty neurons, the most sensitive input feature patterns show that the local interference between adjacent transient impulses is almost suppressed and eliminated completely, and the high-frequency transient impulse information is highlighted. Further, the Hilbert transform and envelope spectrum analysis are applied to deal with the most sensitive input patterns that correspond to the three fault neurons in the output layer of DRN model. The envelope spectrums of these three sensitive input patterns are shown in Fig. 13. The outer race fault characteristic frequency, the inner race fault characteristic frequency and the rolling element fault characteristic frequency are very evident in Fig. 13(a)(b)(c), respectively. It strongly illustrates the effectiveness of the proposed method for mining understandably weak fault information.  For further validate the generalization ability of the proposed method, the incipient fault detection of run-to-failure bearing experiment is adopted. The test data of rolling element bearings are from Prognostics Center Excellence (PCoE) through prognostic data repository contributed by Intelligent Maintenance System (IMS), University of Cincinnati [34]. Fig. 14 shows the experiment rig. Four rolling element bearings are installed on a shaft that is driven by an AC motor at a speed of 2000 rpm. A radial load of 6000 lbs is added to the shaft and bearings by a spring mechanism, which accelerates the degradation of bearings. The VOLUME 10, 2022    bearings are forced lubricated, and an oil circulation system is used to regulate the flow and temperature of the lubricant. In order to carry out the run-to-failure experiment smoothly and monitor the condition of tested bearings, a magnetic plug is installed in the oil feedback pipe and is used to collect debris from the oil. The experiment will be terminated if the accumulated debris exceeds a certain level.
The model of the tested bearings is Rexnord ZA-2115. The important structure parameters of it are listed in Table 5. Table 6 also lists the fault characteristic frequency of the tested bearings at unit rotation frequency. A PCB 353B33 High Sensitivity Quartz ICP accelerometer is installed on the housing of each bearing. Data collection is facilitated by NI DAQ Card 6062E, and the vibration data are collected every 10 minutes. The sampling rate is 20 kHz. The data length of each file is 20480 points. The detailed information about this experiment can be found in Ref. [34].

2) DATA DESCRIPTION
This study uses the data of bearing 1 in the second data set that was collected from February 12, 2004 10:32:39 to February 19, 2004 06:22:39. Outer race failure occurred in bearing 1 at the end of run-to-failure experiment. Fig. 15 depicts the time-domain statistical feature, i.e., root mean square (RMS), for the entire life cycle of bearing 1. According to Ref. [35] the trend of RMS can be separated into three stages: the first is the normal and stable operation stage; the second is early fault stage, and the last is severe failure stage. It can be seen that the RMS value increases a lot in the last stage. Fig. 16 shows the typical samples that correspond to above three stages, respectively. In the last stage, the periodicity of transient impulses is so obvious that characteristic fault frequency can be calculated, that is 1/0.0042s = 238.1Hz, where 0.0042s is the interval of two adjacent impulses. However, the transient impulse feature is not apparent at the phase of incipient fault, as shown in Fig. 16(b). The periodic impulses are very weak and heavily contaminated by noise or other interference. If we can detect fault at this stage, the response time for maintenance will be longer.

3) MINING OF WEAK FAULT INFORMATION BASED ON THE PROPOSED METHOD
In order to diagnosis bearing fault accurately and in time, the proposed DNN inversion estimation technology is employed. Firstly, the one-dimensional DRN model is constructed. The parameters of this model are same to that in case 1, except that the number of neurons is set to 3 in output layer. These neurons correspond to three stages of bearing performance degradation, respectively. The length of input sample is also FIGURE 19. Envelope spectrum of the most sensitive input pattern that is related with the early fault stage.
1024 sampling points. Just like in case 1, the training set and test set is made by the use of the proposed explicit data augmentation method. Afterwards, the whole model is trained using back propagation algorithm.
The tested bearing operation stage can be easily recognized using network forward computation. For further breaking the way of using DNN as ''black box'' and giving an intuitive explanation of what kind of fault information is mined from the input samples, the proposed mining method of understandable fault information is adopted. The two types of typical time-domain signals shown in Fig. 16(a) The most sensitive input pattern could be mined from the input samples by the use of Eq. (10)-Eq. (12). In the process of iterative solution, the waveform changes of sensitive input pattern are shown in Fig. 17-Fig. 18. The most sensitive input patterns closed to the corresponding neurons of network output layer are plotted in Fig. 17(d) and Fig. 18 (d). Obviously, with regard to input signal belonging to early fault stage, the transient impulse information is highlighted while the background noise is largely suppressed. For the neuron related with normal operation stage, the most sensitive input feature pattern is like a DC signal. So as to specify the fault location of tested bearing clearly, the Hilbert transform and envelope spectrum analysis are used to handle the most sensitive input pattern that is related to early fault stage. The envelope spectrum result is shown in Fig. 19. It can be from this figure that the outer race fault characteristic frequency is very evident, which validates the generalization ability of the proposed method.
Experimental results reveal that the proposed method can adaptively invert and estimate the fault-related components contained in the original input data. The proposed method breaks the way of using DNN as ''black box'' when DNN doing feature extraction. It is able to provide a strong support for management personnel to conduct intelligent fault diagnosis.

V. CONCLUSION
Aimed at the abstraction problem of extracted fault-related features based on DNN diagnosis models, a mining method of understandable fault information is put forward on the basis of deep neural network inversion estimation. This method starts from the overall structure of the deep neural network, and mines the most sensitive input pattern that maximizes the activation value of the neuron to be analyzed in the original input feature space. It not only realizes the extraction of weak fault-related feature effectively, but also reveals the weak fault information hidden in the original condition data. The experimental results demonstrate that the proposed method can effectively and adaptively realize mining of understandable fault information, which helps researchers and engineers understand the meanings of the extracted fault-related features by the DNN model and further enhances the practical level of this technology.