Machinery New Emerge Fault Diagnosis Based on Deep Convolution Variational Autoencoder and Adaptive Label Propagation

In the research field of mechanical equipment fault diagnosis, usually only the existing fault types are identified, and the new emerge class of the fault is ignored, however, the new emerge fault class may also occur actually. In order to solve the problem, a novel fault diagnosis model based on deep convolution variational autoencoder network and adaptive label propagation (DCVAN-ALP) is proposed. Firstly, the initial high dimensional features are constructed by using the double tree complex wavelet packet method as the input of the model. Secondly, the convolutional neural network architecture is applied to construct the variational autoencoder, and the local and non-local characteristics of samples are embedded into the loss function for training, which is considered to improve the identification of hidden layer features of the neural network. Finally, t-SNE and the improved label propagation algorithm are adopted to process the hidden features of the neural network, which can achieve the purpose of diagnosing the existing fault class and especially new emerge fault class. Experimental results show that the proposed model can effectively extract the fault characteristics of the vibration signal, and it also has a significantly higher recognition accuracy rate than other typical deep learning methods and traditional classifiers in diagnosing new emerge fault class.


I. INTRODUCTION
In industrial process and military fields, mechanical equipment is widely used, once the equipment is damaged, the system reliability will be reduced, which may cause serious impact. In order to improve the safety of industrial process, and mitigate the risks of equipment failure, automatic health management is of great significance, and fault diagnosis plays an important role in equipment health management [1], [2].
The key component of fault diagnosis research is to extract useful information from the data collected by sensors, and then the classifier is applied to obtain the diagnosis result [3], [4]. Traditional fault diagnosis methods on the basis of signal processing and expert knowledge have been applied many years ago [5], and with the development of artificial intelligence, much more intelligent algorithms have been frequently used to build fault diagnosis models in recent years. For instance, Huang et al. [6] proposed RNN-based variational autoencoder (VAE) network The associate editor coordinating the review of this manuscript and approving it for publication was Dazhong Ma. to extract features from raw vibration signals for motor fault detection. Jing et al. [7] proposed a convolutional neural network (CNN) to learn features directly from frequency data of vibration signals. In addition, not only one-dimensional (1-D) data like raw vibration signal, but also two-dimensional (2-D) image data or high-dimensional data are also widely served as the input of deep neural network [8]- [10]. Some studies have shown that the preprocessing of the original time-domain signal can improve the effect of the diagnostic model. Wen et al. [11] proposed a signal-to-image conversion method, the time-domain raw signals fulfill the pixels of the image by sequence, and then the input 1-D signals were converted to 2-D images. Xu et al. [12] used the continuous wavelet transform to convert bearing vibration signals into time-frequency images, and then CNN was used to extract intrinsic fault features from the images. Wang et al. [13] converted time-domain vibration signal to 3-D images based on erosion operation, and AlexNet convolutional neural network was used to diagnose the faults of coal washing machine. Zhao et al. [14] adopted discrete wavelet packet analysis to transform the gearbox vibration signal into the 2-D VOLUME 10,2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ time-frequency diagram, which was used as the input of the DRN network for fault diagnosis. Liu et al. [15] constructed high-dimensional feature set as the input of deep auto-encoder (DAE) to identify the fault severities of gearbox, including time domain, frequency domain, and energy spectrum features. Shao et al. [16] developed an adaptive deep belief network (DBN) with dual-tree complex wavelet packet (DTCWPT), and DTCWPT was adopted to refine the measured vibration signals to design a manual feature set from each frequency-band signal. Tang et al. [17] decomposed raw vibration signals into intrinsic mode functions (IMFs) by CEEMD, and converted appropriate IMFs into 2-D images as the input of CNN for low-speed structural fault diagnosis. For various deep neural networks, the neurons in DAE and DBN models generally adopt full connection, while CNN network adopts local connection and weight sharing, which can greatly reduce the training of network parameters and improve the efficiency [18]- [20], in addition, VAE inherits the advantages of auto-encoder in unsupervised learning, while the characteristics of hidden layer features can better represent the input data than auto-encoder [21]. In order to improve the safety of industrial process, some other methods such as transfer learning [22], hybrid method of fault detection and diagnosis [23] are also proposed to reduce risks. From above studies, although the validity of these models has been proven, the common problem of these fault diagnosis methods is to assume that the classes of target domain samples belong to the source domain samples, in other words, these methods can just diagnose the existing faults. However, the source domain samples cannot contain all fault classes, which means new fault class may emerge with the long-term operation of the mechanical equipment. Thus, the method can not only diagnose existing fault, but also diagnose new emerge fault class, which is more in line with the actual diagnosis requirements. Wang et al. [24] proposed a novel deep metric learning model for imbalanced fault diagnosis and new emerge fault classification, but few samples of the new fault were labeled for training, although the proposed model had achieved effective results, in fact, the real label of the new fault samples cannot be determined in advance.
To overcome the problem, a novel diagnosis model is proposed based on deep convolution variational autoencoder network with adaptive label propagation algorithm. The main contributions of this work are summarized as follows.
1) The loss function of deep convolution variational autoencoder network is constructed by minimizing intra-class compactness (locality) and maximizing inter-class separability (non-locality) in feature space. 2) Label propagation algorithm is improved to adaptively acquire parameter to build neighborhood graph.
3) The proposed DCVAN-ALP model can not only diagnose the existing fault, but also diagnose new emerge fault class in domain adaptation.
The rest of the paper is organized as follows. Section II describes the VAE method. The improved label propagation algorithm is described in Section III. Section IV is dedicated to detail the framework of the proposed DCVAN-ALP model. In Section V, two case studies are given to illustrate the effectiveness of the proposed method. Finally, the conclusion is drawn in Section VI.

A. THE STRUCTURE OF VAE
VAE is an unsupervised learning method based on the generation of autoencoder, it inherits autoencoder's encoderdecoder architecture that is trained to reconstruct its own input, but the difference is that the latent representation z of given data x are replaced with stochastic variables. The basic structure of VAE is shown in Figure 1.VAE forces the distribution of the latent variables z approach to a standard normal distribution, as a result, the latent variables z have stable statistical properties. The idea of VAE is to use the hidden variable z to generate data setx by optimizing the generation parameter θ. In order to makex similar to the original data x with high probability, that is, to maximize the marginal distribution p θ (x).
where p θ (x|z) represents the probability distribution of the original data x reconstructed from the hidden variable z, and p θ (z) represents the prior distribution of the hidden variable z. Since the real posterior distribution p θ (z|x) is difficult to be calculated, the approximate posterior q φ (z|x) which obeys Gaussian distribution can be used instead of the real posterior [25]. The objective function of VAE is the variational lower bound to the marginal likelihood of the data, it can be rewritten for individual data points x (i) as follows: where D KL is Kullback-Leibler divergence, D KL (q φ (z|x (i) )|| p θ (z)) represents the regularization term, and E q φ (z|x (i) ) [log p θ (x (i) |z)] denotes the reconstruction error. As the reconstruction error term in Eq. (2) requires the Monte Carlo estimate of the expectation, which is not easily differentiable and will cause the network parameters not be updated by the gradient descent method. In order to solve the problem, the reparameterization technique [26] is proposed to make variables z reparameterized, let z = µ + εσ , with ε ∈ N (0, 1). With this trick, a differentiable estimator of the variational lower bound can be calculated and the gradient descent algorithm can be applied to train VAE.

B. VAE LOSS FUNCTION WITH LOCAL AND NON-LOCAL INFORMATION
In order to identify the features of different classes more effectively, local and non-local characteristics among samples are embed into the feature representation space, the following objective function is minimized as In other words, samples of the same class should be clustered in the feature space, and samples of different classes should be scattered in the feature space.
where C denotes the number of classes, n i is the number of the ith class of samples, f j i is defined as the feature matrix of the ith class of the jth sample in the latent z feature space, u i represents the mean value matrix of the feature matrix f j i , u denotes the mean value matrix of all classes of sample features.
The loss function of VAE with local and non-local information can be rewritten as

III. THE IMPROVED LABEL PROPAGATION ALGORITHM A. LABEL PROPAGATION ALGORITHM
In this study, label propagation algorithm is used as a classifier to diagnose faults, including new emerge fault class. The algorithm assumes that if the features of two samples are more similar, the greater the probability that they belong to the same class. Moreover, graph-based label propagation is to transfer the label information of labeled samples to the neighboring nodes through the weight of the graph edge of any node. The label information of unlabeled samples can be derived after reaching the global stable state through several cyclic iterations [27], [28]. Generally, k-nearest neighbor method is used to construct the node neighborhood weight graph, the edge with a larger weight is easier to transfer information.
Define the initial sample matrix X = {x i } n i=1 , the data set X is divided into two nonoverlapped parts, the first l samples x i (i ≤ l) in sample matrix X are taken as labeled samples, and the remaining n − l samples x i (l + 1 ≤ i ≤ n) are unlabeled, which means the labels of samples x i (i ≤ l) are known in advance, and the labels of samples x i (l +1 ≤ i ≤ n) are unknown in advance. The purpose of label propagation algorithm is to predict the labels y i (l + 1 ≤ i ≤ n) by using samples x i (i ≤ l) and these labels y i (i ≤ l). C is the number of classes, the key part of the algorithm is to assume that the classes of target domain is one more than that of source domain, that is, the number of classes is set According to the k-nearest neighbor method, if x i and x j are neighbors, then x i and x j are linked by a weight computed by Here, N k (x j ) denotes k neighborhood samples of sample x j , N k (x i ) denotes k neighborhood samples of sample x i , and σ is the variance, in this paper, σ is set to be 0.1.
Let Figure 2. The blue circle data denote the neighbors of the orange square data x i , and the green triangle denotes the initial label y i of x i . In each iteration of the label propagation process, the predictive label information of x i is partly received from its neighbors' labels, and the rest is received from its initial label y i . The label information of the data at time t +1 is propagated based on the following equation: where I α is an n × n diagonal matrix with the ith entry being α i , and α i is a parameter for data x i to balance the initial label information of x i and the label information received from its neighbors during the iteration. For labeled samples x i (i ≤ l), α i is desirable to a value close to 0, such as α i = 0.1, while for unlabeled samples Label propagation diagram. VOLUME 10, 2022 close to 1, such as α i = 0.9. In addition, By the iteration equation (9), the iterative process converges to where the sum of each row of F is equal to 1. For the unlabeled sample is recognized as the known one and belongs to class j; if j = C + 1, it means that x i belongs to the new emerge class.

B. THE ADAPTIVE LABEL PROPAGATION ALGORITHM
In k-nearest neighbor method, the value of k is usually determined by experience, which will affect the performance of the algorithm. In this paper, an adaptive algorithm based on manifold learning theory is proposed to solve the problem. Compared with Euclidean space, the nonlinear distribution can be better characterized by measuring local geometric features in manifold space. Thus, the manifold curvature index is used to determine the number of neighbors of each sample in manifold space. For sample x i , the manifold curvature r i is described as follows.
where k is initial neighborhood number, X i represents the neighborhood set of sample x i . d e (x i , x j ) denotes the Euclidean distance between sample x i and its neighborhood sample x j . d g (x i , x j ) represents the geodesic distance between sample x i and its neighborhood sample x j , a detailed geodesic distance calculation method can be found in this reference [29]. The smaller the ratio of Euclidean distance among X i to its geodesic distance, the more curved the local manifold around sample x i , in order to better characterize the nonlinear properties of sample x i , the neighborhood set should have fewer neighbors. That is, the smaller the value of r i , the smaller the value of k i .
where the value of k is the initial neighborhood number, ceil(a) means to take the smallest integer greater than a.

IV. PROPOSED METHOD A. INPUT DATA OF DCVAN-ALP MODEL
Vibration data are processed by multiscale feature extraction method, and high dimensional features are constructed as the input of the proposed model. The steps of vibration data preprocessing are as follows: 1) The raw vibration signals are decomposed to level 4 by using the DTCWPT [30] method, and therefore 16 frequency-band signals are obtained.
2) The 24 time-domain and frequency-domain features are extracted from each of the 16 frequency-band signals. Then the extracted 384 features from each sample are combined to construct the original high dimensional feature set.
In this work, those twenty-four feature parameters [30], [31] are selected as shown in Table 1. The eleven parameters (p 1 − p 11 ) are time domain statistical characteristics and the thirteen parameters (p 12 − p 24 ) are frequency-domain statistical characteristics.

B. STRUCTURE OF DCVAN-ALP MODEL
The DCVAN-ALP model consists of three parts, as shown in Figure 3. Stage I is the training and fine-tuning process of the deep convolution variational autoencoder, Stage II is to reduce the dimension of z hidden layer features by t-SNE, and in Stage III, the improved label propagation algorithm is applied to fault diagnosis.
Stage I: (1) Training process In the encoding stage of variational autoencoder, the structure of convolutional layers is applied to construct the VAE. Conv1 is the first convolutional layer which is connected to the input layer, the symbol 16@4 * 1 represents 16 feature surfaces, and the size of the convolution kernel is (4,1), besides, the stride is (2,1), which indicates that the sliding step is 2 in the longitudinal direction and 1 in the transverse direction of the feature surfaces. The trick of batch normalization [32] is added right after the convolutional layer of Conv1, that can reduce the shift of internal covariance and accelerate the training process, and Rectified Linear Unit (ReLU) is used as activation unit. Then a max-pooling layer is also added after the convolutional layer of Conv1, and the step size is 2. The output of the first pooling layer is used as the input of Conv2 which is the second convolution layer, similarly, batch normalization and max-pooling are all implemented right after the second convolution layer. The mean value (u) and logarithm of variance (log σ 2 ) of the hidden layer are output through a full connection layer (Dense) with 100 neurons, and the features (z) of the hidden layer are obtained by re-parameterized sampling method. Since VAE is an unsupervised learning method, it needs to reconstruct the input data to complete the training by implementing the decoding process. The decoding process is the reverse operation of the encoding process, and the deconvolution is used to replace the convolution operation. Meanwhile, the local and  (2) Fine-tuning process VOLUME 10, 2022 The hidden layer features (z) of training data are extracted as the input of Softmax layer, then, the labels of training data and the output of Softmax layer jointly construct the cross entropy loss function. Thus, the network parameters are updated through reverse fine-tuning to complete the training. For the convenience of comparison, the model proposed in Stage I is named deep convolution variational autoencoder network (DCVAN).
Stage II: After completing the training of DCVAN model, the hidden layer features (z) of training data and testing data can be extracted, respectively. The t-distributed stochastic neighbor embedding (t-SNE) [33] algorithm is adopted to obtain the low dimensional features of training data and testing data.
Stage III: The low dimensional features of training data and testing data are taken as the input of the label propagation algorithm, to predict the labels of testing data, so as to complete the diagnosis of the known fault and new emerge fault. Additionally, it is also for comparison in this paper, the diagnosis model with the value of parameter k fixed in label propagation algorithm that is named DCVAN-LP, and the diagnosis model embedded with adaptive label propagation algorithm is called DCVAN-ALP.

C. DIAGNOSIS PROCEDURE
The proposed DCVAN-ALP method includes the following procedures.
Step1: the vibration signals of training samples and testing samples are decomposed to level 4 by using DTCWPT method, respectively, and those 24 feature parameters are extracted from each of the frequency-band signals to construct 384 initial features of each sample.
Step2: the initial high dimensional features of training samples and their labels are used as the input of DCVAN model in Stage I, after completing the training and fine tuning processes, then the hidden layer variables (z) of training samples and testing samples are extracted, respectively.
Step3: the t-SNE algorithm is applied to reduce the dimensions of the hidden layer variables (z) of training samples and testing samples.
Step4: the three-dimensional features of training samples and testing samples are processed by the adaptive label propagation algorithm to predict the labels of testing samples.

V. CASE STUDIES AND EXPERIMENTAL RESULTS
Two case studies are presented in this section to examine and validate the effectiveness of the proposed DCVAN-ALP model. Case 1 focuses on the benchmark data provided by the Case Western Reserve University (CWRU) bearing data center [34]. Case 2 is devoted to the damage data of ammunition swing mechanism from an artillery experimental platform.
A. CASE 1: ANALYSIS OF THE BENCHMARK DATA SET FROM CWRU

1) DATA SOURCES DESCRIPTION
The basic layout of the test rig is shown in Figure 4. The data sets were collected at the sampling frequency of 12kHz under four different loads and speeds (0 hp/1797 rpm, 1 hp/1772 rpm, 2 hp/1750 rpm and 3 hp/1730 rpm). In this study, data sets S1, T1, S2, T2, S3, T3 and S4, T4 are constructed by using the bearing data under the four different working conditions, as shown in Table 2, the purpose is to test the performance of the proposed model in diagnosing new emerge fault class under single working condition or variable working conditions. The training data includes 7 classes, and the testing data contains 8 classes, where 90 samples are taken as training samples of each class, and 25 samples are constructed as testing samples of each class. For instance, S1-T1 means to train the diagnostic model by using the training samples of data set S1 with 7 classes, and to test by using the testing samples of data set T1 with 8 classes under the same working condition (load 0 hp, speed 1797 rpm); S1-T2 means to train the diagnostic model by using the training samples of data set S1 (load 0 hp, speed 1797rpm) with 7 classes, and to test by using the testing samples of data set T2 with 8 classes under different working condition (load 1 hp, rotating speed 1772 rpm).
To demonstrate the fault diagnosis capability of the proposed model, several methods are applied for comparison, and the input data of all models are the same, the brief introduction of each method is as follows.
(1) SVM: SVM is the representative of traditional classifiers, and is widely used in fault diagnosis. (2) SDAE: SDAE is the sparse auto-encoder-based deep neural network [35], as a method of unsupervised learning, which is successfully applied for faults classification. (3) CNN: CNN is an intelligent fault diagnosis method, in this work the model adopts five-layer convolution structure, and the characteristic surfaces of each convolution layer are 16, 32, 64, 64 and 64, respectively, the full connection layer contains 100 neurons, which is connected to the Softmax layer.    As the number of each class in the training samples is the same, the initial value of k in DCVAN-LP and DCVAN-ALP models just takes the value that is less than the number of each class in the training samples. In this case, the initial value of k is set to be 30.

2) EXPERIMENTAL RESULT AND ANALYSIS
Each diagnostic model is run for 10 times to reduce the influence of randomness, and the diagnostic accuracy is averaged as shown in Figure 5 and Table 3, we can find that the proposed DCVAN-ALP model significantly outperforms the other compared models in domain adaptation.
The diagnostic results are basically divided into two parts, one is less than 87.5% and the other is greater than 87.5%. For example, when adapting from Domain S1 to T2, the average accuracy of SVM, SDAE, CNN, DCVAN, DCVAN-LP, DCVAN-ALP reaches 85.5%, 87.5%, 87.5%, 87.5%, 95.6%, 98.9%, respectively. To further show the diagnosis information clearly, the confusion matrix of SVM, DCVAN and DCVAN-ALP is displayed in Figure 6, Figure 7 and   In order to evaluate the influence of the value of parameter k in the label propagation algorithm on the diagnosis results, as the number of each class in the training samples is 90, DCVAN-LP model and DCVAN-ALP model are compared while the value of k varies from 10 to 90. The diagnosis accuracies of DCVAN-LP and DCVAN-ALP in the domain adaptation tasks of S1 to T4 and S4 to T2 can be seen from Table 4. For example, in task S1 to T4, when the initial value of k is equal to 30, then the parameter k i (k i represents the number of neighbors of the ith sample) keeps the value of 30 unchanged in DCVAN-LP model, and then the average accuracy is 90%. However, according to formula (12), the value of parameter k i varies in DCVAN-ALP model, which will help construct domain space for each sample, in fact, the value of parameter k i is 60, 58, 60, 54, 59, 57, 53, 48, 51 and so on, respectively, then the average accuracy is 96.25%. Moreover, in task S4 to T2, when the initial value of k is equal to 40, and the parameter k i keeps the value of 40 unchanged in DCVAN-LP model, then the average accuracy is 95.1%.   However, the value of parameter k i in DCVAN-ALP model is 77, 78, 80, 39, 79, 76, 63, 39, 37 and so on, respectively, then the average accuracy is 99.1%. As shown in Figure 9 and Figure 10, although the initial value of parameter k is determined empirically, the proposed DCVAN-ALP model based on improved label propagation algorithm performs more accurate and stable than DCVAN-LP model in the domain adaptation tasks while the initial value of parameter k varies. In this case, the initial value of parameter k in DCVAN-ALP  model can be set between 30 and 90, and the the average accuracy is higher than 95%.
For the sake of improving the identification of the features in hidden layer, the local and non-local information are embedded into the loss function of DCVAN-ALP model. In order to better understand the performance of the added local and non-local information, then the hidden layer variables (z) distribution of testing data are shown in Figure 11 and Figure 12, where the same color of the scatter points represents the samples come from the same class. Obviously, the model considers the local and non-local information can make the samples of the same class gradually gather, and the samples of different class gradually scatter, which will be conducive to fault diagnosis. The experimental data comes from the bench test of ammunition swing mechanism, as shown in Figure 13. Acceleration sensors are arranged near the vulnerable parts such as the press plate and the roller to collect the vibration acceleration VOLUME 10, 2022   signals, which are generated during the up and down swing of the swing arm, the sampling frequency is 10 kHz. In addition, the press plate opens and closes during the recoil and reentry of the barrel, and the damage is shown in Figure 14. The roller rolls in the groove during the reentry process, and the crack damage is shown in Figure 15.
The experimental data include three classes: the normal state of the swing mechanism, the damage of the press plate, and the crack of the roller. In this section, each class data set  for training contains 40 samples, and 20 samples for testing, as shown in Table. 5. As we can see, data set S1 contains 2 classes: normal and press plate damage, data set T includes 3 classes: normal, press plate damage, and roll crack. For example, S1-T means to train the diagnostic model by using the training samples of data set S1, and to test by using the data set T.
The time domain of the vibration signal in three states of ammunition swing mechanism is shown in Figure 16, and the time domain diagram of vibration signal in the dotted box of Figure 16 is amplified, which can obtain the processes of down swing and up swing in an action cycle as shown  in Figure 17, and the impact is huge during the operation of ammunition swing mechanism.

2) EXPERIMENTAL RESULT AND ANALYSIS
As the number of each class in the training samples is the same, the initial value of k in DCVAN-LP and DCVAN-ALP models just takes the value that is less than the number of each class in the training samples. In this case, the initial value of k is set to be 30. Each diagnostic model is repeat 10 times to measure average accuracy as shown in Figure 18 and Table 6.   Figure 19, Figure 20 and Figure 21, respectively. As shown in Figure 19, all samples of the normal state and new emerge fault class are wrongly diagnosed as the roller crack state, the average sample size of misdiagnosis is 40 in 10 trials, then the average accuracy is 33.33%. Besides, as shown in Figure 20, the two existing classes can be correctly diagnosed by DCVAN, however, all the new emerge fault class of press plate damage are wrongly diagnosed as the normal state or roller crack state, then the average accuracy is 66.67%. In contrast, as shown in Figure 21, all the new emerge fault class of press plate damage can be correctly diagnosed by DCVAN-ALP, while few samples of the normal state and roller crack state are wrongly diagnosed as the new emerge fault class, the average sample size of misdiagnosis is 1.5 in 10 trials, then the average accuracy is 97.5%.  Diagnostic result less than 66.67% indicates that the method can not only accurately diagnose the existing fault classes, but also has no ability to recognize new emerge fault class. In other words, traditional classifier and deep VOLUME 10, 2022 In order to evaluate the influence of the value of parameter k in the label propagation algorithm on the diagnosis results, as the number of each class in the training samples is 40, DCVAN-LP model and DCVAN-ALP model are compared while the value of k varies from 5 to 40. The diagnosis accuracies of DCVAN-LP and DCVAN-ALP in the tasks of S2 to T and S3 to T can be seen from Table 7. For example, in task S2 to T, when the initial value of k is equal to 10, then the parameter k i (k i represents the number of neighbors of the ith sample) keeps the value of 10 unchanged in DCVAN-LP model, and then the average accuracy is 95.67%. However, according to formula (12), the value of parameter k i varies in DCVAN-ALP model, which will help construct domain space for each sample, in fact, the value of parameter k i is 12,18,20,14,16,5,10,19,17 and so on, respectively, then the average accuracy is 97.17%. Moreover, in task S3 to T, when the initial value of k is equal to 30, and the parameter k i keeps the value of 30 unchanged in DCVAN-LP model, then the average accuracy is 94.33%. However, the value of parameter k i in DCVAN-ALP model is 32,31,30,29,30,32,28,33,34 and so on, respectively, then the average accuracy is    96.17%. As shown in Figure 22 and Figure 23, the proposed DCVAN-ALP model based on improved label propagation algorithm performs more accurate than DCVAN-LP model while the initial value of parameter k varies. In this case, the initial value of parameter k in DCVAN-ALP model can be set between 20 and 40, and the the average accuracy is higher than 95%.
In order to better understand the performance of the added local and non-local information in DCVAN-ALP model, the task S1 to T is taken as an example, then the visualizations of the hidden layer variables (z) distribution of testing data are shown in Figure 24 and Figure 25. Obviously, the model considers the local and non-local information can make the samples of the same class gradually gather, and the samples of different class gradually scatter. However, there are still few samples from different classes overlapped in the feature space as shown in Figure 24, which may lead to misdiagnosis, so the average accuracy is 98.33% as shown in Table 6.

VI. CONCLUSION
This study proposed a novel method called DCVAN-ALP for mechanical fault diagnosis. The local and non-local information among samples are embedded into the loss function for training, which is able to compress intra-class samples and separate inter-class samples in feature space. Then, the extracted discriminant features are fed to adaptive label propagation algorithm for fault diagnosis including new emerge fault class. What should be noted is that, no information of the new emerge fault class is provided in advance for training. Thus, compared with the traditional classifier and other deep learning methods, the effectiveness and superiority of the proposed model are verified by bearing fault data under multiple working conditions and damage data of ammunition swing mechanism under a single working condition.
In the future, it is necessary to study how to improve the robustness and the domain adaptability of the model under continuous changing working conditions or imbalanced number of training samples, and reduce the demand for time cost. In addition, we will further investigate to apply this model to other different engineering applications to improve the safety of industrial process and reduce risks.