Fault Diagnosis Method Based on Encoding Time Series and Convolutional Neural Network

In view of the shortcomings of traditional fault diagnosis methods based on time domain vibration analysis, which require complicated calculations of feature vectors, and are over-dependent on a prior diagnosis knowledge, effort for the design of the feature extraction algorithms, and have limited ability to extract the complex relationships in fault signals, in this paper we propose a convolutional neural network (CNN) framework for machine health monitoring based on the encoding of one-dimension (1-D) time series to two-dimension (2-D) images. This paper defines a new Gram matrix and through the Python programming environment, we emulate the new Gram matrix in 2-D images, thus feature extraction and recognition can be performed by CNNs. The proposed method which is tested on two datasets, including multi-stage centrifugal fan dataset for our lab, motor bearing dataset for Case Western Reserve University, has achieved prediction average accuracy of 94.07% and 96.28%, respectively. The results have been compared with other deep learning and traditional methods, including Recurrent neural network (RNN), Support Vector Machines (SVM), Multi-Genetic algorithm, shallow CNN and BP neural network. The results show that the method can improve fault diagnosis accuracy in an effective way and stability than other advanced techniques.


I. INTRODUCTION
The complexity of rotating unit drive systems and the diversity and complexity of operating conditions lead to non-stationary and non-linear vibration signals of the rotating units [1]. The fuzzy, mixed and multi-coupled signals caused by all kinds of excitation sources result in fault shock characteristics that are difficult to extract from strong background signals and noise, especially given the early weak impact characteristics of rotating units. More specifically, it is difficult to extract the expected fault feature information from the observation sensor signals in rotating units used in petrochemical rotary sets, as they operate in harsh environments, which may be affected by wind, rain, strong noise, high pressure, high temperature, poisonous or harmful The associate editor coordinating the review of this manuscript and approving it for publication was Zhanyu Ma . gases, equipment corrosion, and so on. As a result, there are different types of faults, such as value faults, timing faults, and format faults. This renders the use of traditional fault diagnosis methods based on frequency domain analysis ineffective, inhibiting the implementation of effective and quick fault diagnosis in petrochemical rotary sets [2], [3].
In recent years, many experts have progress on timing and format fault detection. Xiong [4] proposed a simple and effective model-based sensor fault diagnosis scheme to detect and isolate the fault of a current or voltage sensor for a series-connected lithium-ion battery pack. Wang [5] proposed a novel gearbox fault diagnosis method based on maximum kurtosis spectra, and this method provides a new approach to fault diagnosis. Aiming at the fault monitoring of petrochemical units, scholars have proposed many new theories and methods. Despite the fact that more and more scholars adopt artificial intelligence methods of traditional [6], such as support vector machines (SVM) [7], K-nearest neighbors [8] and random forests [9], using the technology for petrochemical rotary set health monitoring and obtaining valuable results, there are still difficulties in extracting complex information from the original data effectively. Artificial neural networks with shallow structure have a limited ability to extract information from the complex nonlinear relationships hidden in such data. As a hot sub-field of machine learning, deep learning can serve as a bridge between mechanical data and health monitoring of petrochemical rotating units [10], [11]. Many researchers have introduced deep learning into intelligent mechanical fault diagnosis [12], [13], and summarized the research such methods [14], [15]. Recently, new intelligent fault diagnosis methods have been developed. Such as to overcome the the traditional sparse autoencoder have some deficiencies, Wang [16] proposed a recently developed optimization method called batch normalization is introduced into deep neural networks for achieving fast intelligent fault diagnosis of machines. Fan [17] proposed an intelligent fault diagnosis method of rolling bearing based on FCM clustering of vibration images obtained by EMD-PWVD is presented. Shao [18] proposed a new framework for rotor-bearing system fault diagnosis under varying working conditions is proposed by using a modified CNN with transfer learning. The results show that these intelligent fault diagnosis methods provided an efficient means for troubleshooting of rotating equipment. Based on the confirmed better performance of CNNs in dealing with sparse data [19] and advantages of intelligent fault diagnosis, in this study we convert a signal to an image using a novel methodology. In order to eliminate as much as possible the need and problems associated with the use of expert knowledge [20], the proposed method does not require knowledge of any predetermined parameters, thus constituting a novel approach towards real-time monitoring and data processing systems [11].
The main contributions of this paper are summarized as follows: 1) An adaptive fault diagnosis method for petrochemical units is proposed. A two-dimensional image encoded using a time series is regarded as a feature representation, where the image signal encoded using a one-dimensional time series not only contains all fault information, thus greatly reducing the dependence on existing experience and knowledge. 2) First, a fault diagnosis method based on encoding time series and CNNs is established to extract the hidden connections in the mapped one-dimensional time series and establish the non-linear mapping relationship between the sampled time series and machine health.
In this paper, we study the influence of the structure and parameters of the CNNs and especially some key parameters on the identification accuracy, which is helpful to realize feature acquisition and recognition using a CNN.
3) The parameter selection for this design and the influence of key figures are studied in depth for the case of petrochemical unit fault diagnosis. The 1-D time series encoding method is introduced, which is easy to apply and does not require too much preprocessing of the original data. This solution is more convenient to handle with huge amounts of data, and establishes new directions for advances beyond the state of the art. This paper is divided into four parts. The second part is presents work in the field. In the third part, we discuss relevant theories, to establish a method of time signal processing using image processing techniques and utilize CNN theory. In the fourth part, the experimental environment and analysis are presented. Data sets collected at different times are used to test the method and the results are presented and discussed. The last part is the summary and acknowledgement.

II. RELATED WORK
In this section, we describe related work about transforming a 1-D vibration signal into a 2-D gray image fault classification, and we also introduce the related work about the fault diagnosis by deep learning.

A. TRANSFORMS A 1-D VIBRATION SIGNAL INTO A 2-D IMAGE
Ding [21] first transformed the vibration signals into wavelet packet images and then used the CNNs to recognize the health condition of bearings. Long [20] proposed a data-driven fault diagnosis method based on CNNs, aiming to tackle the limitations of traditional data-driven fault diagnosis methods that are based on experts to extract features, and determined the influence of manually extracted features on fault diagnosis with good results. Since then, based on the excellent performance of deep learning in image processing, researchers have tried to map the original one-dimensional signals to two-dimensional images, that is, the vibration data of the machine were represented using two-dimensional images. Chong [22] proposed an effective way to determine the cause of the failure characteristics of an induction motor from a vibration signal. This method transforms a 1-D vibration signal into a 2-D gray image, and has high fault classification accuracy. Kang and Kim [23] proposed an approach for a 2-D representation of Shannon wavelets for highly reliable fault diagnosis of multiple induction motor defects and this method was shown to achieve higher classification accuracy even in noisy environments. Lu [24] put forward a method to convert signals into images and classify these images with Probabilistic neural network (PNN). This approach verified that image-based methods have higher accuracy and provided an efficient means used for mechanical monitoring of rotating units. Yuan [25] proposed the Continuous Wavelet Transform be used to convert 1-D original vibration signals into 2-D time-frequency images, aiming at the problem that emerging VOLUME 8, 2020 deep learning methods training requires a large number of fault signals. However, these methods have limitations, and most of them rely on expert knowledge.

B. FAULT DIAGNOSIS
Xiong [26] and other scholars proposed a data fusion method relying on Mutual Dimensionless, aiming at the problem that data fusion in traditional methods is not accurate enough. In this method, the original data were collected in real time and dimensionless calculations were carried out to obtain five dimensionless indexes of each data set. In the end, the dimensionless indexes were used to process the original data, which solved the problem that the original dimensionless indexes were not perfect, leading to the low fault diagnosis accuracy. Other scholars [27] proposed a dimensionless basis optimization algorithm for the SVM based on a multi-genetic algorithm, aiming at the problem that overfitting or underfitting phenomena can sometimes occur due to the strong dependence of SVM algorithms on particular parameters and the lack of systems theory relating to parameter selection. Recently, Many excellent researchers have proposed new methods of fault diagnosis. In the time-frequency domain, Attoui [28] proposed a different solution to automatic machinery condition monitoring based on the signal similarity measurement in the time-frequency domain. Zheng [29] proposed a new sparse elitist group lasso denoising algorithm in frequency domain to detect the incipient impulse-based fault feature, which is free of utilizing the prior knowledge. Chen [30] and other scholars studies the data-based diagnosis of mixed faults coming from multiple components with an emphasis on model robustness against a wide spectrum of external perturbation. This method solved the problem that the original wide spectrum were not perfect. In terms of wavelet transform, in view of the shortcomings of traditional singular value, Hua [31] an enhanced SVD is introduced in paper to detect the bearing fault. In terms of deep learning, In view of the shortcomings of traditional fault diagnosis methods based on time domain vibration analysis, which require complicated calculations of feature vectors, Duan [32] proposed a learning framework called deep focus parallel CNN to overcome the shortcomings of traditional fault diagnosis. These method provided an efficient means for troubleshooting of rotating equipment. In this paper we propose a CNN framework for mechanical fault identification and classification based on the encoding of 1-D time series to 2-D images [33], [34].

III. THEORETICAL BACKGROUND
In 2006, due to the complexity of the neural network required to learn the necessary characteristics and the training difficulty problem, Hinton [35] first proposed deep belief networks and limiting the depth of the Boltzmann machine training algorithm. This algorithm solved the problem of traditional multi-layer neural network training strategies obtaining only locally optimal solutions or their inability to guarantee convergence. However, in dealing with 1-D time series, it is difficult to carry out determine the neural network training characteristics due to the difficulty in constructing a prediction model and the training of cyclic neural networks [33]. On the other hand, the recognition performance of deep learning networks for 2-D images has been proved [36], [37]. Therefore, in this paper, through the proposed algorithm we look into how to map a 1-D vibration signal to a 2-D representation is the problem. We propose a fault diagnosis framework based on encoding the time series and a CNN, as shown in Fig 1. The time series encoding theory and derivation are as follows. Considering the values of the 1-D time series and its corresponding timestamps, and in order to make the inner product not biased to the maximum observed value, equation (1) is used to normalize the time series within the range [−1, 1].
Definition 1: A new correlation equation is defined, in which the values of the time series and their corresponding timestamps are correlated using the angle and the radius, respectively. The angle is calculated using arccos(x), and its values lie within is [0,π]. When calculating the radius, the radius interval [0,1] is equally split into N parts, and N +1 separation points [0, · · · , N ] are continuously correlated with the time series. The correlation equations are shown in (2) and (3).
As shown in Fig 2, the time series is mapped to polar coordinates using a bijective function mapping, and the dependence of the time series is maintained by r coordinates. Combined with Hypothesis 1, as the time series increases the polar coordinate r tends to 1 with the increase of the time series, and the combinatorial mapping is well-realized. In this paper, Definition 1 is used to construct a bidirectional mapping between a 1-D sequence and 2-D space to ensure the integrity of fault information.

B. NON-SPARSITY
The inner product of a two-dimensional polar space has several limitations because the norm of each vector is adjusted according to time dependence. When the inner product of the observed value and itself is calculated, a norm with deviation will be obtained, which is not conducive for our purposes. Therefore, we must define an inner product operation that only depends on angles.
Definition 2: The inner product can be obtained through a certain operation between two vectors to measure their ''similarity''. It allows for the use of concepts derived from traditional Euclidian Geometry: length, angle, and orthogonality of the second and third dimensions.
In 2-D space, two fault signal vectors δ and is defined as shown in equations (4) and equations (5).
By defining the norm of δ and as 1, equation (6) is obtained.
We know from equation (6) that if we are processing unit vectors, inner product is only determined by the angle , which is expressed in radians. The calculated value lie in Hypothesis 2: In order to retain the time-dependence of the vibration signal, the Gram matrix is introduced below. Moving from the upper left corner to the lower right corner corresponds to increasing time, so the time factor is positioned into the geometric structure. Suppose we have a group of fault vibration 1-D time series, the Gram matrix containing n vectors is a matrix defined by the inner product of each pair of vectors, as shown in equation (7).
Assume that all 2-D vectors are unit vectors, as shown in equation (8).
Since univariate time series are 1-D, the inner product cannot distinguish valuable information from Gaussian noise. This is shown in Fig 3, where the time series Gram matrix values of length N were used to obtain the density histogram and inner product three-dimension (3-D) graph. In addition, it can be seen from the density map that the output of the Gram matrix is non-sparse and seems to follow the zero-centered Gaussian distribution. Since the more the data follow a Gaussian distribution, the more difficult it is to distinguish it from Gaussian noise, the resulting picture is also noisy. By observing the histogram and inner product 3-D graph of the output density of the Gram matrix values, we see that VOLUME 8, 2020 before the formulution of the Gram matrix, the data must be processed to remove the noise.
Definition 3: Any operation similar to the inner product inevitably converts the information of two different observation values into a single value. For this purpose, another way of retaining the information given by the two angles and defining the inner product is defined, as shown in equation (9).
where ξ is the angle between δ and . In this paper, different symbols are used to solve the problem that the inner product does not meet the requirements of linearity and positive definiteness; we define the Gram-like matrix of equation (10) The original value of the time series after scaling constitutes the diagonal of the Gram matrix, and the time series is reconstructed according to the high-level features learned from the neural network. The temporal correlation is imposed by the direction of the time interval K , which can be interpreted according to the relative correlation. In addition, a new sparse Gramian angular field is obtained from Definition 3, and its distribution density is shown in Fig 4. It can be seen that the matrix mapping image is easy to distinguish from noise, which allows us to fully utilize the advantages of CNNs in sparse data processing [23]. Definition 4: In order to explain Definition 3 and obtain the sparsity of the new Gram matrix and its density distribution histogram, a new penalty function is constructed, as shown in equation (11).
cos(ξ 1 + ξ 2 ) = cos(arccos(δ) + arccos( )) = cos(arccos(δ)) · cos(arccos( )) − sin(arccos(δ)) · sin(arccos( )) A new penalty function can be defined using equation (11), as shown in equation (12). The newly constructed operation corresponds to the penalty function 3-D diagram of the traditional inner product, as shown in Fig 5. δ In order to utility the function of the penalty function from Definition 4, the function's 3-D and density diagrams were simulated and analyzed, corresponding to the penalty function 3-D diagram of the traditional inner product, as shown in Fig 5. Through the density distribution histogram in Fig 4 and the 3-D graph, it can be seen that the main attribute of the function include, moving the average output to −1. When the values of δ and are 0, the penalty is the largest at this time, because these points are closer to the Gaussian values. If δ is equal to , the output is −1. Compared with the Gram matrix above, the output of the Gramian angular field is easier to distinguish from Gaussian noise, which can give full play to the advantages utilizing neural networks for image processing. The time coding method used in this paper has several advantages: it provides a way to maintain time dependence, which is retained by the Gram matrix. Also, since time increases as the position moves from the upper left corner to the lower right corner, the dimension of time is defined as the geometric structure. The map constructed between the 1-D time series and 2-D space is bidirectional, so no information is lost. The new definition of Gram matrix is adopted, and the absolute time relation is preserved relative to Cartesian and polar coordinates. The diagonal consists of the original values of the time series after scaling. The temporal correlation is explained through the relative correlation by superposition of the direction of time interval k.
The method presented in this paper has limitations. For example, the main diagonal represents the original time from the upper left corner to the lower right corner. Since the number of data points changes from n to n 2 , the data processing time also increases correspondingly, and this operation is not the inner product in the real sense.
Through the above steps from Definition 1 to Definition 4, the double-mapping effect of the 1-D time series is obtained, and the process is shown in Fig 6 and 7.

C. CONVOLUTIONAL NEURAL NETWORK
CNNs form a branch of deep learning. As a type of feed-forward neural networks, inspired by the animal visual cortex, researchers began to study the relationship between various layers [38]. CNNs can automatically study the layering function from the input images, in which features from the higher levels are more instability to the lower levels, and the instability features are conducive to classify accurately and automatic learning. A typical feed forward neural network is constructed using multiple filters. These filters can extract input data features and complete convolution and pooling of all filtered data at the convolutional and pooling layers, respectively. By constructing a recursive neural network with many layers, the input data of the neural network will slowly lose their topological structure, and eventually be abstracted [15]. The process of CNN operation is shown in Fig 8. CNNs consist of three layer types: convolutional layers, pooling layers and fully connected layers. In the convolutional layers, the input image is convolved with the weight in the convolution kernel, and a new feature map forms the output. The weighted feature process is shown in Fig 9. The maximum pooling operation is most commonly used in pooling rule, as shown in Fig 10. In the classification task, the output layer is a softmax function used to predict the category. Based on the confirmed better performance of CNNs in dealing with sparse data, there are some famous CNN models, such as LeNet-5 [39], AlexNet [40], VGGNet [41], etc.
For the method in this paper, the training models based on LeNet-5 are designed to solve the method proposed in this paper. In the LeNet-5, the models only contain two alternating convolutional and pooling layers with one full connection layers for images. In the proposed CNN models, they contain five alternating convolutional and pooling layers with three full connection layers for images in this paper. ReLU is selected as the activation function of the model in this paper. Compared with the traditional activation function, in terms of avoiding gradients, the ReLU activation function can gradient vanishing problem is avoided more effectively [42].
For the method in this paper is also different from original LeNet-5, we use the minimum cross-entropy loss training classification neural network to better determine the approximation between the actual output and the expected output. Its definition is shown in equation (13).
Here L(·) is 0 or 1, in terms of real labels, and p is the output probability in the CNN model. The role of cross entropy loss is to measure the error between the predicted value and the actual value.

IV. EXPERIMENTAL ENVIRONMENT AND ANALYSIS
Considering the complexity of big data, but also the limitations of the petrochemical plant health monitoring system, in this study we propose a fault diagnosis method based on the combination of 1-D encoded time series and deep learning theory. Fault classification is realized using a CNN through the multi-layer nonlinear learning of 2-D representations of the original time signal acquired. The experimental environment and process of this method will be discussed and analyzed below.

A. EMT490
We use EMT490, which is a new generation of data collection and signal analysis instrument. The instrument is small in size, light, with a compact structure, which meets the demand for portable collection, big filed data collection and the analysis function of machine fault.

B. EXPERIMENTAL ENVIRONMENT
A multi-stage centrifugal fan fault diagnosis unit was used to collect experimental data from large rotating machinery on an experimental petrochemical fault diagnosis platform, as shown in Fig 11. The computer graphics card model used VOLUME 8, 2020   in this thesis was NVIDIA GeForce GTX 1070. This unit included a 11KW 5-stage centrifugal blower, along with the necessary transmissions, an inverter motor, a torque sensor, a conventional conversion shaft, gears and several bearings. In this fault diagnosis unit, we can simulate a variety of common faults, such as the abnormal sound of multi-stage centrifugal blowers.
We placed an EMT490 data acquisition probe in the designated location, denoted with label '' 4 '' as shown in Fig 12. At the same time, experimental data with the help of the Guangdong Provincial Key Laboratory system software, reading and storage are realized. The instantaneous values of acceleration would be collected by EMT490, the sampling frequency of which is 1024Hz and the time taken for each data set is 410s. The initial data collected included chassis vibration acceleration values. Considering that the characteristics of the shaft will be different depending on the fault location, different fault locations were analyzed. Generally, the fault type is determined by analyzing the vibration acceleration information of the rack. The failure data used in this experiment were 200 sets of normal, inner ring wear measurements collected on the same equipment at different time periods. For convenience, the data collected in each case were numbered. N, MT, RW and OW were used to represent the normal, missing tooth of large gear, inner ring wear and outer ring wear, respectively. The specific numbers are shown in Table 1.

C. EXPERIMENTAL PROCEDURES
Step 1: The EMT490 data collector was used to collect the vibration acceleration measurements of the housing for the four fault types at the same position of the housing, and 50 sets of measurements were collected for each fault for 4 times. Here, each set of data has 1024 vibration acceleration measurements obtained from the housing. Plot the time-domain waveforms of the collected vibration signals, as shown in Fig 13. Step 2: We used the EMT490 data management system to export the collected data and save them in csv file format.
Step 3: We randomly divided the 200 data sets for each fault type into groups of 180 and 20 sets, where the former was used as the training set and the group of 20 datasets was used as the test set. The Python environment was used to encode the time series. The angles of each group were calculated according to Definition 1, and Definition 3 was used to generate the class Gram matrix. Using the class Gram matrix, we then generated the Gramian angular field, that is, a 2-D image with a resolution of 512*512 was generated using the time series. The process is shown in Fig 6 and  Fig 7. We repeated the third step, randomly splitting each fault typeąŕs 200 data sets into 1-D vectors with different equal parts, and mapping them to images with different resolutions, such as 100*100, 256*256, and 1024*1024. Images converted from four original vibration data are presented in Fig 14, which are easily distinguishable from each other. Step 4: Using neural network is the third step. Firstly, to train training and testing the convolution neural network VOLUME 8, 2020  (4), where C (10, 5, 5, 1) is a convolution process. P (2, 2, 'same') represents a pooling process, with a secondary sampling factor of 2 × 2, and the pooling method is the 'same' filling strategy. After the convolution of each channel, the boundary of the feature graph is filled. The pooling strategy adopted in all pooling layers is maximum pooling, which can avoid distortion effectively. The FC generated output vectors for the full connection layer of neurons, and each output vector contained corresponding units. Finally, the softmax regression function was used to represent the probability estimation of class membership of each image.
Step 5: Firstly, to improve the accuracy, we changed a single parameter at a time, such as resolution, full connection number, number of hidden neurons in hidden layer neurons, attenuation and joint weight bearing failure to determine the 2-D accuracy of the comparison and, through the fine tuning strategy, to make the loss function to achieve optimal, the highest accuracy. Finally, the output was compared with the fault type to determine the identification accuracy. The experimental process flow chart is shown in Fig 15. Step 6: In order to verify the portability of the proposed method, we analyzed the experimental data of Case Western Reserve University Bearing Data Center. At the same time, experiments and analyses were conducted. Images converted from seven original vibration data are presented in   accuracy and time of diagnosis. If the resolution is too high, the recognition burden on the CNN will also increase, and the computational time will increase greatly. Experimental results show that when the image resolution is 1024*1024, the average diagnostic accuracy is better.
As shown in Fig 17, the accuracy of fault diagnosis increases with the increase of image resolution. When the resolution is 1024*1024, the average diagnostic accuracy is slightly improved, but the required time is significantly increased. By analyzing the results and influencing factors, the 512*512 resolution time series encoding was selected as a compromise between training time and diagnostic accuracy. The average recognition accuracy achieved at this resolution was more than 93%, which means that fault diagnosis was successful. The proposed method does not require a large amount of data, but achieves good diagnostic accuracy, which is mainly attributed to the fact that the time series are encoded into 2-D images, which retains the effective features of faults. When the fault diagnosis accuracy requirement is relaxed, lower resolutions can be adopted to further shorten the network training time and facilitate the quick identification of fault types. In addition, for smaller quantities of measured data, the equipment requirements for data storage and transmission can be relaxed.

2) NUMBER OF HIDDEN NEURONS IN THE FULLY CONNECTED LAYER
Although it is possible to build a good neural network using only the convolutional layer, the pooling and fully connected layers are still added in most neural network architectures, because the pooling and fully connected layers are easier to design than the convolutional layer. In this paper, the influence of the number of hidden neurons on feature learning is analyzed, as well as its influence on accuracy of fault diagnosis and the average time spent on training the model. Fig 18 and 19 show the influence of the neurons in the two hidden layers. As can be seen in Fig 18, the first fully connected layerąŕs hidden neurons have relatively little influence on accuracy. As the number of hidden neurons increases, the fault recognition accuracy increases slightly but not much. When the number reaches 2056, the average recognition accuracy decreases slightly. When the number of neurons reaches 512 or larger, the accuracy of fault detection is around 94%, which is stable, and the normal error is small.
Since the training time increases with the increase in the number of neurons, we have chosen 512 as the number of neurons in the first hidden layer.
Usually, the number of neurons in the hidden layer of the second fully connected layer should be less than or equal to the number of neurons in the first hidden layer. In this paper, the effects of 128, 256, 512 and 1024 neurons in the second hidden layer were studied, as shown in Fig 19.   FIGURE 19. Influence of the number of neurons in the hidden layer of the second FC layer.
When the number of hidden neurons increased, the average diagnostic accuracy fluctuated slightly, but the general average recognition accuracy remained above 90%. When the number was greater than 256, the diagnostic accuracy remained almost steady, while the training time increased slightly. When the number was 512 or larger, the diagnostic accuracy decreased slightly, but the time required increased significantly. Considering the training time, 256 neurons were selected for the second hidden layer.

3) IMPACT OF WEIGHT DECAY
This parameter adjusts the size of the weight to avoid overfitting, thereby improving the stability of the network. The average reconstruction error can be controlled by the weight decay parameter, and so can the relatve importance. The weight decay parameter is usually close to zero. In this study, the weight decay parameter ranged from 1e-1 to 1e-6. Fig 20 shows the average recognition accuracy for  Fig 20 that the smaller the weight decay parameter is, the higher the average diagnostic accuracy is, and the smaller the standard deviation is. When the weight decay parameter is less than 1e-4, the average accuracy is almost stable, the standard deviation is smaller, and so is the loss function. Based on the above analysis and the required training time, 1e-4 was selected as the weight decay parameter.

E. COMPARED WITH TRADITIONAL METHODS
Deep learning has been widely used in fault classification of rotating machinery. In this part, the proposed time coding method is compared with a multi-hidden layer approach, a shallow monolayer neural network with the same deep learning structure, Mutual Dimensionless using an SVM and a Multi-Genetic algorithm. The diagnostic accuracy of 12 experiments conducted using different diagnostic methods is shown in Fig 21.  The above results prove that the method proposed in this paper has good stability, and achieves high diagnostic accuracy and small errors. Although the recognition accuracy of the SVM and the Multi-Genetic algorithm are comparatively stable, their diagnostic accuracy is lower than the proposed method. However, compared with the other two methods. Compared with the SVM method, shallow CNNs have higher diagnostic accuracy, but the results fluctuate due to low stability, thus the machine health cannot be monitored effectively and timely. Compared with the method proposed in this paper, shallow CNNs have lower stability and diagnostic accuracy. The 5 methods were applied experimentally, and the average diagnostic accuracy and standard deviation were obtained from 12 sets of data for each method. The results are compared in Table 2. In order to test the feasibility and applicability of this algorithm in fault diagnosis engineering of petrochemical rotating unit more directly, the proposed algorithm was compared with Mutual Dimensionless + SVM algorithm. In the fault data measured in this experiment, 180 data sets were randomly selected for each fault type. The fault type data and corresponding data sets are shown in Table 3. The algorithm combining the Mutual Dimensionless + SVM was simulated in the Matlab environment, and the random function of the environment was used to generate the random data of different tests. The experimental simulation was carried out, and the accuracy was calculated in each case. The comparison results are shown in Table 2. Table 2 shows that when the failure types is No.3, from the point of view of average diagnostic accuracy, the proposed method achieves the best performance of 94.07%, while the BP neural network shows the worst performance of 35.76%. In addition, from the point of view of diagnostic stability, the proposed method achieved the best performance with a standard deviation of 1.04%, while the shallow CNN performed the worst with a standard deviation of 8.46%. These results show that, compared with the other four fault diagnosis algorithms, the proposed algorithm combining the encoded time series and a CNN has obvious advantages. It can directly extract fault characteristics from the 2-D representation of the encoded time series and it was found that through the method of deep learning, fault diagnosis and health monitoring of the device can be realized. This method encodes the vibration data so that the fault can be identified and classified by the neural network model. If we compare this method with the traditional method, for example, based on time-domain signal analysis, the solution of this paper requires only a small number of samples to train CNNs to obtain higher recognition accuracy. One of the major drawbacks of traditional statistical properties is that they cannot express all the fault information of the equipment. Therefore, the method combining common statistical features with an SVM has lower average diagnostic accuracy. Through deep learning of data encoded into 2-D representations, the CNNs can mine hidden information and distinguish differences. Therefore, the classification method based on CNN proposed in this paper can achieve high classification accuracy in many experiments and maintain relatively stable classification performance. However, the characteristic learning ability of the BP neural network is limited due to its shallow structure, leading to low diagnostic accuracy. The shallow CNNs method adopts an adaptive gradient algorithm with unstable performance and poor generalization ability.
As shown in the comparison results of Table 2, 12 simulation experiments were conducted for each fault type diagnosis group. Due to their accuracy standard deviation (ASD) being far from ideal, the following discussion does not refer to the shallow CNN and the BP neural network. A significant difference between the proposed method, the SVM-based and Multi-Genetic algorithm appears as the number of fault types increases. To some extent, the average accuracy (AA) of the three methods fluctuates. The average accuracy of the commonly used Mutual Dimensionless + SVM method fluctuates greatly, from 90.30% for two fault types to 66.13% for four types, which is a difference of 27.17%. The average accuracy of the commonly used Multi-Genetic algorithm method fluctuates less, from 95.33% for two fault types to 76.37% for four types, which is a difference of 18.97%. The average accuracy of the method in this paper is 99.97% when identifying two fault types and 94.07% for four types, with a difference of 5.9% in the highest average accuracy. For the identification of two fault types, the method in this paper has relatively stable identification accuracy. In terms of accuracy standard deviation, both our method and the other two methods have relatively stable performance. The minimum standard deviation of SVM method was 1.28%, the minimum standard deviation of the Multi-Genetic algorithm method was 1.84%, while the minimum standard deviation of this method was 0.08%. Compared with the other two methods in Table 2, these three methods show the best performance stability. Table 2 shows that when the failure types are No. 1 and No. 2, the proposed method and the Multi-Genetic algorithm have good performance. The identification accuracy of the Multi-Genetic algorithm was 95.33% and 91.86%, respectively, while that of the proposed method was 99.97% and 98.85%, respectively. However, with the failure types were expanded to include No. 3 during machine operation, the average identification accuracy of the Multi-Genetic algorithm dropped to 76.37%, which shows that it fluctuates greatly, while the average identification accuracy of the proposed method was maintained at 94.07%, which was relatively stable compared with other methods. This is also an advantage of the proposed method with regard to identification accuracy. Table 2 shows that the training process of the proposed method in the paper is more time-consuming compared with the other methods.
The method in this paper directly extracts fault characteristics from the 2-D representation of the encoded time series and with the help of deep learning algorithms, fault diagnosis and health monitoring are realized. The method encodes the vibration data, which allows identification and classification of faults by the neural network model, and increases the computational load for machine learning. In the future work, common statistical analysis and encoding time series theory can be combined to preprocess the original vibration signal, and the advantage of deep learning in image processing can be used to establish a model for fault diagnosis and health monitoring of petrochemical units.

F. PORTABILITY
In order to verify the portability of this method, this paper analyzed the experimental data of Case Western Reserve University Bearing Data Center [45] in this section. The data sets are composed of multivariate vibration series generated by a bearing test rig. In this experiment, the drive end vibration signals are employed which the sampling rate is 48 kHz. The bearing data set included the following four health conditions: Normal condition (N), Outer race fault (OF), Inner race fault (IF), and Roller fault (RF). Different faults can be classified according to different fault diameters. The experimental results are shown in Table 4.
In the analysis of the data provided by Case Western Reserve University, it can be seen from Table 4 that when the failure types are No. 1 and No. 2, the average accuracy can reach up to 100%. With the increasing in fault types, the average accuracy decreases. When the number of fault types reached 7, the average accuracy was 96.28%. After a large number of experiments, we found that the data set of Case Western Reserve University used in this paper for classification is more stable and more accurate than the experimental data in this paper. Due to the large scale of the data, we speculate that the method of this paper is more suitable for huge amount of data.

G. COMPARED WITH THE OTHER METHOD OF LITERATURE
The results are also compared with the literature [11], and the experimental results are shown in Table 5. Table 5 shows that when the failure types are Data1-No. 3 and Data2-No. 7, both the proposed method and the Compression+DNN method have good performance. However, from the point of view of diagnostic stability, the proposed method achieved the best performance with a standard deviation of 1.05% and 0.98%, while the Compression+DNN method with a standard deviation of 1.56% and 1.02%. Even though the compression + DNN method has achieved a good result on the average accuracy of 92.61% and 96.07%, in this paper, the proposed method achieves an average accuracy of 94.07% and 96.28%. This result validates the performance of the proposed method and it can be clearly seen that the proposed method in this paper outperforms the Compression+DNN method of literature [11]. The CNN model in this paper is compared with the RNN model in order to verify its superiority. The average accuracy of RNN is only 87.23% and 93.69%, which is inferior to the proposed method in this paper and the Compression + DNN method in literature [11].
In order to verify the validity of the proposed method, each fault data set consists 200 data for the experiment. However, as increasing the size of the data can achieve better experimental results, we can try to improve the computer performance and increase the size of the data set, which may lead to better experimental results.

V. CONCLUSION
In this paper we study a conversion theory of fault vibration from signal to image, and propose a method combining a CNN model with this transformation theory and apply it to the field of fault diagnosis. The proposed methods were tested on two datasets, and they achieve the average accuracy of 94.07% and 97.83%, respectively, which outperform other deep learning and traditional methods. The results are also compared with the other deep learning method, it can be clearly seen that the proposed method outperforms the method in the literature. These results show the good potential of the proposed methods in the diagnosis field.
The advantage of the proposed method is that it does not require expert knowledge or predetermined parameters. It provides a new method for data processing of mechanical faults, which is worthy of further study. The method has several limitations. Such as the duration of the training process will not be short and requires a better GPU hardware environment. With the above in mind, we will conduct further research in the following directions. First, the encoding time series algorithm can be improved to make the fault characteristics more prominent, so that different fault types can be added to maintain stable diagnostic accuracy. Second, further effort will be devoted to reducing the training time of the CNN-based fault diagnosis model.

DATA AVAILABILITY
The data used to support the findings of this study are available from the corresponding author upon request.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.