Fault Diagnosis Based on Space Mapping and Deformable Convolution Networks

The data-driven method based on deep learning is one of the popular issues in the field of fault diagnosis. The completeness and representativeness of the feature matrix from massive and high-dimensional fault data have a great impact on fault diagnosis performance. In addition, the ability of deep networks to extract the spatial characteristics between fault data is especially important for the accuracy of fault diagnosis. Therefore, we propose a method based on space mapping and deformable convolution networks (DCN) to ensure diagnostic accuracy by improving the spatial resolution and spatial constraint characteristics, and both the size and shape of the convolution kernel, one of the key steps in DCN, are adjusted adaptively according to the input of different sizes. Original data are projected into a more discriminative space by the combination of CN and PCA (i.e., space mapping). Then, DCN extract spatial constraints between fault data by training. The Case Western Reserve University (CWRU) bearing dataset and Xi’an Jiaotong University and Changxing Sumyoung Technology Co., Ltd. (XJTU-SY) datasets are used as benchmarks to perform experiments. The results demonstrate that the fault diagnosis method proposed in this paper performs well and can achieve 100% accuracy in the first several epochs. Comparative experiments based on 3 deep learning methods that combine preprocessed and unprocessed data with a convolutional neural network (CNN), residual networks (ResNets) and DCN are carried out to further show the advantages of the fault diagnosis method based on space mapping and DCN.

Recently, with the rapid development of computer science and technology, artificial intelligence has played an essential role in industrial manufacturing. The study of industrial cyberphysical systems has become a worldwide research focus [1]. Due to the large scale and complexity of industrial control equipment, as the industrial manufacturing process becomes more automated, failures and faults are more likely to occur. Equipment maintenance becomes more important The associate editor coordinating the review of this manuscript and approving it for publication was Jonghoon Kim . because the occurrence of small faults can cause damage to equipment, which has a serious impact on both the economy and production process [2]. Additionally, fault detection and abnormal diagnosis of dynamic systems have attracted much attention in industry and academia [3]. Therefore, it is necessary to detect and diagnose faults early to reduce abnormal circumstances and unforeseen situations for complex dynamic systems, especially for rolling bearing elements.
Fault diagnosis methods generally consist of three parts: 1) estimation techniques and analysis approaches based on mathematical models, parameters and mathematical statistics of data [4]- [7]; 2) expertise methods based on expert VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ experience and prior knowledge; and 3) data-driven methodologies based on machine learning and signal processing. The massive and high-dimensional data create modeling complexity and analysis difficulty in estimation techniques and expertise methods [8]- [11]. Data-driven methodologies based on machine learning and signal processing are currently widely used. Because time-domain or frequency-domain signals can be directly used to extract fault features [12], Fourier transform (FT) and wavelet transform (WT) have played a crucial role in recognizing mutation signals in the field of fault diagnosis [13]- [16]. There is a limitation that the Fourier transform cannot accurately locate fault points. The wavelet transform can solve this problem by using different time windows in different time domains and different frequency domains. The usual approach is to use WT or FT to convert the fault vibration data into time-frequency images and then use a support vector machine (SVM) or extreme learning machine (ELM) for diagnosis and classification, which is efficacious but slightly complicated [17], [18].
With the rapid development of artificial intelligence, some deep learning methods, such as artificial neural networks, have achieved great results in fault diagnosis due to their strong feature extraction capability. Artificial neural networks include convolutional neural networks (CNNs), deep belief networks (DBNs), residual networks (ResNets), and deep transfer networks (DTNs) [12], [19]- [23]. The collected fault data are input into specific networks, and the result of classification or diagnosis can be directly output, which are called end-to-end methods. Most of the data-driven deep learning methods are based on different networks or improvements to existing network models [24]- [26]. Although the fault data collected from the actual conditions generally have high-dimensional characteristics, there are few data preprocessing processes in the fault diagnosis methods [27], [28].
Most of the existing data-driven fault diagnosis methods extract fault features by neural networks, which demands that the backbone networks are complicated enough to have strong feature extraction capability. If the data entered into the network are sufficiently discriminative, the requirements for the backbone network will be reduced. Therefore, we introduce data preprocessing for fault diagnosis.
Color names(CN) performs well in object detection and action recognition fields for dividing the 3 RGB colors into 10 specific and representative channels, black, blue, brown, green, orange, pink, purple, red, yellow and white [29]- [31], which means it can subdivide color features and make features more distinguishable to facilitate subsequent network recognizing objects. Therefore, we consider introducing CN for data preprocessing fault diagnosis to divide the features of original fault data into multiple and more representative dimensions for further improving the spatial resolution of data to make fault diagnosis easier. Principal component analysis (PCA) is combined with CN to extract principal features and reduce dimensions if there are correlations between the multiple dimensional features. By comparing the spatial contribution of the original fault data and preprocessed data after space mapping (as Figure 1 shows), it is shown that the combination of CN and PCA improves the spatial discrimination of features.
In this paper, from the massive and high-dimensional characteristics of the collected fault raw data, we introduce a data preprocessing method that combines CN and PCA. We aim to extract representative features and reduce dimensions to further construct a complete and representative feature matrix. From the perspective that the ability of deep networks to extract the spatial characteristics of faults is especially important for the accuracy of fault diagnosis, we introduce deformable convolution networks (DCN) as backbone deep learning networks for fault diagnosis to achieve high fault diagnosis accuracy.
The rest of this paper is organized as follows. Section 2 presents the materials and methods of CN, PCA and DCNs, which are related in this paper. Section 3 presents the details of the data acquisition, parameters and experimental setup. Section 4 shows the results of the methods proposed in this paper and a comparison with two other deep learning methods. Finally, section 5 concludes this paper.

A. COLOR NAMES(CN)
In recent years, color attributes (or color names) have been widely applied in the fields of object detection, image recognition and action recognition [29]- [31]. Berlin and Kay performed a study that determined linguistic color labels that include eleven basic terms: black, blue, brown, gray, green, orange, pink, purple, red, white and yellow [32]. In other words, color names are assigned by humans to represent colors in the real world. Based on this color attributes method, a mapping learned by images retrieved with Google-image search and is provided by [33], is used in this paper. After mapping the W2C matrix, the RGB values can be converted into an 11-dimensional color space whose probabilistic sum is 1.
In the field of computer vision, the grayscale values in conventional object trackers always need to be normalized within [−0.5, 0.5] [34]. The color names are normalized by a technique to obtain a better performance. The technique is a normalization that is performed by projecting the color names to an orthonormal basis of this 10-dimensional subspace [34]. Therefore, the 11-dimensional space can be reduced to 10-dimensional subspace. Simultaneously, the color names can be centered by this projection.
Rolling bearing data in actual measurements conventionally consist of multiple channels and inevitably contain interference factors such as noise. Based on the strong data fusion capability of CN, this paper introduces the CN method to the data preprocessing of rolling bearing fault diagnosis to project the original fault data into more specific and different channels. Analogous to the RGB three channels, three channels of data are selected from the original data to be projected by the W2C matrix and normalized to 10 dimensions. Henceforth, the multichannel raw data are fused and divided into 10 dimensions. Each of the 10 dimensions can independently represent different typical features.

B. PRINCIPLE COMPONENT ANALYSIS(PCA)
In the research area of most studies, large quantities of data are needed to search for rules. Without doubt, a large quantity of data will provide a wealth of information and make it easier for analysis. However, there may be correlations between many variables so that it is necessary to extract principal information and reduce the dimensions of data to boost computational efficiency. If the analyzed data or dimensions are randomly reduced, it inevitably leads to the loss of useful information. Therefore, it is of great necessity to introduce dimensionality reduction methods.
There are many dimensionality reduction methods, such as local neighborhood structure preserving embedding (LNSPE), local geometric structure Fisher analysis (LGSFA) and principal component analysis (PCA) [35], [36]. From the view of calculation simplicity, PCA is used in this paper to solve the above problem.
The principle of PCA is to project the original sample data into a new space. That is, mapping a set of matrices to another coordinate system. In the new coordinate system, not all of the original samples but the coordinates of the space corresponding to the eigenvalues of the largest linearly independent group of the original sample are needed. The calculation of eigenvalues and their corresponding eigenvectors is a key part of the PCA algorithm. It refers to the eigenvalues of the covariance matrix corresponding to the original data.
In this paper, 784 × 1 data are selected from a sample of original data; then, after the operation of CN, they become 10-dimensional, which includes 784×10 data. If PCA is used to reduce the dimensions, first, the 10 × 10 covariance matrix needs to be calculated. Second, the covariance matrix eigenvalues and their corresponding eigenvectors are obtained by calculation. If the first 4 eigenvalues already account for more than 99% of all the eigenvalues, then only the eigenvectors corresponding to the first 4 eigenvalues are extracted. The selected eigenvectors are formed into a transformation matrix whose size is 10 × 4. Finally, the original data of 784 × 10 multiply the 10 × 4 transformation matrix, and the corresponding coordinates of the original sample data in the new eigenspace can be obtained. The 10-dimensional data were successfully reduced to 4-dimensional data without loss of useful information. Figure 1 shows the difference in the pixel value distribution of two kinds of faults between the data preprocessed by CN and PCA(i.e., space mapping)(b) and the raw data(a).

C. DEFORMABLE CONVOLUTIONAL NETWORKS(DCN)
In CNNs, fixed kernel structures are used to extract fixed feature maps, and pooling layers are used to reduce the spatial resolution at a fixed rate. To effectively adapt the spatial variations of objects, a large scale of images or samples that have different points of view or transformation-invariant features and algorithms are needed. However, conventional convolutional neural networks (CNNs) have disadvantages in terms of adaptive input size and shape. Based on this, deformable convolutional networks (DCN) are proposed to adapt the geometric variations and transformation of objects, and it is confirmed that they have excellent performance. Compared with the CNN, the spatial support of its neural features is more consistent with the object structure [37]. There are two improvements of DCN based on CNN: the first is deformable convolution, which adds a 2D offset to the regular grid sampling position, and the second is deformable region of interest (RoI) pooling, which adds an offset to each bin position in the regular bin partition. The differences between conventional neural networks and deformable convolutional networks are the addition of offsets in convolutional layers and RoI pooling layers. Both methods are based on enhancing the spatial sampling position with additional bias.
The first is convolutional layers. If the input map is defined as x, then the output feature map y in conventional neural networks can be obtained by equation 1. The output feature map y in DCN is presented as equation 2.
where p denotes the location in the feature map. p n |n = 1, 2 . . . , N is the absolute value of a regular grid. The second difference is RoI pooling layers. Homoplastically, given the input feature map x, y as output and as the location at the top left of the RoI. If the RoI size is w × h, the RoI is divided into k × k (k is a free parameter) bins by RoI pooling. Convolutional RoI pooling is always presented as equation 3, but DCN RoI pooling is presented as VOLUME 8, 2020 where (i, j) represents different bins and 0 ≤ i, y ≤ k represents the offsets added to the spatial binning positions.
Although the deficiency of CNN in adapting to geometric changes is improved to a certain extent, the features extracted would be influenced by irrelevant image content. Based on DCN, the improvement version of DCN v2 is proposed to focus on pertinent image regions [38]. The first improvement is to extend the number of deformable convolution layers. The second is to add a modulation mechanism in deformable convolution. Model power can be effectively enhanced by the above two approaches. In this paper, DCNv2 is introduced as the backbone network and we use the abbreviation DCN to refer to it.
In Figure 2, the size of the input is 32 × 32, which is shown as an example. In actual situations, the network can automatically adjust its output sizes according to different inputs.

III. EXPERIMENT SETUP
To effectively take advantage of the high-order quantity feature in massive fault diagnosis data, CN and PCA methods are introduced to preprocessed data. The rolling Case Western Reserve University bearing fault dataset (CWRU) [39] and XJTU-SY bearing datasets provided by the Institute of Design Science and Basic Component at Xi'an Jiaotong University (XJTU) and Changxing Sumyoung Technology Co., Ltd.(SY) [40] are used in this paper to make the experiments and results more reliable.

A. CWRU DATASET
The CWRU dataset used in this paper has 9,900 training data sets and 375 testing datasets. The sizes of the training dataset and testing dataset are [9, 900, 2, 048, 2] and [375, 2, 048, 2], respectively (i.e., every dataset has 2 dimensions, and each dimension contains 2,048 data points). To a certain extent, more data will result in better stability and robustness of deep learning networks. Therefore, to make the dataset contain a sufficient number of samples, data are selected repeatedly with a certain step size for the preparation of the training dataset and testing dataset. The abbreviation of CWRU is used in the following to make the expression more concise.
The data preprocessing is described as follows. Being analogous to the RGB 3-channel input in the field of image processing, three parts were first selected from the same set of data to form a set of 3-dimensional data for the preparation of space mapping. Every part included 784 × 1 data. By projecting the selected data on the W2C matrix, the 3 dimensions were converted to 10 dimensions. However, there may be some correlation between those features. To make the features of each dimension more representative and to make the computation less complicated, PCA was inevitably needed. By calculating the eigenvalues of the 10 × 10 covariance matrix, the first 4 eigenvalues were selected. These corresponding 10 × 4 eigenvectors were computed to form a transformation matrix. Therefore, after CN and PCA(i.e. space mapping), the data were converted to 784 × 4. Finally, the preprocessed data were reshaped as images whose size was 56 × 56(784 × 4 = 56 × 56) for the convenience of training and testing in the deep learning network.
In addition, to make the experiments more reliable, the unprocessed data, also named original data, were also needed as a contrast. There were 784 data repeatedly selected with a certain step from the CWRU dataset to convert into images whose size was 28 × 28.
The flowchart of the data acquisition process on CWRU preprocessed by space mapping and original data is shown in Figure 3. So far, data preprocessing had been completed. The selection was repeated in the process of first selection so that there were 59,400 sets of training data and 2,250 sets of testing data both in the dataset that was preprocessed by space mapping and the original dataset.

B. XJTU DATASET
The XJTU-SY bearing datasets, provided by Xi'an Jiaotong University (XJTU) and Changxing Sumyoung Technology Co., Ltd. (SY) contain complete run-to-failure data of 15 rolling element bearings that were acquired by conducting many accelerated degradation experiments [40]. In [40], the XJTU dataset is made for estimating remaining useful  Life of rolling element bearings, which is inconsistent with the purpose of this article. Therefore, we just used XJTU dataset and did not make comparison with [40] in subsequent part. Its testing rig is shown in their study. The XJTU-SY consists of 3 major categories of operating conditions: 2,100 rpm (35 Hz) and 12 kN; 2,250 rpm (37.5 Hz) and 11 kN and 2,400 rpm(40 Hz) and 10 kN. Each category contains 5 fault types. To make the experiment more comparative, only the operating conditions of 2,100 rpm(35 Hz) and 12 kN were selected for preparing the dataset, which contains 5 faults. In this paper, the abbreviation XJTU refers to the dataset used in this dataset, in which 5 faults were included.
Horizontal and vertical vibration signals were included in the original XJTU-XJ bearing dataset; therefore, each of the 5 fault data sizes of 2,100 rpm(35 Hz) and 12 kN operating conditions is [32768 × 2]. In the experiments in this paper, only horizontal vibration signals are selected for preparing the XJTU bearing dataset. The data preprocessing of the XJTU dataset is the same as the data preprocessing of the CWRU dataset, which is shown in Figure 4. In Figure 4, only one of the 5 faults is presented to show the data acquisition process. Therefore, there are 8,200 sets of data included in the training dataset and 2,050 sets of data included in the testing dataset.
Both the original data of CWRU and XJTU are vibration signals, and the form of data input to the network is images. Taking CWRU as an example, Figure 5 shows the process of data conversion.
According to the methods proposed in this paper, these images are input and are transformed from data to deformable convolutional networks for training a deep learning model and testing the accuracy of fault diagnosis and classification. The specific parameter settings of the backbone network are shown in Table 1.
Other deep learning methods, such as CNN and ResNet, are also introduced in this paper to carry out contrast experiments to further show the reliability and superiority of the methods proposed in this paper.
To make the network more stable, the BatchNorm2d function is added after each layer of convolution for normalization processing. Then, it is followed by a linear rectifying function (ReLU).
The experiments ran on a Windows system with an Intel(R) Core(TM) I7-9700K processor, 32.0 GB of memory and an NVIDIA GeForce GTX 1080. Python is used as the programming language, and PyCharm is used as the integrated development environment (IDE). In the course of training and testing, 32 and 20, respectively, are set as the batch size and number of epochs.

A. RESULTS
After preprocessing by CN and PCA(i.e., space mapping) methods, the data are converted into images to input the backbone network DCN. Experimental results show that, due VOLUME 8, 2020  to the excellent performance of DCN and sufficient data, it can achieve 100% accuracy in the first ten epochs both in CWRU and XJTU. To avoid the occasionality of the results, experiments are run twice, and the mean value is taken as the result. The loss and accuracy of training and testing accuracy are presented in Figure 6. There are 5 fault types both in CWRU and XJTU.
It is necessary to use different deep learning methods as backbone networks to show the advantage of DCN and verify the efficacy of space mapping by severally using preprocessed data and unprocessed data. In addition to DCN, two other deep learning methods, CNN and ResNet, are introduced in this paper for comparison. For CWRU, there are 6 sets of methods that combine preprocessed data and original data with CNN, ResNet and DCNs: CN+PCA+DCN, CN+PCA+CNN, CN+PCA+ResNet, Original+DCN, Original+CNN and Original+ResNet. The testing accuracy of the experimental results based on the 6 methods on CWRU is presented in   Table 3. Table 2 and Table 3 present the experimental results in numerical terms, and Figure 8 shows all 12 methods in the form of a chart.
Above all, experiments based on unprocessed data (original data) and two other deep learning methods were also performed to make comparisons. A total of 12 types of methods were tested to demonstrate the advantage of the methods based on space mapping and DCNs proposed in  this paper. To make the experimental results clearer and more intuitive, a confusion matrix was plotted during training to visually indicate whether there was confusion among these 5 categories, as presented in Figure 7. We chose the testing results of the last epoch in 20 epochs as the benchmark. As Figure 7 indicates, data preprocessed by space mapping perform better than nonpreprocessed data (from the comparison of (a) and (b)), and the DCN deep learning method performs better than CNN and ResNet (which is shown in (e) and (f)).

B. DISCUSSION
The CWRU dataset used in this paper is the same as [18], which uses rolling bearing fault data from Western Reserve University, and the fault diagnosis accuracy is 99.47%. Approximately 1.5 hours is needed in [18] to complete training and testing, while only 40 minutes are needed in this paper. Experiments confirm that the method proposed in this paper performed better than [18] both in accuracy and speed.
As Table 2 and Table 3 show, the fault diagnosis method based on the method proposed in this paper, space mapping and DCN can achieve high accuracy within the first several epochs. The stability of this algorithm is also guaranteed, which can be seen in Figure 9 and Figure 10. By comparing all 12 fault diagnosis methods, it can be seen in Figure 8, Figure 9 and Figure 10 that the combination of data preprocessed for space mapping and DCN can achieve the best performance.
In addition, the computation of the proposed method is not complicated. The computation of the network part is not more complicated than other deep learning methods for fault diagnosis. The computation of space mapping preprocessing is not complicated because it only takes a few minutes to process the whole CWRU rolling bearing dataset.
In our present work, the fault types both in CWRU and XJTU used in this paper are 5. In real cases, with the increase of fault types, the demand for discriminative performance between fault types after data preprocessing  (i.e., space mapping) is higher. Therefore, in future work, we should take the elements of fault types into account and do furter research.

V. CONCLUSION
In this paper, a rolling bearing fault diagnosis method based on space mapping and deformable convolution networks is proposed to effectively address the problems of computational perplexity and modeling complexity caused by massive and high-dimensional fault data. Space mapping, the combination of color names (CN) and principal component analysis (PCA) methods, can effectively extract multichannel features with high-order quantity and construct a representative feature matrix. Feeding the training and testing dataset, images are converted from the constructed features matrix to deformable convolution networks for training and testing, and fault diagnosis can be accomplished with 100% accuracy.
The rolling bearing fault data of Case Western Reserve University (CWRU) and bearing dataset provided by Xi'an Jiaotong University and Changxing Sumyoung Technology Co., Ltd.(XJTU-SY) are used in this paper as the original benchmark datasets to validate the efficacy of space mapping and DCN. The experimental results based on the proposed method show that the diagnosis accuracy can achieve extremely high accuracy in the first several epochs. To demonstrate the high performance of the fault diagnosis method proposed in this paper, a comparison experiment that combines 2 other deep learning methods and two groups of fault datasets that contain preprocessed data and original data is performed. The comparison results show that the fault diagnosis method based on space mapping and DCN performs better both in accuracy and stability.