A New Transfer Learning Fault Diagnosis Method Using TSC and JGSA Under Variable Condition

It is very difficult to obtain the label data of rolling bearings under the complicated and variable working conditions, which results in low diagnosis accuracy. Transfer sparse coding(TSC) is a new feature representation method, which can effectively extract features from data matrix. Joint geometric and statistical alignment (JGSA) is a domain adaptation method, which can reduce the distribution shift and geometric shift between domains. In order to make full use of the feature extraction ability of the TSC and the transfer classification ability of the JGSA, a new transfer learing fault diagnosis(TSC-JGSA) method based on combining the characteristics of the TSC and JGSA is proposed to realize the fault diagnosis of rolling bearings under variable working conditions in this paper. In the TSC-JGSA, the fast Fourier transform technology is used to transform the time-domain signals into frequency-domain amplitudes. Then the TSC is used to effectively extract the deep features from the obtained frequency-domain amplitudes in order to construct a sparse feature matrix, which is input into the JGSA in order to realize the fault diagnosis of rolling bearings. Finally, the vibration data of rolling bearings under variable working conditions is used to prove the effectiveness of the TSC-JGSA. The experiment results show that the TSC-JGSA can effecrively solve the problem of lacking label data in actual engineering by using label data in the laboratory, and obtan higher diagnosis accuracy than other compared methods. It provides a new diagnosis idea for rotating machinery.


I. INTRODUCTION
Rolling bearings play an important role in large-scale rotating machinery and equipment. Once a fault occurs for rolling bearings, it will cause serious economic loss or casualties. Therefore, it is very necessary to carry out fault diagnosis for them [1]- [3]. However, the actual working conditions of rolling bearings are complex and changeable. It is of great practical significance to effectively use the labeled data of known working conditions for fault diagnosis of unlabeled data in the actual engineering under unknown working conditions. The associate editor coordinating the review of this manuscript and approving it for publication was Baoping Cai . In recent years, many scholars have deeply studied the fault diagnosis problem under variable operating conditions [4]- [7]. The electrostatic detection method is used to study the characteristics of electrostatic signals corresponding to different fault injection degrees of rolling bearing under variable working conditions. The multi-body contact dynamic model is established to explore the multi-body contact dynamic characteristics of ball bearing under variable working conditions. The parameter optimized variational mode decomposition (POVMD) and envelope order spectrum are used to extract fault feature information. The vibration signal of rolling bearings is analyzed and processed by fast spectral correlation to obtain the eigenvector, which is input into PSO-SVM for state recognition.
However, the traditional fault dianosis methods have the limitations of complicated analysis process and poor generalization ability under variable working conditions. Transfer learning has attracted extensive attention [8], [9]. The feature extraction method of singular value decomposition and autocorrelation matrix is combined with the transfer learning TrAdaBoost algorithm to diagnose the motor fault [10]- [14]. A new fault identification method based on the combining long and short term memory network (LSTM) and transfer learning (TL) is proposed. A semi-supervised integrated learning tool (SSIT) based on transfer learning is proposed for engine bearing fault prediction to solve the problems of insufficient prediction accuracy and over-fitting [15]- [17]. A bearing fault diagnosis model based on the transfer learning is composed of a stack sparse automatic encoder and a flexible maximum function regression.
Domain adaption is an important branch of transfer learning. It is an effective method to solve the problem that the source domain and the target domain have different data distribution but the same task. The subspace described by the eigenvector is used to represent the source domain and the target domain [18]. A mapping function is learned to find a domain adaptation solution. The mapping function aligns the source subspace with the target subspace. The subspace mapping method is extended. The distribution and subspace basis of source domain and target domain are aligned simultaneously [19]. A kernel-based method is proposed to utilize the inherent low-dimensional structure of data sets [20]. The geodesic flow kernel model describes the changes in geometric and statistical characteristics from the source domain to the target domain by integrating infinite subspaces [21]. A new dimension reduction method, transfer component analysis, is proposed to find a good cross-domain feature representation [22]. The joint distribution adaptation method adapts marginal distribution and conditional distribution in the process of dimension reduction according to certain rules. It constructs a new feature representation. Feature matching and sample re-weighting are established in a unified optimization problem. A transfer joint matching method is proposed [23]. A joint geometric and statistical alignment (JGSA) algorithm is proposed to reduce domain shift both statistically and geometrically [24].
Recently, except for the mentioned feature-based transfer learning methods, domain adaptation is realized through many new ideas. Meanwhile, more and more scholars focus on neural networks and adversarial learning. An algorithm named structured domain adaptation (SDA) is proposed to seek a discriminate subspace shared by two domains where the well-learned knowledge of the source domain can be transferred to the target domain [25]. Samples from both domains are combined together to reveal more shared information across two domains. A previously unexplored instance of the general framework is proposed which combines discriminative modeling, untied weight sharing [26]. A novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes is proposed to accurately classify unseen target data [27]. A weighted MMD model is proposed by introducing an auxiliary weight for each class in the source domain [28]. A novel approach named Learning Distribution-Matched Landmarks (LDML) is proposed [29]. LDML reveals the latent factors by learning a domain-invariant subspace where the two domains are well aligned at both feature level and sample level. A two-stream architecture is introduced where one operates in the source domain and the other in the target domain [30]. In contrast to other approaches, the weights in corresponding layers are related but not shared. A new unsupervised domain adaptation approach called Collaborative and Adversarial Network (CAN) is proposed through domain-collaborative and domain-adversarial training of neural networks [31]. A new domain adaptation method called Domain-Symmetric Networks (SymNets) is proposed [32]. Transferrable Prototypical Networks (TPN) is presented for adaptation such that the prototypes for each class in the two domains are close in the embedding space and the score distributions predicted by prototypes separately on source and target data are similar [33]. Contrastive Adaptation Network (CAN) is proposed to optimize a new metric which explicitly models the intra-class domain discrepancy and the inter-class domain discrepancy [34]. A dynamic Bayesian network (DBN)-based fault diagnosis methodology in the presence of TF and IF for electronic systems is proposed [35]. An OOBN-based real-time fault diagnosis methodology is proposed [36]. A hybrid physics-model-based and data-driven remaining useful life (RUL) estimation methodology of structure systems by using dynamic Bayesian networks (DBNs) is proposed [37]. A personalized diagnosis method to detect faults in gears using numerical simulation and extreme learning machine is proposed [38]. A personalized diagnosis method to detect faults in a bearing based on acceleration sensors and an FEM simulation driving support vector machine is proposed [39]. A new unsupervised domain adaptation method named domain-adversarial residual-transfer learning of deep neural networks is proposed to tackle cross-domain image classification tasks [40]. A domain adaptation method for machinery fault diagnostics based on deep learning is proposed to address the fault diagnostic tasks with data from different places of machines [41], [42]. A domain adaptation diagnostic model based improved deep neural network is proposed, which diagnoses early gear pitting faults under multiple working conditions [43], [44]. In addition, some new algorithms are also proposed, which can optimize the diagnosis models to improve the classification performance [45]- [52].
It can be seen from the previous work that the deep learning method is introduced to solve the transfer problem, which can achieve ideal results by better feature extraction effect. Transfer sparse coding (TSC) is proposed to construct robust sparse representations for classifying cross-distribution images accurately [53]. It is a new feature representation method, which can effectively extract features from data matrix. The JGSA is a domain adaptation method, which can reduce the distribution shift and geometric shift between domains by using shared features and domain-specific features. It can effectively and accuractely realize the classification. In order to make full use of the feature extraction ability of the TSC and the transfer classification ability of the JGSA, the TSC and JGSA are conbined and introduced into fault diagnosis to propose a new transfer learning fault diagnosis (TSC-JGSA) method for rolling bearings under variable working conditions. In the proposed TSC-JGSA method, the time-domain vibration signals under known and unknown working conditions are preprocessed by fast Fourier transform to obtain the frequency-domain amplitudes. Then the TSC is used to extract features from frequency-domain amplitudes to construct the sparse feature matrix, which is input into the JGSA model to realize a new fault diagnosis method. Finally, the vibration signals of rolling bearings under variable working conditions are used to prove the effectiveness of the TSC-JGSA method.
The main contributions of this paper are summarized as follows: • A new transfer learning fault diagnosis method based on creatively combining sparse coding method with the domain adaptation algorithm is proposed to realize the fault classification.
• The motivation to combine TSC and JGSA is to make full use of the feature extraction ability of the TSC and the transfer classification ability of the JGSA to realize the fault diagnosis of rolling bearings under variable working conditions.
• The performance of the TSC-JGSA method is has been extensively investigated by the vibration signals of rolling bearings under variable working conditions.
• The TSC-JGSA method can effecrively solve the problem of lacking label data in actual engineering by using label data in the laboratory.

A. TSC
Sparse coding is an effective method in feature extraction, which can realize adaptive feature extraction [53], [54]. Given a data matrix X = [x 1 ,. . . , x n ]∈R m×n , it means that n data points are sampled from the m-dimension feature space, and the dictionary matrix is expressed as B = [b 1 ,. . . , b k ]∈R m×k , where each column vector b i represents a basis vector in the dictionary, and the coding matrix is expressed as S=[s 1 ,. . . , s n ]∈R kxn , where each column vector s i is a sparse representation of a data point x i . The specific formula is given as follow.
where λ is an adjustable regularization parameter, which is used to determine the sparsity of coding and the approximation of input data. Sparse coding learns a set of dictionary basis vectors and the coding matrix from given training samples by loop iteration. Then it uses the optimized method to solve the coding matrix of the test samples. Through this process, the raw data is reconstructed into a new feature representation using dictionary basis vectors. The coding matrix S is a sparse matrix, which can greatly improve the calculation speed and save storage space. It is the advantage of sparse matrix.
In order to take into account the potential intrinsic geometry of the input data, a graph regularized sparse coding (GraphSC) method is proposed [54]. This method obtains data geometric information by learning the sparse representation that explicitly considers the local manifold structure of the data. The formula is given as follow.
where γ is a graph regularization parameter, which is used to determine the weight between sparse coding and geometry preservation. However, when the labeled and unlabeled data are sampled from different distributions, they may be quantized into different feature words of the codebook and encoded with different representations, which may seriously degrade the classification performance. To improve the learning ability of sparse coding, we minimize the distribution difference between the labeled data and unlabeled data, and apply this criterion to the objective function of sparse coding [45]. The empirical maximum mean difference (MMD) is used as a nonparametric distance measure to compare different distributions, and the objective function of TSC is obtained as follow.
where µ > 0 is the MMD regularization parameter, which is used to determine the weight between GraphSC and distribution matching.

B. JGSA
In order to reduce the shift between domains both geometrically and statistically, a domain adaptation method, referred as joint geometric and statistical alignment, namely JGSA is proposed by using both shared features and domain-specific features. The definition of the JGSA is described as follows.
Source domain data is represented as X s , extracted from distribution P s (X s ), and target domain data is represented as X t , obtained from distribution P t (X t ), where n s and n t are number of samples in source domain and target domain, respectively. We assume that the feature space and label space of the two domains are the same as X s = X t and Y s = Y t . Due to data set offset, there is P s (X s ) = P t (X t ).
The JGSA obtains a new representation of each domain by finding two coupling projections (A for the source domain and B for the target domain), mapping the source domain data and the target domain data to their respective subspaces. VOLUME 8, 2020 After projection, 1) Maximize the variance of the target domain data to maintain the target domain data attributes.
2) The discrimination information of the source domain data is retained to transfer the label information effectively. 3) Minimize both the marginal distribution and conditional distribution divergences between the source and target domains to statistically reduce domain shift. 4) The divergence between the two projections is limited as small as possible, and the shift is reduced geometrically.
Based on the above four points, the objective function of the JGSA method is obtained as follow.
The goal is to find two coupled projections A and B by solving the optimization function, where I ∈ R d×d is the identity matrix, µ is the target domain scatter matrix coefficient, β is the within class scatter matrix and between class scatter matrix coefficient, and λ is the subspace shift coefficient.

III. A TRANSFER LEARNING FAULT DIAGNOSIS METHOD A. THE IDEA OF THE FAULT DIAGNOSIS
In the case of variable working conditions, there is a large difference in the statistical distribution between the data under known and unknown working conditions, but the feature space and the label space are the same. The potential common features from the known working conditions is learned. At the same time, the unique features of the unknown working conditions is retained to improve the generalization ability of the model. The vibration signal is transformed from time-domain to frequency-domain, so as to obtain fault information, which is more conducive to identification and matching. This transformation is also of practical significance for reducing the computational complexity of subsequent algorithms. The data collection environment is not a strict laboratory environment, it is closer to the actual engineering conditions. So the fault data often contains more redundant information and is accompanied by noise, which needs to be further extracted. The sparse coding is superior in feature extraction. TSC is a new feature representation method, which can achieve the desired feature extraction effect. It can find and extract effective features from the data, which also eliminates the influence of noise. It is a powerful tool to extract useful features from data, thus can more accurately and concisely express data features. The domain adaptation method is an effective method to solve the problem that the data distribution in the source domain is different from that in the target domain, but the tasks are the same. The JGSA method can reduce the distribution shift and geometrical shift between domains by simultaneously using the shared features and domain-specific features between source domain data and target domain data to achieve the purpose of transfer learning. In order to make full use of the feature extraction ability of the TSC and the transfer classification ability of the JGSA, a new transfer learning fault diagnosis (TSC-JGSA) method based on creatively combining TSC with JGSA is proposed to realize the fault diagnosis of rolling bearings under variable working conditions. The bearing vibration data of three different rotating speeds and ten fault types under known and unknown working conditions are used to verify the effectiveness of the TSC-JGSA method.

B. FAULT DIAGNOSIS MODEL
The flow of the TSC-JGSA method for the fault diagnosis of rolling bearings under variable condition is shown in Figure 1.

C. THE STEPS OF THE TSC-JGSA
In this paper, the advantages of sparse coding are used to extract deep features. The variable working condition is solved by combining the characteristics of the TSC and JGSA, the differences of samples between different working conditions are reduced to realize fault diagnosis with various types for rolling bearings at different rotating speeds. The specific steps of the TSC-JGSA are described as follows.
Step 1. Data preprocessing The multi-state time-domain vibration signals of rolling bearings under known and unknown working conditions are transformed into frequency-domain by using fast Fourier transform in order to obtain corresponding frequency-domain amplitudes.
Step 2. Feature extraction The TSC is used to extract the deep features of frequency domain amplitudes of the vibration signals.
Step 3. Construct feature matrix The obtained features are used to construct the feature matrix, which is composed of the training samples(source domain) and the test sample(target domain).
Step 4. Domain adaptation processing The JGSA is used to perform domain adaptation processing on the training samples(source domain) and the test samples(target domain), which statistically reduce the marginal distribution domain shift and the conditional distribution domain shift between different domains, geometrically reduces the subspace domain shift after projection, and improves the distribution similarity of the samples between domains. Step

IV. VALIDATION AND ANALYSIS A. DATA DESCRIPTION AND PARAMETER SETTING
In this experiment, the obtained experimental data come from the experiment platform of QPZZ-IIrotating machinery. The experiment platform of QPZZ-IIrotating machinery is shown in Figure 2. The experiment platform of QPZZ-II rotating machinery consists of a variable-speed drive machine with a power of 0.75KW, bearings, gearboxes, shafts, eccentric turntables, governor, and so on. The inner ring, outer ring and rolling elements of the bearing are processed with single-point damage. The eccentric turntable simulates the unbalanced working condition of the rotor by placing weights. The rolling bearing is a cylindrical roller rolling bearing of type N205. By simulating the imbalance between the fault bearing and the shafting, the acceleration vibration signals at different rotating speeds and different fault positions are collected. The fault positions are at the inner ring, outer ring and rolling element of the rolling bearings. The vibration signal is collected by vibration acceleration sensor and data acquisition card, and the sampling frequency is 12kHZ. The bearing fault also contains data with a sampling frequency of 48kHZ. Record the vibration acceleration signal data under the motor speed conditions of 1000rpm, 1250rpm and 1500rpm, respectively.
At each speed, the vibration signal data of 10 states are divided into 5 single faults of normal, inner ring, outer ring, rolling element and rotor imbalance and 5 coupling faults. The data is intercepted with 1024 length. A total of 10 classifications were performed on the input data. The states and label descriptions of the 10 classifications are shown in Table 1. The environment of this experiment is the computer processor Intel (R) Core (TM) i5-7400 CPU @ 3.00GHz, memory 8GB, MATLAB 2018b. 10 kinds of multi-state samples of rolling bearings under different working conditions are set as follows.
(1) Set the working condition A as 1000r/min.
(2) Set the working condition B as 1250r/min.
(3) Set the working condition C as 1500r/min. The specific descriptions of the vibration signal samples are shown in Table 2. In Table 2, the A/B working condition indicates that the working condition A feature sample set in multiple states is used as the source domain, that is, the training feature sample set, and the working condition B feature sample set is used as the target domain, that is, the test feature sample set. Select 2000 samples in the source domain for training and 1800 samples in the target domain for testing under each working condition. The reason for this choice is that the fault diagnosis accuracy can achieve better results when the training data is slightly more than the test data.
The time-domain data samples of rolling bearing vibration signals under the above-mentioned different working conditions are transformed into the frequency-domain data samples by using fast Fourier transform. At this time, the dimension of the frequency domain amplitudes is half of the original time-domain dimension, namely 513 dimensions. Input the transformed frequency-domain amplitudes of all samples as the characteristic values into TSC.
In this paper, the values of parameters are set according to the suggestions given by the original authors of the two methods. It is generally believed that as long as they are within the given range, they can achieve good results. The subspace basis vector of JGSA is set as 30, and the kernel type is set to primeval, which represents the results of the JGSA in the original data space. In JGSA, the target domain scatter matrix coefficient µ is 1, the within class scatter matrix and between class scatter matrix coefficient β is 0.1, and the subspace shift coefficient λ is 1. The number of PCA basis vectors in TSC is set as 64, the dictionary dimension is k, the number of dictionary basis vectors is 128, the MMD regularization parameter is 1e6, the graph regularization parameter is 1, the sparsity penalty parameter is 0.1, and the number of TSC iterations is 10.

B. FEATURE EXTRACTION
The TSC can adaptively extract the features from the data under the unknown data label condition, which can avoid the empirical dependence of signal processing methods and improve the feature extraction efficiency. Therefore, in this paper, the features of distinguishable multi-state vibration signals obtained by TSC are used as the input of the domain adaptation model of JGSA. The flow of deep feature extraction of rolling bearing vibration signals using the TSC is shown in Figure 3.
The TSC is adopted to extract the vibration signal feature of rolling bearing. The specific steps are described as follows.
Step 1. Select the original multi-state data of rolling bearings. Set the number of vibration signal data points in each state and the length of each sample.
Step 2. According to the equation (1), the time-domain data is transformed into the frequency-domain data by using fast Fourier transform. The obtained frequency-domain data is used as the input data of TSC.
Step 3. Initialize the network structure parameters of the TSC. The frequency-domain data of the vibration signal is used as training samples. The TSC is trained in order to obtain the optimal outputs B and S when the objective function is minimized.
Step 4. The output result of the data is the sparse matrix S, which is the deep features extracted by TSC.

C. EXPERIMENT RESULTS
After the frequency-domain amplitudes of rolling bearing vibration signals is input ito the TSC to obtain deep feature samples. The JGSA is introduced to process the deep feature samples under different working conditions to achieve the  increasing purpose within class compactness and between class discrimination. In order to reflect the advantages of JGSA, the non-transfer learning scheme of TSC-KNN was selected to compare with the schemes of principal components analysis (PCA), GFK, TJM and JDA. The fault diagnosis accuracies of rolling bearings using different schemes under different working conditions are shown in Table 3 and Figure 4.
As can be seen from Figure 4 and Table 3 that compared with the other five schemes, JGSA has the highest fault diagnosis accuracy under five working conditions except for the working condition A/C. Its accuracy is slightly lower than the two schemes of TJM and JDA under the working condition A/C. By the comprehensive comparison, JGSA has the highest average accuracy of fault diagnosis, which is higher than the other five comparison schemes. It can be seen from the analysis results that the deep features obtained by TSC are the same. Other methods do not well retain the attributes of the original data when the distribution shift is reduced. It fail to consider the correlation between the statistical Due to the randomness of the TSC, in order to prove that the experimental results are not accidental but universal, and also to further explore the robustness of the proposed scheme, each experiment method is repeated 10 times, and the box plots of six experimental schemes under six working conditions according to the results of 10 experiments are shown in Figure 5∼Figure 10. As can be seen from Figure 5 to Figure 8, under the four working conditions, the lowest accuracy rate of the TSC-JGSA in 10 experiments is still higher than those of other comparative methods. Therefore, the experimental results are less discrete, which shows that the TSC-JGSA does not only have the highest accuracy in the experimental results under the four conditions, but also shows good robustness.   It can be seen from Figure 9 that under the working condition A/C, the JGSA shows a large dispersion, including the lowest accuracy rate than the TJM method, and the median accuracy rate than the other two methods, namely TJM and JDA. It shows that when the working conditions are greatly different, that is, when the two rotating speeds are different, the JGSA can not show strong robustness. Especially the high speed data learns from the low speed data. The reason may be that the data of the two speeds are quite different, and the features can not be well aligned to reduce the distribution domain shift and the subspace domain shift. This is also a limitation of the scheme. How to enhance the robustness of  the JGSA under two different working conditions is the focus of the next research.
It can be seen from Figure 10 that the accuracy of the JGSA is lower than that of the JDA in the 10 experiments. It is still the robustness problem of the proposed method when the reflected working conditions in Figure 9 are quite difficult. However, the dispersion of the experiment results and the median accuracy rate have been greatly improved by comparing the results of the previous working condition, which is still higher than the other five comparative experimental methods.

V. CONCLUSION AND PROSPECTS
In this paper, a new transfer learning fault diagnosis method based on TSC and JGSA for rolling bearings under variable working conditions is proposed. The TSC is used to extract the vibration signal features of rolling bearings under variable working conditions, so as to obtain features that can better represent the state of rolling bearing. The JGSA domain adaptation method is introduced to reduce the distribution domain shift and subspace domain shift simultaneously, and the samples under different working conditions are well aligned statistically and geometrically to reduce sample differences between different working conditions. Compared with other domain adaptation methods and non-transfer learning methods, the experimental results show that the proposed TSC-JGSA method has higher accuracy and stronger robustness in rolling bearing state classification under variable speed.
In the next step, we will further study the domain adaptation method in the transfer learning to improve the robustness of the JGSA in the fault diagnosis under variable working conditions, so as to better diagnose rolling bearing fault diagnosis under variable speed.