A Novel Method for Predicting Fault Labels of Roller Bearing by Generalized Laplacian Matrix

Because mechanical failures are accompanied by contingency and randomness, fault data is often difficult to obtain, and fault labels are also difficult to assign. The lack of data and fault labels have become important issues that restrict the development of fault diagnosis. The paper proposed a generalized Laplacian label prediction (GLLP) algorithm, which mainly uses the generalized Laplacian matrix and calculated a new locally smooth term. Therefore, data points with ambiguous and unclear labels will be assigned a small label value, while samples with more certain labels can get a more confident label value. The effectiveness of the method is verified on the public dataset and the real test rig dataset, and it is expected that this method can be extended to more complex mechanical system fault diagnosis.


I. INTRODUCTION
Mechanical systems are similar to medical systems. When faults occur in the system, it is vital to quickly and accurately find the points of faults and the causes of the faults. In actual mechanical systems, the collected fault data often lacks a large amount of label information, which is a fatal blow to most existing fault diagnosis methods based on supervised learning and deep learning [1], [2]. How to accurately classify the fault type using the only incomplete label at hand? This problem is a dilemma that current fault diagnosis research has to face. Based on this problem, a series of fault diagnosis methods based on few shot learning came into being [3], [4].
Fault diagnosis methods based on unsupervised learning are adopted in [5]- [8]. The labels of the fault samples are discarded directly, and the samples are clustered with the same category, so as to realize the fault diagnosis research in the unlabeled condition. This way does effectively solve the problem of insufficient sample labels, but the problem of the failure type cannot be determined. The intelligent fault The associate editor coordinating the review of this manuscript and approving it for publication was Dazhong Ma . diagnosis method based on unsupervised learning can only keep the distance of the samples of the same category closer, but cannot identify the fault category. In response to this shortcoming, [9]- [12] proposed a series of fault diagnosis methods based on transfer learning. These fault diagnosis methods can effectively solve the dilemma of the lack of data and labels by transferring the knowledge learned in other fields to the field of fault diagnosis, which is a research focus of intelligent fault diagnosis. However, how to judge whether the knowledge learned from other fields is relavant to the features of the existing data is a practical problem that transfer learning has to face. In order to solve this problem, the researchers returned their focus to the existing data and tried to find an effective solution from the data with missing labels, as shown in Fig.1. [13], [13]- [18] Proposed intelligent fault diagnosis methods based on semi-supervised learning, which assign the same labels with similar features by buiding similarity matrix. However, it is always very vague and ambiguous for the existing fault diagnosis methods based on semi-supervised learning to consider paired smooth items as the basis for constructing similarity matrices. The fault diagnosis method based on semi-supervised learning is always not  satisfactory when propagating label information, as shown in Fig.2. In Fig.2(a), the blue, red, and black circles represent positive samples, negative samples and unlabeled samples, respectively. The circle with coordinates (6, 2) has the same distance to the positive and negative samples, so it can be divided into any category, this point is called the'ambiguity point'. Fig.2(b) describes that the ambiguity points are misclassified when only pairwise smooth terms are adopted. In response to this ambiguity, a generalized Laplacian matrix [19] is proposed to define a new smooth term. It can be seen from Fig.2(c) that the proposed new smooth term effectively prevents the label information from passing ambiguity points and achieves greater classification effect.
The main contributions of this research can be summarized as follows: 1) Solving the dilemma of insufficient data labels for fault diagnosis; 2) A new smooth term is constructed by adopting generalized Laplace matrix; 3) The proposed GLLP can be regarded as a unified framework for graph-based label propagation methods.
The rest of the paper is summarized as follows: The inference model and inductive model are detailed described in Section II. In section III, the qualities of label propagation and fault diagnosis will be verified on Double Moon dataset and MFS-MG test rig dataset. Some conclusions are drawn in Section IV.

II. THE PROPOSED METHOD
The proposed GLLP algorithm is divided into two parts: an inference model and an induction model. The inference model is drawn in Euclidean space, and the induction problem is described in Hilbert Regenerative Kernel space.

A. INFERENCE MODEL
The graph W = Q, R is given. Existing graph inference algorithms [20], [21] usually use the original Laplacian matrix H = Z−P to describe the smoothness between labels. When the vector u = (u 1 , u 2 , · · · , u k ) is used to record the soft labels of all samples {c a } k a=1 in φ, then the existing smooth terms can be described as: However, Fig.2 has shown that the paired smooth terms defined by Eq.1 cannot effectively handle ambiguous bridge points, so we redefine a new smooth term among them, and ν are non-negative parameters used to impose different weights, r = k a=1 l aa represents the volume of graph W. The second term on the right side of the formula is the locally smooth term, which can be further elaborated as In the K-Nearest Neighbor graph W, l aa records the closeness of C a and surrounding elements, so when Eq.3 is minimized, l aa with a larger value can obtain a confidence label u a , and a smaller sample of l aa will obtain a relatively weak label.
The generalized Laplacian matrix expressed as H = E − βP − β 2 (E − Z) firstly appeared in [19], [22], where β is a variable parameter. When β is set to 1, the generalized Laplacian matrix H will be transformes into traditional Laplacian matrix, so the traditional Laplacian matrix can be regarded as a special case of our proposed method.
Let H = H + ν(E − Z/r), then Eq.2 can be expressed as (u) = u Hu, where H and H have similar meanings in Eq.1. At the same time, further considering it can be found that when ν/ = r(r − 2)/r − 1, and then divided by the coefficient ν + − ν/r,Ĥ is the Eq.4. Based on our proposed smoothing term, the inference model of GLLP in Euclidean space will be derived. Firstly, the initial state of the sample can be defined as d = (d 1 , d 2 , · · · d k ) , when d a = 1, c a is a positive sample, when d a = 0, c a is an unlabeled sample. Then we extend the vector to the form of a matrix, and define a diagonal matrix A k×k . if C a is a labeled sample, the value of the i-th element on the diagonal element is 1, if it is an unlabeled sample, then set its value to 0. So the inference model of GLLP can be drawn as follows: The first term in brackets in Eq.5 is the pairwise smooth term, which only considers the smoothness of the label between the two samples. Therefore, we implanted a local smoothing term, which can considers the sample and its nearby neighbor samples to achieve smoothness in the local area. In order to obtain the optimal solution of Eq.5, take the derivative of G(u) to u and make the partial derivative equal to 0, then we can obtain Therefore, the form of the optimal solution of u can be described as: In Eq.7, if u a > 0, then the label is defined as a positive label.
Theorem 1: Eq.5 is a convex optimization problem, and the optimal solution obtained is the global optimal solution.
Proof: The Hessian matrix of Eq.5 is drawn as follows: Through observation, it can be found that matrix O is diagonally dominant, so it is a positive definite matrix. In summary, theorem 1 is proved.
Next, two important parameters and ν of the proposed GLLP will be fully analyzed. The classification results of GLLP will prove to be insensitive to changes in parameters, and the result will be further proved in subsequent experiment. Based on Eq.8, the influence of and ν on U can be converted to study the stability of the solution of equation Ou = d. For this, a lemma is provided: Lemma 2: A linear equation system Ou = d is provided, assuming that d is certain, when the coefficient O has a slight disturbance ιO, then the relationship between the deviation of the result and the actual value d is as follows: where When a slight disturbance appears on ν, ιO in Eq.9 is ιO = ιν(E − Z/r), and the resulting error can be described as Eq.10, as shown at the bottom of the page, where ζ = 2 i a=1 [ l aa + ν (1 − l aa /r)]+i > 0. It can be found that in Eq.10, except for the different coefficients, the last term of the numerator and denominator is the same form, which shows that the disturbance has little effect on the result, so Eq.10 is infinitely close to 0.

2) STABILITY OF
When a slight disturbance appears on , ιO in Eq.9 is ιO = ι H, and the resulting error can be described as Eq.11, as shown at the bottom of the page.
Similar to the above proof, the final result of Eq.11 is very close to 0, Similar to the above proof, the final result of Eq.11 is very close to 0, which also shows that GLLP is not sensitive to changes in .

B. INDUCTIVE MODEL
The inductive model is established in the Hilbert space. Assuming that F(·, ·) is a Motzker kernel related to the Hilbert space and the corresponding norm is · H , then the regularization expression of GLLP in the Hilbert space can be described as: Comparing Eq.12 with Eq.5, an inductive term f 2 H is added, which is used to control the complexity of the model and enhance the generalization ability of the model.
The literature [21] shows that the Eq.12 can be decomposed into a kernel function representing marked or unmarked: When Eq.13 is substituted into Eq.12, a objective function can be described as: where F is the Gram matrix defined on the training dataset, . It is not difficult to find that Eq.14 is a convex function, so the model can obtain the global optimal solution, and the solution of Eq.14 can be obtained as
Based on Definition 3 and Definition 4, Definition 5 can be described as follows: Definition 5: Input space is C, and ∀c a , c b ∈ C, c a − c b ≤ υ. Based on the Gaussian kernel function, build a K-Nearest Neighbor graph θ ab = exp − c a − c b 2 / 2ω 2 . When N (υ/2, C, · 2 ) < ∞, Proof: when M in Eq.14 is set to M 0 = (0, · · · , 0) , thenG (M 0 ) = d 2 /2 = i/2 can be obtained. It can also be found that all the items in the brackets in Eq.14 are nonnegative, 1 2 ς M FM ≤ G(M) ≤ G (M 0 ) = i/2 can be further obtained, which means When faced with a binary classification problem, Z can be divided into η = 2 N (υ/2, C, · 2 ) disjoint sets with an interval of υ [23]. According to Definition 3, if n 1 and n 2 are both a subset of set N a (1 ≤ a ≤ η), c 1 − c 2 ≤ υ and d 1 − d 2 = 0 can be obtained. The loss function of our proposed GLLP is According to Definition 4, the difference of the loss function of the GLLP mapping function on sets n 1 and n 2 is Taking Eq.13 into Eq.20, a specific loss function difference can be obtained as follows: where c a )). The upper limit of |O 1 | and |O 2 | can be derived as follows: where ·, · H represents the inner product in space H. In addition, since u 2 At the same time, considering d 1 − d 2 = 0, c 1 − c 2 ≤ υ, the upper limit of O 2 can be obtained as follows Finally, taking Eq.22 and Eq.23 into Eq.21, the specific loss function difference can be obtained as follows: the proposed GLLP is 8 i ς 1 + i ς 1 − exp − υ 2 2ω 2robustness has been proved.

2) GENERALIZATION ABILITY ANALYSIS
Assuming that all samples are independent and identically distributed, then the empirical error and generalization error can be defined asH Definition 6: Assuming that there are n independent and identically distributed samples in the training dataset φ, and the algorithm G is robust, then for any ι > 0 [24], there will be the following constraints when the probability is greater than or equal to 1 − ι: where the upper bound of the loss function H (·, ·) is denoted as U . Definition 7: Assuming that the loss function of the GLLP algorithm is H (u, φ) = (u(c) − d) 2 , then for any ι > 0, when the probability is greater than or equal to 1 − ι, the general-ization error limit of GLLP is: Proof : The generalization error limit of GLLP is related to υ(φ), η, and U . υ(φ) and η have been obtained in the above part, and then only the upper bound U is required, so so the upper bound of the loss function is Definition 7 can be proved by substituting Eq.24 and Eq.28 into Eq.25.

3) LINEARIZATION OF GLLP
The above inductive GLLP is nonlinear, but its corresponding linear model can also be sorted out.
According to Eq.13, the label of test sample c 0 is where F is a kernel matrix. Linear decision function u (c 0 ) = θ c 0 and data matrix C = (c 1 , c 2 , · · · c k ) are used to build the GLLP model: and its optimal solution can be obtained as: The label of test sample c 0 is Substituting F = C C into Eq.29 can get According to the matrix inverse lemma, u 1 (c 0 ) and u 2 (c 0 ) can be derived as Eq.34 and Eq.35 are obviously equivalent.

III. EXPERIMENTAL ANALYSIS
The experimental analysis is divided into two parts: 1) Verify the label propagation ability of the GLLP algorithm on the artificial dataset; 2) Verify the fault classification ability of the proposed GLLP method on the dataset of MFS-MG test rig.

A. DESCRIPTION OF DATASETS
In this section, two types of datasets are adopted: Double Moon dataset and dataset on MFS-MG test rig. These two datasets will be introduced in detail as follows. 1) Double moon dataset: The dataset first appeared in [25]. 1000 samples are divided into two moon shapes in the section, the center coordinates of the two moons are (0,0), (10,0) respectively. 2) MFS-MG dataset: MFS-MG test rig is powered by a 1HP motor. The specific experimental equipment related parameters are presented in Table 1. The faults collected by the test rig include: inner fault (IF), outer fault (OF), roller fault (RF) and normal condition (NC). The specific structure of the MFS-MG test bench is shown in Fig.3.

B. VERIFICATION OF LABEL PROPAGATION PERFORMANCE
In the theoretical analysis part, the various performances of GLLP have been fully proved. Next, we will verify the label propagation ability of the GLLP method through experiments, and make the results more intuitively observed by everyone through the visualization method. In order to fully prove the superior performance of the proposed GLLP method, five latest graph-based label propagation The results of the comparative experiment are presented in Fig.4. Since the GLLP algorithm we proposed is a non-iterative algorithm, we only give the final label propagation visualization results of all comparison methods. It can be easily seen from the results that the label propagation ability of our proposed GLLP method is the highest. In the process of label propagation, ALP and BPFLP propagates negative labels to positive labels, which is over propagation. After the propagation of LNLP, NLPPC and GLP, they still failed to completely spread all the labels, which is under propagation. The proposed GLLP method successfully propagated all kinds of labels to the correct position. The proposed GLLP can be considered as a state of the art label propagation method.

C. FAULT CLASSIFICATION OF MFS-MG TEST RIG
After the theoretical and experimental verification, the performance of the proposed GLLP method has been fully demonstrated. The GLLP method will be adopted for predicting the fault labels of rolling bearing and exploring the fault diagnosis performance.
In this work, 100 samples are selected to extract time-domain features, and each sample is a vibration sequence with a length of 1024.
In the experimental verification of the above subsection, five advanced algorithms are adopted for comparison. In this subsection, we adopt three state-of-the-art fault diagnosis methods based on semi-supervised learning for comparison. At the same time, to ensure the diversity of comparisons, two above latest label propagation algorithms are also adopted to compared with GLLP method. The parameters of these methods are set as follows: varsigma 1) CRSSL: Following the setting in [16]; 2) SS-CDGM: Following the setting in [14]; Since the first five methods are all deep learning methods, the network level is complex, we only provide corresponding references, and the specific parameter settings will not be repeated. It is worth noting that in order to verify the quality of GLLP in the case of missing labels, the random sampling rate is set to ρ = 0.2, 0.4, 0.6. Under different random sampling rates, the prediction accuracy of the label propagation method is shown in Fig.5, Fig.6 and Fig.7. The ability of predicting labels are measured by confusion matrixs [31], [32]. In Fig.5, since the sampling rate adopted is 0.6, that is, the number of labeled samples is 60, all methods can learn enough knowledge and the accuracy rate is also the highest. In Fig.6,  the accuracy of various methods for predicting labels is shown when the sampling rate is 0.4. It is shown in Fig.6 that when the sampling rate is 0.4, since the characteristics learned by various learning methods are sufficient, the accuracy of the predicted labels is generally high. and the GLLP method is the only method that completely predicts correctly. In Fig.7, the accuracy of various methods for predicting labels is shown when the sampling rate is 0.2. As shown in Fig.7, when the sampling rate is 0.2, due to the lack of training samples, the accuracy of the comparison method predicting labels is all lower than 80%, but the accuracy of our proposed GLLP can still reach 98.33%, with only one label being predicted wrong.

IV. CONCLUSION
In this study, a novel label prediction method based on the generalized Laplacian matrix has been proposed. A new locally smooth term is constructed through the generalized Laplacian matrix to accurately determine the category of ambiguity points. The GLLP is a new attempt of the fault label prediction algorithm. The performance of the proposed GLLP method has been verified on theory, manifold artificial dataset and actual test rig dataset. In the process of label propagation ability verification, GLLP is the only method where label propagation is completely correct. In the verification process on the MFS-MG dataset, even in the extreme case where the sampling rate is only 0.4, the proposed GLLP method still has a correct rate of 98.33%. Therefore, it can be seen that the GLLP method has broad development prospect in actual industrial fault diagnosis.
In the future work, we will further study the application of the GLLP method in the problem of unbalanced data distribution, and will also explore the potential of GLLP in fault diagnosis of gears and non-rotating machinery. CHAOFAN HU (Member, IEEE) received the Ph.D. degree in mechanical engineering from the Guilin University of Electronic Technology, Guilin, China, in 2020. He is currently a Full Associate Professor with the School of Mechanical and Electrical Engineering, Guilin University of Electronic Technology. He has authored/coauthored more than 15 technical papers published in prestigious international journals and conferences. His research interests include industrial mechanical fault diagnosis and rensor representation.
ZEXI LUO received the B.S. degree in solid mechanics from Tongji University, Shanghai, China, in 2020. He is currently working as an Engineer with the Shanghai Institute of Aeronautical Measurement and Control Technology. His research interests include fault diagnosis and mechanical dynamic modeling. VOLUME 9, 2021