Semi-Supervised Fuzzy C-Means Clustering Optimized by Simulated Annealing and Genetic Algorithm for Fault Diagnosis of Bearings

As a popular clustering algorithms, fuzzy c-means (FCM) algorithm has been used in various fields, including fault diagnosis, machine learning. To overcome the sensitivity to outliers problem and the local minimum problem of the fuzzy c-means new algorithm is proposed based on the simulated annealing (SA) algorithm and the genetic algorithm (GA). The combined algorithm utilizes the simulated annealing algorithm due to its local search abilities. Thereby, problems associated with the genetic algorithm, such as its tendency to prematurely select optimal values, can be overcome, and genetic algorithm can be applied in fuzzy clustering analysis. Moreover, the new algorithm can solve other problems associated with the fuzzy clustering algorithm, which include initial clustering center value sensitivity and convergence to a local minimum. Furthermore, the simulation results can be used as classification criteria for identifying several types of bearing faults. Compare with the dimensionless indexes, it shows that the mutual dimensionless indexes are more suitable for clustering algorithms. Finally, the experimental results show that the method adopted in this paper can improve the accuracy of clustering and accurately classify the bearing faults of rotating machinery.


I. INTRODUCTION
Rotating machinery is the most popular type of equipment used in mechanical engineering industrial applications. Rolling bearings are one of the most important components of rotating machinery. However, the rapid developments in science and technology have increased the complexity of rotating machinery structures, increasing the probability of rolling bearing failure [1]. Therefore, rotating machinery The associate editor coordinating the review of this manuscript and approving it for publication was Gianmaria Silvello . diagnosis for predicting rolling bearing failures is of particular significance [2], [3]. A common fault diagnosis method is to analyze vibration signals that contain mechanical fault information [4], [5]. This method comprises two important steps: signal feature extraction and fault status identification. Due to the fact that mechanical failures occur gradually, there is often uncertainty in relation to which vibration signal characteristics should be extracted, and a fault state may not be identified from the eigenvalues extracted from the vibration signals. Thus, an effective way to identify failure modes is to use fuzzy c-means (FCM) clustering based on the fuzzy theory [6], [7].
Significant research for rotating machinery fault diagnosis has been previously conducted, to solve the problem of complex rotating machinery failure. To date, a number of efficient fault diagnosis methods have been derived. To improve the dimensionless index classification of petrochemical rotating machinery equipment, a double sample data fusion method is proposed by Xiong et al. Based on a combination of rules and raw data collection, each dimensionless index could be used as the evidence of the system, and then, the Kolmogorov-Smirnov (K-S) test could be used to detect the exact type of failure. Their experiments demonstrated successful fault type identification using dimensionless indexes with coincidences or evidence conflicts. Compared to the k-nearest neighbors (KNN) algorithm, this method provided better fault recognition and improved the fault detection accuracy by 9.45% [8]. Under actual working conditions, serious overlap problems can occur when the dimensionless parameter range is defined by vibration monitoring calculations to simulate compound faults. On this basis, Sun et al. proposed a novel online method based on dimensionless immune detector and evidential reasoning (ER) to diagnose rotating machinery. Their method was able to effectively achieve real-time fault diagnosis with great potential for practical engineering applications [9].
FCM is a popular clustering method using the concept of geometric closeness of the determined data point to classify data [10]. At present, clustering algorithms are widely used in different domains, such as mechanical fault diagnosis, medical data processing, and image processing [11]- [13].
Wang et al. applied the fuzzy c-means clustering algorithm in mechanical fault diagnosis. The center of the data eigenvalues of rolling bearings was used to distinguish the fault categories and good results were obtained. However, when processing complex data, the above algorithm may not be sufficient when using a certain point as a clustering center for a certain type of data [14].
When the data eigenvalues are difficult to be separated, the number of different types of data differs, or the distance between different data types is small, it is convenient to shift the data center point to a category with more data, which will reduce the accuracy of clustering. In such cases, it is important to select the initial clustering center point. According to the above analysis, Wu et al. proposed an improved algorithm, which takes into account two factors, the distance and the local density, and it was able to get good results [15].
Zhang et al. gave a FCM algorithm based on a genetic algorithm (GA). It was shown that when the number of samples and categories is large, the proposed algorithm can lead to faster speed and more accurate results than the FCM algorithm. Bai et al. proposed a fuzzy clustering algorithm based on simulated annealing (SA) and genetic algorithm. The above algorithm could overcome the system sensitivity to data sets and initial clustering centers, and avoid falling into local minima [16].
Among the aforementioned algorithms, the optimization algorithms proposed by Zhang et al. and Bai et al. let the sum of the distances between the clustering center and the data points be the fitness function in genetic algorithm. It is generally believed that the smaller the distance, the higher the fitness. However, this method may not be appropriate for processing some complex data types, especially when the data does not satisfy normal distribution. In addition, this algorithm is sensitive to outliers.
In addition, semi-supervised clustering, a new learning method, is proposed in recent years, which combines semisupervised learning and cluster analysis [17], [18]. The exiting semi-supervised clustering algorithm can be divided into three class, such as the method based on constraint, the method based on distance, and the method based on constraint and distance [19].
To overcome the shortcoming of the method proposed by Zhang et al, in this paper, the distance and clustering accuracy are taken as the fitness function of genetic simulated annealing algorithm. The following contributions are presented: 1) A mutual dimensionless method is used to process data. The mutual dimensionless index can reduce the distance between the internal structure of each dimensionless index and the coincidence of the same dimensionless index of each fault. The experimental results show that, compare with dimensionless method, the accuracy of bearing fault diagnosis can be increased by 9.22% at most by mutual dimensionless processing.
2) Since traditional FCM is very sensitive to outlier, the clustering accuracy is taken as the objective function, which is optimized by the genetic simulation annealing algorithm.
3) Several popular intelligent fault diagnosis methods, including the GA, SA, and FCM clustering algorithm, are merged to form an integrated diagnosis method for rotating machinery.
The paper is organized as follows: Section II will introduct our related work on data processing and fault diagnosis. Section III provides the FCM clustering theory, the GA and SA algorithm principles, and the theory of the fusion algorithm applied for single intelligent fault diagnosis. Section IV presents the experimental procedure, results, and analysis on the hybrid fault diagnosis method based on the SA and GA algorithms. Section V provides a general discussion, while a summary of the study and future directions are presented in Section VI.

II. RELATED WORK
This section describes the related work on data processing of rotating unit and our previous for fault diagnosis. Our related works show the feasibility of our algorithm to some extent.

A. RELATED WORK ON DATA PROCESSING
The characteristic signals of the bearing are different, when the it is in different operational state. The original vibration signal can be analyzed and processed, which can better meet the needs of state diagnosis. For example, time domain waveform signal is a kind of original vibration signal without any processing. Its waveform characteristics are different with different operation status, as shown in the Fig. 1 to 5,   However, it is difficult for us to identify the type of the operation status. Therefore, it is feasible and significant to extract fault feature from time domain signals and dimensional index often used in time domain analysis.
Xiong et al. proposed a method based on the static discount factor, combining the KNN classification algorithm with the dimensionless index for information fusion fault diagnosis, and experimental results show the method can reduce the impact of unreliable factors on the fusion effect. But there are difficulties with distinguishing evidence from complete fusion or large conflicts resulting in uncertainty in the problem diagnosis [20].
Then, Xiong et al. exchanged numerator and denominator in dimensionless formula, and named it the mutualdimensionless index. It is proved in reference [21] that the mutual dimensionless index is sensitive enough to the fault and not affected by the working conditions of the machine, which is more suitable for the fault diagnosis of rotating  machinery. Accordingly, the mutual dimensionless method is used to process the data in this paper.

B. RELATED WORK ON FAULT DIAGNOSIS
Xiong et al. proposed SVM and correlation coefficient algorithm to diagnose mechanical bearing, according to the separability and correlation of fault data eigenvalues [8], [21]. However, the method of SVM did not solve the problem of parameters selection, and the method of correlation coefficient has the problem of low diagnosis rate.
Wang et al. used the FCM algorithm to diagnosis mechanical status [14]. The center of data eigenvalues can be obtain by FCM. However, the method does not overcome the local minimum problem of FCM. Therefore, this paper improves the FCM algorithm by using SA and GA.

III. RELATED THEORY
For readability and ease of later development, this section will describes three algorithms used in the paper, including fuzzy c-means clustering algorithm, genetic algorithm and simulated annealing algorithm.

A. FUZZY C-MEANS CLUSTERING ALGORITHM
The FCM clustering algorithm is one of most popular clustering algorithms, which is based on the objective function [22]. Known data points are grouped into various categories, and the cluster centers and degree of membership in each category are calculated [23]. The objective function is optimized by calculating the membership of the given data points to all cluster centers. However, the algorithm is very sensitive to the parameter initial state values, which means that it can easily fall into a local minimum point [24]. Using a combination of the SA and GA algorithms, the function can quickly converge to the global optimal solution, where the objective-function value of the non-similarity index is stabilized at its minimum.
Theorem 1: The objective function J m can be expressed as follows: where A ij is the membership of the sample x i relative to category S j . Lemma 1: The Euclidean distance can be expressed as follows: where d ij is the Euclidean distance, defined as the distance between the k − th sample x i and the center N j of the j − th class.
Lemma 2: The purpose of fuzzy clustering is to find an optimal membership function M to minimize the value of the objective function J m . The total of the membership degrees of all samples is Then, the membership degree of the sample point x i relative to class S j can be calculated as: The clustering center ( Ni) can be calculated as: Equation (5) used to calculate c cluster centers. Equations (4) and (5) are used to adjust the clustering center and membership degree repeatedly. Therefore, the clustering centers of all samples types and the membership degrees of each sample can be obtained theoretically, and thus, the partition of the fuzzy clustering algorithm is completed. Despite that the FCM has a high retrieval speed, it is a local retrieval algorithm which is sensitive to the initial clustering center. The algorithm will fall into a local minimum, if the initial value is not properly selected [10].
Fuzzy c-means clustering algorithm, using fuzzy theory, obtains the membership degree of each sample point to all class centers by optimizing the objective function, and weights the membership degree, so as to determine the class of sample points to classify sample data. This method gets the minimum value of J m function, but the result may be the local minima or saddle point of the function. Besides, FCM is easily affected by outliers. Therefore, fuzzy c-means clustering algorithm is not suitable for the bearing fault of large-scale petrochemical units with noise and sample imbalance. To solve the above problems, this paper proposes the application of semi supervised fault diagnosis method based on simulated annealing and genetic algorithm optimization in bearing fault diagnosis. See parts B and C for details.

B. GENETIC ALGORITHM
The GA, proposed by John Henry Holland in the 1870s, is a computational model that mimics natural evolutionary systems, and uses the biological evolution theory and the stochastic exchange theory according to Darwin's survival of the fittest [25]. The basic concept is to randomly generate an initial population, whose individual genotype is a tree structure. Then, according to the survival of the fittest principle, population duplication, crossover, and mutation are performed using iterative optimization. Based on the size of the fitness value of each individual, the best individuals are selected, forming a new population. The purpose of the iterative process is to make the offspring population adapt better to the environment and, once an iteration is terminated, to decode the optimal individuals as the optimal solution, which is usually used for parameter selection [26], [27]. However, in the early stages of the iterative process, GA can easily cause the whole population to consist of super individual offspring, leading to premature optimization. Since in the later stages of the GA, the individual population fitness will be similar, the predominance of ''super individuals'' is more apparent in the offspring [28]. A detailed flowchart is shown in Fig. 6, where GEN is the population size of the counter and MAXGEN is the maximum number of evolutions.
1) Coding method: In this paper, the number of parameters to be optimized are C initial clustering centers, in which each dyeing is composed of C clustering centers. Assume that each variable has M-dimensional and is encoded by K binary codes. The length of chromosome is C×M ×K .
2) Fitness function: It is used to evaluate the degree of adaptation of each code string. The search type in genetic algorithms is guided only by the fitness function (Fit) [29]. The Fit in the paper is the accuracy after clustering and J m in Equation (1). The bigger the accuracy is, the smaller the reciprocal is, and the bigger the fitness is. The definition of fitness in this article is where h i denotes the number of data belonging to class i after clustering, and H i denotes the total number of data belonging to class i.
3) Selection operator: Its function is to copy parent chromosomes of high fitness to the following generation. Assuming a population size of U , the probability P that the individual t can be chosen is 4) Variation operator: It is an auxiliary method for generating new individuals, ensuring genotype diversity in the population, while preventing search stagnation [29].
Genetic algorithm takes the fitness function as the information guiding the search type. It only uses fitness function value to measure the excellence of an individual, and does not involve the process of derivation and differentiation of objective function value, which makes genetic algorithm show a high degree of superiority, because in reality, many objective functions are difficult to derive, or even do not have derivatives. However, with the extensive research, the disadvantages of genetic algorithm have come to light, such as: 1) The genetic algorithm is prone to nonstandard and inaccurate problems in coding.
2) As a single genetic algorithm coding can not fully express the constraints of the optimization problem, it is necessary to consider the threshold value for the infeasible solution, which increases the workload and solution time.
3) The efficiency of genetic algorithm is usually lower than other traditional optimization algorithms. 4) Genetic algorithm is prone to premature convergence.

C. SIMULATED ANNEALING ALGORITHM
To overcome the shortcomings of genetic algorithm, simulated annealing algorithm is used in this paper.  [31]. The basic principle of SA, which is used to simulate the optimal solution process, is based on the solid annealing process in physics, where a solid is first heated to a certain temperature, so as to melt it, and then it is slowly cooled into a solid with regular microstructure. As the temperature of the solid increases, the particles in the solid body accelerate and move continuously in a non-uniform manner, and as the solid cools down, they decelerate. When the temperature reaches the ambient temperature, the particles are in the lowest energy state and in a state of thermal equilibrium [32]. In the SA algorithm, the solution is obtained by selecting a non-local optimal solution with a certain probability that falls within the scope of the initial solution. The loop iterates and as the simulated temperature decreases, the algorithm activity declines, approaching eventually the global optimum. According to the Metropolis criterion, the probability P of a particle becoming stable when the temperature of the particle is T can be defined by: where E is the internal energy of the particle at temperature T , E is the change in internal energy, and k b is the Boltzmann constant. At temperature T , the particle satisfies the Boltzmann probability distribution [33]- [35], which for a stable molecule in the x state is derived by: whereĒ is a random molecular energy variable, E(x) denotes the energy of the molecule in the x state, and Z (T ) is the normalized factor of the probability distribution, which can be defined as: Steps to implement the SA algorithm are shown as follow.
1) The initial solution M 0 is arbitrarily generated. Let M (0) = M 0 , set the initial temperature as T 0 , and let T = T k , where k = 0.
2) Use the Metropolis criterion to assess M k and T , then return M i = M , with M k as the current solution.
3) Cool down to the current temperature T and let T = T k + 1. If T k + 1 < T k , add 1 to k (k = k + 1). 4) Check if the annealing algorithm satisfies the termination condition. If yes, continue to step 5), otherwise, return to step 2). 5) Use M k as the current optimal solution, output the optimal value, and end the algorithm.
The advantages of SA are the following: 1) SA can deal with the objective function with any degree of nonlinearity, discontinuity and randomness, and the objective function can have any boundary conditions and constraints. 2) Compared with other linear optimization methods, SA and is easy to implement with less programming work.
3) It can guarantee to find the global optimal solution statistically.
The disadvantages of SA are the following: 1) It takes a lot of time to find the optimal solution, especially when using ''standard'' Boltzmann sampling technology (i.e. standard receiving function).
2) Compare with other algorithms, It needs more difficult parameter adjustment to solve a specific problem.
3) Cooling too fast will leads to simulated annealing to simulated quenching (SQ), which is not statistically guaranteed to find the optimal solution.

IV. CLUSTERING ALGORITHM BASED ON SIMULATED ANNEALING AND GENETIC ALGORITHM
The three algorithms described above, are associated with some disadvantages. In the proposed GA-SA-FCM algorithm the following improvements have been made: 1) In order to solve the poor searching ability problem of the GA, this study adopts the strong local searching ability of the SA algorithm. The SA algorithm is integrated into the solving process of the GA, in order to avoid the local optimal searching process of the GA.
2) The combination of the GA and the SA algorithm can significantly improve the global search ability of the whole algorithm.
3) This paper combines the SA with GA, and then, applied to the FCM. By using the strong local search ability of the SA and the strong global search ability of the GA, the clustering problem can be solved effectively and quickly [36].
Firstly, set the parameters for three algorithms. Secondly, cluster the training data, through which the distance value can be got. Besides, we can also get the accuracy after clustering by comparing the clustering category with the actual category. Then the distance and the reciprocal of the accuracy value were taken as the fitness function of the genetic algorithm. Here, the combination algorithm of SA and GA were used to adjusts the clustering center of clustering algorithm until the algorithm converges. Finally, a superior clustering center can be obtained.The flow chart of the proposed FCM based on the GA and SA algorithms is shown in Fig. 7.

A. STEPS TO IMPLEMENT THE FUZZY C-MEANS CLUSTERING BASED ON SIMULATED ANNEALING AND GENETIC ALGORITHM
begin setup: 1) Initialize the FCM clustering parameters: the power index is 3, the maximum number of iterations is 20, the objective function termination tolerance is 1×10 6 .
2) Iinitialize the SA algorithm parameters: the cooling coefficient q is 0.8, the initial temperature T 0 is 100, and the termination temperature T end is 1.
3) Initialize the genetic algorithm parameters: the number of individual sizepop is 10, the maximum genetic algebra MAXGEN is 10, the variable dimension, the variable binary digit PRECI is 10, the generation gap GGAP is 0.9, the crossover probability p c is 0.7, and the mutation probability p c is 0.01. 4) Create the initial population Chrom, calling the objective function ObjFun, to calculate the objective function value of the initial population individuals ObjV.
The detail steps to implement the GA-SA-FCM Algorithm is shown in Algorithm 1. Through the above procedure, the clustering centers of all types of data can be obtained. Then, the clustering centers were used as a standard for a class of data, while the remaining data are clustered.

V. EXPERIMENTAL PROCEDURE
The experiments were performed in the Key Laboratory of Fault Diagnosis of Petrochemical Equipment, Guangdong Province, China, using a fault diagnosis test platform of a large-scale petrochemical multistage centrifugal blower. The experimental platform was composed of 1) electric motor, 2) gearbox, 3) base platform, 4) coupling, 5) oil pipe, 6) fan, as shown in Fig. 8. The experimental platform was used to simulate a multistage centrifugal blower and common fault conditions of rolling bearings.

A. EXPERIMENTAL PROCEDURE
The experiments were performed on the rotary unit at a frequency of 20kHz and a rotational speed of 800r/min. The EMT390 vibrometer was used to collect signals under four different conditions, including three fault states (outer-ring wear, inner-ring wear, and bearing with one ball less), and the normal state (normal bearing). Table 1 lists the machine operation status types and the number of each corresponding datasets.
In this study, in order to clearly observe the effect of clustering in the two-dimensional plane, the mutual dimensionless indicators were standardized, principal component analysis was performed, and then the first and second principal component characteristic indicators were selected as input values. The data processing process can be seen in Fig. 9 and the relevant theory has described in detail in the literature [21].
Following main component analysis, Tables 2 to 5 demonstrate the corresponding contribution rate of each primary component between different types of data combinations respectively. When the first and second primary components were extracted, the contribution value of the selected sample was as high as 98.84%, 99.2%, 99.7%, and 99.2%. When the variance contribution rate of the main component exceeds   95%, the indicator can essentially represent the original characteristic information [37].

B. SIMULATION AND RESULTS
In this study, different types of data combinations above were experimentally simulated. The characteristic indexes of the first and second principal components were the horizontal and vertical axis, respectively [38]. The experimental results are shown in Figs. 10 to 13. Fig. 10 shows the results before and after clustering of a normal bearing and bearing with inner-ring wear. It was found  that the true clustering number of the normal bearing was 49, the true clustering number of bearing with inner-ring wear was 49, and the total clustering accuracy was 100%. Fig. 11 shows the result of a normal bearing and bearing with outerring wear before and after clustering. The true clustering number of the normal bearing was 49, the true clustering number of that with outer-ring wear was 47, and the total clustering accuracy was 98%. According to Figs. 10 and 11, normal bearings and bearings with inner-ring and outer-ring wear can cluster well. In Fig. 12, the situation was different. The number of true judgments of the normal bearing was 42, VOLUME 8, 2020    the number of true judgments of the bearing with one ball less was 33, and the total accuracy rate was 77%. In Fig. 13, there were three fault types, the correct number of judgements was 30, 33,37, respectively, and the accuracy rate was 73%, which was lower than the previous one. It can be seen that when the fault types are increased, the optimized clustering algorithm has a certain difficulty in resolution and the clustering accuracy rate decreases.

C. EXPERIMENTAL COMPARISON
In order to investigate the applicability and feasibility of the proposed algorithm for bearing fault diagnosis in engineering applications, the FCM and the proposed GA-SA-FCM algorithms were compared. In this comparative experiment, six sets of fault data were used. Each group and the corresponding fault types are listed in Table 6 and the experimental  results are shown Table 7.
In Table 7, D method represents that the data processed by dimensionless method, and M-D method represents mutual dimensionless index. The samples of each fault combination are selected randomly, that is to say, in Table 7, samples in each row is the same, and different in different rows. Therefore, each experiment will show different results. Besides, to make the result more convincing, we repeat the experience for 20 times and average them.
Comparing the mutual dimensionless index and dimensionless index, we can find that the mutual dimensionless index show good performance in clustering. For FCM, the difference in accuracy was as high as 8.36%. For GA-SA-FCM the difference in accuracy was as high as 9.22%.
In addition, the accuracy of GA-SA-FCM is higher than the FCM. As is shown in Table 7,in No.3, the difference in accuracy was as high as 5.5%. Therefore, taking the accuracy after clustering as the fitness value function of the GA can improve the clustering effect, which is an improvement to the FCM algorithms.    Besides, we compared the proposed GA-SA-FCM algorithm with method proposed in letter [8] denoted as DSDF for briefly. Five types of data sets, including inner-ring wear, outer-ring wear, large gears teeth deficiency, large gears teeth deficiency and inner-ring wear combination, large gears teeth deficiency and outer-ring wear combination were compared. The comparison result were shown in Table 8 and 9.
In Table 8, the fault diagnosis accuracy rate of DSDF is 38.89%, and the GA-SA-FCM model is 56.30%. In Table 9, the fault diagnosis accuracy rate of DSDF is 33.33%, and the GA-SA-FCM model is 48.89%, which means that the proposed algorithm is obviously better than that proposed by reference [8].

VI. DISCUSSION
In this paper, the accuracy of clustering and the distance value were taken as the objective function of SA-GA, which can improve the effect of clustering to a certain extent for the reason that the proposed method was able to solve the problem that cluster centers are easy influence by the outliers and local minimum problem.
In addition, the experience result shows that the mutual dimensionless indexes are more suitable for clustering algorithms than dimensionless one. for the reason that mutual dimensionless indicators are able to narrow the dimensionless the internal structure of the target distance, and then reduce the overlap of the same dimensionless index. Table 7 also shows that the advantage of our algorithm is obvious when the fault data is processed by mutual dimensionless.
However, compared with the FCM algorithm, the proposed method has high complexity. For the reason that the our algorithm need to calculate the accuracy, and then use the SA for overcoming the local minimum problem of SA. As is shown in Table 7, with the increasing of the sample size the CPU time of our method is increasing rapidly than FCM. In addition, at present, we have no a systematic method to determine the weight of the accuracy of clustering and the distance value which have influence on the performance of clustering.

VII. CONCLUSION
It is generally believed that the clustering algorithm quality depends on the sum of the distance between the clustering center and the data points. It is also believed that the smaller the sum of distances, the better the clustering effect, while in fact, this is not the case. Therefore, the clustering distance and the clustering accuracy were used as the objective function, and the GA-SA algorithm was used for optimization. The experimental results demonstrated that the proposed algorithm can be better than the general clustering algorithm or the fuzzy clustering algorithm, which take the sum of the clustering distance as the objective function. Besides, with the increase of data distribution complexity and outliers, the algorithm proposed in this paper will have better performance.
However, as we discussing in previous section, the proposed method still requires further improvements. Therefore, our further research will focus on determining the weight for the accuracy and distance (J b ) value and optimizing our algorithm to reduce its CPU time.

DATA AVAILABILITY
The data used to support the findings of this study are available from the corresponding author upon reques.