Improved Winding Mechanical Fault Type Classification Methods Based on Polar Plots and Multiple Support Vector Machines

The accurate and fast diagnosis of transformer winding deformation faults is of significance to power suppliers and utilities. An improved winding mechanical deformation fault classification method is proposed. In this study, the transformer frequency response data is used to draw polar plots, and then its texture features are extracted for fault classification. The classification model constructed by multiple support vector machines is successfully obtained and shows good classification effect. Besides, this article uses an improved genetic algorithm based on the Emperor-Selective mating scheme and catastrophic operation, to optimize the parameters of support vector machine. The feasibility and accuracy of the proposed method are verified with experimental data obtained from a model transformer, and the proposed method is demonstrated to exhibit better performance compared with the traditional method.


I. INTRODUCTION
Power transformers are among the most important and expensive elements of transmission and distribution networks, which are directly related to the performance and reliability of the network [1]. Winding deformation is one of the main causes of transformer accidents [2]. Generally, mechanical faults can occur on transformer windings, which can subsequently lead to transformer failures [3]. Significant failures can result in long-term interruptions of transformer, and the repairs are costly and time-consuming [4]. The early detection of winding mechanical faults can prevent the transformer from consecutive and catastrophic failure [5]. Thus, it is necessary to diagnose transformer winding mechanical fault when the fault is at an early stage, to provide guidance on transformer maintenance [6].
At present, many transformer fault diagnosis methods have been proposed and practically implemented [7]. For instance, The associate editor coordinating the review of this manuscript and approving it for publication was Gerard-Andre Capolino. the temperature measurement [8], dissolved gas-in-oil analysis (DGA) [9], dielectric response analysis [10], vibration analysis [11], partial discharge analysis (electrical and acoustic) [12], [13], transfer function measurement [14], ultrasonic method and online short circuit impedance (SCI) [15]. All of these methods are significant. Some of them are already commercialized; however, some other power frequency techniques such as online SCI methods are still in the development stage due to the lack of sufficient accuracy and robustness.
Among the commercialized techniques, frequency response analysis (FRA) is a simple and popular technique [16]. However, despite the wide application of FRA in the power industry, the diagnostic accuracy of FRA can still be improved, according to the feedback of numerous field tests. In addition, although the online FRA has been studied, it still remains some challenges for its actual application, for example, the difficulties in online measurement, impact of electromagnetic interference on measurement, and lack of standard interpretation should be considered. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Thus, many improved versions of the FRA method have been widely studied [17]- [19]. Nevertheless, most of the existing literature only considered the amplitude information of frequency response, while the uses of phase-frequency characteristics were often neglected, except for a few studies [20]- [22]. In [22], transformer polar plots were drawn by using both the amplitude and phase information of the frequency response, which was used as a new signature for fault diagnosis. In [23], polar plot and digital image processing (DIP) techniques were used to detect the radial deformation (RD) fault of transformer. There is no doubt that the new polar plot is superior to the traditional FRA fingerprints that only provide amplitude information in fault diagnosis. Thus, the polar plot method is also adopted and studied in this contribution.
By using the DIP technique, the FRA polar plot image features can be extracted. There have been efforts to apply DIP to transformer fault diagnosis [24]- [26]. Ref. [27] studied the polar plot texture features of several common faults, which is of significance in fault diagnosis.
Another problem is the diagnosis of winding faults. Despite the diagnosis of winding mechanical deformations can be realized by the traditional statistical indicator method in both the FRA single signature and polar plot signature. However, up to now, the accurate diagnosis and recognition of winding deformations from FRA traces still remains some challenges. Some advanced artificial intelligence methods have been used and studied. Support Vector Machine (SVM) is one of these intelligent methods, which has proven to be suitable for processing the FRA amplitudefrequency signatures with highly dimensional and small samples [28], [29]. In addition, the parameters of SVM model are significant for the performance of classifiers. As one of the most widely accepted intelligent algorithms, the genetic algorithm (GA) has often been used in SVM parameter optimization [30], [31].
However, in the above studies, the method in [22], [23] was not verified by relevant physical experiments. In addition, the feature extraction of signatures can be improved. In [27], only the gray-level co-occurrence matrix (GLCM) features were extracted and focused, which has limitations to some extent. However, most of the existing reported literature only used a single SVM classifier for transformer fault classification [29], [32], which can be modified by a combination strategy of multiple SVMs, to improve the accuracy, calculation speed and robustness of fault classification. In addition, there remains much room for improvement in the standard genetic algorithm (SGA).
To improve the performance of winding deformation fault diagnosis, this study proposes an improved fault classification method based on FRA polar plot images and a combination of multiple SVM classifiers. The main steps of the method are summarized as follows.
• First, the polar plot with both amplitude and phase information of FRA is used to replace the traditional FRA signature as the foundation of fault diagnosis.
• Second, three texture features of the polar plot image are extracted by using DIP technology, including histogram of oriented gradient (HOG) features, local binary pattern (LBP) features, and GLCM features, which are extracted from the perspective of global description, local description and mathematical statistics, respectively.
• Third, three SVM models are trained using different texture features as input. An improved genetic algorithm (IGA) that integrates several recognized methods is used to optimize the SVM parameters. Three SVM models with independent decision-making ability are obtained.
• Finally, three independent SVM models are combined into a strong classifier for fault diagnosis. With the above background, this study provides some new ideas for winding fault type classification considering the state-of-the-art method. The four main contributions of this study are as follows: a) polar plot, b) texture features, c) multiple SVMs classifier, d) IGA. In addition, the experimental data obtained from a model transformer is adopted to verify the feasibility and accuracy of the proposed method.
The rest of this article is organized as follows. In Section II, the FRA polar plot is introduced. Section III and IV present explanations for the feature extraction, IGA-SVM and classification strategy, respectively. In Section V, the experimental test setup is introduced and the classification results are discussed. Finally, Section VI concludes this article.

II. FREQUENCY RESPONSE ANALYSIS POLAR PLOT
FRA, firstly introduced by Dick and Erven [33], adopts a sweep frequency sinuous signal to excite transformer windings and measure the response signal in the frequency domain to construct a frequency response signature [34]. The frequency response of transformer winding is expressed as: where U in is the excitation voltage, U out is the response voltage, f is the frequency of the applied sinusoidal excitation, and H is the frequency response. Both U and H are phasors with amplitude and phase. At present, only the amplitude information of the FRA signature is analyzed in the most cases of FRA application, and the phase information is often neglected. However, Ref. [35] noted that using a combination of amplitude and phase information is more sensitive to the transformer winding faults than using only amplitude information. The FRA polar plot is an image that combines the amplitude and phase information of the frequency response. The polar plot is plotted as follows: when a sweep sinusoidal signal is exerted to the terminal of the transformer winding, the corresponding amplitude and phase of FRA signature are recorded at each frequency sampling point. In the polar coordinates, the specific point, which corresponds to the frequency response at the exact frequency, is plotted by taking the measured phase as the polar angle and the measured amplitude as the polar diameter. By this way, the FRA polar plot can be drawn in polar coordinates.
The frequency response amplitude curve and phase curve of the healthy winding and some typical FRA polar plots of windings are shown in Fig. 1.

III. FEATURE EXTRACTION
As mentioned before, this study takes the polar plot as a fingerprint of a transformer. In the field of biometric technology, many fingerprint and signature recognition are depending on texture feature comparison [36]. The biometric technology can be migrated and applied to the transformer fault diagnosis. Thus, as the signature of transformers, it is also feasible to realize the fault diagnosis by identifying the texture features of polar plot.
In the image processing, the texture can be defined as a function of spatial variation of the brightness intensity of the pixels [36]. This study selects three widely used texture features: HOG features, LBP features and GLCM features as the basis of fault recognition. These features are described in detail below.

A. HOG FEATURES
The HOG feature is a feature descriptor that is used to detect objects in computer vision and image processing. The feature is constructed by calculating the gradient direction histogram of the local region of an image. In an image, the appearance and shape of the local target can be described by the directional density distribution of the gradient or edge. HOG features are essentially statistical information of the gradients.
The gradients mainly exist at the edge, thus, the HOG features can be used to describe the texture features. Here is the procedure: First, the image is divided into small-connected areas, which are known as cell units. Then, the direction histogram of the gradient or edge of each pixel in the cell unit is collected. Finally, these histograms can be combined to form feature descriptors. Fig. 2 shows a visualization of the HOG feature.

B. LBP FEATURES
The LBP feature, firstly proposed by Ojala et al. [37], is an operator that is used to describe the local texture feature of an image. The core concept of LBP is to treat the gray value of the central pixel as the threshold value, and obtain the corresponding binary code to represent the local texture features compared with its neighborhood. From the perspective of texture analysis, the texture features of a pixel in an image refer to the relationship between the certain point and the neighboring pixel points. Thus, the LBP feature is a descriptor that measures the relationship between a pixel and its surrounding pixels. Fig. 3 shows a visualization of the LBP feature. In this study, the LBP value is calculated following the definition that an ordered set of binary comparisons of pixel intensities between the center pixel and its eight surrounding pixels [38]. Therefore, the radius of the LBP operator is 1, and the LBP value of the center pixel is calculated by the binary coding of 8 neighbors.

C. GLCM FEATURES
The GLCM statistical method, proposed by Haralick et al. [39], is a widely used texture analysis method. It is based on the assumption that the spatial distribution relationship among the pixels contains the texture information of the image. The image texture is formed by the repetitive appearance of pixel gray in a spatial position. GLCM is a joint distribution of two pixel gray levels with a certain spatial position relationship. Briefly, GLCM is the joint histogram of two pixel gray levels, which is a second-order statistic.
However, the direct input of classifiers is not the GLCM but the mathematical features extracted from the GLCM. In this study, the following widely used mathematical features are adopted [40] (in the following equations, p(i,j) is the normalized GLCM): The ASM is the sum of squares of the elements of the GLCM. The ASM reflects the uniformity of the gray distribution and texture thickness.
The IDM is a measure of the local variation of the image texture that reflects the degree of the regularity of the texture.

3) ENTROPY (ENT)
The ENT represents the amount of information in an image, which is a measure of the randomness of image content, and represents the complexity of texture.

4) CORRELATION (COR)
The COR is a measure of the similarities of GLCM elements in row or column directions. This parameter can be used to determine the main direction of the texture.
Because GLCM features are statistical features, they are not easily visualized. Therefore, in Table 1, the eigenvalue data of some representative samples are attached.

IV. IGA-SVM AND CLASSIFICATION STRATEGY
It seems unreasonable to use only SVM when other intelligent methods such as LSTM and GRU are in wide use. However, in fact, it is very difficult to obtain much FRA data from the actual transformers. Such a small quantity of data is insufficient for training a neural network with a complex structure. Nevertheless, SVM is suitable for solving the problem of small sample pattern classification. This study proposes a strategy that combines multiple SVMs to obtain a more robust classifier. Another reason for using multiple SVMs lies in the fact that the dimensions of the texture features differ from each other, if only a single SVM model is used, the simple fusion strategy of features may lead to consequence that highdimensional features cover up low-dimensional features. This problem can be avoided by using multiple SVMs.
A. IGA-SVM Fig. 4 shows the flowchart for training a SVM model.

1) EMPEROR-SELECTIVE (EMS) MATING SCHEME
The mating scheme plays an important role in the convergence and robustness of GA. The EMS scheme has been demonstrated to be more effective than other schemes. It allows the fittest individual to procreate freely with the rest of the population, resulting in a greater diversity and faster convergence, and ensures that the high-quality gene is preserved during the random process. Fig. 5 shows a schematic of two different mating schemes.
As seen in Fig. 5, the mating strategy of SGA is to treat the individuals in the population equally and mate randomly among them. Such a mating scheme cannot produce good offspring stability. The EMS scheme regards the optimal individual as the emperor of the population, and the emperor has the right to mate with any individual in the population. This mating scheme enables the genes of the optimal individual to be smoothly passed to the next generation, and ensures that the high-quality gene is preserved during the random process.

2) SELECTION OPERATION
The selection operation of the SGA usually adopts roulette gambling. However, there are still some defects in this selection operation. Individuals with high fitness will occupy the population rapidly in the early iteration stage, and the population will stop evolving in the later iteration stage. Therefore, another selection operation is adopted to ensure the transmission of high-quality genes.
In the proposed selection operation, the population is first sorted according to the fitness degree, and divided into three parts. All superior individuals are copied to replace the inferior individuals, which constructs a new generation of the population, as shown in Fig. 6. This selection operation can protect the diversity of the population and avoid local optimization.

3) CROSSOVER AND MUTATION OPERATION
The crossover and mutation operation of the SGA were completed with fixed crossover rate P C and mutation rate P M , which may lead to the individual genes with high fitness being destroyed and the convergence speed of the algorithm being reduced. Therefore, in this study, the improved adaptive function is introduced to make the P C and P M adjust nonlinearly with the individual fitness to improve the convergence speed and stability of the algorithm. When the individual fitness is superior to the average fitness, the stability of excellent genome can be maintained by reducing the individual crossover rate and mutation rate; when the individual fitness is inferior to the average fitness, the diversity of population gene can be enhanced by increasing the individual crossover rate and mutation rate. The adaptive adjustment function of VOLUME 8, 2020 P C and P M is shown as (6) and (7) [41].
where: A is the transformation constant value equal to 9.903438, P Cmax and P Cmin are the maximum and the minimum of the crossover rate, respectively, P Mmax and P Mmin are the maximum and the minimum of the mutation rate, respectively, f avg is the average fitness of the population, f is the higher fitness value of two mating individuals, and f is the individual fitness value.

4) CATASTROPHIC OPERATION
The local optimum problem of the SGA can be solved by the laws of natural evolution. According to the laws of natural evolution, if we want to escape the trap of local optimal value, we must destroy all the best individuals at present and provide the possibility for other better solutions. When the best individual has not evolved in successive generations, or each individual has been too close to each other and the population lacks diversity, a catastrophic operation can be considered and applied. The specific step of catastrophic operation is to randomly select the individuals in the population, and carry out random mutation operations on them, to increase the genetic diversity of the population.
By using the catastrophic operation, the monopoly advantage of the original gene can be broken, the gene diversity for the population can be created, and the algorithm is more prone to converge to the global optimal value.

B. CLASSIFICATION STRATEGY
In order to distinguish the three IGA optimized SVM models obtained by the abovementioned texture features, the SVM models are named SVM_HOG, SVM_LBP and SVM_GLCM. To test the performance of each SVM model, 1,000 random experiments are conducted to test the three SVM models separately. The random tests are carried out according to the following steps. The data set is divided into a test set and a training set. The test set is made up of 10 samples that are randomly picked from the data set, and these samples contain all fault types. The remaining samples in the data set are used as the training set. The SVM model is then trained and obtained with the training set; after that, each SVM model is tested with the test set. The performance of the three SVM models in the random tests is shown in Fig. 7.
After obtaining the three independent SVM models, how to combine them into a strong classifier determines the performance of the final classifier. The most appropriate solution is to assign weights to the outputs of each SVM model and combine them into one strong classifier. The problem is how to assign weights to these SVM models. The first approach is equal weight assignment, namely, the principle of the minority being subordinate to the majority. This weight distribution method is simple and easy to implement, and has strong practical significance. However, from the performance comparison diagrams of the three SVMs in Fig. 7, the classification accuracy of SVM_HOG and SVM_LBP is significantly larger than that of SVM_GLCM. Thus, the strategy of equal weight distribution still has some shortcomings. This strategy ignores the differences between several SVM models. Moreover, it fails to process the worst-case scenario in which the outputs of three SVM models are inconsistent with each other.
Therefore, in this study, the following strategy is adopted. First, following the principle of the minority being subordinate to the majority in general, the three SVM models are voted to determine the final classification results of the samples. If the outputs of the three models are inconsistent, then the principle of the minority being subordinate to the majority will not be applicable. Because judging the correctness of model is a random process, the Markov chain model is used to decide which SVM model and its output are adopted as the final classification result of the sample.
We conducted numerous random experiments to collect a sufficient number of cases in which the outputs of the three models are inconsistent. The classification accuracy of SVM_HOG and SVM_LBP is much higher than that of SVM_GLCM; therefore, the output of SVM GLCM is directly excluded, and one of the outputs of SVM_HOG and SVM_LBP is selected as the final classification result. When the output of SVM_HOG is the correct judgment, the record state of the Markov chain is 1. When the output of SVM_LBP is the correct judgment, the record state of the Markov chain is 2. The Markov chain model with a chain length of 100 is established and 99 state transitions are recorded in a Markov chain. The state transition probability can be calculated by using the frequency of transitions between states. The initial transition probability is obtained by the ratio of test accuracy of SVM_HOG and SVM_LBP.

V. EXPERIMENTAL RESULT A. RESULTS AND ANALYSIS
In this study, the experimental transformer FRA data are obtained from the previous work reported in [7]. A special model transformer was manufactured to perform the winding simulated fault experiments. All experiments were carried out using FRA end-to-end open circuit connection, and the amplitude and phase of the frequency response were recorded in experimental data. Fig. 9 shows images of the experimental transformer model. More information about the model transformer and the detailed experimental setup can be found in [42], [43].
Winding RD, inter-disk Short Circuit (SC) and Disk Space Variation (DSV) faults were simulated on the model transformer. In the case of the SC fault, 15 groups of different  frequency response data were measured. In the case of the DSV fault, 21 groups of different FRA data were obtained. RD fault experiments were carried out with 17 different conditions. Fig. 11 shows some typical FRA polar plots in different situations under different fault types. To be clear, the proposed method is based on the texture of polar plot to achieve fault classification. The existence of scale will seriously affect the accurate extraction of polar plot texture. Thus, the scales were removed in the polar plots shown in Fig. 11. The scale and data distribution range are found in Fig. 1. Fig. 10 shows a typical fitness curve when the IGA is used to optimize the SVM parameters. Two obvious phenomena are observed. In the early iteration stage of the algorithm, the average fitness of the population rapidly converges to the optimal fitness, which is owing to the EMS adopted by the algorithm. Each individual of the population mates with the best individual, which results in improved average fitness and rapid convergence. In the later iteration stage, it can be observed that the average fitness of the population changes significantly between several generations, which is caused by the catastrophic operation. The fitness of the population is high, and the catastrophic operation is very likely to cause a decline of the average fitness. Meanwhile, the gene diversity improves, which is helpful for searching the global optimum.
After combining several SVM models into a strong classifier according to the above methods, the performance of the final classifier is also evaluated by the aforementioned random experiments. Fig. 12 shows the confusion matrix of the final classifier. Labels 1, 2 and 3 correspond to the DSV, SC and RD fault, respectively. The labels at the bottom of Fig. 12 are the actual labels of the winding fault, which are the targets of the classifier. The left labels in Fig. 12 are the output labels obtained by the classifier. Taking the output and target of the classifier as the index, the classified number of corresponding samples and their percentage to the total number of samples is also displayed in the box. If the classification result VOLUME 8, 2020  It can be seen in Fig. 12 that the accuracy of the final classifier reaches 97.8%, and the performance is excellent. It should be noted that the FRA data are not divided into several sub-frequency bands in this contribution; instead, the entire frequency band of 1-1,000 kHz is analyzed. From the results, the proposed method has reached a satisfactory accuracy, thus, it is not necessary to divide the frequency band. In addition, the most likely mistake that the classifier makes is to divide SC fault into DSV fault. Compared with other fault types, one might suspect that the classifier does not fully extract the features of SC fault. In fact, the measured FRA data of the SC fault are the rarest of the three fault types. This phenomenon is closely related to the lack of SC fault measured data. If the proposed method is industrialized in the future, the problem of insufficient winding fault training data will be solved, and the classification accuracy of the classifier will be further improved.  Table 2 shows the comparison of IGA-based and SGA-based parameter optimization algorithm for the SVM model. The prediction accuracy and time consumption of the three SVM models are compared. It can be concluded that the time consumption of IGA is longer than that of SGA, but the accuracy of the IGA optimized model is much larger than that of the SGA optimized model. The searching ability of IGA is stronger, and the quality of the obtained solution is higher.

B. COMPARISON WITH THE TRADITIONAL METHOD
To compare the classification result of the proposed method with the traditional method, the classification results of previous contribution [7] using traditional mathematical indexes and single SVM method are also established. Fig. 13 (a) compares the recognition accuracy of the proposed method with that of the traditional method under different fault types. Fig. 13 (b) illustrates the classification accuracy. Form Fig. 13, the following conclusions can be drawn. Although the SC fault classification accuracy of the proposed method is slightly lower than that of the traditional method, the RD fault classification accuracy is much higher. In addition, the shaded area surrounded by the red line in Fig. 13 (a) is much larger than that surrounded by the blue line, which shows that the proposed method has a more balanced performance for the classification of various faults. This phenomenon benefits from the combination of multiple SVMs, which once again indicates that the proposed multi-SVM model is more intelligent and powerful than the single SVM model.
In addition, several typical contributions of the existing literature are also compared with the method proposed in this VOLUME 8, 2020 study. The classification accuracy of the fault types achieved by Ref. [44] is 96.74%, and that of Ref. [45] is 97.2%. Both of them are smaller than the 97.8% of the proposed method. Although the samples for training and testing SVM models are quite different in various literature, the comparison of accuracy can indirectly demonstrate the effectiveness of the proposed method.

VI. CONCLUSION
This article presents an improved method for classifying and recognizing transformer winding mechanical faults. The HOG, LBP and GLCM texture features of the transformer polar plot are extracted and input to the multi-SVM classifiers for fault type identification. The IGA is used to optimize the SVM parameters to achieve a better performance. The following conclusions can be drawn: 1) According to the proposed method, the obtained classifier model is capable of classifying transformer winding deformation fault types; 2) The texture feature extracted from the polar plot can represent the state information of transformer winding; 3) The classification accuracy of the multi-SVMs classifier is higher than that of the traditional single SVM classifier. However, there are still some limitations to this study. The data set of this study is not large enough to be widely representative. Moreover, this study only discusses the classification of fault types, and does not pay attention to the degree and location of fault.
The experiment results show that this method is a highly accurate fault classification method. It can provide guidance in fault diagnosis work if massive FRA fault data are obtained in the future. In addition, the identification of transformer fault location and degree can be taken into account in the next work. Data mining and neural networks with more complex structures can be used for fault diagnosis when largescale data of transformer winding deformation faults are available.