Classifying Transformer Winding Deformation Fault Types and Degrees Using FRA Based on Support Vector Machine

As an important part of power system, power transformer plays an irreplaceable role in the process of power transmission. Diagnosis of transformer’s failure is of significance to maintain its safe and stable operation. Frequency response analysis (FRA) has been widely accepted as an effective tool for winding deformation fault diagnosis, which is one of the common failures for power transformers. However, there is no standard and reliable code for FRA interpretation as so far. In this paper, support vector machine (SVM) is combined with FRA to diagnose transformer faults. Furthermore, advanced optimization algorithms are also applied to improve the performance of models. A series of winding fault emulating experiments were carried out on an actual model transformer, the key features are extracted from measured FRA data, and the diagnostic model is trained and obtained, to arrive at an outcome for classifying the fault types and degrees of winding deformation faults with satisfactory accuracy. The diagnostic results indicate that this method has potential to be an intelligent, standardized, accurate and powerful tool.


I. INTRODUCTION
Large power transformers constitute very expensive and vital components in electric power systems [1].The reliability of power transformers, which are critical core equipment in power transmission and distribution systems, dictates the safe and reliable performance of the entire electrical system [2].Most importantly, the stable and safe operation of power transformer is of significance to the normal operation of power system.The faults of power transformers will have serious impact on the safety of power grids [3].It is necessary to pay attention to higher reliability of power transformers, due to possible failures of an electric system and because of their cost [4].
Due to above reasons, condition monitoring of transformer operation status has attracted more and more attention throughout the world [5].Winding

mechanical deformation
The associate editor coordinating the review of this article and approving it for publication was Canbing Li. faults are one of common fault types of transformers.At present, many winding mechanical fault diagnosis methods are proposed theoretically and practically, for instance, the short-circuit impedance (SCI) method based on the principle of short-circuit impedance measurement, the lowvoltage impulse (LVI) method based on the principle of signal analysis, the frequency response analysis (FRA) method, etc.In SCI method, measured SCI of a phase winding is compared to the value that appears at the nameplate or factory test results [6].In LVI proposed by Lech, W. and Tyminski, L. in 1966 [7], time domain signals of winding before and after fault are compared and analyzed to present the information of winding deformation.
Above all, FRA method has been widely accepted because it's economic, accurate, simple and fast.FRA, firstly introduced by Dick and Erven [8], has adopted sweep frequency sinuous signal to excite transformer windings and measured the response signal in the frequency domain to construct a frequency response signature [9].The transformer winding is proven to be equivalent of an electrical network consisting of resistance, capacitance and inductance in high frequency range, and its frequency response signature can represent the status of the winding [10].The frequency response signature of the transformer after the factory production can be regarded as the mostly original, standard and healthy signature of the transformer, which is frequently called fingerprint.By comparing the frequency response curve of the subsequent transformer in fault status with its fingerprint, the specific fault type and severity of windings could be diagnosed [11].
Frequency response signatures of transformers with the same fault type always present the similar characteristics and patterns.On this basis, after comparing frequency response signatures with the fingerprint, the common characteristics of the certain fault type can be extracted.Based on this principle, it is only need to extract the different characteristics of frequency response signature with different faults; as a result, the types of transformer winding mechanical faults can be diagnosed.The same is true for the diagnosis of fault degree.However, as so far, there is still no standard and reliable code for interpretation of FRA signature [12], [13], in which, the analysis of FRA mainly relies on visual inspection or mathematical calculation.The diagnosis result by visual inspection is easily affected by the subjectivity of personnel, while the winding fault type and degree are not easily recognized by simple mathematical calculation.
Recently, artificial intelligence (AI) has developed as an advanced technique which was successfully used in many fields.Some relevant researches on the application of FRA and AI algorithm in transformer fault diagnosis have been successively conducted.For instance, A. J. Ghanizadeh and G. B. Gharehpetian trained the neural network classifiers by processing the FRA data [14].Bigdeli used SVM to diagnose transformer winding fault on the basis of transfer function (TF) [15].Zhao combined SVM with impulse frequency response analysis (IFRA) to diagnose transformer fault [16].Deng classified the transformer winding deformation based on SVM and finite element analysis (FEA) [17].However, in reference [14], the neural network is not fit for the small sample data of transformer winding faults, sometimes it is easy to fall into the local optimal solution, which leads to premature maturity.What's more, the required data for training and testing are obtained by the simulation in this study, instead of the FRA data measured in the actual transformers.In reference [15], only a few feature quantities have been extracted and applied to the fault diagnosis, which can be further improved.Reference [16] only diagnoses the fault types, without further classifying and predicting the degree of fault.In reference [17], the identification of winding fault is based on the short circuit impedance method, not the FRA signature; what's more, the data for training are obtained by FEA.The optimization algorithm of SVM parameters can still be further improved.
In view of above background, this study proposes the identification of actual transformer winding deformation faults by combining FRA and SVM.SVM is popular and powerful due to its unique advantages in solving small sample, nonlinear and high-dimensional pattern recognition problems; it has good generalization ability in the case of limited samples.This is of great practical significance for the fault diagnosis and prediction of power transformer, which, the identification of winding deformation faults is always the problem of sample shortage and high nonlinearity between the fault phenomena and fault reasons [18].
In this study, the SVM model is trained by FRA data.The parameters of the SVM model are optimized by the particle swarm optimization (PSO) algorithm.The characteristics of transformer FRA signatures with different fault types and degrees can be obtained by the SVM model, which is used for fault classification.Besides, some characteristic features are introduced by statistical analysis of FRA signatures.Both the fault types and degrees are comprehensively discriminated.In addition, to compare the proposed method with the current state of the art of SVM on discriminating FRA signatures, two common parameter optimization algorithms-the grid search algorithm and genetic algorithm are also discussed in this study.
The rest of this paper is organized as follows.Firstly, the experimental setup and the experimental results are introduced in the Section II.The key tools and procedures used in this study are described in the Section III.The classifying results and analysis are presented in the Section IV.Refer to Section V for the details of conclusions of this study and further research work.All the data tables supporting this article can be seen in Section VI.

II. EXPERIMENTAL SETUP AND RESULT A. EXPERIMENTAL SETUP
In this paper, a specially manufactured model transformer is adopted to perform all experiments.Detailed parameters of the model transformer are shown in Tab. 1.Three common transformer faults are simulated by experiments, namely, disk space variation (DSV), inter-disk short circuit (SC) and radial deformation (RD).More information about the model transformer and the detailed experimental setup can be found in [16], [19], [20].
The DSV fault has been revealed that the capacitance parameter dominates the effect and this fault could be emulated by changing the inter-disk capacitance parameter [11].The influence of DSV fault on transformer is simulated by connecting several disks with paralleled capacitors.The magnitude of capacitance indicates the degree of fault, including 50 pF, 67 pF, 100 pF, 200 pF, 400 pF, 600 pF and 800 pF.The influence of SC fault on transformer is simulated by shorting the connectors between the adjacent disks.The transformer manufacturer also produced some windings with variable deformations, which are used to replace the middle 10-disk windings to simulate the RD faults [16].The image of the winding RD fault is shown in Figure 1.In Figure 1 (a), d represents the amount of RD, which is a variable, θ represents the angle which is fixed at 45 • , the ratio of d and the   winding radius r are set to be 3%, 5%, 7% and 10% to emulate the different degree of RDs produced at one direction [20].There are also other RD fault windings in which the faults are manufactured at different directions, but the ratio of d and r is fixed at 5%, as shown in Figure 1(b) [16].All fault emulated experiments are performed using FRA end to end open circuit connection.Two sets of FRA data under transformer healthy status are measured and taken as the standard FRA signatures.The reason why two sets of FRA data are chosen as the fingerprints is that the measurement errors were taken into account.In the process of FRA measurement, there are a number of unavoidable factors that can interfere with the measurement results to varying degrees.
Taking two sets of FRA data as the fingerprints can weaken the influence of measurement error on the diagnosis results to a certain extent.The two sets of data were named normal 1 and normal 2.

B. EXPERIMENTAL RESULT
In the case of SC fault, 15 groups of different frequency response data were measured.In the case of DSV fault, 21 groups of different FRA data were obtained.RD fault experiments were carried out with 18 different conditions.Figure 3 ∼ 5 show the comparison between the new measured FRA traces and the fingerprint of transformer with different fault status, including variable fault degrees and locations, respectively.

III. APPLICATION OF SVM TO DIAGNOSE WINDING FAULT A. BRIEF INTRODUCTION TO THE KEY TOOLS
SVM has been in existence for a couple of decades, but it has been developed as a powerful tool since it was firstly proposed by Vapnik in the field of machine learning.The application of SVM in fault diagnosis has attracted an increasing attention in recent years due to its good classification performance.The SVM is based on the statistical learning theory [21].Generally speaking, it is a two-classification model.Its basic model is described as a linear classifier with the largest spacing in feature space.Namely, the learning strategy of the SVM is to maximize the spacing, which can be ultimately transformed into a convex quadratic programming problem.
SVM is a classification method based on structural risk minimization (SRM) criterion.The learning strategy is to maximize the interval, and the solution of the optimal hyperplane can ultimately be transformed into the solution of a convex quadratic function problem [22].It has particular advantages of solving the classification problem of smallscale samples, non-linearity, and high-dimension.According to Statistical Learning Theory (SLT), which is the theory of finite sample statistics put forward by Vapnik et al [23]- [26],   the so-called structural risk minimization is to reduce the Vapnik-Chervonenkis (VC) dimension of the learning machine while ensure the classification accuracy rate (empirical risk), so as to control the expected risk of the learning machine on the whole sample set [25].The relationship between the expected risk and the empirical risk is shown by: where R exp and R emp represent the expected risk and the empirical risk (the classification accuracy rate), and is the confidence interval that is related to the number of samples and the VC dimension, where n is the number of samples and h represent the VC dimension.The confidence interval in the above formula decreases monotonously with the increase of ratio of n to h.In order to make the model has better generalization ability, it is necessary to reduce the VC dimension while minimizing the empirical error.Define the classification interval as ρ.At the ρ interval, the VC dimension h of the hyperplane sets satisfies the following relationship: where f is a monotonic increasing function.It can be seen from Equation (2) that h is inversely proportional to the square of ρ.Thus, maximizing the sample interval can minimize the VC dimension to make the generalization ability of the model stronger.In summary, the principle of SVM is to maximize the interval between two classes of samples in the process of sample classification, so that the training model has strong generalization ability.
The most widely used radial basis function (RBF) is selected as the kernel function of the SVM algorithm in this study.The expression of the function is as follows: The LibSVM provides complete functions for training and testing of SVM models [27].
The parameters of the SVM model are optimized by the PSO algorithm.The PSO algorithm was firstly proposed by Eberhart and Kennedy in 1995 [28], [29].Its basic concept stems from the study of foraging behavior of birds.The PSO algorithm emulates the behavioral characteristics of this biological population and is used to solve the optimization problem.
The initialization of the PSO algorithm is a group of random particles, and then the optimal solution is found through multiple iterations.In each iteration, the particle updates itself through the optimal solution it calculates (namely the individual optimum) and the optimal solution currently calculated by the entire population (namely the global optimum).The formula for updating the velocity and position of particles is introduced by Equation (4) and Equation (5).
where P i and V i are the position and velocity of the i th particle, respectively.C 1 and C 2 are acceleration constants, also known as learning rates.W is an inertia constant, and R 1 and R 2 are random numbers in the range of 0 to 1. E best represents the individual optimum and G best represents the global optimum.PSO algorithm can be used in many fields because of its superior optimization performance, which plays an important role in swarm intelligence algorithm.

B. STRUCTURE OF DATA SETS
Extracting the representative fault features is an important prerequisite for accurately discriminating the various types of faults.At present, the mainstream feature extraction methods from FRA signatures are divided into two categories, the statistical indicators and FRA signature waveform features [30].In this study the feature extraction based on statistical indicators is used to quantify the difference between the fingerprint and the measured new trace.Calculating statistical indicators is easy and fast, and the noise has little influence on the calculation results [30].
The data used for training and testing of SVM model are experimental FRA data of transformer as mentioned above.The following Tab. 2 shows the eight different mathematical features [31]- [33] of the measured data extracted by a series of mathematical operations between the new trace and the fingerprint, the calculate expression of the features is also presented.These mathematical features include the Correlation Coefficient (CC), the Euclidean Distance (ED), the maximum of difference (MAX), the Integral of Absolute difference (IA), the Sum Squared Error (SSE), the Sum Squared Ratio Error (SSRE), the Sum Squared Max-Min Ratio Error (SSMMRE) and Root Mean Square Error (RMSE).
The reasons why these 8 features are chosen can be seen below.CC describes whether the relationship between two independent variables is close.To a certain extent, it is to describe the similarity.ED is a commonly used definition of distance, which is the actual distance between two points in m-dimensional space.Its geometric meaning is clear.The correlation coefficient and Euclidean Distance are the two frequently used features of FRA signature for pattern recognition.MAX represents the maximum range of data changes.IA determines the gap between the new trace and the fingerprint.Intuitively speaking, it is the sum of the areas enclosed by two curves.The actual error can be reflected by this index.Additionally, SSE and RMSE are similar, which explain the dispersion between finger print and the new trace in different ways, and measure the deviation between them.At last, SSRE and SSMMRE are similar, which highlight the impact of relatively large errors and weaken the impact of relatively small errors.In practical situations, the transformer is most likely diagnosed as a healthy state when the curve is slightly offset.However, when there is a large deviation in the curves, the transformer may be identified as a fault state.The characteristics of SSRE and SSMMRE conform to this common fault diagnosis criterion.
The form of data set is 111-by-8, where 111 represents 111 groups of data and 8 is the feature number.Additionally, it's not difficult to understand that the form of data label matrix is 111-by-1.In other words, each group of data contains eight components.The details are shown in the Tab. 3 and Expression (6).
Expression (6) shows the detail structure of the data set, where τ i,j represents the j th feature of the i th data group.
When transformer fault types are classified, 1-3 rows of data label matrix were set to 1 to represent health status, and 4-45 rows of data label matrix were set to 2 to represent DSV fault, 46-75 rows of data label matrix were set to 3 to represent SC fault, 76-111 rows of data label matrix were set to 4 to represent RD fault.4,6,8,10. . .rows of data set were taken as training set, and the corresponding rows of data label matrix were taken as training label matrix.Accordingly, 5,7,9,11. . .rows of data set were taken as testing set, and the corresponding rows of data label matrix were taken as testing label matrix.
When classifying the degree of fault, because of the difficulty in quantifying the degree of the fault in the case of SC fault, the SVM algorithm is only used to classify the cases of DSV fault and RD fault.In the case of DSV fault, 4-24 rows of data set were taken as the training set and 25-45 rows of data set were taken as the testing set.Data label matrix can be determined by the fault level of corresponding rows.In addition, considering that the experimental FRA data are limited in the case of RD fault, only a few samples were taken randomly as the testing set and the rest as the training set.

C. PROCEDURE OF PROPOSED METHOD
The procedure of the proposed method is depicted in Figure 6.The original data is used to plot the corresponding FRA signatures, and then the features of the curves are extracted as the input of SVM.The input is divided into two parts, one for training and the other for testing.Only when the training accuracy meets the requirement, the model completes the training and is used for testing, and finally the diagnostic results are obtained.

D. RESULT OF PROPOSED METHOD FOR FAULT DIAGNOSIS
After training the SVM model, the model is tested with the testing set.The detail result of model testing is shown in following Tab.4-10.When the of the classifier is fault types, half of data set is extracted as a training set and the other half as a testing set.In order to make the SVM model to identify the various fault types, the training set learned the model contains samples of various fault types, DSV, SC and RD.The training of model is 100% (54/54).The testing results of various fault types are given in Tab.4-6.
Table 7 is a summary of the testing results from Tab. 4 to Tab. 6.From Table 7, it is clear that the proposed method is capable of discriminating fault types.The comprehensive classification accuracy can reach up to 96.30%.In particular, DSV and SC fault can be identified without error.
When the aim of the classifier is fault level, two groups of experiments were carried out, namely, DSV level classification and RD level classification.It can be concluded that the diagnostic accuracy rate for discriminating the winding RD fault degree is around 70% by the statistical law.The experimental data are visualized in Figure 7.

E. COMPARISON OF PARAMETER OPTIMIZATION ALGORITHMS BETWEEN PROPOSED METHOD AND CURRENT LITERATURE
The penalty coefficient and kernel function parameters (C, g) of SVM model have a crucial impact on the performance of the model.In other words, the quality of model parameters determines the performance of the model.In this study, except for the PSO algorithms, the other two frequently used algorithms -the grid search and genetic algorithm are also used to optimize the (C, g).
The traditional parameter optimization method in the most of existing literatures is the grid search algorithm.In fact, the grid search algorithm is essentially an enumeration method.In a given interval, many small mesh intervals are  divided according to the given step size.By calculating whether the parameters in each mesh interval can reduce the training error of the model, the optimal parameters can be determined.
In addition, both of genetic algorithm and PSO algorithm are popular at the moment.They are quick and accurate to obtain the optimal parameters with greater probability.
In order to illustrate the differences of these three algorithms in optimizing parameters for the SVM model, taking the classification of fault types as an example, each method is debugged three times to calculate the most appropriate parameters.The debugging results of each optimization algorithm method are shown in the Tab.11.Similar results were obtained for classification of fault degree.
Figure 8 ∼ 10 show the result diagrams of above parameter optimization methods, respectively.
When the grid search algorithm is used to optimize the SVM parameters, the parameters are first searched in a large interval.The search interval and step size are then manually adjusted referring to the results of parameter optimization.Thus, this method is often time-consuming in the early search stage and the human intervention is needed.When genetic algorithm is used to optimize the parameters,

FIGURE 1 .
FIGURE 1. Diagrammatic sketch of winding RD fault: (a) diagram and image of RD; (b) 3D visualization of winding RD with faults manufactured at different directions.

Figure 2 (
a) shows the schematic diagram of transformer experiment and Figure 2 (b) indicates the image of measurements of the research.

FIGURE 2 .
FIGURE 2. Image of measurements of the research: (a) schematic diagram of transformer experiment; (b) image of the testing ground.

FIGURE 3 .
FIGURE 3. Visualization of DSV Fault Experiment Data (the solid line is fingerprint and the dotted line is new trace which corresponds to different fault status).

FIGURE 4 .
FIGURE 4. Visualization of SC Fault Experiment Data (the solid line is fingerprint and the dotted line is new trace which corresponds to different fault status).

FIGURE 5 .
FIGURE 5. Visualization of RD Fault Experiment Data(the solid line is fingerprint and the dotted line is new trace which corresponds to different fault degree).

FIGURE 6 .
FIGURE 6. Block diagram of the proposed method.

TABLE 10 .
Processing result of 1000 groups of randomized trials for discriminating RD fault degree in 3 times.When it comes to DSV level classification, similarly, the DSV fault data was divided into two equal parts, one for training and another for testing.The training accuracy of model is 100% (21/21).The result of classifying the levels of DSV faults are presented in Tab. 8.As shown in Tab. 8, basically, the fault degree of DSV can be accurately identified by the model.It is remarkable that in the only wrong set of data, there is not much difference between what the model predicts (67pF) and what it actually is (50pF).Therefore, the model is capable of detecting the fault degrees of DSV fault.When classifying the degree of RD fault, testing set and training set are randomly extracted from data set.The following tables are the classification results of a typical training set and its testing set.The diagnostic accuracy rate varies with the selection of testing sets and training sets.Anyway, it is not difficult to conclude that the classification accuracy rate must obey the statistical law.Based on this idea, 1000 groups of randomized trials were conducted for 3 times.In each of the randomized trials, 5 groups of FRA data of RD fault are selected randomly from the data set and taken as the testing set, and the selected testing set cover all of the fault degree.rest of data are taken as the training set and cover all fault degrees, too.The average training accuracy of the above 3000 random trials equals to 96.79%.The diagnostic accuracy rates of experimental are listed Tab.10.

FIGURE 8 .
FIGURE 8. Result diagram of grid search algorithm.

TABLE 1 .
Design specifications of specifically manufactured model transformer.

TABLE 3 .
Structure of data set and details of each group.

TABLE 4 .
Classification results for DSV.

TABLE 5 .
Classification results for SC.

TABLE 6 .
Classification results for RD.

TABLE 7 .
Summary of diagnostic results about fault types.

TABLE 8 .
Classification results for the level of DSV.9.Classification results for a typical testing set for the level of RD.