Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier With Decision Tree Principle

Dissolved gas analysis (DGA) is the standard technique to diagnose the fault types of oil-immersed power transformers. Various traditional DGA methods have been employed to detect the transformer faults, but their accuracies were mostly poor. In this light, the current work aims to improve the diagnostic accuracy of power transformer faults using artificial intelligence. A KNN algorithm is combined with the decision tree principle as an improved DGA diagnostic tool. A total of 501 dataset samples are used to train and test the proposed model. Based on the number of correct detections, the neighbor’s number and distance type of the KNN algorithm are optimized in order to improve the classifier’s accuracy rate. For each fault, indeed, several input vectors are assessed to select the most appropriate one for the classifier’s corresponding layer, increasing the overall diagnostic accuracy. On the basis of the accuracy rate obtained by knots and type of defect, two models are proposed where their results are compared and discussed. It is found that the global accuracy rate exceeds 93% for the power transformer diagnosis, demonstrating the effectiveness of the proposed technique. An independent database is employed as a complimentary validation phase of the proposed research.


I. INTRODUCTION
High voltage power transformers are mainly required for the associated heavy and powerful applications in an industrial environment.These transformers use particular insulation systems that depend primarily on the voltage levels.Indeed, the higher the voltage, the greater the impact on the transformer's lifetime and reliability [1].Otherwise, the transformer insulation systems may deteriorate when exposed to numerous defects arising from overheating, paper carbonization, arcing, and discharges of low or high energy [2].Therefore, early-stage detection of faults should be conducted to ensure an efficient service of these transformer [3], [4].For this purpose, several methods were proposed in the literature.Among them, Dissolved Gas Analysis (DGA) represents The associate editor coordinating the review of this manuscript and approving it for publication was Baoping Cai .one of the fastest, economical, and widely used techniques referring to those related to the dielectric insulation systems [5], [6].Hydrogen (H 2 ), Methane (CH 4 ), Acetylene (C 2 H 2 ), Ethylene (C 2 H 4 ), and Ethane (C 2 H 6 ) might be generated within the oil during a faulty mode [7].Hence, the power transformer's abnormal state can be identified by the DGA method according to the dissolved gases composition and content.The concentrations of these gases are associated with six basic electrical and thermal faults, which might occur separately or in a combination [1], [8].
Based on DGA, different approaches have been developed to diagnose the multiple transformer faults and quantitatively indicate each fault's likelihood.These approaches are mainly based on (i) graphical (e.g., [2], [9]), (ii) artificial intelligence techniques (e.g., [10], [11]), and (iii) other improved coupled techniques (e.g., [12]).Overall, the analysis is generally based on the concentration of the five principal hydrocarbon gases where the said concentration is used directly in ppm or percentages to the total sum.Likewise, various methods are based on a combination of ratios of some specific gases.For instance, three-gas ratios (C 2 H 2 /C 2 H 4 , CH 4 /H 2, and C 2 H 4 /C 2 H 6 ) are used in Roger's method [6] and IEC 60599 [8], three relative percentages in Duval's triangle (%CH 4 , %C 2 H 2 and %C 2 H 4 ) [2], and four-gas ratios (CH 4 /H 2 , C 2 H 2 /CH 4 , C 2 H 4 /C 2 H 6, and C 2 H 2 /CH 4 ) in Dornenburg method [1].Such techniques are mostly required for the associated heavy and powerful applications.These techniques and other ones (e.g., [13], [14]) are presented in the literature to identify the different kinds of faults occurring in operating transformers.However, their diagnostic accuracy requires further improvement.In this light, Duval's pentagon has been developed as a complementary tool for interpreting the DGA in power transformers [9].
For this technique, five relative percentages of the (five) leading hydrocarbon gases (%H 2 , %CH 4 , %C 2 H 2 , %C 2 H 4, and %C 2 H 6 ) are used.In the same context, probabilistic classifiers based on Parzen Windows, Bayesian and Mexican hat functions (e.g., [15], [16]) have been employed for transformers fault classification using actual DGA data.Moreover, various artificial intelligence techniques have been also applied for the transformer fault diagnosis, such as a fuzzy logic technique [17].On the other hand, bootstrap and genetic programming have been developed to extract classification features for each fault class.These extracted features have been employed as the inputs to an artificial neural network (ANN), a support vector machine (SVM), or a K-Nearest Neighbor (KNN) classifier to perform multicategory fault classification [18].Also, combined Duval pentagon with SVM and KNN algorithms have been proposed to improve the fault diagnostic accuracy [7].It is worth noting that the KNN algorithm was first suggested by Cover and Hart in 1967 [19].This algorithm has encountered several recent improvements (e.g., [20]- [23]).
Overall, the originality of this work consists in introducing several input vectors into the KNN classification algorithm based on a decision tree principle (DT) in order to select the best one that achieves high accuracy for the transformer faults diagnostic.Various types and combinations of input vectors have been employed, namely, the concentration of gas in ppm, relative concentration of gas in percentage, IEC ratios, Rogers four-ratios, Dornenburg ratios, Duval triangle coordinates, Duval pentagon coordinates, a combination of Rogers and Dornenburg ratios, and the combination of Duval triangle-pentagon coordinates.The accuracy rate has been analyzed to select the most appropriate input vector for the proposed method.
The current paper is organized as follows: In Section 2, we formulate the faults classification problem in power transformers and describe the database set used in this investigation.A general description of the KNN technique's theory is introduced in Section 3, including the basic theory of the KNN technique.In Section 4, the interpretative methods based on DGA are reported.The selection of an appropriate input feature for each classification layer and the proposed combined KNN classifier's performance with the decision tree rule are accomplished in Section 5 to demonstrate the proposed method effectiveness.Finally, conclusions are summarized, and potential future works are discussed in Section 6.

II. PROBLEM FORMULATION
Highly reliable transformers are mainly made of iron core and windings.Both components are placed in a tank filled with insulating oil. Figure 1 shows a cross-section of a typical oil-immersed power transformer.Dissolved gases, that are liberated under particular electrical or thermal constraints, represent a powerful feature indicating the affection of oil properties.In general, the most important gases are H 2 , CH 4 , C 2 H 2 , C 2 H 4 , and C 2 H 6 .The concentration of these gases is strongly related to the type of transformer fault, and the rate of gas generation can be used to identify the fault type [24].For instance, acetylene is associated with arcs where temperatures reach several thousand degrees.Ethylene is related to hotspots between 150 • C and 1000 • C, and hydrogen with cold gas plasma from corona discharges.Although mixtures of all gases, including other saturated hydrocarbons, are generally obtained in most cases of faults where their relative proportions have been correlated with the different fault types [25].
A database set of 501 samples is used to train, test, and validate the proposed classifier in the present investigation.This database is collected from the literature [4], [28]-[31].The fault type samples distribution (number of samples associated with each defect) is explained in Table 1.

III. K NEAREST NEIGHBORS CLASSIFIER (KNN)
KNN algorithms are ranked among the simplest intelligent algorithms that do not require any learning phase.It is based on calculating the distances between the sampling points to the nearest neighbors of the set of assigned points [19].The decision is based on the majority vote of the KNN.Many types of distances can be used to decide the nearest neighbors, such as Gaussian, triangular, cosine . . .etc. [10].
Figure 2 illustrates the principle as well as the influence of the choice of neighbors number.The three closest star neighbors selection allows the star classification as a square (objects inside the small circle in a continuous line).However, the star is classified as a triangle if we consider five closest neighbors (items inside the large ring in discontinuous line).Indeed, the choice of the neighbor's number k is a leading factor during the classification process.

A. 1-NN ALGORITHM (KNN WITH K = 1)
We consider L the data set of awarded points given by: L : {(y i , x i ), i = 1, . . ., n} where yi ∈ {1, . . ., c} denotes the class of sample i and the vector xi = (x i1 , . . ., x ip ) represents the variables that characterize the sample i.
The distance function d(.,.) determines the nearest neighbor.P variables characterize the Euclidean distance between a sample xi and an attributed point x j and is defined by: The observation of the sample (y, x) by the nearest neighbor (y(1), x( 1)) in the learning sample is determined by: We designate by ŷ = y(i) the estimated class of the nearest neighbor.This class is selected for the prediction of y.
Considering the Minkowski formula of equation ( 3), the Euclidian distance is obtained by replacing p by 2 and is given by equation ( 4) The observation is predicted in class l, only if l = max (k r ) with k ∈ N.
It is recommended to choose the odd k, o avoid equal votes in the binary classification.However, in the case of multiclasses, the best choice of k depends on the nature of the data.The noise effect on classification (risk of overlearning) is reduced when k takes large values.However, this choice makes the boundaries between classes less distinct.The best selection of k is the one that minimizes the classification error [27].

C. KERNEL FUNCTION
The classification is influenced by parameter k and the type of the kernel function K(x).The K function must check the following properties: -K(d) ≥ 0 for all d ∈ ℜ; -K(d) reaches its maximum for d = 0; -K(d) decreasing for d → ±∞.Several kernel functions [27] are rectangular (uniform law), triangular, Epanechnikov, cosine, Gaussian, and reverse.

D. KNN ALGORITHM ADVANTAGES AND DISADVANTAGES
This technique is easy to implement and apply to any data, including complex ones such as geographic information, text, images, and sound.Also, it is robust to noise.The introduction of new data does not require the reconstruction of a model.The class is assigned to an object with ease and clarity once the closest neighbors are displayed.
As mentioned, the method performance depends on the distance type, and the number of neighbors, and how the neighbors' responses are combined.The results could be of poor quality if the number of relevant attributes is low relative to the total number of characteristics.The distances on the irrelevant attributes will drown out the proximity on the appropriate attributes.The calculations made in the classification phase can be very time-consuming.

IV. DGA-BASED INPUT VECTOR
Many interpretative methods based on DGA were reported in the literature to detect the incipient fault nature within an oil-immersed power transformer [7], [10].These techniques mainly include the following input vectors: • Vector 1: Since the database contains the concentrations of the five gases in parts per million or ppm, each sample X is represented as follows: • Vector 2: Since the weight percent of the gases would result in an inopportunely small number, percent concentration to the total sum is also used where each sample • Vector 3: The IEC Ratios method is used to produce the following input vector containing three ratios of the dissolved gases given by: • Vector 4: Roger's four-ratio method has been selected in this case to transform each sample to the following one: • Vector 5: Dornenburg's method is also investigated in this study.In this method, the input consists of four ratios computed as a function of the dissolved gases in ppm as follows: • Vector 6: Duval triangle is a graphical method that use only the concentration of three gases (CH 4 , C 2 H 2, and C 2 H 4 ) to produce the input vector as follows: where the components C x and C y are computed by: And The parameters x i are defined as follows: ) The parameters y i can be found by replacing the cosine with the sine in the previous expressions with α = 2π 3.
• Vector 7: The input vector, in this case, has two components given as follows: X is computed according to the Duval pentagon that uses the concentration of five gases in percentages.The components C x and C y are calculated by: And where the parameters xi are defined as follows: ) The parameters y i can be found by replacing the cosine with the sine in the previous expressions with α = 2π 5.
Other possible combinations of the above technique have also been proposed to give strong credibility to the obtained results.Two combinations are given below.
• Vector 8: The first combination consists of an input vector of five ratios given by: Equation ( 14) refers to the mixture of Roger's and Dornenburg's methods.
• Vector 9: The input vector, in this case, has four components given as follows: According to Duval's triangle and pentagon, this vector has computed that use the concentration of five gases in percentages.C x1 and C y1 are calculated according to the triangle method, while C x2 and C y2 employ the pentagon technique.

V. SIMULATIONS AND RESULTS
The database of 501 samples, already used to evaluate the KNN classifier's accuracy rate, has been also employed in this section.Randomly selected, 321 samples have been utilized for the training phase,160 samples for the testing phase, and 20 samples to examine the validity of the proposed classifier.Each previous vector has been used independently as an input of a KNN classifier.Several types of distance have been used, namely ''Euclidean'', ''Cityblock'', ''Cosine'', and ''Correlation''.For the training phase, the number of neighbors k has been varied from 1 to 321 where the value corresponding to the better accuracy rate is maintained.This procedure has been repeated for the nine types of the previously defined input vectors.Figure 3 shows an example of the classification results obtained when using the first input vector (gases in ppm).
From the obtained results in Figure 3, an accuracy rate of 88.75% has been developed for both distances ''Correlation'' and ''Cosine'' with the same neighbors number k = 6.Whereas, an accuracy rate of 77.50% (respectively 76.25%) is obtained for ''Cityblock'' (respectively Euclidean) achieved for a distance k = 8 (respectively k = 4).
For several neighbors and distance types, all input vectors have been separately tested.Based on the obtained results, the best classification accuracy rate is selected for each input vector.Figure 4 regroups the best nine classification results.
From this figure, it is clear that the accuracy rate is affected by both the type and the number of the neighbors.Higher accuracies are obtained for a relatively low number of neighbors, tending toward a plateau of about 28%.In order to quantify the results of Figure 4, the best accuracy rate over the number of the neighbors is subtracted for each input vector.The obtained results are given by Table 2.
As shown in Table 2, it is found that the highest accuracy rate, of 91.88%, is obtained when employing vector 2 (gas in percentage) as an input of the KNN with Cityblock distance and k = 10.Furthermore, the combined triangle and pentagon coordinates (vector 9, using Cityblock distance and k = 6) came in second place with an accuracy rate of 90.63%.Finally, the combined Roger's and Dornenburg' gave the poorest results (29.88%) compared to the others.
As stated in the introduction, the novelty of this investigation is to examine several types of input vector, in the proposed classifier, and to compare the obtained diagnostic accuracies.Select the best input vector that effectively separates the electrical and thermal faults, which represents the first node of the classification process.For the electrical fault, there are two stages (nodes); (i) the first electrical node separates the fault PD from D1&D2, and (ii) the second electrical node distinguish between the faults D1 and D2.It is important to note that the same scenario is repeated for the thermal fault.The first thermal node aims to separate T3 and T1&T2, and the last thermal node intends to isolate T1 and T2.The decision tree's strategy is represented in Figure 5.Note that  the best number of neighbors k and the selected distance types were employed in the following study.
Elaborating the whole of the previous input vectors, an analysis has been made to compute the accuracy rate at each node.Table 3 illustrates a detailed analysis of the accuracy rate leading to selecting the more appropriate input vector and neighbor number for each node.From these results, one can clearly see that the input vectors 1, 2, 3, 7, and 9 separate the and thermal faults with the rate of 98.75%.Indeed, both nodes of electrical faults are suitable with the input vector in ppm.Regarding the thermal faults node, the first node (T3 vs. T1&T2) is more compatible with both input vectors, in percentage, and the combined Duval triangle and pentagon.The accuracy rate in this stage reaches 93.24%.The second thermal node is consistent with the combined Duval triangle and pentagon input vector, with an accuracy rate of 91.67%.
Based on the results of Table 3 as well as the accuracy rate by nodes and type of fault, it can be concluded that the use of the input vectors 2 and 9 in the same algorithm can improve the overall accuracy of the diagnostic.This reasoning can be recapitulated in the flowchart given by Figure 6 (denoted by model 1).
Table 4 shows the diagnostic accuracy rate of the KNN algorithm when using the input vector of DGA in ppm and the combined Duval triangle and pentagon input vectors and the proposed enhancement of diagnostic accuracy of the KNN algorithm by the strategy of a decision tree.
One can see that the diagnostic accuracy rate achieved a value of 92.5% with the proposed method.It was 91.88% with the input vector of DGA in ppm and 90.63% with the combined Duval triangle and pentagon input vector when used alone (Table 3).The first thermal fault node detected many cases by T3, but the actual fault was T2.Therefore, another model has been proposed, as in Figure 7, to overcome this situation.
The second proposed model shows a high accuracy rate of 93.75% compared with the previous application given by Table 4.This imply that the second model is more accurate.For further verification, all proposed models are tested by an independent new dataset of 20 samples.Table 5 shows the results of both proposed models along with those related to the KNN classifier with input vectors 2 and 9.
For the validation phase, the results of Table 5 confirm that the second proposed model (model 2) gave the best accuracy rate compared to the other ones.An accuracy rate of 90%  was obtained against 80% when employing the first proposed (model 1).Both models are more accurate than the traditional methods of power transformer diagnosis as shown in Table 5.
Table 6 illustrates the comparison between the KNN-DT (with model 2) diagnosis results and the Duval triangle, Rogers ratios, and IEC 60 599 methods, the common DGA methods in the literature.The results in Table 6 explained the high performance of the KNN-DT for correctly diagnose of transformer faults where the KNN-DT accuracy succeeded in detecting 18 samples of 20 with 90% accuracy.On the other hand, the other three methods developed poor diagnostic accuracies, which are 30% (6/20), 25% (4/20), and 30% (6/20) for Duval triangle, Rogers' ratio, and IEC code, respectively.Therefore, the KNN-DT has high reliability to diagnose transformer faults.

VI. CONCLUSION
Oil-immersed power transformer fault diagnosis was investigated using different DGA methods, in which a database set of 501 samples was exploited.A decision tree algorithm was improved the KNN classifier to enhance the accuracy of the transformer faults diagnostic.The neighbor's number and distance type of the KNN algorithm were optimized to improve the classifier's accuracy rate.Several input vectors were assessed for each fault to select the most appropriate vector associated with this type of fault for the combination of the KNN classifier with the decision tree principle.The obtained results were discussed, and two models were proposed in order to improve the global accuracy rate.Both proposed models confirmed their accurateness where the global accuracy rate exceeded 93% for the power transformer diagnosis.A complementary validation phase of the proposed research was also considered using an independent database.

FIGURE 2 .
FIGURE 2.Influence of the number of neighbors on the classification results (reproduced from[7]).
ALGORITHMIn applications, several closer neighbors are usually employed.The decision favors the classes majority represented by the k neighbors obtained from equation 5, k r being the number of observations from the group of closest neighbors belonging to class r and ''c'' the number of classes. c

FIGURE 3 . 4 .
FIGURE 3. Impact of input vector ''type 1'' and number of neighbors on the classification performance (accuracy rate).

FIGURE 5 .
FIGURE 5. General representation of the decision tree strategy.

TABLE 2 .
KNN classifier for all input vectors.

TABLE 3 .
KNN analysis of accuracy rate.

TABLE 4 .
Comparison of accuracy rate before and after improvement with models 1.

TABLE 5 .
The accuracy rate of the validation phase.

TABLE 6 .
Comparison between the proposed algorithm and other traditional methods.