Detection and Classification of Lamination Faults in a 15 kVA Three-Phase Transformer Core Using SVM, KNN and DT Algorithms

This paper deals with the detection and classification of two types of lamination faults (i.e., edge burr and lamination insulation faults) in a three-phase transformer core. Previous experimental results are exploited, which are obtained by employing a 15 kVA transformer under healthy and faulty conditions. Different test conditions were considered such as the flux density, number of the affected laminations, and fault location. Indeed, the current signals were used where four features (Average, Fundamental, Total Harmonic Distortion (THD), and Standard Deviation (STD)) were extracted. Elaborating A total of 328 samples, these features are utilized as input vectors to train and test classification models based on SVM, KNN, and DT algorithms. Based on the selected features, the results confirmed that the transformer current can be used for the detection of lamination faults. An accuracy rate of more than 84% was obtained using three different classifiers. Such findings provided a promising step toward fault detection and classification in electrical transformers, helping to prevent the system and avoid other related issues such as the increase in power loss and temperature.


I. INTRODUCTION
Electrical transformers present a key component in the power network, from generation to end-costumer. Therefore, the safety and reliability of these components is an essential step in ensuring the continuity of the utility services. In this light, several researchers have studied and analysis the impact of fault in transformers in order to offer a better understanding of the impact of these faults and provide appropriate protection techniques. In addition, these studies may also provide The associate editor coordinating the review of this manuscript and approving it for publication was M. Mejbaul Haque . consistent monitoring and diagnostic techniques to detect transformer faults at earlier stages, improving the duration of operation and the reliability of the equipment [1], [2].
In literature, numerous studies were carried out to investigate the impact of faults in the electrical machines, including the power transformers e.g., [3]- [6]. Other researchers focused on developing and improving solution techniques to prevent such faults or to increase the performance of the transformers e.g., [7]- [10]. A couple of works aimed to detect and classify faults in transformers e.g., [11]- [13]. For these reasons, the techniques developed help to better exploit electrical transformers, and avoid material losses resulting from possible malfunctions.
According to a study in [8], the authors found that 37% of power transformer failures were caused by an insulation problem. These results come from a study on 343 power transformers with a voltage range of 33-400 kV. Among many other failures, winding, bushings, on-load tap changer and core failures are the most pertinent. In low voltage transformers, the rate of insulation failures is reduced whilst core failure can be identified as a primary failure regarding the laminations and interlaminations issues [14]. Therefore, these failures should be analyzed to provide a better understanding of these problems in power transformers as well as to identify and develop techniques for the diagnostic and maintenance.
In previous work [15], the authors studied the effects of transformer core faults -edge burrs and lamination insulation faults. They experimentally simulated and analyzed both faults utilizing a 15 kVA three phase power transformer where different scenarios are considered such as the area of the affected regions and the number of short-circuited laminations. Various flux densities are considered ranging from 0.5 to 1.8 T. The obtained results represent a good indication of the severity of short circuits relative to their position in the transformer core and can be exploited to discuss the power losses in the transformer core.
Based on the results presented in [15], this paper aims to detect and identify laminations faults in the core of a 15 kVA electrical transformer. Under normal and faulty conditions, different scenarios are considered such as the flux density, number of affected laminations and number-place of faults. Features are extracted from the measured current signals and used as input vectors for the training and testing process elaborating SVM, KNN and DT classifiers. A total of 328 samples are used where four features are selected.
The paper is organized as follows: Section II provides details about the experimental results and the signal processing process. Examples of the dataset is also presented and discussed in this section. Section III starts with a brief description of the used classifiers, followed by the obtained results from different scenarios. The obtained results are presented and discussed where detailed example is given for the results of the DT algorithm.

II. PRE-PROCESSING METHODOLOGY AND RESULTS
This section briefly provides the process of feature extraction for the detection and classification of lamination faults in the transformer core. Features have been extracted using signal processing techniques -Fourier Analysis applied to the current signals. The obtained dataset is then treated to reduce the number of the features, selecting those most contributing to the overall accuracy. Examples of the obtained results are presented and discussed in this paper since the full details are the core of other work where the authors studied the effect of these faults [15].

A. CURRENT SIGNALS
Mechanical deformations shear causes burrs on the cut edges usually followed by the process of punching and cutting the electrical steel. Both faults are the edge burrs and insulation deterioration between laminations, which are the most appeared faults in this type of transformer. These deformations in the core laminations affect the performance of the transformer and electrical machines, causing power losses as experimentally verified in many studies (e.g., [15]- [17]). Figure 1 illustrates two examples of the measured current signals under normal and faulty conditions. For a healthy mode, one can see that the flux density has an important effect on the magnitude and waveshape of the noload current. At low flux density, the current is of incredibly low magnitude in the order of 0.7 A. In the same range of flux density, the current waveform is like a noise signal accompanied by a periodical signal of low amplitude. With regards to flux density, detailed discussion on the effect of each type of faults on the current waveforms can be found in [15] and [18].
From the current waveform, one can obviously distinguish between each operation mode of the transformer. The current magnitude increases with both faults of laminations. However, the waveforms of the current are practically similar. This approach in the waveforms may affect the classification or detection of transformer faults. Quantifying the current signals is a common technique to help the process of detection and classification of faults in electrical transformers, or other electrical systems [19], [20]. For this, signal processing techniques have been applied to the current signals for the matter.

B. FEATURES EXTRACTION
In the first stage, the data has been collected without applying any faults -normal conditions (Healthy operating mode). In order to increase the credibility of the database, several flux densities are considered namely, 0.5, 0.8, 1.0, 1.5, 1.7 and 1.8 T. In the second stage, two types of faults have been applied on the transformer core to form the database. A full day was allotted to take the data of each error separately. This is to leave the transformer core enough time to cool down. The studied cases are summarized in Table 1. For reliable and feasible results, each test was examined many times. In order to increase the database furthermore, and examine the obtained results, each scenario of Table 1 has been repeated several times on different dates. Data collection started in November 2020 and continued for five months. It should be noted that a detailed description of the experimental results has been presented in [15].
A MATLAB code's tool ''FFT_Analyzer_App'' has been used to perform Fourier analysis on the measured results. Current signals have been used, and the frequency spectrum has been determined for each case of the experimental results. In general, the features extraction process starts by displaying the frequency spectrum over [0-500 Hz] frequency band. Figure 2 shows an example of the frequency analysis obtained for 1.8 T flux density for healthy and faulty operation modes.
As can be seen from this figure, the healthy operation mode can be distinguished from the faulty one in the proposed case. This healthy mode is characterized by the appearance of harmonics of the order 3, 5 and 7. Other odd harmonics appear with neglected amplitude along the frequency spectrum of the current signals. In terms of magnitude, the healthy mode is characterized by a small magnitude of about 0.6 A against 2.73 A in edge burr fault. In faulty conditions, the magnitude and number of harmonics increase compared to healthy conditions. The feature selection step is used to minimize dimensionality by excluding irrelevant features and Feature selection helps in improving the model performance by focusing only on the important variables. This step is conducted using differential evolution. For instance, the features have been selected based on a graphical representation to distinguish the independent features among the others, which are optimized into representative features. Figure 3 shows an example of fundamental values as a function of the THD of the transformer currents under 1.7 T flux density for healthy and faulty conditions. This figure clearly shows how THD and fundamental are different between healthy and faulty conditions. This means that both features can be applied to detect both types of faults of the power transformers core. For instance, the obtained results for any couple of points located in this figure, a simple line of equation ''Fundamental = αTHD + β'' can be used to separate between the two operation modes. Figure 4 shows a second example of the distribution of the STD values with respect to the THD of the transformer current under healthy and faulty conditions.
The same ascertainment can be obtained from this figure. However, a graphical method is not practical in the actual situation since a large number of samples is considered. For this, four features (fundamental, average, THD and STD) are used in this investigation. It was found that the use of such features is appropriate for the detection purpose. Referring to Figures 3 and 4, the same ascertainment has been observed with different combinations of the four selected features.

C. DATASET
The FFT technique was applied to the measured current under both healthy and faulty conditions. Features are extracted from the transformer currents, four features have been considered: the average value, the magnitude of the fundamental, total harmonic distortion (THD) and the standard deviation (STD). Table 2 gives the selected features, extracted from the current signal at 0.5 T flux density, representing a relatively low flux density.
As shown in this table, the average values for the healthy and faulty cases are not practically different; the healthy is 0.0655 and the highest point is 0.0677 which is in fault 3, and the lowest point is 0.0552 in fault 2. These results are logical as shown in Table 2 due to the fact that the continuous component of the current signal can be neglected. Also, the results indicate that both types of faults do not affect the symmetry in the current signal. Furthermore, it is clear that the fundamental values are practically different for the healthy and the other faults, while the healthy value is 0.655 the  For a relatively high flux density of 1.7 T, Table 3 gives the selected four features under both healthy and faulty conditions.
Comparing between faulty and healthy conditions, the results are clearly separated in this table compared to those obtained for relatively low flux density. In this case, the margin between the obtained results in healthy conditions are different from those measured when a fault is applied.

III. METHODS, RESULTS AND DISCUSSIONS
This section describes the methods used for the detection and the classification of faults in the transformer core. Samples of the database used to train and test the classifiers have been presented and discussed. The section also provides the obtained accuracy rate of each classifier for different datasets.

A. CLASSIFICATION ALGORITHMS
For detection and classification, three classifiers have been exploited. These include SVM, KNN and DT techniques. SVM techniques are usually used in the classification problems, prediction models, and regression [21]. For the classification problems, the principle of the SVM is to find hyperplanes of separation between two classes yi and yj. The hyperplanes should be with maximum margin. Find the hyperplanes solution, which means the classification becomes an optimization problem. The optimization solution is particularly important because hyperplanes represent the decision boundaries that help to distinguish two different classes [22].
The second classifier consists of KNN algorithm. In this algorithm, the decision of the classifier can be obtained from the vote of the KNN. The vote is based on calculated distances between the sampling points to the nearest neighbors of the total assigned points. Gaussian, triangular and cosine are some of the typical distances used in this classifier. It should be noted that the KNN technique is easy to implement and apply to any problems, including complex ones such as geographic information, text, images, and sound [23], [24]. Also, it is robust to noise. The introduction of new data does not require the reconstruction of a model. The class is assigned to an object with ease and clarity once the closest neighbors are displayed. The method performance depends on the distance type, and the number of neighbors, and how the neighbors' responses are combined. The results could be of inferior quality if the number of relevant attributes is low relative to the total number of characteristics. The distances on the irrelevant attributes will drown out the proximity on the appropriate attributes. The calculations made in the classification phase can be very time-consuming if the number of data sets is too large. The third classifier consists of the decision tree (DT) algorithm. In this technique, a decision is obtained following the tree, starting by a root node down to a leaf node [25]. The leaf node comprises the classifier response.
The data has been managed by considering different scenarios. Three types of decomposition of the database have been selected, the decomposition 30-70 means that 30% of the database is reserved for the training process and 70% for testing. The second type of decomposition is 50-50, 50% of the database used for training and the remaining of 50% of data exploited for testing. The last decomposition is based on 70% for the training phase and 30% for testing.

B. RESULTS OF FAULT DETECTION
In this section, both types of faults have been grouped to form a separate class, representing the results of the faulty operation mode. Therefore, a binary classification (healthy and faulty) is formulated where the aim is to detect the presence of faulty conditions. This process is based on the features extracted from the measured current. Table 4 gives VOLUME 10, 2022  the accuracy rate obtained using three different classifiers, namely SVM, KNN and DT.
From the obtained results, one can see that the proposed classifiers give roughly equivalent results, and that for the three proposed scenarios (data decomposition for training and testing). Overall, the accuracy rate is around 80% with a maximum of more than 82%, obtained when using half of the dataset for the training. This indicates that the number and the quality of the input vectors have both an important impact on the detection results.

C. RESULTS OF FAULT CLASSIFICATION
In this part, the classification between health conditions and both types of faults has been considered. The problem becomes a three-group classification. Table 5 provides the calculated results using the three classifiers, and for three scenarios of the training and testing process. The results in this table show the accuracy rate of each class separately, which is the ratio of the number of the correct decisions over the total number of samples for each given class.
From this table, one can clearly see that the classification results are affected by the type of fault. For instance, the results of the second fault show good accuracies for all the considered cases. This means that the lamination's insulation fault can be easily identified from the other conditions. This ascertainment is in good accordance with the conclusion made from the experimental results in [15]. In addition, edge burr faults show a good result for classification using larger data in the training process. For smaller data size for the training, the accuracy rate for this second class shows   a slight decrease. Moreover, the health conditions show relatively lower accuracy rates, especially when using the DT classifier. The overall accuracy rate for each case is presented in Table 6.
Overall, the results of the KNN classifier show a better accuracy rate compared to those obtained using the SVM classifier. For SVM, the accuracy rate is affected by the number of samples used to train the classifier. It is between 70.55% and 84.26% when using 30/70 and 70/30 decomposition scenarios, respectively. It is more than 80% for all decomposition scenarios when utilizing a KNN classifier. An accuracy rate of 84.05% is obtained for the case 50/50 using this classifier. For better visualization, Figure 4 shows an example of the confusion matrices obtained using the DT algorithm for the three scenarios.
From the confusion matrices, one can get a general understanding of the classification process. For example, precision and recall can be defined for each of the classes. Table 7 gives the precision and recalls for each class using the DT classifier.
In general, the results indicated that the classification was successful, especially for the second class of fault. Such findings gave encouragement in the direction of fault detection and classification of lamination insulation faults in electrical transformers. However, large databases are required to reach higher precision, and more accurate classifications are also required to provide assistance in preventing the electrical system.

IV. CONCLUSION
This paper presented a study on the detection and classification of lamination faults in the power transformer core. From a previous work [15], experimental results obtained using a 15 kVA transformer were exploited. Overall, the obtained results indicated that the transformer current signal is an effective tool for the detection and classification of lamination faults in the transformer core. The following conclusions are also drawn.
1. SVM, KNN, and DT classifiers gave a good accuracy rate of around 82% in the detection purpose where two classes were considered. 2. For the classification, an accuracy rate of 84.26% was obtained using the SVM algorithm. It was 84.04% for KNN and DT classifiers. The classification process was also sensitive to the data decomposition, especially for the DT algorithm. 3. It was found that the insulation lamination fault presents a good accuracy rate compared to other classes. Higher precision and recall were obtained for this class.
Such findings indicated that better detection and classification results may be obtained by enlarging the database or by using more accurate classification algorithms. It is also suggested to investigate the classification using other features by employing other signal processing techniques.