General Machine Learning-Based Approach to Pulse Classification for Separation of Partial Discharges and Interference

This article describes a complete approach to filtering partial discharge (PD) pulses from interference in high voltage (HV) electrical equipment using supervised machine learning (ML) techniques. The PD signals are registered in ultra high frequency (UHF) radiation band with a multisensor acquisition system composed of four antennae. The proposed methodology focuses on the implementation ML algorithms and proposes a novel field approach to the onset detection of incoming signals. The goal was to achieve high accuracy of filtering with reasonably low compilation times of the ML classifier. That would allow to use the model on edge sensor devices. In this article, different models and training variants of the ML framework are tested. The presented results are based on a robust measurement campaign performed in laboratories of Global Energy Interconnection Research Institute (GEIRI) Europe. The methodology is validated through tests on three separate test scenarios. Each represents a different complexity of the problem with an increasing number of active sources. The results show high potential for utilization of the artificial neural network (ANN) and other classifiers for PD filtering problems as the accuracy achieves the desired threshold of 80% for most of the tested variants. The methodology is a step forward toward a fully online PD and interference filter.


I. INTRODUCTION
P ARTIAL discharges (PDs) are localized electric dis- charges that only partially bridge the insulation between conductors [1] affecting only a small part of the dielectric media [2] that take place in all types of insulation systems.PD consists of self-sustaining electron avalanches caused by a local increase in field strength or a local reduction in electric strength [1], [3].PD produces transient electrical pulses lasting around 0.1-1 µs with a pulse rise time in the range of a few nanosecond [4] and they usually do not impact the short-term dielectric strength.However, in the case of frequent and repetitive discharge impulses present in ac voltage, PD leads to a drastically reduced service life of high voltage (HV) equipment [2].Therefore, there is a constant need to monitor the PD as their occurrence is an important criterion for the evaluation of insulation quality [1], [2], [3].
Three main types of PD defects can be defined: internal PD, surface PD, and corona PD.Each type has different characteristics and impacts on the dielectric [5], with internal being the most damaging and corona being almost harmless to the internal structure of the insulation [6].Due to their nature, they depend on the voltage in different ways, as can be observed on phase-related PD (PRPD) patterns [7].Apart from phase-related identification, PD can be also described in the time and frequency domains.This approach allows for direct analysis of individual PD pulses and observations of the correlations between the pulses' shapes and parameters and their origin [8].Moreover, the time and frequency pulse shape features (such as equivalent time and bandwidth) and have been previously successfully used to identify PD sources in various HV machines [9].
Traditionally, the PD measurement was an operation that was conducted offline [10] using a variety of processes that accompanied a PD, such as charge displacement, emitted radiation, heat generation, acoustic emission, or chemical reactions [11].The dielectric had to be removed from the operating HV machine (e.g., a transformer) and studied in circuits designed for PD detection [2].These methods allow for detailed analysis and description of the PD events occurring in the insulation [12] and are generally considered noise-free compared to the online counterparts [13].
The need to examine PD during the machine operation [14] led to the development of various methods implementing techniques such as electromagnetic or acoustic waveform monitoring.The conventional acoustic method is usually prone to suffer from various electromagnetic interference issues related to the measurement device (even though the typical frequency for acoustic PD detection is 20 to 300 kHz).Yet, recent developments show a high potential of adapting this technology with the use of optical fiber-based sensors to achieve better results in condition monitoring of HV equipment [15].
PD pulses can occur in the region of about 600 MHz to 2 GHz [2], therefore in online measurements very high frequency and ultra high frequency (VHF/UHF) band recordings with a very high sampling rate are currently the most widely used and studied [16], [17].The VHF/UHF band measurement has proven to be relatively noise-free in comparison to other online PD measurement methods [18], especially with additional denoising applied [19], and allows for reliable and early detection of defects in various HV machinery, such as gas insulated switchgear [17], [20] and HV transformers [16], [21].However, the radio-frequency bandwidth is relatively crowded (due to, e.g., mobile phones) and it is necessary to filter out the incoming PD signals from interference present in the same bandwidth [22] in a real-life environment.

II. ML IN PD DETECTION
With recent advancements in machine learning (ML) techniques, there has been a noted growth in approaches to handle PD pulses with new or existing algorithms [23].The uses range from attempts to denoise the recorded PD signal [24] with neural networks (NNs), to the localization of PD defects [25], and the classification of PD pulses with regard to the emitting source (a different defect or an interfering factor) [23].
PD classification is a complex task, without a single correct solution, and various implementations of ML techniques and PD descriptions are proposed.ANNs have been tested with good results with classification methodologies that focus mostly on pattern recognition (PR) within the PRPD [26], [27] and its statistical features [28], [29].Approaches using the time-domain recording to classify individual pulses have also been made with the pulse statistical and waveform features [30], [31] or using various dimensionality reduction techniques, such as PCA [32].In [33] and [34] combined approaches have also been tested.
Other ML techniques have been also utilized for classification with good results, including decision tree ensemble [35], [36], and support vector machine [37], [38].In recent years, also deep learning (DL) techniques have been tried.The typical uses range from the classification of PD pulses via pattern recognition with convolutional NNs [39] and long short term memory NN [40], to data augmentation techniques with generative adversarial networks [41].
However, due to the inherent black-box design their implementation is limited the designer wants to have more control over the input features and the output [23].Moreover, DL techniques suffer greatly when the availability of the training data is low, and the computational burden of both classification and training is high [42].Due to our desired use as a universal filter installed on programmable hardware (as an edge computing smart sensor) we decided to use the standard ML techniques in our work [43].With further development of the computational capacity of small controllers the switch to DL will become highly reasonable [44].
This work proposes and tests a complete classification methodology starting from the signal acquisition sensors and their setup, continuing with the data transformation pipelines, and finishing with an ML classification algorithm test.The proposed method allows for the identification and filtering of individual VHF/UHF pulses before their arrival into the recording system.The proposed methodology is desired for generic use with real-life online HV equipment.It uses supervised ML techniques and proposes a novel approach to dimensionality reduction of the incoming signals with precise detection of the pulses' onset.This approach moreover allows for better comparison of the PD-like pulses in light of their varied time-of-arrival to the four acquisition sensors.
The test is performed on a detailed dataset of VHF/UHF bandwidth pulses recorded specifically for this purpose in a remote sensing laboratory belonging to Global Energy Interconnection Research Institute (GEIRI) Europe.
In Section III, the main measurement procedure is described with details about the devices used to generate the pulses, Section IV contains a short description of the proposed ML approach used in the classification procedure.Sections V-VII describe in detail the data recorded in this case study and the results achieved by the implementation of the proposed methodology.

III. MEASUREMENT
The datasets were acquired via a UHF/VHF PD detection system.We conducted a detailed measurement campaign with physical sources of PD pulses and interference.The same acquisition system has been used to record pulses in other case studies including real-life situations [22], [31], [45], [46].
The acquisition system is composed of four bi-conical ultra-wide antennae that were installed close to potential PD sources.The pulses have been recorded through a field programmable gate array (FGPA) as short snapshots with a time duration of 4 µs at a sampling frequency equal to 2.5 GHz.Activation of the recording software is triggered by a quick increase in the VHF/UHF radiation spectrum on a time scale of 0.1-1 µs.The bandwidth of the used bi-conical antenna is between 20 MHz-1 GHz which is sufficient to cover the bandwidth of the captures PD pulses (which for the ac range between 600 MHz to 2 GHz as described in Section I).
Generally, the antennae layout is modified depending on the studied PD source to achieve the best possible performance.In the case of the measurements presented in this article, the Measurement setup in GEIRI Europe laboratory, Berlin, Germany.Fig. 2. Antennae layout used in the experiment-green X marks the location of the PD device, green-APG, ARC location has not been tracked thoroughly.
layout has been kept constant throughout the experiments.The goal was to place the PD source somewhere in the middle of the antennae setup with all sensors at a distance of 1-2 m from the emitting source.The measurement setup is presented in Fig. 1 and the exact location of the sensors and PD sources is shown on the top view in Fig. 2.
The PD-pulse generation has been achieved using a portable PDSIM-600 model device by Spark Instruments that is able to physically emulate six different kinds of insulation defects and related PD pulses [as seen in Fig. 3(a), the nameplate with more details in Fig. 3(b)].The device is commercially available and the authors did not participate in its design.In the recording of the dataset the sources have been activated individually, (apart from a few test setups described in more detail later) by stimulation with ac voltage equal to 5 KV.This allowed for clear representation of the studied sources both in the form of the acquired UHF waveform and PRPD for each source.
Moreover, to simulate background conditions and test the ML-based filter, other devices have been used that generate interference at the same bandwidth as the PD signals (as seen in Fig. 4).The two devices used for that purpose were an electric arc lighter (ARC) that emits chaotic signals with respect to their phase and power spectra and an artificial pulse generator (APG) designed by the authors that can emit pulses in strict power and phase range that can be set during the measurement.
All the recorded pulses (PD defects and the recorded interference) are presented in their time and frequency domain representation in Fig. 5 (based on a single example from the  relative recorded dataset), and the collection of their PRPD patterns is shown in Fig. 6.A PRPD graph contains the scaled power of all single pulses (in dB) associated with each cluster represented as a function of the 50-Hz power cycle.These graphs allow for visual PR of the recorded signals and will be used to distinguish the PD clusters from the noise clusters.
As can be seen in Figs. 5 and 6, it is quite easy for the human eye to note the differences between PD pulses and interference.In the time domain, PD signals are usually characterized by a short but dense fluctuation in the UHF waveform with fairly low power (as the background noise is still relevant in those cases).However, the difference is even more visible on the PRPD.PD pulses have a strong correlation with phase as the events are usually located in the rising part of both positive and negative half cycles in a stochastic power range.In comparison, the interference has either a completely random distribution when it comes to phase ARC or is very strongly defined in its occurrence APG.
The proposed algorithms have been tested on a workstation at Politecnico di Milano, equipped with an Intel 1 Core 2 i9-10900KF CPU with ten cores of the base frequency of 3.7 GHz.Additionally, a GPU based on the ampere architecture "Nvidia GeForce RTX 3060," supporting complex tensor operations was used.The version of CUDA drivers is 11.2 and the related TensorFlow version is 2.8.

IV. METHODOLOGY
Here, we propose a fast ML methodology of multiclass and multisensor classification that utilizes supervised learning techniques supported by a precise signal onset detection method.The procedure can be summarized by a flowchart present in Fig. 7.

A. Onset Detection
In a previous study [31], we showed that training a classification artificial NN (ANN) model on the entire waveform as an input can lead to satisfying results.However, the original waveform of each signal has 5120 samples thus the training and classification become highly demanding for even highend machines.We simplify the waveform by designing a feature extraction method that keeps the original waveform but significantly shortens it compared to the original recording.For that purpose, an onset detection methodology has been employed to pinpoint the precise beginning of each PD-pulse waveform.Furthermore, onset detection helps to avoid the problem of misclassification of samples due to travel lag and varied arrival times between sensors.Due to onset detection, the detection of an exact location itself becomes irrelevant for the purpose of the functioning of the filter.however, it is still relevant for the general diagnostics of the HV machinery.
The onset detection procedure is as follows.
1) A sequence of partially overlapping moving windows is identified and the mean sixth statistical moment (S 6 ) is computed for all the samples belonging to the window For k equal to 6, where X are the values from the chosen window µ and σ are, respectively, their mean and standard deviation, and E is the expected value.
From the result, the standardized cumulant is subtracted (equal to 15 for the sixth-order high-order statistic) to bring the values close to 0. Values close to 0 indicate that only mostly noise is recorded in the given time window.2) For each window, a derivative of the S 6 is calculated (d S 6 ), and its maximum value is identified.A threshold is set equal to 10% of the maximum value retrieved.This threshold definition has proven to be effective for real-life case studies and lab tests.3) All the crossings between d S 6 and the threshold are identified by checking the sign of a product of two 1 Registered trademark. 2Trademarked.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.following values of d S 6 : d S 6,n and d S 6,n+1 compared against the threshold [47]: where C id,n is the identifying index of a given sample and t is the threshold value for the given signal.For negative values of the C id,n index a crossing is identified at the nth sample of the signal.4) For each crossing a signal-to-noise ratio is calculated.It is defined as a ratio of the energy content of the 250 samples (corresponds to 5% of the recorded signal) after the crossing to the energy content of the 250 samples before the crossing where s i is the value of the signal at the ith sample.The crossing sample with the highest SNR value is chosen as the proper onset.Afterward, only 12% of the original waveform is kept as a single signal of 5120 samples is reduced to 625 samples.The procedure is visualized in Fig. 8.The shortened waveforms are later used as input for the classification models.

B. Model Selection and Training
In the selection process of the model best fitted for the filtering task, we studied five typical classification models that are widely available in open-source packages: support vector machine classifier (SVC), random forest ensemble classifier (RFC), k-nearest neighbors (KNNs) classifier, gradient boosting classifier (GBC), and an ANN classifier.The preliminary hypothesis (according to previously performed studies in [31]) assumed the best performance for ANN, therefore the training procedure was focused on achieving the best possible parameter tuning for this classifier.Also for this reason, hereafter, the detailed simulation results are going to be shown only for ANN.Through a short sensitivity study, the optimal parameters were picked as follows.
1) Input layer for feature space composed of the shortened waveform (625 samples) with rectifier linear unit activation function.2) Two hidden layers with 50 neurons each with rectifier linear unit activation function.3) Output layer with a shape adapted to the number of classes with "softmax" activation function.The rest of the classifiers are considered a benchmark in this case study.Their parameters are tuned through a brute-force style grid search in order to achieve the best accuracy score on the validation dataset.Each model (ANN included) is trained through fivefold k-fold cross-validation with stratified shares of the training data.Finally, the optimal models are kept and tested in the later part of the article.In each case, the model training input is composed of defined input-label pairs, with shortened waveforms (625 samples each) serving as the input and recording identification (ID) being the label.In this case, the input waveforms are not standardized or scaled, as the strength of the signal also carries information about the emitting source.
In reality, for each case, four separate models are trained, with one model per acquisition sensor (as shown in Fig. 7).This was done to achieve classification based on the emitted source instance, even for those pulses that do not have an equal full coverage in all the sensors.In the end, the final result for each emitted instance is calculated as an average of the proposed classification probabilities, where the models that do not have their pulse representation for a given emitted instance are ignored in the process.

V. CASE STUDY A. Training Dataset
The case study is based on the data recorded during experiments in laboratories belonging to GEIRI Europe GmbH in Berlin, Germany.The recorded data can be subdivided into two parts that were separately used for training and testing of the developed ML filtering procedure.
The training dataset has been prepared by separate activation of the available emitting sources (as already introduced in Section III)-six PD signals from PDSIM-600 emulator and two interference sources.The interference has been particularly chosen to represent different repetition rates of the signal (for ARC order of magnitude of 1-10 kHz, and for APG-100 Hz).During a single source activation, the acquisition system has been kept online until 4000 events have been recorded.As a result, a library of 32 000 perfectly labeled signals (due to separate recording) has been obtained.All the recorded sources with corresponding IDs are listed in Table I.
The corona discharges happen on the edge (outside) of the dielectric, hence their harmfulness depends on the type of HV equipment [2].Moreover, their presence might disturb the functioning of the acquisition and filtering system.It is especially true for open-air equipment, such as HV transformers [48], when PD acquisition systems might be triggered by corona discharges occurring in the overhead HV electric lines.
Therefore, different affiliations will be tested for the PD defect number 5, that is the corona.Each test case will be run with the ML model trained with the defect being treated as a PD, and later as an interference similar to ARC or APG devices.
A decision had to be made regarding a proper approach to ML model training.Many ML algorithms (such as SVC) can perform worse in cases of multiclass classification, especially with a high amount of possible classes and a low amount of class member samples.For that reason, both multiclass and binary classification will be tested and compared (the grouping procedure can be seen in Fig. 9).
1) Multiclass Classification: Each recorded source is labeled as a separate class for training purposes.Therefore, the classifier labels the test samples as belonging to one of eight available classes.For comparison purposes, these classes are grouped a posteriori into PD-like and interference signals results according to the key from Table I. 2) Binary Classification: The training datasets are grouped before training as PD-such as and interference according to the key from Table I.In this case, the classification procedure will be performed based on two classes instead of eight.The binary training has a significant additional computational advantage in form of direct signal classification into Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply."PD-like" and "Interference" groups.This can be particularly advantageous in achieving the edge-computing (sensoring) goal of the algorithm.The interference signals could be filtered "on the edge," before being properly recorded by the system and without clogging the data transition pipelines and involvement of human control to properly group the classes a-posteriori.

B. Test Dataset
The test cases have been recorded separately from the original training dataset.In this case, each scenario uses a different combination of previously independently recorded sources.The recording has been performed similarly with the acquisition system online until a batch of 4000 signals is captured, however, here multiple sources have been activated at once.To properly test the filtering procedure the subsets were created with the inclusion of interference signals as a "background" for PD-like pulses.As can be seen in Table II in total three different scenarios are tested.Defects IDs are referred to the list in Table I.
All of the recorded test scenarios represent the same type of defect (defect 3 -near ground PD) on three different backgrounds.It has been chosen due to its standard behavior compared to other PD defects in their time, frequency, and phase domains [as seen in Figs.5(a) and (b), and 6, respectively].The scenarios vary in complexity with more sources being introduced to the mix.In scenario I the defect is presented on its own with two clear interference-generating devices (ARC and APG).In scenario II the corona defect is added as an unclear interference (unclear due to its mixed treatment as in Table I).Finally, scenario III adds a second clear PD pulse source (defect 1 -internal PD).

VI. PERFORMANCE METRICS
The proposed model will be evaluated using four metrics that are commonly used in classification problems.
4) F1-Score: A weighted parameter combining precision and recall.For this study, the weight has been considered equal [50] In the test cases, the "true" label of the signals is identified through a hierarchical agglomerative clustering (HAC) procedure that was previously described in [45].It is based on a pairwise cross-correlation (CC) of pulses and was proven to be effective for the identification of different groups of pulses within a batch of signals.
However, this procedure does not result in the desired PD-like or Interference labels.The PRPD patterns of the resulting clusters have to be studied and, through comparison with typical individual PRPD patterns of both PD and interference, manually labeled as such.An example of this procedure can be seen in Fig. 10 where results of HAC for scenario I are shown.These results have to be compared with the general PRPD pattern of the available sources (in Fig. 6) to identify the true labels of the defined groups.In the end, cluster 1 can be labeled as a PD source, due to its defined power and phase signature, while clusters 0 and 2 are labeled as interference as they show similar behavior to APG and ARC, respectively.

VII. RESULTS AND DISCUSSION
Each of the three different test scenarios is used to find the accuracy of classification for the five defined classification models (SVC, RFC, KNN, GB, and ANN) trained for both binary and multiclass classification tasks.Regarding the treatment of the corona discharge as interference or PD, we also tested the performance of the models in both of the specified cases.In total, 60 test cases have been performed and the general results are presented in Fig. 11, for corona as PD, and in Fig. 12, for corona as interference.
The main takeaway from these graphs is the fact that most of the proposed classifiers (even the simpler ones) achieve the desired accuracy of 80% (marked on graphs with the dashed line).Additionally, no matter the method used, the accuracy decreases with the increased complexity of the scenario.Generally, the binary training case presents higher accuracy with a very low drop in between different scenarios.Surprisingly treating corona as an interference brings better stability to the model and decreases variance between scenarios.The possible explanation could come from the fact that the original training recording of corona is far from perfect (as seen in Fig. 6) with numerous interference pulses being caught in the mix.Hence, the uncertain treatment of corona discharges is reflected by somewhat faulty recording.Thus, it is probably easier for the classifier to consider these pulses as interference rather than PD pulses.
Aggregated results can be seen in Table III.With the second-best results, the hypothesis of ANN being the optimal classifier for this task can be easily questioned.However, considering the lower standard deviation of ANN in comparison to KNN (which has the best accuracy) it can be stated that ANN performance is generally more stable and thus better fitted for a multipurpose filtering tool.

TABLE VI SCENARIO III-ANN DETAILED RESULTS
A more detailed study of the ANN performance has also been conducted for the corona as interference case.Based on metrics defined in Section VI, precision, recall, and F1-score have been calculated for each scenario and the results are presented in Tables IV-VI.The scenarios represent an increasing level of complexity, along with an increasing number of active PD and interference sources (as seen in Table II).
As can be seen for ANN all the results achieve the desired threshold of 80% accuracy with good values in terms of precision and recall of the interference class prediction.On the contrary, for scenarios II and III the results for PD are not completely satisfactory with the precision and recall for the class at relatively low levels.This signifies that the filtering is too strict as many PD pulses have mistakenly been classified as interference and rejected by the filter.This confusion is most probably caused by the inclusion of corona discharge in the interference group, as the ANN learns that patterns that are typically more in line with typical PD should represent one of the rejected classes.As for scenario III, the expansion of added sources has a further impact on the detection and classification accuracy of the system.Nonetheless, the global accuracy for ANN still exceeds the desired threshold of 80%.Similarly, low precision values are observed for the PD class, as it reaches 63.5% and 45.8% for binary and multiclass training, respectively, resulting in a high number of interference signals that are passed through the filter.Moreover, as the amount of recorded signals grows (a recording of 4000 signals is a matter of a few seconds), this outcome might be considered highly undesirable as it may lead to clogging of the data pipelines.further studies have to be performed to optimize the precision of the PD class in different test cases.

VIII. CONCLUSION
As demonstrated, PD detection and classification is a complex problem that requires a robust solution that spans across various areas of expertise: from electrical engineering, through signal processing and analysis, to statistics and ML algorithms.
The proposed methodology tries to bridge the gaps between these domains and deliver a general and efficient tool for filtering the PD signals from interference present in the UHF band.As the tests show, despite the high accuracy of classification the precision of identification of PD pulses suffers for more complex recordings.Further optimization of the selected model training has to be performed to decrease the number of signals that are rejected by the filter.With proper implementation, all of the tested models used in the developed procedure had good enough accuracy and compilation times to be used as an online filtering tool for edge sensoring.
However, a question remains on the treatment of corona PD pulses.As discussed, depending on the use case, it can be considered either as an interference or as one of the PD pulses.Surprisingly, the classification accuracy is slightly better and more stable when treating it as an interference, which might be caused by an imperfect recording for the training dataset.Additionally, the binary classifiers tend to outperform multiclass predictions by a small margin in almost every tested case.
Nonetheless, the global accuracy generally exceeded 80% for all the examined cases, thus the main goal of the performed case study has been achieved.With good accuracy and satisfying runtimes, the methodology is a step forward toward a fully online PD and interference filter.

Fig. 6 .Fig. 7 .
Fig. 6.Phase-related PD pattern of all the recorded pulses used for the training of the ML interference filter.Each typology has been recorded separately by activating the sources one by one.

Fig. 8 .
Fig.8.Example of the pulse onset detection based on the derivative of the sixth-order statistic moment (dS 6 ).The result is the shortened waveform (in burgundy).

1 ) 4 ) 2 ) 5 ) 3 )
Accuracy: The global accuracy of the classification model and is defined as the ratio of correctly identified signals (TP total) to the total number of signals recorded Accuracy = TP total Total input signals (Precision: The ratio of correctly identified class members (true positive class) to the total predicted class members (true positive plus false positive of a class) [49] Precision c = TP c TP c + FP c(Recall: The ratio of correctly identified class members (true positive class) to the total true class members (true positive plus false negative of a class)[49]

Fig. 10 .
Fig. 10.CC-based HAC clusters identified in scenario I test case; the true labels for test cases are defined based on comparison of the patterns with PRPD in Fig. 6.

Fig. 11 .
Fig. 11.Global accuracy for all the test cases for corona as PD.The lighter shade is for binary training, and darker for multiclass.

Fig. 12 .
Fig.12.Global accuracy for all the test cases for corona as interference.The lighter shade is for binary training, and darker for multiclass.