Detection and Classification of Fault Types in Distribution Lines by Applying Contrastive Learning to GAN Encoded Time-Series of Pulse Reflectometry Signals

This study proposes a new method for detecting and classifying faults in distribution lines. The physical principle of classification is based on time-domain pulse reflectometry (TDR). These high-frequency pulses are injected into the line, propagate through all of its bifurcations, and are reflected back to the injection point. According to the impedances encountered along the way, these signals carry information regarding the state of the line. In the present work, an initial signal database was obtained using the TDR technique, simulating a real distribution line using (PSCAD™). By transforming these signals into images and reducing their dimensionality, these signals are processed using convolutional neural networks (CNN). In particular, in this study, contrastive learning in Siamese networks was used for the classification of different types of faults (ToF). In addition, to avoid the problem of overfitting owing to the scarcity of examples, generative adversarial neural networks (GAN) have been used to synthesise new examples, enlarging the initial database. The combination of Siamese neural networks and GAN allows the classification of this type of signal using only synthesised examples to train and validate and only the original examples to test the network. This solves the problem of the lack of original examples in this type of signal of natural phenomena which are difficult to obtain and simulate.


I. INTRODUCTION
The automatic detection and classification of short circuits (faults) in distribution lines (especially in low-voltage installations) is a hot research topic with significant challenges ahead. Several techniques are available for the detection and classification of the type of fault (hereinafter referred to as ToF) [1], [2]. One is the analysis of the event signal (highfrequency spike) that appears when a fault occurs somewhere The associate editor coordinating the review of this manuscript and approving it for publication was Fahmi Khalifa . in the network [3], [4], [5]. Owing to the catastrophic and chaotic nature of these events (faults), the acquisition, simulation, and analysis of these types of signals represent a very difficult problem.
Another way to detect and classify the ToF is to analyse the response to the injection of high-frequency pulses into the power grid. The different responses of the grid to these injected signals, owing to the different types, makes it possible to classify these faults.
This study presents an analysis of these signals using neural networks. The objective was to solve the main problem of a lack of original examples. Finding a method to train a neural network model to classify faults with good accuracy and proper generalisation.
To understand this problem, it is necessary to know that faults in distributed lines [6], [7] depend on a physical phenomenon which, as mentioned above, produces a catastrophic event in the electrical network (short circuit of one or several phases to earth). This phenomenon has a chaotic and random nature (falling trees, lightning, fire, etc.). Therefore, it is a major handicap to obtain easily available real signals. By contrast, the training of a neural network [8] requires a large number of examples (typically tens of thousands). Neural networks learn to discover a particular pattern within a set of examples through a training process. In this process, the weights of the network were adjusted. Each time a new example is taught, the output of the network tries to ''adjust'' a little bit more and even closer to the real output (label). This fact makes it necessary to have a large number of examples that allow the network to generalise the desired pattern well. The objective is avoiding to ''learn'' just some few examples, which is known as overfitting [9].
On the other hand, the technique frequently used to deal with this phenomenon is to model mathematically the electrical network [10]. Using this model, we can produce a database of fault signals for certain scenarios and locations. However, even so, it is always a very complex task to obtain a database with typically tens of thousands of fault signals that will allow us to train a neural network with sufficient reliability; for instance, there may be a wide range of cases that produces each ToF. This implies that the electrical network may be in a very diverse state, even for the same ToF. Therefore, it is difficult to simulate all possible random conditions that would give rise to the same ToF.
However, a more plausible solution is to start from an existing database which includes several simulated faults. This database, although insufficient (hundreds of examples), is representative of the problem that needs to be solved. More details of this database are provided in subsequent sections.
It is at this point that our work starts by solving the problem of the limited amount of training data using a generative adversarial network (GAN) [11].
A GAN consists of two neural networks that are trained simultaneously. One of them, called the Generator, can generate an example from a random input (noise) of a certain dimension. The generated example is used as an input for another network, called the Discriminator. The Discriminator, on the other hand, receives these so-called Fake examples, along with others, from the original database. The latter, therefore, will be correctly labelled Real examples. The Generator attempts to generate more realistic examples, and the Discriminator attempts to identify whether an example is Real or Fake. With this ''zero-sum'' game, both networks are trained, and this process ends when the Discriminator is unable to tell whether the example generated by the Generator is Real or Fake.
From that point on, the Generator can be used to generate examples of this type. These have the particularity of having the main information that characterises them as belonging to that type. However, each example is different from the others because it is generated from a random signal (noise).
A final problem we will face is that although these fake images are very similar to the originals, they will not have the same distribution in common. This leads to the fact that a ''standard'' network, even if it is sufficiently powerful, will not be able to generalise well between different distributions. This fact has generally been addressed in other studies by mixing the data from both distributions to enable the network to learn both distributions. In our case, on the contrary, we will tackle the problem using Siamese networks [12], [13], thus exploiting their best feature: adapting well to different distributions.

II. ENVIRONMENT
The new challenges facing modern society have resulted in an increasingly important electrification of the energy system with a growing role in distribution networks. It is therefore necessary to prepare these infrastructures so that new challenges will be faced in the future, including the installation of multiple distributed generation resources.
The detection and classification of faults in electricity grids [14], as well as their localisation, are considered essential requirements to achieve the objective of implementing the smart grid concept in electricity grids. Unfortunately, the locating task is a truly complex challenge, but it is also often an arduous task that involves the movement of specialised personnel with ground or aerial means across the entire affected section. This is the main reason because the automatic location of fault systems is very worthwhile and is the essence of this article.
Currently, a widely used option is to minimise the impact of outages by introducing self-healing mechanisms [15]. In any case, the use of this type of mechanism implies that they are strongly meshed (radial and redundant topology) and are able to isolate a faulty section until the problem is solved without the need for the entire network to be out of service. These faults are usually due to short circuits between one of the phases and the earth or between different phases. It should be borne in mind that as all the sections of a network are connected in parallel, a short circuit is transferred to the entire network that is ''connected'' to that section. Once the protections of the corresponding section closest to the fault are opened, leaving that section isolated, it must be possible to detect the fault and its type as well as to locate the distance to it as accurately and quickly as possible.
Alternatively, in some lines (high-voltage and mediumvoltage distribution lines), there are fault passage detectors [16]. These detectors continuously monitor the voltage and current of the network. When a fault occurs, the device detects it and can supply this signal as an input for the fault locator system. However, these types of detectors are not usually deployed in low-voltage lines because of their high cost; therefore, the detection and classification of faults is undoubtedly very useful information as a preliminary step to locating them.
In this article, our main objective is to address the task of automatically detecting/classifying faults, which is a preliminary work to solve the bigger problem of localisation, while being aware that both are parts of the same solution and have to work together.
We use neural networks to obtain promising results from the time series obtained from certain captured signals. The physical principle used to obtain these signals is time-domain reflectometry (TDR) [17], [18]. It consists of performing periodic pulse injections on the line in each of the phases of the network (R-S-T) so that the electrical response of the network is updated frequently (which will be explained in detail below). When a fault occurs, the returned signal carries implicit information regarding the ToF that has occurred. This information is used by a trained neural network to classify faults that have occurred and classify them properly.
When TDR is employed, the pulses injected into each of the phases travel throughout the network and, at each impedance change (node, junction, etc.), part of the pulse continues its journey, and another part is reflected. This reflection of the pulses makes it possible to record at the injection point all the signals that ''return'' after ''bouncing'' through the different bifurcations of the network. These pulses carry embedded information regarding the network status. When the network is operating normally, the information from the signals is different from the information they carry when a fault has occurred, and this is precisely the information that the neural network is able to extract to determine the ToF that has occurred.

III. STATE OF THE ART
As discussed in the previous section, the detection/ classification process and the localisation process are part of a more global process which would be the localisation and interpretation of faults in electricity distribution networks. In other words, they represent two different phases of the same process.
Having said that, we wanted to give a more general approach to our state of the art, and in the following, we will not only review the state-of-the-art classification methods, but also provide a summary of the localisation methods (outside the scope of this publication).

A. METHODS FOR FAULT CLASSIFICATION
In this section, we show state-of-the-art fault detection/classification. According to [19] and, [20] it comprises three major groups.
-Prominent techniques -Hybrid techniques -Modern techniques Prominent techniques are most commonly used for fault classification and are further divided into three subgroups. One of the techniques that falls under this group is numerical tools based on wavelet transform (WT).
In this technique, the key is to choose a ''mother wavelet'' and then perform checks using versions of this WT. The WT can separate the signal into frequencies that can then be analysed with the help of multi resolution analysis (MRA). For example, it is possible to use discrete wavelet transform (DWT) to classify faults once the fault currents are known at a given location [21].
On the other hand, but also in the same group, artificial neural networks (ANN) are usually used to tackle with this problem. These types of systems can be trained by showing labelled examples to learn the features common to that class. These trained systems can then classify the new example shown to them.
There are some experiences of this technique implemented on MATLAB software to notice the fault on transmission lines. The output of the Simulink model was used to train the ANN to identify faults in transmission systems [22].
Also, under this group, efforts have been made to address this problem by means of Fuzzy Logic. The fuzzy-logic technique uses an easy relationship between the input and output variables. Therefore, the fuzzy logic technique performs simple control to deal with numerous issues, especially when the numerical model is not well known or is difficult to solve.
There are strategies for investigating overhead line failures based on a fuzzy system. For example, by comparing the s-transform and wavelet transform [23], it can be concluded that the fuzzy decision tree (DT) based on the s-transform provides accurate fault classification.
Hybrid techniques attempt to compensate for the shortcomings of separate methods. Neuro-fuzzy techniques have also been found in this group. These neural systems have attempted to adapt to changing situations. The combination of fuzzy inference systems that link human learning and performance approaches with certain improvement strategies has led to this technique.
One type of combination is the Stockwell transform (ST) and multilayer perceptron neural network (MLP-NN) / FFNN [24], [25] tested in a simulated IEEE 13-node test feeder. IEEE 13-node is a very small circuit model used to test common features of distribution analysis software. In the proposed technique, the three-phase current waveforms are measured from different points and then processed using ST to extract statistical features. The features are later fed into the MLP-NN / FFNN system to detect and classify the faults.
Another type of system that combines techniques is wavelet and ANN techniques. Wavelet and ANN techniques combine the features of a wavelet approach and an artificial neural network to obtain better results in fault classification. Some work has been carried out on fault classification using current signals used for thyristor-controlled seriescompensated transmission systems by integrating the DWT and ANN algorithms [26]. VOLUME 10, 2022 On the other hand, the combination of wavelet and fuzzy logic is used to decompose the output signals of the simulated power grid. These output signals were then used as inputs to the fuzzy-logic block. The fuzzy logic block has the particular rules used for this example that result in the type of fault.
In [27], a method that uses the mother wavelet Daubechies4 (db4) and a combination of DWT and fuzzy logic to classify faults was developed.
Finally, wavelet and neuro-fuzzy techniques [28] can be mixed for fault detection and classification using wavelet MRA coefficients only. The fault location for a seriescompensated transmission system using WT and ANFIS was developed in [29].
Modern techniques include support vector machines. This is a technique for learning separation functions in classification tasks (pattern recognition). This technique was based on statistical learning. Therefore, the input vectors were mapped nonlinearly into a high-dimensional feature space. This has been effectively applied to many classification problems, for example, by combining the SVM and wavelet techniques used to detect and classify fault types [30].
Another group is the genetic algorithm. Genetic algorithms (GA) work with variable encoding. (GA) uses a population of points at a time, in contrast to the single-point approach of traditional optimisation methods. It has been proposed that some methods contain a pre-processing unit that depends on both the DWT and GA, in which the DWT has been used to extract characteristic features from the input current signal [31].
In addition, we find Euclidean distance-based methods for this group. The Euclidean distance between successive current samples can be used for power-line fault detection and to identify the faulty phase [32].
Focusing on neural networks, which is the technique used by us, we have found some additional articles that have special relevance for the work developed in this paper: Some studies have used neural networks to process power line voltage and current signals to classify faults [33], [34], [35], [36], [37]. The voltages and currents were digitised to form the input to the neural network. Also, there are some works to making detection and location of aged cable sections in underground lines [38]. In this case, the transfer function of the cable is used as the input of the CNN.
Some studies have also been conducted on fault detection, attempting to extrapolate the voltage values from several simulated power grids. These values were used to train a neural network to detect faults in a real electrical network [39].
We have found some studies in which convolutional neural networks (CNN) were used for processing time series. The fault signals are time-series signals, so this kind of work is very relevant for us. In particular, EEGNet, a compact CNN for the classification and interpretation of EEG-based signals (electroencephalography EEG)), has been used to classify this type of signal [40].
Another interesting work is the processing of the time signals of partial discharge phenomena in power grids using image transformations (scalogram) [41]), in which the nature of the signals is very similar to our own. The transformation of this type of signal into images allows it to be processed by convolutional neural networks (CNN).
This type of neural network provides excellent results in image processing.
Another type of transform of time signals to images presented in the literature is the Gramian angular fields (GAF) transform [42].
We studied this transformation in the present study to use them with our signals.
The state of the art of deep learning includes publications that discuss the use of a special type of network called a generative adversarial network (GAN). Using these GAN, synthesised examples can be generated from a limited database. In our case, we have the problem of the scarcity of examples, so we can probably use this type of neural network to expand our database of signals of the transients produced in the electrical grid in fault [43], [44].
As has been shown, in the literature, there are several publications proposing the use of Neural Networks for the detection of faults and other phenomena of a similar nature in electrical distribution networks.
The possibility of being able to detect and classify the ToF by injecting periodic pulses into the network is currently state-of-the-art and can be a great help as a preliminary step in locating faults in automatic systems.

B. METHODS FOR FAULT LOCATION
Two state-of-the-art techniques are available for locating faults in power lines.
-Impedance measurement -Methods based on travelling waves

1) IMPEDANCE MEASUREMENT
The first technique involves measuring the impedance of the line at the fundamental frequency from a point on the network. The point at which it occurs can be calculated by observing how it varies when a failure occurs. Fault location in double circuit power networks is presented in [45]. In this paper, the new method considers the mutual effect of double circuit lines It has been tested over IEEE 13-node test feeder.
However, although this method works well for transport grids because it is generally simple with no bifurcations [46], it does not yield good results in distribution lines. In these lines, there are numerous bifurcations, and it is not possible to equip each section with a detector because of the high cost.
However, the measurement of these networks from a single point results in a large error because of the lack of proportionality between the impedance value and the distance caused by the multiple bifurcations of the network.

2) METHODS BASED ON TRAVELLING WAVES
Methods based on travelling waves are based on the propagation of waves through conductors. These methods are based on the well-known telegrapher equations. One of the techniques within this group consists of measuring the bounce of a high-frequency signal that appears when a failure event occurs [47]. Short circuits caused by faults produces a high frequency transient that travels through the network (Fig 1).
This technique has provided good results in transportation lines where the use of two devices, one at each end of the section to be monitored, is widespread.
However, in distribution lines with many bifurcations, it is not possible to instal a device at each node because of the high cost.
Another difficulty lies in the time synchronisation that must exist between the equipment for the calculation of the distance and the prior knowledge of each section of the lines.
Other travelling wave methods are based on the injection of a signal and the analysis of the electrical response of the system. Following the telegrapher's equations, these signals undergo reflections at each location impedance change, as well as attenuations and distortions, until they return to the starting point. The occurrence of a fault implies a change in the characteristic impedance such that localisation is possible, at least in theory.
This technique is known as time-domain reflectometry (TDR). In distribution lines, the biggest challenge that TDR must overcome is resolving the multiple reflections caused by shunts, which complicates the analysis of the electrical response of the system. Thus, [48] it is proposed to inject through a healthy and faulted phase, and then perform a modal transformation to decouple the three-phase signals (using the Karrenbauer transformation matrix). The problem with this methodology is that it requires a healthy phase, which is not always available (three-phase faults), and that the injected pulse must have too high the amplitude (high voltage), which requires the network to be de-energised.
Techniques have also been proposed to measure the travel time of transients generated by the fault itself and by an injected signal. Later, the comparison of both allows the localisation of the fault to be calculated [49]. The method is accurate at the simulation level; however, it requires a high sampling rate (the use of the wavelet transform is proposed). In addition, it is not possible to locate all the faults because it is required that the occurrence of the fault temporally coincides with the high part of the sine wave cycle in order to have the detection capability.
Another proposed technique based on TDR consists of performing a periodic injection on the line in a pre-fault state so that the electrical response of the network is updated frequently. When a fault occurred, the injection was repeated, and the responses were compared (Fig. 2). Thus, an attempt can be made to locate the fault from the first point of divergence between the two signals [17], [47], [50], [51]. In the figure, it can be observed that the signals begin to diverge at approximately 50 × 10 −9 s.
The pre-f ault signal (line response to pulse injection when the fault has not yet occurred) and the fault signal (line response to pulse injection when the fault has already occurred) are almost the same until the pulse reaches the point where the fault is located. It is this phenomenon the one that is exploited in the TDR technique to locate the fault and, in our case, to detect and classify the fault.
Regardless of the type of fault, the pre-fault and fault signals begin to differ from the moment they reach the point where the fault has occurred

3) USE OF FAULT PASSAGE RELAYS TO DETECT THE FAULT
In medium-voltage lines (distribution lines/15 kV to 20 kV), there are devices in transformer substations called fault passage relays. These devices monitor the voltage and current in each phase and detect when a fault occurs in any phase. Once the fault passage relays warning that a fault has occurred, the locating system can operate with certainty that the line is indeed faulted. Protection systems are alerted, and they can open the section of the line where the fault is located until the problem is resolved (fault-cleared).
However, in low-voltage distribution lines (230 V ac), these devices usually do not exist, and as there are large numbers of bifurcations, the installation of many devices would be unfeasible.

C. STRUCTURE OF THE PAPER
This paper aims to demonstrate the use of the pulse injection technique (TDR) for fault detection and classification as a preliminary step in fault localisation.
For this purpose, neural networks are used, which are trained to classify the signals obtained as a response of the electrical grids to the injected signals.

IV. METHODOLOGY
This section describes the proposed methodology based on the injection of signals into the distribution line and the analysis and classification of the received bounces. The methodology, including the pulse injection and subsequent transformations, is illustrated in Fig. 3. The following subsections explain the different steps of the methodology in detail.
To obtain a set of examples with which the neural network can be trained, the electrical grid of interest is modelled, as well as the injector with which the pulses are produced in each of the phases.
In this study, a real electrical grid was modelled using PSCAD TM software. Based on the above, the contribution of the present paper is threefold: 1) To demonstrate that there is sufficient information embedded in the signals reflected by the network by injected pulses (TDR), and that it can be extracted to detect and classify them by type. The first process consists of creating a database with signals obtained using the software (PSCAD TM ). To do this, a real electrical network was modelled beforehand, and a series of five fault types was simulated at different points in the distribution line: -Fault type 1: Short circuit between R to Earth.
-Fault type 2: Short circuit between S to Earth. -Fault type 3: Short circuit between T to Earth.
-Fault type 4: Short circuit between R-S to Earth.
• Fault type 5: Short circuit between R-S-T to Earth. The procedure followed, based on the time-domain reflectometry (TDR) technique, consists of injecting short-period pulses (∼10 ns) separated every few seconds (e.g. every 5 s). It should be noted that these pulses were injected into each of the three phases (R-S-T in a three-phase system).
According to the physics of transmission lines, these pulses travel through the network and are reflected at every bifurcation; therefore, some of them are bounced back. The magnitude of these reflections depended on the impedance of the line at each bifurcation.
As shown in Fig. 4, the pulses were injected into each phase with a time lag to allow the signal to be extinguished and therefore, to prevent the induction of the pulses into other phases.
Injection and reading of the response signal were performed at the same physical point on the line. In such a way that as soon as the pulse is injected, the response signal This technique is suitable for high-, medium-, and lowvoltage distribution lines. However, its use is most relevant for low-voltage lines, as fault passage relays are not normally installed because of high installation costs.
In this study, the injected signals were sampled in a simulator at 100 Msps (Mega samples per second). For each of the three injected signals (R-S-T), approximately 40 µs were digitised, after which the signal was practically extinguished. Thus, the total sampling time for the three phases was 40 µs × 3 = 120 µs. At the indicated sampling rate, 12,000 digitised values (parameters) were obtained, which could be used as input to the neural network.

B. TRANSFORMATION PHASE: FROM TIME SIGNALS TO IMAGES
Currently, in the state of the art, there are several examples of time-series problems that are treated very satisfactorily by Convolutional Neural Networks through their prior transformation from time series to images [41], [42], [52].
Therefore, a standard technique is to convert a time series into images for better and easier data processing. For instance, in [52], the Gramian Angular field (GAF) transform was used to transform the time series into images.
In the GAF technique, we represent the time series in a polar co-ordinate system instead of typical Cartesian coordinates. Here, the amplitude of the signal is encoded as the angular cosine, and the timestamp is the radius, as shown in Equation (1). This information is gathered in the form of a matrix in which the relationships between different time instants can be identified. In this matrix, each element is the cosine of the summation of angles (GASF) (2) or the sine of the difference of the angles (GADF) (3).  Applying this technique to each phase (R-S-T), we digitised 4,000 samples. In total, we have 12,000 samples (4,000 × 3), which leads to high dimensionality in the output matrix of the GAF transformation. To reduce this dimensionality, [17] an algorithm called Piecewise Aggregate Approximation (PAA) was proposed in [38]. First, we use the PAA algorithm to reduce the dimensionality to 128 (the choice of this parameter is justified below in the ''Results'' section). After applying PAA, GASF is applied to obtain a 128 × 128 × 3 channel matrix, where each channel corresponds to the signal of each of the three phases of the distribution line (R-S-T). As demonstrated in [52], the GASF transformation has an advantage over the GADF transformation in that the temporal signal can be reconstructed from the image (Fig. 5). This property will allow us to compare the original signals with the ''reduced'' ones ( Fig. 6). Once the signals have been adapted in such a way, the new format can be used in subsequent processing phases.

C. DATASET AUGMENTATION PHASE: GAN
At this stage, our database was converted to a GASF image database. However, owing to the problem discussed above and the difficulty of obtaining more examples quickly and easily, the sample set had a limited size of 200 signals/images that included five types of faults. Thus, only 40 samples were available for each class.
Because our next phase will be automatic classification, a very small number of samples will be a real handicap, and it is well known that any learning algorithm highly depends on a large training dataset.
This issue is common in many cases where data extraction is extremely difficult, and in some cases, becomes impossible. As previously mentioned, simulating tens of thousands of examples is unfeasible because of the simulation time required. Another related problem is the need to manually adjust the parameters to generate diverse fault types. A wide range of cases can occur due to the stochastic nature of the phenomenon for each ToF. Therefore, we used GAN to generate synthetic examples, thereby increasing the initial database.
As we have seen, dimensionality reduction leads to a substantial reduction in training time and allows for easier implementation of GAN. We have tested different types of GAN: Convolutional GAN (CGAN) [53], Conditional GAN (cGAN) [54], even Conditional Deep Convolutional GAN (cDCGAN) [55]. We did not obtain satisfactory results due to the well-known problem of convergence of this type of Neural Networks known as ''mode collapse. ' The GAN consists of a generator (D) and discriminator (D), as shown in Fig. 7. The generator has a random signal (noise) as input, which is used by the generator to produce new output images of the required dimension (Generated sample) (128 × 128 × 3).  On the other hand, the Discriminator is presented with original images (Real samples) and images generated by the Generator (Generated samples).
The Discriminator has to try to label them as ''True'' or ''False''. The successive training of the generator and discriminator ends once the discriminator is not able to distinguish the generated examples from the real ones. At this point, the generator can be used to produce as many examples as necessary (Fig. 8).

D. CLASSIFICATION PHASE: FAULT TYPES
Finally, at this stage, we have a database of ''synthetic'' GASF data (Generated samples) produced with the help of GAN, and a set of ''original'' data (Real samples) obtained from the simulation.
Usually, the training, validation, and testing data are split into a mixture of Generated and Real samples to achieve measurable results and improvements. For example, in, [44] the authors proposed using mixtures of no more than 50% of Generated samples.
However To avoid this situation, we have proposed a different training solution, in which the distribution of training, validation, and testing data is made up as shown in Table 1. The training and validation sets are composed only of Generated data, and the test set is built only with Real samples.
As a first check, we verified the results obtained using the well-tested networks. From the simplest to the most complex, LeNet, AlexNet, and ResNet18 were used in this work. With these classifiers, the results were very poor, as expected and as previously reported. All these tests lead to the conclusion that the model cannot be generalized. The model only learns the distribution of the training examples (Generated) but fails to work with Real samples.
Then, as a suitable alternative to try to obtain better results, we aimed to verify whether we could use another approach based on neural networks that could be trained only with synthetic data and had a good performance when tested with original data. For this purpose, we used Siamese Convolutional Networks [13] As shown in Fig. 9, the Siamese are twin nets that share the same weights. In this case, it can be seen that even if there are two convolutional networks, overall, we have half the parameters with respect to the GAN (46,854,530). The main feature is that they can be trained to learn a space in which the features of different classes are very close to each other. This is also our scenario, in which the different fault GASF images appear similar to each other. This was achieved by exposing the network to a pair of similar and dissimilar observations.
The network minimizes the Euclidean distance between similar pairs and maximizes the distance between dissimilar pairs (L = contrastive loss).
In (4), we can see that s 1 and s 2 are two samples (GASF images), y is a Boolean denoting whether the two samples belong to the same class, α and β are two constants, and m is the margin.
where (5) is the Euclidean distance computed in the embedded feature space. It can be seen that if y is 1 (different classes), the left-hand term disappears, and we attempt to maximize the distance between examples. On the other hand, if y is 0 (same class), the right-hand term disappears, and we attempt to minimize the distance between examples.

V. RESULTS
This section describes the methodology used in our experiments. An introduction is made to the configuration of the examples in the database, specifically how they have been selected for the different phases of the training (training, validation, and testing). Then, the results obtained for each case are presented.

A. STRUCTURE OF THE EXPERIMENTS
To carry out the experiments in this work, we relied on a database of 200 examples of Real samples (Fig. 6 (a)), converted to images using the GASF transformation ( Fig. 6 (b)), and 10.000 GASF Generated samples (2.000 for each of the five types under study).
In the first group of experiments, we focused on demonstrating the potential losses due to the GASF transformation and the subsequent suitability of GAN images. This will provide us with significant conclusions to determine whether these steps affect the quality of our results.
In Finally, when dealing with the Siamese network in the last experiment, a set of pairs randomly selected from the Generated samples were used to train the network. Two important facts should be mentioned at this point:1. This set was chosen to prevent imbalance among the types. 2. The validation and training phases were achieved with synthetic data (Generated), and tests were carried out with the original data (Real).

B. STUDY OF THE GASF TRANSFORMATION AND ITS RELATED LOSSES
As previously mentioned, the first step in processing is to transform the time signals into images using the GASF transform. In this step, it is necessary to decide which dimensionality reduction should be applied when performing this transformation. As explained in [56], this dimensionality reduction is obtained by applying Piecewise Aggregate Approximation (PAA).
Therefore, PAA is a source of potential loss and distortion. Therefore, a study in the frequency domain can demonstrate how this reduction distorted the original signals. Therefore, to proceed, we compute the FFT of the recovered signals already reduced according to the following procedure: -A Real sample is initially transformed into GASF images of different dimensionalities (4000, 1000, 512, and 128, respectively). -Each GASF Generated sample was back processed to obtain new versions of the same signal. -Finally, we computed the FFT for the last signals (see Fig. 10). From the study of these FFTs, it can be deduced that this dimensionality reduction is equivalent to a low-pass filter, as it is observed that as the reduction increases, the higher frequencies disappear and the lower frequencies are preserved.
For other types of studies, where the objective is to extract other types of more complex information (distance to the fault, for instance), these high frequencies can become relevant. In our case and in view of the subsequent results, the assumed reduction (128 × 128) is adequate and allows us to save time in the training/validation phase.

C. ANALYSIS OF POTENTIAL MEASUREMENT ERRORS
Simulations based on real power distribution networks can lead to many difficulties in obtaining an adequate amount of data. This is why the central motif of this article is to obtain valuable fake data (generated samples) that can be used to train a neural network and obtain optimal results. But at the same time, simulations can give us some uncertainty about the performance of the classifier in the face of possible failures of the measurement devices that can occur in a real environment. Indeed, this is a significant factor that distinguishes a simulated network from a real one. That is why this aspect has a specific section in this article.
In this sense, we are going to focus our interest on the study of three types of measurement errors:

1) RANDOM ERROR
Configured as a white noise signal, which has the next properties a) the signal is not statistically correlated between two different times, and b) its power spectral density (PSD) is constant throughout its spectrum. This type of error provides us with information on the behavior of the classifier over the entire frequency spectrum.
The results of the tests with this type of error are presented in Fig. 11, where errors of different signal-to-noise ratios have been used.
Several training epochs with different accuracy rates have been chosen, and each of them has been processed with noisy signals of 1 dB, 10 dB, 20 dB, 30 dB and 40 dB.  The conclusions that can be drawn from the figure are, on the one hand, the stability of the network with similar accuracy rates when the performance of the network in the corresponding epoch is partial or different from 100%. Except for epoch 3, where there is a generalized drop in network performance in the presence of noise. On the other hand, in epochs with a 100% hit rate, similar 100% accuracy rates are obtained from the network regardless of the level of noise.

2) BIAS ERROR
It provides an estimation of the accuracy of the measurement system and represents the systematic error that can occur in the system. Fig. 12 shows the results with respect to this type of error. The experiment is set up identically to the random error case, and the results are also very similar. A drop in performance at epoch 2, and identical behavior when 100% accuracy is obtained in the validation can be observed.

3) AMPLIFICATION ERROR
In the latter case, we consider the error that may be introduced by the instrumentation, due to its limited bandwidth.
The classifier's tolerance to this type of error has a much simpler justification, as this would be a subset of the errors considered in the first two cases.

D. CHECKING THE SUITABILITY OF THE IMAGES OBTAINED WITH THE GAN
As mentioned above, we trained a GAN to synthesize examples and increase the database size after transforming the temporal signals into images. This would be useless if those images were not of sufficient quality and did not appear to be very similar. To do this, we select the Real signals and compare them with the Generated signals, converted back from GASF images, to check whether they are suitable. Therefore, a comparison will be made with signals and not with GASF images for a better understanding of the negative effects.
In Fig. 11 and 12, we can see the comparison between each harmonic of all the signals, both the Real and Generated signals for each ToF. In these figures, the amplitude variation of each harmonic is represented by the black lines. Similarly, the green flag represents the 50 th percentile.
Looking at the graphs and moving average (blue line), it can be seen that both types of signals have a very similar frequency distribution pattern, which means that the GAN has consistently synthesized in terms of frequency spectrum vs. amplitudes.
Another aspect to highlight, in case of Generated signals, the percentile variation remains constant throughout the frequency spectrum. Meanwhile, the Real ones show different variabilities in each of the harmonics. Table 2 lists the maximum and minimum values shown in Figure 13. It can be stated that, in general, both signals are strongly correlated. However, there are some Real signals that show a low correlation (blue color) with the rest of the Generated signals. Fig. 15 (a) shows them with a frequency distribution pattern that is different from the rest. In these cases, the GAN has not learned to generalize this type of signal, which may be a limitation of GAN. Moreover, the rest of the examples with similar patterns were well synthesized by GAN and showed strong correlations.
Finally, the reader can appreciate the degree of correlation between the Real signals (see Fig. 14 and Table 3). The signals have a different pattern, since they are originally poorly correlated with all the others.

E. TRAINING STANDARD NEURAL NETWORKS MODELS
Our first attempt to classify was to train the standard models LeNet [57], AlexNet [58], and ResNet18 [59] with our new extended database to verify the results. Fig. 16 shows the progression of the complete training, validation, and testing processes for the AlexNet model. We continue applying the same strategy to ensure a fair comparison (training and validation phase using newly Generated data and testing results with the Real dataset).
The model can obtain values close to 100% after a few epochs, both in training and validation (with Generated data).
However, in the test stage, with Real examples, the accuracy is poor, which means that the model cannot be generalized well. This is in line with what exists in the literature regarding the need to train using a limited number of syn-VOLUME 10, 2022    thetic examples [44]. Table 4 lists the training results of the LeNet, AlexNet, and ResNet18 models. Similar results were obtained for the three models.

F. TRAINING THE SIAMESE NEURAL NETWORK
Siamese networks are well known for their effectiveness in one-shot or few-shots learning strategies [28], being able to adapt to new distributions quickly.
Applying the same learning strategy as in the previous sections, in Fig. 17 we can observe the evolution of the training loss for each epoch, as well as the loss and accuracy in training and validation phase for different learning rates.
For smaller values of the latter parameter, less aggressive curves are achieved. We would also like to highlight the performance with the test data. The Siamese network (SN) is able to generalize between both distributions (synthetic and original), and the  results are even better if the correct learning rate is finally chosen (in this case, 1e −5 ). In Fig. 18, we can see this comparison between the validation and test accuracies for each epoch as a function of the learning rate.
In conclusion, it can be said that the network converges to validation accuracy values of approximately 70% and test accuracy values of approximately 80% for the correct learning rate. This represents an improvement of more than two times with respect to the standard networks trained in Section D. This result shows that contrastive learning helped us to effectively classify original signals. In this way, the SN has been able to classify original examples that it has never seen before, but also belongs to a different distribution.
Apart from this result, it can be seen that with high learning rates (1e −4 ) as shown in Fig.17 (a), it is worth noting that even with this high variability, and owing to the ability of Siamese networks, it is able to return 100% accuracy rates both in the validation and test phases.
As mentioned before, this behavior can be explained because this type of network can be used for one-shotlearning, so the network does not require much training. Thus, the model can correctly classify never-seen examples belonging to another distribution. With a sufficient number of synthetic examples, it took a few epochs to correctly learn to separate and classify unseen originals.

VI. CONCLUSION
We propose a solution to the problem of fault detection and classification with a few examples: the generation of synthetic examples by means of GAN networks, com-bined with the application of contrastive loss (Siamese networks).
In relation to the expected objectives set in this study: These two groups will belong to the same data distribution, so the generalization could not be tested yet. To certify that the system does not only generalize to the training distribution a test has been performed on the original 200 data, which belong to a different distribution.
This new method facilitates the training of neural networks with this type of signal, for which few examples are available owing to their nature.

VII. FUTURE WORK
Our work is directly linked to a research institution that is strongly committed to TDR technology, which leads us to continue advancing in this field. First, we are trying to understand the limits to which we can take neural networks in the understanding of this type of phenomenon; second, we are excited by the results obtained, and we are encouraged to discover new particularities that can help solve different problems in this area.
However, our short-term points of interest are as follows: 1. To explore whether our method is valid also for high impedances and for simultaneous fault detection. 2. To explore whether our method may be affected by the presence of distributed generations in the distribution networks 3. To explore whether the neural network can generalize the learned fault types to other real distribution lines, without retraining or at least with a minimal learning phase. The versatility of the process must be such that it is functional in diverse environment 4. To explore the possibility of extracting more complex information, such as fault location, by adapting the technique described in this paper. The challenge here is to move from a classification problem to a linear regression problem by generating synthetic examples. 5. Finally, we believe that we have kept open the door that others have opened with one-short learning methods. This philosophy has a huge potential to be pursued, and new contributions can generate the common know-how that the research community needs. In 1999, he worked in the automotive field at the Production Department, Opel Spain, for four years. Since June 2002, he has been a Professor at the University of Zaragoza. In the meantime, he was a Coordinator of the smart vehicle initiative at the Aragon Institute for Engineering Research (I3A). He currently holds a European patent and has led to more than six projects funded in public competitions, more than 20 projects with companies, 12 indexed publications, and more than 50 contributions to international conferences. It is worth mentioning the 3 million awards for his research in the CvLAB Research Group. His teaching work has focused on electronics, from its basics to power electronics, including embedded systems. Within the university, he has developed his research in the area of computer vision. Since 2012, he has focused his research interests in the field of neural networks, and more recently, in deep learning. VOLUME  In CIRCE Technology Center, he was in charge of Innovation and Promotion Unit, created by himself on May 2009. He has been the General Director of the CIRCE Technology Center, since April 2016, and the Former Executive Director, since January 2011. In November 2011, he was designated by the Science and Innovation Ministry as an Expert in Energy Area Committee of the 7th Funding Program of the European Union, being in charge of the coordination of electricity grids topics. He participated in more than 35 research and development +i projects; in 16 of them, he was the primary researcher. He has author of ten articles in indexed journals and more than 50 contributions in international congresses. His active participation in forums, associations, and platforms linked to activity lines. He has eight patents, of which seven are being exploited. He researched the impact reduction of power electronic source grids using passive filters in the kilowatt range. His research gave him a Ph.D. title (specialized in electrical engineering) with the University of Zaragoza, in 2000.
JOSE SALDANA (Senior Member, IEEE) was born in San Sebastián, Spain, in 1974. He received the B.S. and M.S. degrees in telecommunications engineering and the Ph.D. degree in information technologies from the University of Zaragoza, Spain, in 1998, 2008, and 2011, respectively. He is currently a Senior Researcher at the CIRCE Technology Center. He has participated in research projects related to the digitalization of power systems, including ICT performance and security, and the use of wireless communications in industrial environments. He has published over 60 articles in peer-reviewed journals and conferences and RFC7962 in the Internet Engineering Task Force (IETF) on alternative network deployments. His research interests include wired and wireless networks with tight delay constraints, including multimedia services and digital communications in electric substations. He serves on the Editorial Board for IEEE ACCESS (Associate Editor) and the KSII Transactions on Internet and Information Systems (Area Editor). He has also served on the Steering Committee and TPC for many conferences, such as the IEEE Consumer Communications and Networking, the ACM Multimedia Systems Conference, IEEE ICC, and the IEEE Globecom.