High Impedance Fault Location Methods: Review and Harmonic Selection-Based Analysis

High Impedance Faults (HIFs) are recurring events in electrical Distribution Systems (DSs) and occur by the contact between energized conductors and high impedance surfaces. HIFs may pose hazards to living beings and cause bushfires. However, the HIF protection has not been completely solved due to the small fault current and varying impedance, inhibiting traditional protection techniques from functioning correctly. In the literature, researchers have mainly focused on detection techniques. Thus, the development of HIF Location Methods (HIFLMs) is recent, and evidences for conclusive solutions are still lacking. Moreover, to this date, no existing study reviews the main challenges concerning HIFLMs in DSs. This paper proposes a systematic analysis of the common stages to design the main existing HIFLMs. The strategy is evaluating the similar characteristics that pose a common research path regarding challenges faced in real-world conditions. Additionally, this paper proposes a case study to assess the best input signals, metrics, and machine learning-based decision algorithms of a new HIFLM. The results are promising, with high identification rates, even in noisy conditions. The methodology can help to select the datasets for supervised learning-based HIFLM. Highlighting the state-of-art of current methods and support development of HIFLMs are this paper’s main contributions.

H IGH Impedance Faults (HIFs) represent 5 to 20% of fault occurrences in Distribution Systems (DSs) [1]. They occur due to the contact between energized conductors and high impedance surfaces, such as vegetation, asphalt, sand, grass, or cement. They can occur when conductors break [2] or when the conductor contacts vegetation [3]. One of the main problems related to HIFs is the risk of exposing humans and animals to electric shock and causing bushfires [3]. HIFs can also cause interruption of the electricity supply when it results in a conductor rupture.
Due to the previously mentioned issues, it is essential to develop solutions for HIF protection on DSs. The interest in developing studies regarding this type of fault started in the 70s [3]. Since then, efforts have been mainly focused on fault detection. However, a complete protection scheme should also include the fault location, which decreases the time needed for utility maintenance teams to identify the fault location and restore the energy supply.
There are several constraints to the HIF location, such as the fault's non-linear impedance and low fault current. Typically, in low impedance faults location techniques, the error in estimation due to load current variation after the fault can be ignored. However, for HIFs, this consideration is inaccurate, as the fault current is usually lower than the load current. Moreover, due to the nonlinearity and low amplitude of the HIF current, the uncertainty in the estimated distance is considerable, increasing the chance of errors [4].
When analyzing the existing HIF Location Methods (HIFLMs), there is no consensus regarding a new algorithm's tests. A wide range of metrics and decision algorithms are used to locate the fault. In the literature, there are comprehensive studies regarding HIF detection [5], [6]. These studies also comprise a few HIFLMs, but they do not perform a specific analysis of their challenges and development stages. Thus, there are no previous studies concerning a critical and comprehensive comparison of existing HIFLMs in DSs. Hence, this paper's extended version performs an unprecedented critical review of these techniques. The proposed approach is to identify the main steps addressed to develop each method and, by comparing them, outline the characteristics that make an algorithm more appropriate to realworld conditions. Hence, this study can contribute to establish the state-of-the-art, also performing a review of commercial solutions. Furthermore, to help researchers develop new and practical HIFLM, this paper also proposes an analysis of training and test data for a new HIFLM. The methodology aims to establish the best approach for a supervised-learningbased HIFLM regarding its training/testing data and noise levels. The analysis consists of several tests, selecting the input signals, extracting the metrics, and determining the features. The results show high identification rates, revealing the best solutions for a study case.
Accordingly, this paper has been organized as follows. Section II exhibits the critical review of HIFLMs. Section III presents the proposed analysis methodology, and Section IV shows the results. Lastly, Section V concludes the analysis.

II. A REVIEW ON THE STAGES TO DEVELOP HIFLMs
The selected HIFLMs discussed in this paper were published on major journals and conferences in the area. There are few existing techniques for HIF location, as it is a recent research field, starting in 2012. The selected papers aim at localizing HIFs in DSs and present comprehensive tests for their real applicability. Their main aspects are described in this paper.
After selecting the techniques, the review methodology was addressed, identifying similar characteristics among them. Overall, HIFLMs are developed considering the following stages: i) data acquisition through simulation or real tests; ii) acquired data processing, iii) metrics extraction, and iv) a decision algorithm using a mathematical technique or a classifier to locate the fault [7]. Thus, this section presents an analysis of the stages executed when developing and testing HIFLMs. The purpose is to conduct a critical study, identifying the challenges and tests required to delimit a HIFLM application.

A. REVIEW METHODOLOGY
The first task in this study was selecting the main HIFLMs in the literature. In the sequence, similar characteristics among the techniques were identified. Overall, the main conclusions regarding the similar stages of the selected HIFLMs are: • Stage I -data acquisition: Deals with the ways to obtain an HIF signal to test an algorithm. The first way is by using a simulation software to model the HIF computationally. The second one is through tests in real situations. For both approaches, it is necessary to specify how the fault is applied and the signal's measurement location.
• Stage II -data processing: Assesses the signals measured during HIFs in different situations, such as when varying the system loading, the existence and penetration level of Distributed Generators (DGs), the number of measurement devices, the fault location, and the noise level. In general, these are the conditions to that new HIFLMs can face in a practical application.
• Stage III -metrics extraction: Investigates the metrics extracted from the signals. They are the researchers' tools to locate the HIF in a given technique. Consequently, comparing and studying them can reveal the best approaches for new HIFLMs.
• Stage IV -decision algorithm: Defines the decision algorithm of HIFLMs. It evaluates how the location is estimated, the possibility of multiple estimations, and the threshold determination method, if applicable. The following sections analyze each stage, and present a comprehensive comparison among different methods, which propose HIFLMs in DSs with sufficient information to make a fair comparison. The strategy is listing the elements that imply safe and effective protection and summarizing the existing HIFLMs characteristics. Hence, a critical and detailed analysis is developed.

B. STAGE I: DATA ACQUISITION
Firstly, HIFLMs require input data. Thus, it is necessary to acquire real signals in a field or use a software-based model that emulates the HIFs. Performing real HIF measurements can often be difficult, especially considering different conditions, such as fault location and system loading. Nevertheless, the HIF must be adequately modeled to assess a fault location algorithm by using simulations.
The models must represent the HIF's characteristics previously established in the literature. The first characteristic is the small fault current magnitude. Also, the HIF current is distorted due to the nonlinear electrical arc resistance and implies in low order odd harmonics in the signals. The HIF currents are also asymmetric, resulting in even harmonics. Another characteristic is the build-up, given by the progressive increase in current amplitude at the fault's beginning. The shoulder characteristic represents the periods in which the fault current amplitude does not present significant changes for a few cycles. There is also modulation, which is the progressive variation of the current amplitude over time. Moreover, HIF currents have the avalanche characteristic, which are the periods the current amplitude keeps zero when the voltage signal crosses zero until it reaches a sufficient value for the current to flow. It generates high-frequency components in HIF signals. Lastly, there is the intermittence, the period when the fault current extinguishes, with subsequent re-ignition. It is noteworthy that all these features occur randomly along the HIF due to the random nature of the electric arc. All these characteristics can be seen in a typical HIF current waveform in Figure 1 and are thoroughly described in [8].   In some methodologies, the HIF was simulated using only a high-value resistance. However, it may be insufficient, as it does not provide the main HIF characteristics. Furthermore, the evaluation performed by [1] and [9] used fault resistances from 60 to 100 , and 19 , respectively. These impedance values can be considered low for HIFs in medium voltage DSs. Thus, their practical application may be affected. Table 1 also shows that some techniques were tested using a non-linear model. Despite representing resistance variation and phenomena such as avalanche, buildup and modulation, these models do not randomly vary the characteristics (which occurs in real signals). Consequently, it can affect the HIFLMs performance in real conditions. In [10], the tests used both an HIF model and real signals. The results revealed that the technique's success rate decreases when using real signals, proving that considering the actual HIF random characteristics is crucial for evaluating HIFLMs.
Some techniques, such as [11] and [12], consider an analytic fault model in the methods, which makes considering the HIF randomness difficult. In this situation, using an accurate HIF model becomes even more relevant.

C. STAGE II: DATA PROCESSING
This section presents the main issues in DSs that can affect HIFLMs, such as the input signal, sampling frequency, electrical system size, number of measurements, noise, and DGs. Table 2 summarizes the signals used as input data for the HIFLMs. Most approaches use both voltage and current signals. Current transformers might respond as low-pass filters, resulting in the loss of part of the signal information. Thus, this aspect can restrict the use of current signals. In contrast, using voltage is only efficient when there is a sufficient voltage drop between the measurement and fault location. Therefore, as the HIF current amplitude is small, the measured voltage may not be affected [25]. Reference [24] proposes the use of magnetic field for HIF location. Nevertheless, using this signal requires specific devices for the measurements, as it is not usual in protection schemes.  Table 2 also shows different sampling frequencies used in HIFLMs. When using a low sampling rate, only the HIF characteristics that impact the signal's low-frequency components can be analyzed. Consequently, HIFLMs based on high-frequency characteristics require a high sampling rate. However, the measurement equipment becomes more expensive, and its computational burden increases.

1) THE INFLUENCE OF THE INPUT SIGNAL AND SAMPLING FREQUENCY
In general, as seen in those papers, there is no consensus on the optimal sampling frequency used to acquire the input signals of HIFLMs.
Additionally, Table 2 addresses the software used for testing the techniques. The purpose is to inform the reader on the most used simulation software for testing new HIFLMs. As shown by the table, these are transient modeling programs rather than steady state or power flow-only software. An example is the Alternative Transients Program (ATP) [26], which is widely employed for studying electromagnetic transients. It is a version of the Electromagnetic Transients Program (EMTP) [27], which is also used for this type of analysis. Another software for this purpose is PSCAD [28], which is used in [9]. There are also tests with the Real-Time Digital Simulator (RTDS) [29]. It uses real-time hardware to perform the algorithm calculation, similar to practical applications. It is worth highlighting the use of Matlab [30], as a circuit simulator via Simulink and a tool for processing the algorithms. Moreover, other software are the PCFLO and the PowerWorld [31], used in [19] and [1] respectively.

2) THE INFLUENCE OF THE METERS POSITION ON HIFLMs
Usually, DSs have measurement devices in the substation. However, due to HIFs' low fault current, identifying them using a single meter becomes challenging. The current measured at the substation is the sum of the HIF and the system load current. Also, the voltage at the fault location is different from the measured voltage due to the line's voltage drop. Thus, many techniques require using multiple meters in different system locations. Nonetheless, tests concerning the HIFs incidence in several areas must be conducted to delimit the region the method can locate the fault. Table 3 outlines the systems size and measurement devices in HIFLMs tests. Most methods were tested on small systems with less than thirteen nodes and a single measurement location. The algorithms tested on large systems used more than one measurement location. The exception is the 1436-node system [10], which uses just one meter. However, the HIF was simulated at a single location.
The higher the number of meters in relation to the system size, the easier it is to locate the fault. However, the utilities need to invest more in measurement devices, increasing the costs. Also, the measurement allocation requires using an optimization algorithm or a statistical analysis.

3) THE INFLUENCE OF DISTRIBUTED GENERATORS ON HIFLMs
Usually, DSs have a radial topology. Therefore, protection techniques' analysis often considers a unidirectional power flow. However, there has been an increasing connection of DGs in distribution systems. The DGs influence the general performance of the DSs operation, causing a bidirectional power flow. From this perspective, the protection system structures have to change due to the modification in the fault current amplitude and direction [32]. Thus, it has led to a complete revision of the methods by which DSs should be monitored, controlled, and protected [33]. Furthermore, inverter-based DGs can inject harmonics into the system, which may increase the non-linearity on the current and voltage waveforms in DSs [34]. As there are HIFLM approaches that rely on the signals' frequencies (harmonics), they can be affected by the presence of DGs.
Fault protection devices have to deal with different types of faults, operating conditions, imbalances, load variations, and topology modifications (which may increase with the DGs) [7]. These phenomena affect techniques that use the system's power/current flow to identify the fault. Additionally, since the DGs' penetration levels can vary over time, techniques based on decision algorithms that require training may be affected. This last aspect will be addressed in Stage IV. Among the selected HIFLMs, just the approaches presented in [7] and [18] consider the presence of DGs in the system. Only [7] analyzed the influence of DG penetration level variation on the algorithm's operation. This analysis becomes increasingly important for modern DSs. In these cases, the DGs presence had little affected on the accuracy. Nonetheless, there was no analysis involving inverter-based energy resources.

4) THE NOISE INFLUENCE ON THE MEASUREMENTS
The signals measured in real DSs usually have noise. Thus, it is essential to consider their influence when developing protection techniques. The noise may impose a low impact in approaches that use only the signal's fundamental component as a metric. In contrast, techniques based on high-frequency components may be more affected by it [18]. The noise in DSs signals can be modeled as white Gaussian noise (WGN). It usually has a Signal to Noise Ratio (SNR) between 50 to 60 decibels (dB) [35]. However, some HIFLMs are tested with higher noise levels to evaluate their noise immunity. Table 4 outlines that most HIFLMs are not tested under noisy conditions. Among the tested algorithms, there is no consensus on the noise level. An ideal approach is accomplished by [24], which evaluated the techniques in a noise range, delimiting the method applicability.  Table 5 summarizes the main metrics and tools used to extract or measure them in the selected HIFLMs. In general, the metrics are based on comparing measurements among devices or signal processing techniques.

D. STAGE III: METRICS USED IN HIFLMs
One of the signal processing techniques used in the literature is the FT. Despite being widely employed in protection relays, it is a frequency domain transform, which considers the signal stationary in a fixed window. The frequency information calculated by FT is an average over the window. Therefore, the location of a specific transient on the time axis is unknown, as it is not related to the temporal domain [36]. Additionally, there is no consensus regarding the size and shape of the window used in FT-based transforms. As a result, these issues must be addressed when using this technique in a protection algorithm. Another method used to extract the metrics in HIFLMs is the WT. It is a time scale domain technique that captures the instant in time of the signal's frequency components [25]. Signals measured during HIFs usually present discontinuities near the current's zerocrossing region due to the avalanche effect, which can be identified by changes in the coefficients obtained through WT [36]. Despite the listed benefits, it is considered a challenge to design a systematic protection technique based on WT due to the narrow high-frequency support and subjective selection of the mother wavelet. Also, there is a resolution loss since the signal's samples are divided by two at each decomposition level [25], [36]. However, this problem can be solved by using the maximal overlap discrete wavelet transform (MODWT), for example, which does not present undersampling problems when increasing the decomposition level [8]. Table 5 also reveals that the zero-sequence current can be used as a metric to locate the HIF, as the HIFs are usually a single or two-phase event and can unbalance the system. Such methodologies must consider tests in steady-state unbalanced systems and the HIF incidence on end-of-branches nodes. Table 5 shows that metrics based on the system's power flow are also used for HIF location. In these techniques, the metrics are given by the system's modeling and the variations in the measured current and voltage signals. It is important to carefully analyze the system and HIF modeling when using such a strategy. Furthermore, the measurement can be performed along the system using Phasor Measurement Units (PMUs), Power Line Communication (PLC), and smart meters, as in [7], [14], and [16], respectively.
The PLC-based approaches are one of the few existing active methods for HIF location. Devices are set in a transmitter-receiver scheme, where a signal is injected at one line point and received at another. Multiple transmitters can be allocated in the system. Then, the fault position is estimated by analyzing the transmission quality in different network parts. Nevertheless, installing transmitters at every lateral branch is not economically feasible. Thus, using the received signal to estimate the channel (power line) characteristics can be more suitable. As a fault can create a discontinuity, it can be estimated by changes in the PLCs channel's characteristics. The authors of [37] elaborate on how this can be achieved, but no practical implementation was found. PLCs have ongoing and established use in highvoltage systems, but their use in medium voltage is less frequent. Nonetheless, it has potential. The application of PLCs can expand, which implies that higher sampling rates are one of the most crucial factors in enhancing the DSs' monitoring and diagnostic capabilities.

E. STAGE IV: DECISION ALGORITHMS OF HIFLMs
After extracting the metric, the techniques decide the HIF location using a specific algorithm, which is summarized in Table 6. Note that methods that use multiple meters can indicate differences between the devices' metrics to decide the fault location. Nevertheless, there are also HIFLMs based on machine learning techniques, which can be robust for the HIF location. They present higher identification rates than analytical techniques and can be promising for this difficultto-solve protection. However, they require training, which demands a large database, and the HIFLMs must also consider changes in the system's topology and load variation for their evaluation. Additionally, some methodologies indicate the distance between the meter and the fault, and other methods assign the HIF occurrence region. Overall, the purpose is to ensure that utility maintenance teams quickly dislocate to the correct fault location. Additionally, Table 6 reveals that most techniques use an adaptive threshold (the threshold varies according to the situation/system without requiring direct intervention). The extensive use of adaptive thresholds occurs because most techniques compare measurements in the same system to determine the fault location. Thus, even if changes occur in the system, by comparison among meters, the HIF location is correctly determined. However, there is a need to establish a methodology to determine the quantity and location of the measurement devices. Moreover, it is important to point out the problem of multiple estimations in the fault location. It occurs in systems with various branches, where there may be zones with the same electrical distance to the meter [12]. Consequently, there are methods that offer alternatives to prevent multiple estimations, such as [12].

F. REVIEW ON COMMERCIAL SOLUTIONS
Despite being elaborate, the presented approaches are not prevalent or part of a standard use from power utilities. Commercial solutions are still in early development, and most are unattractive due to their high implementation and operational costs. Standard practices such as including more selective relay schemes, reclose blocking, fast fuse curves, and sensitive ground protection are considered best practices for HIFs [38], but the fast location is still unsolved. Commercial products often are in two categories: distribution level micro-PMUs or high-frequency sensing from line sensors. Both procedures rely on multiple measurements along the network because the load current is lower at lateral feeders, making the small HIF current more detectable. The high-frequencies components get attenuated due to system reactance, requiring a measurement close to the fault.
Schweitzer Engineering Laboratories (SEL) presents a whole-system integration for addressing HIFs and locating them [38]. However, these functionalities come with crucial considerations. For HIF detection, SEL recommends their proprietary Arc Sense technology based on interharmonics, which is embedded in their recloser device [38]. The location is proposed for the downer-conductor case with sensors installed in lateral feeders that transmit current readings to a receiver closer to the substation [39]. They also present an option using private 900-MHz LTE or DNP3 wireless networks to achieve broken conductor detection before it touches the ground. Nevertheless, there is evidence that not all HIF are arcing faults presenting inter-harmonic frequencies [40], and transmitters spacing appears to be limited to 400 meters distance. Arguably, any sufficient distributed monitoring systems can be used for localizing HIFs if those result from broken conductors. Fault signals are stronger close to the fault, thus allowing a better estimation of its location. This approach of making multiple measurements financially feasible was adopted by SilverFern Power, which developed self-powered sensors to be installed in several locations in all phases [41]. The devices are effectively micro-PMUs that can record waveforms and transmit system metrics through 3-5G networks. As the devices are closer to the load, they have a higher chance of detecting HIFs. Zero sequence currents can also be recorded. However, given the latency of cellular networks, it is arguable that the trip signal will not be sent before the conductor touches the ground. Moreover, the company does not mention any technology for HIFs generated by contact with vegetation, for example.
Another approach is using high-frequency measurements to find signatures created by such faults, which was adopted by [42], through their sensor device that sits under power lines with antennas that capture the high-frequencies components. As high-frequency signals suffer high attenuation, such devices must be installed every ten kilometers. The signals are pre-processed at the edge and then sent to the cloud for further decision-making. The main advantage is that it can be used for other goals, such as predictive maintenance and early fault detection. The main downside is that it cannot be used as a tripping device due to its processing latency.
In general, the three described approaches illustrate the state of the commercial solutions for the HIF location: devices are in an early phase of production with standard practices still to be objectively set.

G. OVERALL REVIEW
This section summarizes the main conclusions regarding the performed review of HIFLMs.
The analysis revealed that most HIFLMs' tests are not performed using actual HIF signals. Consequently, HIF models are employed to emulate the fault behavior. Additionally, the test systems usually have a low number of buses or present several meters spread throughout the circuit. The study also proved the importance of considering the influence of noisy signals, as many methodologies are based on the signal's frequencies extracted by signal processing techniques. Moreover, machine learning-based decision algorithms achieved high accuracy.
Hence, the next sections aim at helping to solve some of the issues new researchers can face when developing new HIFLMs. The first one regards the input signal, whether it is best to use the neutral or phase current signals when measuring them in one location and considering different noise levels. The second analysis focuses on establishing the best metrics based on the signals' frequencies to be used as input features. Lastly, the study comprises the best use for a machine learning-based decision algorithm.

III. PROPOSED METHODOLOGY FOR HIF LOCATION
The critical review concluded that there is no consensus regarding the input features, metrics, and decision algorithms of HIFLMs. Thus, aiming to help researchers to develop efficient tools, this paper proposes a new methodology for HIF location. The approach is based on selecting the input features that represent the HIF location and using a machine learning algorithm to provide a robust decision algorithm. The goal is to indicate the best input features of a HIFLM. Figure 2 exhibits the proposed method's main steps. In the analysis, considering the HIF incidence at different system buses, the phase and neutral current measurements are kept at the system substation. The goal is evaluating which input signal is more relevant to HIF location. The analysis considers the input signals with different noise levels (as common for real DSs). After the measurement, the harmonics of the signals are extracted through Stockwell Transform (ST) at each cycle. Then, the harmonics' energy at each cycle is calculated. The harmonics' energies are the input features used for the HIF location. To avoid overfitting and increase efficiency, an analysis on a feature-selection algorithm to select the metrics is presented. Finally, the Random Forest algorithm (a machine learning technique) is used as a decision algorithm for the HIF location. A study for its best use is also accomplished.

A. HIF SIMULATION
The authors in [2] sampled real HIF currents in a medium voltage test system specially built for testing new HIF protection methods. The measurement occurred at the fault location, and the tests consisted of the conductor rupture with subsequent contact with different types of soil, such as sand, gravel, asphalt, cement, and clay, both wet and dry. The authors made the data available, which were used in this paper's tests. Some of the current waveforms are illustrated in Figure 3. Note that the currents have low amplitude, are non-linear, and have random characteristics, inhibiting the traditional fault location algorithms to proper functioning.
According to [43], current sources can be used to model harmonic-producing loads. Thus, the field-recorded HIF signals were inserted into the system using current sources controlled by Models in the ATPDraw (shown in Figure 4). The HIF simulation model also contains a switch to simulate the conductor breakage and another to connect the current source to the system.
The test system used in the study is the IEEE 34-node test system, whose single-line diagram is in Figure 5 [44]. This system was modeled through the ATP software [26]. All measurements were performed at bus 800 (the substation). The feeder has a total length of 55 km. Thus, dividing it into regions facilitates fault location. The regions were divided based on the electrical distance between the buses and the substation, considering the main three-phase branch and excluding the one-phase side branches. The main three-phase branch has a total impedance of almost 50 . Thus, each region has between 5 and 10 .
At each simulation, both phase and neutral currents obtained in the ATP software were transferred to MATLAB to carry out the next steps of feature extraction and HIF location tests.

B. METRICS EXTRACTION
The HIF current signals present non-linear characteristics due to the electric arc. Consequently, the frequency spectrum content extracted from the current signals measured during the HIF can be used to locate it. The metrics extraction in this analysis is accomplished by using the ST, a multiresolution spectral analysis technique that presents a great representation for time-varying harmonics [45]. It combines WT time reference ability with the FT capability of providing frequency information using a Gaussian-shaped window [45]. The inputs are the current signals. As output, it provides a matrix containing each signal sample's amplitude and phase. In this study, the ST is calculated at each one-cycle window, so the matrix dimensions are the harmonic orders per signal's samples. In the sequence, the energy of each harmonic is calculated at each cycle using Parseval's Theorem [46].

C. FEATURES SELECTION ALGORITHM
As previously mentioned, this paper uses a machine learning technique as a decision algorithm to localize the HIF, using harmonics as input. The proposed study's metrics are the harmonics' energy at each cycle. However, not all harmonic orders may be necessary as input features to provide adequate accuracy.
In this sense, feature selection algorithms are often used to reduce the computation complexity and dimensionality in machine learning applications. In this paper, the FEAST toolbox [47] was used to select the most relevant features (the harmonics' energy) for the HIF location. Among the toolbox's feature selection algorithms, the minimal-redundancymaximal-relevance (mRMR) is one of the best. Proposed by [48], this method selects the features with high relevance while maintaining a low redundancy among the selected features. Even though the method creates a rank of the given features, the number of features is not defined. Nevertheless, the features higher in rank are defined as the best ones. Hence, the mRMR was used to select the harmonic orders that best represent the HIF location in the system and can be used as input features for the decision algorithm. The study analyzes the real necessity of using the feature selection algorithm.

D. HIF LOCATION DECISION
The proposed HIFLM's decision technique is based on the Random Forest algorithm. It was chosen due to its positive performance after a set of tests with different machine learning-based algorithms. Random Forest is one of the most powerful machine learning techniques. It is based on an ensemble of multiple individual decision trees (thus creating a forest) [49]. Every decision tree is trained independently, using a randomly selected part of the training dataset. After training all the trees, the decision is given by the selective multi-voting method [49]. Its input data consists of the selected harmonics extracted from the HIFs' current signals. The algorithm outputs the region of the HIF location.

IV. TESTS AND RESULTS
This paper's tests consider the current measurements during steady-state and the fault. In real conditions, the signals measured in DSs are noisy, varying between 50 and 70 dB [35]. Thus, the tests consisted of non-noisy and signals with a WGN with an SNR of 60 dB and 50 dB.
The HIF location algorithm assumes that the fault was previously detected. The algorithm input data consists of the first 1 s of the HIF. The signals used in the tests were acquired with 128 samples per cycle (7680 Hz). Therefore, according to the Nyquist Theorem, it is possible to obtain the up to the 64 th order. Thus, for the signals acquired in the 60 Hz system during 1 s, 60 energy samples were obtained for each harmonic order as the Random Forest input features.
Overall, the analysis used 33 HIF signals applied to 18 three-phase buses with three noise levels, resulting in 1782 phase and 1782 neutral current signals. When using the energy of the amplitude of 64 harmonic orders for 1 s, 60 data samples for each harmonic result in 106,920 data for phase and neutral currents. Then, the dataset was divided into training and testing, which consisted of 55.55% and 44.44% of the data, respectively. HIFs on the terminal buses of each region were used in the training dataset, while HIFs on intermediate buses were utilized for testing. For example, in Region 2, HIFs on buses 812 and 816 are used for training, and buses 814 and 850 are used for testing.
The decision algorithm used in the proposed methodology is based on using a single Random Forest to classify the data among the five pre-defined regions (see Figure 5) or using five independent Random Forests, each one specialist in a region. For both approaches, the total number of decision trees was equal to 100. Furthermore, the decision algorithm's input data consists of using or not a feature selection algorithm to indicate the harmonics that best represent the database. As a result, the analysis is divided into four main cases to enable a comparison study: Forests; Input Data-The first 8, 16, and 32 selected harmonic orders using the mRMR algorithm; Thus, this section shows the results for each case, dividing the phase and neutral current analyses for each noise level. Table 7 shows the accuracy achieved using the first 8, 16, 32, and 64 harmonic orders as input data for the single Random Forest to classify the HIF location. It reveals that the success rate is above 91% when training and testing with the same noise level using all the 64 harmonic orders available. The lowest accuracy was for the training with 50 dB noise and testing with no noise. When analyzing the 8, 16, and 32 first harmonic orders, the best accuracy was achieved when training with the non-noisy dataset. Using only the first eight harmonic orders implied the worst results. Using 32 or 64 imply on similar accuracy. There was no significant difference between the phase and neutral signals.

B. CASE II -ONE RANDOM FOREST WITH FEATURE SELECTION
This case analysis used the rank of the 32 better-classified harmonics, shown in Table 8. It is important to consider the delimitation of input features. For example, if three input features are chosen, and there is no selection, the features are the DC, fundamental, and 2 nd harmonic orders. However, if the ranking is used, the selected features are the DC, 12 th , and 13 th harmonic orders. The results in Table 9 show the accuracy independence on the input current again. The accuracy increased compared to Case I, mainly when using the first 32 features selected. Note that, when training and testing with no noise, the accuracy was higher than 99%. Hence, the recommended situation for real applications is training the algorithm for all noise levels expected to be present, using 16 or more input features. Table 10 shows the algorithm accuracy when using five independent Random Forests, each specialist in a region. This case used the first 8, 16, 32, and 64 harmonic orders as input without selecting their relevance. Overall, the best results were achieved using the first 16 harmonic orders as input data, presenting high accuracy even in noisy conditions. This shows that overfitting can occur in machine learning techniques when using a large number of inputs, decreasing their generalization capability. Similar to cases I and II, there is no relevant difference between the accuracy of the neutral and phase currents. This analysis showed that the best results were achieved when training and testing with the same noise levels as in the previous cases.

D. CASE IV -SPECIALIST RANDOM FORESTS WITH FEATURE SELECTION
The last case evaluates the HIF location using five independent Random Forests and the mRMR to select the input data (Table 8). This case resulted in higher accuracy than Case III, indicating that using the feature selection algorithm improved the performance. However, the results were still inferior to Case II, showing that using a single Random Forest presented more promising results. Also, there is still no evident difference when comparing the phase and neutral currents. The highest accuracy was achieved using the 32 best-classified harmonics listed in the mRMR ranking. Table 12 summarizes the best accuracy achieved in the evaluated cases for the phase current, as the neutral had similar results. To analyze this table, take Case I as an example. When considering the tests with signals without noise, the best  training occurs when using the 32 harmonics extracted from the non-noisy signals as input data, achieving an accuracy of 98.90%. However, when considering the overall tests, which means training with a single dataset and testing with all noise levels (no noise, 60 dB, and 50 dB), the best solution would be to train the data without noise considering the first 16 harmonic orders, achieving an average accuracy of 93.75%. The highest success rate is achieved when training the Random Forest for the same noise level included in the tests. In this situation, the Case II conditions imply the best result. Nevertheless, if only one dataset is used for training, the conditions of Case I are the best. Cases III and IV hold the lowest accuracy.

E. RESULTS SUMMARY
The results are still promising even if the accuracy decreases in some cases because real HIF signals were used in the tests. Additionally, there is still no consolidated solution for the HIF location. The accuracy achieved using the neutral and phase currents was similar in all cases. Thus, if there is access to the substation transformer's neutral, this measurement would be a good choice because it does not need three measurement devices. Nonetheless, using the phase currents would be recommended if the transformer's neutral is not accessible.

V. CONCLUSION
HIF location is an issue without evidence of a conclusive solution in DSs, and there is no thorough existing analysis of it. This paper addresses this gap by proposing a critical review on the existing methods. The critical review revealed that most tests of HIFLMs consider HIF models and not actual signals. The test systems are usually short or present several meters spread, and noisy signals and DGs are frequently not considered. Also, the review indicated that most HIFLMs could be divided into three approaches. The first approach uses the signal's frequencies. The second one utilizes measurement devices along with the system. The third approach considers the system modeling and the signals' measurement. The presented analysis summarized these approaches. They estimate the distance between the measurement and the fault or the fault region. In summary, the strategies are based on comparing metrics (expected and measured signals) or machine learning techniques.
Additionally, this paper presented the analysis of input features and different uses of a new HIF location method using the Random Forest algorithm. The results revealed that when using a feature selection algorithm, besides decreasing the computational complexity of the method, it improves the method's accuracy when training and testing with a specific noise level. In contrast, when considering a single dataset for training, the overall accuracy when using a feature selection algorithm decreased, showing a loss of generalization. The analysis also proved that using a single Random Forest to classify the HIFs among different locations implies a higher accuracy than using specialist Random Forests. The use of phase and neutral currents implied similar results.