Mathematical Morphology-Based Feature-Extraction Technique for Detection and Classification of Faults on Power Transmission Line

The permanency of highly-reliable power supply is a core trait of an electric power transmission network. A transmission line is the main part of this network through which power is transmitted to the utility. These lines are often damaged by accidental breakdowns owing to different random origins. Hence, researchers are trying to detect and identify these failures at the earliest to avoid financial losses. This paper offers a new real-time fast mathematical morphology-based fault feature extraction scheme for detection and classification of transmission line faults. The morphological median filter is exploited to wrest unique fault features which are then fed as an input to a decision tree classifier to classify the fault type. The acquired graphical and numerical results of the extracted features affirm the potency of the offered scheme. The proposed scheme is verified for different fault cases simulated on high-voltage transmission line modelled using ATP/EMTP with varying system constraints. The performance of the stated technique is also validated for fault detection and classification on real-field transmission lines. The results state that the proposed method is capable of detecting and classifying the faults with adequate precision and reduced computational intricacy, in less than a quarter of a cycle.


I. INTRODUCTION
During the past few decades, with the emergence of hitech innovations, the conventional electrical power systems up all over the globe are in the throes of transformation to smart power system integrating advanced monitoring and control approaches [1]- [3]. Also, the consumer demand for high-quality reliable and uninterrupted power has been escalating unceasingly. This ultimately headed for the installation of loads of new transmission and distribution lines making electrical power system an enormously intricate system [4].
The electric power transmission line is one of the most vital elements of the power system network since it conveys the electricity from generation to distribution end. The performance of these lines plays a pivotal role for continuous power supply. One of the most significant aspects that obstruct the continuity of electric power supply is a fault on these lines which is inevitable and way beyond the control The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Tedesco . of manhood [5]. If a fault is not detected accurately and persists for a while, it may lead to massive destruction or a power outage. Consequently, it is essential to own a more enhanced and well-coordinated transmission line relaying scheme that detects and characterizes any kind of fault efficiently within the destined time for assisting fleet repair and restoration of the power supply with least disruption [6], [7].
Transmission line protective relaying is a significant feature of a reliable power system operation. It comprises three core functions, viz. detection, classification, and location of transmission line faults [8]. Quick detection of faults facilitates speedy separation of the faulty line to safeguard the system from probable disastrous impacts of the fault. Moreover, the information delivered by fault classification can greatly facilitate the quick estimation of fault location, accordingly alleviating the fault clearing time and rapid restoration of power service [9]. As a consequence, plenty of scholarly research is forced to develop a robust, precise and intelligent scheme for fault detection and classification on transmission lines. In literature, the fault detection and classification techniques are generally divided into two categories as: i) the conventional techniques, and ii) machine-learning (ML)-based techniques. Owing to the tricky mathematical calculations, the conventional techniques have a large computational intricacy that depends on the size of the power system. In terms of speed and accuracy, the ML-based techniques are found to be more efficient for detecting and classifying transmission line faults [9]. The first step of ML-based techniques is the training of the classifier. The classifier needs to be trained by the extracted values for the particular fault features, acquired from the simulations of several fault scenarios in reliable software like MATLAB, PSCAD, ATP-EMTP, etc. Later, the new fault cases can be detected and classified easily with the help of this trained classifier [10].
Numerous signal processing techniques such as fourier transform (FT), wavelet transform (WT), stockwell transform (ST), hilbert-huang transform (HHT), principal component analysis (PCA), empirical mode decomposition (EMD), etc. are stated in the literature for fault feature extraction. Each of these techniques have pros and cons as listed in Table 1 [9]- [12]. Various ML-based techniques such as an artificial neural network (ANN), support vector machine (SVM), decision tree (DT), etc. [13]- [32] combined with the abovementioned feature extraction techniques are used for detecting and classifying the faults.
In [13]- [17], WT together with ANN is used for fault detection and classification on the transmission line. In [18], a combination of ST and ANN is used for detecting and classifying faults on the overhead transmission line. Even if the ANN-based techniques have been quite efficient in identifying the fault types, the major drawback of ANN is that a considerable amount of sampled data is needed for training purpose leading to increase the computational time and complexity.
Despite ANN, SVM combined with above-mentioned feature extraction techniques have been widely used for fault detection and classification on the transmission line. In [19]- [22], fault features extracted using WT are used as input to SVM for classifying faults on the transmission line. In [23], efficient and reliable detection of faults on the transmission line is achieved from the fault features extracted using ST analysis further SVM is deployed for identifying a fault type. One more approach combining PCA with SVM for fault diagnosis in the power transmission line is reported in [24]. A hybrid HHT-SVM and EMD-SVM based fault detection and classification techniques are stated in [25] and [26], respectively. Though all the aforementioned techniques have been successfully applied for fault diagnosis, they have some limitations as the speed, data and the computational burden is higher for both training and testing of the SVM. Also, in some cases, the SVM has high intricacy and sizeable memory requirements for classification which limits the real-time implementation of SVM.
However all these feature extraction techniques have been used for classifying the fault types with different patterns, considerable preceding information of the particular system pattern is essential. The process of obtaining these details requires constant amendments and corrections which can turn out to be a time-taking process and also lacks generalizability [34]. Also, the main drawback of most of the techniques used for fault feature extraction is that it fails to extract the features precisely in presence of decaying DC component (DDC), noise and severe harmonic conditions. This implies that the main issue in ML-based techniques is the extraction of the fault features from the power system signals (i.e. voltage or current). If the technique used has high computational complexity and significant time delay, it can affect the overall speed and accuracy of the fault detection and classification technique [35]. Consequently, to achieve the highest accuracy, an effective technique is desirable for pre-processing and generating the most relevant features from the voltage or current waveforms witnessed during the fault with minimum delay. Hence, to tackle the hassle of computational intricacy and time-delay in fault feature extraction, a new, simple, fast and powerful real-time fault feature extraction technique based on mathematical morphology (MM) is proposed in this paper.
MM is a time-domain signal processing method that precisely extracts apart from any distortions with reduced size of data window in real-time [36]- [40]. Hence, morphological operators can be used to capture the features of the disturbance occurred. The proposed approach works in three steps. First of all, the sampled 3-phase current signals along with zero-sequence current acquired from the sending-end of transmission line are pre-processed using morphological median filter (MMF) to excerpt the relevant fault features. In the second step, these extracted features are used to train ML classifier. Afterwards, this trained classifier is used to detect and classify the different fault types. DT being rulebased is more transparent and human-friendly compared to the black-box solution such as Neural Network and other pattern classifiers. Hence, the proposed method is combined with DT to have the speedy and precise fault detection and classification on power transmission lines.
This work exhibits the following contributions: 1) MMF-based new, easy, fast and efficient real-time fault feature extraction technique with reduced time delay, computational intricacy and data window size.

5)
Validation of the performance of the proposed MMF-DT-based fault detection and classification technique by using both simulated and real power system data. The transients arising from transmission line/transformer/ capacitor/reactor switching and lightning are considered as temporary transients which lead to temporal ionization in transmission lines which persist for a short period of time. The major aspect affecting the transmission system is a permanent fault. The most frequent permanent fault types in power systems are short circuit faults which can be broadly classified into two groups as symmetrical and asymmetrical. And so, the term fault generally refers to these type of permanent faults which are of major concern to the utility and transmission companies. Hence, all the 11 types of short-circuit faults including asymmetric faults (AG, BG, CG, ABG, BCG, ACG, AB, BC, and AC) and symmetric faults (ABC and ABCG) have been considered.
Numerous simulation studies have been performed using a system built in ATP/EMTP software for dataset generation. Several training and test cases are simulated with altered combinations of fault types and varying system constraints such as fault inception angle (FIA), fault resistance (R F ), source impedance (Z S ), fault location (D F ), etc. Since the proposed method needs only four features for detecting and classifying the faults, the memory requirements and computational time are substantially reduced. The obtained results reveal that the proposed technique offers a straightforward and effective means to detect and classify the transmission line faults effectively.
The proposed approach being algorithmic and inherently digital in nature can be easily incorporated in to the overall cyber-physical armour and fits in well with the cyberphysical scheme of today's state of the art power systems. At the same time, the proposed method, further leverages the decision making by making use of ML as an integral part of the proposed approach. Hence, it is eminently suitable in data driven, 'smart' and automated environments that are becoming ubiquitous as we transition to Industry 4.0.
The subsequent topics addressed in the remainder of this paper are: Section II deals with brief outlines of the MM and DT fundamentals. In Section III, the mathematical framework of the proposed MMF-DT-based fault detection and classification technique along with the flowchart is further elaborated. The validation of the proposed DFA using both simulated and real-field data is discussed and compared with existing methods in Section IV. In Section V, the results of comparative analysis are discussed. At last, Section VI puts forward some concluding remarks.

A. BASICS OF MATHEMATICAL MORPHOLOGY (MM)
MM is a deep-rooted non-linear waveform analysis technique offered by Matheron and Serra to facilitate the extraction of vital and most relevant features of the signals using a suitable function called the structuring element (SE) [36]. SE glides through the signal like a moving window and VOLUME 8, 2020 extricates the peculiar features in the neighborhood of each sample in the signal. The shape and size of SE contribute significantly in such type of analysis thus should be selected as per the requirement and aim of the particular application [38]. Owing to the availability of mostly one-dimensional signals, flat SE is suitable for several power system applications.
MM is basically a time-domain approach having lots of advantages over frequency-domain approaches like FT, WT, etc. as: 1) Rapid and easy-peasy computations viz. subtraction, addition, minimum and maximum. 2) High speed and lucid processing with much-reduced data window size, henceforth suitable for real-time applications. 3) Applicable to non-periodic transient signals. 4) Time-domain signal processing technique which extracts the most relevant features accurately despite any deformity. For signal processing, MM employs two basic operations viz. dilation and erosion, defined as: Let X (i) and S (j) be input signal to be processed and the SE, defined in the domains, In the same way, the erosion of X (i) by S (j) denoted by (X S) is given by: Though MM operations have been mainly offered and implemented for image processing, they have been applied for power system applications as well [37]. Proper utilization of these operations can easily wrench out the most relevant and meaningful features from the power system signals captured during a fault. Hence for fault feature extraction, based on these two operations, a morphological median filter (MMF) is defined as: DT is one of the most frequently exploited non-parametric ML technique which serves the purpose of classification. The aim is to build a flowchart-like tree-structure that predicts the class of target variable by interpreting simple decision rules deduced from the feature dataset. Owing to flowchart-like representation, DTs are easy to interpret and user-friendly. Mathematical foundation of the DT algorithm can be expressed as: where T is a symbol of transpose, n denotes the number of cases, m indicates the number of extracted features, F is the n − dimensional vector of the observed cases, S j is the n − dimensional vector of the variables to be predicted from From the available observations F, several DTs can be built each having different precision level [29]. Hence, to create an optimal tree with sound balance among intricacy and precision, robust algorithms have been developed [32]. The strategy is to grow a tree by making a set of on-site perfect rulings regarding which attribute to be used for splitting the dataset F. DT is built in line with these splitting rules. The foremost node in a DT is termed as a root node or parent node n p . Every time n p is divided into two further nodes n l and n r as depicted in Fig. 1. This splitting is done with maximum homogeneity which can be determined from the impurity function expressed as: where p(c| i n) is the fraction of patterns f i allotted to class c i at the node n. The best splitting is that which exploits the difference i (n) represented as: Meanwhile, the impurity of n p is constant for any of the probable partition x i ≤ x r i , i = 1, 2, . . . , m. The maximum homogeneity of n l and n r will be the maximization of i (n) as: arg max As shown in Fig. 1. Equation (7) offers the best partitioning condition as f i < f R i . Accordingly, DT cuts into sub-trees as far as there is no probability of a steep decline in a measure of impurity function. Later this built DT is executed to classify the new test data easily.

III. PROPOSED MMF-DT-BASED TECHNIQUE
The step-by-step basis of the proposed methodology developed to extract the fault features and identify the fault types is outlined in this section. The schematic drawing and flowchart are portrayed in Fig. 2. It works in three phases viz. feature extraction, feature selection and fault classification. In first phase, once the entire system is simulated considering the several fault conditions by varying the system parameters, the fault features are extracted using a proposed feature extraction technique. Phase two consists of feature selection. Also named as attribute selection. It entails the hunt for all probable blends of the extracted features to ascertain a particular feature subset having a fine prediction or classifying aptitudes. Later, the selected set of extracted features with known relevant classes is given as an input to DT to classify the new test cases.

A. FEATURE EXTRACTION
The basis of the majority of the detection and classification techniques lies in a fine dataset of fault features. The set of relevant features should be small-sized and have low computational intricacy. Generally, in real-world applications, the relevant features are suppressed in redundancy and noise, hence it is desirable to excerpt this info in cost-effective ways without losing the valuable data.
Morphological filters (MF) can precisely extract the features characterized by the SE and obtain a signal with merely part of concern through distinct MM operators [35], [40]. The accuracy and effectiveness of the MF are reliant on the length, height and shape of the SE [36], [38]. Hence, the choice of an optimal SE plays a significant role in fault feature extraction. Here, a flat linear SE and an averaging filter named MMF designed using two basic MM operators, viz. dilation and erosion, are used for feature extraction. The detailed mathematical analysis is explained in this subsection as: Whenever fault arises in the power system, the fault signal can be expressed as a combination of steady-state sinusoidal component and an exponentially decaying DC (DDC) component. Hence, mathematically it can be stated as: where X 0 denotes DDC amplitude and τ is DDC time constant, X is the magnitude of sinusoidal component, φ is the phase shift of x f (t) and ω = 2 f is the angular frequency. Now, after sampling the signal x f (t) at a sampling frequency f s , the k th sample of x f (t) can be expressed as: where t = 1/f s is the sampling interval, N denotes samples/cycle, λ = −1 τ , δ = ω t and t = n t. According to first-order Taylor series expansion, equation (9) can be expressed as: If this expansion occurs at a centre point X f (k), then its left and right side samples can be expressed as: Adding equations (11) and (12), If n is a small integer, then for high f s , cos (ω · n t) = cos (nδ) ≈ 1. Hence, equation (13) can be approximated as: Fault events on transmission lines create transient disturbances to current and voltage signals. As mentioned in Section II A., the most relevant and meaningful features of these disturbances can be easily wrenched out by proper utilization of morphological operators. For this, an averaging For signal X f (k), the dilation and erosion operations can be described as:  From (16) and (17), the MMF can be expressed as: For n = 1, equation (19) can be stated as: Now, the difference between X f (k) and D n (k), where n = 1, 2, . . . ., m can be calculated as: Based on (21), the MMF output is constructed as: A fault onset is perceived if D n (k) outstrips the threshold value M , which relies on the flow of fault signals through the transmission line.

B. ATRIBUTE SELECTION
The data features used to train the ML classifiers have a huge impact on model performance. Unrelated or partly relevant features can adversely affect the model performance. Hence, attribute selection is one of the most important steps to be performed. It is a way of opting a subset of pertinent features to build a precise predictive model. Best feature selection helps in enhancing the learning accuracy. If the relevant feature subset is elected, it detracts the training time, computational intricacy as well as over-fitting.
In accordance with (22), the fault features are extracted for all the phase and zero sequence voltages and currents and expressed as Di n and Dv n respectively where p ∈ P = {0, 1, 2, 3} corresponds to the values of phase A, B, C and zero sequence current and voltage signals.
Here, the effectiveness of the extracted features is examined with the help of four basic feature selection measures (FSM) viz. i) information gain, ii) univariate feature selection, iii) recursive feature elimination, and iv) feature importance.

1) INFORMATION GAIN (IG)
IG quantifies how much knowledge a feature offers regarding the class, hence helps to decide which variable in the available set of training attribute is the most significant for the classification. The set of attributes which maximizes the IG is elected [33].

2) UNIVARIATE FEATURE SELECTION (UFS)
This is a robust approach to enhance the classifier performance and to ease the intricacy as well as computing cost. Each attribute is verified singly. Univariate statistical tests are performed to cull those input variables that have a firm relationship with the output variables. The set of features with the highest scores is chosen.

3) RECURSIVE FEATURE ELIMINATION (RFE)
This is a self-indulgent technique that intends to identify the most effective attribute subset. It iteratively builds a model and identifies the best or worst acting features at each repetition. Later, it ranks the attributes according to the sequence of their removal. The features marked with rank one are elected.

4) FEATURE IMPORTANCE (FI)
In this technique, DTs like random forest and extra trees are applied to assess the feature importance. Scores are given to the attributes present in the data set. Features with the highest score are kept.
As mentioned earlier, the effectiveness of the extracted features is examined with the help of four basic FSM and the results are given in Table 2. From Table 2, it is clarified that the set Di n has the best classifying strength over Dv n . Hence Di n is used for composing the fault feature vector F i .

C. FAULT CLASSIFICATION
Later, the hand-picked set of extracted attributes with known significant classes is given as an input to the ML classifier to classify the new test cases. Extracted features are assessed VOLUME 8, 2020

IV. PERFORMANCE EVALUATION
In this section, with an eye to affirm the potency of the proposed technique for feature extraction as well as fault detection and classification, both analytical and simulation tests are performed. The performance is assessed from the perspectives of feature extraction time, group delay, computational intricacy, classifier performance, and classification time. The feasibility of the proposed technique is validated using real-field data as well to prove its aptness for real-time applications.

A. APPLICATION TO DEVIATION FINDING OF A SINE WAVE
In this subsection, to analyse the aptness of the proposed MMF-based feature extraction technique, two speculative sinusoidal signals X and Y are considered as portrayed in Fig. 3(a) and Fig. 3(b), respectively. The signal Y signifies a fault waveform and is composed of a fundamental component, harmonics, noise and DDC; conversely, the signal X is relatively pure. Mathematically, both the signals can be expressed as: where A 0 represents DDC amplitude, τ is DDC time constant, m denotes harmonic order, M specifies the highest harmonic order, A 1 is the magnitude and φ 1 is the phase angle of the fundamental component whereas, A m is the amplitude and φ m is the phase angle of the m th harmonic component. Gaussian noise with standard deviation (σ ) (varied from 0.1 to 1) is considered. After discretization, equation (24) and (25) can be expressed as: where t = n t, t =1/f s signifies the sampling interval, f s indicates the sampling frequency, N stands for the sam-  ples/cycle and n denotes the sampling instant (n th sample) in a discrete domain. Higher order harmonics (higher than N 2 − 1) are supposed to be screened out with a low pass filter to avoid aliasing.
The proposed MMF-based technique is applied to both the signals. The extracted features of both the waveforms D x (n) and D y (n) are depicted in Fig. 3(c). It has been found that the time for the sudden switch in the magnitude is perceived through the spike. By virtue of noise immunity of MMF [40], even though D x (n) have some low ripples owing to the presence of noise in Y , it can still be applied for sensing a sudden switch in the magnitude of the signals.

B. APPLICATION TO EMTP-GENERATED SIGNALS
In this subsection, a typical 400 kV, 50 Hz, 150 km long overhead transmission line with the parameters described in Table 3 is modelled using ATP/EMTP. The single line diagram of the system under deliberation is shown in Fig. 4. All the data is acquired from the sending end. The fault current signals acquired from the relaying point at a sampling rate of 1.2 kHz are fed to MMF for feature extraction. The set of extracted features is used for creating the fault feature vector/dataset which is further exploited for training and testing of the DT. For this, simulations of different fault cases are carried out by varying the system constraints FIA, R F , Z S and D F as shown in Table 4. VOLUME 8, 2020 TABLE 6. Extracted fault feature vector using proposed scheme (Z S1 = 100%, Z S2 = 50%).   The simulated fault current signal and the extracted features Di n for LG fault at D F = 40 km and 100 km on phase 'A' and LL fault at D F = 40 km and 100 km on phase 'AB' with varying system constraints for few cases only are depicted in Fig. 5. and Fig. 6.
The acquired values of Di n for the same are reported in Table 5 and Table 6. It is observed that the value of Di n for the faulted phase is high and exceeds the threshold value, while it is low for that of healthy phase. Hence, the obtained results demonstrate that the extracted fault feature vector F = { Di n }, p = 0, 1, 2, 3 can be used for both fault detection and faulty phase selection as well. Later, this extracted feature dataset with known pertinent classes is fed to DT as an input to categorize the new test cases.
Fault classes are ranked as 0(no fault situations i.e. line/capacitor switching, external faults, lightning and normal cases), 1(AG), 2(BG), 3(CG), 4(AB), 5(BC), 6(AC), 7(ABG), 8(BCG), 9(ACG), 10(ABC), 11(ABCG). The built DT for the above-mentioned combination of the dataset is shown in Fig. 7. Obtained results prove the aptness of the proposed MMF-DT-based technique for the hasty, precise and unfailing detection and classification of transmission line faults. Although the aim of this work is feature extraction technique for quick and reliable detection and classification of transmission line faults. Table 5 and Table 6 shows the extracted fault features at two different fault locations of 40 km and 100 km, respectively. These features could be used in regression algorithms like decision tree regression (DTR), support vector regression (SVR), multi-layer perceptron, etc. to estimate the fault location as well.
The potency of the offered MMF-based feature extraction scheme is compared with the other existing techniques [20], [24], [34], [35] from the perspectives of group delay and data window size. The obtained results are revealed in Table 7.
With an eye to affirm the eminence of the DT, the suggested feature extraction technique is also validated by combining it with other ML-based techniques such as ANN and SVM. The obtained results in terms of the classification accuracy (CA), root mean squared error (RMSE), mean absolute error (MAE), training time and response time are depicted in Table 8. The results showed that the proposed MMF-DTbased scheme is more competent compared to others and the fault detection and classification is achieved in less than quarter of a cycle.

C. APPLICATION TO REAL-FIELD SIGNALS
In this subsection, to weigh its performance, the proposed technique is applied to the real-time fault events recorded during the different fault types on different phases of different transmission lines in the Maharashtra State Electricity Transmission Network, India. All the fault signals are in COMTRADE format and recorded at 1.2 kHz sampling rate (24 samples/cycle). More than 100 fault cases are studied. Due to figure constraints the results are depicted for some cases only ( Fig. 8 and Table 9) which verifies the applicability of the offered method for real-time fault detection and classification.

V. DISCUSSION
The proposed MMF-based feature extraction technique excerpts the fault features more reliably in less than a quarter of a cycle with reduced data window size and a minimum delay of <1/4 cycles compared to others. Apart from this, the suggested MMF-DT-based technique, which combines MMF with DT, is more competent when used for fault detection and classification. To affirm the eminence of the DT, the performance is compared with ANN and SVM. From the simulation and real-time results, the most significant outcomes derived are: 1) The proposed feature extraction technique involves only addition and subtraction, henceforth has a reduced computational intricacy compared to others. 2) Also, it is highly immune to the DDC parameters, noise, and harmonics which are frequently present during a fault.
3) The obtained results proved that when combined with DT, the proposed feature extraction scheme is more efficient to accomplish the speedy and precise fault detection and classification as compared to ANN and SVM.

VI. CONCLUSION
The MMF-based new, simple real-time fault feature extraction technique to achieve speedy and precise fault detection and classification on the HV transmission line is proposed in this paper. The efficacy of the proposed technique to wrench the distinctive fault features is verified in terms of data window size, delay and computational complexity by comparing it with recent techniques. The sampled 3-phase current signals along with zero-sequence current captured from the sending end of the transmission line are pre-processed using MMF to elicit the relevant fault features which ease the fault detection. The extracted features are then fed as an input to DT for fault classification. The performance of the suggested MMF-DT-based approach is verified by simulating several fault events with varying system constraints like FIA, R F , Z S and D F etc. on 400 kV, 50 Hz, 150 km long overhead transmission line. With an eye to justify the potency of DT, the presented feature extraction technique is validated by combining it with ANN and SVM. Also, the proposed technique is applied to detect and classify the real-time fault events at the Maharashtra State Electricity Transmission Network, India. An extensive set of simulation and real-time results has revealed that: 1) The proposed feature extraction technique is highly sensitive to abrupt deviations with reduced data window size, time delay and computational intricacy as compared to others. 2) When combined with DT, the offered technique can be efficiently applied to have speedy and precise fault detection and classification on a HV transmission line.
3) The suggested MMF-DT-based approach merely needs the data from a single end of the line and the decisionmaking is achieved within a quarter of cycle with an accuracy of 99.98 % as compared to others. 4) Apart from this, it is easy and simple to execute as it entails much fewer computations. 5) The great accuracy of the presented scheme pertaining to others makes it suitable for application in real-time protection schemes.