Signal Parameter Estimation and Classification Using Mixed Supervised and Unsupervised Machine Learning Approaches

The increasing use of modern power electronics raises the issue of harmonics in power systems which ultimately deteriorate its optimal performance in terms of: increased power loss, breaker failure and mal-operation of equipment. It has been found that the most severe harmonics in the system are odd ones due to their unsymmetrical nature. This work presents the new framework for estimation and classiﬁcation of harmonics using machine learning approaches. Initially, a shallow neural network and fuzzy logic systems are used to estimate the harmonics contents in the voltage and currents signals. Based on the sequence components and IHD level of source signals, the estimation of harmonic content is achieved. The obtained results are compared with the analytically computed data for validating the performance of designed networks. The results from neural and fuzzy systems are then used to train the explainable convolutional neural network (xCNN) for harmonics classiﬁcation. The xCNN consists of pertained ALEXNET network which trains the standard binary support vector machine (SVM) for classiﬁcation of harmonics. The dictionary-based approach is used to add the explanations to the SVM classiﬁer output as a prototype. The performance of proposed framework is measured in-terms of accuracy and loss function and evaluated on the basis of its scalability and computability. The proposed approach is called a Human with Machine-In-Loop (HMIL).


I. INTRODUCTION
The problem of harmonics is as old as the age of power systems, but it has been multiplied in recent years with the evaluation of power electronics. The power electronics converters, which are widely used to integrate the Renewable Energy Sources (RES), Energy Storage Systems (ESS), loads, etc., are the major source of such disturbances. Harmonics, below the frequency of 2 kHz, must be limited, according to IEEE 519 and EN50160 standards, to 5% and 8% respectively, for optimal performance of devices and systems.
Nowadays, there are plenty of converters used in power systems, their cumulative contribution to harmonics is not just their arithmetic addition. Therefore, the analytical methods, used to estimate and correctly quantify the actual number of The associate editor coordinating the review of this manuscript and approving it for publication was Zhigang Liu . harmonics, fail. Possible way to overcome this difficulty is to use the statistical methods, i.e.: Monte Carlo Simulation (MCS), Unscented Transform (UT), etc., but their computational time and costs are very high [1].
Another possible solution can be to use machine learning techniques, i.e.: neural networks, fuzzy logic and Principle Component Analysis (PCA) for efficient harmonics estimation with computational burdens at reasonable level. The supervised machine learning maps the input with output, while the unsupervised one does not give specific output, but the trend of occurrence instead (such as classification and clustering). Figure 1 shows the generalized electrical power systems architecture where the main sources of harmonics, as well as the place of monitoring centers for estimation and evaluation of signals are indicated.
One of the key problems, with signal processing of power quality measures, is the lack of both: dimensionality reduction and adaptive characteristics. The machine learning (ML) approaches are very useful for solving both of these issues [2]. The ML techniques can improve the hyper-parameters of data models and can give an abstract representation of source data, which is necessary for signal processing of harmonic contents in a reduced subspace, and can also be applied for generalization of different power system setups [2].
Based on the analysis of signal processing in the power quality domain [3], the odd harmonics in the power systems occurred to be the most sever ones, due to their unsymmetrical nature. It is well known that the symmetrical components are widely used to determine the type of harmonics. It can be noticed that the zero sequence components are dominant in triple harmonics (multiple of three) while the negative sequence component share the major portion in other odd harmonics. It can also be noticed, that the influence of even harmonics on the system is mostly neglected due to their symmetrical nature.

II. RELATED WORKSAND PROPOSED FRAMEWORK
Syazlina et. al (2003) proposed a fuzzy controller to regulate the harmonics amount in the systems with non-linear loads. The harmonic spectrums of the signals were obtained with use of the Fourier transform. It was found that the designed fuzzy controller behaved like an active power filter while mitigating the harmonic content in the system [4].However, the controller was unable to distinguish between source and load harmonics. Mazumdar et. al (2008) suggested a recurrent neural network for distinguishing between the source and load harmonics through back-propagation time algorithm. The advantage of proposed method was a possibility of application into the active circuits, without disconnecting the load from the source, for harmonics estimation [5].The disadvantage of proposed network was its deficiency in detecting harmonics for noisy data. Lin et. al (2007) developed a functional neural network for the detection of harmonics in the power system with noisy data for half of the values of disturbances. In terms of sensitivity, the model had tolerance less than 5% and was also compared with results obtained from DFT analysis [6]. The designed network was, however, unable to measure harmonic impedance -necessary for passive filter tuning. Zebardast et. al (2016) discussed the utilization of fuzzy logic systems, based on recursion algorithm, for tracing the harmonic impedance of the network. Initially, a three-point method was used to remove the noise from the data and then fuzzy logic along with complex valued recursive least square (CRLS) algorithms was used to estimate the harmonic impedance of network [7]. The system was tested on particular frequency data set. Nascimento et. al (2011) presented the parallel neural network for estimating the harmonics of non-linear loads in the power system for half cycle data. The proposed technique was implemented along with AC controller in the active filter for the first six harmonics compensation based on the amplitude and phase of current signals [8]. Due to lack of explanations, the filter efficiency was compromised for intra-harmonics suppression.
Fernandez et. al (2019) discussed the hybrid approach of evolutionary algorithms and fuzzy inference system used as an explainable artificial intelligence. It was found that the evolutionary fuzzy systems are the building blocks of explainable machine learning and their use in different applications was also presented [9]. The explanations were not based on social aspects of human behavior.
Miller et. al (2019) explored the use of social science concepts for further improving the transparency of explainable AI algorithms. The key idea was to build the constructive explanations that do not contain probabilities, contrastive queries and overcome the ambiguities that are out of the domains of human beings. This work was limited to the explanations while considering only human behaviors but still, the interaction between humans and machines is also another aspect to be considered for framing any explanations for the AI models [10].
To the best of authors' knowledge, there is no specific work available to estimate and classify the harmonics of the power systems with the use of explainable machine learning approach. The proposed framework initially uses neural network and fuzzy logic for harmonics estimation. Then the convolutional neural network (CNN) is used for harmonics classification into their respective classes. In order to improve the reliability and trust of CNN, dictionary-based explanations, using PCA, are added to the output of the classifier. The obtained results are then compared with the analytically derived data for validation of the proposed framework performance. The classification of harmonic content is achieved on the basis of sequence components and THD level of source signal.
The presented work is arranged into six parts; Section-I introduces the need of machine learning techniques for signal parameterization. Section-II discusses the previous work carried out in the domain of signal parameter estimation and classification and the need for proposed framework. Section-III presents the motivation behind the development of framework. Section-IV exploits the strategic application of: the neural and the fuzzy approaches as well as the Explainable CNN for estimation and classification of signal harmonics. VOLUME 8, 2020 Section-V discusses the results obtained with the application of proposed framework for different scenarios. Finally, section-VI gives the conclusive remarks about the achieved results and future recommendations.

III. PROBLEM STATEMENT AND CONTRIBUTIONS
The estimation and classification of harmonics is one of the crucial aspects in many industrial applications, e.g.: choice of filter, type of motor drives insulation, converter topology, HVDC installation, etc. This research uses coordinated supervised and unsupervised machine learning techniques for initially determining the total level of harmonics and then classifies them into their respective classes. Since the biggest disadvantage of machine learning approach is its black-box nature and unpredictable behavior, therefore, in order to increase the reliability and performance of learning approaches, the explanations were added to the possible outcome of the network -hence called: explainable machine learning.
The main contributions of the presented research are: (1) Application of supervised learning approaches with neural network and fuzzy logic -harmonics are estimated with the use of sequence components and individual THD of the signal components. The proposed systems use the least number of neurons and membership functions for harmonics estimation. (2) Performance evaluation of designed method by comparing the estimated quantities with analytically determined values. The comparative analysis ensures the effectiveness of proposed framework in contrastto traditional methods of signal parameterization. (3) New framework for classification of harmonics in smart grid system. It uses mixed supervised and unsupervised learning for adding dictionary-based explanations to CNN. The xCNN does not only classify the harmonics into respective classes but also provides reason for putting them into particular category. The developed algorithms are model agnostic methods and can be applied to any complex system, provided the system frequency and voltages levels are consistent.

IV. RESEARCH METHODOLOGY A. TOTAL HARMONIC DISTORTION
THD is a power quality index, which is defined as the ratio of root mean square of first 40 components of harmonic spectrum, expressed as the percentage of fundamental frequency, considering the inter-harmonics in 100Hz -2 kHz frequency range [11]. The expression for THD computation is given by the following (1): where IHD is the individual harmonic distortion of harmonic components and is ratio of RMS value of individual harmonic component to the RMS value of fundamental frequency component, given by equation (2): The THD of the distorted waveform is computed using Fourier transform, which expresses the disturbances as the sum of dissociated sine and cosine waveforms. The Fourier series used to represent any waveform into the combination of sine and cosine is given by equation (3) [12]; where A 0 is the DC offset in the waveform, m is the number of frequency components; f (x) is a recursive function, A m is a harmonic current sine wave in phase with fundamental frequency and B m is a multiple harmonic current sine wave in quadrature with fundamental frequency. A 0 , A m and B m can be found by equation (4): ANNs are widely used in many deep-learning applications due to their robustness and computational flexibility. For the estimation of parameters, the ANN plays a vital role, as they get input data and moves to output with layer to layer propagation. Each of the associated neuron has the combined weight and bias of all the signals coming from previous layer. Based on the activation function and convolution of two signals, the inputs pass through activation function to produce output -as shown in Fig. 2. Since, the presented ANN consists of one hidden layer only, it is called a shallow neural network. Presented network is the feed-forward, multi-layer ANN -consisting of input, hidden and output layers.
The development of neural networks is mainly based on three stages: training, testing and validation. Initially, the input data is sampled and divided into training, testing and validation subsets. The main goal of the training stage is to minimize the error, while during the testing stage the acceptability of resulting error is determined [13].

C. FUZZY INFERENCE SYSTEM
Fuzzy Inference System (FIS) is based on the IF-THEN logic for processing the input data. The fuzzy logic approach is using probabilistic technique to produce the desired output. It is multi-valued function that deals with the degrees of membership function and truth table collectively. Mathematically it can be expressed by equation (5) [14]: where u T (y) is the membership function and it completely characterizes the fuzzy logic. The typical membership functions used for this application are sigmoidal and triangular relations, given by equations (6) and (7) [14]: The formation of fuzzy logic is based on four essential steps. First: the input and output variables are declared, second: the membership function is chosen. Third: to formulate the associated rules and finallyfourth: to de-fuzzify the outcome. The fourth stem is also a test of the system against wide range of inputs. The conceptual diagram of designed fuzzy logic is shown in Fig. 3.

D. EXPLAINABLE CONVOLUTIONAL NEURAL NETWORK
Convolution Neural Network is a powerful tool for image recognition and classification due to its ability to extract specific features directly. The use of pre-trained CNNs instead of using large data sets for training and validation reduces the computational cost, time and computational burden. Apart from robustness and efficiency of neural networks, they suffer from low transparency and trust due to their black box nature. Therefore, explanations are needed to enhance their reliability and authenticity. A pertained ALEXNET network (CNN) is used for training standard binary support vector machine (SVM) for image classification. The dictionary-based approach is used to add explanations to the classifier output. The proposed approach is called Human-In-Loop (HIL) system and is presented in Fig. 4, explaining, why the classifier was given the particular output and that the explanations are given by the mediator in a human-interpretable way.
The support vector machine principally is a binary classifier but its functionality can be extended by grouping it into multi-class family. In this work, the error correcting output code (ECOC) was used to merge them together. The binary SVM was used as an image classifier based on ECOC algorithm trained on cross fold validations and the k-fold technique is used to access its quality [15]. The binary SVM is principally defined by equation (8): where K is the number of distinct classes in the data sets. The SVM is trained by pre-trained ALEXNET, which is essentially used for feature extraction from the images and transforming them into RGB matrices. The input to the SVM classifier is mid-level features and is essentially a 25-d feature vector.
The dictionary-based approach is based on mid-level and high-level features which are understandable by human-beings as well. The elements of dictionary are the features of objects to be recognized and classified. The dictionary does not include all the samples but only subsets which make them selective rather than distinctive. The idea is based on decomposing the input data set into meaningful features and picking a few of them for decision making. In order to obtain the elements of dictionary, the principle components analysis (PCA) is used [16]. The elements of dictionary are human interpretable. The PCA transforms the given input data into the new variables along the principle axis of greater variance. Mathematically, it is defined by the equation (9): VOLUME 8, 2020  where X is the input data set of m dimensions, U is the set of principle coefficients of n dimensions with m >= n, and V is the set of principle vectors. The PCA can be formulated as a minimization problem given by equation (10): where, in this case, V represents the dictionary and its entries are called atoms and U represents variance of V . Both the conditions of PCA i.e. orthogonality and dimensionality, set the constraints on above minimization problem and can be relaxed by regularization terms for principle vectors and their corresponding coefficient values.

A. ANALYTICAL APPROACH
The circuit diagram, as test case for the computation of harmonics and THD level is shown in Fig.5. The designed network is based on non-linear source which itself generates the harmonics and the load is supplied through the converter which further degrades the power quality of system. The Figs. 6 and 7 show the voltage and current waveforms of source, respectively, and their corresponding Fast Fourier   analysis. It can be seen that the odd harmonics at 3, 5, 7 and 11 th are the dominant components for source signals.
The Figs. 8 and 9 show the voltage and current waveforms of load, respectively, and their corresponding Fast Fourier analysis. It can be found again, that odd harmonics are the dominant components. Since, the results are obvious because the analyzed signals are inherently symmetrical and therefore the dominant harmonics are always odd on source and load sides.

B. ARTIFICIAL NEURAL NETWORK
The current signal which was selected to estimate the THD of the generated waveform was calculated on the basis of the presence of following harmonics: 3 rd , 5 th , 7 th , 9 th and 11 th from the source signal. At first, the fundamental frequency component was determined i.e.50 Hz -as the nominal frequency of power systems. Then, the number of samples was selected as 1000, as it has proven big enough to provide appropriate number of training data with reasonable level of computational burden. The sampling time was chosen to be half of fundamental period to avoid signal aliasing and thus the threshold value was set for the possible amplitude of fundamental frequency. According to IEEE standards, the admissible amplitude of analyzed signal should be within the tolerance of ±10% of the nominal value. The samples of analyzed signal are stored into variable 'harm' which is later used to determine variable 'thd' for training as well as testing data sets. The investigated neural network was created using Neural Network Toolbox, provided by the Matlab software. It is a typical feed-forward multilayer perceptron network.
As shown in Fig. 10, the investigated network consists of one hidden layer (with 10 neurons) and one output layer (1 neuron). In both layers, weights 'w' and offsets 'b' are set to 0.1 and 0.9 respectively. The input signal (THD value) was calculated and sent to the network over 3000 times in order to provide effective learning. In the beginning, the network was designed for fitting the training data and then the input data set was randomly divided: 70% of samples were used as training data, 15% as test data, and remaining 15% as validation set. After that, the network was actuated to learn, using a Levenberg-Marquardt training algorithm. Figure.11 shows that Mean Square Error curve of the test data coincides with the training data curve. Therefore, the generalization, done by the network, is very good. It can be seen that the test location after 119 epoches reaches the best set value at the least average squared error. At the begging, after 12 epoches, the test data deviates from its training set and thus covariance between test and validation sets is large, but, at the end, after 119 epochs the variance becomes zero and the test data normalizes to training throughout validation set.  As it can be seen in Fig. 12, the training procedure was successful because the error between the actual and the desired output was minimized. Since the considered signals are periodic, the error is inherently small due to the symmetrical nature. It can also be seen that the signal does not contain even harmonics for particular evaluated case because they vanish out for periodic signals. In Fig. 13, the performance of the neural network with twenty neurons in the hidden layer is shown. It can be easily noticed that for the same number of iterations, increasing the number of neurons does not guarantee that the test curve and the training curve will coincide with each other. In this case, over-fitting is observed.

C. FUZZY INFERENCE SYSTEM
For efficient detection of desired quantity, the fuzzy inference system relies upon the formation of effective rules and use of appropriate membership functions for the input and output variables. The sigmoidal and triangular membership functions are used according to equations (5) and (6) and are shown in Fig. 14. The rules designed for presented system to detect the right choice of harmonics level are listed below: IFseq. comp. is zero AND IHD is 1.25% THEN it is 3 rd Harmonic IFseq. comp. is negative AND IHD is 14.65% THEN it is 5 th Harmonic  IFseq. comp. is negative AND IHD is 6.25%THEN it is th Harmonic IFseq. comp. is zero AND IHD is 2.25% THEN it is 9 th Harmonic IFseq. comp. is negative AND IHD is 3.5% THEN it is 11 th Harmonic Based on the investigated membership functions for input and output variables of fuzzy system, the FIS was tested for various harmonics contents and it was found that it operated successfully. Initially, for sequence components, the membership function was selected as 3.5, but was found to have difficulties in classifying the 9 th and 11 th harmonics. Then, the value was increased to 10 being minimum threshold for odd harmonics IHD, which eventually resolves the problem. The performance of the designed fuzzy inference systemis shown in Fig.15, picturing the rule viewer used for accuracy validation of the specific harmonics detection in the test circuit shown in Fig.5.

D. EXPLAINABLE CONVOLUTIONAL NEURAL NETWORK
In this case, the data is stored into the workspace and is then divided into training and testing sets with 75% and 25% of input data respectively. The input data consisted of total  of 100 images stored into pre-defined variable, fifty for each even and odd harmonics. The ALEXNET has three fully connected layers and five convolution layers. For the first layer, the input images are arranged into RGB matrices. The given images are resized before sending them into the next layer. Then, in order to get the features from the training and validation sets, the activation function is used on fully connected layers. In order to evaluate the performance of classifier, its confusion matrix is calculated and is shown in Table.1. The computed confusion matrix indicates 95% accuracy of designed SVM classifier for a particular test case, since its principle diagonal elements are almost identical.
Figure16 shows the subset of randomly selected images for training the designed neural network. The selection is quite diverse and balanced as it contains the samples for even and odd harmonics together.
Figure17 shows the training of designed CNN. The system is evaluated for 5 epochs with 20 mini-batch number to avoid over-fitting. Initially, the accuracy of system for both smooth   and unsmooth training and also for testing is 50% at first iteration. Before 5 th epoch, the accuracy for training and validation set reaches up to 95%. The smoothing window is applied to training data fornoise reduction and smoother trends. The loss function defines the difference between the actual and calculated values. It is clear that greater the loss, lesser the performance of the designed CNN. It can be seen from Fig. 18 that, initially the loss value is high but as the training process progresses the losses decreases to zero and system achieves higher robustness.
In order to ensure generalization capability of proposed framework,the performance of the system was evaluated by selecting samples from outside of the training and testing data sets. Fig. 19 shows the set of images used for testing purposes.
The designed network successfully recognizes and classifies the images according to their predefined harmonic classes based on the THD values and sequence components. The output of the network displays the class of the tested image and has 95% accuracy for this particular case, as defined by its confusion matrix which is shown in Table.1.
The key response of the script is shown above with proper classification of category and confusion matrix for determining its accuracy.   To reduce the amount of data in two sets of vector components, called principle vectors, the PCA approach is applied to statistically distributed input data set as shown in Fig. 20. The principle components are found on the basis of maximizing the variance of data in any particular directions. Fig. 21 represents the variations of Eigen values of respective principle components against the input data variance. Based on PCA approach, the dictionary with atoms built on the features of input test data is shown in Fig. 22. The test image comes from the odd harmonics class thus, the dictionary contains the feature images of odd harmonics.
Based on the knowledge of dictionary atoms obtained by applying PCA techniques, the explanation for the test-1 is obtained along with its score and it's matching the respective class, as shown in Fig. 23. The explanation correctly indicates the odd harmonics class with its relevance score. The   explanations are given as prototype which gives the complete reflection of associated class.
In order to verify the generalization characteristics of proposed framework, the test images (out from the training and testing data sets) are chosen and presented in Fig. 24. Based on the same dictionary atoms, defined in Fig. 22, the framework places the test image into even harmonics class and the corresponding explanations are shown in Fig. 25. From these results, it is clear that the classification layers in convolutional neural network rely on the signal estimations made by designed neural and fuzzysystem. If the estimation results are sufficient to train the designed convolutional neural network over the possible signal components, then the proposed framework is able to generalize to different set of signals provided the frequency and voltage levels are consistent.

E. SCALABILITY AND COMPUTABILITY OF PROPOSED FRAMEWORK
The loss rate and accuracy as the performance criteria, mentioned in previous section, are not sufficient to evaluate the effectiveness of any learning algorithm when dealing with large data sets.
Since, these measures do not consider the major characteristics used to evaluate the scalability of learning algorithm and hence additional three factors are considered: error, time and accuracy. The target is to design learning framework that deals with larger data samples in a shorter time period with the lowest possible error. These scalability measures are defined according to PASCAL large scale learning challenge [17].
The scalability factors evaluated on the framework are shown in Fig.26, illustrating: a) test error v/s training time -reproduced by evaluating the test error over certain allocated time period against the large data set output from the estimation part of explainable convolutional neural network. b) test error v/s memory size -plotted training data sets against corresponding test errors c) training time v/s memory size -various training sets against corresponding training time for proposed framework. In presented situation, PASCAL measures assign the score to each of the characteristics and then takean average of all to decide their rankings according to their severity level. The algorithm with the least test error, consuming less GPU memory and taking the least training time is ranked the highest. From the results presented in Fig. 26, it can be seen, that the test error significantly reduces as the proposed framework is trained for large data set over the sufficient training time.
The learning rate v/s memory size characteristics, of proposed framework, are plotted,for computability measures, in Fig. 27. Since, the proposed framework has two output  layers (one for classification of the signal belonging to particular class and other giving explanation through statistical analysis). From the Fig.27 one can see that the proposed model learning process operated very fast for initial period, when GPU is unloaded and then saturated as it passes through classifier layer. The key reason for this behavior is based on the fact that the accuracy of signal is already high during the early stage of the training.

VI. CONCLUSIONS AND RECOMMENDATIONS
The new framework for estimating and classifying the harmonics was proposed. The initial step was based on designing fuzzy logic and neural network system for estimating the harmonic level on the basis of signal waveforms and associated THDs. It was found that neural network was more efficient than fuzzy approach as it correctly identified all the respective harmonic contents.In terms of the error rate, the ANN resulted in 0.5% level of error while the fuzzy logic was at around 9%; however, the speed of fuzzy approach was higher than neural network due to the fact that the latter involved iterative process during training and testing of different input samples. The biggest error value, found during the presented investigation, was at the level of 32.6%. Such an error value might be caused by the situation when the neural network output signal had a shape which was similar to the shape of another signal but with a different THD value and was recognized incorrectly. Although in most cases, the designed neural network performed correctly. The obtained error, computed for a set of signal samples, was calculated as the relative difference in THD values between the neural network output (approximation) and the THD value calculated using the analytical method (exact value). Since, the samples were generated randomly and were different than the training set. As a result, the output error was smaller and increasing number of neurons in the hidden layer did not necessarily give better results.
The pre-trained convolution neural network was used for training the Support Vector Machine (SVM) for transfer learning and feature extractions, after estimation of harmonics. During this step, it was found that the use of such a solution drastically reduced the computational time and burden for training the designed network with least number of training and validation data sets evaluated on the basis of PASCAL criteria. The designed network was tested randomly with different number of signals and it showed the perfect accuracy in recognition sand classification of the input signals into their respective harmonic classes. The confusion matrix represented a 95% accuracy of the designed CNN for given test case.For the sake of improvement of the transparency and trust levels of the designed CNN, the explanations, with use of dictionary-based approach, were added into the computational process. The applied technique was based on principle component analysis (PCA) and was formulated as the minimization problem for finding the principle vectors as explanations to the designed neural network.
The designed neural network can be further improved by applying bigger number of samples per sampling period as well as the accuracy of fuzzy inference system can be enhanced by utilizing the wavelet characteristics of voltage and current signals. Another possible improvement on the designed system could be an extension with use of other explanation techniques such as: proto-type, pixel-wise, layerwise, etc. for testing their effectiveness under diverse range of learning techniques. The suggested technique could also be applied in other applications along with other supervised machine learning techniques different than SVM.