A New Convolutional Network Structure for Power Quality Disturbance Identification and Classification in Micro-Grids

,


I. INTRODUCTION
The application of a microgrid is an effective technical approach for the large-scale application of distributed generations, an effective way to realize an active distribution network, and an effective means to implement the transition from the traditional grid to the smart grid. However, due to the utilization of a large number of power electronic devices in a microgrid, a large number of harmonic signals are injected into the grid, resulting in power quality deterioration problems such as voltage waveform distortion, fluctuation, flickering and three-phase unbalance [1]. This will pose a serious threat to the security, stability and economic operation The associate editor coordinating the review of this manuscript and approving it for publication was Canbing Li . of the power system, as well as to the surrounding electrical environment. Therefore, microgrid disturbance signal identification and classification is of great significance.
In terms of identification, with the development of computer simulation technology, many advanced signal characteristic extraction methods, such as fast Fourier transform (FFT), wavelet transform (WT) and Hilbert-Huang transform (HHT), have emerged. However, there are certain respective shortcomings to all these methods that are difficult to overcome. For example, FFT requires that the signal must be a stationary periodic signal and must satisfy the sampling theorem; otherwise, the analysis results will exhibit spectrum aliasing and the fence effect, while WT suffers from the difficulty in choosing the wavelet basis, and HHT faces the problems of the endpoint flying wing and modal VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ confusion [2]. Numerous studies have shown that the convolutional structures in convolutional neural networks have strong characteristic learning and expression abilities. The sharing of the convolutional kernel parameters in the hidden layer and the interconnection sparsity between the layers enable convolutional neural networks to learn grid-like topology characteristics such as pixels and audio with less computational effort, and the learning effect is stable, while additional characteristic engineering for the data is unnecessary. The structure has a strong self-learning ability and natural adaptability to one-dimensional signals, which can overcome the limitations of previous characteristic extraction methods. Against this background, scholars have pioneered the application of convolutional neural networks with traditional structures to the characteristic extraction of electric energy disturbance signals [3]- [7]. However, the convolutional network structure, which simply superimposes the convolution layer and then connects one or more fully connected layers, has a poor learning ability. As a result, the convergence speed of the network is slow, and the generalization ability is poor. Many scholars have found that increasing the depth of the network structure is the most effective method for enhancing the network learning ability at present. However, this will lead to an excessive computation amount and overfitting for traditional convolution network structures, such as LeNet and AlexNet. In many convolutional network models, the direct connection of the residual network (ResNet) enables it to considerably increase the depth without an excessive computational effort, thus improving the accuracy. In the internal residual block, the problem of gradient disappearance due to the increased depth in the deep neural network is relieved by a jump connection, which has the advantage of easy optimization, while the inception structure has the advantage of a more efficient use of the computing resources, and regarding the structure, more characteristics can be extracted for the same computation amount, and the training effect can be improved. A method based on the combination of these two network structures has been previously proposed for image classification, and good results have been achieved [8]- [11].
In terms of classification, with the improvement of the computing power, power quality classification methods based on pattern recognition are constantly proposed, e.g., methods based on neural networks [12], random forest networks [13], wavelet neural networks [14], support vector machines [15], [16], k-nearest neighbors [17], and decision trees. These methods have partly solved the problems inherent to the previous methods such that the accuracy of power quality disturbance classification has been considerably improved. However, a large computation amount is still required. Thus, there is still room for improvement of the accuracy.
In this paper, a new method for power quality disturbance identification and classification is proposed by constructing a new deep convolutional network structure. The database established according to the power disturbance signal calibrated by an n-dimensional unit vector is first adopted to train the deep convolution network consisting of a five-layer 1D-MIR structure and a three-layer full-connection layer. Then, the network trained and optimized by the gradient descent method and adaptive moment estimation method (Adam) is applied to power quality disturbance identification and classification, thus improving the classification accuracy and speed.
The remainder of this paper is organized as follows. Section II provides a detailed description of the construction method and process of the proposed new convolutional network structure and its application in power quality disturbance identification and classification, where the establishment of the database is contained in Section II. A. The construction of the deep convolutional network with a 1D-MIR structure is expounded in Section II. B, and the training process is described in Section II. C. In Section III, the simulation results obtained with the proposed method are reported. In Section IV, representative practical cases of single and composite disturbances are analyzed, which illustrates the validity of the classification method. Finally, the Conclusion Section summarizes the main outcomes of this paper.

II. DESCRIPTION OF THE METHOD A. ESTABLISHMENT OF THE DATABASE
According to IEEE standards, seven mathematical models of a single power quality voltage disturbance signal are established: harmonics (X 1 ), transient oscillation (X 2 ), transient swell (X 3 ), transient sag (X 4 ), interruption (X 5 ), transient pulse (X 6 ) and fluctuation (X 7 ). As indicated in TABLE 1, the fundamental wave frequency f 0 is 50 Hz, where T is the power frequency period, u(t) is the unit step function, and U is the voltage amplitude. According to the actual situation of the power grid, considering harmonics below 30 times, the frequency range of the transient oscillation is 800∼1800 Hz.
Python is adopted to generate the disturbance signals listed in TABLE 1, where the sampling frequency is f s = 10 kHz, and 25 fundamental wave cycles, namely, at 500 ms, are realized, with 5000 sampling points. The waveforms corresponding to TABLE 1 are shown in Fig. 1.

B. BUILDING OF THE DEEP CONVOLUTIONAL NETWORK WITH A 1D-MIR STRUCTURE
The convolutional network consists of three parts, which are used for the extraction of the characteristics, classification of the characteristics and optimization of the R. Gong, T. Ruan: New Convolutional Network Structure for Power Quality Disturbance Identification and Classification output, and contains eight layers altogether. The building of the deep convolutional network can be described as follows: The extraction of the characteristics relies on five 1D-MIR modules. Each of them consists of four branches, three of which consist of a subnetwork with two convolution layers in each branch, and the remaining branch is a shortcut, as shown in Fig. 2.
The three branches are named Branch_0, Branch_1 and Branch_2. Their functions are to extract the subtle characteristics of signals and reduce the computational load of the network. Among them, all the first convolution layers are convolution layers with 1 × 1 convolution cores and 1 channel, and the second layers are convolution layers with a 1×3 convolution core and 32 channels, 1×5 convolution core and 16 channels, and 1×7 convolution core and 8 channels for Branch_0, Branch_1 and Branch_2, respectively, as indicated in TABLE 2. Because the dimensions of the output tensors of the three branches are inconsistent, the output of each branch should be filled symmetrically.   First, the outputs of the three branches are jointed to form a characteristic vector, and the vector is then made to multiply the output of the directly connected channel (also called the shortcut), while the multiplication results are then input to the next layer. The shortcut directly passes the input information to the output and protects the integrity of the information.
The classification part consists of three layers comprising the full-connection layer and the three layers of the dropout layer, which are alternately connected.
First, in all the fully connected layers, each node is connected with a node of the last layer. Next, the local information with category discrimination is integrated into a new characteristic vector. Finally, a new dropout layer is connected, and the weights at the nodes with less local characteristic information are set to 0, as shown in Fig. 3.
The optimization part of the output consists of an output layer and an optimization layer. In the output layer, the softmax function (also called the normalized exponential function) is utilized as the loss function. First, the probabilities of the output types in the classification part are individually calculated by the loss function. Next, the types of the power disturbance signals are determined according to the probabilities and are compared to the labels calibrated in advance to determine the ratio of the incorrect samples to the overall samples. Thereafter, the loss is output. In the optimization layer, the Adam function is first utilized to conduct firstand second-order moment estimations of the loss function of the output layer. Finally, the learning speed is dynamically adjusted according to the estimation results, thus regulating the learning steps of the back propagation process and optimizing the loss. The final built network structure is shown in Fig. 4.

C. TRAINING PROCESS
The training process of the 1D-MIR deep convolutional network on the power quality disturbance signal database is divided into two processes: the forward propagation process and back propagation process.

1) FORWARD PROPAGATION PROCESS
Step 1: Python was adopted to generate the disturbance signals as listed in TABLE 1. Two thousand random samples are generated for each of the fundamental wave signals and the seven types of disturbance signals, totaling 16000 samples. For each type of disturbance, 1600 samples are selected to form the training set and 400 samples to form the test set. To ensure that the simulation signal is as close as possible to the actual power system disturbance signals, the begin-end times to each type of disturbance and random variation amplitude are set within a certain range. To improve the generalization ability of the network, data augmentation is required, which is achieved by randomly scrambling the different kinds of samples in the process for intercepting the disturbance signals [18].
To facilitate training, each disturbance type mentioned above is represented by a vector. That is, X 0 is represented by [  The 16000 samples and corresponding labels are divided into 100 batches, and the batch size is 160, while the size of the sample labels is 1× the class number ×1, where the class number is 8. The sample is written as a vector, as defined in Eq. (1): In Eq. (1), i = 1, 2, . . . , n, where n is the batch number of the samples, and m is the length of the signals collected. The sample size of each batch is expressed as: At the same time, at the moment the program starts, each network node is initialized, and the weights and biases are randomly assigned.
Step 2: An input constructor (next batch) is constructed by calling the iterator of Python's bottom implementation. The samples and labels are fed into the network in batches to train the weights and biases of the characteristic extraction part.
A total of i characteristic vectors are obtained from the batch samples intercepted by the iterator under the action of the 5 1D-MIR modules.
The characteristic vector is output by the n-th 1D-MIR module: In Eq. (3), y (n) is the characteristic vector of the output of the n-th layer; k n an , k n bn and k n cn are the convolution kernels of the three branches of the n-th layer; k n is the downsampling convolution kernel of the n-th layer; and '' * '' is the convolution operation, n ∈[0,1,2,3,4].
Step 3: The output of the characteristic extraction part is input to the classification part to train the weights of the fullconnection layer. The specific operation is as follows: i. All the weights are aggregated into a weight matrix (W ) in the order of the hidden units: The same is true for the biases B: where q is the length of the input vector, i.e., the number of data points input into the full-connection layer. Then, the weight matrix W and bias matrix B are initialized. The initialization process is as follows: the matrices are assigned by the random variables generated by the Gaussian process and then multiplied by 2 q (l−1) . ii. Each characteristic vector in each batch is input into the full-connection layer in the form of a vector space, and then, under the action of the full-connection layer, it is transformed into a simple characteristic vector, which can be expressed as: iii. The outputs of the full-connection layer are input into the dropout layer, where the inactivation rate is in the range of [0,1], and the rate is set to 0.8. Then, a sparse vector is obtained by the weights of the partial nodes: where r (2) n is the inactivation factor randomly extracted from the Bernoulli distribution, z (1) is the output of the first fully connected layer, andz (2) is the final sparse vector. The following relationship can be obtained by repeating the same process three times.
The above is the final characteristic vector of the power disturbance signals.
Step 4: The final characteristic vector is input into the softmax layer, and the vector is normalized with the softmax function. Then, the vector is mapped to the interval of (0,1), and an eight-dimensional probability vector is obtained. The estimated probability value of sample x for each category i is denoted as p(i|x), which can be expressed as: os i x so e x [1] os i x so . . . e x [8] os i x so where i is the index of vector x so . At the corresponding positions of the eight vector elements, the closer the element value approaches 1, the more likely it belongs to the perturbation type of the corresponding position of the value. The network classifies the sample as the one with the highest probability.
Step 5: The error rate is calculated by comparing the judgment result of the network to the corresponding label. At the same time, the difference between the output and the sample label, which is called the loss value, is calculated node by node via the loss function, as expressed in Eq. where a is the input of each layer, and y is the basic truth label value of the sample.
Then, the cost function is calculated according to Eq. (12):

2) BACK PROPAGATION PROCESS
Step 1: The derivatives of the weight (w) and bias (b) are determined for the loss functions obtained in the forward propagation process: Step 2: According to the chain rule, the derivatives of the loss function with respect to the weights and biases of the layer are determined for the output y [i] of each level to calculate the minimum value of the loss function. The vectored back propagation equation can be expressed as follows: In the iteration process, the Adam optimization algorithm is adopted to reduce the calculation amount.

3) CONFIGURATIONS OF THE PARAMETERS AND ENVIRONMENT IN THE TRAINING PROCESS
i. The configuration of the parameters: First, the initial learning rate, the exponential decay rate of the first-order moment estimation β 1 , and the exponential decay rate of the second-order moment estimation β 2 are set to 0.01, 0.9, and 0.999, respectively.
iii. The configuration of the software environment: Windows 10 64-bit, CUDA Toolkit 9.0, CUDNN V9.0, Python 3.7.2, Tensor-GPU 1.13.1. model. The training and testing of the network are accelerated by the GPU.

III. RESULTS AND ANALYSIS
The key performance indicators mainly include the classification accuracy, computation cost, convergence speed, etc. for evaluating the quality (good and bad) of an algorithm for the identification and classification of the power quality disturbance. In this paper, these indicators are also adopted.
The classification accuracy is defined as the ratio of the correct number of samples to the total number of samples and is calculated according to Eq. (20).

accuracy =
Correct number of classification Total number of samples (20) The parameter quantity of the network to be trained (the number of weights) is determined by the network structure shown in Fig. 2 and calculated by Eq. (21).
where p l is the number of parameters in layer l, K×G is the number of convolution kernels of the layer, and C is the number of convolution kernels or channels. The calculation quantity is also determined by the network structure and calculated by Eq. (22).
where O is the number of output characteristic graphs, H and K are the height and width of the input characteristic graphs, respectively, S is the step length, and P h and P w are the pixels filled along the height and width directions, respectively. According to Eqs. (20), (21) and (22), the relationships between the loss value and accuracy of the network and the number of iterations on the training and testing sets can be determined, which are shown in Figs. 5, 6, 7 and 8.     number of iterations increases, the gap between the classification results and the predetermined results decreases, namely, the loss value decreases, and the statistical accuracy gradually increases.
As shown in Figs. 7 and 8, it is evident that the maximum accuracy of the network on the testing set is as high as 99.8%, and the loss value is further reduced to 0.1910. Through evaluation of the neural network model via 10-fold crossvalidation, the average accuracy and loss value of the network in the simulation experiment are obtained, which are 98.5% and 0.1945, respectively.
The curve of the accuracy vs. the number of iterations is almost flat, implying that the dependence of the accuracy on the number of iterations is not notable. Therefore, the number of iterations on the testing set can be decreased to reduce the overall computation amount.
To examine the advantage of the network built over others, a test of 3 technical indicators (training time (times), total number of iterations, and number of iterations at convergence) for characterizing the convergence speed of a network is performed and 2 technical indicators (calculation quantity (times) and the number of parameters) are calculated for characterizing the computation cost by Eqs. (21) and (22), VOLUME 8, 2020 respectively, for several kinds of networks (including the networks built in this paper and [3], [4], [19]) for the identification and classification of power quality disturbances.
The results obtained are summarized in TABLES 3 and 4.  TABLES 3 and 4 indicate that the training cost of the network built for the algorithm is much lower than that of the others, and its convergence speed is also much higher than that of the others.
To verify the generalization ability of the network, 64 types of disturbance samples are considered, including 8 single disturbances and 56 composite disturbances. The number of samples is 32000, and they are divided into two training sets.
Figs. 6, 9 and 10 show the accuracy variation obtained from the three training sets (data sets I, II and III, respectively). The figures reveal that the generalization accuracies of ID-MIR are 99.88%, 99.80% and 99.61% on the three data sets (I, II and III, respectively) whose size and type are different, but these values are very close. Figs. 7 and 8 show the performance of the network on the test set.
The generalization precisions obtained by the 4 different kinds of networks for these 64 types of different disturbances are listed and compared in TABLE 5, whose calculation methods are the same as those used in [20], [21].
The classification accuracies for the 8 types of single disturbances are summarized in TABLE 6, and those for the 56 types of composite disturbances are listed in TABLE 7, where the accuracy rate is calculated with Eq. (20), and the number of correct or incorrect number of samples is obtained by screening the total samples.
In addition, the amount of data utilized in the network training is smaller than that utilized in the other networks.   All these findings fully prove that the network has a strong generalization ability.
Moreover, to test the performance of the anti-interference ability of the method, three Gaussian white noises with signal-to-noise ratios (SNRs) of 30 dB, 40 dB and 50 dB are added to the disturbance signals to simulate the actual situation of noise pollution, and the classification accuracy rates are also listed in TABLES 6 and 7. It can be observed that they are still very high.   common methods, the proposed method attains a much higher accuracy rate.

IV. ANALYSIS OF THE EXAMPLES
To verify the effectiveness of the method, a simulation of the various single-disturbance types listed in TABLE 1 and several composite disturbance types is performed by utilizing Python, and an identification and classification is performed by the method proposed in this paper. The results obtained show that this method can clearly illustrate the start and end times when disturbances occur and the various characteristic information of the disturbances, thus realizing the identification and classification of the disturbance types. In the following, examples of single and composite disturbances are given, which illustrate the validity of the classification method.

A. STEADY-STATE HARMONIC SIGNAL
Given a Voltage Signal: The waveform of the given signal is shown in Fig. 12, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 11. The information in the figure shows that the steadystate harmonic signal occurs in the interval t ∈ [0,500] ms, which is consistent with the information contained in the VOLUME 8, 2020  given signal and reflects the characteristics of this type of disturbance.

B. VOLTAGE SWELL
Given a Voltage Signal: The waveform of the given signal is shown in Fig. 14, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 13. The information in the figure shows that voltage swell occurs in the interval t ∈ [349,385] ms, which is consistent with the information provided by the given signal and reflects the characteristics of this type of disturbance.

C. VOLTAGE SAG
Given a Voltage Signal: The waveform of the given signal is shown in Fig. 16, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 15. The information in the figure shows that voltage sag occurs in the interval t ∈ [386,490] ms, which is consistent with the information in the given signal and reflects the characteristics of this type of disturbance.

D. VOLTAGE INTERRUPTION
Given a Voltage Signal: The waveform of the given signal is shown in Fig. 18, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 17. The information in the figure shows that voltage interruption occurs in t ∈ [240,465] ms, which is consistent with the information provided by the given signal and reflects the characteristics of this type of disturbance.
where 0.5T ≤ t 2 − t 1 ≤ 30T and 50ms < t 3 < 150ms. The waveform of the given signal is shown in Fig. 20, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 19. The figures demonstrate that the harmonics occur in t ∈ [60,415] ms, and the voltage surge occurs in t ∈ [415,465] ms, which is consistent with the information contained in the given signal and reflects the characteristics of this type of composite disturbance.
The waveform of the given signal is shown in Fig. 22, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 21. As shown in the figures, the harmonics occur in t ∈ [75,360] ms, and the voltage interruption occurs in t ∈ [360,484] ms, which is consistent with the information provided by the given signal and reflects the characteristics of this type of composite disturbance.
The waveform of the given signal is shown in Fig. 24, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 23. The figures reveal that the harmonics occur in t ∈ [0,500] ms, the voltage flicker occurs in t ∈ [0,500] ms, FIGURE 23. The extracted characteristic waveform. VOLUME 8, 2020 and the voltage sag occurs in t ∈ [150,290] ms, which is consistent with the information in the given signal and reflects the characteristics of this type of composite disturbance.
H. HARMONICS + SWELL + SAG + IMPACT Given a voltage signal: The waveform of the given signal is shown in Fig. 26, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 25. As shown in the figures, the harmonics occur in t ∈ [0,500] ms, the voltage sag occurs in t ∈ [210,500] ms, the impact occurs in t ∈ [205, 209] ms and the voltage swell occurs in t ∈ [0,205] ms, which is consistent with the information contained in the given signal and reflects the characteristics of this type of composite disturbance. −u (−t 2 ))) 220 The waveform of the given signal is shown in Fig. 28, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 27. The figures show that the harmonics occur in t ∈ [0,500] ms, the voltage flicker occurs in t ∈ [0,500] ms, the voltage interruption occurs in t ∈ [315, 475] ms and the voltage swell occurs in t ∈ [475,500] ms, which is consistent with the information in the given signal and reflects the characteristics of this type of composite disturbance.  −u (−t 2 ))) 220 −u (−t 4 ))) 220 The waveform of the given signal is shown in Fig. 30, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 29. As shown in the figures, the harmonics occur in t ∈ [0,500] ms, the voltage flicker occurs in t ∈ [0,500] ms, the voltage sag occurs in t ∈ [111,159] ms and the voltage   swell occurs in t ∈ [159,500] ms, which is consistent with the information provided by the given signal and reflects the characteristics of this type of composite disturbance. −u (−t 2 ))) 220 −u (−t 4 ))) 220 The waveform of the given signal is shown in Fig. 32, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 31. The figures show that the harmonics occur in t ∈ [0,500] ms, the voltage flicker occurs in t ∈ [0,500] ms, the voltage sag occurs in t ∈ [0,211] ms, the voltage swell occurs in t ∈ [211,249] ms, and the voltage interruption occurs in t ∈ [249,500] ms, which is consistent with the information in the given signal and reflects the characteristics of this type of composite disturbance. = 220 −u (−t 4 ))) 220 0.5T ≤ t 6 − t 5 ≤ 30T ; 0.05 < a < 0.2; 0.1 < b < 0.5.
The waveform of the given signal is shown in Fig. 34, and the characteristic waveform extracted by 1D-MIR is shown in Fig. 33. As shown in the figures, the harmonics occur in t ∈ [0,500] ms, the voltage flicker occurs in t ∈ [0,500] ms, the voltage sag occurs in t ∈ [0,170] ms, the impact occurs in t ∈ [170,173] ms, and the voltage swell occurs in t ∈ [173,500] ms, which is consistent with the information contained in the given signal and reflects the characteristics of this type of composite disturbance.  Figs. 11 to 34 show the signal waveforms in the time domain and the extracted characteristic waveforms of the 4 single disturbances, 2 two-composite disturbances, 1 threecomposite disturbance, 3 four-composite disturbances, and 2 five-composite disturbances. The figures reveal that in the characteristic extraction part of the network, not only can the characteristics of the network be extracted but the difference in characteristics can also be magnified to sufficiently characterize the characteristics of the corresponding disturbance types such that the classification link can clearly distinguish the basic disturbance types contained in the given signal from the extracted characteristic waveforms to quickly localize the time when the disturbance occurs, thus identifying the disturbance type.

V. CONCLUSIONS
Aiming at the problems of a low convergence speed, low accuracy and poor generalization ability of the available traditional power disturbance identification and classification methods, this paper proposes a new deep convolutional network structure and a power quality disturbance identification and classification method for microgrids based on this new network structure. The large number of simulation experiments verify that the constructed network can quickly and accurately extract the characteristics of the various disturbance signals, including single and composite disturbances, and identify and classify them. Compared to the available power disturbance identification and classification methods, the proposed method attains a higher classification accuracy (as high as 99.1%), higher convergence speed and stronger generalization ability. The measured classification precision of the network on the data sets for the different sample types are 99.88%, 99.80% and 99.61%, the average classification precision of the network is 99.1%, and the network training time is only approximately 6 minutes.
In conclusion, the built network and proposed method in this paper for power quality disturbance identification and classification are feasible and effective, which provides a new approach for power quality disturbance identification and classification. However, they are not perfect. The main disadvantage is that they still cannot be used to detect certain components of disturbance signals, such as the amplitude of the disturbances. Next, we will improve them such that they can be applied to more fields of power quality governance.