Identification of Oscillatory Modes in Power System Using Deep Learning Approach

The increase in electric power demand pushes the modern power system for more interconnected networks. It leads to a lack of inertia and creates more critical disturbances in the power system. When this oscillation isn’t damped out, it results in cascade tripping. Immediate detection of low-frequency oscillatory modes and their parameters will help the power system operator to act on a particular event without consuming much time. This research paper proposes novel strategies for identifying low-frequency modes using deep learning techniques, and the model can predict the LFO modes in different topologies. This work presents the Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) approach to predict the instantaneous mode oscillatory parameters in the power system. Once the LSTM-RNN model is trained for different power disturbance situations, it can be used for any events associated with the system. Simulation results are verified using two area Kundur systems at various disturbance conditions. The simulations are performed using MATLAB software and python tensor flow library. The results are validated using statistical methods, and it confirms the superior viability and adaptability of the proposed approach in predicting the instantaneous mode parameters.

The associate editor coordinating the review of this manuscript and approving it for publication was Ali Raza .

I. INTRODUCTION
Due to the rapidly changing electrical dynamics, the modern interconnected power system is inevitably disturbed by various oscillation events. Electromechanical oscillations may occur in different frequencies, and it is not dangerous if they decay quickly. The stability of the power system will be wrecked if there is no proper damping [1], [2]. The analysis of low-frequency oscillatory modes and their characteristics lead to an adequate understanding of the dynamic performance of the power system. It will give productive inputs to the operator for prevention and control. Due to the issues impacted by LFO, the capability of monitoring grid operations in real-time is critical for the safe and reliable operation of the grid. Underdamped oscillations lead to significant power swings and tripping of protective relays, resulting in the disconnection of loads [3]. Advanced wide-area monitoring of power grid systems associated with phasor measurements units (PMU) can continuously evaluate the health of the power system. Dynamic monitoring of the power system for real-time operation and control has risen in the last two decades. Researchers have proposed various linear and nonlinear approaches to assess the dynamic responses and estimate the parameters of dominant low-frequency oscillatory modes. Power system modes are evaluated using two methods: Modal based approach and Measurement-based approach. The former works on linearizing the governing equation about an operating point [4], and the latter purely follow a data-driven analysis on the system measurement data [5]. IEEE task force on identifying electromechanical oscillatory modes summarizes different techniques used in modal and data-driven approaches [6]. The applicability of model-based techniques in a large-scale power system is constrained, and computation time is too long. Simultaneously, measurement-based methods are used widely to identify the low-frequency modes with synchrophasor technology. The measurement-based techniques are found in many works of literature. Some of them are Prony analysis [7], matrix pencil method (MPM) [8], estimation of signal parameters via rotational invariant techniques (ESPRIT) [9], auto regressive moving average (ARMA) technique [10], and eigenvalue realization algorithm (ERA) [11]. The above mentioned methods are used in ring-down oscillation studies. The methods used for ambient oscillation studies include transfer function methods [12], [13] and subspace methods [14], [15]. The subspace approach gives better results in terms of accuracy, but transfer function methods are preferred in terms of computational time [16].
The majority of power oscillation analysis follows a stationary nature. However, nonstationary analysis is preferred for power oscillation studies and parameter estimation. Hilbert Huang transform (HHT) is very popular in this class, and it comprises of two techniques, Empirical mode decomposition (EMD) and Hilbert transform (HT) [17], [18]. EMD is a well-known signal decomposition technique and HT is used for instantaneous parameter estimation. EMD is modified into ensemble EMD and complete ensemble EMD with adaptive noise (CEEMDAN) [19] based on their denoising power. The drawbacks of EMD are overcome in Variational Mode Decomposition (VMD), a signal decomposition technique with non-recursive nature suggested by Dragomiretskiy et al. for modal parameter estimation [21]. It exhibits better denoising properties and has a credible theoretical foundation. An adaptive Wiener filter bank is employed to efficiently decompose the center frequency test signal into a restricted bandwidth. The work presented in this paper uses VMD with Teager Kaiser Energy Operator (TKEO) to estimate instantaneous parameters [22]- [24]. TKEO is an estimation method that is highly robust than the HT method [25]. This approach is used in a supervised manner for the deep learning techniques adopted in this paper.
Recent trends show that Artificial Neural Network (ANN) methods are more powerful and reliable and provide better results in real-time applications. Besides, the proposed method should track LFOs in real-time and thus entail widearea PMU data. Therefore measurement-based estimation techniques combined with ANN-based methods extract useful information from the power system. ANN-based power oscillation damping control is suggested in a few pieces of literature [26], [27]. Because of massive data from synchrophasors, initial data is appropriately subjected to preprocessing stage and dimensionality reduction. It results in a marginalized reduction in input data and network size, and hence computation time for the offline training can be reduced. In recent years deep learning techniques have become one of the most efficient tools in many research areas. It is successfully implemented in power systems, particularly short-term load forecasting problems and disturbance classification [28]. Other works include the deep learning approach of cost loss function for transient stability assessment proposed by Zhou et al. [29] and prediction of load demand in the smart grid using the LSTM network presented by Cheng et al. [30].
This paper presents a deep learning method using LSTM -RNN to estimate low frequency mode parameters. This method is compared with the results obtained using conventional back propagation neural network (BPNN), radial basis function neural network (RBFNN), and gated recurrent unit (GRU) architecture. The results of the Kundur two-area system is verified using MATLAB and python tensor flow library. Here the analysis pattern follows a feature imperative strategy method that enhances the determination of dominant modes in the power system oscillation. Hence an operator can quickly determine the critical modes in the power system. Also, Deep learning methodologies enhance the data visualization possibilities of the dominant mode situation.
The significant contributions of this study are as follows (1) The identification of oscillatory modes in power systems using deep learning techniques are discussed (2) The LSTM-RNN model is trained using the advanced signal processing methods of VMD and TKEO method.
(3) The proposed method of LSTM-RNN offers lesser computation time and memory storage than the existing learning methods.
The rest of the paper is organized as follows. Section II describes the methodology of synchrophasor technology, VMD approach, TKEO technique, and LSTM architecture.
Section III describes results and discussion, and finally, the concluding remarks are given in Section IV.

II. METHODOLOGY
This section introduces the standard synchrophasor data acquisition procedure, signal decomposition using VMD, estimation of instantaneous parameters using TKEO and LSTM architecture. It also states the importance of LSTM over other learning approaches.

A. SYNCHROPHASOR DATA ACQUISITION
The PMUs can measure time-stamped measurements of positive-sequence voltages and currents of all buses and feeders where it is installed in addition to frequency and rotor angle. The system voltage is collected using PMU connected to the buses. Fig.1. represents the process of data acquisition using PMUs. The measured data sets are stored in a phasor data concentrator (PDC) which rejects the bad data and stores all the information for further analysis. Another advantage of synchrophasor is that it can correlate data measured from the different substations to the common time reference, and hence a wide-area network status can be accessed. When a disturbance occurs, the data is collected and undergoes a postdisturbance analysis. One such method consisting of VMD and TKEO is discussed in the following subsection.

B. VARIATIONAL MODE DECOMPOSITION
The preprocessed data is decomposed using the VMD technique, and the instantaneous parameters like instantaneous amplitude (IA), instantaneous frequency (IF), and damping ratio (DR) were estimated using the HT method. This section will first discuss the VMD technique and demonstrate the HT platform in the later session. VMD is a multi-resolution analytical signal decomposition method based on the concepts of adaptive Wiener filtering, one-dimensional Hilbert transform, and Heterodyne demodulation [21]. VMD's motive is to decompose a real-valued nonlinear nonstationary signal f (t) into a set of discrete sets of quasi-orthogonal intrinsic mode functions (IMF) represented as u k , where K denotes the mode number. This set of IMF signals are regarded as modulated amplitude, and frequency signals with a center frequency of ω k VMD require the subsequent computational processes as follows: (1) Hilbert Transform is applied to the one-sided spectrum of each of the IMFs to compute its signal characteristics.
(2) A multiplication factor of e −jω k t is considered to shift the frequency spectrum of mode to baseband.
(3) The estimation of bandwidth using gradient of modulated signal based on the L2 norm.
The VMD method is assumed as a constrained optimization as in (1) The objective function is modified into an unconstrained optimization problem as in (2) VMD measures these central frequencies and IMFs at these frequencies concurrently using an optimization technique called the alternate direction method of multipliers [36]. The precise framework of the optimization problem in the time domain is continuous. The various modes are determined by updating the previous mode and center frequency, using equations (3) and (4).
The updation of the modes and center frequencies are made, and the Lagrangian multiplier is also restructured as in equation (5).
The mode update process is performed until it converges to a tolerance factor using equation (6), where ε is the tolerance factor.
Based on the mode number, IMFs are extracted. Parameters like fidelity factor (α) and mode number (K ) are needed to initialize the VMD operation. These two parameter values are randomly allocated, leading to needless decomposition stages. In this work, the data samples from the PMU have undergone Fourier spectra [25]. The number of peaks in spectra was identified and assigned as the mode number.
In the fidelity factor, typically for low-frequency extraction, higher values of α are preferred, and it is assumed as 8000 for the VMD process.

C. TEAGER KAISER ENERGY OPERATOR
TKEO monitors the modulation energy and determines the instantaneous amplitude and frequency. A highly nonlinear energy operator is considered here and the TKEO ψ(·) is one of the finest options. A continuous signal c(t) is defined as The energy operator in discrete form is given by where c(n) represents the discrete-time signal. The operator has a better time resolution in seizing energy fluctuations. The instantaneous amplitude (IA) and frequency (IF) are shown as Instantaneous values of damping ratio (DR) are calculated by (15) where ω n be the natural frequency. The TKEO method gives a quality estimate of instantaneous parameters with less computational complexity.
In all the cases analyzed in this paper, the properly preprocessed input data [31], [32] is fed to the VMD process, and the mode parameters are estimated through the TKEO. This kind of instantaneous parameter estimation is described in the literature in [25]. These results were taken as the actual values for comparison with learning algorithms. LSTM-RNN architecture is described in the following subsection of the methodology part.

D. LSTM ARCHITECTURE
A normal BPNN assumes the data instances are independent and cannot handle sequential data such as text and time-series data. RNN is a class of neural networks, and its cyclic nature gives the ability to work with temporal data. Two of them are famous; one is from Jordan and Elman. Jordan network is a simple neural network and is formulated as Here in this equation X is the input, h is the hidden representation, y be the output, σ be the activation function, W h is the weight of hidden layers, W y is the weight of the output layer, and W r weight of the recurrent computation. Elman proposed a slight modification in information from the previous time step provided by the previously hidden layer, and selection of W r is also different from the Jordan technique [33], [34]. After these inventions, bidirectional RNN was invented by Schuster and Paliwal [35]. This first hidden layer is unfolded using basic RNN, and the second hidden layer is simulated in a reversed connection. Then the backpropagation can be applied on a time scale, and the weights are updated. This is the basic concept of LSTM-RNN. The disadvantage of this technique is the vanishing gradient problem stated by et al. [36]. It deals with the traditional activation functions in which the gradient is bounded. Usually, the gradients are calculated by backpropagation, and its error values decrease exponentially within the time steps, which eventually tends to long-term dependency loss. LSTM has been introduced and incorporated by a specially designed memory cell unit called an LSTM cell to overcome the vanishing gradient problem. Also, LSTM avoids the long-term dependencies of RNN.
LSTM has different critical parameters in terms of states and gates, in which states are the values that offer information for output, and the gates decide the information track of states. Input state I is defined as where h for the value of the hidden layer otherwise hidden state and x is the input data. One more state is there rather than the input state is an internal state, and it serves as a memory, denoted as m. Gates are mainly of three types: input gate, forget gate, and output gate. The input gate decides whether the input state enters the internal state and is denoted as g and represented as The forget gate adopts whether the internal state forgets the previous internal state and is denoted as f and represented as The output gate determines if the internal state passes its value to the output and to the next hidden state. It is denoted as o and represented as Generally, the complete formulation of the LSTM network is represented as The symbol * represents the element-wise multiplication of parameters, and the whole LSTM architecture is represented symbolically, as in Fig.2.
Hyperparameters of deep neural networks are customized by their optimization or tuning techniques. Searching for hyperparameters that result in the best model performance on a specified set of data is called hyperparameter optimization. It involves defining a search space with the volume to be searched in which each dimension represents a hyperparameter, and each point represents a model configuration [37]. First tuned for batch size and epoch, and take the batch size of 10, 20, 40, 60,80, and 100, epochs of 100,150, 200, and 250, and tuned different combinations of this batch size and epochs finally get 20 batch size, and 200 epochs are the best parameters. Next is the grid search for the number of hidden layers and the number of neurons in the hidden layers. Search number of neurons in the dictionary of [10], [10,10], [10,20], [10,20,30]. From which [10] shows the best results that are the bidirectional neural network. It has one hidden layer and ten neurons.
As a result of an activation function, a node or nodes within a layer of a neural network are transformed into an output based on the weighted sum of the input. A rectified Linear activation (ReLU) function is used, and it gives an output x, if x is positive and otherwise zero. Thus A(x) = max (0, x). Due to simpler mathematical operations, ReLu is a less computationally expensive option than tanh or sigmoid. Only a few neurons are activated, making the network sparse and efficient, leading to easy computation.
The assessment of LSTM-RNN architecture towards PMU data is as follows. The datasets from the PMU units are collected and evaluated; these are called testing datasets. The next step is the preprocessing of data in which the data is subjected to noise filtering, checking for missing values, and averaging to the mean value. This phase is necessary to make the data set comfortable with the algorithm and reduce computation time. Training and testing data can be calculated as Training data = PQ * 80% Testing data = PQ * 20% Here the P represents the number of samples Q is the input variable. 80% is the percentage target of the training data, and 20% defines the percentage target of testing data.
Using the Python tensor flow library is used to evaluate the LSTM network. The decision for each weight of the LSTM is apprehended in quantity to the derivative of the error. There are three hidden gate layers in each hidden node. A comparison between the estimated accuracy of the algorithm and the actual response is made to determine the learning rate and activation. Then the performance of LSTM accuracy is an evaluation of Mean Absolute Error (MAE), Mean Square Error (MSE), and Mean Absolute Percentage Error (MAPE).
where S represents the number of testing data, λ t represents identified data, and λ t is the target data. The complete algorithm is programmed and implemented in Keras with tensor flow backend on a desktop computer with 3.2 GHz. The raw frequency data from various PMU units over different nodes should be preprocessed adequately in the first stage. It should filter out noise from the measurements, detrend the data, and set up a proper window frame for offline training. After the preprocessing stage, different power disturbance cases were analyzed and fed to ANN or LSTM approach in a sliding window manner. The estimated instantaneous values of amplitude, frequency, and damping ratio are the results for each mode and it is the output. Fig.3. represents the overall approach for implementing training and testing of the learning algorithm. The present study focuses only on applying the learning algorithms to estimate parameters in the power oscillations. The results should be verified with actual values. ANN approaches of backpropagation neural network and radial basis function are tested. Also, the GRU technique is tested with the dataset, and finally, the LSTM-RNN approach is implemented for the instantaneous parameter prediction with a good fit LSTM model. The results and discussion section describe the dataset generation and application of various learning approaches in parameter estimation.

A. DATA SET CREATION
The application of the proposed approach to a two-area Kundur system is discussed in this section and is shown in Fig.4. A post-event analysis is carried out based on the measurements obtained via the standard system. The system consists of 11 buses, four generators, and two areas. The two areas are connected through a weak tie-line between buses 7 and 9. A rotor speed signal is recorded during the 20-second simulation period at 60 samples per second. Generator three is considered for reference, and a few disturbances are applied to the system to create a suitable training model. VMD technique is applied to the measurements, and the predefined parameters are assigned. The mode number is assumed to be six and the fidelity factor to be 8000. The measurements are decomposed into six modes by applying VMD, and the IMF is selected by using the correlation coefficient. Based on the appropriate IMF, instantaneous parameters are estimated using the TKEO method. It includes instantaneous amplitude, frequency, and damping ratio. This method is trained using various disturbances like three-phase fault, line removal, varying the load, etc., and is shown in Fig.5. Similarly, so many disturbance cases can be created for the data generation, and the result is used for the training purpose of learning techniques. IMF3 is preferred based on the excellent correlation coefficient and is used to estimate the instantaneous parameters.

B. APPLICATION OF VARIOUS LEARNING APPROACHES
Based on the generated sample set, various learning algorithms and their application on parameter estimation are verified in this subsection. The parameter summary of various   TABLE 1 and TABLE 2 respectively. ANN models use a feed-forward backpropagation neural network that sets the maximum number of epochs to 500. After each epoch, the network weight is adjusted and biased to the minimum error value. TANSIG transfer function gives lower MSE. The model is trained with different numbers of hidden layer neurons 1, 2, 5, 8, and 10, 12, 15. Out of these hidden layers, 20 neurons show a better least mean square error of 0.0010, and the TRAIN-LM algorithm gives faster results. It is verified with both 10 and 60 time steps, and when we increase the time steps, the errors become smaller. The ANN model shows good performance and an overall regression value of 0.96574 for mode 3, as shown in Fig.6. Regression means the correlation between output and target and the R-value estimates as one indicates a close relationship. The trained models are tested using another disturbance within the system by applying a three-phase fault at 6 seconds between bus 7 and 8, as shown in Fig.7. Comparison of BPNN estimate with actual values is visualized in Fig.8. The RBFNN is also adapted with different time steps.
The results show that the MAE parameter reduces much better than backpropagation. It still shows performance difficulties in the nonlinear problems and needs to catch this by upgrading to another learning technique. However, it is not easy to follow the peak when the amplitude varies dramatically. The ANN techniques are unsuitable for nonlinear systems, indicating that more accurate approaches are necessary to estimate instantaneous parameters [38].  In the case of GRU can capture the dependencies of input and output parameters. It achieves better performance than ANN methods regarding statistical parameters shown in TABLE 2. GRU with 60 time steps offer better performance than with 10 or 1 time step. Its MAE values are smaller than BPNN and more than the RBFNN technique. The identified model obtained through GRU can be modified using better learning techniques.  LSTM-RNN models are also trained using the mode estimation results. The instantaneous parameters are taken as target data and input as preprocessed frequency signals from PMU. Optimization helps the model to reduce losses and provide the most accurate results. This model uses adaptive moment estimation because it is too fast and converges rapidly. Out of the total samples, 80% is used for training and 20% for validation. The convergence characteristic of various algorithms of BPNN, RBFNN, and GRU is compared with the LSTM network. LSTM ensures a good fit model for the estimation of parameters, and it reflects in the convergence characteristics.
Overfitting and underfitting analysis are the primary purposes of investigating training and validation errors. The underfitting analysis should be considered if training performance is lower than validation. Also, if it is vice versa, the overfitting analysis should be considered. Training loss and validation loss are close for a properly trained model, with validation loss being slightly greater than the training loss. Comparison of LSTM estimate with actual values is visualized in Fig.9. The convergence characteristics in Fig.10 show the better performance of LSTM. It is observed that LSTM shows better convergence and minimum error compared with   the ANN techniques and GRU. The increase in the time steps of LSTM will improve the convergence as per the studies.
For a well-trained LSTM, the identifying performance is independent of the previous system configuration and parameters. The minimum values of MAE, MAPE, and MSE values of LSTM in TABLE 2 reflect the quality of the deep learning approach in LSTM in applying power oscillation analysis. As per the convergence characteristics, LSTM takes 26 epochs with 10 time steps for converging, and it implies the higher training quality in the results.
Overall CPU time and memory storage are verified for different methods are shown in TABLE 3. LSTM approach dominates in this analysis too, with its more incredible converging speed. From the final results of convergence and statistical analysis, it is observed that Ann techniques and GRU sacrifice more input characteristics and time steps to get a better result. Meanwhile, the LSTM method always follows the peak and uses fewer neurons and time steps.

IV. CONCLUSION
This research paper is a preliminary step towards a more extended project of a complex real-time power system environment. In this LSTM architecture is used to predict the dominant modes or the instantaneous behaviour of the system. The proposed technique is based on the offline training of the LSTM technique and is compared with conventional approaches like BPNN, RBFNN, and GRU techniques. The development of the training set is done through a preprocessing stage, VMD and TKEO combination is used for the mode estimation. The proposed technique is validated using two area Kundur system with a simulated test signal, and its effectiveness is verified using statistical parameters like MAE, MAPE, and RMSE values. Convergence characteristics were also plotted based on the learning techniques used in this article. LSTM results offer the lowest error values than the conventional learning techniques. Through this research, paper authors are trying to establish the importance of the deep learning approach in the estimation of instantaneous oscillatory parameters in power systems.
SUNITHA RAJAN (Senior Member, IEEE) received the B.Tech. degree in electrical and electronics, the M.Tech degree in energetics, and the Ph.D. degree in power systems from the National Institute of Technology Calicut, in 1996, 1999, and 2014, respectively. She is currently working as an Associate Professor with the Department of Electrical Engineering, National Institute of Technology Calicut, Kerala, India. She has authored and coauthored several research papers and has been a referee to several journals and conferences. Her current interests include the applications of wide-area monitoring systems in power systems, power system stability and control, power system security, smart grids, and microgrids.
MANU MADHAVAN received the B.Tech. degree from the Nehru College of Engineering and Research Center, the M.Tech. degree from the Government Engineering College Palakkad, and the Ph.D. degree from the National Institute of Technology Calicut. He is currently working as an Assistant Professor with the Amrita School of Engineering, Coimbatore. He has published several research papers. His research interests include machine learning, natural language processing, and bioinformatics.
HASSAN HAES ALHELOU (Senior Member, IEEE) was with University College Dublin (UCD), Ireland, and the Isfahan University of Technology (IUT), Iran. He is a Faculty Member with Tishreen University, Syria. He is included in the 2018 and 2019 Publons list of the top 1% Best Reviewer and Researchers in the field of engineering. He has published more than 150 research papers in high-quality peer-reviewed journals and international conferences. His research interests include power systems, power system dynamics, power system operation and control, dynamic state estimation, frequency control, smart grids, microgrids, demand response, load shedding, and power system protection. He has participated in more than 15 industrial projects. VOLUME 10, 2022