A Multidirectional LSTM Model for Predicting the Stability of a Smart Grid

The grid denotes the electric grid which consists of communication lines, control stations, transformers, and distributors that aids in supplying power from the electrical plant to the consumers. Presently, the electric grid constitutes humongous power production units which generates millions of megawatts of power distributed across several demographic regions. There is a dire need to efﬁciently manage this power supplied to the various consumer domains such as industries, smart cities, household and organizations. In this regard, a smart grid with intelligent systems is being deployed to cater the dynamic power requirements. A smart grid system follows the Cyber-Physical Systems (CPS) model, in which Information Technology (IT) infrastructure is integrated with physical systems. In the scenario of the smart grid embedded with CPS, the Machine Learning (ML) module is the IT aspect and the power dissipation units are the physical entities. In this research, a novel Multidirectional Long Short-Term Memory (MLSTM) technique is being proposed to predict the stability of the smart grid network. The results obtained are evaluated against other popular Deep Learning approaches such as Gated Recurrent Units (GRU), traditional LSTM and Recurrent Neural Networks (RNN). The experimental results prove that the MLSTM approach outperforms the other ML approaches.


FIGURE 1. Components of the smart grid.
distributed solar-photovoltaic generation and energy storage. This deployment overcomes the decision making an issue of using a battery-powered system. Figure 1 depicts the interaction between the power generation units, distribution centers and other different entities such as industrial factories, electric vehicles, smart buildings, and households. The smart grid plays a significant role in efficiently dissipating the right amount of power to these various entities. The flexibility in the power distribution process is achieved by implementing different AI algorithms in the smart grid.
A smarter grid, which can predict the power demand is the need of the hour. This can be accomplished with the application of Machine Learning (ML) algorithms [14], [15] on the data generated from the grid. The smart grid can help to reduce the pollution and make the electricity price much cheaper.
In this work, a novel Multidirectional Long Short Term Memory (LSTM) model is proposed to predict the stability of SG by classifying the smart grid dataset collected from UCI machine learning repository [16]. The experimental results are then compared with recent deep learning algorithms like Recurrent Neural Networks (RNN), conventional LSTM, and Gated Recurrent Units (GRU). The steps involved in the current work are as follows: 1) The Electricity Grid dataset is collected from the UCI machine learning repository.
2) The min-max normalization is used for normalizing the dataset. 3) Label Encoding technique is used for converting categorical, textual data into numerical data. 4) This data is fed to the proposed MLSTM model for training the dataset. 5) The proposed model is then evaluated with other deep learning techniques like RNN, GRU, and traditional LSTM using some metrics, including accuracy, precision, recall, and F1-score. In summary, our contributions in this work can be highlighted as follows: • A novel MLSTM model is introduced to predict the stability of the smart grid dataset.
• A comprehensive preprocessing is performed on the smart grid dataset.
• An accuracy of 99.07% is achieved which is higher compared to other state-of-the-art deep learning models. The rest of the paper is organized as follows. Section II discusses recent state-of-the-art literature related to the application of deep learning algorithms on smart grids. In Section III, the proposed model is discussed in detail. Experimental results are discussed in Section IV, which is followed by a conclusion and future work in Section V.

II. LITERATURE SURVEY
In this section recent state-of-the-art works related to application of machine learning techniques on smart grid are discussed.
Technology growth leads to different cybercrimes. One of the most prevalent is, modifying the data in the smart meters [17]. The authors present a framework aggregating Finite Mixture Model clustering for customer segmentation and a Genetic Programming algorithm [18] for recognizing new functions that aid for accurate predictions. The authors VOLUME 8, 2020 also use a Gradient Boosting Machine algorithm, which is embedded in the framework and performs better than the existing ML algorithms. Most of the existing smart systems use the combination of IoT [19]- [21] embedded with the power of ML algorithms so that the systems perform to their fullest capacity efficiently. The authors in [22] analyze various IoT-based ML modes, which are used in the different domains such as healthcare, smart city, and vehicle-vehicle communication in smart cites. The deployment challenges of various IoT and machine learning systems are also discussed.
A covert data integrity assault (CDIA) on a communications network might be hazardous by directly reducing the availability and safety-measures of a smart grid. This assault is carefully deployed to avoid the conventional bad data detectors in power control stations, and this assault can confront the integrity of the data and instigating a false assessment of the state that would brutally affect the whole power system process. In [23] the authors develop an intelligent system using an unsupervised ML-based model to identify CDIAs in smart grids by using non-labeled data. The smart grid is developed using various ICTs, which lead to humongous data originating from various sources. Usage of big data analysis [24] and intelligent systems in smart grid systems helps to solve the challenge of processing and to manage the huge amount of data. Various challenges associated with usage of big data analysis in smart grids are highlighted in [25]. The authors also enlist various applications of big data in smart grid systems.
The estimation of smart grid stability epitomizes a challenging research problem due to the fact that information used for the membership authentication may lead to instabilities in the grid. This is used to govern configurations in which the grid is stable irrespective of abnormalities. The authors in [26] analyze the usage of a machine learning algorithm for forecasting the smart grid stability based on feature extraction. The authors here use three methods for the feature selection process: Binary Particle Swarm optimization Features Selection (BPSOFS), Binary Kangaroo Mob optimization Features Selection (BKMOFS) and Multivariate Adaptive Regression Splines (MARS). The forecast of the grid stability is done by using four classifiers: Logistic Regression (LR), Random Forest (RF), Gradient Boosted Trees (GBT) and Multilayer Perceptron Classifier (MPC).
The age-old electrical grids constituted of one-way communication between electrical grids and consumers. These grids were deployed across the globe, but the efficiency of power management was a significant concern. To solve this challenge, the smart grids evolved with two-way communication between the grid and the consumer. The primary objective behind developing this smart grid was to accurately predict the energy requirements in a given demographic. The parameters such as historical weather, load, and energy generation data [27] are used to implement machine learning algorithms for predictions. The researchers also develop two models, namely: linear regression model, and deep neural network [28]. These models were assessed by means of root mean squared error and it was observed that the deep neural network models out-performed the linear regression models in forecasting load and energy production at a given region.
Energy load prediction in the smart grid is forecasting the electrical power requirements to meet dynamic demands. There is a dire need to embed accuracy for load prediction which aids electrical services in monitoring their energy generation and process. Presently most of the prediction in any system is done through machine learning algorithms to achieve maximum efficacy. Apache Spark and Apache Hadoop [29] are proposed as big data frameworks for distributed processing for prediction of the energy load. The authors also use MLib to assess the prediction accuracy of various regression methods such as linear regression, generalized linear regression, decision tree, random forest and gradient-boosted trees.
Currently, the smart grids are in trend for the deployment of power systems to manage the energy dissipation efficiently. Since the deployment of a smart grid network involves huge complexities due to the humongous data being generated, the use of artificial intelligent systems helps in easing this process. The advancements in intelligent systems evolution has paved a way for better methods such as Deep Learning (DL), Reinforcement Learning (RL) and deep reinforcement learning (DRL). The challenges in deploying these methods in Smart grid are enlisted to enable further investigation by future researchers [30].
The dynamic energy consumption in household appliances is a major concern for sustainability and efficacy in building smart cities. The advancement in IoT technologies enables newer energy management techniques to address this dynamic energy usage. The authors in [31] suggest an implementation of probabilistic data-driven prognostic technique for consumption prediction in inhabited constructions. This technique uses a Bayesian Network (BN) framework which enables the system to determine dependency relations amongst contributing variables. The authors assess the proposed technique by using the datasets given by Pacific Northwest National Lab (PNNL) which was aggregated through a pilot Smart Grid project.
Various big giants have come together, forming a consortium to address the challenges and future needs of smart grids. Smart grids use different Artificial Intelligence (AI) methods such as Artificial Neural Network (ANN), Machine Learning (ML) and Deep Learning (DL) for efficient energy consumption. Different algorithms of DL for load prediction issues are proposed in [32] for the smart grid. The authors focus on the usage of different apps of DL for load prediction in the smart grid network. The authors also compare the accuracy results of Root Mean Square Error and Mean Absolute Error for the studied applications and their results infer that the use of convolutional neural network with k-means algorithm had a huge percentage of reduction in case of root mean square error. Electricity price has a significant impact on the stability of the distributed power grids. The cost sensitivity and reaction times of power producers and consumers also influence the stability factor. Wood [33] propose a model called decentral smart grid control (DSGC) to render the demand-side control of distributed power grids by associating the electricity price to variations in grid frequency upon the time gauge of a few seconds. The authors simulate the power demand-side consumption/production on analogous time gauges. The authors also implement an optimized data-matching machine-learning algorithm, the transparent, open box (TOB) learning network to envisage dynamic grid stability for the simulation from its independent variables. Training the vast data with variations in the dimensionality reduces the effectiveness of the machine learning model. The authors in [34] implement a secondary principal component analysis (PCA) algorithm [35] to decrease the data dimensions. This algorithm is applied to manage the ML techniques to increase the stability of the grid systems.
From the above discussion, there is no foolproof system to address the challenges of stability in a smart grid network. In this work, a novel Multi-Directional LSTM (MLSTM) architecture to train the SG dataset is proposed.

III. PROPOSED MULTI-DIMENSIONAL LSTM MODEL
The workflow of the proposed MSLTM model to analyze the power utilization [36], [37] and predict the stability of the SG dataset is depicted in Figure 2. First, the electricity grid dataset from different power generating units is aggregated. The dataset is then normalized using min-max normalization. During this process, the minimum and maximum values of the data are obtained and replaced by using Eq. (1).
+new_ min(X ), (1) where X represents an attribute in the given dataset, min(X ) and max(X ) denotes minimum and maximum values of the attributes in the given dataset, respectively. L indicates the updated value of every entry in a dataset, l corresponds to the previous values in the dataset, new_ max(X ) and new_ min(X ) denotes the upper and lower boundary values in the given range, respectively. The ML algorithms cannot process the categorical values in the dataset. This is the reason behind using the Label encoding technique which converts the categorical values in the dataset into numerical values suitable for processing by the ML algorithms. In the next step the smart grid dataset is trained with the proposed MLSTM approach. The performance of the proposed model is then compared with RNN, traditional LSTM, and GRU models by using metrics such as accuracy, precision, recall, and F1-score. These models are explained in brief in the following.
Recurrent Neural Networks (RNN): Traditional Neural Networks (NN) do not yield satisfactory results on time series data due to the vanishing gradient problem. In order to handle the issue mentioned above, RNN was introduced in 1982 by Hopfield [38]. RNN has the advantage of making the NNs learn the patterns over a period of time. RNN can predict the sequential data like actions based on previous events in a video, audio from speech, events from the text, etc. The working model of RNN is shown in Figure 3. In the Figure,    represents the hidden layer's weight vector, E t represents the output layer's weight vector, D t denotes input word vector. The hidden layer time stamp t is calculated by using Eq. (2).
where σ (·) is considered as the activation function. The activation function can be either Sigmoid, Tanh, or Relu. At every timestamp t, the hidden state X t is computed by using Eq. (2) with the corresponding inputs and parameters. Long Short Term Memory (LSTM): RNN uses more number of network layers, which is a challenge to gain an insight about the parameters from previous layers. In recent studies, the LSTM network is one of the popular approaches to overcome this challenge [39]. LSTM has a chain structure like RNN that has multiple neural network modules. Figure 4 illustrates the architecture of LSTM, which consists of different gates like input gate, output gate and forget gate. These gates select and reject the information passing through the network.
Input gate i (t) consists of tanh as activation function ranging from −1 to 1. It takes the current input x (t) , parameters C (t−1) and h (t−1) for processing. Forget gate f (t) has sigmoid and tanh as the activation function. The forget gate decides how much of the information from the previous output has to be retained. If the value is 1, then the data will be transformed into the network, if it is 0, the data will not be passed through the network. The output gate o (t) has sigmoid as an activation function ranging from −1 to 1. At every timestamp, i (t) , o (t) , f (t) are computed by using the following equations.
Traditional LSTM works bi-directionally, while in this research, two LSTMs are used; one LSTM for upward and downward scanning and the other LSTM is for right and left scanning. The input of the second LSTM is a summation of the first LSTM. The proposed MLSTM utilizes twice the number of input gates, output gates and forget gates as compared to the traditional LSTM. MLSTM achieves a promising accuracy, but it has a computational overhead. Gated Recurrent Unit (GRU): GRU is the latest variant of RNN designed to deal with short-term memory problems similar to LSTM. GRU does not have a cell state and makes use of a hidden state to carry information. GRU also consists of two gates: a reset gate r t and an update gate z t , which are represented in the following equations: where z t denotes update gate, σ (·) represents the sigmoid function, w, U and b are parameter matrices and vector, h t denotes the output vector, x t denotes the input vector. The update gate performs similar functions of the forget gate and an input gate of an LSTM. It is responsible for deciding which information has to be dropped and which information has to be added. The reset gate is responsible for determining the amount of the previous data to be forgotten. Since GRU has fewer gates compared to LSTM, the training process is normally faster.

A. MLSTM MODEL
The proposed model consists of four 1D spatial LSTMs to scan each column and row in different directions independently. At every step, hidden layers are calculated and summed at the end. Two spatial 1D LSTM are applied in (A) vertically, where the hidden states are calculated at every step and the results are summed. Similarly, in (B), two spatial 1D LSTM are applied in horizontal directions. The combination of A and B defines the Multidirectional LSTM, as shown in Figure 5. As presented in Figure 5, the suggested module is a combination of the 1D spatial LSTM (A) and (B). In each module, two 1D LSTMs are used for scanning across the attributes maps vertically or horizontally in both the directions (bidirectional LSTMs) and their hidden states are updated at every spatial step. For each of the vertical and horizontal directions,  two output attribute maps are used. In the current simulation, a sum of these output attribute maps is used (an alternative is to concatenate the output feature maps, but that would increase the number of parameters).
For illustration, we consider the input data for the 1D spatial LSTM. F L is an attribute vector at every spatial location, every column or row on the function cards is treated as a series. While scanning from top to bottom, the attribute response for m, n represent the dimensionality of features and it can be estimated as: where F L+1 m,0 = 0, F L m,n ∈ R d * 1 , F L+1 m,n , F L+1 m,n−1 ∈ R D * 1 , U ∈ R D * d , U ∈ R D * D , b ∈ R D * 1 , D depicts the number of nodes used in 1D Spatial LSTM and F represents the non-linearity function. Similarly, right to left 1D spatial LSTM can also be calculated using the same approach. Here d represents number of attributes in the dataset, while * depicts the multiplication. Figure 6 represents the working of the proposed model at every hidden layer. To process the data at hidden layers, the MLSTM model uses three approaches, namely forward propagation, sum, and concatenation as follows:

IV. RESULTS AND DISCUSSION
This section discusses about prediction of smart grid stability using the proposed model.

A. EXPERIMENTAL SETUP
The experimentation is carried out using an online Graphical Processing Unit (GPU) provided by Google called ''Google Colab''. A personal computer with Windows 8.1 Operating system and core I3 processor is used. The programming language used for this purpose is Python 3.7.

B. DATASET DESCRIPTION
The dataset used for the experimentation is collected from the UCI machine learning repository [16]. The dataset has 10000 instances with 14 attributes. The attributes in the dataset have information related to the electricity producer values, nominal power consumed/produced, coefficient related to price elasticity, the maximum value of the equation root, the stability of the system (class label, whether the system is stable or not).

C. METRICS USED FOR EVALUATING THE PROPOSED APPROACH
The performance metrics used to evaluate the proposed MLSTM model are explained below. Confusion matrix: A confusion matrix is used to evaluate the performance of a classification model.
• Accuracy: Accuracy signifies the correctness of a classifier. Accuracy is calculated as follows: • Precision: • Recall:

D. PERFORMANCE EVALUATION OF THE PROPOSED MODEL
70% of the SG dataset is used for training and validation, and the remaining 30% of the dataset is used for testing and validation purpose. Various evaluation methods are used for experimentation, which include true positive, true negative, false negative, false positive, precision, recall, f1 score, training loss and testing loss. At the end, Receiver Operating Characteristic (ROC) curve is also used to justify the results. The proposed MLSTM model is found to be very effective for the smart grid stability system. The accuracy for both stable and unstable labels are more than 99%. Table 1 depicts the confusion matrix for traditional deep learning models and proposed MLSTM model. Proposed MLSTM detects 1084 data as true positives and 1888 true negatives accurately. Only 28 data are misclassified by the proposed model. Training and testing accuracy achieved using GRU is are 97.17% and 97.30% respectively. Similarly, for training and testing the data loss for GRU is 0.06. Using RNN the training and testing, accuracy is recorded as 96.66% and 96.60% respectively. The data loss incurred is 0.08 and 0.08 for both training and testing using RNN. Proposed MLSTM achieved 99.07% accuracy and 0.02 loss for both training and testing respectively. The proposed method achieved 2% higher accuracy when compared to GRU, RNN and LSTM as shown in Table 2. The proposed MLSTM achieved 97% and 100% precision for stable and unstable class respectively. Recall for both stable and unstable classes is recorded as 100% and 99% respectively. F1-Scores for detecting the stability of the smart grid systems for stable and unstable classes is 99% respetively. ROC for MLSTM is 99.27% which is higher in comparison with GRU and LSTM.
From Figure 7 it can be concluded that the proposed model achieved 0.02 loss while other traditional deep learning models GRU and LSTM achieved 0.06 loss respectively. Similarly RNN achieved 0.08 loss which is high when compared to proposed model. From Figure 7, it is evident that the proposed model performs well as the loss incurred in minimum when compared to other traditional deep learning models.   Figure 8 depicts that the proposed model achieved 99.07% accuracy for training and testing respectively as compared to the other models. GRU achieved 97.17% and 97.30% accuracy for training and testing respectively. While RNN achieved 96.66% accuracy for training and 96.60% accuracy for testing respectively. Similarly LSTM achieved 97.30% accuracy for training and 97.13% accuracy for testing respectively. The proposed model achieved 3% increase in training and testing accuracy. Figure 9, 10, 11 and Figure 12 depicts that the proposed model achieved high area under the curve (AUC) which is 99.27%. When compared to RNN, GRU and LSTM respectively the proposed model achieved 2% higher AUC. Table 3 and Figure 13 shows the effectiveness of the proposed model in terms of precision, recall and f1-score. The proposed model achieved 97% precision while other deep learning models like GRU, RNN and LSTM achieved      Similarly F1-Score for GRU is 96.00%, RNN achieved 97.00% and for LSTM f1-score was 96.00%. The proposed model achieved 99.00% F1-score which is also higher when compared to mentioned deep learning models for stable class. Similarly, for unstable class the proposed model outperformed the traditional models in terms of precision, recall and F1-score.

V. CONCLUSION
Smart grid is an application of cyber physical system which is used to intelligently manage power dissipation across different entities. The stability of the smart grid is essential for efficient power distribution to the control stations. Machine learning techniques play a vital role in predicting the stability of the smart grid. In this work a novel MLSTM model is introduced to predict the stability of smart grid. The proposed model is experimented on the smart grid dataset from UCI Machine Learning Repository. The performance of MLSTM is compared with traditional ML models like LSTM, GRU, RNN. The comparative analysis proves the superiority of the proposed model with respect to accuracy, precision, loss and ROC curve metrics. The proposed model achieved 99.07% training and testing accuracy which is 3% times higher compared to other traditional deep learning models. Training and testing loss for the proposed model is 0.02 whhere as for GRU, RNN and LSTM it is 0.06, 0.08 and 0.06 respectively. Precision, recall and F1-score for stable class is 97.00%,100.00% and 99.00% respectively for the proposed model which is 3% higher compared to other traditional deep learning models. Similarly for unstable class the proposed model achieved 100.00% precision, 99.00% recall and 99.00% F1-score respectively. ROC curve also highlights the effectiveness of the proposed model. 99.27% ROC is achieved by the proposed model which outperformed the other models. As part of the future work context aware model can be deployed to cater the dynamic power requirements and also to make the smart grids more reliable.
MAMOUN ALAZAB (Senior Member, IEEE) received the Ph.D. degree in computer science from the School of Science, Information Technology and Engineering, Federation University of Australia. He is currently an Associate Professor with the College of Engineering, IT, and Environment, Charles Darwin University, Australia. He is also a Cyber Security Researcher and a Practitioner with industry and academic experience. His research is multidisciplinary that focuses on cyber security and digital forensics of computer systems with a focus on cybercrime detection and prevention. He has more than 150 research articles in many international journals and conferences, he delivered many invited and keynote speeches, 24 events in 2019 alone. He convened and chaired more than 50 conferences and workshops. He also works closely with government and industry on many projects, including Northern Territory (NT) Department of Information and Corporate Services, IBM, Trend Micro, the Australian Federal Police (AFP), the Australian Communications and Media Authority (ACMA), Westpac, and United Nations Office on Drugs and Crime (UNODC). He is also the Founding Chair of the IEEE NT Subsection.
SULEMAN KHAN received the master's degree from the Department of Computer Science, Air University, Islamabad, Pakistan, in 2019. He is currently a Research Associate with Air University. His research interests include network security, machine learning, and data science.
SOMAYAJI SIVA RAMA KRISHNAN is currently a Research Member with the Centre for Ambient Intelligence and Advanced Networking Research (AMIR) and also working with VIT, as an Assistant Professor Senior. He has working experience with the Centre for Development and Advance Computing (C-DAC) (Ministry of Science and Technology, Government of India), as a Research Intern in the area of data center technologies. He is also certified by EMC Corp., as a Proven Professional in information storage and management. He is also an EMC Academic Alliance Faculty and played a key role in establishing a MoU between VIT University and EMC Corp. He had proposed and developed an Intelligent Network design framework for building small and large scale network. He also developed an efficient and secure framework for IP Storage network for C-DAC. His current interests include e-waste management in India, wireless networks, and cloud computing.