Artificial Intelligence-Assisted Edge Computing for Wide Area Monitoring

The massive digital information generated in conjunction with the ever-increasing phasor measurement data in the power grid has led to a tremendous constraint on the analysis and timely processing of real-time data. Under these conditions, leveraging Artificial Intelligence (AI) can play a crucial role in assisting more efficient data processing and analysis. In this paper an AI-assisted power grid event classification method is proposed, which aims at improving the overall power grid system performance. Furthermore, an edge cloud sharing scheme is introduced for a large-scale power grid system. To balance the load and reduce the maximum processing time, a multiple edge cloud node-based scheme is developed. The simulation results verify that the proposed AI-assisted event classification method, together with the edge cloud sharing scheme, can significantly improve the overall performance of the system.


I. INTRODUCTION
W ITH the increasing deployment of phasor measurement units (PMUs), smart meters, and Internet of Things (IoT) devices, together with the ever-increasing deployment of renewable and sustainable energy technologies, managing a huge amount of generated data is becoming a major challenge. Generally, the collected data is mostly unstructured and can be in various formats. This consequently complicates data processing and analysis. For instance, traditional power system analysis and management is primarily based on physical modeling and numerical calculations, which cannot meet the requirements of smart girds due to the increasing complexity, high uncertainty, and huge volume of digital information. On the other hand, the advancement of computer power and the recent development of complex AI algorithms as an enabling technology can be utilized to effectively assist the processing of such massive data.
In the past decade, a great deal of work has been done on the topic of applying AI techniques to the smart grid. Deep Learning (DL), Reinforcement Learning (RL), and Deep Reinforcement Learning (DRL) are three widely used AI techniques. These algorithms can be used to ensure secure and stable operation in uncertain and complex environments.
Data driven DL and RL algorithms can play a crucial role in processing massive digital information for power grid event detection, identification, and prediction [1]. DL algorithms can be deployed to extract valuable features from collected data that can be used to obtain insights about the power system's state and dynamic behavior. For instance, ambient signals collected from the large number of PMUs installed throughout the power grid system can be exploited to detect disturbances as early as possible. In [2], Multilayer Perceptions (MPs), Deep Belief Networks (DBNs), and Convolutional Neural Networks (CNNs) are used to classify disturbances by processing PMU measurements data collected from the power grid system. A DBN based transient stability assessment method was proposed in [3] to separate stable cases from unstable cases. In [4], CNN was used to extract features during the transient process. CNN is shown to be an effective approach in classifying faults, such as line faults [5], [6]. For instance, a deep CNN architecture [7] is used to classify power line insulators based on aerial images. The authors in [8] demonstrate the effectiveness of a CNNbased approach in forecasting probabilistic wind power. For fault classification these methods also achieve good performance [4], [9]. For fault detection, [10] offers three CNN-based models, which are capable of handling big data generated from multiple PMUs and include measurements such as voltage, current, frequency, etc.
To ensure reliable and stable operation of a power grid, it is essential to detect and predict faults/defects in the early stages. To achieve this, DL algorithms can be utilized to analyze gradual changes caused by faults/defects. In [7], a high-level discriminative Convolutional Neural Network (CNNs) is proposed to extract features of the insulators experiencing gradual changes. For fault detection and prediction, Artificial Neural Networks (ANNs) based methods have been proposed in [11], [12], [13]. Nonetheless, ANN based methods may not be able to extract temporal information during disturbances, which is an essential requirement for fault prediction. Conversely, Recurrent Neural Networks (RNNs), which have a proven record of extracting hidden features in big data for image captioning, voice conversion, and language processing [14], [15], [16], [17], [18], [19] have been shown to achieve a good performance in detecting faults [20], [21]. However, conventional RNN normally suffers from vanishing gradient, which degrades the performance of capturing temporal features in a long-time span. Long short-term memory (LSTM) networks proposed in [22] can effectively overcome this problem. An LSTM based approach is presented in [23] to diagnose and predict faults in complicated scenarios. In [24], an LSTM network is utilized to timely detect and identify faults based on measurement data. The authors in [25] use LSTM networks to capture the temporal features caused by line trip faults, which is a gradual process. As an improved RNN, the LSTM network can achieve a better performance with longer time series. The recently introduced Gated Recurrent Unit (GRU) is another RNN algorithm, which can also perform well [26]. As opposed to LSTM, where 3 distinct gates are used, GRU reduces the gating signals to two, namely an update gate and a reset gate. Support vector machine (SVM) based schemes [27] are another effective classifier used in power grid systems and are considered one of the most robust classification models. In [28], [29] SVM-based schemes have been used for online transient stability assessment. In addition, [30] proposes an SVM-based algorithm to lower misclassification rates for voltage stability assessment. In [31], SVM has also been considered for fault detection and islanding. However, we should point out that the SVM's computational complexity is decided by the number of support vectors, instead of the dimension of the sample space. This consequently results in low computational complexity and good robustness. More importantly, SVM can achieve global optimal solution [27].
In this paper, a power grid events classification method is proposed to process various power events by integrating multiple machine learning schemes. Specifically, we first use SVM to quickly identify the event type. For example, those that are categorized as disturbances and expected to occur gradually, are sent to LSTM/GRU algorithm for identification, while faults are processed by using CNN algorithms. In addition, processing the massive data generated in a large-scale power grid network at the remote-control center can impact the latency, as well as impose a heavy load on the communication network. Multiple small-base-stations can be used as edge cloud nodes to overcome these drawbacks [32]. However, a single edge cloud node can only serve an area with limited range. Local traffic activities, which depend on the occurrence of power grid events in the domain of each edge cloud, may be different and unbalanced in terms of not only the computation, but also the amount of data needs to be reported to the remote cloud [33]. Furthermore, the complexity of the employed machine learning algorithms varies widely, leading to an imbalance and shortage of computing and networking resources in edge cloud nodes. To share computing resources and reduce processing delays, in this paper an edge cloud sharing mechanism, which includes an efficient bandwidth allocation strategy is proposed. Under the proposed scheme edge nodes that are experiencing the large number of events will be able to intelligently cooperate with the remote cloud for processing and real-time transmission of synchrophasor data generated by many PMUs that are placed at sensitive locations throughout the grid network. Their main function is to measure phase and voltage variations. GPS is mostly used to synchronize PMUs using a sampling clock, which is phase locked to one pulse per second (PPS). The basic idea of sampling electric waveforms using GPS is to examine phase and amplitude uncertainties, which can affect the fundamental frequency. For example, monitoring transmission lines using PMUs can give insights into power system phenomena like loss of stability, faults, and load encroachments. PMU data streaming is built on top of the user datagram protocol over Internet protocol (UDP/IP) for transmission to the Phasor Data Collector (PDC) [34].
In this paper our main objective is to investigate the impact of machine learning on identifying various types of faults based on massive amounts of real-time data generated by PMUs, which are dispersed throughout grid networks. The contributions of this paper are summarized as follows: 1) Different machine learning algorithms are integrated to classify and identify power events based on their characteristics and features. Compared with traditional schemes where only one machine learning algorithm is used to process all power events, our method indicates a significant improvement in system performance. 2) A collaborative cloud and edge computing scheme is proposed to balance load and reduce overall processing delay, by using an efficient and adaptive bandwidth allocation strategy. To the best of our knowledge, the proposed scheme is the first to use multiple machine learning algorithms and take into consideration their complexities for resource allocation. The simulation results demonstrate that our scheme can effectively reduce the processing delay of the system, which is a crucial factor in preventing a possible blackout. The paper is organized as follows. After a brief overview of multiple machine learning algorithms, we propose a power grid event classification and identification method in Section II. In Section III, an AI-assisted edge cloud sharing mechanism is then introduced for a large-scale power grid system with multiple edge cloud nodes. To assess the performance of the proposed scheme, we present the simulation results in terms of latency and reliability in Section IV. This is followed by the conclusion in Section V.

II. INTEGRATED MACHINE LEARNING FOR POWER EVENTS CLASSIFICATION
How to effectively extract valuable information from the big data generated in the smart grid has attracted considerable attention in past decades. To improve system performance, machine learning techniques are considered to support classification and identification of power grid events. We present an AI-assisted event detection and classification scheme for a power grid system where different machine learning algorithms are used to improve the performance.
In smart grid, power quality events include transient disturbances, such as Generation Trip (GT), Load Shedding (LS), Oscillation (OS), Line Trip (LT), and various power system faults including Generator Fault (GF), Line Fault (LF), Bus Fault (BF), and Transformer Fault (TF). Disturbances are mainly due to a gradual process in a long-time span, which may be caused by aging, damaged distribution equipment, bad insulation, weather changes, etc. For example, a line trip fault is caused by a gradual process of distribution line resistance, which can lead to massive blackouts. The corresponding change of measurements can be captured by LSTM/GRU. LSTM/GRU based algorithms are well known for processing time series-based problems. On the other hand, faults like line faults can occur suddenly and don't have a gradual process. Under such conditions, LSTM is not suitable. Instead, CNN (LeNet/AlexNet) can be used, after re-shaping the input data to produce a matrix structure that resembles the frame of an image. CNN (LeNet/AlexNet) is well known for handling problems with some spatial correlation like images data.
Therefore, in the proposed method shown in Fig. 1, SVM is first used to categorize the event type. The SVM not only has the advantage of low computational complexity and good robustness, it is also able to achieve a global optimal solution. For instance, after fast identification of the event type, those that are categorized as disturbances are sent to LSTM/GRU algorithm for identification, while events that are categorized as faults are processed using the LeNet/AlexNet algorithm. The simulation results in Section IV show that the proposed events classification method can improve the overall performance by using difference machine learning algorithms.
Most machine learning algorithms are good at solving one kind of problem or processing one type of dataset. However, for a large amount of data with different characteristics and features, these algorithms are not able to realize the full potential of AI by working alone. That's why we propose the integrated machine learning scheme for power event classification. The proposed integrated machine learning could take advantages of every algorithm and makes them to work together to complement each other. This can effectively avoid their weaknesses when solving problems that alone they were not designed to solve.

III. EDGE CLOUD COMPUTING AND SHARING
With large scale deployment of terminals (such as PMUs, smart meters and IoT devices) in smart grid, processing a large amount of data can be achieved by handling data processing tasks at the edge of the network [35]. In the case of detecting and predicting power gird events, edge cloud nodes can use machine learning to locally categorize and identify the events. Under these conditions, only a small portion of data, such as inter-area fault locating and inter-area oscillation detection, will be sent to the remote cloud for further processing. This method of parallel processing and analysis [36] of data from massive terminals through machine learning based edge computing not only provides rapid responses, but also significantly reduces network overhead.
Specifically, under the proposed method power grid events can be largely detected and predicted promptly by edge cloud nodes. They can implement data storage locally and respond in real-time without uploading data. As shown in Fig. 2, power grid events will firstly be categorized into four different types of events: local disturbances, global disturbances, local faults, and global faults. In our approach the local disturbances and local faults will be processed locally by edge computing, where the local disturbances are sent to the LSTM/GRU algorithm for identification, while the local faults are processed by using the LeNet/AlexNet algorithm. Global disturbances and faults will be uploaded to the remote cloud for classification.

A. COLLABORATIVE CLOUD AND EDGE COMPUTING
To meet the requirements for smart grid synchrophasor networks, 5G Ultra-Reliable and Low-Latency Communications (URLLC) can be considered as viable options [37]. Therefore, in this paper URLLC-Based wireless network architecture that consists of multiple micro base stations (MIBSs) each acting as an edge cloud node that can communicate with its associated macro base station (MABS) is considered. With the ever-increasing deployment of PMUs for power gird monitoring, each edge node may have to handle multiple PMUs within its coverage area. However, some of these nodes may experience many events such as disturbances and faults and may not have sufficient processing power to handle events categorization and detection in a timely manner. Under these conditions, cooperation and interaction among edge nodes and the remote cloud can be crucial. With the aid of machine learning algorithm, edge nodes should be able to collaboratively offload a part of data processing to the centralized cloud via MABSs. This can be achieved by intelligently exchanging the network status among edge nodes and the remote cloud. In addition, the end-to-end transmission delay caused by unstable wireless links between MIBSs and MABSs, as well as the transmission between MABSs and cloud platform, can impact the transmission efficiency. In this paper, a collaborative cloud and edge computing scheme is proposed to share the load and minimize the end-to-end latency.
In the case of detecting and predicting power gird events, edge cloud nodes will first use SVM to quickly categorize the power grid events to different types, such as disturbances, faults, and normal operations. The events categorized as disturbances are sent to an LSTM-based algorithm for identification and those categorized as faults are processed by a CNN algorithm. Note that different algorithms have different complexities and those with high complexity will need more computing resources, including extra time to process the events. Furthermore, edge cloud nodes experiencing events categorized as normal operation will become idle until the occurrence of the next event. Therefore, it is expected that some edge cloud nodes remain idle without having to transmit any information [33]. Consequently, this will affect the load-balancing and severely undermine the reliability and latency performance of the entire power grid. Under these conditions, the collaborative cloud and edge computing scheme can be used to assign more channel capacity to busy edge cloud nodes and offload the processed data from busy edge cloud nodes to the remote cloud. The collaborate resource allocation aided multi-edge-cloud architecture is shown in Fig. 3, where each small base station (MIBS), acting as the edge cloud node (ECN), accommodates an uncertain number of PMU terminals. Each ECN can communicate, exchange resources and status information with MABSs and the remote cloud.
Based on the size of the processed data and complexity of the selected algorithm, we define the processing resource; RS, as:

RS =
size of the processed data size of input vectors/matrix × the total parameter of the algorithm, where the total parameter of the algorithm denotes the complexity of the used algorithm. Specifically, the complexity of SVM is O (N 3 ), where N is the data number of vectors [38]. The total number of parameters in an LSTM network can be expressed as, where n c is the number of memory cells (and the number of memory blocks in this case), n i is the number of input units, and n o is the number of output units. The total number of parameters in the GRU RNN can be expressed as, where n and m are the sizes of hidden state and input vector, respectively. The total parameter of classic LeNet with an input image of 32 × 32 × 1 is around 60k; The total parameter of AlexNet, with an input image of 227 × 227 × 3, is around 60M. The integrated machine learning presented in Section II, presents our proposed collaborate approach, i.e., using multiple machine learning schemes is capable of processing different types of power events. Their complexities and sizes of input vector/matrix are taken into consideration when allocating computing resources and bandwidth for balance-sharing [32]. This makes our scheme unique and more efficient compared with traditional collaborative schemes, such as the one in [39] where only one machine learning algorithm is considered.
To balance the loads of multiple ECNs, an edge-cloud selection algorithm is described as follows: The processing resource used by every edge cloud in a given time period can be defined as, where E represents the total number of ECNs. This means that the processing resource by all ECNs is, while the average processing resource is: At the beginning of a time period, each ECN selects suitable machine learning algorithms for occurrences of power events within its domain, by which it calculates the needed RS i . After exchanging information with MABSs, each ECN can obtain the RS i of all ECNs and then decide whether to forward the measurements to the remote cloud. Assuming λ i ∈ [0, 1] is the data splitting ratio, denoting the data proportion remaining at the ECN i, the new processing resource can be expressed as, RS i = λ i · RS i . The entire processing time for idle and less busy ECN i, without offload to the remote cloud, is the computation delay and is denoted by, where τ i is the time spent on one unit of the processing resource in the ECN i. For a busy ECN, the whole processing time includes i) the computation delay of itself ii) transmission delay from the ECN to the remote cloud where L i is the size of the processed data, R i is the wireless channel capacity assigned to the ECNi and W i is the backhaul communication capacity used by the ECN i (R −1 i and W −1 i can be interpreted as the required time for the wireless and backhaul links to transmit one-bit data separately [39], [40]), and iii) computation delay of the remote cloud t comp,c i where τ c,i is the time spent on one unit of the processing resource of ECNi by the assigned cloud computation resources.
Problem Formulation: Based on the above delay types, the overall processing delay of the i-th busy ECN can be derived as, while the processing delay of the ECN without offloading can be expressed as, Which can be treated as a special ECN with t tran,e i = 0 and t comp,c i = 0. Assuming that all ECNs have the same priority, the processing delay of the entire system can be expressed as, It should be noted that the parameters λ i , R i , W i , and τ c,i need to be carefully selected and adjusted to minimize the overall system processing delay; T whole . Therefore, the optimization problem can be formulated as: where R all and W all are the overall wireless and backhaul communication resource constraints of ECNs, τ c,all corresponds to the maximum available computing resources provided by the cloud server. The optimization variables include the data splitting ratio λ i , the communication resource allocation R i and W i , as well as the computation resource allocation τ c,i . Optimal Data Splitting and Resource Allocation: For busy ECNs, the splitting ratio λ i should be selected to minimize processing delay T i and the overall processing delay T whole . As mentioned in [39], optimization of splitting ratio λ i is complicated and its solution depends on the available wireless channel capacity, backhaul link capacity, edge node computation resources, and remote cloud resources. However, to achieve event detection and classification, there are finite choices for λ i . For example, in the case of a busy ECN where each period is composed of 5 successive time slots (one time slot for each event), the splitting ratio; λ i can be selected only as 0%, 20%, 40%, 60%, 80% or 100% (note that the processed data for one event should not be split). Furthermore, if only 3 events are detected after SVM prediction (i.e., no event occurs in two of the time slots), the size of the processed data is reduced to 60% of the entire data. Under these conditions, choices for λ i are 0%, 33.3%, 66.7% and 100%. As discussed in [39], when ECNs have far less edge computation resources than the remote cloud, they should offload more processed data to remote cloud in order to ease data processing load at ECNs. Similarly, when wireless channel and backhaul link capacities are not sufficient, more data should remain at ECNs. Furthermore, as far as the wireless channel and backhaul link capacity assignment are concerned, it is reasonable to assign more capacity to edge cloud nodes with more offload to the remote cloud to increase the values of R i and W i . Similarly, in the remote cloud, to reduce the value of τ c,i more computation resources need to be assigned to the edge cloud node with more offloading. Since, in our approach we use multiple machine learning algorithms, their complexities and sizes of input vectors/matrices are significantly different. Therefore, more computation resources in the remote cloud should be allocated to the ECN using a high complexity algorithm.
After exchanging information with MIBSs and the remote cloud, MABSs will derive: the optimal offload data size of ECNs, optimal wireless channel, backhaul link capacity, and remote computation resources assignment. Specifically, the collaborative cloud and edge computing aim to minimize the processing delay T i of ECNs, as well as the overall processing delay T whole of the whole system according to the following steps: 1: Initial values of R i , W i and τ c,i will be calculated by equally assigning wireless channel and backhaul link capacity and remote cloud computation resources to none-idle ECNs. 2: Based on these initial values, each none-idle ECN will derive the splitting ratio λ i and the size of the offload to remote cloud by minimizing the processing delay T i . 3: The values of R i and W i will be updated by assigning wireless channel and backhaul link capacities proportionally to the size of the offload data. The value of τ c,i will be updated by assigning remote cloud computation resources proportionally to the offload processing resource, where the complexity and size of the input vector/matrix are taken into consideration. 4: Based on the updated values of R i , W i and τ c,i , the splitting ratio λ i and the size of the offload to remote cloud will be updated by re-minimizing the processing delay T i . 5: Steps 3 and 4 will be repeated until achieving stable λ i , minimum T i and the T whole . Note that as there are not many choices for λ i , it is not difficult to derive the optimal λ i and minimize the overall processing delay T whole .

IV. SIMULATION
In our simulation, the EMTP tool 1 is used to simulate power grid events and generate measurement data. First, we considered a scenario for a single ECN where 800 power quality events are used, including 100 GTs, 100 LSs, 100 OSs, 100 LTs, 100 GFs, 100 LFs, 100 BFs and 100 TFs. They are first categorized by the SVM algorithm. As shown in Table 1 OSs and 13 LTs. These 403 events are sent to LeNet for identification. As shown in the confusion matrix of Table 2 and 3, LSTM achieves an accuracy of 90.18% for identifying the 397 events, while LeNet obtains an accuracy of 90.07% for identifying the 403 events. The overall accuracy of the proposed event classification method is 90.13%, which is the best performance as shown in the comparison Table 4, where 800 events are processed separately and comparatively using only LSTM, SVM, LeNet, and the proposed event classification method. Tables 5 and 6 display the relative performances of the LSTM, SVM, and LeNet on disturbances and faults, respectively. In Table 7, the proposed event classification method is compared with LSTM, SVM, and LeNet 1. Certain commercial equipment, instruments, or materials are identified in this paper to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.        Table 7 shows that the proposed event classification method achieves a similar performance on real PMU data as that on EMTP data.
In Figs. 4, 5 and 6, we compare performances of cloud and edge detections, where the scenario of a single edge cloud   node and a remote cloud server is considered. In our simulations, 50 PMUs are generating synchrophasors real-time data with sampling rates ranging from 10 to 60 samples per second. The generated data will be first processed locally and categorized into four different types of events: namely local disturbances, global disturbances, local faults, and global faults. The local disturbances and faults will be processed locally using edge detection, where local disturbances are sent to the LSTM algorithm for identification, while the faults are processed by using the LeNet algorithm. The related data will be stored and processed locally. On the other  hand, the global disturbances and faults will be upload to the remote cloud for classification. In our simulations, the percentages of global events range from 10%, 20% and 50%. Figs. 4 and 5 demonstrate the significant advantage of edge detection over cloud detection in terms of packet loss and average latency, which verifies the efficiency of the proposed edge detection. Fig. 6 evaluates the amount of traffic produced on the network. As can be observed the traffic produced by cloud detection is far more than by edge detection. Though the traffic of edge detection will increase with the increasing percentage of global events, it can still achieve a far better performance than cloud detection.
In Figs. 7 and 8, we evaluate the performance of the proposed collaborate cloud and edge computing in a threeedge-cloud-node scenario, where each edge cloud node is handling 50 PMUs. The wireless channel bandwidth between MIBSs and the MABS is set to 20 MHz, the backhaul link capacity is 100Mbps, edge cloud computation capacity is 1 × 10 10 CPU cycle/s, and remote cloud computation capacity is 2 × 10 11 CPU cycle/s. Within each time-period, multiple events, simulated using the EMTP, will be randomly distributed in the system. Each period can be split to 5 successive time slots where each is allocated to one possible event. Each event consists of 15K bits measurement data. In this case, only local events are considered. The local disturbance and local faults are equally likely to occur. At the beginning of a time period, each ECN uses SVM to categorize the power events. The average processing time of SVM is around 2.3s. LSTM is used to process local disturbances and LeNet or AlexNet is considered to process local faults. If the power signal is categorized as normal no action will be taken. The processing resource used by every edge cloud will be then derived and its value will be exchanged with the MABS. The collaborate cloud and edge computing scheme aim to balance the load and to minimize the whole processing delay T whole of the entire system. As can be seen from Fig. 7, the collaborate cloud and edge computing scheme can significantly reduce the processing delay T whole of the system, especially for the scenarios with more events. Fig. 8 depicts the average convergence rate of the proposed edge cloud sharing scheme in a scenario where the average events number in each time period is 9. As mentioned earlier, there are not many choices for λ i and the average convergence rate of the proposed scheme is around 10 iterative steps.
In Table 8, the processing time and complexity of the employed LeNet, AlexNet, LSTM, and GRU are investigated and compared. Events with same amount of data (15K measurement data) are used for all algorithms. Specifically, LeNet or AlexNet is used for processing faults, while LSTM or GRU is used for processing disturbances. Compared with LeNet, the AlexNet achieves a better accuracy at the expense of higher complexity. On the other hand, the GRU reduces the gating signals to two compared to LSTM. However, it has to increase the hidden state of size n to achieve a similar performance to LSTM, leading to a similar complexity for a similar accuracy performance.

V. CONCLUSION
In this article, we present a power grid event classification method where different AI-assisted algorithms are used to process different events that can effectively improve system performance. For a large-scale power grid system with multiple edge cloud nodes, a collaborative cloud and edge computing scheme is introduced to balance the load and reduce the overall processing delay, T whole , of the whole system. The complexity and processing time of some widely used machine learning algorithms, such as LSTM, GRU, LeNet, and AlexNet, are investigated and compared. Using different scenarios our investigations show that the proposed event classification method, together with the collaborative cloud and edge computing scheme, can improve accuracy and effectively share the balance.