Learning Power Grid Outages With Higher-Order Topological Neural Networks

With the increase in cyber-physical threats and extreme weather events, resilience of the power system has become a problem of utmost societal importance. In this article, we propose a novel approach for resilience improvement of power distribution networks, based on the notions of persistent homology and simplicial neural networks (SNNs) which are new directions in graph learning. In particular, tools of persistent homology allow us to capture the most essential topological descriptors of the distribution network. In turn, extending the convolutional operation to simplicial complexes on the distribution network, using the Hodge-Laplacian analytics, enables us to describe complex interactions among multi-node higher order graph substructures. Such higher order graph substructures are of particular importance in distribution networks, since a change in power demand at a load bus (or the power supplied from a substation) will produce a corresponding perturbation in nodal variables (such as the bus voltages) and edge variables (such as branch currents). We validate our new Higher-Order Topological Neural Networks (HOT-Nets) model for contingency classification of three test distribution networks, the IEEE 37-bus feeder, IEEE 123-bus feeder, and the 342-bus low voltage network. Our experiment results on two case studies (i.e., (i) with sensors placed at all the buses alone in the networks and (ii) with partial observability in the networks) indicate that HOT-Nets substantially outperforms 9 state-of-the-art methods, yielding relative gains of up to 14.04% in terms of system resilience classification.

Abstract-With the increase in cyber-physical threats and extreme weather events, resilience of the power system has become a problem of utmost societal importance.In this article, we propose a novel approach for resilience improvement of power distribution networks, based on the notions of persistent homology and simplicial neural networks (SNNs) which are new directions in graph learning.In particular, tools of persistent homology allow us to capture the most essential topological descriptors of the distribution network.In turn, extending the convolutional operation to simplicial complexes on the distribution network, using the Hodge-Laplacian analytics, enables us to describe complex interactions among multi-node higher order graph substructures.Such higher order graph substructures are of particular importance in distribution networks, since a change in power demand at a load bus (or the power supplied from a substation) will produce a corresponding perturbation in nodal variables (such as the bus voltages) and edge variables (such as branch currents).We validate our new Higher-Order Topological Neural Networks (HOT-Nets) model for contingency classification of three test distribution networks, the IEEE 37-bus feeder, IEEE 123-bus feeder, and the 342-bus low voltage network.Our experiment results on two case studies (i.e., (i) with sensors placed at all the buses alone in the networks and (ii) with partial observability in the networks) indicate that HOT-Nets substantially outperforms 9 state-of-the-art methods, yielding relative gains of up to 14.04% in terms of system resilience classification.
Index Terms-Resilience, distribution network, graph neural networks, graph learning.

I. INTRODUCTION
A RESILIENT power network is crucial for the security and prosperity of a society.The alarming increase of outages in the power network due to extreme weather events and cyber-physical attacks has further necessitated enhancing the resilience of the power network infrastructure.This includes strategies to prevent, detect, and mitigate outages caused by contingency events.Outage detection in the distribution network (DN) is hence an important facet for improving the resilience and provides the much needed real-time situational awareness prior to implementing restorative actions [1], such as distributed energy resource dispatch, network reconfiguration [2] and sectionalizing into microgrids [3].Additionally, the knowledge of the operating status of the DN is critical for state estimation, monitoring of distributed energy resources, and management of demand response [4], [5].
Line failures or short circuit faults in the DN may be caused due to natural reasons (such as aging and extreme weather) or cyber-physical attacks.During such adverse conditions, the protective devices function to automatically isolate the faulted components, and disconnect the downstream network with its connected loads from the main grid, thereby resulting in an outage.Fault diagnostics (determining the location and the type of failure) are often performed after detecting an outage [6].Hence, outage detection is preeminent and corresponds to the task of finding the status of the network functionality including that of the protective devices [7].
The transmission network is known to be a critical infrastructure and hence has a widespread presence of advanced communication, metering, and sensor equipment (such as phasor measurement units, PMUs).Several techniques have been proposed for outage detection in transmission networks, such as Bayesian regression [8] and sparse vector estimation [9].In contrast, the DN outage detection is more challenging considering the unobservability of the network [10].This is because until recently, the DN was overlooked in terms of its criticality which is now changing with the transition towards the smart grid concept and decentralization of the grid.Unlike the transmission network, the DN traditionally has a radial structure, larger reactance-resistance ratios, and is spread out with hundreds of thousands of nodes.Hence, many of the algorithms developed for the transmission networks may not be suitable for DN outage detection.Additionally, increasing the number of sensors to impart global observability to the DN would result in high installation and maintenance costs [11].On the other hand, with innovations in instrumentation technology, a decrease in total sensor cost is expected over the years.Despite this, considering factors such as latency and communication bottleneck with the DN size [5], limited sensors placed optimally at select locations are deemed to be sufficient to capture the system state.This is because any information relayed to the distribution system operator is desired to be real-time so as to enable instantaneous control and transfer to the emergency management system if necessary.
Sparse sensor placement is an area with a plethora of research surrounding the advancements in compressed sensing.Sparse sensors and related network classification or signal reconstruction are attributed to the behavior of complex high dimensional dynamic systems to exhibit low dimensional patterns [12].These patterns can be exploited to learn the system state using datadriven techniques.There exist different streams of data in the DN.The supervisory control and data acquisition (SCADA) system provides the electrical measurements and status of the devices in the primary substations, and is expected to cover the lower voltage networks in the imminent future.Power consumption and generation of customers in the DN are measured using the advanced metering infrastructure (AMI).The smart meters in this class can also provide the voltage and current magnitudes at several points in the network.Others such as micro-PMUs and linewatch sensors provide the real and reactive power flows along the lines incident to the node at which these are placed [13].Time series measurements from PMUs specifically designed for DNs are also being employed for analytics and diagnostics.Optimal placement of these devices in the DN has already been studied in several works such as [1], [7] and is beyond the scope of our article.
The data from the sensors can be used for three different categories of tasks, i.e., predictive, prescriptive, and descriptive.In predictive approaches, historical data will be used to predict the state of the network or events in the near future.State estimation is a common task in this category.Switching (network reconfiguration) and load shedding, etc., fall in the class of prescriptive tasks where suitable actions and/or probable outcomes are prescribed based on the current system state.Descriptive approaches are used to learn from past behaviors to classify elements into groups [14].Fault diagnostics and outage detection are typical examples in this category and usually encompass classification and clustering techniques.In this article, the authors propose a new graph learning architecture for descriptive tasks, in particular outage detection using available sensor measurements.
Until recently, the DN operators gathered information about the network functionality from customer calls and crew inspections, or using the signals picked up by sensors in the substation [7].Following this, knowledge based systems utilizing both customer calls and polling from AMI resources were used for identifying DN outages [15].This however, lacked in a real-time identification of the overall system performance considering the expense of acquiring data and the computational constraints associated with size.Further with the deployment of smart sensors, direct monitoring of the status of some of the network parameters and the substation electrical parameters was made possible.Considering the requirement for timely detection of network contingencies given sparsity in sensors, machine learning (ML) models are considered as a good fit and hence employed in several works such as that in [6].
Extreme events which result in network outages are known to be high-impact, low-probability events.A major limitation associated with this is the non availability of sufficient data (from sensors during outages) to train the ML models.This is often mitigated by the adoption of pseudo measurements (e.g., historical load forecasts and weather predictions) and physics based (power flow) models.On the other hand, different outage scenarios may have nearly similar observations in power flows, thus resulting in a non-unique mapping, which is challenging to learn [5].Hence, using a naive ML approach with pseudo measurements and power flow measurements may not be sufficient.The actual power flow is essentially a function of the network topology, load demand, and the outage scenario in the network [7].This implies that factoring in the topological aspect of the DN is pertinent for the outage detection task and has received relatively little attention in the literature.
The DN has an inherent graph structure, with buses as nodes, and lines or transformers as edges.There also exists a strong interdependence among state variables such as bus voltages, and line flows which can be modeled as node and edge variables respectively.Considering that component failures result in changes in the topology of the distribution feeder, and the underlying network connectivity also affects the scale of system degradation, we adopt a learning-over-graph approach to detect power disruption in the network.
To address the above mentioned limitations, and in particular, to account more accurately for the role of local topological information in the DN as well as the latent multi-node interactions among loads, substation buses and the subsequent branch flows, we propose a novel deep neural architecture, namely, Higher-Order Topological Neural Networks (HOT-Nets) which fuses two emerging directions in graph learning, i.e., a fully trainable topological layer and a convolution operation on simplicial complexes.We extensively validate the HOT-Nets architecture for outage detection on power distribution networks.Our findings show that HOT-Nets delivers significant improvements in the classification of distribution system outages, with relative gains up to 14.04%, compared to 9 state-of-the-art deep learning methods.
The main contributions of this work can be summarized as follows: r We introduce the concepts of topological signatures and higher order interactions among multi-node graph substructures into learning of power distribution systems, by aggregating the benefits of persistent homology, Hodge theory, and deep neural networks.To the best of our knowledge, neither tools of persistence homology nor Hodge-Laplacians alongside graph neural networks have previously been applied in conjunction with any resilience related task in distribution grids.
r To the best of our knowledge, this is the first deep neural architecture that leverages both the utility of a simplicial Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
convolutional layer and a fully trainable topological layer.Considering the slow adoption of graph learning techniques in power system applications, our work goes even beyond this to leverage tools of topological data analysis in characterizing distribution networks.
r To enable the resilient operation of the distribution net- work, we develop a model for detecting outages in a practical distribution network with sparse sensors.The model is validated on two conventional radial and one urban, low voltage meshed distribution networks.Additionally, two case studies, i.e., one with sensors placed at all the buses alone in the networks and another one with partial observability in the networks are performed.The performance of the model is compared with state-of-the-art learning models.

Outage detection and resilience in power distribution networks:
The degradation of the system state with the evolution of high impact, low probability extreme events in the DN is a temporal process, which can be categorized as before, during, and after an event.Hence, the tasks associated with outages can be classified as prediction, detection, and mitigation accordingly.It is important to infer the system status during or immediately following line outages prior to taking restorative actions such as network reconfiguration, intentional islanding [3], etc.Generally, outage detection relies on using sensors allocated in the DN as, for example, discussed in [5], [6].
In related tasks such as state estimation and topology identification, several works have employed classical approaches such as weighted least square and weighted least absolute value methods.In these methods, state estimation or topology identification is modeled as optimization (usually mixed-integer linear programming) problems that can be solved using suitable solvers.In [16], for example, a reformulated version of the weighted least absolute value (WLAV) method has been used for topology identification.Although the WLAV is more robust against bad data (compared to weighted least square optimization methods), they are computationally expensive and sensitive to measurement uncertainty, making it unsuitable for real-time online state estimation.Additionally, the robustness and bad data rejection by the WLAV method are also attributed to measurement redundancy which may not be present in the distribution networks [16].The distribution networks are often only partially observable due to limited sensor placement in the network.This is because the distribution network is considerably large with hundreds of thousands of nodes/lines and it may not be economically feasible to allocate sensors for each component.However, the limited sensors are placed optimally such that the network state can be derived from the sparse signals.This principle of compressed or sparse sensing is used in [17], however, with a mixed integer non-linear programming approach which is computationally intensive, given the NP-hard nature of the state estimation problem.In contrast, some works use the principle of compressed or sparse sensing as in [17], however, with a mixed integer non-linear programming approach which is computationally intensive, given the NP-hard nature of the state estimation problem.
Considering outage-specific tasks in DN, in [5], a strategy of partitioning DN into sub networks which are further divided into multiple control areas is adopted.Similarly, in [6], optimally placed power flow sensors and the load estimates acquired from the advanced metering infrastructure (AMI) are used for outage detection by dividing the tree network into subtrees with sensors placed at the root and boundaries of subtrees.The detector is based on a maximum a-posteriori probability (MAP) formulation and particularly, the graph (tree) structure of the DN is also considered with the power flow measurements as the edge variables, and the load forecasts as the node variables.However, as mentioned earlier, taking into account the partial observability of the DN with limited sensors, the non-scalability, and the latency associated with such approaches, ML based techniques can alternatively be adopted for real-time outage detection and situational awareness.In this light, [18], [19] have adopted ML approaches where the historical outage data for specific extreme weather events is collected, analyzed, and used for training a deep neural network to predict repair time and restoration time.Such models are useful for post-disaster decision making such as optimal repair and crew scheduling.Many outage prediction models using weather data and ML models are also available in the literature such as [20], [21].In particular, a graph embedding model in [21] has been adopted, where a DN overlaying a geographical area has been modeled as a graph to account for the spatial information and the variation in weather with geography, while performing outage prediction.These models can be used for early warning and preparation for outages such as increasing storage, prosumers trading renewable energy [22], etc.Alternatively, the outage detection problem is addressed in [23], where social media (tweets) are used as sensors.Although being a practical and interesting approach, this lacks spontaneous detection as the human response to such events is much slower compared to the speed of outages and cascading effects in DN.There exist models such as [24], where a data-driven scheme with smart meter data and ML models are used for outage detection.Alternatively, in [25], the authors propose a deep learning (DL)-based approach for topology identification (TI) and state estimation for an unbalanced three-phase distribution system.These models are however purely data-driven and do not consider any underlying topological functionalities to accurately capture network dynamics.
As opposed to the large amount of work concerning outage prediction, outage location, and prediction of repair and restoration time, outage detection using ML is comparatively limited.It is, however, essential to know if there are indeed any load outages (i.e., loads not served) and the risks associated with line outages in real time.To enhance the accuracy of the model used for situational awareness, incorporating the topological information along with the physics-based power flow information for each outage scenario in the learning framework has not yet been explored.This is necessary especially considering the variation in topology with outages and the inherent graph structure of the DN with correlation among different nodal variables.
Persistent Homology: Persistent homology [26], [27] is a suite of tools within topological data analysis (TDA) that has shown substantial promise in a broad range of study domains, from bioinformatics to material science to social networks to energy systems [28], [29].Persistent homology (PH) has also been successfully integrated as a fully trainable topological layer into various deep learning models, addressing tasks such as node and graph classification, link prediction, and anomaly detection (see, e.g., overviews in [29], [30]).More specifically, the goal of PH is to study properties of observed data that are invariant under continuous transformations such as twisting, stretching, and compressing.Such properties are broadly referred to as shape, or topological characteristics of the data.Nevertheless, persistent homology and TDA have not yet been employed for resilience related tasks in power distribution systems.
Simplicial Neural Networks: Modeling higher-order interactions on graphs is an emerging direction in graph representation learning.While the role of higher-order structures for graph learning has been documented for a number of years [31], [32] and involves diverse applications such as graph signal processing in electricity networks [33], [34], dynamics of disease transmission and biological networks, the integration of higher-order graph substructures into deep learning on graphs has emerged only in 2020.As shown in [35], [36], higher-order network structures can be leveraged to improve the performance of link and trajectory prediction tasks.Indeed, several recent studies [37], [38], [39], [40] propose to leverage simplicial information to train neural networks on graphs.However, none of these so-called Simplicial Neural Networks (SNNs) is integrated with a fully trainable topological layer allowing them to learn both persistent topological features and simplicial geometry of graphs.In this article, HOT-Nets is proposed to address this limitation.
Graph Classification with Graph Neural Networks: The field of applying graph neural networks for graph classification has received considerable attention in recent years.Graph Convolutional Networks (GCNs) [41] employs a simplified graph convolution by aggregating the target node feature information from its neighbors.GraphSage [42] performs sum, average, or max-pooling neighborhood aggregation and updates the node representation by applying a linear transformation on top of graph convolution.Graph Attention Networks (GAT) [43] combines graph neural networks with attention mechanism by computing attention scores using each node's features and those of its one-hop neighbors.Adaptive Multi-channel Graph Convolutional Networks (AM-GCN) [44] extract different embeddings from node features, topological structures and use an attention mechanism to learn the importance weights for different embeddings.Moreover, some graph neural networks (GNN)-based models exploit a pooling architecture which addresses the limitations of traditional graph pooling architectures, i.e., being incapable of capturing the graph substructure information.For instance, DiffPool [45] develops a differential pooling operator that learns a soft assignment at each graph convolutional layer.A similar idea is utilized in EigenGCN [46], which introduces a pooling operator based on the graph Fourier transform.Compared with these approaches, our proposed HOT-Nets architecture can capture both the most characteristic topological features of the graph and complex interactions among higher-order graph structures described via simplices.As such, HOT-Nets is the first architecture within geometric deep learning that combines a topological layer with learning on simplices.

III. METHODOLOGY
In this section, we present our proposed HOT-Nets model for the graph classification task.The failure of lines in the power DN affects the network connectivity and may result in the isolation of buses from the main grid.Fault isolation by the DN protection system may disintegrate it into islands with no power generating resource, resulting in disruption of supply at the load buses, thereby creating a network outage.A line failure in the equivalent graph representation of the DN corresponds to the removal of the particular edge.Despite the removal of some edges (lines), the network may still be functioning.This can be attributed to the meshed structure of some networks and the presence of zero injection nodes in the DN which carry no load.However, the power flow variables are also required to provide an accurate description of the system state.Further, any instance with unserved energy is considered as an outage and those with no loss of energy indicate the converse.This detection task is formulated as a graph classification problem.Fig. 1 shows the overall framework of our proposed model.The key idea is that HOT-Nets incorporates both higher-order structures and local topological information about the distribution network into the neural network architecture.The attention mechanism then captures dependencies among different topological representations explicitly.We start from outlining the HOT-Nets approach and show how the proposed framework is used to learn higher-order representations and topological signatures of distribution networks.
Problem Definition: We consider an undirected graph G = (V, E, A) as a model of a distribution system with nodes V and edges E.Here |V| = N is the number of nodes and |E| = M is the number of edges.Let A ∈ R N ×N denote a symmetric adjacency matrix with N nodes, X v ∈ R N ×d v be a node feature matrix, and X e ∈ R M ×d e be an edge feature matrix, where d v and d e are the dimensions of node and edge features, respectively.Specifically, A uv = 1 implies that there exists an edge between nodes (buses) u and v, and The function of the proposed model is to detect the presence of outages in the distribution network.Thus, an outage detector is a binary classifier with '0' representing a normal operation, and '1' representing power disruption.Although the outages result in a change of network connectivity, the base network topology is only known prior to the detection and location of outages.However, the sensor signals superimposed on the graph, i.e., the node variables (bus voltage and active power supplied) and the edge variable (branch flow) vary with each scenario and can be measured.Additionally, some of the edge properties (such Fig. 1.Framework of our HOT-Nets model for graph classification.Top row: The higher-order simplices convolution (HoSC) module is used to extract higher-order simplices embeddings and form a primary higher-order simplex descriptor (i.e., Z H ) via the concatenate operation ⊕.Bottom row: First, we generate a persistence image P I i for the input graph using the filtration F i G (where i = {1, 2, 3}, i.e., here we display 3 different filtrations including degree-based, betweenness-based, and closeness-based filtrations); we then feed these PIs into a CNN based model to obtain the image-level topological features.An attention mechanism is used to adaptively learn the correlation information among higher-order structures and different topological representations.
as phases, resistance, reactance, etc.) can also be considered as edge features.Outage detection is a graph classification task which consists of viewing the graph (including its node and edge features) as input and predicting its corresponding label.We assume that each graph belongs to one of C classes (here 2).

A. Preliminaries on Persistent Homology
The key approach of PH is to first associate a graph G with some filtration: and then to count various shape patterns of G i (where i = {1, 2, . . ., k}) such as the number of connected components, triangles, and voids, throughout this nested sequence.To make the counting process systematic, we equip each G i with a combinatorial object, called an abstract simplicial complex.Definition 3.1: A family of sets K with a collection of subsets S is an abstract simplicial complex if for every σ ∈ S and every non-empty subset τ ⊆ σ, we have τ ∈ K , that is, K is closed under the operation of taking subsets.Elements of K are called simplices.An element σ ∈ K such that |σ| = k + 1, is called a k-simplex.Every subset τ ⊂ σ such that |σ| = k is called a face of σ.All simplices in K that have σ as a face are called co-faces.Finally, the dimension d of K is the largest dimension of any of its faces.
As a result, we represent the graph filtration , which allows us to use tools of simplicial homology, offering us with computational techniques to study shape characteristics of G in an efficient manner.In particular, we track which shape patterns appear in this filtration of complexes and record indices of the first and last appearance of each topological feature (i.e., its birth i b and death i d , respectively).All extracted topological information can be then summarized in a form of persistent diagram (PD) which is a multi-set D of points in R 2 , such that the x coordinate is the birth (i.e., i b ) of the q-dimensional topological feature (0 ≤ q ≤ d) and the y coordinate is the death (i.e., i d ) of this topological feature.Since i d ≥ i b , all points in D are in the half-space on or above y = x.Lifespan, or persistence of a q-dimensional topological feature is defined as i d − i b .The longer the lifespan, the likelier the topological feature contains some important information about the structural organization of G [47], [48].Features with shorter lifespans are often referred to as topological noise.
In general, there are multiple ways that a graph filtration can be constructed (see, e.g., [49]).Here we consider a sublevel filtration induced by a continuous function F defined on nodes of G.That is, let F : V → R and ν 1 < ν 2 < . . .< ν n be a sequence of sorted filtered values, then K i = {σ ∈ K : max v∈σ F(v) ≤ ν i }. (A filtration on edges of G can be defined in a similar manner.)As K , we use a Vietoris-Rips abstract simplicial complex [50] due to its computational benefits.In our study, we consider F to be a function of node degree, betweenness, and closeness.Such an architecture with multiple types of filtration functions allows us to better learn multi-scale network properties along different dimensions.In particular, since in practice the power distribution network is designed to deliver power from the substation to the consumers connected at the different load buses, metrics such as degree and betweenness can identify buses that carry maximal power in the network.In turn, node closeness can identify buses that connect the loads to the substation and hence result in maximum disruption in case of outages.
To encode topological information presented in PD D into GNNs, we use its vectorized representation, i.e., persistence image (PI) [51].The PI is defined as a finite-dimensional vector representation derived by a weighted kernel density function and can be formulated in two steps.First, we map D to an integrable function ρ D : R 2 → R 2 , which is called a persistence surface.The persistence surface ρ D is given by sums of weighted Gaussian kernels that are centered at each point in D. Second, we integrate the persistence surface ρ D over each grid box to obtain PI.
More specifically, the value of each pixel z within the PI is formed as where T (D) is the transformation of D (i.e., T (x, y) = (x, y − x)), g(μ) is a weighting function (where the mean μ = (μ x , μ y ) ∈ R 2 ), and δ x and δ y are the standard deviations of the Gaussian kernels in the x and y direction.To gain a better understanding of the complex representations of graph data, we consider multiple types of filtration functions and each filtration function corresponds to a persistence image.

B. Higher-Order Simplices Convolution Module
A closely related concept to simplicial homology is the Hodge theory, allowing us to extend the notion of a standard combinatorial graph Laplacian, which addresses diffusion from node to node of G through edges, to diffusion over higher-order substructures of G, described by k-simplices of G.The generalization of graph Laplacians enables us to account for complex multi-node interactions in G, beyond the node level [35], [52].Such higher-order interactions are particularly important in the analysis of DN, where the buses (corresponding to nodes in the graph) interact with one another and the power flow through the branches or edges is a result of this interaction.Any change introduced at the buses, such as variation in load demand or power generation, affects the state of all the system variables such as voltage, current, etc. Definition 3.2: Let C k be a real-valued vector space endowed with a basis from the oriented k-simplices.(By orientation of simplices, we mean selecting some (arbitrary) order for its nodes, where two orderings are said to be equivalent if they differ by an even permutation.)A linear map An operator over oriented k-simplices L k : C k → C k is called the k-Hodge Laplacian, and its matrix representation is given by where B k B k and B k+1 B k+1 are often referred to L down k and L up k , respectively.For example, the standard graph Laplacian L 0 = B 1 B 1 ∈ R N ×N is a special case of the above k-th combinatorial Hodge Laplacian and the matrix L 1 ∈ R M ×M is the Hodge 1-Laplacian.In our experiments, following the spectral properties of the normalized Laplacian operator (see Schaub et al. [36]), we consider the normalized Hodge 1-Laplacian where D 2 = max (diag(|B 2 |1), I) represents a diagonal degree matrix of the edges, D 1 = 2diag(|B 1 |D 2 1) represents a diagonal degree matrix of the nodes, and D 3 = 1 3 1.For the sake of simplicity and without loss of generality, we set k = 1.Furthermore, we define the propagation rule for the normalized Hodge 1-Laplacian, i.e., higher-order simplices convolution (HoSC) module as follows: where Z ( ) H ∈ R M ×d is the input activation matrix to the -th hidden layer (where is the -th layer trainable weight matrix, max (•) denotes the element-wise max operator, and Ψ(•) is a non-linear transform consisting of a Batch Normalization followed by a ReLU activation.In this case, the message passing scheme on simplicial complexes are Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
(i) from edges to edges through nodes and (ii) from edges to edges through triangles -i.e., capturing how information from the edge propagates through the surrounding nodes and faces.
The power distribution network is a real-world structure with complex interdependencies among its key constituents such as power flow through branches and the load (or generation) at the buses.Hence, the state of the power grid network can be described by properties of buses (node features), flow through branches (edge features) and network topology.Encompassing all these within the same framework is pertinent for realistic decision making as opposed to conceptual graphs where one category of features is more relevant than others.Hence, we use a Hodge 1-Laplacian considering the electrical and topological properties of the distribution grid.From a deep learning perspective, all learned higher-order simplices embeddings in different levels (layers) are combined to be a primary higher-order simplices descriptor.That is, we concatenate the output of the L higher-order simplices convolution modules along the column dimension as where ⊕ denotes concatenate operation.

C. Topological Signatures Representation Learning
In our experiments, we use a convolutional neural network (CNN) based model to learn the topological features of a persistence image.Given the persistence image of resolution p, i.e., P I i ∈ R p×p (where i = {1, . . ., ξ} and ξ represents the number of different types of filtrations), we employ a CNN based model and global max pooling to obtain the image-level local topological feature Z T i as where f GMP is the global max pooling, f θ i is a CNN based neural network with parameter set θ i , and Z T i ∈ R d c is the output for the i-th persistence image P I i .Hence, we obtain of a graph, which enables us to capture graph properties along various geometric dimensions.

D. Attention Mechanism
To adaptively learn the intrinsic dependencies among different higher-order simplices and topological representations, we utilize the attention mechanism to focus on the importance of task relevant parts of the learned representations for decision making, i.e., (α H , α T 1 , . . ., α T ξ ) = Att(Z H , Z T 1 , . . ., Z T ξ ).In practice, we compute the attention coefficient as follows: where Υ Att ∈ R 1×d out is a linear transformation, Ξ is the trainable weight matrix, and the softmax function is used to normalize the attention vector.Then, we obtain the final embedding Z by combining all embeddings Lastly, we feed the final embedding Z into a multilayer perceptron (MLP) layer and use a differentiable classifier (here we use a softmax layer) to make graph classification.

A. Test Networks
The HOT-Nets model is validated on three different distribution test networks, namely the IEEE 37-bus, IEEE 123-bus, and 342-bus low voltage networks [53].The IEEE 37-bus distribution network is a small size, three-phase, unbalanced, and delta configured medium voltage network rated at 4.8 kV.The 123-bus network is also a three-phase, unbalanced system that operates at 4.16 kV, and has multiple shunt capacitors and voltage regulators.The 342-bus low voltage network (LVN) on the other hand, is a moderate size urban network that has a meshed structure found in North America and is unlike the traditional radial feeders.The total load on the 342-bus LVN is approximately 50 MVA and is supplied by a 230 kV substation.The power flow and the state for the test networks with varying scenarios are evaluated using the OpenDSS simulation software [54].

B. Dataset Generation
Due to the lack of available data comprising of signal measurements for different outage scenarios, a synthetic approach is used to train the HOT-Nets model.This includes a scenario generation method and evaluation of network behavior by implementing the scenarios on the equivalent DN model in the OpenDSS simulation tool.The network state in each scenario includes measurements such as bus voltage, power supplied at the buses and branch flows through the lines or transformers.These measurements are extracted only for those buses or lines which have a sensor installed so as to emulate a practical distribution network.The buses and lines with existing sensors are assumed in our study whereas this is available information in the real-world DNs.
The scenario generation method accounts for variations in line failures and load profiles.This also encompasses scenarios with normal operating conditions but with load variations to be able to differentiate the outage from normal operation.The degradation of the distribution system due to component failures can be approximated by randomly disconnecting components from the network [55].A random edge or component removal approach can be used to obtain a generalized model which is independent of the type of the disturbance or the specificity of the weather-related event causing the outage in the feeder.The nature of outages in the distribution network, however, is often localized with a cascading effect.To emulate the behavior of degradation of a real-world distribution feeder, we adopt a subgraph-based random edge removal as in [56].For the graph model representing the DN, considering the localized effect of contingency events, a subgraph of radius r s can be initiated around a node u i selected at random.Following this, a fraction Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. of edges f s are randomly removed within this subgraph.Since the distribution network tends to be geographically constrained, the probability of failure of the edges due to extreme events can be drawn from a uniform random distribution.The radius of the subgraph r s , a fraction of edge failure f s , and the central node u i are varied to obtain multiple scenarios.Additionally, annual historical load data with an hourly resolution is used to account for the variation in load in the DN.The time of outage for scenarios with line failures is drawn randomly from a uniform distribution, and the corresponding load profile is used in each scenario.In situations where outage data for weatherrelated events are available, contingency scenarios could also be generated by simulating edge failures drawn from probability distributions derived from the historical data.However, such a Bayesian inference approach is beyond the scope of this work.Equivalent graphs of the distribution feeders are constructed by representing the substation, distribution transformers, and other interconnecting buses as the nodes.The distribution lines and primary transformers in the network are considered as the edges.
Although the line failures are known while generating scenarios for training, in the real-world network operation, this information is not known a-priori.Rather, the purpose of our model is to detect the presence of one such failure.Relying on the knowledge of the fully functional network topology, and the varying node and edge attributes, the outage detection model determines if edges (lines) have failed, and thus detects the topology change due to line failures.Therefore, the graph scenarios are dynamic in that the signals associated with nodes and edges are varying.Note that, our topological features (e.g., persistence images) are extracted by using various filtration functions (on line features and bus features) and thus the topological signature is varying with nodes and edges in a dynamic system.
We validate the effectiveness of our model using two separate case studies, resulting in two different datasets.In the first case, we assume that the actual power supplied at all the buses can be measured by the sensors.Hence, the active power supplied and demand estimate are considered as the descriptive node features.The edges, however, do not have sensors, and hence different properties (such as resistance, reactance, phase interconnections, maximum capacity rating, baseload and residual capacity) are used as representative edge features.
The following are the properties used to describe an edge (u, v): r The phase interconnection.r The resistance R (u,v) is the equivalent line resistance eval- uated from the three phase matrix of each DN branch.
A statistical overview of the dataset for all three networks in the first case study is given in Table I.For the second case, we assume partial observability in the network with sensors placed at 45-50% of the buses and lines in the network.Here the node features include the three-phase voltage, active power supplied, and the demand estimate at the buses.Of these, the former two measurements are available in real-time only at the nodes with sensors placed.On the other hand, the demand estimate, i.e., the demand forecasted at the load buses is a pseudo measurement.The edge features include resistance, reactance, maximum capacity, residual capacity at base load, and power flow through the branch.Except for the branchflow, the others are properties available to the distribution system operator for all edges.The branchflow, on the other hand, is measured continuously only at those edges with line sensors and varies with outages.In the synthetic approach that we adopt, the sensor measurements are acquired from the powerflow simulator.Table II summarizes the statistics of the dataset for all three networks considered in the study with partial observability.
A graphical representation of the 342-bus LVN used in the study is shown in Fig. 3.A representative failure scenario in the 342-bus LVN is illustrated in Fig. 4. Similarly, the IEEE 37-bus  network along with a contingency event induced on the network is represented in Fig. 5.

C. Baselines
We compare our proposed HOT-Nets with 11 state-of-the-art (SOA) models, including (i) Random Forest (RF) [57], which is a combination of a series of tree structure classifiers, (ii) Artificial Neural Networks (ANN), which is a feedforward multilayer percepetron (MLP) and can capture both linear and nonlinear features; (iii) GCN, which learns node representations by aggregating representations from neighbors; (iv) GAT [58], which extends the graph convolutional operations in GCN with masked self-attentional layers; (v) Graph Isomorphism Network (GIN) [59], whose representation power is well-matched with the Weisfeiler-Lehman test for graph isomorphism; (vi) Graph-Sage [42], a graph convolutional network framework that proposes different types of aggregator functions; (vii) Set2Set [60], a graph convolutional network framework, which further replaces the global mean-pooling by global pooling operation through Long Short-Term Memory (LSTM) networks; (viii) DiffPool [45], a graph convolutional network model designed for graph level representation learning with differential pooling layers which coarsens the graph in a hierarchical manner; (ix) EigenGCN [46], a graph convolutional network that deploys pooling operator based on the graph Fourier transform; (x) Adaptive Multi-channel Graph Convolutional Networks (AM-GCN) [44], an adaptive multi-channel graph convolutional network model that learns the node embedding based on both node features and topological structures with attention mechanism; (xi) SNNs [37], a convolutional neural architecture for data supported on simplicial complexes.

D. Experimental Setups
All the SOAs, including our HOT-Nets, are implemented in Python with Pytorch 1.8.0 and executed on a server with one NVIDIA RTX 3090 GPU card.We optimize all the models by the Adam optimizer for a maximum of 50 epochs.We conduct our experiments using 10-fold cross-validation and report the average accuracy.The learning rate is searched in {0.1, 0.01, 0.001, 1e−4, 1e−5} and layers of higher-order simplices convolution module L ∈ {1, 2, . . ., 5} with the hidden layer dimension nhid HoSC ∈ {8, 16, 32, 64}, and the dropout rate is 0. For PI representation learning (i.e., PIs from 3 different filtrations), we train three 2-layer CNN based models with the same hidden layer dimension nhid CNN 1 ∈ {8, 16, 32} and the same output dimension nhid CNN 2 ∈ {16, 32, 64} simultaneously.Energy is said to be unserved at a node if the estimated load demand at the particular node is not met.The total energy unserved for the network is the sum of the energy unserved at all the nodes and it indicates an outage in the network.The opposite is true if there is no unserved energy.Hence, we label each network scenario in the outage detection task, thereby resulting in a graph classification with 2 classes.In our experiments, we consider ξ = 3 different filtrations (i.e., degree-based, betweenness-based, and closeness-based filtrations via setting the grid size of the PIs to 50 × 50) and note that the proposed topological signatures representation learning module can learn an arbitrary number of filtrations on the graph.The source code is available at https://github.com/hotnets/HOT-Nets.git.

E. Graph Classification
The evaluation results are summarized in Table III.In particular, on all three networks, the improvement gain of HOT-Nets over the runner-ups ranges from 2.88% to 7.63%.In terms of baseline methods, GCN only takes information from 1-hop neighbors into consideration and is not capable of learning topological representations of a graph.GraphSage learns a principle of aggregation to extend GCN into the inductive setting and shows stable improvement over GCN.Besides, Set2Set, DiffPool, EigenGCN, and AM-GCN are SOA GCN-based models (including global pooling, hierarchical pooling, attention mechanism implemented over a graph network architecture, or an extension of GCN architecture to process data supported on simplicial complexes), and SNNs are SNN-based model.A common limitation of these SOA methods is that they are incapable of incorporating both higher-order features and multi-scale local topological structures.Therefore, it is natural that HOT-Nets shows much better performance against these SOA baselines.
The results on the case study with partially observable test networks (i.e., IEEE 37 Bus , IEEE 123 Bus , and 342 Bus LVN ) are summarized in Table IV.For IEEE 37 Bus , we observe that our HOT-Nets yields significant relative gains of 2.37% on average compared to all 9 baselines.Specifically, HOT-Nets surpasses the performances of GNN-based models by a notable margin to reach 5.36% and outperforms the simplicial Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Furthermore, we have also considered that real-time measurements are acquired from the node and line sensors placed in the network as in [5].Taking into account the suggestion of the reviewer, we have conducted another experiment for 123 bus network (denoted by IEEE 123 Bus ) with 30% sensor density which is also used in [5].The results of this experiment is presented in Table V.We observe that our HOT-Nets clearly outperforms all baselines, leading to a relative gain of 3.57% compared to the runner-up (i.e., SNNs).As with the case study for 50% sensor density, here the sensors have been randomly pre-allocated at specific buses and lines.The node features and edge features remain the same as with previous studies.The voltage, power supplied at buses, and the power flow through branches are available in real-time only for those nodes and edges with sensors.

F. Ablation Study
To better evaluate the performance of Hodge Laplacian representation, persistence images from multiple filtrations (i.e.,   VII, the results show that all the components are indispensable.Specifically, NT-Nets (i.e., without using the normalized Hodge 1-Laplacian) indicates the importance of higher-order structures information for contingency classification of distribution networks.Furthermore, HOT-Nets W/o Topo.(i.e., without any PI for topological signatures representation learning) shows the advantage of leveraging topological summaries to capture hidden structural and local topological information in distribution networks.
Our experiments demonstrate that HOT-Nets, if employed in power distribution networks, can detect outages caused due to single and multi component failures with higher accuracy than SOA graph and simplicial based learning techniques.This provides the much needed situational awareness for the network under contingency, thereby enabling distribution system operators (DSOs) to take suitable actions to mitigate outages.Our model can also be used to assess the risk associated with potential failure scenarios since it predicts the occurrence of an outage.In cases where the component failures do not result in load disruptions, the DSO can simply ensure that the faulty area is isolated and continue to monitor power balance in the functioning network component.However, if the failure event results in power outage, immediate actions following isolation such as dispatch of energy resources, repair crews, and other control & switching actions may be required.

G. Computational Complexity
For higher-order simplices, incidence matrices B 1 and B 2 can be calculated efficiently with computational complexity O(N + M ) and O(M + Q) respectively, where N is the number of 0-simplices (i.e., nodes), M is the number of 1-simplices (i.e., edges), and Q is the number of 2-simplices (i.e., filled triangles).Persistent homology can be calculated efficiently for dimensions 0 and 1 (i.e., having a worst-case complexity of O(Mη(M )) with M sorted edges and η(•) denotes the extremely slow growing inverse Ackermann function) and the complexity of calculating persistent features from a filtration is dominated by the complexity of sorting all edges, i.e., O(M log M ).

H. Application to Practical Distribution Networks
Real-world distribution networks are widespread with hundreds of thousands of nodes.Hence, sensor placement at each line or bus is impractical, considering the installation and maintenance cost, and also the communication bottleneck.Therefore, sparse signals collected from a few sensors are to be used to accurately detect the network status in real time.Learning based methods can be used to address these issues.Most traditional learning techniques are however solely data-driven without accounting for the topological interdependence of the complex network.Factoring in topology and also learning from both the global and local structural information is pertinent to derive accurate conclusions about the system state.This is evident from the performance of the HOT-Nets and the ablation study.One such trained model could be deployed in distribution networks where the sparse signal measurements are continuously used to monitor the network status.

V. CONCLUSION
To enable the resilient operation of power distribution systems, we have proposed a new graph learning model that leverages the utility of persistent homology on graphs and extends the convolutional operation to simplicial complexes on the distribution network using Hodge-Laplacian analytics.Integrating persistent homology into learning distribution networks allow us to extract the most characteristic topological descriptors of the distribution grid, while the Hodge-Laplacian analytics account for complex interactions among the higher-order substructures of the grid.Compared to the state-of-the-art learning methods, the new HOT-Nets model has been shown to deliver highly competitive performance in resilience analysis of power distribution networks by predicting the outage status of a network under varying operating conditions.In the future, we plan to exploit these concepts to also determine the locations of outages and estimate the scale of disruption (energy not served).Additionally, we also plan to explore the advantages and limitations of HOT-Nets and, more generally, simplicial neural networks with a fully trainable topological layer for optimal design problems in general cyber-physical systems.

ACKNOWLEDGMENT
The United States Government has a royalty-free license throughout the world in all copyrightable material contained herein.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research.
The article has been awarded the 2022 Best Student Paper Award from the Section for Statistics in Defense and National Security (SDNS) of the American Statistical Association (ASA).

Fig. 2 .
Fig. 2. Example of generating simplicial complexes from a synthetic power network, where each node represents a bus.(a) B 1 is the node-to-edge incidence matrix; (b) B 2 is the edge-to-face incidence matrix; (c) the graph structure of the synthetic power network; (d) the simplicial complexes of the synthetic power network.
called a boundary operator.The adjoint of the boundary map induces the co-boundary operator ∂ T k : C k → C k+1 .Matrix representations of ∂ k and ∂ k are B k and B k , respectively.In Fig. 2, we illustrate an example of generating simplicial complexes from a synthetic power network.

rr
The reactance X (u,v) is the equivalent line reactance eval- uated from the three phase matrix of each DN branch.r The base load capacity I B (u,v) is the flow through the distribution line or transformer for base load condition without contingency.The maximum capacity I max (u,v) is the maximum permissible flow through the line or transformer.r The residual capacity I res

Fig. 4 .
Fig. 4. Illustration of a contingency event on the 342-bus low voltage network.Nodes and edges are color coded.Gray nodes are buses isolated by the network failure, black edges are functioning lines or transformers, and the failed components or edges are represented in red.

Fig. 5 .
Fig.5.Graphical representation of the IEEE 37-bus distribution network and an illustration of a contingency event on the network.The source node is represented in green and marked 'S', the blue nodes indicate load buses and the black nodes are interconnecting buses.The failed components or edges are marked in red and gray nodes are buses isolated by the failure.
Learning Power Grid Outages With Higher-Order Topological Neural Networks Yuzhou Chen , Roshni Anna Jacob , Graduate Student Member, IEEE, Yulia R. Gel , Jie Zhang , Senior Member, IEEE, and H. Vincent Poor , Life Fellow, IEEE

TABLE I SUMMARY
OF DATASETS USED IN GRAPH CLASSIFICATION TASK WITH FULL OBSERVABILITY AT BUSES TABLE II SUMMARY OF DATASETS USED IN GRAPH CLASSIFICATION TASK WITH PARTIALLY OBSERVABLE BUSES AND LINES Fig. 3. Graphical representation of the 342-bus low voltage North American distribution network.The source node is marked 'S' and represented in green, blue nodes are buses with loads, and black nodes are the interconnecting buses.

TABLE III OVERALL
CLASSIFICATION PERFORMANCE (%) (± STANDARD DEVIATION) OF DIFFERENT METHODS ON TEST NETWORKS FOR A CASE WITH ALL BUSES OBSERVABLE.* * * DENOTES THE HIGHLY STATISTICALLY SIGNIFICANT RESULT

TABLE IV CLASSIFICATION
PERFORMANCE (%) (± STANDARD DEVIATION) OF DIFFERENT METHODS ON IEEE 37 BUS , IEEE 123 BUS , AND 342 BUS LVN (I.E., NEW TEST CASE ON NETWORKS) WITH PARTIALLY OBSERVABLE DISTRIBUTION GRIDS.* * * DENOTES THE HIGHLY STATISTICALLY SIGNIFICANT RESULT TABLE V OVERALL CLASSIFICATION PERFORMANCE (%) (± STANDARD DEVIATION) OF DIFFERENT METHODS ON IEEE 123 BUS WITH 30% SENSOR DENSITY.
* * * DENOTES THE HIGHLY STATISTICALLY SIGNIFICANT RESULT complex-based model (SNNs) by 3.09%.For IEEE 123 Bus , we find that our HOT-Nets model significantly outperforms all baselines, i.e., p-value 0.01 by t-test.Specifically, HOT-Nets can improve upon the runner-up of attention-based GNNs (i.

TABLE VII ABLATION
STUDY OF THE NETWORK ARCHITECTURE ON IEEE 123-BUS .DENOTES THE SIGNIFICANT RESULTPIs into the representation learning of topological information, in conjunction with the classification of distribution grid status.Furthermore, we find that combining the simplicial convolutional layer and the fully trainable topological layer results in more informative learning of the underlying graph structure, yielding more competitive classification performance (where HOT-Nets achieves relative gains of up to 7.83% and 12.99% over HOT-Nets W/o Topo.andHOT-Nets W/o HoSC (NT-Nets), respectively).Similarly, to demonstrate the effectiveness of our HOT-Nets model in the IEEE 123 Bus dataset, we have added the ablation study.From Table *