Graph Neural Networks for Voltage Stability Margins With Topology Flexibilities

High penetration of distributed energy resources (DERs) changes the flows in power grids causing thermal congestions which are managed by real-time corrective topology switching. It is crucial to consider voltage stability margin (VSM) as a constraint when modifying grid topology. However, it is nontrivial to exhaustively search using AC power flow (ACPF) for all control actions with desired VSM. Sensitivity methods are used to solve this issue of “power flow-free VSM estimation” to screen candidate control actions. However, due to the volatile nature of DERs, sensitivity methods do not perform well near nonlinear operating regions which is overcome by solving ACPF. Here, we propose a new VSM estimation method that performs well at nonlinear operating regions without solving ACPF. We achieve this by formulating the learning of graph neural networks like the matrix-free power flow algorithms. We empirically demonstrate how this similarity bypasses the inaccuracy issues and performs well on unseen operating conditions and topologies without further re-training. The effectiveness is demonstrated on a power network with realistic load and generation profiles, various generation mix, and large control actions. The benefits are showcased in terms of speed, reliability to identify insecure controls, and adaptability to unseen scenarios and grid topologies.


I. INTRODUCTION
T HE high penetration of renewables, climate change, and extreme events are causing the power system to operate close to its operation boundary which results in congestion. For example, climate change has pushed up the average temperature every year, and 20 to 40% of the global population have experienced over 1.5 deg warming in at least one season [1] causing the steady increase in power demand. From the transmission/independent system operator's (TSO/ISO) point of view, to handle the congestions in power systems, it is shown that they have an interest to use existing grid infrastructure to perform corrective topology switching [2], [3], [4] before relying on other costlier strategies such as load shedding, reinforcing the grid, renewable energy curtailment, etc.
However, when the network topology is modified through topology control actions, the maximum power transfer capability of the grid also changes since topologically it is not the same grid anymore [5]. Thus, it is important to know if the selected corrective topology control action will not only manage the thermal rating violations on the grid but also ensure that it will not trigger a power grid blackout due to voltage instability [6]. The distance (margin) to the voltage collapse point from the current operating condition is called voltage stability margin (VSM). Therefore, there is a need for methods that can exhaustively search for candidate topology control actions that satisfy the desired VSM threshold in realtime since corrective topology switching for thermal congestion management is a real-time application [6], [7], [8], [9].
One approach to estimate the VSM is by using traditional power flow solvers [10], [11], [12], [13], [14], [15], and continuation power flow is popular method used for this VSM estimation [16]. However, these traditional power flow solver-based methods are too slow [17] for real-time applications and assumes how the system loads may increase to compute the margin [17]. To address the assumption on load increase direction, boundary tracing methods are proposed which exhaustively trace the entire stability boundary for various load increase directions [18]. However, this approach is still slow for real-time control applications due to its exhaustive search properties. To solve the issues of exhaustivity and speed, real-time measurement based methods are proposed to estimate VSM [19], [20], [21], [22]. Even though [19], [20], [21], [22] solves these issues, they require real-time voltage measurements which are not available throughout the grid. Thus, there is a need for power flow-free and measurement-free estimation of VSM when developing real-time topology switching applications. To address the problem of power flow-free estimation of VSM, for the first time, [23] proposed a sensitivity analysisbased real-time power flow-free estimation of VSM by using the Jacobian at voltage collapse point. This is a desired method as it is adaptable to volatility and flexible to topology changes in the power grid. This work in [23] was extended to real-time topology switching applications by [7], [8], and [9]. However, [7], [8], [9] suffer from the linearity assumption of power grid physics since the original method in [23] assumes the same.
Fortunately, the historical data is becoming readily available due to continuous investment of sensor systems and advances in machine learning (ML). Researchers have used this data to propose ML based algorithms to quickly estimate the VSM accurately without linearity assumption. [24], [25], [26] proposed a decision tree, reactive power reserve (RPR), and absolute shrinkage based approaches for calculating VSM. Even though [24], [25], [26] are faster and do not have linearity assumption like in [7], [8], [9], and [23], they still suffer from being adaptable to volatile operating conditions introduced due to intermittent energy resources. Solving the challenge of predicting VSM in a volatile environment, [27], [28], [29] used multi-layer deep neural networks (DNNs), convolutional neural networks to estimate the VSM. But they are not flexible to accommodate topology changes in the power grid which occur in the case of corrective topology switching actions. To adapt to grids with flexible topology, [30], [31], [32] were proposed. Even though [30], [31], [32] solved the issue of adapting to topology changes and volatile grid operating conditions, they have two fundamental issues. 1) These methods improve the existing model through an iterative re-training process which is undesirable in an online application where speed is a concern, and 2) lack of data is also a major hindrance (datasets containing all combinations of topology and loading conditions is impossible to generate due to the combinatorial nature of power grids). Power system researchers have recently gained more interest in the application of special type of convolutional neural networks known as graph convolutional neural networks (GCNNs) to the power system problems. For example, GCNNs are employed in research areas such as (but not limited to) transformer fault diagnosis [33], [34], [35], fault localization [36], fault detection and isolation [37], solar power prediction [38], [39], wind power/speed prediction [40], power flow approximation [41], [42], and optimal power flow [43] Contributions: To fundamentally resolve the issues of the iterative re-training process and lack of data while also ensuring the adaptability to volatile operating conditions and flexibility to topology changes, we propose a graph convolutional neural network (GCNN) based method.
We propose a power system specific graph convolution layer by embedding the information of power system topology into the optimization process when learning the neural network weights. In such a framework, the embedding of topology information leads the proposed method to achieve the following 1) it can solve volatile environments, adapting to unseen topologies and unseen scenarios without any need to re-train unlike [30], [31], [32]; 2) it is a power flowfree VSM estimation method that can predict well when the power grid is heavily loaded where the online sensitivitybased methods like [7], [8], [9], [23] fail due to linearity assumption; 3) it does not require to be re-trained nor requires new data once trained; and 4) to our knowledge, this is the first machine learning-based VSM estimation work that is designed address the problem and to learn both line and bus switching control actions, and introduces to the power system community about how GCNNs are similar to traditional power flow solvers [11]. Additionally, the proposed method may require a wide variety of training data. To solve this issue we used synthetic data developed by industry with careful design to represent real-world information [44] with a variety of generation mix and seasonal variations throughout the year.
In this work, we present the numerical validation in comparison with the existing method in the literature to demonstrate the advantages of the proposed method on a power network design (made publicly available in [45]) with elaborate operating conditions, various types of the generation mix, seasonal variation in operating conditions, and line as well as bus switching control actions. These real-world scenarios are generated using publicly available Chronix2Grid [44] developed by RTE (TSO of the French power grid). We also compared the proposed method with the power flow-free domain-specific method like in [7] and [9] that estimates VSM for different topology control actions, which are currently used by several independent system operators (ISOs) in USA [6]. We show a significant improvement in speed and accuracy in estimating VSM when compared to well-known sensitivity-based methods [7], [9].
The paper is organized as follows: Section II presents the different realistic design settings of the selected power grids. Section III presents the impact of control actions on the VSM. Section V demonstrates how GCNNs resemble traditional power flow solvers. Section IV presents the proposed GCNN architecture. Sections VI showcases the efficacy of the proposed method on various case studies.

II. POWER GRID DESIGN AND PROBLEM DESCRIPTION
In this section, we present various rigorous details of designing the case study in a more realistic setting. It is key to understand that the inclusion of the following details makes the problem not only realistic but also more challenging to solve. To our knowledge, such a design is not considered in the existing machine learning-based VSM estimation literature. Here, we present a carefully designed IEEE-14 bus system with time varying load and generation profiles (5-minute resolution), various types of generation mix, thermal limit enforced transmission lines, and line switching as well as bus switching topology reconfiguration actions.

A. OPERATING CONDITION DESIGN OF THE GRID
The power grid presented in Fig. 1 is injected with production and consumption profiles, which are generated by using the real-world profiles' statistics [44]. The generations are indicated by the green circles and the consumptions are indicated by the yellow circles. Each vertex (blue circle) represents a substation. Each substation in this grid contains two busbars. These busbars in each substation node are named bus 1 and bus 2 as shown in the legend of Fig. 1. The generators 0, 1, 2, 3, 4, 5 in Fig. 1 are nuclear, thermal, wind, solar, solar, and hydro respectively.

B. LINE THERMAL OPERATING LIMITS DESIGN
The transmission lines in this network are provided with thermal limits and when a transmission line is overloaded, it will disconnect and becomes out-of-service. The % value for each transmission line in Fig. 1 represents the percentage of loading for that corresponding operating condition.

C. MAPPING OF TOPOLOGY ACTIONS INTO THE NETWORK
The following approach is implemented to reconfigure the power grid topology which is referred to as ''action(s)'' in the remainder of this paper. The action space of the power grid topology reconfiguration involves two types of actions i.e., ''line switching'' and ''bus splitting/merging'' actions. A line switching action decides whether a transmission line should be in-service or out-of-service. This may change the number of edges in the graph representation of the power grid depending on whether another line parallel to the switched transmission line exits. However, a bus splitting/merging action is a more complex topology reconfiguration and a needed one for planning studies according to NERC system analysis and modeling subcommittee [46]. As shown in Fig. 1, inside each substation (blue circle) there are two bus bars (Bus 1 and Bus 2) to which all the transmission lines (incoming and outgoing), loads (yellow circles), and generations (green circles) are connected. By flexibly connecting the electrical equipment in the substation to either of these bus bars, it results in a new electrical circuit as described in [46]. Thus, a bus splitting/merging action may change the number of electrical nodes in the graph representation of a power grid.

D. TOTAL POSSIBLE TOPOLOGIES
The power grid presented in Fig. 1 has 14 substations, 20 transmission lines, and 16 injections (loads and generations). The total possible line switching actions for a grid with 20 transmission lines is 2 20 . Similarly, the total bus switching actions at a substation with k elements is ≈ 2 k−1 . At a given state/snapshot (unique combination of loading condition and grid topology), Tab. 1 shows the approximate size of the total possible actions available to be selected by the operator.
For a given loading condition, it is nontrivial to solve the continuation power flow (CPF) [16] to estimate VSM in a reasonable computing time for all possible power grid states resulting from the topology actions in Tab. 1. Using CPF, this computational time issue further worsens when identifying the optimal sequence of topology actions over a time horizon. Thus, the proposed method in this work is used to estimate VSM considering all the intricacies of the power grid as described in this section, in a fast manner without needing to solve computationally intensive AC power flows for all control actions set at every unique state.

III. OUTPUT LABEL: CLASSIFICATION OF TRAINING DATASET WITH VOLTAGE STABILITY MARGIN
Before introducing the proposed machine learning framework to estimate the voltage stability margin (VSM i.e., output label of the training data set) under different loading scenarios and topology reconfigurations, VSM needs to be quantified in the input training data set using either CPF [16] or a similar technique that is faster with no approximations such as ''distributed voltage stability index'' (D-VSI) from [21]. It is useful to understand the VSM (i.e., D-VSI) and how the different loading scenarios and topology reconfigurations impact the VSM. In this section, the impact of loading conditions on VSM is demonstrated on the test system.
To calculate the voltage stability margin of the grid for a given topology and loading condition, we use the D-VSI from [21]. It is given by where p , q , and pq are functions of radii and centers of ''power flow circles'' from [21]. In this section, the characteristics of VSM (D-VSI) are demonstrated on the IEEE-14 bus system and the power grid from Fig. 3 with focus on 1) how D-VSI quantifies VSM, and 2) how different loading and topology reconfiguration conditions impact the VSM respectively. First, the IEEE-14 bus system is used to understand how D-VSI from [21] quantifies the VSM. Fig. 2 shows the plot of D-VSI calculated at buses 9, 10, 13, and 14 of the IEEE-14 bus system under uniformly increasing loading conditions (using the continuation parameter (λ) from CPF in [16]). According to the theory in [21], when D-VSI = 1, it represents a no-load condition, and when D-VSI ≈ 0, it represents the voltage collapse point. Thus, D-VSI provides a measure of distance between the current operating condition of the grid and its voltage collapse point. For example, from Fig. 2, when the load multiplying factor (operating condition) λ = 2.2, the corresponding D-VSI value at bus 14 is 0.54, which provides a measure of distance to the voltage collapse point from the current operating condition (λ = 2.2). This D-VSI value changes dynamically for each state/snapshot. However, it always represents the measure of distance to voltage instability phenomenon. An unnormalized D-VSI does not have an upper bound, and in this work, the D-VSI used is the unnormalized version described in (1) since lower bound (grid outage when D-VSI≈0) is of interest to us.

A. DESIGN OF GCNNs FOR VOLTAGE STABILITY MARGIN ESTIMATION
In this subsection, we present the detailed information related to the datasets, inputs, and outputs of the proposed GCNNs. We also present the proposed graph convolution/fixed point equation used for VSM estimation used in this work.

1) DATASETS
The datasets used for the training and testing of the proposed GCNNs are as follows. The candidate substations (subID = {1, 3, 4, 5, 8, 12}) are first selected from the system in Fig. 1 as they are prominently used in corrective topology switching to manage thermal cascade outages as per the industry analysis in [47] and [48]. We have also considered the reactive power limits of the synchronous condensers in this paper when performing topological actions. We also used constant power models in the studies presented in this paper. The line and bus switching actions that topologically impact only one of the substations in subID set are used to generate the data for various loading scenarios. This data is split into training and testing/adaptability datasets using the 70−30 rule which is used to check for performance on unseen loading scenarios. The size of the training dataset used to train the GCNN is 35,742. The size of the testing dataset used to evaluate the trained GCNN model has 11,914 unique graph instances.

b: FLEXIBILITY/TESTING DATASET-2
The trained GCNN model must also perform well on the new topologies that are very different from the topologies the GCNN model is trained upon. Therefore, we generate another set of graphs or data samples but this time, the line and bus switching actions performed on the grid impact two substations from subID set at a time. This dataset has 7,965 unique graph instances.
The goal is that the trained model can learn the generalization rule such that it can predict insecure operating states (based on D-VSI) accurately not only for unseen loading scenarios i.e., adaptability/testing dataset-1, but also for new unseen topologies i.e., flexibility/testing dataset-2.

c: DATA GENERATION PROCESS
Instead of stressing the system like explained in Fig. 2, we generate the datasets as follows.
1) We considered different operating conditions using historical load and generation profile data. Fig. 4 presents the load and generation injection in the grid. These are generated using the real-world profiles' statistics from [44] ( [44] is developed by the French ISO, RTE.). 2) We generated different operating conditions by sampling an operating condition at 2 hour resolution from Fig. 4. 3) Using the two-hour resolution operating condition, we generated the dataset in this paper by applying all actions in the action space (action space described in adaptability/flexibility datasets) for each operating condition. This is illustrated in Fig. 5. It is important to note that we considered the impact of only one topology action at a time for a given operating condition but not the combinations due to the combinatorial nature of the data generation. However, each topology action can have multiple changes in topology (like N-1, N-2). This is because we considered all possible topology actions that impact the critical substations/corridors in the power grid. 4) We used the D-VSI [21] to directly estimate the proximity to voltage collapse for each developed scenario (from above point) since the D-VSI from [21] can be computed for a given scenario of a grid without stressing the system.

B. INPUT FEATURES
The GCNNs take graphs as input, meaning that it uses the adjacency matrix (A ∈ R |V|×|V| ) to understand the connectivity between the nodes in the graph, which also requires both node and edge features. In this work, we do not use edge features since we observed that the depth of the GCNN (K ) and epochs (Q) were sufficient to generate good VSM estimations. The nodes in the input graph are segregated as V = V L ∪ V G , into n purely load buses (PQ buses), denoted by V L and m generator buses (PV buses), denoted by V G . Each node i ∈ V has four features i.e., {P load,i , Q load,i , P gen,i , V setpoint,i }. Therefore, a graph G = (V, E) will have an input feature matrix x ∈ R |V|×4 . We used a realistic setting for experiments. For example, for a PQ bus, only the real and reactive power information is assumed to be known. Similarly, for the PV bus, only the real power generation and voltage set points are assumed to be known. Hence, different combinations of grid loading conditions (node features x) and topologies (adjacency matrix A) generate different unique graph representations (input samples).

C. OUTPUT LABEL
Estimate the D-VSI ∈ [0, 1] (voltage stability margin) value at each node (substation) in the power grid. We scaled the D-VSI values by multiplying with 10 so that the output labels are not small numbers between 0 and 1.

D. POWER FLOW FREE VSM ESTIMATION
On a high level, as shown in Fig. 6, the goal of the proposed GCNN is to estimate D-VSI for all nodes in the grid given the VOLUME 10, 2023 adjacency matrix and the node features. It can be known that these inputs i.e., adjacency matrix and the node features are the fundamental inputs for a power flow solver. Hence, in this work, estimation of D-VSI uses only modest input features and it does not require voltages as input features which otherwise need to be obtained either from a power flow solver (computationally expensive when generating training data and not practical due to combinatorial nature of power grid) or field PMUs (input voltage measurement can be relatively noisier [49], [50]). Therefore, even though the manual calculation of D-VSI from [21] requires voltage measurements, the proposed GCNN predicts the D-VSI directly without needing voltage measurements as its input. This ''power flow free margin estimation'' helps in the efficient generation of data for model-based control applications by avoiding states with undesired VSM values since there is no burden due to computation of power flow. It also helps in estimating the power grid margin (D-VSI) instantaneously which helps in CPU intensive applications such as Monte Carlo Tree Searchbased (MCTS) solutions (like AlphaGo) for power grids with margin constraints [51], [52] (these approaches are currently impractical for power grids due to the dependency on power flow solvers).

E. PROPOSED ARCHITECTURE WITH GRAPH CONVOLUTIONAL LAYER FOR POWER GRIDS
The proposed architecture with the graph convolutional layers is presented in Fig. 7. The overall architecture contains a three-layer stack of proposed graph convolutional layers followed by ReLU activation units and a dense layer. The parameters of the GCNNs are adjusted during the learning process while they generate graph embedding representation of the input graph samples. The graph convolution equation generates the node embeddings for node i at k + 1 iteration as follows where x i = P load,i Q load,i P gen,i V set,i T , the weights w 1 , w 2 are optimizable parameters used to ensure that the feature information from neighboring node j x k j ∀j ∈ N (i) does not completely dominate that of the main node x k i when generating the node embedding x k+1 i at k + 1 iteration/epoch.

V. DEMONSTRATION OF HIGH RESEMBLANCE BETWEEN GCNNS AND TRADITIONAL POWER FLOW ALGORITHMS
Section II talks about the impact of different loading conditions and topology reconfigurations on VSM (D-VSI). Given the large size of action space from Tab. 1, we need a fast estimation tool for VSM. There are many other use cases for the tool, and it is an active topic pursued by both research and industry communities worldwide [4], [30], [48]. Here, we present a learning-based approach that uses graph neural networks to estimate VSM correctly over a wide range of operating conditions and topology reconfigurations without the need to retrain in [30], [31], and [32]. Our method is also more accurate than the other domain-based methods numerically [7], [9].
In this section, we first introduce the computational learning process of GCNNs. Second, we focus on how the topology is encoded into the GCNNs' learning process by sharing its close resemblances with matrix-free distributed power flow solvers [11], [53]. This reasoning helps to understand the better performance of GCNNs presented in this work over a wide range of operating conditions and topologies.

A. GRAPH CONVOLUTIONAL NEURAL NETWORKS
We will first define few terms associated with a graph network. Given an AC power network/graph (G) with set of nodes (V) and edges (E), where |V| = N is the number of nodes in the graph. Each node i from the set of nodes (V) in the graph has a set of node features denoted by x i ∈ R M 1 . Similarly, each edge e i,j joining nodes i and j from the set of edges (E) in the graph has a set of edge features denoted by e i,j ∈ R M 2 . In the rest of this work, we use ''bold'' and '' ''(bar) to identify the matrices and vectors respectively. Fig. 7 presents the architecture of GCNNs. The inputs of a GCNNs are a set of graphs ({G 1 Each input graph sample G t = (V t , E t ) ∀t ∈ {1, 2, · · · , p} consists of three types of information, 1) node feature matrix x t ∈ R |V t |×M1 , 2) edge feature matrix e t ∈ R |E t |×M2 , and 3) adjacency matrix A t ∈ R |V t |×|V t | .
The graph convolution layer differentiates GCNNs from regular NNs. Before we look at the overall GCNN algorithm, it is important to understand the functioning mechanism of a graph convolution layer. Graph convolution layers are the basic building blocks of GCNNs. They perform an operation known as the graph convolution step. It is given by where is known as update function, is known as aggregate function, h i , and h j represents the hidden embedding at nodes i and j, e j,i represent the feature vector of the edge connecting nodes i and j, superscript k represents the iteration, and N (i) represents the set of nodes neighboring to node i. Equation (4) is the standard definition of a graph convolution step and hence the functions ''f'', '' '', and '' '' are not explicitly defined. The graph convolution step proposed in this paper is (2) from Section IV.

1) GRAPH CONVOLUTION STEP AND NEIGHBORHOOD INFORMATION CAPTURE
Equation (3) is of the general form x = g(x) which is generally referred to as the ''fixed-point form'' [54]. Typically, this fixed-point form is used to calculate x in an iterative manner and different fixed-point forms have different convergence properties [54], [55]. Thus, selection of an appropriate fixedpoint/graph convolution equation using f , , has a major impact on the learning/convergence behavior of GCNNs [55]. According to (4), at iteration k +1 of the GCNN, as described in Fig. 7 using the seven node system example, the aggregate function ( ) takes the set of embeddings of the nodes in i's graph neighborhood (N (i)) and computes the function value based on the neighborhood information. The update function ( ) combines the function value with the previous iteration embedding h (k) i of node i to generate the updated embedding h (k+1) i . It is important to note that the iterations (k) advance with the number of ''graph convolution layers'' in GCNNs. It is also sometimes referred to as the ''depth'' of the overall GCNN architecture. For example, Fig. 7 shows an illustration of the graph convolution step when k = 0 (initialization). When k = 0, the fixed-point equation in (3) at node 2 becomes h j , ∀j ∈ {1, 2, 4} (fixedpoint update). This fixed-point update is computed at every node in the graph as shown in Fig. 7.

2) OVERALL ALGORITHM
Algorithm 1 formally presents the overall optimization process. Steps 4 − 8 represent the fixed-point equation updates (graph convolution step) described in (3). Steps 9 − 14 are common for any deep learning architecture where the neural network weights are optimized by computing the loss function, gradient, and weight parameter updates. We will use Algorithm 1 next to understand its relation to the matrix-free distributed power flow solver in [11].

B. RESEMBLANCE WITH MATRIX-FREE DISTRIBUTED PF SOLVERS
In this subsection, we present a new perspective about GCNNs such as how the training procedure of GCNNs is very similar to a standard power flow solving algorithm in the literature. This analogy aims to emphasize how GCNNs can naturally learn/train to mimic a traditional power flow solver that is used to solve the power flow problem. This explanation helps to better understand the observed results in this paper. More specifically, we will present how the GCNNs share resemblance with matrix-free distributed power flow algorithm in [11] and [53] and how the topology is encoded into the learning of GCNN optimization. In this article, we refer to the matrix-free distributed power flow algorithm from ''Algorithm 1 presented in [11]'', but it is not presented here in the interest of space. Fig. 8 is used to illustrate the similarity with [11] and [53]. The left-hand side of Fig. 8 shows the GCNN architecture discussed in Fig. 7. It is possible to directly transform the Algorithm 1 equivalently into the Gauss-Seidel power flow solvers in [53] by 1) removing steps 10 − 14, 2) replacing the fixed-point equation presented in (3) with the fixed-point equation from [53], and 3) selecting K to be large positive integer (e.g., 100) in Algorithm 1. This direct transformation is visually demonstrated in Fig. 8. However, the power flow algorithm in [11] and [53] can be used to analytically solve only one snapshot in a single run whereas the proposed Algorithm 1 can learn to estimate outputs of multiple operating conditions and different topologies efficiently. The intuitions are 1) Introduce Learning via Non-Linearity: We can add non-linear activation functions with optimizable weights (ReLUs) to the power flow solver architecture (right side) in Fig. 8. The universal approximation nature of ReLUs makes the learning possible and this learning depends on only local features (due to graph convolution steps). 2) Introduce Deep Learning: We can stack the graph convolution layer with a fixed-point equation from (3) followed by ReLUs and dense layers as shown in Fig. 8 (left side) to make the learning deep. Therefore, Algorithm 1 is the transformation of the power flow in [11] and [53], except that the Algorithm 1 has the capability to solve multiple snapshots with the introduction of non-linearity and deep learning aspects. The graph convolution equation used in step 7 of Algorithm 1 is presented in (2).

1) RESEMBLANCE WITH GAUSS-SEIDEL LIKE POWER FLOW ALGORITHM
In the interest of space for this initial submission, a more in-depth comparison is not presented in the paper.

VI. SIMULATIONS AND DISCUSSION
In this section, simulation studies on a power network from Fig. 1 are presented but with real-world resembling load and generation (nuclear, thermal, wind, solar, hydro) profiles using the Chronix2Grid developed by RTE [44]. Additionally, the test system is designed to accommodate not only line switching actions but also bus splitting/merging actions for the first time in the ML-based VSM estimation literature. The proposed GNN-based method is compared with the online sensitivity-based methods like [7], [9] that are currently used by several independent system operators (ISOs) in the USA [2]. The proposed method is also compared against a different graph neural network called ''GCNConv'' known for its performance [56]. The the datasets are generated using the Grid2Op framework available from [45]. The complete framework for this study is programmed in Python. The optimizer used is Adam with standard Pytorch parameters [57]. The training is performed on a GPU and Pytorch geometric library [58] is used for the implementation of the proposed GCNNs. The utilized machine has 128 GB DDR5 RAM and each instance was trained using a Nvidia GeForce RTX 3090.

A. METRICS TO ANALYZE THE PERFORMANCE OF VSM ESTIMATION
This work originated from a real-world problem of corrective topology switching observed by RTE [47] and many other industry communities [2], [3]. Extensive studies have been conducted and implemented in the industry where corrective topology switching actions are performed to manage the congestions on power grids while ensuring that the constraint on voltage stability margin (VSM) is satisfied [7], [8], [9]. This constraint on VSM is generally enforced by a threshold value selected by the operator to ensure that the grid always operates above this pre-selected margin even after performing a topology switching action [7], [8], [9]. For real-time corrective topology switching of the grid, the literature uses an online sensitivity-based method [7], [8], [9] to calculate the VSM and enforce the voltage stability constraint. However, [7], [8], [9] are inaccurate due to their linearized assumption of grid dynamics for heavy loading conditions and are also very slow for practical implementation of deep MCTS-based solutions (e.g.: AlphaGo [51], [52]) for power grids. Thus, in this work, we use GCNN-based power flow free VSM estimation for fast and accurate identification of insecure operating states. Throughout the numerical validation, conforming to the existing literature and real-world enforcement of voltage stability constraint for topology switching [7], [8], [9], we analyze the performance of the proposed GCNN-based method using a pre-defined threshold which is generally selected by an operator. This constraint on VSM (threshold) set by the operators makes the VSM estimation into a classification problem (safe or unsafe) (similar to the formulations in [7], [8], and [9]).  For example, similar to use cases in [7], [8], and [9], we analyze the performance using the constraint (λ) on VSM which is used to distinguish the given snapshot of a power grid as a safe or unsafe scenario as shown below VSI crit ≥ λ, (operating condition: safe, negative class) VSI crit < λ, (operating condition: unsafe, positive class) where VSI crit = min(D-VSI i ) ∀i ∈ N . Furthermore, we used the following metrics to measure the performance of the trained model, 1) false negative rate (FNR), 2) recall, 3) MCC (correlation coefficient between actual and predicted data for classification problems), and 4) error rate (ratio of number of incorrect predictions to the total predictions).

B. PERFORMANCE BENCHMARK WITH GCNCONV-BASED GCNN
In this subsection, we compare the proposed method with a similar GNN-based method known as GCNConv [56].
The comparison is performed on various perspectives such as 1) training and testing errors on adaptability dataset, 2) testing errors on flexibility dataset, 3) performance evaluation of the proposed method and GCNConv using the metrics from Section VI-A. Finally, [59] showed that the best way to achieve a good performance on graph-level classification is to initialize the problem with a large optimization parameter space and obtain the average of the output using pooling layers [59]. However, we present our finding that the physicsdriven learning nature of GCNNs does not require pooling but can learn well with relatively smaller optimization parameter space. This is also demonstrated in this subsection.
Train & test errors of proposed method versus GCNConv: The loss function to minimize on the training dataset is selected to be the mean squared loss for both proposed method and GCNConv. It is given by where z = batch size; y, andŷ are actual, and predicted values of VSI crit . The training and testing errors on adaptability dataset for proposed method and GCNConv are presented in Fig. 11. It can be observed from Fig. 11 that as the epochs increase, the training errors of proposed method and GCN-Conv decrease. From Fig. 11, at epoch = 0, we can observe that both proposed methods and GCNConv are initialized at a similar point (mean squared error = 400 on training and 100 on testing). Even though both methods are initialized closely, the minimization rate of the error for the proposed method is shown to be superior when compared to that of the GCNConv. By looking at the performance comparison of the adaptability dataset, we can see how the proposed method can extend its learning to unseen scenarios more effectively. Furthermore, as shown in Fig. 12, the proposed method performed exceptionally compared to GCNConv on the flexibility dataset. The flexibility dataset has the topological changes that impact two substations at a time whereas the training dataset (adaptability dataset) only contains the topology changes that impact one substation. By looking at the performance comparison on the flexibility dataset, we note how the proposed method can extend its learning to unseen topologies better.
Proposed method versus GCNConv with pooling layer: In the case when a pooling layer is present, both the proposed method and GCNConv directly predict the critical VSI of a grid/scenario ( VSI crit ). Fig. 9 shows the actual critical VSI versus predicted VSI plot for both methods on both the adaptability dataset and the generalization dataset. It can be observed from Fig. 9 that the proposed method and GCNConv predict the output label well in the case of the adaptability dataset which can be observed by the fact that the majority of the data points lie on the identity line (predicted = actual blue line). However, the performance of GCNConv and the proposed method on the flexibility dataset is quite different from each other. According to Fig. 9, for the flexibility dataset, the proposed method underestimates the VSM while GCNConv overestimates the VSM of the scenarios. Therefore, GCN-Conv performs better than that of the proposed method on both datasets. However, it can be observed that the performance of GCNConv does not perform well on the flexibility dataset since the predicted data is not aligned with the identity line. The reason for the poor performance of the proposed method is because the pooling layer introduces inaccuracies as it dominates topology-based learning. Thus, in the next discussion we show that the proposed method performs better than GCNConv on both datasets when the pooling layers are removed, and the total optimizable parameters are reduced (implicates easy convergence on data in less training time).
Proposed method versus GCNConv without pooling layer: In case of not using the pooling layers, both the models directly predict VSI at every node in the power grid, and the minimum of predicted VSI values at all nodes is considered as the predicted critical VSI of the grid ( VSI crit ). Fig. 10 shows the actual critical VSI versus predicted VSI plot for both methods without pooling layers on both the datasets. It can be observed from Fig. 10 that both the models have a similar performance on the adaptability dataset. However, unlike the case study with pooling layers, in this case study without pooling layers, the proposed method performs better than that of the GCNConv on flexibility dataset by predicting the critical VSI values closer to the identity line. Therefore, in the next discussion, we further analyze the performance of the proposed method without pooling layers and GCNConv with pooling layers using various metrics to highlight the usefulness of the proposed method.
Proposed method without versus GCNConv with pooling: To further understand the performance benefits of the proposed method without pooling layer versus the GCNConv with pooling layer, Fig. 13 presents the various metric measures of the selected two models on both datasets. First, we explain whether having a high or low value of these metrics (error rate, false-negative rate (FNR), recall, MCC) is desirable to assess the models in this paper. The correlation coefficient between actual and predicted data for a classification problem. MCC values range from -1 to 1. If the MCC value is closer to 1, it indicates that the actual and predicted labels are strongly correlated which is desired for machine learning model. Fig. 13 (left side) presents the comparison (using the mtrics described above) between the proposed method and the GCN-Conv with pooling on the adaptability dataset (described in Section IV-A). From Fig. 13 (left side), for the adaptability dataset, it can be observed that GCNConv with pooling has better performance metrics when compared to the proposed method due to its lower error rate, smaller FNR, and high recall. However, it is also important to analyze the performance of the proposed method versus GCNConv on whether it can generalize its learning to the new unseen topologies (i.e., flexibility dataset -described in Section IV-A). Fig. 13 (right side) presents the comparison between the proposed method and GCNConv with pooling for the flexibility dataset. It can be observed that the proposed method has better performance in generalizing its learning when compared to that of the GCNConv i.e., smaller error rate, smaller FNR, and higher recall values. Therefore, by comparing the conclusions from the previous and current paragraph, we can observe that GCNConv overfits during the training. We can also verify this by comparing the MCC values of GCNConv between the adaptability dataset performance (left side) and flexibility dataset performance (right side) from Fig. 13. For example, we can observe that the MCC value of GCNConv for adaptability dataset (left side of Fig. 13) is 0.71 and its MCC value in the flexibility dataset (right side of Fig. 13) is 0.33 only. This indicates less generalizability of the trained model. Whereas the proposed method learned the real mapping rule of power flow-free VSM estimation on both adaptability (MCC = 0.8) and flexibility (MCC = 0.7) testing datasets.
IEEE 118-Bus System: The larger grid used in this case is the IEEE 118-bus system implemented in the L2RPN NeurIPS 2020 competition [48]. We generated different operating conditions for this study by using the historical load and generation profiles generated using [44]. We then performed different topological actions on the L2RPN NeurIPS grid and computed the D-VSI values [21] for all scenarios. After training using the proposed methodology, Tab. 2 presents the performance of the proposed method on both adaptability and flexibility datasets. It can be observed that the proposed method performs well because as discussed in this paper, the learning of the node labels in the graph convolutional neural networks is only dependent on its neighboring bus features. This tremendously reduced the learning complexity since the required input features to predict the node label is limited by the number of neighboring nodes connected to it.

C. PERFORMANCE OF PROPOSED METHOD VERSUS DOMAIN SPECIFIC ONLINE SENSITIVITY METHOD
Similar to the power flow-free margin estimation nature of the proposed method, the sensitivity-based method from [7], [8], [9], [23] is also specially designed of estimating the VSM in a very fast manner without needing to solve the power flow. In this subsection, we solve the problem to estimate the VSM in the case of line switching type topology reconfiguration that impacts one substation (adaptability dataset) and two substations (flexibility dataset) using both the proposed method and sensitivity method on the power network from Fig. 1. Fig. 14 shows the MW error in estimation of VSM using the sensitivity-based method in [7] and [23] for adaptability and flexibility datasets in Fig. 14a and Fig. 14b, respectively. It can be observed from Fig. 14 that the VSM estimation error is significant even when the Jacobian was updated frequently i.e., at the end of each 5-minute interval. This error is due to the two main assumptions made in [7], [8], and [9]. Assumption 1: The net power injections at a bus are approximately replaced by line flows in the formulation of [7] which causes an estimation error for the switching of lines that are carrying large flows. For example, the line IDs in Fig. 14a and Fig. 14b with the largest VSM estimation errors carry large line flows when compared to the other line IDs with relatively less error in VSM estimation. Assumption 2: Linearized assumption of the system through the Jacobian introduces estimation error when the current loading condition is near the system's actual collapse loading condition.
Tab. 3 presents the comparison between the proposed method, sensitivity method, and and brute force by AC power flow solver. The experiment consists of line switching actions that impact one and two substations. It can be observed that even though the AC power flow solver has zero error in identifying unsafe control actions but it takes a lot of time. Additionally, the sensitivity method [7] caters to estimating the VSM online and requires negligible time to compute, but it has large errors of 71.3% when switching lines, thus creating large line flows. It should be noted that sensitivity methods cannot adapt for ''N-2'' contingencies and hence it has a 100% error rate. However, the proposed method has relatively fewer errors and can be used for developing reliable topology controllers (as discussed in Section VI-A) without solving computationally intensive power flows.

VII. CONCLUSION
This paper proposes a power flow-free voltage stability margin (VSM) estimation method using graph convolutional neural networks (GCNNs) to reliably identify insecure operating states before deploying a topological control action. The key novelty of this work is the power flow algorithm motivated design of GCNNs and its adaptability to unseen scenarios and topological control actions. The design of the proposed GCNNs enables the power flow-free estimation of VSM since it does not require voltage measurements as its input. This opens the door for CPU intensive applications such as efficient data generation for model-based control applications by avoiding states with undesirable VSM, practical implementation of AlphaGo-based deep Monte Carlo Tree Search control solutions for power grids, etc. In addition, the proposed work is also used to screen the candidate contingencies that satisfy specific criteria on VSM without needing to solve millions of power flow problems for real-time corrective topology switching. Furthermore, the proposed method is shown to work when states stay at nonlinear operating regions. Despite the presented hard-to-solve problem design in this work (publicly available in [45]), the proposed method developed a fast and highly reliable model to identify insecure operating states due to its power flow physics-driven design of GCNNs.