Towards Model Reduction for Power System Transients with Physics-Informed PDE

This manuscript reports the first step towards building a robust and efficient model reduction methodology to capture transient dynamics in a transmission level electric power system. Such dynamics is normally modeled on seconds-to-tens-of-seconds time scales by the so-called swing equations, which are ordinary differential equations defined on a spatially discrete model of the power grid. Following Seymlyen (1974) and Thorpe, Seyler, and Phadke (1999), we suggest to map the swing equations onto a linear, inhomogeneous Partial Differential Equation (PDE) of parabolic type in two space and one time dimensions with time-independent coefficients and properly defined boundary conditions. We illustrate our method on the synchronous transmission grid of continental Europe. We show that, when properly coarse-grained, i.e., with the PDE coefficients and source terms extracted from a spatial convolution procedure of the respective discrete coefficients in the swing equations, the resulting PDE reproduces faithfully and efficiently the original swing dynamics. We finally discuss future extensions of this work, where the presented PDE-based modeling will initialize a physics-informed machine learning approach for real-time modeling, $n-1$ feasibility assessment and transient stability analysis of power systems.


I. INTRODUCTION
This manuscript is focused on building a computationally efficient and sufficiently accurate model describing the transient response of a transmission level electric power system to a significant perturbation -for example the disconnection and/or reconnection of a large generator. We consider the dynamics of the transmission level of power systems on a continental scale and focus on sub-minute transients on time scales ranging from one second to few tens of seconds. We follow an approach that is standard in power system studies and assume that the so-called swing equations [1], [2], giving the time-evolution of the voltage angles at all nodes on the power grid, provide a sufficiently accurate representation of the power system dynamics within the considered spatiotemporal scales. Stated differently in the language of modern machine learning, the spatio-temporal integration of the swing equations provide a high-fidelity representation of the ground truth. There are two competing aspects of the swing equations. On the one hand, they are based on physically meaningful quantities and parameters such as line capacities, machine inertia and damping. Accordingly they are expected to correctly capture the physics of the system. On the other hand, integrating these equations on a large, continental scale grid can be computationally very expensive, even for a single run corresponding to a specific initial condition. Obviously, it becomes even more expensive if the task is to screen many possible initial conditions, and often prohibitively expensive when the screening need to be repeated numerous times, testing many possible control actions. Model reduction for this type of online applications [3] comes as a way to strike a balance between accuracy and computational complexity. Central to this optimization is that the transient dynamics of interest occurs over time scales up to few tens of seconds while the goal is to numerically resolve multiple scenarios of initial conditions and various controls faster than real time.
How does model reduction work? In the current era of deep learning, many model reduction techniques rely on neural networks and other tools of modern data science and machine learning, see e.g. [4]- [7]. The idea is to use the ground truth model -the swing equations in our caseto produce dynamical data, and then to train a pre-selected reduced model on these datasets to fine-tune the parameters of the model. If the reduced model is of an application agnostic type, as is customary in mainstream machine learning, the scheme relies on very large datasets. However, recall that running our ground truth model is computationally expensive. Then, if producing the needed training datasets is not an option, can we still hope to build a reliable reduced model? Our only hope in this case is to inject the relevant, application-specific information -in our case information about the power system physics -into the model reduction framework. Physics-Informed Machine Learning (PIML) is the modern approach to resolve the model reduction bottleneck -that is to compensate for the lack of data (typical of online applications) by building models that are aware of the underlying physics [8], as, e.g., expressed in terms of differential equations [9]- [11]. (See also [12], [13] for discussion of the application of PIML to power systems.) Why is Partial Differential Equation (PDE) modeling a sound option for power system model reduction? In this manuscript, we propose a first step towards developing PIML for general online applications and advancing model reduction of PIML applications to power systems, as e.g. developed earlier by members of our team [13], [14]. Similarly to [12], we take advantage of the PIML approach and construct an online framework for simulating power system dynamics faster than real time. We are however aiming to capture the transient dynamics in a very large, continental-scale power system, a goal that has not been addressed by any earlier related approach we are aware of. Accordingly, we choose to build our reduced model on the continuous PDE approach to modeling power system dynamics pioneered by Semlyen [15] and later extended by Thorpe, Seyler, and Phadke [16] (see also [17]). These works were however restricted to spatially-continuous one dimensional, time-dependent systems, i.e. with 1+1 dimensional PDE. Our PDE approach to power systems, to be presented below, inherits all the relevant physics of the original swing Ordinary Differential Equations (ODEs), accordingly it is 2+1 dimensional. Thus, it resolves power grid dynamics over a spatially-continuous two-dimensional domain associated with the power system's geographical area of service. Approximating the swing ODEs by a PDE may seem strange at first sight, as naively, this transition dramatically increases the number of degrees of freedom. However, this naive thinking is not quite right for several reasons. First, numerical solutions of linear 2+1 dimensional PDE assume spatial regularization via a twodimensional grid, where the grid size can be chosen accord-ing to the desired spatial resolution. Therefore, the number of grid points may eventually be comparable to or even smaller than the number of nodes in the original grid. Second, numerical operations, such as matrix inversion, can be performed much more efficiently on a regular grid than on a complex meshed graph. Third, and most importantly, the number of physical parameters in the original power grid model (line capacities, machine inertia and damping coefficients) may be reduced significantly within the PDE approach. Indeed, within the reduced model, we want to faithfully capture only the long-wavelength components of the swing dynamics. This justifies using a coarse-grained/filtered expression for all the coefficients in the linear PDE, therefore representing the coefficients via only a few long-wavelength harmonics.
Our Contribution: In this manuscript we make the first steps towards a novel online methodology for multi-scenario testing and control based on modeling the dynamics of a large, continental scale power system within a novel 2+1 PDE modeling framework. We show how a properly coarsegrained PDE model faithfully captures the power grid transient responses to disturbances of a high fidelity model. Our approach is rather heuristic, but it is backed by physical understanding of how perturbations propagate over the grid. Therefore, we "prove by example" -illustrating our model reduction methodology on the PanTaGruEl model of the synchronous grid of continental Europe introduced in [18], [19]. Specifically, PanTaGruEl simulates power flow and swing equations with high fidelity to produce the ground truth data. The latter in their turn are used to infer a spatially continuous 2+1 dimensional PDE model. The quality of the reconstruction is judged, first, by its ability to mimic power system dynamics and, second, by a faithful reconstruction of spatially coarse-grained and physically meaningful static -spatial distribution of line impedances -and dynamicspatial distribution of damping and inertia -parameters. We conclude the manuscript with a suggestion for a path towards using the PDE based reduced modeling framework for efficient online screening of multiple failure scenarios on large transmission grids, faster than real time.

II. PROBLEM FORMULATION A. POWER FLOW AND SWING EQUATIONS (SYSTEM OF ODES)
AC Power Flow (PF) equations describe steady distributions of electric power flows over an AC power grid. The equations connect complex power injections and θ i denote the active and reactive power injections, and the voltage magnitude and angle at node i ∈ V respectively: Here, b ij and g ij are elements of the susceptance and conductance matrices, see e.g. [20] for more details. Suppose that a steady solution of the PF Eqs. (1) is perturbed, for example by a fast disconnection and reconnection of a large generator or load. Such a fault induces a transient voltage angle and amplitude dynamics. It is customary to assume a time-scale separation between voltage amplitudes and angles. On time scales ranging from sub-seconds to few tens of seconds, voltage amplitudes remain constant, while voltage angles evolve according to the swing equations [2], [21], The voltage amplitudes v i and v j are considered constant, already stabilized to the steady-state solution of Eqs. (1) and m i and d i denote the inertia and the damping (i.e., primary control) of the generators. Eq. (2) describes the relaxation dynamics of voltage angles towards a steadystate solution,θ i =θ i = 0, corresponding to the lossless, g ij = 0, linearized version of Eqs. (1a). Two comments are in order here. First, the linearized approach used here is in practice quite accurate to reproduce the transient dynamics following not too strong perturbations -a problem called small signal stability [2]. Nevertheless, the approach to be presented below can be extended to the nonlinear case, with (2). Second, the swing equation approach is not restricted to the just discussed case of a fast disconnection-reconnection fault, but also captures the voltage angle dynamics following a fault which is not immediately cleared, such as the removal of a generator or a load without reconnection. In such cases, the final relaxed state is not balanced, i.e. i p i = 0, and the power mismatch is compensated by the second, damping term in Eq.
(2), leading to an AC frequency shiftθ i = ω pf ∀i, where ω pf denotes the post-fault synchronous frequency.
In the next paragraph, we construct a reduced model by mapping the discrete system of ODEs (2) into a continuous PDE. Before we do that, we re-emphasize why a model reduction is needed at all. The motivation was lucidly expressed in Ref. [22] as follows: "The focus is on the construction of low-order models which closely approximate the global behavior of the hybrid nonlinear system. There is a growing recognition of the strong need for rapid and reliable computation of the system dynamics." Comprehensive discussions of model reduction in a general context as well as for specific applications to slow coherency and inter-area oscillations can further be found in Ref. [23].

B. MAPPING THE POWER SYSTEM TO A TWO-DIMENSIONAL CONTINUUM: THE SWING PDE MODEL
Consider a two-dimensional domain Ω ⊂ R 2 , with coordinates r = (x, y), inside which the discrete, planar or quasiplanar network is embedded. The boundary of the domain is denoted by ∂Ω and ∀r ∈ ∂Ω, n ≡ (n x , n y ) denotes the normal vector to the boundary at r. Imagine that the swing Eqs. (2) are derived by discretizing a PDE describing the dynamics of a scalar field θ(t; r) over an irregular mesh which corresponds to the original network. Then, following [16], one naturally asks: what is the PDE corresponding to the swing Eqs. (2)? We answer this question by writing the following, most general form of the swing PDE on Ω: where r 1 = x, r 2 = y. One of our main task is to map the physical parameters of (2) into the continuum as follows We discuss a procedure for initializing these continuous parameters in Section III. Next, the swing PDE (3) must be constrained with physically appropriate boundary conditions. In our case, they are Neumann boundary conditions ∀t, ∀r ∈ ∂Ω : α,β=1,2 n α (r)b αβ (r)∂ r β θ(t; r) = 0, (5) VOLUME 4, 2016  (2) if the second term in the righthand side of Eq. (6) vanishes identically, which is guaranteed by the Neumann boundary conditions (5).
In the following we simplify our PDE model, assuming that the b-tensor is diagonal b 12 = b 21 = 0, accordingly, we use a shorter notation, b 11 → b x , b 22 → b y . We will shortly show that it can be even more simplified to b y (r) ≈ b x (r) =: b(r).

III. INITIALIZATION OF PDE PARAMETERS AND CALIBRATION
Once the general structure of the swing PDE (3) is established, we need to initialize the PDE parameters b x,y (r), m(r) and d(r). This is achieved by applying a smoothing to the original parameters defined on the discrete grid of Eq. (2). Obviously, there are many different choices for this coarse-graining/filtering procedure. Therefore, it is crucial to develop a validation criteria. We calibrate and validate via post factum tests, described in the following section, where we compare the dynamics of a fault in the original, discrete swing equations with that in the spatially continuous model. In future work, this initialization procedure will be complemented by a machine learning scheme performing tailored adjustments to increase the accuracy of the model even more.
This simple smoothing procedure was proposed in Ref. [15], [16] focusing on a 1+1 dimensional PDE representation of a linear power network, where all parameters in the 1+1 dimensional PDE were chosen to be spatially constant. This homogeneous smoothing procedure was improved in [17], where non-uniform parameters of the 1+1 PDE system were derived by means of a convolution with a fixed Gaussian kernel.
We introduce a slight generalization of this Gaussian smoothing process. We apply an Artificial Diffusion (AD) to spatial distributions of the physical coefficients, m(r), d(r) or b αβ (r). It starts by assigning discrete physical quantities to the nearest nodes discretizing the PDE (3), then they diffuse over the lattice. The longer the diffusion is allowed, the broader the convolution kernel. The diffusion is stopped when parameters satisfy some smoothness criterion. This generalization is advantageous because it allows the optimal width of the Gaussian kernel to be self-determined (no additional criteria are required).
We illustrate the AD process on PanTaGruEl which consists of 3809 buses, 618 generators and 4944 lines. There are 3221 nodes in the discretization of our continuous model. Even though this number is not significantly smaller than the number of buses in the discrete system, the continuum model can be parametrized efficiently with a much fewer parameters than what is required in the discrete model as we will show shortly. Fig. 1 illustrates the AD process on the susceptances, the damping and inertia coefficients and the power injections. These numerical results suggest in particular that the resulting spatial distribution of the diagonal part of the susceptance tensor is isotropic, i.e. b x (r) ≈ b y (r). This, combined with the assumption that the off-diagonal terms of the b-matrix are much smaller than the diagonal ones (thus dropped), means that the entire susceptance tensor may be approximated by a scalar function, b αβ (r) ≈ δ αβ b(r), where δ αβ is the Kronecker symbol.
We conclude this section by showing how already this relatively simple initialization procedure provides a useful and intuitive insight into the system behavior. Indeed, once the continuous values of the susceptance tensor, b(r) and of the inertia vector, m(r), are determined and validated, we can immediately use them to build a spatial map of the electro-mechanical wave velocity, c(r) = b(r)/m(r), shown in Fig. (2). This figure also illustrates how knowing the velocity, c(r), allows us to reconstruct the dynamical spread of a localized perturbation. Notice, however that these illustrations, even though encouraging, also suggest that one needs to be careful in extending the approach to the grids with strong degree of heterogeneity. The extension is certainly possible, however to achieve an accurate representation of the actual grid dynamics by the PDE model will require introducing more trainable parameters representing higher degree of spatial inhomogeneity.

IV. NUMERICAL VALIDATION
In this Section, we juxtapose our PDE model, with the parameters found through the AD process explained in the preceding section, against the original swing model considered as the ground truth. Therefore, our first validation task is to compare the steady state solution (of the static PF equations) with the solution of a Poisson problem associated with the static version of the PDE model. We then compare responses, within our PDE model vs the ground truth model, to a sufficiently large perturbation -a power outage. Finally, we show that the number of parameters describing the PDE model can be reduced dramatically without loss in accuracy.

A. STEADY STATE EXPERIMENTS
We start with a comparative analysis of the steady-state solutions of the PDE and of the ground-truth model. To do that, we find the grid points closest to the location of the buses within PanTaGruEl and terminate the AD process when the voltage angles of the two steady-state solutions are as close to one another as possible, θ cont i θ disc i . Comparison of the two solutions is shown in Fig. 3 (a). They are clearly in a good agreement overall, even though not without some discrepancies. The outliers were highlighted in different colors in Fig 3 (b). We conjecture that the discrepancies are largely due to misrepresentation of parameters in the part of the grid with strong aspect ratio, e.g. at the Italian peninsula which is long and narrow. Specifically, in this case the boundary conditions we set in the continuous model may be too restrictive, effectively forcing parameters in the part of the grid with large aspect ratios to become much smaller than what we observe in the respective part of the discrete model.
To verify the hypothesis, we simply modify susceptances in the parts of the grid corresponding to the outliers. Specifically, we increase susceptance uniformly within the Italian peninsula (as the aspect ratio there is large) and reduce it over the Iberian peninsula and Transylvania. As seen in Fig. 3, this simple and admittedly ad-hoc adjustment was sufficient to improve the agreement (the outlier effect was reduced significantly). We anticipate that a more accurate and automatic tuning (via ML tools which are work in progress) will produce even better results.
We conclude by mentioning that some of the discrepancies just discussed can be attributed to transformers present in the ground truth model, but absent in the PDE model. These and other strongly localized effects cannot be, properly represented in the continuous model. However, these discrepancies are expected to weaken at larger (spatial) scales -that is at in the coarse-grained picture which is in the focus of our model reduction analysis. Moreover, we do not VOLUME 4, 2016 expect the transformers to play a significant role in analysis of transients, we are now switching our attention to.

B. DYNAMIC EXPERIMENTS
Here we discuss how well the PDE model reproduces the ground truth dynamics observed in response to an abrupt removal of a 900 MW power plant in Greece. Fig 4 shows comparison of the frequency response at four generators across Europe. The agreement between the reduced and ground truth model is very good. It is especially accurate far away from the fault. (We remind the reader that reproducing fatefully coarser picture, and not the small scale details, is exactly our goal.) We also report that the propagation time and the frequency map of the inter-area oscillations are well reproduced too. Not surprisingly, we also see that higher harmonics present close to the fault location (and gradually disappearing as we move away from the fault) are over-estimated by the PDE model. We expect that a more accurate -machine-learning trained -re-parametrization of the PDE model will be able to correct the problem, however on the expense of introducing higher-degree of parameter inhomogeneity in the PDE model. In the end, this is a matter of a trade-off decision which a designer of the reduced model should make -this is a trade off between accuracy of prediction and degree of the model reduction.

C. MOVING TOWARDS MODEL REDUCTION
As we mentioned earlier in the text, the number of harmonics in the parameters of the PDE model was set to be slightly less than the number of nodes in the original grid model. Our next step is to see if we can reduce the number of harmonics in the PDE model coefficients even further without any significant loss of accuracy in the spatio-temporal dynamics coarsegrained at the resolution of 50 km or larger. In other words we now study how our PDE model performs once we apply a low-pass filter to the coefficients.
To investigate this matter, we perform an additional Fourier filtering to the results of the AD procedure. Specifically, we choose the cut-off spatial frequency equal to 30% of the largest spatial frequency set in the bare (unfiltered) version of the PDE model. The results of this filtering experiments are shown in Figs. (5,6). We observe that some compression artifacts, similar to "ripples", are present, but they seem to show little to no effect on the system dynamics, see in particular Fig. 6 (c). Here we would like to emphasize (again) that a much more accurate filtering can be achieved with a ML approach, however it is encouraging to see that with a relatively simple tuning of parameters we were able to achieve reproduction of the principal features with such a good quality, even though out filtering (parameter fitting) procedure was certainly not optimal. Let us also notice that the coarse adjustments, mentioned in Section IV-A, are clearly visible.

V. CRITICAL EVALUATIONS, DISCUSSIONS AND FUTURE WORK
Our main accomplishment in this manuscript is the construction and the validation of the continuous PDE model of the swing equations. The construction included the accurate resolution of the boundary conditions and the development of an efficient and flexible parameter filtering procedure based Artificial Diffusion (AD) and Fourier filtering. The validation proceeded via static and dynamic comparisons of the continuous PDE model and the original discrete model.
We also made a number of interesting observations which are clearly preliminary. Comparing the computation times for dynamical simulations of the PDE model on a regular grid of size comparable to the size of the original discrete graph we found that the PDE model is faster by at least a factor of ten. This is consistent with what we stated in the introduction. We also observed that in many instances a significant filtering of the dynamical parameters through a rather large coarse-graining scale (via wide Gaussian kernels and Fourier filtering) does not impact the accuracy of the dynamics of voltage frequency waves on sufficiently large scales (hundreds of kilometers) and sufficiently long times (seconds).
Finally, this manuscript's most important message is that the model reduction presented here is only the starting point for an upcoming PIML methodology, where the functional maps for m(r), d(r) and b αβ (r) will be modeled as Neural Networks. Specifically, future work will focus on using the approach developed here as a warm start for learning physical parameters of the PDE model (3). Indeed, we envision modeling the functional maps for, m(r), d(r) and b αβ (r) as Neural Networks. Hence, the AD procedure is still expected to be useful to initialize the future PIML schemes.
Planning for this work, we are aiming for keeping training process of the Machine Learning schemes under control in terms of computation time -consistently with the goal of making them capable of achieving the goal of evaluating in parallel multiple perturbation scenarios in the time which is comparable or faster than dynamic simulations of the ground truth (swing) model.

APPENDIX. DETAILS ON THE DISCRETIZATION OF THE PDE AND ITS NUMERICAL INTEGRATION
We use the same spatial increment ∆ for x and y axes, subsequently r = (i∆, j∆).
Similar expressions are obtained for the y-axis. Then discretization of the last term in Eq. (3) becomes where β = b x i,j−1 +b x i,j +b x i−1,j +b y i,j . In order to make our numerical scheme more efficient we vectorize (re-index) the grid, and the field, θ(t; r) defined over the grid, according to θ i,j →θ k , where k = N y (i − 1) + j. It results in the following re-indexing of the grid-neighbors: i − 1, j → k − 1, i + 1, j → k + 1, i, j − 1 → k − N y , i, j + 1 → k + N y . This results in reformulation of the principal part of Eq. (7) in terms of a matrix Ξ acting on the vectorθ. Furthermore, with the convention that inner nodes, i.e. nodes that aren't on the boundary layer, have a zero normal vector, n x = 0 and n y = 0, and introducing η ± (x) = {1 if ±x ≥ 0 ; 0 otherwise}, we rewrite Ξ, therefore accounting for the Neumann boundary conditions (5), Ξ kl = −β k δ k,l + η + (n x )b x k−Ny δ k−Ny,l + η − (n x )b x k δ k+Ny,l + η + (n y )b y k−1 δ k−1,l + η + (n y )b y k δ k+1,l , whereβ k = η − (n x )b x k + η + (n x )b x k−Ny + η − (n y )b y k + η + (n y )b y k−1 and δ ·,· is the Kronecker product. It is important that the method used for the numerical integration of the PDE is a finite volume method. This class of methods is conservative. This means that there is zero flux leakage at the boundary by construction which, in particular, guaranties that the post-fault system frequency is indeed at the value it is expected to be.
Finally, we use the Crank-Nicolson method [24] to integrate PDE (3). At each time step we solve the following system of linear equations