A Versatile Surrogate Model of the Power Distribution Grid Described by a Large Number of Parameters

This paper aims to present a general-purpose Surrogate Model for the probabilistic analysis of power distribution grids with a large number of input parameters. The distinctive feature of the novel technique is the employment of the partial derivatives of output variables versus input parameters to tame the “curse of dimensionality” problem exhibited by prior surrogate model calculation techniques. The second important feature of the proposed Surrogate Model method is that it does not require any a priori assumption about the nature or statistical distribution of the input parameters. In fact, it can be applied whenever design parameters are deterministic variables as well as when they are uncertain and represented by continuous and/or discrete random variables. Relevant applications presented in the paper refer to the probabilistic analysis of the distribution grid in the presence of a large number of photovoltaic sources and electric vehicle charging stations.

Abstract-This paper aims to present a general-purpose Surrogate Model for the probabilistic analysis of power distribution grids with a large number of input parameters.The distinctive feature of the novel technique is the employment of the partial derivatives of output variables versus input parameters to tame the "curse of dimensionality" problem exhibited by prior surrogate model calculation techniques.The second important feature of the proposed Surrogate Model method is that it does not require any a priori assumption about the nature or statistical distribution of the input parameters.In fact, it can be applied whenever design parameters are deterministic variables as well as when they are uncertain and represented by continuous and/or discrete random variables.Relevant applications presented in the paper refer to the probabilistic analysis of the distribution grid in the presence of a large number of photovoltaic sources and electric vehicle charging stations.
Index Terms-Compact modeling, design of experiments, electric vehicles, photovoltaic, probabilistic load flow, sensitivity analysis.

I. INTRODUCTION
C OMPACT Surrogate Models (SMs) play a crucial role in the design optimization and performance/reliability exploration of complex power distribution grids.The envisaged massive integration of renewable energy sources distributed along the grid, without violating stability and safety, represents a remarkable challenge that calls for novel powerful simulation methods and even more efficient compact modeling techniques [1], [2], [3], [4].
A first class of consolidated SMs are those based on linearization of Load Flow equations and Voltage Sensitivity Analysis (VSA).For instance, linearization and VSA has been extensively adopted in the analysis of voltage variations due to PV generation fluctuations [5] or combined with predictive control methodologies and reactive power management techniques in voltage regulation applications [6], [7].
A second class of SMs approximate the nonlinear relationships determined by Load Flow problem (e.g., between injected powers and voltage variations at some observable node of interest) through polynomial approximations.Such SMs, are currently emerging as vital tools in the Probabilistic Power Flow (PPF) analysis of grids penetrated by many renewable sources.In this regard, it is worth remembering how PPF analysis methods can be broadly categorized into Analytical techniques and Numerical methods.Analytical techniques, such as cumulant methods [8] and point estimation techniques [9], employ closedform formulas that rely on simplifying assumptions about probabilistic distributions of uncertainty and/or on linearization of the power flow equations.These assumptions may not hold in real grids of interest (e.g., grids with non-standard distributed PV sources).
By contrast, Numerical methods are generally applicable being based on sampling techniques of uncertainty distributions and Monte Carlo (MC) simulations.The limitation of MC method is its slow convergence that requires repeatedly solving a huge number of Load Flow (LF) problems.It is just in this context where SMs, approximating the grid input-output relationships in compact form, can significantly reduce the number of simulations making MC practically applicable.
State-of-the-art SMs for MC acceleration rely on the usage of generalized Polynomial Chaos (gPC) expansions [10].gPC-based SMs have been exploited in probabilistic load flow analyses aimed to quantify the impact on the grid of uncertain parameters, such as unpredictable power injection from renewable sources, uncertain power demand from residential users [11], [12], [13], or from electric vehicle charging stations [14].
However, state-of-the art gPC-based SMs are suitable for small or medium size problems having less than 50 random parameters since they suffer of the curse of dimensionality [15].This is because the number of terms in the series expansion, and thus the number of LF simulations required by the Design of Experiments (DOE), grow superlinearly with the number of considered parameters.In this paper, we advance state-of-the art methods by exploring a novel research direction.The novelty consists in exploiting partial derivatives of output variables versus input parameters to significantly reduce the number of simulations required to build the power grid SM.
A second important limitation of existing gPC-based probabilistic techniques for power system uncertainty quantification is connected with their being parameter dependent.In fact, existing techniques rely on the assumption of a priori knowledge about the statistical distribution of uncertain parameters.Such a priori knowledge about parameter density functions is indeed an indispensable prerequisite in order to calculate the set of orthonormal polynomials basis functions forming the series expansion [10], [16].As a consequence, gPC model should be recalculated/regenerated, and thus LF simulations repeated, every time a different hypothesis about parameters distribution is made.In this paper, this second drawback is completely removed as well.
In fact, the main contributions of this paper include: r A versatile polynomial-based SM that can be applied independently of the type and distributions of the input parameters.The proposed SM relies on a truncated series expansion of monomial polynomial products.
r A systematic design of experiments procedure that leads to a well posed mathematical problem for coefficients determination.The procedure employs partial derivatives of output variables versus input parameters.Details are presented for the relevant case of series truncation order r Applications to uncertainty quantification problems for the power grid with a large number of input parameters either described as continuous or discrete random variables.The remainder of this paper is organized in the following way: Section II briefly reviews the typical formulation of Load Flow analysis and specifies the input parameters and output (observable) variables of the SM.In Section III, the generalpurpose SM is described, while the details of the proposed Design of Experiments are given in Section IV.Section V shows the computational flow for sensitivity computation.Finally, in Section VI, numerical applications are presented.

A. Load Flow Analysis
We examine a power distribution network composed of N nodes.Load flow analysis consists in determining the node voltage values that yield the desired power flow at the network terminals.The problem is formulated mathematically by a set of nonlinear equations of the type: for n = 1, . . ., N, where V n , I n are voltage and current phasors at node n, vector V = [V 1 , . . ., V N ] collects all voltages, while S n = P n + jQ n denotes complex power at node n with P n and Q n being the active and reactive power, respectively.Furthermore, node currents are related to node voltages by means of where Y ns are the entries of the node admittance matrix.This one is determined by node-interconnection elements, such as grid lines, transformers and phase shifters, which are described by their admittance matrix.Power specification at the terminal nodes are given by loads and generators [17].In this paper, the complex electrical quantities appearing in the power flow problem (1) are described in Cartesian coordinates (i.e., by their real and imaginary parts) and the resulting nonlinear equations are solved with Newton-Raphson (NR) method.For notation compactness, a vector-based representation of the complex phasors and power equations is adopted.The nth voltage phasor V n ∈ C, is represented by the 2 × 1 vector collecting its real and imaginary parts V R n = Re{V n } and V I n = Im{V n }, respectively.Similarly, the power flow (1), ( 2) are transformed into the associated vector form and respectively, where Y ns = G ns + jB ns .Load flow ( 4), ( 5), collected for n = 1, . . ., N, form a system of 2 × N nonlinear real equations that can be denoted in compact form as: where T is the column vector assembling node voltages real and imaginary parts.Such a nonlinear system can be solved with Newton-Rapshon method which requires calculation of the Jacobian matrix

B. Design Parameters
A set of physical quantities affecting grid behavior and appearing in power balance problem (1) are assumed to be the input design parameters.Even though, the method we present can deal with any type of input parameter, in this paper we will specifically focus on the cases where the input parameters are a set of D powers P d , with d = 1, . . ., D, delivered or absorbed at some actor node in the grid.This scenario, in fact, is by far the most frequent in applications.No further assumptions or information is assumed to be known about input parameters that can fall into the following categories: a) deterministic powers injected by some grid controller (e.g., aimed at mitigating voltage fluctuations); b) continuous random variables modeling uncertainty in renewable sources generation (PV, or wind generation) as well as uncertainty in loads absorption; c) discrete random variables (e.g., modeling quantized levels of power absorption).
Since the range of such powers can be different, we first map them into normalized powers: where the scaling factor P M d denotes the maximum installed power at node d.
Once design parameters have been decided, load flow problem (6) can be rewritten as: where x = [x 1 , . . ., x D ] T denotes the parameters vector.For given value x of parameters vector, the solution of ( 9) provides the associated V node voltage values.Commonly, in applications one is interested in calculating the values of a subset of output observable variables y j , with j = 1, . . ., M in the grid.
For instance, observable variables that are relevant for power grid quality are the voltage modules at some critical buses: Observable variables can be assembled in vector y = [y 1 , . . ., y M ] T .Due to power balance problem ( 9), observable variables vector results to be a deterministic multi-output function of design parameters: where the jth element in Φ(•) is a multi-variate nonlinear function of the type: A surrogate model should be able to approximate such multivariate functions on the basis of a limited number of numerical experiments, i.e. solutions of power balance problem (9) for a set of parameter vector samples x (s) .

III. POLYNOMIAL-BASED SURROGATE MODEL
In most uncertainty quantification problems, it is assumed that input parameters describing uncertainty are random variables with known continuous statistical distributions.Under this hypothesis, the jth multi-variate function to be modeled, denoted ϕ( x) for simplicity, is approximated as a truncated series where the orthonormal basis function {Ψ α ( x)} is identified by the vector index α = [α 1 , . . ., α D ] ∈ N D .Such orthonormal basis functions are precalculated starting from the joint Probability Density Function of variables x d .In this sense, the SM ( 13) is a parameter-dependent model.In (13), the series expansion truncation order β bounds the number of basis functions in the series, in fact As a result, for a given number of parameters D and truncation order β, the number of basis functions, and of weighting coefficients a α , is given by: In our approach, instead, we want to derive a more versatile SM that could be employed independently of the type and distribution of its inputs.To this aim, we first observe how truncated polynomial series expansion (13) can always be rearranged in the following form: where the series functions are now given by the product of monomial polynomials, i.e.: Please, note that in (15), multi-variate polynomials and related coefficients can be either specified by a vector index notation where In this paper, where no a priori assumptions on variables x k are made, the general right-most notation in ( 15) is adopted as the surrogate model.In compact form it reads: where is the row vector collecting polynomial functions while is the vector with associated coefficients.For instance, for truncation order β = 2 and D = 3 input parameters, the surrogate model is described by row vector: composed of polynomial basis functions.

IV. DESIGN OF EXPERIMENTS
The expansion coefficients in (15) or (18) are determined by means of the design of experiments explained in this section.The distinctive feature of the herein proposed procedure consists in the exploitation of partial derivatives ∂ϕ( x) ∂x k of output variable versus input parameters.In particular, we present details for truncation order β = 2 since this is by far the most commonly Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.adopted approximation order in power distribution grid problems.
In this hypothesis, S = D + 1 samples points x (s) , for s = 0, . . ., S are selected on a regular grid in the parameter space, as shown in Fig. 1.
Samples include the space origin and the D parameter space vertexes having a one at sth position and zero the remaining elements.
In each sample point x (s) , the observable variable ϕ( x (s) ) and its D partial derivatives for k = 1, . . ., D are calculated by running a deterministic LF analysis and further sensitivity analysis as detailed in Section V. Hence, the series expansion ( 18) is enforced to fit ϕ( x) and its D partial derivatives ∂ϕ( x) ∂x k , at the S = D + 1 sample points x (s) .This results in the following over-determined linear system where M is the experiment matrix described as follows: . . .
denotes the partial derivatives of row vector defined in (19) computed at sample points.
The row vectors and their derivatives in (27) are made of ) elements (i.e., columns).Each graph parenthesis in (27) groups the slot of D + 1 constraints in a sample point.As a result, matrix M is made of (D + 1) 2 rows.
Similarly, b is the column vector collecting in a stack-wise fashion the observable variable and its derivatives at the sample points, i.e.: . . .
with s) .With these premises, we are now in the condition to state the following fundamental result.
Result: With the regular grid sample points selection made in ( 23) and ( 24), the experiment matrix ( 27) has exactly N b = D 2 + 1 (D + 1) linearly independent rows.Thanks to the above reported Result, which is proved in the Appendix, the over-determined system (26) has full rank and thus it can be solved in the least-squares sense by multiplying both sides by the transposed experiment matrix, i.e.: V. COMPUTING SENSITIVITY For prescribed values of parameters vector x, the solution of load flow problem (9) yields the associated V = V ( x) node voltage vector.Hence, a small perturbation δ x of parameters will produce a small perturbation δ V of node voltages such that the power flow problem (9) can be written as: where O(δ 2 ) are second-order terms.In the limit that perturbation goes to zero, neglecting second-order terms, we derive the matrix system: The term ∂F ∂ V is the 2N × 2N Jacobian matrix (7)., the sensitivity of observable variables of interest can then be deduced.For instance, when observable variables are the modules of some node voltage like in (10), i.e., y j = ϕ j ( x) = |V j |, their sensitivity is calculated as follows: We end this section by summarizing the computational steps of versatile SM-based PPF analysis as reported in the flowchart in Fig. 2. Its main features are: 1) For D uncertainty parameters, the determination of SM (18) only requires D + 1 LF simulations, and related sensitivity analyses.In other terms, the number of LF simulations grows linearly with the number of parameters.
2) The compact model (18), with computed coefficients c k , does not depend on statistical distributions of parameters x d .It can be re-evaluated many times assuming several hypotheses about x d parameters distribution.

VI. NUMERICAL RESULTS
In this Section, we investigate application of the proposed versatile SM to the IEEE 85 bus medium voltage distribution grid test case provided within the file case85.m of the MatPower suite.For such a grid, the base voltage is V B = 11 kV.This single-phase test grid is first extended to a balanced three-phase network, referred to as balanced 85bus grid, by replicating the single-phase loads and generators at the three nodes (i.e., the three phases) of each bus.SM determination and grid simulations have been done using Matlab.The machine used is an Intel I7 with 16 GB of ram and 2.3 GHz clock frequency.

A. Scenario With Continuous Input Parameters: Predicting the Impact of PV Penetration
In the first numerical experiment, we describe application of the SM in power grid uncertainty quantification in the presence of a large number of distributed PV generators.Such an issue is currently a hot topic in view of the expected green transition towards renewable sources.To this aim, we want to reproduce in simulations the typical statistical features of PV generations and evaluate the effects of their injection in several potential buses/nodes.As an example, we consider 60 injection buses.Fig. 3 shows the topology of the grid, with buses numbering, and highlights the buses where PV injection is supposed to occur.In our example, all buses in the grid are considered as potential PV injection points with the exception of those where lines bifurcate.In each injection bus a maximum installed PV power P M d = 15 kW is first assumed in simulations.The 60 PV sources are all injected into the Phase-A node of the related bus thus creating grid unbalancing.Since, in IEEE 85 bus test case, the total active power absorbed by loads connected to Phase-A is of P = 2.57 MW, the total installed PV power of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.15 kW × 60 = 0.9 MW assumed in simulations corresponds to a PV penetration at Phase-A of α = 0.9/2.57= 35 %.
To be realistic, the statistical features of PV generators used in simulations are derived from the freely-available data set of PV measurements [18].The active power delivered by several PV plants over a hourly time window of the day, i.e., from 10:00 AM to 11:00 AM, are extracted from the data set for several days.Such powers P P V d after normalization to the installed power P M d , i.e., provide the samples of the design parameters as defined in (8).Fig. 4 shows, as an example, the typical statistical distributions of normalized active power for three different hourly time windows of the day.From Fig. 4, one can see how PV power delivery is a non-stationary random process whose PDFs change significantly when computed over different hours of the day.As a consequence, in order to assess grid reliability, network operators need to explore and simulate several hours-of-the-day scenarios described by the corresponding PV distributions.
A versatile SM with D = 60 input design parameters and a set of desired observable output variables is calculated in accordance with the Design of Experiments described in Section IV and computational flow reported in Fig. 2. In this example, the determination of SM requires D + 1 = 61 distinct Load Flow simulations.After that, the computed SM is employed in place of running LF simulations to efficiently evaluate the PDFs of a set of observable variables of interest.
As an instance, Fig. 5 reports the PDF of the voltage (module) variations at Bus 55 (Phase-A) with respect to nominal voltage level V55 and for the hourly time window 12PM-1PM.In Fig. 5 voltage variations in abscissa are measured in kilovolt [kV].In the same figure, the PDF is also computed by means of reference Monte Carlo (MC) running   10,000 simulations as well as with orthonormal-polynomialbased gPC method.The three PDF shapes match excellently.
Table I reports the comparisons of the three methods for a single time window computation, in terms of number of required Load Flow analyses, simulation time and relative error with respect to reference MC method (note: the relative errors are computed in all of the nodes of the grid).It is seen how SM reduces significantly the number of required Load Flow simulations.As a result, SM achieves a two orders of magnitude speed up factor compared to reference MC method and a ≈ 20× speed-up factor compared to gPC method for the same accuracy.
Hence, PPF analysis is repeated for ten different hourly time windows of the day, i.e., from 8AM-9AM hour till to 5PM-6PM hour.As explained in described Theory and in the computational flow chart shown in Fig. 2, the proposed versatile SM does not depend on the distribution of the input parameters.Series expansion model (18), with previously computed c k coefficients, is reused for exploring observable node uncertainty over different hours of the day.Naturally, different distributions of input PV sources will result in different distributions of observable node voltage variations.Fig. 6 reports, as an instance, the PDF of (magnitude) voltage variations at Bus-55 over three hours of the day as efficiently calculated with SM.It is seen how the central hours of the day correspond to the largest node voltage variability interval.It is worth observing that in Fig. 6 and following ones, voltage variations are reported in per unit [p.u.] with respect to the base voltage value of V B = 11 kV.As a check of SM correctness, the ten-time-window simulations are also performed with reference MC method as well as with orthonormal-polynomial-based gPC method, and results are reported in Fig. 6 for comparison.The matching of PDFs computed by the three methods remains excellent.With regard  to gPC method, since orthonormal polynomials basis functions are taylored to parameters distribution, its implementation requires rebuilding the associated Experiment Matrix and thus repeating a new set of Load Flow simulations for each hourly time window [10], [16].
Table II summarizes the number of required Load Flow analyses, the overall simulation time and relative error of the three methods for the ten time window computation.
Simulations are also extended to explore voltage uncertainty at several buses in the grid.For instance, Fig. 7 reports the PDFs of such voltage variations for the central hour 12PM-1PM as computed with SM, reference MC and gPC methods.It is seen how Bus-55 is that with the greatest variability interval and thus the critical one to consider and monitor by network  operators.To this aim, Fig. 8 shows the sensitivity values of voltage module at Bus-55 and Bus-76 (Phase-A) versus power injection buses.These values are calculated in correspondence of parameter space origin x (0) = [0, . . ., 0] (23) in the DOE.We note how, in this grid, voltage at Bus-55 exhibits large sensitivity to power injection at buses from Bus-25 to Bus-56 while voltage at Bus-76 is more affected by injection at buses from Bus-59 to Bus-77.However, both considered voltages tend to exhibit a non zero sensitivity versus almost all of the potential injection buses in the grid.We end this first experiment scenario by showing results for different levels of PV penetration.To this aim, we repeat PPF analysis increasing the node installed PV power from previously considered value of P

B. Scenario With Continuous/Discrete Parameters: PV and Electric Vehicle
In this second application example, we exploit the SM to address another emerging hot topic in power grid reliability analysis: the impact of the simultaneous presence of distributed PV generators and electric vehicle Charging Stations (CSs).
To be realistic, the statistical distributions of CSs power delivery to vehicles are derived from available measurement data sets [19].As an example, in Fig. 10 we report the typical statistical distribution of CSs power P CS : it refers to a multi-point CS that can serve a maximum of six electric vehicles simultaneously.Each electric vehicle, when connected, draws a constant 3.5 kW power.As a result, the CS power absorbed from the grid can only assume one in the six quantized values {0, 3.5, 7, 10.5, 14, 17.5, 21}kW.
After normalization to the maximum load power, the design parameter describing the CS load: is a discrete random variable described by its Discrete Probability Function (DPF) [20].Twelve CSs are added to the buses of the IEEE85 bus grid, as shown in Fig. 11.In simulations, CSs are assumed to be single-phase loads that are connected to the Phase-A node of the related bus.Please note, that in this second example the input-output problem to be approximated has on the whole D = 72 input design parameters, i.e. 60 continuous random variables representing PV sources plus 12 discrete random variables modeling CS load.Fig. 12 shows the PDF of voltage variation at Bus-55 in the grid as computed with the SM model for α = 35 % PV penetration.Even though not reported in the same figure, the PDF calculated with the SM matches with a < 0.5% relative error the one computed with reference MC method.In Fig. 12 the PDF of voltage variation at Bus-55 due to PV sources only is also reported for comparison.From the network operators viewpoint it is seen how the presence of CS loads tends to partially counterbalance PV effect moving voltage variability interval towards lower values.

VII. CONCLUSION
In this paper, we have presented an original approach to compact modeling of power distribution grids affected by a large number of parameters.The approach exploits the computation of partial derivatives of variables of interest versus input parameters to tame the curse of dimensionality problem.For the relevant case of expansion order β = 2, the details of the Design of Experiments (DOE) required to extract the coefficients of the SM have been illustrated.It has been proved how the DOE always results in a numerically well posed problem and how it requires a number of LF simulations that grows linearly with the number of parameters.For the cases considered, the proposed SM introduces a two orders of magnitude speed-up compared to reference MC method for the same accuracy.
In addition to that, the proposed surrogate model is multi purpose since it can be applied independently of the type and statistical distributions of its input parameters.Applications to hot topic probabilistic analysis, i.e. with power grid injected by a large number of renewable PV sources and loaded by electric vehicles stations, have been investigated.It has been shown how the SM can provide, in an accurate yet efficient way, key information to network operators.Such an information includes the identification of critical Buses in the grid, exploration of node voltage variability for different hour-of-the-day scenarios or/and for different levels of PV penetration as well as insights about the interaction of many distributed PV sources with electric charging stations loads.The versatile SM has several potential future employments which include applications to grid optimization and decision making problems [21], [22].

The 2N ×
D sensitivity matrix ∂ V ∂ x contains the unknown (partial) derivatives of real and imaginary parts of node voltages versus parameters, i.e., ∂V R n ∂x d and ∂V I n ∂x d .Finally, the 2N × D right-hand side matrix − ∂F ∂ x collects the derivatives of power flow equations versus parameters.For instance, when the parameters are the active powers injected at nodes, i.e. x d = P n , Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 2 .
Fig. 2. Computational steps of the PPF analyses with the versatile SM model.

Fig. 4 .
Fig. 4. Typical probability distributions of PV delivered power for several hours of the day.

Fig.
Fig. Sensitivities of voltage at Bus-55 and Bus-76 versus PV power at injection Buses.

Fig. 9 .
Fig. 9. PDF(ΔV 55 ) at Bus-55 Phase-A for different PV penetration levels.Penetration is determined by node installed PV power P M d .
M d = 15 kW to values P M d = 30 kW, and P M d = 45 kW.Such installed powers correspond to PV penetration levels, at Phase-A, of α = 35 %, α = 70 %, and α = 105 %, respectively.Fig. 9 reports the P DF (ΔV 55 ) computed at critical Bus-55, and for the central hour time window.For the considered grid and PV injection arrangement, such simulations inform the network operator about the potential random variability of critical node voltages.For instance, a PV penetration of α = 105 %, can result in a (positive) voltage variability interval of ≈ 0.08 × 11 kV = 0.88 kV at the critical Bus-55.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 10 .
Fig.10.Discrete probability function of typical CS power absorption from the grid.The function gives the probability that power is exactly equal to the quantized values reported in abscissa.

Fig. 11 .
Fig. 11.Detail of feeder in the IEEE 85 grid with electric vehicles charging stations addition.