Fast Probabilistic Voltage Control for Distribution Networks With Distributed Generation Using Polynomial Surrogates

Renewable distributed generation will be a key component of future power distribution networks. In order to control the voltage conditions for distribution networks integrated with distributed generation (DG) units, it is vital to quantify the impacts of control variables on voltage magnitudes under consumption and generation uncertainties. To do so, we need to run, for each control action, several power flow simulations for various consumption and generation realizations. This is computationally infeasible for systems with many uncertain inputs. In this work, we address this challenge by developing surrogates, or metamodels, that analytically estimate the random voltage as a function of input variables which include random parameters (consumption and generation levels) and control actions (power factors). Specifically, we propose a model reduction method for building these surrogates, which reduces the number of simulations needed for training. This method identifies and includes only the consumption and generation variables that are influential on the voltage at a given bus. Using this ‘reduced’ surrogate, we then develop a sensitivity-based approach for probabilistic voltage control. We demonstrate the computational efficacy of the control approach on a IEEE 69-bus system with a large number of correlated input parameters. The highlights of computational efficiency in this case study include (1) accurate probabilistic power flow analysis using surrogates constructed by only 500 training simulations for a system with more than 150 random parameters, and (2) successful surrogate-based voltage control approach which only requires 150 additional simulation samples, as opposed to the conventional perturb-and-observe voltage control which needs more than 500,000 samples.


I. INTRODUCTION
In the future, power distribution will involve large penetration of renewable distributed generation (DG) units, due to shortages of fossil fuel shortage, greenhouse gas emissions from fossil fuels, technology development, cost reduction, and government incentives [1]. Although there is no unique definition for DG units (also known as embedded generation, dispersed generation or decentralized generation) one can roughly define them as small generation units that mainly use renewable energies and are directly connected to distribution The associate editor coordinating the review of this manuscript and approving it for publication was Lei Chen . or customers' side of networks [2]. The penetration of DG units can result in an increase in power supply capacity and a reduction in transmission loss [3]. However, despite the advantages of DG integration, they also pose major planning and operation challenges. The main operation challenges are voltage control, grid protection and fault level [4]. In this work, we focus on the voltage control problem and introduce a surrogate-based sensitivity analysis framework that efficiently evaluates how changes in active and reactive power of DGs will impact the voltage profile of the network.
There are two conventional sensitivity analysis approaches that evaluate the change in voltage at a specific bus as a result of a change in active and/or reactive power at another bus: (1) the ''Jacobian-based'' approach (which uses the inverse of the Jacobian matrix that is formed in the Newton-Raphson power flow method), and (2) the perturband-observe approach. The Jacobian-based approach was used in [5] for voltage sensitivity analysis and selecting the optimal reactive control of DGs, and in [6], [7] for sensitivity analysis and voltage control in distribution systems with wind and photovoltaic generation systems. It must be noted that in Jacobian-based approach for sensitivity analysis requires close monitoring and full observation of the corresponding network. Additionally, it requires voltage sensitivities to be updated every time the state of the network changes (i.e., every time that there is a change in consumption and/or generation). Thus, Jacobian-based voltage control methods, such as [5]- [11], cannot be used for proactive voltage control because the exact future demand and generation are not known. In other words, proactive voltage control with such approaches is feasible only when deterministic predictions for demand and generation are used. Consequently, these control approaches are mostly applied after a voltage violation is observed [5], [8].
In [12], the perturb-and-observe approach was used for voltage sensitivity analysis. Unlike Jacobian-based methods, the perturb-and-observe approach is also applicable when power flow is calculated using an algorithm other than Newton-Raphson [13], but similar to Jacobian-based methods, it requires the sensitivity analysis to be re-evaluated every time that there is a change in the system. This is because these conventional methods provide local sensitivity analysis and do not provide analytical insights on the impact of change in power on the voltage profile. Consequently, one has to either repeatedly update the Jacobian matrix or run several power flow simulations for different system states during the operation. This requirement for simulation runs has encouraged studies to develop analytical approximations of voltage change. For instance, in [14], data was used to approximate voltage change as a linear function. However, a linear approximation may not be accurate when large changes in power generation or consumption are considered. In [15], an analytical approach for sensitivity analysis was proposed, but several simplifying assumptions (e.g., considering the power losses to be negligible) were made which can compromise accuracy. In [13], a surface-fitting approach was used to approximate voltage, but it required an unnecessarily large number of simulations. Moreover, none of these studies considered power consumption to be random. The random behavior of consumers and renewable energy sources can cause random fluctuations in voltage profile of distribution network. This randomness calls for methods that probabilistic calculate the voltage change. In [16], [17], an upper bound for change in voltage at a specific bus as a result of change in active and reactive power at other buses was derived and verified using simulations. But this upper bound can be substantially greater than typical voltage changes and result in a voltage analysis that is too conservative and unrealistic.
To summarize, there are very limited works that have fully accounted for uncertainty in demand and generation of power system in voltage control problems. This scarcity is mainly due to the high computational cost associated with developing analytical models and performing sensitivity analysis in the presence of large number of uncertainty sources. In this work, we propose a model reduction technique that can facilitate building polynomial surrogates for distribution systems with many random parameters. Recently, polynomial surrogates have been successfully used in probabilistic power flow analysis [18], [19], where the power system response is approximated as a function of random inputs under a fixed decision or control action. It should be noted that for these surrogates to be used in a control framework, for any new control action, a new surrogate should be constructed using new simulation runs. This will cause a computational inefficiency challenge for problems with numerous control action alternatives.
The ultimate goal of this work is to develop a computationally efficient framework for voltage control of distribution systems with DGs. To this end, we seek to use surrogates which are analytical functions that represent the system response (e.g., voltages), as a function of inputs (e.g., power consumption). In this work, we propose a surrogate defined over an augmented input space which includes both random system parameters and also decision or control variables in distribution systems with DGs. Specifically, the proposed surrogate is a polynomial model that approximates voltage at each bus as a function of the following input variables: (1) random parameters, including power consumptions, active powers generated by uncontrollable DGs, and active powers generated by controllable DGs and (2) decision variables, including power factors of controllable DGs.
As demonstration, we consider a general scenario with high penetration of DGs into the distribution system, and treat power consumption and generation at all buses as correlated random variables. Moreover, we consider the realistic case where some of the DGs are not controllable and operate at unit power factor. We will show that the proposed surrogate (i) can facilitate efficient calculation of probability distribution of voltage levels, (ii) can enable fast global sensitivity analysis and offer analytical insights into how changes in power at a given bus impact voltages at other buses, and (iii) can enable a surrogate-based control approach with a computational cost that is substantially smaller than that of the conventional perturb-and-observe approach. To the best of our knowledge, this work is the first attempt at developing analytical surrogates, in the augmented space of random parameters and decision variables, that allows computationally affordable probabilistic voltage control.
The main challenge in building an accurate polynomial surrogate for distribution systems is the high dimensionality of random inputs. That is, as the number of random inputs increases, the number of simulations needed to form a sufficiently large training set also increases significantly. In this work, we propose using a novel divide-and-conquer approach VOLUME 8, 2020 that effectively identifies the influential inputs and ignores the non-influential parameters as random inputs, thereby reducing the dimensionality of the model and in turn the number of required simulation samples in the training set. This model-reduction technique allows us to avoid simplifying assumptions and include all influential sources of uncertainty in the voltage control scheme. The proposed divide-andconquer approach also identifies the appropriate polynomial order for the surrogate and thus can demonstrably improve the surrogate accuracy.
In summary, the contributions of this paper are: 1. Developing polynomial surrogates for voltage levels in distribution systems over an augmented input space that includes random parameters as well as decision variables.
2. Proposing a divide-and-conquer model reduction approach that facilitates the training of surrogates for distribution systems with large number of random inputs.
3. Proposing a surrogate-based probabilistic voltage control algorithm for distribution systems with DGs.
4. Demonstrating the applicability and efficiency of the surrogate-based voltage control of distribution networks with DGs in the presence of many correlated random inputs.
The rest of the paper is organized as follows: Section II presents general concepts and theoretical background in polynomial approximation as well as the proposed divideand-conquer approach. In Section III, we introduce our surrogate-based probabilistic voltage sensitivity approach along with the proposed voltage control algorithm. In Section IV, we demonstrate the validity and efficiency of our proposed surrogate-based approach on the IEEE 69-bus system.

II. POLYNOMIAL SURROGATES FOR VOLTAGE LEVELS
As mentioned earlier, we seek to build a surrogate to approximate voltage at each bus as a function of input variables, which include random parameters and decision variables. The resulting surrogate will be used to calculate, for each bus, the voltage probability distribution, and in turn the voltage violation probability, which is the probability of the voltage not being in a ''safe'' range. In what follows, we provide the technical background for polynomial surrogates and propose an efficient approach to train these surrogates.

A. POLYNOMIAL SURROGATES
The polynomial chaos expansion (PCE), or polynomial surrogate, is one of the most widely used surrogates that approximates the system response (in our case the voltage) by a polynomial function in the space of random inputs and can thus replace the full-scale expensive simulation to enable fast response calculation [20], [21]. Specifically, let u denote the voltage at a given bus and x denote the vector of input variables. Then, the PCE produces u(x), which is a functional representation of the system response in the form of an expansion with orthogonal bases that are polynomial functions of input variables x.
Let us first consider the case where inputs x = (x 1 , . . . , x d ) are independent random variables. Let I x ⊆ R d be the support of x (i.e., x i ∈ I x i ) and Given this setting, we form the set of univariate orthonormal polynomials, {ψ α,i } α∈N 0 , which by design satisfies , and δ αβ is the Kronecker delta function. The probability density function of x i , ρ i (x i ) determines what kind of polynomial functions {ψ i } should be used. For example, for Gaussian and uniform probability distributions, respectively, Hermite and Legendre polynomials should be used as the orthonormal polynomials. After determining the type of one-dimensional polynomials, the d-dimensional orthonormal polynomials are derived from the multiplication of one-dimensional polynomials in all dimensions. As an example, a d-dimensional polynomial, formed based on one-dimensional polynomials with orders α 1 , α 2 , . . . , α d , in dimensions 1, 2, . . . , d is given by where the multi-index α = (α 1 , α 2 , . . . , α d ). Consequently, the orthonormality will also hold for d-dimensional polynomials, i.e.
Using this construction, any function u(x) : I x → R that is square-integrable can be represented as where {ψ α } α∈N d 0 is the set of orthonormal basis functions satisfying Equation (3) [21]. However, for computation's sake, u(x) is approximated by a finite-order truncation of PCE expansion given by where k is the total order of the polynomial expansion and d,k is the set of multi-indices defined as The cardinality of d,k , i.e. the number of expansion terms, here denoted by K , is a function of d and k according to Given this setting, u k (x) approximates u(x) in a proper sense and is referred to as the k-th degree PCE approximation of u(x) [21].
In order to build a PCE surrogate, the specific goal is to calculate the vector of unknown expansion coefficients c = (c α 1 , .., c α K ) T in Equation (5). To do so, we obtain M input samples and based on which M system responses are calculated using computer models. This results in the data vector u = (u(x (1) ), . . . , u(x (M ) )) T . Among the methods to estimate the PCE coefficients, regression is one of the most widely used approach which concerns solving the following problem [22] min c u − c 2 2 , where is the model matrix, constructed according to In the next sections, we will address challenges associated with solving this regression problems for high dimensional systems with different types of inputs.

B. CHOICE OF INPUT VARIABLES
As mentioned earlier, the input variables in this work are not necessarily independent random variables. Specifically, the vector of input variables x includes correlated random parameters (i.e. power consumption and active power generated by controllable and uncontrollable DGs) and decision variables (i.e. power factors of controllable DGs) that are not inherently random. In this section, we explain how to address the correlation issue, and also how power factors can be considered as input variables of the polynomial surrogate.

1) CORRELATED SYSTEM PARAMETERS
To construct the polynomial surrogate over correlated random inputs, the common practice is to convert correlated variables to independent random variables using Cholesky decomposition or Copula [18], [19]. However, in this work, we deem this orthogonalization to be unnecessary. This is because the only reason for making variables uncorrelated is to maintain the orthogonality of PCE basis functions. The latter is desired for two reasons. First, an orthogonal basis set allows for analytical evaluation of first two statistical moments of system output and also the sensitivity indices after the surrogate is built [21], [23]. However, it should be noted that even when basis functions in a surrogate are not orthogonal with respect to random inputs, one could still numerically calculate the statistical moments and sensitivity indices with high accuracy if the trained surrogate is evaluated at a sufficiently large number of input samples. This numerical calculation will incur minimal computational cost as it only involves evaluating an analytical function at input samples. Another reason to prefer the orthogonality of basis functions is to prevent the model matrix from being highly coherent. Coherence of is especially important when the number of samples is smaller than the number of unknown coefficients and an under-determined system must be solved to estimate the PCE coefficients [24]. This doesn't apply to our work either, since by excluding non-influential inputs in the divide-and-conquer approach, as will be shown, we effectively reduce the number of unknowns and only a few hundred samples are enough to keep overdetermined.

2) TREATING DECISION VARIABLES AS INPUT VARIABLES
In control actions targeted at preventing voltage violation, power factors for controllable DGs are one of the main factors to control. We specifically aim to include power factors as input variables of PCE surrogates, in addition to random inputs. We do so because with the resulting surrogate, one can analytically evaluate the probabilistic system conditions given various choices of power factors. Specifically, we treat power factors as input variables uniformly distributed between a minimum and maximum allowable values. Therefore, the training samples are uniformly distributed within that allowable range. It should be noted that we do not assume that power factors are random system parameters. We only treat them as an additional input variables, assumed to be uniformly distributed over a prescribed range, so that a controller can use the resulting surrogates to analytically calculate voltage levels for different choices of power factors as possible control actions. It should be noted that at each fixed value of power factors, the probability distributions in voltage levels are only due to the random system parameters (e.g. power consumptions).

C. MODEL REDUCTION
A main challenge in building a PCE surrogate using regression problem in 8 is the curse of dimensionality. Specifically, as the dimensionality of the problem (i.e. the number of random inputs) increases the number of polynomial basis increases significantly. For instance, for a system with 150 input variables, a 2nd order polynomial surrogate will include a total number of K = 11, 476 unknown coefficients. This means that at least 11,476 runs of the computer model are needed for the regression problem not to be underdetermined.
In this work, we seek to reduce the dimensionality of the problem by removing the input variables that are not influential. In large scale physical systems, it is likely that the response of interest does not depend equally on all the inputs, and the variation in some of the inputs has negligible impact on the system response. Capitalizing on this possibility, in order to lower the number of required computer simulations, we aim to exclude those uninfluential inputs. Specifically, to determine whether or not an input is influential, in an incremental approach we start with a second order polynomial and try adding input variables to the model, one variable at a time. For each trial, we train the surrogate and compute the ''calculation'' error, chosen to be the 2 norm of regression residual, evaluated on the same training data. Then, out of the tried inputs, we choose to keep the input that results in the smallest calculation error. In a subsequent iterations, we try including more input variables or increasing the polynomial order and accept the trail that leads to the largest reduction in calculation error. Finally, we will stop adding inputs and increasing the polynomial order if there is no further improvement in the 'validation' error, chosen to be the 2 norm of regression residual, evaluated on a separate test data set.
Algorithm 1 shows the pseudocode for the proposed divide-and-conquer approach. In this pseudocode, at step t, k t is the expansion order and x r t is the reduced dimension vector which only includes the ''influential'' inputs. Corresponding to this vector, d and k denote reduced model matrices (with columns fewer than K ), evaluated at the training samples. In the validation step, r test is the reduced model matrix evaluated at the test data, and u test denotes the response vector evaluated on the test samples. Remove ith input: Pick the best dimension: i * = argmin{e

12:
Try increasing the order: k t = k t + 1.

15:
Calculate error: e k = min u − k c 2 2 . 16: if e k < e d then 17: c * = c k , and build r test based on k . if < min then 23: min = and t = t + 1.

24:
Accept the change in x r t or k t , accordingly. The main computation cost of the proposed divide-andconquer approach is due to the least square minimizations, which in turn depends on the size of r t in each step. Since this is an incremental algorithm involving solving reduced models with very few inputs in each iteration, one can expect the algorithm to be very efficient. At the conclusion of this algorithm, for probabilistic voltage calculation at each bus, we will have a reduced polynomial surrogate which only includes a subset of random parameters and decision variables.

III. VOLTAGE CONTROL USING POLYNOMIAL SURROGATES
Here, we briefly explain our proposed probabilistic voltage control framework. First, using the reduced surrogate proposed in previous section, for each bus we calculate the voltage violation probability, which is the probability that the voltage at that bus is not within the ''safe'' range. A critical bus is then defined as the bus with a violation probability greater than a prescribed threshold (taken to be 5% in the numerical case of this paper), and the most critical bus is the critical bus with the highest violation probability. At each control step, we focus on the most critical bus, and identify the most influential DG for that bus. For this identification, we rank influential (controllable) DGs using a surrogate-based sensitivity analysis. In what follows, we provide the background on global sensitivity analysis and then thoroughly explain the surrogate-based control approach.

A. SURROGATE-BASED SENSITIVITY ANALYSIS
Sensitivity analysis studies how variability (or uncertainty) of each input impacts the variability (or uncertainty) of the system's response. The two main categories of sensitivity analysis include local and global sensitivity analysis. Local sensitivity analysis methods typically consider varying an input variable around a fixed point in the input space while keeping the rest of the inputs as fixed or deterministic. Therefore, they measure the response sensitivity only in a small neighborhood in the input space. Global sensitivity analysis, on the other hand, considers the variability in the whole input space and studies how such global input variability induces variability in the system's response.
One of the most widely used global sensitivity analysis approaches is the Sobol' method [25], where the variance of system's response is decomposed as summation of variances of different terms in the model and sensitivity is evaluated by Sobol' indices. Traditionally, Sobol' indices are calculated using Monte Carlo (MC) simulations. However, having constructed a PCE surrogate to replace expensive simulations, one can calculate Sobol' indices. This can be done analytically if inputs are not correlated [23], or with minimal computational cost if they are correlated [26]. To evaluate Sobol' indices using a PCE surrogate, we first need to decompose the PCE expansion based on the indices of its terms. Let us define ν to be a generic index set, ν ⊂ {1, . . . , d}, to label the inputs that are varied and denote these labeled inputs by x ν , as a subvector of x. We also define d,k ν to be a set of all basis functions associated with these labeled inputs. That is, d,k ν contains all the multi-indices within d,k that have non-zero terms α p = 0 if and only if p ∈ ν: We can now rewrite PCE as the summation of terms that only depend on the input variables x ν : where, is the kth-order polynomial expansion that only includes labeled inputs x ν . If the inputs are correlated (i.e., x is a vector of correlated variables) following [26], the variance of u k (x) can be calculated as where, In order to calculate total covariance-based sensitivity indices S  (13) and (14) are normalized as follows [26] It should also be noted that these indices are calculated by only evaluating the PCE surrogate at sample inputs, which incurs minimal computation cost. It should be noted that in the case of uncorrelated random inputs, Equation 17 vanishes and the Sobol' indices will be given by Equation 16. Limited research works have been so far reported on global sensitivity analysis for power systems. In [27], active power at different buses are considered to be uncertain and global sensitivity analysis is used to rank buses at which active power most influentially impacts the system response, e.g. voltages and branch currents. However, in distribution systems, it is the power factor of distributed generators that are typically modified for a voltage control [5]. In what follows, we discuss how such voltage control can be done effectively by leveraging sensitivity analysis and carefully selecting the buses at which power factor modification yields the best result.

B. SURROGATE-BASED VOLTAGE CONTROL
In this section, we thoroughly explain the proposed probabilistic voltage control. Figure 1 summarizes the proposed methodology for surrogate-based probabilistic voltage control, which consists of four procedures: (1) surrogate training, (2) surrogate-based power flow analysis, (3) surrogate-based sensitivity analysis, and (4) control action. After training the voltage surrogates, a surrogate-based probabilistic power flow analysis is performed by calling the surrogate and calculating the voltage PDF at every bus. Then, we identify the critical buses (i.e., buses with voltage violation probability larger than the prescribed threshold). In the next step, we target the most critical bus and perform the surrogate-based sensitivity analysis to rank the respective influential DGs (and accordingly the influential control actions). This ranking is carried out based on the structural sensitivity indices, and not the correlative ones, because control actions are taken one at a time and do not cause any correlative impact.
Once sensitivity analysis is performed and influential DGs are ranked, the last step is to modify the operation of influential DGs in order to bring the voltage violation probability at the most critical bus within the safety threshold. This is done by reducing the power factor of the top-ranked controllable DG. With each power factor reduction, a new surrogate-based power flow analysis is performed to evaluate the new voltage probability distributions. If for the most critical bus that is targeted, the violation probability is still critical after lowering the power factor at all the influential DGs to the lowest allowable level, PF min , then the algorithm sequentially curtails active power generation at the top-ranked controllable DGs to reduce the violation probability. Finally, if the violation probability is within the specified threshold at every bus, the algorithm will terminate.
It should be noted that both surrogate-based probabilistic power flow and sensitivity analysis incur minimal computational cost as they only involve evaluating the analytical surrogates at randomly drawn input samples. The discussion on how to choose the number of evaluation samples is included in Section IV. Also, the proposed control approach is a centralized approach, meaning that all control commands are sent by one central unit in the system. The research on how central unit communicates with DGs is out of the scope of this study, and can be found in e.g., [28].
It should also be highlighted that the proposed surrogatebased sensitivity analysis and control approach does not require exact knowledge of consumption and generation levels at a specific control step. Instead, it only incorporates information about the probability distribution of generation and consumption levels. These probability distributions can be obtained from offline analysis of historical data and in doing so, one does not necessarily need to closely monitor demand and generation levels. This is while the Jacobian-based methods necessitates extensive system monitoring [13]. Our proposed approach is also significantly more efficient compared to conventional perturb-and-observe approach [12] as for each candidate control action, it only evaluates analytical functions to calculate voltage distributions. This underscores the applicability of this approach for short-term probabilistic voltage control.
In the next section, we show the applicability and efficiency of our approach for a distribution network with a significantly large number of correlated random inputs and decision variables. It should be noted that the computational cost of constructing the surrogates does not directly depend on the size of the network, but rather on the number of random parameters and decision variables.

A. TEST CASE
The IEEE 69 bus test system (shown in Fig. 2) is chosen as the test case in this study. The nominal voltage of the system is 12.66 kV and the generator connected to node 1 is set to a voltage of 1.04 per unit.

1) GENERATION UNCERTAINTY
In this work, we consider small units of DGs that operate at a constant power factor as the uncontrollable DGs, for which the reactive power cannot be controlled. On the other hand, we consider larger DG units, for which the power factor can be modified during the voltage control, as the controllable DGs. We also consider both controllable and uncontrollable DGs to generate electricity from renewable energy sources. We assume that at each bus there exist small units of uncontrollable DGs. For all the uncontrollable DG units, we consider the mean active power generation of 0.02 MW and a random power generation following N (0.02, 0.005). On the other hand, the controllable DGs are considered to exist in nodes with numbers divisible by 4 (shown by tick lines in Fig. 2), and to have mean active power generation of 0.135 MW and a random power generation following N (0.135, 0.04). We also set the correlation coefficient between any pairs of active power generations in the network to be 0.8. This is because distributed generators typically use renewable energy sources which are random and highly correlated.

2) DEMAND UNCERTAINTY
We consider power consumption at every bus to follow a normal distribution, with their means set to be the nominal values of the benchmark IEEE 69 bus system and their standard deviations set to be 10% of the corresponding mean values. Similar to the correlation in generation levels, we assume the correlation coefficient between any pair of active power consumptions to be 0.8. We set the power factor for uncontrollable DGs to be 1 and set the allowable range of power factors for controllable DGs to be between 0.8 and 1. It should be noted that these assumptions are made without loss of generality to enable the numerical validation of our approach. Actual power consumption and generation distributions together with their correlation characteristics can be estimated based on historical data, geographical extent of the distribution system, spacing of DGs in the system, weather conditions, etc. It should be noted that any (continuous) probability distribution can be transformed to uniform or normal distributions that can be respectively used with Legendre or Hermite polynomials in a voltage surrogate. In general, it is easier to transform non-normal distributions to uniform distributions and use Legendre-based polynomial surrogates. This is because one can simply transform an arbitrary random variable with a given distribution into a uniform random variable using its cumulative distribution function (cdf). If needed, transformation to normal random variables can also be done (see [29], [30] for more details.) Next, we provide the validation results showing the efficiency and applicability of our proposed surrogate-based approach. In particular, for surrogate-based power flow analysis, we compare our method against the MC sampling approach, and for surrogate-based voltage control, we compare our approach against the perturb-and-observe approach.

B. VALIDATION OF SURROGATE-BASED VOLTAGE ESTIMATION
As the first step, at each bus we train a separate PCE surrogate to approximate its voltage as a function of all active power consumptions, active powers generated by uncontrollable DGs, active powers generated by controllable DGs, and power factors of the controllable DGs. So, for each voltage surrogate, there is a total of 170 input variables, that are characterized in Table 1. Out of these inputs, we identify and include only the influential variables using the proposed divide-and-conquer approach. To do so, we obtain simulation samples, which are random realizations of input variables and the associated voltage values obtained from power flow simulation. These samples include a 'training' set and a 'test' set. The latter always includes 100 samples and is used to calculate the validation error (see Algorithm 1). In what follows, we explain how to determine the number of samples in the training set.

1) SIZE OF TRAINING SET
To determine the number training samples, in a preliminary convergence analysis, we train surrogates using different choices for sample size, denoted by M , and record the approximation accuracy on a pre-specified test set. For each sample size M , we calculate the relative validation error averaged across all the buses and over 100 test samples. Specifically, the average relative validation error is given by 1  Figure 3 shows the convergence of this error measure versus the number of training samples. It can be seen that with only a few hundred training samples, good accuracy can be achieved in approximating voltage levels. Accordingly, in this work we used 500 training samples, at which the validation error seems to have converged.

2) VALIDATION AGAINST MC-BASED POWER FLOW RESULTS
In addition to the validation results of Fig. 3, we seek to validate our surrogate-based voltage estimates against the conventional MC-based estimates by comparing two probabilistic measures: (1) probability distributions, and (2) voltage violation probability. In order to accurately calculate these measures, we need to determine the number of 'evaluation' samples, N ev , based on which the estimated measures are converged. In this work, convergence is chosen to be a point beyond which the change in estimated mean and standard deviation is within 10 −3 . To determine N ev , we generate convergence plots for voltage levels at all the buses. As an example, Figure 4 shows the convergence of mean and standard deviation of MC-based voltage level at bus 20 as a representative bus. By inspecting all the 69 convergence plots, it was found that N ev = 5000 evaluation samples are sufficient. Using these evaluation samples, as a representative case, the surrogate-based and MC-based probability distributions of voltage at bus 20 are calculated and compared in Figure 5. We observed similar good agreements at all the other buses, as well. Furthermore, Figure 6 compares the voltage violation probabilities, defined in this work to be the probability of   voltage levels being greater than 1.05 p.u. or smaller than 0.95 p.u. It should be noted that in determining N ev and producing comparative results of Figures 5 and 6, we assumed the values for all the decision variables (i.e., the power factors at controllable DGs) to be 0.95.

C. VALIDATION OF SURROGATE-BASED VOLTAGE CONTROL
In this task, we develop a control framework to ensure that voltage violation probability, as defined earlier, is always less than 0.05 at every bus. After the estimation of voltage levels by either PCE or MC, it can be seen in Figure 6 that bus 27 has the highest voltage violation probability at about 0.14. We then use the proposed surrogate-based control approach, and compare it step by step against the conventional perturband-observe approach. For brevity purposes, we consider the two control options to include (1) power factor reduction, which is done by setting the power factor at a controllable DG to the lowest value of PF min = 0.8; and (2) power curtailment, which is done by reducing the active power (AP) generation at a controllable DG by 20%. Table 2 and 3 show the predictive control steps using the surrogate-based and the perturb-and-observe approach, respectively. As can be seen, the initial steps of the predictive control simply involves alleviating violation probability at bus 27 by reducing power factors at the corresponding most influential DGs. In the surrogate-based approach, in these first steps, we do not need to run any new simulations (in addition to the original 500 'training' simulations) to evaluate the violation probabilities or identify the most influential DG. On the other hand, in the perturb-and-observe approach, only in its first step, in order to identify the most influential DG, one needs 17 × 5000 new simulations. This is because there are 17 candidate controllable DGs for PF reduction, each requiring 5000 simulation samples to estimate its after-control impact on violation probabilities at bus 27. In the perturb-and-observe, out of these 17 candidate PF reductions, the one that results in the lowest violation probability is selected. Similarly, the second, third and forth steps of this approach require 16 × 5000, 15 × 5000 and 14 × 5000 simulation samples to select the most influential DGs, respectively. TABLE 2. Surrogate-based probabilistic control steps to achieve violation probability less than 0.05 at all buses. TABLE 3. Probabilistic control steps suggested by perturb-and-observe approach to achieve violation probability less than 0.05 at all buses.

By
Step 5 of the surrogate-based predictive control, the power factor of all influential DGs (i.e., those selected by the divide-and-conquer approach) had already been reduced to PF min . Therefore, we resort to the second control option (i.e., power curtailment at the influential DGs). After curtailing the active power at bus 24, as the most influential bus, the new violation probabilities were calculated using the polynomial surrogate, where the maximum value was found to be 0.048 at bus 27. Since this value is very close to the threshold of 0.05, we can use a refined violation probability estimation to minimize the impact of approximation errors. In particular, whenever the surrogate estimation for voltage values is within 1.05 ± 0.002 p.u., we switch to a Monte Carlo estimation of voltage value. This results in 150 additional simulations given that control action. As can be seen, at Step 6, the 'refined' voltage violation probability was calculated to be 0.053. Therefore, at an additional step, the active power at the most influential bus (i.e., bus 24) is curtailed again (by 20%) to ensure that the highest violation probability is less than 0.05.
As a comparison, in the perturb-and-observe control approach, after PF reduction of Step 4, one can observe an insignificant change in the violation probability of bus 27, decreasing from 0.077 to 0.07. Therefore, at Step 5 we consider both PF reduction and power curtailment as control options, hence the need to additional simulations. As can be seen in Table 3, active power curtailment was in fact the more effective option after that step.
In summary, it can be seen in Tables 2 and 3 that both approaches result in the same sequence of control actions. Figure 6b shows that both the exact and approximated violation probabilities at all the buses following the taken control actions are below the threshold of 0.05. This is while the surrogate-based predictive control approach requires substantially fewer simulation samples, thereby enabling more efficient short-term voltage control.

V. CONCLUSION
In this work, we considered a distribution network with uncertain and correlated power consumption and multiple distribution generators with uncertain and correlated active power generations. We showed that a relatively small number of simulations can be used to build a surrogate for voltage magnitudes. Specifically, we constructed PCE surrogates to estimate the voltage profile of the system as a function of active power consumption and generation, and power factors of distributed generators. The approximated PCE surrogates are then used for voltage approximation, efficient calculation of voltage probability distributions and an analytical sensitivity analysis. Moreover, since the approximated PCE surrogate is a function of power factors of distributed generators, it can be used with minimal computational cost to identify influential distributed generators for which power factor must be modified, whenever there is a critical voltage violation. With ever increasing penetration of distributed generators into power systems, fast and efficient voltage analysis is necessary. Our proposed method provides efficient computational tool for efficient and accurate voltage analysis of distribution systems. Results from the implementation of our proposed method on IEEE 69-bus system validate its accuracy and efficiency.