Improved Observability In Distribution Grids using Correlational Measurements

This paper develops a novel pseudo-measurement construct termed a Correlational Measurement (CM) to improve observability and state estimation accuracy. CMs utilize the correlation between similar classes of loads and DERs. The modifications required to accommodate CMs in traditional node voltage or branch current based state estimators are derived. Additionally, an observability analysis framework for distribution systems with DERs is presented to seamlessly integrate CMs. The framework assembles an observability test and an observable island identification procedure from the literature along with a novel observability metric. Finally, an on-line CM parameter estimation procedure is presented. Simulations on the IEEE 123 bus test system and real field data from Pecan Street demonstrate the utility of the proposed CMs and the observability analysis framework in improving observability and state estimation accuracy.


I. INTRODUCTION
I N the distribution grid of the future, the power injection from controllable DERs will be a powerful tool for outage management and restoration [1]. However, determining the optimal DER injections and restoration actions is dependent on knowledge of the system states. State estimation (SE), already the cornerstone of transmission energy management systems, will become commonplace in distribution management systems as more measurements become available from deployment of automated metering infrastrcture (AMI) [2]. Observability analysis is the first step in SE, as it determines the sufficiency of the available measurements in accurately estimating the current system states [3].
Traditionally, the literature on power system observability analysis could be classified either as topological observability that relies on the existence of a spanning tree of full rank or numerical observability that is based on the rank of the Measurement Jacobian [4]. References [5]- [7] present topological observability analysis methods. Topological methods are at a disadvantage because certain combinations of line parameter values can prevent SE solvability despite apparent observability [8]. References [3], [9]- [11] present numerical observability analysis methods. Numerical methods are vulnerable to ill-conditioning and errors associated with floating-point calculations [6]. Newer hybrid methods were suggested in [12]- [14] to take advantage of the best aspects of both approaches. Recently, [15] introduced a graphtheoretic criterion for local observability, and [16] introduced a probabilistic assessment indicating the accuracy of the SE. This paper's contribution focuses more on the development and integration of a novel pseudo-measurement so observability is measured using the traditional numerical method. This has the advantage of computational efficiency and often shared operations with SE [9].
In this paper, we apply the non-iterative Observability Test (OT) and Observable Island Identification Procedure (OIIP) in [9] to the gain matrix from the distribution system state estimator. Because sensors are few in distribution systems, pseudo-measurements based on historical load profiles [17] and zero-injection virtual measurements [18] are often employed to ensure observability. There are also newer machine learning based methods used to generate virtual measurements to increase observability [19]. However, both the observability and the accuracy of the state estimates are uncertain when historical data is limited/unavailable. In such cases, machine learning approaches are inappropriate and alternative methods are needed.
To this end, the authors present a new type of pseudomeasurement called a Correlational Measurement (CM). CMs encapsulate knowledge of correlation between demand patterns for similar classes of loads as well as injection patterns for same-technology renewable DERs. To utilize CMs, an Observability Analysis Framework (OAF) for primary distribution systems is also presented. The framework includes an efficient OT and OIIP from the literature and a novel Observability Metric (OM) that quantifies the accuracy of the state estimates.
Related pseudo-measurements in the literature fall into three categories: 1) sensor correlation [20], 2) intra-node correlation [21], [22], and 3) inter-node correlation [22], [23]. References [22], [23] consider correlation due to geographic proximity for probabilistic power flow studies. In this paper, the authors represent inter-node correlation of medium voltage loads and DERs using a novel linear formulation. Load profiles exhibit similarities due to class (residential, commercial, industrial, etc.) [24]. DERs such as wind turbines located close together also exhibit similar generation profiles [25]. CMs capture both sources of correlation and interface naturally with the familiar SE algorithms. It is demonstrated that the proposed CMs improve system observability and SE accuracy. The traditional Node Voltage based State Estimation (NVSE) and Branch Current based State Estimation (BCSE) are employed in this paper due to their widespread use, but CMs could easily be incorporated into newer approaches such as Semi-Definite Programming (SDP) relaxation [26], Least Absolute Value (LAV) [19], matrix completion [27], etc.
This work provides four key contributions: 1) We introduce a new class of measurement, i.e., CMs, to improve the system observability and SE accuracy. 2) To leverage CMs, we derive the modifications to the standard node voltage and branch current based SE algorithms and discuss the comparative benefits when using CMs. 3) We propose an on-line CM parameter estimation scheme using a least squares approach. 4) We present a novel OM to quantify the SE accuracy within an observable network. CMs and the proposed OM will provide the Distribution System Operator (DSO) better visibility into the network. Simulations on the IEEE 123 bus test system and real field data from Pecan Street demonstrate the utility of the proposed CMs and the observability analysis framework in improving observability and state estimation accuracy. Ultimately, this will facilitate enhanced grid resiliency through more effective utilization of controllable DERs.

A. OBSERVABILITY TEST & OBSERVABLE ISLAND IDENTIFICATION
This section briefly outlines the OT and OIIP presented in [9]. The following model relates measurements z ∈ R M to the state x = θ θ θ v T ∈ R N that is composed of voltage angles and magnitudes at all the nodes: where, h(·) is a nonlinear function that maps the state to the measurements and r refers to the measurement errors (assumed zero-mean Gaussian). Observability of the power system can be stated using the gain matrix G as follows: A distribution system is observable if and only if G is full rank [3]. G is computed as follows: where, H = ∂h(x) ∂x is the measurement Jacobian and R = E rr T is the diagonal error covariance matrix. Further details can be found in [3], [9].

B. STATE ESTIMATION
Once observability is guaranteed, state estimation can be carried out [28]. One procedure is the Maximum Likelihood method, which estimates the state asx, the solution that minimizes the measurement error z − h(x). That is,x is the solution of: There are two popular methods for determining the solution: 1) NVSE and 2) BCSE.

1) NVSE
NVSE uses the nodal voltage magnitudes and phases as state variables. The optimality condition for Eq. (3) is as follows: (4) where, the superscript NV refers to the choice of state variables and H NV = ∂h NV (x NV ) ∂x NV . The venerable Gauss-Newton method for computing the solution of Eq. (4) follows [29]: The converged state estimate from the above iterative method is denotedx NV .

2) BCSE
BCSE uses the real and imaginary components of the branch currents as state variables: x BC = I r I x T . This approach requires converting power measurements z into equivalent current measurements z BC (see [28] for details). Using the equivalent current measurements z BC , the measurement function h(x) becomes linear and the objective function of Eq. (3) simplifies: The minima can be found directly from the optimality condition ∂J ∂x = 0: In order to find the state estimatex BC , the following algorithm is used: Step 1: Use the node voltage estimates u k−1 to convert power flow measurements into current measurements.
Step 2: Solve linear system in Eq. (9) to obtain an estimate of the line currents.
Step 3: Update the node voltage estimates u k using the forward sweep procedure. See [30] for details.
Step 4: Check for convergence of node voltage estimates. If converged, stop. Else, return to Step 1. Fig. 1 shows a schematic of the proposed OAF. First, the vector z is constructed using all available measurements, pseudo-measurements, virtual measurements, and CMs. Then, the OT described in Section II-A is run. If the system is observable, then either NVSE or BCSE is performed using all the measurements. The converged gain matrix G from the state estimator is used to compute the OM.

A. OBSERVABILITY ANALYSIS FRAMEWORK (OAF)
If the system is not observable, the observable subnetworks and corresponding measurement sets are identified. For each subnetwork, a slack bus with a voltage measurement is chosen. If a voltage measurement is present, that bus is a natural choice for the slack.Then NVSE or BCSE is performed for each subnetwork. The set of converged gain matrices G i is used to compute a set of OMs.

FIGURE 1: Observability analysis framework flowchart
The observability metric O is defined as the condition number of G: where λ max (·), λ min (·) return the largest and smallest eigenvalues of the matrix argument, respectively. G can be the converged gain matrix from either NVSE or BCSE. Larger values of the dimensionless O indicate less accurate state estimates. For reference, a condition number exceeding 10 12 may yield an inaccurate state estimate [31]. Alternative metrics are provided in [32]. The condition number is preferable because it captures both the "distance" from singularity and the numerical conditioning of the gain matrix. Because sensors are not yet widely deployed in the distribution system, full observability is unlikely. Therefore, we introduce the two types of CMs-CLMs and CDMs-as a means of increasing observability.

B. CORRELATIONAL LOAD MEASUREMENTS (CLMS)
The authors of [33] have proposed a linear regression model for correlated low voltage residential loads. In this section, we propose a similar formulation for CLMs, but for correlated medium voltage loads from the same load class (residential, commercial, industrial, etc.). CLMs are designed for the primary distribution system because of the increased correlation that results from the aggregation of smaller, less predictable loads [34]. Consider a load (power injection) measurement z p i , z q i at node i [29]: where, p i , q i represent the true real and reactive load values, are the models of the real and reactive load, and r p i ∼ N (0, σ 2 i ), r q i ∼ N (0, σ 2 i ) are the sensor noise. The phase superscript has been dropped for clarity. Let the load at node j be linearly correlated to the (same phase) load at node i with constant parameters c p , c q , d p , d q as follows: Suppose that the parameters in Eq. (12) are composed of a deterministic part c p 0 , c q 0 , d p 0 , d q 0 and a stochastic part r p c , r q c , r p d , r q d :

1) Deterministic part known
In this case, a CLMẑ p j ,ẑ q j can be constructed from the measurement z p i , z q i at node i as follows: In order to use the CLM for SE, it must be written in the formẑ j = h j (x) +r j for some stochastic componentr j . It can be shown thatr j has the following form: The three sources of uncertainty in the CLM are measurement noise from z i and the stochastic components of the constants c, d. VOLUME 4, 2016 In order to obtain the maximum likelihood state estimate using the CLM for NVSE, the error covariance matrix R NV must be updated. Assuming that r c ∼ N (0, σ 2 c ) and r d ∼ N (0, σ 2 d ), the nonzero entries in the rows and columns of R NV corresponding to measurements z p i ,ẑ p j are given by the following: where, the superscript NV on h p i (x) has been dropped for brevity. The zero-mean assumption for the r c and r d terms follows from the assumption of known deterministic components c 0 , d 0 . The entries in R NV corresponding to reactive power measurements z q i ,ẑ q j are identical. See Section III-D below for the details on incorporating CLMs in NVSE.
For BCSE, there is an alternative approach: the CLM can be represented as a linear equality constraint using the node voltage estimates. Under the assumption that c p 0 = c q 0 ≡ c 0 , it can be shown that Eq. (12) can be rewritten as follows: This assumption can be regarded as the loads at nodes i and j having the same power factor. This is reasonable considering that the loads belong to the same class, and the power factor is maintained by various control mechanisms such as volt-var control. See Section III-D below for the details on incorporating the above equality constraints in BCSE.

2) Deterministic part unknown
In this more realistic case, we use the estimates of the parametersĉ p ,ĉ q ,d p ,d q to construct the CLMẑ p j ,ẑ q j : For NVSE, the R NV matrix should be updated according to Eq. (17) replacing c 0 , d 0 withĉ,d respectively. Similarly for BCSE, the equality constraints shown in Eqs. (19,20) should use the estimatesĉ,d. If we have access to even sporadic measurements z p j of the load at node j, we can use them to improve our parameter estimates for CLM on-line. These sporadic measurements could be the result of infrequent manual monitoring by technicians. Using vector notation and omitting the identical equations for reactive power, we have: Under the assumption that r c ∼ N (0, σ 2 c ) and r d ∼ N (0, σ 2 d ), we propose to improve our parameter estimatesη η η using a least squares approach [35]: where, k is the number of available measurements in z p j . As each new measurement z p j (k + 1) comes in, Eq. (25) can be used as a fixed-memory estimator to update the parameter estimates. Because the sensor noise r j is zero mean and W , r j are statistically independent, the above estimator is unbiased [35].

C. CORRELATIONAL DER MEASUREMENTS (CDMS)
This section presents the formulation of CDMs, the second type of CMs proposed in this work. CDMs could be used to represent nearby PV units that are correlated based on orientation and tilt angle or storage units that are correlated based on state of charge. In this paper, we utilize CDMs to capture the correlation between wind turbines that are located close together geographically. The correlation coefficient ρ ij between the generation p i , p j of two wind turbines can be modeled as an exponential function of the distance δ ij between them [25]: where ζ 1 = 20 miles and ζ 2 = 2 are constant parameters. ζ 1 is the average distances between the DER units and the measurement point and ζ 2 is an exponential term. These selections can be changed if the generated CDM are not accurate enough. The correlation coefficient takes values between 0 and 1 and measures the linearity of the relationship between the generation of the two DERs. The authors propose the following CDM formulation combining ρ ij with the CLM formulation in Section III-B. Suppose the generation p j , q j of an unmeasured DER is correlated to the generation p i , q i of a measured DER with an uncorrelated componentp j ,q j : where, as before c p , d p , c q , d q are composed of a deterministic part and a stochastic part (see Eqs. (13,14)). Let us also suppose that the uncorrelated componentp j ,q j is a parameter composed of a deterministic partp j0 ,q j0 and a stochastic part r p j ,r q j :p j =p j0 +r p j ,q j =q j0 +r q j wherer p j ∼ N (0,σ 2 j ),r q j ∼ N (0,σ 2 j ). We consider two cases below.
If the deterministic parts of all the parameters in Eq. (28) are known, then we can construct a CDMẑ p j ,ẑ q j as follows: The parametersp j0 ,q j0 can be viewed as a forecast for the generation of the unmeasured DER. Following the same procedure as in Section III-B1, we write the CDM in the form z j = h j (x) +r j for some stochastic componentr j which yields:r For NVSE, this leads to the following entries in the error covariance matrix R NV in the rows and columns corresponding to the measurements z p i ,ẑ p j : where the superscript NV on h p i (x) has been dropped for brevity. The entries in R NV corresponding to reactive power measurements z q i ,ẑ q j are identical. For BCSE, the CDM can be represented as a linear equality constraint using the node voltage estimates. Under the same power factor assumption as for a CLM (c p 0 = c q 0 ≡ c 0 ), it can be shown that Eq. (28) can be rewritten as follows: The assumption is reasonable for CDMs because most DERs are interfaced to the grid via inverters using a constant power factor mode [36]. See Section III-D below for the details on incorporating CDMs into either NVSE or BCSE.

2) Deterministic part unknown
A CDM contains three parameters, but the coefficients of d,p,q are constants. Only the coefficient of c is time-varying. As a result, only two parameters can be estimated on-line using sporadic measurements. Therefore, we build a CDM of the following form: where we have set the d parameter in Eq. (30) to zero and replaced the remaining parameters' known deterministic parts (subscripted 0) with estimates (denoted by the hat notation). For NVSE, the R NV matrix should also be updated according to Eq. (32) replacing the parameters with subscript 0 with their estimates and eliminating the σ 2 d term. Similarly for BCSE, the equality constraints shown in Eqs. (34,35) should be modified.
Writing Eq. (28) with d set to zero in vector notation and the CDM in Eq. (36) in vector notation yields the same result as before (Eq. (22)). However, the vectors w, η η η have different contents. Omitting the identical equations for reactive power, we have: Under the zero-mean Gaussian assumptions on the stochastic components of the parameters, we can improve our parameter estimatesη η η using the same least squares approach in Eqs. (25,26).

D. STATE ESTIMATION WITH CMS
For NVSE, R NV is no longer diagonal and is now a function of the state x as a result of the CMs. The NVSE problem now has the following form: where the superscript NV has been omitted for brevity. The CMs introduce an additional squared trigonometric nonlinearity. The CMs yield the following problem for BCSE: where the linear equality constraints in Eqs. (19,20,34,35) are encapsulated in Γx = γ γ γ, and the superscript BC has been omitted for brevity. The above quadratic program can be solved by forming the Lagrangian and checking the optimality conditions. The resulting system of linear equations with Lagrange multipliers ν ν ν can be solved to find the solution x: The OT and OM can utilize the augmented matrix above.

IV. SIMULATION RESULTS
This section presents three simulation results obtained using MATLAB. All simulations were performed on an Intel Core i7 machine with 12 GB RAM and a 512 GB SSD. The state estimation solution typically converges in less than a second. First, we present a baseline case with partial observability and VOLUME 4, 2016 FIGURE 2: Modified IEEE 123 node test feeder topology. Black line segments correspond to 3φ lines, whereas colored segments represent 1φ or 2φ. Green boxes represent 3φ power flow meters, green circles represent 3φ power injection meters, and blue x's represent a lack of historical data. DERs are represented as large black circles with a tilde inside.
partial estimation of the islands' state in Section IV-A. This corresponds to the lower path of the OAF flowchart in Fig.  1. Second, we demonstrate the benefit of the CMs: complete observability and complete estimation of the islands' state in Section IV-B. This corresponds to the upper path of the OAF flowchart in Fig. 1. Third, in Section IV-C we present the case in which sporadic measurements of the nodes with CMs are available allowing on-line estimation of the CM parameters.
We consider a scenario for the modified IEEE 123 Node Test Feeder shown in Fig. 2 in which the switches between nodes (18,135) and (13,152) open, and the switch between nodes (450,451) closes. All other switches maintain their nominal status. This creates two independent feeders in the network as shown. Note that the voltage regulators, cap banks, and distribution transformer are neglected because they do not affect the observability. The nodes marked without pseudo-measurements were chosen on the periphery of the network for demonstration so as to maintain observability of the bulk of the islands. They are picked for illustrative purposes only, CMs can be used anywhere in the system.
The measurement error of the meters is zero mean with a standard deviation of 5% of the measured value; the measurement error of the pseudo-measurements is zero mean with a standard deviation of 20% of the measured value. Typical standard deviation for SCADA measurements is in the 2% range per literature [37]. We have a larger standard deviation in the formulation to study the validity of the proposed approach for large errors.

A. OBSERVABILITY ANALYSIS AND STATE ESTIMATION WITHOUT CMS
Without the historical data at the indicated nodes, the two islands are not fully observable. As a result, the OIIP removes the unobservable nodes and branches to perform SE of the remaining networks. Pruning the network to remove unobservable branches allows SE to proceed; however, it introduces topology error compared to the true system. Fig.  3 displays the NVSE and BCSE results for the observable nodes in the system. Table 1 contains the OM values and corresponding SE error. As mentioned previously, OM values exceeding 10 12 could indicate poor state estimation accuracy. It is known that NVSE is susceptible to ill-conditioning as system size increases; this is shown in the OM values of Island 2. However, the OM values from NVSE should not be compared to BCSE because the gain matrices have completely different forms. For both estimators, Island 2 has a higher metric value than Island 1, and the SE error is likewise higher. This can be attributed to the lower ratio of measurements to nodes in Island 2.

B. OBSERVABILITY ANALYSIS AND STATE ESTIMATION WITH CMS
In this section, we achieve full observability of both islands in Fig. 2 by leveraging CLMs and CDMs. Each load node without historical data is given a CLM based on the nearest (same-phase) load node with a pseudo-measurement. The unmeasured DERs at nodes 16 and 64 are given CDMs based on the measured DERs at nodes 29 and 63, respectively. The DERs in Island 1 are assumed to be 10 miles apart geographically; the DERs in Island 2 are assumed to be 1 mile apart. The error associated with the CMs' parameters is zero mean with a standard deviation of 10% of the true value. Fig. 4 displays the NVSE and BCSE results. Note that with CMs, both NVSE and BCSE algorithms are able to estimate the states for all nodes in the test system. This is because, the proposed CMs have improved the observability compared to the base case. Regarding the implementation, occasionally the error covariance matrix R can become singular for NVSE. This was handled by replacing R with a "damped" version: R = R + λdiag(R) where λ is chosen to be as small as possible but large enough to ensure invertibility. This approach is based on the Levenberg-Marquardt algorithm [38]. Table 1 contains the OM values and corresponding SE error. Island 2 once again has a higher metric value than Island 1, and the SE error is likewise higher. As before, this can be attributed to the lower ratio of measurements to nodes in Island 2. Compared to the results without CMs, it can be seen that the NVSE error increases due to the relatively high uncertainty associated with the CMs. On the other hand, BCSE's error decreases because the truncated system topology resulted in greater error than the CM constraints. The OM does not reflect this improved accuracy because the OM does not capture the topology error introduced by  truncating the system. If the baseline case did not have CMs and used the correct topology, then the OM would remain at 3 × 10 6 , and the SE error would decrease to less than 0.6%.

C. ON-LINE PARAMETER AND STATE ESTIMATION
This section presents the simulation results for state and parameter estimation with CMs. This is how the authors actually envision CMs being used because the DSO is unlikely to have accurate estimates of the CM parameters. The same modified IEEE 123 Node Test Feeder in Fig. 2

D. VALIDATION OF PROPOSED CMS USING PECAN STREET DATA
As demonstrated in the results above, the use of CMs has the potential to significantly increase the observability of distribution systems. To validate that the model of CMs proposed in Section III, we use the Pecan street dataset. The Pecan street dataset is the "the largest source of disaggregated customer energy data", as described on their website. Consumer load data and solar generation for a distribution system feeder in Austin, TX are used to derive the correlational measurements. The Pecan street dataset does not provide GIS coordinates or any other geographical identifiers, so it is assumed that sequential node numbers are adjacent. This is inline with the description provided in the dataset. To demonstrate the validity of the CM model, a "source" node i is used to estimate the measurements for the adjacent node j. This exercise is carried for measurements over one week. The Fig. 6 shows the measurements for node i, node j, and CM for CLM and CDM cases. It can be seen that the CM model closely follows the actual node j measurement, and the RMSE was 0.8418 and 0.1976 for the CLM and CDM cases respectively. It is important to note that the notion of geographical distance between the DER units modeled in Eqn. 29 has not been used for this validation as the relevant data is not available in the dataset. The RMSE can be reduced further if the correlation model can be enhanced with infrequent measurements from the load, and geographical correlation for the DERs.

E. CONFIDENCE INTERVALS FOR STATE ESTIMATION
To ensure that the state estimation augmented with CMs can be used by the operators, confidence intervals are constructed FIGURE 7: Confidence intervals for state estimates calculated using correlational measurements for the states estimated using CMs. The confidence intervals are constructed using well established methods in statistics, and following the work of authors in [39]. As detailed in Section II-B, the state estimation can be stated as the MLE that estimates the statex, the solution that minimizes the measurement error z − h(x). The residual vector from the MLE r i is used to calculate the mean and variance as follows: The value of the objective function follows the χ-squared distribution, and the χ-squared table is used to obtain the confidence interval for the parameters estimated. The confidence interval is shown in Eqn. 43.
x j − ∆ σ 2 C j j ≤ x j ≤x j + ∆ σ 2 C j j In Eqn. 43, x is the vector of states, C jj is the j th diagonal element of (H T H) −1 , and ∆ is defined as follows - where, χ 2 refers to the Chi-squared distribution, p is the number of estimated states, and n is the total number of measurements used. The confidence intervals for the state estimated using CMs is shown in Fig. 7. The confidence interval allows the system operator to quickly decide if the state estimates can be used for further analysis. The measurements that do not pass the chi-squared test are removed from the state estimation.

V. CONCLUSIONS
In response to the lack of observability of distribution systems, this paper presents the concept of correlational measurements (CMs) and an observability metric (OM). CMs can be used by system operators to leverage known correlation between loads and between DERs in the system. In order to use CMs to obtain a maximum likelihood state estimate, the required modifications to the classical NVSE and BCSE algorithms are derived. The proposed OM quantifies state estimation accuracy. Simulation results evidence the value