Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee

Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.


I. INTRODUCTION
Distribution grid line outage occurrence detection and localization is essential for efficient system monitoring and sustainable system operation [1].A timely identification of the line outage effectively reduces potential financial loss.According to the U.S. Energy Information Administration, customers had an average of 1.3 outages and went without power for four hours during 2016 [2].The frequency and severity of line outages caused by extreme weather events and power supply shortages have also increased in recent years.
The traditional line outage identification in distribution grids relies on passive feedback from customer reporting [3] or the "last gasp" message from smart meters [4], which is a notification automatically transmitted to the utility when power to the meter is lost.However, the performance of these methods will degrade while the transmission of the "last gasp" signal is not assured [5].For instance, as the growth of distributed energy resources (DERs) penetration in distribution grids, customer can still receive power from the rooftop solar panels, battery storage, and electrical vehicles when there is no power flow in the distribution circuit connecting to the customer.So the smart meter at the customer premises cannot report a power outage.Moreover, some secondary distribution grids are mesh networks in urban areas.In this scenario, a single line outage caused by circuit faults and human activities may not cause a power outage due to alternative paths for power supply.In this second case, we will also observe smart meters measuring power injections without sending the "last gasp" notification for reporting outages.
While alternative power sources make the "last gasp" notification fail to report outages, can we still find the line outage time and location?Answering this question, recent lit-erature aimed at collecting additional information for smarter decisions.For example, power measurements, such as phasor angles from phasor measurement units (PMUs), were modeled in [6] as a Gaussian Markov random field to track the grid topology change.Other power measurements, like power flows and load estimates, were also utilized in a compressive system [7] and hypothesis-test-based detection method [8].Nonpower measurements were explored as well, such as human network information from social media [9] and the weather information from environment [10].In distribution grids, obtaining measurements such as micro-PMUs and accurate power flow data can be challenging and costly, as they are not commonly deployed in households.To address this limitation, our earlier research [3] demonstrated that utilizing readily available voltage magnitudes could still yield accurate outage identification outcomes.However, an in-depth examination of the probability distribution of voltage data and a theoretical guarantee for learning this distribution were not included in our previous work.These aspects are crucial for understanding the outage identification procedure.Besides, the method in [3] has feasibility and accuracy issues when learning the probability distribution.In this work, we fill the above gaps via a novel approach with theoretical guarantees.
To utilize the aforementioned measurements, both deterministic and probabilistic methods were proposed.Deterministic methods usually set up a threshold and declared the outage when the change of data exceeds the threshold.Such methods are simple to apply but cannot accurately discern data change in complex or large-scale grids.Probabilistic methods analyzed the data spatially or temporally.For spatial analysis, [11] studied graph spectral to assess the grid topology for line outage detection.However, such methods required the grid topology as a prior.For temporal analysis, tracing the probability distribution change of the time-series measurements is a common approach [3].This is usually studied in the change point detection framework, which aims to find the distribution change of measurements as quickly as possible under the constraint of false alarm tolerance [12].Such framework has been used in line outage and fault detection in transmission grids [13], [14] and DC micro-grids [15].Although the change point detection framework assures optimal performance [16], it typically necessitates knowledge of both distributions before and after the change.Nevertheless, in distribution grids, this requirement is not practical as the post-outage distribution is unpredictable due to the large number of possible outage patterns, whereas the pre-outage distribution can be learned from historical measurements.
For removing the impractical requirement discussed above, methods were proposed to provide approximation or simplification of the unknown post-change distribution in change point detection.For instance, an approximated maximum likelihood estimation of unknown distribution parameters was proposed in [3].A convexified estimation of the unknown distribution approach was introduced in [17].[18], [19] bypassed the requirement in restricted distribution cases with partially unknown information (e.g., scalar Gaussian with unknown means and known variances).While these methods may mitigate the incompleteness of post-outage information, they have limitations on detection performance and parameter estimation.
In this paper, we propose a practical and straightforward method for utilities to identify line outages with unknown outage patterns.To address the challenge of limited data availability, our approach relies solely on voltage magnitudes obtained from smart meters.This is advantageous compared to expensive phase angle measurements and accurate power flow data, as voltage magnitudes are more readily accessible in typical distribution grids [20].For the utilize of voltage magnitudes, we have made distinctive contributions.We demonstrate that the increment of voltage magnitudes before and after a line outage follows two distinct multivariate Gaussian distributions, where the distribution parameters are influenced by grid connectivity.Moreover, we provide theoretical guarantees for learning the unknown probability distribution parameters based on voltage magnitude data.By effectively utilizing voltage magnitudes and incorporating theoretical guarantees, we address the limitations posed by the absence of precise phase-angle data.Through the detection of changes in the learned Gaussian distributions, we can successfully identify line outages.
The second challenge is the unavailability of post-outage distribution parameters as analyzed earlier.To address this issue, we propose a data-driven method that directly learns these unknown parameters using Projected Gradient Descent (PGD).While Gradient Descent (GD) is susceptible to feasibility issues in parameter estimation, the iterative nature of GD allows us to control the parameter updating trajectory.Specifically, we formulate the distribution parameter learning problem as a projection optimization problem constrained by the Bergman divergence [21].This not only resolves the feasibility issue but also leads to accurate parameter estimation with theoretical guarantees.By accurately learning the parameters, our approach can effectively detect and localize line outages, even in large grids.
In addition to accuracy, utilities are also concerned with timely operation.By utilizing the statistical and physical characteristics of voltage data, we can limit the search space of unknown parameters to a convex set, which allows for fast and accurate recovery of the post-outage distribution.We have demonstrated that PGD can achieve optimal parameter learning with a polynomial-time convergence guarantee.Furthermore, we have developed an efficient implementation of the PGD algorithm, which reduces computational time by 75% and makes it particularly well-suited for timely grid operations.
In summary, our proposed method offers several contributions.Firstly, it only requires simple data but have theoretical guarantees.Secondly, it does not require prior knowledge of the outage pattern.Thirdly, it enables timely operation.Furthermore, our approach comes with performance guarantees and does not rely on knowledge of the distribution grid's topology, nor does it require all households to have smart meters data.The method is validated using four distribution grids and real-world load profiles with 17 outage configurations.In the following, Section II models the problem of line outage identification.Section III discusses the voltage data and identification procedure.Section IV extends to identification with unknown outage pattern.Section V provides performance guarantees on timely operation.Section VI evaluates our method.Section VIII concludes the paper.

II. SYSTEM MODEL
For showing our probabilistic design for change point detection and localization, we define variables on a graph probabilistically.Specifically, we model the distribution grid as a graph G := {1, 2, • • • , M } containing M > 0 buses connected by branches.Then, the voltage data from each bus i ∈ G is modeled as a random variable V i .As a time-series, its realization at time n is denoted as represents the voltage magnitude in per unit and θ i [n] ∈ R is the voltage phase angle in degrees.These steady-state measurements are sinusoidal signals at the same frequency.It's worth noting that unlike PMUs, smart meters typically do not measure phase angles.Therefore, we want to emphasize that even though voltage is represented in its phasor form, solely using the voltage magnitude can still effectively identify a line outage.
In the distribution grid G, the collection of voltage variables is modeled as since V G usually do not follow a regular distribution [3], we model the increment change of voltage data as ∆V G , whose realization at time n is ∆v For the sake of simplicity, we also use the notation Based on the modeling, the problem of identifying the distribution grid line outage is formally defined as follows.
• Given: Voltage increments ∆v 1:N from the smart meters.
• Find: The line outage time as soon as possible and the outof-service branch as accurate as possible.

III. OUTAGE IDENTIFICATION VIA VOLTAGE MAGNITUDE
While the expensive phasor angles and accurate power flows are hard to obtain in distribution grids, [3] showed that the easier-to-acquire voltage magnitude could be utilized to identify the line outage.The authors found that although voltage data do not follow a regular distribution, the incremental change of voltage follows Gaussian distribution.However, two things were missing in [3]: a clear formula of the distribution and an elaborate analysis of how such distribution is affected by line outages.They are the key to understanding the procedure and performance of identifying the line outage, which will be discussed in detail in the following subsection.

A. Gaussian Distribution of Voltage Increment
For answering the missing question, in this subsection, we elaborately prove that the increment of voltage data ∆V G follows two multivariate Gaussian distributions before and after the line outage, and provide a clear formula of such distribution.In doing so, we can identify the outage via tracing the change of the Gaussian distribution.
To study the distribution of ∆V G , we start from the Kirchhoff's Current Law: the relationship between voltages V G ∈ C M and currents where the admittance matrix Y G ∈ C M ×M can be derived through the connectivity of the grid as [22] Y In the above equation, E denotes the set of branches in the grid G.A E,G ∈ R |E|×M is the incidence matrix where each row represents a branch, and has exactly one entry of 1 and one entry of −1 to denote the two buses connected by this branch.We can swap the −1 and 1 since the grid network is undirectional.Y E ∈ C |E|×|E| is a diagonal matrix with the series admittances of each branch, and Y s G ∈ C M ×M is a diagonal matrix with the total shunt admittances at each bus.
By representing Y G in (1), we can discuss the invertibility of Y G , which prepares us for the distribution analysis of ∆V G .To this end, we assume that the branches are not electromagnetically coupled and have non-zero admittance, i.e., Y E is full-rank.This assumption is common in distribution grids [22].With full-rank Y E , we show the invertibility of Y G in Lemma 1.
Lemma 1.In a connected distribution grid (G, E), the admittance matrix Y G ∈ C M ×M is invertible after eliminating the slack-bus corresponding column and row.
In the following, we consider the eliminated admittance matrix and keep the notation unchanged for convenience.Based on Lemma 1, the relationship between voltage increments ∆V G and current increments ∆I G can be expressed as To derive the distribution of ∆V G , we further introduce a common assumption regarding ∆I G .We consider that ∆I at each non-slack bus is independent and normally distributed: ) This statement is adopted and validated by real data in many works [23]- [25], where the authors computed the mutual information between current injections to justify the independence.The empirical histogram in Fig. 2  Theorem 1. Provided that (3) hold, ∆V G (excluding slackbus) in a connected distribution grid (G, E) 1) follows a multivariate Gaussian distribution, 2) still follows a multivariate Gaussian distribution (with different mean and covariance) after grid topology changes.
Proof.With (2), ∆V i of each bus i ∈ G (slack-bus is excluded from G) can be expressed by ∆V i = k∈G Z ik ∆I k , where Z ik is the (i, k) element of Z G .Hence, any non-trivial linear combination of ∆V i , i ∈ G can also be represented by a linear combination of ∆I k , k ∈ G, and is normally distributed.This implies that the joint distribution of where When the grid topology is changed (e.g., due to a line outage), the incidence matrix A E,G changes accordingly: if branch l connecting bus i and k is out-of-service, (A E,G ) l,i and (A E,G ) l,k become zero.Denoting the new incidence matrix as A E,G , there are two scenarios: • The grid network is still connected (which is our focus in this paper).In this case, the changed admittance matrix follows a multivariate Gaussian distribution, only with different mean µ and different covariance Σ calculated according to Z G .
• The grid network is disconnected.In this case, we view the network as disjoint islands where each part is a connected sub-network, e.g., By doing so, we can write the incidence matrix in block format, e.g., ).According to the first case, voltage increments in each sub-network follow a multivariate Gaussian distribution, and so does their joint distribution.In this scenario, since some houses lose power connection and will have zero voltages, the outage time and location can be more easily found via our approach.
Suppose the outage occurs at time λ, Theorem 1 allows us to write the sequence of voltage increments as where g denotes the pre-outage Gaussian distribution and f denotes the post-outage Gaussian distribution.The mean vectors µ 0 , µ 1 , along with covariance matrices Σ 0 , Σ 1 , are the parameters of these distributions.In our work, the preoutage parameters µ 0 and Σ 0 can be estimated using historical data during normal operation periods of the distribution grid [3].The post-outage parameters µ 1 and Σ 1 are considered unknown to reflect real-world outage scenarios, since the outage pattern is unpredictable/unknown.To visualize the varying distribution of the sequence, we provide an illustration of |∆v

B. Outage Identification via Distribution Change
Before proposing our novel solution to unknown outage pattern, we present the commonly used framework to find the outage time λ and outage branch, given voltage data in (5).
To identify the outage time λ, we conduct a sequential hypothesis test H 0 : λ > N and H 1 : λ ≤ N at every time step N .As N increases, the first time we reject the null hypothesis H 0 determines the value of λ.To decide when to reject H 0 , we compute the posterior probability ratio at each time step N as where λ ∈ N is assumed to follow a prior distribution π.The posterior probability ratio in (6) compares the probabilities of "outage occurred (λ ≤ N )" and "outage did not occur (λ > N )" based on the historical measurements ∆v 1:N .A larger posterior probability ratio indicates that "outage occurred" is more likely than "outage did not occur."Therefore, we declare the outage when the ratio (6) exceeds a predefined threshold.By the Shiryaev-Roberts-Pollaks procedure [12], [16], the following threshold in Theorem 2 optimally considers the trade-off between the false alarm and the detection delay.
Theorem 2. (Line outage detection).When λ follows a geometric prior Geo(ρ), we declare the outage at the first time when posterior probability ratio Λ(∆v where the false alarm rate P(τ < λ) is upper bounded by maximal false alarm rate α.As α → 0, τ in (7) is asymptotically optimal for minimizing the average detection delay where D KL (f ||g) is the KL divergence between f and g.
One notable feature of the detection procedure described above is its ability to function effectively without requiring knowledge of the grid topology.Additionally, it can handle non-Gaussian distributions for f and g.As depicted in Fig. 1(c), we calculate the posterior probability ratio sequentially and identify the outage time when the ratio exceeds the threshold.
Once the line outage occurrence is detected, localizing the out-of-service branch is also critical for system recovery.In [3], the authors proposed an accurate outage localization method by proving that the voltage increments of two disconnected buses are conditionally independent.They computed the conditional correlation of every possible pair of buses in the grid and checked if the value changes from non-zero to zero.This approach differs from the utilization of nodal electric circuit matrices [26], [27] for estimating fault location, while our approach has also been effective (as shown in Section VI-D) and capitalizes on the learned covariance matrix in scenarios where the post-outage distribution is unknown.
To estimate the conditional correlation between bus i and bus k, the covariance matrix Σ is utilized.Let set I := {i, k} and K := G\{i, k}, the covariance matrix is decomposed as Based on this, the conditional correlation ρ ik between bus i and bus k is where the conditional covariance is computed by the Schur complement [28] as (Line outage localization).The conditional correlation is calculated based on (9) for every pair of (i, k) as The branch between bus i and k is out-of-service if |ρ − ik | > δ max and |ρ + ik | < δ min .The thresholds are set as δ max = 0.5 and δ min = 0.1 based on real-world outage data to check if the correlation changes from non-zero to near-zero value.
According to Theorem 3, we track the change of covariance matrices to localize the out-of-service branch.Specifically, an out-of-service branch between bus i and bus k can be identified if both of the following conditions are met simultaneously: (1) |ρ − ik | > δ max indicating the presence of a branch between buses i and k before the outage, and (2) |ρ + ik | < δ min indicating the absence of a branch between buses i and k after the outage.Notably, this process still does not need the grid topology as a prior.

IV. OUTAGE IDENTIFICATION WITH UNKNOWN PATTERN
The detection and localization procedure in Section III requires knowing all the parameters of g and f in advance.However, this is impractical in real-world distribution grids.Specifically, although we know g and f are multivariate Gaussian distributions based on Theorem 1, the parameters (mean vectors and covariance matrices) of f are usually hidden.While the pre-outage distribution parameters can be learned by historical measurements during the normal grid operation, the post-outage distribution parameters are often unavailable.In fact, since there are a large number of branches and therefore, substantial combinations of outage patterns, we can not predict the outage pattern or the post-outage distribution parameters.Hence, we need to estimate the unknown parameters before conducting the aforementioned methods to identify the outage.
To resolve such issue, we propose a data-driven framework to learn the post-outage distribution parameters θ = (µ 1 , Σ 1 ) jointly.Specifically, we want to find the parameter set that minimizes the negative likelihood function L(µ 1 , Σ 1 ) as where L(µ 1 , Σ 1 ) is computed as To address the non-convex nature of the likelihood expressed in equation (12), the authors in [3] proposed a convex approximation using Jensen's inequality and derived closedform solutions for equation (11).However, the use of Jensen's inequality can introduce inaccuracies in the resulting closedform solutions, particularly in determining the minimum point.Furthermore, the estimated covariance matrix may not always be feasible.Specifically, a feasible covariance matrix must be positive definite, i.e., Σ 1 ≻ 0, and if this condition is not met during the learning process, the computation of the probability density of f can fail.
An alternative approach is using the Gradient Descent (GD) to find the solution to (11).While the vanilla GD also can not ensure the aforementioned feasibility of the parameters, the iterative learning nature in GD enables us to control the updating trajectory of parameters.

A. Unknown Parameters Estimation via Projected Gradient Descent with Bregman Divergence Constraint
To guarantee that the estimation of parameters θ = {µ 1 , Σ 1 } are always feasible, we introduce the Bregman divergence [21] to constrain the estimate in each iteration of GD and arrive at a series of optimization problems as In the above equation, θ i is the update of the i th parameter at e th iteration, θ (e) is a complement set, and η is the trade-off learning rate.The Bregman divergence provides a distance measurement between two variables θ i and θ (e) i , where Φ is a strictly convex differentiable function.Intuitively, it restricts the new estimate θ i is within the feasible domain, we can expect that the parameter θ i following (13) will update towards the direction of minimizing the negative likelihood and meanwhile satisfy a similar property (e.g., positive definiteness of covariance matrix) since the update step is restricted.
Finding the solution to (13) relies on one characteristic of Bregman divergence: its gradient with respect to θ i has a simple form where Φ is the differential of Φ.Based on this, we can eliminate the argmin in ( 13) by setting the gradient (with respect to θ i ) of the objective function to zero, and derive the following Projected Gradient Descent solution [29].
Lemma 2. The optimization problem in (13) is solved as With Lemma 2, we propose the important result of our paper: the learning scheme of unknown parameters with feasibility guarantee.We will further show that this learning scheme is accurate and has convergence guarantees.Theorem 4. (Projected Gradient Descent of learning µ 1 , Σ 1 ).With careful customization of Bregman divergence (i.e., choosing the appropriate function Φ), the Projected Gradient Descent learning in (14) becomes feasible.
The learning scheme is ) .( 16) The learning process of Σ 1 is shown in Fig. 3. Since matrix exponential maps any symmetric matrix to a positive definite matrix, the learning scheme (16) maintains the property of positive definiteness, i.e., Σ Besides the statistical properties (e.g., the covariance matrix is positive definite), smart meter data have physical properties as well due to grid operation.For example, because the standard range of voltage magnitude is between 0 p.u. and 1.1 p.u., the mean value of voltage increment should be between −1.1 p.u. and 1.1 p.u.. Our learning scheme in (14) can also satisfy this requirement by defining where µ i is the i th element of the mean vector µ 1 .
To conclude, when post-outage distribution parameters µ 1 , Σ 1 are unknown, we use ( 17) and ( 16) to accurately learn them with feasibility and convergence guarantees.The preoutage parameters µ 0 and Σ 0 can be estimated using historical data during normal operation periods of the distribution grid [3].By obtaining these parameters of the Gaussian density functions g and f in (6), we can explicitly calculate g and f .It enables us to implement Theorem 2 for detecting the outage time and use Theorem 3 for localizing the outage branch.This framework is summarized into Algorithm 1.
1 from (N −1) th step // warm start for e = 0, 1, . . .do µ )| ≤ 10 PERFORMANCE GUARANTEE In addition to the feasibility issue that has already been addressed in Theorem 4, the accuracy and computation time of the proposed learning scheme are two other concerns when implementing such a method for real-world outage identification.In this section, we demonstrate that our proposed method can achieve the optimal parameter solution with a guaranteed convergence.Furthermore, we present an efficient implementation for timely operation.

A. Restricted Convexity for Convergence Guarantee
While the non-convexity of likelihood L(µ 1 , Σ 1 ) in (12) hinders us from deriving a convergence analysis directly, we note that L(µ 1 , Σ 1 ) is constrained convex.Specifically, we notice that the unknown parameters µ 1 , Σ 1 of f are not supposed to be far away from the known parameters µ 0 , Σ 0 of g, thus freeing us from searching the entire parameter space.This is because the alternative power supply makes the impact of line outage less severe.In fact, if µ 1 , Σ 1 are significantly far from µ 0 , Σ 0 , distinguishing between f and g would become trivial: imaging a large leap from pre-outage data to postoutage data in Fig. 1(b), one can detect this change very easily.Hence, a reasonable assumption is that µ 1 , Σ 1 are relatively close to µ 0 , Σ 0 , which actually results in a much harder detection problem.With this assumption, we can restrict our search for parameters in a constrained set where L(µ 1 , Σ 1 ) has good properties.To formally present this, we introduce the restricted convexity [30].
Then, we show that L(µ 1 , Σ 1 ) satisfies the restricted convexity in Definition 1. Specifically, L(µ 1 ) is restricted convex on constrained set Based on this property, we derive in Theorem 5 the convergence of updating µ 1 and Σ 1 .
iterations.The proof is in Appendix B.Moreover, since the best update converges faster as shown in Section VI, we choose it as the output of the learning scheme in Algorithm 1.To better visualize how the parameters are updated in the restricted convex area via Projected Gradient Descent (PGD), we provide Fig. 4. As we see, although the likelihood function L is not convex in the entire parameter space, the restricted area is convex, opening the door for learning accurate and feasible parameter solutions to (11).

B. Acceleration for Timely Operation
Theorem 5 shows that our proposed method can find the optimal parameters with polynomial-time complexity, which enables quick operation.In this subsection, we provide an efficient implementation of the learning scheme to further accelerate the algorithm for timely outage identification.To achieve so, we notice that while the matrix exponential and logarithm operations in (16) provide good properties of covariance estimation, it is very time-consuming when calculating them.The calculation is time-consuming because it is often based on their infinite Taylor series.To accelerate these operations, we propose to use finite terms of their Taylor series to approximate the operations.
The matrix exponential is given by the power series in (18), and can be approximated by its first K exp terms since 1 k! decreases drastically when k becomes large.However, once we replace the original matrix exponential operation with the approximated operation exp, we need to verify that exp also maps any symmetric matrix to a positive definite matrix to satisfy the conclusion in Theorem 4. Otherwise, we will arrive at covariance estimates outside the feasible domain.With this motivation, we show in Lemma 3 that if we choose an appropriate value of K exp for approximation, the operation exp has similar properties as exp.
Lemma 3. The matrix exponential can be approximated as for any real matrix X.Also, exp(X) ≻ 0 for any symmetric X if K exp is even and K exp > max{0, −a min }, where a min is the smallest eigenvalue of X.
Similarly, the matrix logarithm is given by the power series in (19), and can be approximated by the first K log terms.Lemma 4. The matrix logarithm can be approximated as In order to make Theorem 4 still hold true when we use the approximated operation log, we only need the symmetry of log, which can be easily verified as log(X In summary, the proposed two approximation operations exp and log can still preserve the feasibility of the covariance matrix estimation.The choice of K exp and K log is a trade-off between execution time and approximation accuracy: when K exp is small, the operation exp is very fast but provides a poor approximation of exp and vice versa.In Section VI, we will demonstrate how we choose an appropriate approximation level which results in almost zero errors and over 75% reduction in execution time.

VI. VALIDATE ON EXTENSIVE OUTAGE SCENARIOS WITH REAL-WORLD DATA
This section shows how our proposed method performs in various distribution grids with real-world data.To evaluate our method in systems with different sizes and environments, we design extensive experiments on IEEE 8-bus, IEEE 123bus networks [31], as well as two European representative distribution systems: medium voltage (MV) network in the urban area and low voltage (LV) network in the suburban area [32].In each network, bus 1 is selected as the slack bus.
To account for more complex outage scenarios in real-world distribution grids, we examine situations where alternative power sources are available after a line outage.In these scenarios, the "last gasp" notification is ineffective, making it more difficult to detect the line outage.We simulated the following two representative scenarios to replicate this complex scenario.It should also be noted that if certain buses are disconnected from the main grid and experience a voltage magnitude of zero following an outage, our method can accurately and quickly identify the out-of-service line.This is a simpler case compared to the ones we simulated below.
• Mesh networks where most buses have non-zero voltages after the outage since they can receive power from alternative branches.Mesh network often depicts the outage scenario in urban areas.For simulating mesh networks, we add loops in each aforementioned network to ensure it is still connected after line outages, following the study in [3] and [25].• Radial networks with DERs where some buses still receive power from DERs though they are isolated from the main grid after an outage (see Fig. 1(a)).This outage scenario is typical in residential areas.To simulate DERs, we select multiple buses to have solar power panels with batteries as the storage.For the solar panel, we use the power generation profile computed by PVWatts Calculator [33].
For simulating more realistic data, we use the real residential power profile from Duquesne Light Company (DLC) in Pittsburgh, PA, USA.This profile contains anonymized and secure hourly (and 15-minute) smart meter readings of active power over more than 5,000 houses in the year 2016.The basic statistics of this dataset are summarized in Table I.The time-series voltage data are simulated by the MATLAB Power System Simulation Package (MATPOWER) in MAT-LAB R2022b.In every distribution network, we assign active power p i [n] from the above DLC power profile to each bus i at time n.The reactive power q i [n] is computed according to a randomly generated power factor pf i [n], which follows a uniform distribution, e.g.pf i [n] ∼ Unif(0.9, 1).Based on the active and reactive power, we use MATPOWER to solve power flow equations and obtain voltage measurements.Moreover, we can simulate an outage scenario by setting the admittance of a branch or several branches to zero.Hence, we can generate the voltage data during both normal operation and outage scenarios.After we obtain voltage data from MATLAB, all the remaining calculations in outage detection in Algorithm 1 are implemented with Python 3.8 on a personal computer with a Windows 10 operating system, an Intel Core i7 processor clocked at 2.2 GHz, and 16 GB of RAM.
Due to the limited deployment of PMU in reality, the voltage phase angles are hard to obtain.Hence, as mentioned in Section II, we only use the voltage magnitude in the following experiments even though we model the voltage data in its phasor form.Another concern of data is the high dimensionality in large-scale grids.To resolve this computational issue, we apply the whitening transformation to our data as ∆v 1:N → W ∆v 1:N based on the PCA whitening matrix satisfying W ⊤ W = Σ −1 0 .Since the whitening transformation does not change the KL divergence between g and f , it has no impact on the outage detection performance.
In the subsequent experiments, we compare our proposed method with various baselines.When full knowledge of postoutage distribution f is known, we refer to the optimal Bayesian procedure as f known.When the parameters of f are unknown, our method is referred to as PGD.For baseline methods specifically designed for outage detection with unknown post-outage distribution, we consider an approximated maximum likelihood estimation (MLE) proposed to learn the unknown parameters [3], [34], a generalized likelihood ratio test (GLRT) that only considers finite possibilities [35] of post-outage distributions f , a Shewhart test [36] that utilizes meanshift and covariance changes in the data to detect outages.For methods that are developed for unknown post-change distribution in the change point detection, we consider a nonparametric binned generalized statistic (BGS) proposed to approximate the original ratio test in classic CPD [37], a non-parametric uncertain likelihood ratio (ULR) proposed to replace the original ratio [38], a distributed approach (DIS) [1], and a deep Q-network approach (DCQ) [39].
For more robust evaluation, each experiment will be conducted by the Monte Carlo simulation with over 1000 replications.In every replication, we randomly simulate outage time λ through geometric distribution Geo(ρ).This geometric prior is based on our belief that outages can occur independently at any time step, with an equal probability of ρ.We choose ρ = 0.04 in our experiments, which is derived from historical outage data, indicating that each time step has a 4% chance of experiencing a line outage.

A. Parameters Estimation with Accuracy and Convergence
Prior to demonstrating the accurate identification of outages with unknown post-outage distribution parameters, we must first verify that our method can learn the optimal parameters with a guaranteed convergence.Throughout the parameter learning iterations, we plot the Euclidean distance between the best update and the ground truth in Fig. 5.The plot indicates that our learning process converges to the ground truth, thereby verifying the convergence conclusion stated in Theorem 5.
Fig. 5. Distance between best update and ground truth against iterations.

B. Outage Detection with Small Delay and Rare False Alarm
After evaluating the effectiveness of using PGD to learn the unknown parameters, we then verify the performance of outage detection using such learned parameters.
The first criterion to evaluate our detection procedure is the average detection delay.To validate the asymptotic optimality of the detection delay in Theorem 2, in Fig. 6, we plot the average delay E(τ − λ|τ ≥ λ) divided by | log α| and the theoretical lower bound − log(1 − ρ) + D KL (f ||g).We observe that the average detection delay of the case when f is known and that of the PGD both achieve the optimal lower bound asymptotically, while the delay of PGD is slightly higher.Moreover, using PGD and accelerated PGD to learn the unknown post-outage distribution statistics enables quicker line outage detection compared to the method of MLE.The detection rule in Theorem 2 can also restrict the false alarm rate below maximum tolerance α.To verify this, we calculate the empirical false alarm rate P(τ < λ) and compare it against the upper bound α, as shown in Fig. 7. Our proposed method has similar performance compared to the case when f is known since the empirical false alarm is mainly below the upper bound α (especially when α → 0).This observation demonstrates that our proposed algorithm could quickly detect line outages with a low false alarm rate, even when the postoutage distribution statistics are unknown.In Table II, we present a summary of our proposed method's performance in various grid systems under different outage configurations.Our method demonstrates the ability to handle diverse outage scenarios in both mesh and radial networks with DERs penetration.Specifically, when f is unknown, our method exhibits a lower detection delay and significantly lower false alarm rate in comparison to MLE.Even when f is given, our proposed method only experiences a slight degradation compared to the benchmark.Furthermore, Table II reveals two additional phenomena.First, when multiple branches are out-of-service simultaneously, the average detection delay is shorter than in a single-line outage scenario due to the larger KL distance between distributions g and f when multiple lines are disconnected.Second, in the radial network with more simulated DERs, it takes more time to detect the line outage as the KL distance between g and f is smaller in this case.
To compare with more relevant methods in the literature, we provide in Table III the detection performance of our proposed method and other methods.The comparison of average detection delay and false alarm rate shows that our method is only slightly degraded from the benchmark even though we have incomplete information, and outperforms other methods that also has incomplete information.The reason for this is our performance guarantee, as stated in Theorem 5, which ensures the accurate estimation of unknown post-outage distribution parameters.Furthermore, upon comparing our approach (PGD) with the machine-learning-based method (DCQ), we notice that the latter displays a greater variance in the average detection delay and false alarm rate.This can be attributed to the fact that the neural network's parameters are randomly initialized during training, leading to a more varied estimation of the unknown post-outage distribution parameters.

C. Analysis of Execution Time for Timely Operation
In addition to detecting delay and false alarm rate, the execution time of the proposed method is also critical for timely detection.From the records, less than 3 seconds per sample is needed to obtain the outage detection result when we receive a new sample, even for grid systems with more than 100 buses.This execution time can be negligible compared to the normal smart meter sampling interval, which ranges from 1 minute to 1 hour.Moreover, since the most time-consuming part of our algorithm is the matrix exponential and matrix logarithm operation, we can accelerate the algorithm by approximating these operations based on their Taylor series expansion, as discussed in Section V-B.
To maintain the detection performance, we select an appropriate level of approximation with near-zero errors incurred.In Fig. 8, we choose K exp = 12 because at this approximation level, the executing time is reduced by more than 75% with almost zero errors incurred.Similarly, we choose K log = 16 which is slightly larger than K exp since the term 1  k in (19) decreases slower than term 1 k! in (18) as k becomes large.As a result, the accelerated PGD only shows a slight performance degradation, as shown in Fig. 6 and 7.More importantly, in Table IV, the acceleration technique reduces the execution time by more than half, thus achieving better timely outage detection.Evidently, the acceleration technique becomes more valuable in distribution systems with smart meters of a lower sampling rate, which is the trend of the future.Table IV exhibits another phenomenon: as the sampling rate increases, the processing time for accumulated data ∆v 1:N also increases.Consequently, conducting Algorithm 1 becomes challenging when N grows very large.To address this issue, we discovered that a small window of historical samples can adequately differentiate between the pre-and post-outage distributions.Specifically, instead of using all N samples when N is very large, we can employ the latest N 0 samples to represent the entire data stream since they contain nearly identical distribution information in the temporal dimension.By doing so, the time complexity of the algorithm is restricted to a constant number, N 0 .Through experiments, we determined that N 0 = 100 samples are sufficient to maintain the algorithm's effectiveness and accuracy.

D. Outage Branch Localization with Accuracy
After detecting an outage occurrence, we further compute the conditional correlation between buses to localize the outof-service branch, following Theorem 3. Here, Fig. 9 demonstrates the absolute conditional correlation of every pair of buses in the loopy 8-bus system before and after a line outage at branch 4-7.Since the value in the red box changes from a non-zero value before the outage (ρ − 47 > δ max ) to near zero after the outage (ρ + 47 < δ min ), we localize the outof-service branch at 4-7, which matches the ground truth.Fig. 9(d) indicates that the localization method using the learned covariance matrix through PGD is as effective as the optimal scenario, and is more effective than using the learned covariance matrix through MLE.
Table V demonstrates the accuracy rate of localization in 1,000 experiments.As shown, our proposed method can accurately localize over 90% of the outage branches, even without the post-outage distribution parameters.

E. Sensitivity Analysis to Data Noise and Data Coverage
Smart-meter data can be noisy and corrupted.Besides, smart-meter data may not be accessible in every household of the distribution grid.Thus, an analysis of our proposed method under different levels of data noise and data coverage is critical to gain a better understanding of its effectiveness in real-world outage scenarios.
In the U.S., ANSI C12.20 standard permits the utility smart meters to have an error within ±0.5% [40].Hence, we simulate such noise in our smart-meter voltage measurements and then evaluate the corresponding detection performance.Table VI shows both average detection delay and false alarm rate under our method with different noise levels.As we see, when the noise level is 0.5%, one more sample (compared to noiseless case) is needed for the detection, while the false alarm rate is also slightly increased.In fact, we are able to quantify the increase in detection delay by analyzing the change of KL divergence between the pre-and post-outage distributions affected by noisy data.In doing so, we are able to better understand and control real-world line outage detection.Another concern regarding the smart meter data is that it may not be accessible for every household in the distribution grid, particularly in certain situations.For instance, (1) in rural areas, some households may not have installed smart meters, (2) the voltage data for certain households may be lost due to technical issues, and (3) some households may refuse to provide their voltage data due to privacy concerns.Although the new generation of smart meters is developing very fast, an analysis of incomplete coverage of smart meters data is needed to evaluate our algorithm.We first emphasize that our proposed method does not rely on the assumption of 100% coverage of smart meters data in the grid.In fact, a power line outage will influence almost all buses in the system, while the degree of influence depends on the distance between a bus and the source of the outage.Hence, we can reveal the outage by detecting the distribution change of some (not necessarily all) voltage data collected nearby the outage source.
According to [41], over 107 million smart meters were deployed by 2021, covering 75% of U.S. households.Hence, we simulate this scenario where only a ratio of buses is randomly selected to provide its voltage measurements in the grid system to detect the outage.Fig. 10 demonstrates both the average detection delay and the false alarm rate of our method under different levels of coverage ratio.In comparison to the scenario where voltage data is available for all buses, the detection delay increases by 1.2 units of time step.This means that an extra 1.2 samples of data are needed to detect the outage in the 75% data coverage scenario.Similarly, when the data coverage ratio drops to 50%, an additional 6.9 samples are required for detection.Furthermore, as the data coverage ratio decreases to only 50%, the false alarm rate increases from 0.7% to 21.9%.

F. Sensitivity Analysis to Hyper-parameters
Our detection procedure involves certain hyper-parameters that have the potential to influence the detection performance, such as the geometric distribution parameter ρ.Therefore, conducting a sensitivity analysis pertaining to these hyperparameters is crucial to assess the robustness of our proposed method.
During our experiments, we randomly simulated the outage time λ using a geometric prior distribution denoted as Geo(ρ).This distribution aligns with our assumption that outages can take place in any time step with an equal probability of ρ.Fig. 11 illustrates the effect of the parameter ρ on the performance of our detection method.It can be observed that choosing different values of ρ within the range of 0.004 to 0.05 has a negligible impact on both the false alarm rate (approximately 1.65%) and the localization accuracy (approximately 92.8%).Additionally, decreasing the value of ρ leads to a slight increase in the average detection delay.VII.LIMITATIONS While this paper has some performance guarantee, we also encounter some of the limitations that we look forward to address in the future.For instance, while the proposed approach requires only voltage magnitude data, it may be limited by the quality and availability of this data.As shown in Section VI-E, noise or incomplete data will lead to additional detection delay.Future research could investigate how to leverage additional types of data to improve outage detection and localization.Another aspect worth investigating is the ability to withstand diverse outage scenarios.For instance, if an outage occurs in an insignificant branch of the grid, resulting in minimal fluctuations in voltage data, detecting such subtle outages remains a challenge.Hence, further research is necessary to improve the detection performance in such cases.Lastly, although sensor readings facilitate line outage detection, they pose privacy concerns since they can disclose sensitive information like household occupancy and economic status to potential adversaries.An open problem is how to identify outages accurately without compromising the customer's data.

VIII. CONCLUSION
This paper resolves three challenges in the line outage identification problem: data availability, unknown outage pattern, and timely operation.Our approach for detecting and localizing line outages only utilizes voltage magnitude.To handle unknown outage patterns, we propose a Projected Gradient Descent framework that can learn the unknown postoutage distribution parameters with a feasibility guarantee.We demonstrate the convergence guarantee of our method and further accelerate the proposed algorithm for timely operation, resulting in a reduction of more than 75% of execution time with minimal errors.Empirical results on representative grid systems confirm that our proposed method is suitable for timely outage detection and localization, even in the absence of prior knowledge about outage patterns.

Fig. 1 .
Fig. 1.An overview of the distribution grid line outage detection problem: we collect voltage magnitudes from smart meters installed at households and use the posterior probability ratio computed in (6) to detect the change in the underlying distribution of voltage increments.

Algorithm 1 :
Line outage identification with unknown post-outage distribution parameters Input: New observation ∆v[N ] Output: Outage time τ and outage location Set µ

Theorem 5 .
Using PGD to iteratively update θ i , the best update θ best i := arg min e∈[E] L(θ the optimal value θ * i = arg min θi L(θ i ) with a step size η = 1

Fig. 7 .
Fig. 7. Plots of the empirical false alarm rate against the theoretical probability of false alarm α in loopy 8-bus system (outage branch 4-7).

Fig. 8 .
Fig. 8.The ratio of saved execution time versus the ratio of error incurred by the operation exp against the level of approximation Kexp.

TABLE I STATISTICAL
ANALYSIS OF DLC POWER DATASET.
Table IV presents the execution time of Algorithm 1 on various grid systems with different sampling rates.
(21)21), we arrive at which upper bounds the sub-optimality at every e th iterate.Suppose we initialize the mean vector as µ11 , we can sum the sub-optimality across iterates and average it by dividing the total iterate number E as 1 + U 2 , where step size η = 1 √ E .Therefore, for any ε > 0, we can always use at most E = O( 1 ε 2 ) total iterates to make sure 1 + U 2 is a constant number.Then, we prove that the averaged and best update both converge to µ *1 after E updates.Applying Jensen's inequality, we derive L(µ avg 1 ) = L( 1 1 ) ≤ L(µ * 1 ) + ε.