Nonlinear Least Absolute Value Estimator for Topology Error Detection and Robust State Estimation

Topology error, a modeling misrepresentation of the power system network configuration, can undermine the quality of state estimation. In this paper, we propose a new methodology for robust power system state estimation (PSSE) modeled by AC power flow equations when there exists a small number of topological errors. The developed technique utilizes the availability of a large number of SCADA measurements and minimizes the $\ell _{1}$ norm of nonconvex residuals augmented by a nonlinear, but convex, regularizer. Representing the power network by a graph, we first study the properties of the solution obtained from the proposed NLAV estimator and demonstrate that, under mild conditions, this solution identifies a small subgraph of the network that contains the topological errors in the model used for the state estimation problem. Then, we introduce a method that can efficiently detect the topological errors by searching over the identified subgraph. In addition, we develop a theoretical upper bound on the state estimation error to guarantee the accuracy of the proposed state estimation technique. The efficacy of the developed framework is demonstrated through numerical simulations on IEEE benchmark systems.


I. INTRODUCTION
S AFEGUARDING power system infrastructures against cascading failures is a crucial challenge when operating these systems and if not managed properly, could lead to blackouts [1], [2]. In order to do so, the power network must be constantly overseen so that, if needed, appropriate actions can be taken. This monitoring is achieved via realtime state estimation that aims to recover the underlying system voltage phasors, given supervisory control and data acquisition (SCADA) measurements and a system model that encodes the network topology and specifications [3], [4]. In fact, state estimation not only helps prevent failures in the power network, but it also underpins every aspect of real-time power system operation and control. To ensure an accurate state estimation, it is essential to have the capability of detecting bad data. Assuming that the network parameters are known and the measurement devices are correctly calibrated, the main source of bad data is topological errors in the model. Topological errors refer to the inaccurate modeling of the current network configuration and are often initiated by the misconception of the system operator about the on/off switching status of a few lines in the network due to faults or unreported network reconfigurations. Due to their significant impact on the quality of state estimation, coping with bad data and detecting topological errors have received considerable attention in the past few decades. In addition, recent research shows the impact of topology errors on real-time market operations such as locational marginal pricing [5].

A. LITERATURE SURVEY ON TOPOLOGICAL ERROR DETECTION
Bayesian hypothesis testing [6], collinearity testing [7], and fuzzy pattern machine [8] are examples of statistical approaches for detecting topological errors. These methods often require prior information on the states and/or a significant amount of historical data from past measurements. Other approaches venture to devise state estimators that are robust against topological errors and measurement noise. The work [9] used normalized Lagrange multipliers of the least-squares state estimation problem, and despite being a heuristic method, has been shown to be effective in some cases. Later studies, such as [10], improved on this approach. Another noteworthy method in this category is the least absolute value (LAV) estimator, which was used in the context of power systems in [11]. By minimizing the 1 norm of the residual vector obtained from the linearized measurement equations, the LAV is able to find a minimum set of measurements untainted by gross errors, thus dismissing bad data and generating a robust state estimate. In spite of its strength, the LAV is susceptible to leverage points as discussed in [12], [13]. Therefore, further investigation and suggestion of various methods to resolve this issue have been made in [14], [15]. The work of [16] showed that the adverse effect of leverage points can be mitigated if measurements consist only of phasor measurement units (PMUs). The similarity of the methods mentioned so far is their reliance on linearized measurement equations. There is a limited number of research that have studied the fully nonlinear, non-convex problem with power measurements, for instance, [17] where a semidefinite programming (SDP) relaxation was proposed to convexify the nonlinear LAV state estimator; however, no theoretical guarantees have been developed to guarantee the recovery of a high-quality solution. Furthermore, the computational burden of solving the surrogate SDP problem may limit the use of this approach to relatively small-sized problems in practice. In [18], without considering topological errors in the model, the authors studied conditions under which a linearized iterative algorithm can recover the true state from nonlinear measurements that contain sparse bad data. The conditions are difficult to check and have not been verified for real-world power networks. The recent work [19] developed a modified least absolute value state estimator that is experimentally robust against both bad data and topological errors but may require theoretical guarantees and extended simulation results. These issues prompt further research on developing robust state estimation techniques with the capability of managing nonconvexities associated with various types of measurements.

B. CONTRIBUTIONS
Taking into account the theoretical guarantees recently developed for the 2 -norm to avoid spurious local minimizers in nonconvex optimization [20] and arising promises for the 1norm [21], our paper introduces a local search algorithm that can find the global solution of the nonlinear LAV (NLAV) state estimator with high probability. The proposed technique presents a robust approach for estimating the power system's voltages in the presence of a modest number of topological errors as well as detecting such errors. We summarize the main contributions of this work as follows: (1) introducing an algorithm for identifying topology errors and estimating the power system states using an NLAV state estimator combined with local search algorithms, (2) formulating a regularized NLAV state estimator to handle severe nonconvexities, (3) finding error bounds and necessary properties for the regularization parameters. As discussed in the later part of this paper, fast local search algorithms would efficiently find global solutions of the underlying NLAV estimators given a sufficient number of noiseless measurements and an adequate initialization of the algorithm. This manuscript is an extended version of the conference paper [22] with new additions including an updated Theorem 2, updated topology error detection algorithm (Algorithm 1), comprehensive case studies and a complete appendix with proofs. In Theorem 2 of [22], we derived a result stating that the line residual graph is a subset of the extended state estimation error graph, which contains the topological errors. The extended state estimation error graph is unattainable in practice and we resorted to searching over the extended line residual graph, which lacked strong theoretical support. To resolve this issue, in the updated Theorem 2 of this manuscript, we show that the topological errors are contained within a small subgraph of the power network, called the suspect-subgraph. This provides a solid theoretical foundation for our algorithm and improves on the efficiency of the detection algorithm since the search now iterates over a smaller subgraph than before. This change is also reflected in the algorithm, and therefore we present an updated version of Algorithm 1 in this manuscript.
The remainder of this paper is organized as follows. Preliminary materials such as notations and definitions are presented in Section II, followed by the formulation of the algorithm and the main theoretical results in Section III. A comprehensive set of numerical simulations on the IEEE 57bus system and the 118-bus system is presented in Section IV. Finally, the summary and concluding remarks are drawn in Section V. The proofs are provided in the Appendix.

A. NOTATIONS
In this paper, lower case letters stand for column vectors, upper case letters stand for matrices and calligraphic letters represent sets and graphs. The sets of real and complex numbers are represented by symbols R and C, respectively. Subsequently, R N and C N stand for the spaces of N -dimensional real and complex vectors, respectively. Next, S N and H N denote the sets of N × N complex symmetric matrices and Hermitian matrices, respectively. The transpose and conjugate transpose of a vector or matrix is denoted by the symbols (·) T and (·) * , respectively. The notation (·) c indicates the set complement. Re(·), Im(·), rank(·) and Tr(·) denote the real part, imaginary part, rank and trace of a given scalar or matrix. The notations x 1 , x 2 and X F indicates the 1norm and 2 -norm of vector x respectively, and the Frobenius norm of matrix X. The symbol X, Y denotes the Frobenius inner product of the matrices X and Y . The symbol | · | is the absolute value operator if the argument is a scalar, vector, or matrix; otherwise, it signifies the cardinality of a measurable set. The relation X 0 means that the matrix X is Hermitian positive semidefinite. The (i, j) entry of X is denoted by X i,j . The notation X[S 1 , S 2 ] denotes the submatrix of X whose rows and columns are chosen from the index sets S 1 and S 2 , respectively. I N denotes the N × N identity matrix. For a given vector x, the symbol diag(x) denotes its diagonalized matrix, whereas for a matrix X, diag(X) denotes the vector consisting of the diagonal elements of X. The i-th smallest eigenvalue of the matrix X is denoted by λ i (X). Given a graph G, the notation G(V, E) implies that V and E are the vertex set and the edge set of this graph, respectively. The imaginary unit is denoted by j = √ −1. The symbol 1 denotes a vector of all ones with appropriate dimension.

B. POWER SYSTEM SCADA MEASUREMENTS
Let an electric power network be described by a graph G(V, E), where V := {1, . . . , K} and E := {1, . . . , L} denote the sets of buses and lines (branches), respectively. We make the assumption that the slack bus is also the reference bus. Let v k ∈ C represent the complex voltage at bus k ∈ V, whose magnitude and phase angle are denoted as |v k | and v k . The net apparent power injected at bus k is denoted by s k = p k + q k j. Given a fixed orientation on the branches, there are two complex power flows associated with each line. Define s l,f = p l,f + q l,f j and s l,t = p l,t + q l,t j as the complex power flows coming into the line l ∈ E through the 'from' and 'to' end of the branch. Let v and i be the vectors of nodal complex voltages and net current injections, respectively. Following Ohm's law, we know that where Y ∈ C K×K symbolizes the nodal admittance matrix of the power network. Furthermore, Y f ∈ C L×K and Y t ∈ C L×K represent the 'from' and 'to' branch admittance matrices. Let {e 1 , . . . , e K } denote the canonical vectors in R K . Define the following three nodal measurement matrices: Next, let {d 1 , . . . , d L } be the canonical vectors in R L . Define the following four line measurement matrices associated with branch l, which has a from node i and a to node j: Then, the traditional measurable quantities can be expressed as the following seven equations, each of which is a simple quadratic function of the complex voltage vector v.
In a power system, measurements are acquired through the SCADA system. Available measurements consist a subset of the entire measurable quantities. Given a power system model Ω characterized by the tuple (Y, Y f , Y t ) and an index set of measurements M = {1, . . . , M } of the form (4), the mapping from the measurement index set to the set of measurement matrices can be defined as where each A j (Ω) represents one of the matrices defined in (2) and (3), depending on the type of measurement j. Next, we define the real-valued state vector and the corresponding real-valued matrices. This enables us to solve optimization problems involving complex voltages in the real-domain. The dimension of the real-valued state vector is 2K − 1 because the voltage angle at the slack/reference bus is fixed to be zero. Accordingly, the matrices also have 2K − 1 rows and columns.
T }] T ∈ R 2K−1 to be the real-valued state vector where O denotes the set of all buses except for the slack bus. In addition, defineX ∈ S 2K−1 to be the realvalued symmetrization of X ∈ H K . To elaborate, note that a general K × K Hermitian matrix can be mapped into a (2K−1)×(2K−1) real-valued symmetric matrix as follows: Finally, we define an operator that maps the state vector to the vector of measurement values.
In this paper, we disregard PMU measurements and only consider voltage magnitude and power measurements to streamline the presentation without loss of generality in our technique. More precisely, if we have access to PMU measurements, they can be viewed as quadratic equations with zero quadratic terms and can be easily incorporated in the current framework. To elaborate, we can append the realvalued state vectorv with a scalar variable u and impose the condition u 2 − 1 = 0 so that its interaction with the other variables can create linear terms. Note that this can result in u taking the value of 1 or -1. If the solution that we obtain VOLUME 4, 2016 results in u = −1, we can simply negate the rest of the values to obtain a meaningful solution.

III. MAIN RESULTS
In this section, we first briefly discuss the most commonly used nonlinear least-squares (NLS) state estimator and its limitations. Next, we present the NLAV formulation and derive a theoretical upper bound on the state estimation error obtained by the NLAV estimator. Finally, we uncover certain properties of the vector of residual errors and design a new algorithm that performs state estimation and topology error detection in a jointly fashion.

A. NONLINEAR LEAST-SQUARES STATE ESTIMATION
The most widely used state estimation technique is the nonlinear least-squares (NLS), first proposed by Schweppe [23], [24]. The objective of NLS is to minimize the 2 -norm of the estimation residuals, which is often executed by local search algorithms such as the Gauss-Newton method. These methods however only guarantee a locally optimal solution, which could correspond to an estimate of the state that is significantly different from the true underlying voltages. Interestingly, recent research have shown that local search algorithms are capable of finding a globally optimal solution of this nonconvex problem when the number of measurements is relatively higher than the degree of the freedom of the system and the measurements are noiseless [4], [20]. As is the case with any other estimator, this method requires that the system's network topology (see Definition 5) be known. However, owing to the existence of topological errors resulting from simple faults or recent changes in the switching status of some lines, the model that the system operator has at hand may be different from the true network. The measurement data at the neighborhood of the incorrectly modeled lines are potential outliers, which can adversely impact the solution of the state estimation problem over a large fraction of the network. This is due to the fact that the 2 -norm is inadequate in dealing with outliers and simulation results supporting this fact are shown in Figure 3 followed by further discussions in Section IV-B. Despite the drawbacks of NLS, the work [6] developed an effective tool for topology error detection using Bayesian-based hypothesis testing and the covariance matrix of the states. The method that we propose in this paper does not require the covariance information but takes advantage of the favorable aspects of 1 -norm minimization. As mentioned in Section I, there are existing works that have studied the nonlinear least absolute value estimator [18], [19]. In fact, [19] considered a wide variety of parameter errors and the presented numerical experiments are very promising. In this paper, we focus on topological errors in the model and develop theoretical guarantees along with a graph-based intuition for detecting those errors.

B. PROPOSED NLAV FORMULATION
For the remainder of this paper, a line whose presence in the network is misrepresented by the system operator is called erroneous and the set of all erroneous lines is denoted by Ξ. Let A Ω (M) denote the set of measurement matrices associated with the true system Ω, and A Ω (M) denote the set of measurement matrices corresponding to the inaccurate model Ω that the system operator possesses. In this work, we make the assumption that Ω and Ω are sparsely different, in other words, the set of lines for which the operator misconceives their switch statuses constitutes only a small subset of the entire lines. It makes sense to only focus on sparse differences because topological errors often occur from low probability events and therefore it is unlikely that the operator's model be significantly different from the true model. We propose the following optimization problem as the first step to designing an algorithm that jointly performs state estimation and sparse topological error detection: In the above equation,z ∈ R 2K−1 symbolizes the true underlying state of the system and η denotes the noise vector. Note that the measurement values b are based on the true system Ω andz. Also, A 0 ∈ S K is a regularization matrix while ρ is a regularization coefficient. As discussed later in the paper, these two parameters help with deriving an upper bound on the state estimation error and also facilitate the convexification of the problem for finding a robust solution using local search algorithms. From here on, we assume that the measurement set M is observable. A necessary condition for observability is that the Jacobian of the measurement equations be full row rank [25]. Letv * denote a globally optimal solution of (9). Then, let ∈ R 2K−1 be the state estimation error vector and r ∈ R K be the residual error vector, defined as =v * −z (11a) By virtue of the 1 norm, the problem (9) attempts to push the insignificant residual errors to hard zeros, while the residuals r j 's associated with the outlier measurements are expected to remain nonzero. This phenomenon is observed empirically through an example in Figure 3(d). The performance of this estimator has a striking contrast with that of the 2 minimization (Figure 3(c)) where the residuals are spread out across all the measurements. In the sections that follow, we use this intuition to design an efficient topological error detection algorithm. Bear in mind that, similar to the NLS method, the objective function of NLAV is nonlinear and nonconvex, which makes local search algorithms prone to being stuck at spurious local solutions. However, recent studies have shown that increasing the number of redundant measurements helps with reducing the non-convexity of NLS problems and hence, improves the likelihood of obtaining their global solutions using local search algorithms [4], [20]. Therefore, having access to many measurements is crucial for improving the quality of real-world state estimation problems. This property is expected to hold for the NLAV estimator too, as partially proven in [21]. The possibility of falling into a local optimum can be further avoided by initializing the algorithm close to the unknown state. This is achievable because in power systems, voltage magnitudes are maintained close to 1 and voltage angles are kept to be small. Therefore, choosing the initial point to be the nominal point 1 would likely ensure that it is relatively close to the true state.

C. ESTIMATION ERROR
Given a design matrix A 0 , we intend to prove a theoretical upper bound on the state estimation error obtained by the NLAV problem (9). To this end, it is useful to introduce the concept of dual certificate: Definition 3. Given a positive-semidefinite regularization matrix A 0 ∈ S K , a system model Ω and a set of measurement is called a dual certificate for the voltage vector v ∈ C K of the system model Ω if it satisfies the following three conditions: In essence, the existence of a dual certificate ensures that the second-smallest eigenvalue of H Ω µ is strictly positive, which enables us to derive an upper-bound of the form presented in the following theorem. Theorem 1. Consider the scenario where the power system operator has a network model Ω and a set of measurement indices M. Under this setting, assume that there exists a dual certificate µ for the true state vector z. Also, consider a parameter ρ satisfying ρ ≥ max j∈M |µ j |. Then, there exists a real-valued scalar β such that where g(z, η, ρ) is equal to with M ⊂ M being the set of measurement indices that correspond to the erroneous lines.
By recalling thatv * andz are, respectively, the recovered and true states of the system, inequality (13) quantitatively bounds the state estimation error. The bound has several important characteristics. First, if the measurements are noiseless and there is no topology error, the NLAV estimator recovers a high-quality solution if not the actual state. On the other hand, if there are measurement noise and topology error, the upper bound for the state estimation error increases proportionally to the magnitude of noise and the number of topology errors. Note that topology errors do not affect all the measurement matrices but only a subset, which is captured by the set M . Second, the upper bound is inversely proportional to the second smallest eigenvalue of the matrix H Ω µ , which acts as the Laplacian of a weighted graph corresponding to the power network. The second smallest eigenvalue of this matrix is also called the algebraic connectivity [26] in graph theory, a parameter that gauges how well-connected the (weighted) graph is. For instance, a fully connected graph has the algebraic connectivity of K while this value is equal to 2 for a star-shaped graph and 2(1 − cos π K ) for a path graph (where K denotes the number of nodes in the graph). In the special case when A 0 reflects the connectivity of the original network G (i.e., i = j and (i, j) / ∈ E =⇒ A 0 (i, j) = 0), the second smallest eigenvalue of H Ω µ represents the algebraic connectivity of the original network where different edges are assigned with different weights. As a final note, a unique solution of the NLAV is not guaranteed by the performance bound in equation (13). For conditions that guarantee the uniqueness of NLAV solution, the reader is referred to Theorem 3.

D. SPARSE SUSPECT-SUBGRAPH
As shown above, the quality of the state estimation deteriorates under the presence of topological errors. Our approach for detecting and correcting these topological errors can be outlined as follows. To start, we solve (9) and utilize the pattern of the nonzero residuals errors to identify a (small) subset of lines that are potentially erroneous in the model. We call this subset the suspect-subgraph, which we then efficiently search through to identify the topological errors. This is followed by a correction of the model and a reestimation of the system states. To formalize this approach, we first introduce some relevant subgraphs.
On the other hand, if k is zero, node k is called solvable. Define the following four subgraphs of G: 1) The state estimation error graph S(V S , E S ) is such that V S is the set of unsolvable nodes and E S is the set of all edges that have both endpoints in V S 2) The extended state estimation error graph S(V S , E S ) is such that V S includes all nodes in V S and also those nodes that are adjacent to any node in V S . The edge set E S consists of all edges that have both endpoints in V S .

3) The node residual graph
is the set of nodes whose associated entries in r are nonzero, and E N is the set of all edges that have both endpoints in V N . 4) The line residual graph R L (V L , E L ) is such that E L is the set of edges whose associated entry in r is nonzero. The vertex set V L is the set of nodes that are either at the 'from' or 'to' end of a line in E L .
In order to help the reader visualize the different subgraphs, we illustrate Definition 4 for a small system in  Theorem 2. Suppose that the measurements are noiseless, i.e., η = 0. In addition, assume that there do not exist any two distinct vectors of voltages resulting in the same measurement values, i.e., Then, Moreover, if no two erroneous lines share the same node, the following statements hold: The relationships between different subgraphs are illustrated in Figure 2. It is important to note that due to the sparsity of the state estimation error (as shown in Figure 3(b)) and the sparsity assumption on Ξ, most lines belong to the set S c ∩ Ξ c . From Figure 2, it can also be inferred that Ξ ⊆ (R N \ R L ) ⊆ ( S c ∩ Ξ c ) c . Henceforth, the pragmatic benefit of Theorem 2 is that it enables us to develop a method for efficiently detecting topology errors by probing over a small subgraph of the original power system model. We call this small subgraph, namely (R N \ R L ), the suspectsubgraph.

Construct the suspect-subgraph
Update Ω t to Ω t by altering the on/off status of l.
Re-solve (9) with Ω t , A Ω t (M) and b to obtain the outputsv update * and r update if r update 2 < r t 2 then Add l to D L and set Ω ← Ω t , r t ← r update . end if end for end while 4. Returnv update * and D L

E. ALGORITHM
Based on the results established so far, we propose Algorithm 1 for detecting topology errors while performing state estimation. Algorithm 1 begins by initializing the set of detected erroneous lines, denoted by D L , with the empty set. Then, the algorithm inspects all branches in the suspect-subgraph (R N \ R L ), and computes the effect that the existence of each line has on the accuracy of solution. Consequently, the method switches a line on if it is off in the model and vice versa, updates the model based on this change, and resolves the NLAV problem with the modified model. If the objective value of NLAV goes down, the line is stored in D L ; otherwise, the update of line status is dismissed and the algorithm proceeds to check another line until all lines of (R N \ R L ) are evaluated. The justification for using such a criteria is explained in the Appendix section D.  (d), the x-axis shows the measurement tag, which is not the same as the node or line number due to the concatenation of different types of measurements.

F. UNPENALIZED NLAV ESTIMATOR AND UNIQUE SOLUTION
After all the topological errors have been detected and fixed, a final state estimation based on the correct network topology can be performed. However, this does not necessarily guarantee a recovery of the true statez. In this subsection, we disregard the regularization term A 0 for simplicity and call this the unpenalized NLAV problem (in other words, we set A 0 to 0). Without prior knowledge of the state, designing a favorable A 0 penalty term could be difficult, in which case setting A 0 to zero makes logical sense. Theorem 3 provides a sufficient condition under which the unpenalized NLAV problem has a unique solution. Since without A 0 , the state estimation error bound provided in Theorem 1 is no longer valid, Theorem 3 also provides a new bound.
where t is defined as the optimal objective value of the following optimization problem: One can easily verify that there does not exist any set of noiseless measurements for the model Ω that leads to nonunique exact solutions if and only if t > 0. That is to say, if t > 0, any global optimizer of the NLAV problem matches the true underlying state that we hope to find (note that this is for when all topological errors have been identified and corrected). Therefore, t can be interpreted as a quantification of the measurement set's capability to generate a unique solution of the over-determined power flow equations. In addition, if t > 0, then condition(15) is implied.
Recently, there has been some study on the connection between the property of no spurious local minima and the restricted isometry property (RIP). A linear map H : R K×K → R M is said to satisfy (r, δ r )-RIP with constant 0 ≤ δ r < 1 if there exists p > 0 such that for all rank-r matrices X: If H satisfies (2r, δ 2r )-RIP with δ 2r < 1, then finding a global optimum constitutes exact recovery of the state [27]. However, this does not exclude the existence of spurious local minima (local minima that are not globally optimal), which can be problematic when using local search algorithms. In order to guarantee no spurious local minima, H suffices to satisfy (2r, δ 2r )-RIP with δ 2r < 0.2, which is a strict condition [28]. A milder condition on RIP for structured mappings (such as power subsystems) has been developed in [29]. The parameter t introduced above is clearly related to the RIP constant. In fact, t > 0 is equivalent to having δ 2r < 1, which implies that there is a unique global solution.

IV. SIMULATION RESULTS
In order to assess the efficacy of the proposed NLAV algorithm for detecting topological errors, this section presents numerical simulations on the IEEE 57-bus system and the 118-bus system. For running the simulations, we use MAT-POWER data along with the MATLAB fmincon as the local search algorithm.

A. SIMULATION SETUP
In this study we focus on two types of topological errors. Type I error is when a transmission line is switched off in the true system while it is switched on in the hypothetical model that is accessible to the power system operator; Type II error is when a branch is switched on in the true model while it is switched off in the hypothetical model. Our numerical evaluations consist of multiple cases where we vary the number of erroneous lines and the percentage of line measurements that are available. The procedure of running the simulations is as follows: (1) For a given number of erroneous lines and line measurement percentage, we run 20 simulations; (2) In each simulation the erroneous lines are randomly chosen and checked to ensure that they satisfy the system's observability and that they do not share VOLUME 4, 2016 common buses; (3) The type of topological error is also randomly assigned to each selected erroneous line; (4) In all simulations full nodal measurements (p k , q k and |v k |) are considered; (5) The line measurements are randomly selected from the intact lines and no measurements are taken from the erroneous ones; (6) To generate a legitimate state, we assume that the voltage magnitudes are close to unity and the angles are small. More specifically, we select the unknown state by sampling each voltage magnitude from a normal distribution with mean equal to 1 and standard deviation equal to 0.1. This is more than a reasonable range since most voltage magnitudes lie within 5% of the nominal value. Voltage angles, computed in radians, are sampled from a normal distribution with mean equal to 0 and standard deviation equal to 0.1. In order to assess the performance of the algorithm, we calculate the true/false positive rates and the suspect rate as: In addition, we also report the number of lines that the algorithm checks before termination, which is simply the cardinality of the set (R N \ R L ). Note that for most of this section (with the exception of subsection IV-E), the measurement values are assumed to be noiseless.

B. EXAMPLE: SPARSE RESIDUALS FOR NLAV
Before analyzing the bulk of simulations data, we concentrate on a specific example to visually illustrate the ideas discussed in Section III. The example under scrutiny is for the scenario with two erroneous lines (lines 8 and 67) and 30% line measurements. Figures 3(a) and 3(c) show the state estimation errors and residuals of NLS in the presence of topological errors. It can be observed from these plots that there is an absence of sparsity pattern, and the large peaks are not even related to the end points of the erroneous lines. This indicates that we need to scan over all realizable combinations of transmission lines to detect the erroneous ones, which is numerically intractable for large systems. In contrast, the state estimation errors and the residuals after the first run of the NLAV (i.e. in the presence of all the topolgical errors) are shown in Figure 3(b) and 3(d). The largest peaks of the residual vector in this plot are associated with the nodes/lines that are directly connected to (or correspond to) the erroneous lines. This implies that the erroneous lines can be detected by searching over only those lines that are related to the largest peaks of the residual vector. Consequently, as stated in Algorithm 1, the two erroneous lines are correctly identified. In the following subsection, we present a summary of the extensive simulations conducted on the IEEE 57-bus system.

C. 57-BUS SYSTEM
For the 57-bus system, we consider {1, 3, . . . , 15} as the discretized range for the possible number of erroneous lines and {0%, 10%, 20%, . . . , 100%} as the discretized range for the possible line measurement percentage. Combining these two sets gives the total of 88 scenarios for this system. Figure 4 shows heat maps of the performance statistics for the above-mentioned 88 scenarios. Figure 4(c) shows that an erroneous line is in the suspect subgraph with high probability. In fact, all of the values are above 0.98, which illustrates that the assumptions made in Theorem 2 are reasonable. Figure 4(a) implies that Algorithm 1 is able to detect most of the erroneous lines given a sufficient number of measurements, and Figure 4(b) indicates that there is close to zero false positives. We can also see that detecting topological errors becomes more difficult as the number of such errors grows. However, note that the number of lines that need to be checked grows only linearly with respect to the number of erroneous lines. More specifically, Figure 4(d) shows that the number of lines to be checked is approximately twice the number of erroneous lines. These results imply that the proposed algorithm is capable of accurately detecting topological errors and therefore provides a tool for robust state estimation if the number of measurements is large enough. The computational time for each run ranges from 5 to 30 seconds (depending on the number of erroneous lines considered) on a laptop with 16GB RAM and an Intel i7-8750H processor.

D. 118-BUS SYSTEM
To better assess the performance of the proposed NLAV estimator in a more realistic problem, we apply the proposed technique to analyze the IEEE 118-bus system. We pursue the procedure described in Section IV-A for numerical simulations, but consider {5, 15, 25} and {10%, 40%, 70%, 100%}, respectively, as the candidate number of erroneous lines and line measurement percentages. TABLE 1 illustrates the state estimation and topological error detection results of these analyses, which are well matched with the ones for the IEEE 57-bus system. The computational time for each run with 5 erroneous lines ranges from 1 to 3 minutes on a laptop with 16GB RAM and an Intel i7-8750H processor.

E. NOISY MEASUREMENTS
So far, the numerical simulations have been performed with noiseless measurements. When measurements are tainted with noise, this can affect the overall residual vector and in turn hinder the topology error detection. In order to analyze this more carefully, first consider Figure 5(a) which shows the nodal measurement residuals obtained by solving the initial NLAV on the IEEE-57 bus system with 30 percent line measurement, no noise in measurement values and the regularization matrix A 0 set to zero. The erroneous lines are line 8 and line 67, same as with the example in Figure 3. Note that this is a different instance of the problem with a distinct set of measurements and a distinct underlying true state. The    Figure 5(a), we can see that these buses are identified, leading to the correct detection of the erroneous lines. However, once noise is added to the measurements, not only do we get a residual vector that is not sparse, but also the peaks of the residuals sometimes fail to identify the relevant buses. This is illustrated in Figure 5(b), where the peaks of the residuals do not capture bus 8 and bus 29, leading to a failure in topology error detection. An interesting result is observed once we add more line measurements and also a nonzero regularization matrix A 0 = Y * Y. This result is shown in Figure 5(c), where we can observe that the residuals corresponding to bus 8 and bus 29 are shown peaking as opposed to being obsolete in the Figure 5(b). Therefore, we see that a high number of measurements combined with an appropriate regularizer offers a state estimator that is more robust to noise.

V. CONCLUSION
In this article we propose a novel methodology to solve the state estimation problem for power systems when there exists a modest number of topology errors and also to identify such modeling errors. The established technique minimizes a nonconvex function corresponding to the 1 -norm of the nonconvex residual errors plus a convex quadratic regularization term. We show that, under mild assumptions, the presented Before going into the proof, we impose the following two conditions for A 0 : Assumption 1. The regularizer matrix A 0 satisfies the following properties: Consider the NLAV problem (9). One can create lower and upper bounds on the optimal objective value as follows: where (a) is due to the triangle inequality and (b) is due to the optimality of v * . The equality (c) follows fromĀ j ( Ω) = A j (Ω) whenever j / ∈ M . Combining the above lower and upper bounds leads tō By adding and subtractingz TĀ j ( Ω)z in the absolute value of the left-hand side, one can write: Now, consider the following optimization problem that serves as a tool for deriving a lower bound: Here y is a fictitious variable with a dimension of choice, and we call the objective of the above problem as f . By introducing a new variable t ∈ R M , an equivalent formulation can be written as Let p + j 's and p − j 's be the nonnegative Lagrange multipliers for the first and second sets of constraints. The Lagrangian can be written as By defining d(p + , p − ) = min t L(t, p + , p − ) and noting that p + j + p − j = ρ for every j ∈ M at optimality, we have Note that d(p + , p − ) gives a lower bound on f . By assumption, there exists a dual certificate µ ∈ R M . We can find a set of vectors p + * and p − * such that they satisfy the previous constraint p + * + p − * = ρ · 1 and also a new constraint p + * − p − * = µ. Then, d(p + * , p − * ) also gives a lower bound to f . Using the fact that H Ω µ z = 0 and defining X =v * v T * , we can establish the following: The rest of the proof can be adopted from [30] (Appendix, Proof of Theorom 2). Consider an eigen-decomposition of H Ω µ = U ΛU T , where Λ = diag(λ 2K−1 , ..., λ 1 ) such that λ 2K−1 ≥ · · · ≥ λ 1 and U is a unitary matrix whose columns are the corresponding eigenvectors. Definȇ where X is the (2K − 2) th -order leading principle submatrix ofX, x is the (2K − 2) × 1 leftover vector and α is a scalar. It is known that Combining (30) and (23) leads to Tr X ≤ 2 · g(z, η, ρ)/λ 2 (H Ω µ ) Define z =z/ z 2 and v * =v * / v * 2 . Since H Ω µ is positive-semidefinite and the eigenvector corresponding to the smallest eigenvalue (i.e. zero) isz, the matrix X can be decomposed as SinceX 0, Schur complement dictates the relationship X − α −1 x x T 0. Using the fact that α = Tr(X) − Tr( X), one can write Therefore, where (d) follows from the fact that U * z = 0, (e) is due to U T U = I 2K−2 and (f) comes from the fact that X F ≤ Tr( X). Finally, (g) results from substituting equation (31).
By defining β = α/ z 2 2 and realizing that Tr(v * v By notational simplicity, we denote x(i) as the i-th element of a vector x. Notice that where (h) and (i) are due to Cauchy-Schwarz and Holder's inequality, respectively. Now combining this inequality with (36) leads to which completes the proof.

B. PROOF OF THEOREM 2
Define N (k) to be the set of nodes adjacent to node k, including k itself. We will focus on a line l ∈ E that connects two nodes i and j.
(1) First, consider the case when l ∈ S c ∩ Ξ. The fact that l / ∈ S implies that all nodes in the set N (i) ∪ N (j) are solvable. Also, since l ∈ Ξ, the nodal residual at nodes i and j are nonzero, which means that i, j ∈ V N . Finally, noting that l / ∈ R L because there is no line measurement for an erroneous line, we can conclude that l ∈ R N \ R L .
(2) Second, consider the case when l ∈ S c ∩ Ξ c . Again, the fact that l / ∈ S implies that all nodes in the set N (i) ∪ N (j) are solvable. Also, since l ∈ Ξ c , the nodal residuals at nodes i and j are zero, and the line residuals on line l is zero. Therefore, we can conclude that l / ∈ R N ∪ R L . (3) Third, consider the case when l ∈ S ∩ Ξ c . Since l ∈ S, at least one node in N (i) and at least one node in N (j) are unsolvable. From here, two different scenarios can happen. Scenario one is when at least one of nodes i and j is unsolvable. In this case, using the fact that there do not exist two distinct set of voltages that result in the same measurement values, we can easily conclude that l ∈ R L ∩ R N . Scenario two is when both nodes i and j are solvable. In this scenario, the nodal residual at nodes i and j are nonzero but the line residual at l is zero. Therefore, l ∈ R N \ R L . (4) Finally, consider the case when l ∈ S ∩ Ξ. Since l ∈ S, at least one node in N (i) and at least one node in N (j) are unsolvable. Also, since l ∈ Ξ, the nodal residual at nodes i and j are nonzero, which means that i, j ∈ V N . Finally, noting that l / ∈ R L because there is no line measurement for an erroneous line, we can conclude that l ∈ R N \ R L . From (1)-(4), we can deduce that l ∈ R L =⇒ l ∈ Ξ c ∩ S , which proves the first part of the theorem. Furthermore, we can see that l ∈ S ∪ Ξ =⇒ l ∈ R N . Finally, from (2) specifically, we also know that if l ∈ S c ∩ Ξ c =⇒ l / ∈ R N . This concludes the fact that R N = S ∪ Ξ.

C. PROOF OF THEOREM 3
Proof. Consider equation (23) and setĀ 0 = 0, ρ = 1. With some basic algebraic manipulations, one can write 2 j∈M |z T (Ā j ( Ω) −Ā j (Ω))z| + 2 The last equality follows because all of the topological errors have been detected and fixed. This completes the proof.

D. THEOREM 4 AND ITS PROOF
Theorem 4. Denote f 1 (·) as the objective function of an NLAV problem with Ξ 1 as the set of erroneous lines and M as the index set of measurements. Similarly, denote f 2 (·) as the objective function of another NLAV problem with Ξ 2 as the set of erroneous lines and M as the index set of measurements. Without loss of generality, suppose that |Ξ 1 | < |Ξ 2 |. Furthermore, assume that for any two vector of voltages,x andȳ, and a measurement index j, the following holds: |x TĀ j ( Ω)x −ȳĀ j (Ω)ȳ| > |x TĀ j (Ω)x −ȳĀ j (Ω)ȳ| (38) Then, minv f 1 (v) < minv f 2 (v) Proof. Letv 1 andv 2 be the global minimizer of f 1 (·) and f 2 (·), respectively. Also, let M 1 and M 2 be the set of measurement indices pertaining to the erroneous lines in Ξ 1 and Ξ 2 , respectively. Then, the following inequalities hold: where (a) follows from the fact thatĀ j ( Ω) =Ā j (Ω) if j / ∈ M 2 , (b) follows from the fact thatv 1 is the global minimum of f 1 (·) and (c) follows from the assumption made in equation (38).