Online learning for robust voltage control under uncertain grid topology

Voltage control generally requires accurate information about the grid's topology in order to guarantee network stability. However, accurate topology identification is challenging for existing methods, especially as the grid is subject to increasingly frequent reconfiguration due to the adoption of renewable energy. In this work, we combine a nested convex body chasing algorithm with a robust predictive controller to achieve provably finite-time convergence to safe voltage limits in the online setting where there is uncertainty in both the network topology as well as load and generation variations. In an online fashion, our algorithm narrows down the set of possible grid models that are consistent with observations and adjusts reactive power generation accordingly to keep voltages within desired safety limits. Our approach can also incorporate existing partial knowledge of the network to improve voltage control performance. We demonstrate the effectiveness of our approach in a case study on a Southern California Edison 56-bus distribution system. Our experiments show that in practical settings, the controller is indeed able to narrow the set of consistent topologies quickly enough to make control decisions that ensure stability in both linearized and realistic non-linear models of the distribution grid.


I. INTRODUCTION
O PERATORS of electricity distribution grids must main- tain voltages at each bus within certain operating limits, as deviations from such limits may damage electrical equipment and cause power outages [1], [2].This "voltage control" or "voltage regulation" problem has been well-studied, e.g., [3], [4], [5] and the references therein.Voltage control devices and algorithms aim to guarantee grid stability and minimize the costs associated with control inputs.While classic voltage regulation devices such as tap-changing transformers are effective in dealing with slow voltage variations [6], [7], increasing penetration of renewables leads to faster variations, and a growing body of literature has focused on inverterbased controllers that can respond quickly by adjusting their active and reactive power set-points.Most of these works cast voltage control as an optimization problem and then propose different centralized or decentralized algorithms depending on the communication infrastructure.
Typically, voltage control algorithms assume exact knowledge of the underlying grid topology.This includes centralized controllers such as algorithms based on model predictive control (MPC) which optimize control decisions for a shortterm horizon.[8] uses MPC to manage distributed generation and energy storage systems, whereas [9] proposes a robust MPC controller that is robust to uncertainty in the forecasts of future loads and solar generation.
However, the exact grid topology and line parameters are often not known, and using existing voltage control algorithms with incorrect grid information may lead to problems with grid stability [10], [11].For example, parts of the grid may undergo reconfiguration due to load balancing or unplanned maintenance, as frequently as every hour of the day [12], [13], [14], [15].This problem is exacerbated by the increasing integration of distributed energy resources (DERs), such as photovoltaic (PV) and storage devices.Especially in distribution grids, where DERs are not owned or operated by the electricity utility, the grid operator may lack up-to-date information about the grid topology [16].While a grid operator can install sensors to help identify the current network topology, unless such sensors are densely deployed (at great cost), uncertainty about the topology remains.Thus, distribution grid operators cannot expect to operate with perfect topology information and the design of voltage control algorithms robust to unknown grid topology is crucial.
There are several families of existing algorithms that do not require knowing the network topology: decentralized controllers, model-free controllers, and controllers that first try to infer the network topology.While decentralized voltage control algorithms are generally efficient to implement, such controllers lack voltage stability guarantees when the load is time-varying [4], [17], [18], [19], [20].Likewise, modelfree controllers based on deep reinforcement learning do not require knowing the network topology, but they generally have no performance or voltage stability guarantees and are therefore not suitable for safety-critical infrastructure [21], [22], [23], [24], [25].Some recent works [26], [27], [28] have proposed methods for introducing stability guarantees for model-free deep reinforcement learning approaches.Their main tool is Lyapunov stability theory, from which a structural constraint for stable controllers is derived, and policy optimization with the constraint is performed.However, their stability guarantees are only valid over an infinite time horizon, and achieving good performance with deep reinforcement learning generally requires large amounts of historical training data.In contrast, our proposed framework jointly learns the system model (consistent with data) and stable controller in an online fashion, achieving a finite-mistake guarantee and good performance without relying on historical data.
Another standard approach for handling uncertainty about network topology is to first estimate the topology and line parameters using a form of system identification with data and then apply a standard voltage control algorithm using the identified network topology.There is a growing literature of such data-driven methods, e.g., [10], [11], [16], [29], [30], [31], [32], [33], [34], [35], [36], [37].A common approach is to leverage least squares for system model estimation.The estimation and therefore control guarantees depend on statistical modeling of measurement noise (e.g., Gaussian).In contrast, we leverage online learning in order to be robust against any bounded disturbances, such as modeling errors and adversarial noise.While least squares-based algorithms focus on asymptotic estimation convergence, e.g.[38], [39], we present a finite mistake guarantee that is crucial for safe transient system behavior.
Another prominent approach is to use graphical models for topology reconstruction [40], via maximum likelihood methods while enforcing other structural restrictions like lowrank and sparsity.However, these methods that first perform some form of system identification have drawbacks.First, the estimated topology and/or system dynamics may be imperfect [41], and applying standard voltage control algorithms using these imperfect estimates may still lead to system instability.Second, these methods either assume access to historical data or require acquiring data online over hundreds of time steps, during which the stability of the system is ignored [16], [40].In contrast, our proposed approach does not perform system identification separately from control; the joint operation of our robust controller with the system dynamics estimation gives rise to our stability guarantee.

A. Contributions
We propose a new approach for voltage control over an uncertain grid topology that does not perform system identification and voltage control separately.Instead, our approach robustly learns to stabilize voltage within the desired limits directly, without prior knowledge of the topology and without needing to precisely learn the topology.Our approach takes ideas from online nested convex body chasing (NCBC) [42] and robust predictive control and combines them using a new learning framework [43] to design a voltage control algorithm.Intuitively, we use a NCBC algorithm to track the set of topologies that are consistent with the observed voltage measurements-as more measurements are taken, the set of consistent topologies shrinks (and so the sets are nested).As these measurements are taken, a form of robust predictive control is used for voltage control, where the robustness guarantee is used to ensure that the uncertainty about the topology can be handled.Our main result (Theorem 1) provides a finite error stability bound for the overall controller, which is summarized in Algorithm 1.This represents the first voltage control algorithm that is provably robust to large uncertainty about network topology.
This paper supersedes the results of the preliminary version of this work [44] in the following aspects: 1) We improve the analysis of [44], which assumes no uncertainty in the maximum load/generation variability, to both reduce the mistake bound by a factor of 2 and also improve empirical voltage control performance.2) We extend our approach to handle uncertainty in the maximum variability of load and generation entities in the grid, and we show that in the limiting case of 0 uncertainty, our result coincides with the improved analysis mentioned in (1).3) We perform case studies of the proposed algorithm on the Southern California Edison (SCE) 56-bus distribution system [45] with a more realistic nonlinear power flow model with partial control and partial observation.Even though the design of our method is based on a linear approximation to the power flow model, our method still performs well for the nonlinear system.4) We demonstrate how to incorporate existing partial knowledge of the grid topology and network line parameters into the algorithm.We show that incorporating such prior knowledge can improve the performance of our algorithm.

II. MODEL
We study voltage control on an unknown grid topology.We consider a radial (tree-structured) power distribution network represented as a connected directed graph G = (N , E), where N = {0, 1, 2, . . ., n} is the set of buses (nodes) and E ⊂ N × N is the set of lines (directed edges).Let the network be rooted at bus 0 (the substation or slack bus), and let other buses be branch buses.Let C ⊆ N denote the subset of buses with controllable reactive power injection.Because the network is radial and rooted at bus 0, there is a unique path P i from bus 0 to any other bus i.For branch buses, let v ∈ R n be their squared voltage magnitudes and p + iq be their complex power injection, where p ∈ R n (units W) is the net active power injection, and q ∈ R n (units Var) is the net reactive power injection.The DistFlow branch equations [46] for a distribution grid are as follows, for all j ∈ N and (i, j) ∈ E: where P ij and Q ij represent the active power and reactive power flow on line (i, j), and r ij , x ij > 0 are the real-valued line resistance and reactance (units Ω).Equations (1a) and (1b) represent the real and reactive power conservation at bus j, and (1c) represents the voltage drop from bus i to bus j.
Assuming the branch power losses (r ij l ij , x ij l ij ) are negligible yields the simplified DistFlow equations [47], which can be rearranged into where v 0 ∈ R n is the known, constant squared voltage magnitude at the substation, and R ⋆ , X ⋆ ∈ S n are computed from the network topology and line parameters with [n] := {1, . . ., n} [18].(S n is the set of symmetric n × n matrices.)R ⋆ , X ⋆ are positive definite with nonnegative entries [48], and the largest entry of each row of these matrices is along the diagonal, since and likewise for R ⋆ ij ≤ R ⋆ ii .We assume that the active power injection p is exogenous but that reactive power at each bus can be decomposed as q = q c + q e , where q c is the "controllable" component and q e is the "exogenous" (i.e., uncontrollable) component.Following [18], we define v par = R ⋆ p+X ⋆ q e +v 0 1 n ∈ R n ("par" stands for "partial") representing the exogenous effects on voltage.Then, v = X ⋆ q c + v par , which can be modeled as a discretetime linear system v(t + 1) = X ⋆ q c (t) + v par (t). ( Substituting u(t) = q c (t)−q c (t−1) (change in controllable reactive power injection) and w(t) = v par (t)−v par (t−1) (change in exogenous noise) yields the linear dynamical system The voltage control problem [45] is to drive the squared voltage magnitudes of each bus from an initial state v(1) ∈ R n into a given multi-dimensional interval [v, v] ⊂ R n ; it is possible that v(1) does not start within the interval due to some large initial disturbance.For all t ≥ 2, the voltage control algorithm aims to maintain v(t) within [v, v], ideally as close as possible to a "nominal" value v nom ∈ [v, v], typically v nom = (v+v)/2.The cost for deviating from v nom is measured by ∥v(t) − v nom ∥ 2 Pv for some positive semidefinite matrix P v , where ∥x∥ 2 A := x ⊤ Ax.At each time step, buses may change their reactive power injection q c (t) in order to regulate the voltage close to v nom .The reactive power injection (including q c (0)) is limited within a given bound [q, q] ⊂ R n .Buses not in C do not have any ability to control the reactive power injection: ∀i ̸ ∈ C. q i = q i = 0. We do not place any hard "ramp constraints" on u(t).However, we impose a quadratic ramping cost ∥u(t)∥  In summary, the voltage control problem is to determine an online sequence of reactive power injections q c (1), q c (2), . . . to drive voltages v(t) to a desired interval [v, v] while minimizing voltage violation and control costs ∥v(t) − v nom ∥ 2 Pv + ∥u(t)∥ 2 Pu .In this work, we solve the voltage control problem in the setting where X ⋆ is unknown.

III. ROBUST ONLINE VOLTAGE CONTROL
In this section we introduce our robust online voltage control algorithm (Algorithm 1) and its performance bound (Theorem 1), which is the main result of this paper.

A. Algorithm
As shown in Figure 1, the algorithm has two main components: a consistent model chasing algorithm SEL (Algorithm 1, step 2) and a robust control oracle Π (Algorithm 1, step 3).SEL and Π are combined by adapting ideas from [43].
The model chasing algorithm SEL selects a consistent model for the robust control oracle Π out of all plausible models that are consistent with the online observations and prior knowledge of the grid.The selection may use any competitive NCBC algorithm, which is the online problem of choosing a sequence of points within sequentially nested convex sets, with the aim of minimizing the sum of distances between the chosen points [42].In our experiments, we use a simple projectionbased NCBC algorithm, detailed in Section V.
The robust control oracle Π is a novel robust predictive controller (Theorem 3).The robustness guarantee of Π is necessary for the analysis which integrates SEL with Π to provide the finite mistake guarantee of the overall algorithm.We remark that other choices for either component are possible, as long as they provide the guarantees needed in the analysis in Section IV.
Intuitively, SEL and Π are combined in a way such that SEL always reduces the uncertainty about the unknown model whenever Π outputs an action that causes a voltage limit violation.This means that Π cannot take too many "bad" actions before the system uncertainty is small.

B. Assumptions
Before presenting the main results, we introduce three assumptions that underlie our analysis and discuss why they are both needed and practical.
Assumption 1: The change in noise is bounded as where This first assumption is standard and bounds the noise in the dynamics.It represents realistic behavior in power systems where the active and exogenous reactive power injections do not vary dramatically between time steps, as can be seen by expanding w(t): For example, if the net active and exogenous reactive power injection is the same at time steps t and t − 1, then w(t) = 0.
An unknown η ⋆ indicates uncertainty in the maximum variability of the exogenous power injections.Unlike [44] which assumes a fixed η, our inclusion of both an unknown η ⋆ and a known upper-bound η allows more flexibility in our algorithmic design and the incorporation of prior knowledge.
Assumption 2: The true model X ⋆ lies within a known compact, convex uncertainty set X ⊂ S n + ∩ R n×n + .(S n + is the set of n × n positive semidefinite matrices, and R n×n + is the set of n × n matrices with nonnegative entries.) Our second assumption bounds the uncertainty about the network topology and line parameters.It ensures that the unknown true model parameters X ⋆ belong to a compact, convex set X , which is a minimal assumption necessary for proving an analytic guarantee.P 1 = X ×[0, η] forms the initial "consistent set" (see Definition 2) for our consistent model chasing algorithm SEL.
This assumption is realistic, as a grid operator should have at least some prior knowledge about the distribution grid topology and the range of possible line parameters, even if they do not have the exact values.In cases where the grid has multiple possible topologies due to switches, X could be set to the convex hull of the corresponding X matrices.
Definition 1 (∥•∥ △ and ∥•∥ △,δ ): For any matrix X ∈ S n and scalars η, δ ≥ 0, define For any sets X ⊆ S n and A ⊆ R, we define diameters diam(X ) and diam(X × A) with respect to the norms ∥•∥ △ and ∥•∥ △,δ , respectively.These norms isometrically map our parameter space to Euclidean space, enabling us to take advantage of known results on NCBC within Euclidean space.For the norm ∥•∥ △,δ , the hyperparameter δ trades off the weight between X and η in the norm.The choice of δ is discussed in Section V.
In practice, we consider uncertainty sets of the form A larger α yields a larger uncertainty set.From Section II (e.g., ( 4)), we know that Furthermore, we can incorporate partial knowledge we may have of the network topology and/or line parameters by adding constraints to the description of X .For example, if we know that the lowest common ancestor between buses i, j in the network is bus k, then we can add the following constraint on X, which is a consequence of (3): If we additionally know the values for some line parameters x ij , we may be able to further constrain some entries of X, again by applying (3).Assumption 3: There exists a compact, convex set V par ⊂ R n such that ∀t ≥ 0 : v par (t) ∈ V par .Furthermore, for some known ϵ > 0, Our final assumption is about the existence of feasible control actions for the robust control oracle.This assumption can be interpreted as either a bound on the noise, or a requirement that the controllable reactive power injection be flexible enough to satisfy the demand of any admissible noise.It represents the reasonable assumption that a grid operator should have installed enough controllable reactive power injection capability to perform voltage control.Intuitively, the η padding is required for robustness to the noise w(t), while the ϵ padding is required for robustness to model uncertainty (i.e., uncertainty about X ⋆ ).

C. Main result
We now state our main result, which is a finite-error bound for Algorithm 1.
Theorem 1 (Main Result): Under Assumptions 1 to 3, Algorithm 1 ensures that the voltage limits will be violated at most .To the best of our knowledge, this result is the first provable stability bound for voltage control in a setting where the network topology is unknown.It highlights that Algorithm 1 can ensure stability even after unknown changes to the network topology, e.g., due to maintenance, failures, etc., without the need to perform system identification while remaining robust to any bounded and potentially adversarial perturbations satisfying Assumptions 1 and 3.
Intuitively, this result guarantees that the model chasing algorithm SEL will learn a "good enough" model for control quickly.When the robust controller Π makes a mistake, the model chasing algorithm will learn from that mistake and significantly reduce the set of consistent models.Because the initial set of consistent models is bounded, and this set shrinks a significant amount after each mistake, the total number

Algorithm 1 Online Robust Voltage Controller Inputs
• desired nominal squared voltage magnitude: 3) Query the robust control oracle for the next control action: where ρ = δϵ/(1 + δ∥q − q∥ 2 ).4) Apply the control action u(t).Observe the system transition to v(t + 1) = v(t) + X ⋆ u(t) + w(t) and q c (t) = q c (t − 1) + u(t).5) Append (v(t), v(t + 1), u(t), q c (t)) to the trajectory: of mistakes is bounded.Note that this finite mistake bound implies finite-time convergence to safe voltage limits without an explicit finite-time bound.
To interpret the error bounds in Theorem 1, we notice that they are proportional to the diameter of the parameter space and the competitive ratio γ(m) of the NCBC algorithm, and inversely proportional to the oracle robustness margin ρ.Because of computational tractability concerns, our experi-ments implement SEL with a greedy projection-based NCBC algorithm with γ proj (m) = π(m − 1)m m/2 [42], rather than the state-of-the-art Steiner point method which can achieve γ Steiner (m) = m/2 [49].As our case studies show, in practice the projection-based NCBC algorithm performs much better than the worst-case bound.We note that any other NCBC algorithm with a finite competitive ratio can be used in (8a) in Algorithm 1. Investigating whether widely-used estimation methods, like least squares, have a finite competitive ratio would be an interesting avenue for future research.
Note that for Theorem 1 to hold, the optimization problem for the robust control oracle Π should first be solved without the slack variable ξ in Algorithm 1.This ensures that if ( Xt , ηt ) is sufficiently close enough to the true model, then the algorithm will not make a mistake.In the case that Π is infeasible initially (e.g., when the initial model estimate is far from the true model), it should be solved again with a slack variable, which ensures feasibility.However, solving Π twice is unnecessary in practice, and so we have written Algorithm 1 to reflect its practical implementation.
We outline a proof of Theorem 1 in the next section.We want to highlight one piece of that proof that is of independent interest.In particular, a major step in the proof is to provide a feasibility guarantee for the robust control oracle component Π of the algorithm, which is done in Theorem 3.

IV. PROOFS
We now prove our main result Theorem 1.Our proof builds on and adapts the approach of [43], which outlines a general framework for integrating model chasing and robust control.To explain the general framework, we first consider a discretetime nonlinear dynamical system where x ∈ S ⊆ R n is the system state and u ∈ U ⊆ R m is the control input.The unknown function f * and disturbance sequence w ∈ ℓ ∞ (Z + ; R n ) belong to an uncertainty set F, and the disturbance is bounded as ∥w∥ ∞ ≤ η.Assume that F has a compact parametrization (T, K, d), where T : K → ℘(F) is a mapping from a parameter space K to a set of functions and disturbances such that The control objective is specified as a sequence of indicator "goal" functions G = (G 0 , G 1 , . . .).Each G t : X × U → {0, 1} encodes a desired condition per time step t: The main result of [43] specifies a set of sufficient conditions for a finite-mistake guarantee-i.e., ∞ t=0 G t (x t , u t ) < ∞.These conditions decouple online robust control into separate online learning and robust control components.The online learning component requires a consistent model chasing algorithm SEL, which takes as input the current observed trajectory i=1 and outputs an estimated parameter θ t ∈ K which must be consistent with D t .
Let P t denote the set of all parameters consistent with D t ; P t is called the consistent set.We say SEL is γ-competitive if ∞ t=1 d(θ t , θ t−1 ) ≤ γ max θ∈K d(P ∞ , θ) holds for a fixed constant γ > 0, which we call the competitive ratio.
The robust control component requires a control oracle Π, which given the current state x t and a parameter θ t , outputs a control action u t = Π θt (x t ) that is robust for all systems that are close to θ t .In particular, we call a control oracle ρ-robust for control objective G, if all trajectories in S Π [ρ; θ] achieve G after finitely many mistakes.S Π [ρ; θ] is defined as the set of all possible trajectories generated by Π θ for all θ such that d(θ, θ) ≤ ρ: Due to the page limit, we refer readers to [43] for a more detailed discussion of consistent model chasing algorithms and ρ-robust control oracles.As a summary, if SEL chases consistent models and Π is a robust oracle for G, then the resulting A Π (SEL) algorithm achieves a finite mistake guarantee, which is stated in the following.
Theorem 2: [43, Theorem 2.5] Assume that SEL chases consistent models and Π is a robust oracle for objective G. Then for any starting point x 0 and trajectory [(x t , u t )] ∞ t=0 generated by A Π (SEL) (illustrated in Figure 1), the following mistake guarantees hold: where M Π ρ denotes the worst case total mistakes of the ρrobust control oracle Π.
To apply Theorem 2 to prove Theorem 1, we need to prove that (i) the proposed algorithm (8) chases consistent models and has a bounded competitive ratio, and (ii) the proposed robust algorithm in ( 11) is a ρ-robust control oracle, for bounded disturbance in the system topology.In particular, the correspondence of the definitions is as follows.We have θ = (X, η), and We begin by proving that the set P t defined in (8b) in Algorithm 1 is consistent with the trajectory D t .
Observe that each P t is a closed, bounded, and convex set.Furthermore, P t is non-empty, since (X ⋆ , η ⋆ ) ∈ P t .Intuitively, P t is the smallest set containing all parameters that could generate the observed trajectory D t along with a corresponding admissible sequence of noise compatible with Assumptions 1 to 3.
The consistent sets are nested P t ⊆ P t−1 , and we use our particular choice of norm ∥•∥ △,δ to establish a linear bijection between (S n × R, ∥•∥ △,δ ) and Euclidean space (R m , ∥•∥ 2 ).This allows us to take advantage of any γ(m)-competitive NCBC algorithm in Euclidean space [42], [49], where m is the dimension of the space, to prove that SEL is γ(m)competitive.This is formalized in the following lemma.
Lemma 2 (SEL is competitive): If the NCBC algorithm used in SEL has competitive ratio γ(m), then SEL is γ(m)competitive.
Proof: The proof is similar to [44, Lemma 2], except that learning η adds an additional dimension to the parameter space.That is, there exists a norm-preserving linear bijection between (S n × R, ∥•∥ △,δ ) and Euclidean space (R m , ∥•∥ 2 ).
Finally, we show that our controller Π is ρ-robust.In particular, we prove that Π X makes no mistakes (M Π ρ = 0) given consistent parameters ( X, η) ∈ P t .
Theorem 3 (Π is ρ-robust): Under Assumptions 1 to 3, suppose ( X, η) ∈ P t , where P t is given in (10) for t ≥ 1. Define ρ = δϵ 1+δ∥q−q∥2 .Then, the following optimization problem is feasible: Further, the solution of ( 11), u(t), guarantees voltage stability for all (X, η) Observe that (11) corresponds to (9) in Algorithm 1 with the slack variable set to zero.We note that the robustness margin ρ decreases as [q, q] increase.The intuitive reason is that the voltage is more sensitive to changes in X when the range of possible u's expands.Therefore, a fixed voltage buffer of ϵ in constraints (9e) and (11d) affords less robustness to changes in X as [q, q] gets larger.
Proof of Theorem 3: First, we will show that the following two conditions are sufficient for feasibility of the optimization problem and ρ-robustness for the solution u.
Then, we will show that our choices of k and ρ satisfy these sufficient conditions.To derive the sufficient condition for feasibility, define as the conjectured noise when we assume the underlying parameter is X.Since X ∈ P t and P t ⊆ P t−1 , we have v par (t − 1) ∈ V par .Then, by Assumption 3, there exists Set u = q c − q c (t − 1) (which satisfies (11b)) and define Recalling (6), we can interpret v′ (u) as the one-step voltage prediction (without disturbance) under the model X given control action u and the current voltage v(t).We thus have Therefore, as long as k ≤ η +ϵ, u will satisfy constraint (11e).Next, we derive the sufficient condition for robustness.Let u be a solution of (11), so it satisfies (11e).Let (X, η) ∈ X ×[0, η] be arbitrary parameters satisfying ∥(X, η)− ( X, η)∥ △,δ ≤ ρ.Define ρ X := ∥X − X∥ △ .By Lemma 3, Furthermore, suppose Adding together the 3 inequalities (11e), ( 12), ( 13) yields Clearly, if k − ρ X ∥u∥ 2 − η ≥ 0, then the desired robustness condition is satisfied.Since Therefore, we can express the robustness condition in terms of η: For ρ > 0, f (ρ X ) is strictly concave and twice-differentiable and therefore achieves its maximum when f ′ (ρ X ) = 0.This . Thus, if k is at least this value, then we achieve robustness.
Finally, we show that our choices of k and ρ satisfy the sufficient conditions.Since a + b ≥ √ a 2 + b 2 for all a, b ≥ 0, our choice of k satisfies the robustness condition: Note that while setting k = η + ρ 1 δ 2 + ∥u∥ 2 2 would also satisfy the robustness condition, this expression would make (11) a nonconvex optimization problem.
In the case where η ⋆ is known, a similar proof shows that k = η ⋆ + ρ ∥u∥ 2 and ρ = ϵ ∥q−q∥2 satisfy feasibility and robustness.(This can be seen as the δ → ∞ limiting case of Theorem 3 such that consistent model chasing only updates X and keeps η = η ⋆ fixed.) Proof: See [44].Finally, combining Theorem 3 with Lemma 2 and applying Theorem 2 completes the proof of Theorem 1.

V. CASE STUDY
We demonstrate the effectiveness of Algorithm 1 using a case study based on a single-phase 56-bus network (n = 55) from the Southern California Edison (SCE) utility, with line parameters r ij , x ij from [45, Table 1].Even though our algorithm only has guarantees for the linear power flow model (2), we show that our algorithm works well on both the linear model and the more realistic nonlinear DistFlow model (1).

A. Experimental Setup
Following [20], we adapt real-world load and PV data from [50] for the 56-bus network by adding power injection (scaled by the PV generation) at buses C = {2, 4, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 23, 25, 26, 32}.Exogenous active and reactive power injection measurements are taken at each bus at 6-second intervals over a 24-hour period.Figure 2 plots these values for several buses to illustrate the setting considered.We assume that controllers with reactive power injection capacity are available at every node.The network parameters used in our experiments are: • nominal squared voltage magnitude at the substation ]kV 2 • reactive power injection limits [q, q] = [−0.24,0.24]MVar • state and input cost matrices P v = 0.1I, P u = 10I • initial state v(1) = R ⋆ p(0) + X ⋆ q e (0) + v 0 1, q c (0) = 0 In comparison to previous papers in the voltage control literature, our reactive power injection limits [q, q] are slightly more generous than ±0.2 MVar used in, e.g., [20].We choose ±0.24 MVar because even a controller with perfect knowledge of the future would need reactive power injection capabilities of at least ±0.238MVar in order to maintain v(t) ∈ [v, v] (if q = −q) under linear dynamics (2).
We set η = 10, which upper-bounds the maximum change in exogenous noise observed in our data, which is ≈ 8.6: We fix ϵ = 0.1.In order to satisfy the requirement in Assumption 3 that v(t) ∈ [v + (η + ϵ), v − (η + ϵ)], the reactive power injection capabilities must exceed ±0.528 MVar.As we show in experiments with only ±0.24 MVar range of control, though, Assumption 3 does not need to be fully satisfied in order for our method to still provide strong empirical results.
For the robust controller Π, we set slack variable weight β = 100 and V par = [v par , v par ] to be a rectangle around the true noise.Under linearized system dynamics, v par (t) is calculated as described in Section II, and then we set Under nonlinear system dynamics, we approximate v par (t) as the nodal squared voltage magnitudes when q c (t) = 0  2), and we add 0.5kV 2 padding which empirically suffices as a convex outer approximation of V par : As mentioned previously, we use a greedy projection-based NCBC algorithm [42] in SEL that minimizes the movement distance ∥( Xt , ηt ) − ( Xt−1 , ηt−1 )∥ △,δ between nested convex sets P t ⊆ P t−1 : This achieves competitive ratio γ proj (m) = π(m − 1)m m/2 .To keep the optimization problem (8) computationally tractable for consistent model chasing, our implementation does not use the full trajectory D as in the constraints of the consistent set (10).Instead, we include the 20 latest observations and 80 more observations sampled uniformly at random (v(t), v(t + 1), u(t), q c (t)) ∼ D. This provides a computationally tractable approximation of the uncertainty set.In our experiments on linear system dynamics, we found that Xt selected using this approximation was always in the consistent set defined by the full trajectory D, when allowing for small numerical inaccuracies introduced by the CVXPY optimization solver.
Unless otherwise stated, we initialize η1 = 0. We initialize X1 by adding noise to the true X ⋆ in two ways.First, we scale each line impedance x ij by a random factor σ ij iid ∼ Uniform[0, 2].Second, we randomly permute the bus ordering, so X1 corresponds to a permuted grid topology.Finally, we project X1 into the uncertainty set X α , with α = 1.
Except for the experiments shown in Figure 5, we fix δ = 20 which empirically strikes a balance between minimizing the modeling error ∥ Xt − X ⋆ ∥ △ and overfitting noise.

B. Experimental Results
Our experimental results demonstrate the ability of Algorithm 1 to stabilize the system without knowledge of the network topology, providing good voltage control performance even though it still has significant uncertainty about the topology at the end of the experiments.We test our algorithm  (d) like (a) but X = X ⋆ is fixed and known so only η is learned (e) Convergence of Xt towards true X ⋆ (solid lines, left axis) and estimated η (dotted lines, right axis).Notice that even when ∥ Xt − X ⋆ ∥ △ does not reach 0, the controller still performs quite well.under both the linearized system dynamics (5) as well as the more realistic nonlinear balanced AC power flow setting (1) simulated using Pandapower [51].The convex optimization problems for SEL and Π are solved with CVXPY [52] using the MOSEK solver [53].Code for our simulations are available on GitHub. 1 a) Linearized power flow with full control: Our first set of experiments, shown in Figure 3 and Table I (top), tests our algorithm's performance on the SCE-56 bus network under linearized system dynamics (5).Different amounts of network information are provided to the consistent model chasing algorithm SEL via the initial consistent set X α , ranging from no information ("unknown," Figure 3a), information about the edges among the first 14 buses but not the line impedances ("topo-14," Figure 3b), information about the edges and line impedances among the first 14 buses ("lines-14," Figure 3c), and complete information about the network ("known," Figure 3d).Because the buses in the SCE 56-bus network are  numbered in a topological ordering, the "topo-14" setting adds constraints of the form (7) for all of the first 14 buses, and the "lines-14" setting constrains all X ∈ X α such that X ij = X ⋆ ij for all i, j ∈ {1, . . ., 14}.
As shown in Figure 3e, incorporating more prior knowledge about the network into the initial uncertainty set reduces the model estimation error ∥ − X ⋆ ∥ △ .Furthermore, the model estimation error decreases the most dramatically when the voltage violations are the largest.However, we note that lower model estimation error does not always result in fewer mistakes in our experiments.
Table I quantifies our algorithm's performance under varying amounts of initial network information.A "mistake" refers to any time step where any bus' voltage violated the limits [v, v]."Avg.violation" refers to the average absolute squaredvoltage violation "Max violation" is like "avg.violation" but replaces the mean with a max.Results given show the mean and standard deviation over 4 random initializations of X1 .
b) Nonlinear power flow with full control: Our second set of experiments test our online controller on the standard balanced AC power flow model (1).As in the linearized power flow experiments, we compare Algorithm 1's performance across varying levels of prior information (Figure 4 and Table I, bottom).Even though the controller is designed under the assumption of linearized voltage dynamics, our algorithm still performs well in the nonlinear simulation.The performance improves progressively, with less voltage violation and smaller overall deviation from the desired steady state voltage as it is provided more information.
c) Nonlinear power flow with partial observation and partial control: We also test our proposed online controller in the partial observation and partial control setting.In Figure 6, we withhold voltage observations and control authority from buses i ∈ {8, 18, 21, 30, 39, 45, 54} by setting q c i (t) = 0 for all t.We simulate the voltage profiles across 4 random initializations of X1 and plot the mean and ±1 standard deviation.Despite the more challenging setting, the performance of Algorithm 1 remains strong.We again observe in Figure 6 that adding prior topology and line parameter information marginally improves the performance of Algorithm 1.
d) Varying δ: In Figure 5, we demonstrate the effect of varying δ on the performance of our algorithm.From a theoretical perspective, Theorem 1 shows that our algorithm achieves a finite mistake bound for every δ > 0, and this bound is minimized by taking δ to be very large.What happens when using a large δ, though, is that the model chasing algorithm may overfit to noise until a time when the noise is too large, forcing the algorithm to increase the noise bound (e.g., around the 16h mark in Figure 5).This leads to inconsistent performance in the short term, albeit with perhaps better worstcase performance.In contrast, a smaller δ allows more of the network uncertainty to be captured in a larger noise η term at the cost of learning a less accurate X, but the decrease in modeling error ∥ Xt − X ⋆ ∥ △ becomes monotonic.
In practice, δ should be treated as a prior "confidence" about how close the initial guess of η is to η ⋆ .δ should be larger when there is greater confidence that η is close to the true η ⋆ .e) Detecting topology changes: Finally, we consider the challenge of responding to a change in the distribution grid topology in real-time.If the topology changes from one radial grid to another due to switches, new observed data may render the consistent set empty.That is, when consistent model chasing (14) becomes infeasible, we are assured that the topology has changed.At this point, we may reset the algorithm by discarding the observed trajectory D t and reinitializing consistent parameter estimates from the original consistent set P 1 .Figure 7 demonstrates this on linear system dynamics, where we introduce a topology change at the 12h mark.We replace lines 33 → 40 and 46 → 48 with new lines 1 → 40 and 10 → 48, which maintains a radial distribution grid.

VI. CONCLUSION
This paper provides the first controller that establishes a finite-mistake guarantee for voltage control in a setting with uncertainty in both the grid topology and load and generation variations.We showed that our proposed algorithm is able to learn a model of the grid dynamics in an online fashion and provably (under linearized voltage dynamics) converge to a stable controller.Further, simulated experiments on a 56-bus distribution grid demonstrate the effectiveness of our algorithm even under more realistic nonlinear dynamics.We demonstrated how to incorporate prior knowledge about the network topology and line parameters to improve performance, while also extending our algorithm to the partial observability and partial controllability setting which may better reflect realworld scenarios.
As the current algorithm is centralized, future works may consider decentralized approaches to topology-robust voltage control in order to enable faster real-time control with ideas from [54].Another direction is to extend the current algorithm to the time-varying topology setting with techniques from works such as [55].Further studies may also explore loosening the radial topology assumption and test our algorithm on unbalanced 3-phase AC grids to accommodate a wider range of distribution grids.This would be a challenging, but important, extension.Finally, an interesting algorithmic extension is to consider computationally efficient convex body chasing algorithms with better competitive ratios.Existing methods based on Steiner point [42], [49] achieve nearly-optimal competitive ratio but are computationally inefficient in high dimension settings such as voltage control, so designing efficient approximate Steiner point algorithms could potentially lead to significant performance improvements.

2
Pu where P u is a positive semidefinite matrix.

Fig. 3 .
Fig. 3. (a)-(d) Voltage profiles of 7 different buses simulated under linear system dynamics (2).Dotted black lines indicate voltage limits [v, v].(a) Π+SEL initialized with random X ∈ Xα.(b) like (a) but the topology for buses 1-14 is known.(c) like (a) but the topology and line parameters for buses 1-14 are known.(d)like (a) but X = X ⋆ is fixed and known so only η is learned (e) Convergence of Xt towards true X ⋆ (solid lines, left axis) and estimated η (dotted lines, right axis).Notice that even when ∥ Xt − X ⋆ ∥ △ does not reach 0, the controller still performs quite well.

Fig. 6 .
Fig. 6.Balanced nonlinear AC power flow simulation of the voltage profiles under different algorithms with partial control and observation.The dark colors plot the mean voltages across 4 random initializations of X1 and the light shading plots ±1 standard deviation.(a) bus 18 (b) bus 30.

Fig. 7 .
Fig. 7. Demonstration of the detection of a topology change under linear system dynamics.Convergence of Xt towards X ⋆ is plotted in solid lines (left axis), where X ⋆ changes at the 12h mark.The topology change triggers a reset of the consistent model chasing algorithm.Estimated η is plotted in dotted lines (right axis).
and action cost matrices: P v , P u ∈ S n Initialize an empty trajectory D 0 = [ ]. Set t = 1.2) Query the model chasing algorithm for a new consistent parameter estimate: ( Xt , ηt ) ← SEL[D t ].

TABLE I PERFORMANCE
OF OUR METHOD SIMULATED UNDER LINEAR SYSTEM DYNAMICS (TOP) AND NONLINEAR SYSTEM DYNAMICS (BOTTOM).SEE SECTION V-B.