A Meta-Learning Approach to the Optimal Power Flow Problem Under Topology Reconﬁgurations

Recently there has been a surge of interest in adopting deep neural networks (DNNs) for solving the optimal power ﬂow (OPF) problem in power systems. Computing optimal generation dispatch decisions using a trained DNN takes signiﬁcantly less time when compared to conventional optimization solvers. However, a major drawback of existing work is that the machine learning models are trained for a speciﬁc system topology. Hence, the DNN predictions are only useful as long as the system topology remains unchanged. Changes to the system topology (initiated by the system operator) would require retraining the DNN, which incurs signiﬁcant training overhead and requires an extensive amount of training data (corresponding to the new system topology). To overcome this drawback, we propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach. The key idea behind this approach is to ﬁnd a common initialization vector that enables fast training for any system topology. The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems. The results show that the MTL approach achieves signiﬁcant training speed-ups and requires only a few gradient steps with a few data samples to achieve high OPF prediction accuracy and outperforms other pretraining techniques.


I. INTRODUCTION
T HE optimal power flow (OPF) problem involves the computation of minimum cost generation dispatch subject to the power flow equations and the grid's operational constraints (e.g., voltage/power flow limits, etc.). Power grid operators must solve the OPF problem repeatedly several times a day in order to ensure economical operation. The OPF problem under the generalized alternating current (AC) power flow model is non-convex, and solving them using conventional optimization solvers can be computationally expensive. The growing integration of renewable energy and the power demand uncertainty necessitates solving the OPF problem repeatedly at a significantly faster time scale (in the order to seconds) to respond to the changing system states, leading to significant computational challenges [1].
To overcome this challenge, there has been a significant interest in adopting machine learning (ML) techniques to speed up the computation of the OPF problem. The ML models can be trained offline, and the trained model can be used online to support the computation of the optimal generation dispatch. The main advantage of this approach is that online computations are cheap, and hence, they can speed up OPF computation significantly. ML has been applied in a number of different ways to support OPF computation.
The most straightforward approach is to use ML models (e.g., DNNs) to directly learn the mapping from the load inputs to the OPF outputs. The real-time load demands are fed as inputs to the trained ML model, and the corresponding OPF solution is computed as outputs. This approach was used to solve the direct-current optimal power flow (DC-OPF) problem in [2], in which, the inputs to the DNN are the active power demand at the load buses and the outputs are the active generation power. This approach was shown to provide up to 100 times speed-up as compared to using conventional optimization solvers. A similar approach was used to solve the AC-OPF problem in [3], in which, the inputs to the DNN are the active/reactive power demand at the load buses and the outputs are the active power generations and voltage magnitudes at the generator buses. This framework was shown to achieve 20 times speed-up as compared to conventional OPF solvers. A similar approach has been used for other applications such as scheduling under outages [4]. During the training stage, the outage schedules are used as inputs to the DNN, and the corresponding OPF costs are obtained as the DNN outputs. This model can effectively assess the impact of a given outage schedule on the OPF solution. Furthermore, ML methods have been used to provide decentralized decision support for distributed energy resources (DERs). For example, [5] designs a local controller by training an ML model using the historical generation and consumption data. The developed model is used for scheduling generation that minimizes the cost of DER control and network loss. In [6], ML methods are used to predict the optimal inverter actions (DER control policy) based on local measurements. Different from this approach, ML can also be used indirectly to speed up conventional optimization solvers. For example, ML can be used to learn the set of active constraints at optimality; this approach was used to solve the DC-OPF problem in [7], [8], [9]. Alternatively, ML can also be used to compute the so-called warm start points for optimization solvers, an approach that is especially useful to solve the nonlinear AC OPF problem [10], [11]. Compared to these indirect approaches [7]- [11], the direct approach can achieve greater computational speed-up.
Other machine learning techniques have also been adopted for the OPF problem. For instance, [12] proposes a stacked extreme learning machine to speedup the parameter tuning process and reduce the learning complexity. Reference [13] builds a random forest model to calculate a near-optimal OPF solution and to perform post-contingency analysis. Further, [14] compares the performance of OPF solvers developed according to different ML methods (random forest, multitarget decision tree, and extreme learning machine). The results show that ML methods can significantly reduce the OPF computation time with minimal constraint violations and optimality loss.
Recent works have also provided feasibility guarantees, i.e., provide theoretical results to show that the solutions proposed by the ML models satisfy the power grid's operational constraints (e.g., line/voltage limits, etc.). In particular, a preventive framework to ensure feasibility for the DC OPF problem was proposed in [15] by calibrating the transmission line capacity limits and the slack bus generation limits to compensate for the inherent approximation errors of DNNs. Similar ideas were extended for the AC OPF problem in [3]. The worst-case guarantees with respect to physical constraint violations for the DNN's OPF solution were derived in [16], [17], and the results were used to reduce the worst-case error. Reference [18] combined DNNs with robust optimization techniques to directly achieve feasible solutions for the securityconstrained OPF problem.
Despite the growing research literature on this topic, a major drawback of existing work [2]- [18] is that they are designed for a specific system configuration. As such, they remain effective only as long as the system topology remains fixed. Nevertheless, topology reconfigurations by transmission switching and impedance changes are essential parts of grid operations that can improve the grid's performance from both operational efficiency and reliability point of view [19], [20], [21]. These measures have gained increasing attention recently. For instance, perturbation of transmission line reactances (using distributed flexible alternating-current transmission systems, D-FACTS devices [22]) is finding increasing applications in power flow control to minimize the transmission power losses [21] and cyber defense [23], [24], [25]. Similarly, grid operators also perform transmission switching and topology control to ensure economic and reliable system operations [19], [20].
Active topology control poses significant challenges in the use of DNNs for OPF prediction. A DNN trained under a specific system configuration might not be able to provide correct OPF outputs under a different system configuration. This is because the mapping between the load inputs and the OPF outputs will change due to the changes in the system topology. Indeed, our results show that DNNs trained on a specific topology have a poor generalization performance when the system topology changes. Complete retraining with the new system configuration will require significant amounts of training data and time, thus negating the computational speed-up achieved by DNN prediction.
To address these shortcomings, we propose a novel approach in which we train the DNN-based OPF predictor using a metalearning (MTL) approach. The main idea behind MTL is to a find good initialization point that enables fast retraining for different system configurations. Specifically, we use the socalled model-agnostic MTL approach [26], which finds the initialization point in such a way that a few gradient steps with a few training samples from any system configuration will lead to good prediction performance. This is accomplished by appropriately tuning the loss function of the offline training phase (that finds the initialization point), such that the ML model (DNN in our case) learns internal features that are broadly applicable to the different tasks at hand (i.e., OPF prediction for different variants of the power grid topology), rather than a specific task [26]. Then during the online training phase, these features can be fine tuned to achieve good OPF prediction performance using a few data samples from that topology. Thus the method is well suited to predict OPF solution under planned topology re-configurations. To the best of our knowledge, this work is the first to utilize MTL in a power grid context.
We conduct extensive simulations using benchmark IEEE bus systems. We compare the performance of MTL against several other approaches. They include (i) "Learn from scratch": in which, there is no pretraining, i.e., when the system is reconfigured, we initialize the DNN weights randomly and train them using the OPF data from the new system reconfiguration. (ii) "Learn from a joint training model": in which, during the offline phase, we train a DNN model from a combined dataset consisting of OPF data from several different topology configurations. Then during the online phase, we initialize the weights of the DNN using this model and finetune it using OPF data from the new system configuration. (iii) "Learn from the closet model": in which, during the offline phase, we train several DNN models separately using OPF datasets from different topology configurations (i.e., one DNN for each system configuration). Then, during the online phase, when the topology is reconfigured, we choose the model that achieves the best prediction performance on the new configuration and choose its weight as the initial DNN's weights. The weights are then fine-tuned using OPF data from the new configuration.
We verify the efficacy of the proposed approach by simulations conducted using IEEE bus systems. We generate the OPF data using the MATPOWER simulator and implement the ML models using Pytorch. The results show that the proposed MTL approach can achieve significant training speed-ups and achieve high accuracy in predicting the OPF outputs. For instance, for the IEEE-118 bus system, MTL can achieve greater than 99% OPF generation prediction accuracy for a new system configuration with less than 10 gradient updates and 50 training samples. Furthermore, MTL can achieve a much higher prediction accuracy as compared to complete retraining (i.e., training from scratch), especially in the limited data regime (i.e., when the number of training data samples from a new system configuration are limited). MTL also outperforms the other two pretraining methods in terms of the OPF prediction accuracy and takes significantly less time/storage in the pretraining phase. Thus the method is well suited to predict the OPF solution under planned topology reconfigurations.
We summarize our main contributions in the following: • To address the shortcoming of existing works that train DNNs under a fixed topology setting and require complete retraining following topology reconfiguration, we propose an MTL approach for computing the OPF solution. Specifically, the MTL approach finds a good initialization point during offline training that enables fast retraining for different system configurations. • We compare the performance of the MTL approach against several other pretraining methods that are designed to compute the OPF solution following topology reconfigurations. To this end, we perform OPF computation considering several benchmark IEEE bus systems. • Using simulation results, we quantify the performance gain of the MTL approach as compared to other pretraining methods in terms of the OPF prediction accuracy, feasibility, and computational speed. Our results show that MTL outperforms other pretraining methods on all these metrics, making it suitable for computing OPF under real-world settings that include topology reconfigurations.
The rest of this paper is organized as follows. Section II introduces the power grid model, OPF problem and DNN approach. Section III details the proposed MTL method. Section IV presents the simulation setting. Section V analyses the simulation results and prove the effectiveness of MTL over other pretraining methods. The conclusions are presented in Section VI. Some additional simulation results are included in Appendix.

A. Power Grid Model
We consider a power grid with N = {0, 1, . . . , N − 1} buses, where N is the total number of the buses and N ≥ 2. Without the loss of generality, we assume bus 0 to be the slack bus whose voltage is set to 1.0∠0 pu. A subset of the buses G ⊆ N are equipped with generators. Since the interest of this paper is grid topology reconfigurations, we consider M different grid topologies, where each topology differs with respect to the bus-branch connectivity and transmission line impedances. We assume that the nodes of the power grid always remain connected (among all the considered topologies). We let L (m) = {1, . . . , L (m) } denote the set of transmission lines under topology m ∈ {1, 2, . . . , M }. Further, we let Y (m) = G (m) + jB (m) denote the bus admittance matrix under topology m, where G (m) and B (m) denote conductance and susceptance respectively [27].
Under topology m, let P Di ) denote the active and reactive power generations (demands) at node i ∈ N respectively. The complex voltage at node i ∈ N under topology m is denoted by V Optimal Power Flow Problem: The OPF problem computes the minimum cost generation dispatch for a given load condition constrained to the power flow equations and power generation/voltage constraints. Mathematically, the OPF problem can be stated as follows: where C i (·) is the generation cost at bus i ∈ G. Further, P max ) denote the maximum (minimum) real/reactive power generations and nodal voltage limits at node i respectively.

B. DNN Approach for the OPF problem
We now summarize the approaches proposed by existing works that use DNNs for the OPF problem [2], [3]. Fig. 1 shows an illustration of the overall methodology. The goal of the DNN is to approximate the non-linear mapping between the system load and the OPF solution. Let h(x (m) k , w) denote a parametric function, specifically a DNN under topology m, in our case, that takes the system load as inputs and produces the OPF outputs. Herein, w denotes the parameters of the DNN.
consists of a vector of power generation at all buses except the slack bus (note that the generation at the slack bus can be determined by solving the AC power flow problem with the other generations specified) and v (m) The parameters of the DNN under topology m are trained to minimize the objective function given by This objective function is the mean square error between DNN's predicted value h(x (m) k , w) and the corresponding real value y (m) k generated by a traditional OPF solver. Following offline training, the DNN is deployed online to predict the generation outputs for given load inputs. We note that once G,k ] are predicted by a trained DNN, the other system parameters (such as the nodal voltages/power injections, etc. at the non-generator buses) can be recovered by solving AC power flow problem as shown in Fig. 1. Note that solving the AC power flow problem is computationally extremely fast as compared to solving the AC OPF problem, and hence, adds only a small computational overhead on the DNN approach [3].
Drawbacks of Existing Work: The main drawback of existing works is that the DNN predictions remain effective only as long as the topology of the system remains fixed. As noted before, topology reconfigurations are increasingly being adopted in power grids to ensure the economic operation and reliability [19], [20], [21]. While it is certainly possible to retrain the model when the system topology is changed, retraining from scratch will require significant amounts of training data and time. Alternatively, the system operator can train separate DNNs for each system configuration. But this would require a significant amount of computational resources. Moreover, the operator must know all possible topology reconfigurations beforehand, which is not possible, since unforeseen contingencies may arise during power system operations.

III. A META LEARNING APPROACH FOR THE OPF PROBLEM UNDER TOPOLOGY RECONFIGURATION
To overcome these challenges, in this work, we seek to build an ML model for the OPF problem that can be rapidly adapted to a new system configuration. MTL is ideally suited to tackle this problem [26]. MTL is a training methodology that is suited to learn a series of related tasks; when presented with a new and related task, MTL can quickly learn this task from a small amount of training data samples. MTL algorithm consists of two phases, an offline training phase (also called the metatraining phase) and an online training phase (adaptation for the new task). During the offline training phase, MTL finds a set of a good initialization parameters for the series of related tasks. During the online phase, MTL uses the initialization parameter to quickly adapt the model parameters to a new task using a few gradient updates with a few training samples.

A. MTL Description
We now present the details of the proposed MTL approach. As noted in Section I, we consider M different grid topologies. Assume that during the offline training phase, the system operator has access to OPF training data samples from M * < M topologies. We denote the offline training data set by T offline training phase = {T 1 , T 2 , . . . , T M * }. During the offline training phase, MTL uses T offline training phase to find a set of parameters w MTL that minimizes the loss function given by where J Tm is defined in (7). The objective function J MTL is the sum of MSE loss for all the topologies in T offline training phase following a single-step gradient descent. The MTL parameters are given by w MTL = arg min w J MTL . As evident from (8) In Algorithm 1, w MTL are the meta-weights (i.e., the initialization weights) for the related tasks, and w m are the taskspecific weights for the training topology m (obtained from a single gradient update on w MTL ). The notation ∇J Tm (w) denotes the gradient of the loss function (defined in (7) computed using the dataset T m ) with respect and weights w. Finally α and β denote the step sizes for the gradient updates. During the online training process, assume that the system operator changes the power system topology to a new configuration that does not belong to the dataset in offline training phase. Let T (new) / ∈ T offline training phase denote the training dataset from the new system configuration. Note that T (new) may consist of only a few data points K (new) as compared to the offline training data. The objective function of the online training is given by MTL finds the task-specific parameters for this new topology by performing gradient update, which starts from the optimal initialization point w MTL obtained in offline training phase.
The overall procedure for OPF using the MTL approach is presented in Algorithm 2. Compute the adapted parameters with gradient descent:

B. Implementation
A schematic diagram illustrating the proposed MTL implementation is shown in Fig. 2. In the offline phase, the system operator uses a power grid simulator to generate the training data set T offline training phase . The data is subsequently used to train a DNN as in Algorithm 1. During real-time operation, assume that the system operator plans a topology reconfiguration. During the online training phase, the system operator takes the new system configuration as input to a power grid simulator and generates a few new data samples for the online training phase. Then, the new samples are used to quickly fine-tune the machine learning model as in Algorithm 2. Following retraining, the new model can be used to predict the generator outputs. The online training procedure must be repeated once the system topology is changed.

C. Ensuring Feasibility
The OPF solution predicted by the DNN is feasible when it satisfies the active power generation/ nodal voltage limits, which are specified in (4), (5), (6). In order to ensure the feasibility of DNN proposed solution, we take the following approach proposed in [3], [15]. First, we perform a linear transformation for the active power generation/ nodal voltage magnitudes as follows: Note that once we make these transformations, we must have Then, we use the DNN to predict these scaled versions of real power generation and voltages (P Gi (ρ i ), V i (σ i )), rather than predicting P Gi and V i directly. To this end, at the output layer of the DNN, we use the sigmoid activation function. Recall that the sigmoid function always outputs a number within the range of [0, 1]. Thus, we can guarantee that the prediction of the scaled versions P Gi (ρ i ) and V i (σ i ) predicted by the DNN lie between [0, 1], and consequently, the predictions of P Gi and V i will lie between their upper and lower limits. Note that without the scaling and the use of the sigmoid function, the DNN prediction cannot be guaranteed to output a feasible solution (i.e., one that lies in between the permissible upper and lower limits). While the aforementioned transformation ensures the feasibility of the variables directly predicted by the DNN, i.e., P Gi , i ∈ G \ {0} and V i , i ∈ G, it does not ensure that feasibility of all the system variables -specifically, those recovered by solving the AC power flow problem (recall Fig. 1). For this reason, we calibrate the voltage constraints while generating the training dataset to avoid such violations [3]. Specifically, in topology m we calibrate the voltage constraints as where λ is a calibration parameter that is set to a small value. This calibration ensures that the DNN is trained to predict voltage magnitudes that lie strictly in the interior of feasible region, and hence mitigates the infeasibility caused by the approximation errors of the DNN. Finally, one can also ensure the feasbility of reactive power generations using a similar procedure. We omit and details here and refer the reader to [3].

IV. SIMULATIONS
In this section, we verify the effectiveness of the proposed MTL approach using simulations and present the results.

A. Algorithms and Metrics
Under MTL, the offline and online training are performed according to Algorithm 1 and 2. We compare the performance of MTL against three other training methods, namely, "learn from scratch" and "learn from a joint training model" and "learn from closet model".
• In "Learn from scratch", there is no pretraining. During the online phase, following topology reconfiguration, a DNN's weights are intialized to random values, and trained using the OPF dataset from the new topology. • The "Learn from joint training model" is described in Algorithm 3. During the offline training phase, a DNN is trained using the dataset T offline training phase , which combines the training data from topologies 1, . . . , M * . During the online phase, following topology reconfiguration, the DNN's weights are fine-tuned (from the pre-trained values) using OPF data from the new topology, similar to the MTL online training phase. • The "learn from closet model" is described in Algorithms 5 and 6. During the offline training phase, we train a separate DNN for each topology 1, . . . , M * . During the online phase, we choose the DNN that achieves the best prediction performance on the new topology at hand (step 4 of Algorithm 6). Then, we fine-tune its weights using OPF data from the new topology. We henceforth refer to "Learn from joint training model" and "Learn from the closest model" as "Pretrain1" and "Pretrain2" respectively.

Algorithm 3 Offline Training for pretrain1
Input: T offline training phase , α Output: w pretrain1 : The initial parameters (model) that developed based on joint training 1: while not done do Change system to new configuration 3: Obtain training samples from the dataset of new configuration T (new)

4:
Find the model that performs best on new task: w best = arg min m J T (new) (w m ) 5: Compute the adapted parameters with gradient descent: The online operation framework of the two-step DNN based OPF solver is presented in Fig 1 (used for the testing data). In the first step, given the active and reactive power demand at the load buses, the trained DNN predicts the active power generations (except that on the slack bus) and the voltage magnitudes at the generator buses. Then, all other system state parameters (e.g. P G0 , Q G , V L , θ and branch power flow pf ) can be reconstructed by solving simple AC power flow equations.
The performance of the DNN based OPF solver is assessed by three metrics. The first metric η 1 is the DNN validation loss, which is defined in (7). The second metric η 2 is the accuracy of the state parameters, defined in (13), where 2|G|− 1 is the dimension of DNN output,ŷ B. Data Creation and DNN Settings The power system models are based on MATPOWER's test cases [28]. The training and testing data are generated using MATPOWER's AC OPF solver (specifically, we use MATPOWER's interior point solver). We test the algorithms using the IEEE-14, 30 and 118 bus systems. During the data generation phase, we create different power grid topologies by randomly disconnecting a subset of transmission lines (e.g., each line is disconnected with a probability of 0.01) and adding a random perturbation to the line reactance values (subject to a maximum and minimum reactance limit). Some of these topologies may not produce a feasible OPF solution, e.g., if too many transmission lines are disconnected at once, there may not be a feasible solution to the OPF problem that can satisfy the load demand in that topology. Thus, we exclude those topologies from the dataset, since a grid operator will not change the system to those configurations (thus, they dont represent real-world topologies). We keep generating different topologies in the aforementioned manner until we can find sufficient number of ones that have resulted in a feasible OPF solution. For instance, in the 118−bus system, we generated 100 topologies with a feasible OPF solution.
For each bus system, M = 100 different grid topologies are generated. For each topology, we create a set of 1000 data points, where each data point corresponds to a different load value obtained by adding a random load perturbation to the base values (that are obtained by the MATPOWER simulator). The maximum load perturbation is restricted to 70% of the original values. We consider the quadratic OPF cost, and use the default generation cost values in MATPOWER. Changes to the system topology will lead to changes in the power flows, leading to a different OPF solution. In our simulations, we allocate M * = 70 tasks to the offline training phase, and the rest 30 tasks (denoted as new tasks) to the online training phase.
We implement the neural network model and the MTL training based on PyTorch framework. We use the ReLu activation function at the hidden layers, and the sigmoid activation function at the output layer. The size of the input and output layers are chosen to be consistent with the size of the dataset. In our case, the input to the DNN is a vector containing the active and reactive power demand at the load buses. For instance, in the IEEE-118 bus system, the size of the input vector is 198 (corresponding to the active and reactive power demands of the 99 load buses), and the size of the output vector is 107 (corresponding to the 2|G| − 1 generator buses). Thus, the number of neurons at the input and output layers are 198 and 107 respectively. For the hidden layers, we vary the number of neurons proportional to the size of the input/output layers. In Table II, we present the prediction accuracy (η 2 ) for different number of neurons in the hidden layers considering the IEEE-118 bus system. The setting labelled "Ref" provides the highest accuracy, which is the DNN setting we use in the rest of the paper. Similarly, for each test system, we vary the size of the hidden layers and choose the setting that gives the best accuracy results. The settings for each layer under different bus systems used in our simulations are enlisted in Table I. In the offline training process, for each pretraining method, we use the "Adam" optimizer with a learning rate of 0.001 and use 1000 training epochs. The L2 regularization is applied to prevent over-fitting, and weight decay is 0.001. For the online training phase, unless specified otherwise, we use 50 training samples during for fine-tuning the weights. Further, we use the stochastic gradient descent (SGD) optimizier with a learning rate is 0.1, and the weight decay is 0.001.

V. SIMULATION RESULTS
The simulation results are presented in Fig. 4,8,9 and Tables III, IV. For brevity, we only present the results from the IEEE 118 bus system in Fig. 4. The results from the IEEE-14 and 30 bus systems are relegated to the Appendix. The results in all the bus systems follow a similar trend.     Table III present the accuracy results based on the different metrics defined in Section IV. It can be observed that MTL achieves a very high prediction accuracy of over 97% (η 2 ) and over 99% (η 3 ) with less than 10 training epochs. This shows that MTL can rapidly adapt to the new system configuration starting from the initialization point w MTL . In contrast, training from scratch from a random initialization takes a significantly greater number of gradient updates. For the purpose of illustration, we choose one particular system topology from the testing phase and present the results of the true value and the prediction of P Gi and V i in Fig. 3 for the IEEE-30 bus system, in which we can observe a close match between the two quantities.
Furthermore, MTL also achieves the highest accuracy as compared to the other pretraining methods (Pretrain 1 and 2) and lower loss. More importantly, we also observe that online training with a very number of data samples (i.e., 50 OPF data samples from the new topology in our case) does not significantly improve the performance of other pretraining methods as observed in Table III (sometimes, we also observed that for other pretraining methods, online training with only a few data samples may result in worse performance due to over-fitting). Thus, with the other pretraining methods, the accuracy is limited to the performance achieved during the offline training phase. From Fig. 4 and Table III, we observe that for MTL, most of the performance improvement occurs within the first few epochs. Thus, MTL is suitable for online training with a very few data samples and a very few training epochs.
We also present the results for feasibility of the predicted OPF solution in Table IV. The feasibility rate is calculated as fr = n f nt , where n f denotes the number of testing sample that achieves feasible solution, and n t denotes the total number of testing samples. The results show that the adjustments made to the training process proposed in Section II is able to ensure that MTL achieves very high feasibility rate.

B. Computational Time for the Offline Training Phase
Besides the advantages of MTL in terms of accuracy, another advantage is its ability to quickly produce an an initialization model (i.e., the offline training). In Table V, we enlist the time required to produce the intialization model of MTL and other pretraining methods for different bus systems. It can be observed that MTL takes significantly less time than the other pretraining methods. Moreover, as compared to the Pretrain2 method, which requires a separate DNN to be trained and stored for each power grid topology, MTL requires a single DNN model to be stored. Thus, MTL also significantly reduces the storage burden in comparison to the Pretrain2 method.

C. MTL Performance for Different Offline/Online Training Parameters
We investigate the performance of MTL as a function of the online/offline training parameters. To this end, first, we test the online training performance of MTL/ learn from scratch under different learning rates γ. The result of the IEEE-118 bus system is presented in Fig. 5. Increasing the learning rate γ can accelerate the speed of online training. However, it is not desirable to set a very high learning rate since it may risk oscillations around the minimum (as in gradient update algorithms). For instance, in the result presented in Fig 5, we observe that the learning speed and the accuracy of MTL is enhanced as we increase γ from 0.001 to 0.1. However, when γ is increased beyond this value (for instance γ = 0.2), the accuracy of the online learning starts to decrease. For each test system, we similarly determine the optimal learning rate by gradually adjusting the value of γ.    Secondly, we investigate the the prediction accuracy (measured according to the metric η 2 ) as a function of the number of training samples used in the online training progress and present the results in Fig. 6. We observe that MTL achieves good accuracy by fine-tuning with only 50 − 100 online training samples. Increasing the number of online training samples to 700 achieves a negligible improvement in the accuracy. This implies that MTL is good for fine tuning with a very few number of data samples, making it particularly attractive for online training. Note that despite using only a few data samples for during the online "training" process of MTL, we have provided significant number of data samples during the "testing" phase to ensure sufficient averaging and    that the results we present are unbiased. For instance, the result in Fig. 6 is computed based on 300 data samples during the testing phase. Thirdly, we investigate the the MTL prediction accuracy as a function of the number of topologies used in the offline training phase T offline training phase . The results plotted in Fig. 7 indicate that the prediction accuracy goes down when the number of topologies used in the offline training process is reduced. This indicates that a sufficient number of toplogies are required in the offline training phase to develop an efficient MTL model.

D. Computational Gain Compared to the Traditional OPF Solver
We further test the computational time for MTL's online training and prediction time (following the retraining) and compare it with the traditional MATPOWER-based OPF solver in Table VI. We summarize the observations in the following.
Online Training Time: It can be observed from Table VI that under the MTL approach, the DNN can be retrained quickly to achieve high prediction accuracy. In particular, recall that MTL's online training achieves a very high prediction accuracy within 10 retraining epochs. For the IEEE-118 bus system, the computation time for the online training (corresponding to 10 epochs) is only 1.2 seconds. Thus, the MTL approach will be scalable to large OPF systems.
Online Prediction Time: The online prediction phase consists of two steps: (1) DNN prediction (2) post-processing to ensure feasibility (as illustrated in Fig. 1 of the paper). We present the computational time for both these operations   Table VI. We compare it with the time required by the traditional MATPOWER-based OPF solver. The results show that the proposed approach can provide significant speed up in comparison to the traditional solver. For instance, for the IEEE-118 bus system, we can achieve a speed-up of 19 times (for every computation of the OPF).
Finally, note that the online training operation is an additional computation burden incurred under the MTL approach (that is not required by the traditional OPF solver). For the IEEE-118 bus system, we observe that the online training time (≈ 1.2 s) for MTL is approximately 23 times that of traditional OPF solver (≈ 52 ms). From this observation, we can conclude that MTL will be useful for a power system operator if the system topology remains unchanged for atleast 23 OPF computations. If the system topology is changed faster than this rate, then the computational burden of MTL is greater than that of using the traditional OPF solver. However, in most practical systems, this is reasonable, since changes in the load/renewable energy fluctuations occur at a much faster rate compared to the rate of topology reconfigurations. Thus, MTL is suitable in practical power system operation scenarios.
VI. CONCLUSIONS In this work, we have proposed a DNN based approach to the OPF problem that is trained using a novel MTL approach. The proposed approach is particularly relevant for computing OPF generation dispatch decisions under power grid topology reconfigurations. The MTL approach finds good initialization points from which the DNNs can be quickly trained to produce accurate predictions for different system configurations. Simulation results show that the proposed approach can significantly enhance the training speed and achieve better prediction accuracy as well as feasible results compared to several other pretraining methods. To the best of our knowledge, this work is the first to adopt an MTL approach in a power grid context.