Solar PV Inverter Reactive Power Disaggregation and Control Setting Estimation

The wide variety of inverter control settings for solar photovoltaics (PV) causes the accurate knowledge of these settings to be difficult to obtain in practice. This paper addresses the problem of determining inverter reactive power control settings from net load advanced metering infrastructure (AMI) data. The estimation is first cast as fitting parameterized control curves. We argue for an intuitive and practical approach to preprocess the AMI data, which exposes the setting to be extracted. We then develop a more general approach with a data-driven reactive power disaggregation algorithm, reframing the problem as a maximum likelihood estimation for the native load reactive power. These methods form the first approach for reconstructing reactive power control settings of solar PV inverters from net load data. The constrained curve fitting algorithm is tested on 701 loads with behind-the-meter (BTM) PV systems with identical control settings. The settings are accurately reconstructed with mean absolute percentage errors between 0.425% and 2.870%. The disaggregation-based approach is then tested on 451 loads with variable BTM PV control settings. Different configurations of this algorithm reconstruct the PV inverter reactive power timeseries with root mean squared errors between 0.173 and 0.198 kVAR.

on distribution feeders [1]- [5]. However, the data used to drive these algorithms are often based on incomplete feeder models that rely on manual data entry and topology configuration. In particular, a priori knowledge of the inverter settings is typically assumed or derived from the model. Records of these settings may be out-of-date or completely unavailable, and the manual entry process for the collection and maintenance of these data is highly prone to error [6], [7].
In contrast with utility-scale PV, the impact of BTM PV inverters is typically invisible to utilities. Even if the presence of an advanced inverter is known, distribution engineers typically do not have the ability to remotely monitor BTM inverters or access the installation premises to verify the inverter control settings of grid-connected solar PV installations. Furthermore, with the advent of the IEEE 1547-2018 standard update [8], the deployment of advanced BTM solar PV inverters with reactive power control and voltage regulation capabilities will increase. Notably, there is a broad range of control settings permitted by the standard. For these reasons, it is also likely that utilities of the future may face limited information about these advanced inverters and their control settings for BTM PV. As a result, engineers may be unable to accurately predict their effects on a distribution network. It is also likely that limited information about the specific operating conditions of BTM inverter-based resources is attainable. Control parameters of the system such as the power factor, curtailment, and voltage control settings may be completely unavailable. These problems are an active area of research [4], [9]- [15].
Fortunately, data from advanced metering infrastructure (AMI) are becoming increasingly available [16], presenting a significant number of opportunities for the creation of entirely data-driven methods to reconstruct the control settings of solar PV inverters. These contribute to existing AMI-derived estimation methods, which have been developed for estimating the PV hosting capacity of distribution systems [7], [17], disaggregating the active power contribution of solar PV [11]- [15], and many other applications.
However, one of the key obstacles in implementing entirely data-driven analysis methods is the detection and estimation of inverter control settings, particularly those for BTM PV. Currently, this problem has been unaddressed in the literature. The contributions of this paper are methods to reconstruct these settings, even if they are unknown. In this paper, we build up two related mathematical models for achieving this reconstruction from heavily corrupted, aggregated load data. The proposed framework uses power system physics with data-driven estimation and optimization methods. The two models together This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ form the first approach to solve this problem. These estimation techniques allow engineers to acquire information about the reactive power behavior of BTM PV interconnections without the use of a distribution circuit model. The method also does not require the input of the customer or the deployment of field technicians, which allows for easier power system planning and power quality monitoring. The techniques could also be used in the form of a non-intrusive method to monitor for changes in these settings. Knowledge of these changes is critical for feeder-wide control and optimization tasks such as Volt-VAR optimization (VVO) [18]. It may also be useful for other DER management tasks such as acquiring operational impacts with other regulating devices and impacts to system protection. The methods implicitly provide information about the operational impacts of the inverter, which simultaneously reconstructs the control curve. Correspondingly, this allows for the user to infer the actuation of the control states, thereby reconstructing the PV reactive power timeseries.
In summary, the methods provide access to unknown devicelevel PV inverter information that is important for other distribution system planning and analysis tasks, such as the computation of PV hosting capacity and ensuring model fidelity. We achieve this with only historical voltage, active power, and reactive power AMI data, without a need for a distribution circuit model. Improved knowledge and visibility of inverter settings will increase utility confidence in their studies, which has the potential to reduce the need for BTM PV interconnection regulations and restrictions, enabling a swifter transition to a sustainable energy portfolio.

A. Advanced Inverter Reactive Power Control Settings
Modern inverters can operate in several different control modes. These modes are defined by characteristic curves that dictate the injection or absorption of reactive power at the point of common coupling (PCC) of the DER. An inverter with these capabilities is commonly referred to as a "smart inverter" or "advanced inverter". For consistency, we adopt the latter terminology.
The simplest and most common is the fixed power factor mode. This mode is favored for its simplicity and interpretability. In this control mode, the inverter controller is designed to inject or absorb reactive power such that a constant power factor is always maintained at the PCC. To achieve this control objective; as the PV active power output is increased, the reactive power output will increase proportionally to the power factor [5]. This behavior is illustrated in Fig. 1. The default setting is unity power factor control [8], in which case the voltage impacts of the PV generator will not be mitigated [19].
In contrast with the default unity power factor control mode, nonunity power factor control, and another control mode, autonomous Volt-VAR, can mitigate distributed generator voltage impacts. These control modes have received significant attention from both researchers and industry due to their potential to increase PV hosting capacity. Notably, it has been shown that Volt-VAR control yields a superior reduction in voltage profile volatility in comparison with both fixed unity and nonunity   power factor control, while requiring a smaller amount of reactive power injections [5], [20], [21]. When an inverter is operating with Volt-VAR control, the reactive power injection or absorption of the PV system's inverter is controlled according to one of the following control states: where the PCC voltage V pcc is used as the input signal to the controller to determine the state. This behavior is illustrated in Fig. 2, and an example of the state regions corresponding with the control curve is shown in Fig. 3. Research and industrial interest in Volt-Watt control have also grown, primarily serving as a "backup" to reactive power control modes. Currently, reactive power control modes-specifically, fixed power factor and Volt-VAR-have emerged as a primary focus of industry and local regulators [19]. For this reason, and for brevity, we leave the detection and estimation of Volt-Watt and other control modes to future work.

B. Net Load Disaggregation
Net load disaggregation is a well-studied topic in the power engineering literature. Reference [22] considered one of the earliest examples of structured net load disaggregation, with the intent of identifying the presence of different household appliances from variations in voltage and current waveforms. More recently, [23] built on this idea with a non-negative sparse coding approach, integrating concepts from the blind-source separation literature in the signal processing field. Using a dictionary of example signals for the energy consumption of various device types, they find representations for aggregate energy demand in a basis of test signals. Later works such as [24] continued to use sparse coding with various enhancements. Even more recently, game-theoretic approaches, such as [13] have appeared, in addition to numerous unsupervised methods such as [12], [25], [26]. Furthermore, reference [27] developed an online learning approach for disaggregation.
A particular application of this topic that has received considerable attention is the estimation -or disaggregation -of BTM solar PV parameters from net load AMI data, i.e., the total energy summation of the customer's demand and PV generation. There has been a considerable amount of work in both PV location, state, size, and presence detection [9], [28], [29]. Other works have explored the direct disaggregation of solar generation signals and "native" demand signals solely from net load AMI measurements [11]- [15], [30]. These are closely related to our work.
Notably, there has been a growing interest in solving disaggregation problems with a completely data-driven approach, i.e., without considering the system feeder model. Some solutions to this problem are dependent on localized feeder information, with newer works pursuing methods increasingly decoupled from localized information, [12], [13], [30]. A particularly popular method is to use "exemplars" or proxy measurements of solar PV generation to provide a basis for the estimation or to extract features of interest. In [11] a multiple linear regression approach based on proxies for localized irradiance information from a nearby metered PV system was established. A similar approach was generalized to online estimation in [15]. It has also been shown in [13] that economic data can be a useful feature input for the estimation.
Probabilistic methods and unsupervised machine learning methods have shown promise for removing the need for modelderived information. Works such as [26], [28], [31] develop datadriven methods using a variety of input data assumptions to estimate size, azimuth, or tilt information for BTM PV installations in an unsupervised fashion. An estimated probability density of nighttime and daytime active power from loads without PV on the feeder was used to disaggregate solar generation in [12]; a similar concept was developed in [32]. Reference [12] then uses a weighted combination of the exemplars to subtract away from the daytime net AMI measurements, where the weights are chosen such that they maximize a log likelihood function derived from a learned generative distribution. In all these works, the focus is solely on disaggregating the active power production of the PV from the customer's demand. In this paper, we take inspiration from these results and construct an approximate generative distribution for the native reactive power. We also improve upon the methodology of [12] by incorporating a metric for estimating the dimension of the model, i.e., the number of Gaussian mixtures that maximally captures the information of the load population without PV.

C. Advanced Metering Infrastructure (AMI)
The growing rate of interconnections of BTM PV systems has also come with a growth in the use of AMI data, with many applications appearing in the literature [7]; including load forecasting [10], [33] and estimating the error of the meter itself [34].
In our analysis, we consider an electric power system with L loads containing BTM PV operating under reactive power control. For each load l = 1, . . . , L, we assume access to a historical multivariate timeseries AMI dataset D l containing measurements of net load voltage, active power, and reactive power demands over a time horizon M : Hereafter, we drop the subscript l for brevity. For a load with BTM PV, the net demand measured by the AMI, p net t and q net t , will be additive combinations of the "native" active and reactive demands of the load, which we will denote as p t and q t ; the injection or absorption of active and reactive power by the solar PV and its inverter system, denoted as p pv t and q pv t ; and random measurement noise, denoted as .
Remark (Sign Convention): Throughout the paper, we adopt the convention that injections from a node into the distribution system are positive and flows from the distribution system into the load through the AMI are negative.
Considering these conventions, native demand is a negative value. Therefore, the net AMI demands are then: The above convention significantly simplifies notation throughout the paper. Importantly, p net t and q net t are the negative of what will typically be read by AMI in practice.
The net metered demands in (2) and (3) are assumed to be sampled at 15-minute granularity. Furthermore, we assume that measurement errors due to sensor manufacturing defects are distributed according to a standard normal: ∼ N (0, σ 2 ) We will use the symbol K to denote the set of all feasible control parameters Θ for the mode. The contribution of the PV system, q pv t , to the net reactive power measurement seen by the AMI, q net t , is modeled by a control curve φ Θ , where Θ ∈ K is a parameter vector describing the control settings. To be specific, in the case of the Volt-VAR control mode, this control curve will be a piecewise-linear function of the PCC voltage v pcc t , whereas for the fixed power factor control, we will have a linear function of the PV active power, i.e.:

III. CONTROL CURVE BASIS FUNCTIONS
Given the control parameter Θ, we can reconstruct the control curve and therefore, the PV inverter's reactive power response. In this section, we build up parameterized representations of common reactive power control curves to allow us to estimate this parameter.

A. Fixed Power Factor Control
In fixed power factor control mode, the PV system's reactive power response q pv t can be characterized by a slope parameter m = Δq Δp , which is also known as the inverter's sensitivity of reactive power injections to active power injections. If we had direct measurements of the PV system's active power injection, we could reconstruct [18] the PV system's reactive power injection as: This sensitivity defines a simple line in the complex power plane, which maintains cos(φ V − φ I ) for any value of p pv t , as shown in Fig. 1. Finally, the power factor setting pf can then be easily related to this slope via trigonometry and the definition of the power factor: Additionally, the setting can be recovered with a known PV apparent power measurement ||s pv t || 2 :

B. Volt-VAR Control
For autonomous Volt-VAR control, the control input is the PCC voltage, assumed to be measured directly by the AMI. Inspecting the IEEE 1547-2018 standards [8] on the Volt-VAR characteristics curve, we can construct a parameterized function of PCC voltage v pcc t , for a general 1 representation of the reactive power response of a Volt-VAR controller as: where 3 . By finding the best fit parameterΘ for this basis function, we can reconstruct the control setting for load l by fitting the curve to the variables q net and v pcc in D l . Note that the parameter vector Θ is N pairs of voltage and reactive power settings . . , N that define the deadband, saturation, and dynamic VAR regions for the inverter:

C. Feasible Control Curve Set
We use the symbol K to represent the set of parameter vectors Θ that define a valid control curve for the mode of interest. For Volt-VAR control, this is defined by a set of convex constraints on the N knots under study. This can be expressed through a set of inequality constraints that are functions of the elements of Θ, as shown in (10)(11)(12), where the subscripts ub, i and lb, i denote the upper and lower bounds respectively for the decision variable i. ΔV represents the absolute value of the minimum dynamic VAR region width, i.e., the minimum distance between The above system of inequality constraints can be expressed compactly through the system AΘ ≤ b. The exact bounds 2 on each parameter can be found in the IEEE 1547-2018 standard [8].
For fixed power factor control, a residential load can be expected to be inductive. Therefore, K represents the reactive power sensitivities, or slopes, that meet this condition, i.e., all real numbers that are less than or equal to zero.

IV. CONTROL CURVE FITTING
The high-level goal of reconstructing inverter control settings is to find an optimal estimateΘ for the true control parameter Θ in (4). For both fixed-power factor and Volt-VAR, different methods can be used to preprocess the measurements in D l , such that the historical data approximates the underlying behavior of q pv t in the net load AMI signal (3). There are many efficient solvers for the programs described in this section, which we have used extensively throughout our work; in particular, [35], [36].

1) Power Factor Control:
Estimating a fixed power factor control parameter can be achieved by applying a simple linear regression in the complex power plane. The control setting is extracted by finding the reactive power sensitivity (slope) to active power injections that best matches the historical data relationships. If we can find a filtering function f (·) that returns filtered active and reactive power timeseries vectorsp pv and q pv where the contributions of the native load power to these measurements are sufficiently small, a power factor control setting can be estimated with ordinary least-squares regression, achieving the closed-form solution: where: Note that for the typical lagging power factor case, the sensitivity is less than or equal to zero, and the complex power line is fixed at the origin, so we can require that b = 0. Then, using the trigonometric relationship in (6), the power factor setting can be estimated with the regression slope: Other typical loss functions such as the Huber loss or the 1 distance can be used to achieve robust performance, as described in [37].
2) Volt-VAR Control: For residential BTM PV installations, the contribution of the native reactive power demand q nat t to the net signal (3) may be large at certain time points, resulting in the true control curve being significantly corrupted, and very difficult to estimate with net load AMI data.
If we can find a filtering function f such contribution of q nat t to f (q net ) t is sufficiently small, q nat t can effectively be lumped into the noise term , i.e., f (q net ) t = q pv t + . In this "naïve" approach, the optimalΘ ∈ R 2N is the solution to a non-linear least-squares program with the observed voltage and reactive power timeseries as data inputs: Furthermore, real AMI datasets with Volt-VAR-even over long time horizons-often primarily contain samples in the absorption and deadband regions. An example of such a dataset is depicted in Fig. 4. Here, Θ can be simplified to finding the best θ 3 = [V 3 , Q 3 ] T and θ 4 = [V 4 , Q 4 ] T : whereφ is a truncated Volt-VAR basis function and the curve parameters θ 3 , θ 4 ∈ R 2 are the actuated control settings to be estimated. The simplified Volt-VAR basis function can be written as:φ Note that the reactive power intercept b is again defined as The intercept b will go to zero in the case that the control curve does not have a deadband region. Furthermore, Volt-VAR curve definitions are often defined symmetrically about a reference voltage. In such instances, this low-dimensional method can be used, provided the user has a degree of confidence that the symmetry convention is followed at the load of interest.

B. Filtering Native Reactive Power
In order to directly fit the curve, it is necessary to form a filter f (q net ) that returns a preprocessed reactive power timeserieŝ q pv ∈ R M , which are the M samples that are most favorable for the curve fitting in (13), (16), and (17). This approximation for the PV system reactive power contribution q pv can then become the target for the estimation.
For Volt-VAR control, a choice for this filter becomes clear if the observations are labeled according to whether the measurement was taken during the nighttime or daytime, as shown in Fig. 5. During the nighttime, the PV inverter is generally set to turn off and to not provide voltage regulation or contribute to the reactive power measurements. Additionally, as shown in Fig. 4, observations with the most positive net load active power values more closely follow the true control curve.
Noting this trend, the user can find a "percentile threshold" for the net active power demand to filter the measurements. We denote this percentile as δ. When an observation in the AMI timeseries meets the condition p net t < p net δ , we filter out the observation, where p net δ is the δth-percentile net active power observation. After filtering, the observations are then primarily composed of the PV contribution, i.e., |p pv t | > |p nat t | and |q pv t | > |q nat t |. This "naïve filter" can be summarized as: For fixed power factor control, a similar filtering method can be applied over a different variable. As power factor control seeks to reduce overvoltages, the subset of the historical samples with the highest voltage measurements are likely to have net reactive power measurements that are primarily composed of Fig. 6. The same dataset as in Fig. 4 and Fig. 5, filtered with cutoff parameter δ = 90. The actuated control parameters can now be estimated. the inverter contribution, approximating the underlying control curve, as discussed in [37], [38].
Experimentally, we have found that 75 δ 95 often exposes both Volt-VAR and power factor curves in real AMI datasets. The example dataset in Fig. 4 and Fig. 5 is filtered with δ = 90 in Fig. 6. In Section VI-A, we discuss the application of the least-squares methods to the filtered datasets. These methods are practical and convenient because they can be optionally unconstrained, with good results.

V. REACTIVE POWER DISAGGREGATION
The least-squares methods discussed in the previous sections have several limitations. In particular: 1) The selection of the filter threshold δ must be set by the user. While tuning methods could be used to automate this task, they must be repeated for every individual load, which may be computationally costly for large systems. 2) Depending on the behavior of the individual load, different curve regions, shown in Fig. 3, may be historically actuated. The optimal percentile may vary significantly, or the full curve may not be exposed at all. In this section, we present a more general alternative to the filtering method proposed in the first section that seeks to disaggregate the native reactive power and PV-attributable reactive power, eliminating the need for the tuning of the percentile cutoff hyperparameter.
Similar to the methodologies applied to active power disaggregation in [12], [14], [32], we use a Gaussian Mixture Model (GMM) to construct an estimate for the joint distribution of nighttime and daytime native reactive power demands at 1-week granularity. The data are nighttime and daytime reactive power measurements from 678 1-year, 15-minute granularity AMI datasets of real-world loads without BTM PV. For computational speed and to ensure that the number of nighttime and daytime samples are equal, the demands are summed over a 1-week period to form a lower-resolution sample. All observations from the 678 loads are shown in Fig. 7. We denote this resampling asx = T {x} where T : R M → R M is a function that maps an M -dimensional timeseries vector x to an M -dimensional timeseriesx.
Crucially, the reactive power AMI measurements for these loads must be solely native reactive power. In practice, the user may need to employ a PV presence detection algorithm to verify that the training data are from loads without PV.
On loads with PV, this model can then be used to provide an estimate for the portion of daytime net reactive power demand that is attributable to PV by subtracting away the contribution of a particular hypothesized control curve.

A. Gaussian Mixture Model
Throughout this section we consider pairwise nighttime and daytime demands of the form: where m is the augmented time index, M is the augmented number of samples (in this case, 52 weeks), and d, n denote daytime and nighttime values respectively. Constructing a GMM consists of finding an optimal set of K multivariate Gaussian densities each with mean vectors and covariance matrices {μ k , Σ k } K k=1 to approximate the training data's generative distribution. We write γ k := {μ k , Σ k } to represent the set of estimated parameters for each distribution g k . Each density g k is then: where μ k ∈ R 2 is the mean vector for density k and Σ k 0 is the covariance matrix for density k. Each density g k is then assigned a positive scalar weight α k > 0.
Hereafter, let γ := [γ 1 , . . . , γ K ] T be the vector of parameter collections for the K densities. We then model the joint distribution of daytime and nighttime native reactive samples for the feeder as the weighted sum of the K multivariate Gaussian densities: where the weight argument α := [α 1 , . . . , α K ] T is implied. Constructing the GMM can be achieved through the optimization problem (23), which returns the model parameters γ and weights α that maximize the data log-likelihood given a particular native reactive power GMM, taking care to ensure that the granularities are converted: Note that an advantage of this method is that we can quantify the minimum amount of training samples M that we need so that the total variation distance from the true distribution is no more than . Bounds for the minimum number of samples are available in [39]. In our case, this bound is M ≥ 4K 2 log(1/ ) .

B. Construction and Training
Using the transformation T , we resample the timeseries features in (2) As before, n denotes a nighttime sample and d a daytime sample in (24). It is standard to use the expectation maximization (EM) algorithm first described in [40] to solve problems such as (23). The EM algorithm is not theoretically guaranteed to converge to the global optimum, but it is known to converge to a local optimum [40]- [42]. An important consideration for mixture models is the selection of the number of Gaussian components K, as it must be specified by the user [40], [41]. One choice for a metric to make this decision is the Bayesian Information Criterion (BIC), which we formulate based on [43] as: where γ i is the locally optimal solution to (23) for a model with K i number of components, and M is the total number of samples at the chosen timestep granularity. The optimal number of components K i that minimizes the BIC (25) is then selected among a finite range 1, . . . , K max , i.e., The corresponding parameters γ i and weights α i are then held fixed to form the trained model. We drop the subscript i hereafter. It is discussed in [43] that the BIC is a valid criterion for selecting the number of parameters of a model if the distribution behind the data is included in the set of candidate models. It was discussed in [44] that data that are distributed according to a member of the exponential family meet this condition for GMMs. Experimentally, we have found that the nighttime and daytime native reactive power samples for the feeder studied are well approximated by a joint distribution in the exponential family, and that (26) converges to 6, where the 6 components correspond to season transitions.

C. Disaggregation and Curve Reconstruction
Suppose that we are studying another load on the feeder with BTM PV. Once the GMM has been trained, we then use the model to define a likelihood function (27) for a M × 2 matrix of the form (24). Note that the rows are now net reactive power pairs [q net n,m , q net d,m ] for this load (at M -granularity): and the equivalent log-likelihood function is (28): LetΘ denote a hypothesized parameter vector. Note that we can construct the estimated native daytime reactive power demand for a load given the hypothesisΘ: With our fitted model, our problem then becomes finding the best control parameterΘ that maximizes the likelihood of the data given the hypothesizedΘ, which is the solution to the following optimization problem: maximize where K is the set of feasible parameters for the control mode. For example, consider the Volt-VAR control mode, and recall that we can encode this set as inequality constraints such as those mentioned in Section III-C: (32b) Expanding out the problem (32) and combining with (29) yields the final problem (33).
Notably, the solutionΘ to the programs (16) and (33) can be used to disaggregate the native and PV reactive power demands at the original timeseries granularity:

VI. EXPERIMENTS
We construct numerous semi-synthetic multivariate timeseries datasets in the form of (2) using actual native load active and reactive measurements from an electric utility. Similarly, the BTM PV active power injections are acquired from actual residential PV system measurements at 15-minute granularity for a year. We simulate the system's reactive power response and voltage impact on the distribution network to which it is connected using a quasi-static timeseries (QSTS) simulation.
We partition the datasets into four different load categories: Fig. 8. Log-likelihood surface for nighttime and daytime native reactive power demand (1-week granularity). Training dataset shown in Fig. 7 is used to fit this likelihood surface.
1) 701 load datasets with BTM PV of varying sizes, tilts, and azimuths that are all operating with the same Volt-VAR control curve settings in terms of p.u. values. 2) 451 load datasets with BTM PV of varying sizes, tilts, and azimuths that are each operating with different Volt-VAR control curve settings, shown in Fig. 11. Each parameter follows a uniform distribution. 3) 687 real AMI datasets for loads without BTM PV. All of these data are shown at 1-week granularity in Fig. 7, which are used to train the model shown in Fig. 8. 4) 50 load datasets with BTM PV of varying sizes, tilts, and azimuths that are each operating under non-unity fixed power factor control curve settings. For the first group, the control settings have identical p.u. values, and the majority of the loads experience sufficient voltage volatility to expose the inductive dynamic VAR region (i.e., θ 3 , θ 4 ). The user can reasonably expect these regions to be actuated by inspecting the shape of the historical datasets, as shown in Fig. 6. In this case, the truncated least-squares approach (17) can be used, and it is appropriate to directly compute the mean absolute percentage error (MAPE) of the actuated parameters for the L = 701 loads, i.e., The second dataset group is more complicated, with parameter distributions shown in Fig. 11. Some loads may not experience sufficient voltage volatility to actuate the full curve range, making it challenging to select the filter parameter δ and the region of the curve to fit with the least-squares method. We evaluate the performance using the root mean squared error (RMSE). This is the quadratic mean (RMS value) of the deviation, in kVAR, of the predicted PV reactive power found by usingΘ in (34) and the true PV reactive power over the course of the entire year, i.e., where M day is the subset of all daytime time points in the original time horizon {1, . . . , M}.
Additionally, in order to account for the variation in inverter size and control settings and to allow for better comparison, Note that we assume access to day-night transition times for simplicity. If the rough location of the distribution system is known, sunrise and sunset times are easy to obtain for each day using weather data. If the user does not have access to this information, there are many existing methods to detect time-series change points, such as [29].

A. Least-Squares Methods
In designing our algorithm to solve the least-squares programs in (16) and (17), we use the sequential least-squares (SLSQ) programming solver introduced in [45] and implemented for Python by the SciPy project [36].
For Volt-VAR control, we test the least-squares models (16) and (17) with the first group of loads. All 701 loads have identical per-unit Volt-VAR settings and a nominal voltage of 240 V, but the differing PV sizing results in correspondingly different maximum reactive power capabilities that define the saturation parameters Q 1 and Q 4 [8].
Using δ = 80, we estimate the rightmost curve parameters θ 3 and θ 4 with the unconstrained nonlinear least-squares program (16), -denoted as "UNNLSQ" -and the constrained version (17) -denoted as "CNLLSQ". The performance of the two are shown and compared in Table I and Fig. 9.
We note that the algorithm could be used in tandem with a PV size estimation method such as [28], [29], [31], in which case we could impose near-exact constraints on Q 1 and Q 4 . We make no assumptions about the size of the system, and instead simply treat the minimum observation in f δ (q net ) as a conservative estimate for the PV size.
Twelve customers out of the 701 have insufficient voltage volatility over the year available to allow for the use of this simple approach, i.e., the majority of samples are in the deadband for these loads. This is a key limitation of the least-squares methods and corroborates the need for the more advanced algorithms developed.
For the fourth dataset group with fixed power factor control, we use a regularized version of the least-squares approach and filter with respect to the 90th percentile voltage measurements. The results of this experiment are illustrated in Fig. 10. More details about this experiment and many others, including other loss functions, are described in [37], [38].

B. Disaggregation Model
We implement the MLE algorithm described in Section V using interior-point methods [36], [46], and apply the algorithm in two configurations. The first assumes that V ref is fixed and known, and the second simultaneously estimates V ref , an additional parameter as shown in Fig. 3. This setting is incorporated as an additional optimization variable in the parameter vector Θ. This variable is essential for constructing the inequality constraints AΘ ≤ b according to the IEEE 1547-2018 standard [8]. We train the GMM according to our formulation in Section V-A on the entirety of dataset group three using the popular scikitlearn [47] library. The yielded log-likelihood surface for these data, which is part of the objective function of (30), is shown in Fig. 8.
Empirical histograms of the disaggregation model performance in terms of the standard kVAR deviation of the estimated PV reactive power from the true PV reactive power (RMSE) are shown in Fig. 13. Each observation in these histograms corresponds to the execution of the disaggregation algorithm (33) over the course of an entire year. Note that the algorithm is robust to prior knowledge of V ref . The sample means of the RMSE values for the 451 load datasets studied are shown in Table II along with CNLLSQ.
An example of the disaggregation-based reconstruction is shown from the curve fitting perspective, in Fig. 12, and the Fig. 11. Distributions of control parameters θ 1 , θ 2 , θ 3 , θ 4 for the 451 loads in dataset category 2, each with the corresponding curve φ Θ shown in black. Note that the reactive power settings are given in per unit. The BTM PV of each load has a different apparent power rating, therefore the actual curve settings in kVAR will differ significantly among loads.    [12], [13].

A. Limitations and Future Work
The methodologies presented in this paper have both limitations and opportunities for future work. Notably, the disaggregation algorithm could use an analytical distribution for the likelihood function. This may create opportunities for theoretical guarantees and Bayesian treatments of the method. Further, intuition suggests that the methods described in this paper may be generalizable to other control curves φ. If this is the case, φ Θ would not need to describe an inverter control parameter specifically.
Throughout this paper, we have assumed access to the daytime/nighttime change points for the timeseries data. It is likely that these labels can be readily approximated if the location of the BTM PV is roughly known. If the user of these algorithms has no information about the BTM PV location, the thresholds will need to be estimated through weather data or another data-driven method.
As is the case in the second dataset group, the full range of the curve is not actuated for the majority of loads. While the disaggregation MLE approach is able to predict the exposed regions, there are opportunities for future work to recover the unactuated parameters. The current algorithm is accurate only within the actuated states, as shown in Fig. 12.
Furthermore, we have developed the algorithms of this paper based on 15-minute granularity AMI data, which is the most common according to a very recent industry survey in [48]. However, some utilities may limit their AMI data collection to hourly resolutions due to bandwidth, storage, or other limitations. Therefore, there is an opportunity for future work to investigate the sensitivity of the proposed methods with respect to various measurement resolutions, collection methods, and other practical data issues.

B. Comparison to Other Disaggregation Algorithms
The work we have presented is, to the best of our knowledge, the first solution to estimate the control parameters of BTM PV systems and simultaneously disaggregate inverter reactive power from net load data. However, the effectiveness of our proposed methods can be reasonably verified by qualitatively comparing them to similar works. For instance, the least-squares methods can be compared to our past work [9] on estimating power factor control settings from net load data, where we achieved an average RMSE of 0.194 kVAR. Similarly, the disaggregation-based elements of the presented methods are comparable in performance to existing active power load disaggregation algorithms. For example, reference [11] achieved RMSEs of approximately 450 kW for a 7.5 MW PV site. In the residential setting, references [12], [13] achieved RMSEs for active power in the range of 0-2 kW and 0-10 kW respectively. In our case, we achieved an average RMSE of 0.175 kVAR for residential PV inverters with sizes ranging from 1.35 to 6.5 kVA. There is an opportunity for future work to develop more universal testing benchmarks and datasets for this new area of research to allow for new developments and extensions of this work and others.

VIII. CONCLUSION
We have presented a set of methods to reconstruct unknown reactive power control parameters of solar PV inverters from aggregate AMI measurements, without the need for a distribution circuit model. The first of these algorithms allows for efficient reconstruction via non-linear least-squares based on parameterized control curves and requires manual tuning of a filter. This method was shown to reconstruct the parameters with MAPE values between 0.425-2.870%. To avoid tuning this filter for each individual load, we developed the more flexible net load reactive power disaggregation approach to solve the same problem with the same data inputs.
Interestingly, we demonstrated a link between the problems of reconstructing reactive power control settings and net load reactive power disaggregation. We showed that the two goals can be accomplished in a simultaneous and complementary manner with a mean prediction error of less than 0.2 kVAR. These results are comparable to the performance of existing active power disaggregation methods in the literature.
The tools developed in this paper are useful for modeling, predicting, and verifying the impact of advanced inverters on the distribution grid, without the input of the PV installer or the customer. These methods can also be used to update controller databases on BTM PV control settings regardless of whether they change or are modified in the future.
As advanced inverter and AMI deployment increases, these methods can help utilities and distribution engineers combat incomplete or out-of-date information in their feeder models, assisting the ongoing sustainable energy transition.

APPENDIX [NIGHTTIME INVERTER OPERATION]
If reactive power control is set to continue during the night, the disaggregation problem (33)