Design and Implementation of a Machine Learning State Estimation Model for Unobservable Microgrids

An observable microgrid may become unobservable when sensors are at fault, sensor data is missing, or data has been tampered by malicious agents. In those cases, state estimation cannot be performed using traditional approaches without pseudo-measurements. To address the lack of observability, this article presents the design and implementation of a novel three-phase state estimation method for unobservable and unbalanced AC microgrids, using machine learning techniques, without pseudo-measurements, and under heteroscedastic (i.e., non-constant variance) noise. The proposed machine learning state estimation (MLSE) makes full use of multiple candidate models trained with a small number of power flow simulations via OpenDSS, through random levels of demand and renewable generation in every simulation, enhanced through a proposed Tikhonov regularization operator. To deal with the heteroscedastic nature of measurements, a recursive average model is proposed to accurately estimate the state variables. Results are obtained using real data from a microgrid located on the main campus of the State University of Campinas (UNICAMP), in Brazil. The method can be easily adapted to microgrids with different configurations, distributed energy resources, and measurements. It is shown that the proposed MLSE outperforms the traditional weighted least square (WLS) state estimator.


I. INTRODUCTION
State estimation calculates the most-likely values of the state variables in an electrical grid, from real-time available measurements. This information is helpful for the identification of erroneous measurements, on-line parameter estimation, The associate editor coordinating the review of this manuscript and approving it for publication was Diego Oliva . cyber-security assessment, and autonomous energy management systems (EMS) [1]. State estimation has been mostly applied in transmission systems due to their high level of digitization [2]. The direct translation of state estimation techniques from transmission systems to the distribution level is not always possible [3], so that the approaches used in distribution systems and AC microgrids are still challenging [4], [5], [6]. On the other hand, incorporating uncertainty into VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ state estimation models has also become a relevant research topic due to the proliferation of low-cost smart meters in distribution systems as a replacement of phasor measurement units (PMUs). In this case, noise measurements and uncertainty due to malfunction associated with smart meters are generally unknown and difficult to characterize in real-time applications [7]. The main challenges of applying a traditional state estimation in microgrids can be summarized as follows: (1) microgrids are typically unbalanced three-phase AC distribution systems. Hence, there is an increase in the required number of variables and equations, compared to classical approaches; (2) the X/R ratios in microgrids are significantly low. As a result, fast decoupled state estimators [8] cannot be used, and conventional DC state estimators become irrelevant [9]; (3) limited observability with only a few measurements available [10]; (4) topological changes (i.e., transition from main grid-connected to islanded mode, and vice-versa) can lead to different operating points in a short amount of time [11]; and (5) PMUs are not as ubiquitous as in transmission systems.
In this context, considerable efforts have been made to develop state estimators for distribution networks [12]. However, very few works have addressed the development of state estimators for AC microgrids [13], [14], [15], [16]. Most static estimators are based on the classical weighted least squares (WLS) method [17]. In [18], the WLS method was applied to multiple microgrids, resulting in high computational complexity. Other works have focused on scrutinizing the problem of dynamic state estimation for microgrids [19]. For instance, in [20] a modified extended Kalman filter (EKF) [21] was proposed to estimate the dynamic variables of generator units, as well as the static variables related to the network. In [22], a distributed dynamic state estimator was proposed to estimate the operation of the energy resources and the status (connected or inslanded) of the microgrid. In [23], two techniques for state estimation in droop-controlled islanded microgrids were proposed, via an unscented Kalman filter (UKF) [24] and a non-linear particle filter. Microgrids usually comprise high-speed conversion systems associated with extremely fast dynamics, on the order of nano-seconds. Thus, dynamic state estimation would require expensive resources for very limited applications. On the other hand, estimation of steady-state electricity variables with scarcity of smart meters, communication networks, and computation technology is more reasonable and costbenefit approach.
The main differences among the aforementioned works lie in the choice of the state variables and how the measurements are considered. By their choice of state variables, state estimation approaches can be broadly classified into two major groups: (1) node voltage estimators and (2) branch current estimators [12]. Both approaches can be formulated in polar or rectangular coordinates. Regarding data, some methods use raw measurements (i.e., originally measured or forecasted values), and others use equivalent voltage and current measurements. [12]. Forecasted data approaches are not adequate for microgrids able to seamlessly disconnect from the grid and operate in an islanded mode in the event of a disturbance. Nevertheless, most state estimation methods for microgrids are mainly based on WLS using analytical formulations to model the system and truncated iterative methods to solve the non-linear set of equations [25].
Traditional state estimation using WLS is widely used in transmission systems [26], and it has been found to be suitable for observable grid-connected microgrids [27]. The non-linear relationship between the state variables and measurements results in a state-dependent Jacobian matrix that has to be updated at each iteration of the state estimation process. Because of its state-dependent Jacobian (and thus gain) matrix, applying this method to microgrids is computationally expensive [28]. Moreover, WLS requires the meter error variance (usually noted as σ ) as an input parameter to perform the estimation.
Classical WLS approaches consider σ as a known constant. This work addresses the possibility of the measurements having different noise levels. The accurate knowledge of noise characteristics in state estimation methods is a prerequisite for designing a high-performance estimator [3]. However, characterizing noise using low-cost smart-meters can be challenging because the sampling rate ranges from minutes to hours [7]. Previous works considered measurements from (PMUs) which typically have a sampling rate of more than 50 samples per second where noise characterization is feasible [2]. Other works have used machine learning only to assist classical state estimators (as in [29], [30]). However, very few works [31], [32] have addressed the direct use of machine learning state estimation (MLSE) for AC microgrids. State estimation can be considered as a regression problem [33]. Therefore, if the noise (σ ) is assumed known, constant, independent, and identical distributed (iid) from the parameters (i.e., homoscedastic), it is possible to develop a model for mapping the available measurements to states variables using supervised learning to train any regression model [33] with a suitable dataset. In this work, the noise is considered inputdependent (i.e., heteroscedastic), which is a more realistic assumption when low-cost smart meters with low sampling rates are used instead of PMUs [7]. Heteroscedastic noise encodes the possibility of the measurements to have different variances σ for various real-time observations. On the other hand, the proposed approach estimates the state without specify (i) an initial guess estimation point (a.k.a., flat start), nor (ii) an explicit definition of the measurements' variances (σ ), as is the case of the classic WLS approach.
While many existing methods focus on solving state estimation problems under simplified assumptions, such as homoscedastic measurements, this manuscript argues the importance of exploring ways for merging simulation and machine learning to solve state estimation under varying conditions, such as heteroscedastic measurements. A primary motivation of this study is the potential future applications to handle the unexpected changes in the dynamic and stochastic nature of the microgrids that can be numerically intractable using classical approaches. The proposed MLSE uses a synaptic matrix of weights W that captures the functional relationship between state variables and the available measurements, in order to estimate the state variables accurately. The training model is based on Moore-Penrose left pseudoinverse, enhanced through a Tikhonov regularization. A robust MLSE against heteroscedastic uncertainty is obtained using an approach named in this paper as recursive average model. AC power flow simulations via OpenDSS [34] are used as the source of knowledge. Results are obtained using data from a real-world microgrid located at the main campus of the State University of Campinas (UNICAMP), in São Paulo, Brazil. An overview of the microgrid is shown in Fig. 1 [35]. The performance and accuracy of the proposed MLSE is analyzed during grid-connected and islanded operation modes with noisy measurements, each with a different signal-to-noise ratio (SNR). In summary, the contributions of this paper are as follows: • A novel MLSE for unobservable three-phase unbalanced AC microgrids, that does not require previous knowledge of the meter error variance σ to deal with the heteroscedastic nature of measurements; • A proposed regularization operator that captures the functional relationship between state variables and available measurements; • A novel learning approach named recursive average model for robustness against heteroscedastic uncertainty.

II. CLASSICAL STATE ESTIMATION
Given a set of m available noisy measurements z = (z 1 , z 2 , . . . , z m ) T , state estimation determines the most-likely n state variablesx = x 1 ,x 2 , . . . ,x n T of the microgrid.
In this case, the required state variables are the voltage magnitudes (V i,p ) and phase voltage angles (θ i,p ), where i is the bus number, given by i ∈ {1, 2, 3, . . . , n b } ≡ B, n b is the total number of buses, and p is the phase defined by p ∈ {a, b, c}. Thus, the set of state variables consists of 3n b elements of the voltage magnitudes, plus 3(n b − 1) elements of the phase voltage angles. Note that an estimation of x is sufficient to determine all remaining electrical variables [25]. Set z contains different types of measurements, such as active power injection (P i,p ), reactive power injection (Q i,p ), current magnitude injection (I i,p ), active power flow (P ij,p ), reactive power flow (Q ij,p ), current magnitude (I ij,p ), among others, where j ∈ B and ij is a branch. Thus, given a set of measurements z, an analytical expression that maps the estimated state variablesx with z is given by (1).
In whichx ∈ R n contains the estimated state variables, z ∈ R m contains the available measurements, and e ∈ R m is the added noise for each measurement. Note that if the number of real-time measurements z is lower than the number of required state variablesx, observability might not be achievable and, as a consequence, it is impossible to calculate state variables without using pseudo-measurements [9]. The minimum amount of measurements m min needed for the method to work is m min = 2n − k, where k is the number of defined slack buses. In this case, the number of state variables is 2n − k, assuming the microgrid contains n buses, then the microgrid is described by 2n variables, namely n voltage absolute values and n voltage angles. However, to improve accuracy, the number of redundant measurements should be higher. In practice, a value of m ≈ 4 n is often considered reasonable [9]. Thus, (1) can be written as: z h x + e (2) where, vector h x are functions that maps the state variables to the measurements and the stochastic noise e is modeled as multivariate Gaussian e ∼ N (0, e ). The solution of the state estimation problem using WLS approach can be expressed as follows: where i = {1, 2, 3, . . . , m} is the measurement used in the state estimation, W i is the weighting factor obtained from the stochastic noise model of each measurements. This is considered to be constant white Gaussian noise with zero mean. Thus, W i = σ −2 i and r i are residuals computed as: (3), the presence of different noise parameters σ in the objective function implies that dissimilar evaluations of the same set of measurements z will lead to erroneous states valuesx.

III. DATA GENERATION
Training a machine learning model with heteroscedastic noise measurements introduces uncertainty in the machine learning model. In order to deal with this challenge, the following data generation process is performed. From the statistical perspective, state variables generate a random vector x obtained VOLUME 10, 2022 from random variables that follow a multivariate normal distribution X ∼ N (µ x , x ). In this case, given x, it is possible to obtain a m-tuple of noiseless measurements h x using well-known analytical or computational microgrid models h (·), such as those in [36]. In practice, the information about those parameters that define the multivariate normal distribution X , is not always available. However, using samples generated from power flow simulations, it is possible to train the machine learning model W D train ,λ best to capture the non-linear relationship between the state variables x and measurements z. Subscript D train , λ best , refers to the training dataset and the regularization hyperparatemer used to fit the model W. This will be explained extensively in the next sections. In this paper, the microgrid was first modeled in OpenDSS [34] to generate a noiseless dataset 1 , using traditional power flow simulations, with random levels of demand and renewable generation in every event δ. As shown in Fig. 2 one thousand and five hundred events (δ = 1500) were generated.
For numerical stability, a normalizing process to make all the state variables x lie between 0 and 1, in the generated data D ∞ .

A. HETEROSCEDASTIC NOISE MEASUREMENTS
The noise used in the proposed MLSE is heteroscedastic. Thus, it can be defined as follows: To transform the noiseless dataset D ∞ into heteroscedastic noise measurements, a signal-to-noise (SNR) ranging from 20 to 40 dB was defined, with increments of 1 dB, based on previous publications [37], [38], [39], [40], [41], [42] that worked with state estimation over homoscedastic measurements. SNR quantifies the measurement error for each measurement using (6).
The error e m iid ∼ N 0, σ 2 m used to pollute each measurement z m is injected as in (5). Note that the state variables x, computed in D ∞ , correspond to a unique microgrid operating point despite the fact that the measurements z are randomly perturbed, causing an increment of uncertainty in each measurement z m . The set D set = {D ∞ , D SNR20 , D SNR21 , . . . , D SNR40 }, that contains several noisy datasets, eliminates the need to specify the nature of the measurement error e(z) during the training process. As shown in Fig. 2, from the available data D set a stratified sampling strategy is employed to select D test = 500 records for test at random [43]. Then, a second stratified sampling method is used to divide the rest of the set into a training set D train = 500 and a validation set D val = 500. In this case, the available datasets are: (i) D train-set = D train,∞ , D train,20 , . . . , D train,40 , 1 In this work, notation ∞ indicates noiseless measurements, i.e., signalto-noise (SNR) → ∞. Notation (·) δ i=1 , is used for indicate that i varies from 1 to δ ∈ N.  40 . Due to the noisy heteroscedastic nature of the measurements, traditional multi-output linear regression methods without regularization are inadequate because different measurements could return the same state variables. Other well-known approaches that use classical regularization methods, such as the Ridge, Lasso & Elastic Net Regression, were tested using Scikit-learn [44] Python library without achieving good results. Therefore, the next section in this paper is concerned with the computation of a machine learning model W D train ,λ best able to perform an approximate solution to obtain estimate state variables from noisy heteroscedastic measurements in unobservable microgrids, since there is no analytical equation to solve this problem.

IV. PROPOSED MLSE AND REGULARIZATION PROCESS
The analysis explained for D train and D val will be the same for all the available datasets, for each SNR level at a time. Then, as shown in Fig. 2, a tuple of training and validation datasets (D train , D val ) is selected for a determined SNR level. In general, a machine learning problem can be solved using the following three steps [45]: 1) Define the machine learning model, for instance single layer networks, feed-forward neural network, recurrent neural networks, etc. 2) Define a metric to evaluate the learning process, like mean squared error (MSE), cross entropy, etc. 3) Define an optimization algorithm to improve the machine learning model such as least squares, gradient descent, etc.

A. MACHINE LEARNING MODEL
Different from classical WLS approaches that perform an optimization process iteratively. The proposed MLSE is based in a geometric point-of-view, in which machine learning is used to fit a parametric model matrix W that projects the information of available measurements to state variables. It is possible to use the least square method to find a projection operator W D train ,λ best , which is a matrix whose dimensions depend on the number of n state variables and m + 1 measurements, as follows: where, the regression parameters of the model w n,m ∈ R, are not known, and must be estimated from the a training dataset D train . In this case, n by m + 1 regression parameters are used to estimate n state variable, from m available measurements, in which the regression parameters (w i,0 ) n i=1 correspond to the fixed offset for a basis function, a.k.a., intercept [45]. In this work, the machine learning model W D train ,λ best was defined as a single layer network, since it has no memory, i.e., the estimated state variables depend only on current measurements. In practice, W D train ,λ best performs an estimation of the state variables from available measurements, using equation (8):

B. LEAST-SQUARE OPTIMIZATION MODEL
In state estimation, it is required that estimated state variableŝ x to be as close as possible to the real values of x, for all feasible scenarios of microgrid operation, i.e., ∀ δ, where δ is the number of available events in the training dataset D train . To do so, the well-regarded mean squared error (MSE) [46] is used as the metric to train and validate the proposed model: where E[·] denotes the expected value and · 2 2 is the squared Frobenius norm. Since the MSE can be assessed as a measure of the aggregated contributions of all estimated variables, it is possible to rewrite (9) as: Given that the number of events using δ AC power flow simulations is a finite set, the expected value can be approximated by the sample mean computed as: Which is the same metric used to evaluate linear least-squares models in estimation theory and in linear regression models [47]. From equation (11), a natural method to determine the best estimation ofx is to obtain the matrix W D train ,λ best , which minimizes the MSE applied to the difference between the true state variables x and the estimated counterpartsx: . .
13) Throughout this paper, superscript T denotes transposition, hence, equation (13) can be summarized as follows: Replacing (14) in (12), one obtains: The above statement can be solved as: A possible closed-form solution known as a Moore-Penrose left pseudo-inverse [48] for (16) is shown in (17). However, this is not a feasible solution since the Gramian matrix T −1 is singular without a regularization process, as explained in the next section.

1) APPLICATION OF THE TIKHONOV REGULARIZATION
In practice, the linear system of equations in (13) is an ill-posed problem because it does not meet the following three Hadamard criteria for well-posedness: (i) for all admissible data, a solution exists, (ii) for all admissible data, the solution is unique, and (iii) the solution depends continuously on the data [49]. Thus, small perturbations in measurements generate large errors inx, when the Moore-Penrose left pseudoinverse is used (i.e., equation (17)) to fit the parametric model W D train without regularization. This solution is useless due to the severe propagation of large errors, caused by large norm of W D train matrix. A matrix with a large norm is called VOLUME 10, 2022 an ill-conditioned matrix. An ill-conditioned matrix can take a unit-length vector and stretch it by a large amount. Thus small uncertainties in the domain vector get magnified and lead to large uncertainties in the range. Linear systems of equations, such as (13), with a matrix of the weights W D train , are often referred as linear discrete ill-posed problems [50]. The standard way to obtain stable solutions is to modify the problem by replacing (17) with a nearby problem, whose solution is less sensitive to large errors. This replacement is known as regularization [51], [52], [53]. While many methods focus on use of zeroth order Tikhonov without considering w n,0 [54], such as Ridge regression model, this work proposes a new regularization operator R that adds terms containing regularization parameters to (17) in order to control the norm of W D train and obtain an stable estimation, as shown in Fig. 3. The remainder of this section describes the regularization process shown in Fig. 3, which details the process to select the best regularization hyperparameter λ. This is of central importance for equation (18) instead of the not regularized least squares solution, in equation (17), aiming at properly characterizing W D train ,λ , for a given set of noisy measurements.
To the best of the authors' knowledge, the regularization operator R presented in this work has yet to be proposed. Therefore, the solution is obtained by a closed-form solution, similar to the one based on the pseudo-inverse, given by: The design of the regularization is governed by available knowledge or model assumptions about the state estimation (i.e., prior information) and, to lesser extent, by the implementation or computational complexity. To solve linear discrete ill-posed problems λ in equation (18) is known as the Tikhonov regularization hyperparameter, and R is known as the regularization operator [55]. The idea is to shift the spectrum of T , i.e., shift the singular values away from zero. A singular value is the positive square root of an eigenvalue of the symmetric matrix T . Note that this expression T −1 in equation (17) is invertible if and only if all the singular values of T matrix are non-zero. It is worth mentioning that λ > 0 because it serves as the coefficient that shifts the diagonals of the T matrix, a.k.a, singular moment matrix [56]. Parameter λ provides balance between the data fidelity (first term: T ) and prior information assumptions (the remaining terms R). Note that, with a λ = 0, equation (18) is equal to (17), making the term T −1 singular again. The proposed MLSE regularization process is shown in Fig. 3. A suitable value of λ generally is not explicitly known and must be determined using the validation dataset D val during the training process. To do so, a λ search space with dissimilar values must be defined as λ ∈ {λ 1 , λ 2 , . . . , λ 1000 }. In this work, equation (18) is used to train candidate MLSE models with different λ's, creating the following model search space: W D train ,λ ∈ W D train ,λ 1 , W D train ,λ 2 , . . . , W D train ,λ 1000 . To determine the best MLSE model W D train ,λ best , obtained with a specific Tikhonov regularization hyperparameter λ best , the L-curve search methodology is performed [57]. The L-curve is a log-log plot of the norm of a regularized solution RW T λ 2 2 versus the norm of the corresponding residual norm x (δ) − x (δ) 2 2 . It is a convenient graphical tool for characterizing the trade-off between the size of regularized solutions and their fit to the training dataset, as the hyperparameter λ changes. Graphically, the best λ is the one located in the corner of the L-curve, as shown in Fig. 3. In this work, it was empirically determined that the regularization operator R in equation (19) shows a stable solution for the non-linear three-phase state estimation problem in unbalanced AC microgrids (19), as shown at the bottom of the next page, where, R ∈ R m+1×m+1 .
The regularization operator R is inspired by the work in [58]. In this case, R plays the role of a penalizing filter, it is done by considering the anti-reflective and high-order boundary conditions [59]. Both these conditions were introduced in [60] and [61], respectively. In this case, the boundary conditions of R is determined by the first and last rows based on the first-order Tikhonov regularization [62], which penalizes deviations from a constant model, i.e., it favors ''flat'' (constant) solutions and penalizes gradients, working as a first-difference operator in the boundary conditions. On the other hand, the interior rows (rows 2 to m) of R penalize model ''roughness'' or bumpiness (curvature) rather than model gradient, based on the second-order Tikhonov regularization [62]. This favors ''smooth'' (constant gradient) solutions, working as a second-difference operator.

C. RECURSIVE AVERAGE MODEL
As is shown in Fig. 2, instead of using multiple W SNR,λ best , one for each SNR level, a novel recursive average model is proposed to generate a single feasible modelW for the complete SNR interval of noisy measurements, as follows: 1) Train a model W SNR,λ best using the MLSE regularization process of Fig. 3 with each pair of available training and validation datasets for different SNR levels, obtained as in Fig. 2. 2) Use equation (20) to average the models recursively, as explained in Fig. 4 W k+1 = 1 2 W SNR,λ best +W k ∀k ∈ N (20)

V. MATERIALS AND METHODS
This section first describes the microgrid system used as a case study. Then, section V-A explains the mathematical background of hypothesis tests and the criteria used for validation purposes of the proposed MLSE model. Numerical results obtained using all different scenarios are presented and comprehensively discussed in section VI. A real-world AC microgrid located at the main campus of the State University of Campinas (UNICAMP), in São Paulo, Brazil [35], with 320 state variables (V i,p and θ i,p ) and 198 measurements (P i,p , Q i,p , I i,p ) was modeled using OpenDSS [34], and the poposed MLSE via the workflow in Fig. 5 programmed in Python 3.8 [63]. From the WLS perspective this case of study is unobservable, without using pseudo-measurements.

A. STATISTICAL ANALYSIS
To analyze the accuracy of the proposed MLSE, a box-plot diagram is computed using the available D test-set for each SNR, as is shown in Fig. 2. The results of the estimation are compared against the actual power flow calculations using (9). In this case, two statistical tests are performed. First, a hypothesis test on the mean value is performed with the purpose of evaluating whether the estimated state variables on each SNR level are statistically equal to the reference given by the power flow simulations. In the second test, known as the homogeneity test, each state variable x is analyzed separately with the intention of evaluating whether    the set of estimated valuesx (using different noise levels) has the same distribution.

1) TEST OF THE MEAN
According to [64], a hypothesis is a statement about the parameters of one or more populations. In this case, one hypothesis should indicate that there is no difference between the estimated state variables and the references values of the test dataset, while the other hypothesis indicates that they are different, i.e.: where H 0 is the null hypothesis and H 1 is the alternative hypothesis.
To test whether the null hypothesis is true or not, the t-score was chosen as the test statistic since the standard deviation of the population is unknown. A critical region was computed, following the procedure described in [64], with a significance level of 0.05 (the probability of rejecting the H 0 when it is true). H 0 is rejected whenx lies outside the critical region. Otherwise, the test fails to reject the null hypothesis, and one could conclude thatx and x are the same.

2) HOMOGENEITY TEST
On the other hand, the homogeneity test evaluates n populations of interest, divided into k categories. The chi-squared method was performed by processing the residuals to find out whether the estimated state variablesx contain errors.
The hypotheses were the following: In this case, one could conclude that there are no errors in the estimated state variables. The critical region and χ 2 were computed using the hypothesis test functions of Python software [63].

VI. NUMERICAL RESULTS
In all experiments, 2 the basic metric used to assess the performance of the MLSE is the mean-squared error (MSE) between the true value of the state variables x (i.e., ground truth) and the output offered by the MLSE model (x). As shown in Fig. 6, the best λ was 0.0004, using a search space that ranges 1e-10 to 1e+10.
The results using the D test-set and the Machine Learning State Estimation (MLSE) shown in Fig. 7 are summarized in Fig. 8 to Fig. 11. These figures are organized as follows: Boxplots show the MSE for all the estimated state variables, using the proposed MLSE model. The boxplot with label ''∞'' shows the performance of the MLSE model with noiseless measurements.
In this case, the best result using noiseless measurements z was approximately 1e-5, as shown in Figs. 8 to 11. The   worst result was 1e-4, approximately. These results suggest that the MLSE model is adequate for heteroscedastic noisy measurements. This is in contrast with traditional WLS state estimation approaches previously tested in the University of Campinas, that reported about 1e-3, using homoscedastic measurements and pseudo-measurements to deal with lack of observability(cf. [65] for a broader discussion on this).
Regarding testing of the mean, the t-values of the test are presented on Figs. 12 -13 (tops). It can be seen that all t-values are close to zero. This means that there are no t-values within the rejection regions determined by the  critical t-value= ±1.974. Instead, all the estimated state variables lay inside the critical region −1.974 < t-value < 1.974. This shoes that the reference state variables and the estimated state variables are equivalent. Note that, since it is known that the t-distribution has zero mean, the t-values can be either positive or negative.
In addition, the p-values are shown in Figs. 12 -13 (middles). They quantify the probability of observing a more extreme test statistic, in the direction of the alternative hypothesis. In a nutshell, if all p-values are larger than the chosen threshold (5%), this would indicate that the observation cannot occur by mere chance, and the null hypothesis of equal population means would not be rejected. In this case, all p-values were larger than the threshold. Consequently, the test fails to reject the null hypothesis i.e. all the estimated state variablesx should not be considered different than the ground truth x on any V i,p or θ i,p .
Finally, the distribution of the p-values is presented in Figs. 12 -13 (bottoms), to show how much the test correctly concludes that the estimated state variables are equivalent to the reference state variables. In this work, the deployed test of the mean concludes that estimated and reference state variables are equivalent, with a confidence level of 95%.   In order to test whether the estimated state variables and the references have the same distribution, the homogeneity test explained in Section V-A was deployed. In this case, all tests return p-values greater than the α = 0.01, as shown in Fig. 14. Thus, with a level of significance of 1%, there is no evidence to conclude that the distribution of the reference variables is different than the distribution of the estimated state variables.

A. COMPUTATIONAL COMPLEXITY ANALYSIS
In order to have a performance value of the proposed MLSE algorithm on running time, the asymptotic execution time as a fundamental measure of the computational complexity efficiency is used. The asymptotic analysis makes it possible to determine the complexity of the asymptotic time. In this paper, the asymptotic notations , O, are used to describe the worst-case time complexity of the MLSE. To be consistent with the notation in equation (8) and Fig. 7, the entry corresponding to the i-th row and j-th column is denoted as W[i, j]. Similarly, the i-th entry in the measurements vector z is denoted as z[i]. Thus, the vector of estimated state variableŝ x of length n and its i-th entry for 0 ≤ i < n is defined as follows: Assuming that each operation can be done in O (1) This implies that the worst-case computational complexity value of the proposed MLSE in running time is (nm).
The advantages of the MLSE over the methods in the literature are: (i) Reduction of the implementation complexity, i.e., the formulation yields a unique constant matrix based on the trained weights to perform state estimation resulting in a non-iterative model. Hence, the number of iterations required to obtain a solution is one. This contrasts with the traditional WLS state estimation approach, which requires many iterations. Consequently, a significant computational time reduction is observed; (ii) versatility, the proposed approach can be used with any set of measurements. Furthermore, it is possible to run parallel implementations with models trained with different groups of measures in order to improve the reliability of the proposed approach; (iii) the MLSE can perform state estimation in unobservable microgrids without pseudo-measurements, while Newton-like approaches do not; (iv) MLSE reports a stable behavior under heteroscedastic noise without explicitly knowing measurements' variances (σ ), while classical approaches require computing variances before performing state estimation.

VII. CONCLUSION AND DISCUSSION
Machine learning state estimation reports high performance in heteroscedastic noisy measurements. The proposed machine learning state estimation is suitable for unobservable microgrids with low-cost smart meters. The MLSE model requires just a small number of samples to train a model. Only 500 power flow simulations were needed, in contrast with other approaches such as [31] and [32] models that required 10 000 and 12 000 events for train a suitable model respectively, on the other hand the proposed approach, can handle heteroscedastic uncertainty on measurements. The proposed MLSE approach estimates the state without specify (i) an initial guess estimation point (a.k.a., flat start), nor (ii) an explicit definition of the measurement's variances (σ ), as is the case of the classic WLS approach. Herein, MLSE, is relevant for expanding existing knowledge in the state estimation area for three-phase, unbalanced and unobservable microgrid systems that are eligible for operating in grid-connected or islanding model. In this paper, gross errors are not considered at the available measurements for the state estimation. Future work will explore how to incorporate a pre-processing stage into the MLSE model to identify measures without gross errors, such as data latency and false data injection attacks, before performing state estimation.
The main difficulties of the proposed MLSE reside in the microgrid modeling used to perform precise simulations. A detailed model of each circuit component was necessary to accurately represent the microgrid's response to different scenarios. To overcome this challenge, the University of Campinas (UNICAMP) is currently deploying a project [35] to study and develop specialized models in the field of (i) power systems, (ii) renewable energy sources, (iii) power electronics, (iv) control systems, (v) optimization, and (vi) communications and information networks. A microgrid system modeled in OpenDSS with those specialized components was employed to perform this study.