A Belief Propagation Based State Estimator for Semi-Intrusive Load Monitoring System

For industrial application of load monitoring techniques, it is important to establish high-performance state estimators of low-cost and low frequency smart meters (SM) and sensors in a power system, which can run under resource-constrained computing units. Because household electronic appliances often tap power from fixed sockets, a finite state table for the corresponding sensors is suitable and convenient. However, SM in the main line may have an enormous state table. In this study, we propose a belief propagation (BP) algorithm to calculate the power consumption of electronic appliances in a semi-intrusive load monitoring (SILM) system whose SM and sensors have state tables with sizes varying largely. The novelty of the proposed method lies in a continuous approximation to a large state table and a switching scheme between discrete and continuous parts of the SILM system. With datasets from numerical simulations and a real-world experimental SILM system in a set of high-density school buildings within a secondary distribution network, the proposed BP algorithm is compared with relevant state-of-the-art algorithms. The results show that the proposed algorithm achieves a percentage of error (8%), which outperforms the percentage achieved by the other methods, a linear state estimation of 99%, a hidden Markov model of 21%, and a full-discrete BP algorithm of 11%. In addition, the complexity of the proposed algorithm is the least of all methods, and the proposed algorithm can run by SoC on concentrators.

x i,j MAP estimator for the state of SM or sensor indicated by i, j. p i , i = 0 probability function about state of the i-th SM and its sensors' states. q i,j probability density function (pdf) of the measurement error at the SM or sensor indicated by i, j. V state table of the system as a whole. v i , i = 0 constraint about the i-th SM and its sensors. H true value of the system as a whole. U measurement error of the system as a whole. X state of the system as a whole.
The associate editor coordinating the review of this manuscript and approving it for publication was Haris Pervaiz .
Z measurement value of the system as a whole. M M 0,i →C 0 message from the i-th SM to the root. M C 0 →M 0,i message from the root to the i-th SM. P real-world probability . P † reference probability under which SMs and sensors are independent. E † expectation corresponding to the reference probability.

I. INTRODUCTION
The development of technology and the decline in prices make smart meters (SM) attractive, and hence there is a significant proliferation of them in electric power distribution systems [1]. Intrusive load monitoring (ILM) techniques install a sensor inside each household electronic appliance in the customers' environment. Although they can present more VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ accurate measurements than other techniques, people often worry about health hazards with radio waves from sensors and leaks of personal privacy by them [2]. An SM based on non-intrusive load monitoring (NILM) receives information at a single point of measurement outside a building and calculates the individual consumption of each electronic appliance in the building by a disaggregation algorithm. NILM techniques do not install a sensor inside end customers' electronic appliances, and have less violation to customers. However, the accuracy of measurement is a challenge for NILM. Semiintrusive load monitoring (SILM) is a compromise of ILM and NILM in both accuracy of data and user experience, which uses SM outside the building and multiple cheap plugand-play monitoring sensors attached to circuit breakers or sockets inside the building [3].
This study addresses low-cost and high-performance state estimation for a power distribution system with SILM. Based on full noised data not only from a target sensor but also from other sensors and SMs, the proposed belief propagation (BP) algorithm obtains a high-performance state estimator for the target with low-cost resource-constrained computing units. The performance of the proposed algorithm is based on reasonable state tables for different kinds of sensors and SMs in the system, a continuous approximation to a large state table, and a switching scheme between discrete and continuous parts of the SILM system. Concretely, we study an experimental SILM system in a set of high-density school buildings within a secondary distribution network. The electronic appliances in the power systems may change sometimes, but we have good apriori information about them to ensure their identification. Because of capitalized cost of the SILM system, there are sensors with low accuracy and low frequency in the branches, and we need to evaluate observed variables with the noise of measurement. In the system, a sensor may connect to few kinds of electronic appliances and it has a discrete state table. An SM in the main line connects to several sensors in different branches, and the SM reports a current which is the total of the branch currents. Therefore, the measuring value at the SM is assumed to be continuous. Keeping the consistency of the algorithms' underlying mechanism, we establish a new form of the BP algorithm by which nodes in a network exchange conditional expectations of involved variables. As details of computation of conditional expectations, which have different forms for discrete and continuous variables, are hidden for nodes, the method can present consistent state estimation for every part of a mixed system with discrete and continuous variables.
The rest of the paper is organized as follows. Section II reviews related work and shows gaps related to highperformance state estimation data followed by the proposed low-cost objective. Section III describes the experimental SILM system and its model. Section IV presents a consistent BP algorithm for discrete and continuous parts under general assumptions. Simulation results to test the performance of the proposed method and discussions are in Section V. Section VI discusses the result of the experimental SILM system and Section VII concludes the paper and details of caluclations and proofs are in Appendixes.

II. RELATED WORK AND NOMENCLATURE
To illustrate the role of state estimation, let us scan the main parts in ILM, NILM, and SILM. Identifying electronic appliances by disaggregation algorithms is the first step of ILM, NILM, and SILM [5]. Disaggregation algorithms include feature extractions as well as load signatures routines, which require apriori information of different equipment and depend highly on the training data [4], [6]. With the information provided by sensors inside the building, identifying an electronic appliance is relatively easier for SILM than NILM when the appliance taps power from wires or sockets with fixed positions [7]. Calculating power consumption, or state estimation, for each electronic appliance is another important issue of ILM, NILM, and SILM. Because there are many sensors and SMs in a power distribution system and their data are related, good state estimation can derive measurements of higher accuracy for an appliance than its raw value with random noise. A classical and efficient state estimation, linear state estimation with weighted least square, is fit for large distribution systems with continuous measurement values [8]. The method assumes that the variables in the system, such as current, phase angle, frequency, active power, reactive power, and apparent power, are real or complex numbers with Gaussian noise.
Spite that linear state estimation is state-of-the-art, there are many other modern methods for state estimation in today's electric power systems. A state estimation by neural networks is investigated by [9], and [10] proposed a random forest algorithm of state estimation for appliances in office buildings. The Bagging classification algorithm and fuzzy analysis have good performances in state estimation and load monitoring, as reported by [11] and [12]. As an important artificial intelligence (AI) method, the hidden Markov model (HMM) makes the state estimation of an individual load by comparing different steady-state power levels of appliances. A modified HMM based on [13] and [14] shows high accuracy after a long burning time in our test, which is consistent with the theoretical result [15]. Different from linear state estimation which handles continuous measurement values only, the Bayesian network and its BP algorithms are suited for estimating discrete random variables [16], [17]. The method assumes that a variable in such a power distribution system has a value chosen from several candidates, uses a factor graph to describe probabilistic dependencies among different variables in the system, and calculates the posterior distribution of variables to obtain a measurement of higher accuracy [16], [18]. The Bayesian network and its BP algorithms are fit for smaller power distribution systems and need more information about variables than a linear state estimation [20]. In our simulating experiment, a Bayesian and BP algorithm presented by [17] obtains good accuracy.
For a slightly large power distribution system, we must handle synchronously discrete and continuous measurement values and there is not a suitable state estimation algorithm. Apriori information, such as few candidate states for a sensor, can improve estimation. However, because the current in the main line is the total of the branch currents, even though the assumption of discrete variables is reasonable for every branch current, values of some variables about main lines have lots of candidates, and continuous measurement values become necessary assumptions for them. Most algorithms can treat continuous variables well, and Bayesian network and BP algorithms are fit for discrete variables. With different assumptions, BP algorithms [18], [27], [28] can analyze a system with Gaussian variables. However, to apply those algorithms, every variable in the power system must have continuous measurement values. [29] studied a mixed system with discrete variables and continuous variables and used a deeping learning algorithm to learn the nonlinear relation between discrete and continuous parts of the system, but the algorithm is extremely time-consuming.
When focusing on the industrial application of NILM and SILM, we have few algorithms of state estimation, because low-cost are important for such cases and hence most candidate algorithms have been affected deeply [21], [22]. Such a system installs low frequency sensors for few kinds of electric parameters. For example, [24] proposes a system which removes voltage sensors from the hardware design. Hence, we must handle great noise in a small dataset from such a system. Moreover, low compute-demanding and low memoryintensive are also important for such a system [23], [25], which is a challenge when selecting or designing algorithms.
To that end, a state estimator, which can handle discrete and continuous variables synchronously and can run with limited computing resources, has great potential. This study proposes such a belief propagation (BP) algorithm to calculate the power consumption of electronic appliances in a semi-intrusive load monitoring (SILM) system. Some symbols have a subscript indicating the involved sensor or SM. For example, a symbol h i,j in the paper represents the current at the j-th sensor connected up to the i-th SM, and a 0,k is for the k-th SM. When considering all SMs and sensors in the system as a whole, we use a matrix H = (h i,j ), which represents true values of current in the system. Moreover, a random variable is printed in italic type or bold italic type, and a symbol with roman fonts or bold roman fonts denotes a possible value of the random variable. The nomenclature of important symbols is as follows.

A. EXPERIMENTAL SILM SYSTEM AND ITS ENVIRONMENT
We study parts of a simple radial distribution system without any feeder interconnections. The radial system has independent feeders, and each end customer is connected to one feeder. The shadow region in Fig. 1 illustrates a part of the radial system, which is a school building consisting of discussion rooms and classrooms. Inside rooms, monitoring sensors attach the wires near switches and sockets, and outside a room, SM with high accuracy attach feeders. Fig. 2 presents monitoring devices in the experimental SILM system. For several neighbouring rooms, there is a wireless concentrator which collects data of SM and sensors in those rooms, according to the standard IEC61334-4-32/IPv4. The server in the SILM system deals with concentrators, and the server does not receive any data directly from sensors.
Fixed electronic appliances in the building includes lots of lamps with different kinds, air-conditioners, desk computers, projectors. There are several electric kettles and heaters in some rooms. Students, teachers and staff may take personal laptop computers into the building for studying and teaching. As a lecture lasts 50 minutes, the pattern of a laptop computer does not change during this period. The number of laptop computers is not big because most students do not take them. Chargers are not permitted but chargers for mobile phones may be taken into classrooms. Features of this experimental environment includes: the number of sensors is far larger than the number of electronic appliances except lamps; the electronic appliances and their positions do not change during a lecture of 50 minutes; the power of appliances except a few laptop computers and chargers are not continuously varying. Moreover, the key variables are current values of every electronic appliance.
Because there are a few laptop computers and chargers in the system, we use effectively a successive sample of current to identify possible laptop computers [31]. Hence we can assume that the current value of every sensor or electronic appliance has serval discrete candidates and we have good apriori information about its distribution. In a test of the SILM system in a few rooms, every sensor can be replaced by an SM as alternative, and another state estimation algorithm of searching a lookup table makes efficient and correct identification for electronic appliances with the accurate value of current from those SM. However, replacing all sensors with SM is an uneconomical plan and we need a practical and efficient state estimation technique based on the dataset from all sensors and SM.

B. MODEL OF THE SILM SYSTEM
According to the standard measurement model [30], we have that for the system in Fig. 1, where H(X) is the matrix of true values, Z is the matrix of measurement values, and U is the matrix of measurement errors. We write the matrix of state variables val- , and x 0 = (x 0,0 , x 0,1 , . . . , x 0,m ). For j = 1, 2, . . . , n, x 0,j denotes the state variable for the jth SM, M 0,j in Fig. 1, and x 0,0 is the state variable for a root SM, M 0 in Fig. 1 ) and x i,j denotes the state variable for a sensor S i,j . The numbers of sensors from different SM may vary, and we replace some x i,j by blanks for this case. Components of other symbols Z, H, U have the similar meaning. There are n SM (except the root one) in the system and an SM has at most m + 1 sensors.
For our scenario, the raw data consists of the current value Z at every SM and sensor. At a time point, the symbols Z = (z i,j ), H = (h i,j ), U = (u i,j ) ∈ R (n+1)×(m+1) are matrices. z 0,0 , h 0,0 , u 0,0 are the measurement value, measurement function, and measurement error, respectively, of the current at the root SM, M 0 in Fig. 1. For other SMs, that elements in U are independent random variables with known variance and the mean 0, and the noise vector u 0 of all SM has a Gaussian distribution.
When the state x i,j of an SM or sensor is known, the corresponding theoretical value of current value h i,j exists. Hence, there is a measurement function connects h i,j with x i,j for each pair of i, j. Sometimes and the measurement function The state table or the state space for an SM or sensor consists of all possible states of the corresponding random variable x i,j , and we often set the state table larger than necessary. When applying our method to the experimental SILM system, we examine possible kinds of electronic appliances for each room and apply the disaggregation algorithm in Section III-A for a successive sample to identify electronic appliances when necessary. According to those method, we can ensure good apriori information about the state table for each sensor.

C. FACTOR GRAPH
The features of the system imply constraints of state variables and measurement functions. As the currents satisfy the Kirchhoff Current Law (KCL), we have the following constraint between the root SM and other SMs in the scenario given in Section III-B.
And for each SM and sensors from it, we have that for i = 1, 2, . . . , n, The markovian relations between SM, sensors, and constraints (2), (3) are illustrated by Fig. 3 with a factor graph, which is introduced by [16] for power grids. The node C 0 in Fig. 3 corresponds to the constraint (2). A node C i , i = 1, 2, . . . , n, corresponds to one of the constraint (3). When two variable nodes, such as M 0,i and S i,j , are connected by a constraint node, C i , then there exists an electrical correlation between them. Two variable nodes which are not connected by a constraint node, such as sensors from different SM, are independent. For example, the factor graph in Fig. 3 implies that x 1 , . . . , x n are independent random vectors.
Given measurement values Z, we will calculate the posterior distribution P(X = X|Z = Z) of state variables. Note that a symbol with roman fonts or bold roman fonts denotes a possible value of the random variable. Because the factor graph in Fig. 3 is a tree, it implies a straightforward method to compute the posterior distribution.

IV. MAXIMUM A POSTERIORI PROBABILITY ESTIMATOR
For a given observation value of measurement, we will study the posterior distribution of state variables regarding the observation value and will present a BP algorithm to calculate the maximum a posteriori probability (MAP) estimator of state variables.

A. POSTERIOR DISTRIBUTION AND MAP ESTIMATOR
is a finite set. Note that each constraint C i in Fig. 3 influences the real-world probability P( . For the scenario in Section III-B, the state variables of an SM and its sensors follow a constraint (3) from the KCL. Hence, for i = 1, 2, .., n, where v i describes the constraint C i given in Fig. 3.
Similarly, the constraint (2) for SM corresponds to a function The same result can be drawn from the factor graph too. p i (x 0,i , x i ), i = 1, 2, . . . , n, describes a local probability about an SM M 0,i and sensors S i,j , j = 0, 1, . . . , m from the SM in Fig. 1. Note that the local probability p i (x 0,i , x i ) = 0 when the SM's state x 0,i and its sensors' state x i do not satisfy the constraint (3). Similarly, the local constraint function v 0 (x 0 ) is about the states of all SM.
Therefore, it follows from the local Markov property of the investigated system, we have the real-world probability From assumptions of the measurement errors, elements of U = (u i,j ) are independent, then the pdf of U is where q i,j (u) is the pdf of u i,j . From assumptions, q 0,i is a Gaussian pdf with known variance and its mean is 0.
According to the measurement model (1), the joint pdf of state variables X and measurement values Z are For a given observation value Z = (z i,j ) of measurement values, it follows from (8-10) that the posterior distribution is the value of measurement function for this state. As the traditional statistical method, the observation value Z does not appear as an argument variable of f (X). We define a function g 0 of variables connected with the constraint node C 0 in Fig. 3, and function g i , i = 1, 2, . . . , n, for variables connected with the node C i , Then f (X) has a concise form as follows.
The sequence MAP estimator X * of the state variables vector for the given observation value Z satisfies Although V is finite, to solve (15) with a brute force method is an intractable problem because |V | is very large and there are constraints (2), (3) and factors To overcome the difficulty, the BP algorithm studies a posteriori probability of a state variable x i,j of an SM or a sensor with the index i, j, with respect to the given observation value Z of measurement values [35]. The BP algorithm presents the following bit-wise MAP estimatorx i,j for the state variable. For convenience, multiplying the posteriori probability with a constant factor |V | −1 , where |V | is the number of elements in V , we have that the bit-wise MAP estimatorx i,j satisfies VOLUME 10, 2022 where the set V i,j is defined at the beginning of this subsection, and V (α, i, j) ⊂ V consists of all X = (x i,j ) ∈ V whose i, j element x i,j = α.

B. BP ALGORITHMS AND THE FLOODING SCHEDULE
Not all elements of the state variable matrix X are independent under the real-world probability P. To overcome this and to apply the BP algorithm easily to the tree given in Fig. 3, we will consider a reference probability P † . We follows the terminology of a reference probability developed by Elliot [32], and its recent applications include [33], [34]. B shows details of the technique. We assume that the random matrix X is drawn uniformly from V under the reference probability P † , i.e., P † (X = X) = 1/|V | for every X ∈ V . Hence all elements of X are independent under P † . For every random variable ξ = ξ (X), the expectation of ξ under P † is For α ∈ V i,j and X = (x i,j ), the index function I α,i,j is Then the bit-wise posterior distributionf (α, i, j) in (17) can be expressed by the reference expectation E † . Concretely, for i = 1, 2, . . . , n, we can calculate the bit-wise posterior distributionf (α, i, j) that the state of a sensor S i,j is α as follows.
Here we omit the arguments of I α,i,j and g i in the last two expressions. The second last equality follows from the tower property of conditional expectations [36] and the last equality follows from the facts that all elements of X are independent under P † and x i is not among arguments of every g k as k = i. Moreover, according to the terminology of the BP algorithm [16], we define the message from the node C 0 to an SM M 0,i as It follows from the relations between g k in Fig. 3 that where the message from the SM node M 0,k to the constraint node C 0 is A general form of (22) is proved in A. The bit-wise posterior distributionf (α, 0, i) that the state of an SM M 0,i is α can be obtained similarly.
Messages M C 0 →M 0,i and M M 0,i →C 0 are generalizations of an issue in the communication theory [26] where every state variable has two possible states. In our problem, because a state variable often has many possible states, a message is a random variable. Although the form of involved messages is different, the flooding schedule of the BP algorithm [37] can still be employed for our problem. According to the flooding schedule, all SM pass extrinsic information M M 0,i →C 0 up to the constraint node C 0 ; the constraint node C 0 then processes its inputs and passes new information M C 0 →M 0,i down to its neighboring SM. The tree in Fig. 3 is suited for the flooding schedule.

C. PSEUDO-CODE OF THE BP ALGORITHM
We can use (23), (22) in turn to calculate conditional expectations, use the last equality in (20) and (24) to obtain the bit-wise posterior distribution, and apply (16) to obtain the bit-wise MAP estimatorx i,j for each state variable. The conditional expectation for a random variable ξ = ξ (X) is a ratio of two expectations. Because the random variable x 0,j is drawn from V uniformly under the probability P † , is a constant for every α ∈ V 0,i . Multiplying a constant to a posteriori probability does not change an MAP estimator. Therefore, we replace hereafter conditional expectations (22), (23) of messages by their corresponding expectations. And we do not change the notation because introducing two many new symbols eventually becomes confusing. The pseudo-code of the BP algorithm to calculate the bit-wise MAP estimators is as follows.  it down to M 0,i . For every state α ∈ V 0,i , 3: Each SM M 0,i , i = 1, 2, . . . , n sends the message M C 0 →M 0,i to its all sensors S i,j . Each sensor computes for every state α in its state table V i,j , and then computes its bit-wise MAP estimatorx i,j according to (16). 4: The root SM M 0 computesf (α, 0, j), j = 0, . . . , n for itself and other SM according to (24), and then computes their bit-wise MAP estimators from (16). A constraint node is not a physical object, and the root SM takes its duty in our implementation, and Fig. 3 illustrates the flooding schedule of messages too. Every SM except the root executes Step 1 in parallel, and after the root sends its message, all SM executes Steps 3 or 4 in parallel again. In our implementation, an SM takes the duty of computation for all its sensors and a sensor sends only its measurement values z i,j up to its SM.

D. DISCRETE AND CONTINUOUS STATE TABLES
We assume hereafter that state variables x i,j of all sensors are independent under the real-world probability P. We will illustrate by a toy system the relationship between a continuous state table and a discrete one for a summation variable. The system has an SM M 0,1 and its 5 sensors whose states and the corresponding values of measurement functions are in Table 1. Then from (3) the state table V 0,1 of the SM M 0,1 has 18 possible states form 0 to 1.7. As a simple expectation, we have shown in (26) that E † I α,0,1 = 1/18 for every α ∈ V 0,1 . Because numbers of all possible values are not large, we can explore every state and obtain exact expectations.
For example, we calculate, for a state α ∈ V 0,1 , a simple and typical expectation value E † I α,0,1 p 1 (x 0,1 , x 1 ) , where p 1 (x 0,1 , x 1 ) is defined by (5) with the real-world probability P and it is a factor in g 1 , see (13). From (18) we have that |V |E † I α,0,1 p 1 (x 0,1 , x 1 ) = X∈V p 1 (x 0,1 , x 1 ) = P(A), (30) where the random event A is Other complex expectations in the BP algorithm can turn to similar issues with the real-world probability P. In general, when x 1,0 , . . . , x 1,m are independent random variables under the real-world probability, the random variables h 1,0 , . . . , h 1,m are independent too. Then P(A) is the probability of the sum h 1,0 + . . . + h 1,m is α. The sum tends to a Gaussian random variable for a large m. Fig. 4 shows the normal probability plot of the sum h 1,0 + . . . + h 1,4 for the data in Table 1 with such a real-world probability, and the distribution of sum approximates a Gaussian distribution well in this case.
Based on the above idea, we can change some discrete random variable with it continuous alternative and do not influence the mechanism of the BP algorithm. In fact, to apply the BP algorithm in Section IV-C, we just need calculate expectations with a general form where involved random variables are of variable nodes connected by one constraint node in Fig. 3. The details to calculate such an expectation are the topics of B.

V. SIMULATION RESULTS AND DISCUSSION
In this Section, we will test the performance of the BP algorithm by simulations and discuss its performance in some extreme cases. The proposed BP algorithm is concluded as the following pseudo-code.

A. ACCURACY OF APPROXIMATE MESSAGES
The number of an SM's sensors affects the accuracy of the SM's message up to C 0 , and the simulation shown by Fig. 4 indicates that the performance of an SM with 5 sensors is good enough.
Messages up to C 0 , messages down to SM, and all kinds of posterior distribution are probability values for various events. When we replace such a value with a pdf at a value h i,j (α), we miss a factor that describes the neighborhood of h i,j (α). For instance, if h 0,1 (α) does not arrange evenly for every α ∈ V 0,1 , the accuracy of the approximate formula for M C 0 →M 0,1 (α) is questionable. Based on the method given in Section IV-D and the data in Table 1, replacing h 1,2 (3) by 0.41, and with the real-world probability in Table 2, we calculate the exact histogram of M C 0 →M 0,1 (α) and present it in Fig. 5. The random variable is far away from a Gaussian variable. This anomalous histogram is due to the increment 0.01 of h 1,2 (3), which is a clear signature of x 1,2 = 3 in the current value at the SM M 0,1 . Tests show the good performance of the approximate method when h 0,1 (α) arranges evenly.

B. INDEPENDENCE OF SENSORS
The theoretical formula of B is based on the assumption of sensors' independence, i.e., state variables x i,j of all sensors are independent under the real-world probability P. To compare performances of methods without the assumptions, we study an environment has only one SM, i.e., there is only one level in the radial distribution system given in Fig. 3, and the SM's sensors are denpendent.
A modification of the proposed method is based on the following discussion on the algorithm's mechanism. In Step 1, each SM M 0,j collects currents at its all sensors and creates a new virtual measurement M M 0,j →C 0 of its current h 0,j (x 0,j ). In Step 2, the node C 0 computes the message M C 0 →M 0,i for an SM M 0,i , from all SMs' measurements z 0 and virtual measurements of SMs except M 0,i itself. Then in Step 3, the SM uses the message M C 0 →M 0,i and data from its sensor to calculate bit-wise posterior distributions and MAP estimators. When the radial distribution system has only one level, we have that the message down to C 0,i is its measurement and the variance of its noise, i.e., parametersm 0,i = z 0,i , σ 2 0,i = σ 2 U . Then for the data in Table 1, where the SM M 0,1 has 18 possible states and every sensor has 3 or 4 states, both exact and approximate methods can apply.
Because the measurement function h i,j (x i,j ) is a one-to-one function of the state variable x i,j , in the test, for sensors states in Table 1, we replace h 1,0 with and then the independence between two sensors S 1,0 and S 1,1 breaks down. Here h is the nearest integer less than or equal to h, and we choose a suitable k to ensure a preset correlation coefficient ρ betweenh 1,0 and h 1,1 . We apply exact and approximate methods to the issue with two kind of sensors' noise u 1 . The first is Gaussian noise N (0, 2.5 × 10 −3 ), the second is a uniform distribution on [−0.087, 0.087]. Two kinds of noise has the same variance. The noise u 0,1 of the SM has a Gaussian distribution N (0, 10 −4 ), corresponding to be of 0.5 accuracy class. We run two methods 400 times at every given parameters. Fig. 6 shows the percentage of error (PE)x 1,0 = x 1,0 for different methods and noise. The approximate method performs still well when the correlation coefficient ρ ≤ 0.2, and the influence of different kinds of sensor's noise is similar. Therefore, we recommend the exact method only when it can execute because it does not need the assumption of independence and its performance is robust.

C. WRONG CONFIGURATION OF A PRIOR PROBABILITY
From a perspective of Bayesian analysis, the real-world probability is a prior probability. The probability is preset in our issue, and it can be obtained from the historical data, or learned by a stochastic model [20], [39]. In this simulation,  4 SM are connected to the root SM and each SM has 5 sensors whose features are in Tables 1 and 2. Based on the approximate algorithm, We calculate the mean PE of all sensors of 400 simulations with different prior probabilities of sensors's state table. Fig. 7 shows the boxplot for the PE of state variables vs the Wasserstein distance [40] between the prior probabilities and the real distributions. The relation between the PE and the Wasserstein distance is similar to the relation between the PE and another distance of distributions, the Kullback-Leibler divergence. The result can be accepted when the Wasserstein distance between the prior probability and the true one is less than 0.1.
Note that the size of an SM's state table is not less than 17, and the size of the root SM is 65. As the true value for SMs is almost a continuous random variable, we can use the Kolmogorov-Smirnov distance to describe the difference between the prior distribution and the real one. Fig. 8 is the scatter plot of the PE of current vs the Kolmogorov-Smirnov distance between the distributions and the real distributions of the first SM. Here a point in Fig. 8 represents a simulation, and in a simulation, we run 100 tests for a pair of prior probability and real distributions of all sensors. As the current at the SM is the sum of its sensors and we cannot control it completely in the simulation, the difference between the prior probability and real distributions is not arranged uniformly as Fig. 7. Nevertheless, the relation between the PE and the Kolmogorov-Smirnov distance is clear in Fig. 8.

VI. RESULT OF THE EXPERIMENTAL SILM SYSTEM
The proposed state estimation method undertakes the main supervision and measurement in the experimental SILM system. However, state estimation is an intermediate step in a full solution of the SILM system which has routines to set an initial state table and to recover failures. For the experimental SILM system, every a hour, there is a successive sample for two minutes of every sensor and SM. From the sample, we can set a suitable state table for most of the sensors by a disaggregation algorithm [31]. Moreover, once our algorithm detects an extreme state for a variable, that is, a very high or a very low current value for a sensor, the algorithm calls a routine to rearrange the state table for the variable with a new successive sample. The routine can identify some high power electrical appliances such as electric kettles and heaters, and in most cases, it can present a suitable state table for lamps, computers, air-conditioners and projectors.
Besides the proposed method, we tested three other methods for the experimental SILM system. The first one is based on a linear state estimation with constraints (2), (3) which gives means of current for all sensors and SM, then the benchmark searches a lookup table to make identification to electronic appliances. A parameter of the benchmark is the lock time during which the system collects data every 15 seconds for computing those means. The second method is an HMM based on [13], [14] which has better performance and has less time-consuming than other learning-based techniques [29]. The third method, a full-discrete BP algorithm, sets a very large state table for SM in the buildings and deals with discrete random variables for SM. Because of the time and space complexities, in applying the third method, all data are uploaded to the server and the server runs the computation. The third method and our proposed method do not need the parameter of lock time, which can present results dynamically. We just presented their results recorded at T = 3 minutes for comparison.
The test is based on data from sensors and SM of the experimental SILM system and corresponding survey results of manual patrol inspection in a set of high-density school buildings lasting one week (five working days). According to manual inspection, we can determine whether a method detects a working device or not. Besides the above data, there is another set of data from three rooms, where every sensor and an alternative SM are connected in series for the test. From this set of data, we can obtain current of a device correctly and hence we can calculate R 2 of its current, where Here I i is measured by an SM, andÎ i is reported by a method. When a method gives the estimatorx k,j of the state of a sensor or SM indicating by k, j, the corresponding currentÎ is determined by the measurement function, i.e.,Î = h k,j (x k,j ). A shortcoming of R 2 is that two states with similar true current values are hard to distinguish by it.  Table 3. Table 3 presents the PE of detecting a device for different methods with respect to manual patrol inspections. As PE of air-conditioner is 0.00 for all methods, its data are not presented in Table 3. It seems that the proposed method can obtain similar results as the full-discrete BP algorithm, and both of them have better performances than other methods. The performance of HMM is good just for a long lock time. Low PE of linear state estimation for lamps may be due to different kinds of lamps among which some have similar powers. Low PE of electric kettles and heaters may be due to faults in inspections, because there are many reports of working heaters by algorithms and no working heaters in inspections. There are some breaks in patrols, whose details are not reported. Table 4 presents R 2 of different devices for the second set of data. There is no electric kettle or heater in the testing rooms and hence there is no corresponding column in the table. Table 4 reveals an interesting result such that the linear state estimation has a better R 2 for lamps, laptop computers and chargers than other methods. The cause of the phenomenon is that states of a device have similar true current values and hence they are hard to distinguish by R 2 . Hence we suggest the linear state estimation to calculate the mean of power for a device. As an index to distinguish devices with varying power, PE is better than R 2 . Table 3 shows that the proposed method and full-discrete method can report devices' working state more accurately.
To investigate the time and space complexities of the four methods, we use a computer with an Intel(R) Core(TM) i7 CPU to store and compute data from all SM and sensors. The execution time in seconds of four methods and maximal memory in megabytes used by four methods are presented in Table 5. For different lock times, the execution time of linear state estimation is similar, and only data of T = 3 minutes are presented in Table 5. The proposed method and the linear state estimation have the least execution time and the full-discrete BP needs a very long execution time and a large memory.
Besides facts presented by Tables 3 to 5, it is worthwhile to note that a large part of the work of the proposed method can run on concentrators. With the proposed method, data from a sensor are collected and processed by a concentrator located in the same room as the sensor. For other methods, the process can not be implemented by concentrators because the linear state estimation and HMM must analyze all data together, and the complexities of a full-discrete BP can not be treated by an SoC on a concentrator. Because of this feature, the proposed method is a technique of SILM, and the other three methods are ILM techniques as sensors' data leak outside a room by other methods.

VII. CONCLUSION
We proposed an approximate BP algorithm for a state estimator of SM and sensors of an experimental SILM system in a radial distribution system. Although every sensor in the SILM system has a few states, the number of states in the state table of some SM in the system is enormous, and hence the complexities of a full-discrete BP algorithm are unacceptable. In this paper, we establish a new BP algorithm for such a system. The algorithm collects discrete states of sensors of an SM and creates a virtual continuous measurement for the SM. And a discrete MAP estimators for a sensor is derived from its measurement value and received continuous messages.
Simulation results indicate that the proposed BP algorithm is acceptable and can apply to some environments where some theoretical assumptions are slightly wrong. Those theoretical assumptions include sensors' independence and a correct prior probability. Tests of an experimental SILM system in a set of high-density school buildings within a secondary distribution network show that the proposed method detects and identifies a working device well. The results show that the proposed method achieves a percentage of error (8%), which outperforms the percentage achieved by the other state-ofthe-art methods, i.e., a linear state estimation of 99%, a hidden Markov model of 21%, and a full-discrete BP algorithm of 11%. In addition, the time and space complexities of the proposed method and the linear state estimation are the least among all methods, and the proposed algorithm is the only one which can run by SoC on concentrators.

APPENDIX A A GENERAL FORM OF (22)
Lemma 1: For a subset W of {1, 2, . . . , n} and i = 0, 1, 2, . . . , n, we have Proof. It follows from the tower property of conditional expectations that We only need to prove that The proof is by induction on the size of W . When W = {k} has only one element k, because x 0 = (x 0,0 , . . . , x 0,n ) and the argument x k , x 0,k of g k and random variables x 0,i , i = k are independent under P † , it is clear that Suppose then that we have (37) for any subset W with size less than l. For a subset W with size l, let W = W \{s} where s ∈ W , and we have that The last equality follows from the fact that the set {x 0 , x r , r ∈ W } includes every argument of g k , k ∈ W . Moreover, because all elements of X are independent under P † and x i is not among arguments of every g r as i = r.
Because M M 0,r →C 0 is a function of the random variable x 0,r , we have that where the last equality follows from the induction hypothesis for the subset W .

APPENDIX B RANDON-NIKODYM DENSITY AND MESSAGE M M 0,i →C 0
To establish an approximate method to calculate expectations, we use the tool of Randon-Nikodym density. For a probability P on V , consider the following random variable on V , dP dP † (X) = |V |P (X = X), X ∈ V .
It can been verified that for every X ∈ V , and hence for every random variable ξ = ξ (X) on V , where E is the expectation under P . We refer to [32], [33], [34] for more details of the reference probability and its Randon-Nikodym densities.
Given i ≥ 1, to calculate a message M M 0,i →C 0 (α) given in (27), let a probability P satisfies that random variables x i,0 , . . . , x i,m are independent under P , and that where x i,j is any state in V i,j , and the constant factor See (9)(10)(11)(12) for the meaning of q i,j , z i,j , and h i,j . Then we can been verify Therefore, for the given i ≥ 1, from (5), (6), and (13) we have that for a random variable ξ = ξ (x 0,i , x i ), where we omit a constant factor. For example, by letting ξ in (44) be I α,0,i , the message M M 0,i →C 0 (α) in (27) is The sum h i = h i,0 + . . . + h i,m tends to a Gaussian random variable under P , it has a meanḿ 0,i and varianceσ 2 0,i , wheré andḿ i,j ,σ 2 i,j are the mean and variance of h i,j under P .
Then the probability that the sum h i is α in (49) is proportional to the Gaussian pdf.
The above (53) is the proposed approximate formula for the message sent up to the constraint node C 0 . When the number of random variables in the sum is small, the arrangement of the state table of the sum is irregular, or not all sensors are independent under the real-world probability P, then the accuracy of the approximate formula may decrease.
From Section B, the message M M 0,i →C 0 can be regarded as a Gaussian random variable with meanḿ 0,i and varianceσ 2 0,i . Because all SMs in the system are often of the same kind, we assume that every SM's measurement error u 0,i is a Gaussian random variable with mean 0 and variance σ 2 U . Then from the discussion in Appendix E we can obtain the following approximate formula of the message M C 0 →M 0,i (α).
From Appendix E, the message M C 0 →M 0,i can be regard as a Gaussian pdf with meanm 0,i and varianceσ 2 0,i , . Appendix VII discusses more details of (55).

APPENDIX D BIT-WISE POSTERIOR DISTRIBUTION AND MAP ESTIMATORS
For every j = 1, 2, . . . , n, we writed j = where the posterior mean and variance arê When i = 0, the bit-wise approximate posterior distribution f (α, 0, 0) follows (56) too, wherê For i = 1, we have similarly that the approximate formulae of the bit-wise posterior distributionf (α, i, j) is given in the following equation, with omitting a constant factor. Because the pdf in (56) is Gaussian and the approximate state table is R, an approximate formula of the bit-wise MAP estimator defined in (16) for x 0,i isx 0,i =μ i , i = 0, 1, . . . , n. For i = 0 and j = 0, 1, . . . , m, the bit-wise posterior distribution distributionf (α, i, j), α ∈ V i,j has a complex expression. But because |V i,j | is small, we can calculatef (α, i, j) for each α in V i,j to search the bit-wise MAP estimatorx i,j .

APPENDIX E APPROXIMATE FORMULAE WITH THE NODE C 0
It follows from (53) of Section B that h 0,i (α), instead of α, appears in a Gaussian pdf. As an example, we study first the following auxiliary function with an argument row vector h = (h 0,1 , . . . , h 0,n ). where h 0,0 and h satisfy (3). Then the function w(h), multiplying a constant factor, is a joint Gaussian pdf of h. Concretely, let a row vector y = (y 1 , . . . , y n ) and consider a symmetric matrices A = (a i,j ) ∈ R n×n such that Then we have that log(w(h)) = − 1 2 where c and c are two constant terms with the given observation value Z, and the row vector µ = (µ 1 , . . . , µ n ) satisfies µ = yA −1 . Note that A = D + ββ T , where D is a diagonal matrix, D = diag( 1