A Nonintrusive Load Monitoring Based on Multi-Target Regression Approach

This paper proposes an experimental design process for the application of energy disaggregation using multi-target regression, a new data learning approach in this application area. The approach shows to be a suitable model for dealing with energy disaggregation problems in which the task is to predict multiple appliances usage from the aggregate data. The experiments were conducted by analyzing AMPds2 and ECO public data sets for verifying the effectiveness of the approach. The data were analyzed through the machine learning process to select the optimal set of electrical features, learning algorithm, and model parameter so that the system resulting from the process could deliver the optimal performance for loads inference. Results of the data learning showed that the electrical features set of current (I), real power (P), reactive power (Q), and power factor (PF) for the aggregate data and Random Forest as the base regressor for multi-target regression model could provide the best disaggregation performance. The overall predictive performance of disaggregation accuracy and F-score outperformed the benchmarking Super State Hidden Markov Model (SSHMM) and Denoising Autoencoder (DAE) network approaches.


I. INTRODUCTION
Non-Intrusive Load Monitoring (NILM) is the task of decomposing the whole energy information from a building into the information of energy used by appliances [1]. The details of energy feedback could provide users with decisive information such as the saving motivations [2], load deferral for high-power appliances. Studies showed that users could save energy costs up to 12% by having real-time feedback on appliance-level as against the conventional monthly billing feedback [3]. Grid utilities could also benefit in energy demand forecasting and balancing the energy supply policy [4]. The NILM operation relies on the machine learning process which consists of data collection, features engineering, and data learning and identification [5]. The key challenge of the system development is to design the learning algorithm to accurately infer the contributed loads operation from the aggregate data for the proper action in the energy management scheme. The NILM research directions were to develop a new or modified data learning algorithm for a better load predictive performance. Those works have been done by using different data learning frameworks mostly on Pattern Matching, Hidden Markov Model-based approach, Multi-Label Classification, and Deep Neural Network [5,6]. Multioutput learning framework is an approach that has drawn more attention in many research areas on prediction of multiple output data [7]. In the NILM applications, the framework is compatible to represent the operating status of multiple appliances to its multiple-output data format. This work presents the use of Multi-Target Regression approach, a new data learning framework for the NILM. The framework could provide a direct estimation of power demand for the target appliances which is a benefit over the conventional multi-output classification-based approaches [8,9]. We also propose the experimental design process based on the machine learning discipline to obtain the optimal predictive performance. Thus, the key contributions of this work can be summarized as follows.
(1) We introduced Multi-Target Regression as the data learning framework and its suitability for energy disaggregation tasks.
(2) The experimental procedure was proposed to design a data learning system with optimal performance for appliance loads inference.
(3) We evaluated the proposed approach on two public data sets with performance benchmarking to verify the effectiveness of the approach. The remainder of this paper is organized as follows: Section II reviews the development, advantages, disadvantages of the major existing approaches, and the research direction. Section III briefs the key descriptions of the proposed approach, system design process, and performance indexes. Section IV presents the experimental results and discussion, and Section V concludes this paper.

II. RELATED WORKS
The research in the NILM system development could be briefly classified into two categories of event-based and nonevent-based approaches. The event-based approaches relied on detecting power state switching (ON <-> OFF) which learned the steady-state features of changed power data (∆P, ∆Q) [1,10] or power 'ON' transient features for more accurate loads identification [11,12]. The need of highfrequency sampling to detect the transient features made some limitations; the data storage problem and integration of the data modeling to the smart meter which normally operated at low-frequency sampling [5,13]. The non-event-based approach referred to the method that identified the appliances operating status without detecting the switch events. Thus, the capability to handle lowfrequency data was an advantage over the event-based approach. The well-known approaches were based on the Factorial Hidden Markov Model (FHMM) [14] which factorized the aggregate as a time series to predict the hidden power states of appliances. The load power data was inferred from the predicted state by the models [15,16]. These approaches could well fit for modeling appliances with multiple power states but required high computational complexity through parameters and probabilistic modeling. Other non-event-based approaches used the conventional classification algorithms to classify the appliance power state using Neural Network [17], k-Nearest Neighbor (kNN) [18], Support Vector Machine [19]. The classification approach, however, modeled each appliance label independently which did not reflect the practical behavior of appliances correlation that could have existed. Recent NILM developments applied Deep Neural Network (DNN) as the data learning framework [20]. Some common network architectures were employed, for example, Convolutional Neural Network [21], Long-Short Term Memory [22], and Denoising Autoencoder [20]. The advantages of the framework were the automatic features extraction and system scalability through transfer learning. However, the Deep Neural Network required a large number of training samples to obtain adequate accuracy performance, and training the model involved the configuration of many parameters and the network structure adjustment.
To ascertain the applicability of low-frequency sampling data with less computational complexity, this work proposes Multi-Target Regression (MTR) framework [23], a new data learning system for the energy disaggregation task. The approach is a category of multi-output learning [24,25] where the model can directly estimate the appliance power consumption data from the aggregate measurement with the incorporation of labels correlation in data modeling. Unlike the multi-label classification [8,26] or HMM-based learning framework that relied on classification tasks in which the primary output was the discrete power state values. It then needs a further process to convert the power state into the estimated power data, which might induce deviation or prediction loss. The proposed approach takes advantage of this issue, in addition, the ability to incorporate multiple features of the aggregate data could enhance the overall predictive performance.

III. PROPOSED METHODS
The objective of a NILM system is to decompose the aggregate consumption data into the estimated power consumption of contributed appliances through the disaggregation algorithm as expressed in (1).
f (P(t)⟼î(t)) : P(t) = (1(t)+2(t)+…+ m(t))+ ( ) where P is the ground-truth aggregate power data which is the input given to the disaggregation model (f). The model is then used to predict and generate the output î as the estimated power data of appliances i from the total m appliances at time t. The error term ( ) is the loss in AC line or measurement error. The task of predicting multiple appliance labels in which each label gives the estimated power consumption can then basically be represented as a problem of multi-output or multi-target regression [7,23].

A. MULTI-TARGET REGRESSION
Multi-target regression is a type of supervised learning framework where the goal is to estimate the numeric value of the target variable from the observation data. It has been successfully utilized in many applications, for example, predicting gas levels in multiple tanks of a gas converter system [27], ecological modeling for a prediction of multiple variables that describe the quality of plantations [28], and estimation of biophysics parameters from multiple sensing images [29]. For the application of NILM, a data set of N instances consists of the observation X1, …,Xk for k electrical features of the aggregate data and the ground-truth power data for m contributing appliances Y1,…,Ym. Since the observation and power data for each instance can be characterized as a vector of input and output, the data set can be expressed in (2) as a term of the input-output pair.
where is the input vector of k features by the aggregate data for j th instance ( 1 , … , ) is the output vector of appliances power data of m appliances for j th instance ( 1 , … , ) The multi-target regression model learns data by mapping the vector of aggregate data ( ) to the vector of appliance power data ( ) through a disaggregation function (f: ⟼ ) or a multi-target regression algorithm. The model is then used for simultaneously predicting the appliance power consumption data ({̂+ 1 ,…, ̂′}) from new aggregate instances ({ +1 ,…, ′ }) where ′ represents the period of new samples. The data learning method for multi-target regression can be classified into 2 categories of Problem Transformation and Algorithm Adaptation approach. The first category transforms the multi-output regression problem ( data) into one or more sets of single-output problems using a multioutput regressor then applies an off-the-shelf regression algorithm or based regressor to build the regression models. The second category adapts a single-target regression algorithm to handle the multi-target data directly [23]. If the label correlation within the appliances exists, the Algorithm Adaptation approach could capture the dependency and provide better regression performance compare to the method from the Problem Transformation approach [30]. The characteristics of common-used multi-target regression methods for each approach can be summarized in Table I.   TABLE I  CHARACTERISTICS OF SOME COMMON-USED METHODS FOR MULTI-TARGET   REGRESSION   Method General operation

Problem Transformation Approach
Single-Target (ST) method -Build an individual regressor for each output label [23].
-Use an off-the-shelf regression algorithm as the base regressor.
Regressor Chain (RC) method -Select a random chain of target labels, the input set is transformed by cascading a label from the chain [31].
-Label dependency is incorporated through the sequence of a cascaded chain.

Algorithm Adaptation Approach
Multi-target regression tree method -Build a tree regressor for multiple outputs with an extension from the single output approach [32].

Rule induction method
-Build an ensemble of regression trees using a set of rules and select the best subset [32].

B. DESIGN PROCESS
The design process in this work utilizes the key procedure in machine learning design to optimize the performance of appliance loads prediction. The process includes (1) Data processing: To clean up, aggregate, and build a multi-target regression (MTR) data set. (2) Features selection: To choose the best combination of relevant electrical features that could deliver the best data estimation performance. (3) Algorithm selection: To choose the best multi-target regressor from common-used regression algorithms by comparing the predictive performance. (4) Model selection: To optimize the regression model through model parameter tuning. The process is then finalized by the performance comparison of the proposed approach to the existing ones. The experimental design process is summarized in Fig. 1.

C. DATA AND MULTI-TARGET DATA SET
The data sets publicly available for the NILM research have different characteristics in sampling frequency, electrical features to be monitored, and set of appliances. This work evaluated the AMPds2 data [33] and ECO data [34] since they both have a couple of electrical features for the mains meter (aggregate data) which were useful in the process of features selection and loads power data (P) were equipped for all sub-meters. For AMPds2, it is a collection of 20 load labels measurement with important electrical features, for example, current, power (active, reactive, apparent), and power factor from a house in Canada with a period of 2 years and 1-minute data sampling interval. The data set contains some appliance labels with a single appliance per label, for example, Clothes Washer, Heat Pump, Wall Oven. Other labels contain multiple appliances within a label, for example, Basement Room, Home Office, Entertainment Plug. For ECO data, it is a collection of some appliance loads measurement from 6 houses in Switzerland for 8 months.
The measurement consists of current, voltage, and their phase shift from three phases of the mains with 1 second of data sampling frequency. After cleaning and concatenating the aggregate data (x) to the set of output targets (y), i.e., the power consumption of each appliance label, a sample illustration of the multi-target data set is shown in Fig. 2. The data set has m appliance labels and N instances with the observation of I, P, Q, and PF features for the aggregate data.

D. EVALUATION TOOLS AND PERFORMANCE INDEX
This work adopted sklearn.multioutput, a method for learning and evaluating multiple-output problems, from Python 3.7

Features selection
Algorithm selection

Model selection
Data processing (MTR data set) VOLUME XX, 2017

FIGURE 2. A sample illustration of multi-target regression data set
with scikit-learn package (ver. 0.24.2) [35]. Data learning methods available for multi-target regression problems are from the problem transformation approach, consisting of MultiOutputRegressor for Single Target (ST) and RegressorChain for Regressor Chain (RC). Both methods are meta-estimators that require a base regressor in the model construction to extend single-output regressors to multioutput regressors. For the algorithm adaptation approach, we used the methods of multi-target regression tree and rule induction from CLUS [32]. It was a Java-based data learning API for learning the predictive clustering trees which were extensible for multi-task problems. The performance evaluation of the regression algorithm is to measure how close the estimated values are as compared to the true value. There are 2 common performance indexes for energy disaggregation tasks as follows. 1) Disaggregation accuracy: This index defines the accuracy of power prediction by the one-complement of the difference between the predicted power to the true power over a given period [36]. We consider this index instead of Root Mean Square Error (RMSE) or Mean Square Error (MAE) since it provides the relative value rather than the absolute ones, so the error of load labels with different power consumption levels can be scaled and make a comparison. The overall disaggregation accuracy can be expressed as (3).
where ̂ and are the predicted and true power data for i th appliance label within a period of T instances. For individual label accuracy, the equation can also be applied by just omitting the summation of n appliances calculation. 2) F-score: A performance measure for calculating the harmonic mean of precision (P) and recall (R). The F-score was evaluated by macro-average whereby, for a binary identification of power 'On' and 'Off' classes, the measure calculates P and R for each class and average between both classes as illustrated in (4).
where P c and R c are the precision and recall for each class of data in binary identification, k is the number of appliance labels.
Since the primary output of the regression algorithm is the estimated power data for each appliance label, there is a need to translate this data into the power state in binary form {0,1} to indicate if the appliance is being turned OFF or ON. A basic strategy is to use a threshold value to determine the power state from the ground-truth power data, then apply the same value to the estimated power data. The F-score value is then evaluated based on the ground-truth and predicted power states to obtain the precision and recall values and calculation in (4). The identification of the power state (p jk ) can be defined by (5).
For j th data instance and k th appliance label, where P Th(k) is the threshold power consumption data of appliance label k. It is derived from the current consumption data that corresponds to the power value, which differentiates the appliance operating state of 'OFF'('0') or 'ON'('1').

IV. RESULTS AND DISCUSSION
This section described the experimental procedures and results which first started with the AMPds2 data set evaluation. We split the data for 3 months (129,600 samples) to evaluate the optimal components in data learning. The evaluation was conducted using 10-fold cross-validation (K=10), the performance value thus composes of the mean score with its standard deviation values. Each procedure was presented in the following topics.

A. FEATURE SELECTION
The multi-target data set was created by using a set of electrical features that could indicate the presence of appliance operation, which were current (I), active power (P), reactive power (Q), and power factor (PF). The Single-Target method with Random Forest Regressor as the multioutput regressor and the based regressor respectively were used for the test purpose. To evaluate the performance by each set of features, the disaggregation accuracy was examined by using the forward selection technique based on the paired t-test [37]. It determined a candidate set of features that would be statistically significant of difference from the others. The result presented in Table II showed that the combination of all 4 electrical features could deliver the best disaggregation performance, the bold figure indicates the best value. On the other hand, this result shows an advantage of the multi-output learning framework, where more relevant features can be incorporated for higher predictive performance. Thus, these features set will be used for the next experiments.
… pmN x (aggregate data) y (power data for load labels) data instance, j

B. LEARNING ALGORITHM SELECTION
This experiment aimed to choose the best regression algorithm for loads power estimation. A set of common single output regressors was used as the based regressor for the Problem Transformation approach. The candidate algorithms were ranged from the linear model of Ridge Regression, the non-linear model of Support Vector Machine, and ensemble algorithms of Random Forest Regressor and Gradient Boosting Regressor for both Single-Target and Regressor Chain methods. For the Algorithm Adaptation approach, two algorithms based on Regression Tree were deployed. Table III showed that the Single-Target method using Random Forest as the based regressor could provide the best disaggregation accuracy. Random Forest is an ensemble learning which proved to provide good performance by previous research [38,39]. Although the regressor ignores label dependency, it could provide good performance. This could be described as the data set was containing 20 appliance labels which were a relatively high number. Thus, to determine or reveal the correlation among appliances could be tough by the model. However, this behavior can be different for data sets with different characteristics such as the appliance power profiles or set of electrical features under measurement.

C. MODEL SELECTION
This experiment aims to tune up the model performance through a given range of model parameters using the Grid Search method in scikit-learn. The number of trees (n_estimator) and the maximum depth of tree branches (max_depth) were applied for the tuning process. These 2 parameters affect the way that the model creates decision trees for the Random Forest Regressor. Table IV shows that the performance can be increased through a new set of model parameters.

D. EXECUTION TIME
Using the prior system configuration for data learning, this experiment aims to evaluate the execution time for data training and data testing. The 3-month samples (129,600 instances) were split 80% and 20% for training and testing, respectively. We used the Python method time.time() to check the starting and stopping time then evaluated the period for the training and testing process. This experiment was run on a 64-bit Windows 10 PC with an AMD RYZEN 5 (2.1 GHz processor) and an 8 GB memory. All evaluations were executed using a single thread setting. The figures of data training time and testing time presented in Table V were the mean and standard deviation values from 10 runs for fitting the model and predicting the test data.

E. TRUE AND PREDICTED POWER DATA
This experiment illustrates the plots of ground-truth (true) power against the predicted power data by some appliance labels. This is to visualize how well the predictions could track the true values in time series data as shown in Fig. 3. The data estimation could perform relatively well for highpower appliances (Clothes Dryer and Heat Pump) where the false positives had a low proportion. The low-power appliances, for example, the Kitchen Fridge had more proportion of falses in data estimation. In a scenario where two or more appliances were turned ON or activated simultaneously, the true and predicted power together with the associated aggregate data for some of these events were illustrated in Fig. 4 (a) and (b). Apart from the power estimation of an appliance at any given samples, this result showed the capability of the proposed approach on disaggregating multiple appliances from the aggregate data at any specific time instance.

F. LABEL ACCURACY AND PERFORMANCE COMPARISON FOR AMPds2 DATA SET
This experiment evaluated the predictive performance for some appliance labels and the overall for the entire set of labels by the train/test split of 0.8/0.2. The benchmarking approaches were Super-State Hidden Markov Model (SSHMM) [2,40] and Denoising Autoencoder (DAE) network [18]. The SSHMM was a variant of HMM-based which defined the power states from the combination of individual appliances status and the approach claimed to outperform some other HMM-based variants. The DAE approach used the network architecture presented in [18] which was shown to outperform the other network topologies. The implementation used the neuraldisaggregator which was referenced from [41] and executed using Keras/Tensorflow and NILMTK [42]. The Denoising Autoencoder network aimed to reconstruct the clean power data of the target appliance from the aggregate measurement.
The aggregate data was considered to be noisy because it also consists of data that was generated by other appliances. We extended the range of AMPds2 data to 1 year (first year of the entire 2-year range) and the denoised data version was used. This was the data where the unmetered power data was subtracted from the actual aggregate measurement and it was for a fair comparative experiment. The evaluation results for disaggregation accuracy and F-score showed in Table VI and  Table VII, respectively.    Since the DAE network learned sets of sequential data when the appliance power was activated rather than learned by sample-based as the proposed approach. Thus, for the same amount of training samples by this experiment, the proposed MTR approach could provide better performance than that of the DAE network. Typically, to obtain a good performance by this network type, the training requires a large number of samples and empirical adjustment on the network parameters [18].
Overall, the performance values of disaggregation accuracy and F-score of high-power appliances (Clothes Dryer, HVAC/Furnace, and Heat Pump) were higher than that of the lower power appliances (Kitchen Fridge, Clothes Washer) and the combination of appliances in a label (Basement, Home Office) which aligned to the results of previous studies [1,13].

G. PERFORMANCE EVALUATION OF ECO DATA SET
Data set from House 2 was selected for evaluation since the proportion of unmetered power data was the lowest value among the six houses which was a benefit for data training and model performance. The data set was a noisy version (unmetered data was not excluded) and was resampled to a 1-minute interval and split for the train/test to 0.8/0.2. The same multi-target regression algorithm and model configuration were adopted for data learning. Some visualizations of the predictions and ground truth data for individual appliance operations and simultaneous power activations were shown in Fig. 5 and Fig. 6, respectively. Interestingly, the Lamp with low-power data could be disaggregated from the much more high-power data of the Dishwasher. In addition, the model could perform well for disaggregating the low-power appliances of Lamp and TV. Table VIII and Table IX presented the predictive performance values which showed to be comparable to the SSHMM and DAE approaches for some appliance labels, but they outperformed for the overall performance values.
The performance values for Fridge and Freezer were quite high by the three approaches since these appliances had a continuous operation which could be the advantage for generating a sufficient number of training samples. The Kettle and Audio got lower performance than the other labels. For the Kettle, even if it was a high-power appliance but it was rarely activated in which its energy consumption contributed to 3.99% of the whole house [34]. Thus, the rare positive training samples could hardly create a model with high predictive performance. For Audio, this was the low-power appliance with normally contributed small changes to the aggregate data, thus making the model difficult to extract its presence.
(a) (b) VOLUME XX, 2017 According to the proposed Multi-Target Regression approach, we can summarize the key advantages against its peers as (1) The capability to incorporate several relevant features in the data training process. It could help the model learns data better, unlike the HMM-based or DAE approach that used just the power data (P) feature for model training. The direct data regression with lower data processing could also help reduce loss from estimation. (2) The approach is more computationally efficient and requires fewer model configurations, just for the multioutput regressor and the base regressor model parameters. Some limitations of the approach, however, are that (1) It requires the full labeled data in training when there is a change in the set of target appliances, which is a property of the supervised learning. To ease the issue of intensive training, the semi-supervised concept which uses partially labeled data together with the unlabeled data in the training process [43] can be employed. Another issue is (2) the limited performance for the multi-power state appliances (like clothes washers or dishwashers). The multi-state operation can generate multiple levels of power data which makes the regression model harder to accurately estimate the power demand. Increasing more training samples that are associated with the relevant power level can help improve the predictive performance.

V. CONCLUSION
The multi-target regression approach shows the suitable capabilities for NILM application in which it can perform data learning tasks for estimating the power data of multiple appliance labels. The experimental results illustrate the evaluation procedure to obtain the optimal learning component for the best performance of loads disaggregation. Using the electrical features set of I-P-Q-PF and the multi-target regression model with the Random Forest as the base regressor, the predictive performance when evaluated AMPds2 data set could reach 92.2% and 91.9% for the overall disaggregation accuracy and F-score value, respectively. For the ECO data set, the overall disaggregation accuracy and F-score value reached 85.5% and 83.3%, respectively. The results outperformed the benchmarking approaches of the SSHMM and DAE network by the overall and most individual appliance labels.
The system design procedure can be applied for individual home energy management. The proposed experimental process would be split into 2 phases. The first phase is the data acquisition and training data which is to do the data collection and perform data learning as the mentioned procedure. The second phase is applying the model for new or unknown aggregate data to obtain the power estimation and infer the status of appliances.