Non-intrusive adaptive load identification based on Siamese network

The traditional non-intrusive load monitoring (NILM) algorithms are mostly based on classification models, which have several deficiencies. Firstly, a large amount of labeled data is required to train the classification model. Secondly, these algorithms cannot identify unknown devices that frequently encountered in practical application. Finally, these models have poor performance in versatility which means they only adapt to the trained data. These shortcomings greatly influence the practicality of these NILM algorithms. To tackle these problems, this paper has proposed a non-intrusive adaptive load identification model based on the Siamese network, which uses both the V-I trajectory and active power as the load signatures. The Siamese network is utilized to calculate the similarity of the V-I trajectory, and the load identification is realized by matching the signature with the feature library. Through adding new features to the feature library dynamically, the identification of unknown load can be realized. In addition, the Siamese network is a typical network for few-shot learning, thus the proposed model can be trained with a small number of samples to achieve ideal recognition effect. At last, the validity and versatility of the model are verified in PLAID dataset and COOLL dataset.


I. INTRODUCTION
Electricity, as the main secondary energy source, is the major way of energy consumption and closely related to our lives. With the development of economy and society, the demand for electricity is also increasing [1]. In order to cope with the energy crisis and environmental problems [2], new energy technologies have been developed vigorously, and the rate of new energy (wind energy, solar energy, etc.) in the grid has been increasing in recent years. In order to be able to absorb new energy and to increase the energy usage efficiency, there is an urgent demand to manage the consumption of electricity.
Grasping load information is of great significance to the load management. In recent years, non-intrusive load monitoring (NILM) technology has attracted extensive attentions. Traditional intrusive load monitoring requires the installation of collection and communication devices in each electrical appliance to detect the load status, and the existing electrical appliances or circuits could be damaged, thereby it is difficult to carry out. The non-intrusive load monitoring technology analyzes the status of each load by monitoring the power bus, thus is with the advantages in aspects of versatility and cost [3].
The information of NILM can not only provide accurate load information to the grid to improve the efficiency [4], but also help the power grid to formulate policies to guide consumer to use electricity reasonably. For consumers, nonintrusive detection technology shows the detailed information of household power consumption in real time, which can be used to optimize power consumption and save electricity expenses. For the equipment manufacturers, equipment status and energy consumption information can be used to detect failures and develop predictive maintenance. Therefore, the research on NILM is interesting and of great significance.
Motivated by the benefits of NILM, many scholars have been attracted and proposed numerous methods in this field. Especially with the rapid development of neural networks, a number of NILM algorithms based on neural network have been born. Some of them can achieve ideal results in precision. However, most of them have the following problems. Firstly, a large amount of labeled data is required to train the model, which is difficult to achieve in reality. Secondly, new loads cannot be recognized by these algorithms. Thirdly, different families have different loads, and it is impossible to build model and construct train set for each family. These problems reduce the versatility of the models, and bring great restrictions to practice. In recent years, some researchers have noticed and tried to solve these problems. In [5], a transfer learning method was proposed, which was realized by locking the weights of the CNN layer in the network. However, labeled data is still required for migration to train the full connect layers. A semi-supervised method was proposed in [6], but the result was not ideal.
Few-shot learning is one of the main research directions in the field of deep learning. Traditional neural network models require a large amount of labeled data for learning. However, in practice, the cost of obtaining labeled data is usually high, or it is even impossible to obtain sufficient data in some cases, which extremely limited the application of these models. The goal of few-shot learning is to solve the problem of poor model performance under limited training data. The achievements of few-shot learning have been widely used in many fields, such as face recognition and signature verification.
Therefore, in this paper, we have proposed a model based on the Siamese network, which is a kind of few-shot learning. The Siamese network is used to calculate the similarity between test sample and feature library one by one to identify the appliance or find new appliance. It can greatly reduce the dependence on training data. In addition, the Siamese network in this paper is mainly consisting of convolutional neural network that has been widely used in transfer learning with its transferability. On the other hand, the acquisition of the V-I trajectory has no direct relationship with the power frequency and sampling frequency. Benefit from these, our model can obtain excellent versatility.
The main contributions of this work lie in: 1) Ideal identification results can be achieved even with limited training data by using Siamese network. 2) It can automatically identify unknown loads and add them to the feature library. 3) When applied to other families, no additional training is required, which means it has strong versatility. The rest of this paper is organized as follows. The section II takes an overview of NILM and few-shot learning, and introduces the motivation and application of Siamese networks. Section III explains the details of the proposed model. Section IV demonstrates the validity through PLAID and COOLL datasets. Section V concludes the work in this article.

A. Non-Intrusive Load Monitoring (NILM)
The concept of NILM was first proposed by Hart in 1992 [7]. In recent years, with the development of computer technology and the popularization of smart meters, NILM has become a research hotspot, and many monitoring algorithms have been developed. These methods can be divided into methods based on low-frequency data (generally less than the power frequency) and high-frequency data (generally over power frequency, usually exceeding 1 kHz) according to the sampling frequency of the data used.
For the low-frequency data, the NILM methods can be divided into methods based on combinatorial optimization (CO) [3] [4], methods based on matching [8] [9] and methods based on classification [10] [11]. In [3], the NILM problems were regarded as classical convex optimization problems, which can be solved by using mixed integer linear programming (MILP) solver CPLEX to enhance calculation efficiency. [4] made further improvements on combinatorial optimization(CO). The method Karhunen Loè ve Expansion (KLE) was used to extract load's feature. The author also used pre-elimination to reduce appliance combinations and combined it with Appliance Usage Patterns (AUPs) to improve the identification accuracy. In [8], a background filtering algorithm was proposed to avoid the difficulty of obtaining labeled aggregated data. This algorithm can complete model training using only aggregate data and target device operating curves. The method in [9] is similar, but it proposed a data pre-processing and post-processing method to effectively improve the recognition result of multi-state electrical appliances. [10] and [11] regarded the load identification problem as a multi-label classification problem. [10] proposed classification method based on multi-label sparse representation classification method and verified it on the REDD and Pecan Street datasets. Compared with other methods, the training data required was less. [11] proposed the UNet-NILM method, which draw on the UNet architecture used for image segmentation. In addition to classification information, the network also outputs electrical power information. The UK-DALE dataset was used to prove its effectiveness.
Compared with low-frequency data, high-frequency data can provide more load features. [12] and [13] were based on the improvement of the harmonic method. [12] carried out detailed harmonic analysis for various light bulbs and used total harmonic distortion rate, power, and current harmonics as features to calculate similarity scores to realize load identification. [13] proposed a method based on lower oddnumbered harmonics and bagging decision tree (FFT_BDT), which included two processes, getting the magnitude and phase at lower odd-numbered harmonics and using bagging decision tree to recognize loads. A voltage-current (V-I) image based method has been proposed for NILM [14]. In this paper, the reconstructed image of a V-I trajectory was used as input data for a convolutional neural network (CNN) to classify the appliances, particularly resistive appliances. In PLAID and IDOUC dataset, the proposed method has excellent performance comparing with other two methods. [15] proposed a two-stream convolutional neural network based on current time-frequency feature fusion for nonintrusive load identification. First, a time series image coding method was proposed to extract the time domain and frequency domain features. Then a two-stream neural network combining gated recurrent unit (GRU) and a 2D-CNN were used to improve the load identification performance. Finally, it was validated on PLAID and IDOUC datasets. Faustine and Pereira [16] used Fryze-Current decomposition theory to decompose the load current into active current and reactive current, and Euler distance function was used to obtain the distance similarity matrix, finally, the CNN was used to identify the loads.
High-frequency data contains more information of an appliance, which can provide more basis for load identification. In the referenced papers, the precision of methods based on high-frequency data is generally much higher than that of low-frequency methods. In addition, with the development of embedded chips and edge computing technology, terminal devices such as smart meters have increasingly powerful functions and computing capabilities, making it easier to obtain high-frequency sampling data. Therefore, it is of great significance to study the identification of appliances based on high-frequency data.

B. Few-Shot Learning (FSL)
Although some of the above algorithms can achieve ideal results in precision, these methods require a large amount of label data to train the model, and it is usually difficult to obtain sufficient training data in practice. In the field of image processing, humans can easily extract key information from a small number of samples, while traditional neural networks require a large amount of training data to learn feature representation methods. Based on this, the concept of few-shot learning has been proposed and has become the focus of research in the AI field in recent years [17].
One of the subjects in few-shot learning is to solve the over-fitting problem in model training. In the case of lacking data, a larger-scale neural network is prone to over-fit during training. For this problem, the simplest and most direct method is to use data enhancement techniques and regularization methods. In [18], a few of layer as a decoder was added after the traditional graph convolutional network (GCN) to reconstruct the error into a regular term, which effectively improved the accuracy of the network.
Although data enhancement techniques and regularization methods can alleviate the problem of overfitting, it is still difficult to achieve the desired performance. The biggest difference between human learning and neural networks is that human learning is based on previous cognition. Based on this, many scholars put forward their own methods on how to learn and use prior knowledge. In [19], the meta-learning method was used to achieve few-shot learning. The thought was to pre-train a general model suitable for multiple tasks and used it as the initial model for few-shot learning. The desired results can be achieved with a few iterations. In [20], memory modules were used to store prior knowledge. And in [21], the goal of the model was to learn a metric method, which mapped the original data into a high-dimensional feature space. In other words, the model was used to extract features. Then the methods such as k-Nearest Neighbor (KNN) or Euclidean distance were used for classification.
Siamese network is a typical method of metric learning. It consists of two branches with the same structure and their weights are shared with each other. It accepts two "similar" inputs and maps them to a new space by each branch. Their similarity is calculated through loss calculation. Siamese network is especially suitable for scenarios where two inputs are comparable, such as face recognition and fingerprint verification. In [21], Siamese network was used to one-shot recognition on Omniglot dataset and got much better performance than other models. [22] used Siamese network to detect anomaly in industrial cyber-physical system, which significantly reduced the demand of training samples. [23] proposed a NILM model based on Siamese network and DBSCAN clustering method. It used Siamese network to map V-I trajectory features into a newly learned feature space, then clustering was performed by DBSCAN allowing the method to assign appliance samples to clusters or label them as "unidentified". However, the method in this paper can only identify unknown devices but cannot distinguish them. The assumption is that most of the device information in the scene is known, there is still a large room for improvement. The method proposed in this paper intends to make up for these shortcomings.

III. THE PROPOSED LOAD MONITORING METHOD BASED ON SIAMESE NETWORK
In this section, we will first illustrate the detailed process of the proposed load identification method. Then each part of the process will be introduced respectively.

A. THE PROCESS OF THE ALGORITHM
The load identification process is shown in FIGURE 1. Load separation algorithm is used to detach individual load data from power bus. This article mainly studied how to realize load identification; thus, it is assumed that detached load data has been obtained. Besides, in order to be able to compare the V-I trajectories of different electrical appliances, the voltage and current are normalized when obtaining the V-I trajectory (see Section B), which means the V-I trajectory does not contain power information. Therefore, this paper takes V-I trajectory combined with active power as the signature of an electrical appliance, so as to identify the electrical appliances which are with similar V-I trajectory but different active power.
After obtaining the detached load data, we first check whether there is a load feature library. If it does not exist (such as at the beginning of the model working or when migrating to a new home), we will build the load feature library and add the load feature (V-I trajectory and activate power) into it, then get the appliance code.
If the load feature library exists, the trained Siamese network (see Section D) is utilized to sequentially calculate the similarity of each V-I trajectory in the feature library with the load trajectory. When the highest similarity does not meet the threshold requirement, the load is considered to be a new load, then its trajectory and power are added to the feature library, and the load number is obtained. Otherwise, the active power data will be compared. Based on the observation of the load data, when the load power is small, the power fluctuation may generally be relatively large, especially the low-power appliances relevant to power converter. Such as the laptop in PLAID dataset, whose power ranges from 15W to 40W in many houses. However, their V-I trajectories are similar. On the other hand, the high-power appliances work in steady state with very small power fluctuations. Therefore, when the V-I trajectory is similar, for the low-power appliances, the power allowable variation range can be increased when comparing the power. On the contrary, for large-power appliances, the power allowable fluctuation range is smaller. Hence, we divide the appliances into two categories and set different thresholds.
In FIGURE 1, P is the active power of load to be identified and P is the active power of the load in feature lib; TH0, TH1 TH2 and Pthresh are the parameters to be set, we will discuss them in Section IV.

B. V-I TRAJECTORY ACQUISITION
The V-I trajectory is a trajectory diagram drawn with voltage as the abscissa and current as the ordinate. It contains abundant load characteristics in the steady state, which is the basis for distinguishing the type of electrical appliances. FIGURE 2 shows the process of obtaining V-I trajectories from the voltage and current waveforms.
Different loads may have a large difference on active power. In order to compare the V-I trajectories, the voltage and current of each load are normalized. At the same time, in order to prevent border effects in convolution network, the denominator is appropriately amplified during normalization.
Some electrical appliances have a large current change rate during operation, which leads to discrete data points in the V-I trajectory. As shown in FIGURE 3(a), these discrete points will affect the effect of image processing (these points may be ignored). Therefore, in order to be able to judge whether the two points are adjacent in time sequence, the V-I trajectory drawing process has been improved. First, an n×n (n is the size of the V-I image) all-zero matrix assigns 1 to the corresponding positions of the voltage and current data. Then the distance between adjacent points in the image is calculated. If the distance is more than 2, the dots on the line between the two points are filled with 1. The result is shown as FIGURE 3(b).

C. SIAMESE NETWORK
The Siamese network, also known as the twin network, is based on the combined architecture of two CNNs. It accepts the inputs of two samples, and outputs the similarity of these two samples. Since the network accepts two inputs, samples can be combined in pairs to increase the number of samples, thus it has great advantages in few-shot learning or one-shot learning. The Siamese network model of this paper is shown in FIGURE 4. Unlike the traditional Siamese network, this paper combines two pictures (V-I trajectory) as a 2-channel picture. The advantage of this is that after the first layer of convolution, the features of the two images have blended with each other, which is more conducive to the calculation of similarity.
More similar are the two V-I trajectories, closer to 1 is the network output, which can be regarded as a binary classification problem. Therefore, the loss function can use binary_cross entropy, and its calculation is shown in (1).
where n represents the output size, and yi and ŷ i represent the label and the actual output of the label.

PLAID[24]: Plug-Level Appliance Identification Dataset
The PLAID dataset is a public dataset designed for nonintrusive load monitoring research. It contains 11 different types of loads in 56 households in Pittsburgh, Pennsylvania, USA, with a sampling frequency of 30 kHz, and contains more than 200 electrical appliances and 1094 records. Each record includes several seconds of voltage and current highfrequency data and appliance labels.
COOLL [25]: Controlled On/Off Loads Library The COOLL dataset was measured by PRISME laboratory of the University of Orlé ans in France in 2016. Through precise control of the switching time, each electrical appliance was sampled 20 times with a phase step of 1ms to obtain 20 records. The sampling frequency is 100 kHz, and 12 different electrical appliances are sampled.

B. PERFORMANCE MATRICS
The F1 score is an indicator used to measure the classification results of the model in statistics. It takes into account the correct rate and the recall rate, which calculation formula is shown in (2), (3) and (4), where precision and recall respectively refer to the correct rate and recall rate, and TP, FP, and FN refer to true positive cases, false positive cases, and false negative cases.
The model proposed in this paper uses the similarity of features to identify appliances. It accepts a pair of appliance features and output the similarity, which can be regarded as a binary classification problem. The F1 score indicates the model's performance on different devices. Fmacro represents the average performance of the model on each type of electrical appliance, which calculation is shown in (5).
where N is the number of load categories. In addition, because the output of this proposed model is the electrical appliance code, the model will be meaningless if each appliance corresponds to too many codes. Ideally, each electrical appliance state corresponds to a unique code. In order to test the performance of the model on the same device, this paper has proposed an indicator, which calculation is shown in (6).  (6) where n is number of codes recognized by the model for an appliance; ci is the number of samples for each code; c is an array of ci, which is {c1, c2, …, cn}; top(c, i) represents the  It can be seen from the formula that the accuracy rises as cnt increases. However, a larger cnt means that an appliance corresponds to more identification codes, which will make it difficult to label the codes as specific appliance. In practice, it needs more annotation work. Therefore, in this paper, the maximum of cnt is 5.

C. TRAINING SET AND TEST SET
The house6 in the PLAID dataset is used as the train set. It contains Air Conditioner (AC), Compact Fluorescent Lamp (CFL), Fan, Fridge (FG), Hairdryer (HD), Laptop, etc. According to the V-I trajectory and power data of observing electrical appliances, the air conditioner and refrigerator have three and two states, as shown in TABLE I. In this procedure, these states are regarded as independent electrical appliances.
10 samples from each instance are obtained, each sample includes three periods of voltage and current data and label data. Since the PLAID dataset label does not contain the state of the electrical appliance, it is essential to correct the labels of appliance that have multiple states. Then each type of electrical appliances is randomly selected and combined. In order to ensure the balance of the train set, the number of combinations of similar electrical appliances and different types of electrical appliances should be same. In this article, 10 pairs of samples are selected for each different types of appliances. That is, for each type of electrical appliances, there are 80 pairs of samples as negative samples. Correspondingly, 80 pairs of samples electrical appliances in same type are selected. Then it is divided into the train set, validation set, and test set according to the ratio of 6:2:2.
After the model training is completed, the data from the remaining houses in the PLAID dataset and the data from the COOLL dataset are used to test the model's performance and versatility.

D. MODEL PARAMETER SETTING
There are several parameters need to be set in the proposed method, including the size of V-I trajectory and the thresholds in FIGURE 1.
First, in order to explore the influence of different V-I trajectory sizes on the model performance, the sizes of 16, 32, 64, 100 were used for verification, and the performance of the model was evaluated by comparing the average performance of the F1 scores of the other houses. The test results are shown in FIGURE 5. Each result is obtained from the mean of five runs.
It can be seen from the figure that when the V-I trajectory image is smaller than 32x32, the model effect is closely related to the image size. With image size of 16*16, the V-I trajectory is too small to fully express the characteristics, and the model performance is poor. As the V-I trajectory increases, the effect becomes better. When the V-I trajectory is larger than 32x32, the V-I trajectory image can express more detailed information (including noise). If the training samples are not enough, these information cannot work, even making model performance worse. Therefore, the final selection of the V-I trajectory image size is 32x32.
To find the appropriate thresholds, we use house44 in the PLAID as the verification data and use the F1 score as the evaluation index. The similar results can be obtained in other houses containing laptops. However, most of the other houses are only sensitive to one threshold. Changes in other thresholds hardly effect the evaluation results, which cannot provide a basis for thresholds selection. First, the results in FIGURE 6 and FIGURE 7 show the necessity of dividing electrical appliances into high-power electrical appliances and low-power electrical appliances according to their power. FIGURE 6 shows how the F1 score changes with the threshold when unified threshold (thresh_u) is used for all electrical appliances; in FIGURE 7, the electrical appliances are divided into high-power appliances and low-power appliances with a divide line power of 100W. The thresholds  are set separately as TH1 and TH2 in FIGURE 1, (a) and (b) in FIGURE 7 respectively shows the relationship between the F1 score and the threshold. It can be seen that the effect of setting different thresholds for different power appliances is obvious. And TH1 and TH2 can be set as 0.5, 1.5.
In order to verify whether it is reasonable to set a threshold of 100W to distinguish high-power and low-power appliances, we draw the curve of F1 score with Pthresh from 0 to 1000, as shown in FIGURE 8. It can be seen that the threshold of 100W is reasonable.
Finally, the parameter that needs to be determined is the threshold of V-I trajectory similarity, that is, TH0 in FIGURE 1. The relation between TH0 and F1 score is shown in FIGURE 9. Different from the previous test, this test is obtained on the entire PLAID dataset. There are several parameter combinations in the figure. Because many houses in PLAID do not have laptop, the influence of TH1 is overwhelmed. It can be seen that when the TH0 is within a certain range, the final effect of the model is not sensitive to this threshold.

E. EXPERIMENT RESULT
The comparison result with different algorithms in the literatures are shown in TABLE II. We use house6 in PLAID dataset as the train set, and the entire PLAID as the test set. Since different papers may have different dataset selection and evaluation indicators, it is difficult to compare results under the same conditions. The F1 score is used as a general evaluation indicator for evaluating multi-classification problems, which is selected for evaluation of models in most essays. Except that [26] and [27] use the low-frequency dataset of UK-DALE and REDD, all the others use highfrequency datasets. As seen from TABLE II, on the PLAID dataset, compared with the ordinary CNN, the algorithm proposed in this paper has increased by 3.88%. Compared with the method of weighted recursive graph (WRG), it has increased by 9.38%, which is equivalent to the method of adaptive weighted recurrence graphs (AWRG).
In addition to the F1 score, in order to explore the relationship between the number of codes and the accuracy of each electrical appliance, if an electrical appliance has m states, ideally, when the number of allowable numbers cnt=m, the accuracy reaches 100%, that is, acc(m)=100%. However, in actual fact, due to misrecognition or other reasons, the number of identified codes may be greater than the number of electrical appliance states. Therefore, formula (6) is used to calculate the accuracy in the situation that cnt codes are allowed. The accuracy of each appliance on the PLAID data set is shown in FIGURE 10. The x-axis represents the type of electrical appliances, and the y-axis represents the accuracy where cnt (cnt = 1~5) codes are allowed.  The test cases in this article are designed to show the superiority compared with the existing model. First, in order to prove the model's ability in few-shot learning, only house6 of the PLAID data set is used for training. To verify the capability on identifying unknown appliance, when migrating to a new environment, the feature library is rebuilt, and all loads in the new environment are treated as unknown loads. Finally, in order to verify the versatility of the model, we use the remaining 55 houses of the PLAID data set and COOLL data set as the test set to observe the performance of the model.
It can be seen from the above evaluation data that the model proposed in this paper has an ideal recognition result. Compared with the traditional classification model, the model in this paper realizes the identification of new load through the establishment of dynamic feature library, and the recognition accuracy can reach more than 90% when cnt=2. According to the practical application requirements of non-intrusive loads proposed by [31], a good user experience can be only achieved when the accuracy of load identification reaches more than 90%. That is, using the model proposed in this paper can achieve the goal through labelling two codes for an appliance. As the number of codes increases, the recognition accuracy continues to increase. Therefore, the model in this paper has high practical value.

V. CONCLUSION
This paper has proposed a non-intrusive adaptive load identification algorithm based on Siamese network, and verified its performance through PLAID and COOLL datasets. Compared with traditional algorithms, the algorithm proposed in this paper has the following advantages: • The requirement for the number of training samples is low. The Siamese network accepts paired inputs, which can expand the number of samples. • New electrical appliances can be identified. Through the dynamic expansion of the feature library, new electrical appliances can be recognized when they are added without retraining the model.

•
There is no need for retraining when migrating to new environment. The Siamese network model is only used to calculate the similarity information of two V-I trajectories, which has good transferability.   However, the algorithm proposed in this paper still has some problems to be solved. In the process of matching with the feature library, the Siamese network needs to be run once for each sample in the feature library. On the one hand, it has a high requirement for the computation capacity. On the other hand, the process of recognition will be slow when there are too many samples in the feature library. So that the performance in real-time can be decline. This is also a common problem in the Siamese network application, that is, the computational efficiency of the model is low. These problems need to be resolved in future research.