Multivariate Extreme Learning Machine Based AutoEncoder for Electricity Consumption Series Clustering

Multivariate electricity consumption series clustering can reflect the trend of power consumption changes in the past time period, which can provide reliable guidance for electricity production. The dimensionality reduction-based method is an effective technology to address this problem, which obtains the low-dimensional features of each variate or all variates for multivariate time series clustering. However, most existing dimensionality reduction-based methods ignore the joint learning of the common representations and the variable-based representations. In this paper, we build a multivariate extreme learning machine based autoencoder model for electricity consumption clustering (MELM-EC), which performs common representation learning and variable-based representation learning simultaneously. MELM-EC maps the common representation and multiple variable-based representations to the original multivariate time series and computes the common output weights within a few iterations. Experimental results on realistic multivariate time series datasets and multivariate electricity consumption series datasets demonstrate the effectiveness of the proposed MELM-EC model.


I. INTRODUCTION
Multivariate electricity consumption series clustering (MECSC) deals with multivariate series with the complex relationship, where each instance is composed of multivariate series which contain information related to each other [1]- [4]. Conceptually, MECSC aims to unsupervised discover the dependencies of multiple series and divide these instances into groups [5], [6]. In recent years, artificial neural network-based methods and dimensionality reduction-based methods are two popular research directions in multivariate electricity consumption series clustering [7], [8]. Artificial neural network-based methods usually designed a deep autoencoder network to extract low-dimensional features of multivariate time series, and then implemented unsupervised division of instances [9]. For example, Ienco and Interdo-The associate editor coordinating the review of this manuscript and approving it for publication was Dost Muhammad Khan . nato [10] exploited a recurrent autoencoder and an attention mechanism to produce an embedding representation. In some multivariate time series classification works, the multivariate time series of equal length or variable length were encoded into 2D images, and then the convolutional network was used to extract the low-dimensional features [11], [12]. Besides, Franceschi et al. [13] combined an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining low-dimensional features for multivariate time series of variable length.
The dimensionality reduction-based method is also an effective technology for multivariate time series clustering, which aims to learn the low-dimensional features of each variate or all variates for clustering [14]. For example, some efforts demonstrated the effectiveness of dimensionality reduction methods (e.g., VPCA [15], CPCA [16], [17], and matrix factorization [18]) for MECSC and provided informative representations for clustering tasks. He et al. [15]  first used VPCA to construct an informative representation, and then adopted the spatial weighted matrix distance to measure the similarity among all instances. Li [17] first used CPCA to obtain projection coordinate representations, and then adopted the reconstruction error to divide instances. Zheng et al. [18] performed robust matrix factorization on the similarity matrix among all instances and learned the lowdimensional representations. However, most existing dimensionality reduction-based methods ignore the joint learning of the common representations and the variable-based representations. Extreme learning machine based autoencoder (ELM-AE) is a simple and efficient single layer feed-forward neural networks (SLFNs) learning algorithm [19]- [21], where the outputs are equal to the inputs. In addition, ELM-AE obtains only one optimal solution without iterative learning. In this paper, we develop a novel multivariate extreme learning machine based autoencoder framework (MELM-EC) for the MECSC task, which is a joint learning framework for the common representations and the variablebased representations. As shown in Fig. 1, MELM-EC utilizes the common representations and the variable-based representations to reconstruct the multivariate electricity consumption series, which minimizes the reconstruction error and obtains the common output weights connecting the common representations and the outputs (i.e., the input multivariate electricity consumption series). We think that the common output weights of MELM-EC can represent the features of the input multivariate electricity consumption series. Moreover, unlike iterative learning methods (e.g., back propagation nerual networks), MELM-EC utilizes the regularized least mean squares and the coordinate descent method to solve the objective function. In this way, the common output weights can be obtained after a few iterations. Furthermore, the main contributions are summarized as follows: • This paper presents a multivariate extreme learning machine based autoencoder framework (MELM-EC) to simultaneously perform the common representation learning and the variable-based representation learning, where variable-based variable-based representation learning can help MELM-EC learn more informative common representations.
• MELM-EC utilizes the regularized least mean squares to calculate the output weights and performs clustering on the common representations. Experimental results on realistic datasets demonstrate that MELM-EC is an effective MECSC framework. The remainder of the paper is organized as follows. Section II introduces the related work about extreme learning machines. Section III details the proposed MELM-EC framework, including the objective function and the inference process. In Section IV, experimental results show the feasibility of the proposed MELM-EC framework. Finally, some conclusions are given in the last section.

II. RELATED WORKS
The extreme learning machine (ELM) model is an efficient learning algorithm of SLFNs [22], [23]. ELM output weights can be determined analytically, while the input weights and biases of the hidden layer are randomly generated. Suppose X ∈ R N ×M denotes the input instances, a ∈ R M ×J denotes the input weights, and b ∈ R J denotes the biases of the hidden layer. Then, the outputs of the hidden layer can be defined as: where f (·) denotes the activation function (e.g., f (z) = 1 1+exp(−z) is the sigmoid function) and H ∈ R N ×J denotes the outputs of the hidden layer (i.e., the hidden features of the inputs). ELM utilizes the least mean square (LMS) to calculate the output weights, which is used for classification and regression. Afterwards, He et al. [24] extends ELM for clustering, where the k-means clustering method is performed on the hidden features. Besides the good performance of ELM based algorithms, the ELM clustering method is also very convenient for implementation and computation.
Recently, there are some works about ELM clustering.
As ELM has been demonstrated to have good performance in clustering, some researchers developed some novel models on the basis of ELM. For example, Zeng [25] introduced the adaptive graph regularization term into the unsupervised ELM framework and proposed an effective graph learning method. Zhang et al. [26] proposed a residual compensation ELM for regression problem, where residual compensation was performed layer by layer iteratively for remodeling the un-modeling prediction error in the previous layer to improve the accuracy further. Xu et al. [27] proposed a fuzzy granularity neighborhood extreme clustering algorithm on the basis of ELM, which used fuzzy neighborhood rough set to eliminate redundant attributes and introduced the adaptive adjustment mechanism to solve the parameters of unsupervised ELM. Due to that the input weights and biases of the hidden layer are randomly generated, the hidden features of ELM sometimes cannot represent the input intances well. Extreme learning machine based autoencoder (ELM-AE) was a novel method of neural network which can reproduce the inputs as well as autoencoder [19], [28], [29]. In ELM-AE, the outputs are equal to the inputs, and the output weights can be also calculated by the LMS method. Suppose X denotes the input instances and H denotes the outputs of the hidden layer. Then, the objective function of ELM-AE is defined as: where λ is the regularization parameter. Taking Therefore, the solution of the problem (2) is In this way, the representions of ELM-AE can be defined as As the superior performance of ELM-AE in represention learning, some researcher made use of it to create multilayer networks. For example, Zhang et al. [28] stacked ELM-AE to create a multi-layer neural network and learned effective deep features for semi-supervised learning. Wang and Ding [30] extended the ELM-AE framework for multiview learning, which can learn the view-based representions and effectively divide multiview instances into groups.

III. PROPOSED FRAMEWORK
Previous multiview clustering work based on ELM and multiview learning takes advantages of the multiview version of ELM-AE to learn the representation of each individual view and ignore the common representation learning [30].
To address this problem, we propose a novel multivariate extreme learning machine based autoencoder model for electricity consumption clustering (MELM-EC). As shown in Fig. 1, MELM-EC maps the common representation and multiple variable-based representations to the original multivariate time series and minmizes the reconstruction error. In other words, MELM-EC is a joint learning framework for the common representation and the variable-based representations, and it utilizes the common representation and the variable-based representations to reconstruct the corresponding variate series. In this way, the autoencoder mechanism and variable-based representation learning can help MELM-EC learn more informative common representations. Given a multivariate dataset X ME where the input weights a d ∈ R M d ×J and the biases b d ∈ R J need to be orthogonalized, and denotes the activation function. In addition, H c is initialized with the mean of {H d } D d=1 . MELM-EC utilizes the common representation and the variable-based representations to reconstruct the multivariate electricity consumption series, where the objective function to be minimized is where β d denotes the output weights connecting the d-th variable-based hidden layer and the output layer, B d denotes the output weights connecting the common hidden layer and the output layer, α d is the parameter to balance different variables (e.g., α d = 1 D is used in this paper), d α d = 1, and λ 1 , λ 2 are hyperparameters. We can see that the objective of MELM-EC consists of there terms: the first term is the reconstruction loss term that minimizes the projection error from the common representation and multiple variable-based representations to the original multivariate time series, the second term is the consistency constraint term that makes the common representations close to the variable-based representations, and the third term is the penalty term that penalizes larger values in the output weights.
Therefore, we can perform clustering on the common representions H c new (e.g., the clustering model with adaptive neighbors (CAN) [31] is used in this paper).

A. LEARNING PROCEDURE OF MELM-EC
The objective of MELM-EC can be solved by the coordinate descent method, where four variates are updated in turn. In addition, the detailed learning procedure of MELM-EC is provided in Algorithm 1.
and calculate H c according to (11 is fixed, the problem (7) will become min H c Setting the derivative of (9) with respect to H c is equal to zeros, we have Therefore, the solution of the common representions H c is is fixed, the problem (7) will become Setting the derivative of (12) with respect to H d is equal to zeros, we have Therefore, the solution of the variable-based hidden representations H d is 3 is fixed, the problem (7) will become Setting the derivative of (15) with respect to β d is equal to zeros, we have Therefore, the solution of the variable-based output weights β d is

4) FIX
Setting the derivative of (18) with respect to B d is equal to zeros, we have Therefore, the solution of the common output weights B d is
In MELM-EC, the parameter λ 1 , λ 2 are selected from {10 −3 , 10 −2 , . . . , 10 3 }, α d = 1 D indicates that the importance of all variables is the same, the number of the hidden layer nodes J is selected from  [18], rand index (RI) and normalized mutual information (NMI), are used in this paper.

2) PERFORMANCE COMPARISON
We report the detailed experimental results of all comparing algorithms on eight datasets in Table 1 and Table 2, where the best results are highlighted in bold. The following observations can be made according to the results: 1) MELM-EC outperferms these baselines, where MELM-EC achieves 7 VOLUME 9, 2021 TABLE 2. Performances of MELM-EC and baselines in terms of NMI (normalized mutual information).

TABLE 3.
Comparative results between MELM-EC and baselines on RI (rand index).

TABLE 4. Comparative results between MELM-EC and baselines on NMI (normalized mutual information).
best performances on 8 datasets. Moreover, MELM-EC outperferms these baselines in terms of ''Mean±Std'', and MELM-EC achieves the least improvements of 4.48% RI and 10.03% NMI; 2) MELM-EC obviously outperforms MLAN and NESE, which indicates that MELM-EC can obtain effective common representations for clustering. To summarize, comparing the proposed MELM-EC against these baselines, our method significantly improved previous results. To make a clearer view of the relative performance between each algorithm, the partial order measures the relative performance between two algorithms A1 and A2 based on pair wise t-test at 5% significance level. If A1 A2 holds, then A1 is rewarded by a positive score +1 and A2 is penalized by a negative score −1 [37]. Taking all the partial orders into consideration, the total order > of these algorithms is defined in terms of the accumulated score of each algorithm. Table 3 and Table 4 shows that our proposed MELM-EC presents the best performance in terms of RI and NMI. We also used t-SNE [38] to visualize the common representation H c on ArticularyWordRecognition. As shown in Fig. 2, it can be seen that there are few outliers in MELM-EC, which also demonstrates its feasibility.     artificial neural network-based methods. We can see that: 1) MELM-EC and MRBM are better than MELM on eight real-world datasets. Compared with MELM, MELM-EC and MRBM use the autoencoder mechanism to guide the training of the model. This shows that MELM-EC and MRBM can learn more effective features by using the autoencoder mechanism; 2) MELM-EC significantly outperforms MRBM on eight real-world datasets. Unlike MRBM, MELM-EC is a joint learning framework for the common representation and the variable-based representations. This illustrates that variable-based representation learning helps MELM-EC learn more informative common representations.
We analysis the impact of the number of iterations K and the number of the hidden layer nodes J on the algorithm performance. Fig. 4 and Fig. 5 show the RI and NMI performances of MELM-EC on eight real-world datasets as the number of iterations K increases. It can be seen that MELM-EC achieves the better performance on a few iterations, which can greatly reduce the computational complexity of MELM-EC. Fig. 6 and Fig. 7 show the RI and NMI performances of MELM-EC on eight realworld datasets as the number of the hidden layer nodes J increases. It can be seen that MELM-EC achieves the better performance when J is less than or equal to 200, which also can reduce the computational complexity. Moreover, it seems that the performance of MELM-EC is not very sensitive to the number of hidden layer nodes.

B. EXPERIMENTS ON MULTIVARIATE ELECTRICITY CONSUMPTION SERIES DATASET 1) MULTIVARIATE ELECTRICITY CONSUMPTION SERIES DATASET
The realistic data of a region in China Southern Power Grid (CSPG) [6] is used to test our proposed MELM-EC.  The attribute information of the CSPG dataset is given as follows: • global-active-power: household global minute-averaged active power (in kilowatt); • global-reactive-power: household global minuteaveraged reactive power (in kilowatt); • voltage: minute-averaged voltage (in volt); • global-intensity: household global minute-averaged current intensity (in ampere). Specifically, CSPG contains 2780 multivariate instances of two class (i.e., two different night time periods, including 2 am to 4 am and 8 pm to 10 pm), where the number of variates is 4 and the length of variates is 18.

2) PERFORMANCE COMPARISON
We first test the performance of our proposed MELM-EC on different series combinations of CSPG. Table 5 shows the RI and NMI performances of MELM-EC on CSPG and the combinations containing arbitrary two variates. We can see that MELM-EC achieves the best results on CSPG which consists of four variates, which indicates each variate that can improve the performance of the model. Furthurmore,  We can see that MELM-EC significantly outperforms contrast algorithms in terms of RI and NMI. Next, we analysis the impact of the number of the hidden layer nodes J on the CSPG dataset. Fig. 9 shows the RI and NMI performances of MELM-EC on CSPG as the number of the hidden layer nodes J increases. It can be seen that MELM-EC achieves the best performance when the number of the hidden layer nodes is equal to 50.

V. CONCLUSION
This paper builds an effective multivariate extreme learning machine-based autoencoder framework (MELM-EC) for multivariate electricity consumption clustering, which is a joint learning framework for the common representations and the variable-based representations. The proposed MELM-EC framework utilizes the regularized least mean squares to calculate the output weights and performs clustering on the common representations, where the common representations and the variable-based representations are used to reconstruct the multivariate electricity consumption series. In addition, the common output weights can be obtained after a few iterations. Experimental results on realistic datasets demonstrate that MELM-EC is an effective MECSC framework.