Short-Term Load Forecasting Based on a Hybrid Neural Network and Phase Space Reconstruction

Most current short-term load forecasting models have difficulty in simultaneously taking into account the time-series nature of load data, the non-linear characteristics, and the ineffectiveness of extracting potential high-dimensional features from historical series. To solve this problem, we propose a hybrid neural algorithm model (DPCL). In the DPCL, we use convolutional neural networks to obtain the high-dimensional spatial features of the phase space reconstruction of the load time series. Then, we combine the obtained high-dimensional spatial features with the external influence features extracted in the Pearson correlation analysis. Long and short-term memory networks retrieve Spatio-temporal features through the connection layer and obtain prediction results. In addition, there are problems of network gradient degradation and overfitting during the training process, We use an improved differential evolutionary algorithm to optimize the topology and time step of the hybrid neural network. We use the public dataset of a European utility and the loaded dataset of a Chinese mathematical competition as practical arithmetic examples. Experiments have higher prediction accuracy and faster prediction speed compared with other traditional algorithms.


I. INTRODUCTION
With the development of the economy, the power demand is constantly expanding [1]. Power dispatching is a crucial link for the safe operation of power grids, reliable external power supply, and orderly production of all kinds of electric power [2]. Load forecasting is the basis of power dispatching [3], which provides information and basis for the planning and construction of power grids and power sources as well as the operational decisions of power grid enterprises and power grid users, and is especially important for the economic and safe operation of power systems [4]. The electrical load system is a complex dynamic system that has internal periodicity and load impact correlation indicating some measurable pattern [5]. The main research methods in the field of short-term load forecasting are traditional statistical methods [6], [7] and machine learning methods [8]. Traditional statistical methods include linear regression [9], the gray model method [10], the fuzzy The associate editor coordinating the review of this manuscript and approving it for publication was Nuno Garcia . prediction method [11], and the autoregressive integrated sliding average model [12]. These methods are simple and easy to implement, but they require a high level of raw data processing and time series stability. However, short-term load forecasting has characteristics such as complexity and nonlinearity; in addition, there are many factors affecting load forecasting, which cannot present a stable and orderly state. In the field of machine learning research, random forests [13], support vector machines [14], expert systems [15], artificial neural network forecasting methods [16], and deep learning methods [17] are commonly used for short-term load forecasting. These methods offer a fast convergence of the algorithmic model through a simple internal structure, but they can destroy the temporal integrity of the load data, and therefore the prediction accuracy is frequently not ideal.
With the development of deep learning, researchers obtained new ideas, and deep learning algorithms such as convolutional neural networks [18], recurrent neural networks [19], and deep belief networks [20] were gradually applied to electric load forecasting. Among them, CNN uses convolutional pooling to extract data features, which can reduce the error caused by human extraction of features and is widely cited in the fields of image and speech [21]. In the literature [22], by using CNN models to extract features of input data and capture seasonal cycles for load forecasting, it was found to be more accurate for learning highly nonlinear sequences in load data; however, a single CNN model is difficult to learn load dynamics better when load data are more volatile and unstable. RNN compared to other networks can model dynamic time-series data. In 2019, Fan et al. [23] proposed to predict the output power of PV systems using long-short memory neural networks (LSTM), which was evaluated using hourly datasets from different locations over a year, and the LSTM further reduced the prediction error and demonstrated better prediction results [24]. However, when selecting the input quantity, they either did not select any more suitable influencing factor as the input quantity based on the correlation between the influencing factor and the power load data, or the processing of the input quantity was too cumbersome. The literature [25], [26] proposes load forecasting based on the LSTM algorithm, combining CNN extracted features from different perspectives while using deep learning in both spatial and temporal domains to predict short-term load forecasts. Although these researches have achieved good results, the models they used, especially the artificial neural network models, with excessive input data, implied layers, and implied layer nodes are likely to lead to overfitting, gradient disappearance, and the explosion of the network training, and the single neural network models generally exhibit shortcomings such as insufficient prediction accuracy.
In recent years, neuro-evolution is a branch in the field of artificial intelligence and machine learning that attempts to design and build neural networks through evolutionary algorithms [27]- [29]. Genetic algorithms [30] and evolutionary strategies [31] have achieved excellent results on the topology and weights of neural networks. we can determine the structure of the appropriate network for a specific problem and obtain excellent prediction performance. We consider a differential evolutionary algorithm to design the topology and weights of the neural network model, and to denote the appropriate prediction time step. However, the standard differential evolution algorithm and adaptive differential evolution algorithm have slow narrow convergence speed and longer running time [32]- [35]. In this way, we improve the DE algorithm by improving the variational operator and the crossove4r operator.
To address the problems of existing prediction methods, we proposes a combined prediction model (DPCL) which combines convolutional neural network (CNN), long short-term memory (LSTM), and an improved differential evolutionary algorithm. Firstly, the load time series are phase space reconstructed and data normalized; Then convolutional neural network (CNN) performs spatial feature extraction on the reconstructed phase space of load time series and obtains high-dimensional features of complex dynamic changes of load; Meanwhile, Pearson correlation analysis is applied to the time series composed of original multidimensional input variables for correlation analysis, and select the influencing factors with greater correlation with the power load data as input quantities, so that the original data can achieve the reduction and optimization. After that, the spatial feature sequences extracted with CNN are fused into new input features, and a long-term short-term memory network (LSTM) extracts temporal features. In addition, too many neural network nodes are likely to lead to problems such as overfitting and gradient explosion in network training, thereby improving the differential evolution algorithm is utilized to optimize the topology and time step of the hybrid neural network.
The paper is organized as follows. Section II describes the data processing. In section III, we introduce CNN, LSTM, and improved ADE. In section IV, we analyze and discuss the experimental results. Finally, section V presents the conclusions and future work directions.

II. DATA PREPROCESSING
The research data in this thesis is drawn from two main sources: an electrical mathematical modeling competition and a European load dataset, which influencing factors are temperatures, humidity, and holidays. X = {x t 1 , x t 2 , . . . , x t n } is an M-dimensional multivariate time series, x 1 is the short-term load time series, and x 2 , x 3 , . . . , x n is the time series of influencing factors.
The chaotic time series x 1 reconstructs in phase space, and X needs to be normalized and de-normalized to the range of (0,1), then transfer the normalized data into the model. The calculation formula is as follows: All these factors can impact the efficiency and effectiveness of Load, so we selected the Pearson correlation coefficients to correlate the time series F = {x t 1 , x t 2 , . . . , x t n }. Figure 1 shows the process of data preprocessing.

A. RECONSTRING PHASE SPACE
Reconstructing the phase space is the theoretical basis of chaotic systems, which was first articulated by Packard et al. [36], and usually uses the delayed coordinate method to reconstruct the phase space. According to Takens's [37] theorem, determining the embedding dimension m and the delay time τ , the univariate load values can VOLUME 10, 2022 The short-term load time series is and mutual information method [38] and Cao method [39] separately determine the embedding dimension m and the delay time τ . the reconstructed phase space matrix is: To establish whether weather and other factors Y influence short-term load forecasting X , the Pearson correlation coefficient analyzes the correlation between short-term load and influencing factors. The formula is as follows: The correlation coefficient of the two variables ρ x,y is reduced to the range of [−1,1], and the mathematical expectation E. Table 1 shows the correlation between X and Y .
As shown in Table 2, the results of the Pearson correlation coefficient analysis. The correlation of temperature is the strongest, and the correlation of maximum temperature, minimum temperature, and average temperature with load is strong, and the correlation of working days and relative humidity with load data is also certain. The above analysis shows that the inputs to the model are mainly the historical load and three characteristic factors of working days, temperature, and humidity.

III. DPCL ALGORITHM A. CNN MODEL
In this paper, we use a 1D convolutional neural network, to extract the internal features of the electric load data, dig out the interrelationship between the data, combine them into high-level features, and transform the structure of the fully connected layer [40]. So that the extracted features can be input to the LSTM to better learn the cycle change law of the load data and the demand law. LSTM can learn better the cycle change law of the load data and the demand law from the extracted features expression of the output feature vector of the CNN layer is: where w is the weights and b is the biases, and f is convolution functions, and u is pooling functions.

B. LONG SHORT-TERM Memory(LSTM)
Recurrent Neural Network [41] (RNN) has problems such as gradient disappearance or gradient explosion as time passes and the number of layers of the network increases [42].
To solve the long-term dependency problem in neural networks, a special recurrent neural network-LSTM is generated [43], where each hidden layer is no longer a single neural network. The structure of the LSTM memory module is shown in Figure 3. The LSTM consists of input gate, forgetting gate, and output gate. The input gate i t is the information input at the moment t. The forgetting gate c t determines whether the memory unit stores or forgets information, Suppose the current moment is t, then c t−1 is the state of the memory cell at the previous moment, andc t is the current input state information. h t and o t are the output values of the current LSTM. The process of state update and information output by the memory module is as follows.
where W and b stand for weights and biases, δ is a sigmoid function, tanh represents activation function, h t is the output of the LSTM cell, t represents the current time state.

C. DIFFERENTIAL EVOLUTION AND ITS IMPROVEMENT 1) STANDARD DIFFERENTIAL EVOLUTIONARY ALGORITHM
The standard Differential Evolution (DE) algorithm is an optimization algorithm using real vector encoding that searches randomly in continuous space to find the optimal solution [44], and the most commonly used strategy is DE/rand/1/bin. The specific steps of the standard DE algorithm are as follows.

a: INITIALIZING THE POPULATION
Initial population x min i,j , x max i,j , (i = 1, 2, . . . , NP; j = 1, 2, . . . , N ), where NP is the initialized population size as the 0th generation population and N is the number of spatial dimensions. The expressions are: where rand(0,1) denotes a random number that follows a uniform distribution in the (0,1) interval.

b: VARIANT OPERATION
The variation operation uses a differential policy to generate variant individuals, the expression is: where v g i is the individual of the ith variant, x g r 1 , x g r 2 , x g r3 for three mutually dissimilar parent individuals, i = 1, 2, . . . ., NP, r 1 , r 2 , r 3 are three different Paternal individuals with each other, and F is the general value of the variance factor [0,2].

c: CROSSOVER OPERATION
To increase the diversity of the population, the various individuals and the target individuals are randomly generated as crossover individuals. The specific calculation process is shown in formula (1).

d: SELECT OPERATION
The selection operation is a competition rule through the elimination of superiority in the target values of parents and children, the Selection of individuals more suitable for the populations to reproduce, and thus the orientation of the population to want the optimal solution. the expression is: 2) ADE ALGORITHM The standard differential evolution algorithm mainly includes three control parameters with fixed values of crossover factor CR, variation factor F, and population size, in terms of specific problems, the values of the control parameters vary for different optimization results, and the appropriate values can make the optimization results better. F and CR are critical operators that affect the search efficiency, and it is difficult to determine the constant real operator in the search process. When the value of F is too large, the efficiency of DE search becomes lower and the accuracy of its search solution is lower; when F is too small, the population diversity of the algorithm will be declined, and the phenomenon of ''premature'' may occur. This problem is VOLUME 10, 2022 solved In the adaptive differential evolution algorithm by the adaptive variation factor F proposed in the literature [41]: In the equation, F 0 is the initialized scaling factor, G m is the maximum number of iterations, G is the current number of iterations, F is gradually approaching F 0 , and G takes values in the range [1, G m ]. It is possible to make the adaptive variation factor gradually gravitate to a good direction, but it will become less effective due to the change of w. We improve the variation factor in this paper as follows to avoid this situation.
where F 0 is the initialized scaling factor, G m is the maximum number of iterations, G is the current number of iterations, F is gradually approaching F 0 , and G takes values in the range [1, G m ]. Meanwhile, the crossover factor is improved in the adaptive differential evolution algorithm, and to diversify the population, the crossover factor is improved as follows: where G m is the maximum number of iterations, G is the current number of iterations, and G takes values in the range [1, G m ]. As shown in Figure 5, In contrast to the adaptive algorithm, the improved differential evolution algorithm has fewer iterations, obtains the optimal value faster, and is closer to the optimal value. The gradual increment of the adaptive crossover operator maintains the diversity of the population at the initial stage, improves the local search ability of the algorithm at the later stage.

3) HYBRID MODEL
This section mainly introduces the framework structure of the DPCL hybrid model, as shown in figure 5. The specific details are as follows: 1) Input layer: input historical electric load data is a matrix f order M: 2)CNN layer: The output shows the feature vector at t time output as: H c = f (X train,t ×W +b) = Sigmoid(X train,t ×W + b), W and b denote the weights and bias values.
3) Combination layer: Layer 3D includes the output matrix of the CNN layer, and also includes new features F train , 5) output layer: The input of the output layer is the output of the LSTM layer. The output layer calculates the output with a prediction step of m through the fully connected layer: Y = [y 1 , y 2 , . . . , y m ] T , The prediction formula can be expressed as y t = Sigmoid(w 0 h t + b 0 ). where y t denotes the predicted output value at time t; w 0 and b 0 are the weight matrix and the deviation vector, and the Sigmoid function is chosen as the activation function of the Dense layer in this paper.

4) OPTIMIZATION OF HYBRID NEURAL NETWORK USING IMPROVED DIFFERENTIAL EVOLUTION
In this paper, we use an improved differential evolution algorithm to optimize the topologies and time-steps on the hybrid neural network. In the optimization, we use the mean square error (MSE) to select the best individual. the calculation formula of the MES is as follows: where y i andŷ i are raw and predictive values, n is the number of predicted points. The flow chart of the short-term load forecasting model based on DPCL is shown in Figure 5. The forecasting steps are as follows:

IV. EXPERIMENTAL ANALYSIS A. TYPES OF DATASETS
The proposed algorithm model is varied on two public datasets. The first dataset is the load prediction dataset of Algorithm 1 Improved Differential Evolution Optimizes Hybrid Neural Network Step 1: Parameters initialization: mutation factor F 0 , crossover operator CR 0 , population size NP, and max generation MAX _G Step 2: Population initialization: c2, l1, l2, l3, b), where c1 and c2 is the size of the CNN filter, l1, l2, and l3 is the number of neurons of the LSTM layers, and b is the time-steps for forecasting. Set the generation number G = 1 Step 3: while the stopping criterion is not satisfied for i = 1 to NP Step 3.  (17), which has a smaller value will be selected end for G = G + 1 end while the Electrician's Mathematical Modeling Contest, the data set contains historical load, daily average temperature, humidity, and holiday data, and the load data sampling period is 1 day. The second dataset is Public data sets provided by European regions. The sampling period of the dataset is 1 h. Training and test set in the ratio of 8:2. We use Adam [46] to optimize the gradient of the stochastic objective function.

B. EVALUATION CRITERIA
The mean absolute percentage error (Mean Absolute Percentage Error, MAPE) and root mean square percentage error (RMSPE) expression ns were selected in the model, and the calculation formula is as follows: where:ȳ i is the predicted value of the neural network at the i time and the actual value of the y neural network at the i time.

C. ANALYSIS AND DISCUSSION
In this section, we analyze and discuss the prediction accuracy of the proposed hybrid model through two experiments. One of them is to optimize the hybrid neural network through different evolutionary algorithms, and the other is to compare the proposed model with other variant models.

1) EVOLUTIONARY ALGORITHM COMPARISON EXPERIMENT
In the comparison experiments among different forms of the basic algorithm, it can be seen from Figure 7 that among  the basic DE algorithm and the improved adaptive algorithm, DE/rand/1/bin has the least number of iterations and is also the fastest to obtain the optimal value, which performs better.
To express further the superiority of the ADE algorithm, and verify the performance of the improved adaptive DE in the model network structure and a time step of the search. the search performance of the improved adaptive DE (ADE) is compared with the search results of evolutionary algorithms such as standard differential evolution DE, Particle swarm algorithm (PSO), evolutionary strategy (ES), and genetic algorithm (GA).
In the table3, We can find that the RMSE and MAPE prediction errors in the model DPCL are the lowest values. The time of ADE optimization is also the shortest among the different optimization algorithms. Figure 8 shows the convergence of several evolutionary algorithms. Compared with the DE, the optimization of the optimal search value of the ADE is not as large, the running time of the ADE is not only shorter but also convergence is faster. Compared with other evolutionary algorithms, the improved ADE achieves the lowest search value for the model, converges faster, and has a significantly lower search time than other evolutionary algorithms.

2) MODELS COMPARISON EXPERIMENT
After obtaining the optimization-seeking prediction model, single LSTM, single CNN, hybrid CNN-LSTM, hybrid PSR-CNN-GRU, and DPCL models to verify that the model proposed in this paper can achieve high prediction accuracy.
As described in Table 4, the RMSE and the MAPE of the DPCL prediction model are far lower than PSR-CNN-LSTM, CNN-LSTM, CNN, and LSTM models. The DPCL has the lowest RMSE, MAPE on two different datasets.
On the European regions datasets, We perform load forecasting for one of these days, and the accuracy of the 24-hour forecast. The RMSPE of DPCL is 0.0070, and the MAPE of DPCL is 0.059, It is clear that DPCL is lower than that of other models. PSR-CNN-LSTM does as well, which is better than CNN-LSTM, CNN, and LSTM. From Fig.9 and Fig.10, the DPCL is most fitting. The prediction value of PSR-CNN-GRU is better compared with other methods, but the prediction effect of peak load is advanced or delayed, while the prediction effect of CNN_LSTM is more different and the judgment of load direction is more wrong. The fitting effect of CNN is relatively poor, and it can predict the basic direction of load, and the key direction will become worse. In contrast, the predicted values of LSTM and ell with the actual values and can be close to the direction of the load to a large extent, but still cannot accurately judge the change of the load, so relatively speaking, the predicted values in this paper are closer to the actual values.
From Table 4, Fig.11, and Fig.12, it is evident that the DPCL performs very well on the Electrician's Mathematical Modeling Contest datasets. This paragraph aims to predict the next month of load data, we predict the total daily load amount used in a month. As expressed in Fig.11, the prediction accuracy of DPCL is higher than the others, and the prediction error is controlled within 1.5%. It is also clear that the MAPE values of DPCL compared to PSR-CNN-LSTM, CNN, and LSTM models have improved by 0.88%,1.07%,1.36%,1.82%. It is worth noting that the DPCL has high predictive accuracy in two datasets.

V. CONCLUSION
It is important to further improve the accuracy of shortterm load forecasting, to ensure the balance between load demand and power supply and demand, we propose an improved differential evolutionary algorithm (ADE) based model for optimizing CNN-LSTM hybrid neural networks, The experimental analysis by data processing and algorithmic modeling shows that: 1) The adoption of normalization, phase space reconstruction, and correlation analysis reduces the dimensionality of the data and the computational cost and helps to prepare for improving the accuracy of the model later by retaining the data features with high correlation; 2) A hybrid of CNN and LSTM is equivalent to combining the advantages of both neural networks in spatial feature extraction and temporal feature extraction. It can further improve the model accuracy than a single neural network model 3) The hybrid neural network model obtained from the training of ADE-optimized CNN and LSTM improves the training drawback that neurons and step size cannot be determined. It can further improve the model accuracy than other neural network models.
In terms of prediction performance and operation time comparison, the DPCL model is valuable for short-term load forecasting systems and overall power plant operation and maintenance.
YUAN HUANG was born in Hebei, China, in 1987. He received the bachelor's and master's degrees from the Hebei University of Engineering, in 2010 and 2013, respectively, and the Ph.D. degree from Yanshan University, in 2017. Since 2017, he has been working as a Teacher with the School of Information and Electrical Engineering, Hebei University of Engineering. He has published 11 articles. His research interests include data mining and machine learning.
RUIXIAO ZHAO was born in Henan, China, in 1996. She received the bachelor's degree from the Xinxiang University of Science and Technology, in 2018. She is currently pursuing the master's degree with the School of Information and Electrical Engineering, Hebei University of Engineering. Her research interests include data mining and machine learning. YUXING XIANG was born in Chongqing, China, in 1997. He received the bachelor's degree from Chongqing Technology and Business University, in 2019. Since 2019, he has been a Student with the School of Information and Electrical Engineering, Hebei University of Engineering. His research interests include data mining and machine learning. VOLUME 10, 2022