A Hybrid Forecasting Model for Short-Term Power Load Based on Sample Entropy, Two-Phase Decomposition and Whale Algorithm Optimized Support Vector Regression

To improve the accuracy and reliability of short-term power load forecasting and reduce the difficulty caused by load volatility and non-linearity, a hybrid forecasting model (CEEMDAN-SE-VMD-PSR-WOA-SVR) is proposed. Firstly, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is employed to generate multiple intrinsic modal functions (IMF) by decomposing the historical power load series. Then the sample entropy (SE) of each IMF is calculated to quantitatively evaluate the corresponding complexity. Afterward, variational mode decomposition (VMD) is adopted to achieve secondary decomposition for the component with the maximum sample entropy. Subsequently, the phase space reconstruction (PSR) is applied to reconstruct each IMF into a high-dimensional feature space matrix, which is formed as the input of support vector regression (SVR). Finally, SVR optimized by whale optimization algorithm (WOA) is used for the prediction, where the predicted values of all IMFs are accumulated to obtain the final prediction results. The experimental result demonstrates that the proposed hybrid model can effectively decompose the load series with non-linear characteristic and provide more accurate forecasting results by comparing the other models.


I. INTRODUCTION
Power load forecasting plays an important role in power system planning and economic operation. However, the environment and human activities greatly affect the power load. The fluctuation of the short-term power load sequence occurs in real-time, and changes periodically in units of days and weeks. Therefore, the time series of short-term power load has certain regularity as well as large volatility and The associate editor coordinating the review of this manuscript and approving it for publication was Canbing Li . non-linearity, which brings great difficulties during the accurate prediction. At present, the commonly used power load prediction methods are mainly divided into the following two categories [1]: mathematical statistics model, and artificial intelligence algorithm model. The mathematical statistics model is relatively simple to be built based on a strong theoretical basis. For instance, Zhao et al. [2] proposed a data-driven nonparametric autoregressive (AR) model and obtained higher short-term load forecasting accuracy by comparing AR models. To obtain a better prediction effect than the AR and ARMA model [3], VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Lee et al. [4] proposed an improved autoregressive integrated moving average (ARIMA) model, which improved shortterm load prediction performance. Considering that the seasonal information integrated into the load can improve the performance of the above statistical model, Bozkurt et al. [5] proposed a hybrid model based on SARIMA, and the experimental results verify the feasibility of the method. However, the statistical model also has the problem of insufficient ability for fitting the non-linear load sequence. Over the past decade, artificial intelligence (AI) attracts a lot of attention. Intelligent algorithms with high solving efficiency, such as back propagation neural network (BPNN) and support vector regression (SVR), have been widely applied in load forecasting and achieved fine results [6]- [9]. Neural network models like BP have the good non-linear fitting ability, but they have more parameters, longer training time, and lack of training stability. By contrast, the SVR model possesses fewer parameters and faster running speeds on small samples, but its internal parameters have an impact on the prediction effect. Consequently, the SVR was optimized by a radial basis function network and chaotic artificial bee colony algorithm, which improved the prediction performance of the SVR model [10], [11].
Generally, there are still non-linear load sequences that cannot be weakened in the above models, and thus the more accurate predicted values could not be obtained. In contrast, the hybrid forecasting model usually has the advantages of effectively reducing the non-stationary time series of power load and high solving efficiency.For example, a power load prediction model combining empirical mode decomposition (EMD) with AR achieves a good prediction effect [12]- [14]. Furthermore, some scholars proposed a new short-term load forecasting model based on ensemble empirical mode decomposition and support vector machine (EEMD-SVM) model [15], [16], which effectively improved the prediction performance of the traditional model. Liu et al. [17] proposed a short-term load forecasting model that combines EEMD and SVR optimized by whale algorithm (EEMD-WOA-SVR), which avoids the PSO algorithm easily falling into local optimum and greatly enhances the solving efficiency of the SVR model. Complete EEMD with adaptive noise (CEEMDAN) is an improved method based on EMD and EEMD, which can achieve a better decomposition effect. An optimized dragonfly algorithm was used to optimize the CEEMDAN-SVR model [18]- [20], which effectively promoted the prediction accuracy of the power load. Variational modal decomposition (VMD) is a mature signal decomposition technology, which is also widely used in various fields [21]. Kim et al. [22] proposed the VMD-LSTM model to study the load series and obtained satisfactory forecasting results. Moreover, this kind of hybrid model is also feasible in such research fields as wind speed forecasting, air quality forecast, and runoff prediction [23]- [25]. For instance, Sun and Wang [26] adopted phase space reconstruction (PSR) to optimize the decomposed modal component sequence and established an improved BP model to predict short-term wind speed. While seeing the advantages of decomposition techniques, we should also realize that a single decomposition may be hard to achieve the potential problem of sufficiently weakening the non-linearity of the sequence.
To obtain a smoother sequence and better prediction effect than a single decomposition, the two-phase decomposition framework CEEMDAN-VMD [24] considering sample entropy (SE) was adopted, and the ELM optimized by the differential evolutionary algorithm was employed to forecast the air quality, and the results showed the efficiency of this method. In addition, the two-phase decomposition (CEEMDAN-VMD) combined with SVM has equal applicability for runoff prediction [25]. In summary, the previous references demonstrate that CEEMDAN and VMD can avoid the drawbacks of other decomposition techniques, and they are successfully applied to the decomposition of the non-linear load time series. SVR is a classical and practical technique in machine learning, which has excellent performance. If the key parameters and kernel functions of the SVR can be optimized sufficiently, the solving efficiency of the SVR model can be enhanced. As a novel population intelligent algorithm, the whale optimization algorithm (WOA) has a stronger global search capability than the traditional intelligent algorithm by introducing a unique spiral update operation, so it is widely used to optimize various prediction models [17], [27], [28].
Based on the above introduction, the single decomposition technique has a limited ability to decompose the non-linearity sequences and the two-phase decomposition technique can further weaken the non-stationary of the time series. At the same time, this combined decomposition technique is rarely used in the field of power load forecasting and current researches lack further analysis of prediction errors. Therefore, this study firstly utilizes the two-phase decomposition framework (CEEMDAN-SE-VMD) to obtain a more stable sequence. Next, PSR is adopted to optimize all sequence components. Then, the SVR model optimized by the WOA algorithm is used for short-term load forecasting. Finally, the predicted values of all components are accumulated to obtain a predicted result. The proposed approach performs load forecasting by taking the real-time power load series of the Jiangyin city, Jiangsu Province, China, for August 2018. To the end, the prediction performance of the proposed model is compared and analyzed with various single prediction models (BP, SVR, and LSTM), hybrid models (EMD-SVR, EEMD-SVR, VMD-BP, and CEEMDAN-SVR), and two-phase decomposition model (CEEMDAN-SE-VMD-SVR).
The main contributions of this paper are presented as follows: 1. An innovative data preprocessing method: Based on the complexity discrimination of sample entropy, the proposed model proposes a two-phase (CEEMDAN-SE-VMD)) decomposition technique to preprocess the short-term power load sequence, which fully weakens the non-linearity of the load sequence, obtains more stable and less complex sequence components, and effectively reduces the difficulty of prediction. Subsequently, PSR is applied for the optimization and reorganization of sequence components, thereby facilitating the training and learning of data by the prediction model.
2. Optimized prediction model: The parameters of the traditional SVR model cannot be selected well, which limits the prediction accuracy of the model. Therefore, the WOA algorithm with strong global optimization ability is proposed to optimize the internal parameters of SVR, which greatly improves the prediction performance of the SVR model. 3. A comprehensive evaluation of the predictive performance of the proposed model: By comparing multiple groups of forecasting models, combined with a variety of mathematical statistics evaluation index (MAE, RMSE, MAPE, and R 2 ), an in-depth analysis of the prediction error of each model, the effectiveness of the proposed model in forecasting shortterm power load series is comprehensively considered and verified.
The remainder of this paper is organized as follows: Section 2 presents the base theoretical of CEEMDAN, SE, VMD, PSR, WOA, and SVR. Section 3 introduces the main procedures of the proposed hybrid model in detail. Section 4 provides the experimental data and the specific operation of the model. Section 5 describes a comparison with other prediction models and analyzes prediction errors. The conclusion work is summarized in Section 6.

A. COMPLETE ENSEMBLE EMPIRICAL MODE DECOMPOSITION WITH ADAPTIVE NOISE
Empirical mode decomposition (EMD) [13], [29] can be used to analyze non-linear signal sequences. It does not use any defined function as the basis, but adaptively generates the inherent modal function based on the analyzed signal. The purpose of EMD is to decompose a signal f (t) into multiple intrinsic modal functions (IMFs) and a residual (RES). On this basis, EEMD adds a limited amount of white noise to the original signal to divide the frequency range in timefrequency space based on EMD, which makes EEMD effectively reduce the chance of modal aliasing [15], [30].
With the increase of white noise frequency in the EEMD, the efficiency of the algorithm decreases, and the prediction error after reconstruction becomes larger. To overcome this drawback, the CEEMDAN improved the EEMD by adding a finite amount of adaptive white noise [18], [24]. In the CEEMDAN algorithm, the decomposed modal component is represented by IMF. The operator E j (·) denotes the j-th modal component after the given signal is calculated by the EMD method. ω i is the i-th added Gaussian white noise satisfying N (0, 1) distribution. If f (t) is the original signal sequence, it is denoted by f (t) + ε 0 ω i (t) after adding the white noise. The signal decomposition process applying CEEMDAN can be briefly described based on references [18], [31], [32], as shown in Equations (1) -(7).
Step 1: Obtain the first modal component decomposed by EMD algorithm : Step 2: Calculate the first residual margin R 1 (t): Step 3: EMD is used to decompose the signal R 1 (t) + ε 1 E 1 ω i (t) repeatedly for N times to obtain the second modal component: Step 4: For k = 2, 3, . . . K , calculate the k-th residual margin: Step 5: Repeat Step 3 to calculate the (k + 1)-th modal component and obtain: Step 6: Repeat Steps 4 and 5 until the residual margin is not suitable for decomposition. The final residual margin: Step 7: Finally, Equation (7) descripts the original signal decomposed by CEEMDAN: B. SAMPLE ENTROPY Sample entropy (SE) is a method to quantitatively describe the complexity of time series by measuring the probability of generating new patterns from the original sequence [24]. When the SE value is larger, the complexity of the original sample sequence is higher and the prediction difficulty is greater. Therefore, it is necessary to calculate and screen the power load sequence with the highest entropy value. The definition process of SE is given in Step 1 -Step 6: Step 1: For the one-dimensional sequence sampled at the same time interval, the time series with dimension N is collected: Step 2: Reconstruct the above sequence to m-dimension, with the biggest absolute value difference in sequence elements, and its expression Step 4: Calculate the ratio r ≥ D [X m (i), X m (j)] of the number of vectors meeting the threshold of N − m and B m i (r), and obtain the mean value of the vectors. It is displayed in Equation (8): Step 5: When dimension m increases by 1, repeat Steps 2 -4 to obtain B m+1 (r), then SE can be defined as Equation (9): Step 6: Usually, N is a finite value, and Equation (9) can be estimated as Equation (10): where m represents number of dimensions; r represents conditional threshold.

C. VARIATIONAL MODE DECOMPOSITION
Variational mode decomposition (VMD) is a self-adaptive, completely non-recursive modal variation and signal processing technique [21], [23]. It is considered that the single decomposition (CEEMDAN) usually can not weaken the non-linear and irregular short-term load series sufficiently. To fully decompose the load sequence components, this study intends to use VMD for two-phase decomposition [24], [25]. VMD overcomes the problem of end effect and modal component aliasing in the EMD method and has a more solid mathematical theoretical basis, which can reduce the non-linearity of the time series. The VMD decomposition obtains multiple different frequency scales with relatively stable subsequences.
Suppose the original signal f is decomposed into k components to ensure that the decomposition sequence is a modal component with limited bandwidth and central frequency. The specific variational constraint expression is as Equation (11): where u k and ω k respectively corresponding to the k-th modal component and the center frequency; K is a positive integer and represents the number of modals that need to be decomposed; δ(t) is a Dirac function; * is a convolution operator. Furthermore, the constraint condition is that the sum of all the modal components obtained by decomposition is consistent with the original sequence.
To solve the Equation (11) with variational constraints, a Lagrange multiplication operator is introduced, and quadratic penalty factor α is introduced to reduce the interference of Gaussian noise. The augmented Lagrange expression is expressed as Equation (12): The alternating direction multiplier (ADMM) iterative algorithm is used in combination with Parseval-Plancherel and Fourier isometric transform to optimize each modal component and center frequency [33]. The expressions of u k , ω k and λ can be described as Equations (13) -(15): where γ is the noise tolerance, which satisfies the fidelity requirement of signal decomposition, andû n+1 k (ω),û i (ω), f (ω) andλ(ω) correspond to the Fourier transform forms of u n+1 k (t), u i (t), f (t) and λ(t), respectively. The decomposition steps of VMD can be briefly expressed as Step 1 -Step 4: Step 1: Initializeû 1 k , ω 1 k and λ 1 , and set the maximum number of iterations N , n ← 0; Step 2: Apply Equations (13) and (14) to updateû k and ω k ; Step 3: Updateλ using Equation (15); Step 4: The accuracy convergence criterion ε needs to be greater than zero. If k û n+1 k −û n k 2 2 / û n k 2 2 < ε and n < N are not satisfied, the Step 2 is returned. Otherwise, the iteration is completed, and the final u k and ω k are output.

D. PHASE SPACE RECONSTRUCTION
Phase space reconstruction (PSR) technique is proposed by Packard et al. [34] in 1980, which is an effective and common method to analyze chaotic time series. In this paper, PSR technology is applied to "process and optimize" the twophase decomposition sequence to form the input of the SVR.
The coordinate delay reconstruction method is adopted to construct the phase space vector of the modal component sequence under delay time τ and embedded dimension d . Therefore, the PSR [21], [34] expression of x = {x i | i = 1, 2, . . . , N } at different prediction layers for the modal component sequence h decomposed by the two-phase decomposition can be expressed as Equation (16): where X i represents the i-th space vector in the phase space; N is the total number of load samples. The corresponding output matrix O could be shown as Equation (17): E. SUPPORT VECTOR REGRESSION Support vector machine (SVM) is a kind of machine learning algorithm, which has been widely applied in regression prediction. Compared with the neural network model, SVR has a stable and efficient solving capability. SVR [12], [18], [25] mainly realizes the problem of the linear inseparability of sample data through penalty coefficient and kernel function technology. It is assumed that there are n non-linear load data sets (x i , y i ), x i is the load sample input, y i is the output,φ(x i ) represents the feature vector after x i is mapped to the highdimensional feature space. The corresponding optimal formula is as Equation (18): where ω represents the normal vector and b is the displacement term. The essence of the SVR model training process is to find the optimal ω and b to make f (x i ) approach y i , and its convex optimization function is shown as Equation (19): The constraint conditions are illustrated by Equation (20): where C represents the penalty coefficient, which is used to balance structural risk 1 2 ω 2 and empirical risk  22): In the above Equation, g is a kernel parameter; the kernel function K (x, x i ) is used to replace the linear term in the linear function, and the traditional linear model can deal with the non-linear regression problem.

III. PROPOSED HYBRID MODELLING A. MODEL INTRODUCTION AND OPTIMIZATION
To weaken the non-linearity of power load time series, reduce the difficulty of short-term power load prediction and improve the prediction accuracy, a hybrid prediction model based on CEEMDAN-SE-VMD-PSR-SVR was proposed to predict short-term power load. Firstly, the original load data is preprocessed by the composite modal decomposition framework (CEEMDAN-SE-VMD), which fully weakens the nonlinearity of the load time series and reduces the difficulty of forecasting. And then the PSR technique assembles the series of load components to form the input of the SVR. Finally, this study adopts the optimized SVR model to make a prediction.
Based on fully decomposing the non-linear sequences, this paper proposes a whale optimization algorithm (WOA) optimized CEEMDAN-SE-VMD-PSR-SVR hybrid model to achieve SVR parameter optimization and obtain better prediction performance. WOA is a swarm intelligence optimization algorithm inspired by humpback whale hunting and predating behavior. This algorithm has the advantages of simple parameter adjustment, fast convergence speed, and strong global optimization ability than general intelligent algorithms [17], [27], [28]. Furthermore, the WOA flow is briefly described in Figure 1.
(1) Wrap around [27]. The humpback whales constantly adjust their position according to the position of prey combined with spiral movement to obtain the best hunting strategy. This search strategy can be described as Equation (23): where D is the distance between the current humpback whale and its prey; X * (t) is the current position vector to obtain the optimal solution; X (t) is the current position vector; X (t +1) is the target vector after the next iteration; A and C are coefficient vectors, which can be calculated by Equation (24): where A ∈ [−1, 1]; r 1 and r 2 are random vectors in (0, 1), the constant a linearly decreases during the iteration process; t represents the current number of iterations; T is the maximum number of iterations. (2) Hunting behavior. The humpback whale uses a spiral equation to update the position of the humpback whale after searching and surrounding its prey: where D i is the distance between the current best position of the i-th whale and its prey; b is the shape constant of the logarithmic spiral; l is the random number between [−1, 1]; P is a random number between [0, 1]. The spiral position is updated by Equation (26): (3) Search for prey. In actual hunting, humpback whales combine spiral search and random search, that is, random search within the range of |A| > 1, which is convenient to find other suitable foods and enhance WOA's global search and predation ability: where X rand is a randomly selected position in the whale group. In the subsequent iterations, the current optimal solution and the global optimal solution are selected [17], [27].

B. SPECIFIC PROCEDURES
Since considering the sample entropy of the load component sequences, the non-linearity original sequence is fully weakened by the two-phase decomposition, and the optimized SVR model is used for prediction. Therefore, the prediction accuracy of the traditional model could be improved. The specific flow of the proposed hybrid model is shown in Figure 2.  (6) to get the final result of the power load prediction.

C. MODEL PERFORMANCE EVALUATION INDICATORS
In this paper, the prediction performance of the proposed model on the test set is assessed and analyzed by the statistical indexes [35], [36] such as the mean square error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and the determination coefficient (R 2 ). And the indexes are shown in Equations (28) -(31).
where y i , y i and y i denotes the actual, average, and predicted values of power load sequences.

IV. CASE STUDY A. DATA COLLECTION
The reliability of data and prediction model is the significant factors affecting the accuracy of short-term load forecasting. The experimental data in this paper are from the power load The historical data of the original load sequence is shown in Figure 3.

B. ENGINEERING APPLICATION
In the first place, the original load sequence is decomposed by CEEMDAN to obtain multiple modal components. Then, the SE value of each sequence component is calculated, and the modal component with the maximum SE value is decomposed by VMD. As a modal decomposition technology with high decomposition efficiency, VMD can fully reduce the series non-stationary property with high complexity and reduce the difficulty of short-term power load prediction. For this reason, several key parameters of CEEMDAN were  set as [18]: noise standard deviation, Nstd = 0.05; number of white noise groups, NR = 500 ; maximum iterations, MaxIter = 5000 . Figure 4 shows the 9 modal components IMF and 1 residual component RES at different frequency scales obtained by CEEMDAN.
To accurately calculate the modal component of the highest SE value obtained by CEEMDAN, the core parameter m in the SE algorithm is set as 2, and the conditional threshold r choose 0.2 times of the standard deviation(σ ) of the modal sequences [26]. Table 1 illustrates the SE value of each modal component decomposed by CEEMDAN.
It can be seen from Table 1 that the SE value of IMF1 is the largest, indicating that IMF1 is the sequence with the highest complexity after CEEMDAN decomposition. Therefore, the two-phase decomposition of IMF1 is conducted by VMD. To ensure that IMF1 is effectively decomposed and avoid the influence of too many modal components on parameter optimization of the WOA algorithm, the parameter K of VMD is set to 4, and the remaining parameters are the default values [21], [33]. As illustrated in Figure 5, the modal components VMF1-VMF4 were obtained through VMD.
To explore the embedded features of the power load time series, the phase space reconstruction (PSR) technique, an effective and commonly used chaotic time series analysis tool, is used to process the decomposed modal component sequences of CEEMDAN and VMD [21], [34], [35]. As a result of PSR, the sample can better reflect the change of the actual power load and enter the SVR forecasting model. However, the selection of the delay time τ cannot be too small and the embedded dimension of d should be appropriate. Otherwise, the correlation of the coordinates will be too strong, making it difficult to display the internal information of the power load. Hence, the two key parameters of PSR [36]: delay time τ chooses 1, embedded dimension d chooses 5, specific reconstruction process as shown in Equations (16) - (17). After PSR optimization, the modal component data still need to be normalized before input into the SVR model to improve the training efficiency.
Considering that the WOA algorithm has the characteristics with simple structure, few parameters, strong search ability, and high convergence accuracy, this paper adopts WOA to optimize the SVR model. To improve the searching ability of the whale algorithm and obtain the global optimal solution, relevant parameters should be set reasonably. The size of the whale group, the maximum number of iterations, and the search range of SVR parameters are specified in Table 2. With the help of the WOA algorithm, the hyperparameters of SVR including penalty coefficient C and kernel parameter g are optimized and selected reasonably [17], [27], [36]. Finally, the predicted values of all modal components are summed up to complete the final short-term load forecasting. The optimization parameters of the involved models are shown in Table 2, and σ represents the standard deviation value of the sequence obtained by CEEMDAN. The grid search algorithm is used to select the internal parameters of SVR in the comparison model [37], and the search range is [2 −10 , 2 10 ]. Other important parameters are set by trial and error and predetermined parameter values, as detailed in Table 2.
The proposed model is applied to the actual load prediction by adopting the process mentioned in Figure 2 and combining it with the preferred parameters in Table 2. The short-term power load forecasting effect diagram is shown in Figure 6.
When carrying out research on the prediction of nonlinear and fluctuant power load time series, Figure 6 shows  that the proposed two-phase decomposition technique combined WOA-SVR (CEEMDAN-SE-VMD-PSR-WOA-SVR) model can well fit and predict the future sequences. It can obtain accurate results in short-term urban load forecasting and provide reasonable arrangements for power grid operation in summer.

V. DISCUSSION
To further illustrate the performance of the proposed hybrid model, it is compared with other predictive models (BP, SVR, LSTM, EMD-SVR, EEMD-SVR, VMD-BP, CEEMDAN-SVR, and CEEMDAN-SE-VMD-SVR). Finally, the evaluation indexes (MAE, RMSE, MAPE and R 2 ) are used to test and verify the superiority and accuracy of the proposed model. Table 3 shows the number of modal components obtained by decomposing the original load sequence for each model. Table 4 shows the prediction effect of each modal component. Table 5 shows the evaluation indexes of the specific prediction effect of each model. Figure 7 shows the prediction effect of the proposed model and other comparable VOLUME 8, 2020   models. Figure 8 compares the prediction error and the fitting effect of each model.
It should be noted that the BP, SVR, and LSTM models have not undergone modal decomposition, and the number of  remaining model decompositions is shown in Table 3. Firstly, The proposed model adopts CEEMDAN to obtain 10 components. The VMD then decomposes IMF1 into 4 components. Finally, 13 modal components are obtained from the proposed model. Table 4 gives the prediction error and fitting effect of the predicted value of each component compared to the true value. It shows that the proposed model can achieve better a prediction effect on each component. Accumulating the predicted values of all components can realize the final load forecast.
From Table 5 and Figure 7, the proposed model has obtained the optimal prediction effect on the test set, with the smallest error index (MAE, RMSE, and MAPE) and the highest fitting index (R 2 ), which is respectively 19.53 MW, 23.81 MW, 0.51% and 99.32% on the test set.
For example, compared with a single prediction model, the prediction error of the BP model is the largest, and the proposed hybrid model is the smallest. The prediction effect of SVR and LSTM is better than that of BP. However, the prediction effect of SVR and the LSTM model is not ideal. The neural network model has a good fitting performance, but in the case of non-linear load prediction, the prediction performance is not stable and needs a long time of training due to the requirement to adjust many parameters. The mode decomposition technology can decompose non-linearity power load sequences. Therefore, the hybrid model is much better than that of BP, SVR, and LSTM. Taking MAPE as an example, compared with the SVR model, the EMD-SVR model decreased by 0.04% in the test set. The VMD-BP model benefits from the high decomposition efficiency of VMD, and the MAPE index is 0.43% lower than that of the BP model. However, the VMD-BP model has been trained for a long time in the experiment. The predictive performance of the CEEMDAN-SVR model is better than that of EMD-SVR, EEMD-SVR, and VMD-BP models, and the fitting effect on the test set reaches 93.09%. Since the above single decomposition technique could not deal with the non-linear data sequence well, and there is room for improvement. The proposed model takes the sample entropy of the power load sequence into consideration and combines two-phase decomposition techniques, CEEMDAN and VMD, to sufficiently weaken the non-linearity of the original sequence and further reduce the difficulty of prediction. PSR is used to optimize the sequence components and input of the SVR prediction model, and the WOA is used to optimize the SVR model, which improved the solving efficiency of the SVR. The above optimization operation enhances the prediction performance of the traditional CEEMDAN-SVR and CEEMDAN-SE-VMD-SVR model. For example, MAPE decreases by 0.67% and 0.5%, while R 2 increases by 6.23% and 4.37%, respectively. Therefore, we can conclude that the proposed model uses the two-phase decomposition technique and WOA optimized SVR hybrid model to get a smoother load sequence, which reduces the difficulty of prediction and is of great help to improve the prediction accuracy and stability of the traditional model.
To show the prediction effect of each model more clearly and intuitively, Figure 7 shows the prediction effect of each model, and Figure 8 compares the prediction error and fitting capability of each model.
It can be observed from Figure 7 and Figure 8 that the proposed model effectively predicted the real load sequence, with the minimum prediction error and the best fitting effect. To further study and analyze the distribution characteristics and prediction effect of the prediction errors of each model, the Gaussian mixture distribution test and statistical analysis of the prediction errors of each model can be carried out.
Based on the Gaussian Distribution [23], [38], the distribution of prediction errors of each model is tested and analyzed. It is assumed that the prediction error sequence X is the probability distribution subject to the position parameter µ and scale parameter σ , where µ represents the expected mean of the error value, σ is the standard deviation, and the normal distribution [39] satisfying the N (µ, σ 2 ) prediction error is as shown in Equation (32): As demonstrated in Figure 9, it shows the prediction error value of each model at each test point and calculates µ and σ of the error value of each model by using Equation (32). Then describe the Gaussian frequency histogram and fitting distribution curve of the prediction error of each model.
As can be seen from the prediction error accumulation diagram in Figure 9, the error value of the proposed model is smaller than that of other models at specific prediction points, and the standard deviation is the smallest in the whole test set. The mathematical statistics analysis of the prediction error value describes the Gaussian distribution histogram and the fitting curve of the prediction error [23], [39], [40]. The result shows that the error of the proposed model is mainly concentrated in the near zero, and the probability of points with low prediction error values is higher. In contrast, for other models, the error value offset zero points on both sides, and the occurrence frequency of the points with larger prediction error value is higher. It shows that the error distribution of the proposed model is more reasonable and has higher observation accuracy. On the test set, the intelligent algorithm model (BP, SVR, and LSTM) model has a wide error distribution range and low prediction accuracy. The single decomposition model (EMD-SVR, EEMD-SVR, VMD-BP, and CEMDAN-SVR) and two-phase decomposition model (CEMDAN-SE-VMD-SVR) reduce the non-linearity of the power load sequence due to mode decomposition, thus ensuring the prediction effect. The proposed model obtained a more stable load sequence by utilizing two-phase decomposition and then adopted the WOA optimized SVR model for prediction. Hence, the prediction effect is better than all the comparison models, and the solution effect is efficient and stable. The actual prediction error distribution of the proposed model is more reasonable and concentrated, with less volatility and higher observation accuracy and prediction stability.

VI. CONCLUSION
Short-term load forecasting plays an important role in power system dispatching and market transaction. To reduce the non-linearity of load series and improve the forecasting accuracy, this paper proposes a hybrid model for short-term load forecasting, which combines two-phase decomposition and optimized support vector regression (CEMDAN-SE-VMD-PSR-WOA-SVR). The proposed model firstly uses CEEMDAN to decompose the original sequence and calculate the sample entropy of all components. Subsequently, the sequence with the largest entropy value is used for two-phase decomposition using VMD technology, which fully weakens the non-linearity of the sequence. Then PSR optimizes all the modal components and inputs them into the SVR model. Next, the SVR model optimized by the WOA algorithm is used to train the data and predict the future sequence. Finally, the prediction accuracy of the proposed model is compared with BP, SVR, LSTM, EMD-SVR, EEMD-SVR, VMD-BP, CEEMDAN-SVR, and CEMDAN-SE-VMD-SVR. And the distribution of the prediction error of each model is analyzed by mathematical statistics. According to the detailed prediction results and the corresponding comprehensive analysis, the conclusions can be drawn: (1) The proposed hybrid model achieves the highest prediction performance and the smallest prediction error among all models. It is verified that the proposed hybrid model can effectively improve the forecasting accuracy of shortterm power load. (2) The performance of the hybrid models (EMD-SVR, EEMD-SVR, VMD-BP, CEEMDAN-SVR, and CEMDAN-SE-VMD-SVR) preprocessed by modal decomposition is better than that of the BP, SVR, and LSTM models, indicating the effectiveness of modal decomposition technology in reducing non-linearity short-term load sequences. (3) As two kinds of signal processing technologies with high decomposition efficiency, CEEMDAN and VMD have good complementary advantages. After the sample entropy screening, the proposed hybrid model is decomposed into a two-phase mode to obtain a more stable load sequence component, which reduces the difficulty of prediction. Compared with the traditional single-phase decomposition technology, it is improved to a certain extent and has better practicability. (4) PSR can effectively optimize the sequence components and form the input of SVR. Moreover, the WOA algorithm is very effective for optimizing and selecting the key parameters (penalty coefficient C and kernel parameter g) inside the SVR model. It has a distinct effect on improving the prediction accuracy of the hybrid model and the fitting effect reaches 99.32%. (5) The hybrid model proposed in this paper combines historical data preprocessing, prediction model optimization, and prediction error analysis to carry out research on short-term load forecasting, which is more conducive to achieving higher forecasting accuracy.
QIANG SHI is currently pursuing the master's degree in electrical engineering with China Three Gorges University (CTGU). His research interests include machine learning, power load forecasting, and optimal reservoir operation.
MUHAMMAD SIBTAIN is currently pursuing the Ph.D. degree in electrical engineering with China Three Gorges University (CTGU). His research interests include machine learning and optimal reservoir operation.