Short-Term Power Load Prediction Based on VMD-SG-LSTM

Power load prediction plays an important role in the safety and stability of national power system. However, due to the nonlinear and multi-frequency characteristics of the power system itself, power load prediction is difficult. To address this problem, we propose a short-term power load prediction model based on variational mode decomposition (VMD). First, original data are decomposed into intrinsic mode function (IMF) of different frequencies using the VMD algorithm, and the decomposed sub-functions are reconstructed. After smoothing the reconstructed data by Savitzky-Golay (S-G) filtering algorithm, the change trend of raw data (CTRD) is obtained. Then, IMF, CTRD and raw data are used as inputs to predict short-term power load by long short-term memory network (LSTM). Finally, the proposed prediction model is compared with the other two groups of prediction models. The results show that the proposed VMD-SG-LSTM prediction model has high fitting ability and high prediction accuracy, and is an effective method for short-term power load prediction.

eration capacity of the national power system is lower than 23 the supply demand, there will be large-scale blackouts, which 24 will affect people's normal life and social production and 25 cause immeasurable losses. On the contrary, if the power gen-26 eration capacity of the national power system is higher than 27 the supply demand, this will lead to the waste of resources 28 caused by leaving the power plants idle for a certain period 29 of time [1]. Therefore, how to accurately predict the future 30 The associate editor coordinating the review of this manuscript and approving it for publication was Qichun Zhang . electricity demand so as to maintain the supply and demand 31 balance of the power system has become a top priority. 32 In the field of power load prediction, many prediction 33 methods have been proposed, including physical model 34 method, statistical method, and artificial intelligence method 35 [2], [3], [4], [5], [6], [7]. The physical model can be used 36 to accurately predict the power load, but the difficulty of 37 the physical model design is that it is difficult to accurately 38 describe the necessary information of the system components 39 in the face of a more complex model. Also, its transferabil-40 ity is poor. When the designed model is applied to other 41 systems for prediction, its accuracy will be greatly reduced. 42 Statistical methods rely more on the periodicity and outliers 43 of data [8]. In the prediction process, statistical methods 44 tend to identify the intrinsic patterns from the historical data 45 of power load and compare them with other parameters, 46 and then summarize the data for prediction. However, the 47 relationship between power load and other parameters is 48 generally complex and nonlinear, and for these reasons, it is 49 difficult to obtain accurate predicted values with statistical 50 power load prediction and proposes a short-term power load 107 prediction model based on VMD-SG-LSTM. By using IMF, 108 CTRD and raw data as inputs, a new power load dataset is 109 constructed and the capability of LSTM to process long-term 110 sequences is used to predict short-term power load, thus 111 improving the fitting ability and prediction accuracy of the 112 model. Its main contributions are as follows:

113
(1) An improved parameter selection method for Savitzky- 114 Golay filter is proposed. By setting the value of MAE, 115 the appropriate window width and polynomial fitting 116 order can be automatically allocated, which can ensure 117 the filtering effect and greatly save the selection time 118 of parameters.

119
(2) A short-term power load prediction model based on 120 VMD-SG-LSTM is proposed. By taking IMF, CTRD 121 and raw data as input, and using the ability of LSTM 122 to process long time series to predict short-term power 123 load, high prediction accuracy can be achieved.

124
(3) Compared with the other two groups of prediction 125 models, the VMD-SG-LSTM model not only has high 126 prediction accuracy, but also has good fitting effect on 127 the predicted peak and trough parts. Compared with the 128 other three latest prediction models, this model has the 129 best prediction performance.

130
The main contents of this paper are organized as follows. 131 Section II mainly introduces the algorithms used in this paper. 132 Section III presents the proposed VMD-SG-LSTM predic-133 tion model and the performance evaluation method of the 134 model. Section IV presents the experimental and comparative 135 analysis of the proposed method. Finally, the conclusion is 136 given in Section V.

139
VMD is a variational structured signal processing method 140 which integrates the Hilbert transform method, the alter-141 nating direction multiplier method, and the Wiener filter 142 method [23]. More specifically, the VMD decomposition 143 algorithm can decompose an actual signal or original data 144 into K bandwidth-based eigenmode functions. The advantage 145 of VMD is that it can decompose unstable and nonlinear 146 time-series signals and filter out some noise and interference 147 from the original signal. Theoretically, the modal aliasing 148 phenomenon can be effectively suppressed by selecting an 149 appropriate value of K [24]. The main flow of VMD decom-150 position algorithm is as follows:

151
Assuming that the original signal is composed of K finite 152 bandwidth intrinsic mode components u k (t), the central fre-153 quency of each IMF is ω k , and the model u k (t) of the ana-154 lyzed signal is calculated by Hilbert transform, the unilateral 155 spectrum can be expressed as: In the analysis signal of the model is multiplied by the 158 operator e −jω k t , and the model u k (t) can be modulated to the 159 VOLUME 10, 2022 corresponding baseband: The square norm L 2 of the demodulation gradient is calcu-162 lated and the bandwidth of each modal signal is estimated: In Lagrange function λ t and the second-order penalty factor α: The alternating direction multiplier method is used to all components can be obtained from the following formula: is defined as follows: In the formula, Erse represents the ratio of residual energy 195 to original energy. When the ratio is less than 1%, we can 196 consider the ratio small enough. When the decreasing trend 197 tends to be level off from steep, the corresponding K value 198 can be selected as the number of IMFs. The S-G filter is characterized by the ability to filter out noise 201 and interference while ensuring that the shape and width of 202 the original signal does not change, thus the change trend 203 of the original signal can be more effectively preserved and 204 analyzed [26]. The principle of the S-G filter is to convolve a 205 certain length of filter with the data to be processed using 206 a weighted average algorithm of moving windows, while 207 fitting a weighted polynomial to the data to be processed 208 that minimizes the root mean square error of the fitted target, 209 thereby discarding some edge points far from the majority of 210 points [27]. The basic formula of the S-G filter is: where Y * is the fitting value, Y is the original value of 213 the signal, and C is the coefficient of the S-G polynomial 214 fitting, indicating the coefficient of the ith filtering from the 215 beginning of the filter. m is the width of the half filtering 216 window, and N is the length of the filter, which is equal to the 217 width of the sliding array 2m + 1. Smoothing filtering with 218 the S-G algorithm improves the smoothness of the original 219 data and reduces noise interference. Another core of the 220 S-G algorithm is to set two parameters in the algorithm, 221 namely the window size and the polynomial fitting order. 222 For a given signal, the correct choice of two parameters will 223 directly lead to different filtering effects. When a low-order 224 large window is selected, the intensity of the absorption peak 225 diminishes and the absorption line becomes wider, leading to 226 distortion of the signal and difficult in retaining the required 227 information. When the parameters choose high-order small 228 window, although the original information of the signal is 229 better retained, the filtering effect on noise is also reduced. 230 So how to choose the S-G filter window size and polynomial 231 fitting order is the key to affect the filtering effect. In this 232 paper, an improved S-G filter parameter selection method is 233 proposed according to the actual control requirements. The 234 method flow chart is shown in figure 1.

235
In figure 1, the mean absolute error (MAE) is used as 236 the parameter selection criteria for the S-G filter. First, the 237 desired filtering effect X is input to the system, and then 238 the system automatically assigns the window size i and the 239  As can be seen from the graph, LSTM network contains 270 three gate functions, from left to right, the forgetting gate, 271 the input gate and the output gate, and f t , i t and o t represent 272 the state values of the forgetting gate, the input gate and the 273 output gate respectively. The sigmoid layer in the forgetting 274 gate C t−1 determines the information to be forgotten in the 275 past historical data, while the current layer x t and the output 276 h t−1 of the previous layer are used as input to the forgetting 277 gate. The output of the forgetting door is: In the input gate, the function is to update the data state in 280 three steps. First, the input gate updates the information to be 281 remembered by the result of sigmoid layer, and then a new 282 candidate value is generated by the tanh layer. Finally, a new 283 data state is obtained by adding the part to be forgotten and 284 the part to be remembered. The formula is as follows: In the output gate, the sigmoid layer is used to determine 289 which part of the output data state C t is, and then the tanh 290 layer is used to scale the updated data state value between 291 [−1,1] and multiply it with the output of the sigmoid layer to 292 obtain the final output h t , the formula is as follows: The VMD-SG-LSTM power load prediction model is pro-298 posed by combining decomposition algorithm, filtering algo-299 rithm, data enhancement and neural network techniques. The 300 model mainly consists of decomposition module, enhance-301 ment module and prediction module. Its structure is shown in 302 figure 3. It can accurately predict the future short-term power 303 load, and the model structure is described below.

304
In the decomposition module, the VMD algorithm is used 305 to decompose the original data to obtain IMFs of different 306 frequencies. Using IMF as input can improve the accuracy 307  Here we propose an improved S-G filter parameter selection 327 method, and the specific process is shown in figure 1.

328
In the prediction module, the newly generated power load 329 data set is taken as input and brought into the LSTM neural  In the design of the prediction module, the parameters of the 334 prediction model need to be set to achieve better prediction 335 results, and the specific parameter adjustment process will be 336 introduced in Section IV.

338
In this experiment, it is also necessary to evaluate the predic-339 tion ability of VMD-SG-LSTM model, so that the pros and 340 cons of each model can be seen by comparison. In this paper, 341 two evaluation methods are selected, namely Mean Absolute 342 Scaled Error (MASE) and Mean Absolute percentage error 343 (MAPE) [30]. Among them, MASE is an evaluation method 344 based on MAE. Its characteristic is that the greater the error is, 345 the greater the value is, where y i is the real value of the current 346 moment, y i−1 is the real value of the previous moment, andŷ i 347 is the predicted value of the current moment. MASE formula 348 is as follows: MAPE considers not only the error between the predicted 351 value and the actual value, but also the ratio between the 352 error and the actual value. It is generally believed that MAPE 353 measures the accuracy of the prediction. The smaller the 354 MAPE value, the higher the prediction accuracy. The formula 355 is as follows:

358
A. DATA PROCESSING

359
In order to verify the feasibility and effectiveness of the pro-360 posed power load prediction model, this experiment uses the 361 power load situation of the Belgian power grid company Elia 362 for forecasting and analysis [31]. It includes the measured net 363 generation from local power stations that inject power to the 364 Elia grid, the netto inflows from the distribution to the Elia 365 grid and the netto import at the borders. The dataset mainly 366 recorded the power load every 15 minutes from January 1, 367 2020 to December 31, 2021, with a total length of 70176 and 368 three missing values, all processed using the mean value 369 method. The power load situation of grid company Elia is 370 shown in figure 4. 371 Figure 4 can be analyzed in terms of power load trend and 372 power load size. From the power load trend, Elia's power 373 load shows a cyclical characteristic. From the power load 374 size, the minimum power load of Elia is less than 0.4MW 375 and the maximum power load is nearly 1.4MW. It can be 376 seen that its variation range and magnitude are particularly 377 obvious. At the same time, many abnormal values can be 378 found in the diagram. Through the analysis we judge that 379 Elia's power load is nonlinear and unstable, which will have 380 a certain impact on power load prediction, but at the same 381 time this difference can also detect the performance of the 382 prediction model more comprehensively.  reflect the CTRD more clearly and retain the important infor-430 mation of the raw data to the maximum extent, we compare 431 different MAE parameters with the noise-reduced original 432 data, as shown in figure 9.

433
It can be seen that when MAE is 0.02, the important 434 information of the noise reduction data can be well retained, 435 but the performance in removing the jagged fluctuation is not 436 good, and there are still obvious fluctuations. When MAE is 437 0.04, it can be found that the jagged fluctuations are com-438 pletely eliminated, but some important information of the 439 noise reduction data is also lost. Therefore, in order to better 440 reflect the CTRD and maximize the retention of important 441 information from the raw data, the evaluation index of MAE 442 is finally set to 0.03. At this point, using the improved S-G 443 parameter selection method, the window size and the poly-444 nomial fitting order can be quickly calculated to be 25 and 2, 445 respectively. Then, each parameter is brought into the S-G 446 filter for smoothing, and finally the change trend of the raw 447 data is obtained. It can be seen that the jagged fluctuations 448 after noise reduction have been significantly improved. At the 449 same time, the important information of the raw data is 450 retained to the maximum extent, and the change trend of the 451 original data can be reflected more. The decomposed IMF 452 components and CTRD are added to the raw power load data 453 to obtain a new power load data set. The new power load data 454 set is shown in Table 1.

455
When constructing a new power load data set, it is also 456 necessary to bring the data set into the LSTM model for 457 short-term power load prediction. In the design of LSTM 458 VOLUME 10, 2022    Table 2 and 3.

467
From Table 2 and Table 3, it can be seen that the it is necessary to generate samples for the data set. When 479 the length of each training sample is set to 672 and the step 480 length of sliding window is set to 1, a set of data sets with 481 the size of [69503, 672, 6 ] can be obtained. According to the 482 ratio of 90 % and 10 %, the new power load data set is divided 483 into training set and test set. At the same time, the Min-Max 484 function is used to normalize the data to accelerate the speed 485 of gradient descent to obtain the optimal solution [32]. The 486 normalization formula is as follows:

506
It can be seen from figure 11 that the prediction perfor-507 mance of LSTM is poor, and there is obvious oscillation in 508 the process of power load prediction, which will increase the 509 instability of prediction and lead to large deviation from the 510 actual value. Compared with the LSTM model, the prediction 511 accuracy of VMD-LSTM model is higher, which is because 512 the VMD algorithm can extract the characteristic information 513 of the original signal, so as to further improve the prediction 514 accuracy. The VMD-SG-LSTM model proposed in this paper 515 adds the S-G algorithm on the basis of the original prediction 516 model, which can be closer to the real value curve in the peak 517 part. At the same time, it also makes the prediction curve 518 smoother, and has a certain fitting effect in predicting some 519 values with large time span. 520 Secondly, in the comparison of the latest prediction mod-521 els, this paper uses the VMD-Bi-LSTM prediction model 522 proposed by Tang et al. [33], the VMD-CISSA-LSSVM pre-523 diction model proposed by Wang et al. [34] and the VMD-524 GWO-SVR prediction model proposed by Zhou et al. [35] to 525 compare with the VMD-SG-LSTM. Each parameter is set 526 according to the value in the article to predict the power 527 load in the next hour. The prediction results are shown 528 in figure 12.

529
It can be seen from figure 12 that the four models have 530 high prediction accuracy, but careful observation shows that 531    Table 4 records the performance evaluation of the four 538 prediction models in the next one hour. In the prediction and 539 evaluation of the next one hour, the prediction performance 540 of VMD-SG-LSTM model is reduced by 0.057, 0.071 and 541 0.076 in MASE compared with VMD-Bi-LSTM, VMD-542 CISSA-LSSVM and VMD-GWO-SVR, and 0.067, 0.084 and 543 0.087 in MAPE. In summary, the prediction performance of 544 VMD-SG-LSTM model is better than that of the other three 545 models. 546 VOLUME 10, 2022  (4) Compared with the other three latest prediction models, 578 the proposed VMD-SG-LSTM model has higher pre-579 diction accuracy and stronger prediction performance. 580 However, the proposed method has certain limitations due 581 to the need to process the current and past power load data in 582 real time in the test set before making predictions. In future 583 work, it is planned to combine some improved decomposition 584 methods with the latest prediction models to further improve 585 the accuracy of short-term power load prediction. In addition, 586 the influence of different input parameters on the prediction 587 model will be carefully studied to further improve the predic-588 tion performance of the model.