Two Stage Prediction Model of Sunspots Monthly Value Based on CEEMDAN and Particle Swarm Optimization ELM

Sunspot number is the most basic parameter to describe the level of solar activity. The accurate prediction of sunspot number can reflect the electromagnetic disturbance level of the electromagnetic layer, ionosphere and the middle and high layers in the future in advance, so as to provide important reference information for navigation, positioning, communication and the prediction of orbital attenuation of LEO satellites. Aiming at the characteristics of sunspot time series such as non-stationary, chaotic and difficult to predict, this paper proposed a two-stage combined prediction model based on complete Ensemble Empirical mode decomposition with adaptive noise (CEEMDAN), particle swarm optimization (PSO) and extreme learning machine(ELM) network. In the first stage of prediction: firstly, the original sunspot monthly mean series is decomposed by CEEMDAN to reduce the non-linearity and non-stationary of the original series. Then, an ELM prediction model is established for the sub-sequences decomposed by CEEMDAN, and the input layer dimension, hidden layer dimension, input layer weight and hidden layer bias of ELM are optimized by PSO algorithm. Finally, the prediction results of the first stage are obtained by superimposing the prediction results of each sub-sequence. The second stage is the error self-correction stage. Firstly, the prediction error sequence of the first stage is obtained. Then, the CEEMDAN-PSO-ELM prediction model is used to self-correct the prediction error of the first stage. Finally, the prediction results of the first stage and the self-correction results of the second stage are superimposed to obtain the final prediction value of the monthly sunspot number. In this paper, CEEMDAN is used to reduce the non-linearity and non-stationary of the sunspot series, and PSO is used to determine the best parameters of ELM network, and the useful information in the prediction error is fully considered, which effectively improves the prediction accuracy of sunspot monthly mean series. The prediction experiment is carried out by using the measured sunspot monthly mean series, and the proposed model is compared with wavelet neural network (WNN), back propagation neural network (BPNN), ELM, CEEMDAN-ELM and CEEMDAN-PSO-ELM. The experimental results show that the prediction accuracy of the proposed two-stage prediction model has been significantly improved, and has better prediction stability.

the decomposed data, and uses the combined optimization 91 method based on simulated annealing and particle swarm 92 optimization(SA-PSO) to optimize the super-parameters of 93 LSSVM. The experimental results of the above hybrid models 94 show that the hybrid prediction model has higher prediction 95 accuracy than the single prediction model. 96 In addition, most of the research on prediction models only 97 carried out simple error correction, ignoring the importance 98 of prediction error, resulting in the inability to make full use 99 of the useful information in the prediction error. However, 100 the current research shows that considering the error factor 101 can significantly improve the prediction performance. For 102 example, [20] shows that the accuracy of the prediction model 103 based on error correction is significantly better than that 104 before error correction. Reference [21] proposed a hybrid 105 prediction model with error correction. The prediction results 106 show that the prediction model based on error correction has 107 better prediction ability than other prediction models without 108 error correction. The prediction results of [20], [22], [23], 109 and [24] also show that error correction can improve the 110 prediction accuracy. Based on the results of the above litera-111 tures, this paper considers the error factor in constructing the 112 prediction model of the monthly mean series of sunspots, and 113 further improves the prediction accuracy through multi-scale 114 error correction. 115 Therefore, this paper proposes a two-stage prediction 116 model based on multi-scale decomposition, swarm intelli-117 gence optimization algorithm and multi-scale correction of 118 error sequence, which successfully solves the above impor-119 tant problems. Specifically, in the first stage of prediction, 120 aiming at the nonlinear problem of sunspot monthly mean 121 time series, CEEMDAN method is used to decompose the 122 original data into a series of intrinsic mode functions (IMF), 123 and then particle swarm optimization (PSO) ELM model 124 (PSO-ELM) is used to predict all the intrinsic mode com-125 ponents. Then, in the second stage of prediction, an error 126 correction prediction method based on multi-scale PSO-ELM 127 is constructed to correct the prediction error in the first stage. 128 Finally, the prediction results of all IMF in the first stage are 129 integrated with the error prediction results in the second stage 130 to obtain the final prediction value. Experimental results show 131 that the proposed two-stage prediction model performs well 132 in predicting the monthly mean of sunspots.

133
The main innovation and contributions of this paper are 134 summarized as follows:

135
(1) In this paper, a new two-stage prediction framework is 136 proposed, which can better improve the prediction accuracy 137 of sunspot monthly mean. Moreover, the effectiveness of the 138 proposed two-stage prediction model is verified by several 139 datasets, and the prediction results of the proposed method are 140 compared and analyzed with other five classical prediction 141 results..

142
(2) The proposed sunspot prediction model takes into 143 account the error factors, and successfully solves a major 144 problem of the previous models, that is, it only improves the 145 prediction ability of sunspot monthly mean series, without 146     where: γ 0 is the standard deviation of noise.

193
Step 2: In the first stage, the EMD method is used to 194 decompose the target signal, and the first intrinsic modal 195 component(IMF) is obtained and the average value is (IMF refers to a function that satisfies the following two 198 conditions: 1) the number of extreme points is equal to or 199 differs from the number of zero crossings by at most one; 200 2) the mean of the envelope defined by the local maximum 201 point and the envelope defined by the local minimum point is 202 zero)

203
The first stage margin signal is expressed as Step 3: Define E k (n) as the K -th IMF component after 206 EMD decomposition of the signal data. Decomposing the 207 sequence r 1 (n) + γ 1 E 1 (w i (n)), the IMF component of the 208 second stage can be obtained as The second remaining component is 211 r 2 (n) = r 1 (n) − C 2 (n) (5) 212 Step 4: By analogy, the K -th remaining component is The (k + 1) -th IMF component is Step 5: Repeat Step 4 until the remaining components cannot 217 meet the EMD decomposition conditions or the iteration 218 ends. Finally, all K intrinsic mode functions ({imf k } 1≤k≤K ) 219 of CEEMDAN are obtained, and the residual R(n) is the target data sequence is decomposed into the output of a feed-forward neural network with hidden 242 nodes and an activation function G(x) is weight connecting the input layer to the -th hidden layer node,  Converting Equation (10) into matrix form, we can get: where H is the hidden layer output matrix of the network.

256
In the ELM algorithm, the input weight and hidden layer 257 can be given randomly without adjustment during the training 258 process, and the hidden layer matrix H is a definite matrix 259 before training. The training of the feed-forward neural net-260 work is actually transformed into a problem of solving the 261 least square solutionβ of the output weight matrix. The 262 output weight matrixβ can be expressed as where H + is the generalized inverse of matrix H .

265
According to Equations (10) to (12), the output weight 266 matrix is determined by the deviation between the input 267 weight matrix and the hidden layer. Since ELM randomly 268 assigns the initial input weight matrix and hidden layer devi-269 ation, there may be some input weight matrix and hidden 270 layer deviation of 0, resulting in some hidden layer nodes 271 are invalid. Therefore, in some practical applications, the 272 accuracy and time of ELM training will be affected by 273 randomness.
The Particle swarm optimization (PSO) algorithm was first 276 proposed by Kennedy and Eberhart [29] in 1995. The PSO 277 algorithm originated from the study of the predation behavior 278 of birds. When birds prey, the easiest and most effective way 279 for each bird to find food is to search the area around the bird 280 closest to the food.

281
Suppose that in a D-dimensional search space, there 282 is a population of n particles X = (X 1 , X 2 , · · · , X n ). 283 The i-th particle represents a D-dimensional vector X i = 284 [X i1 , X i2 , · · · , X in ] T , which represents the position of the 285 i -th particle in the D-dimensional search space, and also 286 represents a potential solution of the problem. According 287 to the objective function, the fitness value corresponding 288 to each particle position X i can be calculated. The veloc-289 ity of the i-th particle is The global extreme minimum of the population is P g = 292 [P g1 , P g2 , · · · , P gD ] T .

293
In each iteration, the particle updates its speed and position 294 through individual extreme values and global extreme values. 295 The update formula is as follows: In Equations (13) and (14), ω is the inertia weight, X id ∈ X i 300 is the d-th element of the i−th particle, d = 1, 2, · · · , D, 301 i = 1, 2, · · · , n, k is the current iteration number, V id is the 302 velocity of the particle, c 1 and c 2 are non-negative constants, 303 called acceleration factors, r 1  parameters, the specific steps are as follows:

318
Step 1: Load the data and divide the data into 80% training 319 set and 20% test set.

320
Step 2: Initialize the particle population and set the relevant and (14). When the maximum number of iterations or the 340 best fitness is reached, the optimization iteration process is 341 stopped.

342
Step 5: The optimal input layer dimension, hidden layer  The two stages prediction process based on CEEMDAN-358 PSO-ELM is described as follows:

FIGURE 2. Flow of two-stage based on CEEMDAN-PSO-ELM.
where x is the input data, and x max and x min are the maximum 365 and minimum of x, respectively, and y is the normalized result 366 of the input data.

367
Step 3: Bring each normalized sub-sequence into the PSO-368 ELM model for prediction, and de-normalize the output result 369 to get the prediction result Y 1 , Y 2 , · · · , Y K +1 .

370
Step 4: Sum Y 1 , Y 2 , · · · , Y K +1 to get the prediction result 371 of the first stage Y sum . Subtract Y sum rom the actual value to 372 get the error sequence E. 373 Step 5: Use the CEEMDAN method to decompose the error 374 sequence E to obtain C E 1 , C E 2 , · · · , C E M and R E .

376
Step 7: Bring each normalized sub-sequence into the PSO-377 ELM model for prediction, and get the prediction result 378 Step 8: to get the second stage 380 prediction result Y E sum .

381
Step 9: Sum the prediction result of the first stage Y sum and 382 the prediction result of the second stage Y E sum to get the final 383 prediction value, that is prediction valueỸ = Y sum + Y E sum .

385
The monthly mean number of sunspots is from the official 386 website of the solar action data analysis center of the Royal 387 Observatory of Belgium (source: silica data, Royal Observa-388 tory of Belgium, Brussels). We selected the sunspot smooth-389 ing monthly observations from August 1949 to March 2021 as 390 the experimental data, and the total length of the data set is 391 3260. In the first experiment, we selected all 3260 data as 392 experimental data (recorded as dataset1), in which the length 393 of the training set is 2600 and the length of the test set is 394 660. In the second group of experiments, we selected the 395 In Equations (16)∼(19),x(t) represents the prediction 409 value, x(t) represents the original data,x(t) represents the 410 mean of original data, and n represents the test set data length.   size to 30, the inertia factor to 0.8, and the learning rate to 2. 425 The optimization results of the PSO algorithm for the input 426 dimension and hidden layer dimension of the ELM model are 427 shown in Table 2. The fitness curve of the PSO algorithm for 428 optimizing the ELM model of each IMF and R is shown in 429 Figure 4. After the predicted value of the first stage is obtained, the 432 error sequence E is decomposed by CEEMDAN, and the 433 parameter settings in CEEMDAN are the same as those in 434 the first stage. The CEEMDAN method decomposes E to 435 obtain 11 IMFs (C E 1 , C E 2 , · · · , C E 11 ) and a residual component 436 R E . The decomposition result is shown in Figure 5. The initial 437 parameter settings of the second stage PSO algorithm are the 438 same as those of the first stage. The optimization results of 439 the PSO algorithm for the input dimension and hidden layer 440 dimension of the ELM model are shown in Table 3.     graph. It is necessary to further calculate, analyze and com-460 pare the model evaluation indicators. 461

4) PREDICTIVE EFFECT EVALUATION OF DATASET1
462 Using the aforementioned model evaluation criteria to evalu-463 ate the model. The evaluation results are shown in Table 4 and 464 Figure 7. In terms of a single model, the ELM model is better 465 than the WNN model and the BPNN model in predicting 466 the monthly mean sequence of sunspots. CEEMDAN decom-467 position effectively improves the prediction accuracy of the 468 ELM model. The PSO algorithm optimizes the parameters 469 of the CEEMDAN-ELM model to further improve the pre-470 diction accuracy. The introduction of the two-stage method 471 makes the accuracy of the model higher. The three error 472 indicators of the model used in this article are all smaller than 473 other forecasting models, and the coefficient of determination 474 is higher than other models. Among them, MAE is 0.2141, 475 RMSE is 0.3260, MAPE is 0.0039, and R 2 is 1.   the model proposed in this paper is compared with the error 488 sequence of other models as shown in Figure 8. It can be 489 seen that the model proposed in this paper has a smaller error 490 sequence amplitude and closer to zero than other models. 491 Taking the absolute value of the prediction error sequence 492 to obtain the absolute prediction error sequence |FE|, the 493 empirical cumulative distribution diagram of the absolute 494 prediction error sequence of each model is shown in Figure 9. 495 Combining with the descriptive statistics of the absolute pre-496 diction error sequence |FE| in Table 5, the mean value of |FE| 497 of the model proposed in this paper is 0.2198, and the variance 498 is 0.2594, which is smaller and more stable. The combined 499 forecasting model adopted in this paper is more suitable for 500 forecasting the changing trend of the monthly mean time 501 series of sunspots. It is a feasible forecasting method and has 502 practical significance. The prediction process of two-stage CEEMDA-PSO-ELM 506 for dataset 2 is as follows:

507
In the first stage: firstly, the monthly mean sunspot time 508 series of dataset 2 is decomposed by CCEMDAN. The 509 dataset 2 is decomposed into seven components, which are six 510 IMFs (C 1 , C 2 , · · · , C 6 ) and one residual component. Then, 511    Table 6.   Table 7.

527
The optimized model is used to predict each component,      Table 8.

542
It can be seen from Table 8 that CEEMDAN decomposi-543 tion effectively improves the prediction accuracy of ELM, 544 PSO further improves the prediction accuracy on the basis 545 of CEEMDAN-ELM, and the self-correction of errors in the 546 two-stage prediction makes the model achieve higher accu-547 racy. The three error indexes of the proposed model are all 548 smaller than other prediction models, and the coefficient of 549 determination is higher than other models, MAE is 0.2701, 550 RMSE is 0.3603, MAPE is 0.0065 and R 2 is 0.9906. Com-551 pared with CEEMDAN-ELM, MAE decreased by 38.23%, 552 FIGURE 11. Prediction error of dataset 2 by different models.   Comparing the absolute value of the prediction result 566 of different models, that is, the absolute prediction error 567 sequence |FE|, and calculating the descriptive statistical 568 value of |FE| as shown in Table 9. It can be seen from value 569 is closer to the real value. This article uses a two-stage modeling method. The first 578 stage: use CEEMDAN for the smoothing of the monthly 579 mean sequence of sunspots, establish an ELM model for each 580 sub-sequence after decomposition, and use the PSO algo-581 rithm to optimize the ELM parameters of each sub-model, 582 and superimpose the prediction results of the sub-sequences. 583 The second stage: CEEMDAN-PSO-ELM modeling is 584 performed on the residuals obtained in the first stage, and 585 the prediction results of the second stage are obtained. The 586 final prediction result is obtained by summing the predic-587 tion results of the first stage and the second stage. Through 588 simulation experiment analysis, the following conclusions are 589 drawn:

590
(1) The monthly mean value model of sunspots predicted 591 after the sequence is decomposed by CEEMDAN has higher 592 overall prediction accuracy than the direct prediction model. 593 Sequence decomposition can effectively reduce the impact 594 of non-stationary features of the sequence on the prediction 595 results.

596
(2) The PSO algorithm has good parameter optimization 597 capabilities, which can effectively solve the influence of the 598 randomness of the ELM model parameters on the model, and 599 improve the prediction accuracy of the prediction model.

600
(3) The two-stage prediction method can further improve 601 the prediction accuracy on the basis of the above model.

602
The experimental results show that the CEEMD-PSO-603 ELM sunspot forecasting model has achieved good forecast 604 accuracy.