Optimal Variational Mode Decomposition and Integrated Extreme Learning Machine for Network Traffic Prediction

Network traffic prediction plays a vital role in effective network management, load evaluation and security warning. Extreme learning machine has the advantages of fast convergence speed and strong generalization ability. Also, it does not easily fall into local optima. The evolutionary algorithm can be used to optimize the number of its hidden layer nodes. However, most of the existing evolutionary algorithms have some adjustable parameters which depend on subjective experience or prior knowledge. Hence, this can affect the model prediction accuracy. Given this context, this paper proposes a network traffic prediction mechanism based on optimized Variational Mode Decomposition (VMD) and Integrated Extreme Learning Machine (ELM). A Scalable Artificial Bee Colony (SABC) algorithm which has fewer adjustable parameters and can thus guarantee the accuracy and stability of the prediction mechanism is also proposed. It can be used in the optimization selection of VMD, Phase Space Reconstruction (PSR) and ELM to achieve higher prediction performance. Finally, we utilize Mackey-Glass, Lorenz chaotic time series of recognized benchmark and a WIDE backbone actual network traffic data to prove the validity of the proposed network traffic prediction mechanism.


I. INTRODUCTION A. BACKGROUND
Network traffic is the amount of data transmitted over the network. The size of network traffic is of great significance to the design of network architecture. With the continuous expansion of the Internet structure, various network types have consistently emerged in recent years. The constant improvement of the scale of network users has significantly increased the network traffic data, which in turn bring new challenges to the effective management, load evaluation and security warning of the network. Network traffic is time-related data with chaotic characteristics, which can be transformed into a time series problem [1]. Network traffic prediction is to use The associate editor coordinating the review of this manuscript and approving it for publication was Christian Esposito . the network traffic data of the past period to predict the data of a specific moment in the future to grasp the development trend of the incoming network and improve the management ability of network load.
At the beginning of the development of the computer network, the amount of data transmitted by the network was small. Hence, Poisson model and Markov model could be effectively used to describe network traffic data and establish prediction model. In the 1990s, Leland [2] et al. found that self-similar characteristics existed in traffic data; a finding which opened a new chapter of traffic prediction. Some self-similar and linear models were thus used to model and predict network traffic data. Nowadays, with the rapid development of network scale, network traffic proved to be a typical nonlinear time series with the characteristics of timevarying, long correlation, self-similarity, sudden, chaotic and so on [3], [4]. Given that, the traditional linear models are unable to accurately describe network traffic characteristics, leading to poor prediction performance and low accuracy of prediction problems. Thus, researchers shifted from linear model to nonlinear model based on Neural Network (NN), Chaotic Theory (CT) and Statistical Theory (ST), particularly the nonlinear analysis which is mainly based on Neural Network. With the recent research development, the problems of some neural network models were gradually exposed. These issues include gradient explosion of Back Propagation neural network [5], slow training speed and low generalization ability of Radial Basis Function neural network [6], more adjustable parameters of ESN network [7] and so on. In light of the aforementioned, a type of feed forward neural network, Extreme Learning Machine (ELM), was proposed by Huang et al. The ELM model has the advantages of fast convergence speed, strong generalization ability, and not easily falling into the local optima, etc [8]. Hence, it is not surprising that in recent years, time series-based prediction model of ELM has attracted wide research attention. This resulted in an ELM network traffic prediction model based on Map Reduce, proposed by Liu et al. [9]. Tian et al. [10] introduced an improved Artificial Bee Colony algorithm to optimize the number of nodes in the hidden layer of ELM for network traffic prediction. In the same year, he proposed a prediction model based on Empirical Mode Decomposition and ELM [11]. In 2017, Li et al. [12] put forward an OS-ELM prediction model based on down-hole working condition. In the following year, Li et al. advocated a time series decomposition and optimization of ELM short-term wind speed prediction method [13].
Along with the complexity of network traffic data in the current large network, the randomness and non-stationarity could make the actual data have strong nonlinearity, which would decrease the computational accuracy of the prediction model. The time-series decomposition (abbreviated by TSD) analysis seems to be an effective method to handle this problem [13]. In recent years, some TSD methods, such as: empirical mode decomposition (EMD) [14]- [16], local mean decomposition (LMD) [17], variational mode decomposition (VMD) [18]- [20], etc. have caught many scholar's attentions. Variational Mode Decomposition (VMD) is a signal processing method which transfers the signal decomposition process adaptively to a variational framework. It then realizes the signal decomposition by searching for the optimal solution of the constrained variational model [21]. VMD splits the signal into several modal components to minimise the sum of estimated bandwidth of each mode. VMD is an improved version of EMD which has better robustness for the data decomposition [22]; compared to EMD, VMD has a superior denoising property and ability to separate tones of similar frequencies [23]; compared to wavelet decomposition (WD) and EMD, VMD is foremost in decomposing signals nonrecursively by adaptively determining the frequency center of the processing data and thereby optimally fixing each subcomponents limit [24].
Moreover, in our research, the Phase Space Reconstruction (PSR) is employed to reflect the real change of system by the original data sequence. The dynamic change process of any variables in the system can reflect the influence of other variables related to it, and the phase space obtained through reconstruction can reflect the change rule of the system state. So, after VMD decomposition, the original signal sequence is decomposed into K sub-sequence; before ELM modeling each sub-sequence, PSR can be carried out to construct the input and output variables of the ELM model.
In the combination of VMD, PSR and ELM, there is one crucial problem needs to be better solved. That is, the value of some tuning parameters in the model should be accurately set, such as: the number of modes and the iterative factor in VMD, the embedding dimension and the delay time in PSR, and the number of hidden layer nodes in ELM. In some applications, the value of these adjustable parameters is highly dependent on subjective experience or prior knowledge. Hence, when the setting is not reasonable it will negatively affect the optimization result of model parameters and the calculation accuracy and stability of the model. Evolutionary Computation is a research direction of Computational Intelligence that involves combinational optimization. Its algorithm is influenced by the natural selection mechanism of ''survival of the fittest'' and the transmission law of genetic information in the process of biological evolution. It simulates this process through program iteration, treats the problem to be solved as the environment, and seeks for the optimal solution through a natural evolution in a population composed of some possible solutions. At present, Particle Swarm Optimization (PSO) [25], improved Artificial Bee Colony algorithm (IABC) [10], improved Harmony Search algorithm (IHS) [26], improved Free Search algorithm (IFS) [27], improved Gravity Search algorithm (IGSA) [13], Levy-Cloud-Model Fruit Fly Optimization algorithm (LVCMFOA) [28] and other different types of evolutionary algorithms have attracted the attention of researchers. ABC algorithm is a swarm intelligence optimization algorithm which imitates the honey gathering mechanism of bees. Its principle and implementation are relatively simple so as to it has been studied and applied in many fields. ABC algorithm [29] involves three kinds of bee colonies, namely, lead bee, follow bee and scout bee. One leading bee corresponds to a food source and searches the food source first. Follow bees are to select dancing areas and visit food sources. Scout bees conduct random searches of food sources. When the food source is exhausted, the lead bee for the intended food source is transformed into a scout bee.

B. RELATED WORKS
In this paper, in order to decrease the computation amount when conducting many repeated times' VMD decomposition and ELM modeling, an improved Artificial Bee Colony algorithm, named Scalable Artificial Bee Colony (SABC) algorithm, is proposed to ensure the reliability of the selection of different tuning parameters. In this regard, this study VOLUME 9, 2021 optimizes the VMD and an integrated ELM to establish the prediction model mechanism. The study also utilizes the SABC algorithm to realize the optimization selection of VMD, PSR and various adjustable parameters in ELM. The proposed prediction mechanism can be applied to network traffic data prediction to ensure that the prediction results are less affected by adjustable parameters and have higher stability and prediction accuracy.
In recent years, the time series prediction based on VMD and ELM has caught many scholars' attentions. Literature [30] used Robust Kernel based Extreme Learning Machine (RKELM) integrated with VMD to predict stock price and movement, but they did not consider the phase space reconstruction of each subset and the optimized selection of two control parameters. Literature [31] combined the VMD, Singular Spectrum Analysis (SSA), Long Short Term Memory (LSTM) network and ELM to forecast wind speed, but many tuning parameters of the VMD, ELM were not optimized for better choice. In Literature [23], VMD and a new low rank robust kernel based Extreme Learning Machine (RKELM) were combined to forecast solar irradiation, but they did not consider the optimized selection of different tuning parameters. In Literature [32], a hybrid mode decomposition (HMD) method (comprised of VMD, sample entropy (SE) and wavelet packet decomposition (WPD)) and online sequential outlier robust extreme learning machine (OSORELM) was proposed to predict wind speed, but they did not use the phase space reconstruction to reflect characteristics of the real system, and meanwhile the optimized selection of different control parameters was not conducted. Literature [24], proposed a combination of an adaptive regularized extreme learning machine (ARELM) and an improved VMD, and used the Ant Colony Optimization (ACO) algorithm to select the optimal value of some tuning parameters, but they did not consider a synchronous optimization of several control parameters in VMD, PSR and ELM. In Literature [22], VMD and an optimized outlierrobust ELM was used for point and interval forecasting of metal prices, but GWO was only used to optimize several tuning parameters in DFs and ORELM, and a synchronous optimization of several control parameters in VMD, PSR and ELM was not considered.
From the above analysis, main contributions of our work are summarized as follows: (1) The synchronous optimization of VMD, PSR and ELM parameter is used to improve the calculation accuracy and stability of the prediction model.
(2) A Scalable Artificial Bee Colony (SABC) algorithm is designed to improve the convergence accuracy and speed of the algorithm through the new solution generation mechanism and the addition of fine-tuning disturbance to the new solution.
The rest of our works is organized as follows. Section II has a brief overview of VMD, ELM and PSR. In Section III, main steps of the classical Artificial Bee Colony algorithm are firstly given, and then a new Scalable Artificial Bee Colony (SABC) algorithm is introduced, including some improvements. Section IV gives the proposed prediction mechanism for the network traffic data, in which a synchronous optimization of VMD, ELM and PSR is used. In Section V, the validity of the proposed mechanism is verified using three datasets. Finally, Section VI concludes our works.

II. NETORK TRAFFIC PREDICTION MODEL A. VARIATIONAL MODE DECOMPOSITION METHOD
According to VMD [21], its constrained variational model is: · 2 denotes the Euclidean norm.
To address the limitations of the constrained variational model as mentioned earlier, two parameters are introduced; they are quadratic penalty factor α and Lagrange multiplier λ. The constrained variational problem is transformed into an unconstrained variational problem. The transformed unconstrained variational model is: The main steps for solving the unconstrained variational model of Equation (2) are as follows: Step 1: Initialise each modal component, centre frequency and Lagrange multiplier, denoised as: u 1 K , ω 1 K and λ 1 . Let the counter n = 0.
Step 2: Transform each variable from the time domain to the frequency domain. For the n+1 count, in the non-negative frequency interval, K modal component u K is updated as where,û K (ω),f (ω) andλ(ω) are the Fourier transforms of u K , f (t) and λ, respectively.
Step 3: For the n+1 count, update the centre frequency ω K of each modal component as follows: Step 4: For the n + 1 count, update the Lagrange multiplier as follows: Among them, τ 1 is the iteration factor.
Step 5: Set the convergence accuracy ε >0, iteration termination conditions as follows: If equation (6) is true, the iteration is stopped and the result is output. Otherwise, return to Step 2 to continue.
The VMD method involves two main controllable parameters: the number of modes K and the iterative factor τ 1 . The values of the signals need to be pre-set before they can be decomposed. For the purpose of reducing the influence of artificial subjective experience and prior knowledge onset values, an optimization method is used to realize their adaptive selection to adapt to the decomposition of different signals. Therefore, after the signal is decomposed into the optimized number of K modes, the ELM method is then introduced to model each mode.

B. EXTREME LEARNING MACHINE
Extreme Learning Machine (ELM) is a neural Network training method and the purpose is to solve the back propagation algorithm learning efficiency which is low and the parameter setting of complicated issues. ELM can better improve the learning speed of network structure, which avoids the problems of local minimum, iteration number and performance index. Determine the training sample set as {(x i , y i )}, i = 1, 2, . . . , N , N is the number of samples in the sample set. The mathematical model of the Extreme Learning Machine (ELM) can be described by Equation (7): where, t i is the i th training output, β ji is the connection weight between the hidden layer neuron and the i th output neuron, ω j is the connection weight between the hidden layer neuron and the input neuron, b j is the bias of the j th hidden layer neuron, L is the number of hidden layer node; g(·) is the activation function. Equation (7) can be expressed as the following matrix form: G is the hidden layer output matrix, connection weight ω and bias b are randomly given when the system is initialized, and remain unchanged in the training process.
The purpose of ELM training is to find the optimal output weight matrix β 1 β 2 · · · β L T so that the following formula is true.
where y = [y 1 , · · · , y N ] T and · denotes the Euclidean norm. By solving equation (9), the output weight matrixβ is obtained as follows:β where G + is the pseudo-inverse of the output matrix of the hidden layer.
Since ω and b do not change during the model training, the output of hidden layer is only related to the input x i and the number of hidden layer nodes L. The output of the ELM model can be described by the following formula: The output accuracy of ELM model is closely related to the number of hidden layer nodes L. ELM modeling is conducted for the K modes after VMD decomposition, involving the setting of the number of K hidden layer nodes L. Therefore, an optimization method is suggested to realize the adaptive selection in order to minimize the influence of artificial subjective experience and prior knowledge onset values.

C. PHASE SPACE RECONSTRUCTION
After VMD decomposition, the original signal sequence is decomposed into K sub-sequence. Before ELM modeling each sub-sequence, Phase Space Reconstruction (PSR) can be carried out to construct the input and output variables of the ELM model. Let the chaotic time series be {x i , i = 1, 2, . . . , N }, where N is the number of samples in the sequence. Then, the input variables after the reconstruction of the phase space can be expressed in the following form [33]: And the training set can be expressed by Among them, m is the embedding dimension and τ 2 is the delay time. ELM modeling can be completed according to equations (12) and (13). In this paper, a one-step prediction is adopted, that is, to predict the value of the next time point from a period of the data sequence.
m and τ 2 are two critical parameters in phase space reconstruction. With reasonable values, they can make the reconstructed phase space close to the original system to a certain extent. In traditional studies, C-C method [34] and G-P method [35] are often used to determine their values, but the advance calculation is required for specific time series. In this paper, the VMD method is used to decompose the signal, and the phase space reconstruction parameters of each mode cannot be calculated in advance. Therefore, to enable m and τ 2 to have an adaptive capacity for each model after VMD decomposition, the optimization method can be adopted in the training process.

III. ARTIFICIAL BEE COLONY ALGORITHM
In this study, the cluster intelligent optimization algorithm is used to optimize several critical parameters in the network traffic prediction model. These parameters are the number of modes K , iterative factor τ 1 , the number of hidden layer nodes L of each ELM model, and the phase-space reconstruction parameters m and τ 2 after decomposition. Although some intelligent optimization algorithms can be used, such as: PSO, IABC, IHS, IFS, IGSA, and so on, there are some adjustable parameters involved. That are, learning factor in PSO, elasticity coefficient of velocity update and population update, parameter iteration velocity coefficient, and forth; In IABC, control parameters of the food source, random step size and location update coefficient moving towards the optimal solution are given up; In IHS, the harmony number, the retention probability of the harmony memory bank, the likelihood of memory disturbance, the minimum bandwidth, the maximum bandwidth, etc.; In IFS, search small steps, search radius, etc; and coefficient of gravity, coefficient of iteration, percentage of attraction in IGSA, etc. These adjustable parameters could reduce the stability and accuracy of the prediction model. Therefore, in order to solve this problem, the study advocates a Scalable Artificial Bee Colony (SABC) algorithm to minimize the problem of uncertainty and poor stability of the prediction model.

A. CLASSICAL ARTIFICIAL BEE COLONY ALGORITHM
The ABC algorithm realizes the search of the food source (the optimal solution) through the cooperation of the three kinds of bee colonies. The search process of the algorithm is as follows: (1) Randomly generate initial solution x i = {x i1 , x i2 , . . . , x id }, i = 1, 2, . . . , M , M is the number of honey sources, d is the dimension of solution. The initial solution is generated as follows: where (2) In the initial stage of the search, each lead bee generates a new solution as follows: Among them, the k ∈ {1, 2, . . . , M }, j ∈ {1, 2, . . . , d}, and k = i, rands() is the random number in [−1, 1].
(3) After all the lead bees have completed the search process, they will share the solution information with the follow bees in the recruitment area.
The follow bee calculates the selection probability of each solution as follows: where the fit is the fitness function value. Then, a random number is generated in the interval [−1,1]. If P i is greater than the random number, the follow bee will generate a new solution from Equation (15). If the fit i of the new solution is better than the previous one, the follow bee will remember the new solution and forget the old solution; otherwise, the old solution will be retained.
(4) After all the follow bees have completed the search process, if a solution has not been updated after several cycles, the honey source will be abandoned (considered to be trapped in the local optima). When a honey source is abandoned, its corresponding lead bee turns into a scout bee, generating a new honey source from Equation (14).
Artificial Bee Colony algorithm finds the optimal solution through the above cyclic search. The main steps of Artificial Bee Colony algorithm are as follows: Step 1: The initialization phase. Set parameters such as colony number, maximum iteration times, solution search range, etc., and then generate the initial solution within the solution search range according to Equation (14). The number of which is half of the population.
Step 2: Calculate the fitness function value of each initial solution, then sort the advantages and disadvantages according to the results. Take the first 50% as the lead bee and the second 50% as the follow bee.
Step 3: Set the loop conditions and begin the search process.
Step 4: For the leading bees, generate a new solution according to Equation (15), and calculate its fitness value.
Step 5: Evaluate the fitness function value of the new solution. If it is better than the old solution, the bees will be led to remember the new solution and forget the old solution. Otherwise, the old solution will still be kept.
Step 6: Calculate the selection probability of each solution according to Equation (16). The following bee selects the honey source according to probability P i , and produces a new solution according to Equation (15), and calculates its fitness value. If the fitness function value of the new solution is better than the old solution, the bees will be guided to remember the new solution and forget the old solution; otherwise, the old solution will be retained.
Step 7: Judge whether there are any solutions to be given up. If there are, the scout bees will randomly generate new solutions according to Equation (14) and replace them.
Step 8: Record the optimal solution for each iteration.
Step 9: Determine whether the loop termination condition is met; if so, end the loop and output the optimal solution; otherwise, return to Step 4 to continue the search.

B. NEW SCALABLE ARTIFICIAL BEE COLONY ALGORITHM
The search of Artificial Bee Colony algorithm involves three main processes: initial population evaluation, lead bee population renewal, and reconnaissance honey source renewal. Decomposition and modeling calculation will be repeated many times when VMD decomposition and ELM prediction is applied, resulting in the complex calculation and long optimization process time. Also, in the process of lead bee populations' update, there is a need to update each lead through several neighboring lead bees. When applied in predicting VMD decomposition and ELM, every solution of the decomposition and modeling calculation has to be updated. If there is too much focus on local waste and its calculations overlap computing resources, it will take a long time to calculate the optimization. In this regard, this paper improves the search mechanism of artificial bee colony algorithm by reducing the number of iterations of VMD decomposition and ELM prediction required. New Scalable Artificial Bee Colony Algorithm (SABC) was proposed to improve the convergence speed and prediction accuracy.

C. UPDATE MODEL WITH NEW SOLUTIONS
According to the population dimension, each dimension is traversed successively, and an individual lead bee in the lead bee population (the first 50% of the whole population) is randomly selected for the updating every time.
We update the selection formula of the solution as follows: where i represents the i th dimension variable, j represents the j th selected individual, and popsize means the population size. The updated and revised formulae for the new solution are as follows: where neigx i,k is the K th nearest neighbour solution of x i,j , δ is the search step size.

D. FINE-TUNE DISTURBANCES THAT GENERATE NEW SOLUTIONS
After the new solution is generated, falling into local optimum is a concern which should be addressed. Hence, set the probability of fine-tuning disturbance PA, and randomly generate a number in the interval [0,1]. When it is less than PA, a positive fine-tuning disturbance will be generated for the new solution, with the following relationship: where σ is the disturbance step size. When the random number generated in the interval of [0, 1] is greater than or equal to PA, a negative fine-tuning disturbance is generated to arrive at the new solution which has the following relationship: The choice of the PA value of the fine-tuning perturbation probability can be set according to the specific application problem. A larger PA value can accelerate the speed of searching solution space and make the optimization process converge as soon as possible. Smaller PA values focus on local searches, particularly in more detailed searches around the global optimal solution.
Remark: According to equations (17)-(21), the update mechanism of our algorithm can be implemented by the following procedures: 1

IV. PROPOSED PREDICTION MECHANISM
We begin by utilizing the VMD method to decompose the data sequence to obtain K modes corresponding to K data subset sequences. Then, the data subsets in each mode are reconstructed in phase space to get the data sequences in high dimensional space. Finally, the reconstructed data of K group phase space is substituted into the K ELM models as input to obtain the predicted output. During the whole prediction process, the proposed Scalable Artificial Bee Colony (SABC) algorithm is used to optimize the selection of different model parameters. It includes the control parameters K and τ 1 for VMD decomposition, control parameters m and τ 2 for phase space reconstruction and the number of hidden layer nodes L in the ELM model. The main calculation steps of the proposed prediction method are as follows: Step 1: Initialization. Set the population size, the maximum number of iterations, and the range of control parameters to be optimized.
Step 2: Make the initial population and each individual in the population to be the 5-dimensional variable, which is represented as [L, m, τ 2 , K , τ 1 ]. Each individual of the initial population is randomly assigned within its range.
Step 3: VMD decomposition is carried out for the data sequence; phase space reconstruction is carried out for each data subset sequence after decomposition. Then K ELM models are established, and the training samples are used for training.
Step 4: Evaluate each individual in the initial population by the fitness function defined in Equation (22) to determine the optimal individual and global optimal solution in the initial population. (22) where N data is the number of samples, y i and y i are the actual value and predicted value of the i th sample, respectively.
Step 5: Following the main stages of the Scalable Artificial Bee Colony (SABC) algorithm and Equations (17)-(21), update the population using the SABC and obtain the revised new solution.
Step 6: Calculate the fitness function value of the updated new solution according to Equation (22) and compare it with the historical global optimal solution. If it is better than the historical global optimal solution, update the historical global optimal solution; otherwise, keep the historical global optimal solution.
Step 7: If the search termination condition is met, terminate the iteration and output the global optimal solution; otherwise, return to Step 5 to continue the iteration.
The flow chart of the prediction method is shown in Figure 1.

V. THE SIMULATION VERIFICATION
The validity of the proposed mechanism is verified using three datasets. They are Mackey-glass chaotic time series, Lorenz chaotic time series and WIDE backbone actual network traffic data of MAWI Working Group. The experiments are conducted in a simulated testbed. Six swarm intelligence optimization algorithms (IGSA [13], IABC [10], IFS [27], IHS [26], LVCMFOA [28] and PSO [25]) are applied to construct the optimal modeling for VMD and ELM, respectively. The results are used to compare the findings of our proposed algorithm, Scalable Artificial Bee Colony (SABC) algorithm. The adjustable parameters setting in different swarm intelligence algorithms are shown in Table 1.
The different values in the parameters may have some impact on the performance of the algorithm. To avoid the bias, the experiments are conducted repeatedly by randomly selecting the parameter value within the range. The computer configuration is as follows: ''Windows 7'' system, ''Matlab 2015a'', Intel core i7 4.00GHz CPU, 16GB RAM.

A. MACKEY-GLASS CHAOTIC TIME SERIES
The following differential equations describe Mackey-Glass chaotic sequences:  When η >17, the Mackey-Glass sequence has chaotic behavior. In this paper, the η = 30 and the initial value x(t) | t=0 = 0.9 are selected. One thousand data points are generated, as shown in Figure 2. As shown in Figure 2, 1000 data points are generated; the first 700 groups are taken as training samples, while the last 300 groups are taken as test samples. Seven swarm intelligence optimization algorithms (IGSA [13], IABC [10], IFS [27], IHS [26], LVCMFOA [28], PSO [25] and SABC) are used to conduct optimization modeling for VMD and ELM, respectively. Each model is run repeatedly for several times. The optimal value and worst value predicted by different data sets under the three indicators of Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) as well as the average training time and test time of the model are accordingly recorded. The average fitness curves of different models at the training stage are displayed in Figure 3.  As seen in Figure 3, the convergence performance of IABC-VMD-ELM is the worst in Mackey-Glass chaotic time series. Meanwhile, the IFS-VMD-ELM and IGSA-VMD-ELM fall into the local optima state. The convergence speed of IHS-VMD-ELM is relatively slow. IFS-VMD-ELM, LVCMFOA-VMD-ELM and SABC-VMD-ELM all have a good convergence performance, but the SABC-VMD-ELM has a good effect on the convergence accuracy.
The average output curves of different models in the test stage are shown in Figure 4.
According to Figure 4, IFS-VMD-ELM, IGSA-VMD-ELM and LVCMFOA-VMD-ELM all have poor generalization ability. However, SABC-VMD-ELM can track the actual output better and has better prediction ability comparatively. The predictive performance metrics (RMSE, MAE, and MAPE) for several models run repeatedly are shown in Table 2.
The findings in Table 2 depict that IGSA-VMD-ELM, IFS-VMD-ELM, LVCMFOA-VMD-ELM and PSO-VMD-ELM  have a wide range of variations. It indicates that different values of adjustable model parameters in the intelligent optimization algorithms will affect the prediction results. The optimal predictive performance metrics of IHS-VMD-ELM is better than that of IABC-VMD-ELM. Yet its overall predictive performance and stability are weaker than that of IABC-VMD-ELM. Compared with the other six models, SABC-VMD-ELM has better prediction performance, overall prediction performance and stability.
The comparison of the average training time and average test time of the seven models is shown in Figure 5. Figure 5 shows that in comparison with the IABC-VMD-ELM model, SABC-VMD-ELM can significantly reduce the training time. Although SABC-VMD-ELM does not have the least training time among all models, it is acceptable when overall predictive performance and stability are taken into account.

B. LORENZ CHAOTIC TIME SERIES
The differential equation of the Lorenz equation is described as follows: where A = 10, B = 2 and C = 8/3, and the initial values of x(0), y(0) and z(0) are 12, 2 and 9, respectively. One thousand sets of x(t) component data are generated, as shown in Figure 6. The first 700 groups are taken as training samples, while the last 300 groups are taken as test samples. The seven models are run repeatedly for several times. The optimal value and the worst value predicted by different data sets under RMSE, MAE and MAPE as well as the model's average training time and test time, are recorded, respectively.
The average fitness curves of different models at the training stage are shown in Figure 7.   Figure 7, the convergence performance of IABC-VMD-ELM is the worst for Lorenz chaotic time series. Whereas IGSA-VMD-ELM, LVCMFOA-VMD-ELM, IFS-VMD-ELM and PSO-VMD-ELM fall into the local optima in the initial stage. IHS-VMD-ELM has good convergence accuracy, but its convergence speed is slow. On the contrary, the SABC-VMD-ELM has a good effect in both convergence speed and convergence accuracy.

As shown in
The average output curves of different models in the test stage are shown in Figure 8.
It can be seen in Figure 8 that the random value of adjustable parameters will affect the stability of the predicted output of the model when repeated many times. Compared with the other models, SABC-VMD-ELM is the least affected and has certain advantages in prediction accuracy and stability. The predictive performance metrics (RMSE, MAE, and MAPE) for several models run repeatedly are shown in Table 3.
Based on Table 3, it can be concluded that after many repeated operations, the predictive performance metrics of IGSA-VMD-ELM and LVCMFOA-VMD-ELM have a broad range of variations, indicating that the predictive performance of the two models is greatly affected by the random value of adjustable parameters. IHS-VMD-ELM, IFS-VMD-ELM and PSO-VMD-ELM have good predictive stability, but the prediction accuracy is low. Although IABC-VMD-ELM proves to have the best prediction accuracy, its prediction stability is still poor. Compared with the other models, SABC-VMD-ELM demonstrates better prediction performance and stability on the whole.  The comparison of the average training time and average test time of the seven models is shown in Figure 9.
As shown in Figure 9, SABC-VMD-ELM can significantly reduce the training time when compared with the IABC-VMD-ELM model. Moreover, SABC-VMD-ELM has an advantage over the latter when considering the overall prediction performance, stability and training time.
C. WIDE BACKBONE NETWORK TRAFFIC DATA WIDE backbone network traffic data is MAWI Working Group which investigates the Internet service records of network traffic datasets (http://mawi.wide.ad.jp/mawi/) in routine work. From July 1 to July 21 2018, 480 data points in the data set were collected and processed and used for simulation experiment. The sampling period of the data was 1 Hour and recorded as ''Hour'' data set [28]. The 480 data points in the data are shown in Figure 10, in which the first 430 data points are used as training samples, and the last 50 groups of data are used as test samples.
The 7 models are run repeatedly for several times. The optimal value and the worst value predicted by different data sets under RMSE, MAE and MAPE as well as the model's average training time and test time are recorded, respectively.
The average fitness curves of different models at the training stage are shown in Figure 11.
As shown in Figure 11, for the ''Hour'' data set, IFS-VMD-ELM, LVCMFOA-VMD-ELM and PSO-VMD-ELM fall into the local optima state at the initial stage as well as demonstrate poor convergence effect. Although IHS-VMD-ELM gradually converges, it does not achieve the best convergence accuracy. The convergence precision of IABC-VMD-ELM is better than that of IHS-VMD-ELM, but the convergence speed is slow. Compared with the other models, SABC-VMD-ELM has a better effect on convergence speed and convergence precision.
The average output curves of different models in the test stage are shown in Figure 12. Figure 12 shows that IGSA-VMD-ELM, IFS-VMD-ELM, IHS-VMD-ELM, LVCMFOA-VMD-ELM and PSO-VMD-ELM have low generalization ability when the adjustable parameters are randomly taken in this actual network traffic data. IABC-VMD-ELM and SABC-VMD-ELM can track the actual output better, but the latter has an advantage in prediction accuracy.
The predictive performance metrics (RMSE, MAE, and MAPE) for several models run repeatedly are shown in Table 4.
Based on the results in table 4 and on the actual network traffic data, it can be surmised that the random value of adjustable parameters will have a great impact on the generalization ability of IGSA-VMD-ELM,  IFS-VMD-ELM, IHS-VMD-ELM and LVCMFOA-VMD-ELM, resulting in a large deviation in the predicted output. Although the optimal value of PSO-VMD-ELM prediction is acceptable, the overall prediction performance is unstable. On the other hand, IABC-VMD-ELM and SABC-VMD-ELM can track the actual value well and are less affected by the random value of adjustable parameters. At the same time, SABC-VMD-ELM has better prediction accuracy and stability.
The comparison of the average training time and average test time of the seven models is shown in Figure 13.
As can be seen from Figure 13, SABC-VMD-ELM significantly reduces the training time when compared with IABC-VMD-ELM. However, in comparison with the other five models, it has a longer training time; yet it has a clear advantage in testing time. In sum, the prediction performance of SABC-VMD-ELM is the best when considering the prediction accuracy, stability and calculation time.

VI. CONCLUSION
This paper proposed a prediction mechanism based on optimal variational mode decomposition and integrated extreme learning machine to predict the network traffic. To reduce the influence of different scale characteristics of network traffic on the prediction accuracy, firstly, the variational mode decomposition method is used to decompose the network traffic data. Then, for the sub-data set corresponding to each mode after decomposition, the predictive sub-models are established respectively using the integrated extreme learning machine. Finally, the outputs of all the sub-models are integrated to get the outcome [36]. In the process of designing a prediction model, several optimized parameters are involved in VMD, PSR and ELM. A Scalable Artificial Bee Colony (SABC) algorithm is thus proposed to train their optimal values. Compared with other evolutionary algorithms, the proposed SABC algorithm has less adjustable parameters. Another advantage is its optimization accuracy and stability. It improves the performance of network traffic prediction model by optimizing multiple parameters in the prediction model. Simulation experiments are carried out with Mackey-Glass chaotic time series, Lorenz chaotic time series and WIDE backbone-network traffic data. The results proved that the proposed model demonstrates better accuracy in predicting network traffic.
JINMEI SHI (Member, IEEE) is currently pursuing the Ph.D. degree in computer science with the Faculty of Computing and Informatics, Universiti Malaysia Sabah. She has been engaged in research and teaching for 13 years at Hainan Vocational University of Science and Technology. She has always been in the forefront of scientific research. She plays an active role in the construction of computer science and personnel training, and has good practical experience and academic foundation. Her research interests include network traffic prediction, algorithm analysis, and software. In recent years, she has published ten articles, three sponsored projects, three books, and ten awards above the provincial level.