Optimization and Decomposition Methods in Network Traffic Prediction Model: A Review and Discussion

The 21st century is a high-tech information era in which our lives are closely linked by computer networks. Hence, how to effectively supervise networks and reduce the frequency of network security incidents has now become a research hotspot in cyberspace. Specifically, researchers have shown an increased interest in predicting the network traffic before any untoward incident happens. Optimization and decomposition technologies are the core components of network traffic prediction model which plays an important role in network management. This article discusses past network traffic prediction research and critically examines the optimization and decomposition technologies used in the model, lists the model parameter structure based on the research methodology, the data set used, the evaluation criteria and so on. By comparison, digging out the Particle Swarm Optimization (PSO) algorithm and the Variational Mode Decomposition (VMD) decomposition technique will effectively solve the network traffic model predictive difficulties that have proven to be crucial to improving predictive accuracy and convergence speed strategy.The comprehensive review reveals that PSO and VMD are the most suitable optimization algorithm and decomposition technology for network traffic prediction modeling.


I. INTRODUCTION
The huge tide of the Internet is driving the rapid development of society, whereby computer network has become an important technical means of the information society. To ensure network service quality, the increasing network traffic has triggered an alarm for network monitoring. These rapid changes have serious effects so much so that the detection of traffic abnormalities has become harder. A single detection can no longer provide a reliable network for an organization. Instead, prevention/intervention should be foreseen before abnormal events occur. In network security monitoring, network traffic is an important parameter to evaluate the running state of the network [1]. Normal network traffic indicates the ordered and safety of the network operation. On the contrary, malicious and abnormal network traffic The associate editor coordinating the review of this manuscript and approving it for publication was Kashif Saleem . model is difficult to converge or falls into local optimal solutions, their findings show that their models are inadequate to reveal subtle internal features of the traffic, unable to predict and take preventive measures accurately and quickly. Traffic modelling and prediction with self-similarity, chaos and mutability have serious effects on the network. Therefore, how to precisely eliminate interference and predict multi-scale characteristics of network traffic in a prediction model has become a worthy research topic.
This article first briefly states the importance of network traffic prediction. Then, it proceeds with reviewing the optimization algorithm in the network traffic prediction mechanism. The third section of the paper discusses the decomposition technology in the traffic time series model. The paper concludes by reiterating the significant role of optimization and decomposition methods in network traffic prediction model.

II. OPTIMIZATION ALGORITHM
In an attempt to better describe the traffic characteristics and improve the accuracy of the prediction model, effective network protection measures, prediction model and its key should be taken in advance. The core of solving this problem lies in the selection and improvement of optimization algorithm, researchers have tended to adopt various optimization algorithms to optimize the model. This article reviews some of these algorithms namely Genetic Algorithm (GA), Quantum Genetic Algorithm (QGA), Fruit Fly Optimization Algorithm (FOA) and Particle Swarm Optimization (PSO), which is shown in figure 1.

A. GENETIC ALGORITHM
Genetic Algorithm (GA) is a search heuristic algorithm combining genetic and evolutionary computing. It was developed by an American professor, J. Holland, [4] in 1975. In 1989, Goldberg's work made a comprehensive and systematic summary and discussion of the genetic algorithm which laid the foundation for the genetic algorithm which based on the nature of the pattern of change [43]. The purpose of GA is to compare many different individual solutions taken from a population and select the one that fits the data best in terms of a characteristic value. Selection simulates the natural 'law of survival of the fittest.
For the event and the data set contained in the event, Genetic algorithm first collects the characteristic quantity of the data model, and then finds the best individual. The process of genetic algorithm is as follows: (1) The data of the initial population is coded to determine its code length, and the initial population is determined by the random number group output.
(2) Calculate the fitness functions of the initial population to determine whether the current requirements are satisfied. If they are, proceed to step (4). Otherwise, go to step (3).
(3) The population in the dataset is selected for replication, crossover and variation, and then output the next generation population. At this time, the number of iterations increases by one, and the genetic algebra increases by one. Then the fitness function of the new population is calculated to determine whether the current requirements are met. If they are, go to step. (4)Otherwise, proceed to step (3).
(4) Return the current optimal individual and complete the entire process.
The process as discussed above is encapsulated in figure 2. Genetic algorithm can optimize the local optimal solution situation through the optimization process to get better algorithm performance. It is usually used to improve the ability of data processing. However, the algorithm's ability to explore new space is limited and it is easy to converge to the local optimal solution. Moreover, the computation time is a problem when the problem is complex, involving a large number of individuals. Despite that, researchers in recent years have successfully developed a quantum genetic algorithm based on genetic algorithm which does not only address the shortcomings of genetic algorithm but also produces good performance.

B. QUANTUM-INSPIRED GENETIC ALGORITHM
Quantum Genetic Algorithm (QGA) is a combination of quantum computing and genetic algorithm. This algorithm was proposed by M. Moore and Narayannan from Exeter University in 1996 [5]. QGA combines quantum bits of data and processes the data through quantum chromosomes which is the classical theory of genetic algorithm. New quantum VOLUME 8, 2020 chromosomes such as crossover and mutation were adopted for relevant processing and analysis so as to ensure that the rules in the data could be well observed so as to finally obtain efficient system model and data with high prediction accuracy. Utilizing the quantum algorithm to solve the gradient explosion problem of BP, Kun Zhang proposed the QPSO-BP prediction model [6]. On the other hand, Ying Han adopted Quantum mechanics theory to optimize five important parameters of ESN, and proposed QFOA-ESN model mechanism to provide model accuracy [7].
The steps of quantum genetic algorithm are mainly divided into three steps: Step 1: Initialize. The feature quantity corresponding to the initial data is loaded, and then the feature quantity is extracted.
Step 2: Measure. Measuring each chromosome in population Q (t) and analyze the probability of different genes to obtain the corresponding definite solution which called P (t).
Step 3: Evolution. Select, cross and mutate to determine whether the entire algorithm process meets the termination condition, and return the current best individual if it meets the condition; otherwise, carry out continuous iterative calculation until the corresponding condition is met.
Pseudo-code of quantum genetic algorithm is shown in Table 1. The fundamental parts of quantum genetic algorithms are quantum coding and quantum gates. Quantum coding itself is combined with quantum bits, which can construct quantum coding for the corresponding dataset or multiple individuals. It also pays attention to the whole search problem so as to solve the local optimal solution. Quantum gate makes the algorithm more efficient, has better data results and global characteristic optimization, and also increases the possibility of data variation so that the data utilization scale is further amplified. It can be seen that the quantum genetic algorithm has the ability to search all spatial solutions and has fast convergence speed and high performance. Moreover, this algorithm can prevent getting into the situation of local optimal solution. One weakness, however, it's very heavy computation [8]. Hui Tian used the Quantum Genetic Algorithm (QGA) with efficient global search capability to model and predict the network traffic, then proposes QGA-BP hybrid model to predict the development trend of network traffic. However, the model ignores the problem of the QGA slide rule, which increases the complexity of the hybrid model, resulting in lower generalization performance and lower convergence rate [36].

C. FRUIT FLY OPTIMATION ALGORITHM
Fruit Fly Optimization Algorithm (FOA) is an intelligent algorithm proposed by Dr. Pan Wenchao in 2011 [9]. Fruit Fly Optimization Algorithm, with its sensitive sensory system, can fly to the target source quickly and efficiently. Its sense of smell can not only search different smells, but also can search dozens of kilometres away. It can smell the approximate location of food. After a blind flight for a period of time, it can closely observe the surrounding individuals, determine the exact location of food according to the information source and vision, and then fly to target location.
Generally speaking, when looking for food from (0,0), individual fruit flies first use their own olfactory organs to smell the smell of food and send smell information to the surrounding fruit flies, or receive smell information from the surrounding fruit flies (X,Y).Then the flies used their visual organs to compare the location of the flies that collected the best odor information in the current group (X2,Y2). The other flies in the group all flew to this location and continued to search.The process of Fruit Fly searching for target food is shown in figure 3. Fruit fly searches for food in two broad stages: (1) Blind flight stage: at a distance from the food source, the fly uses its unique olfactory advantage to quickly locate the general location of the fly, and then fly to the food source.
(2) Precise flight stage: after the previous stage of flight, the fruit fly has been in the vicinity of the food source. At this stage, in addition to observing the surrounding population to obtain information, the most important thing for it to do is to directly fly by means of sharp visual positioning.
The three main features of fly flight are as follows: (1) Feature 1: fruit fly goes near the group position.
(2) Feature 2: fruit fly according to odor concentration and the whole population, flies to individuals with higher odor concentration.
(3) Feature 3: fruit fly population as a whole is close to the food source.
According to the food search stage and flight feature of fruit fly, the final fruit fly algorithm has four parameters, including initial value, iteration step length, sizepop and maxgen. The following is a detailed introduction of the influence of each parameter on the algorithm.
(1)The initial value: it is generated by a random function, and the random() function generates the population diversity that determines the entire fruit fly's ability to search and whether it can find the target food.
(2) Iteration step length: the iteration step length of fruit fly runs through the whole process of the whole algorithm search. When the iteration step length of fruit fly is relatively long, each flight distance is relatively long. Accordingly the whole fruit fly search process will be accelerated. However, it is easy to miss the global optimization in the process of flight and fall into the local optimization. When the step size is smaller, fruit fly flies slowly, but it is not easy to fall into local optimization. At this time, we should choose a proper iterative step size so that it can find the global optimal value with high flying efficiency of drosophila.
(3)The sizepop: the number of individual fruit fly searching for food is related to the size of the fruit fly searching ability. If the number of individual fruit fly is larger, each individual is making its own contribution to the group, then the target food should be located more quickly. However, if the number of individual is large, the memory and running time will be affected. The selection of individuals should also be appropriate when using the fruit fly algorithm.
(4) The maxgen: obviously, the more iterations, the longer the running time is. However, in the case of fewer iterations, it is possible that fruit fly has not yet reached the target source. Therefore, the selection of the number of iterations will also affect the performance of the fruit fly algorithm.
Based on the above analysis, the FOA algorithm has a faster convergence speed and does not rely on manual experience. Using the FOA method, Ying Han managed to solve and optimize the four parameters of ESN. He eventually proposed a network traffic combined model which improves the prediction accuracy [10]. Unfortunately, in the iterative process, it is easy to fall into the ''premature'' linearity and local optimal solution, which can in turn affect the optimization performance of the algorithm.

D. PARTICLE SWARM OPTIMIZATION
Particle Swarm Optimization (PSO) was proposed by Kennedy and Eberhart in 1995 [11]. PSO is a heuristic search algorithm for natural clustering such as clustering the behaviors of birds and insects. This algorithm is similar to genetic algorithm. It is based on the mutual transmission, cooperation and competition among individuals of information, and seeks the optimal value in the global space. The PSO algorithm starts from the initial random solution, that is, the particle is iterated from the beginning, and the optimal solution is obtained by changing the direction, speed and position of the search. In basic particle swarm optimization, particle updating its state is usually based on three rules: maintaining inertia; changing the position and state of the optimized particles through experience; and referring to the experience of other particles, the static change of the global optimal position. Tao He put forward the PSO-RBF combined model based on optimized PSO algorithm to realize traffic prediction [12]. Fei Han used the adaptive PSO algorithm to optimize ELM network which led to the APSO-ELM hybrid model [13]. Yi Yang utilized PSO to optimize the two parameters of Least Squares Support Vector Machines(LSSVM) to predict the network traffic [14].

1) ALGORITHM PRINCIPLE OF PSO
Suppose in a search space with dimension D, the population X i = (X 1 , X 2 , . . . ,X n ) is made up of n particles. Suppose the i th particle can be represented as a D-dimensional vector X i = (X i1 , X i2 , . . . ,X iD ) T , which represents the position of the i th particle in the D-dimensional search space, is a potential solution to the problem. Once the objective function is determined, we can calculate the fitness value corresponding to the position X i of each particle according to its formula. The velocity of the i th particle is V i = (V i1 , V i2 , . . . ,V iD ) T , whose individual extreme value is P i = (P i1 , P i2 , . . . ,P iD ) T , the population extreme value of the population is P g = P g1 , P g2 , . . . ,P gD T .
In each iteration of the algorithm, the particle will adjust its position and speed according to the size of individual extreme value and population extreme value, as shown in formula (1) and (2).
where, ω is the inertia weight; d = 1, 2, . . . , D; i = 1, 2, . . . , n; k is the number of iterations; V id is the velocity of the particle swarm; c 1 and c 2 are constants that greater than zero (c 1 ≥ 0,c 2 ≥ 0), defined as the acceleration factor; r 1 and r 2 are random numbers distributed in the interval of [0,1]. In order to prevent the particle from searching blindly, it is generally recommended to limit its position and speed to a certain range [−X max , CX max ] and [−V max , CV max ].

2) BASIC FLOW OF PSO ALGORITHM
Particle swarm optimization algorithm (PSO) is often used to find the optimal solution between the maximum and minimum optimization problems. The general processing steps are as follows: Step 1: Initialize the particle swarm (set the population size as M); Step 2: Calculate the fitness value of the particle; Step 3: Compare the fitness value of each particle with the best position P best that it passes through. If the fitness value of the particle is good, the best position P best of the particle will be replaced by the particle with the current fitness value.
Step 4: For each particle, compare its fitness value with the best position G best in the whole population. If the fitness value is good, the best position G best in the current population will be replaced by the particle with the current fitness value.
Step 5: Adjust the velocity and position of particles according to formulae (1) and (2); Step 6: If the end condition is not met, go to step 5.
To summarize these steps, a flow chart of particle swarm optimization algorithm is depicted in figure 4. The analysis based on PSO algorithm shows that the learning factor, the elasticity coefficient of velocity update and population update and the parameter iteration velocity coefficient are all adjustable coefficients in PSO. Moreover, the principle is simple and easy to realize. Weijie Zhang improved the PSO algorithm to optimize the three parameters of RBF as well as proposed the network traffic combination prediction model. By adjusting the inertia weight and learning factor to improve the global search ability in the global extremum search, he was able to solve and avoid local optimal solution of PSO. Then he optimized the four parameters of the model network so as to obtain the accuracy of the prediction model. However, it is easy to fall into the local optimal solution in the iteration process [15]. Table 2 summarizes the strengths and limitations of the optimization algorithms. The application of optimization algorithm in network traffic prediction model is also analyzed and compared as shown in table 3. Based on the comparison table above, it can be surmised that with PSO algorithm it is easy to fall into the problem of local optimal solution in the iterative process. Yet it has the advantages of very fast convergence, few parameters, simple principle and easy implementation, PSO also has a very good ability to identify network traffic characteristics [12].
Compared with other optimization algorithms, PSO is found to be feasible and effective for constructing traffic prediction model. This in turn lays a good foundation for follow-up research and algorithm optimization. At the same time, how to solve the local optimal solution problem of PSO will be another key issue as well as an optimization strategy for constructing the prediction model is much needed. In the future research, we will design the algorithm to solve the problem of PSO local optimal solution, and then build the network traffic prediction mechanism.

III. DECOMPOSITION TECHNIQUE
Network traffic has the characteristics of nonlinearity and nonstationarity such as suddenness, self-similarity, multifractal, chaos and so on. Time series analysis has been found to be an effective method to solve the nonlinearity and non-stationarity characteristics of traffic sequence [42] as it provides a modeling idea based on time frequency analysis for traffic analysis. Hence, researchers have introduced time-frequency analysis into traffic law analysis by applying the signal analysis theory to traffic time series analysis.   Time-frequency analysis is to treat the traffic sequence as a signal sequence formed by the accumulation of sinusoidal wave elements with different periods. Firstly, appropriate technologies are adopted for identification and decomposition. Then, by analyzing the subtle features contained in the signal sequence, the internal variation rules of the traffic sequence are mastered. Finally, the trend and characteristics of the future traffic are predicted. Wavelet Analysis and Modal Decomposition are effective applications of timefrequency analysis. In recent years, Wavelet Analysis and Modal Decomposition have been widely used in traffic sequence prediction modeling, and they are effective tools for signal decomposition. This article will first analyze several common signal decomposition technologies such as Wavelet Analysis, Empirical Mode Decomposition(EMD), Ensemble Empirical Mode Decomposition(EEMD) and Variational Mode Decomposition(VMD). It then proceeds to seek a strong scientific and technological basis for sequence signal decomposition, as shown in figure 5.

A. WAVELET TRANSFORM
Wavelet Analysis (WA) and its application are the product of the development of Fourier to a certain stage, and are a relatively common Analysis method in signal processing also known as Wavelet Transform (WT). It was developed by French engineer, Goupillaud et al. [17] in 1984. In 1986, a famous mathematician, Y.Meyer, cooperated with Mallat [18] to establish a unified method for constructing wavelet bases Wavelet transform mainly changes the signal by changing the wavelet function. The wavelet function has the characteristics of fast fading and continuous oscillation which is similar to the Fourier transform. Through this processing method the characteristics of the signal in the time domain and frequency domain can be determined. Hence the signal can be analyzed through these characteristics. In recent years, wavelet transform has become one of the most powerful signal processing tools in non-stationary signal decomposition and has been widely applied in the field of network traffic prediction [19], [20].
Assuming that in any space L 2 (R), F (t) function passes through wavelet basis expansion, we call this expansion as continuous wavelet transformation, and the expression is shown in formula (3).
In the above formulas, the most important one is that the wavelet basis has two parameters of scale a and translation b. Because of the definition of continuous wavelet transform, wavelet transform is also an integral transform. In this case, we call WT x (a, b) the wavelet transform coefficient.
If a and b in formula (4) are discretized simultaneously, and a=a i 0 , b=ka i 0 b 0 , the discrete wavelet transform of x (t) can be obtained, as shown in formula 4).
This transform not only saves the computation time, but also reduces the information redundancy after the continuous wavelet transform.
With the development of research, Mallat et al. proposed a fast algorithm for calculating discrete orthogonal wavelet transform, namely Mallet algorithm [18], based on the analysis of multi-resolution signals. The principle is shown in figure 6. As can be seen from figure 6, the Mallet algorithm decomposes the low-frequency part, and the high-frequency part is obtained from the low-frequency part, and is not considered after the decomposition. The decomposition formula is indicated in formula (5).
WT can analyze the characteristics of signal in time domain and frequency domain and ensure certain accuracy when obtaining signal characteristics [21]. At the same time, the scale parameters and multiresolution of wavelet transform are the key of signal decomposition.
Recent research has reported that the wavelet transform was confused by decomposition and was not suitable for high-frequency signal analysis [22]. At the same time, due to the limitation of wavelet basis length, after scaling decomposition, the result of wavelet transform will be a signal at a certain scale. Further, its frequency component is only related to the sampling frequency, but has nothing to do with the signal itself, thus making it difficult to decompose the flow signal [23].

Empirical mode decomposition (EMD) is the first Modal
Decomposition analysis method, which is a new method of signal processing. This method was proposed by Huang, N E et al. in 1998 [24].The essence of EMD is to smooth the time series, decompose the original signal into several Intrinsic Mode functions (IMF), and determine the intrinsic characteristics of effective signals in data according to experience. It can more effectively reflect the spatial (or temporal) scale distribution of energy in physical processes. By EMD decomposition, the non-stationary complex signals of each IMF component are stable. Thus, EMD is an effective method to decompose non-stationary signals [25].
EMD decomposes the original sequence into several subsequence components (IMF). IMF components. After decomposition, the IMF not only have the internal characteristics of the original signal, but also show stronger regularity and stability than the original signal. The sub-mode IMF obtained by EMD decomposition needs to meet two conditions: (1) The number of extreme points and zero as the same, almost in 1; and (2) The mean value of the upper and lower envelope is zero.
For a given signal x (t), EMD decomposition algorithm is as follows: (1) Need to know the extremum of x (t). CSI functions are used to construct envelope a u (t) and a t (t) of maxima and minima, respectively.
(2) The mean values m (t) of the upper and lower envelop lines are calculated as shown in formula (6).
(4) Subtract IMF i (t) from the original signal x (t) to get the new signal m i (t) with the high-frequency components removed, as shown in formula (7).
(5) If m i (t) is monotone or constant, then the residual R (t) = m i (t) is decomposed. x (t) based on EMD can be expressed as shown in formula (8).
Evidently, multiple IMFs are obtained after signal decomposition by EMD to display more internal characteristics of traffic data. Its basis function is derived from the signal itself, which has obvious self-adaptability and becomes a breakthrough in the field of time series signal decomposition. It also eliminates the influence of remote dependence and noise signals, and solves the problem of multiresolution and decomposition scale of wavelet. Thus, EMD is more suitable than WT for the decomposition of nonlinear and nonstationary signals [26]. Regrettably, with the deepening of the research, Deering et al. stated that EMD caused modal aliasing and endpoint effect due to the intermittent driving mechanism [27].

C. ENSEMBLE EMPIRICAL MODE DECOMOPSITION
In addressing the problems in EMD, Zhang et al. proposed a set of Ensemble Empirical Mode Decomposition (EEMD) algorithm in 2010 [28]. As a new auxiliary data analysis method, EEMD uses statistical characteristics of white noise to decompose the original signal. The EEMD method improves the decomposition accuracy and has been rapidly applied to machine monitoring and fault diagnosis [29].
EEMD technology is to decompose the signal containing intermittent signal into IMF1,IMF2,. . . , IMFn. IMF1 contains intermittent high frequency signals and partial low frequency sinusoidal signals which will all appear in IMF2. Obviously, the modal mixing problem occurs between different IMFn. Mixing has obvious negative effects on subsequent decomposition. The waveform frequencies of IMF2 and IMF3 are similar, but they are located in different IMF components. Thus, it can be seen that intermittency not only causes severe aliasing in time-frequency distribution, but also makes the physical meaning of a single IMF vague [25].
In discussing the optimal EEMD, Du et al. pointed out that the traditional EEMD was mainly due to the decomposition process [25]. If the amplitude of the additional noise is too small relative to the original signal, modal mixing cannot be effectively eliminated. Inversely, if the increased noise amplitude is too large, some extra IMF components will be generated, leading to misinterpretation of the analysis results. To completely offset the effect of adding white noise, numerous integration tests are required, and too many tests increase the computational cost. At the same time, if the amplitude of the white noise is set unreasonable, it will also produce unsatisfactory results.
It can be seen here that the performance of EEMD depends to a large extent on the selection of two important VOLUME 8, 2020 parameters: the amplitude of white noise and the number of overall tests [25].Therefore, EEMD can decompose the original data signals into several IMF, and eliminate pattern aliasing. But, it over-dependence on the selection and optimization of parameters which affects the effective decomposition of signals and cause lower accurate prediction. Compared with WT, EMD and EEMD, VMD can overcome the problems of mode aliasing and endpoint effect. It also has a better decomposition effect which has been applied in the field of signal research. Wang et al. analyzed the friction fault signal and identified the validity of the friction-induced signature by comparing the experimental data in VMD, Wavelet Transform (WT), Ensemble Empirical Mode Decomposition (EEMD) and EMD. Their findings subsequently proved the robustness of VMD over the other methods [31].
Similarly, Ma based on the VMD method, decomposed the traffic and carried out fault detection and concluded that VMD indeed showed better performance, lower noise and higher detection accuracy than EMD and EEMD [32].

2) PRINCIPLE AND ALGORITHM PROCESS OF VARIATIONAL MODE DECOMPOSITION
VMD transforms the signal decomposition process into the solution of the variational problem. Then find the optimal solution and update the modal function. Finally, the sub-mode is converted to IMF mode in time domain by inverse Fourier transform.
If the IMF has a limited bandwidth of the central frequency, we can change the variational problem into solving N modal functions U n (t) , C (n = 1, 2, . . . , N), and the IMF has the minimum estimated bandwidth and constraint conditions for solving the problem, and the sum of each IMF is taken as the input signal for the modeling. The specific steps are as follows: (1) According to the HHT processing mode function U n (t) analytic signal, we can calculate its unilateral spectrum, as shown in formulae (9) and (10).
(2) The spectrum of each mode is modulated to the corresponding base frequency band based on the mixing-estimated center frequency e −jw n (t) of the analytical signals of each mode, as shown in formula (11).
(3) Solve the square norm L 2 of the above signal gradient. We need to estimate the bandwidth of each mode signal, and the expression of the constrained variational problem is shown in formula (12).
To find an optimal solution of the above constrained problem, introduce the quadratic penalty factor C and Lagrance multiplication operator θ t . The improved Lagrance expression is shown in formula (13). where, ω is the random frequency, finish ω − ω n → ω substitution, we can calculate the integral expression of the non-negative frequency interval, as shown in formula (15).
At this point, the solution of the quadratic optimization problem is shown in formula (16).
According to the same calculation principle, update the ω m+1 n center frequency and get the formula (17).
where,û m+1 n (ω) is the Wiener filter of the remaininĝ f (ω) − n n=1û n (ω).ω m+1 n is the center of the current IMF power spectrum.û k (ω) is Fourier inverse, and the real part is Based on the above analysis of VMD variational problem solving, it can be concluded that penalty factor C and Lagrance multiplication operator θ t play a key role in VMD variational problem solving.
The VMD algorithm flow can be summarized in these five steps as follows: (1) Initialization parameter u 1 n , ω 1 n , θ 1 and n. (2) Update u n and ω n according to formulae (18) and (19).
(5) The decomposed results will be output, and we can obtain the each Intrinsic Mode functions (IMF).
According to the above analysis, the VMD decomposition technique involves two main parameters: the number of modes and the iterative factor. The values of the signals need to be pre-set before they can be decomposed. In order to reduce the influence of artificial subjective experience and prior knowledge on the value, an optimization method must be adopted to realize the adaptive selection of parameters to adapt to the decomposition of different signals, so as to build a prediction model mechanism with high accuracy.
Based on the above statement of time series decomposition technology, this article summarizes and compares the strengths and limitations of WT, EMD, EEMD and VMD (see table 4), in order to ascertain the correct data signal processing method constructed by network traffic mechanism In addition, the application of decomposition technology in network traffic prediction model is analyzed and compared as shown in Table 5.
The literature reviews reveal that VMD is less susceptible to noise interference in decomposing noise signals. VMD is does not only overcome the multiresolution and decomposition scale problem in traditional Wavelet Transform, but it can also solve the problem of mode aliasing and white noise amplitude in EMD and EEMD methods [30]. Due to this good decomposition effect and robustness, VMD is more suitable as the core decomposition technology of network traffic prediction model. VMD can adaptively determine the relevant bands and estimate the corresponding patterns, then appropriately balancing the errors between them. According to the narrow-band characteristics corresponding to the current IMF definition, VMD finds a set of patterns and reconstructs and gives them in an optimal process to obtain the best input signal.
Distinctly, the decomposition of VMD and the reconstruction of IMF sub-pattern is an extremely important process, The decomposition modes, penalty factors, and Lagrange multiplier update steps are the core parameters to determine the optimal process. Therefore, an algorithm must be designed to obtain and optimize the performance of VMD in future research. To seek the best three parameters of VMD, improve the performance of VMD, and strive to realize the network traffic model prediction mechanism with higher accuracy and faster convergence speed.
To sum up, the VMD decomposition technology and PSO optimization algorithm for constructing the prediction VOLUME 8, 2020   mechanism of combined model with faster convergence rate and higher accuracy still need to be improved and promoted.

IV. CONCLUSION
In a nutshell, optimization algorithm and decomposition technique play an important role in combined traffic prediction model so as to ensure prediction accuracy and increase the convergence speed. On the one hand, PSO is an optimization strategy for network model construction, it can better identify network traffic time sequence which has the strengths of simple principle, small calculation scale, fast speed and so forth. On the other hand, VMD is the most important core strategy in constructing network traffic prediction mechanism. It processes network time sequence which has the strengths of reducing signal transmission error, eliminating mode aliasing, reducing endpoint effects among others. It is worth noting that an adaptive algorithm must be designed to solve the problem for PSO and VMD are limited to the acquisition of optimal parameters, overcome the defects of human experience, improve the local optimal solution phenomenon and the decomposition effect of data signals, so as to enhance the prediction accuracy of the model.
In closing, the decomposition technology and optimization algorithm for constructing a combined network traffic prediction model with faster convergence speed and higher accuracy is a topic which still warrants future research attention.
JINMEI SHI is currently pursuing the Ph.D. degree in computer science with the Faculty of Computing and Informatics, University Malaysia Sabah. She has been engaged in research and has always been in the forefront of scientific research, which she has been teaching for 12 years with the Hainan Vocational University of Science and Technology. She plays an active role in the construction of computer science and personnel training, and has good practical experience and academic foundation. Her main research interests include network traffic prediction, algorithm analysis, and software. In recent years, she has published ten articles, three sponsored projects, three books, and ten awards above the provincial level. ZHIWEI YAN received the Ph.D. degree from the National Engineering Laboratory for Next Generation Internet Interconnection Devices, Beijing Jiaotong University. He joined the China Internet Network Information Center (CNNIC), in 2011, where he is currently a Professor. Since April 2013, he has been an Invited Researcher with Waseda University. His research interests include mobility management, network security, and next-generation Internet. He is active in IETF, APNIC, and ICANN. He is the ICANN RSSAC Caucus Member and published RFC8191 in IETF. VOLUME 8, 2020