Research on optimization of GWO-BP model for cloud server load prediction

To improve the accuracy of cloud server resource load prediction, particle swarm optimization (PSO) algorithm, gray wolf optimization (GWO) algorithm and BP neural network are studied in-depth and applied. Firstly, the PSO algorithm is introduced to optimize the location update method in the search process of gray wolf. Secondly, the convex function is introduced to improve the linear convergence of the traditional GWO algorithm. Then the optimized GWO algorithm is used to further improve the assignment of weights and thresholds in the traditional BP neural network model, to construct a multi-stage optimized cloud server load prediction model, referred to as PSO- GWO-BP prediction model. Finally, the performance of the PSO- GWO-BP prediction model is verified by comparison experiments.


I. INTRODUCTION
With the increasing maturity of cloud computing technology, more and more enterprises and individual users are choosing cloud data center hosting to meet the needs of data processing, computing, storage, and other tasks in their daily work. The continuous and rapid growth of cloud platform users and differentiated user needs has led to increasingly prominent resource management problems such as load imbalance among computing nodes [1] . To effectively solve such problems, the resource scheduling system must have the ability to predict accurately resource load of cloud server, strengthen further the rational allocation of resources, and improve resource utilization and cloud server performance [2] .
Resource load prediction belongs to the research field of resource management and provides a reliable basis for the design of resource scheduling algorithms. The research in the field of cloud server resource management mainly focuses on resource scheduling and load prediction [3] . The cloud server resource scheduling system rationally allocates and dynamically reclaims computing and storage resources according to the requirements of user application. For example, the sudden peak period of cloud service access requests can easily lead to a significant reduction in service quality caused by resource contention [4] . However, the peak period of cloud service usage usually has a certain regularity in a long term. Through the analysis and mining of historical data, the resource load prediction for a period in the future can be realized [5] , to provide a basis for resource scheduling and ensure the service performance and quality of the cloud server.
The purpose of this paper is to build a resource load prediction model for cloud servicer with high prediction accuracy. Through the comparative analysis of existing research methods, BP neural network is used as the base model in this paper. The particle swarm optimization (PSO) algorithm and the gray wolf optimization (GWO) algorithm are introduced to improve the traditional BP neural network model. Firstly, the PSO algorithm is used to improve the location update method of the GWO algorithm, and then the linear convergence of the traditional GWO algorithm is optimized by the convex function. Finally, the optimized GWO algorithm is used to improve the calculation method of initial weight and threshold in the traditional BP neural network. Therefore, a multi-stage optimized PSO-GWO-BP model for cloud server load prediction is proposed and constructed, which is referred to as PSO-GWO-BP prediction model for short. In addition, when calculating the resource load value, this paper uses the entropy weight method to design the weight of five factors and multiplies the weight by the indexes, and then sum them to obtain the resource load value. To verify the availability and performance of the model, this paper compared it with the support vector machine (SVM) model, the traditional BP neural network model, the BAS-BP model, the GWO-BP model, and the SSA-BP model. The experimental results proved that the comprehensive performance of the PSO-GWO-BP prediction model is better than the above five models. Therefore, the PSO-GWO-BP model can more accurately and effectively predict the overall trend of cloud server resource load changes, which is conducive to improve the intelligent management level of server monitoring and performance optimization in cloud data centers.
The rest of this paper is arranged as follows: the second part is the literature review. The third section is the introduction of GWO model and PSO model. The fourth part uses the traditional BP neural network model as the basic model and studies and proposes the PSO-GWO-BP prediction model. The fifth part verifies the availability and performance of the PSO-GWO-BP prediction model through the comparison experiments. The sixth part summarizes the paper.

II. LITERATURE REVIEW
Scientific and effective cloud service load prediction methods can provide a strong basis for resource scheduling decisions, to indirectly improve the utilization of cloud server resources and enhance its service performance. Through the analysis of the literature in recent years, it is found that the existing research on resource load prediction can be roughly divided into two categories.
The first category is the resource load prediction based on traditional statistical methods. The research on load prediction using statistical methods can be traced back to 1990. Hesterberg used weighted least squares linear regression technology for the load prediction [6] . Since then, many researchers have applied other types of regression statistical methods to load prediction. These methods can be divided into two categories: linear regression prediction methods and time series prediction methods. The linear regression prediction model is a kind of statistical method to analyze and find the causal relationship between resource load value and the key factors which affect resource load based on historical data, and then build a mathematical model to predict the change of resource load in the future. For example, Yang et al. used a linear regression method to predict the workload in the next period and designed an automatic scaling mechanism to scale virtual resources according to cloud workload conditions [7] . But this method assumes that the change trend of cloud service load is linear in the short term. The time series prediction model is a type of statistical method to study and figure out the change law of the data based on the statistical analysis of past time series data and predict the resource load in the future according to this law. For example, Wang Xu and Chen Xiaoyi used the analytic hierarchy process to synthesize the utilization of CPU, memory and disk and obtained the comprehensive resource utilization of the power information system, and then used the auto-regressive integrated moving average (ARIMA) time series prediction method to predict the resource utilization and system response time. The prediction results are used to judge the load status of the system [8] . But this method has higher requirements for the stability of the data. Usually In the process of resource load changes, data has composite characteristics which are linear and non-linear [9]. The disadvantage of pure linear regression prediction is that it cannot accurately describe the characteristics of resource load change. Although time series prediction can analyze the changing trend of resource load, it is limited to shortterm resource usage prediction. In addition, resource load prediction methods models on traditional statistical methods have poor data processing capabilities facing massive data.
The second category is the resource load prediction based on artificial intelligence methods. The research of artificial intelligence technology in load prediction can be traced back to around 1990 [10] , which mainly involves the application of single methods such as support vector machines (SVM), feed-forward neural networks, and Bayes methods. Due to shortcomings in these methods, many scholars developed some combination algorithms. For example, to improve the accuracy of SVM in resource load prediction, Zhao Li designed a combination function for SVM to improve its learning ability [11] . Cortez et al. established a load prediction model on random forest to predict the actual resource load by analyzing the characteristics of virtual machine load data [12] . This model proved that advance prediction can not only improve the utilization of resources, but also prevent the depletion of physical resources. Di designed a prediction model based on the Bayes method, which realized the prediction of load fluctuation in long-term intervals. The result of experiments has verified that the model has higher accuracy than autoregressive prediction, moving average and other methods [13] . Bey et al. proposed a fuzzy reasoning system based on fuzzy clustering and adaptive network, but its prediction result is affected by the number of clusters [14] . Qian Shengpan et al. proposed a multi-step online prediction model based on a deep cyclic neural network encoderdecoder, which can predict the future multi-step host load value by collecting online real-time data [15] . In the experiment of this model, only the CPU utilization is considered. However, the load of the host is affected by many factors, so the generalization ability of this method needs to be verified. The advantage of artificial intelligence method is that the accuracy of the prediction results is generally better than traditional linear regression and time series prediction models, and it is more suitable for largescale data processing. Because the performance of these models is easily affected by model parameter setting, research on improvement of model initialization has become a hot topic. From this point of view, this paper studies the optimization of cloud service load prediction model.
At present, the mainstream of artificial intelligence methods in the field of prediction research includes logistic regression, Naive Bayes, random forest, SVM and Back Propagation neural network (BP neural network). The Logistic regression algorithm cannot be used to solve nonlinear problems [16] . Naive Bayes algorithm is limited to classification prediction [17] . The random forest algorithm cannot make prediction beyond the data range of the training set, which may lead to over-fitting while modeling data containing certain specific noises [18] . While the support vector machine faces a large-scale data set, the storage and calculation of the data will consume a lot of machine memory and computing time [19]. The BP neural network is a kind of neural networks with multiple feedforward layers, which include input layer, hidden layer and output layer, and there are interconnections between the nodes. BP neural network has strong nonlinear processing ability, generalization ability, fault tolerance, adaptability, and self-learning ability. The model is not only easy to implement, but also widely applied. At present, BP neural network has been successfully applied to solve prediction problems in different fields, especially in future resource use prediction [20] , traffic prediction [21] , demand prediction [22] of cloud data center, etc. Compared with other artificial intelligence methods, the application and performance of BP neural network are more prominent in prediction research. Based on the above analysis, this paper uses the BP neural network model as the basic model of load prediction, and then studies and optimizes it. BP neural network is very sensitive to the initial weight and initial threshold of each layer in the network, which will have a great impact on its accuracy. These weights and thresholds are assigned randomly in the traditional BP neural network. Although the whole model will continuously adjust the weights through error back propagation to find the optimal weights and thresholds, it is easy to fall into local optimization.
To solve the above problems, GWO algorithm is selected to assign initial weight and threshold of BP neural network. The GWO algorithm has the advantages of global optimization, few control parameters, and easy implementation. It is widely used in model parameter optimization and has been proven to have significant effects on the optimization of BP neural network models [23] .
After GWO algorithm's own characteristics and convergence mode are further studied, it is found that the GWO algorithm itself may converge prematurely and fall into a local optimum [24] . Therefore, this paper improves the GWO algorithm from two aspects. On the one hand, due to the characteristics of strong memory and frequent communication of particles, PSO algorithm is used to strengthen the communication ability in the optimization process of GWO algorithm, effectively preventing premature convergence. On the other hand, the convex function is introduced to replace the original linear decreasing convergence, preventing GWO algorithm from falling into local optimization. Finally, performances of the PSO-GWO-BP model in prediction accuracy, convergence speed and stability are verified by comparative experiments.

III. RELATED THEORIES A. GRAY WOLF OPTIMIZATION ALGORITHM
The Gray Wolf Optimization (GWO) algorithm is a new population intelligence optimization algorithm proposed by Mirjailili et al. The core of the algorithm is simulating the hunting process of the gray wolf [25] . Because of its simplicity and ease of implementation and few control parameters, the GWO algorithm is widely used in many fields: economic transportation [26] ; workshop scheduling [27] ; optimization of model parameters, such as PID controllers [28] , Support Vector Machine (SVM); hybrid algorithm design, such as the gray wolf-bat (GWO-BA) optimization algorithm [29] , hybrid gray wolf-genetic (GWO-GA) optimization algorithm [30] .
In the GWO algorithm, wolves are divided into four ranks, from high to low: the first rank are named as wolves α, the second rank as wolves β, the third rank as wolves δ, and the fourth rank as wolves ω. Wolves of each rank must strictly obey the leadership of the previous rank. The gray wolf pack determines the prey position, evaluates the distance, and adjusts the best hunting position, and then repeats this process until the prey is captured successfully. The whole algorithm contains three main stages: encirclement, hunting, and attack. Its specific process is as follows.
(1) Encirclement stage: The main purpose is to identify the target prey and then encircle them. The mathematical model is as follows.
Where i is the number of current iterations; ) (i X j is the position of prey in the current i generation; ) (i X is the position of individual gray wolf in the current i generation; so D denotes the distance between the current prey and the gray wolf. The position transformation formula of the gray wolf is as follows.
Where A and C are both coefficients; 1 r and 2 r is a random number in the interval [0,1]; a is the convergence factor and it is inversely proportional to the number of iterations, whose value decreases linearly from 2 to 0.
(2) Hunting stage: after the encirclement, wolves α will lead wolves β and wolves δ to hunt, while wolves ω will be guided by wolves α, β and δ to update position. Where ) ( 1 + i X j is the optimal solution for i+1 iterations, and the mathematical model of specific location information transformation is as follows. Where C 1 , C 2 , C 3 are coefficients that can be calculated according to formula (5) to obtain specific values.
(3) Attack stage: The target is captured, and the optimal solution is obtained. As the convergence factor a decreases linearly from 2 to 0, the value of A varies within [-2, 2] in formulas (2) and (3). When |A|≥1, it means that it is still in the global search stage; While |A|<1, the wolf pack will launch an attack to capture the targets which are already locked. The flowchart of GWO algorithm is shown in Figure  1.

B. PARTICLE SWARM OPTIMIZATION ALGORITHM
The Particle Swarm Optimization (PSO) algorithm is also named as the particle swarm algorithm. In this algorithm, particles keep searching for optimal positions in space and storing the known information. In the process of finding the optimal solution, the optimal solution is found quickly by constantly communicating with other particles, and these positions are called pbest (the best of the particles) and gbest (the global best). The iterative update process of the particle swarm is as follows: Firstly, the initial population is generated in random; then, the particles start to find pbest and gbest, and the information about the positions of the particles in the swarm is continuously stored in the memory.
Finally, the iterative model of position update of all particles is as follows.
Where i is the particles in the particle swarm; j is the iteration step performed;

A. RESOURCE LOAD MODEL
The resource load is used to reflect the current working status of the cloud server. The pressure on the processing capacity of the cloud server is increasing with the resource load value, and vice versa. The load situation of cloud server is affected by many factors. This paper mainly considers five factors on the resource load [31] : CPU utilization, memory utilization, disk space utilization, the number of incoming network packets and the number of outgoing network packets. The load value at each time point is calculated: firstly, the entropy weighting method is used to objectively assign weights to the above five factors. Then the weight of each factor is multiplied by the resource utilization rate of the factor at each moment. Finally, the resource load value of each moment is calculated by Equation (15).  factor. To make the weight determination more objectively, this experiment uses the entropy weighting method to assign weights to individual factors.

B. APPLICATION OF PSO AND CONVEX FUNCTION
Although the GWO algorithm has a high convergence speed in the process of finding the best, it mainly relies on the leadership of wolves α, β, and δ, and this way makes the whole wolf pack communicate little. And it is prone to problems such as premature convergence [32] . In addition, in the GWO algorithm, the convergence speed of the gray wolf should be different for each stage of its main task, so the gray wolf search for the optimum is extremely dependent on the convergence factor a. It can be seen from formula (5) that the convergence factor converges in a linear decreasing way, and the convergence speed of the whole process is constant, which may lead to the wolf pack missing the search range and premature convergence into local optimum [33] . To overcome these two shortcomings, the following approach is used for optimization in this paper.
(1) using PSO algorithm to optimize position update of wolf pack The PSO algorithm with the characteristics of memory and frequent communication can be used to compensate for the shortcoming of early convergence, which is caused by the low communication of wolves in the optimization process of gray wolves. According to the above PSO mathematical model, the update of the position of gray wolves can be improved as follows.
Where c 1 , c 2 , and c 3 are optimization parameters and ω is inertia coefficient. The pseudo-code is shown in Table I.  (2) Introducing convex function to improve the traditional linear convergence of GWO algorithm Considering that the wolf pack mainly conducts prey target search in the early stage, which requires global search, so the convergence speed should be slowed down; the later stage is to capture the prey, which requires local search, and the convergence speed should be accelerated to prevent the prey from escaping. Therefore, convex function is introduced in this paper to improve the convergence of the GWO algorithm as follows.  Based on the above analysis and discussion, this paper designs and proposes the PSO-GWO-BP prediction model based on PSO algorithm, GWO algorithm and BP neural network. By analyzing the GWO model, we found that the class system and leadership style of the wolf pack may cause premature convergence in the optimization process, and the traditional convergence of this model makes it easy to mistake in the search of the wolf pack. Therefore, this paper first introduces the PSO algorithm and the convex functions to solve the above two problems respectively. The improved GWO algorithm is applied to the traditional BP neural network model as a new way of assigning weights and thresholds. Then a new cloud server resource load prediction model is constructed, which is named as the PSO-GWO-BP prediction model. The basic framework and execution of the PSO-GWO-BP model are shown in Figure  4.
Step1: Initialize the PSO-GWO-BP model, and determine the topological structure of PSO-GWO-BP.
Step2: Calculate the initial weights and thresholds of the PSO-GWO-BP model.
Step3: Call the improved GWO algorithm. The abovementioned initial weights and thresholds are used as optimization targets.
Step4: Run the improved GWO algorithm. Firstly, relevant parameters are initialized, such as the size of wolf pack, maximum of iteration times, the range of searching range. Then the fitness value of gray wolf individuals is calculated. Finally, α Wolf, β Wolf and δ Wolf are determined by comparing the fitness values.
Step5: Calculate the convergence factor and update the position. Firstly, the convex function is used to calculate the convergence factor by formula (21). Then the values of A and C are updated according to formulas (3) and (5). Finally, the PSO algorithm is used to update the position of the gray wolf individual as shown in formulas (13) and (14).
Step6: Determine whether the termination condition (iteration times or error) is reached. If not, Step 4 and Step 5 will be repeated until the conditions are met. If yes, the position of the α wolf is output and mapped to the optimal initial weights and thresholds of the PSO-GWO-BP model.
Step7: Continue the main process of the PSO-GWO-BP model. Firstly, the training part of the PSO-GWO-BP model is executed with the above-mentioned optimal initial weights and thresholds until the training termination conditions are met. Then the result is output, and the entire execution of the PSO-GWO-BP model completes.

D. EVALUATION CRITERIA OF PREDICTION MODEL
To verify the performance of the PSO-GWO-BP resource load prediction model for cloud server proposed in this paper, the experiment will compare SVM model, traditional BP neural network model, BAS-BP model [34] , GWO-BP model [35] , and SSA-BP model [36] with it on the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Square Error (MSE) [37] . The performance evaluation indexes of the six models are given by formulas (22) - (24). In addition, this paper also uses formula (24) in the calculation of the fitness value.

V. SIMULATION EXPERIMENTS A. EXPERIMENTAL DATA
The dataset used in this experiment is derived from the Alibaba cluster-trace-v2018 trace dataset [38] . This dataset records the resource usage of each machine at different point in time and information about the instances in the batch workload. The usage of resources such as CPU, memory, and disk space for each machine at different times of the day is used in this experiment.

B. DATA PRE-PROCESSING
This experiment randomly selects a machine in the machine usage dataset for a certain day to detect the trace data. Firstly, the five factors which include CPU utilization, memory utilization, disk space utilization, number of incoming network packets and number of outgoing network packets are used as the evaluation factors of the resource load of the cloud server. Then the entropy weighting method is used to calculate the information entropy as well as the weight of each factor, as shown in Table II. The entropy weighting method uses the variability between data to assign weights, and the method can reflect the weights between each factor more objectively. Finally, the weight of each factor is multiplied by the actual value of each factor and summed to obtain the resource load value of the cloud server. By analyzing the raw data, It can be found that the resource utilization changes within the same minute are relatively small. Therefore, to reflect the variability of the data, this experiment samples the data in minutes and selects the peak within each minute as the load value of node each minute. Among them, the percentages of training data and test data are 80% and 20% respectively. This experiment takes CPU utilization, memory utilization, disk space utilization, the number of incoming network packets and outgoing network packets as the input values, and the cloud server resource load value as the output value. To avoid the possible instability of the input data and the problem of the magnitude between the data affecting the model training effect, this experiment normalizes the data so that all of them are between [-1, 1]. The principle is as follows.
Where X is the original input;

C. DETERMINING STRUCTURE AND PARAMETER OF NEURAL NETWORK
Based on the number of input and output data, it can be determined that the number of neurons in the input and output layers of the network are 5 and 1 respectively. The number of neurons in the hidden layer is calculated as follows. (26) where M denotes the number of neurons in the hidden layer, N denotes the number of neurons in the input layer, L denotes the number of neurons in the output layer, and α is a constant which is between [1,10].
According to formula (26), the number of neurons in the hidden layer can be determined at [4,13]; the whole network structure includes three layers. After several training experiments, it is found that the best output is achieved when the number of neurons in the hidden layer is 11. The maximum number of iterations is 500; the training error target is 0.001; and the learning rate is 0.01.

D. ANALYSIS OF EXPERIMENTAL RESULTS
The (1) Analyzing the prediction accuracy of model Table III summarizes values of the error indexes (MAE, MAPE, MSE) in the prediction results of the above six models, which are calculated according to formulas (22)- (24). To control the stability of the results of a single experiment and the impact of the results of emergencies, the results presented in Table III   (2) Analysis of convergence speed and stability of model BAS-BP model, GWO-BP model, SSA-BP model, and PSO-GWO-BP model are improved prediction models based on traditional BP neural network. The time consumed in the optimization process of these models determines their respective convergence speeds. Commonly, it can be judged based on the time or number of iteration while the fitness curve reaches a plateau. In this paper, the number of iterations is used to evaluate the convergence speed of each prediction model. Meanwhile, the change trend of the fitness curve can intuitively reflect the stability of the prediction model, and then it is easy to judge whether the model fall into the local optimum during the optimization process.
As shown in Figure 5, analyzing of the convergence speed and stability of the above five models, it is indicated that the fitness value of the BAS-BP model is fluctuating, and its fluctuation range is large. So it is clear that the convergence speed of the model is too slow, and it is not stable. The fitness value of the GWO-BP model decreases rapidly in the beginning 10 iterations, and then gradually becomes flat. Compared with the fitness curve of the BAS-BP model, the GWO-BP model has better convergence speed and stability. But the convergence speed and stability of the GWO-BP model are worse compared with the SSA-BP model and the PSO-GWO-BP model. The fitness curve of the SSA-BP model tends to be flat after 5 iterations, indicating that its convergence speed is faster, and its stability is strong. However, the minimum fitness value of the model has not reached the minimum shown in Figure 5, indicating that the model may fall into a local optimum. Until 25 iterations, the fitness value of the PSO-GWO-BP model becomes very slow and stabilized, and its fitness values are lowest than the BAS-BP model, GWO-BP model, and SSA-BP model. Therefore, the convergence speed and stability of the PSO-GWO-BP model are much better than the above-mentioned models. In conclusion, the PSO-GWO-BP model proposed in this paper outperforms SVM model, traditional BP neural network model, BAS-BP model, GWO-BP model, and SSA-BP model in several aspects which involve prediction accuracy, convergence speed and stability.

VI. CONCLUSION
To avoid the one-sidedness of using a single indicator to reflect the resource load of cloud server, this paper adopts the entropy weighting method to calculate the weight coefficients of five factors affecting the resource load, and then calculates the expected value of the resource load. In addition, this paper uses the PSO algorithm and the GWO search algorithm for the optimization of traditional BP neural network, and designs and constructs the PSO-GWO-BP cloud server resource load prediction model. Experiments are conducted using the Alibaba-cluster-trace-2018 dataset, and through the comparative analysis of the prediction results with SVM model, traditional BP neural network model, BAS-BP model, GWO-BP model, SSA-BP model, and PSO-GWO-BP model, it is verified that the PSO-GWO-BP model proposed in this paper is much better than the above five models. Therefore, this model can predict the overall trend of cloud server resource load more accurately and can be applied to monitoring and performance optimization of cloud servers in data centers. The PSO-GWO-BP model is limited by the data acquisition method and the implementation environment of the algorithm and can only process offline data at present. In the future research work, experimental conditions will be upgraded, and then the PSO-GWO-BP model will be improved and carried out on real-time computing platform. His main research interests include big data, intelligent system based on cloud platform.