Many-Objective Cloud Task Scheduling

Task scheduling problem refers to how to reasonably arrange many tasks provided by users in virtual machines, which is very important in the cloud computing. And the quality of the scheduling performance directly affects the customer satisfaction and the provider benefits. In order to describe the task scheduling problem of cloud computing more precisely and improve the scheduling performance. This paper establishes many-objective cloud model, including four objectives: minimizing time, minimizing costs, maximizing resource utilization, and balancing load. At the same time, a many-objective optimization algorithm based on hybrid angles (MaOEA-HA) is proposed to solve this model. Hybrid angle strategy is designed to optimize the algorithm better, which combines two angle strategies: individual-to-individual angle and individual-to-reference point angle. One by one elimination strategy was introduced to remain individuals with better performance. By comparing with five other advanced many-objective optimization algorithms, MaOEA-HA shows the best performance on the DTLZ test suite. Moreover, different algorithms are applied to solve the cloud task scheduling problem, and MaOEA-HA algorithm achieves best results.


I. INTRODUCTION
With the rapid development of computer technology [1]- [3], more and more IT industries (individuals and enterprises) [4], [5] are increasingly connected with cloud services. Cloud computing, with its unique on-demand service mode, ultra-large scale, virtualization and high reliability performance, is always convenient for people. Cloud computing is defined as a commercial, distributed computing model and service model that provides users with a new and convenient computing, storage and other information services via the Internet. Since cloud computing was proposed in 2006, it has become the representative technology of the third wave of information technology. Although many scholars have done a lot of works in the cloud computing environment, it is still the hottest research topic. Also, the cloud service has been applied in many industries. According to RightScale 2019 State of the Cloud Report from Flexera, the detail of results is showed in Fig.1. Tech services account for a maximum of 30%, Software, Financial services and business services account for 7%, 8% and 6%, respectively. And there are still 20% in other industries. In particular, it accounts for The associate editor coordinating the review of this manuscript and approving it for publication was Minho Jo . 8% of all education and healthcare, which means the cloud is in every aspect of our lives. There are a lot of IT enterprises have made great achievements in the field of cloud computing, such as Amazon AWS, Microsoft Azure, Alibaba cloud, Google cloud and IBM cloud et al. Specifically, scheduling issues [6], including task scheduling and resource allocation, are critical to maximizing provider benefits. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ It is important to arrange tasks in different virtual machines in the cloud computing environment. Because of the complexity of tasks and the heterogeneity of resources, large-scale task scheduling becomes a NP-hard problem with great research value. At present, a large number of scholars have conducted in-depth discussions on cloud computing task scheduling problem. In the beginning, there are many traditional optimization algorithms, such as Round-Robin, Min-Min, Max-Min and Backfilling et al. These algorithms are the simplest scheduling algorithms. Although they are easy to implement, they cannot completely solve all problems in task scheduling simultaneously. Therefore, researchers begin to use swarm intelligence algorithms [7], [8] to solve the problem, such as particle swarm optimization (PSO) [9], ant colony optimization (ACO) [10], bat algorithm (BA), firefly algorithm (FA) [11], [12], cuckoo search (CS) [13], [14] and so on. A large number of research results show that the intelligent algorithm is superior in solving scheduling problems. Meanwhile, the multi-objective evolutionary algorithm [15], [16] based on population is more suitable for solving practical problems [17], [18] because it can satisfy the constraints of multiple objectives at the same time and produce a set of optimal solutions. Such as NSGAII [19] and MOEA/D [20], [21]. However, the multi-objective evolution algorithm [22]- [24] is only suitable for the condition that the number of objectives is less than three. It is well known that as the number of objectives increases in evolutionary computation, these constraints make it difficult to obtain satisfactory solutions [25]. In order to solve four or more constraint problems, many-objective optimization algorithm [26], [27] is proposed.
In many-objective algorithm [28], [29], the performance of the algorithm is closely related to the initial population. Since the advantages and disadvantages of the parents limit the range of the offspring, the outstanding parents population is advantageous to produce better offspring. In the algorithm, diversity and convergence are two important indicators to measure the performance of the algorithm. Considering only convergence without diversity will make the solutions gather in a certain region while ignoring other better solutions. Considering only diversity without convergence will make all solutions fill the whole solution space without good performance. How to balance convergence and diversity is a pivotal problem in many-objective optimization algorithm.
In view of the previous research only focused on two or three objectives such as time, cost, etc., it also aims to describe the cloud computing task scheduling problem more accurately, this paper considers the task scheduling requirements from multiple perspectives, including time, cost, the rate of resource and load. Therefore, the many-objective task scheduling model is established. At the same time, in order to obtain a set of satisfactory compromise schemes and better scheduling performance, the many-objective optimization algorithm based on hybrid angle strategy (MaOEA-HA) is designed, which can better balance convergence and diversity of population.
The remainder of this paper is arranged as follows: Section 2 gave the related work of task scheduling in the cloud. Section 3 introduced the proposed model in detail. The strategies of the proposed are showed in section 4. Section 5 displayed the experiment results, and detailed analysis is also given. At last, the conclusion of our work is summarized in section 6.

II. RELATED WORKS
In recent years, many scholars have done a lot of research works on cloud computing task scheduling problem. Wang [30] proposed a task scheduling algorithm based on constrained discrete particle swarm optimization by considering the task deadline, scheduling budget and reliability comprehensively. However, there are still disadvantages for improvement in terms of load and resource utilization. Based on the particle swarm optimization algorithm, Mapetu et al. [31] designed a load balancing strategy for updating particle location, which has better performance in task scheduling and load balancing. Chen and Long [32] combine particle swarm algorithm and ant colony algorithm by adjusting the learning factors to optimize the task scheduling on fitness, cost, and operation cycle. Panda and Jana [33] focused on the problem of load balancing, and proposed a cloud task scheduling algorithm based on probability theory. It can well balance the load of the virtual machine with four different performance indicators: virtual machine load standard deviation, maximum load, minimum load and zero load. Qiao and Lin [34] used k-means clustering and improved bat algorithm to enhance scheduling performance under the problem of uneven allocation for task scheduling resources. Li et al. [35] applied the ant colony optimization algorithm to obtain better scheduling results with shorter total task completion time and average task completion time. Considering the data files required for jobs from public or private clouds, Taheri et al. [36] proposed a bi-objective job scheduling optimization model that uses particle swarm to minimize job execution and data file transfer time.
Because task scheduling is an NP-hard optimization problem with heterogeneous characteristics, few constraints cannot satisfy the needs of users and resource providers, so the need for multi-objective and multi-constrained optimization is intensifying. Manikandan and Pravin [37] proposes a multi-objective optimization algorithm that combines lion swarm optimization algorithm and gravitational search algorithm. The objectives including low resource consumption, low cost, and low energy consumption. He et al. [38] studied the resource utilization, task completion time, average cost, and average energy consumption. At the same time, an adaptive acceleration coefficient was designed to optimize the above model. Pradeep and Jacob [39] accurately organized and optimized resource allocation and task scheduling, and combines cuckoo search and harmony search algorithms to improve the efficiency in the scheduling process. Based on this, a new multi-objective function is proposed, which includes the objectives of cost, energy consumption, and storage. Experimental results show that the proposed algorithm has better results. Abdullahi et al. [40] proposed a multi-objective symbiotic organism search algorithm based on chaos optimization strategy, which achieved significant improvements under the consideration of time and cost. Sanaj and Prathap [41] proposed a chaotic squirrel search algorithm to optimize multi-task scheduling in an infrastructure-as-a-service cloud environment. Srichandan et al. [42] used hybrid method to explore the task scheduling algorithm, which combined genetic algorithm and bacterial foraging algorithm to optimize the cloud task scheduling problem from two aspects of time and energy consumption. Jena [43] believes that task scheduling is important to reduce power consumption and processing time, and a multi-objective nested particle swarm algorithm is proposed.
Analyzing the characteristics of the above researches, it can be found that most scholars have considered several objectives of user needs and provider interests in the process of cloud task scheduling, and cannot accurately describe the problem. At the same time, the multi-objective optimization algorithm is applied to solve the problem of less constrained, and it is no longer suitable for solving the problem of many-constrained optimization. Therefore, it is necessary to design a many-constrained model for task scheduling in cloud. Meanwhile, a many-objective optimization algorithm is proposed to solve the model.

III. THE PROPOSED MODEL
In this section, the detailed description of proposed model is introduced. To begin with, as basic indicator, task execution time can reflect the performance of scheduling clearly. Since tasks are executed in parallel on the virtual machine, the total execution time cannot add up to the execution time of all virtual machines. We chose the maximum execution time of a virtual machine as the total execution time. We assume that we don't consider the task sequence. The time can be expressed as: where Total_time means that the total execution time of tasks. Time_VM refers to the time that all tasks execute on a virtual machine. And TL is the length of task. Vm represents the mips of virtual machine.
In actual problems, the pricing mode involves bandwidth and rate of flow. To facilitate simulation, referring to Tencent Cloud's pricing method, a new pricing model that combines both the bandwidth and rate of flow is gave. If the bandwidth utilization below 10%, the cost will be charged according to the flow rate, while the bandwidth utilization above 10%, the cost will be charged according to the bandwidth, and the detailed cost description is as follows: where filesize and outputsize are predefined in parameter settings in section 5. bw k is the bandwidth of k − th task. R_bw means the bandwidth utilization. Also, 0.08, 0.063 and 0.025 are set inspired Tencent Cloud. Different from the traditional charging mode, this mode adds bandwidth utilization constraints to cloud scheduling, making the scheduling results more suitable for user needs.
In addition, it is necessary to design the rate of resource equation for task scheduling. The objective is to assign each task to the most appropriate virtual machine possible, in other words, the virtual machine will not have any unused resources. If a task needs 1000M of storage, a 1024M virtual machine is his first choice. Considering CPU, bandwidth and storage, increasing the utilization of these resources can significantly improve scheduling performance, which is beneficial to cloud providers. We calculate the utilization of CPU, bandwidth and storage, separately. And finally calculate the average value as the utilization objective. And it can be expressed as: The main idea of resource rate is to calculate the utilization of each virtual machine resource. And then sum the results of each virtual machine. Where N i is the number of tasks in i − th virtual machine. Moreover, to avoid the condition that many tasks allocated into the same virtue machine, the objective of overload has been conducted. Considering the execution speed, bandwidth, and storage of the virtual machine, we consider these indicators to be the execution capacity of the virtual machine. At the same time, taking the execution time of the task into VOLUME 8, 2020 consideration, the average load of the virtual machine is reflected by the ratio of the task execution time to the virtual machine capability and the ratio of the execution time of all tasks to the execution capacity of all virtual machines. It can be expressed as: where D is the capacity of the virtual machine, which involves the execution speed, bandwidth, and storage. Due to the different units of measurement of each attribute, it is necessary to normalize the values of each attribute and map their values to a numerical interval by function transformation to calculate the attribute values of different units. Therefore, we normalize mips, bandwidth and storage by Nor(x) = x−min max − min to reflect the capacity of virtual machine. TD represents the ratio of the time to capacity of i − th virtual machine. TDave denotes the ratio of the time to capacity for total virtue machine. Thus, load balancing means that each virtual machine can be used more evenly by multiple tasks.
In summary, to achieve better scheduling performance, we need to minimize cost, minimize time, improve resource utilization, and balance the load. And all the objectives are listed in Eq. (12). The proposed many-objective algorithm is showed in section 4.

IV. THE PROPOSED MANY-OBJECTIVE ALGORITHM
In this section, we will give a detailed introduction to the proposed algorithm, which could better balance the convergence and diversity.

A. THE FRAMEWORK OF ALGORITHM
In this subsection, we provide the framework of the proposed algorithm. First of all, the parameter t, reference points Z , population size N and the number of objectives M are initialed. And the idea points Zm, a set contains the minimum value for each objective, are confirmed by calculating the value of objective function. Then the iteration process is started and the termination condition is that the maximum number of iterations is satisfied. The algorithm consists of the following three parts. Firstly, hybrid angle strategy is used to optimize the random population and generate a new population of size N , which has better performance and is good for generating better offspring. And then, the parents generate offspring through crossover mutation operations. Finally, the offspring is combined with the parents, and all the optimal individuals in the population are retained through one by one eliminate strategy. The pseudo code of the proposed algorithm has showed in Algorithm 1. In this algorithm, a new strategy of angle is proposed which combined individual-to-individual angle and individualto-reference point angle. Hybrid angle strategy not only consider the minimum angle P between individual and individual, but also for the minimum angle Q between the individual and reference point. For individual A and B, one of both will be remain by comparing the value of P. If P(A)>P(B), it can be concluded that the angle of A's neighbors from A is much greater than the angle of B's neighbors from B, so the diversity of A is better than the diversity of B. The same is true of the angle between the individual and the reference point. Considering the diversity of individuals from two perspectives can enhance the diversity of the population. The corresponding equation can be expressed as: where ∂ ij represents the minimum angle between the individual i and other individual j, δ iz denotes the minimum angle for the individual i to other reference point z, Also, δ zz is the minimum angle between reference point z and other reference  points z . Calculating the angle between individuals and individuals, the better diversity can be obtained by maximizing the minimum angle value. At the same time, the angle value between the reference point and the individual is introduced to maintain the diversity of the population by maximizing the angle value between the individual and the reference point it is assigned to. And δ zz mainly used to normalize the angle to solve the problem that the angle between the individual and the reference point is too large or too small due to high density or sparse reference vectors. This Hybrid angle is combined with Euclidean distance. Firstly, the Euclidean distance between individual and non-dominated front is calculated, which is a convergence strategy. The longer the distance, the better the convergence. And then the hybrid angle value is obtained by Eq. (13). Finally, both the convergence and diversity strategy are combined and select the individual with better performance.

C. ONE BY ONE ELIMINATE STRATEGY
One by one eliminate strategy is used to select the population of size N in 2N . At the same time, diversity and convergence still need to be considered. This strategy includes the angle value and the Euclidean metric strategy (the distance between individual and idea point), which is used for diversity and convergence, respectively. Firstly, we calculate the angle value and select two solutions with the minimum angle value. And then, the solutions that satisfy both diversity and convergence are retained by comparing the Euclidean metric of the two sets of solutions. And another solution of larger value will be eliminated. This method not only makes greater contribution to the convergence performance of the entire population, but also improves the diversity of the population.

V. EXPERIMENTAL RESULTS AND ANALYSIS
Firstly, the proposed algorithm in DTLZ test suite is compared with other advanced many-objective optimization algorithm to verify the performance. And then, these algorithms are applied in the cloud task scheduling, and the performance of the algorithm is further tested by analyzing the scheduling results.

A. THE PERFORMANCE OF DIFFERENT ALGORITHM ON DTLZ TEST SUITE
DTLZ has different characteristics, that is a common test suite in testing many-objective algorithm. And the performance evaluate indicator of Inverted Generational Distance (IGD) [44] value is used to calculate the performance results.  The indicator can be expressed as: The compared algorithm includes NSGAIII [45], GrEA [46], KnEA [47], VaEA [48], [49] and Two_Arch2 [49]. Also, the number of objectives is set as 4,6,8,10,15. The corresponding population size are 120,132,156,275,135. The crossover probability is 1/20, mutation probability is 1/20, independent running for 30 times for each algorithm, and the maximum number of iterations is 10000. And the detailed results are denoted in Table 1. Table 1 shows the comparison results of MaOEA-HA with five other advanced many-objective optimization algorithms on the DTLZ test suite. Where ''+/−/='' indicate the better value, worse value and equal value compared with MaOEA-HA, respectively. And the best performance values are already marked in bold in the table. The results show that MaOEA-HA has more optimal solutions. Compared with NSGAIII, all the solutions of the proposed algorithm have obvious advantages. Compared to GrEA and KnEA, MaOEA-HA is only slightly worse on only one solution, and they have equal performance on the two solutions. Moreover, only one solution has small difference between VaEA and Two_Arch2, and the other solutions have better performance. In general, MaOEA-HA has better performance in solving 79084 VOLUME 8, 2020 many-objective problems. The reason is that the choice of strategy can better balance convergence and diversity. Therefore, we apply the proposed algorithm to solve practical application cloud computing task scheduling problems with many-objective properties.

B. THE PERFORMANCE OF DIFFERENT ALGORITHMS IN CLOUD TASK SCHEDULING 1) CLOUD TASK SCHEDULING ENVIRONMENT SETTINGS
In order to test the performance of the algorithm on the task scheduling problem, Cloudsim platform and the Matlab platform is used for simulation experiments. And the notebook uses win 10 operation system, the processor is Intel (R) Core (TM) i5-4200M CPU @ 2.50GHZ, and the memory is 8GB. One of the more important issues is the setting of parameters during simulation. Therefore, in this section, the detailed parameter settings of the virtual machine and tasks are provided.
Since cloud computing simulation does not have a uniform standard for parameter setting, we have consulted a large number of literatures and relevant materials. Finally, we have taken into account the price of Tencent Cloud and related information, and comprehensively considered virtual machines with different performances. (https://cloud.tencent.com/document/product/213) And the virtual machine is classified as the following four types: Low-level, Basic, Pervasive and Professional. The four representative parameter settings are summarized in Table 2.
And Table 3 listed 16 virtual machines with different configurations. As for the setting of task attributes, we set the CPU required for the task to 1. The initial task length, file size, and output size are 500, 100, and 200, respectively. As the number of tasks increases, the three attributes are summed up at a ratio of 1000, 10, and 10, respectively. Table 4 shows the numerical experimental results of different algorithms on four objectives. And the best value has been marked in bold. Because the many-objective algorithm is used to obtain a set of non-dominated solutions, numerical comparison cannot be performed. Therefore, we take the best value, the worst value, and the average value of each objective in the solutions. It can be seen that MaOEA-HA has the best performance in terms of cost and load objectives on the group of best value. For the best value, MaOEA-HA obtain the minimum time and load, which means it is better than other algorithm in terms of these two indicators. Meanwhile, it only underperforms GrEA in terms of cost. As for utilization, it was unfriendly. From the worst case, the proposed algorithm shows outstanding performance in terms of time, cost, utilization and load. Finally, the average value can better reflect the performance of the solution. MaOEA-HA shows better performance on all three objectives, and only performs poorly on utilization issues. It can be concluded that the proposed algorithm guarantees the advantages of the other three objectives only at the expense of utilization.  The reason may be that the proposed algorithm is more biased towards diversity and makes the objective value of utilization not perform well. Overall, the proposed algorithm has the best performance. The solution obtained by intelligent optimization algorithm is a set of better solutions with similar performance. Since the solutions in the population do non-dominate each other. The user can randomly select one of the solutions, or get the most satisfactory solution according to their preference.

2) COMPARISION OF EXPERIMENTAL RESULTS
Meanwhile, the non-dominated PF of six algorithms on cloudsim is showed in Figure 2. It is easy to see that the four objectives designed have obvious conflict. Meanwhile, KnEA and GrEA have worse distribution, and GrEA shows the worst performance in the cost. What's more, NSGAIII, VaEA, Two_Arch2 and MaOEA-HA are similar, which can better balance convergence and distribution.

VI. CONCLUSION
This paper designs a many-objective model based on the cloud task scheduling problem, while considering time, cost, resource utilization and load. Meanwhile, a many-objective optimization algorithm based on hybrid angles is proposed to solve this model. The simulation experiments are performed using the Cloudsim platform and the Matlab platform. And the experimental results show that MaOEA-HA is superior to the above five algorithms not only on the DTLZ test suite, but also on the application of task scheduling problems. In the future, we will continue to study the scheduling problem in the cloud environment and design many-objective algorithms with better performance. SHAOJIN GENG is currently pursuing the M.S. degree in computer science and technology with the Taiyuan University of Science and Technology, Taiyuan, China. His research interests include computational intelligence, cloud computing, and combinatorial optimization.
DI WU is currently pursuing the M.S. degree in computer science and technology with the Taiyuan University of Science and Technology, Taiyuan, China. Her research interests include computational intelligence and algorithm optimization. VOLUME 8, 2020 PENGHONG WANG is currently pursuing the M.S. degree in computer science and technology with the Taiyuan University of Science and Technology, Taiyuan, China. His main research interests include computational intelligence and combinatorial optimization.
XINGJUAN CAI received the Ph.D. degree in control science and engineering from Tongji University, China, in 2017. She is currently an Associate Professor with the School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan, China. Her research interest includes bio-inspired computation and application.