Dynamic Group Learning Distributed Particle Swarm Optimization for Large-Scale Optimization and Its Application in Cloud Workflow Scheduling

Cloud workflow scheduling is a significant topic in both commercial and industrial applications. However, the growing scale of workflow has made such a scheduling problem increasingly challenging. Many current algorithms often deal with small- or medium-scale problems (e.g., less than 1000 tasks) and face difficulties in providing satisfactory solutions when dealing with the large-scale problems, due to the curse of dimensionality. To this aim, this article proposes a dynamic group learning distributed particle swarm optimization (DGLDPSO) for large-scale optimization and extends it for the large-scale cloud workflow scheduling. DGLDPSO is efficient for large-scale optimization due to its following two advantages. First, the entire population is divided into many groups, and these groups are coevolved by using the master-slave multigroup distributed model, forming a distributed PSO (DPSO) to enhance the algorithm diversity. Second, a dynamic group learning (DGL) strategy is adopted for DPSO to balance diversity and convergence. When applied DGLDPSO into the large-scale cloud workflow scheduling, an adaptive renumber strategy (ARS) is further developed to make solutions relate to the resource characteristic and to make the searching behavior meaningful rather than aimless. Experiments are conducted on the large-scale benchmark functions set and the large-scale cloud workflow scheduling instances to further investigate the performance of DGLDPSO. The comparison results show that DGLDPSO is better than or at least comparable to other state-of-the-art large-scale optimization algorithms and workflow scheduling algorithms.

make the searching behavior meaningful rather than aimless. Experiments are conducted on the large-scale benchmark functions set and the large-scale cloud workflow scheduling instances to further investigate the performance of DGLDPSO. The comparison results show that DGLDPSO is better than or at least comparable to other state-of-the-art large-scale optimization algorithms and workflow scheduling algorithms.

I. INTRODUCTION
W ORKFLOW, which contains a set of tasks interconnected via data or computing dependence between each other, is widely used and applicated in many real-world applications [1]. For example, Montage workflow can be used to generate custom mosaics of the sky and CyberShake workflow can be used to characterize earthquake hazards in a region [2]. The workflow scheduling problem is to find the most suitable resource to execute each task of the workflow, so as to satisfy users' quality of service (QoS).
In the past, researchers often studied the workflow scheduling based on distributed environment like grids [3], [4]. With the popularity of cloud computing [5]- [8], workflow scheduling on cloud resource has gradually become a significant research topic in recent years [9]- [11], but it is also more challenging. Different from the fixed and limited computing resources in grid computing, the computing resources in cloud computing are elastic and almost unlimited, which can be leased in any amount at any time and according to the payper-use pricing principle. Leasing more resources or expensive resources can shorten the executing time, but also requires a larger cost. Therefore, taking both the time and cost into consideration is necessary in workflow scheduling on cloud. Rodriguez and Buyya [12] proposed a cost-minimization and deadline-constrained workflow scheduling (CMDCWS) model, where the tasks are required to execute with a minimum cost and within a given deadline constraint. This has become a popular scheduling model because it can find a potential balance between time and cost, with a number of following works on this model [12]- [17]. For example, due to the success of evolutionary algorithms (EAs) [18]- [26], some This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ EAs have been proposed to deal with the CMDCWS model, such as coevolutionary genetic algorithm (GA) [13], dynamic objective GA (DOGA) [14], cost effective GA [15], particle swarm optimization (PSO) [12], renumber PSO (RNPSO) [16], ant colony system (ACS) [17], and multiobjective ACS (MOACS) [1]. Thus, we also adopt this CMDCWS model in this article.
Even though the above approaches are competitive in solving the small-or medium-scale CMDCWS (e.g., less than 1000 tasks), when the scale of workflow increases, several challenges occur, such as the huge search space and exponentially increasing number of local optima.
To deal with these challenges, this article proposes a dynamic group learning distributed PSO (DGLDPSO) for large-scale optimization and extends it for solving the largescale CMDCWS. Specifically, three major novel designs and advantages that help DGLDPSO balance diversity and convergence for finding the feasible solution under a tight deadline and minimizing the cost in the large-scale cloud workflow scheduling are described as follows.
1) During the evolutionary process, we randomly divide the entire population into several equal groups in every generation, and these groups are coevolved by using the master-slave multigroup distributed model, forming a distributed PSO (DPSO) framework. This is promising to enhance the population diversity to efficiently deal with the complex search challenge of large-scale optimization. 2) In the particle's evolution, a dynamic group learning (DGL) strategy is adopted for DPSO. On the one hand, the size of each group is dynamically changed in every generation. On the other hand, each group is treated as a big "particle." In this sense, the best particle in the group is regarded as the personal best pbest of the big particle, while the best pbest of all big particles (groups) is regarded as the globally best gbest of the entire population. Besides, only the worst particle in each group is updated by learning from the pbest of the current group and the gbest of the entire population. Therefore, in the DGL strategy, the dynamically changed group size can control the learning strength and find a potential balance between diversity and convergence for large-scale optimization. Moreover, only updating the worst particle in the group can save more fitness evaluations (FEs) budget for other particles to search more regions of large-scale space and to further refine the solution accuracy. 3) In the traditional cloud workflow scheduling algorithms, each dimension of the solution represents a task, while the value of each dimension stands for the index of cloud resource that is scheduled to execute the corresponding task. However, the index of resource in fact is only a meaningless index number which does not reflect the characteristic of resource. Learning from the index of resource might make particles sightless. That is when the value x d on dimension d in a particle moves toward pbest d or gbest d , it seems that the resource pbest d or gbest d is more suitable for task t d . However, the resource number between x d and pbest d or gbest d does not represent anything. Therefore, in the particle's learning, an adaptive renumber strategy (ARS) is proposed to make the index of resource meaningful. We adaptively use two metrics that are related to execution time and execution cost, respectively, to sort and renumber the resources in order to make the learning among particles more clear and reasonable. Therefore, the contributions of this article are two-fold, from both the algorithm design aspect and the real-world application aspect. One the one hand, in the algorithm design aspect, we propose a novel DPSO with DGL strategy, called DGLDPSO, to efficiently solve the large-scale optimization problems. This is an algorithm innovation in both PSO and large-scale optimization. On the other hand, in the real-world application aspect, we have enhanced the DGLDPSO with ARS and extended the algorithm to efficiently solve the large-scale cloud workflow scheduling problems. This is a significant contribution to the cloud computing field.
Experiments are conducted on a large-scale benchmark function set from the CEC2010 test suite and the largescale instances of cloud workflow scheduling (e.g., up to 5000 tasks). The results obtained by DGLDPSO are better than, or at least comparable to those obtained by the stateof-the-art large-scale optimization algorithms and the cloud workflow scheduling algorithms, showing the effectiveness of our DGLDPSO algorithm.
The remainder of this article is organized as follows. Section II describes the basic PSO algorithm, the application of PSO in cloud workflow scheduling, and the CMDCWS model. Section III describes the proposed DGLDPSO algorithm in detail. In Section IV, experimental results are presented and discussed. Finally, Section V draws the conclusions.

A. PSO Framework
In PSO [27], the member of the swarm is called particle, which means a possible solution in the search space. Each particle P i is associated with two vectors. The vector ] means the position of P i , where D stands for the dimensions of the search space. Moreover, each particle P i has a memory on its historical best position, called personal best The best one of all the pbest i is treated as the globally best of the entire population, called gbest = [gbest 1 , gbest 2 , . . . , gbest D ]. In every generation, each particle P i adjusts its velocity and position based on its own pbest i and the gbest of the entire population. The velocity V i and position X i of each particle are updated according to the following formulas: where ω is the inertia weight to balance global and local search performance. c 1 and c 2 are the acceleration coefficients, where parameter c 1 pulls the particle to its own pbest, ensuring the diversity of the population; while c 2 pushes the swarm to converge to the current gbest, ensuring the speed of convergence. r1 d i and r2 d i are the two uniformly distributed random numbers within [0, 1]. A particle's velocity and position on each dimension are clamped in [−V max , V max ] and [X min , X max ], respectively.

B. Application of PSO in Cloud Workflow Scheduling
Many improved PSO variants were proposed to solve the tasks scheduling problem in cloud computing, which aims to find the most suitable resource for each task to meet users' QoS, where the tasks are uncorrelated with each other [28]- [35]. For example, Alkhashai and Omara [29] merged the best-fit algorithm into PSO for generating the initial population and utilized tabu search algorithm to do the local search. Zhao et al. [34] proposed an adaptive inertia weight PSO (AIWPSO) to relieve the sensitivity of inertia weight and make PSO more adaptive and effective to obtain better scheduling. Apart from the adaptive inertia weight, an adaptation of acceleration coefficients was also used in the mutation time-varying PSO (MTV_PSO) [35] for task scheduling.
Even though these task scheduling algorithms have achieved a great success in cloud computing, the workflow scheduling problems are much more challenging because the tasks are dependent on each other and require a specific execution order.
Over the past few years, many PSO-based cloud workflow scheduling algorithms have been extensively researched [36]- [40]. Some scheduling algorithms focus on reducing the financial cost. For example, Wu et al. [37] proposed a revised discrete PSO (RDPSO) to schedule applications among cloud services that takes both data transmission cost and computation cost into account. Pandey et al. [39] proposed a novel PSO-based approach to reduce the cost, it shows the PSO-based approach can achieve almost three times cost saving as the compared algorithms. Some scheduling algorithms focus on minimizing the execution time. For instance, Manasrah and Ali [40] proposed a hybrid scheduling algorithm (GA-PSO) which combines PSO and GA to reduce the workflow execution time.
However, cost and time often conflict with each other, the scheduling which can shorten the executing time may also require a larger cost, while a low investment often greatly prolongs the executing time. Therefore, taking both the time and cost into consideration is necessary in workflow scheduling on cloud.

C. CMDCWS Model
To take both the time and cost into consideration, the CMDCWS model is first proposed by Rodriguez and Buyya [12]. It focuses on finding a schedule to execute a workflow so that the total execution cost (TEC) is minimized and the total execution time (TET) should not A simple workflow example is shown in Fig. 1. Besides, there is a set of available resources r j , where each resource has its own specific processing capacity PC j and cost per unit time C j . Often, the resource with higher processing capacity can deal with tasks faster, but is relatively more expensive.
In the CMDCWS model, a schedule for each task can be formulated as

III. DGLDPSO FOR THE LARGE-SCALE CLOUD WORKFLOW SCHEDULING
In order to deal with the large-scale cloud workflow scheduling problems efficiently, DGLDPSO is proposed, together with the following three novel designs and advantages. First, DGLDPSO uses a master-slave multigroup distributed model, forming a DPSO, where multiple groups are coevolved concurrently to enhance the population diversity. Second, the DGL strategy is proposed for DPSO. On the one hand, the size of group is dynamically changed, so as to control the learning strength and find a potential balance between diversity and convergence for large-scale optimization. On the other hand, we treat each group as a big particle and only one update is performed in each group (big particle), which can save FEs to search more regions of the large-scale space and to further refine the solution accuracy. Third, considering the characteristics of cloud workflow scheduling, ARS is further proposed, which can adaptively choose two metrics to sort and renumber the resources in order to make the learning among particles more clear and reasonable.
Before we introduce DGLDPSO, we first describe three critical issues when using PSO for solving the cloud workflow scheduling problem.
The first one is the encoding. Here, the dimension of particle is equal to the number of tasks in the workflow, while the search range of each dimension in particle is the real number between 1.0 and the number of resources plus one. For example, assume the workflow is as Fig. 1 with eight tasks and there are four resources can be leased, then each particle is 8-D and the range of each dimension is [1.0, 5.0). In other words, each dimension stands for each task, and the integer part of its value represents the index of resource scheduled to execute the corresponding task. Fig. 2 shows a simple scheduling, where d 1 stands for task t 1 , and its value 2.2 represents that resource r 2 will be assigned to task t 1 ; d 2 stands for task t 2 , and its value 3.5 indicates that resource r 3 will be allocated to task t 2 , and so on.
The second one is the FE. As mentioned above, we use (5) and (6) to calculate TEC and TET, respectively. However, in order to obtain TEC and TET, the time that each task is executed on each resource needs to be known. The execution time is stored in the matrix exetime, where the exetime[i] [j] represents the time that task t i runs on resource r j . Besides, when the parent task and its child task are executed on different resources, the parent needs time to transfer data to its child. As a result, the transfer time stored in the matrix transfertime is used, where the transfertime[i] [j] indicates the time that the task t i transfers data to t j . Fig. 3 presents an example of the matrices exetime and transfertime. After getting the exetime and transfertime, the TEC and TET values of a solution can be calculated as the pseudocode shown in Algorithm 1.
The last one is the fitness comparison. As we wish to minimize TEC and let TET not exceed the deadline, so if both solutions are feasible (TET does not exceed the deadline), the one with smaller TEC is better. While if only one solution is feasible, obviously, the feasible solution is preferred. However, if both solutions are infeasible (TET exceeds the deadline), the one with smaller constraint violation (smaller TET) is selected.
A simple workflow scheduling corresponding to Figs. 1-3 is shown in Fig. 4.

A. Master-Slave Multigroup Distributed Framework
The master-slave multigroup distributed framework is illustrated in Fig. 5, where the master node dominates the multiple slave nodes in parallel hardware.

Algorithm 1 Calculate TEC and TET
If t i has no parents 5.
End For 15.  During the evolutionary process, the master always randomly divides the entire population into N/M equal groups, where N is the population size, and M is the size of the group which will be dynamically changed. Note that if N%M = 0, the last group will have M + N%M particles. After the population partition, the master will send each group to its corresponding slave, and different groups are coevolved concurrently on their slave nodes. After the evolution, each slave sends the updated group back to the master. Up to now, we have finished a loop sequent. The process will be repeated until the termination criterion is satisfied. As we can see, this master-slave multigroup distributed model can enhance the population diversity by coevolution, which matches the search requirement of the large-scale optimization problems.

B. DGL Strategy
The DGL strategy has the following two novel characteristics.
First, after the population partition, each group is regarded as a big particle, and the best particle in the group is denoted as the pbest of the big particle (group). Besides, only the worst particle x w in the group will update its velocity and position by learning from the pbest in the current big particle (group) and from the gbest of the entire population, while other particles in the group will enter to the next generation directly. In other words, only N/M particles will be updated in the entire population.
Second, the group size M is dynamically changed in every generation, which is randomly selected from a fixed interval. That is because for a given population size, if the group size is large, the pbest is selected from a large group of particles, which would be relatively greedy. On the contrary, if the group size is small, pbest is selected from a small group of particles, which becomes relatively diverse. However, without any prior knowledge about the landscape of a problem, it is hard to precisely determine a proper group size M in different evolutionary stages. Therefore, to make a compromise, the M is randomly selected from a fixed interval. Although this scheme is simple, it is effective and readily applicable to a wide range of uses. For example, if the current group gets trapped in the local optima, the group size M has the opportunity to be smaller to improve the population diversity. In contrast, if the global optima region is found, the group size M can become larger to accelerate the convergence. As a result, we can achieve a potential balance between population diversity and fast convergence. Meanwhile, the random selection also relieves the sensitivity of parameter. Fig. 6 illustrates the main idea of this novel DGL strategy. The formula of updating velocity of the worst particle is shown as (7), which is similar to (1), but with the following differences: 1) The first part is similar to the inertia term in the traditional PSO. The only difference is that the inertia weight ω in PSO is often set as linearly decreasing from 0.9 to 0.4, while herein it is replaced by the random number in DGLDPSO. The random number is helpful to improve the learning diversity and the population diversity.
2) The second part is also learning from the pbest, which is same to the traditional PSO. However, the pbest in DGLDPSO is regarded as the best particle of the big particle (group), as we mentioned above. 3) Parameters c 1 and c 2 are often set as 2.0 in the traditional PSO. But in DGLDPSO, c 1 is set as 1.0 and c 2 is set as 0.1, respectively. That is because when dealing with the large-scale optimization problems, we should concentrate more on diversity maintaining to avoid local trapped. In this sense, c 1 and c 2 should be smaller to ensure the population diversity, while c 2 which controls the convergence speed to the gbest should be much smaller. Therefore, there are several superiorities of this DGL strategy shown as follows.
1) The random selection of M and redefinition of the big particle and its pbest can control the learning strength effectively and can find a potential balance between diversity and convergence, which can further improve the performance of DGLDPSO. 2) Only the worst particle in each group is updated, which is helpful to save more FEs for other particles. 3) Resetting the parameters c 1 and c 2 will make DGLDPSO focus more on population diversity, which matches the search requirement of the large-scale optimization problems. With the above descriptions, the pseudocode of DGLDPSO can be summarized in Algorithm 2.

C. ARS
DGLDPSO performs particularly well on many large-scale optimization problems, which will be further discussed in Section IV-B. However, when applied to the large-scale cloud workflow scheduling, an extra strategy called ARS is incorporated in DGLDPSO due to the characteristics of the cloud workflow scheduling.
Consider a situation. When the cloud provider offers a large amount of resources with random number, the particles will become blind in the traditional PSO during the learning process. That is when the value x d on dimension d in a particle moves toward pbest d or gbest d , it seems that the resource pbest d or gbest d is more suitable for task t d . However, the resource number between x d and pbest d or gbest d does not represent anything. How to make the searching process meaningful is quite important, especially when the scale of workflow is large.
The renumber strategy is first proposed by Li et al. [16]. It used the cost per unit time as the metric to renumber the resources where the resource r i represents the resource with the ith lowest cost per unit time. In that way, when x d flies toward pbest d or gbest d , it predicts a tendency that task t d is suitable for a cheaper resource or an expensive one. As a consequence, the searching process will become meaningful.
Even though it achieved the relatively promising results, there is still much room for improvement. As there are two objectives (i.e., time and cost) when evaluate the scheduling, only using the cost metric to renumber the resources is not wise, especially when the deadline is tight. As a result, in this article, we proposed a novel ARS by adaptively selecting the metric to further improve the learning process.
At the beginning, DGLDPSO uses the metric of processing capacity to renumber the resources. This metric is used because that until the population finds a feasible solution, TET is the primary optimization objective we should concern. The metric of processing capacity is more related to the TET, which can help the algorithm find solutions that satisfy the deadline constraint. Once the population finds a feasible solution, what we concerned turns to the cost. At this moment, DGLDPSO adaptively turns to use the metric of cost per unit time to renumber the resources. As a result, DGLDPSO will predict a tendency for each task that the current task is suitable for a cheaper resource or an expensive one, which is useful to minimize the TEC. There are two advantages of our ARS to solve the large-scale cloud workflow scheduling.
1) The renumber strategy can predict a searching tendency for each task, which can make the searching and learning process more meaningful. 2) Adaptively selecting the two metrics can make the population find the feasible solution and minimize the cost more quickly, which is more suitable for the large-scale cloud workflow scheduling, especially when the deadline is tight.

D. Complete Algorithm DGLDPSO
Based on all the components described above, the pseudocode of the complete procedure of DGLDPSO for the largescale cloud workflow scheduling is outlined in Algorithm 3. The superiority of DGLDPSO is shown as follows.
1) Several groups are coevolved by using the master-slave multigroup distributed model, forming a DPSO, which can enhance the population diversity.
2) The DGL strategy is proposed for DPSO, which can control the learning strength effectively and can find a potential balance between diversity and convergence. 3) ARS is proposed to make the index of the resource and the searching process meaningful, which matches the searching requirements and characteristics of cloud workflow scheduling.

E. Complexity Analysis
Herein, we denote the population size, the dimension of problem, the group size, and the maximum number of  Table S.I in the supplementary material.
When apply DGLDPSO to the large-scale cloud workflow scheduling, ARS is proposed and we add two sorting operators in DGLDPSO, as shown in steps 1 and 8 in the MASTER process of Algorithm 3, respectively. Denote the number of resources as N r , the time complexity of ARS in DGLDPSO is O(N r × log(N r )). Therefore, the overall time complexity of DGLDPSO for the large-scale cloud workflow scheduling is O(MaxFEs × D) + O(N r × log(N r )). Detailed comparisons of time complexity of different cloud workflow scheduling algorithms are listed in Table S.II in the supplementary material.

A. Experimental Setup
To test the performance of DGLDPSO, we conduct the following two experiments. The first one is on the 20 widely used 1000-D optimization benchmark functions from CEC2010 test suite [41], which is used to illustrate the superiority of DGLDPSO for dealing with the large-scale optimization problems. The second one is on the large-scale extension of the CMDCWS model proposed in [12], which is also used to show the preponderance of DGLDPSO for solving the large-scale cloud workflow scheduling. For more details about the test functions and the cloud workflow scheduling model, please refer to [12] and [41], respectively.
For the implementation of DGLDPSO, the master-slave model of DGLDPSO is built in a multiprocessor distributed environment that consists of a number of distributed computing servers. The CPU of each server has eight processors configured with Intel Core i5-7400, 3.00 GHz. Therefore, we obtain the multiprocessor distributed environment and we can assign each group to one processor through MPI. The population size is set as 500 and the interval for the group size M is [2,10].
In the first experiments on the large-scale benchmark functions, we compare DGLDPSO with seven state-of-the-art large-scale optimization algorithms. The first four competitors are based on the cooperative coevolution (CC) framework, that is, multilevel CC (MLCC) [42], cooperative coevolving PSO (CCPSO2) [43], and cooperative coevolving DE (DECC), including DECC with random grouping (DECC-G) [44] and DECC with differential grouping (DECC-DG) [45]. The other four competitors are non-CC-based large-scale optimization algorithms, including competitive swarm optimizer (CSO) [46], social learning PSO (SL-PSO) [47], dynamic multiswarm PSO (DMS-L-PSO) [48], and dynamic level-based learning swarm optimizer (DLLSO) [49]. Although DLLSO also has the dynamic technique, it is different that DLLSO dynamic selects a number from a level pool (contain only a few choices) as the number of levels, while DGLDPSO dynamic selects a number from an interval (contain a large number of choices) as the group size. In the second experiments on the large-scale cloud workflow scheduling, we compare DGLDPSO with seven typical cloud scheduling algorithms, including five PSO-based scheduling algorithms, such as PSO [12], RNPSO [16], AIWPSO [34], MTV_PSO [35], and RDPSO [37], and two GA-based scheduling algorithms, such as DOGA [14] and GA-PSO [40]. The parameters of these competitors are set according to their original proposals because they have been well turned for these related problems. For fair comparisons, the MaxFEs is set as 3 000 000 for all competitors. Moreover, all the experiments run 30 times independently for statistics and the mean results are reported. In addition, the coefficient of variance (C.V) of the 30 runs is calculated to show the stability of the algorithm. Moreover, the Wilcoxon's rank-sum test at α = 0.05 between DGLDPSO and other state-of-the-art algorithms is performed to evaluate the statistical significance of the results. The symbols "+," "≈," and "−" indicate DGLDPSO performs significantly better (+), similarly (≈), or significantly worse (−) than the corresponding algorithm in comparison.

B. Comparison Results on Large-Scale Benchmark Functions
These functions are with 1000 dimensions and can be classified into three groups. The first group includes three separable functions f 1 -f 3 . The second group consists of the following 15 functions f 4 -f 18 , which are partially separable functions. The last group consists of the last two functions f 19 -f 20 that are nonseparable functions. All these functions are shifted and rotated, which are more difficult to solve and make our test more comprehensive and convincing. Due to the space limitation, the properties of these functions are given in Table S.III in the supplementary material. For more details about these test functions, please refer to [41]. Table I presents the comparison results where the best results are highlighted in boldface. From Table I, we can see the following.
For the first three separable functions f 1 -f 3 , DGLDPSO performs significantly better than most of other algorithms, especially on f 1 and f 3 . Although it performs slightly worse than MLCC and DLLSO on these three functions, MLCC and DLLSO lose their feasibilities when dealing with the partially separable or nonseparable functions, which will be discussed below.  14,14,15,15,15,12,19, and 11 functions, respectively. Conversely, MLCC, CCPSO2, DECC-G, DECC-DG, CSO, SL-PSO, DMS-L-PSO, and DLLSO can only surpass DGLDPSO on 6, 4, 5, 5, 2, 6, 1, and 8 functions, respectively. Moreover, the C.V values of DGLDPSO are generally smaller than other state-of-the-art compared algorithms, indicating that DGLDPSO always has more stable performance than the competitors. Therefore, DGLDPSO generally performs better than all these competitors on most of the tested large-scale benchmark functions. The promising performance of DGLDPSO also further confirms that the master-slave multigroup distributed model in coevolution helps the algorithm enhance the population diversity to sufficiently search in the very large space for finding the global optimum.
To further study the evolutionary behavior of different algorithms, we draw their convergence curves to observe their evolutionary processes. Besides, in order to make our comparison more convincing, we choose several typical benchmark functions from all the three groups. Herein, we select separable function f 3 , partially separable functions f 6 , f 11   and DLLSO on these six functions are plotted in Fig. S1 in the supplementary material for saving space.
From Fig. S1(a) in the supplementary material, we can see that only DGLDPSO and DLLSO can converge to good solutions quickly on separable function f 3 , while other algorithms occur stagnation in the early stage or evolve very slow. While on the partially separable functions f 6 , f 11 , and f 13 , shown in Fig. S1(b)-(d) in the supplementary material, DGLDPSO and CSO can find the better results and converge faster than other algorithms. Moreover, DGLDPSO is still more accurate and has a faster convergence speed than CSO, showing DGLDPSO converges the fastest to the best final results. Note that in the partially separable function f 16 in Fig. S1(e) in the supplementary material, only DGLDPSO and DECC-DG can get the promising results, where DGLDPSO still obtains more accurate result than DECC-DG does. The curves in Fig. S1(f) in the supplementary material also show that DGLDPSO can converge to promising results on the very difficult nonseparable function f 20 , and its early convergence speed is faster than most of the other algorithms. Overall, DGLDPSO generally has faster convergence speed than these compared large-scale optimization algorithms on these benchmark functions. This may be due to that the DGL strategy in DGLDPSO can change the group size dynamically to control the learning strength effectively and to find a potential balance between diversity and convergence speed.

C. Comparison Results on the Large-Scale Cloud Workflow Scheduling
In this experiment, DGLDPSO with ARS is applied into the large-scale cloud workflow scheduling.
Before we conduct the experiments, some extra parameters in the cloud workflow scheduling model should be set. First, for every type of resource r j , we define its processing capability PC j as a uniformly distribution Rand(1, 10) within [1,10] and its cost per unit time as a normal distribution Normal(PC j , 0.1) with mean PC j and standard deviation 0.1. That is because the resource with good processing capability is often expensive. Second, for every task t i , we define its size s i as Rand (10,30) within [10,30], while its exetime on r j is defined as Normal(s i /PC j , 0.1). Third, the transfer time transfertime [i][c] from parent task t i to its child t c is calculated according to the size of task t i and the bandwidth, which can be formulated as where the bandwidth is set as 2000 in our experiment. Two evaluation criteria called success rate (SR) and MeanTEC are used to evaluate the performance of DGLDPSO and other algorithms. For a given MaxFEs and a deadline, SR denotes the percentage of successful runs out of all runs. Here, a successful run means a run where the algorithm can find the feasible solution. The MeanTEC is measured by the average TEC in all successful runs since the TEC is invalid when the algorithm cannot find a feasible solution within MaxFEs.
1) Comparison Results on the Scientific Workflows: We first test the performance of DGLDPSO on four widely used scientific workflows called CyberShake, LIGO, SIPHT, and Montage. The topology structures of these workflows are shown in Fig. 7. More details of these workflows can be referred to [50].
We generate three test instances for each scientific workflow, where the numbers of tasks are 1000, 1500, and 2000, respectively, and the numbers of resources are 100, 150, and 200, respectively. For each test instance, we set three deadlines which are loose, medium, and tight, respectively, to test whether a given algorithm can find the feasible solution within the MaxFEs. For the test instance with 1000 tasks, we use the deadlines as 200, 170, and 140, respectively. For the test instance with 1500 tasks, we use the deadlines as 300, 250, and 200, respectively. For the test instance with 2000 tasks, we use the deadlines as 400, 340, and 280, respectively.
The detailed comparison results over 30 runs are shown in Table II. For clarity, the best results are highlighted in boldface. Table II shows that with the scale of tasks increases and the deadline becomes tighter, the performances of many algorithms are largely weakened, while the DGLDPSO is still promising.
For the CyberShake workflow, only DGLDPSO, GA-PSO, and DOGA and can always find the feasible solution on all the three test instances and within all the three deadlines. However, DGLDPSO can achieve the best performance on MeanTEC compared with the other algorithms.
For the LIGO workflow, which has a relatively simpler topology structure, DGLDPSO, RNPSO, RDPSO, GA-PSO, and DOGA can all find the feasible solution on all the three test instances and within all the three deadlines. RNPSO performs even better than DGLDPSO on 2000 tasks instance, while DOGA achieves smaller MeanTEC than DGLDPSO on 1000 tasks instance when deadline = 200. Even so, DGLDPSO still outperforms all the other algorithms on other test instances.
For the SIPHT workflow, GA-PSO achieves the smallest MeanTEC and outperforms DGLDPSO on instances with 1000 and 2000 tasks. However, DGLDPSO still outperforms other scheduling algorithms and performs the best on instance with 1500 tasks.
For the Montage workflow, which has a relatively more complicated topology structure, the performance of many compared algorithms is further weakened. MTV_PSO and DOGA achieve the similar performance with DGLDPSO on instances with 1500 and 2000 tasks, while DGLDPSO dominates almost all the other algorithms on instance with 1000 tasks.  also indicate the better stability of DGLDPSO. Therefore, DGLDPSO achieves the best performance on the large-scale scientific workflows.
2) Comparison Results on the Randomly Generated Workflows: To further make our test more convincing and comprehensive, we designed an algorithm shown in Algorithm 4 using the stochastic mechanism similar to the one used in [17] to randomly generate the relatively complicated topological structure of the workflows.
We randomly generate nine test instances to further illustrate the superiority of DGLDPSO when solving the large-scale cloud workflow scheduling problems. The first three test instances T 1 -T 3 are with 1000 tasks and 100 resources. The following three test instances T 4 -T 6 are with 1500 tasks and 150 resources. The last three test instances T 7 -T 9 are with 2000 tasks and 200 resources. For the test instance with 1000 tasks, we use the deadlines as 2000, 1700, and 1400, respectively. For the test instance with 1500 tasks, we use the deadlines as 3000, 2500, and 2000, respectively. For the test instance with 2000 tasks, we use the deadlines as 4000, 3400, and 2800, respectively.
The detailed comparison results over 30 runs are shown in Table III. For clarity, the best results are highlighted in boldface. According to Table III, we can conclude the following.
For the first three test instances T 1 -T 3 with 1000 tasks, only DGLDPSO, RNPSO, GA-PSO, and DOGA can always find the feasible solution within all the deadlines. However,  For task t c is the child of task t i ; 6.

7.
End For

End For End
DGLDPSO can achieve the best performance on MeanTEC compared with the other algorithms.
For the following three test instances T 4 -T 6 with 1500 tasks, PSO, RNPSO, AIWPSO, MTV_PSO, and RDPSO all lose their feasibilities when the deadline is tight. Especially, when deadline = 2000, PSO is totally unfeasible on T 4 . Although GA-PSO and DOGA can achieve the similar performance on SR compared with DGLDPSO, they are totally dominated by DGLDPSO on MeanTEC on T 5 , no matter on which deadline.
For the last three test instances T 7 -T 9 with 2000 tasks, the performance of other compared algorithms is further weakened. When deadline = 2800, PSO, RNPSO, and MTV_PSO cannot find any feasible solution on T 8 . However, DGLDPSO still achieves the better or at least comparable MeanTEC compared with the other algorithms on all the three test instances.
Overall, on all the 9 test instances and all the 27 cases, DGLDPSO performs better than PSO, RNPSO, AIWPSO, MTV_PSO, RDPSO, GA-PSO, and DOGA on 27,21,22,25,20,25, and 24 cases, respectively. Conversely, RNPSO, AIWPSO, RDPSO, GA-PSO, and DOGA, can only surpass DGLDPSO on 1, 1, 1, 2, and 3 cases, respectively. PSO and MTV_PSO cannot surpass DGLDPSO on any cases. From the results of C.V, we can also see that the performance of DGLDPSO is more stable than other compared algorithms. Therefore, DGLDPSO generally achieves the best performance on the large-scale randomly generated cloud workflow scheduling.
3) Scalability of DGLDPSO: In order to investigate the scalability of DGLDPSO, we further randomly generated three test instances with 5000 tasks and 500 resources using Algorithm 4 to compare the performance of DGLDPSO with these algorithms. Three deadlines are set as 11 000, 10 000, and 9000, respectively. The detail experimental results on the 5000-D scheduling can be seen in Table IV. As we can see, with the increasing scales, the performance of GA-PSO and DOGA is greatly deteriorated. Only DGLDPSO can find all the feasible solutions on all the three test instances and within all the three deadlines. Besides, DGLDPSO performs significantly better and achieves smaller MeanTEC than all the other scheduling algorithms on almost of all the nine cases. Moreover, the performance of DGLDPSO is more stable than the compared algorithms according to the C.V values. These results show that DGLDPSO can also remain good performance when the scale increases to 5000.
From Tables II-IV, DGLDPSO shows its superiority and good scalability, on both the scientific workflows and the randomly generated workflows. As the scale of tasks increases and the deadline becomes tighter, the superiorities of DGLDPSO are increasingly obvious. Therefore, the DGLDPSO not only has a promising performance on the large-scale benchmark functions but also has a promising performance on the large-scale regular scientific workflows, complex randomly generated workflows, and even very large workflows up to 5000 tasks. This may be due to that the multigroup distributed coevolution of DGLDPSO has very strong global search ability in the large-scale optimization. Moreover, the ARS incorporated in DGLDPSO can also make the index of resource meaningful and the learning among particles more clear and more reasonable, helping to solve the large-scale cloud workflows scheduling problems efficiently.

D. Effects of ARS on the Large-Scale Cloud Workflow Scheduling
In order to investigate the effect of ARS, we consider a DGLDPSO variant without ARS, called DGLDPSO-noARS, and compare it with DGLDPSO on all the nine randomly generated workflow instances, the same used in Section IV-C2. The detailed comparison results with respect to SR and MeanTEC between DGLDPSO and DGLDPSO-noARS are listed in Table S.IV in the supplementary material.
As we can see, DGLDPSO can always find the feasible solution on all these nine test instances and on all the deadlines, while DGLDPSO-noARS loses its feasibility when the deadline is tight, such as when deadline = 1400 on T 1 and when deadline = 2000 on T 4 . Meanwhile, DGLDPSO always achieves better MeanTEC when compared with DGLDPSO-noARS. In all, DGLDPSO dominates DGLDPSO-noARS on 25 cases, and the C.V values of DGLDPSO are generally smaller than DGLDPSO-noARS, which fully illustrates the effectiveness of the ARS.
To further investigate the applicability of ARS, we also adopt the ARS in all the other compared algorithms and term their variants with ARS as algorithm-ARS. For example, PSO with ARS is called PSO-ARS. Note that RNPSO with ARS is also called PSO-ARS. The results between these algorithms with and without the ARS are compared in Table S.V in the supplementary material. The results show that the performance of the PSO-based scheduling algorithms, including PSO, RNPSO, AIWPSO, MTV_PSO, and RDPSO, is greatly improved by using the ARS. This indicates that ARS can guide the flying of particles more clearly and make the learning among particles more reasonable. While the performance of the GA-based scheduling algorithms, including DOGA and GA-PSO, is also improved by using the ARS, but not obvious. That may be due to that the evolutionary operators in GA, including selection, crossover, and mutation, do not have the learning and flying process. Therefore, ARS has relatively less influence on the GA-based scheduling algorithms. As the compared algorithms have been enhanced by ARS, we further compare the results obtained by their ARS variants with those obtained by DGLDPSO (which is also with ARS) in Table S.VI in the supplementary material. From Table S.VI in the supplementary material, we can see that DGLDPSO still generally outperforms other algorithms variants with ARS.
Therefore, ARS has the promising effectiveness on the cloud scheduling algorithms, which can be widely applied into other cloud scheduling algorithms and further improve their performance, especially on the PSO-based scheduling algorithms.

V. CONCLUSION
This article develops a DGLDPSO for large-scale optimization and extends it with ARS for tackling largescale cloud workflow scheduling. Three major novel designs are developed to improve the performance of the algorithm: 1) master-slave multigroup distributed model; 2) dynamic group learning strategy; and 3) ARS.
Several groups are coevolved by using the master-slave multigroup distributed model, forming a DPSO framework, which can enhance the population diversity. Furthermore, the DGL strategy in DGLDPSO can control the learning strength effectively and can find a potential balance between diversity and convergence. Finally, the ARS can make the index of the resource and the searching process meaningful.
Based on these three novel designs, DGLDPSO can achieve a promising performance when dealing with the large-scale benchmark functions and the large-scale cloud workflow scheduling problems. His current research interests include evolutionary computation algorithms like differential evolution, particle swarm optimization, and their applications in design and optimization, such as cloud computing resources scheduling.