Parallel Enhanced Whale Optimization Algorithm for Independent Tasks Scheduling on Cloud Computing

Cloud computing has been imperative for computing systems worldwide since its inception. The researchers strive to leverage the efficient utilization of cloud resources to execute workload quickly in addition to providing better quality of service. Among several challenges on the cloud, task scheduling is one of the fundamental NP-hard problems. Meta-heuristic algorithms are extensively employed to solve task scheduling as a discrete optimization problem and therefore several meta-heuristic algorithms have been developed. However, they have their own strengths and weaknesses. Local optima, poor convergence, high execution time, and scalability are the predominant issues among meta-heuristic algorithms. In this paper, a parallel enhanced whale optimization algorithm is proposed to schedule independent tasks in the cloud with heterogeneous resources. The proposed algorithm improves solution diversity and avoids local optima using a modified encircling maneuver and an adaptive bubble net attacking mechanism. The parallelization technique keeps the execution time low despite its internal complexity. The proposed algorithm minimizes the makespan while improving resource utilization and throughput. It demonstrates the effectiveness of the proposed PEWOA against the best performing enhanced whale optimization algorithm (WOAmM) and Multi-core Random Matrix Particle Swarm Optimization (MRMPSO). The algorithm consistently produces better results with varying number of tasks on GoCJ dataset, indicating better scalability. The experiments are conducted in CloudSim utilizing a variety of GoCJ and HCSP instances. Various statistical tests are also conducted to evaluate the significance of the results.


I. INTRODUCTION
Cloud computing has become a prime resource for a variety of applications including banking, healthcare, entertainment, and E-commerce etc.It provides numerous sorts of services on a pay-per-use basis to both users and applications by utilizing its computing, storage, and bandwidth resources [1].
The associate editor coordinating the review of this manuscript and approving it for publication was Nitin Gupta .
The three services models are Platform as a Service, Infrastructure as a Service and Software as a Service [2], while the deployment models are private, public, community and hybrid [3].Cloud service providers offer different levels of services with specific Quality of Service (QoS) parameters to meet the varying needs and expectations of their users.Users demand better Quality of Service (QoS), while cloud providers aim to provide scalable on demand services by employing minimum number of resources.Therefore, a guarantee is warranted in the form of Service Level Agreement (SLA) to specify the QoS parameters for the provided service.
The philosophy of cloud computing ensures provisioning of the right number of resources as per the need at a particular time, but datacenters often overprovision the resources to avoid SLA breaches.These services can scale up and down dynamically according to user requirements.The users rent the resources for a specific period by sending a list of tasks to the cloud; a broker sends the received tasks and available virtual machines to the scheduler that maps the tasks to virtual machines.The mapping by the scheduler plays a crucial role in the overall efficiency of the datacenter.Thus, providing a consistent mapping of tasks to virtual machines with varying workload is crucial for the overall system scalability.
The role of scheduler has been a critical factor in determining the goal of executing workloads swiftly in addition to achieving optimal resource utilization [4].Virtualization [5] of hardware resources is the core technology for cloud resource sharing where multiple virtual machines are created on a single computing node to allow running multiple tasks from multiple users as shown in Figure 1.Cloud data centers receive hundreds of thousands of tasks to run on the virtual machines on a daily basis.The huge number of tasks and large number of heterogeneous virtual machines make the task scheduling an NP-hard problem [6].To meet user need with minimum number of resources, the resource utilization of employed resources should be increased [7].A data obtained over six months of duration from over 5000 cloud servers revealed that the servers were utilized 10-50% of their maximum capacity [3].
Traditional static task scheduling algorithms such as Round Robin (RR), Min-Min, and Max-Min etc. provide optimal mapping of tasks to virtual machines, but these static algorithms provide poor resource utilization and cannot be used widely in the dynamic cloud environments [8].On the other hand, dynamic task scheduling algorithms like SLA-RALBA [9], OG-RADL [10], and D-RALBA [11] are effective to deal with dynamic task scheduling on the cloud with better resource utilization, however they are deterministic and feasible only when the number of tasks and virtual machines are below a certain threshold.
Heuristic based solutions are tailored for specific problems and yield optimal results; however, for NP-hard problems, their viability is limited to below a certain threshold.On the contrary, meta-heuristic algorithms are problem independent techniques that prove promising in areas where integer programming cannot cope with the sheer number of feasible solutions in a near optimal time frame [12].These algorithms provide an acceptable solution within a reasonable amount of time with the help of random search capability [13].Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) are among the pioneer meta-heuristic algorithms that have been consistently used for a variety of optimization problems.On the other hand, Grey Wolf Optimization, Jaya, Firefly, and Whale Optimization Algorithm (WOA) are some of the new meta-heuristic algorithms.
The strength of any meta-heuristic algorithms resides in its abilities to effectively explore and exploit the solution space; however, these techniques face local optima and premature convergence along with high execution time.These issues can be addressed with a variety of methods to efficiently diversify the solution space and select the optimum solution in a reasonable amount of time.For a better trade-off between global and local search, the operations of meta-heuristic algorithms can be improved in addition to parallelism to decrease the overall execution time.Parallelism can be employed to either parallelize computations or parallelize the population.Meta-heuristic algorithms possess intrinsic parallelism in their operations, making them even more effective.There are three parallelization topologies as: master-slave, coarse-grained, and fine-grained.In master-slave, the master node coordinates the allocated workload to multiple slaves.The master is responsible for both communication and coordination among slaves.Coarse-grained topology on the other hand divides a program into multiple chunks with little synchronization and communication.It is also called island or ring topology.Lastly fine-grained topology distributes a program into evenly small sized tasks with high level of synchronization in addition to more communication links.
Traditional algorithms such as OG-RADL [10] and DRALBA [11] schedule tasks based on the notion of Earliest Finish Time (EFT).However, this approach hampers the resource utilization of a cloud server because faster machines are more occupied than others, which also deteriorates the completion time of all tasks, referred to as makespan.Similarly, priority driven algorithms like PBFS [14] and SG-PBFS [15] perform scheduling by favoring shorter tasks to execute first, which also degrades the resource utilization and response time for bigger tasks.To handle all tasks and resources without any preferences for shorter/larger tasks or slower/faster machines, meta-heuristic algorithms have proven promising.They can optimize resource utilization and makespan of a cloud server.These algorithms work through random operations that require a proper balance between exploration and exploitation.They provide good enough solutions in an optimal time frame.In one of our previous studies [16], the Whale Optimization Algorithm (WOA) emerged as one of the new meta-heuristic algorithms that captured researchers' attention in the domain of task scheduling on cloud and fog computing.However, the exploration capability of WOA needs further refinement.In response to this, a Parallel Enhanced Whale Optimization Algorithm (PEWOA) is proposed in this study to schedule independent tasks on heterogeneous virtual machines on the cloud.It employs a modified encircling move and an adaptive bubble net attacking mechanism to effectively perform global search and local search at the appropriate times.The algorithm conducts parallel computations for all whales, utilizing a model similar to master-slave topology with minimal communication between master and slaves.The parallelism keeps the execution time lower.The proposed algorithm shows substantial improvements in the makespan, resource utilization, and throughput against WOAMm [17], RMPSO [18], MRMPSO [18], SAEA [19], and Genetic Algorithm using MapReduce framework (GAMR) [20].A series of experiments are conducted in CloudSim using the two workload datasets of GoCJ and HCSP.
The main contributions of this work are: 1) Shortlisting the studies from literature from 2019 to 2024 that investigate task scheduling on cloud computing in general and particularly using parallel metaheuristic algorithms.2) Proposing a Parallel Enhanced Whale Optimization Algorithm (PEWOA) utilizing multi-threading and improvements through a modified encircling move and an adaptive bubble net attacking mechanism.3) Extensive simulations of the proposed PEWOA against WOAmM, RMPSO, MRMPSO, SAEA, and GAMR on two workload datasets of GoCJ and HCSP.4) Analysis of the proposed algorithm against other algorithms in terms of minimizing the makespan, reducing response time, and increasing resource utilization and throughput in a time-efficient manner.5) Conducting statistical tests, including assessments of standard deviation, the Friedman test, and the Wilcoxon test to illustrate the significance of the results.The rest of the paper is laid out as follows: Section II provides the related work on task scheduling on cloud computing in general and emphasis on parallel metaheuristic algorithms.The mathematical modelling is given in Section III.Section IV illustrates the proposed algorithm, while Section V and VI present the workload selection, and experimentation and results respectively.At the end, the conclusion is provided in Section VII.

II. RELATED WORK
The adoption of meta-heuristic algorithms for task scheduling in cloud computing has been on the rise.In large-scale heterogeneous environments, discovering the optimal solution incurs a high computation cost.Hence, pursuing a near optimal solution within a reasonable timeframe emerges as a promising approach.The following are the related studies for task scheduling on cloud computing.
An enhanced Moth Search algorithm [21] improved makespan, throughput, and load balancing during task scheduling on the cloud.The proposed algorithm was enhanced by employing differential evolution, phototaxis, and levy flight.Similarly, differential evolution was combined with Electre III for scheduling independent tasks in [22].A nature inspired Chaotic Squirrel Search Algorithm increased the velocity and convergence precision to efficiently schedule tasks on the cloud in [23].The task scheduling problem was formulated as a multi-objective optimization problem.The early eco-system was developed through messy optimization to reduce expenses, prevent SLA violations, and minimize resource consumption.The proposed algorithm not only minimized makespan and energy consumption but also improved resource utilization, load balancing, and met deadlines.
A balanced distribution of resources is essential for minimum makespan and maximum resource utilization.OG-RADL, an Overall Performance-based Resource aware Load-balancer was proposed to schedule independent tasks in the cloud [10].It successfully minimized the makespan and maximized the resource utilization and throughput in addition to better load balancing.However, the algorithm toke decisions based on Earliest Finish Time (EFT) for all compute-intensive tasks, that made faster machines more occupied than others.Similarly, Dynamic and Resource Aware Load Balanced Approach (DRALBA) scheduled independent tasks using a deterministic routine to optimize average resource utilization, throughput, and makespan using GoCJ and HCSP workloads.However, it also occupies the faster machines more than others with the increase in workload [11].
The authors in [24] used the Dragonfly algorithm, Biogeography-based algorithm, and Mexican Hat Wavelet to reduce both execution time and response time during task scheduling.These three techniques successfully prevented premature convergence of the solution space and minimizing the SLA violations.The combination of the Biogeographybased algorithm and Mexican Hat Wavelet Transform introduced a mutation operation to assist the Dragonfly algorithm in avoiding local optima.However, in the given scenario, the mutation operation of a traditional genetic algorithm might be more beneficial.In [25], an adaptive regressive Holt-Winters algorithm is utilized to predict bursty or normal workload.Subsequently, the Firefly algorithm with lottery approach was applied to optimize the scheduling process, enhancing resource utilization, load balancing, and minimizing the energy consumption.However, the study did not highlight the nature of tasks.Another study based on modified Henry gas solubility optimization, improved makespan and execution cost during task scheduling [26].Yet, the impact of the improved makespan on resource utilization and throughput was not discussed.In [27], the authors proposed two scheduling algorithms for independent deadline sensitive tasks.The first algorithm employs a greedy approach based on a linear weight sum.The second algorithm used Ant colony optimization, positive feedback mechanism, and heuristic search.The proposed algorithms minimize energy consumption and makespan.
In [18], independent task scheduling was formulated with budget constraints and addressed using two parallelized PSO algorithms.The PSO was initially enhanced using a random integer matrix (RMPSO), followed by proposing two parallel variants of RMPSO based on a Multi-core system (MRMPSO) with shared memory and a many core-GPU system (GRMPSO).The GRMPSO outperformed the MRMPSO in decreasing the total cost and running time of the algorithm.The proposed G-RMPSO used fine-grained GPU threads to accelerate RMPSO particles' computations.During experiments, the number of threads varied from 2 to 12 for OpenMP and from 4 to 20 for CUDA.
In another study, a parallel Squirrel Search Algorithm (SAEA) combined with fuzzy logic optimally scheduled independent tasks on the cloud to minimize makespan, degree of imbalance, security threats, and energy cost under high load conditions [19].The population was divided into subgroups to evolve independently.After a specific number of iterations, the best squirrels were placed in the next subpopulation by replacing the worst squirrels.Fuzzy logic was used to calculate the fitness of each squirrel based on total execution time, makespan, energy cost, degree of imbalance, and security value.However, the communication strategy in SAEA was fixed.The population of squirrels was divided into ten sub-populations to facilitate the convergence of separate groups of squirrels.Subsequently, the best squirrels were migrated to the next sub-populations to increase the exploration of search space and avoid the local optima.
Task prioritization poses a bottleneck in explorationbased scheduling approaches that use various techniques for prioritizing tasks, resulting in increased execution times.Prioritizing tasks based on the shortest execution time deemed appropriate.To address this challenge, a parallel Genetic Algorithm using MapReduce (GAMR) was proposed for cloud workflow scheduling, incorporating different priority queues to reduce the makespan [20].In the first phase, the GA and earliest finish time approach assigned tasks to processors followed by using GA with MapReduce to assign jobs to processors in a heterogenous cloud environment.GAMR outperformed PSO, WOA, Moth-Flame Optimization (MFO), and Intelligent Water Drops (IWD).Nonetheless, only the mutation operation was parallelized in the proposed algorithm.
The unpredictable nature of workload on cloud servers is a major pitfall for reduced resource utilization and efficiency.A task scheduling strategy based on binary JAYA was implemented in [28] to alleviate the above issues.It not only increased the resource utilization, but also reduced the energy consumption and minimized the makespan.In the first stage, tasks were evenly distributed on virtual machines, subsequently executing the proposed JAYA algorithm for the best possible matchmaking among tasks and virtual machines.Both independent and dependent tasks were simulated in experiments to reduce both the makespan and energy consumption, improve load balancing, and maximize the resource utilization.However, the proposed algorithm was evaluated against the old versions of Genetic Algorithm, Particle Swarm Optimization, and Round Robin.
A hybrid firebug and Tunicate Optimization (HFTO) algorithm optimized makespan, response time, and fault tolerance [29].The proposed algorithm offered an enhanced searching capability with faster convergence.HFTO is a preemptive technique that assigns smaller tasks to virtual machines with peak load, while assigning bigger tasks on machines with lower CPU utilization.It improved makespan, average execution time, and load balancing among the machines.The task preemption also improved both the execution time and response time.
Another scheduling technique called the Johnson Sequencing algorithm was originally used in a manufacturing unit.In [30], the Johnson Sequencing algorithm was adapted using a three step approach for task scheduling in cloud computing across three servers minimizing the completion time of all tasks.First, a precedence constraint graph was developed for identification of dependencies among jobs.Second, the jobs were assigned to servers followed by employing the Johonson Sequencing to determine the best ordering of the jobs on each server.The proposed Johonson Sequencing algorithm minimized makespan and improved resource utilization in addition to exhibiting better scalability.However, the scalability analysis was based on a limited number of jobs during simulations.In [31], a hybrid algorithm based on Genetic Algorithm and Gravitational Emulation Local Search (GELS) was developed, minimizing makespan and increasing resource utilization while scheduling task in the cloud.However, the comparative analysis only included the primitive versions of GA and PSO.
Task schedulers based on priority rules struggle to meet user satisfaction.To tackle this problem, a Priority Based Fair Scheduling (PBFS) was presented in to minimize the makespan, flow time and total tardiness [14].However, only two dataset instances of GoCJ were utilized out of nineteen during simulations.In continuation of this study, the Priority Based Fair Scheduling (PBFS) algorithm was improved by proposing Shortest Gap-PBFS (SG-PBFS), a backfilling technique utilizing gaps in the job schedule [15].The proposed algorithm outperformed other Shortest Gap based algorithms such as SG-SJF, SG-LJF, and SG-(Max-Min) etc. in terms of minimizing makespan, missed deadlines, reducing both delays and flowtime.However, the nature (homogeneous or heterogeneous) of virtual machines was not stated.Moreover, SG-PBFS favors shorter jobs to execute on a priority basis that may result into lower resource utilization.The experiments have not used all the instances of GoCJ workload.
In the literature, there are abundant studies based on parallel meta-heuristic algorithms for task scheduling, but the keyword ''Parallel'' refers to parallelism in two different perspectives.One involves parallelism among tasks' execution, while the other entails parallel execution of the scheduling algorithm.Most of the studies primarily focus on the first interpretation, which involves running multiple tasks in parallel after the scheduling decision has been made.However, this work focuses on the parallel execution of a meta-heuristic algorithm for independent task scheduling on the cloud.Therefore, [18], [19], and [20] represent the most relevant studies found in the literature that are considered for comparative analysis with the proposed Parallel Whale Optimization Algorithm.Furthermore, [18] and [19] dealt with independent tasks scheduling, while [20] addressed the scheduling of dependent tasks.Similarly, [18] and [20] parallelized the computations performed by the agents, while [19] parallelized the sub-populations of agents.The proposed PEWOA also parallelizes the computations performed on all whales in the population.Table 1 presents the summary of the related studies.

III. MATHEMATICAL MODELING
A cloud data center contains hundreds of host machines that provide various types of resources to end users.Each host machine often resides thousands of dynamically generated virtual machines [32].Similarly, multiple hosts can collectively generate a single virtual machine [33].Cloud providers offer different types of virtual machines with various performance and pricing specifications.This paper presents the allocation of virtual machines to incoming independent tasks.It is assumed that each task will run on a single virtual machine and cannot be divided.Task scheduling with heterogeneous resources is a combinatorial optimization problem, where the tasks and virtual machines can be expressed as Eq. ( 1) and Eq. ( 2) respectively.
The set T contains the number of instructions for each task, while VM represents a set of virtual machines with compute capacities in Millions of Instruction Per Second (MIPS).Generally, the number of tasks is greater than the number of vms.The sets T and VM serve are inputs for a scheduling algorithm, and an optimized mapping of all tasks over a set of vms present a final solution expressed in the form of a map, as shown in Eq. ( 3): In the solution map, the first item (task) of every tuple will be unique, while the second item (vm) can be repeated.Each task will be allocated only one vm, whereas a vm can have multiple tasks mapped to it.The Execution Time (ET) of task i on vm j can be computed using Eq. ( 4).
ET task i vm j = No. of Instructions in t i /vm j MIPS (4) It is assumed that each virtual machine will execute multiple tasks in a specific order without preemption.The Completion Time (CT) of all assigned tasks on a specific vm is expressed as Eq. ( 5): Here faster machines will have shorter completion times as compared to slower machines.In a meta-heuristic algorithm, the agents are manipulated in various ways before finding their fitness, therefore the assigned tasks on a vm will keep on changing during the execution of an algorithm.If a replacement of task x on a vm with task y is required, the completion time of all tasks on that vm will be updated through Eq. ( 6): One of the important factors during tasks scheduling is makespan, which is the completion time of all tasks on a given set of virtual machines.It is represented by Eq. ( 7) [10]: The unit used to represent execution time, completion time, and makespan in this paper is seconds.As it is beneficial to use a resource in its entirety before employing another instance on cloud, a higher resource utilization is favorable during task scheduling.The Average Resource Utilization (ARU) of a host machine is computed using Eq. ( 8) [10]: The sum of completion time for all vms is divided by the number of vms (m).The resulting value is then divided by makespan.
The efficiency of a system can be expressed in terms of throughput, which is the number of tasks executed per unit time.It can be expressed by Eq. ( 9) [11] as: Throughput = Total No. of tasks/Makespan (9) Throughput is equal to the total number of tasks divided by makespan.The unit for throughput will be the number of tasks completed in one second.
After the scheduling decision, the time a vm takes to start executing a task is referred to as Response Time (RT).Multiple vms often share the same physical host and multiple  tasks can run on a single vm.Eq. ( 10) [10] represents the average response time of all tasks on a set of vms as follows: The sum of the execution start times of all tasks is divided by the total number of tasks to yield average response time of a single vm.Then the sum of the average response times for each vm is divided by the total number of vms.
Table 2 lists down the description of all notations used in equations and pseudocode of the proposed algorithm.

IV. PROPOSED PARALLEL ENHANCED WHALE OPTIMIZATION ALGORITHM
The proposed algorithm is an enhanced version of Whale Optimization Algorithm (WOA) [34].The WOA was formulated as a population based meta-heuristic algorithm inspired from humpback whales.In this approach, several whales serve as agents, each representing a prospective solution to the optimization problem.A group of whales is aware of prey's location and employes a hunting strategy called bubble net feeding.It involves two types of maneuvers: a circular movement and a shrinking move that reduce the circumference of the circle, as illustrated in Figure 2. The whales also release air bubbles that ascend from the whales to the top of the seawater.Several whales start these maneuvers, gradually ascending to sea's surface while simultaneously shrinking the circle and bubbling, effectively trapping the prey (a school of fish or krill) in a confined area.
The shrinking encircling mechanism facilitates exploitation of the solution space that is governed by a variable A using Eq. ( 11) and (12).
r1 is a random number whereas ''a'' linearly decreases from 2 to 0 that shrinks the circle around the prey.Similarly, A is a random value in the range of [-a,a].If the value of A is greater than/equal to 1, a new random whale position (nRWPos) in the search space is selected using Eq. ( 13) to Conversely, if the value of A is less than 1, a new position will be computed using the best whale's position by Eq. ( 14).
There is another random variable p with a range of [0,1] that represents the probability of using bubble net feeding.If p is equal to/greater than 0.5 (50% probability), bubble net attack is triggered to update the whale's position (nWPos) according to Eq. ( 15), otherwise the whale keeps on shrinking the circle according to Eq. ( 14) as shown in Figure 2.
l is also a random number in the range [−1,1], while b is a constant defining the shape of spiral.In our proposed algorithm the value of b ranges from 1 to 2.5 to determine the logarithmic spiral shape.If the new whale position (nWPos) is outside the solution space, it is assigned a random position.
After computing the new position of a whale, its fitness is calculated.The process of updating the whales' positioning continues until the maximum number of iterations are completed.In every iteration, the best whale is selected and kept in memory.
Due to the limited exploration abilities of whale optimization algorithm, a modified encircling maneuver and an adaptive bubble net attacking mechanism are proposed in addition to parallelism.If the solution space contains local optima, WOA tends to trap in it.Unlike continuous optimization problems, combinatorial optimization faces a narrow search space with a high probability of local optima.The proposed enhancements enable PEWOA to move out of the local optimum regions.The shrinking encircling mechanism is enhanced using Eq. ( 16) to replace Eq. (11).
The proposed change in Eq. ( 16) increases the solution diversity by generating the values of a in the range of [1,3].It improves the exploration potential of PEWOA.Further, Eq. ( 17) is used to modify the coefficient of the spiral updating mechanism, altering the shape of the logarithmic spiral during bubble net attacking mode.
Eq. ( 17) allows different shapes of the logarithmic spiral for whales, thus enabling a better balance between exploration and local search.A lower value of b favors global search, while a higher value exploits the best known solutions in the search space.Eq. ( 17) keeps the value of a below 1 for roughly 60% of the time, while it reaches up to 3 in the later stages to refine the existing solutions.
In the proposed PEWOA, a separate thread using Java Executor Framework [35] is allocated to the encircling and bubble net attacking maneuvers of each whale using a distinct set of values for faster convergence.It utilizes a masterslave parallelization model with multiple threads and a shared memory to explore different regions of the solution space.It provides minimal communication among threads and between master and slave nodes as depicted in Figure 4.
Unlike the master-slave model used in MRMPSO [18], the threads in PEWOA do not send back the result to a master node; instead, every thread updates a shared memory to store the global best solution.If any thread is stuck in local optima, others could still perform the exploration of the solution space and converge to the global optimal solution.The shared variables and maps among the whales are accessed via locks to ensure data consistency.Whale Optimization Algorithm, in general, involves a number of complex operations, making it a computation-intensive procedure.Therefore, a parallelization strategy is valuable to reduce the execution time of PEWOA.Following 1 is the pseudocode of the proposed algorithm.
A sets of tasks (T) and virtual machines (VM) serve as the inputs for PEWOA, while the final schedule is represented as a hashmap with T as keys and VM as values.In lines ( 1)-( 4), hashmaps are declared for whales (wMap), virtual machines (vmMap), best whales (bWsMap), and global best whale (gBWMap).The number of whales and maximum iterations are specified in line 5. Best whale makespan (bWMk) and global best value (gBValue) are initialized with maximum values in line 6.In line 7, all maps are initialized.
A while loop spanning over line 8 to 52 iterates 220 times.At line 9, a pool of threads is created based on the population size of whales.A for loop for each whale begins at line 10 and continues until line 52, implementing various types of changes for each whale.The shrinking encircling parameter is updated in each iteration ranging from 3 to 1 (line 11) followed by the declaration of three random numbers r1, r2, and p (probability) in the range of [0,1] (line 12).At lines 13-14, A and C are declared as coefficients with ranges [−3,3] and [0,2] respectively, while l is variable with values [−1,1] (line 15).b ranges from 0 to 3 (line 16) and variables Drand, D, D ′ , nRWPos, nWPos, and randVm are initialized to zero (line 17).The description of these variables is given in Table 1.
A for loop iterates (line 18) over all tasks in a whale to manipulate the assignment of virtual machines for each task.At line 19 and 20, the current task and vm are assigned to cloudlet and vm variables respectively.Line 21 to 32 presents an if construct based on the value of p.If p is less than 0.5 and the absolute value of A in a nested if statement (line 22) is greater than or equal to 1, the distance is computed from the current whale position to any random whale position (line 23).Based on the random distance, the new whale r1, r2, p ← randNo(0, 1) end for 52: end while position is calculated at line 24 using A. If p is less than 0.5 and the absolute value of A is less than 1 in the else part of the second if clause (line 25), the distance is computed from the current whale to the best whale found so far (line 26).At line 27 the new whale position is computed by utilizing the best whale positioning and A. The nested if clause ends at line 28.If p is greater than or equal to 0.5 in the else part of the first if clause, the algorithm enters the bubble net attacking mode (line 29).Now the distance of the current whale is calculated from the best whale without using the value of C (line 30).The new whale position is computed using the location of the best whale, the corresponding distance (D ′ ), and a spiral manoeuvre (e bl * cos(2 * π * l)) at line 31.The if statement at line (21) ends at line 32.
Another if statement (line 33-35) checks the new whale position.If it is outside the solution space, a random position is assigned to the whale.Based on the new whale position, the relevant vm is selected from the list of virtual machines (line 36).The execution time of a task on the already assigned vm is calculated and saved in oLd (line 37).At line 38, the execution time of the task on the new Vm is computed and stored in nLd.A function update_vmMap() propagates the new load of task by adding and removing the execution times on the two virtual machines (line 39).The whales map is updated at line 40 and the loop ends at line 41.The code from line 42 to 51 can only be executed by a single thread at a time.At line 43, the best whale's makespan is calculated using getPBMap() function.If the new makespan is less than the whale's old makespan (line 44), then the best whales map will be updated with the new fitness value (line 45).At line 46, if the newly calculated makespan is less than the global best makespan, it will also be assigned the new value of makespan (line 47).The new global best whale will be kept in memory at line 48 as the final solution.Lines 49 to 52 terminate the nested if statements and the for loop started at line 18 respectively.
The time complexity of an algorithm is fundamental to evaluate its practicality in an elastic cloud environment.For PEWOA, it is computed as O(T (NW * D) + FitFun * NW ) which is the same as WOAmM.Here, T is the number of iterations, NW is the number of whales in population, D is the dimension of a problem, and FitFun is the cost of evaluating a fitness function.However, WOAmM possesses a greater number of operations in comparison to PEWOA.Although all whale optimization algorithms have a higher inherent complexity, the parallelization in PEWOA makes its execution time much lower.

V. WORKLOAD DATASETS
The proposed PEWOA and other comparative algorithms are assessed using the following two datasets.

A. GOOGLE CLOUD JOBS (GOCJ)
The Google Cluster Traces [36] is a real time log of workloads that ran on Google Borg cluster, comprising 12.5k machines.
The trace covers information such as the submission time, scheduling information, and usage of resources.However it does not provide the size of jobs or their deadlines.The trace contains data for 25 million tasks grouped into 650 thousand jobs over a span of 29 days [37].All the provided data in the trace is normalized and obfuscated to avoid disclosing confidential information.Some jobs (0.003%) are omitted from the trace as they ran on nodes not part of this trace.Some task and job events, 0.013% and 0.0008% respectively, have non empty missing fields.Moreover, data is missing for an average of 0.05% of job/task scheduling events and less than 1% of resource usage records.
It is tedious and infeasible to utilize such an enormous number of tasks in simulations, especially when faced with limited resources.Therefore, the Google Cloud Jobs (GoCJ) dataset is adopted that is derived from Google Cluster Traces 2011 [36] using bootstrapped Monte Carlo (MC) simulation [38].The GoCJ is a realistic dataset that reflects the workload bahavior of Google Cluster Traces, as asserted by [39], [40], [41], [42], and [43] and analysis of the MapReduce logs from M45 supercomputing cluster by [44].
Instead of randomly choosing values, the original dataset is repeatedly sampled by selecting a single datapoint from the origional dataset in bootstrapping.A list of 50 different sized jobs from the origional dataset is input into the MC boostrapping with equal probability, considering an average computing power of 1000 MIPS for machines.There is a covariance of 2.49 between the origional and average GoCJ datasets.Figure 5 shows the comparison of data distribution for the Original Dataset(O-Dataset) and the 19 GoCJ instances.
In the context of GoCJ, the terms ''job'' and ''task'' are used interchangeably, both referring to an independent set of instructions.The distinction between these terms becomes essential when there are dependencies among tasks.Furthermore, it is importatnt to note that the granularity of tasks is higher than that of jobs.
The median of all datasets falls within the range of 870000-970000 MIPS.Similarly the ranges for the first quartile and third quartile are 610000-670000 and 115000-112000 MIPS respectively.The minium and maximum sizes of jobs in all the datasets are also the same.The size of jobs is calculated using the expected time to completion figures in the original dataset, as per Eq. ( 18).

Size of Job (MIPS) = Machine(MIPS) x ETC (18)
The ratio of different categories of jobs with the correponding ranges of instructions in GoCJ is shown in Figure 4. Medium sized jobs (40%) clearly constitue the highest percentage, followed by large (30%) and small (20%) jobs.Figure 5 illustrates that the extra-large (6%) and huge (4%) sized jobs are the least in proportions respectively.

VOLUME 12, 2024
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. HETEROGENEOUS COMPUTING SCHEDULING PROBLEM (HCSP)
The Heterogeneous Computing Scheduling Problem dataset [45] is based on the notion of Expected Time to Compute (ETC) for tasks in a heterogeneous environment.The workload assumes each task as an atomic unit and nonpreemptive.Additinally, it also assumes that the execution time of a task varies from machine to machine, aiming to minimizes the makespan of tasks.The dataset offers three types of instances (small, medium, and large) based on the size and complexity of tasks and virtual machines.In this paper, the small HCSP instance has been utilized having 1024 tasks and 32 virtual machines.
HCSP uses a notation of c/i_heterogeneity for tasks (hi/lo) and VMs (hi/lo).The ''c'' and ''i'' stand for consistency and inconsistency respectively.Heterogeneity for tasks and VMs can be either ''hi'' or ''lo''.Low heterogeneity (lo) signifies similar computing resources, while high heterogeneity (ho) indicates a wide range of computing machines.Similarly, the degree of similarity among task execution times is denoted as low heterogeneity and vice versa.For reflecting a realistic scenario, HCSP classifies the workload as consistent (c), inconsistent (i), or semi-consistent (s).Consistency occurs when if a machine executes a task faster than other machines, and it's likely that the same machine will execute other tasks faster in comparision with the rest of machines.In case of inconsistent behavior, a machine may be faster to execute a task, but may not perform similary with other workloads.This category mirros a distributed infrastructure of heterogenous resources with a variety of tasks.A third category is a semiconsistent model, combining characteristics of the first two workloads.Table 3 shows the different configurations of HCSP instances.
The ETC matrices are designed using a range based method that incoporates task heterogeneity (R TASK ), machine heterogeneity (R MACH ), and consistency.Initially a Tx1 baseline vector (B) is generated using a uniform distribution of floating point values in the range [1, R TASK ].Subsequetly, the rows of ETC(t i xm j ) matrix are constructed by multiplying the vector B with another uniform random number X (called row multiplier), which falls in the range [1, R MACH ].As a The minimum and maximum values for task heterogeneity are 100 and 3000 respectively, while the corresponding values for machine heterogeneity are 10 and 1000 respectively.The wider range of task heterogeneity (100-3000) compared to machine heterogeniety (10-1000) reflects the greater variablitly in heterogenity for tasks in real world scenarios.For consistent data, the ETC rows are sorted from left to right in descending order creating an ordered dataset.The unsorted ETC matrix constitutes the inconsistent dataset.In the case of semi-consistent data, the even indexed columns' data is extracted for each row, sorted and replaced, while the odd indexed columns remain unchanged.The ETC matrix represents the task dataset, while the participating X table listing the virtual machines.This paper utilizes a small dataset comprising 1024 tasks and 32 machines.In total there are four workload instances as c_lohi, i_lohi, i_hilo, and c_hilo.The size ranges of tasks and virtual machines are provided in Table 4.

VI. EXPERIMENTS AND RESULTS
The experiments are conducted on an Intel Core i7-4790 3.60 GHz processor, equipped with 8 GB of RAM and 1 TB storage.To assess the performance of the proposed Parallel Enhanced Whale Optimization Algorithm (PEWOA), simulations are carried out in CloudSim 3.0.3using the datasets of Google Cloud Jobs Dataset (GoCJ) and Heterogeneous Computing Scheduling Problem (HCSP).Table 5 details the datacenter configuration for GoCJ according to [18] and HCSP.
The performance of every meta-heuristic algorithm is sensitive to the number of agents and iterations; therefore [18], [19], and [20] will be executed with their proposed number of agents and iterations provided in Table 6.The authors used random generated tasks with a uniform distribution to evaluate benchmark algorithms.For PEWOA and WOAmM, 60 number of agents and 220 number of iterations are selected.With these given numbers, PEWOA provides adequate performance with optimum running time.
While comparing meta-heuristic algorithms, another pertinent aspect to consider is the seed generation (intial population).All algorithms in this paper are using the same seed, except SAEA which undergoes minor changes to simulate location and mapping matrices.The values presented in all experiments indicate the average figures of ten different runs of the algorithms.

A. PERFORMANCE ANALYSIS OF PEWOA 1) MAKESPAN
In cloud computing, makespan is the one of the most crucial factors during task scheduling [46].It represents the completion time of all tasks scheduled on a a set of virtual machines, as expressed by Eq. ( 7) [10].A lower makespan makes a cloud server efficient to execute workloads swiftly.In Figure 7, the makespan of all algorithms increases with the increasing number of tasks.However, the proposed PEWOA performs considerably well to keep the makespan low through the optimal assignment of virtual machines to incoming tasks.The second best makespan figures are shown by WOAmM [17], an enhanced whale optimiation algorithm.The bahavior of SAEA and GAMR is similar, while RMPSO and MRMPSO exhibit nearly identical makespans, as MRMPSO is a parallel version of RMPSO.The relative increase in average makespan for all algorithms is illustrated in Table 7.
PEWOA demonstrates the ability to handle an increasing number of tasks with a relatively modest increase in makespan.In Table 7, PEWOA indicates the least increase (4.31%) in average makespan over 19 GoCJ instances, showcasing better scalability.WOAmM possess the second least increase (8.99%) in average makespan.RMPSO and MRMPSO remain the third-best algorithms, with increasing makespan figures of 16.84% and 16.91% respectively.The worst figures (28.98% and 23.95%) are exhibited by SAEA and GAMR respectively, indicating thier minimal ability to schedule a growing number of tasks.It is pertinent to mention that the makespan figures of SAEA and GAMR show unpredicted highs and lows with various GoCJ instances.The average makespan values for all algorithms are provided in Table 8.Cloud service providers often have SLAs with users specifying the maximum time frame for executing their workload.So, a minimum makespan helps companies to provide results within the agreed upon time duration.
On HCSP workload, a similar behavior is observed with reduced makespan as shown in Figure 8.For a consistent dataset comprising of low heterogeneous tasks and highly heterogeneous vms, the makespan of PEWOA is the lowest as compared to WOAmM, RMPSO, MRMPSO, SAEA, and GAMR, but with a relatively less margin especially in comparision to GAMR.The smallar bars in the graph for c_lohi result from the higher MIPS capacities of vms compared to the smaller size of tasks.There is a small difference between WOAmM and PEWOA for c_lohi, but the difference in makespan increases with the increasing complexity of tasks and vms in i_hilo.For i_lohi, the makesapan of both PEWOA and WOAmM remain unaffected by the low heterogeneous tasks and high heterogeneous vms unlike other algorithms.GAMR struggles with the inconsistent behavior of i_lohi.For i_hilo, where the ratio of the size of tasks to vms is the largest, the makespan of all algorithms is the highest.Despite the parallelism in MRMPSO, RMPSO and MRMPSO have the same makespan figures, while PEWOA still outperforms others.Although SAEA and GAMR are parallelized algorithms utilizing multiple subpopulations and concurrent fitness evaluation respectively, yet they show the worst results.The SAEA is dominant over GAMR by effectively managing the high heterogenity of tasks.In case of c_hilo, the results are similar to c_lohi, but the different magnitude of the bars.PEWOA continues to provide a better makespan with high task heterogeneity.WOAmM shows minimum makespan figures after PEWOA.The results of both PSO variants are identical, but the scheduling behavior of both SAEA and GAMR differs, with GAMR performing better than SAEA.The average makespan values for all algorithms are provided in Table 13.

2) RESOURCE UTILIZATION
A higher resource utilization enables the use of fewer resources, resulting in savings and a number of advantages.Cloud providers strive to use fewer resources while meeting QoS using Eq. ( 8) [10].Figure 9 clearly indicates that the proposed PEWOA achieves the highest resource utilization compared to the benchmark algorithms.WOAmM has better resource utilization than RMPSO, and MRMPSO, while GAMR displays better resource utilization than SAEA.Despite implementing a migration policy, SAEA has the lowest resource utilization among all algorithms.The average values of resource utilization for all algorithms are presented in Table 9.
In Figure 10, for c_lohi, the resource utilization of PEWOA is significantly higher than others due to the execution of parallel threads and enhancements, while RMPSO, MRMPSO, and GAMR show nearly identical utilization of resources.WOAMm has consistently better performance in all the tests after PEWOA.The parallelization of multiple subpopulations in SAEA does not lead to improved results.It is evident that the inconsistent nature of i_lohi has negatively the resource utilization of all algorithms, with the least impact on SAEA.Similarly, MRMPSO and PEWOA show identical reductions in resource utilization.Better resource utilization reduces the idle time of individual resources in the cloud.Idle resources represent wasted capacity that could be used for executing tasks.The ability to scale with dynamic workloads also depends on the proper resource utilization of avaiable resources.PEWOA manage to provide better performance on varying workloads due to its improved resource utilization.Similarly, resource utilization has multifaceted effects on factors such as cost and energy consumption; however, these aspects are beyond the scope of this study.The average values of resource utilization for all algorithms are provided in Table 17.

3) THROUGHPUT
Throughput is a key indicator of the overall efficiency of a cloud.It is defined as the number of tasks completed per unit time, expressed by Eq. ( 9) [11].A system with high throughput makes efficient utilization of resources.Figure 11 illustrates the highest throughput achieved by PEWOA, primarily attributed to running multiple threads and the modified encircling move and logrithmic spiral mechanism.There is a notable difference between the throughput of the proposed algorithm and the rest of benchmark algorithms.The average values of throughput for all algorithms are provided in Table 11.
On HCSP, Figure 12 depicts the least difference in throughput among all algorithms for i_hilo, followed by c_hilo.The number of tasks executed per second is the highest for all algorithms in the case of c_lohi because all algorithms perform well with a consistent set of tasks and vms.However, for i_lohi, the throughput of all algorithms is negatively affected, particularly GAMR.The average values of throughput for all algorithms are provided in Table 15.

4) RESPONSE TIME
The time a virtual machine takes to start executing a mapped task after task scheduling is called response time.A lower response time indicates a higher level of productivity and performance.For task scheduling in a virtualized environment, Eq. (10) computes the average response time.In Figure 13, the average response time of all benchmark algorithms is the same, but our proposed PEWOA shows relatively the minimum figures for 16 out of 19 GoCJ instances.The average values of response time for all algorithms are provided in Table 10.
Unlike other optimizaion metrics, the response time of all HCSP instances is nearly the same for all algorithms, execpt for c_lohi where GAMR and PEWOA show better response time.For i_hilo, the response time of all algorithms is identical, while for i_lohi and c_hilo, both WOAmM and PEWOA indicate minor improvements over the benchmark algorithms as depicted in Figure 14.The average values of response time for all algorithms are provided in Table 16.

5) EXECUTION TIME
The execution time of a meta-heuristic algorithm is also crucial in a scalable environment.It is measured as the difference of starting time and complete execution of an algorithm.In Figure 15, WOAMm exhibit the worst  execution time due to the internal complexity of whale optimization algorithm followed by RMPSO.However, the parallelized variant of RMPSO shows a considerably lower execution time.The proposed PEWOA has a lower execution time than RMPSO and MRMPSO, even though it involves more computations.However, it is still inferior to SAEA and GAMR.SAEA has low time complexity due to its simple internal structure and the utilization of multiple subpopulation of squirrels.GAMR has the lowest execution time, attributed to its undemanding implementation of crossover and mutation.The average values of execution time for different algorithms are provided in Table 12.
Similarly, in Figure 16, GAMR exhibits the lowest execution time with a mutation probability of less than 0.7.PEWOA has the second best execution time because of multithreading.The migration of squirrels in SAEA is complicated, possibly contributing to its higher execution time.Notably, there is a significant difference observed between RMPSO and MRMPSO for the first time.The latter uses a master-slave parallelization model, reducing the overall execution time as compared to the former.PEWOA shows better execution time on HCSP as compared to GoCJ, where it has the second lowest execution time.WOAMm has the highest execution time on HCSP workload as well.A scheduling algorithm with lower execution time enables the handling of diverse workloads effectively.The average values of execution time for all algorithms are provided in Table 17.
The proposed algorithm exhibits a considerably lower makespan than WOAmM, RMPSO, MRMPSO, SAEA, and GAMR.It also demonstrates superior utilization of virtual resources and higher throughput for both workload instances of GoCJ and HCSP.The execution time of PEWOA has been reduced using multi-threading to schedule tasks efficiently.The algorithm consistently achieves lower makespan, higher resource utilization, and greater throughput for various workload instances, demonstrating better scalability.This scalability makes PEWOA well-suited for an elastic cloud infrastructure, where the scale of workloads continuously grows and shrinks.

B. STATISTICAL TESTS 1) STANDARD DEVIATION
It represents the variation or dispersion in a given set of values.Table 18 illustrates that PEWOA has the most consistent makespan followed by WOAmM.Although the PEWOA average resource utilization is better among all, MRMPSO shows more consistent figures.Similary, despite   the PEWOA's impressive throughput, SAEA shows consistent behavior for the number of tasks completed per unit time.The response time of SAEA and GAMR display the least variation.Additionally, the execution time of GAMR is the lowest among all algoirthms and is least affected by the increasing number of GoCJ tasks.
The standard deviation figures (in Table 19) on HCSP shows a similar pattern with minor changes for MRMPSO and SAEA, where the latter has a lower range of resource utilization values than the former.The response time for RMPSO, MRMPSO, and SAEA does not show any variation across the four HCSP instances.

2) FRIEDMAN TEST
The Friedman test [47] is a non-parametric statistical test developed by Milton Friedman.It is used to detect changes in various techniques applied on a particular dataset.To illustrate the behavior of PEWOA againts other algorithms, the Friedman test is performed on all instances of GoCJ and 23544 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.HCSP.The values for the test statistic also known as chi square (χ 2 ) for makespan, resource utilization, throughput, response time, and exeuction time are provided in Table 20.Based on the given values, the null hypothesis (H 0 ) (that there is no difference among the given algorithms) is rejected for both GoCJ and HCSP datasets.The computed chi square values are significantly larger than the corresponding critical values at 5 degrees of freedom (df).

3) WILCOXON TEST
The Wilcoxon Signed-rank test [48] is another nonparametric test developed by Frank Wilcoxon.It is employed to identify any significant difference between two pairs of data.This test is also performed on every benchmark  unable to identify any difference.Therefore, instead of the average values over ten separate runs, all records have been used, making the total number of records 40 to calculate the Wilcoxon test successfully.The values of Wilcoxon Statistic (W) for both GoCJ and HCSP are provided in Table 21.
For GoCJ, the critical value of Wilcoxon's statistic is 46, therefore H 0 is rejected for PEWOA against all algorithms.Similarly, on HCSP workload, H 0 is rejected except for response time because the Wilcoxon's statistic values are greater than the critical value of 264.Hence, there is not a significant difference in the response time of PEWOA in comparison to WOAmM, RMPSO, MRMPSO, SAEA, and GAMR.

VII. CONCLUSION
Task scheduling poses a significant NP-hard problem in cloud computing, impacting the efficiency and resource utilization of cloud datacenters.Efficient scheduling is a prime factor for better quality of service.Heuristic algorithms such as DRALBA, OG-RADL and SG-PBFS etc. schedule tasks on a given set of resources by either prioritizing tasks or ranking resources that hamper the resource utilization of a system and ultimately lower makespan.However, a metaheuristic algorithm treats all tasks and resources impartially.However, the right balance between global search and local search in a meta-heuristic algorithm is exigent to provide an optimal result.In response to this, a Parallel Enhanced Whale Optimization Algorithm (PEWOA) is proposed for scheduling of independent tasks on heterogeneous virtual machines in the cloud.PEWOA incorporates parallelization, an updated encircling maneuver and a bubble net attacking mechanism to the solution avoid local optima, and improve convergence.The enhanced encircling maneuver and bubble net attacking mechanism optimized the solution quality by hitting the right balance between exploration and exploitation at the right time.Despite the internal complexity, parallelization reduced its execution time.Extensive simulations demonstrate that PEWOA minimizes makespan, response time, and increases resource 23546 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
utilization and throughput against WOAmM, RMPSO, MRMPSO, SAEA, and GAMR.The proposed PEWOA provides superior scalability and efficient task scheduling across 19 workload instances of GoCJ.In the case of HCSP with four workload instances, PEWOA maintains similar performance figures while addressing various heterogeneity levels among tasks and virtual machines.Statistical tests, including Standard Deviation, Friedman test, and Wilcoxon test, confirm the significance of the results.In future, it is planned to further improve the algorithm, specifically tailored for task scheduling on fog computing environments.

FIGURE 1 .
FIGURE 1. Mapping of tasks on cloud VMs.

FIGURE 2 .
FIGURE 2. Movement of whales during bubble net attack.

FIGURE 5 .
FIGURE 5. Data distribution of google cluster traces and GoCJ instances.

FIGURE 6 .
FIGURE 6. Resource mapping distribution in 106 articles included in this study.

FIGURE 13 .
FIGURE 13.Task Response time on various GoCJ instances.

FIGURE 14 .
FIGURE 14. Task Response time on various GoCJ instances.

TABLE 1 .
Summary of related studies.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 2 .
Notation and descriptions.

TABLE 4 .
HCSP workload.result, the ETC matrix comprises the values within the range [1, R TASK xR MACH ].

TABLE 6 .
No. of Agents and iterations.

TABLE 7 .
Relative increase in average makespan.

TABLE 10 .
Average response time on GoCJ.

TABLE 12 .
Average execution time on GoCJ.

TABLE 16 .
Average response time figures on HCSP.

TABLE 17 .
Average execution time figures on HCSP.

TABLE 18 .
Standard deviation of various optimization metrics on GoCJ.

TABLE 19 .
Standard deviation of various optimization metrics on all HCSP instances.

TABLE 20 .
χ 2 values for GoCJ and HCSP.algorithm against PEWOA for any significant differences in makespan, resource utilization, response time, and execution time.With four instances in HCSP, the Wilcoxon test is