Levelized Multiple Workflow Allocation Strategy Under Precedence Constraints With Task Merging in IaaS Cloud Environment

Cloud Service Providers are speedily becoming the target platform for scientific workflow computations due to the massive possible and flexible pay-as-you-go pricing model. Workflow allocation problem in cloud systems is considered NP-hard. A heterogeneous IaaS cloud could be fully effective if the allocation method provides an efficient mapping between virtual machines (VMs) and workflow applications demanding execution. First, we model multiple workflow allocation problem in the cloud environment. Then, we propose a levelized multiple workflow allocation strategy with task merging (LMWS-TM) to optimize turnaround time for multiple workflow applications in the Infrastructure as a Service (IaaS) cloud environment to achieve better performance. The task merging scheme is incorporated into workflows after partitioning and prior to allocation to reduce inter-task communication share and the total number of depth levels for improving the overall completion time. Moreover, it considers inter-task communication and inter-machine distance for estimating communication cost share among tasks on the schedule generated. Furthermore, the scheme is capable enough to use simple and flexible level attributes to tackle precedence constraints. Afterward, we conducted an experimental study to evaluate LMWS-TM by comparative performance analysis with its peers, namely SLBBS, DLS, and HEFT, on quality of service (QoS) parameters, namely, turnaround time, system utilization, flow time, and response time. The study reveals the superior performance of LMWS-TM among its considered peers in almost all the cases for almost all considered parameters under investigation. Finally, we performed statistical testing to test the significance level using SPSS 20, confirming the hypothesis drawn in the experimental study.


I. INTRODUCTION
Infrastructure as a Service (IaaS) cloud system consists of heterogeneous processing resources interconnected via a The associate editor coordinating the review of this manuscript and approving it for publication was Nitin Gupta .
high-speed network, capable of matching the requirements of industrial and scientific distributed applications on-demand through collaborative sharing. These systems facilitate sharing of heterogeneous geographically distributed resources in a dynamic environment and self-aggregation depending on cost, availability, capability, performance, and customer VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ demands [1], [2]. Here, the first and foremost concern is allocating compute-intensive resources over workflow applications to optimize the quality of service (QoS) parameters. These facilities are provided on a rental basis and are also referred to as the pay-per-use model. Cloud Service Providers (CSP) features for workflow computations include elasticity, fault tolerance, reliability, and flexible access to virtualized resources. Workflow allocation in the cloud system has been considered the problem of allocation of workflow tasks on virtual resources with matching workflow's requirements with the effective allocation of parallel executable portion in the workflow over the available hardware parallelism [3], [4]. The allocation of workflow applications with precedence constraints has been considered to be NP-Hard [5]. Over the past years, the workflow allocation problem has been extensively studied, focusing on heterogeneous distributed systems like clusters, cloud, and their successor systems [6], [7]. The workflow applications consist of cooperative tasks modeled using Direct Acyclic Graph (DAG). Here, the terms DAG and workflow are used interchangeably. These applications include web service workflows, scientific workflows, and big data processing workflows, i.e., Map Reduce from Google, Dryad from Microsoft, etc. [8], [9], [10], [11]. The scientific workflow management systems manage the deployment of scientific workflows onto virtualized distributed resources. Currently, an example of scientific workflow technology is the detection of gravitational waves by the LIGO project [12]. Workflow allocation is required to consider the heterogeneity of the computing power of machines and communication edges. An important issue in workflow allocation is how to rank the tasks in the DAG. The task rank is used as its priority in the allocation and computes an order of execution of workflow tasks to preserve the precedence constraints. Once the tasks are ranked, the task to resource assignment can be found to minimize the schedule length or other QoS parameters. On the other hand, the level attribute (lowest level gets the highest priority) is also used to assign priorities to tasks in the workflow to manage precedence orders in execution [3], [4], [13], [14], [15]. Tasks at the same level are parallel executable portions of workflow.
The multiple workflows allocation is used to improve the exploitation of the overall parallelism resulting in optimized parameters [3], [16]. Multiple workflow allocation aims to map a batch of merged workflows on suitable heterogeneous virtual resources with precedence and batch constraints so that the processing can optimize demanded parameters as per the workflow user's specifications. The challenges of multiple workflow allocation in the cloud environment are workload heterogeneity, performance variability, QoS diversity, fairness, and priority [17]. Some multiple application workflows need to be allocated onto the virtual resources. One of the straightforward methods for this is to merge multiple workflows and then execute them like a single workflow allocation. The two allocation policies and four composition approaches for merging have been presented in [18]. The merge-based allocation approaches are able to utilize the static allocation benefits. But multiple workflows are allocated immediately as per their arrival, and results are finally combined in the dynamic environment. Other multiple workflow allocation models use the level attribute to assign priorities, and managing precedence constraints to optimize one or more QoS parameters has also been proposed in the literature [16], [19], [20], [21], [22]. In this case, workflows are grouped according to the depth level, followed by mapping onto resources by exploiting the parallel executable tasks at the same depth level within the batch of DAGs.
This paper proposes a levelized multiple workflow allocation strategy with task merging (LMWS-TM) to optimize turnaround time for multiple workflows with precedence constraints represented by a DAG in the IaaS cloud environment. The turnaround time of a job submitted to a cloud environment is a very important QoS parameter that directly affects customer satisfaction, providers' reputation, energy consumption, monetary cost, and overall business. Lower turnaround time is desirable and results in better performance on other parameters too. The proposed strategy combines the benefits of both widely adopted approaches, such as levelized list-based scheduling and task merging approaches. LMWS-TM is capable enough to use simple and flexible level attributes to tackle precedence constraints. Level attribute provides the order of execution to tasks in the workflows by assigning them execution priorities (highest priorities are for the lowest level and vice versa). Levelized list-based allocation approach minimizes the actual execution time of the workflows submitted for execution. While the task merging scheme is employed to reduce inter-task communication cost share and the total number of depth levels to enhance overall completion time. A task merging scheme is incorporated into the batch of workflows after partitioning and prior to allocation. LMWS-TM considers the inter-task communication and the inter-machine distances to estimate communication costs between any pair of tasks and VMs. The task merging scheme is incorporated in SLBBS [23], and the allocation pattern is also modified. An experimental study has been carried out to measure the effect of the task merging scheme on SLBBS. A comparative analysis has been done with its peers, namely SLBBS, DLS, and HEFT, on various cases for performance evaluation of LMWS-TM on quality of service (QoS) parameters namely, turnaround time, system utilization, flow time, and response time. Further, statistical analysis has been performed to confirm the results and significance level of the simulation study. In summary, LMWS-TM has been proposed for possible improvement on the performance parameters over SLBBS [23]. The proposed strategy has been designed with the following advantages and contributions, listed as follows • Many methods exploit the parallel executable portion available only at the workflow level. Due to multiple workflows and levelized partition, LMWS-TM exploits parallelism at both workflow and task levels.
• Many models in the literature used ranking methods for preserving the precedence constraints. But the proposed model uses the level attribute for the purpose. The level attribute is the most convenient and flexible way to preserve precedence constraints in workflows.
• The proposed strategy divides the workflows into partitions as accordance to depth level. Slicing all workflows according to depth level minimizes the response time as all workflows start their execution almost in the first depth level. So, LMWS-TM is also quite suited for applications for interactive users.
• The merging phase works as pre-processing on the workflows, making them quite suitable for allocation for possible improvement.
• Further, in the task merging phase, tasks from higher depth levels are shifted and merged with the tasks of lower depth levels. This process will reduce the total number of depth levels in the batch. Consequently, the considered performance parameters are expected to be improved.
• Merging tasks also minimizes the communication cost share on the allocation of the workflows because more tasks are combined into one and allocated on a single suitable virtual machine resulting in the communication share among them being zero. Again, improvement is expected.
• Experimental evaluation and statistical analysis are conducted to evaluate the proposed LMWS-TM strategy. The rest of the paper is as follows: Section II presents some related work from the domain. Section III explains the proposed model by presenting various notions used, cloud framework, mathematical models, problem formulation with parameter estimation, an algorithm with illustration, motivations, and time complexity. The experimental study to evaluate the comparative performance of LMWS-TM has been presented in Section IV. Also, in Section IV, a statistical analysis has been conducted to test the validity of the hypothesis developed. The paper concludes the work with some future directions in Section V.
In a list-based heuristic, a list of all tasks is prepared from the given DAG according to their priorities. These heuristics have two phases, namely the task priority phase and processor selection phase. In the first phase, the tasks are topologically sorted using rank or level attributes. And in the second phase, the best processor to reduce the overall completion time is selected. There are some states of art list scheduling heuristics Viz. Earliest Time First (ETF) [25], Levelized Min Time (LMT) [39], Dynamic Level Scheduling (DLS) [26], Critical Path On Processor (CPOP) [27], Modified Critical Path (MCP) [25], Heterogeneous Earliest Finish Time (HEFT) [27]. The DLS [26] algorithm dynamically changes priorities and schedules across both temporal and spatial dimensions to avoid shared resource contention. The merits of this algorithm are broadly targetable, high speed, flexible, and display better performance. LMT [39] works in two phases. The initial phase sets the tasks which are executed in parallel mode level-wise. The second phase is used to allocate the task to the best-fitted resource. HEFT [27] is used for a bound number of processors to the heterogeneous distributed systems having two phases, task priority and processor selection phases. In the first phase, tasks are sorted by descending/ascending order using upward/downward rank. The second phase selects the best resource to minimize the actual finish time of the tasks in the list. CPOP [27] uses a critical path in the given DAG. All tasks on the critical path are assigned to the single best processor, and the remaining tasks are allocated with the rank-based allocation same as HEFT. The authors of [28], proposed a list-based workflow scheduling method inspired by HEFT for non-preemptive periodic tasks, reporting better performance. DBEFT has been proposed [29]. DBEFT is an extended work of HEFT with the same objective. Security Aware DAG scheduling using ranking-based ordering and earliest finish time has been proposed, and superior performance has been reported in the domain [30]. In another work, level-based task ordering is used for security-oriented workflow scheduling, minimizing total number of failures [31]. Multi-objective version of levelized workflow task execution has been presented in to optimize makespan and flow time [32]. Another deadline-aware multi-objective model is reported considering execution time and monetary cost [33]. A workflow task allocation model is VOLUME 10, 2022 also developed with energy and dependency constraints for the heterogeneous environment [38].
Clustering workflow heuristics are developed to minimize the transfer time between dependent tasks. Since the list-based heuristic does not consider communication cost seriously, consequently generating needless idle gaps on machines as in EFT [40] and HEFT [27]. A clustering procedure has been employed as a sequence of clustering refinements. Clustering-based approaches consist of mapping tasks to clusters and ordering tasks within the cluster [41], [42]. This approach aims to attain an overall minimum communication time share among tasks by merging them in the same clusters and assigning them on the same machines at the cost of the sacrifice of parallelism within the DAG. Thus, a trade-off between minimizing communication delay and maximizing parallelism has been observed [43]. There are some clustering-based algorithms such as Linear Cluster Method [44], Dominant Sequence Clustering [45], Resource Aware Clustering (RAC) [46], and Clustering for Minimizing the Worst Schedule Length (CMWSL) [47]. RAC [46] aims to achieve relative load balancing and efficiency improvement for a machine with different capacities. For this, RAC gives a dynamic score function for all tasks, followed by task clustering and then task allocation per the processors' computing capability. CMWSL [47] is a cluster-based task scheduling algorithm with four phases to minimize the scheduling length for more heterogeneous processors.
On the other hand, in multiple workflow allocations, all scheduling decisions are made after a successive activation period. An estimated number of workflows with similar requirements have arrived and form batch results in an efficient schedule. The work proposed in [13] presents aggregated DAGs-based multiple workflow allocation models allocating tasks according to depth level to optimize throughput. In this work, tasks within the workflow are noncommunicating, and inter-task communications among them have been considered zero. Author of [48] presents four approaches, i.e., sequential (one after another), gap search (next DAG utilizes gaps), interleaving and grouping of DAGs (merging DAGs into a single one with one entry and exit node). These approaches evaluate each other by using a modified Path Clustering Heuristic (PCH) for prioritization of tasks and selection of machines. Other work [49] proposed two workflow allocation strategies, viz. MWGS4 (Multiple Workflow Grid Scheduling 4 stages) and MWGS2 (Multiple Workflow Grid Scheduling 2 stages). These strategies have various stages Viz. Labeling, adaptive allocation, prioritization, and parallel machine scheduling. In [22], the authors solve the problem of multiple workflow scheduling with four policies for DAGs allocation and another two focusing on fairness optimizing makes pan with good fairness has been presented. The paper [50] developed a RANK hierarchical considering communication contention with task dependencies (CCRH) for reliable scientific workflow allocation in the cloud. CCRH aims to maximize reliability and improve system fault tolerance. In [51], the authors introduced a novel dynamic task rearrangement and rescheduling approach to allocate multiple workflows considering resource efficiency and robustness. The rearrangement policy improves robustness. In [52], the authors proposed a cluster-based approach for multiple workflows with soft deadlines to examine the effect of time restriction on the quality of task allocation in a heterogeneous environment. In this paper, the authors consider how to accommodate free time windows by tasks from workflows to fulfill the user's requirements on an urgent basis. In [53] and [54], the authors have proposed an energy-aware stochastic method for multiple DAGs on heterogeneous DVFS-enabled machines to minimize the energy and time. A model for the batch of DAGs with precedence constraints maximizing load balancing level for effective resource usage has been presented in [55]. A level-based batch scheduling strategy (SLBBS) [23] minimizing the turnaround time with inter-task communication has been presented in the literature. Also, its performance evaluation has been carried out with some other methods in [56]. In [57], the authors have proposed a multi-objective workflow model for multi-DAGs optimizing the completion time and monetary cost.

III. THE PROPOSED MODEL
This Section describes the proposed Levelized Multiple Workflow Scheduling Strategy with Task Merging (LMWS-TM) to optimize the turnaround time of the multiple workflows submitted for processing on VMs for IaaS cloud computing. Moreover, this Section presents various things, e.g., the list of notations used, cloud system framework, VM model, workflow applications model, problem statement, parameter estimation, the algorithm used with illustrations, motivations of the work, and time complexity of the algorithm.

A. CLOUD SYSTEM FRAMEWORK
This Section introduces the Infrastructure as Service cloud system architecture for workflow allocation, as shown in Figure 1. The various components of the architecture have been explained as follows:

1) CLOUD USERS
Multiple cloud users have their workflow applications. They need to submit their requests/applications over the cloud system for processing on some negotiated rent. Users aim to fulfill their requirements by satisfying constraints to optimize QoS parameters.

2) WORKFLOW APPLICATIONS
As shown in Figure 1, a batch of multiple workflows is submitted from many users at different locations and grouped at the arrival queue on the central dispatcher for further processing. Each workflow comprises many dependent and cooperative tasks with precedence constraints having parent and child relationships. Workflows are independent of one another and represented by using a Direct Acyclic Graph (DAG), as depicted in Figure 2. The detailed description and

3) WAITING QUEUE
A waiting queue is maintained at the global scheduler to accommodate incoming workflow applications there from different users to form a batch. These workflows are then dispatched for processing by a global scheduler on compute-intensive VMs in the cloud. The waiting time of applications in this queue till mapping is known as queuing time. We consider this queue follows an M/M/1 model. This model has infinite queue length with a single server. Also, workflow arrival and service times are determined by a Poisson and exponential distribution, respectively. The arrival rate and service rates are λ and µ respectively. The queue length is assumed infinite in this case.

4) GLOBAL SCHEDULER
Global Scheduler is one of the essential components in this framework. It is responsible for capturing all required information about the waiting queue, workflow applications, virtual machines, and associated physical machines. The global scheduler performs workflow allocation effectively by using data captured. The main aim of the global scheduler is to distribute the workflow components over virtual machines created by the hypervisor to improve system performance. It plays a crucial role in cloud systems. However, static centralized workflow allocation has the benefit of generating an efficient schedule for a batch of workflows onto a set of VMs due to having prior estimated knowledge of various application characteristics and resources.

5) PHYSICAL MACHINES
The physical machines comprise clusters, supercomputers, servers, etc., the fundamental physical computing resources that make up a cloud infrastructure. Through virtualization, users can use the virtualized versions of machines of the physical machine without any management overhead.

6) VM MANAGER/HYPERVISOR
The hypervisor can be viewed as a software layer that provides flexible management of virtual resources to users in an application layer. This cannot be used directly by the endusers. A connection is needed between any two entities like clients, servers, or applications. VM monitor enables virtual operating systems (OS) to run simultaneously on a machine. The hypervisor provides the number of VMs, computing capacity, bandwidth, storage, etc. The VM manager decides on allocating tasks, reducing resource cost, time, etc.

7) VIRTUAL MACHINE
A Virtual Machine is an application environment installed on software that mimics the behavior of a dedicated physical machine. The cloud service provider provides VM. The users have the same experience as they would have on the physical machines. A detailed VM modeling has been presented in coming to section III B.

8) FINISH QUEUE
The workflow tasks submitted by users are pooled in the finish queue after their completion. The finish queue acts as a buffer of results from processing. As depicted in Figure 1, generally, the relation among the number of physical machines, VMs, and workflows is r k <= n WF .

B. VM MODEL
Now, CSP offers a set of K virtual machines VM = {VM 1 , VM 2 , . . . .,VM k }. The following characteristics regarding VMs from the IaaS CSP are listed as: • K number of virtual machines (VMs) for mapping multiple independent workflows.
• Machines are capable of compute-intensive workflows.
• Computing capacities of VMs (CC k ) • Initial Ready time (RT k ) which measures the previous load on VM k .
• VM distances (D ab ) are the distance between VM a and VM b estimated as the number of links.
• A matrix, E (n WF × n wf i × K), written as E ijk is the Expected Time to Compute of T ij in the workflow wf i on VM k .

C. WORKFLOW APPLICATIONS MODEL
A batch of multiple workflows WF = {wf i : 1 ≤ i ≤n WF } has been considered in which each workflow is modeled by using a Direct Acyclic Graph (DAG). Each workflow comprises of a set of tasks τ i = {T ij |1 ≤ i ≤n WF , 1 ≤ j ≤n wf i } and set of links (edges) between the tasks in the workflow. In the batch of workflow, each workflow consists of various tasks having depth levels and may need inter-task communication with predecessor tasks from the same workflow at previous depth levels. The following characteristics of workflows are listed as • A batch of n WF the number of compute-intensive multiple workflows represented by DAG.
• Each task T ij in workflow wf i is associated with a unique level attribute l ij .
• Level attributes have been used to manage precedence and dependence constraints.
• L i is the depth level of workflow wf i (L i = max(l ij : precedence level of all T ij ∈ wf i )).
• Depth level of the whole batch of the workflow (L) is the maximum of L i, i.e., L= max(L i ).
• Inter-task communication (Data ixy ) is between tasks T ix and T iy has been considered and measured in MIs.
• Multiple workflows have been divided into L partitions (ρ L ) as per depth levels. A batch of multiple workflows is presented in Figure 2, where tasks in the workflows are sliced into groups as per the order of precedence. They may require communication with the other tasks at previous levels in the workflow. Workflows in WF may have different depth levels. The same color is used to represent the tasks at the same depth level in the workflows, e.g., T 11 , T 12 , T 21 . . . T N1 has precedence level 1 while T 13 , T 14 , T 22 . . . T N4 has level 2 and is shown by yellow and light blue, respectively. Moreover, it is evident from Figure 2 that T 15 is dependent on T 13 and T 14 to begin its execution. Tasks at the same depth level e.g., T 13 , T 14 , T 22 . . . T N4 can be processed simultaneously. Finally, it is supposed that the workflow pre-processing has been done before mapping the workflows.
Communication cost (CC l−s,l ixyab ) in the considered scenario depends on inter-task communication (Data ixy ) and machine distance (D ab ) between machines VM a and VM b . CC l−s,l ixyab between two tasks T iy ∈ ρ l and T ix ∈ ρ l−s of workflow wf i (T iy is assumed to be dependent on T ix ) allocated on machine VM a and VM b, respectively, as presented in Figure 3 and can be estimated as follows [23]: Here, s∈ z + s ≥ and x, y=1, 2, 3. . . n wf i . The communication cost (CC l−s,l ixyab ) is direcly proportional to both Data ixy and D ab . The z is the constant of proportionality with linear relationship between them.
Therefore, the value of z is considered unity.

D. PROBLEM FORMULATION
The workflow allocation problem is the mapping ∅ for the set WF = {wf i : 1 ≤ i ≤n WF } submitted for execution on the set of virtual machines (VMs) in an IaaS cloud environment to produce an allocation schedule (AS) with the aim of optimizing the objective criteria: Here, the objective is the turnaround time of the submitted set of workflows (WF) subject to the constraints as 1.
i.e., the allocation must satisfy precedence constraints for all T ij .
Allocation starts with merging tasks of the workflows following the procedure as explained in detail in Section IIIC. Afterward, the allocation of the tasks of each depth level has been done based on the allocation method mentioned, and detailed illustrations have been presented for the same in sections III C. Level-wise allocation and execution of the workflows having L depth levels with their execution time, communication time, and ready time values for various depth levels have been presented by using different colors in Figure 4. Execution time at level l on VM k is the sum of E ijk of assigned tasks on the specified virtual machine and can be calculated as The net communication cost (CC l k ) is taken as the maximum of the communication cost of the tasks assigned on VM k with their predecessor tasks. Since the communication requirements of next-level tasks can be met in parallel after the execution of previous-level tasks, the CC l k of assigned tasks to their predecessor tasks on VM k at depth level l can be estimated and written as where 1 ≤ h ≤ k, CT l k is the total completion time on VM k and computed as the sum of RT k , CC l k and ET l k as per the equation written as RT l k is the ready time of VM k, which is the workload assigned prior to allocation and incorporated at the first depth level only. The turnaround time (TAT) is estimated as the time duration between workflows submission to completion and can be written as: where QT [23] is the average queuing time of workflows in the central queue and can be expressed as where queue unit (Q u ) of servicing node is the amount, it can handle in one go, and B is the total number of MIs in the batch. Flow time can be computed as: The average system utilization (U s ) of the resources in the cloud system can be estimated as: Finally, the Response time (RT) is the time duration between the workflow's admission in the system and the appearance of the first responses of execution of the task from the system. It is a significant parameter to measure the interactive-ness of the system towards user's workflows. RT for the batch multiple workflows is calculated as The proposed model always desires the optimum values of QoS parameters for the resultant allocation schedule.

E. LEVELIZED MULTIPLE WORKFLOW SCHEDULING STRATEGY WITH TASK MERGING
The method starts with partitioning the batch of multiple workflows per the level attribute in the DAG. And then, the scheduler first merges the tasks of workflows in batch in such a manner that overall execution time gets reduced. The task VOLUME 10, 2022 merging starts with determining the largest tasks as per the size across the workflows at each depth level, observed from first to the last. An effort is made to package the tasks of successive precedence levels for each workflow in such a way that the resultant size has a size less than or equal to the chosen largest task in the given depth level. The logic behind this packaging is that if there is a considerable size difference between the tasks selected for execution at the given depth level, the nodes will be idle till the largest task is over with its execution. The logic behind this task merging is that if there is a huge difference between the tasks selected for execution at the given depth level, the VM will be idle until the largest task is over with its execution. Therefore, the proposed scheme tries to fill the expected gaps produced due to the size difference between tasks at the same depth level. This, in turn, ensures a better total execution time and utilization with the benefits depending on the various attributes of the workflows comprising the batch viz. the degree of dependence, number of tasks at the given depth level, number of depth levels, and the available hardware parallelism in terms of the number of VM. Afterward, this batch of workflows gets assigned as accordance with the depth level on the appropriately selected VM as decided by the scheduler as per the allocation policy. This exercise's overall result is only for effectively exploiting the parallel portion and managing communication cost share in the workflows on VM while maintaining the order of execution of tasks in the workflows. It is also observed that the performance of this scheme is well suited for heterogeneous VM selected for computation are of the comparable capacities. The procedural steps for the same are presented as follows: The procedure for the task merging is given from step 4 to step 14 as in Algorithmic Template. Further, the regrouped tasks get allocated level-wise from the workflows on the appropriately selected VM. The allocation pattern of the tasks of the batch of workflows has been inherited from the allocation policy given in our previously proposed work [23]. However, the allocation pattern is modified with the expectation of enhancement in the turnaround time. Here, at each time largest or smallest tasks from the specified partition get selected for allocation. After that, the scheduler searches for a suitable virtual machine to allocate the tasks belonging to the partition at hand. The algorithm for the allocation of tasks of the batch of DAGs as per their depth level is given in the template from step 15 to step 31. With the amalgamation of task merging (TM), the work intends further to improve the combined scheme's performance, namely LMWS-TM. The scheme is capable enough to be adopted by any other scheme, which is precedence based on fine-tuning the tasks assigned to each level. Finally, for each partition, the values of CC l k and CT l k are computed. The algorithmic template for the same has been presented as follows: An illustration to explain LMWS-TM, SLBBS, HEFT-1 and HEFT-2 has been presented in this Section with two workflows consisting of 6 tasks with a precedence/depth level among themselves, as shown in Figure 5. Let the arrival rate of the workflows and service rate of the central dispatching LMWS-TM Input: WF, T ij , VM k , E ijk , Data ixy , D ab , QT, size(T ij ), n WF , n wf i and K Output: Allocation schedule (i, j), TAT, U s , FT, and RT (4) 3. Sorting all ρ l // Sorting partition ascending/ descending order 4.

Divide WF as per depth level in ρ l // Divide the batch as per depth levels 2. Compute E ijk as per equation
for all ρ l do 6.
Get size l max and size l min // first and last task from the sorted ρ l 7.
Select T ij i.e. size(T ij )=size l min 8. do until size l min ≤ size l max 9.
end do 13. end for 14. } 15. for l=1 to L do 16. for all T ij ∈ ρ l do 17. for all M k 18.
Update Execution Time ET l k //ET l k = ET l k +E ijk 26. ρ l =ρ l -T ij // Remove allocated tasks from the partition 27. end for 28. Compute CC l k // as per equation (6) 29. Compute CT l k // as per equation (7)  virtual machine be 0.03 and 0.05-unit time, respectively. And queuing unit (Q u ) is 10,000 MIs. We have taken only three VM, but there may be more in real scenarios. The complete information on heterogeneous VM is shown in Figure 6, for example, the computing capacity (CC k ) and initial ready time (RT k ) of V 1 are 9 and 40 respectively. And VM distance (D 12 ) between V 1 and V 2 is 2. In each workflow, information like the number of instructions, depth level, and inter-task distance between two tasks of the same workflow is presented in Figure 5. For example, the size of T 11 is 1100 Mis and the inter-task communication (Data 113 ) between T 11 and T 13 is 3. E ijk for tasks of each workflow can then be computed and shown in Table 1    task merging policy mentioned in the algorithmic template. Similarly, tasks T 25 and T 26 of depth levels 3 and 4 also are merged. First workflow wf 1 requires no merging of tasks. The resultant batch of workflow can be seen in Figure 7. The ETC matrix also requires updating as per the previous merging, as shown in Table 2.
The workflows are grouped into partitions according to precedence level, then sorted in descending order i.e.,    Figure 7 and Table 2. All the execution of tasks is shown in Figure 8.
After task merging, the workflows are assigned on the selected VM as per the allocation strategy mentioned earlier, and the respective values of RT 1 k , CC l k and, ET 1 k as shown in Figure 8. Using SLBBS, the allocation of the workflows has been done level-wise with tasks allocated on VM k . and respective values of RT 1 k , CC l k and, ET 1 k as shown in the Figure 9. The detailed illustrations can be seen in [23]. Further, the performance parameters are computed as follows:    The same workflows are allocated using HEFT [27], and performance parameters are computed. HEFT is implemented for multiple workflows such as HEFT-1 and HEFT-2 in two different approaches, viz. sequential and merged-based approaches are presented in [48] as shown in Figure 10 and Figure 11. In HEFT-1, workflows are allocated from batch one after another using HEFT as in [26]. On the other hand, HEFT-2, one pseudo entry, and one pseudo exit tasks form a single larger workflow from all, multiple workflows by joining initial and end tasks to pseudo entry and pseudo exit task, respectively. After this, HEFT is applied to the resultant DAG. Now, the values of parameters for HEFT-2 computed are as CT 1  The parameter values computed as per the illustration and shown in Table 3, LMWS-TM, HEFT-2, HEFT-1, and SLBBS strategies are represented along with the performance metrics mentioned in the above example. LMWS-TM is giving superior values 473 and 0.9398 on TAT and system utilization, while the average response time is almost the same as LMWS-TM and SLBBS. Also, HEFT-1 shows the best value among all on flow time. HEFT-1 has the worst value among all on average response time.

F. TIME COMPLEXITY
The time complexity is estimated for the proposed algorithm as per the steps involved in the algorithmic template in Section III E. Now, let the user has submitted n WF number of workflows WF = {wf i : 1 ≤ i ≤ n WF } and wf i has n wf i the number of the tasks. The average number of tasks assumed here as n and t in the workflows and partitions, respectively i.e., n = i=n WF i=1 n wf i n WF and k number of VM. The depth levels for workflows (wf i ) are l i , so, the depth level of WF is L= max (l i : where i varies from 1 to n wf i ). And, L can be estimated as L= log n. The time complexity of LMWS-TM involved partitioning the batch, sorting partitions, task merging, and allocation. 1) For dividing the workflows into partitions is O (L × n WF × n).

IV. EXPERIMENTAL STUDY
The experimental study has been conducted to observe and analyze the effect of task merging of workflow tasks in the batch of workflows on the set of VM available in the IaaS cloud system at the time of allocation. Statistical testing has also been conducted to check the significance level of the hypothesis derived from the study. The experiments were conducted using MATLAB 7.60 using Intel (R) Core (TM), i7-3770 CPU using 2GB RAM, and Sun Fire X4470 Server with 14 GB RAM. The study is done to evaluate the comparative behavior of the LMWS-TM with SLBBS, DLS, HEFT-2, and HEFT-1 for various cases to analyze its effectiveness in the middleware.

A. SIMULATION RESULTS
The simulation produces a realizing cloud environment and a batch of random workflows for evaluating the proposed strategy's performance. The multiple random workflows consist of parameters like workflow number (Batch Size), precedence level, task size, and degree of parallelism within the batch, amount of inter-task communication within workflow tasks. Parallelism in the batch of multiple workflows varies by varying the parallel tasks in the depth levels and vice versa. Similarly, IaaS cloud system has attributes such as number, computing capacities, distances of VMs. Accordingly, simulator prototype of the workflows allocator implemented in MATLAB produces associated input parameters of workflows, and VMs between a specific feasible limit are randomly generated by using a discrete uniform distribution. For the experimental results, the common parameter setting is given in Table 4 for all cases. Experiments have been done by varying the input parameters related to the cloud system and batch of workflows for all considered strategies, i.e., batch size, number of VMs, and parallelism (depth level). The simulation results for each case are presented in figures and tables in this Section. All experiments are repeated by 50 times, and the mean of the corresponding parameters is reported for all the cases for avoiding the effect of randomness. For example, in varying batch size (for case 1), the experiments are conducted 50 times for each batch size, such as 4 to 128 and the mean values of turnaround time, utilization, response time, and flow time is reported in figure 12 to figure 15 and Table 5. The same pattern is used for other cases. Further, the response time of HEFT-1 is very large in comparison to LMWS-TM,   SLBBS, HEFT-2. Therefore, for better presentation, HEFT-1 is omitted in figure 15, figure 19, and figure 23, and only four methods are presented in these figures. However, the numerical results are also presented in the tables for better clarity.   For better visibility in comparative analysis, the average values of TAT, Us, FT and RT have been also presented in Table 5 for varying workflows and the best values are shown in Bold. Figure 12, the trend of turnaround time keeps increasing as the number of workflows increase for all the strategies, viz. LMWS-TM, SLBBS, HEFT-1, HEFT-2, and DLS as expected on keeping all other input parameters fixed. As can be seen that the rate of increase in turnaround time by increasing the number of workflows is the least for LMWS-TM. LMWS-TM outperforms all other strategies for every batch size from small to larger. The performance order for turnaround time is LMWS-TM (best), SLBBS, HEFT-2, HEFT-1, and DLS (worst). The observed performance gain of LMWS-TM over SLBBS is in the range of 16% -50% for 4 to 128 workflows in the batch. This performance gain is due to the amalgamation of task merging prior to allocation, which reduces the communication cost-share and the total number of depth levels resulting in improved TAT.

Observations: 1) As shown in
2) The trend of system utilization is increasing batch size and keeping other parameters fixed, as shown in figure 13. The performance order for utilization is LMWS-TM (best), SLBBS, DLS, HEFT-2, and HEFT-1 (worst). LMWS-TM outperforms all other strategies for every batch size in terms of system utilization. This is because as more workflows accumulate, more tasks become available at any precedence level, resulting in the batch exhibiting more parallelism within itself. LMWS-TM is designed keeping in mind to exploit parallelism at the workflow level as well as the task level with uniform allocation on the VMs available. Further, the proposed scheme tries to fill the expected gaps produced due to the size difference between tasks at the same depth level by task merging, ensuring better utilization. Also, the performance of LMWS-TM has been improved for batch sizes over SLBBS, which is significant for smaller batch sizes and tries to be at par for larger batches. Hence, it has no effect on response time because it combines the tasks from the next depth levels and merges them with the tasks at previous levels. HEFT-2 performs better than DLS. HEFT-1 observes the worst performance because workflows are assigned one after another in serial exploiting only task level parallelism in the DAG. Again, for better visibility in comparative analysis, the average values of TAT, Us, FT and RT have been also presented in Table 6 for depth levels and the best values are shown in Bold.  Observations: 1) As presented in Figure 16, turnaround time is gradually increased by increasing the depth levels for a fixed batch of 256 workflows. Increasing the number of levels reduces the parallelism in the workflows resulting in degrading turnaround time. LMWS-TM still performs best, and SLBBS is the second-best on turnaround time, followed by HEFT-1 and HEFT-2, with DLS worst. Performance gain of LMWS-TM over SLBBS for depth levels 8 to 64 is almost 22%-35%, respectively. 2) Average utilization is also slightly decreased for all the methods considered in experiments on varying the depth levels, as shown in Figure 17. Here, the batch size is 256. Therefore, LMWS-TM, SLBBS, and DLS are approximately at par. HEFT-1's performance is worst.  3) In this scenario, the flow time is gradually increased, as shown in Figure 18 for all the methods considered. The performance gain over SLBBS is almost 9%-19%. 4) In the case of varying depth levels (parallel tasks), as shown in Figure 19, LMWS-TM and SLLBBS are at par and significantly perform better than other peers. Again, HEFT-1 performs worst for the same reason as mentioned earlier case in detail.

3) VARYING NUMBER OF VM
This case presents the effect of the variation in hardware parallelism (number of VMs) from k = 8 to k = 128, with the remaining input parameters fixed as follows: n WF = 64, K = 32, RT kl = 0 − 1000, CC k = 30 − 1000, T ij = 512, Data ixy = 1 − 100, L = 16, D ab = 1 − 100, µ = 0.05, λ = 0.03, Qu = 200000 Again, for better visibility in comparative analysis, the average values of TAT, Us, FT and RT have been also presented in Table 7 for varying VMs and the best values are shown in Bold.   configurations of VM among all strategies taken in experiments. The performance order is the same as in the first case i.e., LMWS-TM, SLBBS, HEFT-2, HEFT-1, and DLS. The reason for the superior performance of LMWS-TM has been explained earlier in detail. The improvement in turnaround time for the case is reported as almost 15%-30% for the 2 to 128 VM. Chances of selecting better machines increase with a larger number of VM. Consequently, the performance of the strategies is improved as VMs instances are increased. 2) As in Figure 21, average utilization also shows a decreasing trend in increasing the number of VM. The performance order is LMWS-TM, SLBBS, HEFT-2, HEFT-1and DLS. Again, the proposed model has proven best on account of utilization. 3) As presented in Figure 22  HEFT-1, and DLS. In this case, performance in the range is almost 5% to 15% on flow time for LMWS-TM over SLBBS. 4) Response time follows the same decreasing trend on varying the VMs from 8 to 128, as presented in Figure 23. Performance order is the same as in previous cases with LMWS-TM and SLLBBS at par.

B. STATISTICAL ANALYSIS
This part is devoted to statistical testing for performance evaluation to test the hypothesis developed from the outcomes of the simulation study. The analysis has been conducted with SPSS Statistics 20 for the data sets generated in simulation experiments and presented from Figure 12 to Figure 23 Table 8 generated by experiments for considered peers on TAT by varying batch size (number of workflows) shown in Figure 12.   Table 9 presents the normality test results using the Kolmogorov-Smirnov and Shapiro-Wilk test for the sample given in Table 8. Here sample size is taken as 6 for each case. In the table, we can see that almost all Sig. values are greater than 0.05. Hence, the normality test null hypothesis (Ho1) is accepted, and alternate hypothesis H11 is rejected. Thus, the sample data is normally distributed for all the models for TAT on varying the batch size.
As verified, the samples are normally distributed. All the models, namely LMWS-TM, SLBBS, DLS, HEFT-1, and HEFT-2, are independent. So, the samples generated for the parameters, i.e., TAT, Us, FT, and RT, are also considered independent samples. Therefore, one way ANOVA test can be used to test the significance level between the samples. Table 10 presents statistics of one-way ANOVA test. Here, the significant (Sig.) value is 0.014 for TAT on considered samples from various models. This Sig. value is less than 0.05 at 5% level of significance. Hence, Ho2 is rejected. Rejecting the null hypothesis proves that the proposed model i.e., LMWS-TM, significantly differs from SLBBS, DLS, HEFT-1, and HEFT-2 on TAT. The mean plot presented in Figure 24 also confirms the considerably better performance of LMWS-TM on TAT. The sample data from the remaining experiment varying batch size, depth levels, and the number of machines for TAT, Us, FT, and RT as in section IV A has been tested in a similar pattern as presented above. For all samples under study on TAT and FT, Ho2 is rejected and accepts H12. This verifies that the performance of LMWS-TM is significantly best among all cases on TAT and FT under study. Further, for the samples on RT, every time Ho2 is accepted. This is due to the almost equal RT of SLBBS and LMWS-TM for all the cases. Further, SLBBS samples are removed, and only means of four models are compared for the same (i.e., LMWS-TM, HEFT-1, HEFT-2, and DLS). In this scenario, Ho2 is rejected in all the tests. And acceptance of H12 confirms the significantly superior performance of LMWS-TM among all models excluding SLBBS on RT under study. For the case of average utilization (Us), the majority of experiments rejected the Ho2. Still, some samples under study accept the Ho2. However, the performance of LMWS-TM on Us is almost better compared to others, with some exceptions. Thus, the results of hypothesis testing confirm the conclusions drawn from the simulation study in section IV A.

V. CONCLUSION AND FUTURE SCOPE
The strategies dealing with the batch of multiple workflows perform better by considering parallelism at the workflow and task levels, satisfying the precedence constraints. This work presents a levelized multiple workflow allocation strategy with task merging (LMWS-TM) for multiple workflows having precedence constraints to optimize turnaround time. The task merging scheme is inspired by the clustering approach to reduce inter-task communication share and the total number of depth levels. It is incorporated in each workflow after partitioning, followed by allocation level-wise from first to last to enhance overall execution time. Moreover, the scheme is capable enough to use simple and flexible level attributes to tackle precedence constraints that provide the order of execution to tasks in the workflows. An experimental study has been carried out by comparing LMWS-TM with other stateof-the-art DAG scheduling strategies, viz. SLBBS, DLS, and HEFT by varying the batch size, the degree of parallelism in the workflows (depth levels), and the number of available nodes (hardware parallelism) to evaluate the suitability in the literature. The experimental study suggests that LMWS-TM performs best among all its peers significantly in all the cases under study.

91
KETAN KOTECHA is currently an Administrator 92 and a Teacher of deep learning. He has exper-93 tise and experience of cutting-edge research and 94 projects in AI and deep learning for over last 95 25 years. He has published three patents and 96 delivered keynote speeches at various national 97 and international forums, including at Machine 98 Intelligence Labs, USA, the IIT Bombay under 99 the World Bank Project, the International Indian 100 Science Festival organized by the Department of 101 Science and Technology, Government of India, and many more. He has 102 published widely with more than 100 publications in several excellent 103 peer-reviewed journals on various topics ranging from cutting edge AI, edu-104 cation policies, teaching learning practices, and AI for all. His research inter-105 ests include artificial intelligence, computer algorithms, machine learning, 106 and deep learning. He was a recipient of the two SPARC projects in AI and 107 worth INR 166 lacs from MHRD, Government of India, in collaboration with 108 Arizona State University, USA; and the University of Queensland, Australia. 109 He was also a recipient of numerous prestigious awards, such as Erasmus Deemed to be University. He has published more 126 than 200 peer-reviewed research articles (indexed in SCI/SCIE) and ten 127 international books. He was selected as an Outstanding Reviewer from 128 Knowledge-Based Systems (Elsevier). He is also serving as the guest editor 129 of more than 40 special issues in various peer-reviewed journals. More 130 information can be found at: http://www.dhimangaurav.com.