A Fair, Dynamic Load Balanced Task Distribution Strategy for Heterogeneous Cloud Platforms Based On Markov Process Modeling

Load balancing techniques in cloud computing can be applied at three different levels: Virtual machine load balancing, task load balancing, and resource load balancing. At all levels, load balancing should also be implemented in an efficient manner, to increase system performance. In this paper, we propose a fair, in terms of added workload per VM, task load balancing strategy, that aims to improve the average response time and the makespan of the system in the cloud environment. The problem is formulated as an irreducible finite state Markov process, which is known to have a balance equation for each state. From the balance state probabilities we derive the expected utilizations for the virtual machines (VM), which play a vital role in our task allocation approach. In our model, the Load Balancer (LBer) acts as a central server, which uses our proposed fair task allocation scheme to distribute the incoming tasks in a fair, balanced manner among the virtual machines, taking into account their current state as well as their processing capabilities. Our scheme has been compared to recent algorithms that use the particle swarm optimization and the Honey bee foraging scheme to achieve load balancing. Our experimental results show that our proposed scheme outperforms other state of the art schemes in terms of makespan, average response time, and resource utilization and provides lower degree of imbalance.


I. INTRODUCTION
C LOUD computing is a very popular internet-based technology, which provides resources and computer services on demand to customers with different needs [1], [2]. The provision of all the services are divided into three categories: Infrastructure as a service (IaaS), Software as a service (SaaS) and Platform as a servicePaaS [3]. Load balancing (LB) refers to the allocation of the workload in a distributed system like the cloud, in such a manner that the system resources almost equally loaded, that is, no resource is over-or under-loaded. In this manner, the overall system performance, which is determined by parameters like the makespan, the average response time, or the total execution time, improves dramatically.
The load balancing schemes can be broadly classified as proactive and reactive: the proactive approaches act in advance in order to prevent overloads, while the reactive approaches act after the overload problem appears. A narrower classification then divides the LB schemes in the following categories: Virtual machine load balancing (VMLB) schemes, which distribute the VMs from overloaded nodes to less loaded nodes [4], [5], task load balancing (TLB) schemes, which evenly distribute the tasks among the VMs [6], [40], and resource load balancing (RLB) schemes, which focus on the management of the available resources like servers, network links [8], CPU, memory and bandwidth [10]. The LB strategies are further divided in static and dynamic: the static strategies set up pre-specified workload distributions that remain unaltered during runtime, while the dynamic strategies have the capability of adapting to system changes and distribute the workload during runtime. Apparently, the static LB strategies fail to respond to system changes, thus resulting in overall poor system performance. This fact turns the spotlight on the dynamic LB strategies. Dynamic load balancing is one of the most important aspects of scheduling in cloud systems. The workloads between several VMs must be distributed in such a manner that the response times and makespan values are reduced. An efficient load balancing algorithm, prevents over-utilization or even exhaustion of the available resources.
Quite often, large numbers of arriving tasks can cause resource exhaustion. In such a scenario, the VM is unable to handle a percentage of these tasks, which remain unprocessed and unaccomplished. Thus, proper VM selection is required during task distribution, which is usually based on the current workload. However, the current workload alone may not be a good criterion. This is because the workload on different machines may vary because of differences in their computing capacities [1] and because they handle tasks of different sizes and processing requirements [9].
In this work, we introduce a new dynamic task load balancing scheme, to efficiently distribute the tasks among the system's VMs. The novel idea introduced in our scheme is that it considers fairness as the main criterion of task distribution among the available VMs. A fair task distribution scheme: (1) assigns the new workload to each VM proportionally to its current processing capacity and (2) this assignment is implemented in such a manner that all the VMs are expected to be equally utilized after the distribution of the new tasks.
Our fair task distribution scheme is based on employing a queuing network model, where the LBer acts as a central server that feeds the VMs. Each VM is modeled as a queue where the assigned tasks are inserted. We employ a closed network model and we work on a time slot basis. For a short period of a time slot, we can assume that the system performance is not affected by external factors like the rate at which the users submit their tasks. The main contributions of our fair task distribution policy are summarized as follows: • We use a closed network model and thus, our LB strategy is independent of the task arrival rates produced by the users community. • The time-slot by time-slot system monitoring can help in fixing imbalances on time. • The Markov process model is easy to apply and it is computationally efficient. • The proposed scheme can be applied to heterogeneous clouds with different configurations (task sizes, service rates, and utilizations). • The need for task migrations is present only in case when a VM is found to be performing unreliably. The remaining of this article is organized as follows: Section 2 presents a review of the related work. In Section 3, we describe the system model and provide two motivating examples to expose the difficulties of balancing the load in a fair way. In Section 4 we present our approach to task distribution by incorporating our fair distribution strategy into a Markov process model. Section 5 presents our experimental results and Section 6 concludes this paper and gives future work directions.

II. RELATED WORK
Load balancing is one of the most important topics in cloud computing and many papers have focused on it. There is a large number of surveys that present different taxonomies of LB schemes [2], [11]- [19], among many others. In this section, we discuss papers that belong to the categories of virtual machine LB, resource LB and task LB.
VMLB distributes the VMs from overloaded to less loaded nodes. A number of VMLB strategies have been proposed; in [4], the authors propose a system for dynamic wellorganized load balancing using VMware workstation and a genetic algorithm to implement migrations by examining each VM's fitness (overloaded VMs have low fitness). Machine learning algorithms have also been used to group the VMs based on resource (like RAM or CPU) utilization [20]- [23]. Other VMLB strategies are based on active monitoring to locate the least loaded virtual machine among all the virtual machines. The dynamic resource reconfiguration is performed according to the real time requirements [24], [25]. The aforementioned strategies, just like the majority of LB strategies, are reactive [2]. An interesting proactive, predictive VMLB approach can be found in [5], where the ant colony optimization is combined with the particle swarm optimization to achieve VMLB. Another predictive strategy introduces a rule-based load-balancing algorithm based on the predictions of an end-to-end system called Cicada [26].
The RLB strategies focus on the management of the available resources. Some strategies employ game theoretical approaches to RLB. For example, [1], [27], and [28] [29], [30] formulate the problem into a non-cooperative game among the multiple servers, where each server is informed with information for the other servers. Tang et al. [8] propose an approach for OpenFlow network models, which is implemented on a time slot basis. Chen et al. [31] consider the problem of LB in a multi-objective framework, where initially the problem of resource allocation for emergent demands is resolved. In [32] the authors present a loadbalancing framework with the objective of minimizing the operational cost of data centers using a genetic algorithm for resource allocation. Weight factors have also been employed for resources like physical memory, bandwidth, number of processors, and processor speed [33]. The aforementioned strategies are reactive. Also, proactive methods can be found in the literature [34]- [36]. More specifically, Singh et al. [34] base their proactive approach on autonomous agents, and whenever the load of a VM approaches a threshold, the agent looks for an alternative VM in another data center; Xiao et al. [35] employ a non-cooperative game theoretical approach  [36], a predictive strategy, which efficiently predicts the future need of resources is proposed. Task LB has been widely studied over the past years. As with the VMLB and RBL schemes, the reactive approaches are far more widely used. The proactive TLB approaches try to detect task overloads before they actually happen [37]- [40]; The main drawback of the proactive task load balancing approaches that have been proposed is that they are used in a rather traditional way and they introduce no novel concepts [2].
The reactive TLB approaches respond to a load unbalancing situation. A variety of reactive LB techniques have been proposed [41] - [50], the majority of which are multiobjective, in the sense that they aim at enhancing many metrics like makespan, response time, execution time, throughput, etc. A few single objective schemes have been proposed: More specifically, in [41], the authors implement a honey bee inspired TLB scheme; the basic idea is taken from the food finding behavior of the honey bees. The algorithm tries to optimize the makespan (overall task completion time). A similar heuristic approach is taken by Gupta et al. [42] to adjust the scheduling load. In [43], a Simulated Annealing (SA) approach is taken to balance the load of the cloud infrastructure and reduce the response time.
The multi-objective LB strategies are implemented in such a way, that more than one factors that characterize LB are improved. More specifically, Pradhan et al. [44] propose a particle swarm optimization (PSO) load balancing technique, which aims at minimizing the makespan. The same objective can be found in [45], where a heuristic-based load-balancing algorithm (HBLBA) is proposed. The TLB is expressed as an optimization problem, which aims at minimizing the makespan while maximize the resource utilization. In [46], the proposed algorithm is an extension to the Min-Min schedule using a genetic optimization algorithm, which employs the computerized search based on natural selection and genetics. The result is an improvement on the makespan and resource utilization. In [47], the authors base their strategy on monitoring possible violations of the SLA requirements by examining if the task completion time is higher than a defined deadline.
An agent based strategy is proposed in [48]; the agents assigned to resources learn to select the best sequence of the tasks that can optimize the total makespan of the workflow, enhance utilization of resources, and improve load balancing between resources. In [49], the notion of fair distribution of the workload among the VMs is considered. In contrast to the single honey bee foraging behavior (which first uses round robin to assign tasks to virtual machines and then balance their workload), this scheme first makes the VM selection by checking its load. Then, for an incoming task, it checks how the VM status changes if this task is assigned. In this way, the under-loaded VMs are selected to accommodate the incoming tasks, in order to reduce the makespan and increase the degree of load balancing. In [50], the Adaptive Dragonfly algorithm (ADA) combines the dragonfly and the firefly algorithms. ADA uses a multi-objective function to optimize the makespan, the processing costs and load. Table  1 summarizes the techniques being used by reactive TLB state-of-the-art strategies, their objectives and the system information they take into account.
Another interesting approach to task distribution is the socalled cooperative strategy. In this approach, communication and cooperation between the participants is required to accomplish a complex task, which is divided into smaller and simpler sub-tasks. Such tasks may require specific amount of users, with specific machine characteristics [51]- [58]. Most of the aforementioned strategies focus on a trade-off between quality and cost and first they try to find the appropriate users that will cooperate to accomplish the task, before distributing the tasks. The issue of trust among the cooperative parts is taken into account, so that the cooperation produces better results. However, these schemes do not take into account the task allocation problem in a large-scale scenario ( [56] is an exception) and they miss explicit fairness mechanisms.
The proposed scheme is a novel dynamic, reactive TLB scheme, which employs a Markov model to the problem of load balancing. A time slot-by-time slot approach is taken to monitor the current load it's novelty is that the task distribution is based on considering fairness as the basis for the task distribution strategy.Our scheme is multi-objective and tries to enhance 4 metrics: makespan, average response

III. PRELIMINARIES
In this section, we first present the framework of our work and then a number of motivation examples that illustrate the difficulties of load balanced task allocation and the advantages of our work. Fig. 7 shows the system model of the proposed LB scheme. The user community generates tasks, which are assigned among the VMs of the cloud data center. The VMs are responsible for processing the user tasks. Each user submits different numbers and sizes of tasks, which have to be distributed among the VMs in such a way that their load remains as balanced as possible, in order to achieve good performance.

1) Markov Process Model
From Fig.1, it is clear that each VM is modeled as a queue system; the user tasks to be processed by a VM enter its queue, so each VM can be considered as a single-server model. A central server, the load balancer (LBer) is the input to the VMs. We let s represent the overall number of VMs plus the LBer. Particularly, throughout this work, we use the index value of 1 for the LBer and the remaining s − 1 index values for the VMs. The state of our network with s elements (the LBer and the VMs) is given by a vector N = (N 1 , N 2 , . . . N s ), where N k is the number of tasks being processed in an element i and N is the overall number of tasks, that is s i=1 N k = N . For k = 1 (the LBer), processing refers to the allocation of tasks, while for k = 2, . . . , s (the VMs) it refers to task accomplishment. The Markov proposes model is irreducible, that is, each state can be reached from any other state with non-zero probability. Therefore, the equilibrium state probability distribution can be derived. If µ k is the expected processing capacity for VM k (the processing capacity under a certain load), the equilibrium state probability is where P (N, 0, . . . , 0) is the probability that all the tasks are still located in the LBer (all the tasks are unallocated), In [59], the authors have proven that, the equilibrium state probability for a server unit is given by is the distribution factor between the LBer and VM k , b 1k is the probability of distributing one task to V M k under the current system configuration, and F (N ) expresses all the possible combinations of task placements within each server unit: The exponent terms N k in Eq. 5 are all the possible numbers of tasks that may be allocated in VM k . This number can vary from 0 to N (in which case VM k has N tasks and all the others are empty). If we consider only the states where N 1 > 0, that is, at least one task remains undistributed by the load balancer we obtain F (N − 1): and the expected utilization for the LBer can be computed by dividing F (N − 1) by F (N ), where F (N ) includes also the states where N 1 may be reduced to 0: This value shows the percentage of time the LBer is expected to be busy in distributing the N tasks and increases with N until reaching 1. When p 1 is known, the expected utilizations of the VMs for all the possible task placements can be computed as [59]: while the expected utilization p k for VM k is the percentage of time the VM is expected to be busy in processing a portion of the N tasks.

2) Server Availability
In this work, we use a variation of the VM availability model proposed in [1]: where µ k is the expected processing capacity (in number of tasks) of a VM k , µ k is its maximum processing capacity, k is the current workload, and µ k > k . The maximum processing capacity refers to the processing capacity without considering the load impact, while the expected processing capacity is the capacity affected by the current load. A fair assumption that holds throughout the model is that the maximum processing capacity of each VM is independent of the corresponding rate of the other VMs. This independence assumption is necessary, since clouds are usually heterogeneous and integrate different components.

3) Model Parameters
This section describes some of the model parameters, which will be used to evaluate our strategy and make comparisons. The computing power (CP) of each VM, is expressed in Millions of Instructions Per Second (MIPS), is given by that is, the total computing power of all the cores assigned to VM k . The execution time of a task N is given by that is, the size of the task divided by the CP of the executing VM k. The makespan is the maximum of the completion times (CT) of all the N tasks assigned to the VMs: Finally, the response time of a task i within a VM's queue is the time elapsed from the task submission up to the time it starts its execution. The execution starts after the completion of i − 1 tasks, which are ahead of task i in the queue:  The number of tasks assigned to VM k µ k The expected processing capacity of each VM k µ 1 The expected processing capacity of a LBer µ k The max. processing capacity of a VM k b 1k The probability of distributing tasks to VM k y k The distribution factor for VM k k Indicates the workload at VM k p 1 Expected utilization of the LBer Response time for task i

B. MOTIVATING EXAMPLES
Consider a small set of four VMs, as shown in Fig. 7 The user community generates a number of tasks, which are assigned among the VMs of the cloud data center. Each user submits different numbers and sizes of tasks, which are queued in each VM. The parameters for the examples that follow are given in Table 2. = 20 sec for tasks in VM 4 . The makespan will be the completion time of the last task running in the slowest VM 4 , 15 × 20 = 300 sec. By applying Eq. 13, we get each task's response time, for the simple active monitoring scheduler. These values are given in Table 4. The average response time for all the tasks is 49.43. The total execution time is the sum of the execution times for each task (as computed by Eq.11):  Our fair task distribution scheme considers the maximum processing capacity and the current load A i of each VM. In such a scheme (the details are given in the next section), the 40 tasks are distributed as follows: -VM 1 :17 tasks, 22 in total -VM 2 :15 tasks, 20 in total -VM 3 : 8 tasks, 13 in total -VM 4 : 0 tasks, 5 in total Then, the task execution per VM will be completed at: (17 + 5) × 5 = 110 for VM 1 , (15 + 5) × 5.55 = 111 for VM 2 , (8 + 5) × 9.09 = 118.7 for VM 3 , and (5) × 20 = 100 for VM 4 . The makespan is 118.5, so there is a reduction of (300 − 118.5)/300 = 60%. The response times for the tasks are given in Table 4, for the fair scheduler. The average response time for all the tasks is 34.62, a reduction of about 30% compared to the simple active monitoring scheduler. The total execution time is: EXAMPLE 2: In the second example, the VMs have different computation power and the task sizes differ, as shown in Table 3. Table 5 shows the size of the 40 tasks to be distributed. As in Example 1, 20 tasks have remained unprocessed, 5 in each VM. These tasks arrived at t = 0. Also, the batch of these 40 tasks arrive at t = 20. Again, a simple active monitoring scheduler would distribute the incoming tasks equally (10 per VM) and assign the tasks according to their computational intensity and the processing capacity of the VMs. Thus, the most intensive tasks will be assigned to the fastest VM, the next most intensive tasks will be assigned to the next fastest VM, and so on. Table 6 shows this distribution, while Table 7 provides the response times for each task in this schedule. Thus, VM 1 will be assigned 8 tasks with size 20000 plus two tasks of 16000. The execution time required for these tasks will be: (20000×8)+(16000×2) 2000 = 96 sec. Also, the unprocessed 5 tasks require another 25 sec. So, the total execution time for the fastest VM will be 121 sec. The execution times for the remaining VMs are given in the bottom of Table 6. The average execution time is 121+107.7+132.7+196 60 = 9.29. Our fair distribution scheme, which considers each VM's processing capacity will produce the schedule shown in Table  8. The makespan is 129.09 sec, an improvement of 34% compared to the simple active monitoring scheme. The right side of Table 7 shows the response times for all the tasks. The average response time is 44.74, an improvement of 10% compared to the simple active monitoring schedule. The execution times for the VMs are given in the bottom of Table 8. The average execution time is 129+127.77+129.09+100 60 = 8.09, an improvement of about 12% compared to the average execution time of the simple active monitoring strategy.

IV. OUR APPROACH TO TASK DISTRIBUTION A. PROBLEM FORMULATION
Objective: The main objective of our approach is to fairly distribute the N incoming tasks submitted to the cloud within a time slot, so that all the VMs would work for the same percentage of time to process the incoming load, based on the current system configuration, that is, current VM load and VM processing capacity. This strategy guarantees of load balancing, as we will prove later in this section.
Let us consider an source task distribution R, which is defined as a set of values: Our objective can be described as a transition from R to a target distribution R * such that the resulting expected utilizations are equal among all the VMs all p * k values are equal, k = 2, . . . , s In a new time slot, the LBer has N new tasks to distribute to the VMs, with an expected processing capacity µ 1 and expected utilization p 1 . In the target distribution, the N tasks would be distributed to (k − 1) VMs, resulting in lower maximum processing capacities µ * k , lower expected processing capacities µ * k , larger current loads * k and equal expected utilizations p * k for the system's VMs. The probability b 1k may increase or decrease, depending on the workload assigned to VM k .
From Equations 4 and 10, we see that the expected utilizations of the VMs, p k are: In this regard, the expected utilizations are a function of the LBer's utilization p 1 , and its expected processing capacity over a time slot, µ 1 . In the following, we will assume that µ 1 = N over a period of a time slot, that is, the balancer can distribute N incoming tasks over a period of a time slot. The expected processing capacity of the VMs, µ k , is computed as µ k − k (Eq.9), where the maximum processing capacity µ k depends on the computing power of each VM. Finally, b 1k is equal for all the VMs in the source distribution, but it changes to reflect the different loads that will be assigned to each VM after the task distribution. In the remaining paragraphs of this section, we present how our fair task distribution scheme can be embedded in the Markov process model and we prove that it leads to equal expected utilizations and finally to load balancing.

B. OUR FAIR TASK DISTRIBUTION STRATEGY
The key idea for our task distribution strategy is fairness. By fairness, we mean that the new workload of N tasks must be distributed in such a way that (i) each VM takes on an added workload proportional to its current processing capacity and (ii) the expected utilization of all the VMs becomes equal after the distribution, that is, the percentage of time during which the VMs will be busy with processing these newly assigned tasks will be equal. To embed fairness in the Markov process model, we take the following steps: STEP 1: Initially, we compute the u k term, k = 2, . . . s, for each VM, which is the ratio of its expected processing capacity and the total number of tasks to be distributed during a time slot, µ 1 .
The larger the u k values, the larger the workload a VM is able to take over.

STEP 2:
The loading balancing factor, f , can simply be computed as the fraction of µ 1 and the sum of the u k terms:

STEP 3: The workload per VM is computed as
STEP 4: The expected processing capacity after distribution is µ * k will be that is, the current processing capacity minus the newly assigned load * .

STEP 5:
The probability of distributing tasks to VM k changes to b * 1k = * k /µ 1 ; this quantity is the percentage of the total workload that has been assigned to the VM k during this time slot.
In the following propositions, we prove that if we embed these steps to the Markov process model, we can obtain a well-balanced task distribution.
Proposition 1: The distribution of * k load to each VM, as computed by our fair policy reduces proportionally the expected processing capacities µ k of the VMs. VOLUME 4, 2016 Proof: Let us consider the expected processing capacities of two randomly taken VMs, µ k1 and µ k2 . From Eq. 18 and 19, we get that * From Eq. 9 we are aware that the expected processing capacity of each VM decreases as more load is added. Let us consider µ k1 and µ k2 : We know that µ k1 will be reduced by a factor of d(µ k1 ) and will become µ * k1 = µ k1 − µ k1 d(µ k1 ) and µ * k2 = µ k2 − µ k2 d(µ k2 ). Therefore, The numerators of Eq. 22 are equal to * k1 and * k2 . Thus, Eq. 22 becomes: The two ratios d(µ k1 ), d(µ k2 ) are equal if which is always true and this completes the proof.
Proposition 2: Our fair task distribution policy produces equal expected utilizations p * k among all the VMs. Proof: We set all the equal d(µ k ) values that we computed in Proposition 1 as ω. Now, let us consider the expected utilizations of two randomly selected VMs k 1 and k 2 . Let us rewrite Eq.16 for the target distribution, for a VM k 1 : We know that b * 1k = * k /µ 1 and, in the end of Proposition 1, we have shown that µ k2 * k1 = µ k1 * k2 . Also, from Eq. 22 it follows that µ * k1 = (1 − ω)µ k1 . Thus, As µ k2 * k1 = µ k1 * k2 and p1µ1 (1−ω)µ1 is the same for both expected utilizations, it follows that p * k1 = p * k2 , to complete the proof.

C. LOAD BALANCING ANALYSIS
To prove that our scheme provides load balancing, we use the load imbalance factor, which is defined as follows: where s is the number of processing elements and δ(t) is the average expected utilization over a time slot t, given by: The load imbalance factor δ(t) measures the variance between the average expected utilization and the expected utilization at each time slot. Our scheme is reactive, that is, it divides the time into slots and reacts after the first slot, t 0 that imbalances occur. In [8], the authors provided a definition for a load balanced network: where ∆ is a threshold value, the network is balanced.
We will use this definition to prove that our scheme guarantees load balancing.

Proposition 3: Our fair task distribution strategy guarantees load balancing on a slot-by-slot basis.
Proof: To prove this proposition, we need to prove that, for every time slot t, δ(t) is bounded by a predefined threshold. Let us start with a time slot t 0 (source distribution), where an imbalance situation is spotted. The load imbalance factor is δ(t 0 ) and it is computed by Eq. 24. By the end of the next slot t 1 , when the task distribution is completed, the average expected utilization is , for α VMs: p k ≥ 1 (26) Case 1: If all the p k values are < 1, the VMs can be assigned extra workload. In this case, because we have already proved that the expected utilizations will be equal, the average expected utilization will be p * k and thus, Case 2: If α VMs are already overloaded (p k ≥ 1), they take no extra workload and their p k values will be 0 in this time slot. The remaining s − α VMs will have equal p * k values. Thus the average expected utilization will be (s−a)p * k s . Let us consider the numerator of Eq. 24: • If p * k = 0, then the term produced will be δ(t) = (s−a)p * k s • If p * k > 0, the term produced will be < δ(t).
Thus, the largest term that can be produced is δ(t) and because there can be (worst case) at most α = s − 1 zero p * k = 0 terms, Thus, we set ∆ = (α − 1) (δ(t)) 2 s as our threshold value immediately after an imbalance is spotted. We have proved that δ(t) will never exceed ∆, thus our scheme guarantees load balancing.

D. TIME ANALYSIS
To analyze the complexity of our strategy, we need to analyze the complexity of the computations required for the transition from R → R * . It is known that the computations for F (N ), which are required to find the p values are recursive: and they are clearly O(N ). The computations of Equations 17-20 are clearly O(s). That is, our scheme is computationally efficient as the computations grow linearly with the number of incoming tasks N and the number of processing elements s. By making the fair assumption that N > s, the complexity is O(N ).

E. BACK TO THE MOTIVATING EXAMPLES
In this subsection, we briefly show how our strategy applies in the motivating examples of Section 3.

1) Example 1
In slot t 0 , µ 1 = 40, k = 5 tasks, and b 1k = 1 4 = 0.25 (equal probabilities for all the VMs. The maximum processing capacity for each VM k has been computed as The M k term is a factor that expresses the processing capacity of each VM proportionally to the maximum CP that exists among the VMs. Apparently, δ(t 1 ) < ∆. From the proof of Proposition 3, it is clear that if we apply our scheme for a number of time slots, this inequality will hold and balancing is guaranteed.

2) Example 2
In the second example, because the task sizes are different, we make some changes in the source distribution. Specifically, we set µ 1 as the total workload (not the number of tasks as in Example 1) that is to be distributed and we accordingly set the maximum and processing capacities. In this example, the total workload is 8(20000 + 16000 + 12000 + 8000 + 4000) = 48000. The maximum processing capacities are: µ 2 = 480000, µ 3 = 480000/1.11 = 432000, µ 4 = 480000/1.82 = 264000, and µ 5 = 480000/4 = 120000. The remaining computations are as in Example 1. The * k values are expressed as a workload size. To find the number of tasks, we use a change-making problem approach that approximates the minimum number of tasks that correspond to this workload. We start from the fastest VM, in order to assign to it the majority of heavy tasks and proceed to the next fastest, and so on. For this example, the distribution (expressed in workload size) is * 2 = 201169, * 3 = 178713, * 4 = 100116, and * 5 = 0. For * 2 , we need 8 tasks of size 20000 and another 3 of size 16000, for * 3 , we need 5 tasks of size 16000, another 8 of size 12000 and one task of size 4000. Finally, for * 3 we need 8 tasks of size 8000, another 7 of size 4000 and one task of size 4000. VOLUME 4, 2016

V. EXPERIMENTAL RESULTS
The proposed scheduling strategy is evaluated on GR-NET's cloud service okeanos-knossos, which provides a wide range of choices to develop, debug and evaluate an experimental system. We ran 3 experiments to compare strategy with 3 newly proposed schemes: the Load Balancing Modified Particle Swarm Optimization (LBMPSO) [44], the Honey bee foraging (HBF) with VM pre-selection [49], and the community-based Particle Swarm Optimization (CPSO) scheme introduced in [56]. These schemes are multiobjective; the first aims at enhancing the makespan and the resource utilization, the second aims at enhancing the makespan, the degree of LB, and the response time, while the third aims at reducing the processing costs and the response time. Also, recall that the third strategy has been selected among a many community-based dynamic task allocation strategies, because of its large-scale capabilities. For each experiment we ran 10 simulations and averaged the results. Table 9 shows the basic simulation parameters.
The simulation system works in a manner quite similar to what we have already described in our motivating examples. However, we added the rule that, if the expected utilization of a VM exceeds 0.9, it cannot accept any task requests, that is, its u k value becomes 0. This VM can accept further requests once it completes the execution of its tasks, so its utilization drops below 0.9. Fig. 2 presents a flowchart of how our simulator works.
In all the experiments, we initially (in the first time slot) generated an imbalance situation by equally assigning a small number of tasks among the heterogeneous VMs. In the remaining time slots, we iteratively assigned tasks to the available VMs using our strategy, to remove the imbalance. For the first experiment, we have 2 VMs of 250 MIPS and 3 VMs of 300 VMs. For the second experiment, we have 5 VMs of 500 MIPS, 5 VMs of 1000 MIPS, . . . , and 5 VMs of 4000 MIPS. Finally, the tasks were generated randomly, but their sizes were uniformly distributed.

A. MAKESPAN
The makespan is perhaps the most widely used and important metric to evaluate a task scheduling algorithm. In Fig.3 we compare the makespan of our scheme to the makespan of the LBMPSO scheme. We have proved that our scheme can balance the load under different task numbers and sizes that may be produced during a time slot. This fact eliminates the need for task migrations, which are necessary only in cases where VMs are unavailable due to failures. The LBMPSO scheme utilizes the task migration approach whenever the VMs are overloaded, thus its makespan was found to be about Ιnitlialize simulation parameters (see Table 9) 1. the maximum processing capacities μ Assign tasks using the 5 steps of the fair strategy 40% higher compared to our scheme. Fig. 4

B. AVERAGE UTILIZATION
One of the objectives of the LBMPSO scheme is to enhance the VMs' utilization. Specifically, it uses the idea of distributing the tasks in such a way that, for each VM k that is assigned N tasks, the ratio ET k )/M akespan is the maximum possible. We experimented on two different scenarios to compare our work to the LBMPSO strategy: in the first scenario, the available VMs are 5. The average utilization for LBMPSO approaches 0.65 (see Fig.5(a)). When the number of VMs decreases, the LBMPSO increases the utilization, which reaches values > 1 (see Fig. 5(b)). This means that further task assignment can lead to long execution times, large makespan values and imbalances. our scheme maintains equal utilization values among the VMs for both scenarios and the utilization values increased smoothly as we kept adding new tasks. For 3 VMs, the utilization approached 0.9, but no over-utilization incurred because our scheme temporarily excludes over-utilized VMs.

C. DEGREE OF IMBALANCE
The degree of imbalance is given by where T max and T min are the maximum and minimum execution times of all the tasks among all the VMs and T avg is their average value. We have compared our strategy against the HBF scheme with VM pre-selection for 70 tasks. From Fig. 6, it is clear that our scheme achieves very low values for the degree of imbalance. The reason is that T max and T min approximate each other, as a result of our load balancing policy. The HBF strategy achieves good balancing, but there are cases where the response times (see the next paragraph) increase the maximum execution times, T max , thus the degree of imbalance.

D. RESPONSE TIME
In the last set of experiments, we compared the response times of our scheme, the HBF and the CPSO. Fig.7 shows the average response time comparisons. The HBF scheme is designed mainly to prevent overloading. As the number of tasks increase, the overloaded or balanced VMs are not assigned extra workload. Therefore, more tasks are waiting in the queue of the underloaded VMs, resulting in higher response times. The CPSO scheme employs a task allocation algorithm which reduces the response time per task. The tasks are assigned between selected co-workers, but it has the drawback that, if there are no available workers to perform a set of tasks, then, these tasks can be queued until the potential workers are released from their current assigned task. This increases the response time. Our work proportionally the tasks to the VMs based on the expected processing capacity. Thus, the waiting times in the queues are lower, compared to the other schemes.

VI. CONCLUSIONS -FUTURE WORK
In this work we addressed the problem of load balanced task scheduling. Our system model is based on the Markov process model, which is combined with a simple fair task distribution scheme. From the balance state probabilities, we obtain the expected utilizations for the virtual machines (VM). Our fair task allocation policy is implemented on a time slot basis and in such a way, that the expected utilizations of all the VMs is equal. We proved that this scheme always guarantees load balancing. The proposed scheme is multiobjective in the sense that it enhances a number of important metrics: makespan, average utilization, degree of imbalance, and response time. Compared with three new state-of-the art schemes like the Load Balancing Modified Particle Swarm Optimization (LBMPSO), the Honey bee foraging (HBF) with VM pre-selection, and the and the community-based Honey bee foraging [49] FIGURE 6. Comparison of the degree of imbalance between our scheme and the HBF scheme [49].
Particle Swarm Optimization (CPSO) scheme, our task allocation strategy was found to achieve better results regarding the aforementioned metrics. An issue that needs to be further investigated is optimization. Our work was proven to guarantee load balancing, it provides good results for a variety of metrics under different distribution scenarios, but we still have not proved optimality. Also, we need to embed a strategy for VM availability. Currently, when a VM is not functioning properly, its tasks are transferred to the remaining VMs, using the fair distribution scheme. However, such situations cause larger total completion times, thus larger makespans and degrade the overall performance. STAVROS SOURAVLAS is an Assistant Professor of Computer Architecture at the Department of Applied Informatics, School of Information Sciences, University of Macedonia, where he joined in 2014. His research interests include scheduling of parallel and distributed computing systems, big data stream scheduling, cloud computing, systems modeling and simulation. He has published more than 40 papers in peer reviewed journals and conferences including 6 papers in IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Computational Social Systems and IEEE Access. He is a member of the IEEE.