Power- and QoS-Aware Job Assignment With Dynamic Speed Scaling for Cloud Data Center Computing

As the usage of mission-critical mobile applications increases in Industry 4.0, such as smart manufacturing and self-driving cars, the cloud computing paradigm and its supporting data centers have become more crucial. However, a common practice in the cloud data center computing industry tends to supply a surfeit of computing resources mainly for a robust quality-of-service (QoS). In this paper, we propose a simple real-time algorithm which combines a power-aware job assignment policy for a centralized job dispatcher and a power- and QoS-aware dynamic speed scaling policy for each physical machine (PM). The job assignment policy is called “Join the Least Power Consuming (LPC) Server” that routes an incoming cloud job to a server spending minimum power upon request. The server-side adaptive speed scaling policy expedites energy efficiency and satisfies response time-associated QoS condition. We call this policy “Minimizing Earliness (ME)” since it manages the server speed towards finishing jobs at their deadlines as precisely as possible, reducing the earliness of job completion. The design principle of LPC-ME combination supports both energy efficiency and service quality required in cloud data centers. Numerical experiments compare the proposed algorithm’s power consumption and response time with those of existing popular policies and demonstrate better energy efficiency with negligible degradation of service quality.


I. INTRODUCTION
Energy efficiency in data centers has become an essential issue as most computing today happens within cloud data centers, which consume a tremendous amount of electric energy [1]. Describing the environmental impact, data centers consume over 3% of the global electricity and produce over 2% of the total global greenhouse gas emissions [2]. While the harmful effect due to large-scale data centers has been pointed out for the last decade, typical resource utilization is still known to be 30% or less [3]; there is still huge room for improvement to reduce energy consumption.
The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Zakarya . For those who manage large-scale data centers, proper job assignment for load balancing is a fundamental problem because it is directly related to service quality [4]- [6]. There are well-known job assignment policies based on different types of congestion information, such as round-robin (no information), shortest queue length (number of jobs), and least workload (load estimation). They are simple, easy to implement, and effective in terms of load balancing [1], [6]- [8].
Inside data centers, servers are considered as main energy-consuming equipment [9]- [11]. Servers are reported to consume about 80% of the total energy while networking, and storage devices possess the remaining 20% [10]. Typically, CPU has been the most significant contributor to the power consumption in servers, as depicted in Fig. 1. As such, FIGURE 1. Breakdown of power consumption in a server [10]. researchers in the various fields have been developing novel technologies for pursuing more efficiency in CPU usage, for example, dynamic voltage/frequency scaling (DVFS), wakeon-LAN (WoL), and virtualization.
Alongside various energy-saving technologies, recent studies focused on achieving two contradicting objectives: energy efficiency and quality-of-service (QoS) (see [5], [9], [12]- [16] and references therein). Most of them require solving complicated optimization problems, and the solution may not be easy to interpret and implement. To this end, this paper focuses more on developing a simple algorithm that explicitly considers computer servers' power usage behavior and time-sharing concept with a response time-related QoS condition which is one of the most important performance metrics regarding service level agreements (SLAs).
Our work contributes to the green cloud computing literature by proposing a simple but powerful algorithm easily applicable to data centers. The algorithm consists of a new job assignment policy (least-power-consuming, LPC) and a new dynamic speed scaling policy (minimizing earliness, ME) to save energy and satisfy service quality simultaneously; the joint algorithm is called LPC-ME. To the best of our knowledge, this is the first work that proposes and combines two simple power-and QoS-aware policies each of which is designed for job dispatcher and physical server, respectively. The main contributions of this paper are summarized as follows: • We propose a power-aware job assignment policy to achieve high energy efficiency. The policy utilizes the real-time power usage information of each server.
• We suggest a power-and QoS-aware speed scaling policy that takes into account the QoS condition on job response time and physical machines (PMs) with timesharing processors. The policy utilizes the real-time information on jobs' deadlines, workloads, and current speed.
• Combining the job assignment policy with the speed scaling policy, we design a simple power-and QoS-aware real-time algorithm called LPC-ME. The energy saving effect of LPC-ME combination is supported by a short proof, numerical examples, and performance evaluation.
We note that we purposely maintain a macroscopic view of the system to focus on energy conservation and service quality. In this regard, we avoid discussing different areas of cloud systems details such as security and network topology that are out of the scope of this study.
The remainder of this paper is organized as follows. We provide a literature review of the existing green cloud computing studies in Section II. Section III describes the target system. Section IV states a mathematical model and formulation. In Section V, we propose our algorithm and explain the underlying design principle. Section VI introduces some numerical examples that account for the working mechanism and energy efficiency of the proposed method. Section VII provides the performance evaluation results and interpretation. Finally, Section VIII concludes and suggests possible extensions of the study.

II. RELATED WORK
Research studies have addressed energy conservation problems for computational tasks in different ways, from physical to logical, hardware to software, depending on their interests and objectives. People in computer systems have provided solutions based on DVFS adaptively scaling the processing capacity of CPUs. On the other hand, architectural and platform-level researchers have developed segmentalized levels of cloud virtualization that significantly enhance resource utilization in data centers. Also, many kinds of consolidation techniques have become feasible thanks to the cutting edge virtualization technologies, e.g., Containerization [17].
This study belongs to the literature on how to efficiently utilize those novel technologies in the right place. A large volume of related work has been published in the past decade, so we even reached a survey on surveys of energy efficiency in cloud-related environments [18]. As we largely divide the decision types for green cloud computing into server consolidation and real-time operation (see Fig. 2), the surveyed literature is arranged into the following two subsections.

A. SERVER CONSOLIDATION
In cloud data centers, servers are consolidated by virtualization and containerization technologies with efficient management of virtual machines (VMs) running on PMs. Literature on this resource management problem proposes various strategies regarding VM migration comprised by VM selection (decision on which VM to be migrated) VOLUME 10, 2022 and VM placement (decision on which PM to host the selected VMs).
Ye et al. [13] investigated how to assign VMs efficiently to available PMs. The authors proposed a many-objective VMs placement model considering energy consumption, resource utilization, and load balancing. The suggested method showed better performance than the previous studies with the accompanied metaheuristic algorithm named EEKnEA (Energy-Efficient Knee Point-Driven Evolutionary Algorithm). Mustafa et al. [9] proposed joint resource allocation techniques considering the issues of energy consumption and SLA violation. Their methods mainly focused on allocating and migrating VMs based on the dynamic threshold mechanisms that exploit servers' capacity and power information. Simulation results have shown that exploiting the real-time energy state and CPU capacity led to an effective resource management scheme. Remesh Babu et al. [19] proposed an SLA-aware VM allocation strategy for load balancing based on a load prediction model. The paper considered various SLA types and showed effectiveness in reducing SLA violations and load balancing. Buchbinder et al. [20] studied VM scheduling problem in both offline and online settings. They designed novel algorithms that exploit certain predictions about the workload and showed that the extra information gives significant improvements regarding the problem. While most related studies pursued energy efficiency utilizing migration approaches, Khan et al. [21] pointed out that migration itself is expensive in terms of energy consumption and performance degradation. Proposing an energyperformance-aware allocation and migration techniques that take migration cost into account, they found that not using dynamic consolidation could be more cost-efficient.
In a more macroscopic viewpoint, a body of studies focused on drawing useful managerial insights and supporting decision-making by investigating the solution structure of mathematical optimization models. Gallego Arrubla et al. [22] suggested a unified mixed integer programming (MIP) model that incorporates virtualization, workload routing, DVFS, and powering on/off servers simultaneously. Using the MIP model, they answered eight research questions on data center energy efficiency. Several heuristic algorithms were accompanied to provide solution methods to solve relatively large-sized problems. However, scalability remained an issue since the algorithms still cannot handle the realistic-sized problem. Cho and Ko [23] remodeled the unified MIP model developed in [22] for revisiting some of the doubtful research conclusions about operating practices in data centers. As a result of fixing the original model, they drew different insights to the previous study. While mathematical models in [22], [23] simplified stochastic nature in cloud computing workloads using average and majorizing approaches, Kwon [16] explicitly introduced a modern uncertainty quantifying concept to the mathematical models based on two-stage stochastic programming with a chance-and risk-constrained optimization. Compared to the too conservative solutions from the previous studies, it suggested a more reasonable server provisioning strategy that guarantees a certain level of QoS.

B. REAL-TIME OPERATION
Studies on this phase care the operational aspects of cloud data centers to which our paper belongs. The main concern in this phase includes appropriate job assignment (or dispatching) to servers for load balancing, demand-adaptive speed scaling of computing servers, and their joint optimization.
The research problems of job assignment in distributed systems started in the late 1970s. One of the most related studies among the oldest literature was conducted by Bonomi [24]. The author investigated the well-known Join-the-shortestqueue (JSQ) policy for parallel PS servers and demonstrated that it offers a good solution to the load balancing problem, although not necessarily optimal. Both academic and industrial attention regarding load balancing topics then has moved to cloud data centers for the past decades. Alongside, Bose and Kumar [4] provided a survey on load balancing in terms of computational workload, whereas Zhang et al. [5] presented another survey in terms of network load balancing.
For jointly optimizing the load balancing and server-side resource scheduling considering energy efficiency and service quality, Liu et al. [25] presented an integer program accompanied with a heuristic solution approach called ''Most Efficient Server First.'' While the study was meaningful as they formally stated the problem and suggested a rough solution sketch, it has limitations that load balancing becomes arbitrary for servers with the same efficiency (i.e., identical servers). Ko and Cho [12] proposed a new load balancing and speed scaling framework that combined a distributed optimization algorithm with modern queueing theoretic analysis for taking into account the tail probability of response time. Despite the novelty in terms of methodological aspect, some technical requirements such as known a priori stationary workload processes restricted its practicality.
While the literature mentioned above mostly concentrated on the practical issues in cloud data centers, another body of literature has pursued theoretically meaningful results inspired by the industry problems. Wierman et al. [26] examined fundamental energy-performance tradeoff in computer speed scaling in the three metrics: optimality, fairness, and robustness. Wentao et al. [27] studied optimal load balancing for a certain type of cloud architecture that well reflects machine learning applications. Kwon and Gautam [28] and Cho and Ko [29] investigated methods to time stabilize the performance of stochastic service systems that well-model cloud data centers. Anton et al. [30] first showed that a redundancy system (e.g., MapReduce) could help improve the performance of data center computing in case the servers' capacities are sufficiently heterogeneous. Recently, Harchol-Balter [1] published a seminal paper that examines the open problems in queueing theory inspired by data center computing industry. The paper presented new queueing models, workload characteristics, and performance metrics that are all helpful for improving the operations of cloud data centers.  Fig. 3 describes a cloud data center consisting of a centralized job dispatcher and heterogeneous processor-sharing (PS) PMs (i.e., servers) having time-sharing processors. A number of user equipments (UEs) running various cloud computing applications send their requests to the data center. At the entrance, there is a centralized job dispatcher that assigns the incoming jobs to proper servers.

III. PROBLEM DESCRIPTION
We consider the virtualization technology that enables a single PM to process cloud job requests from multiple application types; server 6 in Fig. 3 shows an illustrative situation in which a single server is associated with two applications (App 2 and App 3). Regarding the virtualization, we emphasize that deploying an energy-efficient consolidation strategy is another important research topic on green cloud computing. Throughout this paper, however, we concentrate on a specific timings during which the association between the applications and servers is fixed; we assume no VM migration within the time horizon.
On the physical level resource management for each server, we consider the Dynamic Voltage/Frequency Scaling (DVFS). DVFS is an off-the-shelf technology that enables the adaptation of CPU's performance to workload [31], motivated by the need to achieve higher utilization of computing resources. Thanks to the DVFS feature, modern CPU has ability to scale up and down its processing speed dynamically.
Given the system described above, two decision points to be chosen are (i) the job assignment policy for the centralized job dispatcher and (ii) the speed scaling policy for the servers. The ideal decision should reduce the power consumption at the lowest level capable of processing all the user requests while guaranteeing the desired QoS. Throughout this paper, we adopt a popularly used performance metric, response time, to quantify the QoS required in cloud data centers, i.e., the servers should give the best effort to finish every job within a prespecified time budget.
In the next section, we begin by interpreting the system using the queueing-theoretic viewpoint as it provides a strong tool to describe the shape of dynamic systems. This will abstract out the important characteristics of data center operating decisions.

IV. MODEL AND FORMULATION
This section formally states the problem using mathematical notations based on the queueing-theoretic interpretation.
A. NOTATION Table 1 provides a summary of main notations that help explain the system dynamics mathematically.

1) PREDEFINED SETS
We consider a set of applications A indexed by i (i.e., i ∈ A) that need to run on a set of servers S indexed by j (i.e., j ∈ S). The subscripted sets A j and S i denote the association between application types and servers. More precisely, the set A j comprises the indices of applications that server j is hosting; the jobs of applications in A j can be dispatched to server j. Inversely, the set S i consists of the indices of the servers that are hosting the application i, i.e., a job of application i can be dispatched to one of the servers in S i .

2) SERVER RELATED PARAMETERS
We have deterministic parameters that specify a cloud data center. The servers are heterogeneous in terms of their power consumption behavior. To be more specific, we adopt a well-defined polynomial function that is convex on the positive real line used in many previous studies [11], [26], [31]; server j has a power function p j (µ j ) ≡ α j + m j µ n j j where µ j is the speed of server j and α j , m j , n j are predefined constants with α j , m j ≥ 0 and n j ≥ 2. The instantaneous speed of each server µ j (t) (will be explained below) is assumed to be continuously controllable between the lower and upper bounds: µ j (t) ∈ [γ j , j ] for j ∈ S and at any time t. Regarding the QoS, let R j (t) be the response time of a job that joins server j at time t. Since we have assumed that an user's satisfaction is attained by the response time that is not longer than a constant δ j , the system operator should try to keep R j (t) ≤ δ j for all j ∈ S and t > 0.

3) WORKLOAD RELATED PARAMETERS
Though most ingredients in Table 1 look straightforward, the parameters that quantify cloud workload characteristics include some tricky concepts. As described in Section III, we consider a time-dependent arrival rate accompanied by a (nonexponential) random job size. The nonstationary non-Poisson process (NSNP) is one of the well-known stochastic processes that capture the properties. An NSNP arrival process is defined by a time-dependent arrival rate function λ(t) and a base inter-arrival time T that is a random variable having mean τ and the squared coefficient of variation (SCV) C 2 a . Together with a random job size S with mean β and SCV C 2 s , the workload process w(t) can be expressed by w(t) = βλ(t).
Attaching the subscript i, the application-specific workload is now expressed by w i (t) = β i λ i (t). See Cho and Ko [29] for more details about the NSNP and GI t /GI t /1/PS model.

4) DECISION VARIABLES
Two real-numbered decision variables r ij (t) and µ j (t) indicate decisions on job routing and speed scaling, respectively. First, r ij (t) decides the proportion of workload of application i that is assigned to server j at time t; j r ij (t) = 1 for all application i and at any time t. Second, µ j (t) determines the real-time speed of each server j at time t.

B. MATHEMATICAL FORMULATION
Using the notations explained above, we formulate an optimization problem (P1) for minimizing global power consumption in a cloud data center during a planning horizon [0, T ]: The objective function (1a) minimizes the total power consumption over the planning horizon. Constraints (1b) and (1c) ensure the sum of splitted workloads is equal to the original input workload. Constraint (1d) guarantees each server does not explode during the planning horizon. This corresponds to the stability condition for a time-varying queue: Constraint (1e) restricts the range of server speed. Lastly, the service level in terms of response time is specified by constraint (1f).

1) NEED FOR WORK-AROUND APPROACH
If we are given the optimal values r * ij (t) and µ * j (t), one of the natural usages of them to achieve asymptotically optimal energy efficiency is to apply the following operating scheme: • Job assignment policy (probabilistic routing): For an arriving job of type i at time t, dispatch it to server j with probability r * ij (t). • Dynamic speed scaling policy (continuous time updating): For server j, adjust the speed at µ * j (t). However, finding the optimal solutions to r ij (t) and µ j (t) in real-time is limited due to the complicating factors such as nonlinear functional objective, nonstationarity, and uncertainty; it requires to solve similar problems in [12], [25]. Instead, we come up with heuristic approaches that we call least-power-consuming (LPC) job assignment policy and minimizing earliness (ME) speed scaling policy, jointly called LPC-ME, to find a high quality solution for r ij (t) and µ j (t). In the next section, we explain the design principle of LPC-ME to achieve our goal.

V. ALGORITHM DESIGN
We adopt a generic formed power function for each server as discussed in [11], [31]: With this convex polynomial power function, our design principle starts from the following simple question: Shorter job completion time with higher processing rate or longer job completion time with lower processing rate, which one is more energy-efficient? To answer the question, we state the following proposition. Proposition 1: For a power function in equation (2) and a workload of size s, there exists a most energy-efficient speed µ * to finish the workload given by µ * = [m/(n − 1)α] 1/n . Proof: Define a function f (µ; s) ≡ (s/µ)p(µ) to be the total power consumption required to finish a workload of size s using a constant speed µ. The total power consumption is calculated by the power function p(µ)-power consumption per unit time-multiplied by time to finish the workload (s/µ). Then, the first derivative of the function in terms of µ is with its minimizer µ * = [α/m(n − 1)] 1/n . Proposition 1 implies two insights. First, an optimal server speed exists in terms of total power consumption given a constant job size. Hence, even if the job is not urgent, running the server at speed µ * is optimal due to the energyperformance tradeoff. Second, when the server should run faster than µ * because of the imminent job, fully utilizing the given response time budget is more energy-efficient than finishing it hastefully.
Our goal now is to manage the servers' speeds towards just meeting the jobs' deadlines but faster than the most energy-efficient point µ * . To explain, we introduce the following queueing theoretic notations: • J j (t): Index set of jobs in server j at time t (we use k for indexing the jobs, i.e., k ∈ {1, . . . , |J j (t)|}) • S k j (t): Remaining workload of job k in server j at time t • A k j (t): Arrival time of job k in server j at time t • Q j (t): Number of jobs in server j at time t Note that the above notations are stochastic processes in the sense that their values are randomly varying as time evolves. What we want is to finish all the jobs in a server within their due time. Each job's deadline is determined upon arrival by the server-specific completion time deadline δ j . Now, we propose the following real-time algorithm, which is the main result of this paper.
In the following two subsections, we explain the derivation of Algorithm V. The sequence of explanation is from ME to LPC as we regard this order will be more effective in clearly delivering the rationale behind the idea.

Algorithm 1 Operate a Cloud Data Center By the Combination of the Following Two Policies 1) Job assignment policy: Join the least power consuming (LPC) server
• Dispatch an arriving job to the instantaneously least power consuming associated server. That is, choose a server j for a job of application i at time t such that j ← argmin j∈S i p j µ j (t) .
2) Dynamic speed scaling policy: Minimize earliness (ME) of the work-in-process jobs' completions • Update the processing rate of each server towards minimizing the jobs' earliness. That is, keep the speed µ j (t) of each server j in the following manner: where the µ * j is as defined in Proposition 1.
-If the set of configurable processing speeds is discrete (e.g., P-States of modern CPUs [32]), choose the smallest value larger than the value calculated by expression (3).

A. ME: SERVER's DYNAMIC SPEED SCALING POLICY
The term earliness, by definition, means the quality of coming early or earlier. With the meaning of earliness in mind, consider the k th job in server j at time t. Its allowed time to stay in the server at time t is δ j − t − A k j (t) by the response time requirement. That means we must scale the server speed towards finishing every k th job within its remaining time, δ j − t − A k j (t) , to keep the response time less than δ j . Regarding the PS scheduling policy-all jobs in the system evenly share the processor at any given time, and multiple jobs that are supposed to be completed by their deadlines, server j's earliness-minimizing speed at time t must be chosen by the following expression: Then, the expression (3) in Algorithm V is obtained by manipulating the terms in expression (4) and Proposition 1. Note that this speed scaling reduces the earliness of the job completions as smallest as possible when the job requires higher speed than µ * j .

B. LPC: DISPATCHER's JOB ASSIGNMENT POLICY
The LPC routing policy is motivated both by Proposition 1 and well-known greedy heuristics such as Join-the-shortestqueue (JSQ) and Least-Work-Left (LWL) [7]. The underlying VOLUME 10, 2022 idea of LPC is as follows. Assuming the speed scaling of each server is done by ME as in Algorithm V, dispatching a job to a server increases the server's speed. The following remark explains this behavior. Remark 1: For a processor sharing server under minimizing earliness speed scaling, receiving a job increases the server speed monotonically.
Proof: Let N j (t) be the number of jobs being processed in a server j at time t, i.e., N j (t) ≡ |J j (t)|. Then, the scaled speed of a server right after receiving a job can be expressed as follows: where , we notice that the expression (5) is greater than or equal to the expression (3).
Recalling that the server's power function is convex polynomial, increment of speed also results in increment of power consumption. Under this condition, an energy-efficient dispatcher should find a server which will increase its power consumption as small as possible upon receiving a job.
If the servers are homogeneous, dispatching a job to the server with minimum speed must be the most energy-efficient decision. Practically, however, servers are heterogeneous in terms of the power function. Hence, we just choose the server with the instantaneously least power consumption, which the proposed algorithm remains a greedy heuristic decision.

1) HOW IS THE LOAD BALANCED THROUGH LPC-ME?
Similar to Proposition 1, we may consider the opposite case where a job completion event decreases the instantaneous power consumption. By this mechanism, job arrival and completion dynamically scale up and down the server speed. Since LPC policy always assigns an incoming job to a server with the least power consumption, we notice that load balancing in a cloud data center is attained under LPC-ME combination.
In the next section, we will introduce a numerical example with graphical description to explain how the proposed method is working towards improving energy efficiency while keeping QoS condition and also balancing loads.

VI. NUMERICAL EXAMPLE
This section provides two illustrative examples with specific numbers to help understanding the concept of the proposed policies. We adopt the specific values from Tables 4, 5 which will be introduced in Section VII for performance evaluation.

A. WORKING MECHANISM
Graphics in Figs. 4, 5 describe the working mechanism of LPC-ME combination. We assume a situation that two consecutive job requests of application 4 arrive to a cloud data center. Fig. 4 depicts the operations for the first job and Fig. 6 describes the ones for the second job. The left half (i.e., drawings) of each figure shows the operational description whereas the right half shows server-specific instantaneous power usage according to processing speed (i.e., graph) and information about the work-in-process jobs: job index, remaining workload, and arrival time (i.e., table).
The following enumeration explains step-by-step operations of job dispatcher and servers (refer also to the sky blue colored stickers in Figs. 4, 5). Fig. 4(a): A job request of application 4 arrives at time τ = 302.35 with job size 30. Fig. 4(a): A job dispatcher tries to find the least power consuming server among the associated servers with application 4. Since servers 8 and 9 are associated and server 8 is currently consuming less power (19779.11 per unit time) than server 9 (21092.66 per unit time), the dispatcher chooses server 8 as the least power consuming server. Fig. 4(a): The dispatcher assigns the job to server 8 by LPC policy. Fig. 4(b): Since the processor is time-sharing and a new job is added, server 8 needs to update its speed to satisfy the QoS condition for each job. Based on the remaining workload and arrival time for each work-inprocess job, server 8 changes its speed to 45.16 by ME policy in expression (3) and the power usage is increased to 46404.18 per unit time accordingly. Then, each job will be processed with the rate of 11.29 (= 45.16/4) per unit time and hence all the jobs will be completed within their deadlines. Note that the processing speed will be dynamically updated regularly as well as whenever their is a new job request or a job completion. Fig. 5(a): Another job request of application 4 arrives at time τ = 302.36 with job size 30, which is 0.1 unit time after the previous job. Fig. 5(a): The dispatcher again tries to find the least power consuming server among the associated servers. Since server 9 is consuming less power (21092.66 per unit time) than server 8 (46404.18 per unit time), the dispatcher chooses server 9 as the least power consuming server. Fig. 5(a): The dispatcher assigns the job to server 9 by LPC policy. Fig. 5(b): Based on the remaining workload and arrival time for each work-in-process job, server 9 changes its speed to 58.59 per unit time by ME policy and the power usage is now increased to 40481.28 per unit time accordingly. Fig. 6 shows a numerical example that explains why minimizing earliness is energy efficient. We assume a situation in server 9 and adopt numerical values from Table 5.

B. ENERGY EFFICIENCY
The figure consists of three parts. First, the upperleft table contains an illustrative job information (job size and response time budget) to be processed. Second, the upperright and lowerleft tables and the graph want to show the varying power consumption according to the following levels of earliness of job completion: • Non-Power-Aware (NPA): Does not care about the earliness but gives best effort to process the job.
• 75% Early: Tries to complete a job within 75% of given response time budget.
• 50% Early: Tries to complete a job within 50% of given response time budget.
• 25% Early: Tries to complete a job within 25% of given response time budget.
• Minimizing Earliness (0% Early): Tries to complete a job definitely at the deadline. For each level of earliness, we calculate the scaled speed and corresponding working time requirement (see yellow-colored cells). Then, we notice that ME is the most energy efficient compared to the others (see greencolored cells and accompanied graph).
Here, we emphasize that ME policy is nothing but trying to fully utilize the given response time budget hence results in increment of response time. However, the slight response time increment that does not violate the job deadline will not be considered critical in terms of overall service quality, which we argue that ME policy is still reasonable as the new speed scaling policy for the physical servers in cloud data centers.

VII. PERFORMANCE EVALUATION
In this section, we demonstrate the performance of the proposed method on the large set of randomly generated cloud workloads and a virtual cloud data center. All environments FIGURE 6. A quantitative analysis of energy efficiency according to the earliness of job completion. The server-specific numerical values are adopted from server 9 in Table 5. In short, this figure shows that fully utilizing the given response time budget is most energy-efficient in terms of completing a job. and policies are implemented in Julia programming language [33] and run on Windows Server (Intel(R) Core(TM) i9-11900K @ 3.50GHz, 32GB of RAM). For fair comparison, we compare the performance with existing benchmarks consist of well-known job assignment policies and off-theshelf dynamic speed scaling policies.
A. BENCHMARK 1) JOB ASSIGNMENT POLICIES Table 2 summarizes the set of job assignment policies used for performance comparison. It includes three well-known practical policies and the proposed policy. Note that each policy utilizes different information on congestion: no information, number of jobs, workloads, and instantaneous power usage. Table 3 summarizes the set of dynamic speed scaling policies used for performance comparison. It includes three popular policies and the proposed policy. The existing three policies correspond to a linux kernel feature called CPUFreq Governors for state-of-the-art CPUs [34]. Here, we emphasize that the processing performance of CPU can only be configured to one of the predefined P-States, e.g., P0, P1, . . . , Pn [32]. Simply put, the proposed ME policy adaptively changes a processor's P-State to what corresponds to the smallest speed larger than the value calculated by expression (3) in Algorithm V.

B. SIMULATION ENVIRONMENT 1) WORKLOAD GENERATION
The pattern of cloud workloads has been analyzed to show daily cycle [35]. That is, people tend to work more during day time and less at night (for example, see Fig. 7). To mimic the real-world workload characteristics that vary periodically with burstiness, we generate NSNPs for sampling the job arrival epochs according to the simulation algorithm developed in [36]. In short, NSNP is a generalization of the well-known non-homogeneous Poisson Process (NHPP).  [34].   Real workload traces for 10 applications for one day collected from publicly available source [22].
Appropriately choosing the input parameters of NSNP (i.e., λ i (t) and distributions of random variables T i and S i in Table 1) well describes the realistic cloud workload patterns. Table 4 provides the explicit parameters for generating synthetic workloads from five cloud applications.
We use gradual sine functions as time-varying arrival rates and general probability distributions as job size distributions. First, the time-varying arrival rates are used to reflect the daily cycle of workloads. Second, we mostly use Lognormal distribution, which is heavy-tailed as well as nonexponential, for T i and S i as it will simulate quite variable property of data center computing workloads. See [1] for the detailed discussion on the workload property. We also emphasize that we pick the function parameters towards representing elephants and mice effect, i.e., few applications with high workload levels and many others with low workload levels [5].  Table 7 in Appendix.

FIGURE 9.
Comparison of the violation rates of QoS conditions for the benchmarking policy combinations. The cross dots mark outliers and the orange colored line marks median. The numerical data is available from Table 8 in Appendix.

2) SERVER CONFIGURATION
We implement a small cloud data center that comprises ten heterogeneous servers for an illustrative purpose. Since the proposed algorithm does not require solving complicated optimization problems, scalability is not an issue. For example, LPC job assignment policy can be applied by implementing a real-time power usage monitoring feature a job dispatcher; each server periodically reports its instantaneous power usage to the job dispatcher. When it comes to the server's dynamic speed scaling, it does not require information exchange between servers and a dispatcher; each server estimates its instantaneous workload and perceives current speed. Table 5 summarizes the configuration of the 10 servers comprising the virtual cloud data center.

C. RESULTS AND DISCUSSION
We run 10,000 independent replications of 2,000 unit time simulation for each benchmarking policy combinations enumerated in Table 6 to get large enough samples that prevents misinterpretation due to outliers. We use 0.01 unit time for all the servers' regular speed updating intervals. Fig. 8-10, and Tables 7, 8 (in Appendix) provide graphical and statistical summaries of the simulation results obtained from the 10,000 replications. As shown in the overall plots, frequent outliers (cross dots in Figs. 8, 9 and spikes in Fig. 10) are observed because we simulated quite variable and heavy-tailed workloads as discussed in Section VII-B1. In the following three subsections, we will discuss about energy efficiency, service quality, and load balancing effect of the proposed method.

1) ENERGY EFFICIENCY
Energy efficiency is the primary concern in this research study. As regards, the box plot in Fig. 8 summarizes the total power consumption data obtained from the 10,000 replications of simulation experiment for each benchmarks. As expected, benchmarks with NPA speed scaling policy (i.e., RR-NPA, JSQ-NPA, LWL-NPA) consistently show much higher power consumption compared to those with power-aware speed scaling policies. On the other hand, the policy combinations with power-aware speed scaling policies (i.e., RR-PA1, RR-PA2, JSQ-PA1, JSQ-PA2, LWL-PA1, LWL-PA2, and LPC-ME) show dramatic improvement in energy efficiency. Notably, we found that the proposed LPC-ME combination in average requires less than 10% of power consumed by the other existing power-aware benchmarks (see Table 7 in Appendix). Of course, we need to think about the energy-performance tradeoff and the next subsection will discuss about it.

2) SERVICE QUALITY
Service quality is another main concern in this research study as well as the highest priority of the cloud data center computing. As such, Fig. 9 depicts the violation rates of predefined QoS conditions, i.e., the ratio of delayed jobs out of all the completed jobs. As shown in the figure, we found that the benchmarks with LWL job assignment policy (i.e., LWL-NPA, LWL-PA1, LWL-PA2) outperform other benchmarks including the proposed LPC-ME combination. However, LPC-ME also demonstrates well-managed service quality showing less than 0.001% of violation rates in average and only 0.037% in worst case (see Table 8 in Appendix). In fact, the performance degradation had been predicted since the proposed ME speed scaling policy manages the processing speed tightly in order to fully utilize the given response time budget (recall the last paragraph in Section VI-B). Regarding the significant improvement in energy efficiency shown in Section VII-C1, we conclude that the slight degradation of service quality does not seem to be critical; it provides the contracted service level on the response time with probability higher than 99.999% in most cases.

3) LOAD BALANCING EFFECT
Figs. 10(a)-10(c) portray real-time server-specific status (i.e., workload, number of jobs, and servers' P-States) logged in a single replication of experiments under LPC-ME algorithm. Naturally, all the three measures tend to follow the changing trends of applications' loads. We find here that the servers with the same associated applications (e.g., servers 1, 2 and servers 8,9) show the similar levels of workloads and number of jobs, which verifies the load balancing effect of the proposed algorithm. In addition, Fig. 10(c) demonstrates the effectiveness of LPC-ME combination in terms of scaling the servers' speeds in reaction to the change of loads.

VIII. CONCLUSION AND FUTURE WORK
This study proposes a simple but effective real-time algorithm for cloud data center computing to achieve better energy efficiency. We interpret the target system as a parallel network of heterogeneous single-server PS queues with multiple types of applications, time-varying job request processes, and controllable processing rates. We develop two policies involving LPC job assignment and ME speed scaling motivated by the convexity of processor's power usage with respect to speed. Numerical simulations have demonstrated that LPC-ME combination consistently outperforms other combinations of existing popular policies in terms of energy efficiency without significant QoS degradation.
We suggest several future research directions based on the issues not fully covered in this paper. First, we did not directly solve the formulated problem since the exact solution approach has not been available for such a problem to the best of our knowledge. If an optimal solution can be found in some ways, we can evaluate the performance gap between our heuristic algorithm and the optimum. Second, incorporating the neglected lower-level computer systems details (e.g., overheads between the executions of decisions) can give another insight regarding cloud data center computing. Third, we omitted the networking aspect in cloud data centers to concentrate on the computing aspect. We believe that jointly considering the two main aspects in cloud data centers (i.e., computing and networking) will have a significant synergy effect in terms of end-to-end performance. Last but not least, implementation on real-world cloud data centers should reveal the uncovered issues not studied in this paper.

APPENDIX STATISTICAL SUMMARY OF PERFORMANCE RESULTS
See Figs. 8-10, and Tables 7 and 8.