Introduction
Embedded real-time systems are now prevalent due to the technology over the last several decades. Since their power supplies such as batteries are limited, it has become critical to efficiently manage their energy consumption. For the energy management, the two schemes, dynamic voltage/frequency scaling (DVFS) and dynamic power management (DPM), are commonly used. DVFS is a power management scheme that adjusts the supply voltage of processor units to reduce their dynamic energy consumption. The dynamic energy is consumed by running tasks on the active processors in which the capacitance of the complementary metal-oxide semiconductor (CMOS) transistors is charged or discharged. On the other hand, DPM is a power management scheme that sets the processors to low-power states. Since the CMOS transistors are continuously powered-on even on the inactive processors, some amount of leakage power always occurs. To reduce the leakage power consumption (or static power consumption), DPM transits processors into low-power states by disabling some of the processor parts.
In the past, the dynamic energy consumption was dominant over the static power consumption. Since the dynamic energy
To reduce the static power consumption, several DPM-based algorithms have been introduced [5], [6], [15]–[18]. They switch a processor into one of low-power states when they are not in use. It is known that the deeper low-power state consumes less power than the shallower low-power state. Therefore, when the processor is idling, an improved energy saving can be achieved by putting the idle processors into a deeper low-power state as long as possible. This finding implies that the processor idle time should be collected as much as possible to ensure that the processors remain in one of the low-power states during the collected idle time interval. However, the transitions between different states are not free, since the waking up of the processors from the low-power states requires time and energy. As stated in [32], the challenge is to predict the length of idle time with high accuracy in the presence of multiple low-power states so that the most energy efficient low-power state can be selected for the idle processor. It also implies that the length of the collected idle time should be estimated accurately; for example, if the length is estimated shorter than the actual value, a less-efficient energy saving occurs. Alternatively, if a longer length is estimated, the task deadline can be missed.
The DPM-based algorithms, for example, can be used in determining the minimum number of active processors for scheduling a set of tasks and shutting down the unused processors. It is extremely energy-efficient because it disconnects the supply voltage from the unused processors, which results the dynamic and static energy consumption into zero. In order to use the processor shutdown technique, the scheduling algorithm should be able to compute the minimum number of required processors.
In this paper, we propose the scheduling algorithm mainly focusing on three issues regarding the saving of static energy using real-time scheduling with DPM scheme, as follows:
clustering the idle times as long as possible,
estimating the clustered idle time accurately, and
meeting all of the implicit-deadlines of periodic tasks, i.e., achieving RT-optimality.
Definition 1:
(RT-optimal) An optimal real-time schedule meets all of the task deadlines when the total utilization demand
Contributions: In this study, we focus on clustering the idle time distributed over all the processors in order to make a processor stay in a low-power state as long as possible without jeopardizing the real-time task scheduling. The contributions of this study are summarized as follows.
We propose a flow network-based scheduling algorithm, flow network-based DPM (fnDPM), that is for executing periodic real-time tasks on homogeneous multiprocessors. fnDPM relies on DPM to save the static energy. Since there exists a trade-off between energy-efficiency and time-complexity, we propose two approaches, the fnDPM with fine-grained windows (fnDPM-fw) and the fnDPM with coarse-grained windows (fnDPM-cw). fnDPM-fw uses the fine-grained time windows to generate the longer idle time than that of the fnDPM-cw, which results in high computational complexity. Alternatively, the fnDPM-cw uses the coarse-grained time windows to focus more on the reduction of its time complexity than the fnDPM-fw.
Experimental evaluations of the proposed algorithms were conducted in comparison with the latest existing algorithm, LPDPM, in terms of both the static energy consumption and the state-transition overheads. The reason why LPDPM was selected as the counterpart is that it supports the RT-optimality with consideration of the multiple low-power states, as summarized in Table 1. The experiment results show that the proposed online scheduling algorithms, fnDPMs, save comparable static energy with the offline scheduling algorithm, LPDPM. Especially, in cases with the low processor-utilization demand of the given task set and the earlier completion of the tasks than expected, both fnDPMs save more static energy than LPDPM.
Organization: The remainder of this paper is organized as follows. Section II introduces the related works. Section III defines our models. Section IV and V describe the flow network problem, the procedures of the fnDPM algorithms, and the complexity of the algorithms. Section VI presents the experimental evaluations with their results. Section VII and VIII describe future works, discussion, and conclusion.
Related Works
Langen and Juurlink [8] showed that when the static power dissipation becomes more significant, employing the maximum number of processors to maximize the amount of slack that can be used to lower the supply voltage is no longer beneficial from an energy perspective. Therefore, they proposed a leakage-aware multiprocessor scheduling algorithm (LAMPS) for determining minimum number of processors which consume minimum energy for a set of periodic tasks. Their algorithm sets upper and lower bounds on the number of processors and applies binary search approach to find the minimum number of processors. However, they did not apply the shutdown of processors or the procrastination of job execution for idle processors.
Chen et al. [9] proposed an energy-efficient scheduling algorithm for periodic real-time tasks on homogeneous multiprocessor environment in which the leakage current was non-negligible. They sorted all tasks in a non-increasing order of their workloads by the largest task first (LTF) strategy and then, the sorted tasks were assigned on the processors according to the first fit (FF) strategy to minimize the number of processors in use. In addition, they applied the online procrastination algorithm for turning the processor into the dormant mode (shutdown mode) by delaying the arrival time of the next job in order to reduce the static energy consumption. However, they did not consider the possible overhead by assuming that the processor in the dormant mode consumed zero power and waking up the processors from the dormant mode took no time.
Awan and Petters [16] proposed an approach, called enhanced race-to-halt (ERTH), to save energy for a single processor supporting multiple low-power states. Their technique uses an offline analysis to compute the break-even time for each mode, where the break-even time is the minimum time interval required for a processor to stay in the low-power state in order to save energy. In addition, they accumulated the additional slack time generated by early completion of high-priority tasks to save extra static energy by allowing the processor to stay in the low-power mode for a longer time.
Bhatti et al. [15] presented a DPM strategy for real-time multiprocessor systems called assertive dynamic power management (AsDPM). AsDPM determines the number of active processors to satisfy the execution requirement of released tasks at runtime. Then, with Global-EDF or Global-LLF scheduling policies, AsDPM extracts and clusters the idle times that are distributed across the processors. However, neither Global-EDF nor Global-LLF is not RT-optimal, which results AsDPM in being non-optimal.
Awan [21] proposed a leakage-aware energy management algorithm, which is called global scheduler and power management (GPM), for a system using the Global-EDF scheduler on a homogeneous multicore platform. In order to save the static energy consumption, they utilize two types of slack time that are defined as usable execution slack and usable idle slack, which are obtained from the early completion of jobs and the residual capacity from a non-fully loaded platform, respectively. The GPM exploits these usable slack types to either put a core into a sleep state, or to prolong the sleep interval of the cores. This algorithm was evaluated on a simulator modeled after a Freescale PowerQUICC III-based multicore platform with several low-power states.
Legout et al. [17] proposed an offline approach called linear programming dynamic power management (LPDPM). LPDPM generates a feasible schedule on multiprocessors using the linear programming problem formulated with several constraints and the objective function of the minimization of the static energy consumption. It encourages the neighboring idle times to be merged as much as possible while considering the characteristics of the low-power states. The formulation takes all of the possible scheduling events into account by considering the hyper-period of the given tasks, where the hyper-period is defined as the least common multiple of all of the periods of the given tasks. After this offline procedure, LPDPM runs real-time tasks in each time interval between two adjacent scheduling invocations using the fixed-priority until zero laxity (FPZL) [19]. FPZL helps collecting the idle times that are generated when a task is completed earlier than expected. LPDPM is RT-optimal1 since its schedule is computed using the linear programming problem formulated for the hyper-period. However, the offline procedure-based LPDPM is not applicable to dynamic scenarios where the given task set changes at runtime when a new task can be added dynamically. Furthermore, when the specification of the low-power states for a system changes, the offline LP formulation of LPDPM must be recomputed even if the given task set is the same.
Nair et al. [18] proposed an energy-efficient proportional fair scheduling algorithm to reduce the static energy consumption in multiprocessors called by early-release fair scheduler with suspension on multiprocessors (ESSM). ESSM is based on the early-release fair (ERfair) scheduling algorithm, i.e., a variant of the proportional fair (Pfair) scheduling approach [20], the first RT-optimal scheduling algorithm for multiprocessors. ESSM uses a procrastination scheme that postpones the execution of tasks for keeping the processor in a low-power state to maximize the duration of idle time. However, since ESSM obtains the slack time from early released tasks for the procrastination scheme within adjacent scheduling points called quantum, this algorithm maximizes the idle time only locally while keeping fairness. In addition, ESSM assumes a single low-power state only and its extension for multiple low-power states has not been introduced yet.
System Model
A. Task Model
A set
A set
B. Processor Model
A system with symmetric and homogeneous multiple processors was assumed for this study. Accordingly, a set of
The power consumption in the
-th low-power state is denoted by$p$ .$PC_{p}$ The time needed to wake up from the
-th low-power state is denoted by$p$ .$WT_{p}$ The power overhead for the wake-up from the
-th low-power state is denoted by$p$ .$PP_{p}$ Break-even time of the
-th low-power state is denoted by$p$ according to [16].$BET_{p}$
Energy-Aware Real-Time Scheduler
A. Preliminary
To achieve the RT-optimality, several classes of real-time task scheduling algorithms on multiprocessors have been developed for periodic implicit-deadline tasks. One of the well-known classes is
To achieve our scheduling objective, we attempt to transform the real-time scheduling problem into a network-flow problem. The formulation of a network flow problem or a linear programming problem for real-time scheduling is not new [17], [30]. However, since the previous works formulated the problem in consideration of a very long time interval from 0 to hyper-period of the given task set, they have been used as the offline scheduling techniques. Instead, for the formulation of the problem in this study, only the active jobs at the current boundary were considered, which enables us to design online scheduling algorithms. In addition, the algorithms are no longer restricted by the fluid schedule notion, which means that they become unfair-but-optimal scheduling algorithms with DPM. In next subsection, our problem formulation is explained and the proof of RT-optimality is described in [34] in detail.
B. Problem Formulation
At every boundary, the scheduling algorithm invokes to reserve the execution time for all active jobs. We here formulate it as an optimization problem. For the convenience of description, two sets are defined for the current time interval as follows:\begin{align*} \boldsymbol {K}(t_{s},t_{e})=&\{ k | W_{k} \subset [t_{s}, t_{e}] \}, \tag{1}\\ \boldsymbol {J}(k)=&\{ i | W_{k} \subset [a_{i}(t), d_{i}(t)] \}.\tag{2}\end{align*}
Using these two sets, a flow network problem for scheduling real-time tasks is formulated as follows.\begin{align*}&{{Maximize} }~\sum _{\forall i}\sum _{\forall k}{X_{i,k}} \tag{3}\\&{{s.t.} } ~\sum _{\forall k \in \boldsymbol {K}(t,d_{i}(t))}{X_{i,k}} \leq c_{i}(t), \quad 1 \leq \forall i \leq N \tag{4}\\&\hphantom {{{s.t.} } ~} \sum _{\forall i \in \boldsymbol {J}(k)}{X_{i,k}} \leq Cap(W_{k}),\quad 1 \leq \forall k \leq K \tag{5}\\&\hphantom {{{s.t.} } ~} X_{i,k} \leq l_{k}, \quad 1 \leq \forall i \leq N ~\text {and } 1 \leq \forall k \leq K.\tag{6}\end{align*}
For equation 5, \begin{equation*} Cap(W_{k})=\left [{ M-\sum _{ \forall i \notin \boldsymbol {J}(k,t)}{C_{i}/T_{i}}}\right] \times l_{k}.\tag{7}\end{equation*}
Here, the active job area at time
The formulation based on equations 3–6 constructs a flow network that is represented by a directed and capacitated graph \begin{align*} \mathbf {V}=&\{n_{s},n_{e}\}\cup \{\tau _{i}|\forall _{i}\}\cup \{W_{k}|\forall _{k}\} \tag{8}\\ \mathbf {E}=&\{e(n_{s},\tau _{i})|\forall _{i}\}\cup \{e(\tau _{i},W_{k})|\forall _{i,k}\}\cup \{e(W_{k},n_{e})|\forall _{k}\}\tag{9}\end{align*}
At each boundary
Flow Network-Based Dynamic Power Management Algorithm
To save more static energy using DPM, the processor needs to stay in the deeper low-power state as long time as possible. At the same time, estimation of the length of the clustered idle time in consideration of the overhead should be precise to avoid missing the deadlines of the real-time tasks. Additionally, the proposed algorithm should be computationally tractable to be used online. To achieve these goals, the formulation described in Section IV-B is considered.
To represent the idle time in the flow network, a virtual task called an idle task is added. Since the total capacity of the idle time is
A. Flow Control Over the Flow Network
The objective of a maximum flow algorithm is to send the maximal flow from the source to the sink. In general, multiple maximum flows that achieve the goal could exist, thereby implying that the formulated problem could have multiple solutions. The maximum flow algorithms arbitrarily find one of the feasible solutions. However, to generate the long idle time, it is necessary to collect the flow of idle task across the boundaries while maintaining the feasibility. This implies that we should be able to prioritize certain flow over the flow network, which is not possible with the use of the simple maximum flow algorithms. Therefore, to control the flow, a parameter cost
As shown in Figure 2, the flow for the idle task is controlled using costs \begin{equation*} w(\tau _{Idle},W_{k})=K-k+1,\quad \forall k. \tag{10}\end{equation*}
The costs for all of the other edges are set to 1. Then, the flow for the idle task avoids being sent through the higher-cost edges as long as it does not reduce the predefined amount of the maximum flow. It implies that the idle time is clustered at the end of the current AJA’s time interval as much as possible without the missing of any deadlines. This helps in the later accumulation of the idle time so that the idle time at the next boundary becomes longer.
Alternatively, when the idle time needs to be clustered at the current time, i.e., the start of AJA’s time interval, the costs are assigned as follows.\begin{equation*} w(\tau _{Idle},W_{k})=k, \quad \forall k. \tag{11}\end{equation*}
The costs for all of the other edges are also set to 1.
B. Clustering of Idle Times
Using Equations 10 or 11, the idle time can be either clustered close to the end of the current AJA or clustered around the current time. Switching between the two modes,
Start in
ClusterBackward mode.In
ClusterBackward mode, when the first window contains the idle time, switch to$W_{1}$ ClusterForward mode at the current AJA.\begin{equation*} curMode\gets \texttt {CF},\quad \text {if } X_{Idle}^{1} > 0. \tag{12}\end{equation*} View Source\begin{equation*} curMode\gets \texttt {CF},\quad \text {if } X_{Idle}^{1} > 0. \tag{12}\end{equation*}
It is implied that even if the algorithm attempts to cluster the idle time at the end of AJA’s time interval, the first window
happens to contain the idle time. Thus, rather than wasting the capacity of the first window to run a piece of the idle task, it is preferable to start clustering the idle time.$W_{1}$ In
ClusterForward mode, when the first window contains the idle time whose length is less than the length of the$W_{1}$ , switch to$W_{1}$ ClusterBackward mode at the next AJA.\begin{equation*} curMode\gets \texttt {CB},\quad \text {if } X_{Idle}^{1} < l_{1}. \tag{13}\end{equation*} View Source\begin{equation*} curMode\gets \texttt {CB},\quad \text {if } X_{Idle}^{1} < l_{1}. \tag{13}\end{equation*}
Since the clustered idle time ends in the middle of the first window, the idle time at the next window is going to be disconnected. Therefore, clustering the idle time close to the end of the time interval will be started at the next AJA.
C. Estimating the Length of the Clustered Idle Time
When the idle time is clustered, its length should be accurately estimated for the system to transit into the appropriate low-power state during the well-estimated time interval. However, a direct use of the flow-network formulation with the virtual idle task may cause wrong the estimation. For example, assume that
The wrong estimation is caused by the fact that the simple flow network model does not consider the boundaries, which will be established by inactive jobs (or future jobs) within the time interval [0,
Alternatively, we also propose flow network-based DPM with coarse-grained windows (fnDPM-cw) to alleviate the time complexity. Instead of additionally considering all of the possible boundaries constructed by the inactive jobs, fnDPM-cw considers the boundaries constructed by break-even times. More specifically, within the time interval of
D. Scheduling Algorithms
The pseudo-codes of fnDPM-fw and fnDPM-cw are presented in Algorithms 1 and 2, respectively. Both algorithms use a variable
constructG_IdleTask ( )$mode$ Input: The argument
is either$mode$ ClusterForward orClusterBackward . If the is$mode$ Cluster Backward , then it assigns the costs to the edges of the idle task using Equation 10. Otherwise, it assigns the costs using Equation 11. In addition, each is calculated using Equation (3) and then this value is used for the construction of$Cap(W_{k})$ .$\mathbf {G}$ Output: The constructed graph including the virtual idle task is returned.
constructG_DecCap ( )$time$ Input: The
is a$time$ that is used for the creation of the coarse-grained windows. A time boundary$BET$ is added to the set of$t+time$ . In addition, instead of incorporating the virtual idle task in the flow network, each$\mathbf {B}$ where$Cap(W_{k})$ is less than the$b_{k}$ is subtracted by the utilization of the virtual idle task, as follows:$t+time$ \begin{align*} Cap_{new}(W_{k})=Cap(W_{k})-l_{k}, \tag{14}\\ \text {where }\{k\mid b_{k}\leq t+time{, }\forall k\} \tag{15}\end{align*} View Source\begin{align*} Cap_{new}(W_{k})=Cap(W_{k})-l_{k}, \tag{14}\\ \text {where }\{k\mid b_{k}\leq t+time{, }\forall k\} \tag{15}\end{align*}
It is assumed that the subtracted capacity here will be used for the idle task. Therefore, the numbers of nodes and edges for the virtual idle task in the flow network are reduced, thereby alleviating the complexity of Algorithm 2.
Output: The constructed graph without the virtual idle task is returned.
mincost ( )$\mathbf {G}$ Input:
is the flow network.$\mathbf {G}$ Output: The set of the reserved execution time for all of the real-time tasks within the first window and the set of reserved execution time for the virtual idle task within all of the windows of the current AJA are returned by solving the problem of minimizing costs of flows.
maxflow ( )$\mathbf {G}$ Input:
is the flow network.$\mathbf {G}$ Output: The set of reserved execution time for all of the real-time tasks within the first window and the total reserved execution time for all of the real-time tasks within all of the windows of the current AJA are returned by solving the problem of maximizing flows.
SelectPowerState ( )$flow,index$ Input: The
is the actual flows of an idle task. The$flow$ can be either$index$ or the index of the low-power state. If the$null$ is$index$ , the procedure of$null$ SelectPowerState () finds the lowest possible power state where is less than its$flow$ . If the$BET$ is not$index$ , then it directly selects the$null$ -th low-power state.$index$
Algorithm 1 $fnDPM-fw$
// Initially,
procedure Schedule(
if
if
else
end if
end if
if
else
end if
return
end procedure
Algorithm 2 $fnDPM-cw$
// Initially,
procedure Schedule(
if
if
end if
end if
for
if
Go to line 18.
else
end if
end for
if
else
end if
return
end procedure
Using a set of procedures above, fnDPM-fw and fnDPM-cw run differently depending on
In
ClusterBackward mode,fnDPM-fw attempts to find a feasible solution with the clustered idle time that is close to the end of the AJA’s time interval in lines 3–9 of Algorithm 1. Using the
constructG_IdleTask () in line 4, fnDPM-fw constructs the flow network and assigns the costs to the edges between the idle task node and the window nodes using Equation 10. Then, it finds the solution using the$\mathbf {G}$ mincost () in line 5. This solution is used for the scheduling if the first window does not contain the idle time, as shown in line 6. This means that fnDPM-fw successfully clusters the idle time at the end of the AJA’s time interval; this forces the$W_{1}$ to retain$curMode$ ClusterBackward in order to cluster the idle time at the end of the time interval again at the next boundary. This procedure is shown in lines 13-14. Otherwise, it prepares the clustering of idle time by setting the to$curMode$ ClusterForward .fnDPM-cw performs a procedure that is the same as that of fnDPM-fw in lines 3–8 of Algorithm 2.
In
ClusterForward mode,fnDPM-fw newly constructs
and assigns the costs to the edges of the idle task by Equation 11 in line 10 of Algorithm 1. Then, it finds a solution using$\mathbf {G}$ mincost () in line 11. Especially inClusterForward mode, this algorithm finds which low-power state is available using the length of the clustered idle time and it selects the available low-power state through theSelectPowerState () in line 12. Then, fnDPM-fw sets the to$curMode$ ClusterBackward for the next boundary if the length of the idle time is less than , thereby merging the current idle time with the previous idle time. If the length of the idle time is equal to$l_{1}$ , fnDPM-fw sets the$l_{1}$ to$curMode$ ClusterForward . As a result, the current idle time is merged with the previous idle time. These procedures are shown in lines 13-16.fnDPM-cw verifies which low-power state is possible to transit into in lines 9–17 of Algorithm 2. First, fnDPM-cw constructs
with the exclusion of the virtual idle task using the$\mathbf {G}$ of low-power state by the$BET$ constructG_DecCap () in line 10. Then, it finds the solution using themaxflow () in line 11. When the maximum flow occurs in the given problem, the left-hand sides of all of Equation 4 should be equal to the summation of values for all of the tasks. Thus, the feasibility of the solution can be verified by line 12. If this solution is feasible, then fnDPM-cw selects the low-power state corresponding to$c_{i}(t)$ by calling the$BET$ SelectPowerState () in line 13. Otherwise, it iterates the above procedures in lines 10–11 with the next shallower low-power state. The procedures for the updating of the in lines 18–21 of Algorithm 2 are performed similarly to those of fnDPM-fw in$curMode$ ClusterForward mode.
To compare the proposed algorithms with LPDPM [17], the actual scheduling on the four processors is presented in Figure 4 using the task set in Table 2. Each number in these figures means the index of the real-time tasks and the character
Actual for the example in Table 2. (a) A schedule of the LPDPM when the actual execution time is equal to WCET. (b) A schedule of the fnDPM when the actual execution time is equal to WCET. (c) A schedule of the LPDPM when the actual execution time is less than WCET. (d) A schedule of the fnDPM when the actual execution time is less than WCET.
Example 1:
Figures 4 (a) and 4 (b) show the actual scheduling by LPDPM and fnDPM when the actual execution time of each task is equal to its WCET, respectively. In Figure 4(a), LPDPM first clusters the idle time around the start time
Example 2:
Figure 4 (c) and 4 (d) show the actual scheduling by LPDPM and fnDPM, respectively when the actual execution time of each task is less than its WCET. A set of the actual execution times is set to {2,2,3,4,13}, each of which is less than WCET in Table 2.
In Figure 4 (c), LPDPM first schedules the real-time tasks using the offline schedule. At the time 0, LPDPM first executes
Contrarily, as shown in Figure 4 (d), fnDPM continuously adds the extra idle time to the last processor. This is possible because fnDPM formulates and solves the scheduling problem in the consideration of the extra idle time when the early completion occurs. In this example, when
E. Complexity of the Proposed Algorithm
The proposed algorithms are invoked at each boundary and if necessary, when the early completion occurs. Both algorithms use existing solvers for the maximum flow problem and the minimum cost problem. Since the computational complexities of the solvers are higher than those of the other procedures such as
In fnDPM-fw (Algorithm 1), the maximum number of boundaries of the task
Alternatively, fnDPM-cw (Algorithm 2) considers a maximum of
Experiment
We conducted several experiments to compare the proposed algorithms with LPDPM using STORM, a simulator that was designed to evaluate the real-time schedulers on multiprocessors [29].
A. Simulation Environment
We assume that the tasks were running on four processors, where each processor can transit into one of three low-power states. Table 3 shows the parameters of each of the low-power states that were used in these experiments. They were determined by referring to NXP LPC1800-series micro-controllers (MCU) that was designed based on ARM Cortex-M3 processor. The MCU has been commonly used in several areas, e.g., motor-control, industrial-automation, and embedded-audio applications [31]. The parameters include the regulator supply current, the wake-up time, and the power penalty for each low-power state. The wake-up time and the power penalty are required and consumed to return to the active state, respectively. The break-even time was set to the same as the wake-up time, which is a reasonable assumption as described in [4].
In the active state, the cores are fully operational and can access the peripherals and memories that are configured by their running software. In the sleep state, the cores receive no clock pulse but peripherals and memories remain running. In both deep sleep and deep power-down states, all cores and peripherals except the peripherals in the always-on power domain are shut-down. Memories can remain powered for retaining memory contents as defined by the individual power-down state.
We generated 1,000 real-time task sets per each utilization
For the evaluation of the energy-efficiency of the algorithms, the following three metrics were used:
The static energy consumption.
The total power overheads for the transition between the low-power states and the active state.
The time interval during wherein the processors stayed in each low-power state.
Here, the static energy consumption depends on the other two metrics. For example, even if the total energy overhead is high, the static energy consumption can be reduced by staying in a low-power state for a long time. Another example is that staying in the shallow sleep state for a long time can be better than staying in the deep sleep state for a short time in minimizing the static energy consumption. These metrics will reveal the characteristic of each algorithm in energy savings.
B. Simulation Results
The metrics for the comparison are specified as follows:
The normalized static energy consumption is calculated as follows:
\begin{equation*}=\frac {\sum _{\forall {\{\text {low-power state } p\}}}{(PC_{p}\times t_{p}+PP_{p})}}{PC_{Idle}\times t_{Idle}}\end{equation*} View Source\begin{equation*}=\frac {\sum _{\forall {\{\text {low-power state } p\}}}{(PC_{p}\times t_{p}+PP_{p})}}{PC_{Idle}\times t_{Idle}}\end{equation*}
is the total time interval for which a DPM algorithm remains in the$t_{p}$ -th low-power state.$p$ is the total time that a non-DPM algorithm spends in the idle state.$t_{Idle}$ is the power that a non-DPM algorithm consumes in the idle state.$PC_{Idle}$
The total power overheads is calculated as follows:
\begin{equation*}=\sum _{\forall p}{(Tr_{p}\times PP_{p})}\end{equation*} View Source\begin{equation*}=\sum _{\forall p}{(Tr_{p}\times PP_{p})}\end{equation*}
is the total number of transitions between the active state and the$Tr_{p}$ -th low-power state.$p$
The normalized time spent in each low-power state is calculated as follows:
\begin{equation*}=\frac {t_{p}}{t_{Idle}},\quad {\forall p}\end{equation*} View Source\begin{equation*}=\frac {t_{p}}{t_{Idle}},\quad {\forall p}\end{equation*}
Figures 5 and 6 show the experiment results when the actual execution time of each task is equal to its WCET. Figure 5 (a) shows the static energy consumption of each algorithm that has been normalized by that of a non-DPM algorithm (B-Fair). This figure shows that compared with B-Fair, both LPDPM and the proposed algorithms significantly reduce the static energy consumption over the whole range of the utilization. The difference of the energy consumption between LPDPM and the proposed algorithms stays within 5%. Figure 5 (b) depicts the total energy overhead for each algorithm. It shows that fnDPM-cw incurs a higher total energy overhead than both LPDPM and fnDPM-fw, which implies that fnDPM-cw transits into the low-power states more frequently than the others.
The energy consumption and the power penalty of algorithms when the actual execution time is equal to WCET. (a) Normalized static energy consumption. (b) Total power overheads for state-transitions.
The time measurement of algorithms when the actual execution time is equal to WCET. (a) Total time that LPDPM spent in each low-power state. (b) Total time that fnDPM-fw spent in each low-power state. (c) Total time that fnDPM-cw spent in each low-power state.
Figures 6 (a), (b), and (c) show the total time spent in the low-power states, as normalized by the total idle time of the given tasks. Figure 6 (a) shows that LPDPM utilizes almost 100% of the idle time for staying in one of the low-power states. In addition, it stays in the deep-power down state for a long time. Alternatively, fnDPM-fw utilizes an idle time that is shorter than that of LPDPM. In addition, it also stays in the deep-power down state for a shorter time than that of LPDPM. A similar but attenuated trend is found in Figure 6(a) and (b) between LPDPM and fnDPM-cw. Note that as the utilization increases toward 4.0, the total idle time was decreased. Thus, as the utilization increased, the normalized total time spent in the low-power states affected the power consumption decreasingly.
On the other hand, Figures 7 and 8 show the results when the actual execution time of each task is allowed to be less than its WCET. Figure 7 (a) shows the static energy consumption of each algorithm, as normalized by that of B-Fair. The early completion of the tasks produces the extra idle time dynamically, thereby providing another chance for the reduction of the energy consumption. Especially when the utilization is low (
The energy consumption and the power penalty of algorithms when the actual execution time is less than WCET. (a) Normalized static energy consumption. (b) Total power overheads for state-transitions.
The time measurement of algorithms when the actual execution time is less than WCET. (a) Total time that LPDPM spent in each low-power state. (b) Total time that fnDPM-fw spent in each low-power state. (c) Total time that fnDPM-cw spent in each low-power state.
Figures 8 (a), (b), and (c) show that, when the early completion occurs, both of the fnDPM algorithms utilize a longer idle time in the low-power states than LPDPM. It causes both fnDPM algorithms to consume less energy than LPDPM when the utilization is low (
Future Works and Discussion
A real-time scheduling algorithm based on both DVFS and DPM can reduce the dynamic and static energy consumption but it has a trade-off in handling the slack time. To reduce the dynamic energy consumption, DVFS-based scheduling algorithms scale down the operating frequency using the slack time and thus, it makes the execution time of tasks increase. The increased execution time however reduces the length of an idle time which DPM-based algorithms can utilize for entering the low-power state. This trade-off makes it difficult to find the energy optimal solution for reducing both dynamic and static energy consumption.
In order to resolve the above problem, many researchers studied interplay of DVFS and DPM [8], [10]–[13]. In [12], the scheduling algorithms supporting the interplay of DVFS and DPM generally have to determine: (a) the switching time instant for turning the processor from a low-power state to an active state, (b) the target frequency and the time instant for scaling, and (c) the time instant for turning the processor back to a low-power state. As mentioned above, since it is difficult to consider these features at once, the algorithms based on interplay of DVFS and DPM use the slack time for applying DVFS first and subsequently use the residual slack time for applying DPM, or vice versa.
Langen and Juurlink [10] proposed several scheduling algorithms, extensions of their previous work in [8], by incorporating DVFS, shutdown, and procrastination. These algorithms find the optimal number of active processors like LAMPS [8] and also heuristically determine either the scaled frequency or the time point for the processor shutdown.
Lee [11] proposed a heuristic scheduling algorithm for light workloads on multi-core platform. This algorithm lowers the frequency of the overabundant cores to reduce the dynamic energy consumption and then it turns off rarely used cores after moving their tasks to other cores to reduce the static energy consumption.
Chen et al. [12] presented an energy efficient framework that considers the interplay of DVFS and DPM. In this framework, a power management strategy first determines the optimal frequency based on current workloads without jeopardizing the schedulability of real-time tasks and then uses the idle time interval to turn processors to sleep modes.
Moulik et al. [13] proposed an energy-aware real-time scheduling algorithm for hard real-time multicore systems using interplay of DVFS and DPM. This algorithm guarantees RT-optimality because it is based on RT-optimal DP-Fair scheduling algorithm (deadline partitioning-fair [33]). Their power management strategy first determines the minimum operating frequency and then compares it with critical frequency. As defined in [5], if the operating frequency is less than critical frequency, then the static energy consumption dominates dynamic energy consumption, which is known to result in increasing the total energy consumption. Therefore, the critical frequency can be the lower bound for achieving high energy savings. The algorithm proposed by Moulik et al. [13] assigns current tasks to each core with the minimum operating frequency to fully use the processor capacity, when the minimum operating frequency is higher than critical frequency. Otherwise, it scales the operating frequency up to the critical frequency and calculates the residual processor capacity for determining the time interval to enter the sleep state. Instead of minimizing the number of active processors, this algorithm chose to keep all processors powered on to scale the frequency at any time based on DVFS.
Our study is based on two premises for achieving energy savings. First, the strategy of shutting down as many processors as possible is more energy-efficient than keeping processors idle for applying DVFS later, as claimed in [8] and [9]. Second, the static power consumption has been growing over the dynamic power consumption [5], [14]. These premises have led us to designing DPM-based scheduling algorithms that generate the longer idle time for entering the deeper low-power state.
Our experimental results shows that the proposed algorithms reduce the static energy consumption similarly with the latest DPM-based offline algorithm [17]. In addition, our algorithm can be easily extended to reduce the dynamic energy consumption by adopting the interplay of DVFS and DPM. For example, when the break-even time of a processor is longer than the available idle time, our algorithm can be extended to use the idle time for scaling down the operating frequency of the processor. However, even if this simple heuristic enables the proposed algorithm to use both DVFS and DPM without wasting the idle time, it does not guarantee that it is the most energy-efficient approach. Instead, we believe that there should be a way to consider both DVFS and DPM in a single mathematical formulation, that eventually provides us with the optimal energy-efficiency. In this sense, we claim that our mathematical formulation in this paper is a good starting point toward obtaining the optimal energy-efficiency.
In order to find a solution, we used the solvers (such as [26]–[28]) for the general minimum cost flow problem. The complexity of the solvers is dominant in the proposed algorithms, since the solution for the general minimum cost flow problem has relatively high complexity. To reduce the complexity further, we need to use a different solver especially designed for our problem. For example, the scheduling problem in this paper always leads to a form of unbalanced bipartite graph where (1) the vertex set are partitioned into two subsets of tasks and windows and (2) the number of tasks and the number of windows are not always the same. It can give a chance for us to use solvers specialized in the unbalanced bipartite graph, which is computationally lighter than the solver for the general minimum cost flow problem [26]–[28].
Conclusions
An online real-time scheduling algorithm, fnDPM, is proposed in this work, using the DPM on a symmetric homogeneous multiprocessor to schedule periodic real-time tasks and to achieve static energy savings. The focus is the clustering of the distributed idle times to make a processor remains in an appropriate low-power state for a long time without jeopardizing the real-time constraints while using the minimum number of active processors. Since fnDPM is based on the flow network model, fnDPM efficiently generates long clustered idle times while satisfying the real-time constraints. An experimental evaluation of the proposed algorithms was conducted in comparison with an existing offline approach that is the only counterpart in the problem domain. Through the experiments, it is shown that the proposed algorithms achieved the static energy savings in manner that is similar to that of the existing algorithm. Especially, it is shown that, in the case of an early completion that occurs during a low utilization of real-time tasks, the proposed algorithms consume less static energy than the existing algorithm. For future works, an investigation on a procedure for the further reduction of the total energy consumption through the integration of DVFS and DPM will be performed.