Task Scheduling with Multicore Edge Computing in Dense Small Cell Networks

As a reaction and complement to cloud computing, edge computing is a computing paradigm designed for low-latency computing. Edge servers, deployed at the boundary of the Internet, bridge those distributed end devices and the centralized cloud server, forming a harmonic architecture with low latency and balanced loadings. Elaborated task scheduling, including task assignment and processor dispatching, is essential to the success of edge computing systems in dense small cell networks. Plenty of issues need to be considered, such as servers’ computing power, storage capacity, loadings, bandwidth and tasks’ sizes, delays, partitionability, etc. This study contributes to the task scheduling for multicore edge computing environments. We first show that this scheduling problem is an NP-hard problem. An efficient and effective heuristic is then proposed to tackle the problem. Our Multicore Task assignment for maximum Rewards (MAR) scheme differs from most previous schemes in jointly considering all three critical factors: namely task partitionability, multicore, and task properties. A task’s priority is decided by its cost function, which takes into account the task’s size, deadline, partitionability, cores’ loadings, processing power, and so forth. First, tasks from end devices are assigned to edge servers considering servers’ loadings and storage. Next, tasks are assigned to the cores of the selected server. Simulations compare the proposed scheme with First-Come-First-Serve (FCFS), Shortest Task First (STF), Delay Priority Scheduling (DPS), and Green Greedy Algorithm (GGA). Simulations demonstrate that the task completion ratio can be significantly increased, and the number of aborted tasks can be greatly reduced. Compared with FCFS (First-Come-First-Serve), STF (Shortest Task First), DPS (Delay Priority Scheduling), and GGA (Green Greedy Algorithm), the improvement in task completion ratio for hotspots is up to 26%, 25%, 22%, and 9%, respectively.

A straightforward integration is to deploy the edge servers at the Radio Access Network (RAN), as illustrated in Figure  1, which is a typical scenario we intend in this study. In Figure  1, task assignment and server/processor dispatch strongly affect the performance of an edge computing system. Various issues need to be considered, such as servers' computing power, storage, loadings, bandwidth and tasks' sizes, delaines, partitionability, etc.
Our contributions to task scheduling in multicore edge computing environments are summarized bellows: • We first show that the task assignment and server/processor dispatch problem is an NP-hard problem. • The problem is approached with an efficient and effective two-step heuristic. First, tasks from end devices are assigned to edge servers factoring in servers' loadings and storage. Next, the dispatched tasks onto the selected servers are assigned to the cores of the selected server. • A task's priority is decided by its cost function, which considers essential factors, including tasks' size, deadlines, partitionability, cores' loadings, and processing power. • The proposed Multicore task Assignment scheme for maximum Rewards (MAR) scheme effectually improves the task completion ratio. When comparing the MAR scheme with FCFS (First-Come-First-Serve), STF (Shortest Task First), DPS (Delay Priority Scheduling), and GGA (Green Greedy Algorithm), simulation results show the improvement in task completion ratio for hotspots is up to 26%, 25%, 22%, and 9%, respectively. The rest of this paper is organized as follows. Previous works are briefly reviewed in Section II, highlighting their relative merits and limitations. We define the system model and formulate the scheduling problem in Section III. Then, our MAR scheme is presented in Section IV with an in-depth explanation and discussion. A series of simulations demonstrating the feasibility and effectiveness of the proposed approach are presented in Section V. Finally, some conclusions are drawn in Section VI.

II. RELATED WORKS
End devices are generally equipped with limited computing power and storage resources. A standard solution to this problem is to offload tasks to more capable systems, such as cloud servers or data centers [6]- [8]. Offloading can improve the processing efficiency and reduce the power consumption of end devices [6]- [7], [9]. However, latency and network congestion might be induced during the offloading process [10]- [11]. In recent years, edge computing has received more and more attention due to its merits in reducing latency and bandwidth demand. Lots of effort paid to task assignment and scheduling in edge computing environments can be found in the literature [9], [12]- [13].
Many previous works resolve the scheduling problem through spectrum allocation [14]- [17]. In [14], the authors randomly assign tasks to the edge servers by a greedy method. They focus on the bandwidth demand when tasks move between edge servers and propose a solution to minimize congestion. The authors of [15] assume tasks are partitionable. Aiming at least power consumption and as many completed tasks as possible, they evaluate observed system dynamics to solve resource allocation, spectrum allocation, and power control problems. In [16], a task is partitioned into two parts. One part of them will be executed locally, and the other will be executed on the edge server. Different partition sizes and different execution durations demand different bandwidth resources. The authors schedule tasks according to tasks' characteristics to minimize the completion time. Multi-Radio Access Technology (Multi-RAT) is employed in [17] for task partitioning. Their scheme assigns tasks to different edge servers based on the network conditions and the queue status.
In [14]- [17], only external scheduling is addressed. The cases of multicore are considered in [18]- [20] for the speed-up of the task execution. In [18], they inspect how the number of cores affects the system performance. Their scheduling allocates tasks to edge servers by weighing the availabilities of cores and tasks. The work in [19] deliberates the different power consumption in heterogeneous-core designs. The authors of [20] advocate that low delay is more demanding in remote healthy monitoring systems. They show that multicore edge servers are more suitable for real-time operation.
There are also works paying particular attention to the properties of tasks, such as workload, deadline, size, and so forth. The context of [21] is a data center responsible for the timely delivery of data. Various tasks have different workloads and deadlines. Multiple cores operate at different clock speeds. Their scheme allocates weighted execution time to cores based on the workloads of tasks. In [22], tasks are assumed to be independent and with different priorities. With efficient management in mind, a scheme is designed to meet tasks' QoS (Quality of Service) requirements as much as possible. A genetic algorithm is applied to multi-hop wireless networks in [23]. Data to be processed is partitioned and assigned to wireless nodes for execution. The authors define a cost function, counting for the influence of computation and communication, for improved scheduling performance. The work in [24] examines the scheduling problem in smart-grid environments. A cost function is defined to determine the target edge server for task offloading. A two-stage scheme is proposed in [25] to minimize the overall operating cost of the entire edge computing system.
Features and appeals of the approaches mentioned above are tabulated in Table I. Aiming at a comprehensive and versatile schedule scheme for edge computing, we include all three critical factors, namely partitionability, multicore, and task properties (such as deadlines or sizes), and match them with servers' computing power, storage spaces, loadings, and networking bandwidth in this study.

A. SYSTEM MODEL
The context of this study is illustrated in Figure 1. It is a homogeneous network consisting of small cells, edge servers, and end devices. Edge servers are deployed in the proximity of small cells to serve end devices within a single-hop range. End devices periodically collect data and generate tasks, which will be offloaded to edge servers for execution. Each edge server has its task buffer. Results of task execution will be returned to the end device and/or passed to the cloud server for further analysis. A scheduling scheme trying to maximize the reward of task completion needs to deliberate the varieties in the tasks' sizes, workloads, deadline, partitionability, as well as the diversity in edge servers' capacity, computing power, loading, etc.

B. PROBLEM FORMULATION
To facilitate the formulation of the problem to be addressed, a list of symbols is defined in Table II. End devices in the environment collect data and generate tasks forming the task set, , where each task is denoted by a 4-tuple, as follows: where is the allowed delay; is the workload; is the storage requirement; and is an indicator variable for task partitionability, i-th task Allowed delay of task Workload of task Require storage of task Indicator variable for task 's partitionability j-th edge server Available storage of edge server Number of cores of edge server The processing power of edge server Set of tasks assigned to edge server * Set of tasks ready to be executed The reward for the in-time completion of task Indicator variable indicating if task is assigned to edge server The timing task entering edge server Execution cost of task Φ Set of execution costs of tasks assigned to edge server The weighting factor for delay time The weighting factor for execution time The weighting factor for the number of cores A partitionable task can be divided into parts with each part assigned to a core. There are edge servers in the environment, , where each server is represented by a 4-tuple, as follows: where is the storage capacity; is the number of cores in ; is the computing power; and is the set of tasks assigned to . Assume there is a reward, , for the completion of task . The objective of our scheduling scheme is to find a subset of , * , to maximize the overall reward. * consists of tasks which can be completed in time. Overdue tasks will be excluded. This problem can be formulated as a constrained optimization problem, as follows: The in (6a) is an indicator variable. is equal to 1 if task is assigned to edge server , and 0 otherwise. Equation (6b) implies that a task can only be assigned to an edge server. Equation (6c) ensures that the storage capacity of an edge server cannot be exceeded. Equation (6d) specifies that a task can be partitioned into at most partitions for an edge server with cores. Equation (6e) defines the condition weather task can be completed in time if assigned to the edge server . Reward of task would be assigned a value of 0 if it cannot be completed in time. , and , are the waiting time and execution of task running on an edge server . , can be evaluated according to: where , is the commencement time of execution of task on edge server 's k-th core and , is the time task entering edge server . , can be evaluated as follows: where , is the finishing time of execution of task on edge server 's k-th core.

Theorem 1:
The scheduling problem addressed in this study, Equation (6), is an NP-complete problem. Proof: Since (6b) denotes that a task can only be assigned to an edge server and (6d) can be set that a task can be partitioned into at most one partition for an edge server with cores, we reduce the original optimization problem into a simpler one with only one constraint, as follows: This is in effect a scheduling problem considering only the capacities of edge servers if we further incorporate (6e) by setting = 1 = 1 ( ) = _ . It can be further rewritten into the following form: The 0/1 knapsack problem is a classical NP-complete problem [26]. A typical formulation of it is given bellows: where is number of items; is the value of item ; is an indicator variable indicating if item is packed; is the size of item ; and is the capacity of the knapsack. The objective is to maximize the total value of the packing without exceeding the capacity limit.
The problem in (9) can be referred to as problem (10) following the above deductive steps. Therefore, the scheduling problem addressed in this study, Equation (6), is also an NPcomplete problem. ▓

IV. Multicore task Assignment for maximum Rewards (MAR)
To be practical, an NP-complete problem ought to be approached by heuristic methods. A two-stage scheme is developed in this section to solve the problem addressed in the last section. In the first stage, server assignment, tasks are assigned to selected edge servers, considering the servers' capacities and accumulated workloads. In the second stage, core assignment, tasks are assigned to chosen cores in the server. The operational flowchart of the proposed MAR scheme is given in Figure 2. Figure 3 is in charge of the offloading of tasks to edge servers. It considers edge servers' capacities and accumulated workloads to offload tasks to adequate edge servers.

Stage 1 -Server Assignment Algorithm 1 in
Line 1 initializes . Line 2 to 11 exercises the server assignment. Line 3 to 6 lists the requirements of an adequate server. Line 4 specifies that and are within a single-hop distance. Line 5 indicates that the inclusion of does not exceed the storage capacity of . In addition to the conditions mentioned above, a server with the least accumulated processing workload is preferred, as shown in Line 6. If an adequate edge server, , can be found, and will be updated accordingly, as shown in Line 9 to 11. Otherwise, will be removed from , as indicated in Line 7 to 8.

Stage 2 -Core Assignment
The core assignment is illustrated in Figure 4. The edge server evaluates the execution cost of tasks assigned to it with Function 1. Execution cost decides the order of execution. The lower the execution cost is, the higher the execution priority is. A task might be divided into parts depending on its partitionability. The proposed scheme assigns a standalone task to a core or partitions it into subtasks to multiple cores, one subtask for each core, minimizing the overall execution time. The ultimate goal is to accomplish as many tasks as possible to receive a maximal total reward.   Due to the variety in tasks' workloads and deadlines, a dedicated scheme is required to conduct the core assignment once tasks had been assigned to a particular edge server. To decide the order of task execution, our scheme calculates the execution cost of individual tasks. Lower cost implies a higher priority of execution. Also, the associated reward can be easier to be obtained. Φ = { 1 , ⋯ , } is the set of execution cost, , of tasks assigned to edge server , where is the number of tasks assigned to . The evaluation of depends on its partitionability, as follows: Equation (11) takes into account tasks' allowed delays, workloads, rewards, as well as servers' processing power, number of cores. is a weighting factor for the allowed delay; is a weighting factor for the execution time; is a weighting factor for the number of cores; is the number of cores of edge server An important implication of (11) is that a partitionable task is assigned with a higher cost since it can be partitioned and assigned to multiple cores for execution. The methodology is to endow non-partitionable tasks with higher priorities of execution to increase their chance of completion to earn higher total rewards. The cost of execution can be evaluated with Function 1 given in Figure 5. The time complexity of Function 1 is of ( ) = ( ).
After execution costs of all tasks are evaluated, Algorithm 2core assignment, is called for core dispatch. Tasks in the queue are sorted according to their execution costs in ascending order and processed one by one. A task will be dropped if it is not possible to be completed by its deadline. Figure 6 presents Algorithm 2. Line 2 evaluates the execution costs of all tasks assigned to the edge server . Line 3 to 22 conduct the core dispatch. Task, , with the least execution cost is picked up in Line 5. Line 5 evaluates the waiting time of the task . Line 6 calculates the time left for execution without overdue. Line 7 examines the execution time if not partitioned. Line 8 to 10 checks if a least loaded core can meet the deadline. Line 11 to 20 considers the case that a single core is not enough and looks to the possibility of task partitioning. Line 13 calculates the execution time if a partitionable task is partitioned and assigned to multiple cores. If it is within the delay bound of the task , the assignment will be committed. Otherwise, the task will be dropped.

V. SIMULATIONS AND ANALYSIS
A series of simulations were conducted using MATLAB to verify the feasibility and effectiveness of the proposed MAR scheme. MATLAB object classes for end devices and edge servers were created to simulate the functionalities and behaviors of their counterparts in the real world. The simulations platform is a Windows 10 PC with an i5 CPU, 16 GB DDR RAM, and 500GB SSD. For each experiment, we report the average results of 30 runs.
The completion ratio of tasks which denotes the number of tasks completed in time divided by the total number of tasks is a performance indicator of primary concern. A high completion ratio also implies a high satisfaction rate. Number of cores in each edge server 4 0.2 0.8

A. SIMULATION SETTINGS
The parameter settings for the simulations are given in TABLE III. In the simulated environment [27], there are 105 small cells residing in a 1550 m × 1550 m region. The radius of a small cell is 100 m. Small cells are 150 m apart from each other. An edge server with four cores is deployed at each small cell. It is assumed that every end-device is singlehop with its neighbor edge server, and edge servers can communicate with each other via small cell. The computing power of each core is 50 MCPS (Million Cycles Per Second). There are 500 to 3,000 end devices in the environment. Tasks generated by end devices have their workload from 270,000 to 330,000 clock cycles and storage requirements between 2,880 and 3,520 bits. We also consider the distribution of end devices. There are two scenarios in our simulations, as shown in Figure 7 and Figure 8. Figure 7 is a typical scenario for uniform distribution. For non-uniform distribution, there are ten hotspots out of 105 small cells, as shown in Figure 8. 60% of the end devices reside within the coverage of hotspots. In other words, each hotspot cell covers 6% of the end devices.

B. COMPARATIVE STUDY AND ANALYSIS
To figure out the relative merits and limitations, we have the following schemes included in our comparative study: • First-Come-First-Serve (FCFS): Tasks are executed in the order they arrive at the edge servers. A core will be occupied till the completion of a task. • Shortest Task First (STF): Scheduling is done according to tasks' workloads. Tasks with smaller workloads are preferred. The task with the smallest workload will be scheduled first. The dispatch of core begins with the least loaded core. • Delay Priority Scheduling (DPS): The scheduling is done based on allowed delays of tasks. A task with a smaller allowed delay is given a higher priority in execution. The task with the smallest allowed delay will be scheduled foremost.
• Green Greedy Algorithm (GGA) [24]: A cost function considering tasks' workload, edge servers' computing capability, and power consumption is defined. The total workloads of servers are used to decide the target server in task offloading. • Proposed: The MAR scheme proposed in this article.
Here we assume that the reward for each task is the same. Then, the maximal overall reward is equivalent to the maximal number of completed tasks. Hence, the task complete ratio, the percentage of completed tasks, is our primary concern. We shall examine how the complete ratio is affected by different system settings, such as the number of end devices, the distribution of end devices, the setting of , etc. We use the task generation rate to indicate the percentage of end devices that generate tasks in an epoch. Figure 9 shows how the completion ratio changes as the number of end devices vary in Scenario 1. When the system is lightly loaded, all schemes perform well. The proposed scheme prevails as the system load gets heavier. We have a performance ranking list as follows: The Proposed MAR scheme, DPS, GGA, SFT, and finally FCFS. We can see a similar pattern in subsequent simulations. GGA, STF, and FCFS do not consider the allowed delays. Tasks can become overdue and be dropped. GGA defines a cost function to decide offloading server. Therefore, it has better performance than STF and FCFS. There are fewer dropped tasks with DPS. However, DPS considers only allowed delay, so its performance is still inferior to that of the proposed MAR scheme.  Figure 10 presents the results of the same simulation in the Scenario 2 environment. There is a slightly different performance ranking list: The proposed MAR scheme, GGA, DPS, STF, and FCFS. The performance has significant degradation due to the hotspot effect. However, the proposed MAR scheme still maintains a clear superiority over other schemes. It can be seen that the cost function scheme of GGA renders it outperforming DPS in the scenario with hot spots. We turn our attention to the effect of , the weighting factor for the number of cores. The completion ratio of all tasks and partitionable (independent) tasks are given in Figure 11. The proposed scheme is designed to postpone the execution of partitionable tasks since a partitionable task can be divided and assigned to multiple cores for a shorter execution time. According to the design, a larger will favor nonpartitionable tasks and a smaller will favor partitionable tasks and. The improved overall completion ratio with a greater value is at the cost of a lower completion ratio for partitionable tasks. We now have a closer look into the system behavior within hotspots. As shown in Figure 12, the performance pattern resembles that in Figure 10, whereas with a shift down along the y-axis and steeper declination. It implies that a large number of end devices in a hotspot increases the amount of dropped tasks.  Figure 13 illustrates how system performance is affected by the distribution of end devices. When there are 20% of end devices located within hotspots, all schemes have a completion ratio above 85%. Note that as the percentage increases to 80%, the system performance drops rapidly. It is because a large number of end devices residing within hotspots would saturate the system capacity. Anyhow, the proposed MAR scheme can still maintain a completion ratio as high as 85%. The effect of the different number of cores is given in Figure  14. For the same number of end devices, a more-core setting leads to a higher completion ratio. Because partitionable tasks can be allocated to multiple cores for shortening the execution time. The chance of overdue is also decreased accordingly. As the number of end devices increases, the performance difference among the different number of cores becomes increasingly significant. For 3,000 end devices, a six-core setting can maintain a completion ratio of 90%, while the completion ratio of a dual-core setting drops to 30%.

VI. CONCLUSIONS
With the rise of the Internet of Things, the amount of collected data grows explosively. Due to the insufficient computing power of the end devices, tasks should be offloaded to edge servers for the sake of timely response and balanced loading. However, due to the uneven distribution of devices in the real environment, hot spots may occur. There could be too many devices under some servers, which may cause some tasks to expire due to failing to be scheduled properly.
To solve this problem, this paper proposes a task scheduling mechanism. First, we weigh each edge computing server's storage space and workload to ascertain an adequate target server for the task offloading. Next, the internal task scheduling, the core assignment, is executed. We distinguish the priority of different tasks by calculating the execution cost of each task. The execution cost factors the allowed delay, the workload, the number of processor cores, and processing power. The priority of non-partitionable tasks will be slightly higher than partitionable tasks. Partitionable tasks can reduce the chance of overdue by partitioning them into sub-tasks and mapping them to multiple cores.
In our simulations, we observe the task completion rate under different conditions. Simulation results corroborate that the proposed scheme outperforms FCFS, STF, DPS, and GGA in maintaining a higher task completion ratio under different situations and conditions.