Computation Offloading and Task Scheduling for DNN-Based Applications in Cloud-Edge Computing

Due to the high demands of deep neural network (DNN) based applications on computational capability, it is hard for them to be directly run on mobile devices with limited resources. Computation offloading technology offers a feasible solution by offloading some computation-intensive tasks of neural network layers to edges or remote clouds that are equipped with sufficient resources. However, the offloading process might lead to excessive delays and thus seriously affect the user experience. To address this important problem, we first regard the average response time of multi-task parallel scheduling as our optimization goal. Next, the problem of computation offloading and task scheduling for DNN-based applications in cloud-edge computing is formulated with a scheme evaluation algorithm. Finally, the greedy and genetic algorithms based methods are proposed to solve the problem. The extensive experiments are conducted to demonstrate the effectiveness of the proposed methods for scheduling tasks of DNN-based applications in different cloud-edge environments. The results show that the proposed methods can obtain the near-optimal scheduling performance, and generate less average response time than traditional scheduling schemes. Moreover, the genetic algorithm leads to less average response time than the greedy algorithm, but the genetic algorithm needs more running time.


I. INTRODUCTION
With the rapid development of deep learning (DL) [1], deep neural network (DNN) based applications, such as personalized recommendation systems [2], face recognition systems [3], and license plate recognition systems [4], have become an integral part of people's daily life. The high intelligence of DNN-based applications relies on large-scale and complex DNNs, and thus they commonly require sufficient resources and lead to high energy consumption [5]. However, mobile systems are usually equipped with limited resources [6], including battery life, network bandwidth, storage capacity, and processor performance. Thus, complex The associate editor coordinating the review of this manuscript and approving it for publication was Shagufta Henna.
DNN-based applications cannot be directly run on mobile devices. One feasible solution is to offload all or part of computational tasks to remote clouds with sufficient resources [7]. More specifically, DNNs are first divided by the granularity of neural network layers [8]. Next, some computationally-complex neural network layers are offloaded to remote clouds for execution, while other tasks with simpler neural network layers are processed locally. Finally, the results are returned and integrated on mobile devices.
However, offloading tasks to remote clouds is significantly limited by the distance between users and remote clouds. Such long-distance leads to huge delays that might cause applications lagging with frequent user interactions, and it seriously affects the user experience [9]. Moreover, the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ leakage of user privacy might happen when offloading tasks to remote clouds. With the rise of edge computing, mobile edges have become the main platforms for implementing computation offloading [10]. Compared with remote clouds, mobile edges are closer to the user data and provide services nearby. Therefore, they can offer a faster network service response and meet the basic requirements of users for privacy protection [11]. However, it is difficult to realize computation offloading for DNN-based applications, due to the geographical distribution of mobile edges and the mobility of mobile devices [12]. To address this problem, in the early work of this paper, we designed an adaptive offloading framework for DNN-based applications in mobile edge environments. Correspondingly, we proposed a design pattern and reconstruction method to support the computation offloading of DNN-based applications.
In the process of computation offloading, task scheduling has become a new challenge [13]- [16], where various types of delays occur. For example, the data transmission delay happens when the data is transmitted between different computing nodes. After tasks are offloaded to target nodes, they might need to wait in queues due to the limited concurrency capability of nodes, and it results in the waiting delay. If the total delays of offloading are excessive, the average response time of tasks will be significantly increased, and thus it will seriously affect the user experience. Besides, different DNN-based tasks require various amounts of data transmission, while the network connections and data transmission rates between nodes are also diverse. Therefore, different scheduling schemes might lead to different delays, and it has become a tough issue to find an optimal scheduling scheme with the lowest average response time. The traditional computation offloading schemes are to offload all tasks to mobile edges or remote clouds for execution. However, they result in huge data transmission time. Therefore, it is necessary to design an effective scheduling scheme, especially when multiple tasks are executed concurrently.
To solve these problems, we propose an effective method for offloading and scheduling DNN-based applications in cloud-edge environments. The optimization goal is to reduce the average response time of multi-task parallel scheduling. Moreover, the proposed method is able to make scheduling decisions with high-efficiency in response to the mobility of mobile devices. The main contributions of this paper are summarized as follows.
• The problem of computation offloading and task scheduling for DNN-based applications in cloud-edge computing is formulated. Meanwhile, a scheme evaluation algorithm is designed to evaluate the solutions.
• A greedy algorithm based method is first proposed to address the problem, and it can achieve a near-optimal scheduling scheme in a short time. Next, a genetic algorithm based method is developed with better scheduling performance, but it requires more running time than the greedy algorithm.
• The extensive experiments are conducted to validate the effectiveness of the proposed methods under different scenarios of cloud-edge environments. The results show that the proposed methods achieve less average response time than traditional scheduling schemes. The rest of this paper is organized as follows. In Section II, the related work is analyzed. Section III formulates the problem of offloading and scheduling for DNN-based applications in cloud-edge computing, and a scheme evaluation algorithm is introduced in Section IV. Section V and Section VI discuss the greedy and genetic algorithms based methods for the scheduling problem, respectively. In Section VII, the proposed methods are evaluated. Finally, we conclude this paper and look for future work in Section VIII.

II. RELATED WORK
To relieve the limitation of mobile devices on computational capability, local tasks can be partially offloaded to remote clouds by using cloud computing technology. As a new type of business computing model, cloud computing is regarded an extension of distributed processing, parallel processing, and grid computing [17], [18]. At the early stages, most of the researches for computation offloading rely on cloud environments. Suradkar and Bharati [19] pointed out that offloading computing-intensive tasks to cloud platforms can improve battery life and the performance of mobile devices. For example, a computation offloading system (Phone2Cloud) was designed in [20], and it can offload application tasks from smart-phones to the remote cloud.
However, a lot of delays are generated when offloading application tasks to remote clouds. Thus, this offloading scheme is unsuitable for real-time applications. To address this issue, mobile edge computing (MEC) has emerged as a promising way to optimize the performance of computation offloading [21], [22]. With the rapid development of MEC, the research focus of computation offloading has gradually developed from clouds to edges. Moreover, Jeong et al. [7] indicated that machine learning (ML) based applications (especially using DNNs) consume a large number of computational resources. Therefore, mobile devices with limited computational capability cannot well support DNN-based applications. One feasible solution is to offload partial computing tasks of DNNs from mobile devices to nearby edge servers.
To better perform the computation offloading for DNNs, Kang et al. [8] designed a lightweight scheduler (Neurosurgeon) that can automatically divide the computation of DNNs at the granularity of neural network layers between clouds and mobile edges. To avoid the high latency and connection errors caused by offloading all DNNs to external devices, Saguil and Azim [23] proposed that some DNNs should be executed locally while the others can be split and offloaded to different devices. Moreover, many researchers have contributed to the problems of computation offloading and task scheduling. For example, Jia et al. [24] proposed an online heuristic algorithm for task offloading that can minimize the completion time of applications on mobile devices. Based on the Lyapunov optimization, a dynamic computation offloading algorithm was developed in [25] to jointly determine offloading decisions and CPU-cycle frequencies for mobile execution and transmit power. Liu et al. [26] formulated the problem of delay minimization with power constraint, and then proposed a one-dimensional search algorithm to explore the optimal scheme of task scheduling. Guo et al. [27] provided an energy-efficient dynamic offloading and scheduling strategy, in order to reduce energy consumption and shorten the complement time of applications. Besides, the task scheduling was first modeled as an optimization problem in [28], and then a two-stage task scheduling cost optimization (TTSCO) algorithm was proposed to reduce the cost of edge computing systems by offloading the latency-sensitive tasks of IoT devices to the edge cloud. Its goal is to minimize the computational cost and meet the delay requirements of tasks. Moreover, a novel many-objective optimization algorithm based on hybrid angles (MaOEA-HA) was proposed in [29] to enhance the performance of task scheduling in cloud computing. Rahmani Hosseinabadi et al. [30] studied the selection of crossover and mutation operators in the genetic algorithm for addressing the open-shop scheduling problem (OSSP). To optimize the task scheduling problem, Keshanchi et al. [31] designed an improved genetic algorithm by integrating the evolutionary genetic algorithm with heuristics. Similarly, Ahmad et al. [32] improved the genetic algorithm by involving a heuristic in genetic operators and developed a hybrid genetic algorithm for scheduling workflow applications in heterogeneous computing systems. Different from the above work, the research objective of this paper is DNN-based applications. There are some researches about the computation offloading problem for DNN-based applications. For example, Qi et al. [33] designed an adaptive scheduling algorithm for choosing the processing environments (e.g., clouds or mobile devices) for DNN-based system models, according to the network condition between the remote cloud and mobile devices. However, only the current network condition is considered. If the network condition is good, the models can be offloaded to the remote cloud. Otherwise, the models should be processed locally. Therefore, this method cannot be applied to the complex multi-task scheduling problem proposed in this paper.

III. PROBLEM FORMULATION A. CLOUD-EDGE ENVIRONMENT
In a cloud-edge environment, there are commonly three types of computational resources, including mobile devices, edge nodes, and a remote cloud. In general, these resources are with different levels of performance, which are mainly reflected in their computational capability and concurrency.
Assume that there are k computational resources (denoted by S = {s 1 , s 2 , . . . , s k }) in a cloud-edge environment. Among these resources, there are a mobile devices (denoted by , and a remote cloud (denoted by c). For the clarity of presentation, these computational resources are regarded as k nodes, and each node is denoted as s i (i ∈ [1, k]). More specifically, each mobile device is denoted as m i (i ∈ [1, a]), and it corresponds to the node set {s 1 , s 2 , . . . , s a }. For example, the mobile device m 1 corresponds to the node s 1 , the mobile device m 2 corresponds to the node s 2 , and so on. Each edge node is denoted as e i (i ∈ [1, b]), and it corresponds to the node set {s a+1 , s a+2 , . . . , s a+b }. For example, the edge node e 1 corresponds to the node s a+1 , the edge node e 2 corresponds to the node s a+2 , and so on. Moreover, the remote cloud c corresponds to the node s k .
Besides, p i (i ∈ [1, k]) is used to indicate the number of concurrent lanes of the node s i , and it represents the maximum number of tasks that can be concurrently processed on the node s i . For example, if the number of concurrent lanes of a node is 3, up to 3 tasks can be simultaneously processed on the node, where each concurrent lane can be used to process a task.
Next, the connections between different nodes are represented by the two-dimensional matrices V and R as where v i,j is the data transmission rate between the nodes s i and s j , and r i,j is the response time between the nodes s i and s j . Moreover, the values of v i,j and r i,j are defined as where C v i,j and C r i,j are the constant values under different connection conditions.

B. DESCRIPTION OF DNN-BASED TASKS
In a DNN-based application with m neural network layers, each layer is regarded as a subtask. Thus, each DNN-based task consists of m different subtasks. Assume that the scheduling process is with n tasks, and the task set is denoted as T = {T 1 , T 2 , . . . , T n }. Meanwhile, each task can be denoted as T i = {t i,1 , t i,2 , . . . , t i,m }, where t i,j represents the jth subtask of the i-th task. Moreover, each task is generated on a mobile device and arrives with a rate of λ i , where i ∈ [1, a].
As each neural network layer in DNNs is processed orderly, m subtasks of a DNN-based task are also processed orderly. For example, the subtask t i,j will not be generated until the subtask t i,j−1 is processed. Similarly, the subtask t i,j+1 will only be generated after the subtask t i,j is processed.
Next, time i,j is used to represent the processing time for the subtask t x,i of the task T x on the node s j . Thus, the set of processing time for the subtasks on different nodes is defined as When the same subtask is processed on different nodes, the nodes with stronger computational capability lead to the smaller value of time i,j . Similarly, when different subtasks are processed on the same node, the subtasks with smaller resource requirements result in the smaller value of time i,j .
Besides, D = {d 1 , d 2 , . . . , d m } is used to represent the set of data transmission volume between different subtasks of a task, where d j indicates the data transmission volume between the subtasks t i,j and t i,j+1 .

C. FORMAL DEFINITION OF PROBLEM
During the process of task scheduling, t i,j .arrival is used to indicate the time when the subtask t i,j arrives at the node s y , and t i,j .begin is used to represent the time when the subtask t i,j begins to be processed. Therefore, the waiting time of the subtask t i,j on the node s y is defined as Next, the data transmission time of the subtask t i,j between the nodes s x and s y is defined as Moreover, the response time of the task T i is calculated from its generation to completion, and it is equal to the sum of the response time of all subtasks. More specifically, the response time r i,j of the subtask t i,j consists of three components, including the processing time, data transmission time, and waiting time, which can be denoted as Therefore, the response time of the task T i can be calculated by Correspondingly, the average response time of n tasks is defined as Based on the problem formulation, the objective of our work is to find an optimal scheduling scheme and use it to schedule the tasks of DNN-based applications, in order to minimize the average response time f ave (T ). Therefore, the objective function is needed to measure and guide the potential scheduling schemes for achieving the optimal one with the lowest value of f ave (T ). Meanwhile, the scheduling scheme can specify the processing node for each subtask and the processing order of subtasks on different nodes.
To solve this problem, one simple idea is to explore all possible scheduling schemes and find one with the lowest average response time. However, this strategy is with exponential complexity while it requires a large amount of running time. Therefore, it is necessary to design a more efficient method to solve this complicated problem of task scheduling.

IV. SCHEME EVALUATION
The scheme evaluation algorithm is used to evaluate scheduling schemes, where the average response time of a specific scheduling scheme is calculated. In general, better scheduling schemes lead to less average response time. In this paper, the proposed scheduling algorithms are optimized based on the results of the scheme evaluation algorithm, where the scheme evaluation algorithm simulates the scheduling process according to a specific scheme. During this process, the scheme evaluation algorithm first records the arrival and completion time of each subtask, and then it calculates the average response time of a scheduling scheme.
More specifically, curTime is first used to indicate the current time with an initial value of 0. Next, a scheduling scheme (denoted by scheme) is represented by a two-dimensional array with k rows. This array corresponds to k nodes, where each row orderly records the subtasks that will be processed on a node. Moreover, each subtask t i,j is with three attributes, including t i,j .arrival, t i,j .end, and t i,j .time. These three attributes indicate the time when the subtask t i,j arrives at the node, the time when the subtask t i,j is completed, and the remaining processing time of the subtask t i,j , respectively. They are initialized as The response time of a task is the time elapsed from its generation to completion. The time when the task T i is generated is the time when its first subtask t i,1 arrives (denoted by t i,1 .arrival), while the completion time of the task T i is the time when its last subtask t i,m is completed (denoted by t i,m .end). Therefore, the response time of the task T i is defined as Therefore, the average response time can be calculated by As shown in Algorithm 1, the key steps of calculating the average response time of a scheduling scheme are as follows.

Algorithm 1 The Scheme Evaluation
Input: A scheduling scheme (denoted by scheme). Output: The arrival time of each subtask (denoted by t i,j .arrival), the completion time of each subtask (denoted by t i,j .end), and the average response time (denoted by f ave (T )). 1: Initialize t i,j .time ← t i,j , curTime ← 0, t i,j .arrival ← (0orNone), and t i,j .end ← None. 2: slice ← ∞. 3: # Fill lanes. 4: for s x in S do 5: while s x .empty > 0 do 6: if t i,j .arrival ≤ curTime then 7: Add t i,j into pool x . 8: Remove t i,j from scheme. 9: s x .empty ← (s x .empty − 1). 10: end if 11: end while 12: end for 13: # Find the smallest time slice. 14: for s x in S do 15: for t i,j in pool x do 16: if t i,j .time ≤ slice then 17: slice ← t i,j .time. 18: end if 19: end for 20: end for 21: curTime ← (curTime + slice). 22: # Calculate the remaining processing time of subtasks. 23: for s x in S do 24: for t i,j in pool x do 25: t i,j .time ← (t i,j .time − slice). 26: # Generate the new subtask. 27: if t i,j .time ≤ 0 then 28: t i,j .end ← curTime. 29: Remove t i,j from scheme. 30: s x .empty ← (s x .empty + 1). 31: # Assume that the new subtask is processed on the node s y . 32: if j = m then 33: j+1 (x, y)). Step 1: Fill lanes. According to scheme, the subtasks on each node are orderly placed into lanes until there is no idle lane on the node. More specifically, s x .empty is used to indicate the number of idle lanes on the node s x , and the subtasks that have been placed into lanes will be removed from scheme. The subtask t i,j in a lane needs to meet the condition (i.e., t i,j .arrival ≤ curTime), which means that this subtask must have arrived at the node s x . Moreover, pool x is used to represent the subtasks that have been placed into the lanes of the node s x .
Step 2: Find the smallest time slice. First, the subtasks in each lane will be traversed, in order to find the subtask t i,j with the smallest value of t i,j .time. Next, t i,j .time is regarded as a time slice (denoted by slice), and curTime is updated by adding slice.
Step 3: The remaining processing time of subtasks in lanes is subtracted by slice, which indicates that the subtasks have been executed for a slice of time. For the subtask t i,j on the node s x , if t i,j .time ≤ 0, the subtask t i,j has been completed, and t i,j .end = curTime will be recorded. Next, the subtask t i,j will be removed from the lane, and s x .empty will be increased by 1. If the subtask t i,j is not of the last neural network layer (i.e., j = m), the following subtask t i,j+1 will be generated. Thus, the processing node s y for the subtask t i,j+1 needs to be found, and the data transmission time can be calculated by Repeat the above steps until the arrival and completion time of all subtasks are determined. Finally, the average response time of scheme can be calculated by using Equation (11).

V. GREEDY ALGORITHM FOR SCHEDULING
The greedy algorithm always makes the current best choice as it is solving a problem. When it comes to the scheduling problem, the node with the lowest response time will always be chosen for processing newly-arriving subtasks. After all the subtasks are allocated to the processing nodes, a scheduling scheme can thus be generated.
According to Equation (7), the response time r i,j (y) of the subtask t i,j consists of the processing time, data transmission time, and waiting time. When the greedy algorithm makes decisions, it will always choose the node s y that leads to the smallest r i,j (y).
On each node, the subtasks in waiting are sorted by their size (the required processing time). Based on the rule of the shortest job first (SJF), smaller jobs (with less processing time) will be processed with higher priority. For example, the processing order of the subtasks t i,j and t i,j on the node s y is determined by comparing time j,y and time j,y . Meanwhile, the beginning time of the subtask t i,j (i.e., t i,j .begin) is also the completion time of the subtask that precedes it (i.e., t i,j−1 .end). Therefore, when calculating the waiting time w i,j (y) of the subtask t i,j , both t i,j .begin and t i,j .arrival can be obtained by using Algorithm 1.
As shown in Algorithm 2, the key steps of the greedy algorithm for scheduling are as follows.
Step 1: Find all reachable nodes s x .avl of the node s x . If there exists the network connection between two nodes 1: # Find all reachable nodes of the node s x . 2: for s y in S do 3: if v x,y > 0 then 4: Add s y into s x .avl.

5:
end if 6: end for 7: # Calculate the response time of each reachable node. 8: for s y in s x .avl do 9: Call Algorithm 1 to calculate t i,j .begin and t i,j .arrival. 10:

12:
r i,j (y) ← (time j,y + g i,j (x, y) + w i,j (y)). 13: end for 14: minNode ← s x . 15: # Choose the node with the lowest response time. 16: for s y in s x .avl do 17: if r i,j (y) < r i,j (minNode) then 18: minNode ← s y . 19: end if 20: end for while the data can be transmitted, these two nodes are mutually reachable.
Step 2: Calculate the waiting time w i,j (y), data transmission time g i,j (x, y), and response time r i,j (y) for each reachable node.
Step 3: Choose the node s y with the lowest r i,j (y) to process the subtask t i,j .

VI. GENETIC ALGORITHM FOR SCHEDULING
The genetic algorithm is considered as a useful meta-heuristic algorithm that can offer high-quality solutions to a wide range of combinatorial optimization problems, including the task scheduling problem [30]- [32].
In this paper, n tasks are divided into (n * m) subtasks according to neural network layers. Each subtask corresponds to a gene loci, and thus there are (n * m) gene loci in total. The numbering of a gene loci starts from 0, and the i-th gene loci corresponds to the subtask t i m +1,i%m+1 . For example, the 0th gene loci corresponds to the subtask t 1,1 , the 1st gene loci corresponds to the subtask t 1,2 , and the m-th gene loci corresponds to the subtask t 2,1 . Moreover, the gene at each gene loci represents the node s x for processing the subtask t i,j . For instance, if the subtask t 1,1 is processed on the node s 1 , the gene of the 0th gene loci is 1. Thus, there are k genes that correspond to k nodes.
More specifically, each individual u i in the genetic algorithm is regarded as a potential scheduling scheme, where the average response time (denoted by u i .time) is calculated by using Algorithm 1. Moreover, the population size is denoted as size, and the average response time of each generation of the population (denoted by aveTime) can be calculated by Next, the individuals whose average response time is less than that of the population (i.e., u i .time < aveTime) will be retained to the next generation. However, the selection operations reduce the population size. To maintain the initial population size, crossover operators are used to expand the offspring population size. For example, the individuals u 1 and u 2 are first selected from the remaining individuals according to the roulette selection method [34]. Next, these two individuals are used as the parent generation to perform the single-point crossover [35], and thus their offspring individuals u 1 and u 2 are generated. Finally, the above process is repeated until the population size reaches the initial value of size.
Besides, mutation operations are used to change the genes of the offspring population for increasing diversity. Therefore, premature convergence can be avoided [36]. More specifically, a random number (in the range of [0, 0.1]) is generated with a mutation rate of µ. If the random number is less than µ, mutation operations will be performed. When performing the operations, the number of mutated genes (denoted by num) is first randomly generated. Next, num gene loci are randomly generated, where the genes will be changed randomly. Thus, the genetic mutations of organisms in nature are simulated.
As shown in Algorithm 3, the key steps of the genetic algorithm for scheduling are as follows.
Step 1: Calculate the average response time of each individual (denoted by u i .time) by using Algorithm 1.
Step 2: Find the best individual (denoted by best) with the minimum value of u i .time.
Step 3: Calculate the average response time of the population (denoted by aveTime).

VII. EXPERIMENTS
In this section, five different scenarios of a cloud-edge environment are simulated to evaluate the proposed greedy and genetic algorithms based offloading and scheduling methods for DNN-based applications.

A. EXPERIMENTAL SETTINGS
We implement the cloud-edge simulation environments and the proposed scheduling methods for DNN-based applications based on Python 3.6, where NumPy is used to provide massive mathematical function libraries for array and matrix operations. As shown in Figure 1, we simulate the cloud-edge environment with different task arrivals. More specifically, Figure 1(a) depicts the node settings, including 4 mobile 1: Initialize the first generation of the population. 2: Call Algorithm 1 to calculate the average response time of each individual (denoted by u i .time). 3: best ← u 0 . 4: # Find the best individual (denoted by best). 5: for i ← 1 to size do 6: if u i .time < best.time then 7: best ← u i . if u i .time < aveTime then 15: Add u i into newPopulation. 16: end if 17: end for 18: # Crossover operations. 19: while len(newPopulation) < size do 20: Perform the single-point crossover. 21: end while 22: Perform mutation operations. devices (i.e., m 1 , m 2 , m 3 and m 4 ), 2 edge nodes (i.e., e 1 and e 2 ), and a remote cloud c. The detailed performance metrics of the nodes are shown in Table 1. As for the essential parameters of the proposed methods, we set the population size as 1000, the maximum number of iterations as 500, and the mutation rate as 0.05. In the experiments, a DNN-based application with 7 neural network layers is simulated, where the number of tasks is 12 (numbered from 1 to 12). Thus, there are 84 subtasks in total. Figure 1(b) shows the tasks generated on mobile devices and the arrival time of each task. More specifically, the tasks T 1 , T 5 , T 9 and T 12 are generated on m 1 , the tasks T 2 , T 6 and T 10 are generated on m 2 , the tasks T 3 , T 7 and T 11 are generated on m 3 , and the tasks T 4 and T 8 are generated on m 4 , respectively. Meanwhile, the tasks on each mobile device arrive at a uniform speed within 1 second. For example, the 4 tasks on m 1 are with the task arrival rate of 1 4 (one task arrives per 0.25 seconds).Moreover, the data transmission volume (Mb) between layers is D = {1.2, 0.3, 0.8, 0.2, 0.4, 0.1, 0.05}. Besides, the processing time (ms) of each neural network layer on nodes is 163 163 163 163 107  81  69  12  112  12  12  10  10  8  219 219 219 219 132 109  92  21  21  21  21  18  16  15  313 313 313 313 231 185 152  25  25  25  25  22  18 14 820 820 820 820 583 394 330 Next, the following five different scenarios of cloud-edge environments are simulated with the above settings.
Scenario 1: The typical scenario. As shown in Table 2, the cloud-edge environment contains 2 edge nodes (i.e., e 1 and e 2 ) with the concurrent number of 2 (i.e., p = 2), 4 mobile devices (i.e., m 1 , m 2 , m 3 and m 4 ) with p = 1, and a remote cloud c with p = 8. c is connected to all mobile devices and edge nodes with the data transmission rate of 400 Kb/s and 600 Kb/s, respectively. e 1 is connected to m 2 and m 3 , and e 2 is connected to m 1 and m 4 , where the data transmission rate between an edge node and a mobile device is 2 Mb/s. But there is no connection between different mobile devices, neither for edge nodes.
Scenario 2: The scenario with limited mobile edges. As shown in Table 3, only one edge node (i.e., e 1 ) is simulated with p = 2, where e 1 is connected to all mobile devices and the remote cloud. Moreover, other settings are the same as Scenario 1.
Scenario 3: The scenario with sufficient mobile edges. As shown in Table 4, 2 edge nodes (i.e., e 1 and e 2 ) are simulated with p = 3. Moreover, other settings are the same as Scenario 1.
Scenario 4: The scenario with alternative mobile edges. As shown in Table 5, 2 edge nodes (i.e., e 1 and e 2 ) are simulated with p = 2 and p = 3, respectively. e 1 is     connected to m 2 , m 3 and m 4 with the data transmission rate of 2 Mb/s, while e 2 is connected to m 1 , m 2 and m 3 with the data transmission rate of 1.4 Mb/s. Moreover, other settings are the same as Scenario 1.
Scenario 5: The scenario with connected mobile edges. As shown in Table 6, 2 edge nodes (i.e., e 1 and e 2 ) are simulated with p = 2 and p = 3, respectively. Different from other scenarios, e 1 is connected to e 2 with the data transmission rate of 2 Mb/s. Moreover, other settings are the same as Scenario 1.

B. EXPERIMENTAL RESULTS
Based on the above settings, we evaluate the performance of the proposed greedy and genetic algorithms based offloading and scheduling methods for DNN-based applications. As shown in Figure 2, the proposed methods are compared with traditional scheduling schemes, including the load balancing scheme (all tasks are evenly offloaded to the remote cloud and nearby edge nodes), edge scheme (all tasks are offloaded to nearby edge nodes), and cloud scheme (all tasks are offloaded to the remote cloud). Table 7, we compare the average response time generated by using the proposed methods with the optimal results under different scenarios. The above results show that the highest average response time is generated in Scenario 2 with only one edge node. Thus, task congestion happens. In Scenario 3, edge nodes are with more concurrent numbers. Thus, task congestion is greatly relieved, and it leads to less average response time than the typical cloud-edge environment (Scenario 1). In Scenario 4, m 2 and m 3 are connected to the edge nodes with different performance. During the scheduling process, tasks tend to be processed on the edge node with better performance, where the task congestion happens. Thus, more  average response time is generated (only less than Scenario 2). In Scenario 5, edge nodes are connected, where the concurrent number of e 1 is less than e 2 . Therefore, tasks tend to be processed on e 2 . But tasks can be scheduled to e 1 when the task congestion occurs on e 2 . Therefore, the congestion can be relieved, and the average response time generated in Scenario 5 is less than Scenario 4. However, more average response time is still generated in Scenario 5 than Scenario 1 due to the data transmission delay between edge nodes.

As shown in
In each scenario, the average response time generated by using the proposed methods is close to the optimal result, and they outperform other scheduling schemes. More specifically, the edge and cloud schemes result in much data transmission delay and task queuing, and thus their average response time is much larger than other schemes. As edge nodes are close to users, the average response time generated by using the edge scheme is less than the cloud scheme. In Scenario 2, there is only one edge node with limited computational capability, and thus more average response time is generated by using the edge scheme. In Scenario 3, there are two edge nodes with better performance, and thus the average response time generated by using the edge scheme is much less than the cloud scheme. Compared with the load balancing scheme, the proposed methods achieve less average response time. This is because that the load balancing scheme does not consider the data transmission delay. If the data transmission time is more than the waiting time, the response time generated by using this scheme will be increased. By contrast, the proposed methods consider the processing time, data transmission time, and waiting time. Therefore, the response time of offloading and scheduling for DNN-based applications in cloud-edge environments can be effectively reduced.
As shown in Table 8, we compare the running time between the genetic and greedy algorithms under different scenarios. Although the average response time generated by using the genetic algorithm is less than the greedy algorithm (as shown in Table 7), the running time of the genetic algorithm is much more than the greedy algorithm. Moreover, when the total number of tasks is constant, there is almost no change in the running time of these two algorithms under different scenarios.
Furthermore, the running time of the greedy and genetic algorithms is evaluated by changing the total number of tasks in Scenario 1. As shown in Figure 3, the running time of the algorithms has a linear relationship with the total number of tasks.
In general, these two algorithms have their pros and cons. The greedy algorithm needs less running time, while the genetic algorithm offers better scheduling results. Moreover, the cloud-edge environment might change with the mobility of mobile devices. In an unstable environment, the greedy algorithm will be regarded as a better choice than using the genetic algorithm, because the greedy algorithm can provide faster decision-making speed.

VIII. CONCLUSION AND FUTURE WORK
In this paper, we first formulate the problem of computation offloading and task scheduling for DNN-based applications in cloud-edge environments and design a scheme evaluation mechanism. Next, the greedy and genetic algorithms based methods are proposed to efficiently explore the suitable schemes. The extensive experiments are conducted to verify the effectiveness of the proposed methods in different scenarios of cloud-edge environments. The results show that the genetic algorithm leads to less average response time than other scheduling schemes but needs more running time than the greedy algorithm. Therefore, these two proposed algorithms are suitable for different scenarios with diverse objectives. For example, the genetic algorithm is more suitable for offline tasks, since these tasks are not sensitive to the training time while the genetic algorithm can promise better application performance. By contrast, the genetic algorithm would be a better choice when dealing with online tasks, because these tasks require fast decision-making ability, which is also the advantage of the greedy algorithm. In the future, we will continue the research by using the learning-based methods, such as reinforcement learning, for better balancing the performance and overheads of algorithms. GEYONG MIN received the B.Sc. degree in computer science from the Huazhong University of Science and Technology, China, in 1995, and the Ph.D. degree in computing science from the University of Glasgow, U.K., in 2003. He is currently a Professor of high performance computing and networking with the Department of Computer Science, College of Engineering, Mathematics, and Physical Sciences, University of Exeter, U.K. His research interests include the future Internet, computer networks, wireless communications, multimedia systems, information security, high-performance computing, ubiquitous computing, modeling, and performance engineering. VOLUME 8, 2020