An Energy-Efficient Off-Loading Scheme for Low Latency in Collaborative Edge Computing

Mobile terminal users applications, such as smartphones or laptops, have frequent computational task demanding but limited battery power. Edge computing is introduced to offload terminals’ tasks to meet the quality of service requirements such as low delay and energy consumption. By offloading computation tasks, edge servers can enable terminals to collaboratively run the highly demanding applications in acceptable delay requirements. However, existing schemes barely consider the characteristics of the edge server, which leads to random assignment of tasks among servers and big tasks with high computational intensity (named as “big task”) may be assigned to servers with low ability. In this paper, a task is divided into several subtasks and subtasks are offloaded according to characteristics of edge servers, such as transmission distance and central processing unit (CPU) capacity. With this multi-subtasks-to-multi-servers model, an adaptive offloading scheme based on Hungarian algorithm is proposed with low complexity. Extensive simulations are conducted to show the efficiency of the scheme on reducing the offloading latency with low energy consumption.


I. INTRODUCTION
Mobile terminal devices are connected through the internet to accomplish many different applications and services, such as smartphones, laptops, sensors, machines, and vehicles, etc [1]. To extract valuable information from the huge amount of users' data, local computation with terminal devices are no longer provide demanding quality of services such as low latency and energy consumption [2], [3], especially for video image stream data processing [4]- [6]. In-vehicle networks, tasks with high latency sensitivity require lower processing time. Otherwise, message propagation among vehicles may fail [7]. Therefore, light-weighted servers are deployed on the edge around terminals to bring computation and storage resource from the centralized cloud (CC), which is called as Mobile Edge Computing (MEC) [8]. Tasks generated The associate editor coordinating the review of this manuscript and approving it for publication was Parul Garg. by terminals can be offloaded and processed on edge servers [9]- [10] instead of being transferred to CC with large delay, and tasks or applications can effectively meet the delay requirements [11]- [13]. As privacy and security become more important in our daily life [14]- [17], a low delay would be particularly important in privacy and security issues for mobile edge computing systems.
Many researches are devoted to reduce the offloading time and energy consumption for edge computing. More and more researchers considered the difference of tasks, such as computation-intensive task, delay-sensitive task, etc. For these scenarios, relationships between servers are also considered, such as the hierarchical servers, called as collaborative edge computing (CEC). CEC allows multiple servers to collaboratively offload different type of tasks to efficiently reduce time and energy consumption.
However, previous work only focused on how to offload different types of task. The difference of computation intensity caused by different types of demand are rarely taken into account in existing designs. Meanwhile, characteristics of servers, such as physical distance and CPU capacity, are not considered. If tasks with high computational intensity which we name as ''big tasks'' is assigned to servers with low ability or long task processing queue, the delay of offloading will be very large and the whole offloading process is choked.
In this paper, we focus on the relationship between tasks and servers, characteristics of edge servers, and a task can be divided into several subtasks to be offloaded to different servers. An adaptable offloading scheme based on Hungarian algorithm is designed to allocate subtasks to edge servers to reduce offloading latency and energy consumption. The main contributions of this paper are as follows: 1) A collaborative task offloading model is proposed in edge computing system. 2) We formulate the task offloading problem and design a distributed task offloading scheme to solve this problem. 3) Extensive simulations are designed to evaluate performance. The rest of this article is organized as follows. Section 2 introduces related works. Section 3 introduces the offloading model. The offloading scheme is presented in Section 4. In Section 5, simulation results are illustrated and discussed. And Section 6 concludes the paper.

II. RELATED WORK
In order to get low latency and energy consumption in edge computing, offloading is considered and widely adopted. For efficient offloading, existing offloading method can be summarized into three categories: partial offloading, fully offloading and preference offloading.

A. PARTIAL OFFLOADING
Considering that the storage and computation capacity of servers is limited, to guarantee the each task can be executed in one time slot, a task is divided into two or more parts, one part is processed at the local server, and the others is processed remotely [18]- [20]. By dividing task reasonably, the execution time or energy of the whole task can be effectively reduced. Shurman and Aljarah [21] proposed a collaborative method of distributing marginal resources for pre-partitioned application modules, which maximized the utilization of edge resources. This method leaded to less latency and less traffic over the network compared to executing modules on the cloud, which provides users with faster service delivery and reduced core network traffic. Wu et al. [22] proposed a dynamic task partitioning algorithm that can determine the optimal allocation of tasks performed locally or remotely. A weighted resource consumption map (WRCM) was constructed and a Minimum Cost Offloading Partition Algorithm (MCOP) is further proposed. The adaptive partitioning was briefly analyzed by program profiler, network profiler and network profiler. The algorithm can effectively reduce network overhead and can be applied when the network changes. He et al. [23] studied how to improve the computation capacity of cellular networks. A device-todevice mobile edge computing (D2D-MEC) technology is proposed and a mixed integer nonlinear problem is formed. This problem can be divided into two sub-problems, the first sub-problem minimizing the need of computation resources for a given edge D2D pair, the second sub-problem was based on the first sub-problem solution, maximizing the number of devices the system can support. By solving these two sub-problems, the computational capacity was effectively improved.

B. FULLY OFFLOADING
Fully offloading means the whole task offloaded to a device or a server, and the device or the server can process the task in a reasonable latency. Chen et al. [24] analyzed the multi-channel and multi-user computing offloading decision problem in edge computing system. They proved that this problem will always admits a Nash equilibrium, and designed a distributed computing offloading algorithm to achieve Nash equilibrium and effectively reduced time consumption by effectively solving problems. Wei et al. [25] investigated the scenario of multiple cell phone upload tasks to an MEC server and the challenge of allocating limited server resources and wireless channels between devices. A Select Maximum Saved Energy First (SMSEF) algorithm was proposed to solve the energy optimization problem of mobile devices with dividable tasks. Xing et al. [26] studied a new D2D multi-assistant MEC system. The authors employed a time division multiple access (TDMA) transport protocol. In this protocol, local users offloaded tasks to multiple assistants and download results from them at predetermined orthogonal intervals, which reduced computational latency, computational frequency of task execution, and algorithm complexity. Zhang et al. [27] proposed a contract-based computation resource allocation method, which improved the utility of the vehicle terminal in mobile edge computing. Xu et al. [28] explored a resource allocation method in the Internet of things (IoT) environment. A model named Zenith is proposed to establish an auction-based resource allocation contract, and a task scheduling model is developed according to specific tasks. Kim et al. [29] studied the resource management of mobile devices in a tradeoff environment. A series of shortterm goals are obtained by using Lyapunov optimization technology, and an optimization algorithm was proposed. Chen et al. [30] took the task offloading problem as a stochastic optimization problem, and used a stochastic optimization technology to transform a random problem into a deterministic optimization problem. Zhang et al. [31] studied the trade-off relationship between energy consumption and time consumption in edge computing. An energy-aware offloading scheme was proposed to jointly optimize communication and computing resources with limited energy and delay sensitivity for time and energy reducing C. PREFERENCE OFFLOADING Some researches focused on offloading tasks according to some preferences. Jiang et al. [32] studied the relationship VOLUME 7, 2019 between content popularity and user preference and task offloading in edge computing. The user popularity is predicted in the online phase and the user's preferences are learned in the offline phase. Through ''Follow The (Proximally) Regularized Leader'' (FTRL-Proximal) algorithm and Online Gradient Descent (OGD) algorithm, the cache-hit rate is improved. This method can reduce the computational complexity and optimize the edge cache problem.
In the paper, we focus on partial task assignment in edge computing system, but differ from above works. The above work about partial task assignment did not consider that a task can be divided into several subtasks and the synergy of servers. In this scheme, local server divides big tasks into subtasks and assigns them to neighbor servers based on Hungarian algorithm is designed, which take the characteristics of servers into account.

III. SYSTEM MODEL
Considering a mobile edge system consists of M user equipment and N edge servers. Edge servers connect with each other by cellular link. For each user, the nearest server that can provide task offloading service is called as the local server, while servers that are one-hop far away from the local server are called as neighbor servers of the local server. Assumed that each user generate only one task at a time slot and the ith task is described by a triple tuple where D i is the data size, C i is the CPU cycle required to successfully process the ith task and T tolerate i is the time tolerance of the ith task. Each task can be divided into several subtasks, especially for video and image streaming tasks. Then subtasks can be distributed to different servers to process [22]. This assumption also works for interdependent tasks, because such tasks can be partially offloading. One part of subtasks are processed locally and the rest part are processed remotely [20], [22], [33]- [35].
Servers will also provide computation offloading service to users and share computation resources among other edge servers to solve the problem of limited capacities of user equipment and other edge servers. Servers will periodically exchange their own information with their neighbors through cellular link. The servers' information is presented as a triple < f j , d j , w j >, j = {1, 2, · · · , N } to describes the characteristics of each server, where f j is the CPU capacity of the server j, d j is the distance, and w j is the waiting time if subtask need to be executed in server j.
A local server is considered being connected with K neighbor servers, and it will divide each task into (k + 1) subtasks. One subtask will be left on the local server and the others will be assigned to the k neighbor servers. The information of each subtasks is presented as a triple is the time tolerance of subtask.

A. TIME CONSUMPTION
In practical researches, the latency mainly consists of three parts: transmission latency, computation latency and queuing latency. In this work, transmission latency is the time spent on transferring a subtask from the local server to a neighboring one. Computation latency is the subtask processing time on the server. Queuing latency is the waiting time that a subtask costs on a server.
In this work, the orthogonal multiple access (OMA) based communication technology is adopted during the communication between edge servers. OMA based communication is the communication technology in present fourth generation (4G) and widely used in daily life since non-orthogonal multiple access (NOMA) based communication technology of the fifth Generation (5G) technology communication technology is still in a research and development stage. Thus the signal interference during transmission can be ignored [34]. According to Shannon formula [35], the system date rate between local server and neighbor server j is given as follows.
, where B is the network bandwidth,, N 0 is the background noise power, p tra u,j is the transmission power, h u,j is the channel power gain between subtask u and neighbor server j. Then, the transmission latency for the subtask u to edge server j can be presented as follows: If a subtask is processed on the local sever, the transmission delay zero. Noting that the download transmission delay and packet loss of the subtask is not considered because the size of subtask shrinks sharply after processed [24]. The computation latency when subtask u is executed in edge server j is depending on the server's computing capacity and can be described as follows [20], [29], [33]- [36].
By adding the three types of latency, the total latency T total u,j of the subtask u allocating to the server j is constructed 149184 VOLUME 7, 2019 as follows: where w j is the waiting time when subtask executed in edge server j. As mentioned above, one subtask will be left in local server for executing to make fully use of limited resource in local server. Thus T tra u,j = 0 if subtask executed in local server due to subtask doesn't need to be transferred to neighbor server.

B. ENERGY CONSUMPTION
In this work, we focus on the energy consumption which is the major concern that needs to be addressed. Energy consumption is mainly composed of two parts: the consumption of energy generated in the procedure of transmission; the consumption of energy generated in the procedure of server processing tasks.
On the one hand, the energy consumption in the procedure of transmission is as follows: On the other hand, the energy consumption in the procedure of processing tasks in the server is as follows [20]: where p com j is the energy consumption of per second of server j. By adding the two type of energy consumption, the total energy consumption is obtained as follows: If the subtask is executed in local server, then E tra u,j = 0 because the subtask does not need to be transferred to neighbor server.
A subtask will have different energy consumption when it is executed on different edge servers which are different in computation capacity, distance, transmission power, and CPU cost. The energy consumption matrix G energy i for subtasks is constructed as a matrix shown in expression (8). E total u,j stands for the energy consumption when subtask u executed on edge server j. The row is the energy consumption of subtask if it is offloading on corresponding edge servers where the local server is denoted as LS, the jth neighbor server is denoted as NS j . And the column is the energy consumption of different subtasks on the same edge server.

C. OPTIMIZATION GOAL
A vector X = (x 1 , . . . , x u , . . . , x k+1 ) is used to indicate which server the subtask u of task i is assigned to, x u = 0 indicates subtask uis executed on local server, x u = j indicates subtask u is executed on neighbor server j. Then we can formulate the optimization goal as follows where constraints (9b) indicates that the sum time consumption of all subtasks should be less than task time tolerance. Constraints (9c) indicates that the subtask can be only assigned among k neighbor servers and the local server. Constraints (9d) indicates that the Signal to Noise Ratio (SNR) must be higher than a threshold value to ensure successful transmission.

IV. ALGORUTHM DESIGN
In this section, we propose an energy overhead optimize task offloading scheme under multiple subtasks and multiple edge servers. The task offloading scheme consists of two phases, (1) the task division phase and (2) the subtask assignment phase. A task is divided according to the number of neighbor servers in the first phase of the algorithm, then we focus on the assignment strategy in the second phase. Next, a specific illustration is given on how a local server assign subtasks to its neighbor servers.

A. ILLUSTRATION OF ALGORITHM
As shown in Figure 2, when a task arrives at the local server, it may be divided locally into several subtasks, which depends on the computational intensity of the task and the computer capabilities of the local server. If the local server is busy in computation or has a long task-processing queue, subtasks will be assigned to other selected neighbor servers.
In task division phase, the number of subtasks is determined by the number of neighbor servers to make full use of the resources. At least one subtask is kept on the local server to take full advantage of the resources of the local server.
After the task is divided into subtasks on the local server, the subtasks would then be assigned to the neighbor servers considering several realistic factors of the server including local server resource limits, physical distance, server CPU capacity which goes to the second assignment phase.  Noting that the subtask assigned to neighbor servers have different transmission time and energy consumption. The physical distance and transmission power between different neighbor servers in the real world are different, which makes the transmission time consumption and energy consumption different. For example, the distance between the local server and the selected neighbor server may be very long, but the computing capacity of the neighbor server may be very high [28]. The distance between the local and neighbor servers may be short, but the computing capacity of the neighbor server may be low. Therefore, it is necessary to select neighbor servers elaborately to effectively reduce the energy consumption of the entire tasks.

B. SOLUTION
Since different subtasks are processed on different servers with different time and energy consumption, it is critical to assign which subtask to which server with minimum energy. After the first phase, two sets are produced, one is subtasks set, and the other is neighbor server set. To determine the best server for each subtask, a Multi-subTasks-to-Multi-Servers offloading scheme (MTMS) is proposed based on the Hungarian algorithm to assigning subtasks during the subtasks assignment phase.
Because edge servers broadcast the information about themselves, include CPU frequency, position and some network information and so on, each local server will maintain and update information about its neighbor servers. After the first division phrase, the local server decides to assign tasks to neighbor servers. The local server makes a calculation about transmission time and execution time when subtasks processed in servers, then forms an energy overhead matrix by add the transmission energy and the computation energy about subtasks processed in neighbor servers. After that, Hungarian algorithm is used to make the perfect decision about subtasks assignment. Finally, local server assigns subtasks to those servers. The detailed description is presented in Algorithm 1 as follows.

C. STABILITY AND COMPLEXITY ANALYSIS
Given a subtask set and a neighbor server set, the MTMS algorithm is stable if can each subtask can be successfully be offloaded by a server. Because we assume that a server accepts only one subtask in a time slot, and the number of subtasks to be assigned is equal to the number of neighbor servers plus one (local server). According to the Hungarian choose a subtask u from subtask set 15: choose the edge server w which has minimum energy consumption when execute subtask u according to the overhead matrix A 16: if w hasn't accept any subtask then 17: assign subtask u to w 18: else//w has accept subtask u' 19: if the energy consumption of subtask u' lower than that of subtask u in w then 20: subtask u is not assigned to w 21: else 22: w discord subtask u' and accept subtask u 23: subtask u' back to subtask set 24: end while 25: assigning subtasks to servers according to the subtask assignment decision. 26: end for algorithm, each subtask will be assignment to at least one neighbor server.
The other major advantage of the MTMS algorithm is the low-degree polynomial complexity [37]. From a practical point of view, Hungarian algorithm has produced solutions to many industrial problems that were hitherto intractable. A task is divided into (n + 1) subtasks, and each subtask need to compute the transmission and execution energy on every servers, then an overhead matrix is formed by adding transmission and execution energy, so the time complexity is O(n 2 ) in constructing overhead matrix. The time complexity is O(n 3 ) in decision making by using Hungarian algorithm. Thus the time complexity of MTMS is O(n 3 ), and n is the number of subtasks. 149186 VOLUME 7, 2019

V. EVALUATE THE PERFORMANCE
In this section, the performance of the multi-server task offloading scheme is investigated. We compare MTMS with a Non-assignment scheme, Greedy assignment scheme [23] and Random assignment to evaluate the performance. 1) Non-assignment scheme: tasks are stored and processed on the local server. 2) Greedy assignment scheme [25]: when the task is not processed on the local server, the neighboring server with the largest CPU capacity is chosen to be the server that the whole task is offloaded to. 3) Random assignment scheme: a task is divided into several subtasks and randomly assigned to multiple neighbor servers for processing. This is the most common and widely used task unloading method in industry at present.
We simulated the experiment on a windows computer, which contains a dual-core CPU, 4 gigabytes of memory and 200 gigabytes of external memory. Here, 10 servers are set up in the range of 50m × 50m. Without losing generality, we refer to the parameter settings in the existing work [20]. The size of the input data and tasks required for the number of CPU cycles are respectively [200, 400] kB and 1 × 10 9 , 5 × 10 9 , each server's CPU cycles is in [1,2] GHz random selection. The bandwidth is set up as 20 MHZ, transmission energy p tra i is set to 36 dBm, receiving noise power N 0 is set to 2 × 10 9 , and the channel power gain p gain i,j is set as −40d −4 , where d is the distance between each server, energy consumption of per second p com j set as [5,20] w, w is short for watt. Figure 3(a) shows that the total delay of the greedy assignment method is reduced by 40.39% compared with the nonassignment method. Compared with the greedy assignment method, the delay of the random assignment method is reduced by 80.14%. This is not surprising, because our method is to divide the task into multiple subtasks and assign them to multiple servers for processing in a distributed way. Compared with the random assignment method, the delay of the MTMS method is reduced by 40.93%. This is because MTMS is to assign subtasks based on the minimum energy consumption on the server. In response, our time will be less than random assignment. From the figure, we can see that as the number of tasks increases, the total delay of tasks will be reduced more and more by using the optimization method. Therefore, our proposed method is suitable for large-scale tasks assignment scenarios. When the number of neighbor servers is fixed, the delay of the task execution will be affected by the increased size of the input data of the task. As shown in Figure 3(b), the size of the input data increases from 400kB to 800kB and the number of tasks required for the number of CPU cycles increases accordingly. Compared with the non-assignment method, the total delay of the greedy assignment method is reduced by 41.12%. Compared to the greedy method, the random method can reduce the delay by 83.26%. Furthermore, compared with the random assignment method, the delay of the optimization method is reduced by 30.21%. This means that our approach is very suitable for handling large tasks and can be adapted to the upcoming 5G era.
As can be seen from Figure 4, the average delay of greedy assignment is reduced by 39.87% compared to the non-assignment method. Compared to the greedy assignment method, the average delay of random assignment is greatly reduced. Moreover, the average time of optimization assignment is 40.91% less than that of random assignment. From Figure 4, we can see that the time efficiency of random assignment method and optimal assignment method is much higher than that of non-assignment method and greedy assignment method, which proves that the idea of dividing tasks into multiple sub-tasks can effectively reduce the time consumption. And the MTMS algorithm has good time efficiency. When we fixed the task size and the CPU cycles required to complete the task, as the number of server increases, our total latency will gradually decrease. As can be seen from VOLUME 7, 2019   Figure 5, the delay of the greedy assignment method is reduced by 44.32% compared to the non-assignment method, and the delay of the random assignment method is greatly reduced compared to the greedy assignment method. Compared to the random assignment method, the delay of the MTMS method is reduced by 29.13%. When there is only one server, it means that only the local server is processing the task, so the delay in the four schemes are the same. However, as the number of neighbor servers increases, the number of subtasks after the task division phase increases, and the data size of each subtask decreases. So the more the number of neighbor servers, the better the time efficiency of our MTMS method.
It can be seen from Figure 6(a) that the energy consumption generated by our MTMS method is reduced by 50.15% compared with the non-assignment method, and the energy consumption generated by the MTMS method is reduced by 27.74% compared with the greedy assignment method. Compared with the random assignment method, the energy consumption of the MTMS method is reduced by 16.69%. This shows that the MTMS algorithm can effectively reduce energy consumption. And the distributed task assignment method has less energy consumption than the centralized task assignment method. As shown in Figure 6(b), when the number of servers are fixed, the energy consumption of the MTMS scheme was reduced by 47.36%, 24.63%, and 15.66%, compared to the non-assignment method, the greedy assignment method, and the random assignment method respectively. As the size of the task increases, the energy 149188 VOLUME 7, 2019 consumption gap between the optimization algorithm and the other three distribution methods will become larger and larger, indicating that the MTMS algorithm is very suitable for processing large data volume tasks. As shown in Figure 6(c), the task input size and the required number of CPU cycles are fixed. And as the number of servers increases, the energy consumption of the MTMS method is reduced by 45.99%, 5.30%, and 16.75% compared to the non-assignment method, the greedy assignment method, and the random assignment method respectively. It can be seen from the figure that as the number of servers increase, the total energy consumption of the random assignment method will be higher than that of the greedy assignment method, so random assignment has limitations in reducing energy consumption.
It can be seen from Figure 7 that the average energy consumption generated by the MTMS scheme is reduced by 51.51% compared with the non-assignment method, and the average energy consumption generated by the MTMS method is reduced by 26.94% compared with the greedy assignment method. Compared with the random assignment method, the average energy consumption of the MTMS method is reduced by 16.70%. As can be seen from the figure, the MTMS can effectively reduce the energy consumption as the number of tasks increases, the energy consumption shows a downward trend. Therefore, the proposed MTMS is very suitable for the Internet of Things environment with a large number of devices and tasks.

VI. CONCLUSION AND FUTURE WORK
To effectively reduce the time and energy consumption of task processing, task segmentation is combined with edge server collaboration. In this work, big task can be divided and be assigned to neighbor servers to minimize the energy consumption of the whole offloading process. Therefor a Multi-subTasks-to-Multi-Severs offloading scheme (MTMS) based on the Hungarian algorithm is proposed. The MTMS algorithm can provide the optimal assignment decision for subtasks offloading, which minimizes the time and energy consumption of the entire task processing. Experiments and simulations verify the effectiveness of our ideas and methods.
In this paper, the orthogonal channel is used as a transmission channel, but in the upcoming fifth-generation (5G) era, the channels are non-orthogonal. The advantage of nonorthogonal channels is that the efficiency of the frequency can be improved, including frequency rate and bandwidth. Non-orthogonal channels can improve access quality and can transmit multiple tasks simultaneously. How to combine edge computing with 5G is the focus of our work in the future. JIN  He has authored or coauthored more than 250 scientific articles in high-standard Science Citation Index (SCI) journals. His research interests include software engineering, computational intelligence, wireless networks, bioinformatics, and embedded systems.
R. SIMON SHERRATT (M'97-SM'02-F'12) received the B.Eng. degree in electronic systems and control engineering from Sheffield City Polytechnic, U.K., in 1992, and the M.Sc. degree in data telecommunications and the Ph.D. degree in video signal processing from the University of Salford, in 1994 and 1996, respectively. In 1996, he has appointed as a Lecturer in electronic engineering at the University of Reading, where he is currently a Professor of consumer electronics and the Head of the wireless and computing research. He is also a Guest Professor with the Nanjing University of Information Science and Technology, China. His research topic is on signal processing in consumer electronic devices concentrating on equalization and DSP architectures, specifically for personal area networks, USB, and Wireless USB. He