Q-Learning-Based Task Offloading and Resources Optimization for a Collaborative Computing System

Mobile edge computing (MEC) can effectively overcome the shortcomings of high-latency in mobile cloud computing (MCC) by deploying the cloud resources, e.g., storage and computational capability, to the edge. However, the limited computation capability of the MEC restricts the scalability of offloading. Therefore, the basic requirements of the MEC system are to explore effective offloading decisions and resource allocation methods. To address it, we develop a collaborative computing system composed of local computing (mobile device), MEC (edge cloud) and MCC (central cloud). Based on the proposed collaborative computing system, we design a novel Q-learning based computation offloading (QLCOF) policy to achieve the optimal resource allocation and offloading scheme by prescheduling the computation side for each task from a global perspective. Specifically, we first model the offloading decision process as a Markov decision process (MDP) and design a state loss function (STLF) to measure the quality of experience (QoE). After that, we define the cumulation of STLFs as the system loss function (SYLF) and formulate an SYLF minimization problem. Due to the difficulty to directly solve the formulated problem, we decompose it into multiple subproblems and preferentially optimize the transmission power and computation frequency of the edge cloud by the quasi-convex bisection and polynomial analysis method, respectively. Based on the precalculated offline transmission power and edge cloud computation frequency, we develop a Q-learning based offloading (QLOF) scheme to minimize the SYLF by optimizing offloading decisions. Finally, the numeral results show that the proposed QLOF scheme effectively reduces the SYLF under different parameters.


I. INTRODUCTION
In the era of the Internet of Things and mobile computing, smart devices (such as smartphones, laptops, wearables, automotive devices, etc.) present an explosive growth [1], [2]. Meanwhile, all kinds of emerging applications (such as face recognition, natural language processing, augmented reality, etc.) also have a significant increase. Whereas most of these applications are computation-intensive, delay-sensitive, and high energy-consuming, which brings a heavy burden on mobile devices due to their limited computation capabilities and battery capacities [3], [4]. As a result, mobile cloud computing (MCC) as a promising technology has The associate editor coordinating the review of this manuscript and approving it for publication was Igor Bisio. been proposed [5]. In fact, MCC is the integration of cloud computing and mobile computing, and it provides considerable computation capability, storage, and energy for mobile devices [6]. However, for real-time or delay-sensitive applications, MCC cannot guarantee a high-quality service due to the high transmission latency [7]. Based on this, mobile edge computing (MEC) is proposed to enhance MCC by deploying cloud resources (such as storage and computing resources) to the edge so as to provide the fast and relatively powerful computation capability [8], [9]. Although MEC has great potential to relieve the burden on the core network, it is still insufficient for some high computation-intensive applications due to the limited computation capability [10], [11]. To this end, we propose a collaborative computing system composed of a mobile device, an edge cloud, and a central cloud to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ utilize resources of the three processors effectively. For the moment, the main challenge is how to design an efficient computation offloading scheme, including (i) How to allocate the transmission power when the task is offloaded? (ii) How to allocate the computing resources of edge cloud and central cloud? (iii) How to allocate workloads to the central cloud and edge cloud when the task is offloaded? (iv) Which task should be offloaded? To address the above problems, a hierarchical computing scheme is implemented in a collaborative computing system, where three processors can opportunistically process tasks. On the one hand, the non-computation-intensive tasks can be executed on the mobile device to guarantee lower system latency and higher energy-efficiency. On the other hand, computation-intensive tasks can be executed on the collaborative cloud by combining the MEC and MCC. As for the first three problems, we still implemented the hierarchical computing scheme in the collaborative cloud computing system to optimize resource allocation. For example, the highly computation-intensive task is allocated to the central cloud with a large proportion so that the powerful computation capability of the central cloud can be exploited effectively. On the contrary, the delay-sensitive and general computation-intensive tasks can be allocated to the edge cloud with a large proportion to achieve low-latency and high energy-efficiency. As for the last problem, we utilized the Q-Learning based OFfloading algorithm (QLOF) of reinforcement learning (RL) to obtain the global optimal offloading decisions, which is an off-policy control that separates the deferral policy from the learning policy and updates the action selection using the Bellman optimal equations and the e-greed policy [12]. Inspired by the great success of DeepMind's Go software AlphaGo, most of the existing offloading algorithms are based on the RL. Min et al. [13] proposed a model-free scheme based on the deep Q-network (DQN). Such an RL-based scheme is shown able to learn the optimal offloading policy through sufficient interactions with the network environment. Chen et al. [14] proposed a double deep Q-network (DQN)-based strategic computation offloading algorithm to learn the optimal policy without knowing a priori knowledge of network dynamics. Motivated by the above works, we model a task-dependency computation offloading problem based on the MDP and utilize a model-free Q-learning algorithm to solve the problem without any prior knowledge [15]. Specifically, the implementation of our QLOF is based on a completely Markov decision process (MDP) composed of state (defined by the offloading decisions of adjacent two tasks), action (defined by the offloading decisions of the mobile device) and reward (defined by the minus cost function).
The main contributions of this article are summarized as follows. To solve the entire computation offloading problem, we design an effective Q-learning based computation offloading (QLCOF) policy to optimize the resource allocation and offloading scheme for the proposed collaborative computing system. Specifically, we first assign four states and two actions for each task to model the offloading decision process as an MDP. Then, for each state, we design a state loss function (STLF) based on the quality of experience (QoE), which measures the time and energy consumption of different states. Afterward, we define the cumulation of STLFs as the system loss function (SYLF) and formulate an SYLF minimization problem under the hard constraint. Then, we solve the formulated problem by two steps (1) optimize the resource allocation, (2) optimize offloading decisions. For the former, we use the quasi-convex bisection method and polynomial analysis method to obtain the optimal transmission power and computing frequency of edge cloud, respectively. For the latter, based on the MDP, we propose a QLOF algorithm to obtain the optimal offloading scheme.

II. RELATED WORK
Cloud computing and edge computing have been regarded as effective technologies to enhance the computation capability of mobile device, and have attracted much attention in recent years. The existing researches on MEC and MCC can be divided into three types, latency-based offloading, energy-based offloading, and energy efficiency (cost)-based offloading.
The latency-based offloading scheme aims to minimize the system time consumption for delay-sensitive applications. To minimize the latency, Sun and Ansari in [16] proposed a Latency Aware Workload Offloading (LEAD) strategy in terms of a novel cloudlet network, which enabled mobile users to offload the workload of computation-intensive applications onto the nearby cloudlet to minimize the average application response time among mobile users. To achieve the minimization of the system latency, Fan and Ansari [17] proposed the cost-aware cloudlet placement in the mobile edge computing (CAPABLE) strategy, which considers the cloudlet cost and design a workload allocation scheme to minimize the E2E delay, but it did not consider the energy consumption. Moreover, in order to provide the better QoE for delay-sensitive applications, Ren et al. in [10] studied the fully offloading situation and proposed a collaborative computing system combining the MEC and MCC techniques, which leveraged the optimal task splitting strategy to jointly optimize communication and computation resources for minimizing the system latency. Compared to our work, the above studies did not consider energy consumption and only focus on the independent MCC or MEC system. Although Ren et al. in [10] combined the techniques of MCC and MEC, it did not facilitate the resources of mobile devices.
Energy-based offloading schemes aim to reduce energy consumption for energy-sensitive applications. To conserve energy, He in et al. in [18] formulated an energy-aware collaborative computation offloading problem (EA-CCO) to save energy and designed an iterative searching algorithm for collaborative computation offloading paradigm (ISA-CCO) to jointly optimize the offloading decisions, computational frequency, transmission power and battery rate. Zhang et al. in [19], developed an optimal offloading algorithm for the mobile user in an intermittently connected cloudlet system, which formulated an MDP model to minimize the energy cost. Sardellit et al. in [20] proposed a successive convex approximation (SCA) algorithm to minimize energy consumption by optimizing computational offloading problems across densely deployed multiple radio access points for highly computation-intensive tasks. Compared to our work, the above studies did not consider the time consumption and only focus on the independent MCC or MEC system.
Meanwhile, some researches aim to improve (reduce) energy efficiency (cost) by considering both energy and latency. You et al. in [21] developed a threshold-based mobile offloading scheme to improve the energy-efficiency of the multiuser MEC systems by offloading computation-intensive mobile applications to clouds located at the edge of cellular networks. Chen et al. in [22] studied the multiuser computation offloading problem for mobile-edge cloud computing in a multi-channel wireless interference environment and adapted the game-theoretic approach for achieving energy-efficient computation offloading in a distributed manner. Guo et al. in [23] proposed the energy-efficient dynamic offloading and resource scheduling (eDors) algorithm to minimize the system cost by jointly optimizing the computation frequency, transmission power, and offloading decisions. Lyu et al. in [24] proposed the heuristic offloading decision algorithm (HODA), which jointly optimized offloading decisions, communication, and computation resources for a multiuser scenario to maximize the system utility function. However, the above researches mainly focused on an independent MEC/MCC system or collaboration between MEC and MCC and rarely involved the collaborative computing system composed of local computing, MEC, and MCC.
Different from the existing work, we consider a more practical scenario where a mobile device with limited computation resources works with edge cloud and central cloud to handle computation-intensive single-chain applications. Furthermore, based on the MDP, we propose an effective computation offloading policy, namely QLCOF, which considers both latency and energy consumption through jointly optimizing resource allocations and offloading decisions.

III. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, we first introduce the computation model and communication model for the collaborative computing system. We then model the single-chain application offloading process as an MDP. Based on this, we design an STLF to measure the loss of different states and optimize resource allocations, which will be explained in the next section. Finally, we define the cumulation of STLFs as the SYLF and formulate an SYLF minimization problem.

A. SYSTEM MODEL
We assume that the mobile device, edge cloud, and central cloud can cooperatively process computation-intensive applications. As shown in Fig.1, each base station (BS) serves one mobile device, and all BSs are linked to a central cloud with unlimited computing resources. Each BS can be regarded as an edge cloud with relatively powerful computation capability, catching, and storage space. We assume that the mobile device offloads tasks to the edge cloud over a wireless channel, and different edge clouds transmit data to the central cloud through different backhaul links. Specifically, mobile devices can choose to execute some tasks with low computational complexity locally or offload some highly computation-intensive tasks onto the collaborative cloud. Once the tasks are offloaded onto the collaborative cloud, they are immediately split into two parts optimally; the central cloud processes one part, and the other part processed by the edge cloud for minimizing overall latency.
In this system, we assume that each single-chain application is partitioned into a M task sequence, denoted by a set of M = {1, . . . , M }, and each task can be split with any ratio. Specifically, all tasks must be processed sequentially, since the output data of the current task is the input data for executing the next task. Here, we utilize a directed acyclic graph G = (V , E) to represent the sequential dependency relationships among these tasks for a single-chain application. Each task i ∈ V in G represents a task and a directed edge e(i, j) indicates the precedence constraint between tasks i and j such that task j cannot start execution until its precedent task i completes [23].

B. COMPUTATION MODEL
In this subsection, we introduce the local computing model and the collaborative cloud computing model. Here, we adopt a two-filed notation T ask i = {L i , C i } to represent the i-th task of a single-chain application, where L i (bit) is the input data size, including system settings, program codes, and input parameters. C i represents the required CPU cycles/bit to accomplish the task i. Each task can be either executed locally or offloaded onto the collaborative cloud [25].

1) LOCAL COMPUTING
We assume that the computation capability per second of each mobile device is a fixed value and denoted as F l . Thus, the computation delay of local computing for VOLUME 8, 2020 executing L i bits can be expressed as: According to [16], [20], and [21], the CPU power is a superlinear function of computing frequency F l and represented as where γ = 2, and κ l is effective switched capacity set as 10 −11 , [16], [20]. Obviously, the CPU power P l is an inherent characteristic of a specific mobile device. The energy consumption of the mobile device can be represented as: From the Eq. (1) and Eq. (3), the computation delay T l i and energy consumption E l i of a specific task are only determined by input data size and the required CPU cycles/bit.

2) COLLABORATIVE CLOUD COMPUTING
In our model, a collaborative cloud computing system combines the MEC and MCC. When the task is completely received by edge cloud, it will be immediately split into two parts. One part will be transferred to the central cloud via the backhaul link, and another part is executed on the edge cloud so that the edge cloud and the central cloud can execute it in parallel. Here, we give some reasonable assumptions: (1) Tasks can be split arbitrarily, regardless of its type and original content [21], [27], [28]. (2) Since the optimal splitting ratio is affected by task parameters, such as input data size and required CPU cycles/bit, the edge cloud splits the task only when it completely receives this task for guaranteeing the accuracy of the splitting ratio [22], [29]. (3) The main objective of the collaborative cloud is to minimize delay, and the optimal splitting ratio is achieved by minimizing the delay of collaborative cloud computing [16].
Since edge cloud and central cloud process task in parallel, the overall delay minimization problem of the collaborative cloud computing can be shown as [10]: where is computation delay of edge cloud, is the computation delay of central cloud. It has been proved in [10], the minimum value of T c i is achieved when , and the optimal splitting ratio is: where F edge i is the computing frequency of edge cloud, F cent is the ultra-high frequency of central cloud, W is the backhaul communication capacity. Therefore, the overall delay of the collaborative cloud computing at an optimal splitting ratio can be expressed as: Proof: Please refer to Appendix. Since the central cloud has sufficient computation capability, the energy consumption of the central cloud computing has not been included in the system energy consumption [23], [25]. Besides, the energy consumption of transmission from edge cloud to the central cloud through the backhaul link is quite small; thus, we ignore it. Therefore, the energy consumption of collaborative cloud computing is only determined by edge cloud computing, which can be derived as: where κ edge is an effective swatch capacity that is set as 10 −12 to satisfy the measurement of energy consumption for edge cloud computing.

C. WIRELESS COMMUNICATION MODEL
For a delay-sensitive and computation-intensive application, some tasks are offloaded onto the collaborative cloud through the wireless channel, which leads to extra time consumption and energy consumption. In our model, the gain of the wireless channel between the mobile device and edge cloud is denoted by H i . B is the system bandwidth, and N 0 is noise power. Besides, we use p t i to represent the transmission power, which can be configured by the mobile device to satisfy the maximum transmission power constraint. Therefore, the data rate of transmission from the mobile device to the edge cloud can be given by: Since the real-time applications are delay-sensitive, the overall system delay should be rather short. To this end, it is reasonable to ignore the movement of mobile devices during this period, and H i keeps a constant that can be estimated and known in advance by the scheduler. So, the transmission time and consumed energy can be represented as:

D. MDP MODEL OF OFFLOADING DECISION PROCESS
For a single-chain application, the entire offloading process is a sequential decision process in which the mobile device is required to decide the computation side for each task in sequence. However, the global optimal offloading decision of a task not only depends on the inherent characteristics but also depends on the computation position and data information of other tasks. Therefore, we model the entire offloading process as an MDP and propose an improved QLOF algorithm to obtain the optimal offloading scheme, which will be explained in the next section. MDP is an optimal decision process of a stochastic dynamic system based on Markov's property. It includes two main subjects, namely agent and environment, where the agent is also the decision-maker [33].
To achieve the goal of the decision process, the agent interacts with the environment to make a proper decision at a specific state, while generating an instant reward [33]. Here, we model the offloading process as an MDP and formulate the SYLF minimization problem based on state, action, and reward.

1) STATE SPACE S
As for an MDP, the state space is a finite-dimensional space that can be regarded as the observations of the environment. In our model, we allocate four states to each task to form a sub-state space S i , which can be represented as: where i is the index of the task for a single-chain application.
Here, we give the denotation: s 1 i = 00, s 2 i = 01, s 3 i = 10, and s 4 i = 11. Since the Markov's property emphasizes that the next state is related to the current state, here we use two binary numbers to establish the connection between two adjacent tasks. The first number indicates the position of executing the current task, and the second number indicates the position of executing the next task in the future. Local computing is represented by 0, and collaborative cloud computing is represented by 1. For example, s 2 i demonstrates that the current task i is executed on the mobile device, and the next task i + 1 will be offloaded on the collaborative cloud. The entire system state space consists of a start state S0, a termination state 0T , and all sub-state spaces S i . Note that S0 and 0T satisfy the requirement that the initial task and termination task must be executed locally. Accordingly, the system state space S can be denoted as: where M is the number of tasks for a single-chain application. From Eq. (12), we can find that the number of states of an entire application is 4M − 2, which satisfies the requirement of the finite-dimension of the MDP state space.

2) ACTION SET A
For the standard MDP, the action space refers to the actions that the agent can perform at a specific state to reach the next state [34]. In our model, there are two actions for the agent to choose, which are represented as: where 0 and 1 represent the local computing and collaborative cloud computing, respectively. For our MDP-based offloading process, a specific state contains the position of executing the current task and the position of executing the next task; thus, the action taken by the mobile device is not for the next but the second task adjacent to the current task. In order to explain the state transition process accurately, we use a specific example to elaborate further. We assume that the current state is s 2 i = 01, which means that the current task i is executed locally, and the next task i + 1 is executed on the collaborative cloud. At this moment, when the mobile device makes an offloading decision a = 1, the system will transit from state s 2 i = 01 to the state s 4 i+1 = 11. The state transition diagram of a single-chain application is shown in Fig.2.

3) POLICY
The MDP policy is a distribution over actions [34], which means a probability of choosing a specific action when the agent arrives at a particular state represented as: Our goal is to choose an optimal action by a deterministic policy π * when mobile device stays at a specific state.

4) LOSS FUNCTION AND REWARD
MDP aims to maximize cumulative rewards by making decisions optimally with uncertainty [34], [35]. For a standard MDP, the reward should be provided to the agent for a particular environment so that the agent can learn. Before elaborating on the cumulative reward function, we first design two loss functions, STLF and SYLF, to measure the performance of VOLUME 8, 2020 the collaborative computing system. Then, the SYLF minimization problem will be transformed into a maximization problem that complies with the standard MDP. In order to facilitate the follow-up content, we call the situation where all the tasks of the entire application are executed locally as the all-local scheme and the situation where all tasks (except for the initial task and termination task) are executed on the collaborative cloud as the all-collaborative cloud scheme. Definiation 1 (State Loss Function (STLF)): The state loss function is designed to measure the loss ratio for different states, which is defined as the weighted sum of the time loss ratio and energy loss ratio for a specific state in terms of all-local scheme loss.
So, the uniform expression of STLF can be formulated as: (15) where i is the index of the task, and β E , β T ∈ [0, 1] are the weights of the time loss ratio and the energy loss ratio, respectively. They measure preferences of completion time and energy consumption for applications. Note that we assume β E + β T = 1 to balance the time consumption and energy consumption in case of excessive loss caused by massive weight. For instance, larger β T and smaller β E of delay-sensitive application can save time at the expense of energy consumption. Besides, T l i denote the energy consumption and time consumption for all-local scheme, respectively. Accordingly, we can note that the STLF utilizes the all-local scheme loss as a criterion to measure state performance. As the STLF describes the loss of a particular state of the current task, we only focus on the consumption from the start execution of the current task to the start execution of the next task. Next, we display the specific expressions of the STLFs. a: s 1 i = 00 Since both two adjacent tasks are executed on the mobile device, we only need to consider the local computing consumption for the current task, which can be represented as: The STLF of s 1 i = 00 can be written as: b: The current task is executed locally while the next task is offloaded on the collaborative cloud. Since two adjacent tasks are executed at different sides, the mobile device not only executes the current task but also transmits the next task data to the edge cloud. Due to task dependency, the following task data can only be transmitted after completely executing the current task. So, the energy consumption E 01 i and time consumption T 01 i can be denoted as: The STLF of s 2 i = 01 can be expressed as: The current task is executed on the collaborative cloud while the next task is executed locally. Since two adjacent tasks are executed at different sides, the system not only executes the current task but also transmits the current task data from the collaborative cloud to the mobile device. Thus, energy consumption and time consumption can be expressed as follows: .
The STLF is shown as below: d: Both two adjacent tasks are executed on the collaborative cloud; thus, we only consider the consumption for processing the current task on the collaborative cloud, which can be expressed as follows: The STLF is shown as follows: In our model, the mobile device makes offloading decisions for each task in sequence and assign a state to each task to form a complete MDP. To measure the decisions performed by the mobile device, we design a system loss function (SYLF) that is the cumulation of the STLFs for all states and is represented as Our goal is to minimize the SYLF by optimizing frequency F edge i of edge cloud, transmission power p t i , and offloading decisions. As a result, we can formulate the problem as: where M denotes the number of tasks for an application. C 1 ensures transmission power constraint. Constraint C 2 guarantees that F edge i is not less than the lowest value F edge min , and must not exceed the central cloud computing frequency F cent . C 3 states that tasks can stay at any state of sub-state space, which means that tasks can be either locally executed or offloaded to the collaborative cloud. In the following section, we will decompose the problem through a distributed algorithm to minimize the SYLF.

IV. COMMUNICATION AND COMPUTATION RESOURCES OPTIMIZATION
In our minimization problem (28), the offloading decisions are coupled with the transmission power and the edge cloud computation capability. Besides, the offloading decisions are denoted by the state s j i , and thus, (28) is a mixed-integer problem [35]. To solve it, we decompose the problem and preferentially optimize the transmission power and the edge cloud computing frequency by minimizing the STLFs. After that, we optimize the offloading decisions based on optimized resource allocation by the QLOF algorithm.

A. UPLINK TRANSMISSION POWER ALLOCATION
The optimal uplink transmission power is acquired by minimizing the STLF 01 i , which can be expressed as: We rewrite the objective function P1 by substituting Eq. (19), Eq. (20) into Eq. (21): where E l i ), and µ i = H /N 0 . The second-order derivative of Eq. (32) can be represanted as: Obviously, when f (p 0 ) = 0, we have: Substituting Eq. (35) into Eq. (33), we can acquire: Therefore, the f (p To explore that, we develop a bisection method to approximate the optimal transmission power p * i for minimizing f (p up i ), which is illustrated as Algorithm 1. It is worth noting that the optimal transmission power of each task should be explored before prescheduling the offloading scheme.

B. EDGE CLOUD COMPUTING FREQUENCY CONTROL
Edge cloud computing frequency control works when the task is offloaded onto the collaborative cloud, such as state s 3 i = 10 and s 4 i = 11. Here, we utilize the state s 4 i = 11 to explore the optimal solution of edge cloud computing frequency F edge i . Based on the polynomial analysis, we optimize the edge cloud computing frequency F edge i by minimizing P3: and where Firstly, we derive the first-order derivative of y(F cent i ), which is shown as below: It is easy to find that the sign of y(F edge i ) is determined by the sign of its numerator. Next, we only analyse the property of its numerator that is: From Eq. (42), we can find that there are four roots of ( • ) = 0, and two roots are 0s, the other two roots are determined by the content in brackets of Eq. (42), which is a univariate quadratic polynomial. Here we use N (F edge i ) to express it: 3)F r > F cent , (F cent ) < 0. It means that y(F edge i ) monotonically decreases, and reach the minimum value at F cent . So, the optimal frequency (F edge i ) * = F cent . For the case 2, it is hard to obtain the closed-form of (F edge i ) * , so we adopt the bisection method shown by the Algorithm 1 to approximate the optimal frequency of edge cloud, we omit here.

V. QLOF ALGORITHM FOR OPTIMAL OFFLOADING SCHEME
In this section, we propose the QLOF algorithm to achieve the optimal offloading scheme. MDP is a sequential decision process, which agent observes its current state to make optimal decisions concerning its next states [38]. Q-learning is the most widely utilized technique to solve MDP, which combines the Monte Carlo and dynamic programming to learn an offline policy without prior knowledge [39]. This section develops a QLOF algorithm to minimize the SYLF based on the optimal transmission power and computing frequency of edge cloud. When the agent takes a policy π, it should consider how many rewards can be obtained. In general, the expected cumulative rewards can be used to evaluate the policy at the current state, which is denoted by the value function: The expectation of Eq. (44) is in terms of policy π, which means whether the mobile device performs offloading action or not at the current state. We can rewrite the Eq. (44) as: Since the standard MDP aims to maximize the cumulative immediate rewards, the optimal value function is represented as: We utilize the negative rewards R to substitute the STLFs, and negative G to substitute SYLF, thus the SYLF minimization problem can be transformed as the cumulative rewards maximization problem represented as P5: Obviously, P5 satisfies the formulation of the optimal value function Eq. (46). Thus, we can solve it through the proposed QLOF algorithm. The action at the current state is measured by Q-value Q(s, a) represented as:  if Q-value of all states keep stable then 7. Break the loop of episode. end if end for /*********************************************/ /* Optimal Offloading Decision*/ 8. Set current state = initial state S0 9. From current state, find the action with the largest Qvalue, and reaching to the next state. 10. Set current state = next state. 11. Repeat step 8 and 9, until current state = terminal state 0T .
offloading scheme by updating the Q-value of each state continuously using the Eq. (48). We design a Q-table for each task to store its Q-values, which is shown as Table 1. The QLOF algorithm continuously updates the Q-value through looking up Q-table until all Q-values converge. Algorithm 2 shows the QLOF algorithm in detail, and we can find that an episode of the QLOF algorithm is a training session. After each episode is learned, it enters the next episode. Therefore, we can find that the outer loop of QLOF is episodes, and the inner loop is every step of the episode. The loop of the episode ends until the Q-values converge for all states. After that, we can utilize stable Q-values to guide the agent's action to obtain the entire application's optimal offloading scheme. VOLUME 8, 2020

VI. SIMULATION RESULTS
In this section, we provide numerical results to evaluate the performance of the proposed QLOF scheme. We assume that the coverage radius of the edge cloud is 500m, and each single-chain application is within each coverage area of edge cloud. Applications in different edge cloud coverage areas do not interfere with each other. All tasks for an application have the same weights of time loss ratio β T and the energy loss ratio β E . The input data size and the required CPU cycles/bit for each task both subject to uniform distribution with L i ∈ [0.1, 0.5] Mbits, and C i ∈ [500, 2500] CPU cycles/bit, respectively. The wireless channel gain between the mobile device and edge cloud is a sequence of Rayleigh random variables with unit variance. Other parameters in our simulation are listed in Table 2.

A. ANALYSIS OF THREE EXECUTION SCHEMES
We first analyze three execution schemes for delay-sensitive applications with large β T and small β E . Fig.4 shows the SYLFs under three schemes: all-local scheme, all-collaborative cloud scheme, and our proposed QLOF scheme. For the all-local scheme, the SYLF is always 1 for different applications and is selected as a baseline to evaluate the other two schemes. Since the all-collaborative cloud scheme is affected by the wireless channel state, computing capabilities of edge cloud and central cloud, the SYLFs of the other two schemes fluctuate slightly for different applications. However, the performance of the all-collaborative cloud scheme is always better than the all-local scheme for  delay-sensitive applications. Besides, our proposed QLOF scheme is superior to the other two execution schemes. Fig.5 depicts the number of offloaded tasks under the QLOF scheme for different applications. Compared with the other two schemes, the QLOF scheme utilizes the resources of the mobile device and the collaborative cloud to minimize the SYLF by prescheduling the execution position of tasks from a global perspective.

B. COMPARISON OF SYLFs ON DIFFERENT OFFLOADING SCHEMES
In this subsection, we compare the STLFs of the QLOF scheme and dynamic offloading scheme in [23]. Fig. 6 (a) depicts the SYLFs under the QLOF scheme and dynamic offloading scheme. The performance of the QLOF scheme is always better than that of the dynamic offloading scheme. This is because the dynamic offloading scheme is only a one-step prescheduling scheme, which computes the consumptions of the current task on different sides and directly makes the offloading decisions, then continues to compute the next task consumptions. On the contrary, our OLOF scheme optimizes offloading decisions from the global perspective to minimize the SYLF. Fig. 6 (b) shows SYLFs under different offloading schemes versus local computing frequency F l . From Fig.6 (b), we can find that the SYLF of the QLOF scheme is always lower than that of the dynamic offloading scheme as the increase of the F l . Moreover, the SYLFs of two offloading schemes fluctuate slightly around 1 when F l is smaller than 2.2 × 10 8 , while the SYLFs  decrease when F l is more than 2.2 × 10 8 . This implies that both offloading schemes select more tasks that are executed on the mobile device with smaller F l , as the increase of the F l , more tasks are offloaded on the collaborative cloud to reduce energy consumption caused by high local computing frequency.

C. COMPARISION OF ENERGY CONSUMPTION AND COMPLETION TIME
In this subsection, according to Figs.7 and 8, we analyze the completion time and energy consumption under three different execution schemes. From Fig.7, the energy consumption of three execution schemes all shows a successive upward trend as the increase of M . The performance of the all-local scheme is the best among three execution schemes, which the energy consumption of the all-local scheme is lower than that of the other two schemes. Moreover, the all-collaborative cloud scheme always keeps the highest for different applications. The reasons account for that are as follows: (i) All-local scheme saves energy for transmitting data from the mobile device to a collaborative cloud. (ii) The energy consumption mainly depends on computation capability. Since F l is , the all-local scheme can effectively reduce energy consumption.
As can be seen from the Fig.8, the completion time of the all-local scheme has a very sharp rise as the increase in the number of tasks. In comparison, the completion time of the all-collaborative cloud scheme has an extremely slight rise and keeps lower than that of the other two schemes, which indicates that the emergence of collaborative cloud computing greatly resolves the contradiction between low-latency requirement and powerful computing capability requirement of computation-intensive real-time applications. In the light of Figs.7 and 8, we can find that the QLOF scheme is a compromise between all-local scheme and all-collaborative cloud scheme when only considering completion time or energy consumption, that is because it utilizes not only local computation resources but also collaborative cloud computation resources.

D. THE IMPACT OF WEIGHT FACTORS β E AND β T
In this subsection, we analyze the impacts of the weight factors β E and β T . Fig. 9 depicts the SYLFs of three schemes versus β E . As the increase of β E , the SYLF of the all-local scheme always keeps as 1, and we select it as the baseline. The SYLF of the all-collaborative cloud scheme rises approximately linearly as the increase of the β E . Compared with the all-local scheme at small β E , the all-collaborative cloud scheme shows significant superiority. As the increase of the β E , the SYLF of the all-collaborative cloud scheme continues to rise and exceeds the all-local scheme. When the β E is quite large, the performance of the all-collaborative cloud scheme is much worse than that of the all-local scheme, which indicates that the all-collaborative cloud scheme only adapts to the delay-sensitive applications with large β T . In comparison, the QLOF scheme always has the best performance among all schemes throughout the β E growth process. When β E is small, the performance of the QLOF scheme is close to the all-collaborative cloud scheme, which means that delay-sensitive application offloads more tasks onto the collaborative cloud to reduce SYLF. As the increase of the β E , the performance of the QLOF scheme tends to the all-local  scheme implying that more delay-tolerant applications are executed locally. This is because it utilizes the computation resources of the mobile device and collaborative cloud so that tasks execution position can be optimally prescheduled. Fig.10 corroborates the above analysis results intuitively that the number of offloaded tasks gradually decreases as the increase of β E .

E. THE IMPACTS OF COMPUTING FREQUENCY OF EDGE CLOUD
In this subsection, we analyze the impacts of the edge cloud computing frequency F edge i on splitting ratio, the completion time of collaborative cloud computing, and SYLFs. Fig. 11 displays optimal splitting ratio versus the computing frequency of edge cloud for tasks with different required CPU cycles/bit C i . The optimal splitting ratios rise with the increase of the computing frequency of edge cloud, which implies that when the computation resource of edge cloud is sufficient, more tasks should be executed on the edge cloud, and the central cloud serves as an auxiliary to process remainings. By the vertical comparison, for a specific F edge i , the higher the required CPU cycles/bit, the smaller the optimal splitting ratio, which means that the central  cloud allocates more computation resources to execute high computation-intensive tasks to assist the edge cloud with limited computation capability.
From Fig.12, as the increase of the F edge i , the completion time has a downward trend, which means that the powerful edge computation capability can effectively reduce the completion time of collaborative cloud computing. Through vertical comparison, tasks with higher required CPU cycles/bit cost more time. The reasons can be explained as: (i) The optimal splitting ratio of tasks with higher CPU cycles/bit is smaller, so more tasks are executed at the central cloud, resulting in greater transmission time. (ii) Higher CPU cycles/bit means the greater workload. For a specific computing frequency of edge cloud, the tasks with enormous workload cost more time than tasks with a smaller workload. Fig.13 presents the SYLFs of three schemes versus computation capability of edge cloud for a delay-sensitive application. Since the all-local scheme is independent of F E i , its SYLF always keeps at one. In comparison, the SYLF of the all-collaborative cloud scheme rises almost linearly as the increase of the F edge i . At first, the STLF of all-collaborative scheme is lower than that of the all-local scheme at a relatively small computing frequency, and then it continues to rise and exceeds the all-local scheme. This is because the SYLF of all-collaborative cloud scheme is mainly determined by energy consumption. Thus, the high computing frequency of edge cloud leads to the large SYLF. For the QLOF algorithm, its SYLF shows an upward trend, which firstly rises sharply and closes to the all-collaborative cloud scheme, and then gradually slows down and approaches to the all-local scheme, which indicates that more tasks are executed on the collaborative cloud in the case of low F edge i , and more tasks are executed locally in the case of high F edge i . Throughout the increasing process of the F edge i , the performance of the QLOF scheme is always the best among all schemes, which further verifies its superiority.

VII. CONCLUSION
In this article, given the limited computation capability of the edge cloud, we have proposed the QLCOF policy to jointly optimize the resource allocations and offloading decisions for the collaborative computing system. We have first modeled the offloading decision process of a single-chain application as an MDP and design an STLF as the performance metric. Then we have defined cumulative STLFs as the SYLF and formulated it as a minimization problem. Due to the difficulty to directly solve it, we then have decomposed the original minimization problem into multiple subproblems. Through minimizing the STLFs, we have preferentially optimized the transmission power and computing frequency of edge cloud by the bisection method of the quasiconvex problem and polynomial analysis method, respectively. Next, based on precalculated offline optimal transmission power and computing frequency, we have developed an effective offloading scheme, namely the QLOF scheme, which preschedules the execution position for each task of the entire application from a global perspective to minimize the SYLF. Finally, we have implemented the QLOF scheme on Matlab simulation and analyzed the impacts of weight factors and edge cloud computing frequency on three executing schemes. In particular, we also compared the QLOF scheme and the dynamic offload scheme at different scales. The experimental results showed that compared with the all-local scheme, all-collaborative cloud scheme, and dynamic offloading scheme, the QLOF scheme is an optimal offloading scheme, which enables to minimize the SYLF under different parameters.
MEC research is still in its infancy, which may redefine how services are implemented in fixed/mobile networks. In this article, we have only considered the scenario where a multi-tasks application is served by an edge cloud. For future work, we will expand our work to multi-users multi-edge cloud scenario to jointly optimize offloading decisions and resource allocation, which will be more challenging.

APPENDIX PROOF OF OPTIMAL SPILTTING RATIO
In this part, we proof the optimal splitting ratio Eq. (5) and delay of the collaborative cloud computing Eq. (6) by analyzing the monotonicity of the overall delay of the collaborative cloud computing T c i and splitting ratio α i .