DPRL: Task Offloading Strategy Based on Differential Privacy and Reinforcement Learning in Edge Computing

Mobile edge computing has been widely used in various IoT devices due to its excellent computing power and good interaction speed. Task offloading is the core of mobile edge computing. However, most of the existing task offloading strategies only focus on improving the unilateral performance of MEC, such as security, delay, and overhead. Therefore, focus on the security, delay and overhead of MEC, we propose a task offloading strategy based on differential privacy and reinforcement learning. This strategy optimizes the overhead required for the task offloading process while protecting user privacy. Specifically, before task offloading, differential privacy is used to interfere with the user’s location information to avoid malicious edge servers from stealing user privacy. Then, on the basis of ensuring user privacy and security, combined with the resource environment of the MEC network, reinforcement learning is used to select appropriate edge servers for task offloading. Simulation results show that our scheme improves the performance of MEC in many aspects, especially in security and resource consumption. Compared with the typical privacy protection scheme, the security is improved by 7%, and the resource consumption is reduced by 9% compared with the typical task offloading strategy.


I. INTRODUCTION
The rapid development of the Internet of Things (IoT) and 5G technology has promoted the combination of various smart applications with mobile devices, such as face recognition, smart medical care, etc [1]. These smart applications are gradually becoming more complex and diverse. However, the size and weight of mobile devices, computing power, storage capacity and network transmission capacity of users are limited, which makes them unable to meet the quality of service requirements of users. Therefore, in order to solve the above problems, related scholars proposed mobile edge computing technology (MEC) [2]. FIGURE 1 reveals the application of MEC in an intelligent transportation system. The core technology of MEC is to deploy service nodes at the edge of the network, and offload the complex computing tasks of user to edge servers for computing [3]. Compared with local computing and cloud computing, MEC has stronger computing power and lower latency. However, while MEC provides users with efficient computing services, the privacy of mobile users is also facing a huge threat [4]. In order to satisfy the mobile user's experience, the previous task offloading strategies mostly aim to reduce the delay and network resource overhead, while ignoring the privacy protection of the user. For example, a malicious edge server can infer the user's location, name, interests and other private information by monitoring the user's wireless channel information. Therefore, when using MEC to offload computing tasks of user, his private information must be protected. In addition, it should be noted that excessive privacy protection may lead to high communication overhead and delay, which will affect the user's quality of service experience. Therefore, how to reduce the overhead of task offloading while protecting user privacy is still an urgent problem that needs to be prepared.
To solve the above problems, this paper proposes a task offloading strategy (DPRL) based on differential privacy (DP) [5], [6] and reinforcement learning (RL) [7]- [9]. Specifically, before the computing task is unloaded, to avoid leakage of user privacy, we first encrypt the user's private information with DP. Then, to reduce communication overhead, we select appropriate edge servers to offload tasks in line with the network resource environment and RL.
As a privacy protection technology, DP has been widely used in many fields [10], [11]. It protects user privacy by introducing sufficiently small interference information into user information. Since the introduced interference is small enough and can protect privacy, the edge server in the MEC can still analyze the computing tasks through the disturbed information. In addition, RL, as a typical machine learning technique, has been widely used in many fields. For example, content caching and D2D offloading [12]. Recent studies have also shown its superior performance in edge computing [13] and edge intelligence [14]. Therefore, inspired by the above research, it is feasible to use RL to reduce the communication overhead in MEC. The main innovations of this paper are as follows: (1) Firstly, before task offloading, to prevent malicious edge servers from stealing user's privacy, we use DP to protect user's location information. User's location information often contains a lot of user privacy.
(2) Secondly, we construct a four-layer policy model as a learning agent for RL, which interacts with the MEC network environment to provide a task offloading scheme that minimizes communication overhead.
(3) Finally, to further protect users' privacy, we measure the trust of each edge server according to the intimacy between the edge server and users, and then take it as one of the characteristics of the edge server. In addition, to ensure that the RL agent can be trained in the real MEC network environment, we extract the network resources of MEC and form a feature matrix as the input of the policy model.
The rest of this paper is organized as follows. Section II analyzes the research status of MEC. Section III describes the DP-based user privacy protection strategy. Section IV presents an RL-based edge server selection strategy. Section V analyzes the performance of the DPRL strategy. Section VI summarizes the full text and presents future work.

II. RELATED WORK
As a hot topic in network research in recent years, MEC has been widely used in intelligent transportation systems, Internet of Vehicles and other fields. The existing researches on MEC mainly focus on two aspects: resource optimization and privacy protection during MEC task offloading. This section will analyze the research status of task offloading in MEC from the above two aspects, and compare the task offloading strategy proposed in this paper with these researches.

A. RESOURCE OPTIMIZATION IN TASK OFFLOADING
According to the analysis of existing research, the overhead generated by MEC mainly includes the delay (DL), bandwidth (BW) and computing (CPU) resources required for the calculation. The main goal of resource optimization in the task offloading process is to minimize the BW and CPU resources consumed in the MEC process with low DL [15]- [17].
Liu et al. [18] adjust the task offloading strategy according to the queuing buffer of computing tasks, the available CPU resources of user and edge servers to achieve the minimum time consumption. Literature [19] first proves that minimizing the average DL during task offloading is an NP-hard problem, and then uses Lyapunov to optimize the problem to achieve a near-optimal task offloading strategy. However, it is worth noting that [18], [19] only consider the DL in the task offloading process, but do not pay attention to the overhead in the task offloading process. Therefore, the literatures [20]- [22] try to optimize the computational cost within a small DL. Among them, Kamoun et al. [20] propose a resource allocation strategy that takes into account the user's local CPU resources and the wireless network resources between the user and the edge server, which can minimize the average overhead of the user while satisfying the predefined DL constraints. Huang et al. [21] propose a dynamic task offloading decision algorithm that combines dynamic task offloading decision with computing resource allocation to minimize overhead while ensuring queue stability. Literature [22] studies long-term task offloading and resource scheduling. Through Lyapunov optimization technology, the task offloading problem is decomposed into multiple subproblems, so as to achieve the goal of minimizing the average power consumption of the system. Besides, to minimize the task duration while satisfying the energy budget constraints, literature [23] proposed an online task offloading algorithm based on asynchronous advantage actor-critic. The algorithm can learn good offloading strategies to obtain near-optimal task assignments.
Although the above studies improve the resource consumption of MEC to a certain extent, it has the following limitations. On the one hand, most of these schemes only reduce the unilateral resource consumption and do not comprehensively consider the CPU, DL and BW required by MEC. On the other hand, most of these schemes use heuristic algorithms to solve MEC problems, which is easy to fall into local optimization. In order to solve the above problems, based on the inspiration of literature [23], we propose a DRRL scheme. DPRL extracts the MEC network resource state as the RL training environment, so that the performance of the whole MEC is balanced. In addition, the RL problem is optimized to avoid falling into local optimization.

B. PRIVACY PROTECTION IN TASK OFFLOADING
The privacy problem in task offloading usually consists of two aspects, namely the privacy problem caused by data interaction and the privacy problem caused by the task offloading feature. The privacy problem caused by data interaction is mainly due to the malicious MEC server stealing the user's privacy based on the calculation data. To solve this problem, Xu et al. [24] divide computing tasks into different types of data, and increase privacy entropy by enhancing the uncertainty of computing tasks to protect user privacy. Xu et al. [25] propose a two-stage task offload optimization strategy to maximize resource utilization and minimize time cost in the first stage, and balance task offload performance and privacy effect in the second stage. Aiming at the privacy problem caused by the characteristics of task offloading, Min et al. [26] propose a privacy-aware offloading scheme to improve the privacy protection level of users. The scheme selects the task offloading ratio and local processing ratio of user according to information such as wireless channel status, sensor data, size and priority of computing tasks, which reduces the DL and saves the computational overhead of user. Nguyen et al. [27] propose a MEC-based mobile network user privacy model, where user selects an efficient task offloading scheme through constrained Markov decision.
Similar to the literature [26], [27], the proposed scheme is mainly used to solve the privacy problem caused by the task offloading feature. The difference is that most of the privacy protection methods in previous task offloading only pre-set the privacy protection level and optimize it as an indicator. This kind of methods lacks clear privacy protection means, which may lead to privacy disclosure. And due to the lack of protection of the location information of the user, a malicious MEC server can steal the private information of the user according to the location information of the user. The scheme proposed in this paper takes the user's location information as the entry point, and interferes with the distance between the user and the edge server through DP, which provides clear privacy protection for the user.

C. RESEARCH ON MEC SECURITY BASED ON DIFFERENTIAL PRIVACY
At present, DP-based MEC research is mainly divided into centralized DP and localized DP. Centralized DP refers to uploading the initial data to the data center for storage, and then the data center uses the DP mechanism to perform pri- VOLUME 4, 2016 vacy protection processing on the data, and finally publishes the protected privacy information. Literature [28] proposed a new data encryption technology based on Trusted Third Party (TTP), which can maintain privacy protection in cloud environment. Literature [29] uses DP to build an optimal model based on TTP, in which TTP can obtain some information from public datasets to better understand private datasets without compromising its privacy. The work of Wang et al. [30] is to propose a DP-based location perturbation scheme to protect location privacy in third-party geolocation services, which uses a TTP architecture to perform the protection. However, it is worth noting that although the centralized DP mechanism can ensure that the user's private information will not be leaked. However, since third-party data collectors are often untrustworthy, they can pose a threat to users' private information without their knowledge.
To solve the above problems, a localized DP mechanism is proposed. On the basis of protecting user privacy, localized DP also considers the personalized protection of each user's private information. The difference between it and centralized DP is that each user will obfuscate the private data locally, and then publish the perturbed private data. The representative local DP studies are in [31] and [32]. Wang et al. [31] proposed a method to protect user location based on localized DP preference. In order to avoid the privacy problems caused by excessive reliance on third parties, Wu et al. [32] proposed a contract signing protocol, which uses the decentralized features of block-chain technology to avoid excessive reliance on third parties and protect user privacy.
Similar to the literatures [31] and [32], this paper uses localized DP technology to solve the security problem of MEC, and on this basis, uses RL to reduce the communication overhead of MEC.

III. LOCATION PRIVACY PROTECTION BASED ON DP
During the task offloading process of MEC, the user will choose different task offloading strategies according to the wireless channel between the user and the edge server. If the channel conditions are good, user will offload computing tasks to edge servers to save overhead. On the contrary, if the channel condition is poor, the user tends to perform operations locally. It is worth noting that when the channel conditions are good, the malicious edge server will infer the location of the user by analyzing the wireless channel between the user and the edge server. Therefore, in this section, we will focus on the location privacy leakage problem of user during task offloading.

A. DESIGN OF INTERFERENCE DISTANCE PROBABILITY DENSITY FUNCTION
DP is widely used in data mining and deep learning privacy protection because of its superior privacy protection performance. However, due to the limited coverage of edge servers, traditional DP methods are difficult to directly apply to task offloading in MEC. Therefore, this paper proposes a new distance obfuscation probability density function (PD), which can directly obfuscate the distance between the user and the edge server to prevent the leakage of the user's location information. It is calculated as follows: among them, l is the distance between the user and the edge server, the distance after obfuscation is l * . l 1 and l 2 represent the previous and next term of the confusion range, where l 1 < l 2 ,l 1 , l 2 ∈ [0, l max ] and ∆ l = l 2 − l 1 . This function takes into account the restriction of the confusion range and ensures that l * is within the confusion range with probability 1.  The flow of using PD to interfere with probability is shown in FIGURE 2. It can be revealed from Fig .2 that the key to using the PD to interfere with the distance between the user and the edge server lies in the selection of the interference range, that is, the determination of l 1 and l 2 . Generally speaking, when the real distance l is relatively small, the distance l * ∈ [l 1 , l 2 ] after interference is relatively small, so as to ensure that users can offload more computing tasks to the edge server to save costs. Therefore, in order to ensure that the task offloading decision made by the user based on the interference distance l * has the same utility as the decision made at the distance l, the value of l * must be within the smallest possible range on the left and right sides of l. The closer l 1 and l 2 are to l, the better the utility of the task offloading strategy after privacy protection.

B. DESIGN OF PRIVACY LEAKAGE DEGREE FUNCTION
According to equation 1, the user can interfere with the real distance between the user and the edge server before the task is offloaded. Malicious edge servers are then unable to deduce the user's true location. However, it is worth noting that since the process of using DP to interfere with the real distance is random, the real distance l and the interfered distance l * may be equal. The greater the equal probability, the higher the probability of user privacy leakage. This privacy leakage caused by the randomness of DP interference data is inevitable, so we also need to introduce the user privacy leakage degree function. We first use the Kullback-Leibler divergence (KLD) to measure the fit between the true distance l with and without privacy protection. The specific calculation method is as follows: among them, Q (l * |l) represents the probability distribution of the user's task offloading according to the real distance, and P (l * |l) represents the probability distribution of the user's task offloading according to the distance after interference. According to the definition of KLD, we can know that the smaller the value, the higher the degree of fit and the greater the probability of user privacy leakage. Conversely, when the value of KLD is larger, the fitting degree is lower and the probability of user privacy leakage is smaller. Therefore, the degree of privacy leakage (PLD) can be defined as the inverse of equation 3.

IV. RL-BASED EDGE SERVER SELECTION ALGORITHM
After using the DP to interfere with the user's location information, the computing task needs to be offloaded. Before offloading computing tasks, to further protect user privacy and ensure user QoS requirements, we need to select a safe and high-performance edge server for task offloading. Therefore, in this section, we first set a trust attribute for each edge server according to the interaction frequency between the edge server and the user. Then we design an edge server selection algorithm based on RL, which comprehensively considers the resource environment and security conditions of the entire MEC network. Finally provides users with an optimal task offload scheme through the interaction between the resource environment and the RL agent. This scheme can optimize the resource overhead of task offloading while protecting user privacy [15].

A. FEATURE EXTRACTION OF MEC NETWORK RESOURCES
Edge server has strong social attributes because it interacts with users. Interaction frequency is an important aspect of the social attribute of edge server. The higher the interaction frequency, the more reliable the edge server is. Before feature extraction, we set the trust attribute for each edge server by calculating the interaction frequency between the edge server and users. The calculation of trust is as follows.
where, T represents a time period, n represents the number of interactions between the edge server and the user in the T time period, and t i represents the time of each interaction.
To facilitate the extraction of network features, as shown in FIGURE 3, we model the MEC network as a weighted undirected graph G es = {N es , L es }. N es represents the set of edge servers, and L es represents the set of communication links. The numbers on the edge server nodes represent the available CPU of the edge server, and the numbers on the links represent the BW and DL, respectively. The key to using RL to select edge servers for task offloading lies in the understanding of the MEC network environment, which facilitates the training of RL agents in real environments. Therefore, we extracted the following features for each edge server node: (1) CPU: The capacity of edge server depends on its available computing resources. An edge server with higher computing capability can load more computing tasks.
(2) Degree: In the MEC network, the greater the degree of the edge server node, the more available communication links exist between it and the user.
(3) SUM (n es ) BW : Each edge server has a set of links linked to it, SUM (n es ) BW is the sum of the bandwidth resources of these links. The larger the value, the stronger the ability to transmit data between the edge server and the user.
(4) SUM (n es ) DL : SUM (n es ) DL is the sum of the delays of the links connected to the edge server. The smaller the value, the shorter the interaction time between the edge server and the user, and the better the user QoS experience.
(5) Trust(n es ): Each node has a trust attribute. The higher the trust, the smaller the probability of user privacy disclosure.
In fact, there are many network features that can be extracted from MEC networks. The more features extracted, the more realistic the training environment of the RL agent, but the complexity of the algorithm increases accordingly. Therefore, after comprehensively considering the actual situation of MEC task offloading, we extract the above features, which are also the main factors determining the task offloading overhead.
After extracting the features of the edge servers, we normalize them to feature vectors, and construct the feature vectors of all edge servers into a feature matrix, which is used VOLUME 4, 2016 as the input environment for the RL agent. The feature matrix is shown in equation 5.

B. DESIGN OF POLICY NETWORKS FOR RL
We design a four-layer policy network as the RL agent, which is the input layer, the convolutional layer, the softmax layer and the output layer. We take the feature matrix of the MEC network as the input of the agent, and operate it in the convolution layer. The specific convolution operation is as follows: Out among them, Out put i is the i-th output of the convolution layer, w is the weight vector of the convolution kernel, v i is the i-th input of the convolutional layer, and b is the deviation. The softmax layer mainly outputs the probability of each edge server being selected according to the available resource vector of each edge server. The higher the probability, the higher the priority of selecting the edge server for task offloading. The specific calculation method of softmax is as follows: During the training of an RL agent, the agent relies on the reward signal to decide what action to take next. In order to motivate the agent to act correctly and optimize the resource overhead during the MEC task offloading process, this paper uses the total resource consumption of the task offloading process as the reward signal. If the current total resource consumption is low, it means that the agent's current behavior is correct. Conversely, if the current total resource consumption is high, it means that the agent needs to adjust the current behavior. The equation for calculating total resource consumption is as follows: among them, E (s, a) represents the total computing resources required to offload the current task. T (s, a) represents the total delay in offloading the current task. Γ (s, a) represents the task loss probability, which is calculated as follows: where n loc represents the number of tasks executed locally, and N t represents the total number of currently submitted tasks unloaded. In addition, s and a represent the current state and behavior of RL, respectively, where s is determined by the input feature matrix, and a is the offloading of the current computing task.

V. EXPERIMENTAL DESIGN AND RESULTS ANALYSIS
In this section, we will verify the effectiveness of the proposed scheme in this paper. We divided the experimental part into two parts. On the one hand, we verify the feasibility of the DP-based location privacy protection strategy. On the other hand, we verify the superior performance of the RLbased task offloading strategy in reducing resource overhead.
In addition, we also analyze the scalability and time complexity of the algorithm.

A. EXPERIMENTAL ANALYSIS OF LOCATION PRIVACY PROTECTION STRATEGY BASED ON DP
In order to simulate the scenario of task offloading in MEC as realistically as possible, this paper selects the real dataset EUA to simulate the relationship between edge servers and users. We select a base station with a longitude of 144.96 and a dimension of -37.81 in the EUA as the edge server, its coverage is 400 meters, and there are about 70 mobile users within the coverage. Then we set the parameters of the MEC network. For the specific configuration, we refer to Chen et al. [33]. For the convenience of viewing, the network parameter settings are shown in TABLE 1. To highlight the effectiveness of the DP-based user location privacy protection strategy, we selected two strategies for comparison with the strategy proposed in this paper, namely the Basic task offloading strategy without considering user privacy and the NDR task offloading strategy without considering the interference range. At the same time, we determined two indicators to measure the effectiveness of the strategy, namely the probability of privacy leakage, and the user's security level.
The security level of mobile users in each period of task offloading can be calculated by the following equation: where ζ is the security and privacy compromise factor of the user's continuous task offloading. PLD n is the probability of user privacy leakage, which can be calculated by equation 3. FIGURE 4 indicates the privacy leakage probabilities of the three strategies. As revealed in the FIGURE 4, the probability of privacy leakage of the three strategies gradually increases, because as time goes on, users continue to send task offloading information to the edge server, and the edge server can infer user privacy from this information. Therefore, the probability of user privacy leakage increases with time. Compared with the Basic strategy without privacy  protection, NDR and the strategy proposed in this paper are at a lower level, which means that interfering with user location information can effectively avoid user privacy leakage. In addition, it is worth noting that since the NDR does not limit the interference range, the edge server still has a certain probability to infer the user's location information, so the probability of privacy leakage is higher than the scheme proposed in this paper.   It can be clearly seen from the FIGURE 5 that the Basic task offloading strategy without privacy protection is relatively low in security. Privacy protection can significantly improve the security level of users and reduce the probability of privacy leakage. At the same time, compared with the NDR strategy that does not limit the interference range, the scheme proposed in this paper can stabilize the user's security level above 0.5, and the security level is improved by about 7%.
In addition, it is worth noting that privacy protection will affect the quality of service for users to a certain extent. Because privacy protection is invisible to users, the intuitive performance of privacy protection is the degradation of service quality. Therefore, we compare the service quality of the three strategies over time. As shown in FIGURE 6, the quality of service of the Basic strategy without privacy protection is higher than that of NDR and the task offloading strategy proposed in this paper, because privacy protection will bring additional delay overhead, such as transmission delay and privacy protection delay . Furthermore, because the NDR does not limit the interference range, the real distance of the user may be blurred too far, so the delay and bandwidth overhead will also increase accordingly.

B. RL-BASED EDGE SERVER SELECTION STRATEGY
To verify the performance of the RL-based edge server selection strategy during task offloading. We compare it with the DDLO algorithm proposed in [34], the classical task offloading algorithm using Q-learning and the task offloading algorithm without resource optimization in terms of total resource consumption and task loss rate.
We use Pycharm and python to code the experiments, and build a policy network based on TensorFlow. The initial parameters of the policy network satisfy a normal distribution, and the learning rate of the agent is 0.005. Finally, we trained 100 epochs using gradient descent. The agent dynamically adjusts the current behavior according to formula 8 to obtain better training effect. FIGURE 7 shows the total resource consumption of the four task offloading strategies. It is obvious that our proposed RL-based task offloading strategy is more advantageous than the other three algorithms in optimizing resource consump-    task loss rates of the other three strategies except the Without strategy decreased rapidly in the first 500 time points, and then gradually stabilized. This is because with the training iteration of the model, the model's understanding of the network environment gradually deepens, and then gradually stabilizes. According to the analysis of specific data, the strategy proposed in this paper reduces task offloading by 19.5% compared to Without, 6.8% lower than DDLO, and 11.4% lower than Q-learning.

C. SCALABILITY AND TIME COMPLEXITY ANALYSIS
The scalability of task offloading strategy refers to the adaptability of the strategy when dealing with a large number of computing tasks. For a scalable strategy, the change of processing efficiency is stable when the amount of data increases. In addition, the scalability of task unloading strategy is closely related to its time complexity. Based on this, we analyze the scalability and time complexity of the DPRL strategy proposed in this paper. The analysis results are indicated in TABLE 2. It can be seen from TABLE 2 that the DPRL strategy proposed by us has high time complexity, but good security and scalability. This is because compared with the other three algorithms, we first need to blur the user information and calculate the reputation of the edge server, so the time complexity is higher than other algorithms. However, in terms of scalability, DPRL has excellent performance. This is because our solution uses RL to solve the problem of task unloading, so the mobility is better. When there are new computing tasks, a good task offloading scheme can be obtained by using a simple pre-train, so it has better scalability.
According to the analysis of the above experimental results, the task offloading strategy based on DP and RL proposed in this paper can effectively solve the privacy leakage in the process of MEC task offloading and optimize the resource overhead.

VI. CONCLUSION
The low latency and strong computing characteristics of MEC promote the development of the Internet of Things. However, due to the lack of consideration of security and resource overhead in the task offloading process, the existing task offloading strategies are gradually unable to meet the high quality of service requirements of users. To address these issues, we propose a task offloading strategy based on DP and RL. This strategy first encrypts the user's private information through DP to ensure security, and then uses RL to select appropriate edge servers for task offloading to optimize resource overhead in combination with MEC