Privacy-Preserving Cost Minimization in Mobile Crowd Sensing Supported by Edge Computing

To minimize the sensing cost in MCS while preserving the participants’ privacy, in this paper we propose a Data Sensing mechanism with User Privacy Preserved (DS-UPP). We introduce edge computing into MCS to support task allocation and user privacy protection. In DS-UPP, based on compressive sensing theory we minimize the amount of data needed to be submitted. We also design an algorithm based on local differential privacy theory. Selected participants only need to submit their real data along with the reconstructed data generated by the algorithm. It is proved that DS-UPP satisfies <inline-formula> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula>-differential privacy. We give the mathematical lower bound and upper bound of the number of participants needed for task accomplishment with the constraints that privacy budget is <inline-formula> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula> and recovery error of task data is 0, as well as the average amount of data that should be submitted by a participant. We also evaluate the performance of DS-UPP through simulations. Compared with the existing method PrivKV, DS-UPP can reduce the needed data amount by about 90% on the average while guarantee users’ privacy preserved.


I. INTRODUCTION
Mobile crowd sensing (MCS) is an efficient way to obtain physical social data through using mobile devices carried by people. According to IDC, the global sales volume of smartphones has reached 33.32 million in the second quarter of 2019. Due to more and more sensing equipment fit into mobile devices, their sensing capabilities are improved greatly and they become important supplement for the traditional deployed static sensors. Compared with the way of deploying embedded hardware nodes [1], MCS is more free and low-cost sensing data. Therefore, MCS has become more and more popular in academia and industry, which is widely used in environmental monitoring [2], [3], intelligent transportation [4], urban management [5] and so on.
In MCS, task allocation has been an important problem. Appropriate participants are needed to be selected to provide high-quality data for tasks with low cost. When the privacy problem is taken into consideration, the problem becomes more complex. This is because that the effectiveness of task alloction in MCS usually depends on accurate users' The associate editor coordinating the review of this manuscript and approving it for publication was Maurice J. Khabbaz . information and sensing data, which may leak users' sensitive private information. On the other side, traditional privacy preservation mechanisms always degrade data quality. In the process of data submission by participants, the existing techologies, e.g., differential privacy [6], fuzzy logic based routing [7] and etc., usually bring distortion to the sensing data, which reduces the accuracy of the MCS organizer's statistical results of the tasks.
In the MCS task allocation stage and data submission stage, the privacy of user location information is protected, and the trade-off between the privacy protection requirement of participants and the high-quality data requirement of the MCS organizer are achieved. We design and implement an MCS Data Sensing mechanism under User Privacy Preserved (DS-UPP). It introduce the technology of edge computing into MCS, and changes the traditional two-tier architecture, i.e., user and cloud, to three tier, i.e., user, edge computer server, and cloud. The introduction of edge computing can reduce response latency and eliminate the overhead on the backbone infrastructure. More importantly, the edge computing server can perform the management of mobile sensing users in its area where it is deployed, allocating sensing tasks and processing the raw sensed data properly [8].
After appropriate aggregation, the users' privacy can be protected efficiently.
In DS-UPP, we try to select participants who can provide sensing data with high quality. Different from existing work, based on the theory of compressive sensing we utilize the relationship among sensing data from different users and minimize the amount of data needed to be collected. DS-UPP can improve sensing cost significantly, which is also proved through experiments. For the privacy protection problem of participants, we design a privacy preservation algorithm for their submitted sensing data based on LDP. There are two main advantages of our algorithm. The first one is that it is implemented locally by each single participant, which can provide quantitative privacy preservation without relying on any other trusted entity. The second one is that the published data by the participants have the same statistical characteristics of the original data. There is no noise in the published data, and no statistical calculation is required to restore the original information.
Our main contributions in this paper are as follows: 1. We introduce edge computing into MCS and proposed the three-tier architecture, under which we make innovative designs for task allocation and user data submission. In task allocation, the edge servers distribute the task requirement to users, whose precise personal information is no longer necessary. For data submission, edge servers help the participants submit high-quality perturbed data which will not expose their privacy.
2. We define the sensing cost minimization problem with privacy preserved, and propose the DS-UPP mechanism for it. In DS-UPP, we develop an algorithm based on compressive sensing to minimize the amount of necessary sensing data, as well as an algorithm based on LDP to protect the participants' privacy.
3. We analyse the theoretical characteristics of DS-UPP. It is proved that DS-UPP satisfies ε-differential privacy. We give the mathematical lower bound and upper bound of the number of participants needed for task accomplishment with the constraints that privacy budget is ε and recovery error of task data is 0, as well as the average amount of data that should be submitted by a participant. Also, we implement a simulator and take thorough simulations to evaluate the performance of DS-UPP. For comparison, we implement the algorithm PrivKV. It is found from the results of experiments that DS-UPP can reduce the sensing cost by almost 90% on average under the same requirement of privacy and data quality.
This paper is organized as follows. In section II, we discuss the related work. In section III, we formalize the system model and define the problem. We then introduce our proposed architecture of MCS and the DS-UPP mechanism in section IV. In section V, we theoretically analyse the characteristics of DS-UPP and In section VI, we evaluate its performance through simulations. We conclude in Section VII.

II. RELATED WORK
When the strategy of MCS is taken, coverage quality [9] is used to measure how the MCS tasks are allocated. It is the number of sensing readings at each task location. In order to select the least number of users to participate and guarantee the quality, the MCS organizer needs to know the location of every user for assigning tasks. Reference [10] investigates the situation in which participants perform multiple tasks. The interdependency among multiple tasks are taken into account when allocating the tasks, which optimizes the overall utility of the tasks while ensuring the sensing quality of each one. It can be observed that in order to improve the efficiency of task execution in a complex environment, detailed and rich information of the participants is necessary for the organizer, which makes the privacy of participants vulnerable for leakage.
It is challenging to design privacy-preserving task allocation mechanisms in MCS. It is infeasible to send accurate information of participants directly to MCS organizer due to privacy protection. As a result, it may lead to unreasonable task assignments. To achieve the trade-off between privacy preservation and effectiveness in task allocation, [11] proposes a method in which users obfuscate their reported locations before submission, while minimizing total distances traveled by the mobile users for accomplishing of the sensing tasks. Different user's privacy levels are introduced in [12], which also tries to optimize the users' travelling distance. These methods add noise to the user's distance to the task according to LDP. But it also reveals that the participant is near the task in which he or she is involved. And neither protects location privacy when users submit data.
In [13], tasks are allocated from the view of users and the centralized server. Both worker-selected tasks (WST) model and server-assigned tasks (SAT) model are considered, which tries to balance the requirement of user privacy and task accomplishment quality. The way the user privately chooses the tasks it wants to perform protects the user's location, but the MCS system is less efficient because task assignments are not controlled. Then, some work has been done to design anonymous methods for participants. Reference [14] proposed a private method of using group signatures. Reference [15] proposes a way for mobile nodes such as vehicular to interact with each other in the block chain. There is also mechanisms based on deep reinforcement learning [16]. But those works add complexity to the approach. Different from the above work, we protect the participant's choice about the task and control the work efficiency through the private participant choice. The location information in the participant submitted data is also protected. We investigate the relationship among sensed data of different areas to select the pieces with high quality, minimizing the amount of data needed while satisfying users' privacy preserved.
Different technologies have been developed to protect data privacy in MCS, e.g., based on the theory of distributed agent [17], that of differential privacy [18]- [21] and etc. In the most VOLUME 8, 2020 related work [22], a data privacy protection algorithm for the key-value type is designed. It privatizes the user-executed task (key) and the user's data (value) for the task based on LDP theory, which makes sure that the perturbed data has the same statistical characteristics of the original data, e.g., the same estimated key value and mean value. However, it needs a large amount of sample data which means that its cost is high. Our work is different from it in that we optimize the sensing cost through minimizing the necessary data amount for the tasks.
Mobile Edge Computing(MEC) is used to protect the privacy of healthcare internet of things devices [23]. For the MEC based architecture for MCS, [24] proposes that the system load can be reduced and users' privacy can be better protected through the partition and distribution of users' sensitive data. Different from our work, it is more about the architecture and application scenarios. There is no specific algorithms for privacy preservation nor for sensing cost minimization.

III. SYSTEM MODEL AND PROBLEM DEFINITION
The architecture for mobile crowd sensing system supported by edge computing is shown in Figure.1. Sensing task requests are usually of different types and from different regions. We can assign different requests to different mobile edge computing (MEC) servers, which are deployed and in charge of sensing tasks in the certain areas. MEC servers select appropriate users to perform as participants for the tasks, collect data submitted by them, and then submit data to the MCS organizer after proper aggregation process. Thus, in this architecture the MEC server plays three important roles. One is to distribute tasks according to sensing requests and select suitable participants to join in. The second one is to collect and aggregate the data submitted by the participants and ensure that the task is completed successfully. Thirdly, the MEC server isolates participants and the organizer, which can effectively reduces the threats of privacy leakage of participants, especially after particular data aggregation.
We introduce some definitions and notations as follows.

A. DEFINITION
MEC server can be responsible for multiple different requests if they are from the area where the server is deployed and in charge of. For a request, we denote all its required tasks as the set T = {t 1 , t 2 , · · · , t N }. We denote the users as U = {u 1 , u 2 , · · · , u j , · · · }, and the selected participants as C i denotes the real sensing action of user u i . When c i n = 1, it means u i really performs the task t n . Different from it, S i denotes u i publishes data of task t n , but it may be unreal and in fact u i does not performs the task. This perturbation is taken just for privacy preservation. We use the correlation between S i and C i to measure the degree of privacy threats of participant u i .
In this paper, the participant's privacy discussed is mainly related with his preference for task selection, location, and the published data, which may reveal his trajectories and other private information. The emphasis of our work is on how the participants publish data. Our target is to guarantee that certain statistical features of published data are the same with those of the original sensed data, and from it attackers can hardly obtain the private information. Therefore, we regard the degree of privacy threat as equivalent to the risk of sensitive information to be leaked.
Definition 4: Privacy budget ε. We quantify the degree of privacy threats of participants based on the local differential privacy theory as follows.  Definition 5: Task data errorÊ. We use Mean Absolute Deviation to measure sensing data error.Ê is the average error of each task in task T .
where D is the published data. Definition 6: We define the total amount of data required to complete the tasks H as follows, which is also the sum of data collected by all participants.
We explain all the symbols used in the article.

B. SENSING COST MINIMIZATION PROBLEM WITH PRIVACY PRESERVED
We define the sensing cost minimization problem with participants' privacy preserved as follows.
min H (4) where c ∈ C, s ∈ S, u i ∈ U c and U c ⊆ U . (5) is to ensure that the MCS tasks are completed successfully, i.e., the amount of submitted sensing data for task t n satisfies All tasks T . (6) is the privacy budget requirement of participant u i .

IV. DATA SENSING MECHANISM UNDER USER PRIVACY PRESERVING IN MOBILE CROWD SENSING
We designed the DS-UPP privacy protection mechanism, which can satisfy the privacy preservation requirements, while minimizing the amount of data necessary for MCS.
We introduce the mechanism in two stages. The first step, MCS organizers allocate tasks to users in a private way. In the second step, the user submits the perceived data privacy to the MCS organizer.

A. TASK ALLOCATION BASED ON COMPRESSIVE SENSING
The MCS organizer coordinates all sensing requests and assigns them to proper MEC servers according to the region where the tasks are located. After that, MEC servers select participants based on compressive sensing theory, in order to minimize the amount of sensing data for accomplishing the tasks. Users choose tasks which they can collect sensing data for, based on their trajectories or preferences. MEC is deployed in base stations or routers to provide sufficient computing power and external power supply.
After choosing tasks, every user generates the Task Selection Matrix (TSM), and then calculates the correlation between TSM and the sparse transformation matrix, based on the compressive sensing theory.
Based on the correlation of each user, the MEC server selects the optimal combination of users as the participants. Participants are selected in a greedy manner, i.e., those participants whose correlation value is biggest will be selected first. Figure 2 shows the participants selection process.
In detail, task allocation in DS-UPP consists of three key steps.
1) When the MEC server receives a task assigned by the organizer, it divides the request into the task set T = {t 1 , t 2 , · · · , t N }. At the same time, it use the second-order difference matrix as the sparse transformation matrix. The sparsity of the data transformed by the second-order difference matrix can reach 5% for perception data [25]. A principal component analysis algorithm based on singular value decomposition is presented in [25]. We used this method to decompose different types of historical data to get a sparse transformation matrix.
2) The MEC server distributes task set T and the sparse transformation matrix N ×N to all users associated with it. Each user u i ∈ U selects tasks from T , and generates the Task Selection Matrix C i according to his preference. u i also calculates the correlation µ between C i and as follows.
i is generated by C i . The number of rows in i is the number of tasks collected by u i , i.e., N i , and the number of columns is N . i has a similar meaning but different form as compared with C i . In i = (φ mn ) N i ×N , N i = ||C u || 0 is the number of non-zero elements in C i , and N is the number of all elements in C i . The following equation gives the mathematical definition of i and Figure 3 demonstrates an example of how to use C i to generate i .
if the n th element in C i is the m th non-zero element, 0 else.

} (8)
As a response, u i sends back the correlation value µ i to the MEC server.
3) After receiving correlation values from all users, the MEC server sorts them in an descending order. The server also calculates the number of required participants U c according to the relationship between the amount of participants and the data error threshold. Finally, in a greedy manner the server selects U c participants, i.e., the first U c users according to the descent order of their correlation values.

B. PRIVACY PRESERVATION OF SENSING DATA BASED ON LDP
In this subsection, we introduce the method for participants to perturb data locally before publishing data. The working process of participants is shown in the Figure4.
For the participant u i , he collects the sensing data according to the task requirement, as shown in Fig.4(a). Then, he tries to recover data of all tasks using Algorithm 1, shown as Fig.4(b). We use prediction methods to generate the missing data of u i for certain tasks. After that, Algorithm 2 is taken for u i to perturb data before publishing, through which his privacy can be protected.

Input: Raw sensing data D i
Task acquisition matrix i Sparse transformation matrix The number of non-zero elements K in Output: All tasks dataD i 1: A = i × −1 2: Initialize residual data R 0 = D i ,vector ordinal set V 0 =Ø, vector set A 0 =Ø 3: for (t = 1; t K ; t + +) do 4: for (column vector a n in A) do 5: λ t = arg n max n∈{1,2,··· ,N } |R t−1 · a n | 6: end for 7: Calculate the least squares solutionˆ t :

1) RECOVER ALL TASK DATA
We denote the sensing data of participant u i as D i . According to the previous definitions, D i and D have the following relationship.
T is the transpose of a matrix. The sparse transformation matrix changes D to .
N is the number of all tasks, N i is the number of data collected by u i , and K is the number of non-zero data in the sparse matrix . M > N > K .
We know that (9) is an under-determined equation, so we cannot solve it for D directly. We use the following formula.

Algorithm 2 Privacy Preserving for Participant Data
Input: All task dataD i of u i Task Selection Matrix C i of u i Privacy budget ε Output: The publish data P i of u i 1: with D i is a positive definite equation. Candes has verified that when satisfies the RIP property, reconstruction is a solvable optimization problem [26].
D can be obtained through (10). Inferring the task of non-zero elements in is a convex optimization problem The time complexity of algorithm 1 is O(KN ). According to Orthogonal Matching Pursuit [27], we generate the sparse matrix and restore all tasks dataD by the Algorithm 1. First, we calculate the reduced dimensional transformation matrix A = i × −1 . Then, we initialize the residual R 0 = D i . The column vector a λ t which is selected from the matrix A has the largest inner product with the residual R t for each iteration. We calculate the least squares of the linear equation D = A t · t using all iterative column vectors, and the number of iterations is K . We can get the matrixˆ K according to the set V K . Finally, data for all tasks, i.e., the matrixD, can be obtained.

2) PARTICIPANT LOCAL DATA PRIVACY
We use D i to represent the raw sensing data of participant u i . It is known that C i and D i indicates the user's real choice of tasks and the real sensed data. We use S i to indicate the tasks to be published by u i . Through Algorithm 1, data for all tasks data can be obtained. Thus, the published data of We protect user privacy based on LDP theory, as shown in Algorithm 2. The relationship between the task in the data published by the u i and the real task of u i satisfies the ε-LDP constraint. We set the relationship between c i n ∈ C i and s i n ∈ S i as follows.
Specifically, there are four cases for the element in C i and S i . 0 → 0: Participants neither sense nor publish. 1 → 0: Participants sense but do not publish. 0 → 1: Participants do not sense but publish. 1 → 1: Participants sense and publish. Any element c i n in C i and s i n in S i of the same task n satisfies P(s i n = 1|c i n = 1) P(s i n = 1|c i n = 0) e ε , and P(s i n = 0|c i n = 0) P(s i n = 0|c i n = 1) Following Algorithm 2, participants locally process the sensing data and generate published data. The MEC server collects the published data from all associated participants and send it back to the MCS organizer, to accomplish the required tasks.
The time complexity of algorithm 2 is O(N ). When the user performs enough tasks, the missing data generated by the user can have no error. Theorem 2 shows the number of tasks collected is lower bound. If the lowe bound is not satisfied, there will be distorted data. We calculate the mean or mode of all participants' data. In practical applications, more incentives are provided to motivate users to perform more tasks, and multiple data are collected for each task to ensure the accuracy of the data.

V. THEORETICAL ANALYSIS
In this section, we analyse the performance of DS-UPP theoretically. Firstly, we prove that the published data in the DS-UPP paradigm satisfies the LDP privacy constraint.
Theorem 1: For the published data from any participant in DS-UPP, it satisfies εdifferential privacy.
Besides privacy preservation, DS-UPP also optimize the selection of high-quality sensed data, in order to minimize the sensing cost as shown in (4). From [28], we can obtain the following theorem. It gives the lower bound of the amount of data that every participant should submit to satisfy the requirement of compressive sensing.
Theorem 2: When the average amount of data for a participant in DS-UPP satisfies the original data can be recovered without error [25]. For the number of participants needed in DS-UPP, we have the following theorem, which gives its lower bound and upper bound.
Theorem 3: Under the constraints that privacy budget is ε and recovery error of task dataÊ = 0, the number of participants needed for task accomplishment in DS-UPP m c must satisfy log 1+e ε N m c < log 1+e ε e ε N . Proof: We assume that the number of tasks performed by the participants is N 1 , N 2 , N 3 , · · · N i · · · N m c . For a participant i, we can set the probability that any task will be taken by him is N i N . We can get that, in any participant's submitted data the probability that the perceived task is not involved is Because participants perform tasks independently, after the k participants submit data the probability that the task is not VOLUME 8, 2020 involved is So if the data completely covers all tasks, it must satisfy We can simplify it as We know that and we can analyse the upper bound and lower bound of the number of necessary participants.
In the worst case, participants only to collect the smallest size of samples. N i N can be regarded as 0 and the above formula is reduced to We can get m max In the best case, each participant collects as many samples as possible and we have Therefore, the range of m c is According to the above two theorems, we can get that the total amount of data needed in DS-UPP for the optimization problem (4) satisfies

VI. PERFORMANCE EVALUATION
To evaluate the performance of DS-UPP, we implement a simulator in Python and take thorough simulations. We analyse the influence of different parameters, i.e., number of different tasks, privacy budget and data sparsity ratio, as shown in Figure 5. The number of tasks included in each of the three scenarios is 10, 1000 and 100000. We use the metric Data Sample Ratio (DSR) to measure the performance, which is the average amount of data needed for accomplishing a piece of task. It also means the amount of data that must be collected for each task.

A. THE AMOUNT OF DATA REQUIRED OF DS-UPP
We evaluate the DSR under different settings of privacy budget and data sparsity ratio. Privacy budget changes from 0.1 to 10 and data sparsity ratio changes from 0.1 to 1, both with the step size 0.1. When the privacy budget is the minimum, equal to 0, the privacy of data is the greatest. With the increase of privacy budget, the privacy of data decreases and converges. When the privacy budget is larger than 10, the privacy state tends to be the same. For the results of experiments, we take 100 simulations to get the average value and in every simulation we generate the sensing data of each user randomly.
In the previous section, it is proved that DS-UPP meets the privacy requirement. It is observed from the experiments that in DS-UPP the DSR is below 2.25, 7 and 12 in the three different scenarios, which means that the average amount of sample data needed for each task is less than 2.25, 7 and 12, respectively. Also, it can be found that the DSR increases with the sparsity ratio, as well as the privacy budget, and converges to a upper bound. This upper bound slightly reduces when the sparsity ratio increases.
The increase of sparsity ratio requires more acquired values for the recovery of all data. Thus, the size of sampled data becomes larger and DSR increases, as shown in the left half part of the curves in Figure 5(a) and (b). On the other hand, when the privacy budget is large enough, the probability that the data stays true in the perturbation approaches to 1 and the sample size is mainly affected by the number of tasks that do not have collected data. In this case, as the sparsity ratio increases the amount of samples increases, too. As a result, the number of tasks missing data is reduced and DSR decreases, as shown in the right half part of the curves in Figure 5(a) and (b). As the total amount of tasks increases, the proportion of changes on the number of tasks missing data becomes smaller and smaller. Thus, the velocity of the DSR increase in the right half of the curves in the three sub-figures becomes slower and slower.
The increase of privacy budget loosens the privacy requirement, but it also leads to an increase of DSR. This is due to that the increase of the privacy budget raises the probability that the data stays true, but decreases the probability that users who do not have sensed data submit perturbed data just as they had finished the sensing task. As a result, more participants are needed to take part in the MCS tasks and DSR is increased.
The number of tasks affects the amount of data collected by a participant according to Theorem 2. Moreover, the number of tasks affects the number of participants according to Theorem 3. As a result, there is a logarithmic relationship between Task Volume and data sample ratio.
There are two special cases. For small number of tasks, the amount of data needed to be collected is almost the same for the close sparsity ratio, e.g., 0.1 and 0.2, 0.2 and 0.3, and etc., as shown in Figure 5 a. We can also find that when the sparse ratio is 0.9 and 1, the amount of necessary data is almost half of the total number of tasks. Therefore, regardless how the privacy budget changes, the number of tasks without sensing data in the submission remains the same. Therefore, DSR stays unchanged.

B. THE COMPARISON OF THE AMOUNT OF DATA NECESSARY WHEN THE DATA IS ACCURATE
We also implement PrivKV [22] for comparison. PrivKV privatizes the user-executed task (key) and the user's data (value) for the task based on the LDP theory, Which makes sure that the perturbed data has the same statistical characteristics of the original data. The difference between DS-UPP and PrivKV lies in two points. The first is that DS-UPP uses Compressed Sensing to generate data for the user's data collection task, while PrivKV generates random false data. Second, the data generated by DS-UPP is directly the MCS demand data, and no statistical calculation is needed.
The data accuracy of DS-UPP algorithm and PrivKV algorithm increases with the increase of the amount of data collected by users. We verify the comparison of the minimum amount of data collected under the privacy constraint and the data accuracy constraint.
For both PrivKV and DS-UPP, we set the same range and distribution of the task values in simulations. We evaluate their performance on DSR under four different parameter settings, i.e., different sparsity ratios and privacy budgets, as shown in Figure 6. The x-axis of the sub-figures denotes the number of tasks. It can be found that DSR of PrivKV does not change with the number of tasks. This is because in PrivKV DSR is only related with the range of the task values, which remains the same in simulations. In addition with the average DSR values of DS-UPP in simulations, we also evaluate its lower bound and upper bound values, which are demonstrated as the domains of DS-UPP in the figures. It is observed that the size of sample data required in DS-UPP is much smaller than that in PrivKV, which can be reduced by about 90% in each parameter settings. This proves the effectiveness of our optimization on the selection of sensing data with high quality, which greatly reduces the necessary sensing cost.

VII. CONCLUSION
In this paper, we designed the DS-UPP mechanism to solve the problem of maximizing data efficiency while protecting users' privacy in MCS supported by edge computing. Edge servers help the participants submit high-quality perturbed data which will not expose their privacy. In DS-UPP, VOLUME 8, 2020 we develop a compressive sensing based algorithm to minimize the amount of necessary sensing data. Based on LDP theory, we develop an algorithm to protect participants' privacy. We analyse the performance of DS-UPP theoretically, and also evaluate its performance through simulations. It is found from the results of experiments that DS-UPP can reduce the sensing cost by almost 90% on average compared with the existing algorithm PrivKV.