Clustering Based Priority Queue Algorithm for Spatial Task Assignment in Crowdsourcing

Spatial crowdsourcing is an increasingly popular category in the era of mobile Internet and sharing economy, where tasks have spatio-temporal constraints and must be completed at specific locations. In this article, we focus on <italic>the</italic> <bold>M</bold><italic>ulti-</italic><bold>O</bold><italic>bjective</italic> <bold>S</bold><italic>patio-</italic><bold>T</bold><italic>emporal task assignment (MOST) problem</italic> considering the worker heterogeneity in spatial crowdsourcing and model it as a combinatorial multi-objective optimization (MOO) problem with the goals of maximizing the overall task completion rate and minimizing the average task time cost. Finding the optimal global assignment turns out to be intractable since it does not simply imply optimality for an individual worker, as a typical nearest-neighbor heuristic generally does not render a satisfactory result. We prove that the problem is NP-hard. Subsequently, we formulate an efficient algorithm for the MOST problem — <bold>Ta</bold><italic>sk Clustering based</italic> <bold>M</bold><italic>ixed</italic> <bold>P</bold><italic>riority Queue Scheduling (TAMP)</italic>. First, we improve the spectral clustering algorithm to evenly divide the task network into different subdomains according to tasks’ geographical locations, considering the task clustering phenomena in real scenarios. We then design a mixed priority queue strategy considering the geographical influence and temporal urgency, to schedule workers finishing tasks in sequence. Experiments on synthetic and real datasets demonstrate the efficiency of our solution over other methods.

(e.g., 5G network), a new class of crowdsourcing has emerged, called spatial crowdsourcing (SC) [3].Spatial crowdsourcing advances the potential of a crowd to perform tasks related to real-world scenarios involving physical locations, which were not feasible with conventional crowdsourcing methods.The main feature of spatial crowdsourcing is the presence of spatial tasks that require workers (with smartphones) to be physically present at a particular location for task fulfillment.Its natural connection with the physical world makes spatial crowdsourcing a computing paradigm for a broad spectrum of daily applications, such as real-time ride-hailing services (e.g., Uber) [4], product placement checking supermarkets [5], road condition monitoring [6], crowdsourcing-aided positioning [7], etc.
A representative spatial crowdsourcing model consists of three types of participants: requesters (clients), workers (the crowd), and crowdsourcing platform (server).The general framework of spatial crowdsourcing model for task assignment is shown in Fig. 1.The task publishers release tasks also known as human intelligence tasks (HITs), and then the worker requests these tasks.The crowdsourcing platform acts as a broker between the task publishers and the workers.The crowdsourcing platform aggregates the information of publishers and workers, then assigns tasks to suitable workers by the algorithm.In practice, a spatial crowdsourcing platform is the core of the system and often needs to manage massive tasks and workers every day.
Thus, the major challenge of the spatial crowdsourcing platforms is how to assign the large-scale tasks to their workers, i.e., task assignment.In addition, most of the existing studies focus on task assignment based on the whole study area [8], [9], [10], [11], [12], [13].
The platforms usually aim to arrange the tasks to suitable workers with different optimization objectives, such as maximizing the total number of assigned tasks or the full payoff of the tasks to their assigned workers, minimizing the total traveling costs of the allocated workers.The objective is generally determined based on real needs and constraints.For example, one common challenge in spatial crowdsourcing is that the tasks reachable by each worker highly depend on the distance between origin and destination as well as the tightness of deadline, which have to be treated carefully in constructing the task assignment algorithm.
Therefore, designing an efficient assignment mechanism is of paramount importance for the SC platform, which could improve the system efficiency by increasing the income of workers and saving the cost of the platform.Based on the basic problem characteristics, task assignment in SC can be classified into two different categories: task matching and task scheduling.Task matching provides guidance on which tasks to perform: the assignment mechanism tries to match a set of tasks to workers.Task scheduling provides a plan (or order) to perform tasks located at different places: the assignment mechanism schedules the order of tasks for the workers.The problem of task scheduling is unique to spatial crowdsourcing.
Geographic information is of vital importance in the field of SC and is a necessary condition to allocate tasks in the spatial dimension.As the task allocation problem is NP-hard in its general form [14], it is easier to obtain an accurate solution by dividing a complex spatial problem into multiple sub-problems based on geographical information.However, most of these studies ignored the temporal information of workers and tasks, and thus do not apply readily to an SC application.Niu et al. [15] propose a pricing model based on the distance information and the number of workers, but the expiration time of tasks is not considered.
In addition, in real crowdsourcing situations, it is often observed the phenomena of task clustering, i.e., most of the tasks are concentrated in a few regions in the space instead of being distributed uniformly.The reason for this phenomenon is that community structures are quite common in real networks [16].For example, in the take-out scene, the locations of tasks are often concentrated in densely populated areas, such as schools or office spaces.However, in sparsely populated places or suburbs, the distribution of tasks is relatively sparse.If let workers select tasks by their preferences, workers will only choose to complete tasks that are closer, resulting in remote tasks that have not been responded to forever, thereby affecting the overall task completion rate.
Based on the observation, we introduce spectral clustering [17], a graph clustering algorithm in our integrated algorithm to partition the task network.Besides, we improve the spectral clustering algorithm by applying θ-sparseness to reconstruct the affinity matrix for the reasonable time complexity and enhancing the fairness of subdomain division.In view of the divide-andconquer idea, we first divide the task network into different subdomains based on their locations in space and allocate workers to the corresponding subdomains to finish tasks.
Compared to the previous work, the hardness of our problem lies in that, once the traveling cost associated with moving to tasks' locations, the expiration time of tasks, and the heterogeneity of workers are taken into account, the locally optimal assignment does not guarantee global optimality.In other words, assigning the most jobs to each worker does not necessarily imply the maximum number of accomplished tasks by all workers.
To the best of our knowledge, a unified assignment mechanism considering spatio-temporal constraints of tasks, worker heterogeneity, and task clustering, with multiple objectives in the multi-user dynamic environment in SC systems has not been all probed together, so far.In summary, we make the following contributions: r To promote the overall performance of task assignment, the task queue scheduling problem is modeled as a multiobjective joint optimization problem, the Multi-Objective Spatio-Temporal task assignment (MOST) problem, which focuses on maximizing the overall task completion rate and minimizing the average task time cost, and considers the worker heterogeneity simultaneously.Besides, we prove the problem is NP-hard.
r Considering clustering phenomena of spatial crowdsourc- ing tasks, spectral clustering is introduced to divide tasks into subdomains.At the same time, in order to better learn the spatial relationships between tasks and reduce the complexity of the algorithm, we creatively reconstruct the affinity matrix by θ-sparseness method to improve spectral clustering.
r The proposed algorithm, Task Clustering based Mixed Priority Queue Scheduling (TAMP) algorithm, integrates two critical objectives, the temporal constraints and the spatial information of spatial tasks, into one joint metric and uses it to make decisions on sequences of task execution.The choice of the combined weight of these two metrics is dynamic and investigated in the design.
r Extensive experiments on both synthetic and real data are performed to compare the proposed scheme with different comparative techniques, and the results show that the new scheme outperforms others.This paper extends the initial study [18], via (i) surveying up to date literature and summarizing the comparison of task assignment models in Section II; (ii) reformulating the combinatorial multi-objective optimization problem and proving it as a NP-hard problem in Section III; (iii) designing a sparseness method to improve the spectral clustering efficiency; (iv) adding more explanations for model framework and the time complexity analysis for algorithm in Section III-B3; (v) updating existing figures and add more diagrams; (vi) improving the organization and presentation of the paper by a major revision and careful proofreading.

II. RELATED WORK
Spatial Crowdsourcing (SC) can be deemed as one of the main enablers to employ smart device carriers as workers to move to some specified locations and perform location-based tasks physically [38].A recent survey on spatial crowdsourcing is [39], which reviews the existing research on major algorithmic issues Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I TAXONOMY AND ANALYSIS OF RESEARCH CONCERNING TASK ASSIGNMENT IN SPATIAL CROWDSOURCING
such as task assignment, quality control, incentive mechanism design, and privacy protection.Moreover, the first challenge of the spatial crowdsourcing platforms is task assignment, which is the basis for other research studies.
In view of the task publishing mode, SC can be classified into two categories, namely Worker Selected Tasks (WST) mode and Server Assigned Tasks (SAT) mode [3].WST mode gives workers the right to directly select the tasks based on their own preference without coordination with the server [19], [20], while SAT mode requires the server to assign tasks to the interested workers based on the system optimization goals [21], [22], [23].In WST mode, no specific task allocation algorithm is required, and it suffices for the platform to receive and process the orders of the workers.The users explore the optimal task assignment to befit their own instead of the platform.One drawback of this mode is that the SC server has no control over task allocation.This may result in some spatial tasks never being assigned, while others may be assigned redundantly.(The comparison of the existing works is shown in Table I.) Another drawback of WST is that workers choose tasks based on their own objectives (e.g., choosing the closest spatial tasks to minimize their travel cost), which is not necessarily the ultimate objective of the SC-server (i.e., maximizing the overall task assignment).Besides, incentive mechanisms have been widely used in WST mode.Wang et al. [24] study a worker incentive model combined with both a genetic algorithm and an ant colony optimization algorithm to maximize the task completion quality while minimizing the incentive budget in the whole area.Zhu et al. [40] propose Incentive-aware Task Location (ITL) for a location-unspecific task with a fixed budget, the aim of which is to maximize the number of workers who are willing to participate in the task.And the work proposes three heuristic methods to solve it, including even clustering, uneven clustering, and greedy location methods.
In SAT mode, the server of the crowdsourcing platform assigns tasks to nearby workers usually based on the system optimization goals such as maximizing the number of assigned tasks after collecting all the locations of workers [25], maximizing the total payoff from assigned tasks [26], maximizing the expected total utility achieved by all workers [27], maximizing task reliability for dynamic task assignment [28], maximizing platform profit considering worker utilities simultaneously [29], maximizing the expected quality of results from workers by a real-time budget-aware task package allocation [30], or maximizing the spatial/temporal coverage where/when workers perform tasks [31].
Most existing studies adopt the SAT mode, where an SC server takes charge of the task assignment process.For example, Cheng et al. [32] propose a reliable diversity-based spatial crowdsourcing (RDB-SC) problem in SC, where an SC server assigns tasks to suitable workers in order to maximize the diversity score of assignments.Zhao et al. [33] propose a preference-based task assignment problem and design a tensor-decomposition-based algorithm to learn worker preferences, after which the assignment problem is transformed into a Minimum Cost Maximum Flow (MCMF) problem.However, they all assume that each worker can only perform tasks in a specific spatial region, while we do not exert in our model a hard constraint on the working area.Therefore, these works have a much smaller search space in their problem settings compared to ours.
Moreover, within the SAT publishing mode, tasks assignment can be further classified into two different modes: Single Task Assignment (STA) mode and Redundant Task Assignment (RTA) mode [3].STA mode assumes that all the workers are trusted and can perform the tasks correctly without any malicious intentions so that each task is only assigned to one worker in STA mode.However, there inevitably exist some malicious workers that might intentionally complete tasks incorrectly.Therefore, RTA mode is proposed to improve the validity of task completion by assigning each task to several nearby workers.In RTA mode, the task completion result with the majority vote is regarded as correct [34], [35].
Among the above studies in SC, traveling cost is critical, due to the fact that SC workers have to physically move to the locations Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
of spatial tasks in order to perform them [36], [37].For instance, considering task localness, which refers to workers' preferences based on their traveling cost (i.e., workers are more likely to accept nearby tasks), [36] proposes an effective task assignment framework by modeling task acceptance rate as a decreasing function of travel distance.Cheung et al. [37] formulate the interactions among users as a non-cooperative Task Selection Game (TSG), and propose an Asynchronous and Distributed Task Selection (ADTS) algorithm that balances the rewards and traveling costs of the workers for completing tasks.
In task assignment, modeling the assignment score as the shortest path in visiting the locations of multiple tasks becomes similar to the traveling salesman problem (TSP) and vehicle routing problem (VRP) [41].Since there is only one worker in TSP, here we discuss VRP.Different variants of VRP have been studied [42], [43], still, there are differences between our task assignment problem and these variants.Compared with VRP, our goal is to maximize the overall task completion rate and minimize the average task time cost simultaneously, whereas VRP aims to minimize the total traveling time of all workers.Besides, in VRP, all workers start from the same location, whereas, in our setting, workers have different initial locations.
The latest model is extended to multiple workers in [10], which is the closest related work to our study.In that paper, the authors propose a Task Allocation with Geographic Partition (TAGP) framework for the Multi-Center-based Task Allocation problem (MCTA), which aims to maximize the allocated task number and achieve the allocation fairness among workers.More specifically, the work first utilizes a Voronoi diagram mechanism to decompose a complex multi-center graph into multiple smaller single-center-based graphs and then adopts a Reinforcement Learning method to allocate tasks by transforming the task allocation problem into a multiple traveling salesman problem (MTSP).The idea is similar to our work, which is to divide the whole area first and then assign tasks for different subdomains.However, this work still transforms the task assignment problem into a MTSP inconsistent with facts in spatial crowdsourcing, because the center of a graph assumed by the work does not exist in practice.Besides, worker heterogeneity is not taken into account.
Our problem, which is discussed in this paper, is a version of the task assignment problem considering traveling time cost and worker heterogeneity in STA mode.

III. THE PROPOSED SCHEME
In this section, we first present our model architecture and give a formal statement of the Multi-Objective Spatio-Temporal task assignment (MOST) problem.Then, we explain each part of the proposed scheme in detail.

A. Model Architecture and Problem Statement
Here we investigate a kind of task assignment mechanism under the above spatial crowdsourcing model with Single Task Assignment (STA) mode, referred to as single spatio-temporal task assignment.Specifically, given a user's current location, the platform aims to find an optimal assignment between tasks and workers such that the overall task completion rate is maximized and the average task time cost rate is minimized.In particular, we note that the task assignment is actually made up of two sub-problems: 1) for each task, we need to assign it to a suitable worker; and 2) for each worker, we need to schedule a sequence that each worker follows to perform the assigned tasks.(The list of involved notations is given in Table II.) Before presenting our problem, we first formally define the spatial tasks and the workers in spatial crowdsourcing.
Definition 1 (Spatial Task): A spatial task s i is characterized by a 2-tuple s i = ls i , e i , which implies that the task s i is located at ls i , and will expire at time e i .ls i is a position in 2D space expressed as its coordinate (x i , y i ).
For simplicity and without loss of generality, most studies assume that the processing time of each task is 0 and the workers' speeds are the same, but we consider the worker heterogeneity in our model.

Definition 2 (Worker):
is a carrier of a mobile device who volunteers to perform spatial tasks.A worker can be in either online or offline mode.A worker is offline when she is unable to perform tasks and is online when she is ready to accept tasks.An online worker is associated with her current location lw k = (x k , y k ), the traveling speed and processing speed of which are v k and p k respectively.In addition, she has to return to her initial departure location lw 0 k before her deadline e w k .In spatial crowdsourcing, the query of a spatial task s i can be answered only if a worker w k is physically located at that location ls i .Therefore, considering the expiration time of task s i and the worker's deadline, it can be completed only if a worker w k arrives at ls i finishing the task before its expiration time e i and returning back to her initial departure lw 0 k before her deadline e w k , which implies the constraint where a k,i is the arrival time of w k at the location ls i for task s i , a k,0 is the time w k back to departure lw 0 k , 1 p k is the time of w k processing task s i , and d(s i , lw 0 k ) is the distance between task s i and the initial location of w k .
Note that in the STA mode, the platform can assign every spatial task to one worker only.Once worker w k is online, she sends a task inquiry to the server, which includes her current location lw k .The server will take all the available tasks and workers at the particular time instance into account and return a task sequence to w k .
Let t denote the current time.The distance d k i (t) between worker w k and task s i at t is calculated as their euclidean distance, i.e., where ls i and lw k are respectively the location of task s i and worker w k at the moment t.
The maximum allowable remaining time for task s i is determined based on the time left for the task before its expiration time, given by t i = e i − t. (3) Due to the foregoing descriptions, we formally formulate the problem statement as follows.
Problem Statement (MOST Problem): In a model of spatial crowdsourcing containing a crowdsourcing platform, plenty of task publishers, tasks, and workers, how does the crowdsourcing platform with STA mode simultaneously consider the spatiotemporal interactive information of tasks and the heterogeneity of workers to assign tasks to suitable workers, so as to maximize the number of accomplished tasks and reduce traveling cost of workers?
Optimal Objective: To better analyze and solve the problem, we formalize it as a multiple-objective joint optimization problem, MOST problem, which has two optimal goals: maximizing the overall task completion rate δ and minimizing the average task time cost τ simultaneously.
Because a task only can be assigned to a suitable worker, then we assume x k,i = 1 if worker w k complete task s i , otherwise x k,i = 0. Let S = {s 1 , s 2 , . . .} be the set of all tasks, and S A denote the set of tasks that are accomplished by the task assignment strategy A. Obviously, S A ⊆ S. Thus, the maximization of the overall task completion rate δ can be expressed as: subject to: x k,i = 0, or 1. (5) Similarly, the minimization of average task time cost τ for accomplished tasks is expressed as: subject to: x k,i = 0, or 1.
Here, d w k is the traveling distance of worker w k , and t p k is the time spent on processing tasks of worker w k .
The MOST problem can be proved to be NP-hard by reduction from the Maximum Coverage (MC) problem.In the following, we give the definition of the achievable task set for subsequent proof and then prove the MOST problem as NP-hard.

Definition 3 (Achievable Task Set (ATS)): A task set S k
A is called an achievable task set (ATS) for a worker w k , if there exists a task assignment strategy A k , such that, r all the tasks of S k A can be completed before their respective expiration time, i.e., a k,i + 1 p k ≤ e i for each s i ∈ S k A , and r worker w k can return back to departure on time after completing all tasks S k A , i.e., a k,0 ≤ e w k for each s i ∈ S k A .Lemma 1: MOST problem is NP-hard.Proof: We first introduce the Maximum Coverage (MC) problem, which is proven to be NP-hard [44].Given a collection of sets R = {R 1 , R 2 , . . ., R K } over a set of objects Ω, where R i ⊆ Ω, and a positive integer l, the MC problem is to find a subset R ⊆ R such that |R | ≤ l and the number of covered elements by R is maximized.

B. TAMP: Task Clustering Based Mixed Priority Queue Scheduling
Since the MOST problem is NP-hard, a simple greedy algorithm is to use the maximum achievable task set for each worker as the assignment result.This can hardly be a satisfying result since multiple workers may be assigned the same set of tasks which may leave more tasks unassigned.In this paper, we propose the spectral clustering based scheme, Task Clusteringbased Mixed Priority Queue Scheduling (TAMP), which works in the above problem setting for task assignment.
The whole framework of TAMP algorithm is shown in Fig. 2, with two parts: task network division and worker queue scheduling.First, TAMP initializes the network, and reconstructs the network by θ-sparseness.Then, enhanced spectral clustering divides the task network into subdomains according to tasks' geographical locations by Enhanced Spectral Clustering (ESC).Next, tasks of each subdomain are allocated to corresponding workers.Moreover, the task queue for a worker is rearranged by a mixed metric incorporating geographical location information as well as the task's temporal emergency.Finally, return the target task that the worker needs to accomplish in the next moment by Mixed Priority Task (MPT), which calls two subalgorithms -Returnable Task (RT) and Not-Returnable Task (NRT).Finally, schedule workers to accomplish those tasks by the final task queue through Queue Scheduling (QS) algorithm.
1) Task Network Division: In order to group the network of tasks into subdomains, the spectral clustering algorithm is adopted to divide the network into subareas {Ω k }, k ∈ {1, . . ., |W |}.Every Ω k has a designated worker, who is mainly responsible for all tasks located inside.
To apply spectral clustering, the key step is to learn the affinity matrix to measure the similarity among data points.In the paper, we apply θ-sparseness to sparse the distance matrix for reconstructing the affinity matrix.Here we reconstruct the affinity matrix by matrix sparsification for two main reasons: 1) the spectral clustering algorithm needs to calculate the eigenvectors and eigenvalues of the affinity matrix, and the sparse processing could reduce the computational complexity in order to study the spatial relationships between tasks within a reasonable time; 2) the sparse processing saves the task information in a much closer neighborhood, and the subsequent subdomain division can divide the subarea as fairly as possible.
We first calculate the geographical distance matrix G, where G i,j is the euclidean distance between each pair of tasks s i and s j in 2D space: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Algorithm 1: Enhanced Spectral Clustering (ESC).
Obviously, the distance matrix elements should be nonnegative.Simply introducing the matrix into the spectral clustering algorithm does not impose any constraints on the graph sparsity, which will lead to expensive computing costs and might introduce noise (i.e., unimportant edges).Besides, it is not sparse enough that the spectral clustering algorithm cannot focus on the more proximity tasks.
Therefore, we extract the sparse non-negative adjacency matrix M from G by considering only the node pair with a much closer distance.To make the hyperparameter of the extraction threshold insensitive and not destroy the graph's sparsity distribution, we adopt a relative ranking strategy for the entire graph.Specifically, we mask off (i.e., set to zero) those elements that are larger than a non-negative threshold, obtained by ranking the metric value in G.The adjacency matrix can be reconstructed by θ-sparseness where Rank θn (G i * ) returns the θn-th smallest value in ith row of distance matrix G, n is the number of nodes, and θ controls the overall sparsity of the generated graph.
It should be noticed that A is not necessarily symmetric based on the definition of the connectivity.In order to obtain a symmetric affinity matrix required in spectral clustering algorithm, we define the affinity matrix B as below: It is different from the traditional sparse method that introduces KNN algorithm to calculate the affinity matrix [45].The latter sets an absolute number threshold to select the neighbors by distance matrix.In our method, the hyperparameter θ could control the sparsity of the newly generated graph, and the number of removed elements could vary with the size of the graph.
As the value of parameter θ largely influences the obtained clusters, we need to carefully regulate the value of θ to make the number of tasks in each cluster more even, which considers the fairness for workers.The choice of θ will be discussed later in Section IV.
According to the above process, we can rebuild a new affinity matrix, and divide the task network into different subdomains by spectral clustering algorithm as shown in Algorithm 1 -Enhanced Spectral Clustering (ESC).Then we begin to consider how to assign a suitable worker to the corresponding subdomain and schedule the worker to accomplish assigned tasks.
2) Worker Queue Scheduling: At first, we note the center of the subdomain Ω k as (x k , y k ), which is simply given by Additionally, we set the number of clusters as |W | when dividing the clusters.Thus, the tasks in every subdomain could be assigned to a specific worker, because the number of subdomains is equal to the number of workers, which is shown in the following part.
Here, we need to sort the subdomains {Ω k } by their size |Ω k |, and the subdomain containing more tasks needs to be prioritized by the nearest worker for the reason that the more tasks in the subdomain, the less traveling cost need to be paid in the domain to ensure that more tasks are completed.Then the worker who is nearest to the subdomain center is allocated the tasks in the subdomain.
As shown in Fig. 3, the remaining assigned tasks for a single worker w k could be formed into a task queue Ψ k , which will be rearranged by the mixed priority strategy considering both her geographical distance to the task and the task's temporal emergency.This scheme will schedule the worker to accomplish the corresponding target task with higher mixed priority.
In the current task queue for some worker w k , we denote the geographical distance of the nearest task by d k min and the furthest task by d k max .In order to make the distance comparable among workers, we first normalize the distances from w k towards different tasks by defining her spatial priority of each task s i as: where is a very small number to prevent potential overflow due to division by zero, which is set to 10 −6 by default.Similarly, we define the temporal emergency of the task s i for some worker w k as the maximum allowable remaining traveling time t i .Let t min denote the remaining time of the most urgent task and t max denote the least.The temporal priority of task s i can be calculated as: To incorporate both metrics to evaluate the importance of a task s i for w k , a joint mixed priority ξ (m) k (i) is obtained by combining spatial and temporal priorities: where parameter α balances weights of spatial and temporal priorities, which is between 0 and 1.The influence of α will be discussed and studied in Section IV through experiments.Generally, tasks with smaller values of mixed priorities, ξ (m) k (i), are given higher priorities to be served first.The results of the algorithm proposed in this paper are quite different from those based solely on time constraints or spatial information.Here we give an example to explain the utility of the mixed priority strategy.
Example 1 (Utility of the Mixed Priority Strategy).Fig. 4 shows the spatial and temporal information of worker w 1 and tasks s 1 ∼ s 4 .Besides, let w 1 could finish processing 4 tasks at one time unit, and the traveling speed is 2.
The lines with arrowheads show the routes for w 1 under different priority strategies.w 1 will first go to process s 2 , then s 3 , s 4 , and back to initial location considering temporal priority only (see the yellow route shown in Fig. 4), in which w 1 could not finish task s 1 before its expiration time (w 1 accomplish s 2 , s 3 at 2 + 1 4 = 7.42, which is exceeding s 1 ' expiration time e 1 = 7).Whereas if only the geographical location information is considered, w 1 will first go to process s 1 , then s 4 , s 3 , and back to initial location (see the green route shown in Fig. 4), in which w 1 could not finish task s 2 before its expiration time (w 1 accomplish s 1 at time t = 2 √ 2 2 + 1 4 = 1.66, and she finishes processing s 2 at least at time t = 1.66 + √ 5 2 + 1 4 = 3.03, which is exceeding s 2 ' expiration time e 2 = 3).
However, for the mixed priority metric (set α = 0.5) considering both temporal constraints and spatial information, the worker w 1 will finish the task s 2 first (the mixed priority of s 2 is minimum, which is 0.5 167), then s 1 and s 3 , and then s 4 (see the red route shown in Fig. 4).Obviously, the number of accomplished tasks is most by mixed priority strategy, and the route of the mixed priority strategy is quite different from the other two priority strategies with pure temporal or spatial.
Although we could schedule the worker to process the assigned tasks in a subdomain by the mixed priority strategy, in some extreme cases, when a worker is unable to tackle currently assigned tasks, those tasks will be forwarded to a nearest worker for help following the specific forwarding rules.
In order to ensure the shortest collaboration paths and reduce the traveling time, we construct the shortest Hamilton path.We use H i to represent the length of Hamiltonian path from worker Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Algorithm 4: Returnable Task (RT).
w k to the ith task created based on nodes in Ψ k including the location of w k .We denote the distance between s m and s n as d m,n , and H i can be calculated as: where u i is the first node of ith path and e i is the last node.The worker queue scheduling scheme is mainly shown in QS algorithm, the part of which is split into MPT, RT and NRT algorithm.First, task subdomains are allocated to their Algorithm 5: Not-Returnable Task (NRT).designated workers (line 5-10).Moreover, the task queue for a worker is rearranged by a mixed metric incorporating geographical location information as well as the task's temporal emergency (line [11][12][13][14][15][16].Finally, return the target task that the worker needs to accomplish in the next moment by MPT algorithm.If current task could be Returnable, the scheme select next task for the worker by RT algorithm.Otherwise, the scheme will re-select a new task for the worker by NRT algorithm.Besides, in case a task cannot be served by its initial designated worker, this work will send it to the nearest worker for help. 3) Scheme Analysis: In the task assignment architecture, when the crowdsourcing platform schedules workers to accomplish corresponding tasks, there are still remaining two important problems: whether the working time of the worker is exceeded and whether the allowable arriving time of the task is exceeded.
Returnable-In-Time Test: The worker w k need to finish the assigned tasks and return back to initial departure location before her deadline e w k .After selecting a target task s i , a worker will pre-calculate whether she could finish the task and return back to her departure location before her deadline.If so, she will move forward to the next task s i ; otherwise, she will re-select a new task in the mixed priority queue and forward current task s i to a nearest worker: Worker-Still-Online Test: When a worker is assigned a new task by a crowdsourcing platform, the worker should be tested whether she's still online.
When worker w k is assigned a new task s i , w k couldn't finish and return back to the initial departure before her deadline.Then if the task is the last one in the current task queue of worker w k , the worker would enter into the offline state; otherwise, the worker stays online and needs to re-select a new task in the remaining mixed priority queue.

Offline Test =
offline, s i is the last and not returnable, online, otherwise.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
When the worker enters into the offline state, the crowdsourcing platform does not consider assigning a new task to the worker anymore.
When a new task is assigned to some worker, and the worker could finish the task and return back to the initial departure before her deadline, if the set of remaining tasks for the worker is not empty after finishing the currently assigned task, then the state of the worker is still online.Otherwise, the worker enters into the online-idle state.
Idle Test = online-idle, the remaining task set is empty, online, otherwise.
If a worker is online-idle, the crowdsourcing platform will not actively assign a new task to the worker, unless the worker's neighboring workers can't complete the assigned tasks (i.e., the task can't be finished by her neighboring worker and its nearest worker is her).
Reachable-in-Time Test: In the task assignment architecture, a subtle issue is that, no matter how hard the worker works, there might exist task requests that will never be served.Such requests should be either removed from the queue or delivered to a nearest region's worker for help.
When a task s i request is received, w k should compute the earliest time to reach the location of the task and compare it with the task's maximum allowable traveling time.If a worker cannot reach the task before the maximum remaining allowable traveling time, the task request should be simply dismissed or forwarded to the nearest subdomain for another online worker for help.The worker's finished time of the previous task . This decision can be calculated by (19) as well: The platform first allocates the worker to the corresponding subdomain and then assigns a new task to the worker who is the first to complete a previous task and still online.(When two or more workers finish a previous task at the same time, the platform would choose the worker whose id is smaller.)The Queue Scheduling (QS) algorithm is shown in Algorithm 2. Besides, the target task is assigned to the worker with the highest mixed priority by Algorithm 3 MPT algorithm.Moreover, If a worker could finish the current task before her deadline, Algorithm 4 RT algorithm will return the reachable task; If a worker could not finish the current task before her deadline, Algorithm 5 NRT algorithm will re-select a new task for the worker.
Conventional spectral clustering typically consists of two time-consuming phases, namely, affinity matrix construction and eigen-decomposition.It generally takes O(N 2 d) time to construct the affinity matrix, and takes O(N 3 ) time to solve the eigen-decomposition problem [46], where N is the data size and d is the dimension.Thus, the time complexity of Enhanced Spectral Clustering in Algorithm 1 is O(|S| 2 + |S| 3 ).

A. General Setup
We report the results for two sets of experiments over the proposed scheme on both synthetic datasets (SYN) and real datasets (REAL).All the experiments are carried out on a machine with 6 cores of AMD R5-4600 U and 16 GByte RAM.
In the first part of experiments, we evaluate the impact of the hyper-parameters, in particular, θ and α, on the performance of our approach on synthetic datasets.In these experiments, we evaluate the performance through 2 important metrics: 1) the overall task completion rate, and 2) the average task time cost.
In the second part of experiments, we fix the hyper-parameters determined in the first part and evaluate the scalability of our proposed approach by varying the number of tasks and workers on both synthetic and real datasets.

B. Experiments on Synthetic Data Sets
For the synthetic (SYN) datasets, we use random data following two different distributions: uniform (SYN-UNIFORM) and skewed (SYN-SKEWED).With regard to SYN datasets, 50% of the tasks are generated in twenty clusters (with standard deviation as 1 and randomly chosen centers) and the other 50% of the tasks are uniformly distributed, i.e., 50% of the tasks are SYN-SKEWED and others are SYN-UNIFORM.This is motivated by the clustering characteristic of tasks in practice.
1) Effect of Parameter θ: Our experiments first decide the best value of parameter θ for applying θ-sparseness to reconstruct the affinity matrix in Algorithm 1.We conduct experiments on a 200 × 200 km 2 space where tasks with a cluster characteristic as illustrated in Fig. 5, where circles represent tasks in space.The default values of all the parameters used in our experiments are summarized in Table III.
At the first time, we have no clear idea of the effects of parameter α in our algorithm, and we set α = 0.5, which implies that the time and distance factors have the same level of impact on the priority of the tasks.Since the synthetic data are generated randomly, in order to reduce the impact of randomness on the experimental results, the experiments are repeated 1000 times for each value of θ, and the means of the metrics are reported.
In Fig. 6, we illustrate the overall task completion rate δ and average traveling time cost τ .We notice that in general, when we have the parameter θ = 0.007 for θ-sparseness, the overall task completion rate δ is largest, and the average traveling time cost τ is relatively low and close to the minimum obtained in the experiment.Besides, we can see from Fig. 6 when θ < 0.007 or θ > 0.007 both measures will get worse, which shows that the sparseness degree of the affinity matrix in Spectral clustering would affect the effect of the whole model.This suggests that θ = 0.007 is a reasonable choice for Spectral clustering and we fix the value in the following experiments.At the same time, the value of θ is very small, which will have an impression on the complexity of Spectral clustering.A smaller value of θ can reduce the complexity of the whole algorithm to some extent, and improve the performance of algorithm.
2) Effect of Parameter α: With the fixed value of parameter θ, in this part, we will decide the value of parameter α, which influences the weights of the temporal and spatial factors in the integrated priority function.The parameter θ is set as 0.007, and the other parameters are shown in Table III as well.Meanwhile, in order to reduce the impact of randomness on the experimental results, the experiments are repeated 1000 times for each value of α, and the means of the metrics are reported.In Fig. 7, we present the value variation of δ and τ with respect to the value of α.We notice that when α = 0.65, the accomplishment task rate δ is the highest, which is up to 0.782, and the traveling time cost rate τ is relatively low.In particular, when α > 0.5 (i.e., α 1 − α > 1), which indicates that the time priority is more important than the space priority, the task completion rate has a significant improvement compared to α < 0.5.As shown in Fig. 7, the overall task completion rate δ is increased from 0.731 to 0.774, rising 5.9%, when parameter α is changed from 0.5 to 0.55.Thus, a relative proportion of time priority can improve the performance of our algorithm.Such a result could be due to the proper combination of these two priorities.On one hand, if temporal priority is weighted too heavily, tasks that are too far away will be left alone to spend a lot of traveling costs.On the other hand, a heavy-weighted spatial priority may skip those tasks requiring immediate accomplishment with lower spatial priority.
Motivated by the results in Fig. 7, in the following experiments, we fix α = 0.65.Because the task accomplishment rate at this time is higher than 70% in the SYN data, the TAMP algorithm is relatively stable and the optimal solution could be obtained.

C. Comparison With Other Algorithms
In this part, we use the values of two hyper-parameters determined in the first two experiments and compare TAMP with condign methods for task assignment on both synthetic and real data.
1) Baselines: We first briefly present the baseline methods for comparative studies as follows.
r K-MP: The method clusters the tasks for different workers by K-means, then schedules the tasks for every worker by Mixed Priority Queue Scheduling.
r SC-DisGreedy: The method clusters the tasks for different workers by spectral clustering, and then every worker selects the nearest achievable task, which aims to reduce the traveling costs.exploits the spatial proximity between tasks by iteratively choosing the nearest available task to the last task added in the task sequence.At each iteration of NNH, the worker chooses one task which is available and the closest to his current position.
2) Results on Synthetic Data: Figs. 8 and 9 show the evaluation metrics δ and τ achieved by different numbers of workers when there are 2000 and 5000 tasks in the SYN data network.The other parameters are the same as shown in Table III.
Intuitively, the rates δ and τ both increase in the number of workers, as when more workers are available, it is more likely to find a worker for any specific task, and the minimum distance toward the task becomes lower.However, as the number of workers k increases, the growth of the overall task completion rate δ has begun to moderate.On the one hand, this could be due to the possibility that the remaining uncompleted tasks are located in remote locations.On the other hand, there are other factors that restrict the growth of the overall task completion rate δ, such as the traveling speed of workers, the deadline of workers, and so on.Therefore, the growth of the overall task completion rate δ merely driven by the increase of workers could slow down and even saturate.
In terms of the overall task completion rate δ, the TAMP algorithm outperforms the three baseline methods in most of the cases.When |W | = 200 and |W | = 250 in Fig. 9 (S = 5000), the overall task completion rate δ is lower for TAMP than SC-DisGreedy, which means α = 0.65 is not the best weight of time for our algorithm at this moment, i.e., the value of α needs to be adjusted with the model.Even if the parameter setting of the TAMP algorithm might not be optimal in all the scenarios (The parameters' values are not changed in subsequent experiments), the TAMP algorithm is still significantly better than other methods in most cases.When |W | = 200 in Fig. 8 and |W | = 500 in Fig. 9, the rate δ of TAMP algorithm is up to 0.9.
The NNH method has a lower running time since only considers the distance between workers and tasks, then the average task time cost τ is the lowest.The NNH method pays more attention to tasks in the neighborhood of workers, so as to minimize the traveling time cost of workers, but Our algorithm seeks the optimal solution of worker task assignment from a global perspective.Meanwhile, the TAMP algorithm is better than the two remaining methods in the average task time cost τ .
Generally speaking, the TAMP algorithm performed well on SYN data.The dataset is a location-based social network, where users are able to check in to different spots in their vicinity.The check-ins include the location and the time that the users entered the spots.For our experiments, we use the check-in data over a period of one month (i.e., October 2010).Moreover, we assume that Gowalla nodes are the tasks of our spatial crowdsourcing system.Consequently, we assume all the chosen items happen in a single day.
For each check-in, we use its location and time as the location and expiration time of the task.Intuitively, checking in a spot is equivalent to finishing a spatial task at that location.The worker heterogeneity is considered in the setting, then the processing time and traveling time are different when the same job is assigned to different workers.For the sake of simplicity, the traveling time cost is calculated by the euclidean distance divided by the worker's traveling speed, and the processing time is calculated by the worker's processing speed.
The initial locations of workers are randomly generated in the restricted rectangle region, and the deadlines of workers are uniformly distributed from 6:00 pm to 8:00 pm.In this set of experiments, we evaluate the scalability of TAMP algorithm by different numbers |S| of tasks, which is up to 8000.  of workers when there are 5000 and 8000 tasks in the Gowalla data.Based on the experiments of REAL data, the advantage of TAMP algorithm in the overall task completion rate δ is more obvious.When |W | = 400 in Fig. 10 and |W | = 700 in Fig. 11, the rate δ of TAMP algorithm is more than 0.7.

D. Discussion
The experiments present the efficiency effectiveness of our proposed method.The real-world application scenarios, such as road condition monitoring [6] and crowdsourcing-aided positioning [7], are more applicable to the problem situations in our work.In these circumstances, the distance between the worker and the task will impact the number of completed tasks, with no special requirements on the worker's skills and strict time constraints on the task's completion.In many applications, the task assignment problem prefers to be dynamic rather than static.It is difficult to deal with the real-time task assignment problem in SC due to the unevenness of arriving tasks and workers, as well as the arrival time being random to the system.Our proposed method in this paper is applied in the static state of each batch, where the spatial-temporal information of tasks and workers is obtained in advance.

V. CONCLUSION
In this article, we design an adequate task assignment mechanism in the context of spatial crowdsourcing, which assigns spatio-temporal tasks considering workers' heterogeneity.We formulate a combinatorial multi-objective optimization problem, i.e., MOST problem, and prove that it is NP-hard.To solve the above problem, we proposed the Task Clustering-based Mixed Priority Queue Scheduling (TAMP) algorithm focusing on task network division and worker queue scheduling.At first, we apply θ-sparseness to the spectral clustering algorithm for optimizing the network partition to improve the scope of crowdsourcing services.Subsequently, the mixed priority queue scheduling scheme combines the temporal requirement as well as spatial features into a single priority metric, which schedules workers to complete the assigned tasks in turn.Extensive experiments on both synthetic and real data demonstrate the effectiveness and efficiency of our scheme.
The add-on of this work is to consider other properties of spatial tasks, such as the rewards of spatial tasks, the task workload, and others.Moreover, the work can be extended to spatial crowdsourcing in a real-time/online scenario.

Fig. 2 .
Fig. 2. Framework of TAMP algorithm with two parts: task network division and worker queue scheduling.

Fig. 3 .
Fig. 3. Queue scheduling process for a single worker w k .

Fig. 4 .
Fig. 4. Example 1 for a worker under different priority strategies.

r
NNH: In [47], Deng et al. propose an approximation al- gorithm named nearest neighbor heuristic (NNH).NNH Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

3 )
Results on Real Data: Considering the real social network scenarios, we use the open real-world dataset from Gowalla 1 .For simplicity, we only sample the data with longitude between −125 and −120, and latitude between 35 and 40 (approximately 440 km × 557 km rectangle region).
Figs. 10 and 11 show the rate δ and τ achieved by different numbers

TABLE II SUMMARY
OF SYMBOLS AND NOTATIONS

TABLE III PARAMETERS
OF SIMULATION Fig. 6.Determining the value of parameter θ.