Maximum Influential Location Selection With Differentially Private User Locations

The widespread use of mobile devices and social network services has made optimal location queries an important research topic. Previous studies have focused on the problem of maximum inﬂuential (Max-inf) location selection, that is, ﬁnding a location that can attract as many clients as possible. The location information of each client should be collected to process such a query. However, client location is considered sensitive information. Therefore, a privacy protection technique should be applied to Max-inf problems. Motivated by this, we propose a Max-inf problem query-processing technique with differentially private client location information. Furthermore, we present a Voronoi region-based technique to guarantee query accuracy and a Voronoi envelope-based pruning heuristic to improve query performance.


I. INTRODUCTION
During the past decades, a vast amount of geo-spatial data has been collected by various location-based services owing to the widespread use of mobile devices. The increasing amount of location data can provide exciting opportunities to support market analysis, such as decision making problems of competitive location selection and establishment of public facilities. Especially, previous studies of geo-spatial data have focused on the maximum influential (Max-inf) location selection problem, that is, finding a location that can attract as many clients as possible. These applications generally assume that customers trust data analysts and agree to the collection of their location information without any restrictions. However, location data are typically collected by telecommunication operators and social network service providers rather than data analysts. In addition, places that people visit disclose extremely sensitive information, such as their behavior, home and work locations, preferences, and habits. Therefore, people dislike disclosing their exact location, and location privacy has become an emerging issue in the spatial database community. For this reason, location-based service providers usually exploit privacy preserving data analyzing techniques The associate editor coordinating the review of this manuscript and approving it for publication was Jerry Chun-Wei Lin . when offering customer location data to data analysts. Motivated by this, we propose a novel privacy preserving query processing technique that finds the best location to establish a new facility while satisfying privacy requirements.
Optimal location selection [1]- [7] is a common problem that finds the best location to add a new facility that optimizes an objective function. Specifically, a Max-inf problem [1]- [3] is a traditional problem that identifies the most influential object in a given database which consists of potential objects P, existing facilities F, and client locations C, as depicted in Fig. 1. A Max-inf problem finds a location that maximizes influence on clients under the assumption that each client utilizes the nearest facility. We occasionally use the words ''user,'' ''client,'' and ''customer'' interchangeably.
A Max-inf problem is extremely useful for market analysis and beneficial in several real-life applications. For example, Starbucks may want to open a new branch to compete with other coffee shop franchises, or a telecommunication service provider may want to add a new base station in a densely populated area to improve their service quality. Fig. 1 shows two existing facilities, {f 1 , f 2 }, and five clients, {c 1 , c 2 , c 3 , c 4 , c 5 }. We want to determine the best location between three different potential locations, {p 1 , p 2 , p 3 }, to establish a new facility.
We assume that f 1 is the nearest facility of {c 1 , c 2 , c 3 }, and f 2 is the nearest facility of {c 4 , c 5 }. If a new facility is established at p 1 , then it becomes the nearest facility of {c 2 }. If we select location p 2 , then we can attract clients {c 3 , c 4 }. However, if we select location p 3 , then it becomes the nearest facility for every clients, that is, {c 3 , c 4 , c 5 }. Therefore, p 3 is the most attractive location. A conventional approach to support such decision making is to utilize computational geometry techniques with the assumption that the exact locations of customers are known. However, customers provide information only to trusted service providers rather than data analysts.
To remedy this problem, we present novel Max-inf problem query-processing techniques while applying differential privacy to client location data. Differential privacy [8] is a de facto standard privacy protection technique that applies a randomized mechanism to add controlled noise into statistical query results. A naïve approach to apply differential privacy to a Max-inf problem is to add a Laplace noise to its objective function, which is called a Laplace mechanism. We invoke reverse nearest neighbor (RNN) queries for each potential location p i ∈ P and identify clients whose nearest neighbor is p i . Then, we add a Laplace noise to the number of clients and select the most influential location. However, if the potential locations are close to each other, then client location information is disclosed recursively. For instance, the RNN clients of p 2 and p 3 are {c 3 , c 4 } and {c 3 , c 4 , c 5 }, respectively, as shown in the previous example. The intersection of the two RNN clients is {c 3 , c 4 }, which indicates that their location is leaked twice by p 2 and p 3 . This problem is called sequential composition [8] in differential privacy and requires dividing the privacy budget by the number of potential locations. However, sequential composition degrades query accuracy exponentially and suffers from the number of potential locations. Thus, we propose a Voronoi region-partitioning method (VPM) that partitions the Voronoi region of a potential location to exploit the parallel composition [9]. Although the VPM mitigates the degradation of performance accuracy, it suffers from expensive computational cost. To reduce computational cost, we exploit 2 r-trees for potential locations and client locations with aggregate r-tree for existing facilities, which is a variation of the r-tree [19]. We present a pruning technique called the Voronoi envelope filtering method (VEM), which precomputes the upper-bound of the noisy count of influece regions to reduce search space.
In summary, our contribution is threefold. First, we present a Max-inf problem with a differential private user location. To the best of our knowledge, this study is the first attempt to apply differential privacy directly to the query processing of optimal location selection. Second, we present two algorithms to improve query performance, namely, the VPM and the VEM. Third, we study the properties of the proposed methods empirically on an actual dataset. The remainder of the paper is organized as follows. Section 2 reviews studies relevant to the Max-inf problem and differential privacy, and Section 3 formalizes problem definitions and presents system models. Sections 4 and 5 propose two query processing techniques, the VPM and the VEM. Section 6 provides the experimental results from an actual dataset. Finally, Section 7 concludes the paper and recommends directions for future work.

II. RELATED WORKS AND BACKGROUNDS
In this section, we review existing studies relevant to the Max-inf problem and differential privacy. Location optimization problems are characterized by optimization functions and can be classified into three categories, namely, Max-inf, Min-sum, and Min-dist problems. These problems are closely related to RNN queries. Therefore, we first briefly review RNN studies.

A. RNN QUERIES
The RNN has received significant attention in research [10]- [13] since its introduction by Korn and Muthukrishnan [10]. These authors were the first to study RNN queries and present a general approach to solve such queries. The authors precalculated the nearest neighbor distance for each data object and found its surrounding circle that its radius is the nearest neighbor distance. Then, for any query q, each point is the RNN for q that contains q in its circle. Yiu et al. [11] first studied the problem of RNN queries on road networks. They proposed the Eager algorithm, which is a filter and refinement method based on the network expansion approach. The Eager algorithm traverses the network around the query point q with ascending order of the shortest distance from q to each node of the network. For each node n retrieved, the Eager algorithm performs a range-NN query in the range d(n, q). If the data object p is retrieved, then all the nodes with the shortest path to q that pass through n can be pruned. Vlachou et al. [12] extended the RNN to the reverse top-k query (RkNN), which retrieves an object in a weighted feature space. An RkNN query is used to assess the impact of a potential product in the market. This option is based on the number of clients that identify a top-k product according to their preference. The authors introduced a threshold algorithm-based method, that is, the RTA, to solve the RkNN problem. Lu et al. [13] investigated the reverse spatial and textual kNN (RSTkNN) search, which considers textual similarities in RkNN retrieval. An RSTkNN query is used to find objects that take a specified query object as one of its k-most spatial-textual similar objects. The authors proposed a hybrid index structure, namely, the intersection union r-tree (IUR-tree) to answer the RSTkNN query. The IUR-tree consists of an r-tree with inverted files for each node. The leaf VOLUME 8, 2020 nodes contain entries with location and keyword information, whereas each intermediate node has union and intersection vectors for the keywords of its child. The authors designed a branch-and-bound algorithm on the basis of the IUR-tree to solve the RSTkNN problem.

B. MAX-INF PROBLEMS
The Max-inf problem was first introduced by Cabello et al. [1]. It maximizes the influence of a facility, where influence indicates the number of clients who are the RNNs of a facility. The authors found regions for a new facility through the nearest location circle (NLC). The NLC of a client c is a circle centered at c, and its radius is the distance between c and the nearest facility of c. Only a facility established within the NLC of c can be the new nearest facility of c. Therefore, a solution can be obtained by finding regions that are enclosed by the largest number of NLCs. Wong et al. [2] also studied the Max-inf problem using a first polynomial time complexity algorithm called the MaxOverlap. The authors likewise exploited NLCs to avoid evaluating intersection points that are guaranteed not to be optimal. Yan et al. [3] presented an approximate method for the Max-inf problem. The authors designed an efficient influential location miner called FILM, which returns a small grid cell where all locations have an influence guarantee. Contrary to existing approaches that return a precisely optimal location at the expense of long running time, the authors' approach returns near optimal locations in considerably less time. Meanwhile, Zhang et al. proposed the Min-dist location selection problem [4]. This method finds points within Q given a client set C, an existing facility set F, and a region Q. Thus, if a new facility is established at any one of these points, then the average distance of the clients to their respective nearest facilities is minimized. To solve the problem, the authors proposed a method that initially identifies a set L of candidate locations from Q and then divides L progressively until an answer set is found. Qi et al. [5] also resolved the Min-dist problem and proposed the maximum NFC distance (MND) method. The MND is a variation of the minimum bounding rectangle (MBR), which is combined with the NLC. Moreover, Xiao et al. [6] and Chen et al. [7] presented an optimal location selection query in a road network environment.

C. DIFFERENTIAL PRIVACY
Differential privacy was first introduced by Dwork and Roth [8]. The aim of differential privacy is to mask the differences in queries among neighboring datasets. Its definition and properties are as follows.
Definition 1: Differential privacy. Given two neighboring databases D 1 ,D 2 , such that D 1 − D 2 1 ≤ 1, a randomized mechanism M is -differential private if the following condition holds for all S ⊆ Range(M ).
where is the privacy budget, and · 1 is a norm of a vector. One method to achieve -differential privacy is to use a Laplace mechanism, as explained in the previous section. It simply adds noise sampled from a Laplace distribution to the query results, where the noise is proportional to the sensitivity of mechanism M . Definition 2: Sensitivity. For two neighboring data sets D 1 ,D 2 , the sensitivity of M captures the magnitude by which a single individual's data can change the output of M in the worst case, as follows: Differential privacy satisfies simple composition properties, which are called sequential composition and parallel composition as follows.
Definition 3: Sequential Composition. Let M 1 and M 2 be two differential private mechanisms and their privacy budgets be 1 and 2 , respectively. Then, their combination, M 1,2 , is ( 1 + 2 )-differentially private. Therefore, the composition of multiple differentially private mechanisms leads to a linear increase in the privacy budget or an increase in noise to maintain a fixed total privacy budget.
Definition 4: Parallel Composition. Let M i provide i -differential privacy and D i be an arbitrary disjoint subset of database D. Then, the sequence of Existing studies on differential privacy that is related to our work have used private spatial decomposition techniques. Location-based services involve several privacy concerns. For example, a location-based server aims to hide the number of people in a region, and this range query can be solved by differential privacy. Cormode et al. [14] applied spatial decomposition methods, which are a type of dataset-partitioning mechanisms, to decrease noise. The authors instantiated a hierarchical tree structure to decompose a geometric space from large to small areas with data points partitioned among the leaves. In addition, the authors added noise to the count for each node. Qardaji et al. [15] identified the selection of partition granularity to balance errors from two sources as the key challenge in differentially private synopsis methods. The authors proposed a methodology for selecting grid size for the uniform grid method on the basis of the analysis of the dependence of errors on grid size. Li et al. [16] proposed a matrix mechanism that can answer sets of linear counting queries. The set of queries, which is defined as a workload, is transformed into matrix A, where each row contains the coefficients of a linear query. The essential element of the matrix mechanism is to select A to represent a set of queries. The matrix mechanism can be extended to various approaches based on the selection of A. For example, if A is an identity matrix, then this mechanism can be a normal Laplace mechanism for batch queries. Zhang et al. [17] created a quadtree for spatial datasets. The authors defined a threshold to determine the minimum of a subdomain and another threshold to 83730 VOLUME 8, 2020 limit the height of a quadtree. The amount of noise added to the quadtree can be limited to a constant by using the two thresholds. Several researchers have focused intensively on the private spatial decomposition method in spatial differential privacy. However, private spatial decomposition is inadequate in optimal location queries owing to its dataset skewness. When applying private spatial decomposition to the Max-inf problem, accuracy would be poor if the users are not evenly distributed. However, users are usually skewed in real-world applications, and Max-inf problems are often out of alignment with grid cells. Fig. 2 shows this alignment problem. The gray area shows the influence region of p 1 . Then, c 3 and c 4 are outside the region, but their grid cells overlap with the region. Since we don't know the exact locations for c 3 and c 4 , it is difficult to decide whether the influence region of p 1 include them or not. Therefore, we utilize the Voronoi region-based approach rather than the grid cell-based approach to process the Max-inf problem.

III. PROBLEM DEFINITION
We formally define the problems. Table 1 summarizes the notations frequently used in the study. All data objects are represented by points in the Euclidean space. Let d(, ) denote the Euclidean distance between two points. Then, the Max-inf location selection problem is defined as follows.
Definition 5: Max-inf location selection query. The Max-inf finds an optimal location that maximizes the influence on clients under the assumption that each client utilizes the nearest facility given existing facilities F, potential location P, and client location C.
In (3), every client c ∈ C is associated with a positive weight w(c) that captures the importance of the client. Generally, every client has the same importance. Thus, we set w(c) as 1 for all the clients in this study. As prescribed in Section 1, a random noise drawn from a Laplace distribution can be added to the objective function to find an optimal location with a differentially private approach. The magnitude of the noise depends on sensitivity, and the sensitivity of the counting problem is 1. Thus, the Max-inf problem with a differentially private user location is defined as follows.
Definition 6: Max-inf with a differentially private user location. Given the same object datasets as the Max-inf problem, the DP-Max-inf finds a location that maximizes influence with a differentially private client location.
The DP-Max-inf simply changes the objective function from (3) to (5) by adding a Laplace noise. As described in the previous sections, the influence regions of potential locations overlap each other. Therefore, the DP-Max-inf divides by the number of potential locations to achieve total -differential privacy. Hence, we set the privacy budget as = /|P| in the naïve approach. However, if we apply sequential composition to the DP-Max-inf problem in the naïve approach, then it will suffer from highly poor performance accuracy. Therefore, we present an enhanced approach, SC enhanced . SC enhanced computes influence regions for each potential location p i . The influence region is a subspace of the Euclidean space in which the customers are the RNN of p i . Then, SC enhanced checks whether they overlap and calculates how much noise should be added to satisfy -differential privacy. Finally, SC enhanced improves accuracy by providing tighter noise bounds than the naïve approach because not all the influential regions of the potential location overlap in general. The proposed methods exploit the Voronoi diagram [18] of existing facilities and the potential location. The Voronoi diagram is defined as follows. The proposed methods determine the influence regions of each potential location after the Voronoi diagram is constructed. Then, the influence region is the Voronoi cell of the corresponding potential location. The influence region is defined as follows.
Definition 8: Influence region. For any object o in the Euclidean space, the influence region of the potential location p ∈ P on the set F ∪ {p} is a region V (p) that satisfies that for any point p ∈ F ∪ {p}, p = p and for any point The influence region of p encloses all and only the clients in IS(p). We use the influence region to quickly identify IS(p). Thus, we redefine (3) as IS(p) = {c|c ∈ C ∧ c ∈ V (p)}. Next, we determine the overlapped potential location set, in which the influence regions overlap with the influence region of each potential location.
Definition 9: Overlapped potential location set. For any potential location p and its influence region V (p), an overlapped potential location set is a subset of potential locations that hold the following condition.
Finally, we redefine the objective function of DP-Max-inf with the overlapped potential location set as follows:

IV. VORONOI REGION PARTITIONING METHOD
Even if we use SC enhanced to solve the DP-Max-inf problem, accuracy degrades exponentially with the maximum cardinality of the overlapped potential location set. Therefore, we propose the VPM to further improve accuracy.

A. BASELINE APPROACH OF VPM
The VPM changes a sequential composition to a parallel composition; thus, it mitigates the degradation of accuracy. The VPM constructs an influence region of each potential location p i ∈ P with existing facilities F. Then, it finds the overlapped potential location set OP(p i ), and the VPM enumerates the combinations of overlapped regions. Thereafter, the VPM counts the number of clients who are located in each partitioned region and adds a Laplace noise. Finally, the VPM sums up the noisy count of every partitioned region. Definition 10: Partitioned influence region. The meet operation is defined as Meet(S) = ∩ u∈S u, which is extracted from the basic theorem on Galois lattice [23]. We also define Voronoi regions of potential locations in OP(p i ) as VOP(p i ) = {V (p j )|p j ∈ OP(p i )}. Then, given potential location p i , influence region V (p i ) and its overlapped potential location set, OP(p i ), the partitioned influence region, PV (p i ), is a set of regions that holds the following condition.
For example, Fig. 3 shows the Voronoi regions of the facilities and the partitioned influence regions of each potential location presented in Fig. 1. Fig. 3 (a) shows the Voronoi regions, which are composed of {f 1 , f 2 , p 1 }, and Fig. 3 (b) and (c) are constructed by {f 1 , f 2 , p 2 } and {f 1 , f 2 , p 3 }, respectively. Then, Fig. 3 (d) and (e) show the partitioned influence regions of {V (p 1 ), V (p 2 ), V (p 3 )}. The partitioned regions are constructed to compute the noisy count of p 1 , as shown in Fig. 3 (d). The Voronoi region of p 1 is composed of the following partitioned influence regions.
As shown in Fig.3 (e), region A contains c 2 , whereas the other regions of PV (p 1 ) contain no clients. Let n X is the noise of the partitioned region X . Then, the noisy count of V (p 1 ) is calculated as follows: Similarly, the Voronoi region of p 2 is partitioned with regions {B, C, D, E}, and its noisy count is 2 + n B + n C + n D + n E . The noisy counts of regions B and C are reused because they have been computed during the process of V (p 1 ). If we compute the noisy count of potential location p, then the total noise of ncount(p) is proportional to O( √ |PV (p)|) owing to parallel composition. Therefore, query accuracy is better than SC enhanced if |PV (p)| is less than |OP(p)| 2 . However, the computation cost of the VPM is extremely high, because the time complexity of finding PV (p) is O(2 |PV (p)| ) in the worst case. Thus, the baseline algorithm of the VPM is based on the divide-and-conquer framework. We utilize the following remark to compute the partitioned influence region. The overall procedures are described in Algorithms 1 and 2. Algorithm 1 computes the partitioned influence region of each potential location, whereas Algorithm 2 shows the overall query-processing steps of the VPM. The VPM constructs Voronoi regions based on potential locations and partitions them by overlapped regions. Then, the VPM adds a Laplace noise once to each partitioned region, which is a parallel composition. Therefore, the VPM is -differentially private owing to parallel composition.

B. R-TREE-BASED APPROACH
Although the VPM is based on the divide-and-conquer approach, it still suffers from high computational cost in generating the influence region and finding the overlapped potential location set. Therefore, the VPM exploits three r-tree [19] indices, that is, R f for Voronoi regions of existing facilities, R p for potential locations, and R c for client locations. Then, we can use the intersection query to determine Algorithm 1 Get Partitioned Influence Region (GetPV) Input: pv -Partitioned influence region, j -index of overlapped potential location set, OP -overlapped potential location set, V -Voronoi regions of potential locations Output: PV (p) -A set of partitioned influence regions 1: L ← [ ] 2: if j is greater than OP.size then 3: insert pv into L 4: else 5: if pv intersects with v j then 8: for v j ∈ V do 6: if v i interects with v j then 7: add v j to OP 8: cnt i ← 0 10: for pv j ∈ L do 11: cnt ← count the number of users that pv j contains 12: cnt i ← cnt i + cnt + Laplace( 1 ) 13: PQ.enqueue(cnt i , p i ) 14: Return PQ.top the overlapped potential location set. We can also use the range query to compute the partitioned influence region of the potential location and to count the number of users in each partitioned influence region with R c . In R f , the leaf node is composed of MBRs for each Voronoi cell of existing facilities. As explained in the previous section, a Voronoi diagram of F ∪ {p i } should be constructed to generate the influence region of each potential location p i ∈ P. We precompute the candidate Voronoi neighbors of each influence region to reduce the computational cost of generating the influence region. Voronoi neighbors are subsets of existing facilities that are adjacent to a given Voronoi cell. We refer to the edge of a Voronoi cell as a Voronoi edge and each end VOLUME 8, 2020 point as a Voronoi vertex. A Voronoi edge is a perpendicular bisector of a line segment between two existing facilities. For each Voronoi edge of the existing facility f , we refer to the corresponding facility f ∈ F as Voronoi neighbors of f denoted VN (f ). In addition, VN (p) represents Voronoi neighbors of the influence region of potential location p. Since |VN (p)| F, if we know VN (p) in advance, then it is possible to reduce the construction time of the influence region. However, it is impossible to pre-compute the Voronoi neighbors; thus, we use candidate Voronoi neighbors instead. Candidate Voronoi neighbors are a superset of Voronoi neighbors. We need to compute the Delaunay triangulation of existing facilities to determine the candidate Voronoi neighbors of the influence region. The Delaunay triangulation [20] of a discrete point set O in the general position corresponds to the dual graph of the Voronoi diagram for O. The circumcenters of Delaunay triangles are the vertices of the Voronoi diagram. In the Euclidean space, Voronoi vertices are connected via edges. They can be derived from the adjacency relationships of the Delaunay triangles. If two triangles share an edge in the Delaunay triangulation, then their circumcenters relate to an edge in the Voronoi cell. Thus, we can define the candidate Voronoi neighbors as follows.

Definition 11: Candidate Voronoi neighbors. Assume that location p is located inside the Voronoi cell V (f ). Let DT be a set of Delaunay triangles that consists of existing facilities F and Disk() be a circumcircle covering each delaunay triangle. Then, candidate Voronoi neighbors, CVN (p)
, are the subset of existing facilities and defined as follows:

Lemma 12: VN (p) ⊆ CVN (p)
Proof: The proof can be found in the appendix. We can construct the influence region of the potential location with its candidate Voronoi neighbors but not all existing facilities. In addition, we can compute the overlapped potential location set through the Voronoi neighbors of the influence region. Let a potential location p is fixed, and its influence region V (p) and Voronoi neighbors VN (p) are given. Then, each Voronoi vertex corresponds to a pair of Voronoi neighbors. Thus, let the corresponding Voronoi neighbors of Voronoi vertex v i ∈ V (p).vertex be vn i and vn i . In addition, the circle whose center is each Voronoi vertex v i and the radius of Then, we can easily find the overlapped potential location set by following lemma.

Lemma 13: Potential location p is an overlapped potential location of p if and only if p is inside the union regions of VC(v i ), for all v i ∈ V (p).vertex.
Proof: The proof can be found in the appendix. Therefore, we can easily compute the overlapped potential location set through the range query of R p based on the above lemma. In conclusion, lines 5 to 7 in Algorithm 2 are changed to invoke the range query of R p with the influence region of the given potential location. Although we utilize these properties, computational cost is still extremely high if potential locations are skewed. In the next section, we propose the last query-processing algorithm that reduces computational time by slightly sacrificing accuracy to overcome this drawback.

V. VORONOI ENVELOPE FILTERING METHOD (VEM) A. BASIC IDEA OF VEM
In this section, we propose a VEM. The VPM suffers from worst case query-processing time. In the worst case, the overlapped potential location set constructs every combination of intersection regions. Thus, the time complexity of the VPM increases exponentially proportional to the cardinality of the overlapped potential location set. Although it is extremely time-consuming, computing the partitioned influence region for query accuracy is inevitable. However, we observe that customers and facilities are generally skewed in the Euclidean space. Most potential locations are less influential among customers than the optimal location. Therefore, we can reduce query-processing time if we know the upper-bound noisy count of the potential location. Motivated by this observation, we filter unnecessary potential locations whose upper-bound noisy count is less than the noisy count of the optimal location. To compute the upper-bound noisy count, we initially determine the candidate influence region of each potential location, which is called the Voronoi envelope. The Voronoi envelope of a potential location p, which is denoted by VE(p), is the union of the Voronoi cells of CVN (p). Fig. 4 shows an example of the Voronoi envelope. We have 15 existing facilities and one potential location p. Fig. 4 (a) shows the Voronoi diagram of existing facilities, and p is located inside the Voronoi cell of f 2 , as depicted in Fig. 4 (b). Then, the Voronoi envelope of f 2 consists of its candidate Voronoi neighbors {f 1 , f 2 , f 3 , f 4 , f 5 , f 6 , f 14 }, as shown in Fig. 4 (c). Fig. 4 (d) describes the actual influence region of p. As shown in this example, VE(p) is an upper-bound region of V (p).

Algorithm 3 Voronoi Envelope Filtering method(VEM)
Input: R f -R-tree of existing facilities, R p -R-tree of potential locations, 2 -privacy budget Output: p r ∈ P -near-optimal location 1 if Both of E p and E f are R-tree node then 7: for N p ∈ E p .children and N f ∈ E f .children do 8: if N p interects with N f then 9: PQ.enqueue(N p , N f , UBncount(N f )) 10: else if E p is R-tree node and E f is not then 11: for N p ∈ E p .children do 12: if N p interects with E f then 13: 14: else if E f is R-tree node and E p is not then 15: for N f ∈ E f .children do 16: if N f contains E p then 17: PQ.enqueue (E p  p r ← E p 30: Return p r Thus, we can compute the upper-bound noisy count of the influence region. As shown in Fig. 4 (c) and (d), the Voronoi envelope is the upper-bound region of any potential location inside a corresponding Voronoi cell. Then, the noisy count of the Voronoi envelope is the upper-bound noisy count of each influence region. Thus, we divide into 1 and 2 , where 1 is used to compute the noisy count of each Voronoi cell of existing facilities, and 2 is used for the VPM. The noisy count of each Voronoi cell and its upper-bound count are computed as follows: Equation (11) has the same form as (7), except the term for the overlapped potential location set is removed. The Voronoi diagram partitions the entire region, so it is possible to apply parallel composition. Therefore, the sensitivity of (11) is also 1 as same as (7). As we will describe later,  the intermediate nodes of the r-tree only use the noisy count of voronoi regions, so it does not need additional privacy budget due to the post-processing property of differential privacy [8]. The VEM can filter out potential locations whose upper-bound noisy count is less than the current best location during query processing. For this reason, we change R f to an aggregate R-tree (aR-tree) [24], which is a variation of the R-tree and maintains aggregate information in the intermediate nodes. The leaf nodes of R f is composed of MBRs for each Voronoi cell, which is the same as the VPM, and their aggregate count is the upper-bound noisy count for each Voronoi cell. In R f , the intermediate node stores the maximum upper-bound noisy count of its children nodes. Let N f be a node of R f and N f .children be its children nodes. Then, the upper-bound of the noisy count of N f is calculated as follows: (i) N f is an intermediate node including the root (ii) Otherwise    If N f is a leaf node of R f , then the upper-bound noisy count is the sum of the noisy count of its candidate Voronoi neighbors. By contrast, the upper-bound noisy count of the intermediate node is the sum of the upper-bound noisy count of its children nodes. Then, the noisy count of the potential location is less than UBncount(N f ) if its nearest existing facility is a descendent of N f .

B. QUERY PROCESSING OF VEM
The VEM also exploits three r-trees, namely, R p , R f , and R c , during query processing. It traverses R p in the best-first search approach [25] while finding the corresponding nodes of R f . The VEM algorithm searches the Voronoi cell of the existing facility, which contains potential locations concurrently and prunes out unnecessary potential locations during the traversing R p . Let N p be a node of R p and E p (E f ) be an entry of R p (R f ). Algorithm 3 shows the query-processing steps of the VEM. The VEM maintains the triplet (N p , N f , UBncnt(N f )) in a maxheap sorted by UBncnt(N f ). Then, the VEM dequeues the triplet (E p , E f , UBcnt) of the maxheap at each step. Four cases exist, depending on the types of E p and E f as follows: (i) E p and E f are nodes of the R-tree.
(ii) E p is a potential location point, and E f is a node of R f . (iii) E p is a node of R p , and E f is a Voronoi cell of the existing facility. (iv) E p is a potential location point, and E f is a Voronoi cell of the existing facility. In Case (i), the VEM extracts the children of N p and N f . Next, it finds the pairs of the children (N c p , N c f ) whose MBRs overlap. Then, the VEM computes the upper-bound noisy count of each N c f and enqueues the new triplet (N c p , N c f , UBncount(N c f )). In Case (ii), the VEM extracts the children of N f and finds each child node N c f , which contains E p . Then, the VEM computes the upper-bound noisy count of each N c f and enqueues the new triplet (E p , N c f , UBncount(N c f )). In Case (iii), the VEM extracts the children of N p and finds each child node, N c p , which overlaps with E f . The last step is the same as that of the above cases. In Case (iv), the VEM computes the influence region of E p with E f and invokes the GetPV operation of the VPM algorithm to calculate its noisy count. If its noisy count is greater than the current best, then the VEM updates the current best. The VEM repeats these steps until the current best is greater than the top of the maxheap. The potential locations in Fig. 5 (a) are considered with existing facilities in Fig. 4 (a). Then, the Voronoi diagram and R f are constructed, as shown in Fig. 5 (b) and (c). Assume that the upper-bound noisy counts of Voronoi regions are given as described in Table 2, and nodes of r f are constructed as Table 3.

VI. EXPERIMENTAL RESULTS
In this section, we report the experimental results of the proposed method. Section 6.2 studies the behavior of the methods with varying dataset sizes. Section 6.3 evaluates the performance of the methods with varying parameters.

A. EXPERIMENTAL ENVIRONMENTS
We perform experiments to evaluate our proposed VPM and VEM methods by using three datasets, namely, Yelp [21], CAL [22] and Gowalla [27]. The Yelp dataset consists of location and check-in data and is a location-based social network service. The user datasets, which comprise a set of 1,326,097 users, are generated by check-in data. A total of 174,566 points of interest (POIs) exists in the business data. Therefore, we divide the POIs into two datasets. One dataset is for potential locations, and the other is for existing facilities. Then, we randomly select 100 to 900 points from potential locations. The CAL dataset is an actual road network dataset from California that contains 21,048 vertices, 21,693 edges, and 85,070 POIs. In this case, we divide the POIs into two sections to form potential locations and existing facilities. Next, we randomly generate client locations based on Zipfian distribution varying the skewness of the data. We randomly select the POIs in CAL dataset and find the nearest neighbor facilities of each POI. Then, we compute the nearest neighbor distance, and we generate distance of each user from zipf distribution. Finally, we randomly generate 2D positional values with this distance. The Gowalla dataset has only check-in data of users. There are 2,548,428 user histories, and the users visit total 706,947 POIs. Thus, we select 50,000 potential locations and existing facilities, respectively. Then, we make the other 606,947 POIs as client locations to match the ratio of POIs to client location with the Yelp dataset. To measure accuracy with generality, all data sets were divided into 10 subdata sets, and the experiments are conducted 10 times with varying parameter values with each subdata sets. Table 4 shows a summary of the experimental setup in which the bold texts are default settings. The experiments are conducted with an Ubuntu 14.04 operation system with an Intel(R) Xeon(R) CPU E5-2620 v2@2.10 GHz and 64 GB RAM. We compare SC_naive (based on sequential composition), SC_enhanced (the enhanced version of sequential composition), the PSD (private spatial decomposition), the VPM, and the VEM.
The PSD is based on DAWA [16] which is one of the best private spatial decomposition methods in 2D data [26]. We set the system parameter of DAWA as same as [16]. To find out optimal location with PSD index, we construct each influence region of potential locations. Then, we compute the overlapping areas of the regions and PSD index. Next, we compute the noisy count of each potential location with overlapping ratio. Finally, we find out the potential location with the highest noisy count. We measure mean absolute error (MAE), query accuracy, and query processing time of optimal location selection. Let p o (i) be an actual optimal location and p m (i) be the query output of each method in i th iteration. Then, the query error means the difference between the number of clients attracted by p o (i) and p m (i). Meanwhile, the query accuracy is the number of times that p m (i) is same with p o (i). With the number of test case n t , the MAE and query accuracy VOLUME 8, 2020 are calculated as follows: B. SCALABILITY TEST Fig. 7 shows the results of potential location scalability. We evaluate accuracy, MAE, and query time by varying the number of potential locations from 100 to 900. As depicted in the figures, all schemes degrade performance in terms of MAE and accuracy as the cardinality of the potential location set increases. However, the VPM and the VEM outperform the other methods in all the experiments. In the Gowalla dataset, synthetic client data is highly skewed, so the proposed techniques seem to be significantly better than the other methods. As Fig. 7 (g), (h) and (i) show, the VPM suffers from poor scalability of query-processing time. Meanwhile, the VEM outperforms the other methods in terms of query processing time, and its query accuracy is similar to that of the VPM. The scalability of existing facilities is depicted in Fig. 8. It shows that the accuracy of all the methods increases as the number of existing facilities increases except accuracy in the Yelp dataset. Fig. 8(d) shows that all the methods slightly degrade in accuracy,  but there are not significant change. Moreover, the VPM and the VEM still outperforms the other methods. The areas of the potential locations depend on the existing facilities.
As the number of existing facilities increases, the influence region of each potential location would be smaller. Therefore, the influence of each potential location decreases, as the cardinality of existing facility increases. As a result, the overall query accuracy decreases because it is highly affected by the inserted noise. The results of query time are depicted in Fig. 8 (g), (h) and (i). The query processing time increases in overall methods except the PSD and the VEM, as the number of existing facilities increases. It is because the computing time of overlapping area decreases in the PSD and the probability of pruning increases in the VEM due to the small size of influence region. In conclusion, the VPM and the VEM outperform the other methods in terms of accuracy and MAE, but the VPM still suffers from query-processing time, as described in Fig. 8. The results of user scalability are depicted in Fig. 9. It shows that the accuracy of all the methods increases as the number of users increases. By contrast, the MAE decreases as the number of users increases. As the number of users increases, the influence of individual potential locations increases. As a result, the accuracy of the query is less affected by the inserted noise. The query processing time increases in all the methods, as the number of users increases. However, the PSD and the VEM are less affected than the others. The PSD just computes the overlapping areas of each influence region and the DAWA index, so the query processing time is independent of the number of users. The VEM prunes out unnecessary potential locations, so the query processing time slightly increases. In all datasets, the VPM and the VEM outperform the other methods in terms of accuracy and MAE, but the VPM still suffers from query-processing time.

C. PARAMETER TEST AND INDEX BUILD TIME
The second set of experiments demonstrates performance evaluation with varying parameter values. We evaluate the effectiveness of the total privacy budget and ratio. As explained in Section 5.1, we divide the total privacy into 1 and 2 . The ratio indicates the budget amount used to construct the index. When the ratio is r, the total privacy budget is computed as follows: We evaluate accuracy and MAE by varying the total privacy budget from 0.25 to 4. The experimental results are shown in Fig. 10. As expected, the accuracy of the overall methods increases substantially with the increase in except the SC_naive and PSD. In general, the VPM outperforms the other methods, and the VEM performs second best. However, the VEM is the best in terms of query-processing time. The query processing time is not affected by the privacy budget. The results of the experiments that vary the ratio are shown in Fig. 11. These figures demonstrate that the VEM exhibits similar accuracy and MAE, but it has the best when ratio is 0.1. Therefore, we select the default value of ratio as 0.1. The ratio does not affect the query processing time as same as the privacy budget. Fig. 12 shows the results of experiments at varying clients distribution skewness (α). As α grows, the clients get to be more skewed.
The query accuracy of the PSD decreases in proportion to α as depicted in Fig. 12 (a) and (b). The query processing time is not affected by the client skewness, as shown in Fig. 12 (c). Fig. 13 describes the results of the last experiments for the index build time. We compare the index build time of the VEM with the R-tree and the DAWA index structures. Since the scalability of potential locations does not affect the build time of the VEM, we conducts the experiments for the existing facilities and clients. The VEM has the poor performance in all datasets. However, index build time is not on-demand service time, but precomputation time.

VII. CONCLUSION AND FUTURE WORKS
We propose a differentially private method to protect user locations in the query processing of Max-inf location selection. To the best of our knowledge, this study is the first to point out the influence region overlapping problem when applying differential privacy to Max-inf problems. It is possible to find an adequate location for various purposes, such as analysis of trade areas and establishment of public facilities, while preserving the individual location of users. Also, we present query processing methods VPM and VEM to improve query accuracy and to reduce query processing time.
In this work, the VEM is the best method in terms of accuracy and query-processing time. In the future, we will perform additional experiments to obtain other appropriate parameters to improve performance on various datasets.
The inequality follows that d( The inequality follows that d( (p, o). It contradicts to the assumption that d(f , o) > d(p, o). Therefore, VN (p) ⊆ CVN (p).
Proof of Lemma 13: Suppose that p ∈ OP(p) and p is outside ∪ v i ∈V (p).vertex VC(v i ). Then, ∃o ∈ V (p) such that ∀f ∈ F, d(o, f ) > d(o, p ). Without loss of generality, assume that o is inside of pv i v j , such that v i and v j are the adjacent Voronoi vertices of p. Let denote their corresponding Voronoi neighbor be as f * . Then, d(v i , f * ) = d(v i , p) and d(v j , f * ) = d(v j , p). The followings are hold by the assumption.
This contradicts to that ∀f ∈ F, d(o, f ) > d(o, p ). Therefore, if p ∈ OP(p), then p is outside ∪ v i ∈V (p).vertex VC(v i ).