Secure and Efficient Nearest Neighbor Query for an Outsourced Database

Cloud computing enables resource-constrained clients to outsource their computation-intensive data storage and computation tasks to a cloud server. Although it creates many benefits, data outsourcing causes some security challenges for clients. In this paper, we propose the secure and efficient query (SecNN) scheme to address the secure nearest neighbor query problem, which is popular in academia and industry. Compared with the state-of-the-art scheme, the proposed scheme can significantly decrease the number of communications between the clients and the cloud server. Furthermore, it achieves the desired security in the random oracle model. Moreover, the extending scheme described can verify the correctness and completeness of query results and database updates by using the Merkle hash tree technique. Finally, an experimental evaluation demonstrates the high efficiency of our scheme.


I. INTRODUCTION
Cloud computing has attracted considerable research interests in view of its powerful storage and computational capacity, which offers outsourcing services for resourceconstrained clients [1]- [4]. Outsourcing enables clients to turn data into a valuable asset. It has been demonstrated as a potential tool and has numerous possible uses in big data [5]- [8]. However, the cloud server may be untrustworthy and eavesdrop on confidential information, such as financial records and personal health information.
One method for protecting data secrecy, which has been studied for many years, is to encrypt the outsourced database [9]- [12]. Unfortunately, traditional encryption schemes inevitably break the correlation among different data items, thus increasing the difficulty of some normal operations. For example, traditional encryption breaks the order information in plaintexts and makes order comparison on ciphertexts impossible. Although fully homomorphic encryption is a potential solution [9], the existing schemes are not practical for the sake of heavy computational time and ciphertext size.
The nearest neighbor query aims to identify the nearest data tuples in a database D based on a query point Q, which is one of the most basic operations in practice for processing potential applications in machine learning [13]- [15].
The associate editor coordinating the review of this manuscript and approving it for publication was Claudio Agostino Ardagna .
As a result, data secrecy in nearest neighbor queries becomes a crucial issue for the outsourced database. Recently, many researches have been conducted to protect data secrecy in nearest neighbor queries [16]- [18]. Based on asymmetric scalar-product-preserving encryption (ASPE), Wong et al. [16] proposed two schemes to support nearest neighbor query under different attack models. However, both schemes have a linear search complexity regarding the data items. Moving a step forward, Yao et al. [19] constructed a nearest neighbor query scheme based on partitions. Although the scheme achieves a faster-than-linear search complexity, it does not support the data items with a dimensionality greater than two. Subsequently, Wang et al. [17] proposed another scheme based on the tree structure and searchable encryption techniques. Their scheme adopted a lightweight cryptographic primitive, order-preserving encryption (OPE), to obtain a faster-than-linear search complexity. To the best of our knowledge, practical OPE schemes cannot simultaneously satisfy the requirements of noninteraction and indistinguishability under an ordered chosen-plaintext attack (IND-OCPA). Therefore, the state-of-the-art scheme [17] with IND-OCPA security needs O(log n) rounds of interactions, where n is the number of data items in the database.

A. OUR CONTRIBUTIONS
In this paper, we focus on the construction of a secure and efficient nearest neighbor query scheme for the outsourced database in cloud computing. The main contributions are as follows: • We apply the prefix encoding and R-tree techniques to design a nearest neighbor query scheme (called SecNN) for the outsourced database. SecNN achieves index indistinguishability under the chosen keyword attack (IND-CKA) in the random oracle model.
• The proposed SecNN scheme only requires 2 rounds of interactions (same as plaintexts) between a client and a cloud server, while the state-of-the-art scheme [17] requires O(log n) rounds of interactions due to the OPE technique.
• We further extend SecNN against a malicious cloud server by incorporating the Merkle hash tree technique. Thus, clients can verify the correctness and completeness of the outsourced database and query results.

B. RELATED WORK
The basis of the nearest neighbor query has been studied in the past decades [16], [17], [19]- [21]. In this section, we review some existing nearest neighbor query schemes and analyze their limitations in detail.
Wong et al. [16] proposed an ASPE algorithm, which preserved a special scalar product. They constructed two nearest neighbor query schemes by ASPE algorithm. Their schemes support the nearest neighbor (NN) computation and resist practical attacks of different background knowledge. Based on non-colluded cloud servers [22] and the Paillier cryptosystem [23], Elmehdwi et al. [24] designed a nearest neighbor query scheme. Yin et al. [25] introduced a privacypreserving k-means clustering technology over encrypted multi-dimensional cloud data by leveraging the scalarproduct-preserving encryption primitive. Unfortunately, all above schemes have a linear-search complexity regarding the database size. As an improvement, Hu et al. [20] designed a new nearest neighbor query scheme from privacy homomorphism (PH). PHs are encryption schemes that support computations on ciphertexts without decryption. The proposed scheme [20] supports large-scale databases and has a fasterthan-linear search complexity by using the tree structure and ASPE. However, it requires a local index and O(log n) rounds of interactions between a client and a cloud server.
Recently, Yao et al. [19] firstly pointed out that the nearest neighbor problem was at least as hard as the construction of an OPE scheme, and showed that it was impossible to retrieve the NN given only E(q) and E(D) based on the impossibility of OPE in the standard model. For this reason, an algorithm based on the secure Voronoi diagram (SVD) technique was proposed to retrieve a partition E(G), where E(G) contains the NN. Unfortunately, it requires the client to locally maintain a large-size index. Wang et al. [17] designed a practical and secure nearest neighbor query scheme in cloud computing. They adopted a lightweight cryptography primitive OPE to construct a solution and achieved sublinear search complexity. OPE [26] was proposed to solve the range query problem. Subsequently, Boldyreva et al. [27] proved that IND-OCPA security was unachievable for a practical OPE scheme unless it had an exponential ciphertext space. Furthermore, Popa et al. [28] pointed out that mutable ciphertext was required for IND-OCPA security. To the best of our knowledge, all OPE schemes with IND-OCPA security require O(log n) rounds of interactions. Therefore, Wang et al.'s scheme [17] requires O(log n) rounds of communication. Similarity, Hsu et al. [29] constructed a PPkNN nearest neighbor search scheme by leveraging the extension of OPE and R-tree. However, their scheme fails in the man-in-the-middle attack. Lei et al. [30] proposed a secure k-nearest neighbor (SkNN) query scheme from the projection function approach to code neighbor regions. This protocol only returns approximate results and donot support highdimensional data.
Hacigümüs et al. [1] firstly proposed the idea of database outsourcing. Devanbu et al. [31] studied the verifiability of an outsourced database by using Merkle hash tree technique. However, their scheme did not consider the case in which a cloud server returns an empty result. Moreover, Wang et al. [32] proposed verifiable auditing for an outsourced database, especially when the cloud server returned an empty result. Cheng and Tan [33] verified that kNN results were complete, authentic and minimal. More promising approaches based on network Voronoi diagram (NVD) and neighbors have been invented. Hu et al. [34] studied the problem of privacy-preserving query authentication for location-based services of multidimensional indexes and later solved the problem of distance verification and path verification on road networks in [35].

C. ORGANIZATION
The rest of this paper is organized as follows. We illustrate some preliminaries in section 2. The system model and the formal definition are shown in section 3. Furthermore, we describe the proposed SecNN scheme in the honest-butcurious cloud server model in section 4. In section 5, we further consider the verifiable update and query in the malicious cloud server model. Section 6 illustrates the security and efficiency analysis. The performance evaluation of SecNN scheme is presented in section 7. Finally, the conclusions are presented in section 8.

II. PRELIMINARIES
We first provide a brief introduction to the definitions and properties of the Bloom filter, Merkle hash tree, and prefix encoding. Then we review the method to find the NN point based on an R-tree.

A. BLOOM FILTER
A Bloom filter refers to a space-efficient data storage structure. It is always used to determine whether an element belongs to a set with allowable errors [36]. A Bloom filter contains a binary array of size m and k independent hash func- In the initial phase, VOLUME 8, 2020 all positions of the array are set to 0. To insert an element x, it computes k independent hash functions h i (x)(1 ≤ i ≤ k) and modifies the corresponding positions of the array to 1, such as a 1 , a 2 . To query an element x, it computes hash functions and determines whether the corresponding positions are 1. If there exists 0 at some positions, then x does not belong to the set S, such as b 2 . If values at the positions are equal to 1, then x belongs to set S with allowable errors, such as b 1 , b 3 . In Fig. 1, it seems easy to draw a conclusion that b 3 does not belong to the set S, while the Bloom filter determines that it belongs to set S. In this condition, we call it false positive probability denoted P f . In [37], the author drew the conclusion P f = (1 − e −kn/m ) k , where n was the number of data items. The false positive P f reaches the minimum value (0.6185) m/n if k = ln 2 × m n .

B. MERKLE HASH TREE
The Merkle hash tree is a special binary tree structure. It is used to authenticate the integrity of outsourced database [38]. A Merkle hash tree stores a hash of data items in leaf nodes and a hash of their children in non-leaf nodes. The clients only store the root locally and the cloud server stores the whole tree.
In the verification phase, the cloud server returns the data and their sibling nodes to the root. Then, the client reconstructs the tree and determines whether the reconstructed root is equal to the stored hash. As shown in Fig. 2 18 . If it holds, the set is integrated and correct; otherwise, it has been tampered.
The Merkle hash tree has a lower computational and storage cost. The clients compute O(log n) hashes which are very fast. Generally, the cloud server only stores data items in leaf nodes. When the clients want to verify a set, the cloud server temporarily generates the sibling values.

C. PREFIX ENCODING
The prefix encoding technique converts a range query problem to test whether two sets have common elements [39].
Given an element x of w bits is defined as the minimum prefix set which covers the range [a, b]. In this condition, we convert the testing of whether an element x falls into a range [a, b] to the testing of whether sets An instance is shown as follows. Given 6 of 5 bits, its prefix family is F(5) = {00110, 0011 * , 001 * * , 00 * * * , 0 * * * * , * * * * * }, and the range prefix

D. R-TREE
An R-tree is a hierarchical data structure. It handles geometrical data, such as point, line, surface, and volume [40]. An R-tree always has a faster-than-linear search complexity and is generally efficient [41].
In an R-tree, geometric data are represented as a minimum bounding rectangle (MBR) and stored in a leaf node. The nonleaf node is an MBR that bounds its children. At the same time, an R-tree is a dynamic data structure that does not require global reorganization. We present the method for solving the nearest neighbor query problem with an R-tree in the following.
1) According to an NN query point Q, a cloud server traverses the R-tree to find the deepest nonleaf node B deepest , which contains Q and returns the leaf nodes that exist under this non-leaf node B deepest . 2) The client computes the closest point d j in B deepest as a temporary NN and computes the smallest distance r. Next, the client draws a circle O center at Q with radius r and generates a temporary rectangle R that is the MBR of circle O.
3) The cloud server traverses the R-tree to determine whether some leaf nodes are in R. The cloud server searches its children only if a nonleaf node B i intersects with the temporary R. When arriving at a leaf node, the cloud server determines whether the leaf node d i falls into R. If no points fall in R, then the temporary NN is the NN. Otherwise, the cloud server returns the leaf nodes. Afterward, the client computes the closest point d k in R and takes it as the NN. An instance is illustrated in Fig. 3. First, the cloud server finds the deepest nonleaf node B 4 that contains the query point Q and computes the temporary NN d 7 . Then, the client computes its temporary rectangle R and submits it to the cloud server. Afterward, the cloud server finds that no leaf node falls in R. Therefore, the temporary NN d 7 is the NN.

III. PROBLEM FORMULATION A. SYSTEM MODEL
As shown in Fig. 4, the system model of our SecNN scheme includes two entities: a client and a cloud server. The client is an entity that outsources its encrypted database to save storage and computational burden and then provides a query request. The cloud server is an entity that stores the encrypted database, searches over it according to the query request and returns the corresponding result. We assume that the cloud server is untrusted and consider the following threat models [42].
• Honest-but-Curious: The cloud server follows a protocol honestly and returns the answer correctly to the client, but tries to learn information about the encrypted database. To protect its secrecy, the client encrypts the data items before outsourcing. • Malicious: The cloud server is not only curious but also dishonest. It can misbehave in any way, such as storing damaged data or returning incorrect results. In this model, the client verifies the correctness and completeness of the outsourced database and query results.

B. FORMAL DEFINITION
A secure nearest neighbor query scheme empowers a client to outsource its database to a cloud server and then to provide the query request to search the NN for a given query point Q.
Wang et al. [17] provide a formal definition of the nearest neighbor query scheme. Definition 1: A secure nearest neighbor query scheme is a tuple of polynomial-time algorithms = {KeyGen, Tree-Build, Enc, TokenGen, Search } defined as follows: • KeyGen (λ) → sk: Take a security parameter λ as input, this key generation algorithm outputs a secret key sk.
• TreeBuild (D) → : Take a database D = {d 1 , d 2 , . . . , d n } as input, this algorithm outputs an R-tree = where d i is a leaf node that stores the data record, B j is a nonleaf node and P is a set of pointers to cover the parent-child relations.
• Enc (sk, ) → * : Take a secret key and an R-tree as inputs, this encryption algorithm outputs an encrypted R-tree * = {d 1 ,d 2 , . . . ,d n ,B 1 ,B 2 , . . .,B m , P}, wherē d i is an encrypted leaf node,B j is an encrypted nonleaf node and P is a set of pointers to cover the parent-child relations in the encrypted R-tree * .
• TokenGen (sk, Q) → tk: Take a secret key and a query point as inputs, this token generation algorithm outputs a token tk for the query point Q.
• Search (tk, * ) → I : Take a search token tk and an encrypted R-tree * as inputs, this search algorithm outputs the result I . VOLUME 8, 2020 In the following, we aim to introduce the IND-CKA security definition in [43]. Informally, it means that the adversary A cannot distinguish which index is built when the databases S 1 and S 2 have the same number of items.
Definition 2: Take a security parameter λ as input, a scheme is IND-CKA secure if for all probabilistic polynomial time (PPT) adversaries A, 2 | is defined as the advantage of adversary A in the experiment as follows: 1) A challenger C generates a secret key sk ← KeyGen(λ); 2) An adversary A chooses two different databases S 0 and S 1 , where both databases have the same number of data items. Then, adversary A gives S 0 and S 1 to the challenger C.

IV. THE PROPOSED SECNN SCHEME A. HIGH DESCRIPTION
In this paper, we propose a secure SecNN scheme based on an R-tree to solve the privacy-preserving problem in the nearest neighbor query. In the proposed scheme, the cloud server needs to test whether a query point Q falls into an MBR B i , whether a leaf node d i falls into a temporary rectangle R, and whether an MBR B i intersects with R.
Fortunately, we can take the Bloom filter and prefix encoding techniques to address all the above issues. For an  ([0, y)). After encoding, the above problems convert to test whether two sets have common elements. We take the testing of whether a query point Q falls into an MBR B i as an example. The condition in which Q falls in an MBR B i is equivalent to the conditions x i1 < x, x < x i2 , y i1 < y and y < y i2 . Moreover, • Enc(sk, ): Given a secret key sk and R-tree , the client proceeds to perform data encryption and index preserving. The details of both algorithms are described as follows. 1) Data Encryption. This algorithm encrypts a leaf node (i.e. data record) d i = (x i , y i ) as To facilitate the data search, the client constructs an index. Furthermore, he encrypts the index into ciphertexts to protect data secrecy. The specific construction is as follows. a) Index Construction. For a leaf node , the client encodes them as prefix families Furthermore, the client computes their Bloom filters as , BF x j2 , BF y j1 , BF y j2 ).
b) Node Randomization. For each node, we generate a random number v.R, namely, we compute H (v.R, H (k 1 , P)), H (v.R, H (k 2 , P)), . . . , H (v.R, H (k r , P)) to obtain their Bloom filter values for every prefix P, as shown in Fig. 5. Then the client updates the encrypted database and index tree to the cloud server, where * = {d 1 , Then, the client submits the search token M Q to the cloud server.
• Search(M Q , * ): Given a search token M Q and an encrypted tree * , the cloud server computes the deepest nonleaf node B j and returns the data records in B j to the client. After receiving the encrypted data records in B j , the client decrypts them and computes the temporary closest point T as a temporary NN. Next, it generates a temporary rectangle R = (x r1 , x r2 )×(y r1 , y r2 ) and gives it to the cloud server. Then the cloud server searches the encrypted tree * to test whether some leaf nodes fall into the temporary rectangle R.
In the Search phase, the cloud server proceeds three operations: whether a query point Q is in an MBR B i , whether a temporary rectangle R intersects with an MBR B i , and whether a leaf node d i is in a temporary rectangle R. We solve all these problems in the following. 1) Whether a query point Q is in an MBR B i . In this step, the cloud server owns M Q = {M x , M y } and BF B i . Owing to the fact that ([a, b]) = φ, the cloud server verifies the relations x 1 < x < x 2 , y 1 < y < y 2 in plaintexts. Namely, it verifies the following relations in ciphertexts:  y r2 ), its token is generated as From the root to a leaf node, the cloud server tests whether a nonleaf node B i intersects with R. If intersecting, the cloud server continues to search the children. Otherwise, the searching progress terminates in this branch. We present it in Algorithm 1.

Ensure:
Output true if B ∩ R = ; otherwise output false.

3) Whether a leaf node d i is in a temporary rectangle R.
Arriving at the leaf nodes, the cloud server retrieves a leaf node which is in the temporary rectangle R. The algorithm is shown in Algorithm 2. At the same time, the results may have some redundancies, namely, many elements may fall in the temporary rectangle R. If the redundancy is too large, the client randomly chooses a leaf node as the temporary NN to carry out the Search algorithm.

Require:
Given a rectangle token R = (M x r1 , M x r2 , M y r1 , M y r2 ) and a leaf node d i = (BF x i , BF y i ).

Ensure:
Output true if d ∈ R; otherwise output false. 1: RESULT = false; 2: If BF x i (M x r1 ) = 0 and BF x i (M x r2 ) = 1 and BF y i (M y r1 ) = 0 and BF y i (M y r2 ) = 1 then RESULT = true; 3: return RESULT.

V. EXTENDING SCHEME: A VERIFIABLE SECNN SCHEME
A malicious cloud server is a cloud server that may perform operations dishonestly and return false results. For example, it may not insert a new value into the database or may return an empty result to save storage and computation cost. To avoid these false operations, the client adds Merkle hashes [44] to verify the correctness and completeness of the outsourced database and the query results.
As indicated in Fig. 6, we present the result of adding a Merkle hash on an R-tree. The hash is computed over its Bloom filters and Merkle hashes of its children. The client memorizes the root to verify whether an operation was performed correctly by the cloud server. For example, when verifying a data record a is in the database, the cloud server provides a sibling-path to the root. If the data record a is in the database, the client reconstructs a root Merkle hash and determines if it equals the stored hash. Otherwise, there exists a level i such that BF i,j (a) = 0 for all j. BF i,j (a) = 0 means that the original BF i,j (a) = 0 or its children BF i+1,2j−1 (a) = 0 and BF i+1,2j (a) = 0. In this condition, the client reconstructs a root and verifies it. The completeness of a database can be verified as above. For an NN query, the client only needs to verify its correctness. In the following, we present a verification method in detail.

A. PROOFS OF INSERTION AND DELETION
When a client performs the insertion or deletion operation, the cloud server needs to provide a proof that the database has been updated correctly. The proof is presented as follows: • Old Merkle Information. The information was affected by the insertion or deletion and the information came from their sibling nodes.
• New Merkle Information. The information forms new sibling nodes after insertion or deletion. After receiving the proofs, the client checks as follows. He first verifies that the insertion or deletion was executed correctly by computing the old root based on the old Merkle information. The client tests if it agrees with the stored root. If it agrees, the update operation has been executed correctly. Whereafter, the client regenerates the root and stores the new root of the Merkle hash tree.

B. PROOFS OF SEARCH
To prove the correctness, the client regenerates the root hash based on the results and their sibling nodes. Then the client verifies if it agrees with the stored root. In the nearest neighbor query algorithm, the cloud server verifies that the result is the NN. In this step, the client regenerates a rectangle R based on the result and verifies that no leaf nodes fall into R . This process is similar to the verification of whether a data record a is in the database.

VI. PERFORMANCE ANALYSIS A. SECURITY ANALYSIS
In this section, we prove that the proposed scheme SecNN achieves IND-CKA security. ; g(·, k)} ≈ {F ← R RF n ; F}. We prove the proposed scheme by contradiction. We suppose that the proposed scheme is not IND-CKA secure, namely, there exists an adversary A can distinguish the index with non-negligible probability. After, we construct a PPT adversary B using A as a subroutine to distinguish a pseudorandom function from a random function. If A can break with non-negligible probability, and thus adversary B can distinguish with non-negligible probability. The algorithm works as follows.
Given a function f , f is a pseudorandom function or random function, a challenger C replaces hash h with f . Note that, C can compute function f through to access its oracle O f . Algorithm B takes algorithm A as its subroutine in the following game.
Takes an unknown function f as input, algorithm B wants to decide whether it is a pseudorandom function or random function. B calls algorithm A to choose sets S 0 and S 1 , where S 0 and S 1 have the same number of data items. B sends sets S 0 and S 1 to the challenger C. After receiving the sets, C randomly chooses S b , b ∈ R {0, 1} to construct its index as I b and sends it to an adversary B. B calls A to guess which set that I b is built for. A outputs b as a guess for b. If b = b, then B outputs 1, indicating f is a pseudorandom function. Otherwise, B outputs 0.
In the following, we show that if A can break with nonnegligible probability, then algorithm B can distinguish a pseudorandom function from a random function with nonnegligible probability by proving the following claims.
• Claim 1: When f is a pseudorandom function, then • Claim 2: When f is a random function, then To begin with, we proof the Claim 1. If f is a pseudorandom function, B takes algorithm A as a subroutine on IND-CKA game, namely, Because adversary A can break IND-CKA game, then . Furthermore, we proof the Claim 2. Recall that each node in an R-tree is assigned to a different random number v.R (if there are more than one Bloom filters in each node, we will assign different random numbers for different Bloom filters), and function f uses random number v.R to compute Bloom filter B v of node v. Owing to the fact that f is a random function, the output of f is a random number of each node v. Facing a random position of Bloom filter B v , adversary A guesses index I b what is built for. Therefore, adversary A guesses b correctly with probability of 1 2 . It follows that Pr[B f = 1|f ← RF n ] = 1 2 . From Claim 1 and Claim 2, we can draw the conclusion that Namely, B can distinguish a pseudorandom function from a random function, which is impossible. Thus we prove that the proposed scheme is IND-CKA secure.

B. COMPLEXITY ANALYSIS
Compared to the other nearest neighbor query schemes, our SecNN scheme shows advantages in communication and search efficiency. The details are shown in Table 1. Furthermore, we present the efficiency analysis between the state-ofthe-art scheme [17] and SecNN scheme. For the convenience of description, some marks are introduced. We denote by E/D an encryption/decryption of an IND-CPA secure symmetric encryption scheme, by H a hash, by m the number of MBR B i in an R-tree, by n the size of database D, by d the dimensions of data in D, by w the bits of data, by l 1 the number of data in the deepest nonleaf node B deepest , and by l 2 the number of data in the temporary rectangle R.
In Wang et al.'s scheme [17], they used OPE to test the order information. As demonstrated in the scheme [28], we can draw the conclusion that the encryption/decryption computation of OPE equals to the rounds of communication multiplied by the decryption computation of DET, namely, E OPE /D OPE = log n × D. In SecNN scheme, we use prefix encoding and Bloom filter techniques to replace OPE. It focuses on the hash computation and we present it in Table 2.

VII. PERFORMANCE EVALUATION
In this section, we present a formal experimental evaluation of the SecNN scheme. More specifically, we run our SecNN with the C++ language on a machine with an Intel Xeon(R) CPU E5-2609 v3 processor running at 1.90 GHz and 96 GB memory. The cipher used is AES-ECB-256, as provided by the Crypto library. Throughout the experiment, we simulate both the cloud server and the client on the same machine.

A. DATABASE
In the experiment, we choose 1,000,000 data items randomly from the Gowalla database, which has 6,442,890 check-in locations collected from 196,591 users.

B. NETWORK
We note that scheme [17] will be affected by the communication due to the existence of O(log n) interactions, where n is the size of data items in a database. The communication latency differs with the variance of network environment. In this paper, we assume that the network is slower than 5 ms of latency and 20 Mbps bandwidth [45]. As shown in Table 3, it can be seen that the communication time of scheme [17]  We provide the storage and time cost of the tree construction in Fig. 7. In our experiment, the tree construction consists of two factors: TreeBuild and Enc. It is easy to see     that the tree construction time and storage increase with the number of data items. The storage cost of both schemes have a linear-increase and the proposed SecNN scheme is higher than scheme [17]. The time cost of tree construction has a similar trend to the storage cost. However, the tree construction is only a one-time cost, and the cost of SecNN scheme is efficient. Therefore, the proposed scheme is suitable for real-world applications.
As shown in Figs. 8 and 9, we provide the efficiency comparison of the nearest neighbor query with the increasing number of data items. In our experiment, we divide the Search phase in two rounds and simulate the experiments into two scenarios. In Figs. 8a and 9a, we simulate the cloud server and the client on the same machine without considering the communication latency. From the figures, we can conclude that both schemes achieve an average query processing time of about 10 milliseconds on the 1 million data items and are very efficient. In Figs. 8b and 9b, we present the impact of communication latency for both SecNN and Scheme [17]. These figures demonstrate that the communication latency sharply influences the search time in the scheme [17]. We test the client-side computation cost in both schemes. The client computation is about 12 ms in Wang et al. scheme [17] and 14 ms in our SecNN scheme. The SecNN scheme is slightly higher but acceptable. For these experiments, we did not actually transfer the data over a network, but merely measured the theoretical communication cost. If considering the response times of the client and the cloud server, the performance has a 10x slowdown. From the Figs. 8 and 9, it is easy to conclude that the proposed SecNN scheme is efficient in practice.

VIII. CONCLUSION
The nearest neighbor query is a normal query operation in many engineering disciplines. In this paper, we introduce a SecNN scheme which enables privacy-preserving nearest neighbor query in cloud computing. The key novelty of the proposed scheme is in leveraging prefix encoding technique to convert range query to test whether two sets have common elements. To achieve a practical performance, SecNN applies R-tree on a distributed database to facilitate the query. Additionally, it supports dynamic database and is secure in the random oracle model. Finally, the proposed scheme is implemented on the Gowalla database. It is shown that SecNN scheme can achieve high-efficiency.