Privacy-Protection Path Finding Supporting the Ranked Order on Encrypted Graph in Big Data Environment

Since data outsourcing in big data environment become popular and become a trend, large quantities of graph data are outsourced to the big data server for saving cost. As the big data server can not be fully reliable, people usually encrypt the graph data before they are outsourced to the big data platform for the privacy protection. The path finding is a frequently-used action and can be useful for production and living. The path finding supporting the ranked order is a more useful operation, and a user can obtain a ranked search result set. Because of the outsourced graph data being encrypted on the big data server, the path finding supporting the ranked order becomes a task with enough challenge. In this paper, we propose a solution to perform privacy-protection path finding supporting the ranked order on encrypted graph in big data environment (PPFR). Our research uses an encryption mechanism and a ranking strategy to achieve path finding supporting the ranked order. We formally analyze the security of our scheme. We demonstrate the efficiency of our proposed scheme on a real graph data set by experiment.


I. INTRODUCTION
The ubiquity of big data application promote the development of data outsourcing. Outsourcing services bring great convenience to people [1]. At present, graph is widely used in many fields, such as social network [2], road network [3], and collaboration network [4]. Considering reducing costs and expenses, data owners are willing to outsource graph data to the big data server [5]. As the big data server can not be entirely trustworthy, the outsourced graph data needs to be conducted security processing. Performing encryption processing on graph data is a very usual method before it is The associate editor coordinating the review of this manuscript and approving it for publication was Mansoor Ahmed . sent to the big data platform [6]. But it is not convenient for users to use and operate the encrypted graph data. Therefore, the implementation to perform privacy-protection path finding supporting the ranked order on encrypted graph in big data environment is a very significant job.
The path finding in the graph is a frequently-used action and can be useful for production and living [7]. Many operations and applications, such as community detection [8], outlier detection [9], and path planning [10], can be realized through path finding. The path finding supporting the ranked order is a stronger operation, and a user can obtain a ranked search result set depending on path weight. Meanwhile, to prevent the disclosure of privacy, we would process graph information and query contents through encryption algorithm [11]. For instance, in a urban road graph, a vertex represents a location in the city, and the weight of each edge represents path length. We get the ranked information of a path based on the sum of weights between the two locations. And a user can obtain the top-k nearest distances from the query results in a path finding. A privacy-protection path finding is a path query between two vertices on encrypted graph without revealing any private information related to the query. However, it is a difficult task to conduct the path finding in the context of an encrypted graph. Considering the ''pay-on-demand'' rule in data outsourcing scenario [12], it isn't cost-effective to download all the graph data on the local machine. For that reason, it's extremely valuable to perform path finding supporting the ranked order over encrypted graph data. But it is a very challenging thing to complete the path finding in view of the privacy concerns in big data environment.
To perform search over encrypted data in big data environment, searchable encryption is a very useful method that allows the big data server run search by encrypted query token [13]- [17]. Searchable encryption is currently a research focus, and some study works have unfolded. And then a lot of researchers have proposed and implemented some searchable encryption methods that support dynamic data updates [18]- [22]. But all the methods can not be taken to execute path finding supporting the ranked order over encrypted graph. Recently, some research about encrypted graph search has emerged [23]- [27]. Chase et al. present the ideas of structured encryption and the using of the controlled disclosure on encrypted graph [23]. The subgraph search protecting privacy was studied by some scientific research personnel in the literature [24]- [26]. The secure reachability search problem was studied by Yin et al. in the literature [27]. Yet it cannot solve the problem of path finding supporting the ranked order on encryptedased on the sum of weights graph.
We solve this problem by coming up with an effective solution to implement privacy-protection path finding supporting the ranked order on encrypted graph in big data environment (PPFR). In the PPFR scheme, we achieve the secure path finding supporting the ranked order by building index, and ensure search security. We first link and encrypt any two vertices, and get the new symbols. Then we build the chained list of each new symbol including the ranked order value and path information. And then we build an index based on all the chained lists, and the index is kept on the big data server. Before excuting path finding, the query vertices are encrypted and converted to a new symbol set which is sent the big data platform. The big data server runs the path finding through the index and the query symbols, and the query results are returned to the user after query processing. The big data server cannot obtain the private information about the query results and query symbols. By the aid of security analysis and experimental results, it is showed that our PPFR scheme is provably secure and highly efficient.
The contribution of our research can be stated as follows.
(1) We design a scheme to solve the problem of path finding supporting the ranked order on encrypted graph in big data environment.
(2) We analyze the security of the scheme, and ensure the the privacy of the query process and query results.
(3) We demonstrate the efficiency of the scheme through the experiment results.
The rest of our paper is introduced below. Section II introduces the related work. Section III designs and analyzes the PPFR scheme. Section IV gives the security analysis of the PPFR scheme. Section V evaluates our PPFR scheme from the experiments. Finally, section VI summarizes our paper.

II. RELATED WORK
In the current field of information technology [28]- [31], information security is an extremely significant aspect that needs to be considered, and the protection of privacy information is an elementary need of the masses [32]- [34]. Outsourcing data security is one of our main research directions at present. Searchable encryption is of important function to querying in cloud outsourcing data [35]. A ton of data are outsourced to the remote server in the form of encryption, and the privacy-protection query can be performed over the data. In general, there has two modes about searchable encryption: symmetric searchable encryption (SSE) and asymmetric searchable encryption (ASE) [15], [16]. As symmetric encryption is more efficient than asymmetric encryption in terms of computation and overhead, symmetric encryption principle is used in our scheme design.
In outsourcing data query, searchable symmetric encryption has become a very useful basic primitive [13]- [17]. Song et al. first presented the idea of searchable symmetric encryption in the literature [13]. Goh adopted the bloom filter in the literature [14], and put forward the idea of the secure index to solve query question on remote server. The idea and way of non-adaptive SSE and adaptive SSE were presented by Curtmola et al. in the literature [16]. Chang et al. proposed simulation-based idea in the literature [17], and intended to protect the privacy of the search index and the query token. And then a number of expanding searchable encryption schemes were proposed in the literatures [18]- [21]. The dynamic searchable encryption method that supported the addition and deletion of outsourcing files was proposed in the literature [18]. For very-large databases, Cash et al. proposed and accomplished a dynamic SSE idea in the literature [19]. Hahn et al. presented a secure searchable encryption scheme in which the outsourcing data can be efficiently updated [20]. A SSE scheme which supported high scalability and boolean query was presented by Cash et al. in the literature [21]. Fu et al. proposed effective schemes based on concept hierarchy to address the problem of semantic retrieval in the literature [22]. But none of all the above solutions can be used to implement path finding supporting the ranked order on encrypted graph. VOLUME 8, 2020 Some researchers studied and addressed some privacypreserving graph query questions in recent years [23]- [27], [36]. Chase et al. proposed the notion of structured encryption and studied the query scheme of encrypted graph [23]. Cao et al. defined and addressed the problem of secure query over encrypted graph, and accomplished subgraph query through ''filtering-and-verification'' mechanism [24]. Zhang et al. used privacy homomorphism and obscuration ideas to implement privacy-preserving substructure similarity query on encrypted graph in the literature [25]. Fan et al. proposed a practical privacy-protection method to perform subgraph query of protection structure in the literature [26]. Yi et al. studied the problem of secure reachability query on outsourcing encrypted graph in the literature [27], [36]. However, existing schemes about encrytped graph queries have not solved the problem of path finding supporting the ranked order on encrypted graph.
In our paper, we think out a scheme to perform path finding supporting the ranked order in big data environment. We first build the linked lists based on the transformed graph vertices, and build an index on the basis of the linked lists, and next the queries are implemented by the index and the encrypted query symbols on the big data server. We finally give the security analysis and experiment evaluation to demonstrate the security and efficiency of our proposed scheme.

III. PPFR SCHEME CONSTRUCTION A. PRELIMINARIES
The ideas of semantic security and indistinguishability were first proposed in the literature [37]. A scheme is semantically secure if whatever an attacker can compute about the plaintext given the ciphertext, he can also compute without the ciphertext [37]. In our design scheme, we make use of a set (Kge, Enc, Dec) which includes three polynomial time algorithms and is semantically secure to denote a semantically secure encryption infrastructure [38]. Kge is a key generating algorithm, and Enc denotes a symmetric encryption algorithm. Dec represents a symmetric decryption algorithm. The main notations used in our paper follows in Table 1.

B. DESIGN OVERVIEW
In big data outsourcing environment, the architecture of query system is illustrated in Fig. 1, and is mainly composed of three parts: the big data server, the data owner and data users. To protect the outsourced graph data and perform secure path finding, we encrypt the sensitive information of graph data and then build a query index to implement path finding on the big data server. The big data server is not trustworthy, and is responsible for storing the outsourcing graph and the index. The request of user's path finding needs to be encrypted for security reasons. In the paper, our main work focuses on the implemention of path finding supporting the ranked order on encrypted graph. For the authentication and query control of data users, we use the strategies of previous searchable encryption schemes, e.g. broadcast encryption [16], [17].  The proposed PPFR scheme will perform path finding supporting the ranked order on the big data server, and follow the rules of the existing searchable encryption [16], [17]. Also, our scheme will achieve the following goals.
(1) Privacy-protection path finding function. A query user can achieve path finding supporting the ranked order by the encrypted query symbols.
(2) Security. The security of the proposed scheme is formally analyzed, and the privacy of query symbols and results can not be divulged to the big data server.
(3) Efficiency. Our scheme spends the less cost to accomplish the path finding functionality.
In the proposed scheme, we use several data structures such as index and chained list. In the paper, undirected or directed graphs can be used for research purposes. For the sake of uniformity, we may as well consider using the undirected graph in our design scheme. For the requirements of the ranked order, we adopt the weight calculation on the path to achieve. We use the chained list to store the encrypted graph vertices and the new symbols after conversion. The query request has to be encrypted before executing the query. To implement the secure query on the big data server, the index is needed to be built, and also it needs to prevent privacy information leaking.
To achieve the path finding in our scheme, we're going to do it in three steps. We first do the vertices processing to get new symbols and then build the path chained list of each new symbol, and each node of the chained list comprises the path relation and the ranking weight. All the nodes of each chained list are needed to be encrypted for the privacy and security. We need to build the index secondly, and the nodes of all the chained lists are randomly placed in the index. Finally, the query vertices are encrypted and processed, and then be sent to the big data server. The big data server performs path finding supporting the ranked order by encrypted symbols and the index. To be in conformity with most of the SSE ideas [16], [17], [22], we presume the big data server can use the adaptive attack model and, query users have the mutual request authentication and search control mechanisms with the data owner [16].

C. SCHEME DESIGNING
In our proposed scheme, the major problem would be to build the index and path finding method in big data environment, and several algorithms used in the scheme are listed as follows.
• KeysG(1 l ): Keys generation algorithm. It takes l as an input parameter, and outputs a key κ.
• Buildchainedlist(G, K): Building a chained list of every two vertices to load path information. It takes the graph G and the keys set K as inputs, and takes the set of encrypted linked symbols U and the chained list set L as outputs.
• Buildindex(U , L, K, K ): Building a path finding index, and it takes the set U , the chained list L, the key set K, and the other key set K as inputs, and the output is the index I.
• BuildQueryterm(v i , v j , K): Building encrypted query symbol, and the inputs are two different vertices (e.g, v i , v j , where, 1 ≤ i, j ≤ n, and i = j), and the keys set K, and the output is the encrypted query symbol set T v i,j .
• Queryindex(I, T i,j ): Excuting path finding in big data environment, and it takes the index I and the query symbol set T v i,j as inputs, and outputs path information set.
As mentioned above, we use (Kge, Enc, Dec) as the symmetric encryption mode in our path finding scheme, and take l as security parameter. The creating process of our PPFR scheme follows.

1) BUILDING CHAINED LIST
For the outsourcing graph G, we use V = {v 1 , v 2 , . . . , v n } to represent the set of vertices. Every two different vertices are linked and encrypted, and a new generated symbol u i (0 ≤ i ≤ m, m is the number of paths in the graph G) is got. The set of all the new symbols is represented as U = {u 1 , u 2 , . . . , u m }. Next, we're going to build a chained list L i for each symbol u i , and the specific process is shown in algorithm 1. We need to compute the sum of the weights on each path, which is used as the value for ranked order. The contents of each node in the chained list include the path information and the value for ranked order, and we uniformly use this symbol path_cons to express the contents of each node. To prevent privacy disclosure, we can encrypt the node contents. The two key sets used are respectively denoted as K = {κ 1

2) BUILDING PATH FINDING INDEX
To implement path finding about encrypted graph in big data environment, we need to build an index so that the big data server can perform secure path finding by it. The building process of path finding index is shown in algorithm 2. For each element u i in the set U , all of its path contents are kept in the chained list L i , and the number of paths is |L i |. For 1 ≤ j ≤ |L i |, we build a tag about path information of u i by linking u i and j, and the tag is denoted as u i j. And then all the tags about u i are represented as a set J u i = {u i 1, . . . , u i |L i |}. The matching path information of each element in the set J u i is stored in the path finding index I. When carrying out path finding of u i , it is equivalent to searching for the matching entries in the I by all the relational tags in the set J u i . Each tag in the set J u i corresponds to only one element in the path finding index I. To prevent the big data server from knowing the number of paths of each element u i , we have to add disturbing elements in the index I, so that the number of each element's paths in the set U is the same value e, that is,

3) BUILDING QUERY SYMBOL AND PERFORMING PATH FINDING
When the index creation is complete, the outsourcing graph and the index are stored on the big data server. For every two different vertices v i and v j , we build query symbol set T v i,j = T u x = (ρ x1 , . . . , ρ xe ), where, 1 ≤ x ≤ m. Specifically speaking, the query symbol set is created by a symmetric encryption function Enc key (·). It is also to say, T v i,j = T u x = (ρ x1 , . . . , ρ xe ) = (Enc κ x (u x 1), . . . , Enc κ x (u x e)). When a query user is going to perform path finding about the vertices v i and v j , the query symbol set T v i,j (that is, T u x ) is sent to the big data server. By means of the index, the big data server will carry out path finding by Algorithm 3.
In the path finding algorithm Queryindex, if I[ρ xy ] is not a disturbing value for 1 ≤ y ≤ e, we are going to add I[ρ xy ] to the path finding result set. But if the result set is a null set, no query results match the query symbol. Therefore, the time complexity of path finding is O(e).

IV. SECURITY ANALYSIS
The PPFR scheme has been built, and then we are going to do a security analysis of the scheme. We first present several notions that are used in the security analysis of our PPFR scheme [16].
• History: The interaction between the big data server and the path finding user, containing the outsourcing graph G and the query symbol set, represented as H q = (G, T 1 , . . . , T q ). The partial history is represented as H t q = (G, T 1 , . . . , T t ), where t ≤ q.
• View: Considering a history H q about the key κ, the view is defined as V κ (H q ) = (Enc κ (G), I, T 1 , . . . , T q ). The partial view is V t κ (H q ) = (Enc κ (G), I, T 1 , . . . , T t ), where t ≤ q. • Access Pattern: Considering a history H q about the key κ, an access pattern is the tuple is the path finding results matching the query symbol T i .
• Search Pattern: Considering a history H q about the key κ, the search pattern is a binary symmetric matrix q , such that • Trace: Considering a history H q about the key κ, the trace is the sequence T r (H q ) = (|Enc κ (G)|, R(H q ), q ), where |Enc κ (G)| is the overall scale of the encrypted outsourcing graph, R(H q ) and q are access pattern and search pattern of history H q respectively. The trace of partial history is represented as When performing path finding, the big data server will not know the privacy information of graph data and the path finding results. In our proposed scheme, we prove the path finding meets the adaptive semantic security guarantee [16]. As for the adaptive attack model, the adversary (i.e., the big data server) can choose the query request according to the query symbols and path finding results of previous queries [16]. For the implementation of security analysis, we are in obedience to the security idea that is used in the previous scheme implementation [16], [17]. In the light of the security of our PPFR scheme, the big data server can not know the excess contents beyond the information we are willing to divulge, i.e., the trace, and hence our scheme is of security. The security theorem for our path finding is made the following statement.
Theorem 1: Our path finding supporting the ranked order on encrypted graph meets the adaptive semantic security.
Proof: To prove the security of our scheme, we first define a polynomial-scale simulator S. For each q ∈ N , considering the trace of a partial history T r (H t q ), the simulator S can generate a view (V t q ) * which is indistinguishable from the adversary's view V t κ (H q ), where κ is a randomly chosen key, and 0 ≤ t ≤ q.
For t = 0, the simulator S builds the encrypted graph in (V t q ) * by arbitrary generated string. Because of the indistinguishability principle of the semantically secure symmetric encryption Enc key (·), the built encrypted graph in (V t q ) * is indistinguishable from that in V t κ (H q ). And then the simulator S builds the simulative index I * = {I * 1 , . . . , I * m } on the trace T r (H 0 q ) of a partial history through generated random strings. The size of the index I * is the same with the real index I which is obtained from the tace of the partial history, and the index I * can be used in all partial views (V t q ) * to simulate the adversary, where 0 ≤ t ≤ q. It is quite clear that I * is indistinguishable from the real I, otherwise one can distinguish between the outputs of the symmetric encryption Enc key (·) and the random strings of the same size. Thus, . For 1 ≤ t ≤ q, the simulative index I * in the partial view (V t q ) * still belongs to the simulator S. T r (H t q ) involves search pattern matrix t of t path findings. The simulator S will construct the query symbols (T * 1 , . . . , T * t ) which are included in (V t q ) * . During the build process of the query smybols, the simulator S will reuse the symbols (T * 1 , . . . , T * t−1 ) which were contained in (V t−1 q ) * , and alternatively the simulator S will reconstruct the query smybols from T r (H t−1 q ). To generate T * t , the simulator S first make sure that t 1), . . . , R(u t e)). Each R(u t i) involves only one encrypted path content, where 1 ≤ i ≤ e. The simulator S constructs a new encrypted path content through choosing any random string. The simulator S randomly chooses any two addresses ad i and ad j from I * for 1 ≤ i, j ≤ e, being sure that the any two addresses are distinct, and builds the query symbol T * t = (ad 1 , . . . , ad e ). The simulator S remembers the correlation between T * t and u t . Otherwise, if H t−1 q involves u t , the simulator S retrieves the query information corresponding to u t and assigns it to T * t . This ensures that if H t q involves repeated query symbols, then the query information contained in (V t q ) * is the exact same. It is quite obvious that the the query symbols (T * 1 , . . . , T * t ) in (V t q ) * are indistinguishable from the query symbols (T 1 , . . . , T t ) in V t K (H q ). Otherwise, the output of a symmetric encryption function and the random string of the same size are distinguishable. Therefore, for all 0 ≤ t ≤ q, there is no probabilistic polynomial-size adversary that could distinguish between (V t q ) * and V t K (H q ). Thus, the security theorem of our proposed scheme have been proven.

V. EXPERIMENTAL EVALUATIONS
For our proposed scheme, we are going to test and evaluate on the Hep-Ph citation network graph [39], [40] in this section. We carry out our experiments by using the C language program on both local working machine and big data server. Our local machine is equipped with Windows 10 system with Intel Core 4 CPU running at 2.6 GHz. The big data server is equipped with Linux system using 6 CPU cores wiht 3.0 GHz and 16 GB of RAM. In this paper, our main work is the construction of privacy-protection path finding method, including index building, query symbol building, and path finding, etc. The performances evaluations about building index, generating query symbols, and decrypting query results are made on the local machine. The path finding performance evaluation of our scheme is made on the big data server.
In general, a graph data set will have more paths if it has more edges under the same number of vertices. The different number of vertices and edges in the outsourcing graph will influence our experimental evaluations. In the experiment of our PPFR scheme, we use five outsourcing graph sets of random selection, and consider two scenarios to evaluate performance. One of the scenarios is that the outsourcing grap has more edges (marked as PPFR1), and the number of graph edges is respectively 8136, 17052, 36242, 75628, and 129573. The other scenario includes less edges (marked as PPFR2), and the number of edges is approximately half of the number in the first scenario. The comparative test between the two scenarios will be designed to assess the time and storage spendings about path finding and to validate the efficiency of our PPFR scheme.

A. INDEX BUILDING
To implement path queries and protect the security of private information on the big data server, we do this by creating secure index and encrypting query symbols. In our PPFR scheme, we first link and encrypt any two vertices in the graph vertex set to get a new symbol set. Based on the new symbol set, we build a path chained list of each new symbol through Buildchainedlist algorithm, and next generate secure index through Buildindex algorithm. After comparison of experimental evaluations on the outsourcing graph data set in the two scenarios, we give the experimental results analysis figures of index generation. The experimental results of index building time are plotted in Fig. 2, where the X -axis shows the number of vertices about the outsourcing graph, and the Y -axis represents the time of index building.
As Fig. 2 shows, we can see that the number of paths increases as the number of vertices increases in the outsourcing graph. The time of index building is nearly linear to the number of vertices in the graph. Generally, for a graph that has the same number of vertices, there are more paths if the graph contains more vertices. Consequently, the time of index building in PPFR1 scenario is more than that in PPFR2 scenario. Likewise, the analysis results about the size of index building are shown in Fig. 3. The horizontal axis in Fig. 3 shows the number of vertices in the graph, and the vertical axis shows the index size. The size of index building almost increases linearly with the number of vertices VOLUME 8, 2020  increasing in the two scenarios, and the size of the building index will be the larger when there are more edges under the same number of vertices in the graph. Therefore, the size of index building in PPFR1 scenario is more than that in PPFR2 scenario.

B. PERFORMING PATH FINDING
When performing query operations, the big data server uses Queryindex algorithm to execute path finding by means of the index, and then the returned query results are decrypted on the local machine. The results of experimental analysis are plotted in Fig. 4 and Fig. 5, where the abscissa shows the number of vertices in outsourcing graph, and the ordinate represents the query time or decryption time. In Fig. 4, we can see that the query time about path finding is nearly linear to the number of vertices in the graph, and the query operation in PPFR2 scenario has less query time than that in PPFR1 scenario.
After the query operation about path finding is completed, the query user get the returned retrieved results from the big data server. The analysis results of decryption process in our  experiment are shown in Fig. 5, where the X -axis represents the number of vertices in outsourcing graph, and the Y -axis shows the time of decryption operation. The decryption time of our scheme depends on the decryption algorithm and the size of retrieved results. The used decryption algorithms in PPFR1 and PPFR2 two scenarios are identical. As the size of retrieved results and the number of paths are identical, the decryption time and the number of paths in our scheme are interrelated. The time of decryption operation in two scenarios improves with the number of vertices improving, and is almost linear in variation trend. The time consuming of decryption operation in PPFR1 scenario is more than that in PPFR2 scenario.
In conclusion, as can be seen from our experimental analysis, the overhead of time and storage about path finding method building in our scheme almost linearly increases with the number of vertices. The operations of index building and query results decryption are done offline on the client side, and the big data server does not obtain the information of outsourcing contents, query smybols, and query results. As a result, our proposed PPFR scheme accomplishes path finding supporting the ranked order and meets security and efficiency.

VI. CONCLUSION
We propose a noval path finding scheme supporting the ranked order based on searchable encryption thought in our paper. We first give the overall design idea of our scheme which contains building chained list, generating index, and performing path finding in three steps. We next prove the security of our scheme which meets the adaptive semantic security. we finally evaluate our proposed scheme through experimental analysis, and the scheme has good performance and efficiency.
BIN WU received the Ph.D. degree from the Huazhong University of Science and Technology in 2017. He is currently a Lecturer with the School of Information Science and Technology, Jiujiang University. His research interests include privacy preserving, big data security, cloud security, information security, and information query. TAO YAN received the Ph.D. degree in communication and information systems from Shanghai University, Shanghai, China, in 2010. He has been with the faculty of the School of Information Engineering, Putian University, where he is currently an Associate Professor. His major research interests include multiview high efficiency video coding, rate control, and video codec optimization.