An Efficient Two-Server Ranked Dynamic Searchable Encryption Scheme

Searchable encryption (SE) allows users to search over encrypted data without decrypting. In most existing SE schemes, server returns all matched ﬁles without relevance ranking, and the update mechanism are suffering with high communication and computation cost, which are not efﬁcient enough to satisfy the real-life dynamic scenario. Addressing the above issues, we proposed TS-RDSE—a Two-Server Ranked Dynamic Searchable Encryption scheme. We integrate orthogonal vector and efﬁcient homomorphic encryption cryptosystems to build a vector-level dynamic secure index, which simultaneously supports efﬁcient dynamic update operations like deletion and insertion of ﬁles ﬂexibly. Moreover, in order to rank the search results by relevance without decryption, we build a secure sorting protocol based on the widely-used tf-idf weighting formula and addition property of partial homomorphic encryption, which achieves accurate sorting for the search results while protecting the privacy of relevance scores. We give a comprehensive analysis of the correctness of TS-RDSE in the aspect of searching and sorting with mathematical proofs. The security analysis shows that TS-RDSE is secure against adaptive dynamic chosen-keyword attacks (CKA2) by honest-but-curious adversaries in random oracle model. The performance analysis shows that TS-RDSE has both a very light user workload and a moderate server workload, and it is superior to the existing approaches in terms of functionalities and expansibility. Extensive experiments on the real-world dataset validate our analysis and show that TS-RDSE is suitable for the real world cloud storage environment.

With the rapid development of computer technology and the widespread of 5G service scenarios [1], outsourced data and applications are growing rapidly. To save the cost of data management and local system maintenance, a large number of individual users and enterprises choose to migrate their data and business to the cloud and allow the cloud service provider to store and process their data by different demands. Since the data in the cloud is out of user's physical control, user cannot effectively check the data's security and consistency without totally trusting the cloud storage service provider. However, in recent years, there are more and more security incidents caused by security risks of cloud service providers [2], [3]. Statistics show that about 92 percents of enterprises worry that outsourcing data to the cloud is a threat to data privacy [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Yassine Maleh .
To avoid unauthorized access to user's data from cloud administrators and hackers, user should encrypt the data at local before outsourcing. By encrypting sensitive data and storing it on untrusted cloud server, user can ensure that data confidentiality is guaranteed regardless of the privacy terms of cloud service provider. Since the traditional encryption scheme will change the original characteristics of data, it limits the server's ability to perform search.
In order to let the server be able to compute search function as comprehensive as possible without decrypting the encrypted data, searchable encryption (SE) was born [5], and quickly became a active research topic both in the theoretical researches (e.g., [6]- [8]) and in the practice (e.g., [9], [10]). The research object of SE is how to ensure the availability of outsourced encrypted data on the premise of protecting user privacy and data security. At present, there are still some shortcomings in the SE research, which are embodied in the following aspects: Generally, the traditional SE schemes mainly consider the static setting. The forward index-based SE achieves directly dynamic update, however, it suffer from linear (in the file number) search complexity. Many schemes use inverted index structure to speed up the search process in cloud server. When adding or deleting files, it is necessary to reconstruct the index structure which is inefficient in dynamic scenarios since the user's workload and communication are overhead, and the leakage to the server is much. Although some schemes use additional data structures to perform update operations [6], [16], the access pattern and search pattern are leaked during update, and the high complexity of the dynamic updatable structure still leads to low update efficiency and allows only limited times update operations.
Moreover, most existing schemes do not capture the relevance between the search request and the retrieved files. Many ranked SE schemes [12]- [15] proposed to return all the files that match the search request to the user, who then rank the result in the local based on relevance weights. In large data scenarios, it is demand that the cloud server has the ability to sort the search results according to their correlation with search request, and place the most relevant files to user, to avoid transfer of irrelevant files and, hence, minimize the communication and computation cost at the user side. With ranked results, user can get the most relevance files more effectively. Many researchers do efforts to address how to sort search results in SE schemes to support ranked retrieval. However, the result sorting methods are either based on approximate ranking measurements [14]- [16] or need complex operations [11], [13], [18]. What's more, many schemes did not take the keyword frequency into account, so the rank order of two files will be the same even if one contains all the searched keyword infrequently and the other one contains the same keyword very frequently. In terms of security, the proposed solutions are mainly based on higher assumptions [15], [19] (e.g., introducing trusted third parties) or specific security models [11], [18], [20]- [23] (e.g., the rank privacy are slightly weakened by allowing server obtain weight information).
Therefore, it is fair to say that how to design an efficient searchable encryption scheme in the two aspects of secure accurate result ranking and efficient dynamic update and secure accurate result ranking, is a very challenging task and still remains open. Addressing the above issues, we studied the state-of-art updatable search index structure and MPC-based comparison methods, and proposed TS-RDSE--a two-server ranked dynamic searchable encryption scheme that supports efficient updating and secure sorting. The contributions are summarized as follows: • We design a new dynamic secure index structure based on partial homomorphic encryption and orthogonal vector. Benefiting from the property of the orthogonal vector, the special vector-level index simultaneously supports efficient search and dynamic update operations like deletion and insertion of files flexibly. The index includes the encrypted keyword-file relevance scores, which are computed by tf-idf weighting formula and encrypted by homomorphic encryption. To perform search, server only needs to conduct simple vector multiplication between search token and index to judge whether the search keywords appears, then compute the sum of the multi-keyword weight scores on the encrypted domain with addition property of partial homomorphic encryption.
• We build a two-server secure sorting protocol to achieve ranked search on encrypted domain obliviously. In this protocol, two non-collusive servers play the role of cloud service provider. By deploying the MPC-based comparison mechanism with Batcher's sorting network, this protocol achieves efficient and accurate sorting while preventing the privacy of relevance score and the result order from the servers. Compared with the single server model, the two-server model reduces the communication cost between user and server, and also lower the information leaked during searching and sorting.
• We give a comprehensive analysis of the correctness of TS-RDSE with mathematical proofs. The formal analysis of security and performance shows that TS-RDSE has both a very light user workload and a moderate server workload while being secure against adaptive dynamic chosen-keyword attacks by honest-but curious adversaries in random oracle model. Extensive experiments on the real-world dataset validate our analysis and show that TS-RDSE is superior to the existing approaches in terms of functionalities and suitable for the real world cloud storage environment.

II. RELATED WORKS
In recent literatures, many researchers proposed a number of searchable encryption schemes that focused on the two aspects of secure result ranking and dynamic update.

A. DYNAMIC SEARCHABLE ENCRYPTION
Facing the requirement for dynamic scenario, researchers do much effort in studying the dynamic update method of secure index that can achieve both search and update operations. Among them, the schemes constructed by the forward index supports efficient updating of files and indexes [17], [18], which makes the dynamic update phase easy to develop applications in practice, such as Skyhigh Networks [9], Cipher-Cloud [10] and Bitglass [24]. However, the forward index structure reveals a lot of plaintext-related information and the search efficiency is not very well. Kamara et al. proposed a dynamic searchable encryption scheme based on linked list structure in 2012 [6]. However, the update complexity of this scheme is linear in the number of files in the file collection which contain the updated keywords. They also extended their scheme to achieve parallel search [16] by deploying red-black tree in the index stricture. In 2014, Cash et al. [7] designed a dynamic SE scheme for big data senario, which could support encrypted search for billions of records and keyword pairs. Fu et al. [25] designed a dynamic SE scheme that obtain semantic search based on keyword extension technique. However, it leaks the information about the position of the newly added keyword as it is fixed. To reduce the update leakage, Bost constructed a dynamic searchable encryption scheme that satisfies forward security [8], but it only support searching and adding files without deleting. In 2018, Zuo et al. [21] built two dynamic searchable encryption schemes that realized forward security and partial backward security at the same time. However, it only allows limited times of update operations, and they did not provide a detailed privacy analysis. Later, Wang et al. [26] claimed that there are a few problems exist in the schemes in [21] which related to security and scalability, and they suggested a generic forward private DSSE with range queries, they also extended their scheme to achieve backward privacy by deploying another roundtrip between the user and the server. In the same year, Li et al. [27] proposed an integrity-verifiable multi-keyword dynamic searchable encryption scheme in response to the demand of multi-keyword searchable encryption in practical application scenarios. Etemad et al. [28] proposed to store the secure index in the user side and server side, which needs to be synchronized when updating, while the search key should be re-encrypted and updated after every search operation.
From the above schemes we can reach that the dynamic update mechanisms in SE would either cause extra information leakage when updating files, or needs high computation overload to build and update index structure which still leads to low update efficiency (even linear in the number of files containing the updated keywords) and allows only limited times update operations.

B. RANKED SEARCHABLE ENCRYPTION
Wang et al. [20] first proposed a ranked SE scheme, where the frequencies of keywords are encrypted with the file identifier, and the search time achieves sub-linear search time with the number of keywords based on the one-tomany order-preserving mapping. Cao et al. [11] extended the scheme in [20] and proposed a solution to support multi-keyword ranked encrypted search based on coordinate matching and inner product similarity. However, they build secure index based on Boolean representation, so files that contains the searched keywords have the same weight score, Many schemes did not take the keyword frequency into account, so the rank order of two files will be the same even if one contains all the searched keyword infrequently and the other one contains the same keyword very frequently.
Term frequency-inverse document frequency (tf-idf) weighting formula is a common information retrieval method to evaluate the importance of a keyword to one document in a set of documents or a corpus [29]. Based on tf-idf, the followup works below were proposed to achieve better rank functionalities and greater efficiency. In 2013, Yu et al. [30] proposed a dynamic ranked SE scheme by separately storing tf values and idf values for each file in inverted index and auxiliary vectors. When updating, only the idf value of the updated keyword will be re-computed. Therefore, instead of updating all file vectors, only the auxiliary vectors are updated. In 2014, Strizhov et al. [14] combined tf-idf and inner product similarity (IPS) technology to design a new ranked SE scheme, where the weight scores are encrypted by ring-LWE-based variant of homomorphic cryptosystem. In 2015, Orencik and Savas [22] proposed a multi-keyword ranked searchable encryption scheme based on location sensitive hash (LSH) [32]. However, LSH is only suitable for similarity searches but cannot provide accurate ranking. Sun et al. [31] design a verifiable ranked searchable encryption scheme on tree structure by using cosine measure information retrieval technology in vectors space model. Their scheme can improve search efficiency at the expense of accuracy. Xia et al. [17] also proposed a ranking scheme where weights are encrypted using OPE scheme. Server can filter the retrieved identifier and send only top-k relevant results to user. The server clusters similar files based on k-nearest neighbors method, filters the retrieved identifier and send only top-k most relevant identifiers to user. In 2018, Shen et al. [13] proposed conjunctive encrypted query scheme based on the scheme in [7], they employ OPE to achieve result sorted. Unfortunately, OPE keeps the encrypted data sorted, which may cause leakage attack.the rank privacy are slightly weakened. In 2020, Guan et al. [23] designed a Cross-lingual multi-keyword rank search with semantic extension, which also speed up the sorting process by designing a top-k data retrieval protocol based on the heap binary tree structures.

C. TWO-SERVER SEARCHABLE ENCRYPTION
The designing of two-server based SE is a recent but active research topic both in the theoretical researches and in the practice. In 2014, Elmehdwi et al. [33] utilized two non-colluding clouds with Paillier cryptosystem to perform the encrypted k-nearest neighbors query. where the purpose of two-server setting is preventing the servers to learn the actual access patterns. Wang et al. [34] deployed two non-colluding servers jointly compute arithmetic functions over multi-key encrypted data. It reduced the privacy leakage by leveraging homomorphic encryption and proxy re-encryption. It contains many fairly complex processes like paring-based operations in elliptic curve during the two server interaction, which make the scalability slightly weakened. In 2015, Orencik et al. [22] proposed a two-server ranked SE scheme, where one performs search, and the other performs rank. But search operation requires more expensive asymmetric-based homomorphic encryption than [34]. In 2018, Rompay et al. [35] proposed a two-server multi-user SE scheme based on private set intersection (PSI), where the two servers receive either encrypted data or key. Meng et al. [14] proposed a relational database encryption scheme that support two-server range query with No-Random-Access (NRA) algorithm. In 2018, Shen et al. [13] extended the scheme in [7] to two-server setting so that conjunctive results are sorted by keyword occurrences encrypted using OPE. In 2019, Wu et al. [37] also utilized secure classification over encrypted database in two cloud environments.
In practice, two non-colluding servers setting have been built and used in real-world applications for both performance and security reasons [18], [36], [38]. One server acts as a crypto cloud equipped with a secure processor, which stores the decryption key. The other one is the cloud service provider, which is typically provided by some large companies that have also commercial interests not to collude. Aiming at real-world application, Li et al. [18] designed an image retrieval service that supports similarity search and updates over encrypted images database with the prevalent model of two-servers in 2018, where the service provider just needs to perform the basic jobs of processing individual image. In 2019, Cheng et al. [38] constructed an efficient privacy-preserving auction mechanism for two-sided cloud markets which enables data-oblivious double auction by sharing keys with one market without the leakages of bidding privacy.
Following the works done above, we deploy our framework under two non-colluding honest-but-curious cloud servers to perform searching and sorting, one holding the encrypted data and the other the secret key.

III. BUILDING BLOCKS A. PAILLIER CRYPTOSYSTEM
Paillier cryptosystem [39] is a probabilistic public key cryptographic system proposed by Pascal Paillier in 1999. It is a probabilistic asymmetric algorithm for public key cryptography. The message space M for the encryption is Z N . N is a product of two large prime numbers p and q. Define function L as L(x) = x − 1/n. For a message m ∈ Z N , we denote [m] ∈ Z 2 N to be the encryption of m with the public key pk. In particular, Paillier encryption is three algorithms P = {P.KeyGen, P.Enc, P.Dec} as follows: • P.KeyGen(1 k ): To construct the public key, set an RSA modulus n = pq of k bits where p and q are large primes such that gcd(pq, (p − 1)(q − 1)) = 1. Let K = lcm((p − 1)(q − 1)) = 1 and pick g ∈ Z * N . The public key is the pair pk P = (n, g) and the secret is sk P = K . Since it is a secure and efficient cryptography along with the additive homomorphic property, it has been utilized in many existing applications such as E-voting. The addition homomorphic property is: We also use the tool of a generalization of Paillier encryption [40]. The message space expands to Z N s for s > 1, and the ciphertext space is under the group Z N s+1 . This generalization allows one to doubly encrypt messages and use the additive homomorphism of the inner encryption layer under the same secret key. Note that the generalized version of Paillier uses g ∈ Z * n s+1 as a public key while the secret is K such that K mod n ∈ Z n and K = 0 mod K , where K = lcm((p − 1)(q − 1)) is the secret key of the original Paillier. We denote [[m]] as ciphertext of m using the second layer. This extension allows a ciphertext of the first layer to be treated as a plaintext in the second layer. Moreover, this nested encryption preserves the structure over inner ciphertexts and allows one to manipulate it as follow: B. GOLDWASSER-MICALI CRYPTOSYSTEM TS-RDSE also relies on Goldwasser-Micali (GM) cryptosystem [41]. The message space M for the encryption is (0, 1). N is a product of two large prime numbers p and q. For a message m ∈ (0, 1), we denote m ∈ Z 2 N to be the encryption of m with the public key pk. Let J p (x) be the Jacobi Symbol of x under p. In particular, GM encryption is three algorithms GM = {GM.KeyGen, GM.Enc, GM.Dec} as follows: • GM.KeyGen(1 k ): To construct the public key, set an RSA modulus n = pq of k bits where p and q are large primes such that gcd(pq, (p − 1)(q − 1)) = 1. Choose δ ∈ Z * n , which satisfies ( δ n ) = 1. The public key is pk GM = (δ, n), and the private key is sk GM = (p, q).
To decrypt a ciphertext c = ||m||m using Jacobi Symbol to compute J p (m) and J q (m), if J p (m) = J q (m) = +1, then the message bit m is 0. Otherwise it's 1. It has the addition homomorphic properties:

C. TF-IDF WEIGHTING FORMULA
We use tf-idf formula [29] as weighting method of search results based on frequency of keywords in each document and the collection. Tf means term frequency-measures how frequently a term occurs in a document, while idf means inverse document frequency-measures how important a term is. Let N be the number of documents in the collection, p be the number of term t appears in a document d, n be the number of terms in the document d and q be the number of documents with term t in it, then: If a document d does not contain t we set tf-idf(t, d) to zero.

A. ARCHITECTURE
The architecture of TS-RDSE is shown in Figure 1. It consists of three types of entities: U, S 1 and S 2 .
• U-the owner of the outsourced file, can make keyword search query request and obtain search result from the server. VOLUME 8, 2020 • S 1 -the cloud service provider, provides data storage and search service for U.
• S 2 -the server who has deployed a secure processor, blindly assists S 1 to perform searching and sorting.
Assume U has a file collection f, TS-RDSE allows U to encrypt f to c at local and outsource the encrypted files c to S 1 . To enable S 1 to search through c, U builds a secure index γ , which contains a search index that is used for searching and a weight index that is used for ranking. U can later search requests to S 1 by keywords. S 1 , upon receiving a search request, can perform searching in the secure index γ , and then retrieve all the matched file identities that satisfy the search request. Without revealing anything about the search result, it can measure and evaluate the encrypted weight of the files in the result by relevance, and sort them obliviously with the assistant of S 2 . S 1 then filters the irrelevant results and returns the final ranked search results to U. After receiving the results, U can decrypt them locally and get the ranked list of files. What's more, user can also dynamically update the outsourced file collection and index on demand by sending update token to S 1 .

B. FORMAL DEFINITION
Definition 1: TS-RDSE consists a tuple of six polynomialtime algorithms and protocols TS-RDSE = (KeyGen, Setup, T oken Gen, Search, Sort, Update) such that: : is a probabilistic algorithm run by U that takes a security parameter 1 k as input, outputs user's symmetric key K and public/secret keys PK , SK .
• (γ , c) ← Setup(K , PK , f): is a probabilistic algorithm run by U that takes the symmetric key K , public key PK and a set of files f as input, outputs a secure index γ and a set of ciphertexts c.
• τ ← T okenGen(PK , K , W): is a deterministic algorithm run by U that takes symmetric key K , public key PK , and a set of keywords W as input, outputs a search token τ .
• (S 1 (R); S 2 (⊥)) ← Search(S 1 (τ, γ ); S 2 (SK )): is an interactive protocol run by S 1 and S 2 that S 1 takes search token τ and secure index γ as input; S 2 takes secret key SK as input, the output of S 1 is the search result R.
• (S 1 (R s ); S 2 (⊥)) ← Sort(S 1 (R, PK ); S 2 (SK )): is an interactive protocol run by S 1 and S 2 that S 1 takes the search results R as input, and S 2 takes secret key SK as input, the output of S 1 is a ranked search result R s .
is an interactive protocol running between U and S 1 ranked result. In the protocol, user takes keys K , PK , the file f as input; S 1 takes the secure index γ and encrypted file sets c as input. After the protocol S 1 outputs the updated encrypted index γ and encrypted file set c .
We formalize the ideal/real world paradigm for TS-RDSE. In the real world Real(1 k ), the protocols between the adversarial servers and user execute just like the real scheme. In the ideal world Ideal(1 k ), there exists two simulators Sim 1 and Sim 2 who can get the leakage information from leakage functions L 1 to L 4 and try to simulate the execution of S 1 and S 2 in the real world. We give the formal definition as follow.
Using the leakage, the simulators attempts to imitate the input and output behavior of the servers in the real instantiation. If Ideal(1 k ) and Real(1 k ) are indistinguishable by any PPT adversaries, then it means that the information more than the leakage, i.e. all data passed to the leakage functions that is not output by the leakage functions remains confidential. The two paradigms are as follows.
Definition 2: Given the scheme described in Definition 1 and consider the following probabilistic paradigms where, U is user, A 1 and A 2 are two non-colluding honest-but-curious PPT adversaries, Sim 1 and Sim 2 are two PPT simulators, and L 1 to L 4 as leakage functions: Real(1 k ): It runs among the two adaptive PPT adversaries A 1 , A 2 and the user U using the real scheme. In this experiment, two sets of files f 0 and f 1 with the same number of files and keywords are sent to the U. U randomly chooses b ← (0, 1) to generate a secure index γ b of f b and sends γ b to A 1 . Subsequently, polynomial times queries (q 1 , . . . , q t ) are made by user to A 1 . For each query q i , if the query is to search, then A 1 chooses two multi-keyword search requests W 1 | and send them to U, U generates the update token τ (i) f b and send it to the adversary A 1 . Since A 1 and A 2 are adaptive adversaries, the results of each query can be used as input to generate the next query. At the end of all queries, the information A 1 received will be recorded as view A 1 . A 2 only participates in the interaction with A 1 and gets the interaction recorded as view A 2 . After the experiment, Ideal(1 k ): It is run by the two simulators Sim 1 and Sim 2 . Sim 1 and Sim 2 respond to queries using randomly generated data, with these leakage functions as the only input. By the information leaked from L 1 to L 2 , Sim 1 builds simulated index structureγ . After that, there are q polynomial times queries (q 1 , . . . , q t ) send to Sim 1 . For each query q i , if the query is for searching, then Sim 1 chooses two multi-keyword search requests W At the end of all queries, the information Sim 1 received will be recorded as view Sim 1 . Sim 2 only participate in the interaction with Sim 1 and get the interaction recorded as view Sim 2 . After the experiment, Sim 1 output Output Sim 1 , Sim 2 output Output Sim 2 .
We say that TS-RDSE is (L 1 , L 2 , L 3 , L 4 )-dynamic secure against adaptive chosen-keyword attacks secure in random oracle model if for all polynomial time A 1 and A 2 , there exists polynomial time simulators Sim 1 and Sim 2 such that the following two distribution ensembles are computationally indistinguishable: In TS-RDSE, before outsourcing files, U encrypts his file collection using the symmetric encryption method which guarantees the CPA security of the original file. He also extracts keywords for each file, and builds an inverted index. The entries of the index correspond to the keyword, which leads to a set of files that contain this word. To support multi-keyword ranked search based on the relevance weight, U should first measure the tf-idf scores for all the (keyword, file) pairs to evaluate the relevance, and insert the scores into a key-value weight index: The entries of the weight index correspond to the files IDs, which lead to a set of keywords as long as their weight scores.
We make use of two semantically secure homomorphic encryption schemes-Paillier and GM schemes to encrypt the weight index to a binary orthogonal vector-based secure weight index. Then, user publishes the public key and then outsources the encrypted data collection and the secure index to S 1 .
To start a search, U sends a search token to S 1 . S 1 searches the matched files in the secure index. The homomorphic property of Paillier makes it possible for S 1 to compute the sum of weights for multiple keywords in a single file without decryption after the search process, so as to obtain the final encrypted weight for every search result. To rank the search results obliviously, we design a two-server MPC-based secure sorting protocol-running between S 1 and S 2 to sort the search result. Our protocol relies on homomorphic properties of the Paillier cryptosystem to allow S 1 and S 2 to privately compare and swap pairs of ciphertexts. To achieve secure and efficient rank, we also require a parallel sorting network that performs comparisons in an oblivious way and guarantees that after performing a deterministic sequence of comparisons the result is sorted. We pick Batcher's sorting [42] for our purposes. VOLUME 8, 2020 Moreover, U can update the outsourced file set and secure index dynamically. The orthogonality of binary orthogonal vector-based index make it possible to support efficient and oblivious updates. Note that the search and update phases are both non-interactive with respect to U: U needs only prepare his search/update token and send them to S 1 .

B. EXPLICIT CONSTRUCTION
Let SKE = (SKE.Gen, SKE.Enc, SKE.Dec) be a CPA-secure symmetric key encryption scheme. F : , 1} k be a pseudo-random function. Let P = (P.Gen, P.Enc, P.Dec) and GM = (GM.Gen, GM.Enc, GM.Dec) be Paillier encryption scheme and GM encryption scheme. The explicit construction is given as follows:

1) KEY GENERATION
To initialize the system, U needs to generate his own keys for encrypting files and building secure index. He first selects a a k-bit string k F as the key of pseudo-random function F, and performs the symmetric key generation algorithm SKE.Gen to generate the symmetric secret key k. Moreover, he uses the Paillier encryption key generation algorithm P.Gen to generate the key pair (pk P , sk P ), and uses the GM encryption key generation algorithm GM.Gen to generate GM's key pair (pk GM , sk GM ).
To construct the secure index, U generates an orthogonal vector basis V: he first chooses p Hadamard arrays A 1 , A 2 , . . . , A p , each of size is e i * e i for 1 i p where every e i is either 2, 4 or 8. Later, he constructs e 1 e 2 . . . e p − sized matrix A M by the tensor product of these p matrices given as V = A M = A 1 ⊕ A 2 ⊕ · · · ⊕ A p . Then he chooses an orthogonal vector basis (ṽ 1 , . . . ,ṽ #W ) from V, in which the length ofṽ i ∈ V is 1 k .
Finally, U outputs (K , SK , PK ) as U's key, where K = (k, k F , V), PK = (pk P , pk GM ), SK = (sk P , sk GM ). U sends PK to the two servers, sends SK to server S 2 in secure channel.

2) SETUP
Let f = (f 1 , . . . , f #f ) be the files set. For each file f i , U extracts its keywords w f i , encrypts each f i to c i using symmetric key encryption algorithm SKE.Enc k (f i ). Then U constructs the secure index γ , which consists of a search index I and a weight index S.
After all keywords are indexed, U selects a random number r in R and a random vectorṽ r in V to build the final search index I: , whereṽ j and [w j ] r w j have been generated when building the search index I. Finally, user selects a random number r in R and a random vectorṽ r in V, computes S[id i ] as: Then user builds the final search index S as follows: Subsequently, U integrates the search index and the weight index into γ = (I, S), outputs γ as secure index and c = (c 1 , . . . , c #f ) as encrypted files, and sends them to S 1 . The Setup description is shown in Algorithm 2.

3) TOKEN GENERATION
As shown in Algorithm 3, to search for a set of keywords W = {w 1 , . . . , w q }, U forms a search token τ . For each search keyword w i in W, U first computes w i 's the inverse elementw i ∈ Z * n mod n, and computes r w i 's inverse element r w i ∈ Z * n mod n 2 . Then U encryptsw i to [w i ] rw i by Paillier encryption algorithm P.Enc, generates a sub-token τ i by computing the vector product between the secure vectorṽ i and w i : τ i = [w i ] ṽ i . After U generates all the sub-tokens for W, he builds the final search token τ = (τ 1 , . . . , τ q ) and sent τ to S 1 .

Algorithm 3 T okenGen Require:
PK , K , W Ensure: [w i ]r w i ← P.Enc pk P (w i ,r w i ) 6: After receiving a search token τ = (τ 1 , . . . , τ q ), S 1 performs the multi-keyword encrypted search in the index γ . It first computes the vector product R w i of each τ i in τ with the search index I : R w i = I · τ i mod n 2 . If the keyword w i does exist in the set w, S 1 can get R w i = ||v i || mod n 2 , which is a GM ciphertext of the corresponding binary index vector of w i . Otherwise, R w i is concluded as 0 according to the property of orthogonal vector basis. Afterwards, S 1 generates the search results R = (||v 1 ||, . . . , ||v q ||) and sends it to S 2 . S 2 decrypts each ||v i || in R by GM decryption algorithm GM.Dec to get the binary vector v i corresponding to each keyword w i ∈ W, and computes the intersection of v 1 , . . . , v q to get the final binary Finally, S 1 generates the search results R = { id 1 , e 1 , . . . , id d , e d }, where d is the number of matched files. The search phase is shown in Algorithm 4.

5) SORT
Since the weight scores (e 1 , . . . , e d ) in the search result R are encrypted by Paillier encryption, the randomness property prohibits S 1 from sorting the search result and returning the identifiers ranked by their weight scores to U. To achieve ranking, we construct a secure sorting protocol running between S 1 and S 2 to privately sort the weight of the matched files. Our idea is to compare each pair ( [id x ], e x , [id y ], e y ) inR based on the order P.Dec sk p (e x ) and P.Dec sk p (e y ) and then use the comparison as a blackbox building block for the sorting protocol. We pick Batcher's sorting [39] for performing efficient parallel multi-time comparisons. VOLUME 8, 2020
For each level i, for every pair ( [id x ], e x , [id y ], e y ), we construct a secure comparison protocol SC for comparing each pair ( [id x ], e x , [id y ], e y ) inR based on the order P.Dec sk p (e x ) and P.Dec sk p (e y ). The goal of SC is that S 1 obtains the encryption of the relation between the order of the plaintext of e x and e y without learning neither the actual numbers nor the comparison result v, where v = 1 if P.Dec sk p (e x ) ≤ P.Dec sk p (e y ) and v = 0, otherwise. We also require S 2 to learn nothing about the relation between P.Dec sk p (e x ) and P.Dec sk p (e y ) but just assist S 1 to obtain an encryption of the comparison result.the main process that S 1 and S 2 performs in SC are as follows: , retrieves the l−th bit of d, denotes it as d l and sends it to S 1 . S 1 retrieves the l−th bit of r, and denotes it as r l computes r l := r mod 2 l . S 1 and S 2 engage in a private comparison protocol (e.g. the DGK protocol [43]) to compare d l and r l . After the DGK protocol, S 1 receives ||λ||--the GM ciphertext of the comparison result λ. If d l < r l , then λ = 1. To sort, S 1 needs to know the encrypted bit using second layer of generalized Paillier encryption, that is, [[v]]. For this purpose, S 1 computes the most significant bit of z, denotes it by v, and computes ||v|| = ||d l || * ||r l || * ||λ||. They performs the bit re-encryption and the encrypted chosen protocol to get the ranked ||λ||. The detailed description of SC is shown in Algorithm 5. d l ← d mod 2 l 8: (S 1 (||λ||); S 2 (⊥)) ← DGK(S 1 (r l , PK ); S 2 (d l , SK )) 9: S 2 : ||d l || ← GM.Enc pk GM (d l ) 10: S 2 → S 1 : ||d l ||. 11: S 1 : ||r l || ← GM.Enc pk GM (r l ) 12: ||v|| ← ||d l ||||r l ||||λ|| 13: r s ← R {0, 1} 14: ||s r s || ← ||v|| · ||0||; ||s 1−r s || ← ||v|| · ||1|| 15: S 1 → S 2 : ||s 0 ||, ||s 1 || 16: S 2 : s 0 ← GM.Dec sk GM (||s 0 ||)  1 ], e 1 , . . . , [id d ], e d }, which is a re-encryption of array R using the first layer of Paillier that satisfys P.Dec sk p (e 1 ) ≤ P.Dec sk p (e 2 ) ≤ . . . ≤ P.Dec sk p (e d ). Finally, S 1 sends the final ranked result R s to U. U can decrypt each [id i ] in R s and get the ranked identities (id 1 , . . . , id d ). The sort phase is shown in Algorithm 6.

Require:
S 1 (R, PK ); S 2 (SK )) Ensure: S 1 (R s ); S 2 (⊥) 1 . Afterwards, he sends the integrated update token τ f = (τ 1 f , τ 2 f ) to S 1 . S 1 updates the search index I to I by computing addition and deletion of vectors as I = I − I w f + I w f based on the adding mechanism of orthogonal vector group, and updates the weight index S to S by replacing the original vector that S[τ 2 f [1]] pointed to with the new vec- and output the updated index γ = (I , S ).
In Algorithm 7 we present our update protocol for adding new file. Deleting file is in a similar way. To avoid repetition, it's not described here.

A. CORRECTNESS
The correctness of TS-RDSE implies that, for all security parameters k, all keys (K , SK , PK ) generated by KeyGen(1 k ), for all (γ , c) outputs by Setup (K , PK , f), and for all sequences of search or update operations on γ , Sort(S 1 (Search(S 1 (T okenGen(PK , K , W), γ ); S 2 (SK )),

Require:
U(K , PK , δ f , f ); S 1 (γ , c) Ensure: Form sets I f ← (I w 1 , . . . , I w #f ) 7: Form sets I f ← (I w 1 , . . . , I w #f ) 8: Generate : +ṽ r · r ) 15: 20: S ← S 21: γ ← (I , S ) PK ); S 2 (SK )) will always output the correct ranked results R s , that is, not only satisfies the search request, but is sorted according to the corresponding weight scores. We mainly divided the correctness of TS-RDSE in the following two parts:

1) Correct Search
The correctness of search implies that the search algorithm Search always output the correct results. Proof: To search for keywords W = {w 1 , . . . , w q }, U computesw i andr w i that satisfiesw i · w i ≡ 1 mod n andr w i · r w i ≡ 1 mod n 2 . Then he generates [w i ]r w i = P.Enc pk P (w i ,r w i ) = gw i hr w i mod n 2 . Letṽ i be the secure vector corresponding to w i , U computes τ i = [w i ]r w i ṽ i , and generates the search token τ = (τ 1 , . . . , τ q ). Upon receiving the search token τ , S 1 computes the search result R w i for w i : a vector product of sub-token τ i and the search index I: If w i does exist in the keyword set w, it can be deduced that the sub-index I w i is included in the search index I, so I w i can be represented as I = j∈w j =i (I w j ) + I w i +ṽ r · r. So S 1 can VOLUME 8, 2020 compute the search result R w i by the following equation: According to the property of orthogonal vector basis, if the set of orthogonal vectors V containsṽ i andṽ j , whereṽ i =ṽ j , thenṽ i ·ṽ j = 0 mod n 2 ; ifṽ i =ṽ j , thenṽ i ·ṽ j = 1 mod n 2 . Becausew i ·w i ≡ 1 mod n andr w i ·r w i ≡ 1 mod n 2 , S 1 can get the result mod n 2 , which is GM ciphertext of the corresponding binary index string of w i .
If the keyword w i do not exist in the keyword set w. S 1 can compute the search result R w i by the following equation.
The comparison correctness implies that the secure comparison protocol SC for the encrypted weights e x and e y from ( [id x ], e x , [id y ], e y ) in each round of sorting is correct. Proof: Suppose D x and D y are two integers with length l−bits, which represent the plaintexts of e x and e y separately. The value z = D y − D x + 2 l is an integer with length (l + 1)−bits. We can see that if the l + 1−th bit of z is 1, then D x ≤ D y ; if the l + 1−th bit of z is 0, then D x > D y . The l−th bit of z can be represented as z ÷ 2 l . So z = 2 l (z ÷ 2 l ) + (z mod 2 l ), where 0 ≤ (z mod 2 l ) ≤ 2 l . Based on the above observation, we let S 1 compute [z] as [2 l ]e x e −1 y mod n 2 (l > k), and blind it by computing [d] := [z][r] mod n 2 . Because the value of z is blinded by a randomness r to a new value d: d = z+r, so d = 2 l (d ÷2 l )+(d mod 2 l ) = 2 l ((z ÷ 2 l ) + (r ÷ 2 l )) + ((z mod 2 l ) + (r mod 2 l )).
We let d l be the l-th bit of d, r l be the l-th bit of r, and λ be the comparison result of d l and r l . If d l < r l , then λ = 1. We can compute ||v|| by ||d l ||·||r l ||·||λ||. The correctness will hold as long as we do not experience carry-overs modulo N . In particular, this implies that l + 1 + λ ≤ log 2 N . For operations over bits using GM encryption, we don't have this problem as we are operating on F 2 . Due to the homomorphic of GM encryption, we can get ||d l ||·||r l ||·||λ|| = d l ⊕r l ⊕λ . Therefore, the correctness of computing the l−th bit of z can be translated into the correctness of computing d l ⊕ r l ⊕ λ.

3) Correct Sort
The sort correctness implies that sorting the key-value arrayR = { [id 1 ], e y , . . . , [id d ], e d } according to the weight scores is correct, which means the sort result R s = { [id 1 ], e 1 , . . . , [id d ], e d } is a re-encryption of array R that satisfies P.Dec sk p (e 1 ) ≤ P.Dec sk p (e 2 ) ≤ . . . ≤ P.Dec sk p (e d ).
After the two halves ofR have been sorted separately, it is obvious that for all q between 1 and d except for 1 and n/2 + 1, the (q − 1)-th element of the array is less than the q-th. Call the (q − 1)-th key the predecessor of the q-th. Note that 1 and n/2 + 1 are both odd. Every even-indexed score has as its predecessor a number smaller than itself (since any even-indexed score and its predecessor are in the same half), so the l-th smallest even-indexed score must be larger than at least l odd-indexed scores. Similarly, every odd-indexed key (with the possible exception of two scores, namely the 1st and ((n/2 + 1)-th) has as its predecessor a number smaller than itself, so the l + 1-th smallest odd-indexed score must be larger than at least l − 1-th even-indexed scores.
We denote by D q the q-th indexed score after sorting the first and second halves and the even and odd halves, which satisfies D q = P.Dec sk p (e q ). We have just argued that D 2l−1 D 2l and D 2l−2 D 2l+1 , for any appropriate l. Since we have sorted the even indexed scores and the odd indexed scores, we also know that D 2l−1 D 2l and D 2l−2 D 2l+1 . Thus, if we group the elements in pairs (D 2l , D 2l+1 ) for each appropriate l, we see that both elements of a pair are greater or equal to both elements of the previous pair. To finish the sort after sorting the odds and evens, it is therefore only necessary to compare the (l + 1)-st smallest odd-indexed key D 2l+1 to the l-th smallest even-indexed key D 2l for each appropriate l to determine its rank completely; this is precisely what the final step does. So we can get that D 1 ≤ D 2 ≤ . . . ≤ D d . Therefore, the output of sorting is a correct ranked result

B. SECURITY ANALYSIS
First, we analyze the information leaked to the servers S 1 and S 2 in TS-RDSE. The formal definition will be given afterwards. Our goal is to prove that, any PPT adversary can obtain no information about the data and queries, except the information in the leakage functions. The leakage functions are described as follows: • In the setup phase, given the encrypted secure index γ = (I, S) and the ciphertexts c, S 1 can get the size of the search index vector I, the number of elements #S[id i ] of S, the length of the each set S[id i ] and the size of each ciphertext c ∈ c. We denote these by L 1 , i.e., • In the search phase, in each search query, the search operation reveals to S 1 the association between each keyword w and the pseudonym of the file containing the keyword. Specifically, let Q = (τ 1 , . . . , τ q ) be the set of search tokens, where l max be the longest search token in the set Q, and the search history can be expressed as a binary fourth-order matrix Q with the size of n q × n q × l max ×l max . Moreover, during the interaction with S 1 , S 2 get R i for each query. Therefore, after q times queries, the search operation reveals to S 2 a binary two-order matrix R q×q . We denote these by L 2 , i.e., • In the sort phase, S 1 and S 2 need to interact with each other. In each round of sorting interaction, the knowledge that S 1 can learn the information obtained from running SC: (e x , e y , l); where S 2 can learn ([[z]], [λ]). After multiple rounds of interaction, S 2 also get the number of rounds of interaction-(logd) 2 , then he can infer the number of files d included in the results. We denote these by L 3 , i.e., (e x,i , e y,i , l) S 1 , • In the update phase, from the update token τ f = (τ 1 f , τ 2 f ), S 1 can learn the ciphertext size of update file c f , get the index set size #I f corresponding to the pseudonym of the update file id f , so as to know the number of keywords #w f it contains and the length |S[id f ]|of the set S[id f ] in the weight index S. We denote these by L 4 , i.e., Now we use the following theorem to claim that the construction of TS-RDSE is (L 1 , L 2 , L 3 , L 4 )-secure against adaptive chosen-keyword attacks secure in random oracle model.
Theorem 1: If the SKE scheme, Paillier and GM encryption cryptosystem are CPA-secure, F is pseudo-random function, and the DGK protocol is proved semantic secure in random oracle model then the construction of TS-RDSE is secure against adaptive chosen-keyword attacks in the random oracle model.
Proof: The primary goal of providing this proof is to construct two simulators Sim 1 , Sim 2 that can generate the simulated values in Ideal(1 k ) using the information given in these leakage functions described above.
We show that the outputs of the simulator Sim 1 and Sim 2 are indistinguishable from the view of A 1 and A 2 in Real(1 k ) using a sequence of hybrid games, where each of them simulates one or more non-revealed encrypted values than the previous game. As a consequence the first hybrid game corresponds to Real(1 k ) and the last one corresponds to Ideal(1 k ).
Given the information received from L 1 , the simulator Sim 1 could learn the length and the structure of encrypted index I, the length of the set S[id i ] in the weight index S and the length of each ciphertext c ∈ c. Then it can use randomly chosen strings to construct these structures and produce these values as the simulated one (Ĩ,S,c). For all f ∈ f , computesc f = SKE.Enc K 4 (0 |f | ). The simulated ciphertextc are all generated by (c f 1 , . . . ,c f n ), so the CPA security of the SKE encryption scheme prevent the adversary to distinguish c fromc. Padiing random vectorṽ r · r and the pseudo-randomness of the ciphertext of Paillier and GM schemes in the indexes (I,S) prevent the adversary to distinguish (I,S) from (Ĩ,S). Therefore, Output Real A S (Ĩ,S,c) ≈ Output Ideal Sim 1 (I, S, c). Given the information received from L 2 , the simulator Sim 1 should respond the simulated search token during the adversary's queries. These steps become more complex due to the fact that simulator needs to track the dependencies between the information revealed by these queries to ensure consistency among these simulated tokens. From the interact, the simulator Sim 1 learn the search result R i and the binary string v W satisfying the search request by multiplying τ i and the search index vector τ i . It can locate the pseudonym set of the corresponding file set D = (id 1 , . . . id d ) through the information in the v W , and obtain the corresponding correlation encrypted weight (e 1 , . . . , e d ). In the process mentioned above, the association between each keyword identifier id(w) and the pseudonym of the file containing the keyword. Specifically, let Q = τ 1 , . . . , τ q be the set of search tokens, l max represents the longest search token in the set Q, and the search history can be expressed as a binary fourth-order matrix Q n q ×n q ×l max ×l max . Therefore, Output Real At every pair, A 1 's view can be denoted as view A 1 = (e x , e y , l, pk GM , pk P , r; λ , [z l ]), in which pk P is the secret key of Paillir Encryption. Given (e x , e y , l, pk GM , pk P ), we can build Sim 1 to simulate A 1 : • choose R S ← (0, 2 λ+l ) Z • randomly chooseλ,z l , generate λ , z • output view Sim 1 = (e x , e y , l, pk GM , pk P , R S ; [z l ]). For view A 1 and view Sim 1 , r is extracted from uniform distribution (0, 2 λ+l ) Z , and z is the second layer ciphertext of Paillier encryption, which is randomness. So (e x , e y , l, pk GM , pk P ) = (e x , e y , l, pk GM , pk P , r, [z l ]). Besides, r and R S are extracted from the same uniform distribution, based on the CPA-secure Paillier encryption, we can get: A 1 's view and Sim 1 's view are computational indistinguishable.
In a similar way, at every pairs, A 2 's view can be denoted asview A 2 = (sk GM , sk P , [[z]], λ ). We can build Sim 2 to simulate A 2 : • randomly chooseλ, generate λ , to represent that P.Dec sk P (e x ) ≤ P.Dec sk P (e y ).
Sim 1 and Sim 2 then simulate (logd) 2 times Batcher's protocols. At the end of the sort protocol, we can get: Given the information received from L 4 , the simulator Sim 1 could learn the vector set size #I f corresponding to the pseudonym of the update file id f , so as to know the number of keywords #w f it contains and the length of the set S[id f ] in the weight index S: Given the information received from L 4 , Sim 1 can use randomly chosen strings to construct simulated update The above values are constructed by Paillier and GM encryption scheme and random vectors, padding random vectorṽ r · r and the pseudo-randomness of the ciphertext of Paillier and GM schemes making Output Real A 2 ≈ Output Ideal Sim 2 . Therefore, we can meet the security definition defined in Definition 2, that is, the TS-RDSE scheme is (L 1 , L 2 , L 3 , L 4 )-secure against adaptive chosen-keyword attacks secure in random oracle model for all polynomial time adversaries.

A. THEORETICAL ANALYSIS
In this section, we measure the performance of TS-RDSE by evaluating the complexity in terms of computation and communication overhead.
First of all, we analysis the properties of TS-RDSE and compare it with some related searchable encryption schemes that supporting search result ranking. In Table 1 we can see that, [11] achieves ranking based on coordinate matching (CM). The rank method in [14] is based on IPS to cluster similar files. [22] designed an LSH-based scheme. However, the above rank methods only calculate the number of keywords matching in each file, which did not take into account the keyword importance. Moreover, the search results in the above schemes and in [15], [18] have a certain rate of false positives, which are suitable for similarity search without providing an exact sort. TS-RDSE uses tf-idf formula to calculate the relevance weight of each keyword-file pair, and compares the encrypted weight scores based on the proven-secure comparison protocol. It achieves the precise ranking of search results and supports flexible dynamic update for the inverted index.
In terms of security, [23] uses a heuristics method to hide search patterns and access patterns, so its security analysis is not within the comparison scope. [14] builds the search index based on fully homomorphic encryption (LWE-Brakerski). [13] and [14] achieve rank privacy by assuming the server just performs search and the rank is performed by user-side. The above methods reach high security but lead to high computation or communication cost. [18] and [23] also use two non-collusion two-server setting to rank the search results, however, during the interaction between servers, one of the servers can access the plaintext of each search result, which leak the search and rank privacy. TS-RDSE encrypts the weight scores based on homomorphic encryption algorithm and the servers can calculate and compare the encrypted weights without decryption, so as to sort the search results without obtaining additional information. In addition, the collaborate server S 2 in TS-RDSE only plays an assistant role in the searching and sorting processes, so S 2 learns nothing more than he knew already.
We also evaluate the efficiency of TS-RDSE in Table 2. Let N be file number, M be keyword number, q be the number of search keywords, q U be keyword number in an update file, and d be the file number in search results. For storage complexity, since U only needs to store his secret key, regardless of the files number and sizes, his storage overhead is constant size. The cloud server stores the set of file ciphertexts c and the encrypted index γ , so its storage overhead of the set of file ciphertexts c is up to the size of the file sets and the encryption algorithm SKE.Enc. For the encrypted index γ , the storage overhead is mainly determined by file number N and the keyword number M.
The token generation complexities in [11], [15] and [17] are O(qN), O(N(logN)) and O(q*N), which are all affected by the total file number and not suitable for very large dataset. In [11], the computation cost of generating a search token is O(M 2 ). In TS-RDSE, user only needs O(q) complexity to compute the search token, which is independent with the number of files number and keywords in the cloud. Therefore the computation cost for generating search request is much lower than other schemes.
In response to the search request, the total search computation cost in [17] is O (N(logN)). And the server in [11] needs to do multiplication to compute vector products for O(qN + M 2 ) times to perform search. In terms of TS-RDSE, two servers are required to carry out two-party protocols with the complexity of O(qd + (logd) 2 ) for generating ranked search result with d elements, where O(qd) for searching and O((logd) 2 ) for ranking. So the total computation complexity in TS-RDSE to search and sort is slightly higher than that of scheme [14], [17] in the case of small dataset and large search result. To get the sorted results, the schemes in [13], [14] needs user to perform rank algorithm after decrypting all the search results with complexity of O(M 2 ) and O(qd). [15] deployed a interactive protocol between user and server to rank the result with computation cost of O(qd) and O(d).
For update, the worst update computation cost in TS-RDSE is linear with the update keyword number O(q), since update needs only O(q U ) times orthogonal vector addition and subtraction respect to the secure index, which is also not affected by the amount file number. In [17], the update computation cost for the update process is O(M 2 logN), which is affected by file number and keyword number. Compared with [17] and others, TS-RDSE achieves lightweight update and can flexible extended to large dataset.
Therefore, from Table 1 and 2, we can demonstrate that TS-RDSE has both a very light user workload and a moderate server workload while being secure against honest-but curious secure model. In nowadays's distributed cloud storage environment, how to reduce the cost of user-side storage and computation cost to ensure more flexible index update and secure server-side rank in dynamic environment is the primary concern, so TS-RDSE is more suitable for actual cloud storage deployment requirements.

B. EXPERIMENT ANALYSIS
We implement TS-RDSE in C++ under the 256 bit system security parameter. The experiments were run on linux ubuntu installed on a rotational disk. We implemented a job allocation mechanism in the server end that acts as the master server, and used threads to simulate the collaborate server that do the assistant job. As a result, our experiment environment simulates a real dual-model cloud service: to perform search, the clients only communicate with the master server, and the master server collaborate withe another server to do the search job. As a comparison, we compared TS-RDSE with the scheme in [11] which also conducts multiplication of vector with matrix-based inverted index to perform keyword search. Due to the two-server ranked search is unsupported in [11], the rank method in [11] was executed by coordinate matching, so the above phase is incomparable with ours.
In the first experiment, we tested the factors that affect index storage overhead by choosing 4 different keywords dictionaries for the size of 2000/4000/6000/8000 words. The experiment results are shown in Figure 2. The abscissa is the size of the keyword dictionary, and the ordinate is the size of the storage space. It can be found that the index storage size of TS-RDSE does not change significantly with the increase of the number of keywords, because TS-RDSE uses the orthogonal vector addition method to build the index, the corresponding sub-indexes of multiple keywords are combined into one single vector.
In the second experiment, we focused on factors that affect T okenGen algorithm performance. We divided the experiment into two parts.
We first tested how different number of search keywords affected the running time of the T okenGen algorithm by choosing the keyword dictionary for the size of 2000 words and the number of search keywords as 5/10/15/20. The experiment results are shown in Figure 3. The abscissa is the number of search keywords, and the ordinate is the T okenGen time. Furthermore, we also tested how different sizes of keywords dictionary affects the T okenGen algorithm time by choosing 5 search keywords by default, and keyword dictionaries are selected as 2000/4000/6000/8000, respectively. The horizontal coordinate is the size of keyword dictionary, and the vertical coordinate is the generation time of token. It can be seen that the T okenGen running time is linear with the search keywords number, but does not change significantly as the keyword dictionary increases. This is because the length of the secret vector in the search token corresponding to the keyword is constant, it cannot significantly change the size of the keyword dictionary.
It is worth mentioning that, the algorithms can be further optimized using parallel implementation techniques which can greatly reduce the sort time at the server side, such as the multi-thread execution of sorting. However, in order to honestly reflect the original efficiency of the algorithm, we hadn't use any optimize techniques, just simply let the server do all the computation in single thread each time.

VIII. CONCLUSION AND FUTURE WORK
In this paper, we proposed TS-RDSE-a two-server ranked dynamic searchable encryption scheme. TS-RDSE integrates homomorphic encryption cryptosystem and orthogonal vector to build a vector-level secure index, server perform simple vector multiplication between request and index to judge whether the search keywords appears. Also, the special construction of the index simultaneously supports efficient dynamic update operations like deletion and insertion of documents flexibly. In order to rank the results without decryption, we combine the widely-used tf-idf formula and partial homomorphic encryption cryptosystems to build a MPC-based secure sorting protocol, which achieves oblivious sorting while protecting the privacy of relevance score. We give a comprehensive analysis of the correctness of TS-RDSE in the aspect of searching and sorting with formal mathematical proofs. The security analysis shows that TS-RDSE is proven (L 1 , L 2 , L 3 , L 4 )-secure against adaptive dynamic chosen-keyword attacks secure in random oracle model. Performance analysis and extensive experiments shows that TS-RDSE is superior to the existing approaches in terms of functionalities and suitable for the real world cloud storage environment.
As for the future work, we consider four main directions as follows: • In TS-RDSE, adding or deleting of keywords will alter the weight scores of the entire dictionary. So our future work will be focused on designing new update methodology from the existing dictionary incurring minimum overhead on index re-construction.
• In terms of security, our vital research direction is how to construct dynamic ranked SE schemes with forward and backward privacy.
• Moreover, improving rank performance in the serverside is also an important direction for future work, especially to deal with the search requests over large-scale cloud data.