An Efficient Ciphertext Retrieval Scheme Based on Homomorphic Encryption for Multiple Data Owners in Hybrid Cloud

More and more individuals and enterprises outsource data and applications to cloud servers in recent years. Since the public cloud servers are not completely trusted, users usually encrypt important data before sending it to cloud servers. As a result, ciphertext retrieval technology has gradually become a research hotspot. In the existing related schemes, there are some defects such as not supporting "multiple owners" mode and multi-keyword retrieval, having low retrieval efficiency, accuracy and security, and difficult data updating. Hence, we propose an efficient Ciphertext Retrieval scheme based on Homomorphic encryption for Multiple data owners in hybrid cloud (CRHM), in which the public cloud server and the private cloud server cooperate to perform the ciphertext retrieval. In CRHM, an encrypted balanced binary index tree structure and a homomorphic encryption method based on large integer operations are designed to support "multiple owners" mode and multi-keyword ranked retrieval. The security analysis shows that CRHM can effectively guarantee the privacy and security of user file and retrieval, and the performance evaluation demonstrates that compared with the existing related schemes, CRHM has high efficiency in the index generation and retrieval processes, while keeps relatively high retrieval accuracy.


I. INTRODUCTION
With the rapid development of cloud computing [1], more and more individuals and enterprises outsource data and applications to cloud servers, so as to obtain many benefits brought by cloud services. In recent years the hybrid cloud has gradually become one of the main modes of cloud computing, and is also the further development direction of cloud computing [2], which integrates both the public cloud and the private cloud. Typically, public clouds are responsible for storing large-scale data, and private clouds are responsible for handling sensitive data. The main application scenarios of hybrid cloud technology are oncloud disaster recovery, cross-cloud backup, and proprietary distributed storage services. However, the public cloud server providers are usually not completely trusted, they may snoop and analyze some important privacy information stored by users. The data needs to be encrypted before outsourcing to the public cloud server, but at the same time, it brings new challenges such as data access and operation difficulties.
Song et al. [4] first proposed a ciphertext retrieval scheme based on symmetric encryption, which is the beginning of the ciphertext retrieval. Boneh et al. [7] proposed an asymmetric retrievable encryption scheme for the first time, this scheme only supports the linear matching of single keyword, which will disclose the privacy information of users in the retrieval VOLUME XX, 2017 1 process. Zhou et al. [10] proposed a partial homomorphic encryption scheme over integer vectors, which improves the efficiency and security of retrieval, and supports the relevancy ranking of single keyword retrieval. Cao et al. [11] proposed a multi-keyword ciphertext retrieval scheme for the first time, to achieve the sorting of retrieval results, this scheme adopted the kNN algorithm, and calculated the inner product similarity between index vector and query vector. Li et al. [12] improved the secure kNN algorithm and built a B+ index tree to achieve efficient query. Fu et al. [13] proposed a personalized multikeyword ciphertext retrieval scheme, which builds a user interest model for individual users by analyzing their search history with the help of semantic ontology WordNet to rank files.
The above schemes all follow the "single owner" mode, which means that users retrieve data in the outsourcing dataset belonging to only one certain data owner. In reality, different data owners have different fields of expertise, and there are large quality differences in the outsourcing data of multiple data owners, so users usually expect to retrieve high quality data from all the datasets of multiple data owners at the same time, which is "multiple owners" mode [15]. Obviously, the retrieval accuracy of "multiple owners" mode schemes is higher than that of "single owner" mode schemes. Subsequently, Peng et al. [16] proposed a multi-keyword ranked retrieval for multiple data owners over encrypted cloud data. The tree-based indexes are generated by different data owners and merged in the cloud server to improve retrieval efficiency. Yin et al. [17] proposed a secure joint multikeyword ranked retrieval for multiple data owners over encrypted cloud data, where elliptic curve algorithm is used to calculate the similarity score of index vector and query vector to achieve top-k retrieval. In the above two schemes, each data owner needs to encrypt its own files and indexes, and the efficiency of creating indexes is relatively low. Guo et al. [18] proposed a secure multi-keyword ranked retrieval for multiple owners on cloud server, in which a trusted agent is introduced to manage key distribution, and a grouped balanced binary tree is built to achieve efficient query. However, although the asymmetric scalar-product-preserving encryption used in this scheme can easily implement searchable encryption, its security risk is relatively high. Moreover, it needs to regenerate all the security indexes every time the indexes need to be updated, which is inefficient. To sum up, at present there is no multi-keyword ciphertext retrieval scheme with high efficiency, high accuracy and high security for "multiple owners" mode in the cloud environment. Thus, it is of great significance to research on designing such a scheme Based on the above research works, in this paper we propose an efficient Ciphertext Retrieval scheme based on Homomorphic encryption for Multiple data owners in hybrid cloud (CRHM). In CRHM, the public cloud server and the private cloud server cooperate to perform the ciphertext retrieval process. CRHM supports "multiple owners" mode and multi-keyword ranked retrieval. In addition, it does not disclose any privacy information related to data files, keywords and indexes to the public cloud server and users. Compared with the existing related schemes, CRHM improves the retrieval efficiency significantly while keeps relatively high retrieval accuracy. Our main contributions are as follows: • We construct CRHM based on the characteristics of hybrid cloud, in which the public cloud server provides sufficient computing and storage resources to store ciphertext files and security indexes, and implement operations of ciphertext retrieval. The private cloud server performs the generation of security indexes and trapdoors for users to ensure the security and privacy of the retrieval, so that users can get rid of the heavy tasks. • We design an Encrypted Balanced Binary Index (EBBI) tree to generate and store security indexes. CRHM collects the popularity information of files belongs to multiple data owners to obtain the weighted indexes which are generated as scheme [17], so that it can provide the desirable ranked results which are not only relevant to query data, but also of high quality from multiple owners. The difference is that we encrypt the weighted indexes to generate security indexes and then construct an EBBI tree for each partition of files after grouping them. The EBBI tree structure can greatly improve the efficiency of index generation and retrieval. Furthermore, this structure makes it more convenient to update indexes.
• We design a novel ciphertext retrieval method based on homomorphic encryption. The method realizes secure and efficient multi-keyword ciphertext ranked retrieval in hybrid cloud by using additive homomorphism of large integer modular operation.

A. DYNAMIC WEIGHT MODEL
The traditional TF-IDF model [19] is widely used in the ciphertext retrieval system of "single owner" mode. However, it is not useful in the ciphertext retrieval system of "multiple owners" mode. Guo et al. [18] designed a dynamic weight model to obtain the weighted indexes of files which belong to multiple data owners. In general, if the files of a data owner are more popular and professional in a subject, they should be retrieved preferentially when data users retrieve with keywords about this subject, so that the evaluation of files should not only consider the correlation with query keywords, but also consider the quality of files of multiple data owners.
In order to accurately calculate the weights of keywords, first the Jaccard similarity coefficient [20] is used to evaluate the degree of correlation between different keywords. The correlation coefficient Sx, y between each two keywords can be defined as follows: VOLUME XX, 2017 1 In formula (1), dx and dy represent the file set containing keyword x and file set containing keyword y respectively, L (dx ∩ dy) represents the number of files containing both x and y, L (dx ∪ dy) represents the number of files containing at least x or y, and β (β∈ [0,1)) represents a variable parameter, which is used to adjust the correlation results of keywords. Obviously, the more files containing both x and y, the more relevant the two keywords are. Next, a correlation matrix S can

A. SCHEME OVERVIEWE
There are four entities in CRHM: 1) Public Cloud Server (PuCS). PuCS is operated by largescale enterprises, which can provide users with large storage space and data processing capacity.
2) Private Cloud Server (PrCS). PrCS is built by small and medium-sized enterprises, which is used to handle important security operations for users. 6) DU contacts DO to get the symmetric key and decrypts the ciphertext files to obtain the plaintext files.

B. THREAT MODEL
In CRHM, we assume that PuCS is semi-trusted, that is, it executes programs honestly without tampering with the instruction sets from DO and DU, however, it also snoops data content contained in the user file and retrieval process as much as possible. We assume that PrCS is secure and reliable, which performs data operations with high security requirements for DO and DU. As described in literature [25], CRHM is also based on two threat models:

C. ALGORITHM DESIGN
The algorithm of CRHM includes five functions. For a file partition P i which contains a file set F i , Function 1 generates a plaintext index set I i of F i .

Function 1. Plaintext Index Generation
which is the number of keywords in Dp i , and each bit of I i, j depends on whether each keyword in Dp i appears in f i, j , if a keyword appears, the corresponding bit is 1, otherwise it is 0; 3. Return I i .
Using the dynamic weight model, Function 2 generates a weighted index set Ĩ i of F i according to I i , the partition dictionary Dp i , and the popularity information of files in F i .

Function 2. Weighted Index Generation input:
The plaintext index set I i = (I i,1 , I i,2 , ..., I i, r ) T of file set F i , the partition dictionary Dp i , the popularity information of files in F i Output: The weighted index set Ĩ i of F i 1. Calculate a correlation coefficient between every two keywords in Dp i using formula (1); 2. Generate a correlation matrix S i which contains the correlation coefficients of each two keywords in Dp i ; S i is a symmetric matrix with the same number of rows and columns, which is the number of keywords in Dp i , and each element in S i is the correlation coefficient between the corresponding keywords; 3. For each data owner O l : 4. Generate a popularity vector PV l = (PV l,1 , PV l,2 , ..., PV l, t ), in which t is the number of files of O l in F i , and each element is the popularity information such as download amount and click rate of the corresponding file of O l ; 5. Calculate an average popularity vector AP l = (PV l I l )α l , in which I l = (I l,1 , I l,2 , ..., I l, t ) T , where each element is the plaintext index vector of a file of O l in F i , α l = (α l,1 , α l,2 , ..., α l, d ), and each element is the reciprocal of the number of files which contain the corresponding keywords in Dp i , the operator "" represents the product of the corresponding elements of the two vectors; 6. Calculate an original weight vector W l Function 3 generates a balanced binary index tree τ i of F i based on Ĩ i .

Function 3. Balanced Binary Index Tree Generation
Input: The weighted index set Ĩ i of file set F i Output: The balanced binary index tree τ i of F i 1. For each weighted index Ĩ i, j in Ĩ i : 2. Generate a leaf node u; 3. u.id ← a unique identifier of u; 4. u.fid ← the identifier of the corresponding file f i, j ; 5. u.val ← the value of Ĩ i, j ; 6. Add u in a queue que; 7. End for 8. For each leaf node u in que: 9. Select a leaf node u' which satisfies the inner product of u.val and u'.val is the maximum among all other nodes in que; 10. Generate a parent node v for u and u'; In Function 4 ， we design a formula with additive homomorphism to get the security index, C t = B t * K 1 + K 1 * K 2 * q t . In Function 5, it is also used to calculate the trapdoor. Because of its additive homomorphism, Function 6 can rank the files by calculating the correlation scores. For one retrieval, Function 5 generates a trapdoor T i for P i according to the query data and Dp i . Function 5. Trapdoor Generation Input: The query data, the partition dictionary Dp i Output: The trapdoor T i for partition P i 1. Generate a query vector Q i based on the query data and Dp i , the length of Q i is equal to d, and each bit of Q i depends on whether each keyword in Dp i appears in the query data, if a keyword appears, the corresponding bit is 1, otherwise it is 0; Select a random large prime number q t ', q t ' << K 1 ; 5. Encrypt b t ' as C t ' = B t ' * K 1 + K 1 * K 2 * q t ', in which A, K 1 , K 2 are the same as those in Function 4; 6. End for 7. Obtain the trapdoor T i = (C 1 ', C 2 ', ..., C d '); According to τ i * and T i , Function 6 implements file retrieval in P i , and the top k ciphertext files with the highest scores are selected. Function 6. Retrieval

Input:
The trapdoor T i for partition P i , the EBBI tree τ i * of file set F i , the shared key K 2 Output: The top k ciphertext files in P i with the highest scores 1. Traverse τ i * with the deep-first search strategy beginning from the root node of τ i * : 2. When traversing to a leaf node u: 3. Calculate the inner product R u of T i and the security index Ĩ u * in u as: ...
Calculate the correlation score Sc u as: , Sc u ' is the actual correlation score between Q i and the index Ĩ u ; 5. Add Sc u in a queue que with length k; 6. If there are already k scores in que, stop traversal and define Sc m as the minimum score in que; 7. Else continue to traverse; 8. Continue to traverse τ i * with the deep-first search strategy beginning from the node next to the kth leaf node until traversal is complete: 9. When traversing to a non-leaf node u: 10. Calculate the correlation score Sc u as the steps 3 and 4; 11. If Sc u <= Sc m , return to the previous node and continue to traverse; 12. Else continue to traverse the child nodes of u; 13. When traversing to a leaf node u: 14. Calculate the correlation score Sc u as the steps 3 and 4; 15. If Sc u ≤ Sc m , continue to traverse the next node; 16. Else replace the minimum score with Sc u in que, redefine Sc m as the minimum score in que and continue to traverse; 17. When traversal is complete, return k scores in que and the ciphertext files corresponding to the leaf nodes with these scores.

D. EXECUTION STEPS
CRHM consists of five steps: data preprocessing, weighted index generation, security index generation, ciphertext retrieval and dynamic update, which are described as below.
1) Data preprocessing. Each DO sends the plaintext file set to PrCS, then uses a symmetric encryption algorithm (AES or DES) to encrypt each plaintext file respectively and sends the ciphertext file set to PuCS.
2) Weighted index generation. PrCS receives the file sets sent by DOs, and divides all files into m partitions randomly (m = log2n, n is the total number of files). The purpose of file partition is to improve the efficiency of index generation and retrieval, and provide convenience for data update. However, the larger the number of file partitions is, the shorter the length of partition dictionary is, which will lead to the decline of retrieval precision. Here files are divided into log2n partitions to improve the retrieval efficiency significantly and ensure high retrieval precision simultaneously. Then, PrCS uses Function 1 to generate the partition dictionary and the plaintext index set of each partition.
PrCS collects the popularity information of files, and calls Function 2 to generate the weighted index set of each partition VOLUME XX, 2017 1 based on the plaintext index sets, partition dictionaries and popularity information. Then the weights of DO to keywords are calculated, based on which the weighted indexes of files of DO are generated. Thus, the weighted indexes can represent the quality of files, and high-quality files will be obtained when retrieval is performed on them.
3) Security index generation. For improving the retrieval efficiency, PrCS calls Function 3 to generate a balanced binary index tree for each partition, whose leaf node stores the identifier and the weighted index of each file. An example of balanced binary index tree is shown in Figure. 2. Assume that there are 7 files in partition P i which belong to 3 DOs, files f i,1 , f i,2 , f i,3 belong to O 1 , files f i,4 , f i,5 belong to O 2 and files f i, 6 , f i,7 belong to O 3 . Besides, we assume that the number of keywords in partition dictionary D pi is 5. PrCS generates 7 leaf nodes for these files and constructs a balanced binary index tree τ i according to Function 3. Then PrCS encrypts τ i using Function 4. In Function 4, all the indexes in the nodes of τ i are encrypted as the security indexes, and the structure of τ i is kept unchanged, so that an EBBI tree τ i * is generated, in addition, τ i and τ i * are isomorphic. In this way, PrCS generates m EBBI trees for m partitions, and sends them to PuCS for storage. 4) Ciphertext retrieval. When DU needs to retrieve files, he sends the query data to PrCS, PrCS calls Function 5 to generate a trapdoor T i for partition P i and total m trapdoors are combined to form a new integrated trapdoor T q . PrCS sends T q to PuCS, and PuCS performs retrieval in all m EBBI trees in parallel according to Function 6. When retrieval is finished, PuCS gets total m*k scores, then selects top k scores from them, and returns the corresponding ciphertext files to DU. At last, DU contacts DO to get the symmetric encryption keys and decrypts the ciphertext files. In Function 6, PuCS calculates the correlation score Sc u between T i and the security index Ĩ u * in τ i * by shared key K 2 . Here Sc u is a kind of similarity score and is proportional to the actual score Sc u '. The proportion coefficient is the square of private key K 1 . As a result, PuCS does not need to decrypt T i , Ĩ u * and then calculate Sc u ', it only needs to obtain Sc u , sort the scores and finally return the ciphertext files corresponding to the leaf nodes with top k scores.
As the example shown in Figure. 2, we assume the query vector Q i is {1, 0, 0, 0, 1}, the parameter A in Function 4 and Function 5 is 10, the parameter k in Function 6 is 3. The retrieval starts from the root node, and reaches the first leaf node u 1 through v 4 and v 1 , the correlation score of f i,1 is 460K 1 2 . Next the retrieval reaches leaf nodes u 2 and then u 3 through v 2 , the correlation scores of f i,2 and f i,3 are 490K 1 2 and 620K 1 2 respectively. At this time, the score queue que = {460K 1 2 , 490K 1 2 , 620K 1 2 } and the length of que is limited to 3. After that, the nodes u 4 , v 3 , u 7 are reached in order and que is changed while reaching u 4 and u 7 . Finally, que = {620K 1 2 , 1010K 1 2 , 1160 K 1 2 }, and the corresponding ciphertext files f i,3 , f i, 5 and f i,7 are returned. 5) Dynamic update. When DO does not need to update files, the weights of the keywords in the partition dictionary still change dynamically with the popularity information of files on PuCS. Therefore, PrCS needs to update the average popularity information and normalized weights of all keywords in the dictionary of each partition at regular intervals. In this case, PrCS does not need to recalculate the correlation matrixes and just recalculates the weighted indexes of files of all DOs, then generates new EBBI trees and sends them to PuCS.
When DO needs to update files, the partition dictionary changes and the weights of keywords in the dictionary also change. In this case, PrCS regenerates the dictionary and plaintext indexes of the partition where the file is located, and sends the new ciphertext file to PuCS. After that, PrCS regenerates the correlation matrix and weighted indexes in this partition, then generates a new EBBI tree and sends it to PuCS. When DO needs to update files, the partition dictionary changes and the weights of keywords in the dictionary also change. In this case, PrCS regenerates the dictionary and plaintext indexes of the partition where the file is located, and sends the new ciphertext file to PuCS. After that, PrCS regenerates the correlation matrix and weighted indexes in this partition, then generates a new EBBI tree and sends it to PuCS. FIGURE2. An example of balanced binary index tree τ i .

A. SECURITY ANALYSIS
We analyze and prove the security of CRHM under the known ciphertext model and the known background model respectively. And the notions used are listed as follows.
History: H = (∆ s , T s , Q k ), in which ∆ s is a partition of the file set, T s is an EBBI tree of ∆ s and Q k = {q1, …, qk} is a series of queries from users.
View: V(H) = (Enc SK (∆ s ), Enc SK (T s ), Enc SK (Q k )), which is encrypted from H. The original history H is invisible for PuCS, while the view is visible.
Trace of a history: A trace of H is the set of the trace of queries Tr(H) = {Tr(q1), …, Tr(q k )} which is get by PuCS from access pattern and retrieval results. And Tr(q i ) = {(δ j , ζ j ) q i ⊂ δ j , 1 ≤ j ≤ |∆s|}, where ζ j is the similarity score between the query q i and the file δ j . Theorem 1. CRHM is secure under the known ciphertext model.
Proof: If PuCS cannot distinguish two histories with the same trace generated by simulator, and then PuCS cannot VOLUME XX, 2017 1 explore more information about the index and the dataset except the access pattern and the retrieval results. Here, we introduce a simulator S to generate a V' that is distinguishable from PuCS's view V(H). The generation process of a view V' as follows: • S selects a random δ i '  (0, 1)|δ i |, δ i  ∆ s , 1 ≤ i≤ |∆ s |, then outputs ∆ s '= {δ i ', 1 ≤ i≤ |∆ s '|}.
• S generates a query Q k ' to simulate Q k and constructs trapdoor Td'(Q k ') = Enc SK '(Q k '), as follows: 1) for each q i  Q k , 1 ≤ i ≤ k, is generated. Each position is a randomly selected as 1 or 0, but the number of 1s is the same as Q k ; 2) Encrypt • S generates an index trees τ i ' for F i to simulate τ i and encrypt τ i ', as follows: 1) for each δ i '  ∆ s ', 1 ≤ i ≤ |∆ s '|, a dbit null vector Iδ i ' is generated as the index; 2) for each q j  Q k , if q j  δ i , 1 ≤ j≤ k, then the d positions of Iδ i ' are set as that of q j '; 3) an index tree τ i ' is constructed based on these index vectors, τ i ' is encrypted with SK' as Enc SK '(τ i ').
• S outputs the V' = (∆ s ', Enc SK '(τ i '), Enc SK '(Q k ')). From the above process, according to the same trace as those of PuCS, the EBBI tree Enc SK '(τ i ') and the trapdoor Enc SK '(Q k ') are generated. No PPT (probabilistic polynomialtime) adversary can distinguish view V' and V, or the encrypted file sets Enc SK (∆ s ) and file sets ∆ s ' with more than 1/2 probability since the semantic security of symmetric encryption. The scheme [38] has proved the indistinguishable of index tree and trapdoor which are generated by homomorphic encryption. So, theorem 1 has been proven.
Theorem 2. CRHM is secure under the known background model.
• S randomly generates a SK' = (K 1 ', K 2 ') as above, and a large prime number q t is selected randomly, q t ' << K 1 .
• S generates a query Q k ' to simulate Q k and constructs trapdoor Td'(Q k ') = Enc SK '(Q k '), as follows: 1) for each q i  Q k , 1 ≤ i ≤ k, is generated. Each position is a randomly selected as 1 or 0, but the number of 1s is the same as Q k ; 2) Encrypt • S generates an index trees τ i ' for F i to simulate τ i and encrypt τ i ', as follows: 1) for each δ i '  ∆ s ', 1 ≤ i ≤ |∆ s '|, a dbit null vector Iδ i ' is generated as the index; 2) for each q j  Q k , if q j  δ i , 1 ≤ j≤ k, then the d positions of Iδ i ' are set as that of q j '; 3) an index tree τ i ' is constructed based on these index vectors, τ i ' is encrypted with SK' as Enc SK '(τ i ').
In this process, the conclusion proved in Theorem 1 applies equally to Theorem 2. Although PuSC has a series of keyword-trapdoor pairs, it cannot distinguish the output of the linear analysis form a random string because of the indistinguishability of the randomness of file partitioning and randomness of large prime number selection. So, theorem 2 has been proven.

B. PERFORMANCE EVALUATION
The function comparison among CRHM and related schemes in terms of "multiple owners" mode, trusted organization, tree-based index, simple update, and high-quality file retrieval is described in TABLE Ⅰ.
Then we evaluate the performance of CRHM from five aspects: weighted index generation, security index generation, trapdoor generation, retrieval efficiency and retrieval precision. CRHM is compared with EMRS [18], MRSE [11] and PRSE [13] in the above aspects. These schemes are implemented on Intel Core i5-3230M 2.60 GHz processor and Windows 10 operating system platform using Java language. The data set used in the experiment is crawled from Google Scholar, which contains 10000 papers related to different fields. And the average of each file is 2MB, and the number of keywords in each file is set to 20. So, the parameters d = 100, A = 10, k = 3, and K 1 , K 2 , q t , q t ' are selected randomly, where K 1 is a 64-bit integer, K 2 is a 512-bit integer and q t , q t ' are 8bit prime numbers.
Weighted index generation. We compare and analyze the efficiency of weighted index generation of these schemes under different file sets. Figure. 3 shows the variation of the generation time of weighted indexes with the total number of files. In CRHM, the whole process of weighted index generation mainly includes file partition, partition dictionaries generation, plaintext indexes generation and weighted indexes generation; In EMRS, there are no partitions, a public dictionary is prepared instead of partition dictionaries, and plaintext indexes and weighted indexes are generated according to the public dictionary. As for MRSE and PRSE, plaintext indexes are generated with a public dictionary like EMRS, but plaintext indexes do not need to be further processed. Thus, we treat the plaintext indexes here as weighted indexes. In CRHM, the sizes of keyword dictionaries are reduced by partitioning files, so that the time overheads are also decreased while generating plaintext indexes and weighted indexes. Therefore, for any number of files, the generation time of weighted indexes in CRHM is significantly less than that of EMRS, MRSE and PRSE.
Security index generation. After the generation of weighted indexes, they need to be encrypted with specific methods to generate the security indexes. In CRHM, we divide files into multiple partitions, and an EBBI tree is generated for each partition with the homomorphic encryption method. In EMRS, the secure kNN method is used to encrypt weighted indexes and generate a large grouped balanced binary tree to store security indexes. In MRSE and PRSE, the plaintext indexes are also encrypted with the secure kNN method, but the security indexes are stored directly instead of using treebased structure. The variation of the generation time of VOLUME XX, 2017 9  security indexes with the total number of files is as shown in Figure. 4. It can be seen that, MRSE and PRSE are more efficient than EMRS because of their simple processing of indexes. Moreover, because of the superiority of the homomorphic encryption and EBBI tree, the efficiency of security index generation in CRHM is the highest among all the schemes for any number of files. Trapdoor generation. According to query data, trapdoors are generated with the keyword dictionaries and encryption method. In EMRS and MRSE, query vectors are generated according to the query data and public dictionary, and trapdoors will be generated with the secure kNN method based on the query vectors. In PRSE, query vectors are generated as the same, and further processed as the weighted query vectors, based on which trapdoors are generated with the secure kNN method. As for CRHM, each trapdoor is generated by the partition dictionaries and homomorphic encryption method. Figure. 5 shows the variation of the generation time of trapdoors with the number of query keywords. The capacity of the partition dictionaries in CRHM is much smaller than that of the public dictionary in EMRS, MRSE and PRSE, and the homomorphic encryption method in CRHM is very efficient, so that the efficiency of trapdoor generation in CRHM is higher than that in other three schemes for any number of query keywords. Retrieval efficiency. In order to illustrate the retrieval efficiency of these schemes, we use the same query data to retrieve files and then compare the retrieval time. In CRHM, EBBI trees are traversed in parallel and the files are sorted by the inner products of trapdoors and security indexes which are calculated with the feature of homomorphic encryption based on large integer operations. In EMRS, MRSE and PRSE, the inner product is calculated with the feature of complex matrix operations. In addition, In EMRS, the large grouped balanced binary tree is traversed while all indexes without any treebased structure are linearly traversed in MRSE and PRSE. Figure. 6 shows the variation of retrieval time with the total number of files, where the number of files to be retrieved out is set to 10, and Figure. 7 shows the variation of retrieval time with the number of files to be retrieved out, where the total number of files is set to 1000. We can get similar results with other settings. It can be seen that CRHM has better retrieval efficiency compared with other three schemes in various situations. Retrieval precision. We compare the retrieval precision of these schemes. The retrieval precision here refers to the ability of the scheme to distinguish the files with different qualities but involving similar topics. In MRSE, coordinate matching method is used to calculate the correlation score, in which all keywords in security index are regarded as equivalent. In PRSE, TF-IDF model is used and each keyword in security index is given weight that considers the importance of the keyword to files. In EMRS and CRHM, dynamic weight model is used and each keyword is given weight that considers both the importance of the keyword to files and the file popularity information of multiple DOs.
In the experiment, we choose two file sets, in which the corresponding files contain the same keywords, and the files in file set A are far more popular than the corresponding files in file set B. For these four schemes, we calculate the inner product results of the query vectors and weighted indexes of files in the two file sets respectively, so as to rank files for simplicity. The results are shown in Table Ⅱ. We number the files in the two file sets in order respectively, FID represents the file number, and the value in Table Ⅱ (e.g., 10/10) contains two parts, in which the first part represents the inner product of the file in A, and the second part represents the inner product of the file in B. Since the qualities of files in A are higher than those in B, the larger the quotient value of the two parts is, the higher the retrieval precision is. As we can see, in MRSE and PRSE, the inner products of files in A are the same as those in B, so high quality files in A cannot be selected preferentially. When using dynamic weight model, the inner products of files in A are obviously larger than those in B, therefore CRHM and EMRS achieve more precise retrieval, which can find out higher quality files in A. In addition, the quotient value of the two parts in CRHM is a bit smaller than the corresponding value in EMRS, because affected by the file partition, the retrieval precision is slightly reduced in CRHM. In a word, CRHM can ensure relatively high retrieval precision and improve the retrieval efficiency significantly.

V. CONCLUSION
In this paper, we propose an efficient ciphertext retrieval scheme based on homomorphic encryption for multiple data VOLUME XX, 2017 9 owners in hybrid cloud, namely CRHM, where the public cloud server and the private cloud server cooperate to implement the ciphertext retrieval securely and efficiently. In CRHM, multiple EBBI trees are set up to generate and store security indexes, and a homomorphic encryption method based on large integer operations is designed to perform ciphertext retrieval process. CRHM can support multikeyword ranked retrieval, and provide the results which are not only relevant to query data, but also of high quality from multiple owners. Compared with the existing related schemes, CRHM achieves both high retrieval efficiency and accuracy. In addition, it can effectively guarantee the privacy and security of user file and retrieval.