Notice
There is currently an issue with the citation download feature. Learn more

IEEE Quick Preview
  • Abstract

SECTION I

INTRODUCTION

Mobile cloud computing [1] [2] [3] [4] gets rid of the hardware limitation of mobile devices by exploring the scalable and virtualized cloud storage and computing resources, and accordingly is able to provide much more powerful and scalable mobile services to users. In mobile cloud computing, mobile users typically outsource their data to external cloud servers, e.g., iCloud, to enjoy a stable, low-cost and scalable way for data storage and access. However, as outsourced data typically contain sensitive privacy information, such as personal photos, emails, etc., which would lead to severe confidentiality and privacy violations [5], if without efficient protections. It is therefore necessary to encrypt the sensitive data before outsourcing them to the cloud. The data encryption, however, would result in salient difficulties when other users need to access interested data with search, due to the difficulties of search over encrypted data. This fundamental issue in mobile cloud computing accordingly motivates an extensive body of research in the recent years on the investigation of searchable encryption technique to achieve efficient searching over outsourced encrypted data [6] [7] [8] [9].

A collection of research works have recently been developed on the topic of multi-keyword search over encrypted data. Cash et al. [10] propose a symmetric searchable encryption scheme which achieves high efficiency for large databases with modest scarification on security guarantees. Cao et al. [11] propose a multi-keyword search scheme supporting result ranking by adopting Formula$k$-nearest neighbors (kNN) technique [12]. Naveed et.al. [13] propose a dynamic searchable encryption scheme through blind storage to conceal access pattern of the search user.

In order to meet the practical search requirements, search over encrypted data should support the following three functions. First, the searchable encryption schemes should support multi-keyword search, and provide the same user experience as searching in Google search with different keywords; single-keyword search is far from satisfactory by only returning very limited and inaccurate search results. Second, to quickly identify most relevant results, the search user would typically prefer cloud servers to sort the returned search results in a relevance-based order [14] ranked by the relevance of the search request to the documents. In addition, showing the ranked search to users can also eliminate the unnecessary network traffic by only sending back the most relevant results from cloud to search users. Third, as for the search efficiency, since the number of the documents contained in a database could be extraordinarily large, searchable encryption schemes should be efficient to quickly respond to the search requests with minimum delays.

In contrast to the theoretical benefits, most of the existing proposals, however, fail to offer sufficient insights towards the construction of full functioned searchable encryption as described above. As an effort towards the issue, in this paper, we propose an efficient multi-keyword ranked search (EMRS) scheme over encrypted mobile cloud data through blind storage. Our main contributions can be summarized as follows:

  • We introduce a relevance score in searchable encryption to achieve multi-keyword ranked search over the encrypted mobile cloud data. In addition to that, we construct an efficient index to improve the search efficiency.
  • By modifying the blind storage system in the EMRS, we solve the trapdoor unlinkability problem and conceal access pattern of the search user from the cloud server.
  • We give thorough security analysis to demonstrate that the EMRS can reach a high security level including confidentiality of documents and index, trapdoor privacy, trapdoor unlinkability, and concealing access pattern of the search user. Moreover, we implement extensive experiments, which show that the EMRS can achieve enhanced efficiency in the terms of functionality and search efficiency compared with existing proposals.

The remainder of this paper is organized as follows. In Section II, the system model, security requirements and design goal are formalized. In Section III, we recap relevance scoring, secure kNN technique, blind storage system and ciphertext policy attribute-based encryption. In Section IV, we propose the EMRS. Its security analysis and performance evaluation are presented in Section V and Section VI, respectively. In Section VII, we present related work. Finally, we conclude this paper in Section VIII.

SECTION II

SYSTEM MODEL, SECURITY REQUIREMENTS AND DESIGN GOAL

A. System Model

As shown in Fig. 1, the system model in the EMRS consists of three entities: data owner, search users and cloud server. The data owner keeps a large collection of documents Formula$D$ to be outsourced to a cloud server in an encrypted form Formula$C$. In the system, the data owner sets a keyword dictionary Formula$W$ which contains Formula$d$ keywords. To enable search users to query over the encrypted documents, the data owner builds the encrypted index Formula$\digamma$. Both the encrypted documents Formula$C$ and encrypted index Formula$\digamma$ are stored on the cloud server through blind storage system.

Figure 1
Figure 1. System model.

When a search user wants to search over the encrypted documents, she first receives the secret key from the data owner. Then, she chooses a conjunctive keyword set Formula$\varpi$ which contains Formula$l$ interested keywords and computes a trapdoor Formula$T$ including a keyword-related token Formula$stag$ and the encrypted query vector Formula$Q$. Finally, the search user sends Formula$stag$, Formula$Q$, and an optional number Formula$k$ to the cloud server to request the most Formula$k$ relevant results.

Upon receiving Formula$stag$, Formula$Q$, and Formula$k$ from the search user, the cloud server uses the Formula$stag$ to access the index Formula$\digamma$ in the blind storage and computes the relevance scores with the encrypted query vector Formula$Q$. Then, the cloud server sends back descriptors Formula$(Dsc)$ of the top-k documents that are most relevant to the searched keywords. The search user can use these descriptors to access the blind storage system to retrieve the encrypted documents. An access control technique, e.g., attribute-based encryption, can be implemented to manage the search user’s decryption capability.

B. Security Requirements

In the EMRS, we consider the cloud server to be curious but honest which means it executes the task assigned by the data owner and the search user correctly. However, it is curious about the data in its storage and the received trapdoors to obtain additional information. Moreover, we consider the Formula$Knowing~Background$ model in the EMRS, which allows the cloud server to know more background information of the documents such as statistical information of the keywords. Specifically, the EMRS aims to provide the following four security requirements:

  • Confidentiality of Documents and Index: Documents and index should be encrypted before being outsourced to a cloud server. The cloud server should be prevented from prying into the outsourced documents and cannot deduce any associations between the documents and keywords using the index.
  • Trapdoor Privacy: Since the search user would like to keep her searches from being exposed to the cloud server, the cloud server should be prevented from knowing the exact keywords contained in the trapdoor of the search user.
  • Trapdoor Unlinkability: The trapdoors should not be linkable, which means the trapdoors should be totally different even if they contain the same keywords. In other words, the trapdoors should be randomized rather than determined. The cloud server cannot deduce any associations between two trapdoors.
  • Concealing Access Pattern of the Search User: Access pattern is the sequence of the searched results. In the EMRS, the access pattern should be totally concealed from the cloud server. Specifically, the cloud server cannot learn the total number of the documents stored on it nor the size of the searched document even when the search user retrieves this document from the cloud server.

C. Design Goal

To enable efficient and privacy-preserving multi-keyword ranked search over encrypted mobile cloud data via blind storage system, the EMRS has following design goals:

  • Multi-Keyword Ranked Search: To meet the requirements for practical uses and provide better user experience, the EMRS should not only support multi-keyword search over encrypted mobile cloud data, but also achieve relevance-based result ranking.
  • Search Efficiency: Since the number of the total documents may be very large in a practical situation, the EMRS should achieve sublinear search with better search efficiency.
  • Confidentiality and Privacy Preservation: To prevent the cloud server from learning any additional information about the documents and the index, and to keep search users’ trapdoors secret, the EMRS should cover all the security requirements that we introduced above.
SECTION III

PRELIMINARIES

A. Relevance Scoring

In searchable symmetric encryption (SSE) schemes, due to a large number of documents, search results should be retrieved in an order of the relevancy with the searched keywords. Scoring is the natural way to weight the relevancy of the documents. Among many relevance scoring techniques, we adopt Formula$TF$-Formula$IDF$ weighting [15] in the EMRS. In Formula$TF$-Formula$IDF$ weighting, term frequency Formula$tf_{t,\,f}$ refers to the number of term Formula$t$ in a document Formula$f$. Inverse document frequency is calculated as Formula$idf_{t}=log\frac {N}{df_{t}}$, where Formula$df_{t}$ denotes the number of documents which contain term Formula$t$ and Formula$N$ refers to the total number of documents in the database. Then, the weighting of term Formula$t$ in a document Formula$f$ can be calculated as Formula$tf_{t,f}*idf_{t}$.

B. Secure kNN Computation

We adopt the work of Wong et al. [12] in the EMRS. Wong et al. propose a secure Formula$k$-nearest neighbor (kNN) scheme which can confidentially encrypt two vectors and compute Euclidean distance of them. First, the secret key Formula$(S,M_{1},M_{2})$ should be generated. The binary vector Formula$S$ is a splitting indicator to split plaintext vector into two random vectors, which can confuse the value of plaintext vector. And Formula$M_{1}$ and Formula$M_{2}$ are used to encrypt the split vectors. The correctness and security of secure kNN computation scheme can be referred to [12].

C. Blind Storage System

A blind storage system [13] is built on the cloud server to support adding, updating and deleting documents and concealing the access pattern of the search user from the cloud server. In the blind storage system, all documents are divided into fixed-size blocks. These blocks are indexed by a sequence of random integers generated by a document-related seed. In the view of a cloud server, it can only see the blocks of encrypted documents uploaded and downloaded. Thus, the blind storage system leaks little information to the cloud server. Specifically, the cloud server does not know which blocks are of the same document, even the total number of the documents and the size of each document. Moreover, all the documents and index can be stored in the blind storage system to achieve a searchable encryption scheme.

D. Ciphertext Policy Attribute-Based Encryption

In ciphertext policy attribute-based encryption (CP-ABE) [16], ciphertexts are created with an access structure (usually an access tree) which defines the access policy. A user can decrypt the data only if the attributes embedded in his attribute keys satisfy the access policy in the ciphertext. In CP-ABE, the encrypter holds the ultimate authority of the access policy.

SECTION IV

PROPOSED SCHEME

In this section, we propose the detailed EMRS. Since the encrypted documents and index Formula$\digamma$ are both stored in the blind storage system, we would provide the general construction of the blind storage system. Moreover, since the EMRS aims to eliminate the risk of sharing the key that is used to encrypt the documents with all search users and solve the trapdoor unlinkability problem in Naveed’s scheme [13], we modify the construction of blind storage and leverage ciphertext policy attribute-based encryption (CP-ABE) technique in the EMRS. However, specific construction of CP-ABE is out of scope of this paper and we only give a simple indication here. The notations of this paper is shown in Table 1. The EMRS consists of the following phases: System Setup, Construction of Blind Storage, Encrypted Database Setup, Trapdoor Generation, Efficient and Secure Search, and Retrieve Documents from Blind Storage.

Table 1
Table 1. Notations.

A. System Setup

The data owner takes a security parameter Formula$\lambda$, and outputs two invertible matrixes Formula$M_{1}, M_{2} \in R^{(d+2)*(d+2)}$ as well as a (Formula$\text{d}+2$)-dimension binary vector Formula$S$ as the secret key, where Formula$d$ represents the size of the keyword dictionary. Then, the data owner generates a set of attribute keys Formula$sk$ for each search user according to her role in the system. The data owner chooses a key Formula$K_{T}$ for a symmetric cryptography Formula$Enc()$, e.g., AES. Finally, the data owner sends Formula$(M_{1}, M_{2}, S, sk, Enc(), K_{T})$ to the search user through a secure channel.

B. Construction of Blind Storage

The data owner chooses a full-domain collusion resistant hash function Formula$H$, a full-domain pseudorandom function Formula$\Psi$, a pseudorandom generator Formula$\Gamma$ and a hash function Formula$\Phi \!:\! \{0,1\}^{*} \!\!\rightarrow \! \{0,1\}^{192}$. Formula$\Psi$ and Formula$\Gamma$ are based on the AES block-cipher [13]. Then, the data owner chooses a number Formula$\alpha >1$ that defines the expansion parameter and a number Formula$\kappa$ that denotes the minimum number of blocks in a communication.

1) B.Keygen

The data owner generates a key Formula$K_{\Psi }$ for the function Formula${\Psi }$ and sends it to the search user using a secure channel.

2) B.Build

This phase takes into a large collection of documents Formula$D$. Formula$D$ is a list of documents Formula$(d_{1}, d_{2}, d_{3} \cdots d_{m})$ containing Formula$m$ documents. where each document has a unique id denoted as Formula$id_{i}$. The B.Build outputs an array of blocks Formula$B$, which consists of Formula$n_{b}$ blocks of Formula$m_{b}$ bits each. For document Formula$d_{i}$, it contains Formula$size_{i}$ blocks of Formula$m_{b}$ bits each and each header of these blocks contains the Formula$H(id_{i})$. In addition, the header of the first block of the document Formula$d_{i}$ indicates the size of Formula$d_{i}$. At the beginning, we initialize all blocks in Formula$B$ with all 0. For each document Formula$d_{i}$ in Formula$D$, we construct the blind storage as follows:

Step 1: Compute the seed Formula$\sigma _{i}=\Psi _{K_\Psi }(id_{i})$ as the input of the function Formula$\Gamma$. Generate a sufficiently long bit-number through the function Formula$\Gamma$ using the seed Formula$\sigma _{i}$ and parse it as a sequence of integers in the range Formula$[n_{b}]$. Let Formula$\pi [\sigma _{i},l]$ denote the first Formula$l$ integers of this sequence. Generate a set Formula$S_{f}=\pi [\sigma _{i}, \max (\lceil \alpha * {size_{i}}\rceil,\kappa )]$.

Step 2: Let Formula${S^{0}_{f}}=\pi [\sigma _{i},\kappa ]$, then check if the following conditions hold:

  • There exists Formula$size_{i}$ free blocks indexed by the integers in the set Formula$S_{f}$.
  • There exists one free block indexed by the integers in the set Formula${S^{0}_{f}}$.

If either of the above two does not hold, abort.

Step 3: Pick a subset Formula${S^{\prime }_{f}} \subset {S_{f}}$ that contains Formula$size_{i}$ integers, and make sure that the blocks indexed by these integers in the subset Formula${S^{\prime }_{f}}$ are all free. We would rely on the fact that integers in the set Formula$S_{f}$ are in a random order and we pick the first Formula$size_{i}$ integers indexing free blocks and make these integers form the subset Formula${S^{\prime }_{f}}$. Mark these blocks as unfree. Then, write the document Formula$d_{i}$ to the blocks indexed by the integers in Formula${S^{\prime }_{f}}$ in an increasing order.

Note that, one can once write the blocks of different documents to the blind storage system to conceal the associations of the blocks. Moreover, the specific construction of each block and the encryption of the blocks would be discussed next.

Discussions

The main idea of the blind storage system is that storing a document in a set of fixed-size blocks indexed by the integers, that are generated by applying the seed Formula$\sigma _{i}$ to the pseudorandom generator Formula$\Gamma$. To reduce the probability that the number of free blocks indexed by integers in Formula$S_{f}$ is less than Formula$size_{i}$, we can choose a sequence of Formula$\alpha \ast size_{i}$ integers as the set Formula$S_{f}$. Here the choice of the parameter Formula$\alpha$ is an inherent tension between collision probability and the wasted space. And the probability the above two conditions in Step 2 do not hold may be negligible by the choice of the parameters [13]. And we would prove it in Section V.

C. Encrypted Database Setup

The data owner builds the encrypted database as follows:

Step 1: The data owner computes the d-dimension relevance vector Formula$p=(p_{1}, p_{2}, \cdots p_{d})$ for each document using the Formula$TF$-Formula$IDF$ weighting technique, where Formula$p_{j}$ for Formula$j\in (1,2\cdots d)$ represents the weighting of keyword Formula$\omega _{j}$ in document Formula$d_{i}$. Then, the data owner extends the Formula$p$ to a (Formula$\text{d}+2$)-dimension vector Formula$p^{*}$. The (Formula$\text{d}+1$)-th entry of Formula$p^{*}$ is set to a random number Formula$\varepsilon$ and the (Formula$\text{d}+2$)-th entry is set to 1. We would let Formula$\varepsilon$ follow a normal distribution Formula$N(\mu,\sigma ^{2})$ [11]. For each document Formula$d_{i}$, to compute the encrypted relevance vector, the data owner encrypts the associated extended relevance vector Formula$p^{*}$ using the secret key Formula$M_{1}$, Formula$M_{2}$ and Formula$S$. First, the data owner chooses a random number Formula$r$ and splits the extended relevance vector Formula$p^{*}$ into two (Formula$\text{d}+2$)-dimension vectors Formula$p^{\prime }$ and Formula$p^{\prime \prime }$ using the vector Formula$S$. For the j-th item in Formula$p^{*}$, set FormulaTeX Source$$\begin{equation} \begin{cases} p^{\prime }_{j}=p^{\prime \prime }_{j}=p^{*}_{j},~\quad if~ S_{j}=1\\ p^{\prime }_{j}=\frac {1}{2}p^{*}_{j}+r, {}\quad p^{\prime \prime }_{j}=\frac {1}{2}p^{*}_{j}-r,\quad otherwise \\ \end{cases} \end{equation}$$ where Formula$S_{j}$ represents the j-th item of Formula$S$. Then compute the Formula$P=\{M_{1}^{T}\cdot p^{\prime },M_{2}^{T}\cdot p^{\prime \prime }\}$ as the encrypted relevance vector.

Step 2: For each document Formula$d_{i}$ in Formula$D$, set the document into blocks of Formula$m_{b}$ bits each. For each block, there is a header Formula$H(id_{i})$ indicating that this block belongs to document Formula$d_{i}$. And the Formula$size_{i}$ of the document is contained in the header of the first block of Formula$d_{i}$. Then, for each document Formula$d_{i}$, the data owner chooses a 192-bit key Formula$K_{i}$ for the algorithm Formula$Enc()$. More precisely, for each block Formula$B[j]$ of the document Formula$d_{i}$, where Formula$j$ represents the index number of this block, compute the Formula$K_{i} \oplus \Phi (j)$ as the key for the encryption of this block. Since each block has a unique index number, the blocks of the same document are encrypted with different keys. The document Formula$d_{i}$ contains Formula$size_{i}$ encrypted blocks and the first block of the document Formula$d_{i}$ with index number Formula$j$ is as FormulaTeX Source$$\begin{equation} Enc_{(K_{i} \oplus \Phi (j))}(H(id_{i})||size_{i}||data) \end{equation}$$ And the rest of the blocks of Formula$d_{i}$ is as FormulaTeX Source$$\begin{equation} Enc_{(K_{i} \oplus \Phi (j))}(H(id_{i})||data) \end{equation}$$ Finally, the data owner encrypts all the documents and writes them to the blind storage system using the B.Build function.

Step 3: To enable efficient search over the encrypted documents, the data owner builds the index Formula$\digamma$. First, the data owner defines the access policy Formula$\upsilon _{i}$ for each document Formula$d_{i}$. We denote the result of attribute-based encryption using access policy Formula$\upsilon _{i}$ as Formula$ABE_{\upsilon _{i}}()$. The data owner initializes Formula$\digamma$ to an empty array indexed by all keywords. Then, the index Formula$\digamma$ can be constructed as shown in Algorithm 1.

Algorithm 1 Initialize Formula$\digamma$

Algorithm 1

As we can see, the index Formula$\digamma$ maps the keyword to the encrypted relevance vectors Formula$(P)$ and the descriptors Formula$(Dsc)$ of the documents that contain the keyword. And each list Formula$\digamma [\omega ]$ can be transformed to be stored in the blind storage system with Formula$\omega$ as the document id. Specifically, for each Formula$\digamma [\omega ]$, the data owner computes Formula$\sigma _{\omega }=\Psi _{K_{\Psi }}(\omega )$ as the seed for the function Formula$\Gamma$ to generate the set Formula$S_{f}$. Here, for each block of Formula$\digamma [\omega ]$ indexed by the integer Formula$j$, the data owner adds an encrypted header as Formula$Enc_{(K_{T} \oplus \Phi (j))}(H(\omega )||size_{\omega })$, where Formula$size_{\omega }$ represents the number of blocks that belong to Formula$\digamma [\omega ]$. Finally, the data owner writes the index Formula$\digamma$ to the blind storage system using the B.Build function.

Discussions

When using the B.Build function, it is crucial to determine the way we compute the seed for generating the set Formula$S_{f}$. We use the document id Formula$id_{i}$ to compute the seed for the documents stored in the blind storage system, and the keyword Formula$\omega$ to compute the seed for each Formula$\digamma [\omega ]$. Moreover, each header of the blocks of the documents contains the encrypted Formula$H(id_{i})$ and the first block indicates the Formula$size_{i}$. And the blocks of index Formula$\digamma$ are different from those of the documents. Each header of the blocks of index Formula$\digamma$ is denoted as Formula$Enc_{(K_{T} \oplus \Phi (j))}(H(\omega )||size_{\omega })$. This little change is for the security concerns and does not affect the implementation of the blind storage. In addition, since each block is encrypted using the key generated by the index number, the headers would be different even if the two blocks belong to the same document or the same list Formula$\digamma [\omega ]$.

D. Trapdoor Generation

To search over the outsourced encrypted data, the search user needs to compute the trapdoor including a keyword-related token Formula$stag$ and encrypted query vector Formula$Q$ as follows:

Step 1: The search user takes a keyword conjunction Formula$\varpi =(\omega _{1}, \omega _{2}, \cdots \omega _{l})$ with Formula$l$ keywords of interest in Formula$W$. A d-dimension binary query vector Formula$q$ is generated where the j-th bit of Formula$q$ represents whether Formula$\omega _{j}\in \varpi$ or not. Then, the search user chooses two random numbers Formula$r$, Formula$t$ and scales the query vector Formula$q$ to a (Formula$\text{d}+2$)-dimension vector Formula$q^{*}$ as FormulaTeX Source$$\begin{equation} q^{*}=\{rq,r,t\} \end{equation}$$ Then, the search user chooses a random number Formula$r^{\prime }$ and splits the vector Formula$q^{*}$ into two (Formula$\text{d}+2$)-dimension vectors Formula$q^{\prime }$ and Formula$q^{\prime \prime }$. For the j-th item in Formula$q^{*}$, set FormulaTeX Source$$\begin{equation} \left \{{\!\! \begin{array}{l}\textstyle q^{\prime }_{j}=q^{\prime \prime }_{j}=q^{*}_{j}, ~if ~S_{j}=0\\\textstyle q^{\prime }_{j}=\frac {1}{2}q^{*}_{j}+r^{\prime },\quad q^{\prime \prime }_{j}=\frac {1}{2}q^{*}_{j}-r^{\prime }, {}\quad otherwise \end{array} }\right. \end{equation}$$ The search user computes the Formula$Q=\{M_{1}^{-1}\cdot q^{\prime },M_{2}^{-1}\cdot q^{\prime \prime }\}$ as the encrypted query vector.

Step 2: The search user chooses the estimated least frequent keyword Formula$\omega ^{\prime }$ in the conjunction Formula$\varpi$ and computes the seed Formula$\sigma _{\omega ^{\prime }}=\Psi _{K_\Psi }(\omega ^{\prime })$. Then the search user generates a long bit-number through the function Formula$\Gamma$ using the seed Formula$\sigma _{\omega ^{\prime }}$. The search user chooses the sequence Formula$\pi [\sigma _{\omega ^{\prime }},\kappa ]$ and randomly adds Formula$\kappa$ dummy integers to the sequence. The search user downloads the blocks indexed by these Formula$2\kappa$ integers and decrypts the header using the key Formula$K_{T} \oplus \Phi (j)$, where Formula$j$ is the index number of the block, to find the first block of the list Formula$\digamma [\omega ^{\prime }]$, which consists of the descriptors and the encrypted relevance vectors of the documents containing Formula$\omega ^{\prime }$. Then the search user obtains the Formula$size_{\omega ^{\prime }}$ from the first block and computes the set Formula$S_{\omega }=\pi [\sigma _{\omega ^{\prime }}, \alpha \ast size_{\omega ^{\prime }}]$. The search user randomly adds Formula$\alpha \ast size_{\omega ^{\prime }}$ dummy integers to the set Formula$S_{\omega }$ resulting in a set Formula$S^{\prime }_{\omega }$ of Formula$2 \alpha \ast size_{\omega ^{\prime }}$ integers. And the extended set Formula$S^{\prime }_{\omega }$ is denoted as Formula$stag$. Note that, the Formula$stag$ consists of some dummy integers, which is for the privacy consideration.

Finally, the search user sends Formula$Q$, Formula$stag$ and a number Formula$k$ to the cloud server to request the most Formula$k$ relevant documents.

E. Efficient and Secure Search

Upon receiving Formula$Q$, Formula$stag$, and Formula$k$, the cloud server parses the Formula$stag$ to get a set of integers in the range Formula$[n_{b}]$. Then, the cloud server accesses index Formula$\digamma$ in the blind storage and retrieves the blocks indexed by the integers to obtain the tuples Formula$(ABE_{\upsilon _{i}}(id_{i}||K_{i} ||x),P)$ on these blocks. Note that, these blocks consist of the blocks of Formula$\digamma [\omega ^{\prime }]$ and some dummy blocks. For each retrieved encrypted relevance vector Formula$P$, compute the relevance score Formula$Score_{i}$ for the associated document Formula$d_{i}$ with the encrypted query vector Formula$Q$ as follows:FormulaTeX Source$$\begin{align} Score_{i}=&P \cdot Q\notag \\=&\{M_{1}^{T}\cdot p^{\prime },M_{2}^{T}\cdot p^{\prime \prime }\} \cdot \{M_{1}^{-1}\cdot q^{\prime },M_{2}^{-1}\cdot q^{\prime \prime }\}\notag \\=&p^{\prime } \cdot q^{\prime }+p^{\prime \prime } \cdot q^{\prime \prime } \notag \\=&p^{*} \cdot q^{*}\notag \\=&(p,\epsilon,1)\cdot (rq,r,t)\notag \\=&r(pq+\varepsilon )+t \end{align}$$

Finally, after sorting the relevance scores, the cloud server sends back the descriptors Formula$ABE_{\upsilon _{i}}(id_{i}||K_{i} || x)$ of the top-k documents that are most relevant to the searched keywords. Note that, as discussed before, attribute-based encryption as an access control technique can be implemented to manage search user’s decryption capability.

F. Retrieve Documents From Blind Storage

Upon receiving a set of descriptors Formula$ABE_{\upsilon _{i}}(id_{i}||K_{i} || x)$, the search user can retrieve the documents as follows:

Step 1: If the search user’s attributes satisfy the access policy of the document, the search user can decrypt the descriptor using her secret attribute keys to get the document id Formula$id_{i}$ and the associated symmetric key Formula$K_{i}$. To retrieve the document Formula$d_{i}$, compute Formula$\sigma _{i}=\Psi _{K_\Psi }(id_{i})$ for the function Formula$\Gamma$. Generate a sufficiently long bit-number through the function Formula$\Gamma$ using the seed Formula$\sigma _{i}$, parse it as a sequence of integers in the range Formula$[n_{b}]$ and choose the first Formula$\kappa$ integers as the set Formula$S^{0}_{f}$. Retrieve the blocks indexed by these Formula$\kappa$ integers from the encrypted database Formula$D$ through blind storage system.

Step 2: The search user tries to decrypt these blocks using the symmetric key Formula$K_{i} \oplus \Phi (j)$, until she finds the first block of the document Formula$d_{i}$. If she does not find the first block, the document is not accessed in the system. Otherwise, the search user recovers the size of the document Formula$size_{i}$ from the header of the first block.

Step 3: Then, the search user computes Formula$l=\lceil \alpha \ast {size_{i}} \rceil$. If Formula$l \leq \kappa$, compute Formula$S_{f}=\pi [\sigma _{i}, \kappa ]$. Otherwise, compute Formula$S_{f}=\pi [\sigma _{i}, l]$ and retrieve the rest of the blocks indexed by the integers in Formula$S_{f}$ via the blind storage system. Decrypt these blocks and combine the blocks with the header Formula$H(id_{i})$ in an increasing order to recover document Formula$d_{i}$.

Discussions

Here we explain how the search user retrieves one document from the blind storage system. This can form the foundation of the B.Aceess function of the blind storage. Moreover, the search user can require more than one document once by combining the sequence Formula$S^{0}_{f}$ and Formula$S_{f}$ of different documents in a random order. And this combination can further conceal access pattern of the search user since the cloud server even does not know the number of documents that the search user requires.

SECTION V

SECURITY ANALYSIS

Under the assumption presented in Section II, we analyze the security properties of the EMRS. We give analysis of the EMRS in terms of confidentiality of documents and index, trapdoor privacy, trapdoor unlinkability and concealing access pattern of the search user.

A. Confidentiality of Documents and Index

The documents are encrypted by the traditional symmetric cryptography technique before being outsourced to the cloud server. Without a correct key, the search user and cloud server cannot decrypt the documents. As for index confidentiality, the relevance vector for each document is encrypted using the secret key Formula$M_{1}$, Formula$M_{2}$, and Formula$S$. And the descriptors of the documents are encrypted using CP-ABE technique. Thus, the cloud server can only use the index Formula$\digamma$ to retrieve the encrypted relevance vectors without knowing any additional information, such as the associations between the documents and the keywords. And only the search user with correct attribute keys can decrypt the descriptor Formula$ABE_{\upsilon _{i}}(id_{i}||K_{i} || x )$ to get the document id and the associated symmetric key. Thus, the confidentiality of documents and index can be well protected.

B. Trapdoor Privacy

When a search user generates her trapdoor including the keyword-related token Formula$stag$ and encrypted query vector Formula$Q$, she randomly chooses two numbers Formula$r$ and Formula$t$. Then, for the query vector Formula$q$, the search user extends it as Formula$(rq,r,t)$ and encrypts the query vector using the secret key Formula$M_{1},M_{2}$ and Formula$S$. Thus, the query vectors can be totally different even if they contain same keywords. And we use the secure function Formula$\Psi$ and Formula$\Gamma$ to help the search user compute keyword-related token Formula$stag$ using the secret key Formula${K_\Psi }$. Without the secret key Formula$M_{1},M_{2}, S$ and Formula${K_\Psi }$, the cloud server cannot pry into the trapdoor. And the search user can add dummy integers to the set Formula$S_{f}$ to conceal what she is truly searching for. Thus, the keyword information in the trapdoor is totally concealed from the cloud server in the EMRS and trapdoor privacy is well protected.

C. Trapdoor Unlinkability

Trapdoor unlinkability is defined as that the cloud server cannot deduce associations between any two trapdoors. Even though the cloud server cannot decrypt the trapdoors, any association between two trapdoors may lead to the leakage of the search user’s privacy. We consider whether the two trapdoors including Formula$stag$ and the encrypted query vector Formula$Q$ can be linked to each other or to the keywords. Moreover, we would prove the EMRS can achieve trapdoor unlinkability under the Formula$Knowing~Backgroud$ model.

To compute the encrypted query vector Formula$Q$ that is defined as Formula$\{M_{1}^{-1}\cdot q^{\prime },M_{2}^{-1}\cdot q^{\prime \prime }\}$ in the EMRS. First, the search user needs to extend the query vector Formula$q$ to Formula$q^{*}$. As we can see, the (Formula$\text{d}+1$)-th and (Formula$\text{d}+2$)-th entry of the vector Formula$q^{*}$ are set to random values Formula$r$ and Formula$t$. So there are Formula$2^{{\eta }_{r}}*2^{{\eta }_{t}}$ possible values, where the number Formula$r$ and Formula$t$ are Formula${\eta }_{r}$-bit or Formula${\eta }_{t}$-bit long, respectively. Further, the search user needs to split the vector Formula$q^{*}$ according to the splitting vector Formula$S$ as we discussed above. If Formula$S_{j}=0$, the Formula$q^{*}_{j}$ is split into two random values which add up to Formula$q^{*}_{j}$. Suppose that the number of 0 in Formula$S$ is Formula$\mu$ and each dimension of the vector Formula$q^{\prime }$ is Formula$\eta _{q}$-bit long. We can see that Formula${\eta }_{r}$, Formula${\eta }_{t}$, Formula$\mu$ and Formula$\eta _{q}$ are independent of each other. Then we can compute the probability that two encrypted query vectors are the same as FormulaTeX Source$$\begin{equation} P=\frac {1}{{2^{{\eta }_{r}}}{2^{{\eta }_{t}}}{2^{{\mu } {\eta _{q}}}}}=\frac {1}{{2^{{\eta }_{r}+{{\eta }_{t}}+{\mu } {\eta _{q}}}}} \end{equation}$$ Therefore, the larger these parameters are, the lower the probability is. Hence, if we choose 1024-bit Formula$r$ and Formula$t$, the probability that two encrypted query vectors are the same is Formula$P<\frac {1}{2^{2048}}$, which is negligible as a result.

As for the keyword-related token Formula$stag$, the search user first obtains the Formula$size_{\omega }$ from the cloud server using the sequence of Formula$2\kappa$ integers, half of which are dummy integers. Then, the search user computes the set Formula$S_{\omega }=\pi [\sigma _{\omega }, \alpha \ast size_{\omega }]$ and adds Formula$\alpha \ast size_{\omega }$ dummy integers to the set Formula$S_{\omega }$ to form the Formula$stag$. Thus, each Formula$stag$ contains Formula$2 \alpha \ast size_{\omega }$ random integers, half of which are random integers. Suppose the integers are Formula$n_{b}$ bits long. Then the probability that the two Formula$stag\text{s}$ are the same is FormulaTeX Source$$\begin{equation} P^{\prime }=\frac {1}{2^{2 \alpha \ast size_{\omega } \ast n_{b}}} \end{equation}$$ Hence, if we choose 12-bit long Formula$n_{b}$, 3-bit long extension parameter Formula$\alpha$ and Formula$size_{\omega }$ is supposed to be 8-bit long, the probability Formula$P^{\prime }<\frac {1}{2^{576}}$, which is negligible as a result.

In Cash’s scheme [10] and Naveed’s scheme [13], for the same keyword, the search user can only compute the same Formula$stag$ or the same set Formula$S_{f}$. Moreover, when a search user accesses the cloud server using a keyword that has been searched before, the cloud server can learn that two search requests contain the same keyword. Under Formula$Knowing~Backgroud$ model, the cloud server may learn the search frequency of the keywords and deduce some information using the statistic knowledge in [10] and [13].

D. Concealing Access Pattern of the Search User

The access pattern means the sequence of the searched results [11]. In Cash’s scheme [10] and Cao’s scheme [11], the search user directly obtains the associated documents from the cloud server, which may reveal the association between the search request and the documents to the cloud server. In the EMRS by modifying the blind storage system, access pattern is well concealed from the cloud server. Since the headers of the blocks are encrypted with the block number Formula$j$ and each descriptor has a random padding, they would be different even if they belong to the same document. Thus, in view of the cloud server, it can only see blocks downloaded and uploaded. And, the cloud server even does not know the number of the documents stored in its storage and the length of each document, since all the documents are divided into blocks in a random order. In addition, when a search user requests a document, she can choose more blocks than the document contains. Moreover, she can require blocks of different documents at one time in a random order to totally conceal what she is requesting.

In the implementation of the blind storage system, there would be a trade-off between security guarantee and performance by the choice of parameters. We define the Formula$P_{err}$ as the probability that the data owner aborts the document when there are not enough free blocks indexed by the integers in the set Formula$S_{f}$ as discussed in Section IV. When this abort happens, some illegitimate information may be revealed to the cloud server [13]. We consider the following parameters Formula$\gamma$, Formula$\alpha$ and Formula$\kappa$ to measure the Formula$P_{err}$. We denote Formula$\gamma =n_{b} / m$, where Formula$n_{b}$ is the number of blocks in the array Formula$B$ and Formula$m$ is the total number of the documents stored on the cloud server. Formula$\alpha$ is the ratio that scales the number of blocks a document contains to the number of blocks in the set Formula$S_{f}$. Formula$\kappa$ is the minimum number of blocks in a transaction. Then, according to [13], we can compute the Formula$P_{err}$ as FormulaTeX Source$$\begin{equation} P_{err}(\gamma,\alpha,\kappa ) \leq {\max \limits _{n\geq \frac {\kappa }{\alpha }}} {\sum \limits _{i=0}^{n-1}} \left ({ \begin{aligned} \lceil \alpha &n \rceil \\ &i \end{aligned} }\right ) \left ({ \frac {\gamma -1}{\gamma } }\right )^{i} \left ({ \frac {1}{\gamma } }\right )^{\lceil \alpha n\rceil -i} \end{equation}$$

As we can see, the higher these parameters we choose, the lower the probability Formula$P_{err}$ is and the higher the security guarantee would be. However, the parameters also influence the performance of the blind storage system, such as the communication and computation cost. By the choice of these parameters, the probability Formula$P_{err}$ would be negligible [13].

The comparison of security level is shown in TABLE 2. We can see that the EMRS can achieve best security guarantees compared with the exiting schemes [10], [11], [13].

Table 2
Table 2. Comparison of Security Level.
SECTION VI

PERFORMANCE EVALUATION

A. Functionality

Considering a large number of documents and search users in a cloud environment, searchable encryption schemes should allow privacy-preserving multi-keyword search and return documents in a order of higher relevance to the search request. As shown in TABLE 3, we compare functionalities among the EMRS, Cash’s scheme [10], Cao’s scheme [11] and Naveed’s scheme [13].

Table 3
Table 3. Comparison of Functionalities.

Cash’s scheme supports multi-keyword search, but cannot return results in a specific order of the relevance score. Cao’s scheme achieves multi-keyword search and returns documents in a relevance-based order. Naveed’s scheme implements the blind storage system to protect the access pattern but it only supports single-keyword search and returns undifferentiated results. The EMRS can achieve multi-keyword search, and relevance sorting while preserving a high security guarantees as discussed in Section V.

B. Computation Overhead

We evaluate the performance of the EMRS through simulations and compare the time cost with Cao’s [11]. We apply a real dataset National Science Foundation Research Awards Abstracts 1990–2003 [17], by randomly selecting some documents. Then, we conduct real-world experiments on a 2.8Hz-processor, computing machine to evaluate the performance of index construction and search phases. Moreover, we implement the trapdoor generation on a 1.2GHz smart phone. We would show the simulation experiments of the EMRS, and demonstrate that the computation overhead of index construction and trapdoor generation are almost the same compared with that of Cao’s [11]. Then we would compare the execution time of search phase with Cao’s [11] and show that the EMRS achieves better search efficiency.

1) Index Construction

Index construction in the EMRS consists of two phases: encrypted relevance vector computation and the efficient index Formula$\digamma$ construction via blind storage.

As for the computation of encrypted relevance vector, the data owner first needs to compute the relevance score for each keyword in each document using the Formula$TF-IDF$ technique. As shown in Fig. 2, both the size of the dictionary and the number of documents would influence the time for calculating all the relevance scores. Then, to compute the encrypted relevance vector Formula$P$, the data owner needs two multiplications of a Formula$(d+2)*(d+2)$ matrix and a (Formula$\text{d}+2$)-dimension vector with complexity Formula$O(d^{2})$. The time cost for computing all the encrypted relevance vectors is linear to the size of the database since time for building subindex of one document is fixed. Thus, the computation complexity is Formula$O(md^{2})$, where Formula$m$ represents the number of documents in the database and Formula$d$ represents the size of the keyword dictionary Formula$W$. The computation complexity is as the same as that in Cao’s [11]. The computational cost for computing the encrypted relevance vectors is shown in Fig. 3. As we can see, both the size of the dictionary and the number of documents would affect the execution time.

Figure 2
Figure 2. Time for calculating relevance score. (a) For the different size of dictionary with the same number of documents, Formula$m = 10000.$ (b) For the different number of documents with the same size of dictionary, Formula$|W| = 10000.$
Figure 3
Figure 3. Time for computing the encrypted relevance vectors. (a) For the different size of dictionary with the same number of documents, Formula$m = 6000.$ (b) For the different number of documents with the same size of dictionary, Formula$|W| = 4000.$

Finally, we adopt the index Formula$\digamma$ via the blind storage in the EMRS to improve search efficiency and conceal the access pattern of the search user. For each keyword Formula$\omega \in W$, we need to build the list Formula$\digamma [\omega ]$ of tuples Formula$(ABE_{\upsilon _{i}}(id_{i}||K_{i} || x),P)$ of documents that contain the keyword and upload it using the B.Build function. So the computation complexity to build the index Formula$\digamma$ is Formula$O(\varrho d)$, where Formula$\varrho$ represents the average number of tuples contained in the list Formula$\digamma [\omega ]$ and is no more than the number of document Formula$m$. Since the access pattern is not considered in most schemes, we are not going to give the specific comparison of the implementation of the blind storage [13] in the EMRS.

2) Trapdoor Generation

In the EMRS, trapdoor generation consists of Formula$stag$ and encrypted query vector Formula$Q$. To compute Formula$stag$, the search user only needs two efficient operations (Formula$\Psi$ and Formula$\Gamma$) to generate a sequence of random integers. Compared with time cost to compute the encrypted query vector which is linearly increasing with the size of the keyword dictionary, time cost for computing Formula$stag$ is negligible. As for computing the encrypted query vector Formula$Q$, the search user needs to compute two multiplications of a Formula$(d+2)*(d+2)$ matrix and a (Formula$\text{d}+2$)-dimension vector with complexity Formula$O(d^{2})$. Thus, the computation complexity of trapdoor generation for the search user is Formula$O(d^{2})$, which is as the same as that in Cao’s scheme [11]. As shown in Fig. 4, we conduct a simulation experiment on a 1.2Ghz smart phone and give the experiment results for computing trapdoor in the EMRS.

Figure 4
Figure 4. Time for generating trapdoor on a real smart phone. (a) For the different size of dictionary with the same number of query keywords, Formula$|\varpi | = 20.$ (b) For the different number of query keywords with the same size of dictionary, Formula$|W| = 6000.$

3) Search Efficiency

Search operation in Cao’s scheme [11] requires computing the relevance scores for all documents in the database. For each document, the cloud server needs to compute the inner product of two (Formula$\text{d}+2$)-dimension vectors twice. Thus, the computation complexity for the whole data collection is Formula$O(md)$. As we can see, the search time in Cao’s scheme linearly increases with the scale of the dataset, which is impractical for large-scale dataset.

In the EMRS, by adopting the inverted index Formula$\digamma$ which is built in the blind storage system, we achieve a sublinear computation overhead compared with Cao’s scheme. Upon receiving Formula$stag$, the cloud server can use Formula$stag$ to access blind storage and retrieve the encrypted relevance vector on the blocks indexed by the Formula$stag$. These blocks consist of blocks of documents containing the Formula$stag$-related keyword and some dummy blocks. Thus, the EMRS can significantly decrease the number of documents which are relevant to the searched keywords. Then, the cloud server only needs to compute the inner product of two (Formula$\text{d}+2$)-dimension vectors for the associated documents rather than computing relevance scores for all documents as that in Cao’s scheme [11]. The computation complexity for search operation in the EMRS is Formula$O(\alpha \varrho _{s} d)$, where Formula$\varrho _{s}$ represents the the number of documents which contain the keyword applied by the keyword-related token Formula$stag$ and the Formula$\alpha$ is the extension parameter that scales the number of blocks in a document to the number of blocks in the set Formula$S_{f}$. The value of Formula$\varrho _{s}$ can be small if the search user typically chooses the estimated least frequent keyword, such that the computation cost for search on the cloud server is significantly reduced.

As shown in Fig. 5, the computation cost of search phase is mainly affected by the number of documents in the dataset and the size of the keyword dictionary. In our experiments, we implement the index on the memory to avoid the time-cost I/O operations. Note that, although the time costs of search operation are linearly increasing in both schemes, the increase rate of the EMRS is less than half of that in Cao’s scheme.

Figure 5
Figure 5. Time for search on the cloud server. (a) For different number of documents with the same size of keyword dictionary and number of searched keywords, Formula$|W| = 8000, |\varpi | = 20.$ (b) For different size of keyword dictionary with the same number of documents and searched keywords, Formula$m = 8000, |\varpi | = 20.$

C. Communication Overhead

When the system is once setup, including generating encrypted documents and index, the communication overhead is mainly influenced by the search phase. In this section, we would compare the communication overhead among the EMRS, Cash’s scheme [10], Cao’s scheme [11] and Naveed’s scheme [13] when searching over the cloud server. Since most existing schemes of SSE only consider obtaining a sequence of results rather than the related documents, the comparison here would not involve the communication of retrieving the documents.

In Cao’s scheme [11], the search user needs to compute the trapdoor and send it to the cloud server. Then it can obtain the searched results. The communication overhead in Cao’s is Formula$2(d+2)\eta _{q}$, where Formula$d$ represents the size of the keyword dictionary and each dimension of the encrypted query vector is Formula$\eta _{q}$-bit long. According to Cash’s scheme [10], when a search user wants to query over the cloud server using a conjunctive keyword set Formula$\varpi$, she needs to compute Formula$stag$ for the estimated least-frequent keyword and Formula$xtoken\text{s}$ for the other keywords in the set Formula$\varpi$. And, each Formula$xtoken$ contains Formula$|\varpi |$ elements in Formula$G$, where Formula$G$ is a group of prime order Formula$p$. Moreover, the search user needs to continuously compute the Formula$xtoken$ until the cloud server sends stop, which indicates that the total number of the Formula$xtoken\text{s}$ is linear to Formula$\varrho$, the number of documents containing the keyword related to the Formula$stag$. This results in much unnecessary communication overhead of Formula$\varrho |\varpi | |G|$, where Formula$|G|$ represents the size of an element in Formula$G$. In Naveed’s scheme [13], since the index is constructed in the blind storage system, the search user may need to access the blind storage system to obtain the Formula$size_{\omega }$ and then obtain the results. This requires one or two round communication of Formula$\alpha *size_{\omega } *n_{b}$ bits, where Formula$\alpha$ is the extension parameter, Formula$size_{\omega }$ is the number of blocks of documents containing Formula$\omega$, and each index number is Formula$n_{b}$-bit long. In the EMRS, we modify the way the search user computes the sequence Formula$S_{f}$ that indexes the blocks by adding some dummy integers to Formula$S_{f}$ to conceal what the search user is searching for. The communication comparison is shown in TABLE 4. As we can see, even though the EMRS requires a little more communication overhead, the EMRS can achieve more functionalities compared with [10], [13] as shown in TABLE 3 and better search efficiency compared with [11] as shown in Fig. 5.

Table 4
Table 4. Comparison of Communication Overhead.

Discussions

Note that the communication overhead in our paper is higher than that in the Cao’s scheme. But the higher communication overhead will not severely affect the user’s experience. This is because that the communication overhead is mainly incurred by the exchange of short signaling messages and can be transmitted in a very short time. Moreover, with the adoption of advanced wireless technology, such as 4G/5G and IEEE 802.11ac, the communication delays tend to further reduce and negligible. As a theoretical framework, in this paper, we target to a prototype system and expose our proposal to the public. As such, based on the specific deployment scenarios, e.g., whether communication bandwidth is expensive and precious or not, to modify our proposal for real-world implementation.

D. Size of Returned Results

The size of the returned results in the EMRS is mainly affected by the choice of the security parameters, Formula$\alpha$ and Formula$\kappa$. The larger these two numbers are, the higher security guarantee the scheme provides, as we discussed in Section V. The size of returned results for each document can be Formula$a*size_{\omega }$ blocks, which contain the blocks of searched document and dummy blocks. Moreover, the search user can require many documents at one time and thus can avoid requesting dummy blocks. The EMRS provides balance parameters for search users to satisfy their different requirements on communication and computation cost, as well as privacy.

SECTION VII

RELATED WORK

Searchable encryption is a promising technique that provides the search service over the encrypted cloud data. It can mainly be classified into two types: Searchable Public-key Encryption (SPE) and Searchable Symmetric Encryption (SSE).

Boneh et al. [18] first propose the concept of SPE, which supports single-keyword search over the encrypted cloud data. The work is later extended in [19] to support the conjunctive, subset, and range search queries on encrypted data. Zhang et al. [20] propose an efficient public key searchable encryption scheme with conjunctive-subset search. However, the above proposals require that the search results match all the keywords at the same time, and cannot return results in a specific order. Further, Liu et al. [21] propose a ranked search scheme which adopts a mask matrix to achieve cost-effectiveness. Yu et al. [15] propose a multi-keyword retrieval scheme that can return the top-k relevant documents by leveraging the fully homomorphic encryption. [22], [23] adopt the attribute-based encryption technique to achieve search authority in SPE.

Although SPE can achieve above rich search functionalities, SPE are not efficient since SPE involves a good many asymmetric cryptography operations. This motivates the research on SSE mechanisms.

The first SSE scheme is introduced by Song et al. [24], which builds the searchable encrypted index in a symmetric way but only supports single keyword. Curtmola et al. further improve the security definitions of SSE in [25]. Their work forms the basis of many subsequent works, such as [10], [13], and [26], by introducing the fundamental approach of using a keyword-related index, which enable the quickly search of documents that contain a given keyword. To meet the requirements of practical uses, conjunctive multi-keyword search is necessary which has been studied in [11] and [15]. Moreover, to give the search user a better search experience, some proposals [27], [28] propose to enabled ranked results instead of returning undifferentiated results, by introducing the relevance score to the searchable encryption. To further improve the user experience, fuzzy keyword search over the encrypted data has also been developed in [7] and [29].

Cao et al. [11] propose a privacy-preserving multi-keyword search scheme that supports ranked results by adopting secure Formula$k$-nearest neighbors (kNN) technique in searchable encryption. The proposal can achieve rich functionalities such as multi-keyword and ranked results, but requires the computation of relevance scores for all documents contained in the database. This operation incurs huge computation overload to the cloud server and is therefore not suitable for large-scale datasets. Cash et al. [10] adopt the inverted index Formula$TSet$, which maps the keyword to the documents containing it, to achieve efficient multi-keyword search for large-scale datasets. The works is later extended in [26] with the implementation on real-world datasets. However, the ranked results is not supported in [26]. Naveed et.al. [13] construct a blind storage system to achieve searchable encryption and conceal the access pattern of the search user. However, only single-keyword search is supported in [13].

SECTION VIII

CONCLUSION

In this paper, we have proposed a multi-keyword ranked search scheme to enable accurate, efficient and secure search over encrypted mobile cloud data. Security analysis have demonstrated that proposed scheme can effectively achieve confidentiality of documents and index, trapdoor privacy, trapdoor unlinkability, and concealing access pattern of the search user. Extensive performance evaluations have shown that the proposed scheme can achieve better efficiency in terms of the functionality and computation overhead compared with existing ones. For the future work, we will investigate on the authentication and access control issues in searchable encryption technique.

Footnotes

This work was supported in part by the International Science and Technology Cooperation and Exchange Program of Sichuan Province, China, under Grant 2014HH0029, the China Post-Doctoral Science Foundation under Grant 2014M552336, and the National Natural Science Foundation of China under Grant 61472065, Grant 61350110238, Grant U1233108, Grant U1333127, Grant 61272525, and 61472065.

Corresponding Author: H. Li

References

No Data Available

Authors

Hongwei Li

Hongwei Li

Hongwei Li (M’12) is currently an Associate Professor with the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China, where he received the Ph.D. degree in computer software and theory, in 2008. He was a Post-Doctoral Fellow with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada, for one year until 2012. His research interests include network security, applied cryptography, and trusted computing. He serves as an Associate Editor of Peer-to-Peer Networking and Applications, a Guest Editor of Peer to-Peer Networking and Applications of the Special Issue on Security and Privacy of P2P Networks in Emerging Smart City. He serves on the Technical Program Committees for many international conferences, such as the IEEE INFOCOM, the IEEE ICC, the IEEE GLOBECOM, the IEEE WCNC, the IEEE SmartGridComm, BODYNETS, and the IEEE DASC. He is also a member of the China Computer Federation and the China Association for Cryptologic Research.

Dongxiao Liu

Dongxiao Liu

Dongxiao Liu (S’14) received the B.S. degree from the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China, in 2013, where he is currently pursuing the master’s degree with the School of Computer Science and Engineering. He serves as a reviewer of Peer-to-Peer Networking and Application. His research interests include cryptography, cloud computing security, and the secure smart grid.

Yuanshun Dai

Yuanshun Dai

Yuanshun Dai (M’03) received the B.S. degree from Tsinghua University, Beijing, China, in 2000, and the Ph.D. degree from the National University of Singapore, Singapore, in 2003. He is currently the Dean of the School of Computer Science and Engineering with the University of Electronic Science and Technology of China, Chengdu, China, where he is also the Chaired Professor and Director of the Collaborative Autonomic Computing Laboratory. He has served as the Chairman of the Professor Committee with the School of Computer Science and Engineering since 2012, and the Associate Director at the Youth Committee of the National 1000 Year Plan in China. He has authored over 100 papers and five books, out of which 50 papers were indexed by SCI, including 25 IEEE TRANSACTIONS/ACM Transactions papers. His current research interests include cloud computing and big data, reliability and security, modeling, and optimization. He has served as a Guest Editor of the IEEE TRANSACTIONS ON RELIABILITY. He is also on the Editorial Boards of several journals.

Tom H. Luan

Tom H. Luan

Tom H. Luan (M’13) received the B.Sc. degree from Xi’an Jiaotong University, Xi’an, China, in 2004, the M.Phil. degree from the Hong Kong University of Science and Technology, Hong Kong, in 2007, and the Ph.D. degree from the University of Waterloo, Waterloo, ON, Canada, in 2012. Since 2013, he has been a Lecturer in Mobile and Applications with the School of Information Technology, Deakin University, Melbourne, VIC, Australia. His research mainly focuses on vehicular networking, wireless content distribution, peer-to-peer networking, and mobile cloud computing.

Xuemin Sherman Shen

Xuemin Sherman Shen

Xuemin (Sherman) Shen (F’09) is currently a Professor and the University Research Chair of the Department of Electrical and Computer Engineering with the University of Waterloo, Waterloo, ON, Canada. He was the Associate Chair for Graduate Studies from 2004 to 2008. His research focuses on resource management in interconnected wireless/wired networks, wireless network security, and vehicular ad hoc and sensor networks. He served as the Technical Program Committee Chair of the IEEE VTC’10 Fall and the IEEE Globecom’07. He also served as the Editor-in-Chief of the IEEE Network, Peer-to-Peer Networking and Application, and IET Communications, a Founding Area Editor of the IEEE T ransactions on Wbioscireless COMMUNICATIONS, and an Associate Editor of the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY and Computer Networks. He is also a registered Professional Engineer in Ontario, Canada, a fellow of the Engineering Institute of Canada and the Canadian Academy of Engineering, and a Distinguished Lecturer of the IEEE Vehicular Technology Society and the Communications Society.

Cited By

No Data Available

Keywords

Corrections

None

Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size