By Topic

• Abstract

SECTION I

## INTRODUCTION

SMART grid has emerged as a new concept and a promising solution for intelligent electricity generation, transmission, distribution and control [1]. The use of robust two-way communications and distributed computing technology improves the efficiency and reliability of power delivery and usage [2]. Currently, many utility companies begin to use smart grid information systems to collect real-time metering data at their control centers, via a reliable communication network deployed in parallel to the power transmission and distribution grid [3], as shown in Fig. 1. In the smart grid information system, smart meters are deployed at residential users' premises as two-way communication devices [4], [5], which periodically record the power consumption and report their metering data to a local area gateway, e.g., a wireless access point (AP). The gateway then collects and forwards data to a control center. Additionally, metering data in smart grid information systems should be periodically audited to ensure that the billing and pricing statements are presented fairly [6]. Specifically, requesters, such as market analysts, are endowed with the task of querying smart grid information systems for auditing, analysis, accounting or tax-related activities [7]. Thus, to prevent the private and sensitive information in the metering data from disclosure, data confidentiality and privacy should be achieved in financial audit for smart grid.

Fig. 1. The conceptional smart grid architecture.

However, the metering data in smart grid are surging from 10,780 terabytes (TB) in 2010 to over 75,200 TB in 2015 [8], which is far beyond the control center's data management capability. Outsourcing data to cloud servers is a promising approach to relieve the control center from the burden of such a large amount of data storage and maintenance. In this approach, users can store their data on cloud servers and execute computation and queries using the servers' computational capabilities [9]. Nevertheless, cloud servers might be untrusted, and intentionally share sensitive data with the third parties for commercial purposes. Therefore, data confidentiality is important in financial audit for smart grid.

In addition, privacy concerns raise in financial auditing [10]. For instance, utility usage patterns within short intervals may reveal the users' regular daily activities [11]. In particular, data from a single house would reveal the activities of the residents, e.g., when the individual resident is at home, when he/she is watching TV [3]. If an attacker can query these data, data privacy might be violated. Therefore, users' data confidentiality and privacy should be protected and only authorized requesters can query the metering data.

From the requester's perspective, the requester, who manages the data query for financial auditing, needs to frequently query the metering data by using date ranges and/or geographic regions etc. If the query is sensitive, the requesters may prefer to keep their queries from being exposed to servers. As a result, how to operate such range queries with guaranteed query privacy is also significant for smart grid.

In this paper, we propose a Privacy-preserving Range Query (PaRQ) scheme over encrypted metering data for smart grid. The PaRQ addresses the data confidentiality and privacy problem by introducing an HVE technique. The main contributions of this paper are twofold.

• Firstly, we construct a range query predicate based on the HVE. Specifically, the session keys and the searchable attributes of the encrypted data are hidden in the HVE based range query predicate. When a requester query the cloud server, the session keys, whose encryption vectors are satisfied with the range query vectors, are released to the requester, for decrypting the encrypted metering data.
• Secondly, we analyze the security strengths and evaluate the performance of the PaRQ. Security analysis demonstrates that the PaRQ can achieve user's data confidentiality and privacy, as well as requester's query privacy. Performance evaluation results show that our PaRQ can reduce the communication and computation overhead, and shorten the response time.

The remainder of this paper is organized as follows. In Section II, we investigate the related works. In Section III, we introduce our system model, security requirements and our design goals. Then, in Section IV, we review some preliminaries. In Section V, we present our PaRQ scheme, followed by its security analysis and performance evaluation in Section VI and Section VII, respectively. Finally, we conclude this paper in Section VIII.

SECTION II

## RELATED WORKS

### A. Security and Privacy in Smart Grid

Security and privacy are critical to the development of wireless networks [10], especially for the real-time data audit strategy in smart grid. The smart grid interpretability panel-cyber security working group [6]presents some guidelines for smart grid cyber security, including security strategy, architecture, and high-level requirements. Li [11] reviews the cyber security and privacy issues in smart grid and discusses some security and privacy solutions for smart grid. Lu et al. [3] use a super-increasing sequence to structure multidimensional data and encrypt the structured data by the holomorphic paillier cryptosystem technique. Li et al. [12] propose an authentication scheme based on merkle tree for smart grid. Acs and Castelluccia [13] exploit the privacy-preserving aggregation technique of time-series data in smart meters. They employ a differential privacy model in which users add noise to their electricity metering and the aggregator can successfully obtain the sum of the metering with a very large probability. In summary, few works focus on the query, especially range query over encrypted data in smart grid, which is really significant for user's metering data audit.

### B. Range Query

Recently, the problem of querying encrypted data has been deeply investigated in both cryptography and database communities. One of the widely studied approaches is public key encryption with keyword search (PEKS) [14]. PEKS can protect users' data privacy and certain query privacy. However, most of PEKS schemes, such as the Searchable Encryption Scheme for Auction (SESA) [15], only can be applied for equality checks. Range query over the encrypted data with numeric attributes is more difficult, and most of the existing literatures cannot achieve data and query privacy simultaneously.

Roughly speaking, there are four categories of solutions that have been developed for range queries: order-preserving encryption (OPE), bucketization (Bucket), HVE and special data structure traversal. OPE-based technique [16] is to ensure that the order of plaintext data is preserved in the ciphertext domain. This allows direct translation of range predicate from the original domain to the domain of the ciphertext. However, the coupling distribution of plaintext and ciphertext domains might be exploited by attackers to guess the scope of the corresponding plaintext for a ciphertext [17]. Bucket-based technique [18] uses distributional properties of the datasets to partition and index data for efficient querying while trying to keep the information disclosure to a minimum. Queries are evaluated in an approximate manner where the returned set of records may contain some false positives.

In an HVE-based approach [19], two vectors over attributes are associated with a ciphertext and a token, respectively. Under the predicate translator, the ciphertext matches the token if and only if the two vectors are component-wise equal. Several HVE schemes [20], [21], [22] have been proposed in literatures. All of them use bilinear groups equipped with bilinear maps, and each constructs a proper method to hide attributes in an encrypted vector. However, it is expensive to compute exponentiation and pairing in a composite-order group. Jong [20] proposes a new HVE scheme that not only works in prime-order groups, but also requires a shorter token size and fewer pairing computations. However, Jong's scheme cannot be directly applied in the smart grid applications where data are high in dimension, variety or both.

Some specialized data structured for range query evaluation are trying to preserve notions of semantic security of the encrypted data, such as ${\rm B}+{\rm tree}$ etc. Recently, Shi et al. [23] propose a searchable encryption scheme that supports multidimensional range queries over encrypted data (MRQED). The MRQED utilizes an interval tree structure to form a hierarchical representation of intervals for each dimension and stores multiple ciphertexts corresponding to a single data value on the server, i.e., each one corresponds to a range. If it is applied to a single-dimensional data with values belonging to a domain of size $N$. The ciphertext representation is $O(logN)$ times the actual data. If the MRQED is applied to a piece of data with $l$ dimensions, each query requires $l$ times complexity to execute.

SECTION III

## SYSTEM MODEL, SECURITY REQUIREMENTS AND DESIGN GOAL

In this section, we formalize the system model, and identify the security requirements and our design goals.

### A. System Model

Our focus is on how to outsource residential users' metering data to a cloud server in encrypted form and how to operate a range query over the encrypted metering data with the help of the control center (CC). Specifically, we consider a typical residential area, as shown in Fig. 2, which is composed of a CC, two cloud servers: the $CS_{1}$ and $CS_{2}$, a requester $S$ and some residential users $\BBU=\{U_{1},U_{2},\ldots,U_{v}\}$.

Fig. 2. System model of PaRQ.

A residential user is the data owner, who encrypts his data by using a secret session key before outsourcing the data to the CSs. There are two cloud servers: Cloud Server 1 $(CS_{1})$ stores data ciphertexts; Cloud Server 2 $(CS_{2})$ stores session key's ciphertexts and indexes. Both servers are semi-trusted, honest but curious. We assume that either the $CS_{1}$ or $CS_{2}$ might be compromised and controlled by an adversary seeking to link users' ciphertexts with their keys, but the adversary cannot control both CSs. The control center is a trusted proxy (it operates on behalf of the utility companies), which can help users to deposit their data to cloud servers and generate query tokens for requesters to retrieve data from the servers. The requester can query the encrypted data on the cloud servers by depositing his entitling tokens to the $CS_{2}$.

The CC consists of two main components: a ciphertext forwarder, and a query translator which always operates within the secure environment. The forwarder on the CC needs to add a unique index to the data ciphertexts and the session key's ciphertexts. To preserve the query privacy, the requester's query needs to be translated into two tokens, so that the $CS_{2}$ can evaluate this query without disclosing its real value.

### B. Security Requirements

We identify the security requirements for our PaRQ. In our security model, the CC is trustable, and residential users $\BBU=\{U_{1},U_{2},\ldots,U_{v}\}$ are honest as well. However, there exists an adversary ${\cal A}$ in the system intending to eavesdrop and invade the database on cloud servers to steal the individual users' reports. In addition, ${\cal A}$ can also launch some active attacks to threaten the data privacy and query privacy. Therefore, in order to prevent ${\cal A}$ from learning the users' data and to detect its malicious actions, the following security requirements should be satisfied in range query applications for smart grid.

• Data Confidentiality: The residential user can utilize symmetric or asymmetric cryptography to encrypt the data before outsourcing, and successfully prevent the unauthorized entities, including eavesdroppers and cloud servers, from prying into the outsourced data.
• Data privacy: Individual residential users' data should not be accessed by unauthorized requesters. It means that only requesters with authorized query tokens can access the $CS_{2}$, and they can obtain the correct session keys when their query vectors in the tokens are satisfied with the encryption vectors. Thus, only the authorized requester can decrypt the encrypted metering data.
• Query privacy: As requesters usually prefer to keep their queries from being exposed to others, thus, the biggest concern is to hide their queries into tokens to protect the query privacy. Otherwise, if the query includes some sensitive information, such as “$5\leq priority\leq 7$”, then the $CS_{2}$ could know the requester is querying some important users' metering data. Then, the requester or the query results could be traced or analyzed by the curious server $CS_{2}$.

### C. Designing Goal

To enable effective range query over encrypted metering data under the aforementioned model, our design goal is to develop a privacy-preserving range query scheme over encrypted data for smart grid, and to achieve the security of the data and efficient range query as follows.

• The security requirements should be guaranteed in the proposed scheme. As stated above, if the smart grid does not consider the security, the residential users' privacy could be disclosed, and the real-time power metering reports could be stealed. Therefore, the proposed scheme should achieve the data confidentiality and privacy, as well as the query privacy.
• The performance efficiency should be achieved in the proposed scheme. As range query are operated over encrypted multidimensional data, compared with existing schemes, the proposed PaRQ scheme should improve the communication, computation and response time complexities.
SECTION IV

## PRELIMINARIES

In this section, we briefly describe the basic definitions and properties of bilinear pairings and HVE, which serves as the basis of the PaRQ.

### A. Bilinear Pairing

Bilinear pairing is an important cryptographic primitive [24]. Let $\BBG_{1}$ and $\BBG_{2}$ be two cyclic multiplication groups of prime order $q$. Let $a$ and $b$ be elements of $Z_{q}^{\ast}$. We assume that the discrete logarithm problem (DLP) in both $\BBG_{1}$ and $\BBG_{2}$ are hard. $g$ is a generator of $\BBG_{1}$. A bilinear pairing is a map $e:\BBG_{1}\times\BBG_{1}\rightarrow\BBG_{2}$ with the following properties.

1. Bilinear: $e(g^{a},h^{b})=e(g,h)^{ab}$ for any $(g,h)\in\BBG_{1}^{2}$.
2. Non-degenerate: $e(g,h)\ne 1_{\BBG_{2}}$ whenever $g$, $h\ne 1_{\BBG_{1}}$.
3. Computable: There is an efficient algorithm to compute $e(g,h)\in\BBG_{2}$ for all $(g,h)\in\BBG_{1}^{2}$.

#### Definition 1

A bilinear parameter generator ${\cal G}en$ is a probabilistic algorithm that takes a security parameter $\kappa$ as input, and outputs a 5-tuple $(q, g,\BBG_{1},\BBG_{2}, e)$.

### B. HEV Based Query Predicate

The concept of HVE is proposed by Boneh and Waters [19]. HVE is a type of predicate encryption where two vectors over attributes are associated with a ciphertext and a token, respectively. At a high level, the ciphertext matches the token if and only if the two vectors are component-wise equal. There are two character sets $\sum$ and $\sum_{\ast}=\sum\cup\{\ast\}$ in the setting of HVE. Here $\sum$ is an arbitrary set of attributes. We assume $\sum=\BBZ_{q}$; ∗ is a special symbol denoting a wildcard component, which means that the component related to ∗ is not involved with any attribute. HVE mainly consists of four phases: key generation, data encryption, token generation and data query.

#### 1) Equality Query

• In key generation phase, the TA distributes the public/private key pair $(PK,SK)$ to a receiver.
• In data encryption phase, a user chooses a vector ${\bf x}=(x_{1},\ldots,x_{l})\in\sum^{l}$ to characterize its data and encrypts its data $m$ into a ciphertext ${\ssr CT}$ using the receiver's public key.
• In token generation phase, the receiver chooses a vector ${\bf w}\!=\!(w_{1},\ldots,w_{l})\!\in\!(\sum_{\ast})^{l}$ to represent his query requirements and generate a query token $T_{w}$. The receiver sends $T_{w}$ to the server.
• In data query phase, if ${\bf x}$ equals to ${\bf w}$, the token can decrypt a ciphertext by using the receiver's private keys. The matching condition is defined as following: let $s(w)$ be the set of indexes $i$ such that $w_{i}$ is not a wildcard in the vector ${\bf w}=(w_{1},\ldots,w_{l})$. For the vector ${\bf x}$ and ${\bf w}$, let $P_{\bf w}({\bf x})$ be the following equality predicate: TeX Source $$P_{\bf w}({\bf x})=\cases{1, &if for all i\in s(w),w_{i}=x_{i},\cr 0,&otherwise.\cr}\eqno{\hbox{(1)}}$$ Then, the server can disclose the data $m$ if the equality predicate $P_{\bf w}({\bf x})=1$.

#### 2) Comparison Query

If we map the $i$th component $x_{i}\in{\bf x}$ to its domain $\{1,\ldots,n\}$ as in [20], the value of $x_{i}$ is one of the number $j\in\{1,\ldots,n\}$. The key generation phase is same as in the above quality query.

Then, in the data encryption phase, the user builds an encryption vector $\sigma ({\bf x})=(\sigma_{i,j})\in\{0,1\}^{nl}$ for ${\bf x}=(x_{1},\ldots, x_{l})\in\{1,\ldots,n\}^{l}$, as follows: TeX Source $$\sigma_{i,j}=\cases{1, &if x_{i}\geq j,\cr 0, & otherwise,\cr}\eqno{\hbox{(2)}}$$ where, $i\in\{1,\ldots,l\}$ and $j\in\{1,\ldots,n\}$. For example, $l=3$, $n=5$ and let ${\bf x}=(1,3,2)$. Thus ${\bf x}=(x_{1},\ldots,x_{l})\in\{1,\ldots,n\}^{l}=\{1,2,3,4,5\}^{3}$ and the corresponding encryption vector $\sigma ({\bf x})=(10000, 11100, 11000)$. Then, the data should be encrypted under the encryption vector $\sigma ({\bf x})$.

Next, in the token generation phase, the user builds a query vector $\sigma^{\ast}({\bf w})=(\sigma_{i,j}^{\ast})\in\{0,1,\ast\}^{nl}$ for ${\bf w}=(w_{1},\ldots, w_{l})\in\{1,\ldots,n\}^{l}$ as follows: TeX Source $$\sigma_{i,j}^{\ast}=\cases{1, &if w_{i}=j,\cr\ast, &otherwise.\cr}\eqno{\hbox{(3)}}$$ Similarly, we assume $l=3$, $n=5$ and ${\bf w}=(w_{1},\ldots,w_{l})\in\{1,\ldots,n\}^{l}=\{1,2,3,4,5\}^{3}$. If the receiver's query condition is $P=({x}_{1}\geq 1)\wedge ({x}_{2}\geq 3)\wedge ({x}_{3}\geq 1)$, i.e., ${\bf w}=(1,3,1)$. Thus the query vector $\sigma^{\ast}({\bf w})=(1\ast\ast\ast\ast,\ast\ast 1\ast\ast,1\ast\ast\ast\ast)$. Note that, the number of the elements in $\sigma^{\ast}({\bf w})$ is $nl$.

In the data query phase, let $s(\sigma^{\ast}(w))$ denotes the set of all indexes $k$ which satisfies $\sigma_{k}^{\ast}\ne\ast$, where $k\in\{1,\ldots,nl\}$. Let $P_{\sigma^{\ast}({\bf w})}(\sigma ({\bf x}))$ be the following comparison predicate: TeX Source \eqalignno{& P_{\sigma^{\ast}({\bf w})}(\sigma ({\bf x}))\cr &\quad=\cases{1, & if for all  i\in s(\sigma^{\ast}(w)),\sigma^{\ast}(w_{i})=\sigma (x_{i}),\cr 0, & otherwise.}&{\hbox{(4)}}} Finally, the server can disclose the data $m$ if the comparison predicate $P_{\sigma^{\ast}({\bf w})}(\sigma ({\bf x}))=1$.

SECTION V

## THE PROPOSED PARQ SCHEME

In this section, we present the details of the PaRQ. There are three major phases in our scheme: construction of the range query predicate phase, encrypted data deposit phase and range query phase. Firstly, we introduce the construction of the range query predicate phase.

### A. Construction of the Range Query Predicate

Inspired by the equality predicate and comparison predicate, we can extend them to support range query predicate. Specifically, we can achieve the opposite semantics of the above comparison query, i.e., $x_{i}\leq j$, by constituting the vectors $\sigma ({\bf x})$ in a reverse manner as (5). TeX Source $$\sigma_{i,j}=\cases{1, &if x_{i}\leq j,\cr 0, & otherwise.\cr}\eqno{\hbox{(5)}}$$ Thus, the HVE scheme can support range queries, such as $a\leq x_{i}\leq b$. Table Iillustrates notations used in this paper. The key generation phase is same as above in the quality query.

TABLE I THE NOTATIONS USED IN THIS PAPER.

In the data encryption phase, the residential user should define two encryption vectors: $\sigma_{\geq}({\bf x})$ and $\sigma_{\leq}({\bf x})$ as (2) and (5) when $x_{i}\geq j$ and $x_{i}\leq j$, respectively. The receiver can obtain the correct data if and only if both conditions $x_{i}\geq a$ and $x_{i}\leq b$ hold. If the encrypted data in HVE is $\Omega$, the residential user $U_{i}\in\BBU$ should split $\Omega$ into two parts by the following steps: 1) randomly chooses a polynomial $f(x)=a^{\prime}x+\Omega$, where $a'$ is a random coefficient. 2) $U_{i}$ chooses two random integers and computes two data shares $\Omega_{L}$ and $\Omega_{R}$, i.e., $\Omega$ is divided into two parts: $\Omega_{{\ssr L}}$ and $\Omega_{{\ssr R}}$. $U_{i}$ encrypts $\Omega_{{\ssr L}}$ and $\Omega_{{\ssr R}}$ under vectors $\sigma_{\geq}({\bf x})$ and $\sigma_{\leq}({\bf x})$, respectively.

In the token generation phase, the requester's range query are defined with two vectors: $\sigma_{\geq}^{\ast}({\bf w})$ and $\sigma_{\leq}^{\ast}({\bf w})$ when $w_{i}=a$ and $w_{i}=b$, respectively. Let $s(\sigma_{\geq}^{\ast}(w))$ be the set of all indexes $k$ which satisfies $\sigma_{\geq}^{\ast}(w_{k})\ne\ast$, and $s(\sigma_{\leq}^{\ast}(w))$ be the sets of all indexes $k^{\prime}$ which satisfy $\sigma_{\leq}^{\ast}(w_{k^{\prime}})\ne\ast$. Here, $k, k^{\prime}\in (1,\ldots,nl)$. Finally, in the data query phase, the server checks two comparison predicates $P_{\sigma_{\geq}^{\ast}({\bf w})}(\sigma_{\geq}({\bf x}))$ and $P_{\sigma_{\leq}^{\ast}({\bf w})}(\sigma_{\leq}({\bf x}))$, which can be generated as (4). The server can obtain $\Omega_{{\ssr L}}$ if $\sigma_{g}(x_{k})$ and $\sigma_{\geq}^{\ast}(w_{k})$ are equal for all $k\in s(\sigma_{\geq}^{\ast}(w))$, i.e., $P_{\sigma_{\geq}^{\ast}({\bf w})}(\sigma_{\geq}({\bf x}))=1$. Similarly, the server can obtain $\Omega_{{\ssr R}}$ if $\sigma_{\leq}(x_{k^{\prime}})$ and $\sigma_{\leq}^{\ast}(w_{k^{\prime}})$ are equal for all $k^{\prime}\in s(\sigma_{\leq}^{\ast}(w))$, i.e., $P_{\sigma_{\leq}^{\ast}({\bf w})}(\sigma_{\leq}({\bf x}))=1$. The range query predicate can be denoted as follows: TeX Source \eqalignno{&P_{(\sigma_{\geq}^{\ast}({\bf w}),\sigma_{\leq}^{\ast}({\bf w}))}(\sigma_{\geq}({\bf x}),\sigma_{\leq}({\bf x}))\cr& =\cases{1, \quad {\rm if} P_{\sigma_{\geq}^{\ast}({\bf w})}(\sigma_{\geq}({\bf x}))=1, and, P_{\sigma_{\leq}^{\ast}({\bf w})}(\sigma_{\leq}({\bf x}))=1\cr 0, \quad {\rm otherwise}.}\cr&&{\hbox{(6)}}} Finally, if $P_{(\sigma_{\geq}^{\ast}({\bf w}),\sigma_{\leq}^{\ast}({\bf w}))}(\sigma_{\geq}({\bf x}),\sigma_{\leq}({\bf x}))=1$ the server can recover $\Omega_{{\ssr L}}$ and $\Omega_{{\ssr R}}$. Then, the data $\Omega$ can be computed.

The main procedures of range query on encrypted data in smart grid are illustrated in Fig. 3. The CC is not only the data forwarder but also the query translator. In the encrypted data deposit phase, before outsourcing his data, a residential user $U_{i}$ encrypts his data $m$ into a ciphertext ${\ssr C}$ by randomly choosing a secret session key $ks$. At the same time, $U_{i}$ hides $ks$ and $m$'s searchable attributes into another ciphertext ${\ssr CT}$ by using the HVE range query predicate and the CC's pubic key $PK$. Note that, $\Omega=ks$ in our PaRQ. Then, $U_{i}$ deposits both ciphertexts ${\ssr C}$ and ${\ssr CT}$ to the CC. The CC adds an index $Ind$ to both ${\ssr C}$ and ${\ssr CT}$. Then the CC transmits $\{Ind,{\ssr C}\}$ to the $CS_{1}$ and $\{Ind,{\ssr CT}\}$ to the $CS_{2}$.

Fig. 3. Data query procedures (a) Encrypted Data Deposit Phase (ED) (b) Range Query Phase (RQ).

As shown in Fig. 3, when a requester S posts a range query, the query should be translated into query tokens by using the CC's private key. Then, the requester deposits its tokens to the $CS_{2}$ to retrieve the session key $ks$ and index $Ind$. The session keys whose encryption vectors are satisfied with the range query vectors and their indexes can be released to the requester. The requester queries the corresponding ciphertext ${\ssr C}$ from $CS_{1}$ by using its received index $Ind$. Then, the requester can recover the original data by using the secret key $ks$ to decrypt ${\ssr C}$.

### B. The Encrypted Data Deposit Phase

#### 1) Key Generation

For a single-authority smart grid system, a trusted authority (TA) can bootstrap the whole system. Specifically, in the key generation phase, given the security parameters $\kappa$, TA first generates $(q, g,\BBG_{1},\BBG_{2}, e)$ by running ${\cal G}en(\kappa)$. TA randomly chooses a master key $r\in Z_{q}^{\ast}$, and computes the corresponding public key $g^{r}$. Thus, $(g^{r},r)$ is the public/private key pair of the TA. When S applies a query, TA assigns an ID-based key pair $(H_{1}(ID_{S}),H_{1}^{r}(ID_{S}))$, denoted as $(pk_{s}, sk_{s})$, to S. TA selects some random elements $g_{1}$, $g_{2}$, $(h_{1},u_{1},\psi_{1}),\ldots, (h_{nl},u_{nl},\psi_{nl})\in\BBG_{1}$. The TA also picks random numbers $y_{1},y_{2},v_{1},\ldots,v_{nl}, t_{1},\ldots,t_{nl}\in\BBZ_{p}$. Then, TA computes $Y_{1}=g^{y_{1}}$, $Y_{2}=g^{y_{2}}$, $v_{k}=g^{v_{k}}\in\BBG_{1}$ for $k\in (1,\ldots,nl)$. In addition, TA computes $\Gamma=e(g_{1},Y_{1})e(g_{2},Y_{2})\in\BBG_{2}$. Later, TA distributes the HVE public/private key pair $(PK, SK)$ to the CC as follows: TeX Source \eqalignno{PK=&\, (g,Y_{1},Y_{2},(h_{1},u_{1},\psi_{1},V_{1},T_{1}),\cr &\quad\ldots,(h_{nl},u_{nl},\psi_{l},V_{nl},T_{nl}))\cr SK=&\,(g_{1},g_{2},y_{1},y_{2},v_{1},\ldots,v_{nl},t_{1},\ldots,t_{nl}).} We assume that the communication channels are secure in our system model. An ID-based signature scheme $Sig(\cdot)$ [25] can be used to authenticate the requester's identity. The details of secure channel establishment is without the scope of this paper. Fig. 4 shows the dataflow of the PaRQ.

Fig. 4. Data flow in our PaRQ scheme.

#### 2) Data Encryption

We denote each data as $m_{i}$. When $U_{i}$ wants to report $m_{i}$ to the cloud server $CS_{1}$, $U_{i}$ randomly generates a session key $ks_{i}$. Then $U_{i}$ encrypts its data into a ciphertext ${\ssr CT}_{i}$, where ${\ssr CT}_{i}={\ssr Enc}_{ks_{i}}(m_{i})$. ${\ssr Enc}(\cdot)$ is a symmetric encryption algorithm, e.g., AES [26].

For each uploading interval TeX Source $$U_{i}\rightarrow CC:\{{\ssr C}_{i},t_{1}\}.$$ In this paper, “$A\rightarrow B:\{C\}$” means “A sends C to B”. Then, the CC adds a unique index $Ind_{i}$ to the data ciphertext and transmits all of them to the $CS_{1}$. TeX Source $$CC\rightarrow CS_{1}:\{{\ssr C}_{i},t_{1},Ind_{i}\}.$$

#### 3) HVE-Based Session Key Encryption

If each data has $l$ searchable attributes, $U_{i}$ chooses a vector ${\bf x}_{i}=(x_{i1},\ldots,x_{il})\in\sum^{l}$ to characterize its data $m_{i}$ in different dimensions. To encrypt $ks_{i}$ by using the CC's $PK$ and the vector ${\bf x}_{i}$, $U_{i}$ divides each $ks_{i}$ into two parts: $ks_{i{\ssr L}}$ and $ks_{i{\ssr R}}$. Then, $ks_{i{\ssr L}}$ is encrypted by using the encryption vector $\sigma_{\geq}({\bf x}_{i})$; $ks_{i{\ssr R}}$ is encrypted by using the encryption vector $\sigma_{\leq}({\bf x}_{i})$. Thus, the $CS_{2}$ can recover $ks_{i}$ only when both encryption vectors are satisfied with the corresponding query vectors in the range query tokens. The HVE-based session key encryption details are as follows:

1. Firstly, $U_{i}$ maps ${\bf x}_{i}$ to an encryption vector $\sigma_{\geq}({\bf x}_{i})$ as (2). Then, $U_{i}$ selects two random numbers $r_{i1}$, $r_{i2}\in Z_{p}$ and computes tags for the ciphertext $ks_{i{\ssr L}}$ by using the encryption vector $\sigma_{\geq}({\bf x}_{i})$ as: TeX Source \eqalignno{&{\ssr C}^{i}_{{\ssr L}1}=Y_{1}^{r_{i1}},{\ssr C}^{i}_{{\ssr L}2}=Y_{2}^{r_{i1}},\cr &{\ssr C}^{i}_{{\ssr L}3,1}=(h_{1}u_{1}^{\sigma_{\geq}(x_{i1})})^{r_{i1}}V_{1}^{r_{i2}},\cr &\ldots,\cr &{\ssr C}^{i}_{{\ssr L}3,nl}=(h_{nl}u_{nl}^{\sigma_{\geq}(x_{inl})})^{r_{i1}}V_{nl}^{r_{i2}},\cr &{\ssr C}^{i}_{{\ssr L}4,1}=\psi_{1}^{r_{i1}}T_{1}^{r_{i2}},\cr &\ldots,\cr &{\ssr C}^{i}_{{\ssr L}4,nl}=\psi_{nl}^{r_{i1}}T_{nl}^{r_{i2}},\cr &{\ssr C}^{i}_{{\ssr L}5}=g^{r_{i2}},{\ssr C}^{i}_{{\ssr L}6}=\Gamma^{r_{i1}}ks_{i{\ssr L}}.&{\hbox{(7)}}}
Let ${\ssr CT}_{i{\ssr L}}=({\ssr C}^{i}_{{\ssr L}1},{\ssr C}^{i}_{{\ssr L}2},{\ssr C}^{i}_{{\ssr L}3,1},\ldots,{\ssr C}^{i}_{{\ssr L}3,nl},{\ssr C}^{i}_{{\ssr L}4,1},\ldots, {\ssr C}^{i}_{{\ssr L}4,nl}, {\ssr C}^{i}_{{\ssr L}5},{\ssr C}^{i}_{{\ssr L}6})$.
2. Secondly, $U_{i}$ maps ${\bf x}_{i}$ to an encryption vector $\sigma_{\leq}({\bf x}_{i})$ as (5). Then, $U_{i}$ selects two random numbers $r^{\prime}_{i1}$, $r^{\prime}_{i2}\in Z_{p}$, and computes tags for the ciphertext $ks_{i{\ssr R}}$ by using the encryption vector $\sigma_{\leq}({\bf x}_{i})$: TeX Source \eqalignno{&{\ssr C}^{i}_{{\ssr R}1}=Y_{1}^{r^{\prime}_{i1}},{\ssr C}^{i}_{{\ssr R}2}=Y_{2}^{r^{\prime}_{i1}},\cr &{\ssr C}^{i}_{{\ssr R}3,1}=(h_{1}u_{1}^{\sigma_{\leq}(x_{i1})})^{r^{\prime}_{i1}}V_{1}^{r^{\prime}_{i2}},\cr &\ldots,\cr &{\ssr C}^{i}_{{\ssr R}3,nl}=(h_{nl}u_{nl}^{\sigma_{\leq}(x_{inl})})^{r^{\prime}_{i1}}V_{nl}^{r^{\prime}_{i2}},\cr &{\ssr C}^{i}_{{\ssr R}4,1}==\psi_{1}^{r^{\prime}_{i1}}T_{1}^{r^{\prime}_{i2}},\cr &\ldots,\cr &{\ssr C}^{i}_{{\ssr R}4,nl}=\psi_{nl}^{r^{\prime}_{i1}}T_{nl}^{r^{\prime}_{i2}},\cr &{\ssr C}^{i}_{{\ssr R}5}=g^{r^{\prime}_{i2}},{\ssr C}^{i}_{{\ssr R}6}=\Gamma^{r^{\prime}_{i1}}ks_{i{\ssr R}}.&{\hbox{(8)}}} Let ${\ssr CT}_{i{\ssr R}}=({\ssr C}^{i}_{{\ssr R}1},{\ssr C}^{i}_{{\ssr R}2},{\ssr C}^{i}_{{\ssr R}3,1},\ldots,{\ssr C}^{i}_{{\ssr R}3,nl},{\ssr C}^{i}_{{\ssr R}4,1},\ldots, {\ssr C}^{i}_{{\ssr R}4,nl}, {\ssr C}^{i}_{{\ssr R}5},{\ssr C}^{i}_{{\ssr R}6})$.

#### 4) Ciphertext Deposit

$U_{i}$ deposits ${\ssr CT}_{i{\ssr L}}$ and ${\ssr CT}_{i{\ssr R}}$ to the CC as: TeX Source $$U_{i}\rightarrow CC:\{{\ssr CT}_{i{\ssr L}},{\ssr CT}_{i{\ssr R}},t_{2}\}.$$ The CC also adds the index $Ind_{i}$ to the key ciphertext and transmits all of them to the $CS_{2}$. TeX Source $$CC\rightarrow CS_{2}:\{{\ssr CT}_{i{\ssr L}},{\ssr CT}_{i{\ssr R}},t_{2},Ind_{i}\}.$$

### C. Range Query Phase

#### 1) Token Generation

When S wants to query the server to retrieve the expected data, S firstly generates a range query, such as $P=(a_{1}\leq{x}_{1}\leq b_{1})\wedge (a_{2}\leq{x}_{2}\leq b_{2})\cdots\wedge(a_{l}\leq {x}_{l}\leq b_{l})$. S then computes a signature $\delta_{s}=Sig_{sk_{s}}(P)$.

In each querying interval TeX Source $$S\rightarrow CC:\{P,ID_{S},\delta_{s}\}.$$ The CC verifies S's signature by using S's public key. If S is an authorized requester, the CC might issue query tokens to S by following steps. The CC divides $P$ into two parts: $P_{\geq}=({x}_{1}\geq a_{1})\wedge ({x}_{2}\geq a_{2})\cdots\wedge ({x}_{l}\geq a_{l})$ and $P_{\leq}=({x}_{1}\leq b_{1})\wedge ({x}_{1}\leq b_{2})\cdots\wedge ({x}_{1}\leq b_{l})$. Let $w_{\geq}=(a_{1},\ldots,a_{l})$ and $w_{\leq}=(b_{1},\ldots,b_{l})$.

The CC generates two query vector $\sigma_{\geq}^{\ast}({\bf w})$ and $\sigma_{\leq}^{\ast}({\bf w})$ to represent $P_{\geq}$ and $P_{\leq}$ as (3), respectively. The wildcard ∗ in the vector $\sigma_{\geq}^{\ast}({\bf w})$ means that S does not care about the attributes related to ∗. Let $s(\sigma^{\ast}(w_{g}))$ be the set of $k$ which satisfies $\sigma_{\geq}^{\ast}(w_{k})\ne\ast$. Let $s(\sigma_{\leq}^{\ast}(w))$ be the set of $k^{\prime}$ which satisfies $\sigma_{\leq}^{\ast}(w_{k^{\prime}})\ne\ast$. Then the CC computes a token $T_{P_{\geq}}$ by using the query vector $\sigma_{\geq}^{\ast}({\ssr w})$ as follows.

1. Select a random $\alpha$, $\beta\in Z_{p}$, and generate $\lambda_{k}$, $\psi_{k}$, $\gamma_{k}$, $\tau_{k}\in Z_{p}$ such that $\lambda_{k}y_{1}+\psi_{k}y_{2}=\alpha$, $\gamma_{k}y_{1}+\tau_{k}y_{2}=\beta$ for all $k\in s(\sigma_{\geq}^{\ast}(w))$.
2. Compute a token $T_{P_{\geq}}$ as TeX Source \eqalignno{K^{s}_{{\ssr L}1}=&\, g_{1}\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{\lambda_{k}}\psi_{k}^{\gamma_{k}},\cr K^{s}_{{\ssr L}2}=&\, g_{2}\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{\varphi_{k}}\psi_{k}^{\tau_{k}},\cr K^{s}_{{\ssr L}3}=&\,g^{\alpha},\cr K^{s}_{{\ssr L}4}=&\, g^{\beta},\cr K^{s}_{{\ssr L}5}=&\, g^{-\sum_{k\in s(\sigma_{\geq}^{\ast}(w))}(v_{k}\alpha+t_{k}\beta)}.&{\hbox{(9)}}}

Similarly, the CC generates a token $T_{P_{\leq}}$ by using the query vector $\sigma_{\leq}^{\ast}({\ssr w})$ as follows.

1. Select a random $\alpha^{\prime}$, $\beta^{\prime}\in Z_{p}$, and generate $\lambda_{k^{\prime}}$, $\psi_{k^{\prime}}$, $\gamma_{k^{\prime}}$, $\tau_{k^{\prime}}\in Z_{p}$ such that $\lambda_{k^{\prime}}y_{1}+\psi_{k^{\prime}}y_{2}=\alpha^{\prime}$, $\gamma_{k^{\prime}}y_{1}+\tau_{k^{\prime}}y_{2}=\beta^{\prime}$ for all $k^{\prime}\in s(\sigma_{\leq}^{\ast}(w))$.
2. Compute the token $T_{P_{\leq}}$ as TeX Source \eqalignno{K^{s}_{{\ssr R}1}=&\, g_{1}\prod_{k^{\prime}\in s(\sigma^{\ast}(w_{ls}))}(h_{k^{\prime}}u_{k^{\prime}}^{\sigma^{\ast}(w_{lsk^{\prime}})})^{\lambda_{k^{\prime}}}\psi_{k^{\prime}}^{\gamma_{k^{\prime}}},\cr K^{s}_{{\ssr R}2}=&\, g_{2}\prod_{k^{\prime}\in s(\sigma^{\ast}(w_{ls}))}(h_{k^{\prime}}u_{k^{\prime}}^{\sigma^{\ast}(w_{lsk^{\prime}})})^{\varphi_{k^{\prime}}}\psi_{k^{\prime}}^{\tau_{k^{\prime}}},\cr K^{s}_{{\ssr R}3}=&\,g^{\alpha^{\prime}},\cr K^{s}_{{\ssr R}4}=&\, g^{\beta^{\prime}},\cr K^{s}_{{\ssr R}5}=&\, g^{-\sum_{k^{\prime}\in s(\sigma^{\ast}(w_{ls}))}(v_{k^{\prime}}\alpha^{\prime}+t_{k^{\prime}}\beta^{\prime})}.&{\hbox{(10)}}}

Let $T_{P_{\geq}}=(K^{s}_{{\ssr L}1},K^{s}_{{\ssr L}2},K^{s}_{{\ssr L}3},K^{s}_{{\ssr L}4},K^{s}_{{\ssr L}5})$ and $T_{P_{\leq}}=(K^{s}_{{\ssr R}1}, K^{s}_{{\ssr R}2},K^{s}_{{\ssr R}3},K^{s}_{{\ssr R}4},K^{s}_{{\ssr R}5})$. Then, the CC keeps a record $(ID_{s},T_{P_{\geq}}, T_{P_{\leq}})$ in its database and distributes $T_{P_{\geq}}$ and $T_{P_{\leq}}$ to the requester $S$ as its authorized tokens: TeX Source $$CC\rightarrow S:\{T_{P_{\geq}}, T_{P_{\leq}}\}.$$

#### 2) Key and Index Query

After receiving the query tokens from the CC, the requester deposits them as well as the non-wildcard indexes sets to the cloud server $CS_{2}$ as: TeX Source $$S\rightarrow CS_{2}:\{T_{P_{\geq}}, T_{P_{\leq}},t_{4},s(\sigma_{\geq}^{\ast}(w)),s(\sigma_{\leq}^{\ast}(w))\}.$$ Then, the $CS_{2}$ searches its database to find whether there is a key ciphertext which matches the requester's query conditions. For each key ciphertext, if its encryption vectors are satisfied with the query vectors in the query tokens, the $CS_{2}$ can obtain: TeX Source \eqalignno{ks_{i{\ssr L}}=&\,{{D_{1}.D_{2}.e(K^{s}_{{\ssr L}5},{\ssr C}^{i}_{{\ssr L}5}).{\ssr C}^{i}_{{\ssr L}6}}\over{e(K^{s}_{{\ssr L}1},{\ssr C}^{i}_{{\ssr L}1}).e(K^{s}_{{\ssr L}2},{\ssr C}^{i}_{{\ssr L}2})}},&\hbox{(11)}\cr ks_{i{\ssr R}}=&\,{{D^{\prime}_{1}.D^{\prime}_{2}.e(K^{s}_{{\ssr R}5},{\ssr C}^{i}_{{\ssr R}5}).{\ssr C}^{i}_{{\ssr R}6}}\over{e(K^{s}_{{\ssr R}1},{\ssr C}^{i}_{{\ssr R}1}).e(K^{s}_{{\ssr R}2},{\ssr C}^{i}_{{\ssr R}2})}},&{\hbox{(12)}}} where $D_{1}=e(K^{s}_{{\ssr L}3},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}{\ssr C}^{i}_{{\ssr L}3,k})$ and $D_{2}=e(K^{s}_{{\ssr L}4}, \prod_{k\in s(\sigma_{\geq}^{\ast}(w))}{\ssr C}^{i}_{{\ssr L}4,k})$. $D^{\prime}_{1}=e(K^{s}_{{\ssr R}3},\prod_{k^{\prime}\in s(\sigma_{\leq}^{\ast}(w))}{\ssr C}^{i}_{{\ssr R}3,k^{\prime}})$ and $D^{\prime}_{2}=e(K^{s}_{{\ssr R}4}, \prod_{k^{\prime}\in s(\sigma_{\leq}^{\ast}(w))}{\ssr C}^{i}_{{\ssr R}4,k^{\prime}})$. Thus, the $CS_{2}$ can obtain $ks_{i}$ by using $ks_{i{\ssr L}}$ and $ks_{i{\ssr R}}$. Then, the $CS_{2}$ sends this recovered key $ks_{i}$ with indexes to the requester. TeX Source $$CS_{2}\rightarrow S:\{(ks_{i},Ind_{i}),(ks_{i_{,}},Ind_{i^{\prime}}),\ldots\}.$$ Note that, there might be more than one session keys which are satisfied with the range query tokens. Then, the $CS_{2}$ distributes all of session keys back to the requester S.

#### 3) Data Query

Upon receiving the keys and indexes from the $CS_{2}$, the requester queries the $CS_{1}$ by using the received indexes to obtain the corresponding data ciphertext. TeX Source $$S\rightarrow CS_{1}:\{Ind_{i}, Ind_{i^{\prime}},\ldots\}.$$ Then, the $CS_{1}$ searches its database to find whether there are ciphertexts matching the requester's indexes. If so, the $CS_{1}$ sends matched ciphertexts to the requester S: TeX Source $$CS_{1}\rightarrow S:\{{\ssr C}_{i},{\ssr C}_{i^{\prime}},\ldots\}.$$

#### 4) Data Decryption

After receiving the real session keys from the $CS_{2}$ and ciphertexts $\{{\ssr C}_{i},{\ssr C}_{i^{\prime}},\ldots\}$ from the $CS_{1}$, the requester $S$ can obtain the real data by using the session keys to decrypt the ciphertexts, otherwise, ${\ssr C}_{i}$ can be discarded. TeX Source $$S:m_{i}={\ssr Dec}_{ks_{i}}({\ssr C}_{i}),\ldots$$ ${\ssr Dec}(\cdot)$ is the symmetric decryption algorithm corresponding to the opposite operation of ${\ssr Enc}(\cdot)$.

### D. Enhancement With Collusion Resilience

In our system model, we assume that the adversary cannot control both CSs. Actually, in order to prevent the cloud server $CS_{1}$ and $CS_{2}$ in collusion to disclose the data $m_{i}$, an identity based encryption scheme [24] can be used in the data encryption phase to encrypt $m_{i}$. For instance, the CC has one pair of identity based public/private key $(pk,sk)$. Firstly, $m_{i}$ is encrypted by the CC's public key $pk$. Againit is encrypted by using the session key $ks_{i}$ and be outsourced to the $CS_{1}$. Thus, even if the $CS_{2}$ recovers the session key $ks_{i}$ with the requester's query tokens, the $CS_{2}$ can not decrypt the ciphertexts on the $CS_{1}$ to obtain the data $m_{i}$. The reason is that the $CS_{2}$ cannot obtain the CC's ID-based private key $sk$. When an authorized requester S asks for query tokens from the CC, the CC replies the query tokens as well as its identity based private key $sk$. Accordingly, the requester can recover data $m_{i}$ by using both session key $ks_{i}$ from $CS_{1}$ and $sk$ from the CC. Note that, the requesters are high-level users, such as energy company's financial auditors, who are authorized by the CC. Therefore, they can obtain the CC's private key to decrypt their queried data. As a result, the PaRQ can remain secure even the two cloud servers are in collusion.

Furthermore, to provide forward security and prevent requesters from decrypting future encrypted data by using CC's old private key, the CC can compute different pairs of time and identity based public/private key pairs at different time intervals. Therefore, the authorized requesters can only obtain CC's private keys corresponding to their entitled intervals to decrypt their required data.

SECTION VI

## SECURITY ANALYSIS

In this section, we analyze the security properties of the proposed PaRQ according to the security requirements discussed in Section III-B.

• The individual residential users' data confidentiality can be achieved. In the PaRQ, the residential user's data $m_{i}$ is encrypted by its session key $ks_{i}$. For the eavesdroppers and the $CS_{2}$, they cannot obtain anything from the ciphertext ${\ssr C}_{i}$ because they lack of the secret key $ks_{i}$. Although the $CS_{2}$ can recover $ks_{i}$ from the requester's query tokens, the $CS_{2}$ cannot extract the ciphertexts from the $CS_{1}$ if they are not in collusion. Our enhancement introduced in Section V-D can be resilient to the collusion attack even if the $CS_{1}$ is in collusion with the $CS_{2}$. Accordingly, the individual residential users data confidentiality is achieved in the proposed PaRQ scheme.
• The individual residential users' data privacy can be preserved. In our PaRQ, $ks_{i}$ and the searchable attributes are hid in two ciphertexts $\{{\ssr CT}_{i{\ssr L}},{\ssr CT}_{i{\ssr R}}\}$. Other requesters cannot obtain $ks_{i}$ from the $CS_{2}$ if they cannot obtain the authorized query tokens $\{T_{P_{\geq}}, T_{P_{\leq}}\}$ from the CC or their query vectors in the query tokens are not satisfied with the encryption vectors on the ciphertexts. Accordingly, they cannot pry into the session keys and decrypt the encrypted metering data. The range query correctness can be demonstrated as follows. In (11), the numerator equals: TeX Source \eqalignno{& D_{1}\cdot D_{2}\cdot e(K^{s}_{{\ssr L}5},{\ssr C}^{i}_{{\ssr L}5})\cdot{\ssr C}^{i}_{{\ssr R}6}\cr &\quad=e\left(g^{\alpha},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}(x_{ik})})^{r_{i1}}g^{v_{k}r_{i2}}\right)\cr &\qquad\cdot e\left(g^{\beta},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}\psi_{1}^{r_{i1}}g^{t_{1}r_{i2}}\right)\cr &\qquad\cdot e\left(g^{-\sum_{k\in s(\sigma_{\geq}^{\ast}(w))}(v_{k}\alpha+t_{k}\beta)},g^{r_{i2}}\right)\cdot\Gamma^{r_{i1}}ks_{i{\ssr L}}\cr &\quad=e\left(g^{\alpha},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}(x_{ik})})^{r_{i1}}\right)\cr &\qquad\cdot e\left(g^{\beta},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}\psi_{k}^{r_{i1}}\right)\cr &\qquad\cdot e\left(g^{r_{i2}},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}g^{v_{k}\alpha+t_{k}\beta}\right)\cr &\qquad\cdot e\left(g^{-\sum_{k\in s(\sigma_{\geq}^{\ast}(w))}(v_{k}\alpha+t_{k}\beta)},g^{r_{i2}}\right)\cdot\Gamma^{r_{i1}}ks_{i{\ssr L}}\cr &\quad=e\left(g^{\alpha},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}(x_{ik})})^{r_{i1}}\right)\cr &\qquad\cdot e\left(g^{\beta},\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}\psi_{k}^{r_{i1}}\right)\cdot\Gamma^{r_{i1}}ks_{i{\ssr L}}.&{\hbox{(13)}}}
• while the denominator equals: TeX Source \eqalignno{& e(K^{s}_{{\ssr L}1},{\ssr C}^{i}_{{\ssr L}1}).e(K^{s}_{{\ssr L}2},{\ssr C}^{i}_{{\ssr L}2})\cr &\quad=e\left(g_{1}\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{\lambda_{k}}\psi_{k}^{\gamma_{k}},g^{y_{1}r_{i1}}\right)\cr &\qquad\cdot e\left(g_{2}\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{\varphi_{k}}\psi_{k}^{\tau_{k}},g^{y_{2}r_{i1}}\right)\cr &\quad=\Gamma^{r_{i1}}\cdot\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}[e((h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{\lambda_{k}},g^{y_{1}r_{i1}})\cr &\hskip10em\cdot e((h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{\varphi_{k}},g^{y_{2}r_{i1}})]\cr &\qquad\cdot\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}[e(\psi_{k}^{\gamma_{k}},g^{y_{1}r_{i1}})\cdot e(\psi_{k}^{\tau_{k}},g^{y_{2}r_{i1}})]\cr &\quad=\Gamma^{r_{i1}}\cdot\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}e((h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{r_{i1}},g^{\lambda_{k}y_{1}+\psi_{k}y_{2}})\cr &\qquad\cdot\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}e(\psi_{k}^{r_{i1}},g^{\gamma_{k}y_{1}+\tau_{k}y_{2}})\cr &\quad=\Gamma^{r_{i1}}\cdot e\left(\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{r_{i1}},g^{\alpha}\right)\cr &\qquad\cdot e\left(\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}\psi_{k}^{r_{i1}},g^{\beta}\right).&{\hbox{(14)}}}

Let $\Theta$ be the set of indexes $i\in s(\sigma_{\geq}^{\ast}(w))$ where $\sigma_{\geq}^{\ast}(w_{k})\ne\sigma_{\geq}(x_{ik})$. The (11) outputs follows TeX Source \eqalignno{&{{D_{1}\cdot D_{2}\cdot e(K_{s50},C_{i50})\cdot C_{i60}}\over{e(K_{s10},C_{i10})\cdot e(K_{s20},C_{i20})}}\cr &\quad=e(g^{\alpha},\prod_{k\in\Theta}(u_{k}^{\sigma_{\geq}(x_{ik})-\sigma_{\geq}^{\ast}(w_{k})})^{r_{i1}})\cdot ks_{iL}\cr &\quad=e(g,g)^{\alpha r_{i1}\Sigma_{k\in\Theta}(log_{g}(u_{k}))(\sigma_{g}(x_{ik})-\sigma_{\geq}^{\ast}(w_{k}))}.&{\hbox{(15)}}} If $\sigma_{\geq}^{\ast}(w_{k})$ equals $\sigma_{\geq}(x_{ik})$ for all $k\in s(\sigma_{\geq}^{\ast}(w))$, $ks_{i{\ssr L}}$ can be recovered; otherwise, the unauthorized requesters cannot obtain $ks_{i{\ssr L}}$ according to (11).

Similarly, the unauthorized requesters cannot obtain $ks_{i{\ssr R}}$ from (12). Therefore, only the authorized requester can obtain the query results and the users' data privacy is preserved.

• The requester's query privacy can be preserved. In our PaRQ, a requester's query is divided into two parts: $P_{\geq}$ and $P_{\leq}$, by the $CC$. Both of them are translated into tokens $T_{P_{\geq}}$ and $T_{P_{\leq}}$ in the form of $\prod_{k\in s(\sigma_{\geq}^{\ast}(w))}(h_{k}u_{k}^{\sigma_{\geq}^{\ast}(w_{k})})^{\lambda_{k}}\psi_{k}^{\gamma_{k}}$ and $\prod_{k^{\prime}\in s(\sigma_{\leq}^{\ast}(w))} (h_{k^{\prime}}u_{k^{\prime}}^{\sigma_{\leq}^{\ast}(w_{k^{\prime}})})^{\varphi_{k^{\prime}}}\psi_{k^{\prime}}^{\tau_{k^{\prime}}}$, respectively. For the eavesdroppers, they can learn nothing about the query $P$ only with the tokens $T_{P_{\geq}}$ and $T_{P_{\leq}}$ because they have no idea about the index sets and encryption parameters $\lambda_{k}$, $\psi_{k}$, $\gamma_{k}$, $\tau_{k}$. Since the $CS_{2}$ still does not know the encryption parameters $\lambda_{k}$, $\psi_{k}$, $\gamma_{k}$, $\tau_{k}$, it also cannot obtain the real value of $P$ even with the tokens and the index sets $s(\sigma_{\geq}^{\ast}(w))$ and $s(\sigma_{\leq}^{\ast}(w))$. Therefore, the requester's query privacy is preserved in the proposed PaRQ scheme.

From the above security analysis and comparison in Table II, our PaRQ can achieve all of the data confidentiality and privacy and query privacy, compared with order-preserving encryption (OPE) [16] based technique and bucketization (Buket) based technique [18].

TABLE II COMPARISON OF SECURITY PROPERTIES.
SECTION VII

## PERFORMANCE EVALUATION

In this section, we evaluate the performance of the proposed PaRQ scheme in terms of the communication overhead, computation complexity and response time of the system.

We numerically analyze the communication overhead of our PaRQ, compared with the MRQED [23], in terms of the public key size, ciphertexts size and token size. Since the functionality of the decryption key computation phase in determining the query results in MRQED is similar to that of query tokens in the PaRQ, therefore, we take their size in comparison. “Tok/Dek” in Tables II and III represents Token or Decryption key. Since most pairing-based cryptosystems need to work in a subgroup of the elliptic curve $E(F_{q})$, by representing elliptic curve points using point compression, the lengths of the elements in $G_{1}$ and $G_{2}$ are roughly 161-bit (using point compression) and 1,024-bit, respectively. In the following, $l$ is the number of data dimensions and $N$ is domain of attribute values.

TABLE III COMPARISON OF COMMUNICATION COMPLEXITY (BIT).

In the key generation phase, the public key includes $(5Nl+3) G_{1}$ elements. If we choose AES ciphertext with 256-bit, the data ciphertext ${\ssr C}_{i}$ is only of 256 bits. In addition, session key's ciphertext includes two parts: ${\ssr CT}_{i{\ssr L}}$ and ${\ssr CT}_{i{\ssr R}}$, each of which includes $(2Nl+3) G_{1}$ elements and a $G_{2}$ element, thus the size of each session key's ciphertext is $(4Nl+6)\times 161+2048 {\rm bits}$. Since each query token includes 5 $G_{1}$ elements, the size of the two query tokens $\{T_{P_{\geq}}, T_{P_{\leq}}\}$ is $161\times 10=1610 {\rm bits}$.

In comparison, the public key in the MRQED [23] includes $8Nl G_{1}$ elements and a $G_{2}$ element. The decryption keys include $5Nl G_{1}$ elements. In addition, there are $(4Nl+1) G_{1}$ elements and a $G_{2}$ element in the ciphertexts. Table III shows that compared with the MRQED, our PaRQ consumes less communication overhead. Especially, our PaRQ significantly reduces the tokens transmission overhead, which is a constant, i.e., 1600 bits; in the MRQED the transmission overhead of the decryption keys may increase with both $l$ and $N$. The total communication overhead comparison is depicted in Fig. 5. It further indicates that our PaRQ costs less communication overhead than the MRQED.

Fig. 5. Comparison of communication overhead between PaRQ and MRQED schemes.

In our PaRQ, the computation tasks include pairing operations and exponentiation operations. For simplicity of description, the pairing operation and exponentiation operation are denoted as $C_{p}$ and $C_{e}$, respectively. Since the AES encryption/decryption and multiplication are much faster than the pairing operations, we do not analyze the AES encryption/decryption and multiplication in this subsection.

For the PaRQ, the symmetric encryption of ${\ssr C}_{i}$ is very fast. Meanwhile, the corresponding session key is encrypted into key ciphertexts $\{{\ssr CT}_{i0},{\ssr CT}_{i1}\}$ by using its encryption vectors. The computation overhead of $\{{\ssr CT}_{i0},{\ssr CT}_{i1}\}$ is $(10Nl+8)C_{e}$ because each part requires $(5Nl+4)$ exponentiation operations. In the token generation phase, the computation cost of $\{T_{P_{\geq}}, T_{P_{\leq}}\}$ is $(12l+2)C_{e}$, because each query token in $\{T_{P_{\geq}}, T_{P_{\leq}}\}$ needs $6l+3$ exponentiation operations. After receiving the tokens, the $CS_{2}$ needs to compute 10 pairings to recover the session key $\{ks_{i{\ssr L}},ks_{i{\ssr R}}\}$, i.e., $10C_{p}$.

On the other hand, the MRQED [23] needs $(8Nl+3)$ exponentiation operations to encrypt a message, another $8Nl$ exponentiation operations to derive the decryption keys and $5l\cdot l_{og}N$ pairing operations to search the correct results. From Table IV, we can see that the encryption overhead in both PaRQ and MRQED increase with $l$ and $N$. The computation overhead of token generation in the PaRQ only increases wtih $l$, whereas, the overhead of decryption key generation in MRQED increases with both $l$ and $N$. When a query is executed in a database, the overhead in our PaRQ is a constant $(10C_{p})$; the overhead in the MRQED still increases with both $l$ and $N$. Hence, our PaRQ is much more efficient than the MRQED. Further comparison of their range query response time is analyzed in Section VII-C.

TABLE IV COMPARISON OF COMPUTATION COMPLEXITY.

### C. Response Time

To provide good services to requesters, the response time of a range query is an important metric. For example, it would be useful for the requesters to know how long they exactly need to wait for a range query result so that they can efficiently schedule their tasks. Actually, response time varies according to many factors, such as communication latency etc. We analyze the response time of our PaRQ with or without considering the network communication latency $\Delta$. The other factors are not being included in this calculation of the response time.

In the PaRQ, a range query is processed by the CC, $CS_{1}$ and $CS_{2}$. We model our range query process as a tandem model of network queues [27], as shown in Fig. 6. We assume that the range query arrives the system according to a poisson process with rate $\lambda$, and uses the CC for token generation in an exponentially distributed time interval with mean $1/\mu$ (as an $M/M/1$ queue). Upon exiting the CC, the requester continue accesses the $CS_{2}$ with rate $\lambda_{2}$ for a time which is deterministic $1/\mu_{2}$ (as an $M/D/1$ queue). Finally, the requester accesses server $CS_{1}$ with rate $\lambda_{1}$ for a time which is exponentially distributed with mean $1/\mu_{1}$ (as an $M/M/1$ queue). Let TeX Source $$\rho={{\lambda}\over{\mu}};\rho_{1}={{\lambda_{1}}\over{\mu_{1}}};\rho_{2}={{\lambda_{2}}\over{\mu_{2}}}.$$ If all the network states are in the set of $n=\{n_{0},n_{1},n_{2}\}$, according to Jackson's Theorem [28], the steady-state probability distribution of the system is given as: TeX Source $$P(n_{1},n_{2},n_{3})=\rho^{n_{0}}(1-\rho)\rho_{1}^{n_{1}}(1-\rho_{1})\rho_{2}^{n_{2}}(1-\rho_{2}).$$ Let $T$, $T_{1}$ and $T_{2}$ be the average queuing delay of the CC, $CS_{1}$ and $CS_{2}$, respectively. They can be calculated as: TeX Source \eqalignno{T=&\,{{1}\over{\mu-\lambda}}={{1}\over{\mu (1-\rho)}};\cr T_{1}=&\,{{1}\over{\mu_{1}-\lambda_{1}}}={{1}\over{\mu_{1}(1-\rho_{1})}};\cr T_{2}=&\,{{1}\over{\mu_{2}}}{{2-\rho_{2}}\over{2-2\rho_{2}}}.} Then, the total delay of the range query in the PaRQ is: TeX Source $$T_{tol}={{1}\over{\mu (1-\rho)}}+{{1}\over{\mu_{1}(1-\rho_{1})}}+{{1}\over{\mu_{2}}}{{2-\rho_{2}}\over{2-2\rho_{2}}}.$$ In this section, the response time of a range query is the total queuing delay on all the servers.

Fig. 6. Tandem model [27] of a range query process.

Detailed experiments are conducted on a Pentium IV 3-GHz system to study the execution time [29]. For $G_{1}$ over the Freeman–Scott–Teske (FST) curve, a single exponentiation operation in $G_{1}$ with 161 bits costs 1.1 ms, and the corresponding pairing operation costs 3.1 ms. Without loss of generality, let $N=20$, $l=5$. According to Table IV, the processing time of the CC is the tokens generation time, i.e., $1/\mu=(12l+6)\times 1.1=72.6 {\rm ms} \approx0.073 {\rm s}$. If range query length is exponentially distributed with mean 2 Kbits and arrives according to a poisson process with rate 1query/10minute, i.e., $\lambda=1/600$, and the queuing delay $T={{1}\over{\mu-\lambda}}=0.073 {\rm s}$ and average queue length $L={{\lambda}\over{\mu-\lambda}}=0.0012$.

Next, the processing time of the $CS_{2}$ depends on the query tokens verification and the searching time in $CS_{2}$'s database. The computation time of a query token verification over one record is $10\times 3.1=31 {\rm ms}$. If the number of records in $CS_{2}$'s database is $M=100$, the $CS_{2}$'s processing delay is 3.1 s, i.e., $1/\mu_{2}=3.1 {\rm s}$. At last, querying indexed ciphertexts on the $CS_{1}$ is typically processed very fast, the processing time is only several milliseconds (e.g $1/\mu_{1}=10 {\rm ms}$). Since $\mu_{2}=max(\lambda_{1})$, considering the extreme case $\lambda_{1}=\mu_{2}$, then, $T_{1}=0.01 {\rm s}$ and $L_{1}\approx 0$. Therefore, $T$, $L$, $T_{1}$, $L_{1}$ are very small, and few rang query tasks can be buffered in the queue $CS_{1}$ and CC. As a result, their queuing delay can be neglected.

From the above analysis, the processing time on the CC is much faster than the query arriving interval. Thus, $\lambda=\lambda_{2}=1/600$. Moreover, compared with the $CS_{2}$ queue, the service rate of the $CS_{1}$ and CC is much faster, i.e., $\mu_{1}\gg\mu\gg\mu_{2}$. Therefore, a range query's average response time is mainly determined by the processing time of the $CS_{2}$. Consequently, the whole range query response time is distributed approximately as in the $CS_{2}$ queue, that is, as in an $M/D/1$ queue with poisson rate $\lambda_{2}$ and service rate $1/\mu_{2}$. Hence, the response time of a range query can be represented by $T^{,}_{tol}$. TeX Source $$T^{,}_{tol}={{1}\over{\mu_{2}}}{{2-\rho_{2}}\over{2-2\rho_{2}}}.$$ If the total communication latency $\Delta$ among all network links is not negligible, the formula $T^{,}_{tol}$ should be adjusted to TeX Source $$T^{,}_{tol}={{1}\over{\mu_{2}}}{{2-\rho_{2}}\over{2-2\rho_{2}}}+\Delta.$$ In fact, the smart grid usually uses 3G (3rd Generation) or 4G (4th Generation) cellular network topology for cells data transmission. Data transmission rate is 60–240 Kbps, and distance converge depends upon the availability of cellular service [30]. Hence, the communications rate among the requesters, the CC and the CSs in our PaRQ scheme is assumed to be 240 Kbps. Here, $\lambda_{2}=1/600$ and $1/\mu_{2}=3.1 {\rm s}$.

1. If the communication latency $\Delta$ is negligible, i.e., $\Delta\approx 0$, we can see from Fig. 7 that the response time of a range query is increased with the number of the database records. Comparing the total response time of a range query with or without considering the queuing delay of the CC and $CS_{1}$ by using $T_{tol}$ and $T^{,}_{tol}$, respectively, Fig. 7 also shows that they are almost the same, which means that the queuing delay of the $CS_{1}$ and CC really can be neglected in response time calculation.
2. If the communication latency $\Delta$ is not negligible, we should consider the communication overhead during the range query process. From the above analysis, if S sends a 2K-bit range query to the CC, the CC replies with two 1610-bit tokens. Then, S forwards these two tokens to the $CS_{2}$, and the $CS_{2}$ replies the satisfactory keys and indexes. If $t$ is the average number of matched results and the size of the keys and indexes are 80 bits each, thus, their communication overhead is 160t bits. Finally, when S accesses the $CS_{1}$, the $CS_{1}$ replies the correct indexed ciphertexts. Usually, the size of the ciphertext is large. Let it be 1 Mbits/packet. The communication overhead of the ciphertexts is 1 tM bits. Hence, the system communication latency $\Delta\approx 0.022+4.26t$. TeX Source $$T^{,}_{tol}={{1}\over{\mu_{2}}}{{2-\rho_{2}}\over{2-2\rho_{2}}}+0.022+4.26t.$$
Fig. 8 illustrates that both the number of data records in the $CS_{2}$'s database and the number of replied ciphertexts from the $CS_{1}$ can augment the system response time.
3. Compared with the MRQED, Table II shows that the MRQED needs $5l\cdot l_{og}NC_{p}$ to verify a query on a ciphertext. In a database with the number of data records $M=100$, the processing time in the database is $5l\cdot l_{og}N\times 3.1 {\rm s}$. Similarly, in the MRQED, the encryption and decryption key generation time is much shorter than the processing time of data query in the database. Therefore, the range query response time of the MRQED is mainly determined by the query in database. Thus, the range query service in the MRQED can also be modeled by using an $M/D/1$ queue with poisson rate $\lambda^{\prime}_{2}=1/600$ and service rate $1/\mu^{\prime}_{2}=5l\cdot l_{og}N\times 3.1$. Fig. 9 illustrates the response time comparison between the PaRQ and the MRQED without communication delay, the range query arrival rates are $\lambda_{2}=\lambda^{\prime}_{2}=1/600$. From Table IIand Fig. 9, we can observe that in the MRQED, if the number of data records in the database is a constant, the response time of a range query increases with the domain $N$ and the number of dimensions $l$, while the service time in our PaRQ is $1/\mu_{2}=3.1 {\rm s}$. Thus, no matter how many dimensions of the data and how large the domain is, a single range query process time in the PaRQ remains ${{1}\over{\mu_{2}}}{{2-\rho_{2}}\over{2-2\rho_{2}}}=3.1 {\rm s}$, which is much less than that of the MRQED.
Fig. 7. Response time in PaRQ scheme, $\Delta=0$.
Fig. 8. Response time in PaRQ scheme, $\Delta=0.022+4.26t.{\rm c}$.
Fig. 9. Comparison of the number of matched residential users. (a) $\lambda_{2}=\lambda^{\prime}_{2}=1/600$, $1/\mu_{2}=3.1 {\rm s}$, $1/\mu^{\prime}_{2}=0.6699 {\rm l}$. (b) $\lambda_{2}=\lambda^{\prime}_{2}=1/600$, $1/\mu_{2}=3.1 {\rm s}$, $1/\mu^{\prime}_{2}=0.775\times\log_{2}(N)$.
SECTION VIII

## CONCLUSION

In this paper, we have proposed a privacy-preserving range query scheme, named PaRQ, for smart grid. An HVE based range query predicate is constructed to realize the range query on encrypted metering data. The PaRQ allows users to store their data on cloud servers in encrypted form, and range queries can be executed by using cloud server's computational capabilities. A requester with authorized query tokens can obtain the correct session keys to retrieve the metering data within specific query ranges. Security analysis demonstrates that the PaRQ can achieve data confidentiality and privacy and preserve query privacy. Performance evaluation shows that the PaRQ can significantly reduce computation and communication overhead, as well as response time. For our future work, we intend to enhance our PaRQ to support ranked range query with security and privacy preservation.

## Footnotes

This work was supported by the National Natural Science Foundation of China under Grants 61073189, 61272437, and 61202369, NSERC, Canada, the Innovation Program of Shanghai Municipal Education Commission under Grant 13ZZ131, the Foundation Key Project of Shanghai Science and Technology Committee under Grant 12JC1404500, and the Project of Shanghai Science and Technology Committee under Grant 12510500700.

M. Wen, is with the College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201101, China and also with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada CORRESPONDING AUTHOR: M. WEN (wenmi2222@gmail.com)

J. Lei is with the College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201101, China

K. Zhang, X. Liang, and X. Shen are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada

R. Lu is with the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore 639798

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available