Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

IEEE Quick Preview
  • Abstract

SECTION I

INTRODUCTION

Rapid developments in the computer science, and information and communication technology along with the control advances in physical systems have emerged into a new direction of multi-disciplinary engineering systems called Cyber-Physical Systems (CPS) [2]– [3][4]. This revolutionary section of the science enables humans to interact with and control the environment more efficiently and effectively. “CPSs will transform how humans interact with and control the physical environment to the greater benefit of society” [4]. Regardless of having a fully or semi-auto controlling system, the system relies on collecting data to make the controlling decisions for the physical and controlling interactions, or as part of the system feedback loop [5]. Normally, the fine-grained data gathered/sensed by the sensing devices, e.g., sensors or metering devices [6], are transferred to the monitoring/controlling parties for further actions, e.g., get processed and make the controlling decisions. There are many example of the CPS applications such as health-care [3], manufacturing automation, energy (smart grid), agriculture, defense and transportation [4], [6]– [7] [8][9] to name a few. In this paper, we describe our scheme and designs specifically for the case of smart grid.

Smart grid system is aimed at improving power generation, transmission, distribution and consumption through contribution and collaborations of different stockholders such as utility sector, service providers and consumers [10], [11]. In all systems and applications that follow demand-response architecture such as the advanced metering infrastructure (AMI) used in a smart grid system, information about the actual or planed power consumption are key elements [12]. In this case, smart meters are used to periodically collect live metering data from end-users, e.g., home area networks (HANs). This information is then transmitted to the utility via AMI to be used for billing purposes. Also, this information is used by the service provider as a reference to efficiently plan service delivery [13]. Furthermore, this fine-grained information is used by the energy management system to provide users with real-time price (tariff) of the power upon which the consumers can take advantage of the low price times. This motivates consumers to move their power demands to off-peak hours so as to efficiently use the power and decrease their monetary costs [10].

Different communication technologies have been proposed for the AMI such as power line communication and wireless communication [14]. In North America, wireless multi-hop communication technologies (e.g., ad-hoc and mesh networks) are proposed to be used for exchanging data and control messages over the AMI between smart meters or gateways of HANs and the utility [1], [13]– [14] [15] [16][17]. In this case, data traffic is transmitted from a smart meter to the utility and vice versa over multi-hop wireless links with intermediate network nodes forwarding traffic (Fig. 1).

Figure 1
Fig. 1. Smart grid network architecture.

Privacy in the smart grid is identified as one of the biggest concern by the research community, considering the uncertainty in the environment [7]. Due to the broadcast nature of wireless transmissions in the AMI, an attacker can overhear communication between any adjacent wireless nodes. This enables the attacker to detect valuable information, which can compromise privacy of the clients. Even if the transmitted packets were encrypted, the attacker may correlate the amount of traffic transmitted by a particular user at different times to infer private information about the user by applying a user behavior model. Thus, having well defined security and privacy system are preliminary demands for implementation readiness of the smart grid system. Although it may be tempting to try to patch existing protocols such as random paths and anonymous routing to provide some level of privacy [18], the privacy of the users in the smart grid system needs to consider more precise specifications such as anonymity, unobservability, unlinkability, and undetectability. This requires different designs of traffic routing in order to meet the required privacy properties. For example, when using anonymous routing protocols, an adversary may detect data traffic generated by an individual smart meter to infer information about appliances existed in a HAN (by monitoring trends of power consumed by different appliances), and information about behavior of the users (by monitoring amount of power usage in the HAN). Although a trivial scheme that generates dummy packets may solve the unobservability problem, it fails to address anonymity, unlinkability and undetectability while introducing high amount of the overhead to the system. We refer to the Pfitzmann-Hansen definitions of the privacy [19], which we describe in Section II.

Contribution: Our proposed schemes address the problem of preserving privacy of users in a smart grid system by maintaining all necessary features required for privacy in such a system including anonymity, unlinkability, undetectability and unobservability communications.

None of the existing schemes in the literature simultaneously address all these properties together. We identify five privacy measures for the CPS communication such as hiding source, destination, path, traffic volume and content. We address this problem using an enhanced network coding technique. Our proposed schemes basically benefit from the capability of the network coding in encoding transmitted linear combination of packets.

We review our definition of privacy in the smart grid context and provide a background for network coding in Section II followed by literature review in Section III. Our proposed schemes are presented in Section IV, while we analyze the performance of our proposed schemes in Section V. We conclude the paper in Section VI.

SECTION II

BACKGROUND

Definition of Privacy

There are different proposed definitions for the privacy. Bob Blakley defines privacy as “The ability to lie about yourself and get away with it” [20], or “The right to be left alone”. The latter definition has been adopted by NIST [21]. Pfitzmann and Hansen provided six features for the privacy [19] as follows:

1. Anonymity

The most used feature in the literature for the privacy is anonymity. “Anonymity of a subject means that the subject is not identifiable within a set of subjects, the anonymity set” [19]. The main goal of the anonymity is to make a party anonymous from others, even a peer. There are two defined forms for the anonymity: Sender Anonymity and Receiver Anonymity.

2. Unlinkability

The situation of not being able to distinguish relationship between two items in a system is referred to as unlinkability. Unlinkability is required for different items in the smart grid such as smart device, smart meter, controller of a HAN, Building Area Network or Neighborhood Area Network, aggregator, system/sub-system (located in cloud or in any of the smart grid servers) or group (like multicast group).

3. Undetectability

Undetectability of an item (entity, application or process) from an adversary's perspective means that the adversary is not able to sufficiently distinguish whether the item exists or not.

4. Unobservability

Unobservability of an item (entity, application or process) means that first of all, undetectability of the item against all subjects uninvolved in it. In addition and at the same it means the anonymity of the subject(s) involved in the item even against the other subject(s) involved in that item.

5. Pseudonymity

“A pseudonym is an identifier of a subject which is different from the subject's real names”. For instance, a smart meter can have multiple identities known by whom the smart meter is communicating with. Pseudonym can be defined as person pseudonym, role pseudonym, relationship pseudonym, role-relationship pseudonym, transaction pseudonym, with respect to the relationship and link between the pseudonym and its holder.

6. Identity Management

Entities of a system that follows pseudonymity approach have multiple identities. Each identity can be based on one or some attributes of the entity. Managing the identities in terms of assigning and controlling them in a way that makes the item unidentifiable by any unauthorized party is the task of identity management.

A. Network Coding

Network coding has been widely used to improve the robustness and bandwidth efficiency of multicast routing in special network topologies. However, the inherit feature of packet encryption in the network coding can be exploited to provide privacy for users in a smart grid. Furthermore, the distributed nature of the network coding increases its robustness against possible attempts of attackers. The simplest coding scheme is linear coding [22], [23]. Linear network coding treats a block of data as a vector over a certain base field of coefficients. Each intermediate node performs a linear transformation and achieves a linear combination of the incoming edges before delivering them to the next node(s).

Network coding is used in communication to target maximizing throughput, minimizing energy per bit and Minimizing delay [24]. A linear combination of received packets at the encoding nodes is transmitted with a linear coding coefficient vector or Local Encoding Vector (LEV). The GEV is used to form the transfer matrix for the entire system. Practical instances of the network coding constitute the following: (i) Random coding [25] which allows the encoding to be done in a distributed fashion, (ii) Packet tagging of each packet with LEV allows the decoding to be done in a distributed manner, and (iii) Buffering which is required for asynchronous packet arrivals and departures with arbitrarily varying rates, delay, and loss.

Let us assume an acyclic network Formula$(V, E, c)$ with unit capacity edges Formula$c(e)=1$ for all Formula$e\in E$. Let Formula$x_{1},x_{2},\ldots, x_{h}$ be the Formula$h$ packets that our graph, from an over all point of view, wishes to carry. Bringing the coefficients of all nodes Formula$v\in V$ into account and in short, if we assume an “Formula$h\times h$” model, (1) shows the relationship between received packets Formula$(y_{i}{\rm s})$ and sent packets Formula$(x_{i}{\rm s})$. Matrix Formula$T$ presented by (2) is called transfer matrix of the network, therefore, receiver(s) can use (3) to extract the original Formula$x_{i}$ out of Formula$y_{i}$. Formula$T$ is based on each node coefficient and should be an invertible matrix, which having a random coefficient guarantees that.FormulaTeX Source$$\eqalignno{\left[\matrix{y_{1}\cr\vdots\cr y_{h}}\right]=&\,\left[\matrix{t_{1}(e_{1}) &\ldots & t_{h}(e_{1})\cr\vdots&\ddots &\vdots\cr t_{1}(e_{h}) &\ldots &t_{h}(e_{h})}\right]\times\left[\matrix{x_{1}\cr\vdots\cr x_{h}}\right] &{\hbox{(1)}}\cr T=&\,\left[\matrix{t_{1}(e_{1})&\ldots & t_{h}(e_{1})\cr\vdots &\ddots &\vdots\cr t_{1}(e_{h}) &\ldots & t_{h}(e_{h})}\right]&{\hbox{(2)}}\cr\left[\matrix{y_{1}\cr\vdots\cr y_{h}}\right]=&\, T\times\left[\matrix{x_{1}\cr\vdots\cr x_{h}}\right]\Rightarrow\left[\matrix{x_{1}\cr\vdots\cr x_{h}}\right]=T^{-1}\times\left[\matrix{y_{1}\cr\vdots\cr y_{h}}\right]&{\hbox{(3)}}}$$

Depicted by Fig. 2, and since transfer matrix Formula$T$ is not fix due to dynamic and randomness of the coefficients, a receiver requires to calculate Formula$T^{-1}$ each time based on received tags. To improve the calculations of (3), [26] proposes using sub-graph in order to handle different sources' traffics to different destination. More specifically, the main graph is divided to parallel sub-graphs, and packets from a source to a destination traverse in only one sub-graph. The aim in [27] is finding the minimum cost multicast sub-graph, where delay values associated with each link, limited buffer-size of the intermediate nodes and link capacity variations over time are taken into account.

Figure 2
Fig. 2. Matrix of transfer.
SECTION III

RELATED WORK

Wayne Wolf proposed the concept of the cyber-physical systems. He mentioned that understanding and using of computer needs to change, “Cyberphysical systems actively engage with the real world in real time and expend real energy. This requires a new understanding of computing as a physical act, a big change for computing” [2]. The challenges of the CPS design and deployment are studied in [3]. The authors mentioned that global warming coupled with energy shortage and the aging of the population are the objects of the CPS, and they identified the research challenges for the CPS as real-time system abstractions, robustness of CPS, quality of service composition, and knowledge engineering. In [4], the CPS is studied as a combination of multiple fields of science such as computing, communication and control systems. The author compared the evolution of the CPS to the Internet, and provided some applications of the CPS in real world, e.g., smart grid for the power sector. He also mentioned that privacy should be preserved by the CPS: “These CPSs will have embedded and distributed intelligence, operating dependably, securely, safely, and efficiently in real time, while satisfying privacy constraints”. The author also presented advances of the CPS, such as fully autonomous vehicles, smart power grids and extreme-yield agriculture, as well as the impact of the CPS on society and education. Modeling the CPS is studied in [5], where authors provided challenges of the CPS caused by heterogeneity, concurrency, and sensitivity to timing of CPSs, by modeling the dynamics considering the evolution of a system state over time.

A survey on the CPS in [7] presents a number of CPS and their features. The authors also described state-of-the-art CPS researches in energy control, secure control, transmission and management, control technique, system resource allocation, and model-based software design. Authors also described the research CPS challenges in the area of control and hybrid systems, sensor and mobile networks, robustness, reliability, safety, and security, abstractions, model-based development, and verification, validation, and certification.

The work in [6] considers the case of smart grid as an application of the CPS, which is related to the scope of our work in this paper. The research work presented in [8] considers security of the smart grid. Author discussed the security aspects of the cyber-physical controls required to support the smart grid, which takes into account the power application. They analyzed the security from the risk point of view, and address the security concerns in control systems of the generation, transmission and distribution of the power in the smart grid. Furthermore, they studied the security of the infrastructure support and devices as well as security management and intrusion detection systems, followed by list of research challenges in this area. In this paper, however, we focus on the privacy aspect of the smart grid in this paper. To the best of our knowledge, we are the first to propose comprehensive schemes to address all features required to preserve privacy of clients in a smart grid system.

The scope of the work in [9] is the smart grid as well, in which the authors presented a security-oriented cyber-physical state estimation system. Their proposed system identifies the compromised set of hosts in the cyber network and the maliciously modified set of measurements obtained from power system sensors, at each time instant. They used the concept of the IDS, which utilizes stochastic information fusion algorithms and merges sensor information from both the cyber and electrical infrastructures. The innovation of their proposed work is using the IDS system to monitor the cyber infrastructure for malicious or abnormal activity, in conjunction with knowledge about the communication network topology. Similarly in [28], the authors concentrated on the effect of intrusion detection and response on the reliability of a CPS. They considered a CPS system comprises of sensors, actuators, control units, and physical objects for controlling and protecting a physical infrastructure. Their developed model is based on stochastic Petri nets to emulate the behavior of the CPS in the presence of both malicious nodes exhibiting a range of attacker behaviors. They also proposed an intrusion detection and response system for detecting and responding to malicious events at runtime.

The scope of the work in [29] is data center from the CPS point of view, in which the authors considered the controlling system of data centers versus the ITC system. Precisely, the proposed model considered a computational network representing the cyber dynamics and a thermal network representing the physical dynamics as two coupled networks in a control oriented model. In [30], safety, security and sustainability (S3) of the CPS is the target of the study, in which they proposed a formal framework for representing cyber-physical interactions in a CPS. Authors also studied the challenges that are applicable to this framework. In [31], the authors provided a review of the historical technology developed to the CPS systems, as well as applications of the CPS along with the new research challenges and directions.

M. Stegelmann et al. proposed a scheme, wherein smart meter sends the metering data to a local aggregator, and then the aggregator applies the anonymity before sending the data to service providers. Although data for the billing is not anonymous, the same data is anonymous when it is sent to the service provider for the planning [32]. However, this scheme provides only source anonymity in portion of the data deliveries. The presented system in [33] aimed at anonymity of the smart meters by combining the data collected by each smart meter with an ortho code, in a ring architecture, to the utility via an aggregator. The utility, without realizing the identification of each smart meter, can obtain the meters by summation information processed by aggregator. As the authors mentioned as well, they only provided anonymity of the sender (smart meter).

A Secured routing protocol for ad-hoc network is presented in [34], which enables anonymity of the source, destination and path. In this protocol, a source initiates and broadcasts a path request including a path sequence number and the encrypted destination address. The relay nodes only rebroadcast the path request after recording it. The destination responds back (unicast) to the path request, and nodes along the path reserve the path by matching information about the previous and next hops. However, this protocol is vulnerable to the flow tracing attack.

In [35], a network coding based scheme is used for privacy preserving, which extends the work in [34] by providing source anonymity. The scheme forwards a random-based linear vector encrypted Global Encoding Vector (GEV) at each intermediate node in which only the destination is capable of decrypting the GEV. The receiver has to undergo the decryption of the tags, forming transfer matrix, and heavy process of the reverse matrix calculation. The scheme presented in [36] also utilizes network coding to support security and privacy.

In [37], the linear network coding is used to maintain privacy of the mobile nodes in a wireless mesh network environment. The proposed mechanism is aimed at flow untraceability and movement untraceability of the nodes. However, the proposal mainly pay attention to the flow of the information of the mobile nodes, and does not preserve anonymity of the nodes, especially when an attacker is listening to the first mesh router that receives the data/packet from the mobile node.

The proposal scheme in [38] aimed at flow anonymity of the data to provide the anonymity of the communicating parties by tacking advantage of mixing characteristic of the coding. Although the scheme concentrates on anonymity of the source and destination by hiding the flow identifies causes by mixing the flows, it does not address other aspects of the privacy.

SECTION IV

SYSTEM DESIGN

In this section, we first describe our assumptions. we then present our proposed enhanced the network coding mechanism and describe our privacy-preserving scheme.

Assumptions and System Setup

Our assumption are as follows:

  • Public key encryption system that has a private key generator (PKG) responsible for the key management. The detail of the encryption system can be found in the literature, e.g., [17].
  • Nodes have already performed an authentication scheme. They have also received their private key as well as the system parameters from the PKG.
  • Topology is almost static: For instance in case of the smart grid, the maximum movement of nodes are within a HAN, although the smart meter of the HAN is static.
  • A smart grid server, which can be in charg eof the PKG duties as well, is aware of the topology and graph of the network.

A. Enhanced Network Coding

As shown in Fig. 3, the system administrator divides the main topology/graph Formula$G$ into “Formula$m$” sub-graphs Formula$SubG_{i}$ (he may consider the proposed solution in [27] for sub-graphing) and forms sub-graphs set Formula$\widetilde{SubGS}$ such that: FormulaTeX Source$$\eqalignno{&\widetilde{SubGS}=\{SubG_{i}\vert~i=1,2,\ldots, m\}&{\hbox{(4a)}}\cr\smash{\left\{\vrule height3pc depth3pcwidth0pc\right.}\cr& G=\bigcup_{i=1}^{m}SubG_{i}=\bigcup_{SubG_{i}\in\widetilde{SubGS}}SubG_{i}&{\hbox{(4b)}}}$$

Figure 3
Fig. 3. Matrix of transfer, with sub-graphs.

In each sub-graph Formula$SubG_{i}$, system administrator selects Formula$n_{s}$ nodes to be the network coding nodes, which perform the network coding activities such as encoding. Furthermore, system administrator nominates one of the nodes to be head cluster of the sub-graph, which can be shown by Formula$HC_{i}$.

We consider transfer matrices set Formula$\widetilde{TS}$, which Formula$T_{i}$ represents transfer matrix of Formula$SubG_{i}$ such that:FormulaTeX Source$$\widetilde{TS}=\{T_{i}\vert i=1,2,\ldots,m\}\eqno{\hbox{(5)}}$$

Similarly, we consider inverse of transfer matrices set Formula$\widetilde{TRS}$, which Formula$TR_{i}$ represents inverse of the transfer matrix of the sub-graph Formula$SubG_{i}$, such that: FormulaTeX Source$$\widetilde{TRS}=\{TR_{i}\vert i=1,2,\ldots,m\}\eqno{\hbox{(6)}}$$

Furthermore, we introduce a new parameter “Formula$\alpha_{i}$” as follows: FormulaTeX Source$$\eqalignno{&~1,\quad data~crosses~SubG_{i}&{\hbox{(7a)}}\cr\noalign{\vskip-2ex}\alpha_{i}=\smash{\left\{\vrule height1.35pc depth1.35pc width0pc\right.}\cr &~0,\quad data~does~not~cross~Sub~G_{i}&{\hbox{(7b)}}}$$

Finally, we define “Formula$h\times h$” transfer matrix Formula$\widehat{T}$ which converts an input data matrix Formula$\widehat{X}=\left[\matrix{x_{1}& x_{2}&\cdots & x_{h}}\right]^{T}$ to the output data matrix Formula$\widehat{Y}=\left[\matrix{y_{1}& y_{2}&\cdots & y_{h}}\right]^{T}$, following (8a) and (8b).FormulaTeX Source$$\eqalignno{{}&\widehat{T}=\prod\limits_{T_{i}\in\widetilde{TS}\&~\alpha_{i}=1}T_{i},\quad i=1,2,\ldots,m &{\hbox{(8a)}}\cr\noalign{\vskip-1.5ex}\smash{\left\{\vrule height2pc depth2pc width0pc\right.}&\cr &\widehat{Y}=\widehat{T}\times\widehat{X}&{\hbox{(8b)}}}$$

Similarly and at the receiver side, (9a) and (9b) are used to decode Formula$\widehat{X}$ out of Formula$\widehat{Y}$. Note that Formula$\widehat{TR}=\widehat{T}^{-1}$.FormulaTeX Source$$\eqalignno{{}&\widehat{TR}=\prod\limits_{T_{i}\in\widetilde{TS}~\&~\alpha_{i}=1}T_{i}^{-1},\quad i=1,2,\ldots,m\cr\noalign{\vskip-0.5ex}\smash{\left\{\vrule height3.5pc depth3.5pc width0em\right.}&\cr\noalign{\vskip-1.5ex}&\quad~=\prod_{TR_{i}\in\widetilde{TRS}~\&~\alpha_{i}=1}TR_{i},\quad i=1,2,\ldots,m~~~&{\hbox{(9a)}}\cr&\widehat{X}=\widehat{TR}\times\widehat{Y}&{\hbox{(9b)}}}$$

B. Privacy-Preserving Scheme

Referring to Section II, a receiver requires the LEVs of a graph (over which the data has passed through) in order to compute the transfer matrix. In a linear network coding, there are two parameters that can be changed, such as network topology (path) and coefficient factors (LEVs). One solution is having one of these two values to be fixed and the other one changes dynamically (or, in some cases both of them can be dynamic). To be more precise, we can keep the topology (path) static, and randomly choose the coefficients, which in this case the coefficients information should be transferred (some how, and securely) to the receivers to make the receiver capable of decoding the data. On the other hand, we can fix the coefficients and randomly choose the path, which in this case information about the path, or the network coding nodes (that have performed network coding operation/encoding), should be transferred to the receiver.

Note that LEV is a function of the coefficient factors [24]. Without loss of generality: FormulaTeX Source$$T_{i}=Function (LEV_{SubG_{i}}),\quad i=1,2,\ldots, m\eqno{\hbox{(10)}}$$

Since we keep the sub-graph structure fix, only knowing coefficients is missing to compute the transfer matrix(ces) of the sub-graphs, which the server is capable of doing it. From an abstract point of view, in our system, we keep the topology, nodes coefficients and structure of the sub-graphs fix, although the sub-graphs that the data is crossing is being selected randomly. Our mechanism phases are as follows:

Algorithm 1

1. Phase I: Setup

Firstly (Algorithm 1), PKG provides a One-Way hash function Formula$F_{coef}(.)$ to the nodes. Each node applies Formula$F_{coef}(.)$ to its own private key to obtain its coefficient (11): FormulaTeX Source$$Node{\_}Coefficient=F_{coef}(Node{\_}PrivateKey)\eqno{\hbox{(11)}}$$

In a PKI-based system, only PKG and each node know the private key of the node. System administrator provides all information about the topology and graph consists of the participating nodes in each sub-graph to PKG. PKG calculates Formula$T_{i}$ and Formula$T_{i}^{-1}$ of each Formula$SubG_{i}$ and provides the Formula$T_{i}^{-1}~{\rm s}$ to a destination.

Note that a private key can be considered as a random-based secret value managed by PKG. For instance, in an identity-based cryptography approach, like [39], the private key of a node is multiplication of a secret random value generated by PKG and the public key of the node. Since the coefficient is a function of the private key (11), the randomness is implied for the coefficient as well, and referring to [24], Formula$T_{i}$ is invertible.

Since Formula$F_{coef}(.)$ is a One-Way function, even if any of the receivers acts maliciously, an attacker would not be able to utilize matrix Formula$T_{i}^{-1}$ and performs a reverse operation to obtain the private keys of the nodes. We discuss more about this in Section V. Furthermore, a private key is a dynamic value [17], therefore, transfer matrices Formula$T_{i}$ (and Formula$T_{i}^{-1}$) are also dynamic. Note that the PKG is responsible to maintain and update the matrices and informing the receivers, for instance in case of the smart grid, the smart grid servers, which collect the data, should be notified by this server (PKG).

Algorithm 2

2. Phase II: Generating and Sending the Packets

Presented by Algorithm 2, a sender chooses a nonce and assigns it to the Formula$TAG$, and a nonce random identity for the Formula$TAG$, which we show it as Formula$ID_{TAG}$. Then, the sender chooses one of the adjacent sub-graphs with equal probability to send the data. Then, the sender forms the data header including the nonce values and address of the receiver. Furthermore, the sender signs the header with its own private key in order to preserve the source authentication as well as the data header integrity. Finally, the sender sends the encrypted data (packets) and data header, signature of data header and plain form of the tag and its ID to the next sub-graph toward the receiver.

3. Note

Formula$TAG$ is an array that traverses with the data. Each bit of the Formula$TAG$ represents Formula$\alpha_{i}$ of a sub-graph ((7a) and (7b)). To be more precise, the Formula$i$th bit of the array is converted to one if the data passes through Formula$SubG_{i}$. Therefore, initially Formula$TAG$ consists of only zeros Formula$(TAG=0)$. Since Formula$TAG$ is sent in a plain format, we load it with a nonce value, and forward the nonce (encrypted) to the destination. Then, in each sub-graph, the head cluster only reverses the value of the Formula$i$th bit. In other words, we Formula$XOR$ this bit with Formula$\alpha_{i}$. Consequently, at the destination only needs to Formula$XOR$ the result with the original nonce value to decrypt the tag and obtain list of the sub-graphs that the data has passed through. Comparing to the network coding operation, especially at the receiver, changing one bit per sub-graph is negligible overhead added cost by our mechanism.

4. Note

Referring to our discussion in Section II about the network coding, normally the coefficient that each network coding node use to handle the coding process, needs to be sent to the receiver for encoding process (by receiver). In our design, we eliminate sending this overhead data (coefficients) in cost of sending the tag and tag identity. In fact, tag ID is similar to the flow ID that is being used by the network coding, and our additional overhead cost is the tag itself. The overhead cost of sending the tag is much less than sending the coefficients, since in network coding there is one coefficient per network coding node, and we only have one tag from source to destination.

Algorithm 3

5. Phase III: Relaying the Packets

As it is shown in Algorithm 3, we consider a situation that our data is entering to the Formula$SubG_{i}$. The data passes through Formula$SubG_{i}$ concerning the defined connections and coefficient values of the nodes (network coding nodes are already identified by the administrator). The head cluster of the sub-graph needs to record Formula$\alpha_{i}$ into Formula$TAG$ by changing the Formula$i$th bit of Formula$TAG$. Similar to the previous step (sending data), the head cluster of the sub-graph Formula$SubG_{i}$ randomly selects one of its neighbour sub-graphs to transfer the data to toward the receiver.

6. Note

Since the next sub-graph is chosen randomly, the data may get entered to the same sub-graph more than once. In order to prevent this looping situation, the identity of the tag Formula$(ID_{TAG})$ is referred by the header of the sub-graph Formula$(HC_{i})$. Indeed, Formula$HC_{i}$ keeps a record of the Formula$ID_{TAG}$ that is processed by the sub-graph, in addition to IDs the sub-graphs that it is received from and is sent to, for some time in order to prevent processing it twice. The reasonable expiry time of keeping the record can be same as smart meters periodic collecting time, e.g., 15 minutes. In this case, the assumption is that the data will be received and decoded by the receivers during 15 minutes. Therefore, first of all, Formula$HC_{i}$ does not lead the processed (coded) information to be sent to the same sub-graph that is coming from. Secondly, if it receives the same data Formula$(ID_{TAG})$ from another sun-graph, it will forward the data as-is and without coding it again, to the next randomly chosen sub-graph excluding the sub-graphs that are received from as well as the data has been sent previously to. It is obvious that in a worse case scenario, the data will reach the destination after being processed by the entire sub-graphs only once.

Algorithm 4

7. Phase IV: Receiving and Decoding the Packets

Presented by Algorithm 4, when a receiver receives the data:

  • Utilizes its own private key to decrypt the header to obtain addresses of the sender and receiver, and the nonce.
  • Referring to the sender address, verifies the signature, and if it is valid, Formula$XOR{\rm es}$ the nonce with the received tags for decryption.
  • Referring to the bit values of Formula$TAG$, selects Formula$T_{i}^{-1}~(TR_{i})$ of sub-graphs that data has passed through, and multiplies them together to obtain the reverse value of the path transfer matrix Formula$\widehat{TRS}$ via (9a).
  • Obtains original packets sent by the sender via (9b).
SECTION V

SYSTEM EVALUATION

In this section, we present our analysis from privacy and system performance point of views. First we propose two adversary models, then compare our delivered privacy factors comparing to the literature, and finally in the communication and network performance subsection, we discuss complexity and reliability of our design.

A. Adversary Models

We refer to Dolev-Yao model [40] to design our two adversary models including external and internal adversaries, in case of the smart grid system.

1. External Adversary

In this case, the adversary is an external party and is not an entity of the system.

Objectives: The adversary objective is obtaining information about the HAN occupancy and its resident behaviour.

Initial capabilities: The adversary knows the detail information about the initial security system as well as our proposed privacy mechanism. For instance, the adversary knows public keys of the entire parties and has the detail knowledge about the network topology, graph and sub-graphs. Furthermore, the adversary knows the detail design of our mechanism including algorithms shown by Algorithm 14. Finally, the adversary has enough technical knowledge and is fully-equipped to be able to listen to the channels and analyze the traffic.

Capabilities during the attack: The adversary receives all of the packets entering to a HAN (smart meter of HAN) and departure from the HAN. Beside, the adversary can listen to the channel of any other entity of the system like PKG and any destination, to collect their receiving data.

2. Note

By using the term data, we mean and refer to the exact data that is in the channels (encrypted and/or encoded).

Discussion: Refer to our assumption, a HAN gateway (smart meter) acts as relay node in a mesh-based topology. We also implement and perform enhanced network coding that mixes the packets utilizing sub-graphs. Since source and destination addresses are encrypted inside the header, our scheme delivers the anonymity and undetectability, which yields to unobservability. If the adversary listens to entering and departing data from a HAN, he does not gain any useful information, since the entering packets plus HAN packet are encoded into one packet, which hides the HAN packet. If the origin of a packet is an appliance, listening to the channel does not help the adversary to obtain anything about the existence of the appliance (undetectability over appliances). In the proposed schemes in the literature (Section I), he can understand HAN is generating a packet by listening to the first node, so, mostly those schemes only make a private path.

The packets entering a smart meter to be relayed, also do not have the source address, and are entering to the sub-graphs randomly. Therefore, the adversary cannot trace back the packets or monitor flow of the data, so unlinkability is delivered since he cannot observe direction of the data.

Last position for the adversary is at receiver side and listening to the receiving data. Considering above discussion about the hidden address of the receiver, he only obtain the flow of information to the destination. Indeed, since the data travels through random chosen sub-graphs to reach the destination, he cannot trace back the data. Consequently, our scheme maintains anonymity and unlinkability here too.

Note that in any of the above situations, gaining access to Formula$TAG$ does not help the adversary. Indeed, encoding Formula$TAG$ with a random nonce makes sub-graphs capable of inserting Formula$\alpha_{i}$ without decoding Formula$TAG$. He does not obtain anything by having an encoded Formula$TAG$, even at the first or last sub-graphs.

3. Internal Adversary

Adversary is an internal party, e.g., he has access to one of the HANs and can particularly monitor gateway of the HAN or analyze the gateway information.

Objectives: Gaining access to the neighbour HANs information by receiving their data for relay.

Initial capabilities: The malicious node is already authenticated and receives the system parameters and its own private key, so our adversary has these information.

Capabilities during the attack: The malicious node is under control of the adversary and performs the Algorithm 3.

Discussion: Having access to a malicious node only improves the adversary situation on modifying its HAN data. The relay nodes only mix the packets and do not perform any encryption and decryption. Furthermore, the data that he receives does not show any sign of the source or destination. Consequently, his capability and behave is almost same as the previous scenario.

B. Privacy Performance Analysis

Referring to Sections II and III as well as our proposal in Section IV, Table I presents performance of our scheme comparing to the discussed schemes in Section I. We consider two types of the attackers such as a neighbour and a relay node. Some of the schemes may deliver the anonymity in case of relay nodes; however, the data is not anonymous for a neighbour. We also use the following symbols to describe each deliverable:

  • Formula${\ssr X}$”: Does not deliver the measure.
  • “•”: Delivers the measure only against relay nodes.
  • Formula$\checkmark$”: Delivers the measure against all nodes.
Table 1
TABLE I Delivery of the Privacy Measures

C. Communication and Network Performance Analysis

In this subsection, we provide an analysis and evaluation on the aspects of probability of success and complexity as well as intrusion success likelihood, and reliability for the proposed approach. Throughout the discussion we consider a square grid network topology. The communication performance evaluation of our proposed coordinated method is evaluated against the random network coding approach of [41] where authors claim a throughput performance gain over no coding. However, while there are advantages to network coding approaches, the success of these methods highly depends on the characteristics of topology. In this method, nodes continuously replicate and forward messages to newly discovered nodes.

1. Complexity

One of the overheads with the network coding is that nodes must have the processing capability to perform arithmetic operations over finite fields in real time. This processing will determine whether a decoded content chunk is innovative and makes a decision to either encode, forward, or decode. The processing complexity involved in operations over fields depends on the size of each generation Formula$h$, and size of the field Formula$n$. It takes Formula$O(h^{2})$ operations in Formula$F_{2^{n}}$ for linear operations with generations of size Formula$h$. Multiplications and inversions over field Formula$F_{2^{n}}$ is of complexity Formula$O(n^{2})$. Furthermore, matrix inversions and Gaussian elimination to solve the system takes Formula$O(h^{3})$.

As shown in Fig. 4, the cost of computing in our method is lower since the transfer metric at the receiver is implied and need not to be recalculated every interval. The computational cost in our algorithm is reduced because enhanced network coding is performed on a selected set of nodes within each cluster.

Figure 4
Fig. 4. Cost of computing.

2. Reliability

Our method aims at minimizing the number of nodes that shall perform the network coding operations. Therefore, we can take advantage of opportunities for fixed the network coding where possible. It is intuitive that as the system size increases, random network coding on large number of node compromise the overall computational complexity and degrades the overall probability of success.

The probability that a random network coding problem is solvable depends on whether the global coding vector has a full rank. If the coefficients are randomly chosen from a field Formula$F_{q}$, then probability for a generation to be invalid is at most Formula${{\vert T\vert}\over{\vert q\vert}}$. The extension of the Schwartz-Zippel theorem yields the probability of success at each random coded node as follows:FormulaTeX Source$$Pr(success)=(1-{{\vert T\vert}\over{q}})$$ where Formula$Pr(success)$ is the probability of success within the cluster of random network coding. The following theorem from [25] states the probability of success by a valid network code.

Theorem 5.1

The probability of a random network code with coefficients from field Formula$F_{q}$ being valid and being successfully decoded in a multicast connection problem with Formula$\vert T\vert$ number of receivers and Formula$\vert S\vert$ number of sources is Formula$(1-{{\vert T\vert}\over{q}})^{\eta}$ where Formula$q>\vert S\vert$ and Formula$\eta$ is the number of intermediate links with associated random coefficients.

As depicted by Fig. 5, in contrast to the base case scenario, where random network coding is used, our proposed method utilizes a fixed network coding approach where the coefficients are dependent on the private key. Therefore, the uncertainty about the existence of a solution for the system is being resolved.

Figure 5
Fig. 5. Probability of success.
SECTION VI

CONCLUSION

In this paper, we have proposed a privacy-preserving approach for the smart grid system, an application of the cyber-physical system. We developed an enhance network coding technique for packet routing to hide source, destination, path, traffic volume and content information of the packets. We introduced concept of the sub-graphing the network for this purpose, and used a subset of the sub-graphs to transfer the data, which improve the energy consumption and system complexity. Also, we eliminated sending the coefficients of the network coding nodes to the receiver for performing the decoding process, which saves the bandwidth. We have shown that our scheme maintains multiple favourable privacy preserving metrics such as anonymity, unlinkability, undetectability and unobservability for communications over the advanced metering infrastructure. We evaluated the performance of our scheme using both simulation and analytical analysis. Our result show that our proposed schemes provide reliability to the system without adding much complexity.

Footnotes

This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada under Grant STPGP 396838, and by the National Center for Electronics, Communications, and Photonics at King Abdulaziz City for Science and Technology in Saudi Arabia.

References

No Data Available

Authors

Hasen Nicanfar

Hasen Nicanfar

Hasen Nicanfar (S'11) received the Ph.D. degree from the Department of Electrical and Computer Engineering, University of British Columbia, the B.A.Sc. degree in electrical engineering from the Sharif University of Technology in 1993, and the M.A.Sc. degree in computer networks from Ryerson University in 2011. From 1993 to 2010, he was involved in different positions such as an IT/ERP Manager, a Project Manager, and Business and System Analyst. His research interests are in the areas of trust, security and privacy in wireless communication, computer network, and cloud computing.

Peyman Talebifard

Peyman Talebifard

Peyman Talebifard (S'08) received the B.Eng. degree with high distinction in communications engineering from Carleton University in 2006, and was awarded the Senate Medal for high academic achievements. He attended graduate school at the University of British Columbia (UBC) and received the M.A.Sc degree in electrical and computer engineering in 2008. He is currently pursuing the Ph.D. degree with the Electrical and Computer Engineering Department, UBC. His research includes design and analysis of architectures, protocols, and management, control and solutions for interworking of heterogeneous wireless access networks and next generation networks for reliable, efficient, and cost effective communications in telecommunication and computer networks.

Amr Alasaad

Amr Alasaad

Amr Alasaad (S'09–M'13) is an Assistant Professor with the National Center for Electronics, Communications and Photonics, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia. His research interests are in the broad areas of wireless networks and mobile system, P2P resource sharing protocols, routing and scheduling in wireless mesh networks, content sharing and replication schemes. He received the Ph.D. degree in electrical and computer engineering from the University of British Columbia, the M.S. degree in electrical and computer engineering from the University of Southern California, and the B.Sc. degree in electrical engineering from King Saud University in 2000, 2005, and 2013, respectively.

Victor C. M. Leung

Victor C. M. Leung

Victor C. M. Leung (S'75–M'89–SM'97–F'03) is a Professor of electrical and computer engineering and holds the TELUS Mobility Research Chair at the University of British Columbia. He has contributed more than 600 technical papers and 25 book chapters in the areas of wireless networks and mobile systems. He was a Distinguished Lecturer of the IEEE Communications Society. He has been serving on the editorial boards of the IEEE T ransactions on C omputers, the IEEE W ireless C ommunications L etters and several other journals, and has served on the organizing and technical program committees of numerous conferences. He was a winner of the 2012 UBC Killam Research Prize and the IEEE Vancouver Section Centennial Award.

Cited By

No Data Available

Keywords

Corrections

None

Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size