Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

• Abstract

SECTION I

## INTRODUCTION

Rapid developments in the computer science, and information and communication technology along with the control advances in physical systems have emerged into a new direction of multi-disciplinary engineering systems called Cyber-Physical Systems (CPS) [2]– [3][4]. This revolutionary section of the science enables humans to interact with and control the environment more efficiently and effectively. “CPSs will transform how humans interact with and control the physical environment to the greater benefit of society” [4]. Regardless of having a fully or semi-auto controlling system, the system relies on collecting data to make the controlling decisions for the physical and controlling interactions, or as part of the system feedback loop [5]. Normally, the fine-grained data gathered/sensed by the sensing devices, e.g., sensors or metering devices [6], are transferred to the monitoring/controlling parties for further actions, e.g., get processed and make the controlling decisions. There are many example of the CPS applications such as health-care [3], manufacturing automation, energy (smart grid), agriculture, defense and transportation [4], [6]– [7] [8][9] to name a few. In this paper, we describe our scheme and designs specifically for the case of smart grid.

Smart grid system is aimed at improving power generation, transmission, distribution and consumption through contribution and collaborations of different stockholders such as utility sector, service providers and consumers [10], [11]. In all systems and applications that follow demand-response architecture such as the advanced metering infrastructure (AMI) used in a smart grid system, information about the actual or planed power consumption are key elements [12]. In this case, smart meters are used to periodically collect live metering data from end-users, e.g., home area networks (HANs). This information is then transmitted to the utility via AMI to be used for billing purposes. Also, this information is used by the service provider as a reference to efficiently plan service delivery [13]. Furthermore, this fine-grained information is used by the energy management system to provide users with real-time price (tariff) of the power upon which the consumers can take advantage of the low price times. This motivates consumers to move their power demands to off-peak hours so as to efficiently use the power and decrease their monetary costs [10].

Different communication technologies have been proposed for the AMI such as power line communication and wireless communication [14]. In North America, wireless multi-hop communication technologies (e.g., ad-hoc and mesh networks) are proposed to be used for exchanging data and control messages over the AMI between smart meters or gateways of HANs and the utility [1], [13]– [14] [15] [16][17]. In this case, data traffic is transmitted from a smart meter to the utility and vice versa over multi-hop wireless links with intermediate network nodes forwarding traffic (Fig. 1).

Fig. 1. Smart grid network architecture.

Contribution: Our proposed schemes address the problem of preserving privacy of users in a smart grid system by maintaining all necessary features required for privacy in such a system including anonymity, unlinkability, undetectability and unobservability communications.

None of the existing schemes in the literature simultaneously address all these properties together. We identify five privacy measures for the CPS communication such as hiding source, destination, path, traffic volume and content. We address this problem using an enhanced network coding technique. Our proposed schemes basically benefit from the capability of the network coding in encoding transmitted linear combination of packets.

We review our definition of privacy in the smart grid context and provide a background for network coding in Section II followed by literature review in Section III. Our proposed schemes are presented in Section IV, while we analyze the performance of our proposed schemes in Section V. We conclude the paper in Section VI.

SECTION II

## BACKGROUND

#### Definition of Privacy

There are different proposed definitions for the privacy. Bob Blakley defines privacy as “The ability to lie about yourself and get away with it” [20], or “The right to be left alone”. The latter definition has been adopted by NIST [21]. Pfitzmann and Hansen provided six features for the privacy [19] as follows:

#### 1. Anonymity

The most used feature in the literature for the privacy is anonymity. “Anonymity of a subject means that the subject is not identifiable within a set of subjects, the anonymity set” [19]. The main goal of the anonymity is to make a party anonymous from others, even a peer. There are two defined forms for the anonymity: Sender Anonymity and Receiver Anonymity.

The situation of not being able to distinguish relationship between two items in a system is referred to as unlinkability. Unlinkability is required for different items in the smart grid such as smart device, smart meter, controller of a HAN, Building Area Network or Neighborhood Area Network, aggregator, system/sub-system (located in cloud or in any of the smart grid servers) or group (like multicast group).

#### 3. Undetectability

Undetectability of an item (entity, application or process) from an adversary's perspective means that the adversary is not able to sufficiently distinguish whether the item exists or not.

#### 4. Unobservability

Unobservability of an item (entity, application or process) means that first of all, undetectability of the item against all subjects uninvolved in it. In addition and at the same it means the anonymity of the subject(s) involved in the item even against the other subject(s) involved in that item.

#### 5. Pseudonymity

“A pseudonym is an identifier of a subject which is different from the subject's real names”. For instance, a smart meter can have multiple identities known by whom the smart meter is communicating with. Pseudonym can be defined as person pseudonym, role pseudonym, relationship pseudonym, role-relationship pseudonym, transaction pseudonym, with respect to the relationship and link between the pseudonym and its holder.

#### 6. Identity Management

Entities of a system that follows pseudonymity approach have multiple identities. Each identity can be based on one or some attributes of the entity. Managing the identities in terms of assigning and controlling them in a way that makes the item unidentifiable by any unauthorized party is the task of identity management.

### A. Network Coding

Network coding has been widely used to improve the robustness and bandwidth efficiency of multicast routing in special network topologies. However, the inherit feature of packet encryption in the network coding can be exploited to provide privacy for users in a smart grid. Furthermore, the distributed nature of the network coding increases its robustness against possible attempts of attackers. The simplest coding scheme is linear coding [22], [23]. Linear network coding treats a block of data as a vector over a certain base field of coefficients. Each intermediate node performs a linear transformation and achieves a linear combination of the incoming edges before delivering them to the next node(s).

Network coding is used in communication to target maximizing throughput, minimizing energy per bit and Minimizing delay [24]. A linear combination of received packets at the encoding nodes is transmitted with a linear coding coefficient vector or Local Encoding Vector (LEV). The GEV is used to form the transfer matrix for the entire system. Practical instances of the network coding constitute the following: (i) Random coding [25] which allows the encoding to be done in a distributed fashion, (ii) Packet tagging of each packet with LEV allows the decoding to be done in a distributed manner, and (iii) Buffering which is required for asynchronous packet arrivals and departures with arbitrarily varying rates, delay, and loss.

Let us assume an acyclic network $(V, E, c)$ with unit capacity edges $c(e)=1$ for all $e\in E$. Let $x_{1},x_{2},\ldots, x_{h}$ be the $h$ packets that our graph, from an over all point of view, wishes to carry. Bringing the coefficients of all nodes $v\in V$ into account and in short, if we assume an “$h\times h$” model, (1) shows the relationship between received packets $(y_{i}{\rm s})$ and sent packets $(x_{i}{\rm s})$. Matrix $T$ presented by (2) is called transfer matrix of the network, therefore, receiver(s) can use (3) to extract the original $x_{i}$ out of $y_{i}$. $T$ is based on each node coefficient and should be an invertible matrix, which having a random coefficient guarantees that.TeX Source\eqalignno{\left[\matrix{y_{1}\cr\vdots\cr y_{h}}\right]=&\,\left[\matrix{t_{1}(e_{1}) &\ldots & t_{h}(e_{1})\cr\vdots&\ddots &\vdots\cr t_{1}(e_{h}) &\ldots &t_{h}(e_{h})}\right]\times\left[\matrix{x_{1}\cr\vdots\cr x_{h}}\right] &{\hbox{(1)}}\cr T=&\,\left[\matrix{t_{1}(e_{1})&\ldots & t_{h}(e_{1})\cr\vdots &\ddots &\vdots\cr t_{1}(e_{h}) &\ldots & t_{h}(e_{h})}\right]&{\hbox{(2)}}\cr\left[\matrix{y_{1}\cr\vdots\cr y_{h}}\right]=&\, T\times\left[\matrix{x_{1}\cr\vdots\cr x_{h}}\right]\Rightarrow\left[\matrix{x_{1}\cr\vdots\cr x_{h}}\right]=T^{-1}\times\left[\matrix{y_{1}\cr\vdots\cr y_{h}}\right]&{\hbox{(3)}}}

Depicted by Fig. 2, and since transfer matrix $T$ is not fix due to dynamic and randomness of the coefficients, a receiver requires to calculate $T^{-1}$ each time based on received tags. To improve the calculations of (3), [26] proposes using sub-graph in order to handle different sources' traffics to different destination. More specifically, the main graph is divided to parallel sub-graphs, and packets from a source to a destination traverse in only one sub-graph. The aim in [27] is finding the minimum cost multicast sub-graph, where delay values associated with each link, limited buffer-size of the intermediate nodes and link capacity variations over time are taken into account.

Fig. 2. Matrix of transfer.
SECTION III

## RELATED WORK

Wayne Wolf proposed the concept of the cyber-physical systems. He mentioned that understanding and using of computer needs to change, “Cyberphysical systems actively engage with the real world in real time and expend real energy. This requires a new understanding of computing as a physical act, a big change for computing” [2]. The challenges of the CPS design and deployment are studied in [3]. The authors mentioned that global warming coupled with energy shortage and the aging of the population are the objects of the CPS, and they identified the research challenges for the CPS as real-time system abstractions, robustness of CPS, quality of service composition, and knowledge engineering. In [4], the CPS is studied as a combination of multiple fields of science such as computing, communication and control systems. The author compared the evolution of the CPS to the Internet, and provided some applications of the CPS in real world, e.g., smart grid for the power sector. He also mentioned that privacy should be preserved by the CPS: “These CPSs will have embedded and distributed intelligence, operating dependably, securely, safely, and efficiently in real time, while satisfying privacy constraints”. The author also presented advances of the CPS, such as fully autonomous vehicles, smart power grids and extreme-yield agriculture, as well as the impact of the CPS on society and education. Modeling the CPS is studied in [5], where authors provided challenges of the CPS caused by heterogeneity, concurrency, and sensitivity to timing of CPSs, by modeling the dynamics considering the evolution of a system state over time.

A survey on the CPS in [7] presents a number of CPS and their features. The authors also described state-of-the-art CPS researches in energy control, secure control, transmission and management, control technique, system resource allocation, and model-based software design. Authors also described the research CPS challenges in the area of control and hybrid systems, sensor and mobile networks, robustness, reliability, safety, and security, abstractions, model-based development, and verification, validation, and certification.

The work in [6] considers the case of smart grid as an application of the CPS, which is related to the scope of our work in this paper. The research work presented in [8] considers security of the smart grid. Author discussed the security aspects of the cyber-physical controls required to support the smart grid, which takes into account the power application. They analyzed the security from the risk point of view, and address the security concerns in control systems of the generation, transmission and distribution of the power in the smart grid. Furthermore, they studied the security of the infrastructure support and devices as well as security management and intrusion detection systems, followed by list of research challenges in this area. In this paper, however, we focus on the privacy aspect of the smart grid in this paper. To the best of our knowledge, we are the first to propose comprehensive schemes to address all features required to preserve privacy of clients in a smart grid system.

The scope of the work in [9] is the smart grid as well, in which the authors presented a security-oriented cyber-physical state estimation system. Their proposed system identifies the compromised set of hosts in the cyber network and the maliciously modified set of measurements obtained from power system sensors, at each time instant. They used the concept of the IDS, which utilizes stochastic information fusion algorithms and merges sensor information from both the cyber and electrical infrastructures. The innovation of their proposed work is using the IDS system to monitor the cyber infrastructure for malicious or abnormal activity, in conjunction with knowledge about the communication network topology. Similarly in [28], the authors concentrated on the effect of intrusion detection and response on the reliability of a CPS. They considered a CPS system comprises of sensors, actuators, control units, and physical objects for controlling and protecting a physical infrastructure. Their developed model is based on stochastic Petri nets to emulate the behavior of the CPS in the presence of both malicious nodes exhibiting a range of attacker behaviors. They also proposed an intrusion detection and response system for detecting and responding to malicious events at runtime.

The scope of the work in [29] is data center from the CPS point of view, in which the authors considered the controlling system of data centers versus the ITC system. Precisely, the proposed model considered a computational network representing the cyber dynamics and a thermal network representing the physical dynamics as two coupled networks in a control oriented model. In [30], safety, security and sustainability (S3) of the CPS is the target of the study, in which they proposed a formal framework for representing cyber-physical interactions in a CPS. Authors also studied the challenges that are applicable to this framework. In [31], the authors provided a review of the historical technology developed to the CPS systems, as well as applications of the CPS along with the new research challenges and directions.

M. Stegelmann et al. proposed a scheme, wherein smart meter sends the metering data to a local aggregator, and then the aggregator applies the anonymity before sending the data to service providers. Although data for the billing is not anonymous, the same data is anonymous when it is sent to the service provider for the planning [32]. However, this scheme provides only source anonymity in portion of the data deliveries. The presented system in [33] aimed at anonymity of the smart meters by combining the data collected by each smart meter with an ortho code, in a ring architecture, to the utility via an aggregator. The utility, without realizing the identification of each smart meter, can obtain the meters by summation information processed by aggregator. As the authors mentioned as well, they only provided anonymity of the sender (smart meter).

A Secured routing protocol for ad-hoc network is presented in [34], which enables anonymity of the source, destination and path. In this protocol, a source initiates and broadcasts a path request including a path sequence number and the encrypted destination address. The relay nodes only rebroadcast the path request after recording it. The destination responds back (unicast) to the path request, and nodes along the path reserve the path by matching information about the previous and next hops. However, this protocol is vulnerable to the flow tracing attack.

In [35], a network coding based scheme is used for privacy preserving, which extends the work in [34] by providing source anonymity. The scheme forwards a random-based linear vector encrypted Global Encoding Vector (GEV) at each intermediate node in which only the destination is capable of decrypting the GEV. The receiver has to undergo the decryption of the tags, forming transfer matrix, and heavy process of the reverse matrix calculation. The scheme presented in [36] also utilizes network coding to support security and privacy.

In [37], the linear network coding is used to maintain privacy of the mobile nodes in a wireless mesh network environment. The proposed mechanism is aimed at flow untraceability and movement untraceability of the nodes. However, the proposal mainly pay attention to the flow of the information of the mobile nodes, and does not preserve anonymity of the nodes, especially when an attacker is listening to the first mesh router that receives the data/packet from the mobile node.

The proposal scheme in [38] aimed at flow anonymity of the data to provide the anonymity of the communicating parties by tacking advantage of mixing characteristic of the coding. Although the scheme concentrates on anonymity of the source and destination by hiding the flow identifies causes by mixing the flows, it does not address other aspects of the privacy.

SECTION IV

## SYSTEM DESIGN

In this section, we first describe our assumptions. we then present our proposed enhanced the network coding mechanism and describe our privacy-preserving scheme.

#### Assumptions and System Setup

Our assumption are as follows:

• Public key encryption system that has a private key generator (PKG) responsible for the key management. The detail of the encryption system can be found in the literature, e.g., [17].
• Nodes have already performed an authentication scheme. They have also received their private key as well as the system parameters from the PKG.
• Topology is almost static: For instance in case of the smart grid, the maximum movement of nodes are within a HAN, although the smart meter of the HAN is static.
• A smart grid server, which can be in charg eof the PKG duties as well, is aware of the topology and graph of the network.

### A. Enhanced Network Coding

As shown in Fig. 3, the system administrator divides the main topology/graph $G$ into “$m$” sub-graphs $SubG_{i}$ (he may consider the proposed solution in [27] for sub-graphing) and forms sub-graphs set $\widetilde{SubGS}$ such that: TeX Source\eqalignno{&\widetilde{SubGS}=\{SubG_{i}\vert~i=1,2,\ldots, m\}&{\hbox{(4a)}}\cr\smash{\left\{\vrule height3pc depth3pcwidth0pc\right.}\cr& G=\bigcup_{i=1}^{m}SubG_{i}=\bigcup_{SubG_{i}\in\widetilde{SubGS}}SubG_{i}&{\hbox{(4b)}}}

Fig. 3. Matrix of transfer, with sub-graphs.

In each sub-graph $SubG_{i}$, system administrator selects $n_{s}$ nodes to be the network coding nodes, which perform the network coding activities such as encoding. Furthermore, system administrator nominates one of the nodes to be head cluster of the sub-graph, which can be shown by $HC_{i}$.

We consider transfer matrices set $\widetilde{TS}$, which $T_{i}$ represents transfer matrix of $SubG_{i}$ such that:TeX Source$$\widetilde{TS}=\{T_{i}\vert i=1,2,\ldots,m\}\eqno{\hbox{(5)}}$$

Similarly, we consider inverse of transfer matrices set $\widetilde{TRS}$, which $TR_{i}$ represents inverse of the transfer matrix of the sub-graph $SubG_{i}$, such that: TeX Source$$\widetilde{TRS}=\{TR_{i}\vert i=1,2,\ldots,m\}\eqno{\hbox{(6)}}$$

Furthermore, we introduce a new parameter “$\alpha_{i}$” as follows: TeX Source\eqalignno{&~1,\quad data~crosses~SubG_{i}&{\hbox{(7a)}}\cr\noalign{\vskip-2ex}\alpha_{i}=\smash{\left\{\vrule height1.35pc depth1.35pc width0pc\right.}\cr &~0,\quad data~does~not~cross~Sub~G_{i}&{\hbox{(7b)}}}

Finally, we define “$h\times h$” transfer matrix $\widehat{T}$ which converts an input data matrix $\widehat{X}=\left[\matrix{x_{1}& x_{2}&\cdots & x_{h}}\right]^{T}$ to the output data matrix $\widehat{Y}=\left[\matrix{y_{1}& y_{2}&\cdots & y_{h}}\right]^{T}$, following (8a) and (8b).TeX Source\eqalignno{{}&\widehat{T}=\prod\limits_{T_{i}\in\widetilde{TS}\&~\alpha_{i}=1}T_{i},\quad i=1,2,\ldots,m &{\hbox{(8a)}}\cr\noalign{\vskip-1.5ex}\smash{\left\{\vrule height2pc depth2pc width0pc\right.}&\cr &\widehat{Y}=\widehat{T}\times\widehat{X}&{\hbox{(8b)}}}

Similarly and at the receiver side, (9a) and (9b) are used to decode $\widehat{X}$ out of $\widehat{Y}$. Note that $\widehat{TR}=\widehat{T}^{-1}$.TeX Source\eqalignno{{}&\widehat{TR}=\prod\limits_{T_{i}\in\widetilde{TS}~\&~\alpha_{i}=1}T_{i}^{-1},\quad i=1,2,\ldots,m\cr\noalign{\vskip-0.5ex}\smash{\left\{\vrule height3.5pc depth3.5pc width0em\right.}&\cr\noalign{\vskip-1.5ex}&\quad~=\prod_{TR_{i}\in\widetilde{TRS}~\&~\alpha_{i}=1}TR_{i},\quad i=1,2,\ldots,m~~~&{\hbox{(9a)}}\cr&\widehat{X}=\widehat{TR}\times\widehat{Y}&{\hbox{(9b)}}}

### B. Privacy-Preserving Scheme

Referring to Section II, a receiver requires the LEVs of a graph (over which the data has passed through) in order to compute the transfer matrix. In a linear network coding, there are two parameters that can be changed, such as network topology (path) and coefficient factors (LEVs). One solution is having one of these two values to be fixed and the other one changes dynamically (or, in some cases both of them can be dynamic). To be more precise, we can keep the topology (path) static, and randomly choose the coefficients, which in this case the coefficients information should be transferred (some how, and securely) to the receivers to make the receiver capable of decoding the data. On the other hand, we can fix the coefficients and randomly choose the path, which in this case information about the path, or the network coding nodes (that have performed network coding operation/encoding), should be transferred to the receiver.

Note that LEV is a function of the coefficient factors [24]. Without loss of generality: TeX Source$$T_{i}=Function (LEV_{SubG_{i}}),\quad i=1,2,\ldots, m\eqno{\hbox{(10)}}$$

Since we keep the sub-graph structure fix, only knowing coefficients is missing to compute the transfer matrix(ces) of the sub-graphs, which the server is capable of doing it. From an abstract point of view, in our system, we keep the topology, nodes coefficients and structure of the sub-graphs fix, although the sub-graphs that the data is crossing is being selected randomly. Our mechanism phases are as follows:

#### 1. Phase I: Setup

Firstly (Algorithm 1), PKG provides a One-Way hash function $F_{coef}(.)$ to the nodes. Each node applies $F_{coef}(.)$ to its own private key to obtain its coefficient (11): TeX Source$$Node{\_}Coefficient=F_{coef}(Node{\_}PrivateKey)\eqno{\hbox{(11)}}$$

In a PKI-based system, only PKG and each node know the private key of the node. System administrator provides all information about the topology and graph consists of the participating nodes in each sub-graph to PKG. PKG calculates $T_{i}$ and $T_{i}^{-1}$ of each $SubG_{i}$ and provides the $T_{i}^{-1}~{\rm s}$ to a destination.

Note that a private key can be considered as a random-based secret value managed by PKG. For instance, in an identity-based cryptography approach, like [39], the private key of a node is multiplication of a secret random value generated by PKG and the public key of the node. Since the coefficient is a function of the private key (11), the randomness is implied for the coefficient as well, and referring to [24], $T_{i}$ is invertible.

Since $F_{coef}(.)$ is a One-Way function, even if any of the receivers acts maliciously, an attacker would not be able to utilize matrix $T_{i}^{-1}$ and performs a reverse operation to obtain the private keys of the nodes. We discuss more about this in Section V. Furthermore, a private key is a dynamic value [17], therefore, transfer matrices $T_{i}$ (and $T_{i}^{-1}$) are also dynamic. Note that the PKG is responsible to maintain and update the matrices and informing the receivers, for instance in case of the smart grid, the smart grid servers, which collect the data, should be notified by this server (PKG).

#### 2. Phase II: Generating and Sending the Packets

Presented by Algorithm 2, a sender chooses a nonce and assigns it to the $TAG$, and a nonce random identity for the $TAG$, which we show it as $ID_{TAG}$. Then, the sender chooses one of the adjacent sub-graphs with equal probability to send the data. Then, the sender forms the data header including the nonce values and address of the receiver. Furthermore, the sender signs the header with its own private key in order to preserve the source authentication as well as the data header integrity. Finally, the sender sends the encrypted data (packets) and data header, signature of data header and plain form of the tag and its ID to the next sub-graph toward the receiver.

#### 3. Note

$TAG$ is an array that traverses with the data. Each bit of the $TAG$ represents $\alpha_{i}$ of a sub-graph ((7a) and (7b)). To be more precise, the $i$th bit of the array is converted to one if the data passes through $SubG_{i}$. Therefore, initially $TAG$ consists of only zeros $(TAG=0)$. Since $TAG$ is sent in a plain format, we load it with a nonce value, and forward the nonce (encrypted) to the destination. Then, in each sub-graph, the head cluster only reverses the value of the $i$th bit. In other words, we $XOR$ this bit with $\alpha_{i}$. Consequently, at the destination only needs to $XOR$ the result with the original nonce value to decrypt the tag and obtain list of the sub-graphs that the data has passed through. Comparing to the network coding operation, especially at the receiver, changing one bit per sub-graph is negligible overhead added cost by our mechanism.

#### 4. Note

Referring to our discussion in Section II about the network coding, normally the coefficient that each network coding node use to handle the coding process, needs to be sent to the receiver for encoding process (by receiver). In our design, we eliminate sending this overhead data (coefficients) in cost of sending the tag and tag identity. In fact, tag ID is similar to the flow ID that is being used by the network coding, and our additional overhead cost is the tag itself. The overhead cost of sending the tag is much less than sending the coefficients, since in network coding there is one coefficient per network coding node, and we only have one tag from source to destination.

#### 5. Phase III: Relaying the Packets

As it is shown in Algorithm 3, we consider a situation that our data is entering to the $SubG_{i}$. The data passes through $SubG_{i}$ concerning the defined connections and coefficient values of the nodes (network coding nodes are already identified by the administrator). The head cluster of the sub-graph needs to record $\alpha_{i}$ into $TAG$ by changing the $i$th bit of $TAG$. Similar to the previous step (sending data), the head cluster of the sub-graph $SubG_{i}$ randomly selects one of its neighbour sub-graphs to transfer the data to toward the receiver.

#### 6. Note

Since the next sub-graph is chosen randomly, the data may get entered to the same sub-graph more than once. In order to prevent this looping situation, the identity of the tag $(ID_{TAG})$ is referred by the header of the sub-graph $(HC_{i})$. Indeed, $HC_{i}$ keeps a record of the $ID_{TAG}$ that is processed by the sub-graph, in addition to IDs the sub-graphs that it is received from and is sent to, for some time in order to prevent processing it twice. The reasonable expiry time of keeping the record can be same as smart meters periodic collecting time, e.g., 15 minutes. In this case, the assumption is that the data will be received and decoded by the receivers during 15 minutes. Therefore, first of all, $HC_{i}$ does not lead the processed (coded) information to be sent to the same sub-graph that is coming from. Secondly, if it receives the same data $(ID_{TAG})$ from another sun-graph, it will forward the data as-is and without coding it again, to the next randomly chosen sub-graph excluding the sub-graphs that are received from as well as the data has been sent previously to. It is obvious that in a worse case scenario, the data will reach the destination after being processed by the entire sub-graphs only once.

#### 7. Phase IV: Receiving and Decoding the Packets

• Utilizes its own private key to decrypt the header to obtain addresses of the sender and receiver, and the nonce.
• Referring to the sender address, verifies the signature, and if it is valid, $XOR{\rm es}$ the nonce with the received tags for decryption.
• Referring to the bit values of $TAG$, selects $T_{i}^{-1}~(TR_{i})$ of sub-graphs that data has passed through, and multiplies them together to obtain the reverse value of the path transfer matrix $\widehat{TRS}$ via (9a).
• Obtains original packets sent by the sender via (9b).
SECTION V

## SYSTEM EVALUATION

In this section, we present our analysis from privacy and system performance point of views. First we propose two adversary models, then compare our delivered privacy factors comparing to the literature, and finally in the communication and network performance subsection, we discuss complexity and reliability of our design.

We refer to Dolev-Yao model [40] to design our two adversary models including external and internal adversaries, in case of the smart grid system.

In this case, the adversary is an external party and is not an entity of the system.

Objectives: The adversary objective is obtaining information about the HAN occupancy and its resident behaviour.

Initial capabilities: The adversary knows the detail information about the initial security system as well as our proposed privacy mechanism. For instance, the adversary knows public keys of the entire parties and has the detail knowledge about the network topology, graph and sub-graphs. Furthermore, the adversary knows the detail design of our mechanism including algorithms shown by Algorithm 14. Finally, the adversary has enough technical knowledge and is fully-equipped to be able to listen to the channels and analyze the traffic.

Capabilities during the attack: The adversary receives all of the packets entering to a HAN (smart meter of HAN) and departure from the HAN. Beside, the adversary can listen to the channel of any other entity of the system like PKG and any destination, to collect their receiving data.

#### 2. Note

By using the term data, we mean and refer to the exact data that is in the channels (encrypted and/or encoded).

Discussion: Refer to our assumption, a HAN gateway (smart meter) acts as relay node in a mesh-based topology. We also implement and perform enhanced network coding that mixes the packets utilizing sub-graphs. Since source and destination addresses are encrypted inside the header, our scheme delivers the anonymity and undetectability, which yields to unobservability. If the adversary listens to entering and departing data from a HAN, he does not gain any useful information, since the entering packets plus HAN packet are encoded into one packet, which hides the HAN packet. If the origin of a packet is an appliance, listening to the channel does not help the adversary to obtain anything about the existence of the appliance (undetectability over appliances). In the proposed schemes in the literature (Section I), he can understand HAN is generating a packet by listening to the first node, so, mostly those schemes only make a private path.

The packets entering a smart meter to be relayed, also do not have the source address, and are entering to the sub-graphs randomly. Therefore, the adversary cannot trace back the packets or monitor flow of the data, so unlinkability is delivered since he cannot observe direction of the data.

Last position for the adversary is at receiver side and listening to the receiving data. Considering above discussion about the hidden address of the receiver, he only obtain the flow of information to the destination. Indeed, since the data travels through random chosen sub-graphs to reach the destination, he cannot trace back the data. Consequently, our scheme maintains anonymity and unlinkability here too.

Note that in any of the above situations, gaining access to $TAG$ does not help the adversary. Indeed, encoding $TAG$ with a random nonce makes sub-graphs capable of inserting $\alpha_{i}$ without decoding $TAG$. He does not obtain anything by having an encoded $TAG$, even at the first or last sub-graphs.

Adversary is an internal party, e.g., he has access to one of the HANs and can particularly monitor gateway of the HAN or analyze the gateway information.

Objectives: Gaining access to the neighbour HANs information by receiving their data for relay.

Initial capabilities: The malicious node is already authenticated and receives the system parameters and its own private key, so our adversary has these information.

Capabilities during the attack: The malicious node is under control of the adversary and performs the Algorithm 3.

Discussion: Having access to a malicious node only improves the adversary situation on modifying its HAN data. The relay nodes only mix the packets and do not perform any encryption and decryption. Furthermore, the data that he receives does not show any sign of the source or destination. Consequently, his capability and behave is almost same as the previous scenario.

### B. Privacy Performance Analysis

Referring to Sections II and III as well as our proposal in Section IV, Table I presents performance of our scheme comparing to the discussed schemes in Section I. We consider two types of the attackers such as a neighbour and a relay node. Some of the schemes may deliver the anonymity in case of relay nodes; however, the data is not anonymous for a neighbour. We also use the following symbols to describe each deliverable:

• ${\ssr X}$”: Does not deliver the measure.
• “•”: Delivers the measure only against relay nodes.
• $\checkmark$”: Delivers the measure against all nodes.
TABLE I Delivery of the Privacy Measures

### C. Communication and Network Performance Analysis

In this subsection, we provide an analysis and evaluation on the aspects of probability of success and complexity as well as intrusion success likelihood, and reliability for the proposed approach. Throughout the discussion we consider a square grid network topology. The communication performance evaluation of our proposed coordinated method is evaluated against the random network coding approach of [41] where authors claim a throughput performance gain over no coding. However, while there are advantages to network coding approaches, the success of these methods highly depends on the characteristics of topology. In this method, nodes continuously replicate and forward messages to newly discovered nodes.

#### 1. Complexity

One of the overheads with the network coding is that nodes must have the processing capability to perform arithmetic operations over finite fields in real time. This processing will determine whether a decoded content chunk is innovative and makes a decision to either encode, forward, or decode. The processing complexity involved in operations over fields depends on the size of each generation $h$, and size of the field $n$. It takes $O(h^{2})$ operations in $F_{2^{n}}$ for linear operations with generations of size $h$. Multiplications and inversions over field $F_{2^{n}}$ is of complexity $O(n^{2})$. Furthermore, matrix inversions and Gaussian elimination to solve the system takes $O(h^{3})$.

As shown in Fig. 4, the cost of computing in our method is lower since the transfer metric at the receiver is implied and need not to be recalculated every interval. The computational cost in our algorithm is reduced because enhanced network coding is performed on a selected set of nodes within each cluster.

Fig. 4. Cost of computing.

#### 2. Reliability

Our method aims at minimizing the number of nodes that shall perform the network coding operations. Therefore, we can take advantage of opportunities for fixed the network coding where possible. It is intuitive that as the system size increases, random network coding on large number of node compromise the overall computational complexity and degrades the overall probability of success.

The probability that a random network coding problem is solvable depends on whether the global coding vector has a full rank. If the coefficients are randomly chosen from a field $F_{q}$, then probability for a generation to be invalid is at most ${{\vert T\vert}\over{\vert q\vert}}$. The extension of the Schwartz-Zippel theorem yields the probability of success at each random coded node as follows:TeX Source$$Pr(success)=(1-{{\vert T\vert}\over{q}})$$ where $Pr(success)$ is the probability of success within the cluster of random network coding. The following theorem from [25] states the probability of success by a valid network code.

#### Theorem 5.1

The probability of a random network code with coefficients from field $F_{q}$ being valid and being successfully decoded in a multicast connection problem with $\vert T\vert$ number of receivers and $\vert S\vert$ number of sources is $(1-{{\vert T\vert}\over{q}})^{\eta}$ where $q>\vert S\vert$ and $\eta$ is the number of intermediate links with associated random coefficients.

As depicted by Fig. 5, in contrast to the base case scenario, where random network coding is used, our proposed method utilizes a fixed network coding approach where the coefficients are dependent on the private key. Therefore, the uncertainty about the existence of a solution for the system is being resolved.

Fig. 5. Probability of success.
SECTION VI

## CONCLUSION

In this paper, we have proposed a privacy-preserving approach for the smart grid system, an application of the cyber-physical system. We developed an enhance network coding technique for packet routing to hide source, destination, path, traffic volume and content information of the packets. We introduced concept of the sub-graphing the network for this purpose, and used a subset of the sub-graphs to transfer the data, which improve the energy consumption and system complexity. Also, we eliminated sending the coefficients of the network coding nodes to the receiver for performing the decoding process, which saves the bandwidth. We have shown that our scheme maintains multiple favourable privacy preserving metrics such as anonymity, unlinkability, undetectability and unobservability for communications over the advanced metering infrastructure. We evaluated the performance of our scheme using both simulation and analytical analysis. Our result show that our proposed schemes provide reliability to the system without adding much complexity.

## Footnotes

This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada under Grant STPGP 396838, and by the National Center for Electronics, Communications, and Photonics at King Abdulaziz City for Science and Technology in Saudi Arabia.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available