SECTION I

INTEGRATION of state-of-the-art information and communications technology (ICT), control, and computing is a critical enabler to facilitate grid modernization and optimization for the existing electric power systems. During the evolutional movement in smart grid development, the conventional critical infrastructure is gradually exposed to the public such that part of the systems especially the distribution networks involving smart metering communications along with controls of distributed generation and demand responses at consumption sites will potentially pose a number of security risks. Advanced Metering Infrastructure (AMI) in the distribution network is essentially comprised of endpoint-based home area networks (HANs), grid-based wireless sensor networks (WSNs), and access point-based neighborhood/field area networks (NANs and FANs) [1], [2]. Recently, a middleware architecture design has been proposed to consolidate heterogeneous quality of service/experience (QoS/QoE)-oriented smart grid applications, such as spectrum efficiency, power scheduling, and security protection [3], [4]. In the meantime, several surveys and tutorials have elaborately addressed a number of security issues in terms of confidentiality, integrity, and availability (CIA), from passive attacks to active attacks [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], such as eavesdropping, jamming, tampering, spoofing, altering, and other attacks against the protocol stacks of the OSI model; these attacks are foreseen inevitable and nontrivial within the context of the cyber-physical smart grid. Among which some literatures have emphasized the interrelationship between cyber and physical securities [5], [6], [15].

There are two primary research directions in smart grid security: 1) a breach of network availability: a power system involves real-time models that perform state estimation to observe the current state conditions in the power network by obtaining real-time measurement data from network meters and devices—without these data, state estimation cannot be effectively executed in real time, thus hampering the decision making of network operators—if the network communications is intruded by denial of service (DoS) attacks or other schemes against data availability, the services will be interrupted in both communications and power systems; and 2) a breach of measurement data confidentiality and integrity: due to the cause-effect attribute, if measurement data are further altered by intruders in a way that the attack is hard to be detected, not only customer privacy may be compromised, but the undetectability may also cause utilities to lose revenues and potentially result in severe power outage and equipment damages. Countermeasures relied on cryptographic mechanisms, secure communications architecture and network designs, device security, and intrusion detection systems (IDS) are anticipated options for securing the future power system against malicious intrusions and attacks from all perspectives in a complementary manner. The implementation of various strategic approaches will be based on different smart grid applications as well as communications requirements throughout the networks.

According to the Institute for Electric Efficiency (IEE) [16], one-third of households in the U.S. have had a smart meter (i.e., approximately 36 million smart meters) as of May 2012, and approximately 65 million smart meters will have been deployed by 2015. While the deployments continue to rise, a few energy theft incidents have been discovered that illegal customers intended to lower their electricity bills via meter tampering, bypassing, or other unlawful schemes regardless of traditional or smart meters in places such as Ireland, Hong Kong, and Virginia U.S. [17]. Notably, energy theft is one dominant component of non-technical losses, which account for 10%–40% of energy distribution [18], e.g., $1–6 billion losses due to energy theft yearly for utilities in the U.S. Moreover, the report [19] revealed that the current installations of smart meter communications protocols and associated infrastructure do not have sufficient security controls to protect the electric power system against false data injection attacks, not to mention older meters which were not designed to adequately cope with such attacks. In addition to the physical attacks, network attacks by compromising meters can also introduce malicious measurement data and cause degradation of grid operation [20], [21]. While some protection schemes against malicious network traffic have been proposed for smart grid communications networks monitoring [22], [23], [24], detection mechanisms and analyses for identifying malicious measurement data and energy theft have been investigated explicitly in [18] and [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]. The main contributions of this paper are summarized in the following.

- The common DC (direct current) model for state estimation in a power network and traditional techniques for processing bad measurement data are reviewed. Meanwhile, a false data injection (FDI) attack and associated impacts on the power network are illustrated, followed by a discussion of existing countermeasures and studies.
- An attack model related to FDI, combination sum of energy profiles (CONSUMER) attack, is defined and formulated into one type of coin change problems that minimizes the number of compromised smart meters without being revealed by maintaining a cumulative load at the aggregation point to which multiple households are connected in today's radial tree-like distribution network.
- A hybrid anomaly intrusion detection system framework, which incorporates power information and sensor placement (POISE) along with grid-placed sensor (GPS) algorithms using graph theory to provide network observability, is proposed to validate the correctness of customers' energy usage by detecting anomaly activities at the consumption level in the distribution network.
- Simulations for analyzing the proposed attack model as well as grid sensor implementation in terms of network observability and detection rate are conducted and discussed.
- Several potential research directions for furthering the proposed framework in the smart grid context are presented.

The remainder of this paper is structured as follows: Section II reviews the background and state-of-the-art studies related to this work. Section III illustrates the system measurement model prior to the discussion of the proposed detection designs. Section IV presents the problem formulation, attack model, and countermeasures. Section V analyzes the simulation results of the proposed detection framework and discusses the findings. Finally, Section VI summarizes the focal points, draws a conclusion, and presents the future works.

SECTION II

An electric power system is a feedback loop control system that relies on measurement data obtained from network measurement units such as meters and sensors. Based on the available data, the control center executes a series of tasks such as topology processing, network observability analysis, state estimation, and bad measurement data processing in order to identify the current status of the power network [41]. Consequently, the decision-making processes of controlling actuators, optimizing power flows, and analyzing possible contingencies are performed to ensure network stability and security, in accordance with what the system observes or estimates. In reality, the measurement data may not be always accurate because of errors in measurements, failures in telemetry and equipment, noises in communications channels, and possibly breached integrity by intentional intrusions or attacks. If the accuracy of measurement data is not as precise as it gets, the decision making can be mistaken in consequence of misguided state estimation.

For simplicity, the common formulation of the state estimation problem is to consider a DC power flow model [41]: ${\bf z=Hx+e}$, where ${\bf H}$ is the $m$-by-$n$ Jacobian matrix representing $m$ independent network equations with $n$ state variables related to the network topology, ${\bf x}$ is the $n$-vector ($n\times 1$ matrix) of the true states (unknown and to be estimated for bad data detection), ${\bf z}$ is the $m$-vector of measurements (observed by data collection; in this case, a macro grid power generation reading and $m-1$ household energy consumption readings), and ${\bf e}$ is the $m$-vector of random errors. The state estimate ${\mathhat{\bf x}}$ can be obtained by calculating ${\bf G^{-1}H^{\ssr T}Wz}$, where ${\bf G=H^{\ssr T}WH}$ is the state estimation gain matrix, $(.)^{\ssr T}$ is the transpose of $(.)$, and ${\bf W}$ is a diagonal matrix whose entities are most commonly the reciprocals of the variance of measurement errors based on historical statistics, which may represent meter accuracy. In order to detect bad measurement data, the measurement residual ${\bf z}-{\bf H}{\mathhat{\bf x}}$ is computed and its $L_{2}$-norm $\Vert{\bf z}-{\bf H}{\mathhat{\bf x}}\Vert$ is compared with a predetermined threshold $\delta$; common techniques including normalized residuals and hypothesis testing are sufficient to detect anomalies, e.g., $\Vert{\bf z}-{\bf H}{\mathhat{\bf x}}\Vert>\delta$.

Nevertheless, a recent study [21] observed that the traditional detection is not able to differentiate between natural anomalies and malicious intrusions attributed to false data injection (FDI) such that ${\bf z_{b}=z+a}$ and ${{\mathhat{\bf x}}_{\bf b}={{\mathhat{\bf x}}+{\bf c}}}$, where ${\bf a=Hc}$ is an $m\times 1$ attack vector injected to the system that is designed to be a linear combination of the column vectors of ${\bf H}$ in order to bypass the detection, i.e., ${\Vert {\bf z}_{\bf b}-{\bf H}}{\mathhat{\bf x}}_{\bf b}\Vert=\Vert{\bf z-H}{\mathhat{\bf x}}\Vert{\leq\delta}$. The authors further showed that the attacker is required to compromise a number of meters (i.e., 30%–70% of meters in IEEE 9, 14, 30, 118, 300 bus test systems) in order to bypass detection and takes less than 10 seconds. This type of attack is interchangeably called an unobservable, undetectable, or stealth attack that needs to be launched in a coordinated manner [5], [27], [42]with knowledge of the network configuration matrix ${\bf H}$ while not violating the physics of power flow. Having knowledge of ${\bf H}$ by the attacker has been assumed in most of the current studies. Although a full knowledge of the entire system gained by the attacker may be improbable, it is worth studying and developing a detection framework to identify the malicious attack in case of the attacker possibly having acquired partial knowledge and considerable capability and resource. In fact, the attacker being able to launch FDI without prior knowledge of ${\bf H}$ has been studied in [37], that is, if the network topology remains static and the independent loads vary insignificantly for a period of time, ${\bf H}$ can be inferred.

Several works have rigorously investigated the FDI attack by proposing various detectors or analyzing the damage effects on the power system. For examples, Kosut *et al.* [26] proposed a detection scheme based on generalized likelihood ratio test while comparing with other two detectors based on the residual error ${\bf r}$ derived from the state estimation that uses minimum mean square error technique. The authors studied the outcomes of maximizing the residual error and minimizing the detection rate for the attack. Yuan *et al.* [32] identified the attack launched in two different time periods (i.e., immediate and delayed attack) in which the former may lead the system to perform unnecessary load shedding whereas the latter may cause power overflows on some transmission lines. However, the authors only modeled the immediate attack and showed that the attack leads to a high economic loss. Lin *et al.* [31] studied the effectiveness of the attack in terms of transmission cost and power outage rate by deceiving the amount of energy request and supply as well as the status of transmission lines by claiming a line is valid to deliver a certain amount of power while it is not and vice versa. Giani *et al.* [27] proposed countermeasures by utilizing known-secure PMUs (phasor measurement units) placement and illustrated that $p+1$ PMUs are enough to detect $p k$–sparse attacks for $k\leq 5$ while assuming all lines are metered. Qin *et al.* [30] illustrated a case where the attack is detected but still unidentifiable in such a way that it is difficult for operators to know which set of meters are truly compromised. The authors proposed a three-step search process that firstly identifies the meter with the largest residual (which exceeds a predetermined threshold) after state estimation, secondly locates a feasible attack region associated with the meter, and finally checks a set of suspicious meters located in the region by using a brute-force search.

Most of the existing works have focused on the cyber-physical attacks at the transmission/distribution level, and very few have analyzed the attack at the distribution/consumption level where smart meters are deployed, e.g., [39], [40]. While motivated by [30], [40], to the best of our knowledge, this is the first work to investigate the CONSUMER attack in the distribution network where a dishonest customer intends to lower his or her electricity bill by compromising some of its neighbors' smart meters in a neighborhood. We will show that the attack can be neither detectable nor identifiable at a certain period of time for the defined scenario. This work can also be extended to an aggressive collusion attack that compromises a group of smart meters and intentionally causes service disruption or equipment damages throughout the distribution network.

SECTION III

AC (alternating current) and DC power flow models are essentially used for studying state estimation. Nevertheless, the DC power flow model is often assessed due to its inexpensive computation and simplicity [43]. Moreover, a DC power grid is a foreseeable approach for the future distribution network [44] because 1) many distributed generators (e.g., household/neighborhood-based solar power systems) supply DC power, 2) AC grid-connected inverters are not needed, and 3) overall costs and power losses can be reduced. The ability to perform state estimation relies on the sufficiency of measurement data available in a network. In other words, the observability of a network has to be analyzed before state estimation can be processed.

A network is said to be observable [41] if all flows in the network can be observed by obtaining information in a set of sufficient measurement data such that no power flows in the network for which ${\bf Hx=0}$, $\forall P\in{\bf x}$ (where $P$ is an element of the state vector ${\bf x}$); otherwise, there is (are) unobservable state(s) where non-zero flows exist in the network. In other words, whenever there is a non-zero flow in the network, at least one of the measurements should be nonzero.

Consider a DC network model that has three state variables as shown in Fig. 1: to ensure that the power network is balanced, there is at least one state that acts as a generation or load node, i.e., $P_{1}+P_{2}+P_{3}=0$. Fig. 1(a) shows an underdetermined and partially observable case where only state $P_{1}$ is observable, and one of the states $P_{2}$, $P_{3}$ is unobservable, and another dependent state is indeterminate. Fig. 1(b) shows an observable and sufficient case where both states $P_{1}$ and $P_{2}$ are observable, and dependent state $P_{3}$ can be computed from the network model equation with the other two known state variables. Fig. 1(c) shows that all states $P_{1}$, $P_{2}$, $P_{3}$ are observable and form an overdetermined system, but can be solved as a least-squares problem. We will use this model to study the proposed CONSUMER attack model as well as grid sensor placement for the distribution network of smart grid in this paper. Additionally, we also consider the characteristics of the emerging smart grid network as follows.

- Nodes (e.g., smart meters, grid sensors) strategically deployed throughout distribution grids are static; in other words, grid operators have full knowledge of network topologies in terms of geographical locations and coordinates.
- Nodes are wire-powered while attached to power lines and taking various measurements such as voltage, current, frequency, and metering.
- The majority of data traffic generated at the nodes are periodic for real-time monitoring and control.
- Each measurement data generated at the nodes (representing individual customer energy consumption and grid line conditions for state estimation) cannot be fused at aggregation nodes as opposed to traditional sensor network scenarios where data of sensors tracking their surrounding environmental conditions (e.g., temperature) are aggregated at cluster nodes to generalize the current network status by determining the correlation of the multiple obtained measurements.

SECTION IV

Most parts of the current distribution networks are characterized by radial tree-like topologies, which may or may not contain loops or cycles, as shown in Fig. 2. The distribution network consists of four components: 1) a root aggregation node at which power $P_{G}$ is generated or delivered from other sources, such as macro grid or neighboring distribution networks (see [45]), and that supplies power $P_{G}$ to customers' loads, 2) a grid sensor (GS) node that constantly measures the generating power $P_{G}$, 3) a set of electric poles (EPs) or buses as intermediate nodes, ${\cal N}_{EP}=\left(1,2,\ldots, n_{EP}\right)$ representing the indices of EPs, with distribution lines/feeders, transformers and capacitors (not shown) that construct a distribution grid and deliver power to customers, and 4) a set of household smart meters (SMs), ${\cal N}_{SM}=\left(1,2,\ldots, n_{SM}\right)$ representing the indices of SMs, that have two-way communications capability of reporting household energy consumption to the utility control center and receiving associate feedback messages in real time.

Notably, Fig. 2(a) shows a distribution network that has loops found among some EP nodes, whereas Fig. 2(b) depicts a network with no loops representing a spanning tree. Any spanning tree $G(V_{T},E_{T})$ from its originally connected graph $G(V,E)$ can be computed by using various algorithms, e.g., Prim's algorithm [46], where $V$ is a collection of vertices, $E$ is a collection of edges, and $V_{T}=V$. In other words, any connected distribution network $G(V,E)$ can have at least one spanning tree $G(V_{T},E_{T})$ (composed of $\vert V_{T}\vert$ nodes and $\vert V_{T}\vert-1$ edges; $\vert.\vert$ is the cardinality) with the fewest edges among EP nodes^{1} while the four network properties must be obeyed: 1) the network connectivity in terms of power and communications operations is maintained, 2) the spanning tree starts with the distribution head node, 3) the EP node cannot be a leaf node, and 4) the SM node must be a leaf node. Under these conditions, the spanning tree topology as illustrated in Fig. 2(b) can be discovered, and therefore considered in this study in order for us to determine the minimum number of grid sensors to be placed on edges such that the network is sufficiently observable (to be discussed in Section IV-B-III).

We further assume that power flow is unidirectional (in a traditional way) such that power is delivered from the root of the tree to the end leaves. We consider a practical scenario where utility operators currently have limited knowledge about the real-time conditions of distribution networks (e.g., the difficulty of exactly knowing how and how much power is delivered across feeders/lines as well as discovering how and where faults are caused if erroneous activities are present) in a geographically and temporally fine-grained manner due to lack of grid sensors along with effective coordinated monitoring. As shown in Fig. 2, for a power balance circumstance, a summation of individual loads must be equal to the amount of measurement metered at the aggregation GS node. If the aggregated load value exceeds or lessens the GS measurement for a tolerable amount, an anomalous activity is detected and alarmed, but somehow may not be identified easily whether it is caused by natural errors or malicious attacks.

In the CONSUMER attack model, we apply the FDI model (introduced in [21]) to construct our attack scenario at the smart meter level. The typical distribution network (shown in Fig. 2) is characterized by its own network topology and configuration matrix ${\bf H}$ and a set of observed measurements ${\bf z}=[P_{G}, P_{1}, P_{2},\ldots, P_{i}]^{\ssr T}\in\BBZ$, where $P_{G}\leq 0$ is the total amount of generated power, $P_{i}\geq 0$, $\forall i\in{\cal N}_{SM}$ indicates the energy consumption of household $i$, and $\sum\nolimits_{\forall i\in{\cal N}_{SM}}P_{i}=P_{G}$ for a balanced system. We assume that no anomalies should be detected by traditional bad measurement detectors (i.e., ${{\vert}\vert}{\bf z}-{\bf H}{\mathhat{\bf x}}{{\vert}\vert}{\leq\delta}$) under a normal condition such that smart meters are functioning correctly and legitimately.

The attacker is assumed to have (partial) knowledge of ${\bf H}$ and estimation error whether they are obtained illegally or deduced by its own observation. By knowing them, the attacker is able to construct the attack vector ${\bf a}$ and associated $\bar{\bf z}$ such that ${\Vert {\bf z}_{\bf b}-{\bf H}}{\mathhat{\bf x}}_{\bf b}\Vert=\Vert{\bf z-H}{\mathhat{\bf x}}\Vert{\leq\delta}$ is satisfied in order to bypass detection, where ${{\mathhat{\bf x}}_{\bf b}={{\mathhat{\bf x}}+{\bf c}}}$ and ${\bf c}$ is a non-zero $n\times 1$ vector designed to derive the vector ${\bf a}$. The goal of the attacker is to launch the CONSUMER attack by fabricating $\bar{\bf z}={\bf z}+{\bf a}=[\bar{P}_{G},\bar{P}_{1},\bar{P}_{2},\ldots,\bar{P}_{i}]^{\ssr T}\ne 0$ in which ${\bf a}=[a_{G}, a_{1}, a_{2},\ldots, a_{i}]^{\ssr T}\in\BBZ$ and $\sum\nolimits_{\forall i\in{\cal N}_{SM}}a_{i}=a_{G}=0$. There exists load alterations, i.e., $\exists a_{i}\in{\bf a}$, $a_{i}<0$ for which the attacker compromises its meter $i\in{\cal A}$, and $\exists a_{j}\in{\bf a}$, $a_{j}>0$, $j\ne i$ for which the attacker is able to compromise the victim's meter $j\in{\cal B}$. Note that the elements of ${\cal A}$ correspond a set of meters belonging to the attacker such that $1\leq\vert{\cal A}\vert\leq\vert{\cal N}_{SM}\vert-1$ and ${\cal A}\subset{\cal N}_{SM}$, whereas the elements of ${\cal B}$ correspond a set of meters belonging to the victim such that $1\leq\vert{\cal B}\vert\leq\vert{\cal N}_{SM}\vert-1,{\cal B}\subset{\cal N}_{SM}$, and ${\cal B}\cap{\cal A}=\emptyset$. The altered linear combination in the vector ${\bf a}$ cannot be easily detected by a traditional bad measurement detector. We consider ${\bf a}=[a_{G}, a_{1}\chi_{1}, a_{2}\chi_{2},\ldots, a_{i}\chi_{i}]^{\ssr T}$ where the indicator $\chi_{i}$ represents that the smart meter of household $i$ is compromised if $\chi_{i}=1$; otherwise, $\chi_{i}=0$.

The objective of the attacker is to lower the reading of its own energy consumption level by raising others'. Owing to constrained resources, the attacker tries to minimize the number of compromised smart meters while achieving its objective subject to the inviolability of a total stealing value, $P_{s}\in\BBN$. The minimization problem for such attack is formulated as TeX Source $$\eqalignno{&{\rm min}\quad\sum_{i=1}^{n_{SM}}{\chi_{i}}\cr&{\rm s.t.}\cr&\quad\sum_{i=1}^{n_{SM}}{{a_{i}}{\chi_{i}}}={{P}_{s}},\quad{\chi_{i}}\in\{0,1\},\quad\forall i\in{\cal B}, &{\hbox{(1)}}\cr&\quad{a_{i}\geq P_{i}^{\min}(t+1)-P_{i}(t)},\quad\forall i\in{\cal A},&{\hbox{(2)}}\cr&\quad{a_{i}\leq P_{i}^{\max}(t+1)-P_{i}(t)},\quad\forall i\in{\cal B}, &{\hbox{(3)}}\cr&\quad{P_{i}(t), P_{i}^{\min}(t+1), P_{i}^{\max}(t+1)\geq 0,\quad\forall i\in{\cal A},\in{\cal B}},\cr&&{\hbox{(4)}}}$$ where $P_{s}=-\left(\sum\nolimits_{\forall i\in{\cal A}}a_{i}\right)\geq 0$ is the total amount of non-negative integer power that the attacker plans to steal, $P_{i\in{\cal A}}(t)$ is the energy consumption value of the attacker's smart meter $i$ at time $t$, $P_{i\in{\cal B}}(t)$ is the energy consumption value of victim $i$ at time $t$, $P_{i\in{\cal A}}^{\min}(t+1)$ is the minimum power value that the attacker with smart meter $i$ is predicted to consume at time $t+1$, and $P_{i\in{\cal B}}^{\max}(t+1)$ is the maximum power value that the victim $i$ is predicted to consume at time $t+1$.

This minimization problem is analogous to the coin change problem, which is NP-hard [49]. Both problems aim to match a given non-negative integer value (equality Constraint 1) while minimizing the number of components (Objective function) for the outcome. As opposed to the coin change problem, the CONSUMER attack problem considers inequality Constraints 2, 3, and 4 that determine $\vert{\cal A}\vert+\vert{\cal B}\vert$ sets of non-negative integer power corresponding to households (the attacker(s) and victim(s)) energy profiles at the present time slot (i.e., $P_{i\in{\cal A}}(t)$ and $P_{i\in{\cal B}}(t)$), as well as the sets of predicted values of energy consumption at the next time slot (i.e., $P_{i\in{\cal A}}^{\min}(t+1)$ and $P_{i\in{\cal B}}^{\max}(t+1)$). Given a set of $a_{i}$ belonging to household $i$ that are discovered under Constraints 2, 3, and 4, the problem can be solved as a coin change problem. The total number of compromised smart meters is defined as $k_{SM}=\vert{\cal A}\vert+\vert{\cal B}\vert\leq\vert{\cal N}_{SM}\vert$.

A CONSUMER attack can be launched successfully by compromising as few as two smart meters (one for the attacker and one for the victim; $k_{SM}\geq 2$) in any spanning tree of a distribution network (described in Section IV).

Consider a radial tree topology illustrated in Fig. 2(b) where there is only one grid sensor available near the supply node measuring the total amount of energy $P_{G}$ consumed by end customers. We assume that the capacity of each edge in the network may sustain at least $P_{G}$ during power transmission. A balanced system is maintained if $P_{G}+P_{1}+P_{2}+\cdots+P_{i}=0$, $\forall i\in{\cal N}_{SM}$, $\forall P_{i}$, $P_{G}\in\BBZ$, $P_{i}\geq 0$, $P_{G}\leq 0$.

There exists combinations of various sets to satisfy the balance equation when $P_{i}\leq{\ssr abs}(P_{G})$, where ${\ssr abs}(.)$ is the absolute value of $(.)$. To capitalize on this property, the attacker may design a vector ${\bf a}$ such that $P_{G}+P_{1}+a_{1}+P_{2}+a_{2}+\cdots+P_{i}+a_{i}=0,\forall i\in{\cal N}_{SM},\forall a_{i}\in\BBZ$. No smart meter is compromised when $a_{i}=0,\forall i$. However, if $\exists a_{i}: a_{i}<0, a_{i}\in{\cal A}$, then $\exists a_{j}: a_{j}>0, j\ne i, a_{j}\in{\cal B}$ without violating the balance equation. For ${\ssr abs}(a_{i})=a_{j}$ and ${\cal\vert A\vert=\vert B\vert}=1$, only two meters are compromised by the attacker; otherwise, more than two meters need to be compromised in order to evade detection. Hence, $k_{SM}=2$ is the least number for the attacker to launch a successful CONSUMER attack. $\blackboxfill$

In an one-attacker-one-victim scenario (depicted in Fig. 3), the attacker tries to decrease its consumption by increasing the victim's as much as it can. Under an unconstrained case (which excludes Constraints 2 and 3), the attacker can pick any arbitrary non-negative $P_{s}$ and performs subtraction on its consumption amount and addition on the victim's to avoid detection as long as Constraint 1 is held; the minimization problem will be reduced to a simple linear programming problem.

On the other hand under a constrained case (which includes Constraints 2 and 3), the attacker cannot simply pick any number but needs to determine appropriate ${P_{i\in{\cal A}}^{\min}}$ and ${P_{i\in{\cal B}}^{\max}}$ in the next time slot in order to avoid detection as anomalous activities. In fact, utilities might implement various kinds of prediction methods to predict and monitor households' energy consumption from time to time, and that would complicate the problem. Any anomaly activity that deviates from the correspondingly estimated regression lines beyond a predetermined threshold will trigger an alarm in the intrusion detection system. Unless the attacker had prior knowledge of what the thresholds were, ${P_{i\in{\cal A}}^{\min}}$ and ${P_{i\in{\cal B}}^{\max}}$ could not be chosen too aggressively. Therefore, implementing Constraints 2 and 3 in the attack model would affect the outcome of a CONSUMER attack. In addition to these constraints, the costs of compromising smart meters via coordinated communications on the spatial and temporal scales are challenges from the attacker perspective.

It is unlikely to have an one-size-fits-all solution for detecting anomalous or malicious activities in smart grid. We develop a framework that integrates the characteristics of power network load consumption dynamics, communications network traffic dynamics, and network observability analysis via grid sensor placement for an evolutionary intrusion detection system, as shown in Fig. 4. The last item of the proposed framework is covered in this paper, and the first two items are left for our future works. In a cyber-physical smart grid AMI network, the uplink transmission from smart meters to control centers as well as downlink transmission in an opposite way is vulnerable to a breach of CIA. While a general FDI attack can be launched on the two way links, the CONSUMER attack is specifically instigated in the uplink transmission causing utility operators to make wrong decisions in consequence of receiving falsified measurement data which are hardly distinguishable from the legitimate ones. There are two fundamentally challenging questions in the context of the smart grid intrusion detection system design.

- What is an adequate threshold for defining an anomaly activity, e.g., in the application of characterizing customers energy consumption behavior while they may be elusive to some extent? Does it even exist?
- How to effectively distinguish between (unintentionally) anomaly and (intentionally) malicious activities?

While these intriguing questions require further research in the next few years, we provide some insights into the following first two detection methods based on power and communications networks dynamics analyses, and then propose a grid sensor placement mechanism to effectively enhance the intrusion detection process.

A power grid system obeys a series of control theories based on laws of physics. Data measurement collection does not only involve power loads but also voltage, current, power factor elements. Observation of phase differences on the transmission/distribution level studied in [27] can be further evaluated on the distribution/consumption level. Another useful metric for designing specification or rule-based anomaly detection systems is to deeply understand different classes of customer energy consumption patterns at different time scales, e.g., usage trends on weekdays, weekends, monthly, seasonal, and annual basis corresponding to individual activities and weather conditions. Many approaches for characterizing household electricity demands including Fourier series, Gaussian processes, neural networks, fuzzy logic, as well as regression and autoregression have been studied [50]. Meanwhile, the existing scheme of detecting illegal customers based on Support Vector Machine (SVM) learning and rule-based algorithms has also been investigated in [18]. These methods could be effectively incorporated in the intrusion detection system at the application level to improve detection accuracy. Computational intelligence [51] can also be readily applied for intrusion detection.

Along with the methods of power dynamics inspection, extensive studies on traditional low-power WSN attack scenarios [10] at the physical, MAC, and network layer levels are complementary intrusion detection tools to be integrated into the smart grid communications security environment, specifically the jamming, replay, and DoS attacks. Several dominant metrics such as data sending rate, receiving rate, packet loss rate, and signal strength will be tailored to effectively facilitate the detection of anomaly activities in smart grid communications in response to compromised circumstance.

Smart meter deployment has been initiated worldwide in the past few years. The rationale for replacing the traditional meters with smart meters is plentiful, but the fundamental one is to be able to monitor and control customer energy consumption more efficiently in real time through two-way communications by leveraging the state-of-the-art wire/wireless and power line communications technologies. By gaining knowledge of individual energy usage patterns, utilities can deal with primary issues easily such as peak demands alleviation, remote meter reading, and distributed renewable energy sources accommodation, in order to increase energy efficiency and reduce greenhouse gas emission. The entire smart grid AMI network consisting of a number of control centers and hundreds of thousands of smart meters is likely to operate using the IP Protocol with IPv6 addresses assignment connected to the Internet [1]. Smart meters support multiple communications protocols that facilitate smart energy management in HANs and mesh routing in NAN. Many have considered utilizing the existing networks such as WiFi and wireless mesh networks to communicate under unlicensed bands for economic reasons. This strategy creates network uncertainties by exposing security vulnerabilities of smart metering communications to the public.

In the meantime, we propose grid sensor placement across the distribution network in which these grid sensors with simpler design (than smart meters) are owned by utilities and construct grid sensor networks operating in dedicated or licensed bands specified in IEEE 802.15.4g Smart Utility Network (SUN), e.g., see [1], [2]for further studies. The grid sensor network is much less vulnerable to malicious attacks and is designed as surveillance guards in the distribution grid. Moreover, deploying grid sensors on lines/feeders (as low-voltage sensors) brings utilities a number of potential benefits: 1) greater transparency and stability can be achieved owing to the substantial observability of power flow conditions on each segment and portion of the network, 2) voltage fluctuation due to varying input of renewable energy sources (e.g., household/neighborhood-based PV solar systems) can be effectively monitored, and 3) optimization in volt-var control and optimal power flow operations can be intelligently performed. Hence, utility operators will have a full knowledge of their supervised network topologies in terms of geographical locations with coordinates of grid sensors as well as smart meters while monitoring the network quality and ensuring cyber-physical security. At this stage, we assume that all deployed grid sensors are intrusion resistant and their measurement data are trustworthy (i.e., false alarm rate is zero) so that the measurement data of smart meters can be compared with that of grid sensors to detect and identify any falsified data by compromised smart meters.

As discussed in Section IV, the existing distribution grid is not transparent to the utilities to a certain degree. The design of sensor grid placement can help provide topological observability by deploying a sufficient number of grid sensors to guarantee state estimation solvability. In Fig. 5(a), every grid line is placed with a sensor that results in an overdetermined system. In order to reduce the redundancy to a sufficient number while observability is still satisfied, a Grid-Placed Sensor (GPS) algorithm is proposed, as shown in Alg. 1. For the spanning tree illustrated in Fig. 5(b), the network graph $G(V_{T},E_{T})$ with depth $1,2,\ldots,d\in D_{T}$ is constructed by a set of EP and SM nodes $v_{1},v_{2},\ldots,v_{n}\in V_{T}$ and a set of edges $E_{T}$, where ${\cal N}_{SM}\subset V_{T},{\cal N}_{EP}\subset V_{T},\vert V_{T}\vert=\vert{\cal N}_{SM}\vert+\vert{\cal N}_{EP}\vert$.

At the beginning, the GS node $v_{1}$ is directly placed on the edge between the generation source and distribution bus, i.e., $v_{2}$. The algorithm then starts with EP node $v_{2}$ and discovers that it has two children, which can be EP or SM nodes. Either $e(v_{2},v_{3})$ or $e(v_{2},v_{16})$ placed with a GS node $v_{15}$ in between will make both edges become observable, according to Def. 1 in Section III. Note that $e(w,v)$ or $e(v,w)$ denotes the edge $e$ that connects both node $w$ and $v$. Both edges becoming observable are then marked with 1 in $I_{O}$. Repeat the process for the right branch, the algorithm starts with EP node $v_{16}$ and discovers that it also has two children, and therefore, either $e(v_{16},v_{19})$ or $e(v_{16},v_{17})$ placed with a GS node $v_{18}$ will make both edges become observable; again, the two observable edges are marked with 1 in $I_{O}$. Notably, although SM node $v_{17}$ has metering capability to make $e(v_{16},v_{17})$ observable already, the GS node $v_{18}$ is placed in order to later verify whether or not the measurement data of SM node $v_{17}$ is legitimate. The process is repeated until it reaches the leaves with the largest $d$.

A spanning tree of a distribution network is said to be (sufficiently) observable if $Y(G)-I_{O}=0$, where $Y(G)$ is the $n\times n$ adjacency matrix of $G$ and the entry $y_{i,j}$ in $Y(G)$ is the number of edges from node $i$ to node $j$.

While $Y(G)$ represents the adjacency of edges where $y_{i,j}=1$ if there exists an edge from node $i$ to node $j$(otherwise, $y_{i,j}=0$), the $n\times n$ matrix $I_{O}$ specifies the observability of edges where the entry $\alpha_{i,j}=1$ if $y_{i,j}=1$ is observable and $\alpha_{i,j}=0$ otherwise according to the GPS algorithm. Since both $Y(G)$ and $I_{O}$ are determined by the status of edges between node $i$ and node $j$ for $n$ nodes (i.e., edge existence/observability), both matrices are identical when the two conditions are true, i.e., $y_{i,j}-\alpha_{i,j}=0$, $\forall i$, $j\in V_{T}$. $\blackboxfill$

In a spanning tree of a distribution network, the number of GS nodes placed on edges for the network to be observable is the same as the number of SM nodes.

By performing a breadth-first search, one can determine the number of children $m_{i}$ for every $v_{i}\in V_{T}$; ${\cal M}=\left(m_{1},m_{2},\ldots, m_{n}\right)$ is thus a set of non-negative integers and $\sum\nolimits_{\forall m_{i}\in{\cal M}}m_{i}=\vert V_{T}\vert-1$ is the total number of edges. Since the spanning tree imposes that the SM node must be a leaf and the EP node cannot be a leaf, there are $\gamma(<\vert V_{T}\vert)$ SM nodes (i.e., $m_{i}=0$ for a leaf node) while there are $\vert V_{T}\vert-\gamma$ EP nodes (i.e., $m_{i}>0$).

According to the GPS algorithm, $(m_{i}-1)$ of $m_{i}$ children of the EP node $i$ (at each depth and branch) are placed with a GS node on the associated edges in order for the network to be observable; this implies that one edge connecting the EP node $i$ and one of its associated children does not require a GS node. Since there are totally $\vert V_{T}\vert-\gamma$ EP nodes with $\vert V_{T}\vert-\gamma$ edges that do not need the GS nodes, the total number of GS nodes is derived as $\left(\vert V_{T}\vert-1\right)-\left(\vert V_{T}\vert-\gamma\right)+1=\gamma$ (which equals the total number of SM nodes), where the third term, 1, accounts for the GS node near the root. $\blackboxfill$

Theorem IV-B.3 allows utility network operators to rapidly discover the observability status of a distribution network upon the collections of the adjacency matrix $Y(G)$ and observability matrix $I_{O}$. Theorem IV-B.3 helps operators realize the total number of grid sensors required for a distribution network to be observable given the total number of smart meters.

SECTION V

We conducted three types of simulations in this paper in order to analyze the outcomes of the proposed CONSUMER attack model as well as grid sensor placement for detecting the attack.

In the first simulation, we set 5 kWh for the actual amount of power that the attacker consumes at a certain time period, and four different values (4 kWh, 3 kWh, 2 kWh, and 1 kWh) to which the attacker aims to reduce and for which it actually pays; this means that the difference in energy consumption between the real and fabricated values has to be compensated by a number of chosen neighboring victims in order to evade detection. From Fig. 6, we consider three conditions in terms of constraint level while the attacker performs such action. In an unconstrained scenario, there are no boundaries for the attacker to steal. As a result, it only needs to compromise as low as one smart meter from the neighbors (in addition to its own meter) for stealing the four different amounts of power. In more practical cases where there are boundaries (predetermined at utility control centers), we set an expected amount of 2 kWh that can be tolerated in fluctuation of customers energy consumption for a loosely constrained case, and 1 kWh for a strictly constrained case. From the results, we discover that more smart meters need to be compromised to achieve the targets while bypassing detection.

Remarkably, compromising a large number of smart meters is believed to be an improbable scenario because the attacker needs to know the upper and lower bounds for the victims' and its energy consumption patterns upon which the utility control center constantly monitors. However, a probable case should be emphasized, that is, the attacker may change its strategy to launch a $p$–cluster ${\bf k}$–sparse attack throughout the network and still, without being detected, where $p$ is the number of clustered attacks and ${\bf k}$ is a set of distinct element values that represent the number of compromised smart meters consisted in one of $p$ clusters. Such attack is studied in the next subsection, and note that the term sparse introduced here somewhat differs from that in [27].

In the second simulation, we study how the multiple CONSUMER attacks, referred to as a multi-CONSUMER attack, may be potentially launched by designating $p$ number of clustered attacks in which each cluster may be composed of a number of compromised smart meters based on any unique value in the set ${\bf k}=\{2,3,\ldots, k_{SM}\}$. Conditionally, each cluster must contain at least two meters: one for the attacker and one for the victim (see Theorem IV-A). If all $p$ clusters are composed of two meters (i.e., ${\bf k}=\{2\}$), a combination of ${\cal C}=(2,2,\ldots, 2)$ is formed where $\vert{\cal C}\vert=p\leq\lfloor{{k_{SM}}\over{2}}\rfloor$. Note that the order in ${\cal C}$ does not matter and the summation of all element values in ${\cal C}$ equals $k_{SM}$. Given $k_{SM}$ and ${\bf k}$, one can determine a number of possible combinations ${\cal C}$'s involved in a multi-CONSUMER attack.

Suppose there were six smart meters in a distribution network and the attacker was able to compromise all six meters. Originally, one attack cluster with all six meters could be formed as a single CONSUMER attack, namely, a 1–cluster $\{6\}$–sparse attack with the combination of ${\cal C}=(6)$. Alternatively, there might be two other possibilities of launching a multi-CONSUMER attack: 1) two attack clusters and three meters in each cluster—a 2–cluster $\{3\}$–sparse attack with the combination of ${\cal C}=(3,3)$, and 2) three attack clusters and two meters in each cluster—a 3–cluster $\{2\}$–sparse attack with the combination of ${\cal C}=(2,2,2)$. Similarly, if the attacker was to compromise eight meters with ${\bf k}=\{2,3,4\}$, a total of four cluster combinations can be generated: 1) $p=4$, ${\cal C}=(2,2,2,2)$, 2) $p=3$, ${\cal C}=(2,3,3)$, 3) $p=3$, ${\cal C}=(2,2,4)$, and 4) $p=2$, ${\cal C}=(4,4)$. Again, the order in ${\cal C}$ does not mater.

We set $k_{SM}=50$, 100, and 300. In Table I, 14 sets of element values in ${\bf k}$ for producing different sizes of clusters are discovered under a multi-CONSUMER attack scenario, while ${\bf k}$ contains a set of values ranging from 2 to 5. Take $k_{SM}=50$ as an example, in order for the attacker to maintain 2 compromised smart meters in each attack cluster, 25 clusters can be launched at once. On the contrary, when the attacker launches a multi-CONSUMER attack that is collectively formed by 2, 3, 4, and/or 5 compromised meters, the number of combinations of cluster formation can be as high as 258; this can also be solved as a coin change problem. As more different sizes of clusters are involved in the attack, more combinations are generated. This outcome may complicate the detection performance for utility operators because $p$–cluster attack may instigate $p$ CONSUMER attack problems at the same time.

More explicitly, Fig. 7(a) shows that there can be 10–25 clusters formed for ${\bf k}=\{2,3,4,5\}$ and $k_{SM}=50$ in which the average number of clusters is 16.29 with variance 8.48; also 10–25 clusters formed for ${\bf k}=\{2,4,5\}$ but 16.31 averaged number of clusters with 14.66 variance; and 10–16 clusters formed for ${\bf k}=\{3,4,5\}$ and 13.08 averaged number of clusters with 2.47 variance. It is worth noting that the value of elements as well as quantity of elements in ${\bf k}$ will determine the number of combinations, which may increase dramatically when the total number of compromised meters increases, as depicted in Fig. 7(b) and (c).

In the last simulation, we investigate how the detection rate varies with different levels of network observability in terms of the number of grid sensors placed in the network. From an attacker point of view, it can have ${\vert{\cal N}_{SM}\vert\choose k_{SM}}$ of ways to compromise $k_{SM}$ out of $\vert{\cal N}_{SM}\vert$ smart meters. Similarly, from a utility defender point of view, the operator has to determine ${n_{GS}\choose k_{GS}}$ of possible ways that $k_{GS}$ out of $n_{GS}$ grid sensors may become unavailable and cause partial unobservability of the network when $n_{GS}$ is a sufficient number for the network to be observable. In the worst case, the detection rate can be as low as zero when compromised smart meters are next to each other (whether they are connected to the same parent node or connected to their parents whose edge is shared by each other) and where exactly the grid sensor becomes unavailable. Two examples may be depicted from Fig. 5(b): 1) the worst undetectable and unidentifiable cases: consider the case that SM nodes $v_{27}$ and $v_{28}$ are compromised and at the same time GS node $v_{26}$ is unavailable, thus causing unobservability on $e(v_{25},v_{27})$ and $e(v_{25},v_{28})$—the CONSUMER attack on these two smart meters is undetected; also consider the case that SM nodes $v_{17}$ and $v_{20}$ are compromised, in which case the unavailability of GS node $v_{18}$ can cause $e(v_{16},v_{17})$ and $e(v_{19},v_{20})$ to be unobservable, and hence undetectable on SM nodes $v_{17}$ and $v_{20}$; and 2) the unidentifiable but detectable case: consider the case that SM nodes $v_{17}$ and $v_{23}$ are compromised and GS node $v_{18}$ becomes unavailable, in which case SM node $v_{23}$ is detected as an attacked node by observing GS nodes $v_{21}$ and $v_{24}$ but SM nodes $v_{17}$ and $v_{20}$ cannot be identified whether one or all of the smart meters are attacked. Hence, SM nodes $v_{17}$ and $v_{20}$ must be further inspected by the utility and considered as a detected case.

Fig. 8 shows how the average rate of detecting the CONSUMER attack(s) can be improved by increasing the number of grid sensors. Since the number of smart meters and grid sensors are identical (proven in Theorem IV-B.3), at the same time the number of times the smart meters to be attacked and the number of times the grid sensors to become unavailable are equally likely, the outcomes of the detection rate and grid sensor availability shown in Fig. 8 exhibit a linear relationship. From the results, we notice that the slope of the detection rate is steeper when the number of grid sensors (as well as smart meters) is smaller. On the other hand, the slope of the detection rate declines when the number of grid sensors increases. This means that a smaller network with a smaller number of sufficient $n_{GS}$ deployed is more vulnerable to unobservability as compared to a larger network with the same number of GS nodes becoming unavailable.

SECTION VI

In this paper, we have investigated a breach of data integrity attributed to false data injection attacks for the future power grid environment. We have formulated an attack model (CONSUMER) to illustrate that by compromising smart meters, an illegal customer “can steal” electricity by lowering the reading of its energy consumption and raising others' in a neighborhood distribution network. A novel hybrid intrusion detection system framework that incorporates power information and sensor placement has been developed to detect malicious activities such as CONSUMER attacks while the traditional bad measurement detectors cannot. An algorithm for placing grid sensors on lines or feeders strategically throughout a spanning-tree distribution network is proposed to provide sufficient network observability to enhance detection performance. We have shown that compromising a large number of smart meters may be improbable and indicated that the attack may be turned into multiple clustered attacks with a few compromised smart meters. We have also shown that while the detection rate can be improved by the proposed grid sensor placement with sufficient observability, it can be degraded by the unavailability of grid sensors as well.

Intrusion detection for the smart grid system (deployed with a large number of smart meters and grid sensors) will attract further investigation for the coming years. Meanwhile, we provide a few insights into some potential research topics associated with the proposed intrusion detection framework:

- Grid sensors in this paper are considered fully trustable. For practical scenarios, trustworthiness of meters and sensors can be explored to determine possible impacts on the proposed intrusion detection framework by addressing uncertainties of network dynamics in the context of smart grid security, e.g., the attacker can launch an observability attack by compromising or disabling some of the grid sensors, thus making intrusion detection more challenging.
- Grid sensor localization and associated observability studies can be further extended to grid isolation designs. For example, grid isolation may be employed to prevent catastrophic failures from cyber-physical attacks, but the grid in islanded mode must remain observable as well.
- The proposed CONSUMER attack design, which is currently limited to a one-player attack, can be extended to multi-player CONSUMER warfare where more than one attacker tries to steal electricity at the same time period. The attack can be redesigned as a (non)cooperative game based on the CONSUMER attack model to broadly explore its variants against the integrity of the power distribution grid system.
- The complementary detection methods of utilizing power and communications networks inspection incorporated in the proposed framework can be developed elaborately to improve detection performance.
- Further development of effective and efficient countermeasures are desired to cope with variants of the CONSUMER attack.

C.-H. LO and N. Ansari are with the Advanced Networking Laboratory, Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102 USA.

CORRESPONDING AUTHOR: C.-H. LO (CL96@njit.edu)

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

No Data Available

No Data Available

None

No Data Available

- This paper appears in:
- No Data Available
- Issue Date:
- No Data Available
- On page(s):
- No Data Available
- ISSN:
- None
- INSPEC Accession Number:
- None
- Digital Object Identifier:
- None
- Date of Current Version:
- No Data Available
- Date of Original Publication:
- No Data Available

Normal | Large

- Bookmark This Article
- Email to a Colleague
- Share
- Download Citation
- Download References
- Rights and Permissions