A Bayesian Nash Equilibrium-Based Moving Target Defense Against Stealthy Sensor Attacks

We present a moving target defense strategy to reduce the impact of stealthy sensor attacks on feedback systems. The defender periodically and randomly switches between thresholds from a discrete set to increase the uncertainty for the attacker and make stealthy attacks detectable. However, the defender does not know the exact goal of the attacker but only the prior of the possible attacker goals. Here, we model one period with a constant threshold as a Bayesian game and use the Bayesian Nash equilibrium concept to find the distribution for the choice of the threshold in that period, which takes the defender's uncertainty about the attacker into account. To obtain the equilibrium distribution, the defender minimizes its cost consisting of the cost for false alarms and the cost induced by the attack. We present a necessary and sufficient condition for the existence of a moving target defense and formulate a linear program to determine the moving target defense. Furthermore, we present a closed-form solution for the special case when the defender knows the attacker's goals. The results are numerically evaluated on a four-tank process.


I. INTRODUCTION
Critical infrastructures and control systems are increasingly connected to public communication networks, such as the Internet, and constitute geographically distributed cyber-physical systems (CPS).The use of public network infrastructures can save costs, for example cabling, but also increase the performance of the CPS.However, this interconnection comes at the price of vulnerability to cyber-attacks, which are already impacting critical infrastructures, such as the Ukranian power grid [1], as well as industrial control systems, such as a steel mill in Germany [2].
To improve CPS security and complement existing information technology (IT) security measures, such as encryption and authentication, a new branch of security measures based on control-theoretic methods has emerged over the last decade.These novel security measures are based on the physics of the CPS and use physical models to detect, isolate, and mitigate malicious attacks.Hence, the control-theoretic security measures are a complementary approach to the IT security measures.In the authors' opinion the main difference is between IT security measures and control-theoretic security measures is that IT security measures consider cryptography and logical isolation, while control-theoretic security measures are based on the physical models of the closed-loop system.Using both control-theoretic and IT security measures constitutes a defense-in-depth approach to security.An introduction to this topic can be found in the tutorial papers [3] and [4].Since the attacker and the operator/defender are rational entities, their interaction is strategic and can thus be modeled using gametheoretic tools, see, for example, [5].
An emerging approach to detect attacks and to limit their impact, which can combine both physical models and game theory, are moving target defense (MTD) strategies [6] that induce controlled uncertainties into the CPS to confuse the attacker.Gairo et al. [7], for example, randomly switch between the sensors used to detect otherwise stealthy attacks, while in [8] a random watermarking signal is injected into the CPS to make stealthy attacks detectable.Furthermore, perturbations of power line impedances are analyzed in [9], where an in-depth analysis is conducted to determine when the MTD will be successful.Another system-switching approach is considered in [10], which considers both actuator and sensor attacks.However, the MTD strategies in [7]- [10] directly influence the closed-loop behavior of the CPS and can decrease its performance.Griffioen et al. [11] propose three different MTD schemes, where the first one is similar to [7] but it also switches the plant and input matrix and not only the measurements used.The second MTD of [11] introduces an auxiliary system to not influence the closed-loop behavior and simultaneously detect an attack, while the third MTD utilizes the nonlinearities in the measurements.In this work, we propose a moving target defense that is placed in the anomaly detector of the CPS, which is located outside of the control loop (see Figure 1).Therefore, the proposed MTD neither influences the closed-loop performance directly, nor is there a need of introducing new auxiliary components to the system such that the controller and the moving target defense can be designed independently.

A. Contribution
When using an anomaly detector the defender faces a tradeoff between the cost for false alarms and the cost for the impact of a stealthy attack, where ideally both costs should be as small as possible, However, fewer false alarms typically lead to a larger attack impact, and vice versa, such that we cannot minimize both costs at the same time.Therefore, we formulate a game where the defender periodically chooses a detector threshold at random to mitigate the trade-off between its cost for false alarms and its cost induced by the stealthy attack launched by the attacker.The goal of the attacker is to maximize its payoff, which can, for example, be characterized by an unsafe region in the system's state space, while the defender wants to minimize the cost induced by false alarms and the cost induced by the attack.However, the defender is uncertain about the payoff function the attacker tries to optimize and only has a belief of facing an attacker with a certain payoff function.Here, we present an initial analysis of this game.We consider a single period with a constant threshold and look at the threshold choice for this period.We show that there is an equivalent matrix game to analyse the equilibrium strategies of each player.
The matrix game formulation is used to provide a necessary and sufficient condition for when a Bayesian Nash equilibrium exists in which the defender's strategy is mixed and does not concentrate the whole probability on one action.The defender's equilibrium strategy is then a moving target defense strategy.Furthermore, by using the structure of the matrix game, we show that the Bayesian Nash equilibrium can be obtained by solving a linear program.For the special case where the defender knows the attacker's type, we provide a closed-form solution for the Nash equilibrium, which gives us insights about the equilibrium strategies of the defender and attacker.Finally, we numerically verify our results with a four-tank system.

B. Related Work
Since control systems are typically equipped with an anomaly detector to detect faults, several research groups have investigated how the choice and tuning of the anomaly detector threshold can help limiting the attack impact of stealthy attacks.When it comes to the choice of the detector, Murguia et al. [12] compare a χ 2 and a CUSUM detector and investigate which detector mitigates the impact of a sensor attack the most.
In the present work, we are interested in the case where the detector is already chosen and we want to define a way to choose the thresholds to limit the attack impact.Urbina et al. [13] point out that there will be a trade-off between the number of false alarms and the maximum impact of a stealthy attack when tuning the anomaly detector.
There are several other works that use the anomaly detector threshold to limit the attack impact or to detect attacks.
Ghafouri et al. [14] propose a Stackelberg game framework for choosing the detector threshold.Both a static choice as well as a dynamic choice of the detector threshold are presented, but the attack is assumed to be detectable.The cost that the defender wants to minimize is composed of the cost of false alarms, the cost of the attack impact, and the cost for switching between thresholds.
In [15], we extend the static detector threshold choice of [14] to the case of stealthy sensor attacks and prove the existence of such a threshold and provide conditions for the uniqueness.
Niu et al. [16] formulated the detector threshold switching problem as a zero-sum Stackelberg game without considering the cost for false alarms.
In [17], we consider a similar game but there the defender exactly knows the attacker's objective.This assumption is relaxed in the present work since the defender only has a prior over possible attacker objectives.Therefore, we extend the results of [17] to a broader class of games, namely Bayesian games.Here, we also provide a closed-form solution to the special case considered in [17].

C. Notation
Let x ∈ R n be an n-dimensional column vector and A ∈ R m×n be an m-by-n matrix.The ith element of x is denoted by x i and A ij corresponds to the element in the ith row and jth column of A. Further x i:j is the vector [x i , x i+1 , . . ., x j−1 , x j ] T , where i ≤ j.If a random variable x has a Gaussian distribution with mean µ ∈ R n and covariance matrix Σ ∈ R n×n , we denote it as x ∼ N (µ, Σ).The expected value of a random variable x is denoted by E{x}.The n-by-n identity matrix is denoted by I n and an n-dimensional column vector with all elements equal to one as 1 n , while the indicator function of an event D is represented by 1 {D} .

II. SYSTEM MODEL
In this section, we introduce the models for the plant, controller, and detector and present our assumptions on the attacker and the defender.Further, in Section II-E we will discuss the assumptions made on the system, the attacker, and the defender.Figure 1 shows a block diagram of the sensor attack scenario that we consider.

A. Plant and Controller Model
In our setup, the plant receives actuator signals and sends measurement signals over a network.We model the plant in Fig. 1 as a linear discrete-time system, where x(k) ∈ R nx is the plant's state, ũ(k) ∈ R nu is the actuator signal received over the network, y(k) ∈ R ny is the measurement signal, w(k) ∈ R nx is the process noise, and v(k) ∈ R ny is the measurement noise.Both w(k) and v(k) have independent and identically distributed zero-mean multivariate Gaussian distributions with covariance matrices Σ w and Σ v , respectively.Further, w(k) and v(k) are independent processes.The system, input, and output matrices are w ) has no uncontrollable modes on the unit circle.The plant is controlled using a Kalman filter-based observer, which estimates the plant's state as x(k) ∈ R nx .The dynamics of the controller are where ỹ(k) is the measurement signal received over the network, u(k) is the actuator signal determined by the controller, and K and L are the controller gain and steady-state Kalman gain, respectively.Further, L = AP C T (CP C T + Σ v ) −1 , where P is the stabilizing solution to the algebraic Riccati equation and P exists due to Assumption 1.

B. Detector Model
Since faults and/or malicious attacks can occur, the closedloop system is equipped with an anomaly detector on the controller side, which has the possibly nonlinear dynamics where x D (k) ∈ R nD is the detector's internal state, y D (k) ∈ R ≥0 is the detector output, and r(k) ∈ R ny is the detector input.The exact structure of θ(x D (k), r(k)) and d(x D (k), r(k)) depends on the detector the defender will use.For example, in Section VII we consider the static χ 2 detector, i.e., y D (k) = r(k +1) 2 2 .More detector models can be found in [18], [19].
We define the input r(k) to be the residual signal, which is the normalized difference between the received and the predicted measurements, i.e., where Σ r = CP C T + Σ v is the steady state covariance matrix of r(k) = ỹ(k) − C x(k) under nominal conditions (no faults, no attacks), i.e., ũ(k) = u(k) and ỹ(k) = y(k) for all k.
Assumption 2: The detector dynamics (1) fulfill the subsequent three conditions: 3) d(0, 0) = 0 and θ(0, 0) = 0.If the predictions are accurate, i.e., r(k) ≈ 0, the detector output should be small.However, if the predictions are inaccurate, both the detector state and the detector output should increase.Furthermore, the detector triggers an alarm whenever the detector output y D (k) exceeds the detection threshold J D > 0.
Since the detector input is a random variable under nominal conditions, that is, r(k) ∼ N (0, I ny ) due to the Kalman filter, the detector output y D (k) is also a random variable.To avoid too frequent false alarms, i.e., alarms triggered under nominal conditions when there is no attacker present, the threshold J D should be chosen large enough.However, if it is chosen too large, the detector might not be able to detect anomalies (missed detection).Hence, there is a trade-off between false alarms and missed detections when choosing J D .Urbina et al. [13] further noted that there is also a trade-off between false alarms and the attack impact of a stealthy attack, which in our case corresponds to a missed detection.For example, a larger J D reduces the frequency of false alarms but gives the attacker more space to remain stealthy while causing harm.
Since the amount of false alarms plays an important role in detector tuning, we denote by τ the mean time between false alarms.The larger time between false alarms we want to achieve, the larger the detector threshold has to be such that the following is a reasonable assumption.
Assumption 3: The detector threshold is a strictly increasing (possibly nonlinear) function of τ , i.e., J D (τ a ) < J D (τ b ) if, and only if, τ a < τ b .
Instead of considering the threshold, J D , we will consider the mean time between false alarms, τ , in the following, since there is a direct relation between J D and τ .Further, the value of τ is more meaningful to the operator.To circumvent the trade-off between false alarms and missed detections (and the impact of stealthy attacks), the defender could periodically randomize the choice of the mean time between false alarms τ such that in one period the threshold reduces the number of false alarms while in another it limits the impact of a potential stealthy attack.That is, it chooses τ periodically from the fixed set {τ 1 , . . ., τ m }, pre-determined by the defender, with probability distribution p, where 1 ≤ τ 1 < τ 2 < . . .< τ m , p i ∈ [0, 1] is the probability of choosing τ i , and m i=1 p i = 1.We make the following assumption about the random choice of τ .
Assumption 4: At the beginning of each period, τ is drawn from the probability distribution p, independent from previous realizations.
Our definition of a moving target defense is stated next.Definition 1: A probability distribution p ∈ R m over a fixed set of mean times between false alarms {τ 1 , . . ., τ m } is a moving target defense if p does not have singleton support, i.e., the probability of choosing τ i fulfills p i ∈ [0, 1) for all i ∈ {1, • • • , m}.

C. Attacker Model
In this paper, our focus lies on sensor attacks.Assumption 5: The measurement signals are subject to an additive attack y a (k) chosen by the attacker, and the actuator signals are transmitted fault/attack-free, i.e., ỹ(k) = y(k) + y a (k) and ũ(k) = u(k).
Furthermore, we make the following assumption on the attacker's model knowledge.
Assumption 6: The attacker knows the closed-loop system matrices, A, B, C, L, K, the noise statistics Σ w , Σ v , and the detector dynamics.The attack starts at time k = 0 and has a length of N time steps.The length N of the attack is such that the attacker is able to complete the attack before the next threshold switch.The attacker knows both x D (0) and x(0), and has access to the measurements y(k).Moreover, the attacker knows the function J D (τ ) and the set {τ 1 , . . ., τ m } but not the exact value of τ .
An attacker according to Assumption 6 can launch an attack of the form (see [12]) r a(k), which gives the attacker complete control over the detector input, i.e., r(k) = a(k).This attack is a closed-loop attack since it uses the measurements y(k), whereas a(k) can be interpreted as the attacker's reference signal.The attacker can define the set of attacks that do not trigger an alarm for a given τ as T is the complete attack trajectory during the attack.Here, the set A(τ ) constrains the size of y a (k) as well by constraining a(k).
Remark 1: Note that A(τ a ) ⊂ A(τ b ) if τ a < τ b due to Assumption 3. Furthermore, where appropriate we will use a ∈ A(τ ) instead of {a, x D (0)} ∈ A(τ ) for the sake of readability.
If the attacker manages to choose a such that {a, x D (0)} ∈ A(τ ) in the current period with a constant threshold, the attacker remains stealthy, which is the main constraint of the attacker as described below.
Assumption 7: The attacker has one of n φ different types, which determine the objective of the attacker.An attacker of type φ wants to maximize its expected payoff characterized by f φ (a).Further, if the attack is detected, the attacker receives no payoff and, therefore, it wants to remain stealthy, i.e., y Next, we define the attacker's expected payoff for a given attacker type, where we use the indicator function to take into account that the attacker will not get any payoff if it is detected.Assumption 8: For a given attacker type φ, the corresponding expected attacker payoff f φ (a) is continuous and fulfills max a∈A(τa) f φ (a) < max a∈A(τ b ) f φ (a) if τ a < τ b except when f φ (a) = 0 for all a.
Remark 2: If f φ (a) is a continuous, convex function and A(τ ) is a closed convex set, then Assumption 8 is fulfilled, since then max a∈A(τ ) f φ (a) is equivalent to a concave minimization problem, whose optimizers are the extreme points of A(τ ) (see [20]).If we use a vector norm-based stateless detector, such as the χ 2 detector, A(τ ) is a closed convex set.

D. Defender model
Next, we describe our defender model.When choosing τ the defender needs to take into account the expected cost that is induced by the false alarms in the nominal case, but also the expected cost of an undetectable attack.This leads to the following cost function for the defender assuming an attacker of type φ, where c F > 0 is the cost factor for false alarms.Note that while the attacker has one of n φ possible types, the defender has only one type, but its cost function is influenced by the attacker type.Remark 3: Since the defender's cost (3) is always influenced by the attacker's payoff, it is reasonable to introduce an attacker type with zero payoff, i.e., f φ (a) = 0 for all a.This means that the case of there not being an attacker present in the system is modeled as well in our moving target defense framework.
Next, let us make the following assumption about the knowledge the attacker and defender have about each other.

E. Discussion of the system model
In this section, we discuss the assumptions made during the setup of the model.First, we discuss the assumption about iid choice of τ (Assumption 4) and the attack length (Assumption 6).Since the values of τ are realizations of iid random variables, the current value of τ is independent of its previous values, observing the system does hence not reveal information about τ beyond its distribution, which the attacker can deduce from its system knowledge.The attacker will be able to estimate the distribution p under the iid choice if it has access to previous values of τ .This case is already taken into account in our MTD framework.A change of threshold implies a reconfiguration of the system, which can be costly for the operator of a safetycritical large-scale infrastructure.Therefore, the operator does not switch the thresholds too frequently.Hence, it is not unreasonable to analyse the case where the attack is carried out during a fixed, but random, configuration.Note that an approach to consider the cost of a finite amount of switches can be found in [14].Furthermore, for industrial processes, where a product is produced in batches, an iid choice of the threshold between different batches is also a reasonable assumption.
The analysis of an attacker that experiences threshold switches during the attack is similar to the analysis we present in the subsequent sections, because we can determine the probability of first choosing τ i and then τ j due to the iid choice in Assumption 4.However, the notation would become more involved.Therefore, these assumptions simplify the problem formulation so that it becomes mathematically more tractable.
Next, we justify the assumptions on the attacker's knowledge and goals.According to [21], one should design the plant for the worst-case attacker knowledge, because, given enough time, an attacker may be able to obtain a perfect model of the plant, the controller, and the detector.For example, the plant and controller could be estimated through system identification techniques from the observed sensor data, while the detector model could be obtained from leaked documentation of the system.Hence, the extensive knowledge of the attacker about the closed-loop system and detector according to Assumption 6 is in line with [21].In our previous work [22], we showed how the attacker can obtain the internal states of both the controller and the detector in an experimental setup.Hence, assuming that the attacker has knowledge of the controller and detector states is not unreasonable.Further, the knowledge of x D (0) and x(0) can be interpreted as an opportunistic attacker choosing to attack at the best time instant, which we define to occur without loss of generality at k = 0.In contrast, the choice of τ is not visible in the sensor data observed by the attacker.In addition, the attacker knowledge in Assumption 6 together with the assumption that the attacker will maximize its objective function f φ (a) (Assumption 7) results in a worstcase scenario for the defender under the given assumption.
While both x D (0) and x(0) depend on the measurements and could, therefore, be estimated by the attacker, τ is chosen randomly from {τ 1 , . . ., τ m } (see Assumption 4) such that the attacker cannot know the exact value of τ .Further, τ does not directly influence the system variables, such that the attacker is also not able to estimate τ from the measurements.
In Assumption 7, we introduce attacker types.A given attacker type, φ, describes the target of the attacker through its objective, f φ (a).Since the attacker's target is often not known to the defender, having different attacker types gives the defender the possibility to distinguish between different targets while using our sensor attack model, and also incorporate the case of no attacker being present (Remark 3).
The assumption that the attacker will not get any payoff when detected (Assumption 7) is a strong assumption on both the attacker and defender, which is mostly beneficial for the defender.However, if we consider critical infrastructures, such as the power grid, an operator has to mitigate the attack quickly when detected to prevent harm.Furthermore, we can also imagine that the attacker has made a significant investment to obtain its system knowledge and infiltrate the system.Hence, the attacker wants to remain undetected in order to not risk losing its investment.This kind of attacker has similarities to an advanced persistent threat, which is an attacker with knowledge about the system and that targets specific parts of the system while remaining stealthy (see, for example, [23]).It is important to point out that due to the random choice of τ (Assumption 4) it is more difficult for the attacker to remain stealthy but at the same time obtain a large payoff.
The defender will rarely know the intentions of the attacker.To obtain information about potential targets of the attack, the defender can conduct a risk assessment [24] of the system.By conducting a risk assessment, the defender determines the vulnerabilities in its system, the likelihood of an attacker exploiting a vulnerability, and the potential impact of a successful attack.A vulnerability could be an unsafe region in the system's state space, e.g., the overpressure region for a tank, such that the attacker's objective would be to bring the system into this unsafe region.The different vulnerabilities can be interpreted as attacker types φ, the prior π φ as the likelihood of an attacker exploiting a vulnerability, and the impacts are reflected by the attacker's payoff f φ (a), which directly influence the defender's cost (3).Hence, the defender's knowledge about possible attack objectives, their prior and their impact, as assumed in Assumption 9, can be interpreted as the outcome of a risk assessment conducted by the defender.Therefore, we can interpret the Bayesian moving target defense framework as a tool to enhance the security for the defender similar to the ARMOR framework deployed at LAX [25], which makes use of the outcomes of the risk assessment.

III. PROBLEM FORMULATION
Now we formulate the problem of finding a moving target defense strategy as a game between the defender and the attacker, where the defender's goal is to choose τ to minimize the expected value of (3) with respect to the prior of the attacker types while the attacker chooses a to maximize (2).Due to Assumption 6, we can focus on the game over one period with a constant threshold.This focus on one only period can also be interpreted as a repeated game with memoryless players, which has been considered in [26].
The game has both imperfect and incomplete information.The information is imperfect because neither player observes the action taken by the other player.The information is incomplete because the defender does not know which type of attacker it faces.The defender believes that with probability π φ it will play the game with an attacker of type φ.The imperfect information lets us interpret the game as a game with simultaneous moves, while the incomplete information results in a Bayesian game framework.Therefore, we define the moving target defense game M = P, A, T , Π, U , where P = {Defender, Attacker} is the set of players, A = {τ 1 , . . ., τ m } × R N ny+nD is the action set, T = {1} × {1, . . ., n φ } is the set of player types, Π = {1} × {π 1 , . . ., π n φ } is the prior, and U = (c(τ, a|φ), p(τ, a|φ)) contains the cost and payoff functions of each player.For the analysis, we also define the game M φ = P, A, U , where the defender is certain about the attacker type it faces, that is, π φ = 1 for some φ ∈ {1, . . ., n φ } in M.
The Bayesian game framework together with the simultaneous choice of actions lead us to the Bayesian Nash equilibrium as the solution concept.To define the Bayesian Nash equilibrium we introduce the (possibly mixed) strategies of the defender and attacker.Let ∆ p be the set of probability distributions over the defender's actions.Then p ∈ ∆ p is a discrete probability distribution, where the ith element, p i , is the probability that the defender chooses τ i .For a given attacker type φ, ∆ q (φ) is the set of probability distributions over the attacker's action set.Since the attacker, to obtain a non-zero payoff, chooses a trajectory a from A(τ ), which is typically not a discrete set, q φ ∈ ∆ q (φ) may represent a continuous probability distribution.We call both p and q φ a mixed strategy, if it does not concentrate the whole probability on one action.Otherwise, we call it a pure strategy.
Since both the attacker and defender might use mixed strategies, we investigate the average cost of the defender and the average payoff of the attacker for a given attacker type φ.Hence, (4) and ( 5) represent the average cost and payoff of the players in the game M φ .
A mixed strategy Bayesian Nash equilibrium, p * ∈ ∆ p and q * φ ∈ ∆ q (φ), fulfills for all p ∈ ∆ p , q φ ∈ ∆ q (φ), and In the Bayesian Nash equilibrium, a change from p * to another p ∈ ∆ p does not lead to a decrease in the cost for the defender, and, similarly, a change from q * φ to another q φ ∈ ∆ q (φ) does not lead to an increase in payoff for an attacker of type φ.Hence, neither the defender nor the attacker want to deviate from their equilibrium strategies.Here, the defender needs to consider all possible attacker types, which results in averaging of the costs of each game M φ over the prior, while the attacker needs to have an equilibrium strategy for each type.This is because the attacker knows its own type, which the defender does not know, while the defender has only one type, which is known to both the defender and attacker.
Equipped with the definition of both the MTD and the Bayesian Nash equilibrium, we formulate the two problems we investigate in the remainder of this paper.
Problem 2: If a Bayesian Nash equilibrium representing a MTD exists, compute an equilibrium strategy p * .

IV. MATRIX GAME FORMULATION
Recall that the defender plays against one of n φ adversaries, but it does not know which adversary it is facing.Furthermore, while the defender has a finite set of actions, i.e., {τ 1 , • • • , τ m }, the attacker's action set, R N ny+nD , is a continuum.This makes finding Bayesian Nash equilibrium strategies challenging.In this section, we will show that each game M φ can be reformulated into a strategically equivalent game M φ , where the attacker's action set is finite too.
We begin by recalling that for a given τ the attacker will only receive a non-zero payoff if {a, x D (0)} ∈ A(τ ).Hence, for a given τ the attacker will always choose its attack trajectory such that {a, x D (0)} ∈ A(τ ).Due to the discrete set of actions for the defender, we can separate the continuous action space of the attacker into m+1 sets in the game M φ as shown in Table I.The set A(τ i ) \ A(τ i−1 ) contains all attack trajectories that are stealthy for τ i excluding the ones that are stealthy for τ i−1 .Hence, if {a, x D (0)} ∈ A(τ i )\ A(τ i−1 ) then the attack will be detected if the defender chooses τ i−1 , but not if it chooses τ i .We can remove the last column from Table I, because a ∈ A(τ m ) results in zero payoff for the attacker.
We define the maximum payoff for a given τ i and a given attacker type φ as Note that we also optimize over x D (0), which the attacker has normally no influence over.We do that to obtain the maximum possible payoff an attacker could achieve, which goes along with the scenario of the worst-case attacker and the interpretation that the attacker waits for the optimal time to attack.We can show the following for the maximum payoff.
Proof: We begin by proving the first part of the lemma.The first and second condition in Assumption 2 guarantee that A(τ i ) is a compact set for a given τ i (see Theorem 7.1 in [27]).Hence, by the extreme value theorem, we know that I φ i always exists for a given τ i and φ.The second part of the lemma readily follows from Assumption 8.With the maximum payoff for a given τ i , we can formulate a finite matrix game M φ = P, A φ , U φ , where P is defined as in M, A φ = {τ 1 , . . ., τ m } × {I φ 1 , . . ., I φ m }, and U φ = ( cF τi + 1 {j≤i} I φ j , 1 {j≤i} I φ j ), where i, j ∈ {1, . . ., m}.Since both the attacker and the defender have finite actions sets in M φ , we formulate M φ as the matrix game shown in Table II.Further, we define the m × m matrix Ω(φ) as the defender's cost matrix with elements Ω i,j (φ) = cF τi +1 {j≤i} I φ j , and Υ(φ) as the m × m matrix that has the attacker's payoff matrix with elements Υ i,j (φ) = 1 {j≤i} I φ j .Proposition 1: The finite game M φ in Table II is strategically equivalent to the game M φ in Table I.
Proof: Since the attacker's objective is to maximize its payoff (2), it always chooses the trajectory that maximizes its payoff.From Lemma 1 we know there exists a maximum payoff trajectory for each of the columns in Table I.Hence, choosing the maximum payoff is strategically equivalent to choosing an attack trajectory that yields the maximum payoff.
By using the equivalent game in Table II, we can simplify the average cost functions, (4) and ( 5) of the game M φ , used in the Bayesian Nash equilibrium (6) to bilinear functions of p and q φ , which helps us to solve both Problem 1 and Problem 2.
Corollary 1: In the strategically equivalent finite game M φ , the average cost of the defender is given by cφ (p, q φ ) = p T Ω(φ)q φ and the average payoff of the attacker is given by pφ (p, q φ ) = p T Υ(φ)q φ for each attacker type, where the ith element, q φ,i , of q φ is the probability of choosing an attack trajectory that leads to the maximum payoff I φ i .Proof: Since the attacker has a finite set of actions in M φ , its mixed strategy q φ is a discrete probability distribution.This leads directly to bilinear functions of p and q φ for the average cost and payoff, respectively.

V. BAYESIAN NASH EQUILIBRIUM-BASED MTD
In the previous section, we showed that for any particular φ the corresponding game M φ is strategically equivalent to a finite matrix game M φ .This means that the Bayesian game M is strategically equivalent to a finite Bayesian game, denoted by M and its equilibria can be found by formulating an induced matrix game [28], obtained by combining the games M φ with respect to the prior.In what follows, we first illustrate the procedure and we then use the induced game to give a necessary and sufficient condition for the existence of a Bayesian Nash equilibrium that is a moving target defense according to Definition 1.

A. An illustrative example
We start with an illustrative example, where each player has two actions to choose from.The attacker is assumed to have type 1 with probability π 1 and type 2 with probability π 2 = 1 − π 1 .Hence, the finite game M φ corresponding to attacker type φ is as shown in Table III, where φ ∈ {1, 2}.
To find the Bayesian Nash equilibrium, we can formulate an induced matrix game (see [28]) and find the Nash equilibria of that induced matrix game, which correspond to the Bayesian Nash equilibria of the original game M. In the induced game, we combine the matrix games M 1 and M 2 into one game.The actions of the defender in the induced game are the same as in the games M 1 and M 2 , that is, it can choose τ 1 or τ 2 as its action.The attacker, however, has the actions I 1 i1 I 2 i2 , where i 1 and i 2 are in {1, 2}.Hence, the attacker in the induced game is a combination of the attackers in the games M 1 and M 2 and its payoff is the expected value over the attacker types given the defender's prior [π 1 , π 2 ].The induced game is illustrated in Table IV.If the attacker chooses I 1 i1 I 2 i2 in the induced game, then in M 1 the action of the attacker is its i 1 th action, i.e., I 1 i1 , and in M 2 the action of the attacker is its i 2 th action, i.e., I 2 i2 .From Table IV, we observe that the defender prefers τ 2 over τ 1 if the following conditions hold Note that the first inequality always holds, while the second and third inequalities hold if the last inequality holds.Hence, we see that the defender prefers to play τ 2 over playing τ 1 if cF τ1 > cF τ2 +π 1 I 1 2 +π 2 I 2 2 .In this case, the defender will play τ 2 independent of the attacker's action, such that the attacker will always play the action that maximizes its payoff, i.e., I 1 2 I 2 2 .Therefore, there exists only a pure Bayesian Nash equilibrium strategy, which is not a moving target defense.For this simple example, we determined a sufficient condition for when an MTD does not exist.However, this is a simple example where the induced matrix game has a managable size and we can calculate the Bayesian Nash equilibrium by hand.Assume now that the defender has m > 2 actions, while the attacker has m actions and n φ > 1 types.Then the induced matrix game is an m×m n φ matrix game, whose size becomes unmanagable as either m, n φ , or both, grow.

B. Best responses and strictly dominated actions
In the induced matrix game, the actions of the attacker are , where i φ ∈ {1, . . ., m} and φ ∈ {1, . . ., n φ }, while the defender chooses τ l .This leads to the attacker payoff and the defender cost in the induced matrix game, which we can use to characterize the best responses of the players.Lemma 2: The best response of the attacker to a given action and the best response of the defender to a given action Proof: We start by investigating the best response of the attacker.For a given τ l , the payoff I j ij of type j with i j > l is zero since it is detected.Hence, i j ≤ l needs to be fulfilled for each attacker type if it wants to get a payoff.Recall that I j i < I j η for all i < η.Hence, to obtain the maximum payoff for a given τ l , the attacker has a unique best response given by (7) in the induced matrix game.
For a given attack action, the defender's best response is to choose τ l to minimize its cost, which results in the set of best responses given in (8).
While Lemma 2 provides the unique best response of the attacker, the defender might have several best responses.For example, it could be best to choose τ m for a given attacker action to minimize the cost for false alarms.By choosing a smaller τ , even though it increases the false alarm cost, it makes more attacks detectable, which in turn decreases the attack cost.Hence, the best response depends on many factors.Now that we looked at best responses, we will investigate when actions are strictly dominated in the induced matrix game.For the defender, an action τ l strictly dominates τ η if the cost for τ l is strictly lower than the cost for τ η for all possible actions of the attacker.Strict dominance of one attacker action over another can be defined similarly.With the best responses and the strictly dominated actions, we are then equipped to prove the existence of moving target defenses according to Definition 1. Recall that by eliminating strictly dominated actions, we do not change the set of the Nash equilibria of the induced game and, therefore, neither the Bayesian Nash equilibria of the original game M.
Proof: We start with the strict dominance of the rows.First note that τ l strictly dominates τ η if holds for all possible attacker actions . We can split this condition into three cases: 1) All attacker types use attacks that are stealthy for τ η , i.e., actions Here, we see that the terms related to the attacker payoff on both sides of the inequality in (10) are the same, such that (10) simplifies to cF τη < cF τ l .Hence, η < l is necessary for the strict dominance of τ l over τ η .
2) All attacker types use attacks that will be detected for τ l , i.e., actions Since the attacks are detected by τ l , they will also be detected by τ η such that the terms related to the attacker payoff on both sides of the inequality in (10) disappear.Therefore, (10) simplifies again to cF τη < cF τ l .
3) There exists at least one attacker type that uses a strategy that is stealthy for τ l but not for τ η , i.e., actions Subtracting the terms related to the attacker payoff on the left side of (10) from the inequality itself leads to The first two cases show that we need τ l > τ η or equivalently l > η for τ l to strictly dominate τ η .For the third case, since is always correct, (11) holds if (9) holds.Further, since the following inequalities hold for all i j ∈ {1, . . ., m} and j ∈ {1, . . ., n φ }, we see that if (9) holds τ l does not only strictly dominate τ η , but all τ ν with ν ∈ {1, . . ., η}.Therefore, if (9) holds we can remove the first η rows of the induced matrix game.With the first η rows removed it follows that η+1 strictly dominates all actions where the attacker of type η+1 for i j ∈ {1, • • • , η + 1}.Hence, we can additionally remove the (η + 1) n φ − 1 columns corresponding to these actions to obtain the reduced matrix game.

C. Existence of a MTD strategy (Problem 1)
We now formulate a necessary and sufficient condition for the existence of a MTD strategy for the defender in the Bayesian game M according to Definition 1.
Theorem 1: A moving target defense strategy exists if, and only if, Proof: First, assume that (12) does not hold.Then ( 9) is fulfilled with l = m and η = m− 1.Hence, we can reduce the induced matrix game to a 1 × 1 matrix game (see Lemma 3), which has a pure strategy equilibrium.Therefore, the original Bayesian game, M, has a unique and pure Bayesian Nash equilibrium, such that no MTD strategy exists.
Next, we show that there exists at least one Nash equilibrium where the defender plays an MTD strategy if (12) holds.Since the induced matrix game is a finite matrix game, we know that there exists at least one Nash equilibrium and equivalently at least one Bayesian Nash equilibrium for the original game, M. For the Nash equilibrium to be a pure strategy Nash equilibrium, each player needs to play a best response to the other player's best response.Assume that ) is a pure strategy Nash equilibrium, then according to Lemma 2 the following needs to be fulfilled i.e., each player's action is a best response to the other player's best response.Comparing the first equation with the attacker's best response (7), we see that in a pure Nash equilibrium 8), we determine that the best response of the defender is To have a pure Nash equilibrium we need l = m.We observe that there cannot be a pure Nash equilibrium if (12) holds with inequality such that all equilibria are moving target defenses.However, if (12) holds with equality, the best response of the defender to m can be both τ m−1 and τ m .Hence, in this case there exists a pure strategy Nash equilibrium in the induced matrix game.Next, we show that a moving target defense equilibrium strategy exists as well in this case.
First, note that if (12) holds with equality then τ i is strictly dominated by τ m for all i ∈ {1, . . ., m − 2}, such that we can reduce the induced matrix game to a 2×(m n φ −(m−1) n φ +1) matrix game.Further, from (13) we see that any distribution over τ m−1 and τ m is a best response to the attack strategy m is also a best response to at least one distribution over τ m−1 and τ m then we have found a Bayesian Nash equilibrium, which fulfills Definition 1.By multiplying the attacker's payoff matrix in the reduced matrix game from the left with the distribution over τ m−1 and τ m , we determine that the expected payoff for playing where i j ∈ {1, . . ., m} for all j ∈ {1, . . ., n φ } and p m is the probability of choosing τ m .Note that m is a best response to the mixed strategy of the defender, if the expected payoff for choosing m is greater than or equal to all other expected payoffs the attacker could receive, i.e., for all i j such that Hence, the attacker prefers to play where we used that I ij < I m−1 for all i j < m − 1.This shows us that there there are infinitely many Nash equilibria in the induced matrix game where the defender uses a MTD strategy according to Definition 1 if ( 12) holds with equality.
Therefore, we conclude that a moving target defense strategy according to Definition 1 exists if, and only if, (12) holds.Remark 4: If π i = 1 for some i, i.e., we have only one attacker type, the condition in Theorem 1 simplifies to the condition for the existence of a MTD from [17].
Remark 5: In case (12) holds with equality, we can introduce an attacker type that has zero payoff as mentioned in Remark 3 with a prior of ǫ > 0 and subtract ǫ from one of the priors π j .This will lead to (12) holding with inequality.Hence, (12) can always be turned into an inequality by an arbitrarily small change in the priors.

VI. COMPUTING A MTD STRATEGY (PROBLEM 2)
In this section, we look into computing a MTD strategy.First, we investigate the general case and formulate a linear program to compute MTD strategies.Second, we investigate the special case n φ = 1 and provide a closed-form solution for computing a MTD strategy.

A. General case
Finding Nash equilibria of a finite matrix game leads to a bilinear optimization problem as shown in [29].For Bayesian Nash equilibria, we can adopt the optimization problem in [30] to obtain the following bilinear optimization problem min p,q φ ,c,p(φ) Recall that the elements of the matrices Ω(φ) and Υ(φ) are Ω i,j (φ) = cF τi + 1 {j≤i} I φ j and Υ i,j (φ) = 1 {j≤i} I φ j , respectively.Here, q φ is the mixed strategy for the attacker with type φ and p(φ) is its average payoff, while p is the mixed strategy of the defender and c is its average cost.
Proposition 2: Assume that the condition of Theorem 1 holds, and thus a MTD exists.A MTD strategy can then be computed by solving the linear program, min p,q φ ,c,p(φ) where the lth element of the m-dimensional vector γ is cF τ l .Proof: Due to the special structure of Ω(φ), we note that the lth element of Ω(φ)q φ equals cF τ l + l j=1 q φ,j I φ j , such that Ω(φ)q φ = γ + Υ(φ)q φ .Inserting that in the objective function and the constraints of ( 14) leads to the optimization problem in (15), where we further used that The optimization problem in ( 15) is a convex linear program and therefore, we are guaranteed to find the global optimum.This has an advantage over directly solving (14), where we may get stuck in a local optimum.

B. Special case: n
Next, we provide a closed-form solution to Problem 2, when the defender faces only one attacker type, i.e. n φ = 1, which is the problem we mentioned in Remark 4. For ease of notation, we will omit the superscript for the attacker type.
For n φ = 1 the matrix representation of the Bayesian Nash equilibrium definition in (6) simplifies to the definition of the Nash equilibrium Let Q denote the support of the attacker's mixed strategy, i.e., if i ∈ Q then the attacker chooses I i with a nonzero probability q i > 0 and if i ∈ Q then q i = 0.The support for the mixed strategy of the defender is defined in a similar way and is denoted by P with probabilities p i .In the following, we investigate one mixed strategy for the defender and one for the attacker and show how the support of the best response of the attacker, respectively defender, has to look like.We then use this to define a mixed strategy Nash equilibrium, which represents a MTD.
Lemma 4: If the attacker fixes i, 1 < i < m, and uses the mixed strategy where then P ⊆ {i, . . ., m} needs to hold for the support of the defender's best response.Proof: First, note that q j > 0 for j ∈ {i + 1, . . ., m}, since τ j > τ i if j > i.Further, if q i ∈ [0, 1) we see that the mixed strategy q given by ( 17) is a proper probability distribution.
Next, we look at the possible best responses of the defender to this strategy.The average cost of the defender is given by and With that we can determine that . . .
where we used (17) for the values of q j for j > i.This leads to the following average cost of the defender Now assume P ⊆ {i, . . ., m}, then p 1:i−1 = 0 and m j=i p j = 1, such that the average cost turns into p T Ωq = q i I i + cF τi , which shows us that the defender is indifferent among its action, as it obtains the same average cost no matter how the distribution p i:m is chosen.Now let P ⊆ {i, . . ., m}, then The defender chooses P ⊆ {i, . . ., m} if, and only if, Since the elements of both p 1:i−1 and γ are positive and cF τ1 > cF τ2 > . . .> cF τi−1 , we obtain the following necessary condition for choosing P ⊆ {i, . . ., m}, c 0 and the defender chooses P ⊆ {i, . . ., m} to minimize its cost.Reformulating this inequality gives us the upper bound Note that if τ i−1 is strictly dominated by τ i , this upper bound is larger than 1 and therefore automatically fulfilled, if q i ∈ [0, 1).However, if τ i−1 is not strictly dominated by holds and we can determine ρ i as Hence, if q is chosen according to (17) such that q i fulfills (18), the defender chooses P ⊆ {i, . . ., m}.
Lemma 4 shows us the support of best responses for the defender to the attack strategy (17), however, it still leaves the open question how to choose i such that (18) is fulfilled.
Lemma 5: There exists a unique index i = i * ∈ (1, m − 1) so that ( 17) is a proper probability distribution and ( 18 From ( 17), we obtain that Hence, there exists an i = i * > 1 such that q i ≥ 0 while for i < i * we have q i < 0, such that the mixed strategy in ( 17) is not a proper probability distribution and therefore not a valid strategy.For i = i * we can, therefore, show that Hence, (18) holds for i = i * .Now assume we choose i = j > i * such that q i ∈ [0, 1) if i = j and also q i ≥ 0 if i = j − 1.
Then we obtain that Hence, (18) does not hold for any i = i * .Therefore, i = i * is the smallest index for which q * i ∈ [0, 1) and the unique index for which (18) holds.
Similar to Lemma 4 we can find a mixed strategy p with support P = {i, i + 1, . . ., m} such that the best response of the attacker has support Q ⊆ {i, i + 1, . . ., m}.
Lemma 6: If the defender uses the mixed strategy then the support of the attacker's best response needs to satisfy Q ⊆ {i, i + 1, . . ., m}.
Proof: First note that since I j > I i if j > i, we see that each p i ∈ (0, 1).Furthermore, we can verify that m j=1 p j = 1.Hence, the mixed strategy described by ( 19) is a proper probability distribution.
Next, we look at the possible best responses of the attacker to this strategy.The average cost of the attacker is where we used that Υ 12 = 0 and Υ 21 = 1 i−1 [I 1 , I 2 , . . ., I i−1 ].Due to the chosen p, we obtain that p T i:m Υ 22 = 1 T m−i+1 I i , which results in the following average cost The upper bound comes from the fact that I j < I i if j ∈ {1, . . ., i − 1} (see Assumption 8).Hence, we see that the best response of the attacker to the defender's strategy p is any mixed strategy q with support Q ⊆ {i, i + 1, . . ., m}.
One notable difference between the results given in Lemma 4 and Lemma 6 is that the defender's mixed strategy p given by (19) is valid for all i, while q given by ( 17) has the additional constraint (18).However, Lemma 5 shows us that under a certain condition there exists a unique index i for which (18) holds.Next, we show that for a specific choice of i the strategies (17) and (19) form a mixed strategy Nash equilibrium.
Theorem 2: Let i = i * ∈ [1, m − 1] be the smallest index for which q * i ∈ [0, 1) in (18) holds such that q * is a proper probability distribution.The mixed strategies p * and q * given by ( 19) and (17), respectively, form a mixed strategy Nash equilibrium such that p * is a MTD if, and only if, such that q i * ∈ [0, 1) exists.For this, we need to consider three different cases.In the first case, we assume that 1 − m j=2 cF Ij Then Lemma 5 shows us that a unique i * ∈ (1, m − 1) exists, for which (18) holds and i * is also the smallest index for which q i ∈ [0, 1).In the second case, we assume that 1 − m j=2 cF Ij Then i * = 1 guarantees that q i ∈ [0, 1).Furthermore, i * = 1 is also the smallest index in this case for which q i ∈ [0, 1).In the third case, we assume that cF Im ( 1 τm−1 − 1 τm ) = 1, which shows us that i * = m − 1 is the smallest index in this case for which q i ∈ [0, 1).Hence, there exists a unique index i * ∈ [1, m − 1], which is the smallest index such that q i ∈ [0, 1) if, and only, if cF Im Next, if we use q * with i = i * , we see that the support of the defender's best response needs to fulfil P ⊆ {i * , i * + 1, . . ., m}, which is fulfilled when p * is used.Therefore, p * is a best response to q * .Finally, if we use p * with i = i * , we see that the support of the defender's best response needs to fulfill Q ⊆ {i * , i * + 1, . . ., m}, which is fulfilled when q * is used.Therefore, q * is a best response to p * .Hence, p * and q * form a mixed strategy Nash equilibrium and p * is a MTD according to Definition 1.
Theorem 2 presents one optimal solution to the optimization problem (15) when n φ = 1.Our numerical experiments in Section VII-B show that the optimal solution obtained by solving (15) coincides with the equilibrium proposed in Theorem 2.
Remark 6: Note that if cF Im ( 1 τm−1 − 1 τm ) = 1, then we have i * = m − 1 and q i * = 0 such that the attacker plays a pure strategy.This means there are at least two Nash equilibria, one given by p * and I m , and one given by τ m and I m .In this case, our matrix game is degenerate but there still exists an MTD according to Definition 1.
Remark 7: As already mentioned in Remark 3, it is often reasonable to include an attacker type that has zero payoff for all trajectories a such that the defender does also consider the case without an attacker when choosing the threshold.For the closed-form solution presented in this section, we are able to replace I j by π 1 I j to take the attacker type with zero payoff into account, where π 1 ∈ (0, 1] is the probability that the attack is happening.Interestingly, this modification does not change the defender's equilibrium strategy (19).However, it does lead to a larger i * in Lemma 5 (see Section VII-B).This is consistent with our intuition since the attack will be less likely to happen and the defender focuses on increasing the mean time between false alarms by choosing larger thresholds.

VII. NUMERICAL EVALUATION
For the numerical evaluation, we look at a four-tank system [31], which we linearize around the input voltage of 6 V and discretize with a sampling time of 0.5 s.We further assume that w(k) ∼ N (0, 0.1I 4 ) and v(k) ∼ N (0, 0.01I 2 ) and an LQG controller is used, where the LQR cost matrix for the states is I 4 and the cost matrix for the controller input is I 2 , i.e., the controller input u The attack length is chosen to be N = 1000 time steps.For anomaly detection, a χ 2 detector is used such that y D (k + 1) = r(k) T r(k).

A. Bayesian Nash equilibrium
In this part, we solve the optimization problem (15) to find the equilibrium moving target defense for the defender.Note that we choose the defender's set of actions, the attacker type payoff functions and the factor c F in the defender's objective function for illustrative purposes of the presented MTD framework.In practice, the defender needs to choose c F according to its cost for false alarm and the payoff functions of the attacker types could be the result of a risk assessment as discussed in Section II-E.
The defender considers six thresholds, which correspond to the following average times between false alarms, τ ∈ {10, 10 2 , 10 3 , 10 4 , 10 5 , 10 6 }. ( These values are chosen to cover a wide range of average times between false alarms.Further, we use c F = 43200 as the cost factor for false alarms. The defender faces n φ = 5 attacker types.We further assume that the attack starts at k = 0.The first attacker type is an attacker with zero payoff, i.e., f 1 (a) = 0 for all a.This type represents the case where there is actually no attacker present in the system and the defender only has to consider the cost induced by the false alarms.For the other attacker types, we use the average value of the plant's state at the end of the attack, i.e., x = E{x(N )}, to define the payoff.For attacker type φ ∈ {2, . . ., 5}, we use f φ (a) = |x φ−1 | 2 as the payoff function.Thus, attacker type 2 attacks the water level in tank 1, attacker type 3 attacks the water level in tank 2 and so on.Since a χ 2 detector is used, we can use the results of Proposition 3 in [15] to determine the attack impact for each attacker type for a given τ .
We consider three different scenarios that differ in terms of their priors π φ .In the first scenario, the operator assumes that it is more likely that there is no attack than that there is an attack, and thus π 1 = 0.6 and π φ = 0.1 for φ ∈ {2, . . ., 5}.In the second scenario, the operator assumes that there is always an attack but we want to investigate how the defender's strategy changes when the attacks are still equally likely, i.e., π 1 = 0 and π φ = 0.25 for φ ∈ {2, . . ., 5}.In the third scenario, there is also always an attack but this time the attacker is assumed to most likely attack the first and second state of the plant, i.e., π 1 = 0, π 2 = π 3 = 0.49, and π 4 = π 5 = 0.01.
Figure 2 shows the equilibrium MTD of the defender for the three different scenarios.Figure 3 shows the equilibrium mixed strategies for attacker type 2 to attacker type 5. Using Lemma 3, we can determine that for the first two scenarios the defender's actions τ 1 and τ 2 are strictly dominated and, therefore, will not be used in the Bayesian Nash equilibrium.In the third scenario, only τ 1 is strictly dominated and τ 2 is used in the Bayesian Nash equilibrium.This means that the cost for an attack that is stealthy for τ 1 is negligible compared to the cost for false alarms.Further, this also means none of the attacker types will use the attack action corresponding to these thresholds in the respective scenarios, because the attacker wants to maximize its payoff.Hence, p * 1 = q * φ,1 = 0 for all φ and these values are not depicted in Figure 2 and Figure 3 for the sake of simplicity.Furthermore, since the attacker type 1 obtains always zero payoff it does not influence the objective value of the optimization problem in (15).Therefore, we can arbitrarily choose q * 1,1:6 = 1 6 1 6 as the equilibrium mixed strategy of attacker type 1 for all three scenarios.This is not shown in the figures for the sake of simplicity.
From Figure 2, we can observe that the defender chooses τ 6 with the highest probability in the MTD, while the attacker puts most of the probability weight on the attacks with a lower payoff than the payoff corresponding to τ 6 .Since the attack will not receive any payoff if it is detected, this observation is reasonable and also shows that the proposed moving target defense is effectively limiting the attacker payoff.This can be observed especially in the third scenario, where attacker type 2 and type 3 are more likely than attacker type 4 and type 5.In addition to that, the payoffs of attacker type 2 and type 3 are larger than for type 4 and type 5. Hence, a larger attacker payoff is more likely than in the second scenario.Therefore, in the third scenario the defender chooses τ 2 with a non-zero probability to force the attacker to remain undetected and receive a lower payoff.So we see that in this case having more false alarms outweighs the cost of having an undetectable attack.
We can also observe from Figure 2 that the defender has the same MTD for the first two scenarios.A reason for this is that in (15) the defender's constraints will not be affected by the attacker type with zero payoff.Hence, the constraint set for choosing p is the same in the first and second scenario due to the uniform prior across attacker types 2 to 5. For the different attacker types, it is interesting to see that although the defender has the same MTD in the first and second scenario, the attacker types' mixed strategies do change.We, further, observe that the mixed strategies q * 2 and q * 3 for attacker type 2 and type 3, respectively, are very close and the mixed strategies q * 4 and q * 5 are close as well, in all three investigated scenarios.For example, q * 2 − q * 3 ∞ for the first, second, and third scenario is 0.0672, 0.0199, and 0.0131, respectively, while q * 4 − q * 5 ∞ for the first, second, and third scenario is 0.0039, 0.0024, and 0.0053, respectively.Finally, we look at an interesting property of the MTD obtained by solving (15).The MTD strategy in the first and second scenario, where all attacker types with a nonzero impact have the same prior, is p * = 0 0 0.25 0.15 0.1 0.5 T (21) while in the third scenario the MTD is p * = 0 0.3333 0.1667 0.1 0.0667 0.3333 T .( Interestingly, both (21) and ( 22) have the structure of the proposed MTD in Theorem 2, although Theorem 2 is only for the case where the defender faces one specific attacker type.This can be explained with the structure of the payoff for each attacker type.We determine that I φ j = α φ J D (τ j ) for φ ∈ {2, . . ., 5} (Proposition 3 of [15]).Hence, the payoff is the detector threshold times an attacker type specific constant α φ .Therefore, we have that Since this ratio is independent of the attacker type φ, the closed-form solution of the defender's equilibrium MTD for each of the attacker types is the same.

B. Closed-form solution
We finally evaluate the closed-form solution, where we only consider one attacker type.The attacker that we consider uses f (a) = x 2 ∞ as its attack objective, which can be seen as the attacker type with the largest payoff among the four attacker types with a non-zero payoff in the previous section.
We consider a defender that can choose from the six different mean time between false alarms in (20).Lemma 3 with n φ = 1 shows us that τ 1 = 10 is strictly dominated by the other strategies and is therefore not used in the Nash equilibrium.Setting i = 2, i.e., using the set of all strictly dominating strategies, we determine that q 2 = 0.4748 < 1 These strategies are also obtained with the optimization problem (15).So we see that our closed-form solution coincides with the solution of the linear program when n φ = 1.Furthermore, the MTD (23) coincides with (22), while (24) does not coincide with any of the attacker type distributions obtained in the previous section.Next, we evaluate the effect of the size m of the set of average times between false alarms the defender can choose from on the smallest index i in the support set of the defender and its probability q * i .We do so by considering values of m from 1 to 100 and set τ j = 10 + (j − 1)50, where j ∈ [1, m].The goal here is to show how the index i changes as the size of the set of τ varies.First, for a given m the Nash equilibrium obtained from (15) coincides with the closed-form solution in Theorem 2 for all investigated m.Now let us analyse how the index i and with it q * i changes as m increases.In Figure 4, we see the index i of the mixed Nash equilibrium in the upper plot and q * i in the lower plot over m.First, note the evolution of q * i over m.Furthermore, we analyse two different cases, one where the attacker is always present and one where the prior of the attacker being present is 0.2 and the prior of an attacker with zero payoff is 0.8 (see Remark 7).Every time i changes, i.e., the smallest τ used in the Nash equilibrium changes for a given m, q * i jumps from a value close to zero back up to a larger value just to decrease to zero again as m increases until the next jump.This is consistent with q * i = 1 − m l=i+1 cF I l 1 τ l−1 − 1 τ l , which is decreasing as m increases.Take, for example, the interval m ∈ [2,6] for the case where there is always an attacker (solid line in Figure 4).In this interval i = 2 and we see that q * i has its maximum 1 at m = 2 and it decays to q * i = 0.02456 at m = 6.However, if we choose i = 2 for m = 7, then q * i = −0.0148 is smaller than zero and, therefore, using τ 2 in the Nash equilibrium does not lead to a proper probability distribution when m = 6.Hence, i needs to jump from 2 to 3 to guarantee that q * is a probability distribution with q * i ∈ [0, 1).Furthermore, if m = 100 then only the action τ 1 is strictly dominated.However, in the Nash equilibrium the smallest τ used has index i = 3.Therefore, we see that even if an action is not strictly dominated the action is not necessarily used in the Nash equilibrium.With our results in Theorem 2 we understand the reasons behind that in the game presented here.Similar observations are made for the case when the prior of the attacker with a non-zero payoff is 0.2 (dashed lines in Figure 4).Furthermore, we see since the attacker is only present with a chance of 20 % more strategies of the defender are strictly dominated.This observation is in line with our intuition, since the attack is less likely to happen and the defender can choose larger thresholds to avoid false alarms without fearing a larger impact.

VIII. CONCLUSIONS
In this paper, we presented a moving target defense strategy against stealthy sensor attacks.To find the moving target defense, we formulated a game where the defender periodically switches the detector threshold at random and the attacker has access to all sensor measurements.While the attacker wants to maximize its payoff the defender wants to minimize its cost consisting of the cost for false alarms and the cost induced by the attacker's payoff.However, the defender is not certain about which attacker it faces and only knows the prior of the attacker's possible type.We analyzed one period of this periodic switching game and showed that for one period we can find a strategically equivalent matrix game.For this matrix game, we use the Bayesian Nash equilibrium to determine the equilibrium MTD strategy and presented a necessary and sufficient condition for when a MTD for the defender exists.Furthermore, we showed that the MTD can be found by solving a linear program.For the case, where the defender only faces one attacker/knows the attacker type exactly, we presented a closed-form solution for the moving target defense.In the numerical evaluation, we saw how the thresholds used by the defender depend on the prior of an attack happening.The mere threat by the defender of randomly choosing a lower threshold with a low probability forces each attacker type to choose attacks with a lower impact which are stealthy for even small thresholds.The reason for that is that the attacker will not be able to get a payoff when it is detected.
If we believe that the attacker might observe the defender's switching pattern before attacking, the attacker could have a larger payoff.Therefore, one direction of future work is not to investigate the Bayesian Nash equilibrium, but the Bayesian Stackelberg equilibrium, where it is assumed that the attacker observes the defender first.Another direction of future work would be to investigate the repeated game setting.Furthermore, an in-depth analysis of the optimal choice of the set {τ 1 , . . ., τ m } is also an avenue of future work.

Fig. 2 .Equilibrium mixed strategy of attacker type 2 to type 5 =Fig. 3 .
Fig.2.The plot shows the MTD of the defender for three different priors of an attack happening, where the horizontal axis are the defender actions and the vertical axis shows the probability of choosing the respective action.

Fig. 4 .
Fig.4.The upper plot shows how the smallest index i = i * , for which(18) holds, changes when m increases.The lower plot shows the trajectory of q * i in (17) over m.The solid lines represent the case with an attacker always being present, while the dashed lines represent the case where the chance of the attacker being present is 20 %.

TABLE I THE
GAME M φ WITH DISJOINT SETS FOR THE ATTACKER'S ACTION