Game theoretical modelling of network/cybersecurity

Game theory is an established branch of mathematics that offers a rich set of mathematical tools for multi-person strategic decision making that can be used to model the interactions of decision makers in security problems who compete for limited and shared resources. This article presents a review of the literature in the area of game theoretical modelling of network/cybersecurity.


Introduction
For most physical security situations the outcomes depend on the actions of both attackers and defenders.The attackers and defenders act rationally and can depend on various incentives that may be diametrically opposite or, under other circumstances, may have some overlap.Physical security thus provides the situations where the tools of game theory can be beneficially applied and can provide insights into making optimal security decisions.For various decision-making problems arising in physical security, game theory can provide a rich set of analytical methods and mathematical tools.
The pervasive use of the Internet opens up numerous network security situations.The attackers and defenders in typical situations are rational agents who have the ability to act strategically.The agents can be assumed to be interested in finding either the most damaging or the most secure use of available resources.In the domain of network security, game theory has been shown [1,2] to provide useful insights in making decisions that lead to developing novel, analytic, computational, and practical approaches in the thought, policy, planning, and strategic action.Game theory provides methodical approaches in order to explain the inter-dependencies of the role of hidden and asymmetric information in networks, network security decisions [3,4], the incentives/limitations of the attackers, the perception of risks and costs in human behavior, and much more.The elegant and powerful tools made available by game theory are found to be highly useful in order to build secure, resilient, and dependable networked systems [5,6].

Static and dynamic games
Games considered in the field of game theory are broadly classified as being the static or dynamic games [10,11].In static games the players choose their strategies simultaneously whereas dynamic games involve a sequence of moves.In dynamic games [39,40], a player chooses before others do, knowing that the others' choices will be influenced by his/her publicly observable choice.The dynamic character of the game results in models that can enhance the learning ability of the players.In turn, this learning can help security practitioners to develop high quality theoretical studies on real-life problems.Depending on a particular situation in cybersecurity that are amenable to game theoretical analyses, applying either static or dynamic game theory can considered appropriate.For instance, a cybersecurity situation involving a team of attackers and a plan of attack in which the attackers act simultaneously, the application of static game theory will be required.Whereas, dynamic game theory will be applied when some of the attackers act first and the reaction of the defenders is observed before the remaining attackers act while equipped with the knowledge of how the defenders have reacted.

Nash equilibrium
The rule for predicting how a game will be played defines the solution concepts in terms of which the game is understood by game theorists.The most commonly used solution concept in game theory is that of a Nash equilibrium (NE).Assume there are N players in a game.Let S i and U i for 1 ≤ i ≤ N be the strategy spaces and the payoff, or utility, for each player i, respectively.The individual elements of the strategy space S i for player i are called the pure strategies.The game can then be described [41] by the set G : The presumable outcome of the game is determined by analyzing the behaviour of the players and their strategy choices.Let s = {s 1 , s 2 , ...s N } be the profile of pure strategies with s i ∈ S i .Let s −1 be the profile of strategies excluding player i.A strategy profile s with s = (s i ; s −i ) for all i, is a NE [7,10,13,43,44] such that for all 1 ≤ i ≤ N we have This is also described by stating that the strategy of each player i is a best reply [7,10,13] to the strategies of other players.In the cybersecurity context, the defenders' strategy profile that is a NE will consist of a set of defensive strategies, one on the behalf of each defender, such that the strategy of each defender is a best reply to the strategies of the attackers.

Nash equilibrium in mixed strategies
A mixed strategy [13] is a linear combination, with real coefficients, of two or more pure strategies, with their probability weights summing up to 1.This defines the probability distribution π i = (π i,t ) t∈Si for player i choosing randomly among the pure strategies S i .The expected payoff for the player i is then given [41] by A set of probability distributions π i = (π i ) 1≤i≤N defines a mixed NE, or a NE in mixed strategies, such that for all i and any other probability distribution πi = (π i,t ) t∈Si we have A key result of Nash's thesis [43,44] states that a NE always exists in mixed (randomized) strategies in games where each player has only a finite number of deterministic strategies.In a NE, no one player can improve his/her situation by unilaterally changing his/her strategy.This amounts to stating that each person is doing as well as they possibly can, even if that does not mean that an optimal outcome has been achieved for the collective of all players.In a cyber attack when a NE is determined for a team of defenders, neither is left with any motivation to deviate unilaterally from it.

Prisoners' Dilemma
The game of Prisoners' Dilemma (PD) [7,10,13] describes the following situation: a) Two criminals, called in the following as Alice and Bob, commit a crime together and are arrested.As the evidence is being investigated, they wait for their trial, b) Each suspect is offered the opportunity to confess the crime after placing him/her in a separate cell, c) Each suspect may choose between the strategies of confessing (D) or not confessing (C), where C and D represent the pure strategies of cooperation and defection with one's partner in the crime and not with the authorities, d) If neither of the two confesses, i.e. (C, C), they both go free, and divide between them the proceeds of their crime.We represent this in the following by 3 units of payoff to each prisoner, e) However, if one prisoner confesses (D) and the other does not (C), the prisoner who confesses testifies against his partner in exchange for going free and gets the entire 5 units of payoff.However, the prisoner who did not confess is sent to prison and that is represented by the payoff of zero, f) If both suspects confess, i.e. (D, D), then both are convicted while a reduced term is given to both.This is represented by giving each suspect 1 unit of payoff.This payoff is better than having the other suspect confess, but it is not so good as going free.The game between the prisoners can be represented by the following bimatrix of payoffs: where the first and the second entry in a bracket correspond to Alice's and Bob's payoff, respectively.Let Alice play C with probability p and play D with probability (1 − p).Similarly, let Bob play C with probability q and play D with probability (1 − q).The players' payoffs for the PD matrix (5) are The inequalities that define the NE consisting of a pair of mixed strategies (p * , q * ) in PD can then be written as which produces a unique NE in PD: p * = q * = 0.The NE corresponds to both players playing the pure strategy D. Ref. [42] Utilizes an international relations PD game to present an explanation of the complexities of cyber intrusions and the way forward for nation-states to deal with these new exigencies.The PD game is also discussed in the context of cybersecurity in Ref. [54].

Refinements of Nash equilibrium
Some of the largest problems in security applications come from actions that cannot be anticipated.This makes using NE problematic as the concept presumes that the structure of the game, as well as all possible moves, is common knowledge among the players.Several refinements of the NE have been introduced [8,12] including sequential equilibrium, proper equilibrium, trembling hand equilibrium, and rationalizability.

Sequential or Stackelberg games
A dynamic model of the duopoly game was proposed by Stackelberg (1934) [11,45] in which a leader (or dominant) firm moves first and in view of the leading firm's move a follower (or subordinate) firm moves second.For instance, in the early history of US mobile industry, General Motors played this leadership role against more than one firm such as Ford and Chrysler who acted as followers.In the sequential game of duopoly a Stackelberg equilibrium is obtained using the solution concept of backwards-induction outcome of the game.As a solution-concept it is stronger than that of NE and refers to sequential nature of the game.Multiple NE may appear in sequential move games whereas only one of those is associated with the backwards-induction outcome of the game.Consider the following simple three step game, a) Player 1 chooses an action a 1 from the set A 1 of his strategies, b) Player 2 observes a 1 and then chooses an action a 2 from the set A 2 of her strategies, c) Payoffs for the two players are U 1 (a 1 , a 2 ) and U 2 (a 1 , a 2 ).It is an example of the dynamic games of complete and perfect information whose key features are, a) Players take their moves in sequence, b) All previous moves are known to the players they make a next move, and c) The players' payoff functions are common knowledge.Given the action a 1 is previously chosen, at the second stage of the game when player 2 takes his turn to make the move he faces the problem: Assume that for each a 1 in A 1 , player 2's above optimization problem has a unique solution R 2 (a 1 ), which is the best response of player 2. By anticipating player 2's response to each action a 1 that player 1 might take, Player 1 can now solve player 2's optimization problem.So that player 1 faces the problem: Assuming that this optimization problem has a unique solution for player 1 and it is denoted by )) is then called as the backwards-induction outcome of this game.In Ref. [46] Damjanovic-Behrendt presents an approach to optimize the cybersecurity decisions in order to protect instances of a federated Internet of Things platform in the cloud.His solution implements the repeated Stackelberg security game.An overview of use-inspired research in Stackelberg security games is presented in Ref. [47].

Repeated games
A specific class of dynamic games are the repeated games in which the players play the same game more than once.Players observe the outcome of the first play before the start of the second play.Payoffs for the entire game are then obtained as the sum of the payoffs from the previous stages.Generally, repeated games have a strategic structure that is more complex than it is in their one-stage counterpart.This is because the players' strategic choices in the following stages are influenced by the outcome of the choices they make in an earlier stage.
A two-stage game of complete but imperfect information is sequential in that the players' moves in the first stage are observed before the next stage begins.The simultaneity of the players' moves in each stage result in the imperfect information in the game.Such a game consists of these steps [11], a) Players A and B simultaneously choose their moves p and q from their strategy sets P and Q, respectively, b) Players A and B observe outcome of the first stage of the game, (p, q), and they then simultaneously choose actions p 1 and q 1 from the sets P and Q, respectively, c) Payoffs are U i (p, q, p 1 , q 1 ) for i = A, B. Usually, the games from this class are solved using the method of backwards-induction.This involves solving the simultaneous-move game between players A and B in the second stage, given the outcome from the first stage.Players A and B can anticipate that their second-stage behavior will be given by (p ⋆ 1 (p, q), q ⋆ 1 (p, q)).In view of this, the first-stage interaction between the players becomes equivalent to the following simultaneousmove game: a) Players A and B simultaneously choose actions p and q from sets P and Q, respectively, b) Payoffs are U i (p, q, p ⋆ 1 (p, q), q ⋆ 1 (p, q)) for i = A, B. When (p ⋆ , q ⋆ ) is the unique NE of this simultaneous-move game, the set of four numbers (p ⋆ , q ⋆ , p ⋆ 1 (p, q), q ⋆ 1 (p, q)) is known as the subgame-perfect outcome [11] of this two-stage game.This solution concept is the natural analog of the backwards-induction outcome in games of complete and perfect information.
Consider the PD game given by the matrix (5) for which the players play the game twice and the outcome of the first play is observed before the second stage begins.Payoffs for the entire game are then obtained as the sum of the payoffs from the two stages of the game.The game is a two-stage game of complete but imperfect information [11].Assume players A and B play the pure strategy C with probabilities p and q, respectively, in stage 1.Also assume the players A and B play the strategy C with probabilities p 1 and q 1 , respectively, in stage 2. Let U A1 and U B1 represent the payoffs to players A and B, respectively, in the stage 1.From Eqs. (6) these payoffs are 0 (i.e.defection for both the players) as the unique NE in this stage.Similarly, in the second stage the payoffs to players A and B are expressed as U A2 and U B2 respectively, where Therefore, the strategy of defection, i.e. p ⋆ 1 = q ⋆ 1 = 0, once again comes out as the unique NE in the second stage.To compute the subgame-perfect outcome of this two-stage game, we analyze its first stage given that the second-stage outcome is also the NE of that stage -namely p ⋆ 1 = q ⋆ 1 = 0.For this NE the players' payoffs in the second stage are U A2 (0, 0) = 1, U B2 (0, 0) = 1.The players' first-stage interaction, therefore, in this two-stage game becomes equivalent to a one-shot game, in which the payoff pair (1, 1) from the second stage is added to their first-stage payoff pair.We can write the players' payoffs in the one-shot game as It has again (0, 0) as the unique NE.The unique subgame-perfect outcome of the two-stage PD, therefore, is (0, 0) in the first stage, and it is also (0, 0) in the second stage.The strategy of defection in both the stages comes out as subgame-perfect outcome for the two stage classical PD.

Cooperative games
In cooperative games, players are allowed to form coalitions, binding agreements, pay compensations, make side payments etc and there is a strong incentive to work together to receive the largest total payoff.In their pioneering work on game theory [14], von Neumann and Morgenstern offered models of coalition formation where the strategy of each player consists of choosing the coalition s/he wishes to join.In coalition games the players' possibilities are described by the available resources of different groups (coalitions) of players and joining a group, or remaining outside, is part of strategy of a player affecting his/her payoff.The notion of a strategy disappears in a cooperative game and the notion of a coalition and the value or worth of that coalition attain significance.It is assumed that each coalition can guarantee for its members a certain amount that is called the value of a coalition [8,13].It measures the worth of the coalition that is obtained as the payoff which the coalition can guarantee for itself if it selects an appropriate strategy.However, the 'odd man' can prevent the coalition from receiving more than this amount.
An example of a three-player symmetric cooperative game is a classical three-person normal form game [48] that is defined by: a) Three non-empty sets Σ A , Σ B , and Σ C that are the strategy sets of the players A, B, and C, b) Three real valued functions U A , U B , and For this game, a strategy is understood as such a tuple (σ A , σ B , σ C ) and U A , U B , U C are payoff functions of the three players and the game can be denoted as Γ Let ℜ = {A, B, C} represent the set of players and assume that ℘ is an arbitrary subset of ℜ. Players in ℘ may form a coalition so that the coalition ℘ can be considered as a single player.It is expected that players in (ℜ − ℘) will form an opposing coalition and the game has two opposing "coalition players" i.e. ℘ and (ℜ − ℘).One of the two strategies 1, 2 is chosen by each of the three players A, B, and C.There is no payoff if the three players choose the same strategy.If the two players choose the same strategy, both receive one unit of money from the 'odd man.'The payoff functions U A , U B and U C for players A, B and C, respectively, are given as [48]: with similar expressions for U B and U C .Suppose ℘ = {B, C}, hence ℜ − ℘ = {A}.The coalition game represented by Γ ℘ is given by the payoff matrix: Here the strategies [12] and [21] are dominated by [11] and [22].After eliminating these dominated strategies the payoff matrix becomes It is seen that the mixed strategies: are optimal for ℘ and (ℜ − ℘) respectively.With these strategies a payoff 1 for players ℘ is assured for all strategies of the opponent; hence, the value of the coalition υ(Γ ℘ ) is 1 i.e. υ({B, C}) = 1.Since Γ is a zero-sum game υ(Γ ℘ ) can also be used to find υ(Γ ℜ−℘ ) as υ({A}) = −1.The game is symmetric and one can write Cooperative game theory has been applied to cybersecurity in a number of studies: In a Masters thesis submitted to the Florida Atlantic University, Golchubian [56] has used cooperative game theory by developing a game theoretical approach to prevent collusion and to incentivize cooperation in cybersecurity contexts.Vakilinia and Sengupta [57] have investigated profit sharing in coalitional game theory using calculation for rewarding the players that is participation-fee.In particular, they analyze the well-known Shapley value concept [10,11] by formulating a coalitional game between organizations in cybersecurity information sharing system.

Bayesian games
In other situations that are characterized by the players' access to only a partial knowledge about the game, game theory is still shown to be an effective modelling tool by exploiting the concepts from Bayesian games [11].A Bayesian game is defined as a game of incomplete information in which the players do not have the complete knowledge of the rules of the game.The incomplete knowledge is described by the existence of the so-called state of Nature, which is decided probabilistically by some relevant random source.In Bayesian games, the probability distribution over the states of Nature is private to each player and which represents each player's knowledge about Nature.Nature is allowed to leak some information about its state in the Bayesian games, which is called the signal to the players.With the signal, the players can probabilistically work out their expected utilities.A Bayesian game [55]

Network/cybersecurity and game theory
The information technology landscape has been revolutionized by the recent advances in software and hardware technologies.Cyberspace has now become an integral part of the way the business is conducted.For current telecommunication and information networks, their network/cybersecurity is the main concern and the protection and security of cyberspace infrastructure is of key importance.
Game theory is applied to networks in settings in which agents are connected by physical or virtual links.Given the network structure and the actions of other users of the network, the agents must decide on some action in a strategic manner.
Heterogeneous, large-scale, and dynamic networks define the cyberspace of the present time.Cyberspace has become increasingly complex even within carefully designed network and software infrastructures.Ample and a large attack surface is available for evasive maneuvers of adversaries in the cyberspace.Cyberspace has become characterized by higher computational power and ubiquitous connectivity and these features have given birth to new risks and threats.
The miscreants launching cybersecurity attacks have various degrees of uncertainty and defenders have incomplete information about their intentions and capabilities.Improving cybersecurity thus involves difficult challenges and decision making on multiple levels and over different time scales.The goal of cybersecurity is to provide practical and scalable security mechanisms and to enhance the trustworthiness of cyber-physical systems.
As is the case with the physical security, in cybersecurity there exists a wide variety of the agents' utilities, including adversarial and antithetical types.Game theory, therefore, shares many common features with the cybersecurity problem.The success of a cybersecurity scheme depends not only on the actual cyberdefense strategies that have been implemented, but also on the strategic actions taken by the attackers to launch their attacks.Thus these scenarios are well-suited to the game theoretical analyses of the cybersecurity schemes.Such analyses can also be viewed from the perspective of establishing trust.When security is compromised, building trustworthy relationships, and deciding whether to trust received information becomes particularly relevant.It is well known [60] that the trust problem can be formulated as in gametheoretic strategic terms.Trust emerges as an important aspect in the design and analysis of security solutions and the implementations of security games involve several levels of trust.

Network/cybersecurity games
A significant motivation for cybersecurity games comes from earlier applications of game theory to the domain of physical security.These are examples of practical situations that demonstrate the potential for game theory in that domain.Physical security considerations are important at airports, product transportation, national security patrols, etc. Usually, a defender allocates the available resources to defend against an attacker whereas the attacker can attempt to compromise targets that the defender is protecting from possible attacks.Most often, the defender can best allocate resources to minimize the chance of success for the attacker and minimize the cost incurred by the defender.How should the defender allocate agents, patrols, surveillance technology, and other resources to minimize the impact of attackers?Examples of physical security situations include, a) the airport security: where the defender can schedule optimal checkpoints and patrols for their agents, b) the coast guard: more efficiently protection can be provided to ferries or ports that are the targets for theft or terrorism.The finite number of agents and limited resources can be allocated in such ways to best counteract wide scale poaching.
In network/cybersecurity situations, the zero-sum games between malicious attackers and the transmitter-receiver pairs can model the problems of jamming and eavesdropping in communication networks.Attackers and defenders are most often considered as the agents in network security problems.Security games form a basis for formal decision making, algorithm development, and in predicting the behaviour of attackers.Security games can be deterministic or stochastic.They can be sequential or hierarchical (Stackelberg game) in which an agent has a certain information advantage over the others.In cooperative or coalitional security games the agents can cooperate to achieve their strategic objectives.Examples of security games in the network/cybersecurity domain include, i) intrusion detection [5,75], ii) privacy concerns [74,83,84], iii) network jamming [76,77,79], and iv) eavesdropping in communication networks [78].
Scheduling and deployment of patrols is a key operational problem for those who are responsible for the security of airports, art galleries etc. Alpern et al. [49] have presented a class of patrolling games addressing the optimization problem involving randomized, and thus unpredictable, patrols.They have considered the facility to be patrolled as a network or graph Q of interconnected nodes (e.g.rooms, terminals) such that the Attacker has the option to attack any node of Q within a given time T.That is, the attacker requires m consecutive periods that are uninterrupted by the Patroller in order to commit his nefarious act and therefore win.In this approach, the Patroller can follow any path on the graph.The patrolling game turns out to be a win-lose game in which, given best play on both sides, the Value is the probability that the Patroller successfully intercepts an attack.

Examples
We begin by reviewing two examples from the literature in some detail, as reported by Sokri [50] and Durkota et al [65].

Optimal resource allocation in cybersecurity
Sokri [50] has considered a security game between an attacker a and a defender d in a system for cyberinfrastructure.Let T = {t 1 , t 2 , ..., t n } be a set of n targets that are at the risk of being attacked and S = {s 1 , s 2 , ..., s m } a set of resources to protect the targets.Vector a t can represent the attacker's mixed strategy where a t is the probability of attacking the target t.The defender's mixed strategy is the vector p t where the marginal probability of protecting the target t is p t .Players' access to mixed strategies allows them to play probability distributions over their pure strategies.A strategy profile a, p is a combination of (mixed) strategies that the attacker and the defender may play.Let r d (t) be the defender's reward if the attacked target t is covered and c d (t) his cost if the target is uncovered.Similarly, denote by r a (t) the attacker's reward if the attacked target t is uncovered and by c a (t) the attacker's costs if the attacked target t is covered.For the strategy profile a, p following are the expected payoffs of the two players: The payoffs in Eqs.(15,16) depend only on the attacked targets and their protection and these payoffs do not consider the targets that are not attacked.Now, if the players move simultaneously, the solution of this cybersecurity game is a NE.However, if the game is played sequentially in which the defender moves first (leader) and commits to a strategy and the attacker (follower) reacts to the defender's move, the Stackelberg equilibrium appears as the standard solution in this leader-follower interaction.
Given the defender's strategy p, the attacker's optimization problem can be presented as follows: t∈T It is optimal to assign 1 to any a t that is associated with a maximal value of The dual problem that corresponds to the above has the same optimal solution and it can be formulated as follows: Min u, u ≥ U a (t, p), ∀t ∈ T.
The complementary slackness condition then becomes: When the leader problem is completed by including the follower's optimality condition, it becomes a single mixed-integer quadratic problem [51]: t∈T Eq. ( 23) maximizes the leader's expected payoff.The coverage to the available resources (m) is limited by Eq. ( 24) whereas Eq. ( 27) restricts the coverage vector to [0, 1].The leader's mixed strategy is enforced to be feasible by these two constraints.Eq. ( 26), where M is a large number, is the complementary slackness condition indicating that the follower's payoff u is optimal for every pure strategy with a t > 0. Sokri [50] has considered the example of a game in normal form as shown in the Table 1 and that is adapted from the Refs.[52,53].There are 4 targets and two resources that can cover any of the two targets.For each target, there are two payoffs i.e. the payoffs of the attacker and the payoffs of the defender.Each payoff consists of two parts i.e. a reward and a cost.  1 to a range of values, an uncertainty can be placed on each variable.Using a three-point estimate (minimum, most likely, and maximum) approach that incorporates this uncertainty, Sokri [50] has determined the following solution, which is found to satisfy all the constraints as well as the numerical convergence criterion: p = (0.5549, 0.4994, 0.3411, 0.6025), a = (0, 0, 0, 1) .

Defender
The objective did not move significantly after many iterations, and even if it is heavily defended the attacker preferred to attack the most valuable target.The most likely payoffs have the corresponding cumulative distribution function (CDF).This can now be determined With this solution and the median of the defender's average payoff comes out to be approximately 0.95.This gives a 50% probability that the defender's average payoff will be less than 0.95.The values for minimum and maximum of defender's average payoff are then determined to be 0.4261 and 1.5166, respectively.

Threshold-setting to detect data exfiltration
Data breach involves strategic interaction between defender and attacker for which game theory provides helpful insights.It is carried out through the process of information exfiltration and involves unauthorized transfer of information.A dynamic (sequential) game model of data infiltration is described by Durkota et al [65] in which the attacker's objective is to exfiltrate as much data as possible before the activity is detected.The defender's objective is to minimize the loss of data before the breach is detected.
The defender records the volume of data that each host at the network uploads over time while using windows of time with fixed lengths.Defender selects a detection threshold θ, chosen from a set Θ of thresholds, such that if the host uploading data that is more than θ in the time window then it triggers an alarm.
The defender can set the detection threshold θ for each host individually.However, it is possible to identify groups of hosts with similar behaviours.For instance, a group can be of type λ from the set Λ of all types.For a randomly selected host, P (λ) then defines the probability that the host is of the type λ.That is, P (λ) is the probability of the concurrence of the host types.It is assumed that both the attacker and the defender know the probability P (λ).Two hosts of same types have the common activity pattern i.e.P (o | λ) gives the probability that a host of type λ transfers the amount of data o ∈ O in a time interval.
It can be the case that even without an attacker's activity a selected threshold θ is surpassed along with the alarm triggered.These instances are called the false positives and usually it is a time consuming task for the administrators to determine their cause.Certain number of false positives are expected in the defender's strategies and usually their bound is expressed as the constant FP.
The external and internal attackers are called the outsider and the insider, respectively.The nature of information that the insiders and the outsiders have about the targeted organization can be different from each other.Although the outsider may know which host types exist but cannot know which types were compromised.In contrast, the insider knows which host types exist and also which were compromised.
Defender's Strategy: Defender's pure strategy ψ is a map from the set Λ of all types to the set Θ of thresholds, i.e. ψ : Λ → Θ. Defender's mixed or randomized strategy is σ(θ | λ) that defines a probability distribution of thresholds θ given host types λ.The false positive constraint for a defender strategy σ is then written as where is type λ's amount of false positives when the threshold is θ.Attacker's Strategy: It consists of choosing the amount of data a ∈ A that the attacker infiltrates in the next time window.By controlling one of the users, the attacker uploads as much data as possible before being detected.The attacker is awarded a utility when the sum of the host's activity o ∈ O and the amount of data that the attacker infiltrates a ∈ A is less than the detection threshold is a probability distribution over possible host types and threshold settings.It is assumed that first the defender selects the threshold θ and the attacker acts in response to knowing θ.Attacker's expected utility is then defined as u a (σ, π) where σ is the defender's mixed strategy whereas π is the attacker's policy chosen from the set Π.The attacker's policy is defined as a mapping from the set of belief states ∆(Λ × Θ) to the set A of the amounts of data.
During the course of interaction with the defender, the attacker takes into account the last action and observation and uses Bayesian update rule in order to keep track of his belief b.Attacker's action π depends on his belief b i.e. π = π(b).The defender's expected utility is defined as where C > 0. This requires that the attacker and the defender have opposite objectives and their payoffs are proportional to each other's.Also, C being greater than 1 means that the defender's disutility is greater than the attacker's utility.The insider vs. outsider attacks: The data breach attacks can be performed by agents who are inside an organization/company or who are inside.The outsiders usually does not know the type of the host that is compromised even though s/he can know which types (group) exist within the company, for instance, IT admins and secretaries.The insiders, however, know their host types as they use the network regularly and they are also knowledgeable on the defences that are deployed.For instance, an insider because of him/her knowing the exact values of the thresholds that has been fixed for each host type by the defender, can exfiltrate exactly at those values.
An approximate algorithm is used to compute the attacker's policy and to find an approximate Stackelberg equilibrium.The defender's strategy is also an approximate Stackelberg equilibrium and his utility presents as a close approximation to the exact Stackelberg equilibrium.
Defender's optimal strategy against attacks by the insiders: Durkota et al [65] present an algorithm that computes exact Stackelberg algorithm against attacks from the insiders.Knowing the type of the user, using whom the insider can exfiltrate data, allows representing the game between the host and the attacker as a normal form game.For this game, the attacker's strategy consists of choosing for each host type a probability distribution over the actions from the set A. Similarly, the defender's strategy consists of choosing for each host type a probability distribution over thresholds from the set Θ.The game between the attacker and all host types can then be formalized as one problem.To achieve this, the zero-sum normal-form linear program [66] is extended to include a false-positive constraint and multiple host types: λ∈Λ Here σ(θ | λ), U a and U a,λ are the variables in the linear program.In the above, the expected utilities of each type are U a,λ that is weighed by its probability given by (38).With the requirement (36), the expected utility of the attacker U a is minimized.With the requirement (37) it is ensured that against the given defense strategy, a best response is played in each host type.The requirements given by ( 39) and ( 40) are placed in order to ensure that the defender's strategy given by σ is considered a proper probability distribution.The requirement (41) ensures that the false-positive rate is met by σ.Defender's optimal strategy against attacks by the outsiders: As the outsider is unaware of the host's type, s/he tries to learn about it by observing host's activity.This results in the attacker's strategies becoming more complex when these are compared to the strategies of the insider.To achieve his objectives, the outsider can come up with stronger attacks which can vary over time.The uncertainty involved suggests using Partially Observable Markov Decisions Processes (POMDPs).The algorithms developed in order to solve POMDPs can be used to compute the attacker's best response and his/her optimal strategy.In every time step, the attacker exploiting the POMDP framework takes an action from a set of allowed actions and receives the environment's response to that action.Based on this response he then updates his beliefs about the environment.Also, in each time step, the attacker's utility is a function of his action and the environment's response.POMDP generates a solution in the form of a policy describing the list of actions for all belief states about the environment.Heuristic Search Value Iteration (HSVI) [67] is the well established algorithm that is used to solve the POMDPs as it computes the Stackelberg equilibrium of the game.Using interations the HSVI computes the strategies constituting the best response of the attacker and the defender.A best response is then achieved by a convergence of the best response strategies.
Game theory has many security applications, and we cannot give detailed examples of all such analyses here.In this section, we review a number of other applications that have seen gametheoretic analysis.

Trust assignment
Rajtmajer et al. (2017) [83] consider the problem of multiparty access control [68].Users of social networks have a shared interest in the privacy settings applied to content relating to them.They model the problem as a variant of the ultimatum game where all parties are motivated to reach some agreement, despite the need to compromise.They develop a model of how participants will vary their offers over time, and show empirically that the network tends to converge around the proposals of the more 'stubborn' users who are unwilling to vary their proposals.However, they also find that stubborn users are less likely to reach agreement with their neighbours at all, unlike less stubborn users who will quickly take an approach similar to that of their neighbours, resulting in a greater rate of successful interaction.
Raya et al. (2010) [84] consider the "free-rider" problem in systems based on data aggregation: participants gain a privacy benefit by refusing to trust other parties with their data, but with less data available, the system as a whole becomes less resistant to malicious behavior.They show that it is possible to design an incentive scheme that discourages free-riding to avoid a 'tragedy of the commons'-type scenario.

Resource allocation
Game theory is a natural tool for the analysis of resource allocation problems in cybersecurity.An example of this is the analysis by Panaousis et al. (2014) [73], which builds up a quantitative model of how various security controls interact with various classes of vulnerability, yielding different types of costs to the defender, for example in the form of reputational damage or data loss.This is then used to argue that certain controls are or are not worthwhile at a given budget and at a given depth in the network.
A related analysis is given by Cui et al. (2017) [69].In [69], it is hypothesized that an attacker can choose between attacking a customer database and attacking individual users.Gaining access to the database yields a greater reward for the attacker, but may lead to a more vigorous lawenforcement response.Conversely, targeting individual users-e.g. by phishing-leads to a reduced payoff, but may be less risky for the attacker.The defender is represented by two parties: a system administrator who manages the database, and a user who sets a security level for themselves only.One of the more interesting features of this model is the use of a two-stage process for compromising the database, in order to model the greater technical sophistication of such an attack, at least relative to the difficulty of acquiring user credentials by e.g.phishing.

Anomaly detection
Intrusion detection systems based on anomaly detection [70] require the setting of a threshold parameter, that determines whether some data is reported as 'normal' or 'anomalous'.This leads to a trade-off: a low threshold will force attackers to sacrifice the efficacy of their attacks in order to stay covert, but will also lead to a high false-positive rate, resulting in excess cost to the defender.Conversely, a high threshold will reduce the time wasted investigating false-positives, but allows attackers to be less covert and use more powerful attacks that are more costly to the defender.
This interplay between the strategies of the attacker and defender is well-modelled by gametheory, and so game-theoretic methods can effectively inform the design of these anomaly detectors.
Schlenker [63] considers the problem of allocating investigative resources to security-relevant events in a more general sense.A system that triggers investigation in too-predictable a manner is vulnerable to an attacker that can tailor its behavior so as to avoid a follow-up investigation even if it is detected.For example, a system that directs all its investigative capacity toward targets labelled as high-value is easily circumvented by an attacker who has carte blanche to attack moderate-value systems without concern for covertness.

Information flow
Durkota et al. (2017) [65] consider the problem of detecting data exfiltration in a heterogeneous network.Once an attacker is present inside a network, they must decide how quickly to exfiltrate the data that they acquire: a small flow of data is difficult to detect, but the value of the information to the attacker is less timely and therefore argued to be less valuable.Conversely, a large flow of data is more readily apparent, but more valuable to the adversary while it goes undetected.This leads to an interesting result: the optimum strategy for the defender is to vary their detection threshold randomly, yielding a 30% reduction in exfiltrated data relative to a deterministic choice of threshold.
Alvim et al. (2017) [72] consider information leakage in more general terms, defining a framework of information leakage games, and finding that in many cases, the attacker also benefits from a mixed strategy.They also show that the utility of a strategy for the defender is a convex function, allowing the optimal strategy to be determined using normal optimization techniques.

Deception
Others have used game theory to model techniques aimed at deception of attackers [62].Underbrink (2016) [61] classifies these into passive methods, which serve to frustrate reconnaissance and detect the attacker before it strikes, and active methods, which in which the defender takes actions predicted to interfere with an attack in progress.
Schlenker [64] considers the passive case where the defender manipulates their behavior so that an attacker scanning the defender's network will be uncertain about the type or value of each system, making it difficult for the attacker to effectively allocate their effort.Schlenker shows that determining the optimal strategy for the defender is NP-hard in general, but provides an algorithm to approximate this.
Alternatively, deception may be used to engage an attacker that has already compromised the defender's network.Horák et al. (2017) [71] consider a system in which the defender can feed the attacker with useless data once an attack has been detected.They argue that evicting the attacker immediately upon detection is suboptimal, as this leads to the attacker starting again from an 'undetected' state, only now armed with useful information on the defender's detection capabilities.The defender might therefore be better served by allowing the attacker to remain for a time, ideally fed with a stream of valueless disinformation.

Jamming
Game-theoretic analysis of channel jamming has a long history, tracing back to Basar (1983) [79], who shows that the optimum strategy of a single attacker seeking to jam a Gaussian memoryless channel is to either transmit a linear function of the transmitted signal or to transmit random symbols, depending on the relative signal and noise powers.
Other authors have carried out similar analyses in different situations: for example Kashyap et al. (2004) [80] consider a Rayleigh fading channel, and show that knowledge of the channel input does not affect the jammer's strategy.Altman et al. (2009) [77] analyze the case of multiple attackers who seek to jam an orthogonal frequency-division multiplexing (OFDM) communication channel.The attacker and defender must each decide how to distribute their power across the available subchannels in order to minimize or maximise, respectively, the signal-to-interference-plus-noise ratio (SINR) of the channel.Han et al. (2009) [78] consider a different scenario in which 'friendly' jammers broadcast their own signals, introducing noise to disrupt eavesdroppers.They consider the problem from an economic viewpoint: what price can the friendly jammers demand for their services, given some desired rate of secret communication?However, like all economic analyses, this depends strongly on the model of the participants: their analysis assumes that the sender gains a constant utility per unit bandwidth, and the jammers pay a constant amount per unit power.Nevertheless, the results are interesting: in their simulations, they find that there exists a cutoff price for jamming power above which the use of friendly jammers is no longer justifiable.

Smart grids
Smart grids can incorporate fine-grained demand-side data into their control systems, as well as provide demand-side management: with the right incentives, users will consent to automatic reduction of their power consumption at times of high load-for example, by slightly increasing the target temperature of their air conditioners, or by delaying the activation of refrigerator motors.
By incorporating incentives into the pricing scheme, users may be incentivized to lie about their usage in order to secure a reduced tariff.Mohsenian-Rad et al. (2010) [82] provide a gametheoretic analysis of a decentralized demand-side management system; they show how to design the system so that users do not benefit from lying to each other about their usage.Though a decentralized system as in [82] might provide privacy benefits to its users, issues such as communication complexity and deployment considerations may result in a centralized system being preferable in practice.Hajj and Awad (2015) [81] describe a centralized system that uses game-theoretic methods to provides optimum scheduling.This comes at the cost of forcing users to reveal their projected demand to the supplier.In practice this may be a reasonable sacrifice: in order to take advantage of off-peak tariffs, users must already reveal some information on their demand schedule, so the difference in privacy might well be small in practice.

Challenges to applying game theory to security
Although game theory has been shown to be significant for security, there exist many challenges that need to be addressed for developing a viable game-theoretic approaches to security.In this regard, some key challenges include the complexity of computing a game-theoretical equilibrium strategy, as the illustrative examples in Section (3.2) show.There are also difficulties in properly quantifying security parameters such as risk, privacy, and trust [83] [84], i.e. the parameters in terms of which the utility functions for the participants (players) in a security game are defined.
Choosing an appropriate game model for a given security problem comes out as a challenge for the game theory too.Such a model need to depend on the detail and particular aspects of the security problem/application scenario.Choosing a game can be solely based on the intuition and this choice may not substantiated by the available data.A two-player game can be a model for a security game involving an attacker and a defender.However, in the dynamic version of this game can involve multiple stages for attacking and defending.In fact, as described in Section (3.2), the games of later type are more likely to be representative of the network/cybersecurity challenges of the real world.
Another aspect of the security game models is that the players are assumed to have unbounded rationality.In real life and experimental studies, the players do not always act with rationality.As a consequence, there exists a significant scope for studying the solution-concepts of Harsanyi's disturbed games or that of Selten's perturbed games [13] [9] in the network/cybersecurity situations.In Selten's perturbed games, a player's hand 'trembles', resulting in the erroneous move and the trembles are assumed to be determined by a random process.On the other hand, in Harsanyi's disturbed games, it is the payoffs or the utility functions, rather than the players' actions, that go astray.
Interpretation of game-theoretical notions such as mixed strategy Nash equilibrium also appears as a challenge, and particularly so for the security games.Usual approach in game theory in this regard involves considering repeated games whereas many security games are represented as one-shot games.Even within the game theory community, there is no consensus on how to interpret a mixed strategy.There is clear need for interpreting the notion of a mixed-strategy for network/cybersecurity games.In order to convert the game theoretic results into practical security solutions these challenges are required to be addressed.
[59]ists of a tuple N, Ω, S i , T i , C i , τ i , p i , U i i∈N where Ω is the set of natural states, and for each player i ∈ N, a) S i is the set of player i's all available actions, b) T i is the set of player i's signals/types, withτ i : Ω −→ T i is the state-to-signal mapping, c) C i : T i −→ 2 Siis the set of i's available actions after receiving t i ∈ T i , d) p i is the probability measure over Ω, and, e) U i : Ω × S −→ R is player i's utility function where R is the set of real numbers.The solution concept of a NE is adapted into Bayesian games and is called Bayesian NE.Some applications of Bayesian games include Liu et al's.[58]computation of Bayesian Nash outcomes for an intrusion detection game and under the conditions of limited information, Johnson et al's[59]determination of Bayesian Nash equilibria for network security games.

Table 1 :
[50]ff table[50].Note that, a) If the target is attacked, the defender can cover a target and get a reward, b) He can also leave the target uncovered and incur a cost if it is attacked, c) If the target is uncovered, the attacker can attack a target and get a reward, and d) If the target is covered he can also incur a cost.By changing the static values in Table