Resisting Multiple Advanced Persistent Threats via Hypergame-Theoretic Defensive Deception

Existing defensive deception (DD) approaches apply game theory, assuming that an attacker and defender play the same, full game with all possible strategies. However, in deceptive settings, players may have different beliefs about the game itself. Such structural uncertainty is not naturally handled in traditional game theory. In this work, we formulate an attack-defense hypergame where multiple advanced persistent threat (APT) attackers and a single defender play a repeated game with different perceptions. The hypergame model systematically evaluates how various DD strategies can defend proactively against APT attacks. We present an adaptive method to select an optimal defense strategy using hypergame theory for strategic defense as well as machine learning for adaptive defense. We conducted in-depth experiments to analyze the performance of the eight schemes including ours, baselines, and existing counterparts. We found the DD strategies showed their highest advantages when the hypergame and machine learning are considered in terms of reduced false positives and negatives of the NIDS, system lifetime, and players’ perceived uncertainties and utilities. We also analyze the Hyper Nash Equilibrium of given hypergames and discuss the key findings and insights behind them.


I. INTRODUCTION
D EFENSIVE deception (DD) has emerged as a promis- ing proactive defense technique.DD employs the information asymmetry between an attacker and a defender about a target system to mislead the attacker into selecting a suboptimal strategy [1], [2].Game-theoretic DD techniques typically address settings wherein a defender seeks for attack failures by applying defense strategies to deceive the attacker and mislead its decision-making [3], [4].Since the key idea of game-theoretic defensive deception is to manipulate the attacker's beliefs, how to exploit the attacker's beliefs under uncertainty to maximize the effectiveness of deception has been the main concern of the DD research.Conventional game theory assumes that players perceive the correct beliefs of their opponents [5], which is not realistic in real-world situations.However, deviations from this have not been not adequately explored [4].In this work, we consider the players' perceived uncertainty derived from environmental and system dynamics, which affects their beliefs towards the strategies taken by the opponents and make decisions based on utility under the beliefs.Such games have been considered in Hypergame Theory [6], [7] which is leveraged in this work.We consider a high-level cybergame study between multiple Advanced Persistent Threat (APT) attackers and one defender which has been rarely studied.We introduce bundle-based defense strategies so that a single defender handles multiple APT attackers.
Cyberdeception games have been proposed that consider uncertainty.Prelec [8] considers uncertainty derived from human deviated perception.Rass et al. [9] designed extensive form games for capturing the uncertainty in APTs, including disagreement risk assessments, adversarial incentives, and uncertainty about the current system state.Mohammadi et al. [10] used a signaling game to learn the strategy of the deployed fake avatar under uncertainty.However, [8], [9], and [10] consider a game where both the attacker and defender have the same view even if uncertainty is considered in a probabilistic way.In contrast, our work handle the case where the attacker and defender have distinct views about a game and may play different subgames.In addition, they select a strategy based on their beliefs and use an estimated uncertainty quantity in calibrating utilities for the final decision-making in selecting the best strategy.
A cyberdeception game can be formulated as a sequential game, (G, G A , G D ), where G is an original game and G A and G D are games perceived by an attacker and a defender, respectively [11].When G = G A = G D , we obtain a conventional game as both players play the same game G.However, when the players play a hypergame with G A = G D , they view the game differently and take action accordingly [11], [12], [13].To the best of our knowledge, no prior work has considered a cyberdeception hypergame dealing with multiple APT attackers.
Game-theoretic DD techniques have been used to handle the APT attacker [14].However, most game-theoretic DD techniques dealt with APT attacks in the Reconnaissance stage (i.e., scanning attack) but not the other CKC stages.They also mainly considered a two-player attack-defense game [3], [4] even if multiple APT attackers may exist in a real system.Further, machine learning (ML) is not employed with game-theoretic DD especially in developing honeypots [4].
This work develops a game framework that considers six CKC stages, and a single defender against multiple APT attackers.Considering all six stages and multiple attackers increases the complexity of the game.However, such a comprehensive game can provide more realistic conclusions than over-simplified cyber game studies.
Our work make the following key contributions: • We consider a cyber deception hypergame whose multiple APT attackers and a single defender play a game under uncertainty, where the APT attackers perform multistaged attacks following the Cyber Kill Chain (CKC).We propose a bundle-based defense strategy given a defense budget constraint.Our approach is the first that can handle multiple APT attackers in a hypergame-theoretic environment.
• We present an ML-based cyber deception hypergame for the defender to directly identify its optimal strategies based on more accurate prediction of an opponent's move.
No prior work has considered such an ML-based cyber deception game.• We conduct an in-depth comparative performance analysis of eight schemes under various conditions associated with DD, information availability, or ML.The results show that the effectiveness of DD strategies is maximized under imperfect information whereas strategies using hybrid hypergame-ML-based defensive deception outperform the baseline strategies.
• We investigate Hyper Nash Equilibrium (HNE) [15].We show the degree of the discrepancies observed between the HNE solutions by APT attackers and the defender, respectively, across the whole repeated game.This shows how hypergame theory addresses the effectiveness of DD as the defender can increase the attacker's uncertainty via DD to achieve attack failure.This is the first work that proposes a cyber deception hypergame with the analysis of HNE.• In addition to demonstrating the degree of perceived uncertainty by the attacker and defender and their hypergame expected utilities (HEUs), we also show how much the quality of the network-based intrusion detection system (NIDS) is improved in terms of reducing false positives and negatives, recall, and precision.This is to show how effectively the defensive deception techniques improve the quality of a legacy defense mechanism, such as NIDS.Our prior work [13] reported the findings for a network dealing with APT attacks where a single attacker and defender play a hyper game with limited features.This work substantially extended [13] by introducing the additional significant contributions as below: Fig. 1.The IoT system is under attack by multiple attackers.And the defender uses honeypot to lure inside attacker and protects Web server & database.The attacker on the left side is an insider attacker and lured by honeypot.The attacker on the right side is an outside attacker and it chooses phishing attack according to the associated HEU value.
• Unlike [13], which only considers a single attack-defense interaction and the associated hypergame, this work considers multiple attackers simultaneously arriving in a given system which interact with a defender taking a bundle-based defense strategy.• We reformulated the entire hypergame framework to deal with multiple attackers and a defense with multiple strategies in a bundle to deal with them.This reformulation of the hypergame includes belief calculation, hypergame utility estimation, and uncertainty calibration.To clearly demonstrate this extended form of the proposed hypergame, we demonstrate an extensive-form game between an attacker and a defender in Fig. 1 of the supplement document.• We also conducted HNE analysis to identify the discrepancies between the attacker's HNE and the defender's HNE.When the HNE hitting ratio is low, it means the attackers and defender do not have a common view about the hypergame and they can choose sub-optimal or poor strategies due to the uncertainty.This is measured in order to explain how DD can be more effectively deployed under uncertainty by deceiving the attacker better.• We also employed a machine learning (ML) algorithm to identify a defender's optimal strategy which is also compared with the counterpart without ML.We selected the decision tree algorithm, which performed the best among all other ML algorithms, and compared its performance with the HEU-based counterpart.• We considered four additional schemes in this paper, compared to [13], to conduct in-depth experiments for extensive performance analysis.In addition, we illustrated our proposed hypergame theoretic framework under a given experimental setting to provide a clear conceptual idea proposed in this work via Figs. 1 and 2. The rest of this work is structured as follows.Section II provides the overview of the related work.Section III describes the system model, including network model, node model, and system failure conditions.Section IV discusses the attacker and defender models including how a hypergame framework is formulated.In addition, we provide the details of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 2. Overview of the proposed hypergame between an attacker and a defender where they view the same game differently and make decisions based on the Hypergame Expected Utility.
proposed game-theoretic ML-based defense strategy selection.Section V provides the details of the simulation environment, comparing schemes, and metrics used for the comparative evaluation.Section VI demonstrates the experimental results for sensitivity analysis and discusses the overall trends of the observed results, along with the discussions of real world applications.Section VII concludes the paper and suggests the future work directions.

II. RELATED WORK
In this section, we provide an overview of the related work in terms of game-theoretic defensive deception, cybersecurity research using hypergame theory, game-theoretic defense against multiple attackers, and game-theoretic ML research.

A. Game-Theoretic Defensive Deception
Xiao et al. [16] investigated the framing effect of an attacker on the APT detection.They used Cumulative Prospect Theory to model players' subjective decision-making under uncertainty and derive NE solutions.Basak et al. [17] used a multi-stage Stackelberg game for a defender to optimally deploy honeypots and identify an attacker's type.Pawlick and Zhu [18] formulated a two-player signaling game where a defender protects a system using honeypots while an attacker can detect the honeypots based on evidence.Píbil et al. [19] modeled a zero-sum game of imperfect and incomplete information where a defender aims to maximize the chances of an attacker selecting honeypots while an attacker is to maximize the chances of selecting real servers as targets.However, they [16], [17], [18], [19] consider neither multiple APT attackers nor perceived uncertainty.

B. Cybersecurity Research Using Hypergame Theory
Bennett [6] introduced hypergame theory (HT) to analyze how a player's beliefs, misbeliefs, and perceived uncertainty affect its decision making.Vane [7] extended HT by introducing the concept of hypergame expected utility (HEU), an expected utility estimated based on each player's subjective belief under uncertainty.HT research was explored in the military for decision making in adversarial settings [20], [21], [22].Recent efforts explored cybersecurity research using HT.Ferguson-Walter et al. [11] provided an HT-based framework quantifying how defensive deception can manipulate an attacker's beliefs on available actions and expected payoffs under uncertainty.Bakker et al. [23] [24] formulated a Bayesian hypergame for robustness of a cyber-physical control system against attacks.Cho et al. [25] formulated a cyber deception hypergame (CDHG) and investigated how an attacker and a defender choose their best strategy by evaluating Stochastic Petri Nets with the CDHG.However, no prior work has adopted HT to consider DD to defend against multiple APT attackers [4].

C. Game-Theoretic Defense Against Multiple Attackers
Gao et al. [26] proposed a non-zero game with multiple attackers and multiple defenders in a massive Machine Type Communication with many communication terminals the multiple attackers can access.Xu and Zhuang [27] designed a sequential game with complete information to model multiple independent attackers.Sanjab and Saad [28] formulated a Stackelberg game between multiple adversaries to deploy data injection attacks and a defender in a smart grid environment.However, work [26], [27], [28] did not consider a bundle defense strategy to handle multiple APT attackers.

D. Game-Theoretic Machine Learning (ML) Research
ML has been applied to enhance players' decision-making in choosing their best strategies.Wahab et al. [29] applied a Support Vector Machine (SVM) for a defender to better perceive an attacker's type based on a repeated Bayesian Stackelberg game.Chung et al. [30] demonstrated the superiority of Q-learning over a conventional two-player stochastic game in choosing optimal defense strategies against suspicious users.Kantarcıoglu et al. [31] considered a multi-stage Stackelberg game and analyzed its equilibrium.This approach improves a Bayesian-based classifier to discourage the attacker to change its strategy.Xu and Xie [32] solved an intrusion detection problem to predict a value function using a Markov chain model.However, unlike our approach, no prior work has applied ML where it uses the key features of a given game, including attack cost, attack impact, past attack moves, or APT attack stages in the CKC, to directly identify an optimal defense strategy [4].

III. SYSTEM MODEL
In this section, we provide the details of the network model, node model, and system failure conditions.

A. Network Model
We assume an Internet-of-Things (IoT) system with a central entity playing a defender role to deal with multiple APT attackers.For example, such network environments include an IoT with a software-defined network (SDN) using a single SDN controller [33], an IoT network with an edge server or central cloud server, or a hierarchical wireless sensor or adhoc network with a centralized coordinator [34].Although our work considers a single defender to make a decision and identify an optimal defense strategy, it can be extended to consider multiple defenders dealing with multiple APT attackers.We consider a mobile network and model it using a rewiring probability P r i .This enables us to analyze the DD performance under a dynamically changing network.This work can be extended to enterprise networks with P r i = 1.We assume a NIDS deployed in the central controller.Developing an IDS is beyond the scope of this work.Hence, we simply characterize it by its false positives (P fp ) and false negatives (P fn ).We assume NIDS runs throughout the system lifetime.The NIDS's P fp and P fn are dynamically updated upon receiving attack intelligence from DD techniques.Attack signatures collected by DD-based monitoring can reduce P fn .We use a Beta distribution to derive Beta(P fn ; α, β) where α refers to false negatives (FN) and β to true positives (TP).Similarly, for P fp , we derive Beta(P fp ; α , β ) where α refers to false positives (FP) and β to true negatives (TN).We model the effect of updated attack signatures in the NIDS via DD-based monitoring mechanisms by adding evidence to TP or TN so that the expected values of FN and FP are updated as E [P fn ] = FN /(FN + TP ) and E [P fp ] = FP /(FP + TN ), respectively [35].
To maximize the effectiveness of DD strategies, we allow detected, compromised nodes to stay in the system based on their risk level.If node i has been identified as compromised, and i's importance level is higher than a predefined threshold, Th risk , the defender evicts the node.For secure communications, legitimate, normal users use a secret key for access control for secured network resources.To access network resources, an outside user should be authenticated and given a secret key.Each user can access the network resources depending on the user's privilege level.
To help understand the model, we provide Fig. 1 to demonstrate the IoT system under the attack by multiple attackers,

B. Node Model
We consider an IoT network with variant node types, including IoT devices, and Web and database servers.Following the design of existing security-aware systems [36], [37], [38], we consider an IoT system with databases separately from Web services.In addition, we consider honeypots as one defense strategy (i.e., DS 5 ).
An asset (or node)'s criticality has been measured in terms of its importance in terms of information, role, or operation in service provision, performance, or security [39], [40], [41], and influence or power in a network [42].The underlying idea of a node's criticality measure is to represent how much adverse impact or damage can be introduced to the system when the asset (or node) is compromised or fails.Hence, we measure node i's criticality, c i , in terms of the confidential information it has and its centrality in a network by: where importance i is given by an integer in the range specified in Table I, and centrality i captures node i's centrality using fast betweenness centrality [43].
We consider three types of vulnerabilities of a node exposed to various cyberattacks: (1) software vulnerabilities, sv i ; (2) encryption key vulnerabilities due to compromised secret or private keys, ev i .We consider a higher vulnerability for the key used for a longer period of time by ev i = ev i • e −1/T rekey where T rekey refers to the time elapsed of using the key; and (3) an unknown vulnerability, uv i in [0, 10], indicating a mean unknown vulnerability.Each node's software and encryption vulnerability values follow the ranges in Table I.When the defender takes defense strategies, node vulnerability levels are affected.Table III summarizes the impact of each defense on the system vulnerabilities.The three vulnerabilities are calibrated based on the Common Vulnerability Scoring System (CVSS) [44] with severity represented by a real number in [0,10].The CVSS provides the characteristics of software/encryption/unknown vulnerabilities and provides numerical scores to reflect the security level of a system [44].The normalized overall vulnerability is obtained by: where V i is a set of vulnerabilities node i has (e.g., {sv 0 , sv 1 , sv 2 , ev 0 , ev 1 , ev 2 , uv 0 }), and v i j refers to one of node i's vulnerabilities calibrated as a real number in [0, 10] based on the CVSS.P v i is used as the probability for the attacker to compromise node i.We summarize node characteristics, in terms of each node type's importance, software vulnerability, and encryption vulnerability in Table I.Notice that Web servers and databases have higher importance levels than other node types.In addition, it is assumed that they have lower software and encryption vulnerabilities by using stronger encryption techniques and thus making them less vulnerable.We set the importance of the honeypots to zero.Further, we set high software and encryption vulnerabilities for honeypots to lure attackers more effectively.We consider IoT devices as legitimate user devices with medium levels of importance and vulnerabilities.For each node characterized by all these attributes, we randomly select a value in a given range to consider the node heterogeneity of the IoT network.

C. System Failure Conditions
We consider a repeated game where players continue playing the given game until the system fails.To allow this, we define a system failure (SF) condition by: where G t is a given network at game t excluding nodes being evicted, and G is a network originally deployed at game t = 0. Correspondingly, |G| and |G t | refer to the number of the nodes in G and the number of nodes in G t in game t, respectively.We use cp i to indicate the status of node i being compromised (i.e., cp i = 1 for compromise; 0 otherwise).We use two thresholds to define SF as above: (1) ρ 1 refers to a predefined threshold to decide system failure based on the extent of information compromised in the system (i.e., the sum of the importance of all compromised nodes); and (2) ρ 2 is a predefined threshold to evaluate if the system has a sufficient number of nodes to provide minimum services at game t.These two conditions can determine a system failure based on loss of confidentiality, integrity, and availability.
According to the concept of Byzantine Failure [45], when more than one-third components (or nodes) of a system are compromised, it cannot operate properly.To reflect this, we set ρ 1 = 1/3 and ρ 2 = 2/3 to ensure the number of active nodes is no less than 2/3 of all nodes and the amount of important information leaked out is less than 1/3 of the total important information.Although the one-third-condition is conventionally used in the concept of Byzantine failure, those thresholds can be determined based on a system or organization's requirements.

IV. CYBER DECEPTION HYPERGAME
In this section, we describe the overview of the cyber deception hypergame considered in this work.We provide the details of hypergame theory in Appendix A of the supplement document.Since we use game theory, the attackers and the defender are assumed to choose their optimal strategies to maximize their respective utilities.

A. Attacker Model
We consider an APT attacker that performs multiple attacks throughout each stage of the CKC..In Table II, we summarize the characteristics of each attack strategy p in terms of what CKC stage p can be used in, associated attack cost, whether using p can compromise a single node or multiple nodes, and the vulnerability the attacker can exploit by taking p.The C t p represents a set of nodes which the attacker expects to compromise after taking p at game t and c j indicates node j's criticality as in Eq. ( 1).The attacker selects the set of nodes based on each node's vulnerability probability P v j .When the attacker takes p, the expected attack impact, ai t p , is estimated by: We assume that the attacker knows the vulnerabilities and criticalities of its adjacent nodes and can calibrate the expected attack impact and where it is located in the CKC, which determines the subgame κ it plays.
Since attacker i may have multiple adjacent nodes, it tries to compromise adjacent node j with P v j .If attacker i successfully compromises node j, it moves to node j and keeps compromising node j's adjacent nodes until the attacker collects a sufficient amount of data and is ready to exfiltrate the data by taking AS 8 (i.e., data exfiltration attack).
For the attacker to maximize its chances to compromise a legitimate node, it selects a node with maximum exploitability.We define exploitability as the extent that attacker i can compromise node j by taking strategy p, which is given by: if j is detected as a honeypot with ad i for LHs or ad i /2 for HHs; where ac p is a predefined attack cost by taking p, in [0, 3] (see Table II), and P v i is the normalized overall vulnerability (see Eq. ( 2) in Section III-B).If attacker i can detect node j as a honeypot, it avoids choosing node j as the target.Otherwise, it treats node j as a normal node even if node j is a honeypot.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
We consider a repeated game consisting of multiple games between a defender and attackers.We consider that a new attacker arrives with probability P A .At each game, the game is defined as a one-shot interaction between the attacker (i.e., inside or outside attackers) and the defender system.We model a game of imperfect information in each round of the game where the players move simultaneously with no knowledge of the opponent's moves.However, the players know their opponent's moves in a previous round.The play repeats until the system fails (i.e., until the game ends) based on Eq. ( 3).
The key characteristics of APT attacks are considered where an attacker chooses a strategy given in a subgame, corresponding to each stage of the CKC.That is, after the attacker successfully performs the chosen attack strategy in each stage of the CKC (i.e., an attack strategy in the corresponding subgame), it can proceed to the next stage of the CKC and can take one of the strategies available in the next stage, which is also considered in the corresponding subgame.Table II summarizes attack strategies availableineachstageoftheCKCinadditiontothecorresponding attack cost, involvement of node compromise attack, and what vulnerabilities are exploited to perform the attack.
1) Attack Strategies: We consider the below attack strategies the APT attackers can take by following the six stages of the CKC [46], [47], [48].
• AS 1 -Scanning attack: This attack is to collect vulnerability information and identify targets to attack.It can be performed by both inside and outside attackers in all stages.This passive attack does not involve any node compromise, implying that the attack impact is zero, and attack cost is low with ac 1 = 1.The attacker first selects target node j at random and monitors it.We define the scanning attack success (SAS) for node j by attacker i as: , where P v j is obtained by Eq. ( 2) and T At j is the elapsed monitoring time attacker i has monitored the target system.Note that the attacker's elapsed monitoring time, T At i , is reset when the system changes its configuration.When the system disseminates patches to mitigate vulnerabilities (DS 2 ), the attacker's collected intelligence about the system vulnerability may not be valid any more.In addition, the attacker may obtain false information misled by honeypots (DS 5 ) or honey information (DS 6 ).
• AS 2 -Phishing: This attack includes social engineering, pretexting, baiting, and tailgating [49].An outside attacker sends a phishing email to nodes that have been scanned.If the attacker is outside of the target system (i.e., in R or D stage in the CKC), it sends phishing emails up to N p number of monitored nodes in the target system.However, if the attacker is an inside attacker, it propagates phishing emails to its neighbors.Based on the vulnerability level of each node (i.e., P v i ) receiving the phishing emails, this attack can be successful.Since humans can easily detect phishing emails with the help of embedded phishing detectors, it is quite costly to develop a real-like phishing attack.We consider it with a high attack cost, ac 2 = 3.
• AS 3 -Botnet attack: A botnet is a set of compromised nodes.Bots perform epidemic attacks such as spreading malware to compromise adjacent nodes [50].This strategy incurs a high cost of ac 3 = 3 involving multiple bots.), the attacker exfiltrates high-value system information to unauthorized outside parties.It incurs high cost with ac 8 = 3. 2) Attacker's Hypergame Expected Utility (AHEU): AHEU is estimated based on the function of a player's perceived uncertainty, expected utility for a given strategy, and the player's belief about what strategies to take in a given subgame when an opponent takes a certain strategy.
Attacker's Uncertainty: Unlike existing hypergame approaches considering a static uncertainty level [7], [52], we dynamically estimate each player's uncertainty as follows.Attacker i's uncertainty level at game t (i.e., g At i ) can be estimated by (i) whether a DD technique is used or not (dec = 1 when DD, i.e., DS 5 -DS 8 , is used; dec = 0 otherwise); (ii) how long attacker i has monitored a target defense system (T At i ); and (iii) in what extent attacker i can detect the DD used by the defender (i.e., deception detectability, ad i ).Attacker i's uncertainty, g At i , is given by: where λ is a constant to model the amount of prior information before the attacker penetrates the target system.A higher λ represents less prior information, leading to higher uncertainty g At i .
Attacker's Utility: It is attack strategy p's utility when the defender takes strategy q at game t, u At pq , and is estimated by: where G At pq is the gain and L At pq is the loss due to taking attack strategy p when the defender takes strategy q in game t.The attack gain comprises attack impact, ai t p , by taking p based on Eq. ( 4) and defense cost, dc t q , in game t.The attack loss includes defense impact, di t q , based on Eq. ( 10) and attack cost, ac t p .We consider a zero-sum game where one player's loss is the other player's gain.
Attacker's Belief: This is estimated based on the frequency of strategies p's taken in the past in a given subgame κ, as described in Eq. ( 3) of Appendix A of the supplement document.A subgame follows each stage in the CKC.We denote this as the probability of the attacker taking strategy p at game t, denoted by r At κp .We denote the probability of the defender taking strategy q at game t by c Dt κq based on the attacker's belief.Since the attacker does not have past experience at the beginning of the game, it randomly chooses an available action in a given subgame κ.As the game evolves, the attacker can shape the probability distribution of its actions.By using the Dirichlet distribution [53], we estimate r At κp and c Dt κq by: where AS κ and DS κ are sets of attack and defense strategies, respectively, and γ At p and γ Dt q are the amounts of evidence observed for the attacker and defender taking strategy p and q, respectively.The γ At p and γ Dt q refer to the accumulated amounts of evidence collected at game t during [0, t−1].We consider an attacker's successful intelligence collection based on its perceived certainty (1 − g At i ).Both r At κp and c Dt κq are estimated by the attacker's belief.Hence, the attacker may not have ground-truth r At κp and c Dt κq due to the uncertainty.Finally, attacker i's HEU (AHEU) at game t by taking p is: where g At i is attacker i's uncertainty at game t.Due to space constraint, we omit details of computing HEU(•) [13].

B. Defender Model
The defender plays a game based on which subgame κ, corresponding to the CKC stages, is played.As in Table III, each defense strategy has its corresponding defense cost, availability in each CKC stage (i.e., subgame κ), and how it makes changes in a given system.Unlike the attacker who knows what stage it is in the CKC, the defender may not know the CKC stage.This is modeled by the defender's perceived uncertainty, g D i , in Eq. (11).The defender surely knows the attacker's CKC stage (i.e., subgame κ) with probability 1 − g D i while considering a full game with all possible defense strategies (i.e., DS 1 − DS 8 ) with probability g D i due to the defender's uncertainty.With APT attackers, the defender may observe different attackers at different CKC stages.We where AS t is a set of strategies taken by the attackers at game t, ξ is a positive constant to adjust the defense impact, and ai D p is the expected attack impact of taking strategy p based on the defender's belief.This implies ai D p = 0 if the attacker does not exist in the network or is not detected by the NIDS.The c A κp is the probability of the attacker taking p strategy in subgame κ based on the defender's belief in Eq. ( 14).The defender predicts the attack strategy for each subgame based on perceived certainty (1 − g Dt (i )).Since the overall attack strategy probability is obtained by K κ=0 c A κp , the misprediction for each subgame caused by g Dt (i ) leads to an inaccurate prediction of the defense impact.We assume that the defender keeps track of the locations and behaviors of compromised nodes and collects attack intelligence as discussed in Section III.Hence, the defender can estimate ai D p .In addition, we consider misdetection of the NIDS by considering false positives and negatives.
1) Defense Strategies: The defender considers a bundle strategy to play, consisting of more than one defense strategy.It considers only defense strategies available to a given subgame κ as in Table III, with each defense strategy as below.
• DS 1 -Firewalls: Firewalls monitor and control incoming and outgoing network flows based on predefined rules.When the defender enables the firewalls, it reduces unknown vulnerabilities (uv i ) of existing nodes.We model this by reducing uv i with 2 %. • DS 2 -Patch management: Patches reduce known vulnerabilities [54] by fixing discovered software vulnerabilities or providing updates in a full software package.This is modeled by reducing software vulnerabilities of all nodes, by decreasing sv i with 2 %.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
• DS 3 -Rekeying cryptographic keys: This strategy reduces encryption vulnerability, êv i = ev i • e −1/T rekey where it reduces when T rekey is reset (i.e., T rekey = 1).• DS 4 -Eviction: As discussed in Section III-A, compromised nodes can stay in the system to collect further attack intelligence.However, additional compromise of confidential information can lead the system to fail faster based on the conditions in Eq. ( 3).The defender can take this strategy to evict all detected, compromised nodes to reduce vulnerabilities.• DS 5 -Low/high-interaction honeypots (LHs/HHs) [55]: The defender can use LHs and HHs to lure attackers.This strategy is taken to change attack paths as LHs and HHs are connected with existing nodes, particularly nodes with high vulnerabilities.HHs are connected with more vulnerable nodes than LHs.To reduce the attacker's access to legitimate nodes, once the attacker reaches a honeypot, it can move only to other honeypots, not to legitimate nodes.This strategy can counter the AS 2 −AS 8 attack strategies by luring attackers.Since HHs are more difficult to be detected by the attacker than LHs, we set ad for the attacker to detect LHs and ad/2 to detect HHs.• DS 6 -Honey information [56]: The defender uses honey information, such as files, tokens, or fake patches indicating vulnerabilities to mislead attackers.This strategy applies defend against AS 1 , AS 2 , AS 5 − AS 8 as those attacks aim to obtain target information (i.e., vulnerability, private key) for their attack success.• DS 7 -Fake keys [1]: The defender gives fake keys to attackers aiming to compromise keys and obtain confidential information using them.Particularly, when an attacker intends to compromise cryptographic keys when taking AS 2 , AS 6 , AS 7 , or AS 8 , this prevents the attacker from compromising legitimate nodes with a fake key.• DS 8 -Hiding network topology edges [57]: The defender hides some edges in the network.To select edges to hide, we select C NT of nodes with the highest criticality and select an edge to an adjacent node with the highest criticality to protect more important nodes with higher priority.This ensures hiding the C NT number of edges that can change attack paths.When the attacker performs AS 3 − AS 8 , this strategy protects the system by hiding the neighbors of a compromised node and preventing the escalation of those attacks.When the defender takes defensive deception techniques, including DS 5 − DS 8 , the attacker i can detect it using its deception detectability ad i .The ad i affects the attacker's perceived uncertainty, which impacts the AHEU and the choice of strategy.The honeypot (DS 5 ), honey information (DS 6 ), and fake key (DS 7 ) deployed in the previous game t are cleared in a new game where the whole repeated game consists of multiple games until the system fails.
2) Defender's Hypergame Expected Utility (DHEU): DHEU is a function of the defender's perceived uncertainty, the utility of each strategy, and its belief.
Defender's Uncertainty: We estimate the defender's uncertainty towards attacker i at game t, g Dt (i ), by: where ad i is attacker i's deception detectability, T Dt i is the monitoring time elapsed for attacker i by the defender, and μ is a constant to adjust the scale.The defender can only keep track of attackers (or compromised nodes) detected by the NIDS, and will miss false negatives.
Defender's Utility: The utility of defense strategy q, u D qp , given attack strategy p, is estimated by: This is a zero-sum game, with u At pq (i ) + u Dt qp (i ) = 0. Defender's Belief: The defender forms its belief, r Dt κq , representing the probability the defender will take strategy q based on past experience.The defender forms a belief about the strategy the attacker will take, denoted by c At κp .The defender selects its strategy at random at the beginning of the game and revises its belief by playing more games and observing the moves of the opponent.The multinomial belief probabilities follow a Dirichlet distribution [53] based on the strategies played in past games.The beliefs r Dt κq and c At κp are given by: Note that AS κ and DS κ are sets of attack strategies and defense strategies, respectively.Here, γ D κq and γ A κp are the amounts of accumulated evidence of the defender taking strategy q and the attacker taking strategy p for the period [0, t−1] in game t.Since the defender has its uncertainty g Dt (i ) towards attacker i, it will correctly observe each evidence with probability (1 − g Dt (i )) while missing it with g Dt (i ).
The defender's HEU (DHEU) is estimated by summing up all HEUs for all attacks i's by: Since the defender is subject to the defense cost budget B, it selects a defense bundle, including multiple defense strategies to maximize the DHEU of the bundle.The defender selects a set of defense strategies as a bundle to deal with multiple APT attackers based on the utilities of the bundled defense strategies, given cost constraint B. The DHEU of the defense bundle is called Collective-DHEU (C-DHEU), and given by: where S d is a set of defense strategies in a given bundle.Fig. 2 describes how each player (i.e., an attacker and a defender) views the same game differently and makes decisions based on their observation and beliefs based on our discussions in this section.In addition, to clarify the game scenario considered in this work, we show an extensive-form game between an attacker and a defender in Fig. 1 of the supplement document.
Our model provides substantial details of attacker-defender interactions while avoiding unnecessary complexity in the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
hypergame.For example, the attack and defense costs provide a basis to estimate the expected utilities of the two players.However, they can be considered as the costs from the real world and be easily scaled in a larger range.

C. Game Theory-Guided, Machine Learning-Based Defense Strategy Selection
We introduce a defense strategy selection method based on game theory-guided machine learning (ML) where a game is with perfect or imperfect information.Under imperfect information, we formulate a hypergame, as in Section IV, where players observe opponents under uncertainty and select strategies based on the AHEU/DHEU.The defender can predict an optimal defense strategy using ML to form the best bundle strategy based on its learning from the training dataset where the attacker follows game-theoretic (either hypergame or conventional game) choices of identifying its best strategy.When generating training datasets, for each game, the attacker chooses its strategy based on AHEU while the defender takes its defense strategy at random.This is to include a wide range of defender behaviors including strategies with utilities, allowing an ML technique to better predict the defender's best bundle strategy across all possible combinations of eight defense strategies.
We employ the decision tree (DT) algorithm [58] to solve this classification problem because DT shows the best performance (highest MTTSF) empirically, compared to other ML techniques (e.g., k-nearest neighbors, KNN, or Support Vector Machines, SVM).Then, we select an optimal defense strategy based on the predicted probabilities by DT.The optimal defense strategy at each game t is based on input X data for the following features: Observed attack strategy at game t−1, associated attack cost, defense cost associated with a given defense strategy, expected attack impact at game t, expected defense impact at game t, the defender's uncertainty level, and the CKC stage perceived by the defender.We calculate a defense strategy's utility using Eq. ( 13) at time t, and identify the optimal defense strategy Y as the strategy with the highest utility.We adopt DT to learn the training dataset (X, Y), considering the same weight for each sample.Depending on the game type, e.g., perfect or imperfect information, the perceived observations can differ, influencing the features considered by the defender and accordingly its prediction of the best defense strategy.In our experiment, the defender applies ML to choose its optimal strategy.The attacker does not apply ML.We use 2,000 runs to collect datasets for a variety of scenarios and the corresponding utilities for better predicting optimal defense strategies.Each run refers to a whole repeated game consisting of multiple rounds of subgames.Note that a 'round' includes the following four steps: (1) Attackers select and perform strategies; (2) A defender selects and performs bundle strategies; (3) The attackers observe the defender's moves; and (4) The defender observes attackers moves.Since in each round we assume a static game, meaning players take actions at the same time, (1) and ( 2) are performed simultaneously.Similarly, (3) and ( 4) are also performed simultaneously.Note that after each round, a next round can use the experience in previous rounds, which is a dynamic game.The procedure of each subgame is discussed in Section V.Each node of DT has 14 parameters, and the trained tree for each scheme has about 20,000 to 24,000 nodes, resulting in around 280,000 to 336,000 parameters with DT.

V. EXPERIMENT SETUP
This section provides the details of the simulation environment, comparing schemes, and metrics used for the experiments.

A. Simulation Environment
We consider a network of N nodes based on an Erdös-Rényi (ER) model [59] representing a random graph with each node being connected with other nodes with rewiring probability P r .In addition to 100 (=N) legitimate nodes in the network, we considered 15 honeypots (5 HHs and 10 LHs).For the given honeynet, a directed network is formed with no out edges to legitimate nodes.Hence, if an attacker compromises a honeypot and moves to it, it can only connect to other honeypots.When the defender chooses DS 5 , we connect highly vulnerable nodes to the honeypots.Given 100 nodes, we connect the most vulnerable 45 nodes with honeypots: The top 15 vulnerable nodes are connected with 5 HHs and the next top 30 vulnerable nodes are connected with 10 LHs.The attacker has prior knowledge of a target system whereas the defender has much less knowledge about the attacker.We considered this by setting λ = 1 and μ = 10 in Eqs. ( 6) and ( 11), respectively.We used n r = 1, 000 simulation runs to show the results with the average values.
In each subgame, a new attacker arrives as an outside attacker with probability P A .The new attacker starts by performing the scanning attack (AS 1 ) in the first stage of the CKC (i.e., R).If the attack is successful based on P SAS ij , it means attacker j identified vulnerable, legitimate node i and process with the next attack stage, D, in the CKC.Similarly, as the attacker enters the next stage, it chooses one of the attack strategies shown in Table II.We consider a repeated game of complete and imperfect information where each subgame is based on the stage of the CKC.This implies that, in this game, players may not know about their opponent's move with 100% accuracy due to their perceived uncertainty but there are no multiple players' types.An attacker moves to the next CKC stage when it succeeds in its attack.If the attacker collects a sufficient amount of confidential data based on Th c in AS 8 , it attempts to leave the target system upon the success of AS 8 .
We develop our own simulation model to have more flexibility and due to the following reasons.First, since we need to implement 16 strategies in all, it adds too much complexity when an existing simulator is used.In addition, due to the unique nature of hypergame theory, we need to develop a game framework where each player has its own view about the given game, which adds unreasonably high complexity and overhead.
All simulations use Python 3.6 and the Scikit-learn 0.19.1 library for DT algorithm.The source code is available at GitHub [60].We provide the key design parameters, their meanings, and default values used for our experiments in Table I of the supplement document.

B. Comparing Schemes
We compare the performance of eight schemes based on the combinations of: (a) with DD, with DD using ML (DD-ML), or without DD (No-DD); and (b) hypergame (HG) with imperfect information (IPI), traditional game (G) with perfect information (PI), or Random.Except for Random with or without DD, both HG and G are considered with all three types of DD.These combinations give the eight schemes including: HG-DD-IPI, G-DD-PI, HG-DD-ML-IPI, G-DD-ML-PI, DD-Random, HG-No-DD-IPI, G-No-DD-P, and No-DD-Random.

C. Metrics
We use the following metrics for our experiments: • Players' Uncertainty (g A or g D ): An attacker's or a defender's mean uncertainty measured based on Eqs. ( 6) and (11), respectively, by: where S t is a set of attackers in game t and T is a set of subgames in the given whole repeated game.• Collective Hypergame Expected Utility (C-HEU): This refers to the mean C-HEU for attackers and a defender.The defender's mean C-HEU, namely C-DHEU, for a repeated game with multiple subgames until the system fails is: where C-DHEU T S d is the defender's HEU for selected strategies in a given defense bundle at game t.The attacker's C-AHEU is: where C-AHEU t Sa means the sum of AHEUs for all attack strategies taken by all attackers and is measured by: where S a is a set of pairs including attackers and their chosen strategies in a given game t, rs At ip is strategy p chosen by attacker i, and g At i is i's uncertainty in given game t.
• Mean Time To Security Failure (MTTSF) measures the average system lifetime based on Eq. ( 3).

VI. RESULTS & ANALYSES
In this section, we demonstrate the effect of varying the attacker arrival probability on the performance metrics in Section V-C.Due to the space constraint, We demonstrate additional results for the comparative performance analyses under varying the vulnerability bounds and when a setting is fixed in Appendix B of the supplement document.We also discuss Hypergame Nash Equilibrium to investigate how the views of an attacker match with those of a defender, which plays the key in defensive deception.Lastly, we discuss how the hypergame theoretic defensive deception techniques have insightful implications in real-world systems.Greater use of DD helps the defender obtain greater attack intelligence, decreasing FNR and increasing TPR.Because DD-ML-based schemes prefer optimal strategies with higher utilities, higher TPR under smaller P A means that selected DD strategies in a bundle can provide the defender higher utilities under smaller P A . ( 4 3) Hypergame theoretic ML-based schemes show higher utilities as P A increases because each defense strategy can handle more attacks (i.e., DHEU is the sum of HEU towards all attackers).( 4) Although HG-DD-IPI is the third-best performing scheme in MTTSF, its utilities are not much distinct from other schemes, including Random-based schemes.This implies that since the defender is concerned about the defense cost in addition to effectiveness, the defense strategy with higher utility does not necessarily lead to a longer system lifetime.In addition, ML-based schemes yield higher utilities as more attackers arrive in the system.This is because the arrival of more attackers can provide less benefit for more effective defense strategies while giving more benefit to less costly defense strategies.Recall that the defense utilities are estimated based on both the defense impact and cost.

B. Hyper Nash Equilibrium
Nash Equilibrium (NE) is a widely adopted solution concept in game theory (i.e., an action to take) assuming that each player knows the equilibrium strategies of all other players and its belief towards what strategy other players take is correct.NE assumes that each player has accurate beliefs about their opponent which does not reflect real-world situations.HNE tackles how players' choices of strategies under uncertainty differ from NE.In a hypergame, players do not have the same views toward other players and may have incorrect beliefs about the next moves of other players.Due to uncertainty, players' best responses may not be aligned with the best responses with the inaccurate beliefs of an opponent, which is assumed in NE.HNE addresses biased observations.Although common knowledge about what players know about other players can form their own beliefs, it does not guarantee maintaining correct beliefs among all players.Therefore, NE cannot be applied in this hypergame setting [15], [61], [62], [63].
For a game Γ N , {A i } N i=1 , {v i } N i=1 consisting of N rational players i's, a set of actions player i can take, A i = {a 1  i , a 2 i , . . ., a m i } for m actions available, and −i 's refers to players except player i, we define a * i as HNE if the following is met: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV PAYOFF MATRICES FOR HNE CALIBRATION
where a ji is player j's action based on player i's beliefs, A ji is player j's strategy space based on player i's beliefs with a ji , a * ji ∈ A ji , and v i (•) is the player i's payoff function.This implies that the attacker views G as the attacker's game, G A , and the defender views G as the defender's game, G D , where G A and G D may or may not be the same as G.In addition, even if G A = G D in a some subgame of the repeated game, it does not mean G A = G D = G because G A = G D means both attackers and defenders have the same view of G regardless of the correctness of their views towards G. Existing HNE studies [15], [61], [62], [63] considered only static utilities but not perceived uncertainty in formulating a hypergame.In our work, we use players' HEUs (i.e., AHEU for attackers and DHEU for the defender) to obtain v i (a i , a ji ).Since each player estimates an opponent's HEU based on its own beliefs about the opponent's move, in our attack-defense hypergame, the defender calculates AHEU by replacing r At  8) and (14).We show example payoff matrices of the attacker's and defender's games to obtain HNE in Table IV.We use this method to show whether the attacker's HNE matches the defender's HNE in our game.We investigate the hitting ratio when these two HNE match.
In HNE, each player has its view of given game G [15], [61], [62] Fig. 5 shows the mean fraction that the HNE in G D equals the HNE in G A , called the HNE hitting ratio under HG-DD-IPI, G-DD-PI, and DD-Random schemes based on 1,000 simulation runs.We show the hitting ratios of only these three schemes because ML-based or No-DD-based schemes are not meaningful.Fig. 5 shows: (1) The ratio of DD-Random scheme is close to zero because players select a strategy randomly and each player is fully uncertain about the opponent's moves; (2) HG-DD-IPI has a low HNE hitting ratio at the beginning but gradually increases as the attackers and the defender monitor the opponents for longer time, resulting in reduced uncertainty; and (3) G-DD-PI shows a higher HNE hitting ratio than HG-DD-IPI because it uses perfect information.However, the reason for having a low hitting ratio (<0.6) is because the defender uses DD strategies, which increase uncertainty for the attacker.This creates a lack of the existence of HNE, which is considered missing in the HNE hitting ratio.To show meaningful results where all three schemes show the results are non-zeros (since MTTSF is different under each scheme), we show the HNE hitting ratio for the first 120 games.Due to the space constraint, we demonstrate the additional results for the comparative performance analyses under varying the vulnerability bounds and when a setting is fixed in Appendix B of the supplement document.

C. Implications in Real-World Applications
Defensive deception technologies include honeypots to detect threats from the Internet [64], protect the data of government or companies [65], [66], or lure attackers to protect high-value system assets [67].However, they are mainly limited to honeypot development or deployment.Although game theory is not fully applied in practice, a rich volume of game-theoretic DD techniques have been proposed and validated via analytical, simulation, emulation, or testbed experiments [4].Our work makes the following contributions.First, we validate our model both theoretically and empirically.Specifically, for theoretical validation, we evaluated our model based on the core metrics of game theory, including utilities, perceived uncertainties of players, the gaps of decisions made between hypergame under uncertainty, and traditional game under complete certainty.In addition, we evaluated our model using metrics system security and performance in terms of system lifetime, FPR, and TPR of a NIDS.Second, we examined how players' perceived uncertainty in real environments can impact their decisions in choosing strategies.We showed how HEU enables calibrating the expected utility per strategy where each player's perceived uncertainty is applied.Further, we investigated to what extent HNE can be achieved to reflect real-world systems under high uncertainty.We compared NE solutions with HNE solutions to explain the gaps to be addressed.Third, we demonstrated extensive sensitivity analyses of how the proposed model performs with respect to the number of games, which provide an indepth understanding of our hypergame model and real-world performance.

VII. CONCLUSION & FUTURE WORK
From this study, we obtained the following key findings: • Defensive deception (DD) can significantly extend system lifetime.It can further extend the lifetime in games of imperfect information, maximizing the effectiveness of DD. • With DD, the defender's uncertainty decreases while the attacker's uncertainty does not fluctuate much.This is because DD contributes to collecting intelligence of the detected inside attacker more.• MTTSF and TPR/FPR of the NIDS are the keys to system security.However, we found that higher DHEU does not necessarily lead to higher MTTSF or TPR because defense utility counts defense cost in addition to defense impact representing defense effectiveness.• Game-theoretic ML-based defense solutions provide higher performance, mainly when attackers slowly arrive while players' perceived uncertainty seems not the critical factor in affecting MTTSF.This is because the gametheoretic ML-based scheme simply predicts an optimal defense strategy based on the patterns identified during the training phase without considering the defender's perceived uncertainty.• We derived Hyper Nash Equilibrium (HNE) of the attackers and the defender and investigated how well they match each other.Since in a hypergame, the players' views are not in sync, we observed there is a fairly low HNE hitting ratio of the attacker and defender having the same HNE.In particular, under the hypergame of imperfect information, we observed a lower HNE hitting ratio than under the game of perfect information.This helps the defender perform better, prolonging system lifetime by providing chances to manipulate the attacker's perceived uncertainty.Future research: (1) To enhance the prediction accuracy of opponents' strategies using an ML-based approach under high dynamics and uncertainty, we plan to evaluate deep reinforcement learning (DRL) algorithms and time series algorithms, such as Autoregressive Integrated Moving Average (ARIMA); (2) We will consider additional intelligent adversaries that can also perform ML-based or DRL-based strategy selection and investigate the game dynamics and the performance of the attackers and defender in repeated games; and (3) Currently we only considered vacuity as an uncertainty dimension caused by a lack of information or knowledge.In formulating the hypergame expected utility functions, we will also consider other types of uncertainty caused by different root causes, such as conflicting information/opinions or different observations on a single event, and investigate their impact on the choices of strategies by players.

Fig. 3 .
Fig. 3. MTTSF, TPR of the NIDS, and FPR of the NIDS with respect to varying the attacker arrival probability.
• TPR or FPR of the NIDS measures the mean true or false positive rate to measure the effectiveness of DD strategies in improving the NIDS.• HNE Hitting Ratio measures the fraction that the attacker's HNE (HNE(G A )) is matched with the defender's HNE (HNE(G D )), where G A and G D are the games viewed by the defender and attacker, respectively.

Fig. 3
Fig.3shows the comparative performance of the eight schemes in terms of MTTSF, TPR, and FPR with respect to varying attack arrival probability, P A .In Fig.3a, we observe: (1) DD-based schemes outperform No-DD-based counterparts due to the effectiveness of DD strategies in helping the NIDS generate lower false positives/negatives. (2) Game-theoretic ML-based schemes outperform pure game-theoretic counterparts because the defender's choice of strategies is less affected by uncertainty or can leverage the merit of DD strategies based on learning from a sufficient amount of past experience.(3) As more attackers arrive, the eight schemes show almost the same MTTSF except for the No-DD-based schemes.(4) The order of the performance is: HG-DD-ML-IPI ≥ G-DD-ML-PI ≥ HG-DD-IPI ≈ G-DD-PI > DD-Random > No-DD-Random ≈ G-No-DD-PI ≈ HG-DD-IPI.

Fig. 4 .
Fig. 4. Uncertainty and collective HEU (C-HEU) of the two players with respect to varying the attacker arrival probability.

Fig. 3 (
Fig. 3(b) demonstrates the TPR of the NIDS under the eight schemes when P A varies.The key findings are: (1) No-DD schemes do not have any sensitivity over varying P A because they do not allow insider attacker(s) when no DD is used.(2) DD-Random scheme is less influenced by varying P A than other DD-based schemes because the defender under this scheme selects a bundle strategy at random, making the defender have the same probability to select a DD strategy under varying P A .(3) DD-ML-based schemes outperform others because of more frequent use of DD strategies.Greater use of DD helps the defender obtain greater attack intelligence, decreasing FNR and increasing TPR.Because DD-ML-based schemes prefer optimal strategies with higher utilities, higher TPR under smaller P A means that selected DD strategies in a bundle can provide the defender higher utilities under smaller P A .(4) The performance order follows: DD-Random ≈ HG-DD-IPI > G-DD-PI > HG-DD-ML-IPI ≈ G-DD-ML-PI > No-DD-Random ≈ G-No-DD-PI ≈ HG-DD-IPI.Figs.4a and 4b show how the eight schemes perform with respect to varying P A in terms of the attacker's and defender's uncertainty.The key findings are: (1) Schemes under PI have zero uncertainty (i.e., all games of perfect information, PI) while complete uncertainty (i.e., g = 1) is observed under DD-Random and No-DD-Random schemes.(2) Under games of IPI (i.e., HG-DD-IPI, HG-DD-ML-IPI, HG-No-DD-IPI), the defender's uncertainty increases slightly as P A increases because new attackers keep arriving.(3) Overall no schemes show much sensitivity as P A grows as a small portion of inside attacks is maintained due to the presence of the NIDS and eviction of inside attackers when DS 4 is selected.(4) While DD is used, both players perceive lower uncertainty because there are higher chances for them to observe their opponent due to a longer monitoring time allowed by the DD.Figs.4cand 4dshow how the eight schemes perform as P A increases in terms of C-AHEU and C-DHEU.Fig.4c shows:(1) The attackers under PI tend to obtain higher utilities than under IPI or Random.(2) When the attackers use Randombased schemes, they have the lowest utilities.(3) Compared to Fig.3a, the attacker's utility does not directly represent the level of attack impact because AHEU measures the subjective utility of the attacker and there are other factors causing system failure, such as evicting legitimate nodes due to false positives of the NIDS.In Fig.4dshowing C-DHEU, we find: (1) Hypergame theoretic ML-based schemes show the highest utilities, which are well aligned with the best MTTSF Fig. 3(b) demonstrates the TPR of the NIDS under the eight schemes when P A varies.The key findings are: (1) No-DD schemes do not have any sensitivity over varying P A because they do not allow insider attacker(s) when no DD is used.(2) DD-Random scheme is less influenced by varying P A than other DD-based schemes because the defender under this scheme selects a bundle strategy at random, making the defender have the same probability to select a DD strategy under varying P A .(3) DD-ML-based schemes outperform others because of more frequent use of DD strategies.Greater use of DD helps the defender obtain greater attack intelligence, decreasing FNR and increasing TPR.Because DD-ML-based schemes prefer optimal strategies with higher utilities, higher TPR under smaller P A means that selected DD strategies in a bundle can provide the defender higher utilities under smaller P A .(4) The performance order follows: DD-Random ≈ HG-DD-IPI > G-DD-PI > HG-DD-ML-IPI ≈ G-DD-ML-PI > No-DD-Random ≈ G-No-DD-PI ≈ HG-DD-IPI.Figs.4a and 4b show how the eight schemes perform with respect to varying P A in terms of the attacker's and defender's uncertainty.The key findings are: (1) Schemes under PI have zero uncertainty (i.e., all games of perfect information, PI) while complete uncertainty (i.e., g = 1) is observed under DD-Random and No-DD-Random schemes.(2) Under games of IPI (i.e., HG-DD-IPI, HG-DD-ML-IPI, HG-No-DD-IPI), the defender's uncertainty increases slightly as P A increases because new attackers keep arriving.(3) Overall no schemes show much sensitivity as P A grows as a small portion of inside attacks is maintained due to the presence of the NIDS and eviction of inside attackers when DS 4 is selected.(4) While DD is used, both players perceive lower uncertainty because there are higher chances for them to observe their opponent due to a longer monitoring time allowed by the DD.Figs.4cand 4dshow how the eight schemes perform as P A increases in terms of C-AHEU and C-DHEU.Fig.4c shows:(1) The attackers under PI tend to obtain higher utilities than under IPI or Random.(2) When the attackers use Randombased schemes, they have the lowest utilities.(3) Compared to Fig.3a, the attacker's utility does not directly represent the level of attack impact because AHEU measures the subjective utility of the attacker and there are other factors causing system failure, such as evicting legitimate nodes due to false positives of the NIDS.In Fig.4dshowing C-DHEU, we find:(1) Hypergame theoretic ML-based schemes show the highest utilities, which are well aligned with the best MTTSF performance in Fig.3a.(2) However, for other schemes, their utilities are not necessarily well aligned with the performance in MTTSF (e.g., No-DD-based schemes shows the lowest MTTSF) because a player's utility also considers cost associated with a given strategy.(3) Hypergame theoretic ML-based schemes show higher utilities as P A increases because each defense strategy can handle more attacks (i.e., DHEU is the sum of HEU towards all attackers).(4) Although HG-DD-IPI is the third-best performing scheme in MTTSF, its utilities are not much distinct from other schemes, including Random-based schemes.This implies that since the defender is concerned about the defense cost in addition to effectiveness, the defense strategy with higher utility does not necessarily lead to a longer system lifetime.In addition, ML-based schemes yield higher utilities as more attackers arrive in the system.This is because the arrival of more attackers can provide less benefit for more effective defense strategies while giving more benefit to less costly defense strategies.Recall that the defense utilities are estimated based on both the defense impact and cost.
κp = c At κp and c Dt κq = r Dt κq .Similarly, the attacker calculates DHEU by using r Dt κq = c Dt κq and c Dt κq = r Dt κq .For each player's belief estimation, refer to Eqs. (

TABLE II CHARACTERISTICS
OF APT ATTACK STRATEGIES [50] 4 -Distributed Denial-of-Service (DDoS): It is performed by sending queries multiple times to reduce service availability in the network[50].When nodes receive multiple service requests, they may not properly deal with underlying security operations due to the increased workload.DDoS can increase unknown vulnerabilities, uv i by 1 %.It incurs high attack cost with ac 4 = 3. • AS 5 -Zero-day attacks: It exploits a node's unknown vulnerability, uv i , to compromise an adjacent node that is not patched yet.The attacker seeks to obtain root permission and collect the node's information or perform further attacks on other nodes.This attack is available from E to DE stages with a low cost ac 5 = 1.• AS 6 -Key compromise: This compromises private or secret keys of legitimate users by exploiting the encryption vulnerability êv i .It incurs a high attack cost with ac 6 = 3 due to the high complexity of compromising keys.The adversary can reauthenticate the node by resetting the obtained private key and carrying out malicious actions, such as stealing other confidential information or implanting malware.
[51] 7 -Fake identity: It can be performed when a system does not use authentication for packet transmissions or an inside attacker spoofs a source node's ID in packet transmission[51].It incurs medium cost with ac 7 = 2. Attack success will increase the encryption vulnerabilities of legitimate nodes by 1 % when they obtain secret keys from victim nodes.•AS 8 -Data exfiltration: This attack compromises an adjacent node based on software and encryption vulnerability.The attacker checks all compromised nodes.And if the accumulated importance of the collected data exceeds a threshold (Th c , i.e., j ∈C A c j > Th c