Introduction
Defensive deception (DD) has emerged as a promising proactive defense technique. DD employs the information asymmetry between an attacker and a defender about a target system to mislead the attacker into selecting a suboptimal strategy [1], [2]. Game-theoretic DD techniques typically address settings wherein a defender seeks for attack failures by applying defense strategies to deceive the attacker and mislead its decision-making [3], [4]. Since the key idea of game-theoretic defensive deception is to manipulate the attacker’s beliefs, how to exploit the attacker’s beliefs under uncertainty to maximize the effectiveness of deception has been the main concern of the DD research. Conventional game theory assumes that players perceive the correct beliefs of their opponents [5], which is not realistic in real-world situations. However, deviations from this have not been not adequately explored [4]. In this work, we consider the players’ perceived uncertainty derived from environmental and system dynamics, which affects their beliefs towards the strategies taken by the opponents and make decisions based on utility under the beliefs. Such games have been considered in Hypergame Theory [6], [7] which is leveraged in this work. We consider a high-level cybergame study between multiple Advanced Persistent Threat (APT) attackers and one defender which has been rarely studied. We introduce bundle-based defense strategies so that a single defender handles multiple APT attackers.
Cyberdeception games have been proposed that consider uncertainty. Prelec [8] considers uncertainty derived from human deviated perception. Rass et al. [9] designed extensive form games for capturing the uncertainty in APTs, including disagreement risk assessments, adversarial incentives, and uncertainty about the current system state. Mohammadi et al. [10] used a signaling game to learn the strategy of the deployed fake avatar under uncertainty. However, [8], [9], and [10] consider a game where both the attacker and defender have the same view even if uncertainty is considered in a probabilistic way. In contrast, our work handle the case where the attacker and defender have distinct views about a game and may play different subgames. In addition, they select a strategy based on their beliefs and use an estimated uncertainty quantity in calibrating utilities for the final decision-making in selecting the best strategy.
A cyberdeception game can be formulated as a sequential game,
Game-theoretic DD techniques have been used to handle the APT attacker [14]. However, most game-theoretic DD techniques dealt with APT attacks in the Reconnaissance stage (i.e., scanning attack) but not the other CKC stages. They also mainly considered a two-player attack-defense game [3], [4] even if multiple APT attackers may exist in a real system. Further, machine learning (ML) is not employed with game-theoretic DD especially in developing honeypots [4].
This work develops a game framework that considers six CKC stages, and a single defender against multiple APT attackers. Considering all six stages and multiple attackers increases the complexity of the game. However, such a comprehensive game can provide more realistic conclusions than over-simplified cyber game studies.
Our work make the following key contributions:
We consider a cyber deception hypergame whose multiple APT attackers and a single defender play a game under uncertainty, where the APT attackers perform multi-staged attacks following the Cyber Kill Chain (CKC). We propose a bundle-based defense strategy given a defense budget constraint. Our approach is the first that can handle multiple APT attackers in a hypergame-theoretic environment.
We present an ML-based cyber deception hypergame for the defender to directly identify its optimal strategies based on more accurate prediction of an opponent’s move. No prior work has considered such an ML-based cyber deception game.
We conduct an in-depth comparative performance analysis of eight schemes under various conditions associated with DD, information availability, or ML. The results show that the effectiveness of DD strategies is maximized under imperfect information whereas strategies using hybrid hypergame-ML-based defensive deception outperform the baseline strategies.
We investigate Hyper Nash Equilibrium (HNE) [15]. We show the degree of the discrepancies observed between the HNE solutions by APT attackers and the defender, respectively, across the whole repeated game. This shows how hypergame theory addresses the effectiveness of DD as the defender can increase the attacker’s uncertainty via DD to achieve attack failure. This is the first work that proposes a cyber deception hypergame with the analysis of HNE.
In addition to demonstrating the degree of perceived uncertainty by the attacker and defender and their hypergame expected utilities (HEUs), we also show how much the quality of the network-based intrusion detection system (NIDS) is improved in terms of reducing false positives and negatives, recall, and precision. This is to show how effectively the defensive deception techniques improve the quality of a legacy defense mechanism, such as NIDS.
Unlike [13], which only considers a single attack-defense interaction and the associated hypergame, this work considers multiple attackers simultaneously arriving in a given system which interact with a defender taking a bundle-based defense strategy.
We reformulated the entire hypergame framework to deal with multiple attackers and a defense with multiple strategies in a bundle to deal with them. This reformulation of the hypergame includes belief calculation, hypergame utility estimation, and uncertainty calibration. To clearly demonstrate this extended form of the proposed hypergame, we demonstrate an extensive-form game between an attacker and a defender in Fig. 1 of the supplement document.
We also conducted HNE analysis to identify the discrepancies between the attacker’s HNE and the defender’s HNE. When the HNE hitting ratio is low, it means the attackers and defender do not have a common view about the hypergame and they can choose sub-optimal or poor strategies due to the uncertainty. This is measured in order to explain how DD can be more effectively deployed under uncertainty by deceiving the attacker better.
We also employed a machine learning (ML) algorithm to identify a defender’s optimal strategy which is also compared with the counterpart without ML. We selected the decision tree algorithm, which performed the best among all other ML algorithms, and compared its performance with the HEU-based counterpart.
We considered four additional schemes in this paper, compared to [13], to conduct in-depth experiments for extensive performance analysis. In addition, we illustrated our proposed hypergame theoretic framework under a given experimental setting to provide a clear conceptual idea proposed in this work via Figs. 1 and 2.
The IoT system is under attack by multiple attackers. And the defender uses honeypot to lure inside attacker and protects Web server & database. The attacker on the left side is an insider attacker and lured by honeypot. The attacker on the right side is an outside attacker and it chooses phishing attack according to the associated HEU value.
Overview of the proposed hypergame between an attacker and a defender where they view the same game differently and make decisions based on the Hypergame Expected Utility.
Related Work
In this section, we provide an overview of the related work in terms of game-theoretic defensive deception, cybersecurity research using hypergame theory, game-theoretic defense against multiple attackers, and game-theoretic ML research.
A. Game-Theoretic Defensive Deception
Xiao et al. [16] investigated the framing effect of an attacker on the APT detection. They used Cumulative Prospect Theory to model players’ subjective decision-making under uncertainty and derive NE solutions. Basak et al. [17] used a multi-stage Stackelberg game for a defender to optimally deploy honeypots and identify an attacker’s type. Pawlick and Zhu [18] formulated a two-player signaling game where a defender protects a system using honeypots while an attacker can detect the honeypots based on evidence. Píbil et al. [19] modeled a zero-sum game of imperfect and incomplete information where a defender aims to maximize the chances of an attacker selecting honeypots while an attacker is to maximize the chances of selecting real servers as targets. However, they [16], [17], [18], [19] consider neither multiple APT attackers nor perceived uncertainty.
B. Cybersecurity Research Using Hypergame Theory
Bennett [6] introduced hypergame theory (HT) to analyze how a player’s beliefs, misbeliefs, and perceived uncertainty affect its decision making. Vane [7] extended HT by introducing the concept of hypergame expected utility (HEU), an expected utility estimated based on each player’s subjective belief under uncertainty. HT research was explored in the military for decision making in adversarial settings [20], [21], [22]. Recent efforts explored cybersecurity research using HT. Ferguson-Walter et al. [11] provided an HT-based framework quantifying how defensive deception can manipulate an attacker’s beliefs on available actions and expected payoffs under uncertainty. Bakker et al. [23] [24] formulated a Bayesian hypergame for robustness of a cyber-physical control system against attacks. Cho et al. [25] formulated a cyber deception hypergame (CDHG) and investigated how an attacker and a defender choose their best strategy by evaluating Stochastic Petri Nets with the CDHG. However, no prior work has adopted HT to consider DD to defend against multiple APT attackers [4].
C. Game-Theoretic Defense Against Multiple Attackers
Gao et al. [26] proposed a non-zero game with multiple attackers and multiple defenders in a massive Machine Type Communication with many communication terminals the multiple attackers can access. Xu and Zhuang [27] designed a sequential game with complete information to model multiple independent attackers. Sanjab and Saad [28] formulated a Stackelberg game between multiple adversaries to deploy data injection attacks and a defender in a smart grid environment. However, work [26], [27], [28] did not consider a bundle defense strategy to handle multiple APT attackers.
D. Game-Theoretic Machine Learning (ML) Research
ML has been applied to enhance players’ decision-making in choosing their best strategies. Wahab et al. [29] applied a Support Vector Machine (SVM) for a defender to better perceive an attacker’s type based on a repeated Bayesian Stackelberg game. Chung et al. [30] demonstrated the superiority of Q-learning over a conventional two-player stochastic game in choosing optimal defense strategies against suspicious users. Kantarcıoğlu et al. [31] considered a multi-stage Stackelberg game and analyzed its equilibrium. This approach improves a Bayesian-based classifier to discourage the attacker to change its strategy. Xu and Xie [32] solved an intrusion detection problem to predict a value function using a Markov chain model. However, unlike our approach, no prior work has applied ML where it uses the key features of a given game, including attack cost, attack impact, past attack moves, or APT attack stages in the CKC, to directly identify an optimal defense strategy [4].
System Model
In this section, we provide the details of the network model, node model, and system failure conditions.
A. Network Model
We assume an Internet-of-Things (IoT) system with a central entity playing a defender role to deal with multiple APT attackers. For example, such network environments include an IoT with a software-defined network (SDN) using a single SDN controller [33], an IoT network with an edge server or central cloud server, or a hierarchical wireless sensor or ad-hoc network with a centralized coordinator [34]. Although our work considers a single defender to make a decision and identify an optimal defense strategy, it can be extended to consider multiple defenders dealing with multiple APT attackers. We consider a mobile network and model it using a rewiring probability
We assume a NIDS deployed in the central controller. Developing an IDS is beyond the scope of this work. Hence, we simply characterize it by its false positives
To maximize the effectiveness of DD strategies, we allow detected, compromised nodes to stay in the system based on their risk level. If node
To help understand the model, we provide Fig. 1 to demonstrate the IoT system under the attack by multiple attackers, and the defender protects the Web server & database using honeypots.
B. Node Model
We consider an IoT network with variant node types, including IoT devices, and Web and database servers. Following the design of existing security-aware systems [36], [37], [38], we consider an IoT system with databases separately from Web services. In addition, we consider honeypots as one defense strategy (i.e.,
An asset (or node)’s criticality has been measured in terms of its importance in terms of information, role, or operation in service provision, performance, or security [39], [40], [41], and influence or power in a network [42]. The underlying idea of a node’s criticality measure is to represent how much adverse impact or damage can be introduced to the system when the asset (or node) is compromised or fails. Hence, we measure node i’s criticality, \begin{equation*} c_{i} = \mathrm {importance}_{i} \times \mathrm {centrality}_{i},\tag{1}\end{equation*}
We consider three types of vulnerabilities of a node exposed to various cyberattacks: (1) software vulnerabilities, \begin{equation*} P_{i}^{v} = \frac {\arg \max _{v_{j} \in V_{i}} v_{j}^{i}}{10}, \tag{2}\end{equation*}
C. System Failure Conditions
We consider a repeated game where players continue playing the given game until the system fails. To allow this, we define a system failure (SF) condition by:\begin{align*} SF = \begin{cases} 1 & {\mathrm{ if}} \rho _{1} \leq \frac {\sum _{i \in G} \mathrm {cp}_{i} \cdot \mathrm {importance}_{i}}{\sum _{i \in G} \mathrm {importance}_{i}} \; || \; \rho _{2} \geq \frac {|G_{t}|}{|G|} \\ 0 & {\mathrm{ otherwise,}} \end{cases} \tag{3}\end{align*}
Cyber Deception Hypergame
In this section, we describe the overview of the cyber deception hypergame considered in this work. We provide the details of hypergame theory in Appendix A of the supplement document. Since we use game theory, the attackers and the defender are assumed to choose their optimal strategies to maximize their respective utilities.
A. Attacker Model
We consider an APT attacker that performs multiple attacks throughout each stage of the CKC.. In Table II, we summarize the characteristics of each attack strategy \begin{equation*} \mathrm {ai}_{p}^{t} = \sum _{j \in C_{p}^{t}} \mathrm {c}_{j}.\tag{4}\end{equation*}
Since attacker
For the attacker to maximize its chances to compromise a legitimate node, it selects a node with maximum exploitability. We define exploitability as the extent that attacker \begin{align*} E \left ({i, j, p}\right)= \begin{cases} -1 \qquad {\mathrm{ if}} j {\mathrm{ is detected as a honeypot}} \\ \quad \qquad {\mathrm{ with}} \mathrm {ad}_{i} {\mathrm{ for LHs or}} \mathrm {ad}_{i}/2 {\mathrm{ for HHs}};\\ \left ({3-ac_{p}}\right) \cdot P_{i}^{v}, \; \; {\mathrm{ otherwise,}} \end{cases} \tag{5}\end{align*}
We consider a repeated game consisting of multiple games between a defender and attackers. We consider that a new attacker arrives with probability
The key characteristics of APT attacks are considered where an attacker chooses a strategy given in a subgame, corresponding to each stage of the CKC. That is, after the attacker successfully performs the chosen attack strategy in each stage of the CKC (i.e., an attack strategy in the corresponding subgame), it can proceed to the next stage of the CKC and can take one of the strategies available in the next stage, which is also considered in the corresponding subgame. Table II summarizes attack strategies available in each stage of the CKC in addition to the corresponding attack cost, involvement of node compromise attack, and what vulnerabilities are exploited to perform the attack.
1) Attack Strategies:
We consider the below attack strategies the APT attackers can take by following the six stages of the CKC [46], [47], [48].
– Scanning attack: This attack is to collect vulnerability information and identify targets to attack. It can be performed by both inside and outside attackers in all stages. This passive attack does not involve any node compromise, implying that the attack impact is zero, and attack cost is low withAS_{1} . The attacker first selects target node\mathrm {ac}_{1} = 1 at random and monitors it. We define the scanning attack success (SAS) for node{j} by attacker{j} as:{i} , whereP^{SAS}_{ij} = P_{j}^{v} \cdot e^{-{}\frac {1}{T_{i}^{A_{t}}}} is obtained by Eq. (2) andP_{j}^{v} is the elapsed monitoring time attackerT_{j}^{A_{t}} has monitored the target system. Note that the attacker’s elapsed monitoring time,{i} , is reset when the system changes its configuration. When the system disseminates patches to mitigate vulnerabilitiesT_{i}^{A_{t}} , the attacker’s collected intelligence about the system vulnerability may not be valid any more. In addition, the attacker may obtain false information misled by honeypots(DS_{2}) or honey information(DS_{5}) .(DS_{6}) – Phishing: This attack includes social engineering, pretexting, baiting, and tailgating [49]. An outside attacker sends a phishing email to nodes that have been scanned. If the attacker is outside of the target system (i.e., in R or D stage in the CKC), it sends phishing emails up toAS_{2} number of monitored nodes in the target system. However, if the attacker is an inside attacker, it propagates phishing emails to its neighbors. Based on the vulnerability level of each node (i.e.,N_{p} ) receiving the phishing emails, this attack can be successful. Since humans can easily detect phishing emails with the help of embedded phishing detectors, it is quite costly to develop a real-like phishing attack. We consider it with a high attack cost,P_{i}^{v} .\mathrm {ac}_{2} = 3 – Botnet attack: botnet is a set of compromised nodes. Bots perform epidemic attacks such as spreading malware to compromise adjacent nodes [50]. This strategy incurs a high cost ofAS_{3} involving multiple bots.\mathrm {ac}_{3} = 3 – Distributed Denial-of-Service (DDoS): It is performed by sending queries multiple times to reduce service availability in the network [50]. When nodes receive multiple service requests, they may not properly deal with underlying security operations due to the increased workload. DDoS can increase unknown vulnerabilities,AS_{4} by\mathrm {uv}_{i} . It incurs high attack cost with\epsilon _{1} \% .\mathrm {ac}_{4} = 3 – Zero-day attacks: It exploits a node’s unknown vulnerability,AS_{5} , to compromise an adjacent node that is not patched yet. The attacker seeks to obtain root permission and collect the node’s information or perform further attacks on other nodes. This attack is available from\mathrm {uv}_{i} to DE stages with a low cost{E} .\mathrm {ac}_{5} = 1 – Key compromise: This compromises private or secret keys of legitimate users by exploiting the encryption vulnerabilityAS_{6} . It incurs a high attack cost with\hat {\mathrm {ev}}_{i} due to the high complexity of compromising keys. The adversary can reauthenticate the node by resetting the obtained private key and carrying out malicious actions, such as stealing other confidential information or implanting malware.\mathrm {ac}_{6} = 3 – Fake identity: It can be performed when a system does not use authentication for packet transmissions or an inside attacker spoofs a source node’s ID in packet transmission [51]. It incurs medium cost withAS_{7} . Attack success will increase the encryption vulnerabilities of legitimate nodes by\mathrm {ac}_{7} = 2 when they obtain secret keys from victim nodes.\epsilon _{1} \% – Data exfiltration: This attack compromises an adjacent node based on software and encryption vulnerability. The attacker checks all compromised nodes. And if the accumulated importance of the collected data exceeds a thresholdAS_{8} , i.e.,({\mathrm {Th}}_{c} ), the attacker exfiltrates high-value system information to unauthorized outside parties. It incurs high cost with\sum _{j \in C_{A}} \mathrm {c}_{j} > \mathrm {Th}_{c} .\mathrm {ac}_{8} = 3
2) Attacker’s Hypergame Expected Utility (AHEU):
AHEU is estimated based on the function of a player’s perceived uncertainty, expected utility for a given strategy, and the player’s belief about what strategies to take in a given subgame when an opponent takes a certain strategy.
Attacker’s Uncertainty: Unlike existing hypergame approaches considering a static uncertainty level [7], [52], we dynamically estimate each player’s uncertainty as follows. Attacker \begin{equation*} g^{A_{t}}_{i} = 1- \exp \left ({-\lambda \cdot \mathrm {\left ({1+ \left ({1-\mathrm {ad}_{i}}\right) \cdot \mathrm {dec}}\right)}/T^{A_{t}}_{i}}\right), \tag{6}\end{equation*}
Attacker’s Utility: It is attack strategy \begin{align*} u_{pq}^{A_{t}} = G_{pq}^{A_{t}} - L_{pq}^{A_{t}}, \; \; G_{pq}^{A_{t}} = \mathrm {ai}_{p}^{t} + \mathrm {dc}_{q}^{t}, \; \; L_{pq}^{A_{t}} = \mathrm {ac}_{p}^{t} + \mathrm {di}_{q}^{t}, \tag{7}\end{align*}
Attacker’s Belief: is estimated based on the frequency of strategies \begin{equation*} r_{\kappa p}^{A_{t}} = \frac {\gamma _{\kappa p}^{A_{t}}}{\sum _{p \in \mathbf {AS}_{\kappa }} \gamma _{p}^{A_{t}}}, \; \; c_{\kappa q}^{D_{t}} = \frac {\gamma _{\kappa q}^{D_{t}}}{\sum _{i \in \mathbf {DS}_{\kappa }} \gamma _{q}^{D_{t}}}, \tag{8}\end{equation*}
Finally, attacker \begin{equation*} {\mathrm{ AHEU}}\left ({rs_{ip}^{A}}\right) = {\mathrm{ HEU}} \left ({p, g_{i}^{A_{t}}}\right), \tag{9}\end{equation*}
B. Defender Model
The defender plays a game based on which subgame \begin{equation*} \mathrm {di}_{q}^{t} = \sum _{p \in AS_{t}} \left[{\sum _{\kappa = 0}^{K} c^{A}_{\kappa p}}\right] \cdot e^{-\xi \cdot \mathrm {ai}_{p}^{D}}, \tag{10}\end{equation*}
1) Defense Strategies:
The defender considers a bundle strategy to play, consisting of more than one defense strategy. It considers only defense strategies available to a given subgame
– Firewalls: Firewalls monitor and control incoming and outgoing network flows based on predefined rules. When the defender enables the firewalls, it reduces unknown vulnerabilities{DS}_{1} of existing nodes. We model this by reducing(\mathrm {uv}_{i}) with\mathrm {uv}_{i} .\epsilon _{2} \% – Patch management: Patches reduce known vulnerabilities [54] by fixing discovered software vulnerabilities or providing updates in a full software package. This is modeled by reducing software vulnerabilities of all nodes, by decreasingDS_{2} with\mathrm {sv}_{i} .\epsilon _{2} \% – Rekeying cryptographic keys: This strategy reduces encryption vulnerability,DS_{3} where it reduces when\hat {\mathrm {ev}}_{i} = \mathrm {ev}_{i} \cdot e^{-1/\mathrm {T_{rekey}}} is reset (i.e.,\mathrm {T_{rekey}} ).\mathrm {T_{rekey}} = 1 – Eviction: As discussed in Section III-A, compromised nodes can stay in the system to collect further attack intelligence. However, additional compromise of confidential information can lead the system to fail faster based on the conditions in Eq. (3). The defender can take this strategy to evict all detected, compromised nodes to reduce vulnerabilities.DS_{4} – Low/high-interaction honeypots (LHs/HHs) [55]: The defender can use LHs and HHs to lure attackers. This strategy is taken to change attack paths as LHs and HHs are connected with existing nodes, particularly nodes with high vulnerabilities. HHs are connected with more vulnerable nodes than LHs. To reduce the attacker’s access to legitimate nodes, once the attacker reaches a honeypot, it can move only to other honeypots, not to legitimate nodes. This strategy can counter theDS_{5} attack strategies by luring attackers. Since HHs are more difficult to be detected by the attacker than LHs, we set ad for the attacker to detect LHs and ad/2 to detect HHs.AS_{2}-AS_{8} – Honey information [56]: The defender uses honey information, such as files, tokens, or fake patches indicating vulnerabilities to mislead attackers. This strategy applies defend againstDS_{6} as those attacks aim to obtain target information (i.e., vulnerability, private key) for their attack success.AS_{1}, AS_{2}, AS_{5}-AS_{8} – Fake keys [1]: The defender gives fake keys to attackers aiming to compromise keys and obtain confidential information using them. Particularly, when an attacker intends to compromise cryptographic keys when takingDS_{7} , orAS_{2}, AS_{6}, AS_{7} , this prevents the attacker from compromising legitimate nodes with a fake key.AS_{8} – Hiding network topology edges [57]: The defender hides some edges in the network. To select edges to hide, we selectDS_{8} of nodes with the highest criticality and select an edge to an adjacent node with the highest criticality to protect more important nodes with higher priority. This ensures hiding theC_{NT} number of edges that can change attack paths. When the attacker performsC_{NT} , this strategy protects the system by hiding the neighbors of a compromised node and preventing the escalation of those attacks.AS_{3}-AS_{8}
When the defender takes defensive deception techniques, including
2) Defender’s Hypergame Expected Utility (DHEU):
DHEU is a function of the defender’s perceived uncertainty, the utility of each strategy, and its belief.
Defender’s Uncertainty: We estimate the defender’s uncertainty towards attacker \begin{equation*} g^{D_{t}}(i) = 1-\exp \left ({-\mu \cdot \mathrm {ad}_{i}/T^{D_{t}}_{i}}\right), \tag{11}\end{equation*}
Defender’s Utility: The utility of defense strategy \begin{align*} G_{qp}^{D_{t}}=&\mathrm {di}_{q}^{t} + \mathrm {ac}_{p}^{t}, \; \; L_{qp}^{D_{t}} = \mathrm {dc}_{q}^{t} + \mathrm {ai}_{p}^{t}. \tag{12}\\ u_{qp}^{D_{t}}=&G_{qp}^{D_{t}} - L_{qp}^{D_{t}}, \; \; \tag{13}\end{align*}
Defender’s Belief: The defender forms its belief, \begin{equation*} r_{\kappa q}^{D_{t}} = \frac {\gamma _{\kappa q}^{D_{t}}}{\sum _{q \in \mathbf {DS}_{\kappa }} \gamma _{q}^{D_{t}}}, \; \; c_{\kappa p}^{A_{t}} = \frac {\gamma _{\kappa p}^{A_{t}}}{\sum _{p \in \mathbf {AS}_{\kappa }} \gamma _{p}^{A_{t}}}. \tag{14}\end{equation*}
The defender’s HEU (DHEU) is estimated by summing up all HEUs for all attacks \begin{equation*} {\mathrm{ DHEU}}\left ({rs_{q}^{D}}\right) = \sum _{i \in A_{t}} {\mathrm{ HEU}}_{i} \left ({rs_{q}^{D}, g^{D_{t}}(i)}\right). \tag{15}\end{equation*}
\begin{align*} {\rm C{-}DHEU}_{S_{d}} = \sum _{q \in S_{d}} {\mathrm{ DHEU}} \left ({rs_{q}^{D}}\right), \; \; s.t. \; \; \sum _{q \in S_{d}} \mathrm {dc}_{q}^{t} \leq B, \tag{16}\end{align*}
Fig. 2 describes how each player (i.e., an attacker and a defender) views the same game differently and makes decisions based on their observation and beliefs based on our discussions in this section. In addition, to clarify the game scenario considered in this work, we show an extensive-form game between an attacker and a defender in Fig. 1 of the supplement document.
Our model provides substantial details of attacker-defender interactions while avoiding unnecessary complexity in the hypergame. For example, the attack and defense costs provide a basis to estimate the expected utilities of the two players. However, they can be considered as the costs from the real world and be easily scaled in a larger range.
C. Game Theory-Guided, Machine Learning-Based Defense Strategy Selection
We introduce a defense strategy selection method based on game theory-guided machine learning (ML) where a game is with perfect or imperfect information. Under imperfect information, we formulate a hypergame, as in Section IV, where players observe opponents under uncertainty and select strategies based on the AHEU/DHEU. The defender can predict an optimal defense strategy using ML to form the best bundle strategy based on its learning from the training dataset where the attacker follows game-theoretic (either hypergame or conventional game) choices of identifying its best strategy. When generating training datasets, for each game, the attacker chooses its strategy based on AHEU while the defender takes its defense strategy at random. This is to include a wide range of defender behaviors including strategies with utilities, allowing an ML technique to better predict the defender’s best bundle strategy across all possible combinations of eight defense strategies.
We employ the decision tree (DT) algorithm [58] to solve this classification problem because DT shows the best performance (highest MTTSF) empirically, compared to other ML techniques (e.g.,
Experiment Setup
This section provides the details of the simulation environment, comparing schemes, and metrics used for the experiments.
A. Simulation Environment
We consider a network of
In each subgame, a new attacker arrives as an outside attacker with probability
We develop our own simulation model to have more flexibility and due to the following reasons. First, since we need to implement 16 strategies in all, it adds too much complexity when an existing simulator is used. In addition, due to the unique nature of hypergame theory, we need to develop a game framework where each player has its own view about the given game, which adds unreasonably high complexity and overhead.
All simulations use Python 3.6 and the Scikit-learn 0.19.1 library for DT algorithm. The source code is available at GitHub [60]. We provide the key design parameters, their meanings, and default values used for our experiments in Table I of the supplement document.
B. Comparing Schemes
We compare the performance of eight schemes based on the combinations of: (a) with DD, with DD using ML (DD-ML), or without DD (No-DD); and (b) hypergame (HG) with imperfect information (IPI), traditional game (G) with perfect information (PI), or Random. Except for Random with or without DD, both HG and G are considered with all three types of DD. These combinations give the eight schemes including: HG-DD-IPI, G-DD-PI, HG-DD-ML-IPI, G-DD-ML-PI, DD-Random, HG-No-DD-IPI, G-No-DD-P, and No-DD-Random.
C. Metrics
We use the following metrics for our experiments:
Players’ Uncertainty
or(g^{A} ): An attacker’s or a defender’s mean uncertainty measured based on Eqs. (6) and (11), respectively, by:g^{D} where\begin{align*} g^{A} = \frac {\sum _{t \in T} \frac {\sum _{i \in S_{t}} g_{i}^{A_{t}}}{|S_{t}|}}{|T|}, \; \; g^{D} = \frac {\sum _{t \in T} \frac {\sum _{i \in S_{t}} g^{D_{t}}(i)}{|S_{t}|}}{|T|}, \tag{17}\end{align*} View Source\begin{align*} g^{A} = \frac {\sum _{t \in T} \frac {\sum _{i \in S_{t}} g_{i}^{A_{t}}}{|S_{t}|}}{|T|}, \; \; g^{D} = \frac {\sum _{t \in T} \frac {\sum _{i \in S_{t}} g^{D_{t}}(i)}{|S_{t}|}}{|T|}, \tag{17}\end{align*}
is a set of attackers in gameS_{t} and{t} is a set of subgames in the given whole repeated game.{T} Collective Hypergame Expected Utility (C-HEU): This refers to the mean C-HEU for attackers and a defender. The defender’s mean C-HEU, namely C-DHEU, for a repeated game with multiple subgames until the system fails is:
where C-DHEU\begin{equation*} {\mathrm{ C-DHEU}} = \frac {\sum _{t \in T} {\mathrm{ C-DHEU}}_{S_{d}}^{t}}{|T|}, \tag{18}\end{equation*} View Source\begin{equation*} {\mathrm{ C-DHEU}} = \frac {\sum _{t \in T} {\mathrm{ C-DHEU}}_{S_{d}}^{t}}{|T|}, \tag{18}\end{equation*}
is the defender’s HEU for selected strategies in a given defense bundle at game_{S_{d}}^{T} . The attacker’s C-AHEU is:{t} where C-AHEU\begin{equation*} {\mathrm{ C-AHEU}} = \frac {\sum _{t \in T} {\mathrm{ C-AHEU}}_{S_{a}}^{t}}{|T|}, \tag{19}\end{equation*} View Source\begin{equation*} {\mathrm{ C-AHEU}} = \frac {\sum _{t \in T} {\mathrm{ C-AHEU}}_{S_{a}}^{t}}{|T|}, \tag{19}\end{equation*}
means the sum of AHEUs for all attack strategies taken by all attackers and is measured by:_{S_{a}}^{t} where\begin{equation*} {\mathrm{ C-AHEU}}_{S_{a}}^{t} = \sum _{\left ({i, p}\right) \in S_{a}} {\mathrm{ AHEU}}\left ({rs_{ip}^{A_{t}}}\right), \tag{20}\end{equation*} View Source\begin{equation*} {\mathrm{ C-AHEU}}_{S_{a}}^{t} = \sum _{\left ({i, p}\right) \in S_{a}} {\mathrm{ AHEU}}\left ({rs_{ip}^{A_{t}}}\right), \tag{20}\end{equation*}
is a set of pairs including attackers and their chosen strategies in a given gameS_{a} ,{t} is strategyrs_{ip}^{A_{t}} chosen by attacker{p} , and{i} isg^{A_{t}}_{i} ’s uncertainty in given game{i} .{t} Mean Time To Security Failure (MTTSF) measures the average system lifetime based on Eq. (3).
TPR or FPR of the NIDS measures the mean true or false positive rate to measure the effectiveness of DD strategies in improving the NIDS.
HNE Hitting Ratio measures the fraction that the attacker’s HNE
is matched with the defender’s HNE({\mathrm{ HNE}}(G^{A})) , where({\mathrm{ HNE}}(G^{D})) andG^{A} are the games viewed by the defender and attacker, respectively.G^{D}
Results & Analyses
In this section, we demonstrate the effect of varying the attacker arrival probability on the performance metrics in Section V-C. Due to the space constraint, We demonstrate additional results for the comparative performance analyses under varying the vulnerability bounds and when a setting is fixed in Appendix B of the supplement document. We also discuss Hypergame Nash Equilibrium to investigate how the views of an attacker match with those of a defender, which plays the key in defensive deception. Lastly, we discuss how the hypergame theoretic defensive deception techniques have insightful implications in real-world systems.
A. Effect of Varying the Attacker Arrival Probability
Fig. 3 shows the comparative performance of the eight schemes in terms of MTTSF, TPR, and FPR with respect to varying attack arrival probability,
MTTSF, TPR of the NIDS, and FPR of the NIDS with respect to varying the attacker arrival probability.
Fig. 3(b) demonstrates the TPR of the NIDS under the eight schemes when
Figs. 4a and 4b show how the eight schemes perform with respect to varying
Uncertainty and collective HEU (C-HEU) of the two players with respect to varying the attacker arrival probability.
Figs. 4c and 4d show how the eight schemes perform as
B. Hyper Nash Equilibrium
Nash Equilibrium (NE) is a widely adopted solution concept in game theory (i.e., an action to take) assuming that each player knows the equilibrium strategies of all other players and its belief towards what strategy other players take is correct. NE assumes that each player has accurate beliefs about their opponent which does not reflect real-world situations. HNE tackles how players’ choices of strategies under uncertainty differ from NE. In a hypergame, players do not have the same views toward other players and may have incorrect beliefs about the next moves of other players. Due to uncertainty, players’ best responses may not be aligned with the best responses with the inaccurate beliefs of an opponent, which is assumed in NE. HNE addresses biased observations. Although common knowledge about what players know about other players can form their own beliefs, it does not guarantee maintaining correct beliefs among all players. Therefore, NE cannot be applied in this hypergame setting [15], [61], [62], [63].
For a game \begin{equation*} v_{i}\left ({a^{*}_{i}, a^{*}_{-ii}}\right) \geq v_{i}\left ({a_{i}, a^{*}_{-ii}}\right)\; \; {\mathrm{ for}}\; \; \forall i, \forall a_{i},\tag{21}\end{equation*}
This implies that the attacker views
In HNE, each player has its view of given game
Due to the space constraint, we demonstrate the additional results for the comparative performance analyses under varying the vulnerability bounds and when a setting is fixed in Appendix B of the supplement document.
C. Implications in Real-World Applications
Defensive deception technologies include honeypots to detect threats from the Internet [64], protect the data of government or companies [65], [66], or lure attackers to protect high-value system assets [67]. However, they are mainly limited to honeypot development or deployment. Although game theory is not fully applied in practice, a rich volume of game-theoretic DD techniques have been proposed and validated via analytical, simulation, emulation, or testbed experiments [4]. Our work makes the following contributions. First, we validate our model both theoretically and empirically. Specifically, for theoretical validation, we evaluated our model based on the core metrics of game theory, including utilities, perceived uncertainties of players, the gaps of decisions made between hypergame under uncertainty, and traditional game under complete certainty. In addition, we evaluated our model using metrics system security and performance in terms of system lifetime, FPR, and TPR of a NIDS. Second, we examined how players’ perceived uncertainty in real environments can impact their decisions in choosing strategies. We showed how HEU enables calibrating the expected utility per strategy where each player’s perceived uncertainty is applied. Further, we investigated to what extent HNE can be achieved to reflect real-world systems under high uncertainty. We compared NE solutions with HNE solutions to explain the gaps to be addressed. Third, we demonstrated extensive sensitivity analyses of how the proposed model performs with respect to the number of games, which provide an in-depth understanding of our hypergame model and real-world performance.
Conclusion & Future Work
From this study, we obtained the following key findings:
Defensive deception (DD) can significantly extend system lifetime. It can further extend the lifetime in games of imperfect information, maximizing the effectiveness of DD.
With DD, the defender’s uncertainty decreases while the attacker’s uncertainty does not fluctuate much. This is because DD contributes to collecting intelligence of the detected inside attacker more.
MTTSF and TPR/FPR of the NIDS are the keys to system security. However, we found that higher DHEU does not necessarily lead to higher MTTSF or TPR because defense utility counts defense cost in addition to defense impact representing defense effectiveness.
Game-theoretic ML-based defense solutions provide higher performance, mainly when attackers slowly arrive while players’ perceived uncertainty seems not the critical factor in affecting MTTSF. This is because the game-theoretic ML-based scheme simply predicts an optimal defense strategy based on the patterns identified during the training phase without considering the defender’s perceived uncertainty.
We derived Hyper Nash Equilibrium (HNE) of the attackers and the defender and investigated how well they match each other. Since in a hypergame, the players’ views are not in sync, we observed there is a fairly low HNE hitting ratio of the attacker and defender having the same HNE. In particular, under the hypergame of imperfect information, we observed a lower HNE hitting ratio than under the game of perfect information. This helps the defender perform better, prolonging system lifetime by providing chances to manipulate the attacker’s perceived uncertainty.
Future research: (1) To enhance the prediction accuracy of opponents’ strategies using an ML-based approach under high dynamics and uncertainty, we plan to evaluate deep reinforcement learning (DRL) algorithms and time series algorithms, such as Autoregressive Integrated Moving Average (ARIMA); (2) We will consider additional intelligent adversaries that can also perform ML-based or DRL-based strategy selection and investigate the game dynamics and the performance of the attackers and defender in repeated games; and (3) Currently we only considered vacuity as an uncertainty dimension caused by a lack of information or knowledge. In formulating the hypergame expected utility functions, we will also consider other types of uncertainty caused by different root causes, such as conflicting information/opinions or different observations on a single event, and investigate their impact on the choices of strategies by players.
ACKNOWLEDGMENT
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.