Interplay Between Malware Epidemics and Honeynet Potency in Industrial Control System Network

The Industrial Control System (ICS) is widely used in industrial processes, such as power grids, water conservancy, natural gas, petrochemical and so on. More and more cyber attacks are targeting the ICS worldwide. This paper presents a novel honeynet-based epidemic model in ICS network. The honeynet is an active approach that can attract malware attacks and provide sample information and immunization strategy of the malware. An epidemic model with immunization and quarantine in ICS network is formulated to explore the dynamics of the malware propagation, and the honeynet potency is analyzed as well. Theoretical analysis reveals the disease-free and endemic equilibrium of our model, then the local and global stability of the disease-free (endemic) equilibrium are examined by the basic reproduction number. Furthermore, numerical experiments show that the honeypot with more system vulnerabilities is conducive to suppress the malware epidemic, and the honeynet with lower average degree power low index can be more effectively. In addition, simulation experiments provide the actual behavior of malware propagation in the ICS network and verification of our derivations.

Blaster means that the security situation of ICS is becoming increasingly severe.
The conventional techniques against malware intrusion are usually based on a known set of malware samples [14]. Since the ICS network is different with the internet and the system can not stop running optionally, restarting and patching are not very suitable methods for ICS network security. However, the honeypot technology provides an approach to solve this problem and it brings a great innovation in ICS network security. The honeypot does not impact the normal operation of the ICS, it is essentially a technique of deceiving an attacker. By arranging some hosts, network services or information as bait, the attacker is induced to attack them, so that the attack behavior can be captured and analyzed. The honeypot can elucidate the malware's characteristics by monitoring and analyzing the attacks, so that it will provide feedback about immunization strategy to defenders. Meanwhile, attacks captured by honeypots can also serve as an ''early warning'' system for defenders, it can provide more time to deal with the attacks and waste the attacker's time and resources. Based on honeypots, the honeynet is a network system rather than a single honeypot, it looks more like an ordinary network. Usually, the honeynet consists of a large number of honeypots, it is ''a security resource whose value lies in being probed, attacked and compromised'' [16]. The honeynet does not depend on any specific ICS architecture, and it can respond to attacks to gain information more efficiently. The state of art of study researches that deals with honeynet to get enhance information security against ICS or Internet of things (IoT) system attacks [32]. Even though there are quite a few ways to evade those attacks, there is a crucial need for one which can turn table on attacker by using active approach. Honeynet is a very suitable solution [33].

B. LITERATURE REVIEW
Currently, most studies about honeynet focus on its technology level, such as data capture [16], information collection [17] and virtualization technology [18]. Ren and Xu [19] propose a compartmental model to explore the interplay between disease epidemics and honeynet potency. However, the interplay between disease epidemics and honeynet needs more appropriate discussions, and the communicating rate ij (t) is neglected in the model, it is exactly an important factor in the coupled complex networks. Honeynet in a systematic framework about malware epidemic has not been widely discussed in ICS network. The dynamics in honeynet is more complicated than that in honeypots. So it is necessary to model and discuss the way that malware spread in a honeynet, the study will help to inhibit the malware propagation, design and deploy the honeynet more effectively and economically.
In the past decades, some traditional epidemic models of infectious diseases are used to describe the propagation of Internet worms [5], the SIS model [6] and SIR model [7] have been proposed later inspired by human infectious disease. Then mathematical models [8]- [12] inspired by the SIR model have been employed to inhibit the spread of worms.
In addition, some approaches to detect and constrain malware have been studied, such as firewall, intrusion detection system (IDS) and benign worms defending system [13]. By using mathematical models and computational methods, epidemic dynamics on complex networks has been studied [20]- [22]. And some studies have shown that the topology (such as average degree and power-law index) of networks has significant influence on the propagation of malware [23]- [26]. Wang et al. [27] investigate an epidemic propagation on the three-layer interdependent networks, their analysis provides a basic framework for better understanding of epidemic propagation on multi-layered complex networks. A general formal study to obtain the reproduction number and discuss the positivity and stability properties of equilibrium points is proposed and formally discussed [34]. Existence and stability of equilibrium points of the models have been extensively studied [35], [36]. Consequently, epidemic models about malwares is used for reference in our study.

C. PAPER ORGANIZATION
Inspired by the mentioned works above, we propose an epidemic model in the ICS network to study the interplay between malware epidemics and honeynet potency. This paper is organized as follows. Section II introduces a honeynet-based malware propagation model in a two-layer (ICS network and honeynet) complex network. Section III analyzes the epidemic dynamics of our proposed malware propagation system, the disease-free equilibrium, endemic equilibrium and stability are discussed in detail. Numerical and simulation experiments are showed in Section IV and Section V, respectively. Section VI presents the discussion and conclusion for this paper.

II. MODEL FORMULATION
In this section, we aim to formulate an epidemic model with quarantine strategy. Since the two networks (ICS network and honeynet) are connected in some way, a two-layer, coupled network is formed. The basic topology of the two-layer complex network is shown in Figure 1. Each PLC (resp. honeypot) has two types of link-degrees: internal degrees and external degrees. Some models have introduced quarantine strategy into epidemic dynamics, but they are only suitable for a single-layer network [31]. We assume that the PLCs and honeypots in ICS network are the nodes, the links represent the communication relationships among the nodes. Total network size is fixed, and the nodes are divided into two categories, namely, PLCs and honeypots with respective network sizes N p and N H .

A. THE PROPAGATION OF MALWARE AMONG PLCS
In our model, the PLC is partitioned into four compartments depending on the states of the nodes under malware attacks: susceptible state (S p , which denotes the number of PLCs vulnerable to malware attacks), infected state (I p , which denotes the number of infected PLCs), quarantined state (Q p , which denotes the number of quarantined PLCs) and recovered state (R p , which represents the number of PLCs that recovered after infection). At every time step, the susceptible PLCs can be infected by the infected PLCs at a rate β 11 , and the infected PLCs can be restored by using anti-malware software or installing patches at a rate ϕ 1 . In some cases the patches are temporary and they may be loss of function if the system is updated or the malware is mutated. When the malwarevariants or unknown malwares appear, recovered PLCs may change to susceptible PLCs at a rate δ 1 .

B. THE PROPAGATION OF MALWARE AMONG HONEYPOTS
A honeypot is designed to trap the malwares using the system vulnerabilities, such as opening a service port and proceeding without system patches, thus, the honeypot has two obvious facts [19]. Firstly, an infected honeypot can not infect other connected nodes, including the PLCs and honeypots. Meanwhile, the honeypot is designed to attract the malware attacks, so a honeypot is not immune to malware attacks. However, a honeypot can be quarantined at a rate ϕ 2 . Thus, the honeypot can be in three states: susceptible state (S H , which denotes the number of uninfected honeypots with seductive baits), infected state (I H , which denotes the number of infected honeypots that have successfully captured the malware samples), quarantined state (Q H , which denotes the number of quarantined honeypots). And by reinstalling the system, the honeypot in quarantined state can become susceptible state again at a rate δ 2 .

C. INTERACTION BETWEEN HONEYPOTS AND PLCS
In the honeynet context, an infected PLC can infect the honeypot in susceptible state at a rate β 22 . As it is mentioned above, an infected honeypot can never infect other PLCs, so susceptible nodes can only be infected by the infected PLCs. However, the infected honeypot can capture the malware samples. Once the information related to a malware sample is captured by a honeypot, quarantine strategy for PLCs can be carried out and the PLCs in susceptible state will be quarantined at a rate β 12 . Then the specific immunization measures to counter the malware should be taken, thus, a part of quarantined PLCs will turn to recovered states at a rate ω. We assume our model into a heterogeneous honeynet. Then a two-layer complex network which the node degree distribution follows a power law (P (k) = k −γ ) is formed, where P(k) stands for the probability that a randomly chosen node within the network has degree k pointing to this node. For a PLC, we use (i, j) to denote its degree, which means it is connected with i (internal) other PLC nodes and j (external) honeypot nodes. Similarly, for a honeypot, the degree (k, l) means that it is connected with k (internal) honeypot nodes and l (external) PLC nodes. The maximum node degree of the network is n 11 = max {l}. We further define P P (i, j) and P H (k, l) as the joint degree distribution in PLCs and honeypots, respectively. And marginal degree distributions are The average degree values are as follows: numbers with node degree k at time t, respectively. Note that there is no recovered state for the honeypots. That is because a honeypot is designed to attract the malware attacks, no honeypots are immune to the malware attacks. Thus, the total number of common nodes and honeypots over the network with degree k at time t is Then we have the following equations: In our model, we assume that the connectivity of nodes is uncorrelated. Thus, for susceptible PLCs, the probability of communicating with infected PLCs is: and the probability of communicating with susceptible honeypots is: Similarly, for susceptible honeypots, the probability of communicating with infected honeypots is: and the probability of communicating with infected PLCs is: The transition diagram among the states of the nodes in the two-layer complex network is showed in Figure 2 according to the transition relationship and formulations.

III. EQUILIBRIUM AND STABILITY ANALYSIS OF THE MODEL
The dynamical behaviors of (8) proposed in Section II is studied in this section. An equilibrium of (8) under which the malware remain epidemics or become extinct is determined, that is, the disease-free equilibrium and endemic equilibrium. Then the local and global stability of the disease-free (endemic) equilibrium is examined by the basic reproduction number R 0 .

A. DISEASE-FREE EQUILIBRIUM
For PLCs and the honeypots, we have To express the disease-free equilibrium, we set all equations in (8) to be zero with I P i,j (t) = 0 for all i and j, and I H k,l (t) = 0 for all k and l. Then we have VOLUME 8, 2020 Obviously, the disease-free equilibrium is

B. ENDEMIC EQUILIBRIUM
The endemic equilibrium means that there are no more state changes in our model, and infected PLCs and honeypots are present in the complex network. In our model, the endemic equilibrium is equivalent to setting all equations in (8) to be zero (no more state changes), which will lead to the following equations, However, since there are infectious nodes, the condition I P i,j (t) = 0 and I H k,l (t) = 0 do not hold anymore. Instead, I P i,j (t) and I H k,l (t) are now of some positive values. Thus, all the derivatives on the left of equal sign in (8) are set to be zero and assuming given I P * i,j and I H * i,j , then we can derive the endemic equilibrium point:

C. STABILITY OF THE MODEL
To analyze the stability of the model, we calculate the basic reproduction number R 0 in this subsection. The basic reproduction number R 0 is an important threshold parameter in epidemiology, which is ''the expected number of secondary cases produced, in a completely susceptible population, by a typical infective individual'' [27]. Driessche et al. depict a general compartmental disease transmission model suited to heterogeneous populations and demonstrate a detailed calculation method of the basic reproduction number [28]. It is shown that, if R 0 < 1, then the malware will be extinct; whereas if R 0 > 1, there will exist an endemic equilibrium. For a heterogeneous population, R 0 is characterized as the spectral radius of the next generation matrix. Driessche and Watmough [29] put forward a general compartmental disease transmission model suitable to heterogeneous populations and illustrated a detailed calculation method of R 0 . Because the joint degree distributions are assumed independent in this paper, thus, P P (i, j) = P P (i, ·) P P (·, j) , For briefness, we denote i P 0,0 = y 1 , . . . , i P 0,n 12 = y n 12 +1 , i P 1,0 = y n 12 +2 , . . . , i P 1,n 12 = y 2n 12 +2 , . . . , i P n 11 ,0 = y n 11 (n 12 +1)+1 , . . . , i P n 11 ,n 12 = y (n 11 +1)(n 12 +1) . Similarly, i H 0,1 = y (n 11 +1)(n 12 +1)+1 ,. . . , i H n 21 ,n 22 = y n , where n = (n 11 + 1) (n 12 + 1) + (n 21 + 1)(n 22 + 1). Also, it is denoted that f = (f 1 , f 2 , · · · , f n ), where f i represents the rate of change of infection compartment i and dy(t) dt = f (y(t)). Obviously, when the disease-free equilibrium E 0 is obtained, y i = 0, i = 1, · · · , n. Then the basic reproduction number of model is R 0 = ρ( ), where ρ( ) is the spectral radius of matrix [30], and F is the rate of new occurring infections, V is a diagonal matrix, and it is the rate of transferring individuals out of the original group, and ρ( ) is the spectral radius of the next generation matrix . is a complex matrix: where A i,j , B i,j , C i,j and D i,j are block matrices, and each element of them represents a sub-matrix satisfying is obtained by the equation as shown at the bottom of next page. Through a series of similarity transformations, matrix (15) can be simplified to is obtained by the equation as shown at the bottom of next page. Because the joint degree distributions are independent in our model, through a series of transformations, matrix (15) can be further simplified to: E is the unit matrix of (16), then let and we can calculate the eigenvalues of matrix (17), 81586 VOLUME 8, 2020 According to paper [29], Since is a nonnegative matrix, and according to Perron-Frobenius theorem, R 0 = max{|λ 1 |, |λ 2 |} is a positive eigenvalue of matrix . Based on [28], Chavez et al. outline the second generation operator approach developed and collaborators for the computation of the basic reproductive number [37], it is denoted that s(F − V) = max{Reλ i }, where λ i is an eigenvalue of and Reλ i is the real part of λ i . In paper, the following formula is given, The results in paper [28] are suited to heterogeneous group, and the PLCs and honeypots in our model are heterogeneous and the coupled network in our model is a heterogeneous complex network, so the results of [28] are applicable to our model. Therefore, we can obtain the following theorems. Theorem 1: If the basic reproduction number R 0 < 1, the disease-free equilibrium E 0 is locally asymptotically stable; if R 0 > 1, E 0 is unstable. Then the positivity and boundedness of the solutions of (8) are examined.
Lemma 1: For a constant differential autonomous system where y ∈ R n and f : R n + → R n is a continuously differentiable map. The following conditions are assumed.
From Theorem 3, we prove that equation (19) is an accurate threshold for disease transmission. If there are infected nodes at the initial moment, no matter what the number of infected nodes are, as long as R 0 ≤ 1, the malware will slowly die out. According to equation (19), it is related to the average degrees (< k > 11 , < k > 22 ), recovery (quarantine) rate (ϕ 1 , ϕ 2 ), and the infection rate (β 11 , β 22 ) in our model.

IV. NUMERICAL EXPERIMENTS
In this section, numerical experiments in the complex network is conducted to verify our theoretical research above, and some dynamical properties of our model is showed. In the following experiments, it is assumed that there are 10000 PLCs and 10000 honeypots in, and there are 250 infected PLCs. The complex network is supposed to be scale-free, and the node-degree follows the power-low distribution.

A. DISEASE-FREE EQUILIBRIUM
Three parts experiments are presented as following, diseasefree equilibrium, endemic equilibrium and the effectiveness of the honeynet.
The parameters for disease-free equilibrium are listed in Table 2. And the power-low index of the complex network is set γ = 3. Figure 3 and Figure 4 show the variation of every state in the complex network. With the parameters given in Table 2, a disease-free equilibrium is revealed before 500 time units.

B. ENDEMIC EQUILIBRIUM
To contrast with the disease-free equilibrium, the endemic equilibrium is showed in Figure 5 and Figure 6. The parameters are listed in Table 3.
The power-low index of the complex network is also set γ = 3. According to the physics significance, the parameters are adjusted to show the endemic equilibrium. Then we can    see that before 1000 time units, the number of PLCs and honeypots reaches about 1500 and 2800, respectively.

C. THE EFFECT OF THE HONEYNET
Our goal is to deploy honeypots in the honeynet more effectively and economically, so comparisons are made between different parameters to optimize the honeynet deployment  strategy. As we know, honeypots are utilized to attract the malware attacks and provide quarantine strategy. Considering (8) with the parameters given in Table 3 except for the average degree. A comparison about different average degrees is made. Figure 7 shows that the number of infected honeypots changes with different average degrees when the system reaches the endemic equilibrium. The experiment show that the number of infected honeypots increases as the average degree decreases. It means that the honeynet with lower average degree is more attractive to the attackers, and more honeypots are infected so that the honeynet can collect the information of malware more effectively.
In Figure 8, we consider (8) with the parameters given in Table 3 except for the parameter β 12 . And it exhibits the variation trend of infected PLCs with different β 12 . When tending to the equilibrium state, the number of infected PLCs decreases with β 12 increases, and larger β 12 means that the honeypot in the honeynet exists more system vulnerabilities. So the honeypots with more system vulnerabilities is conducive to ensure the malware epidemic to a lower level.
In addition, the power-law index of the complex network plays an important role in the propagation of malware. Figure 9 shows that the evolution of the number of infected honeypots with different power-law index of honeynet in the case of endemic equilibrium. We can find that the number  of infected honeypots increases with the power-law index decreases when tending to the equilibrium state. That is the honeynet with lower power-law index will be more attractive to the attackers. Another interesting phenomenon is that the number of infected honeypots decreases much more when the power-law index γ = 3.2 in the case of endemic equilibrium. It indicates that it is better to keep the power-law index γ ≤ 3 when deploying honeynet.

V. SIMULATIONS EXPERIMENTS
In this section, simulation results are presented to verify the actual behavior of malware propagation in the complex network. The topology of the network is set to be scalefree, identical to Section IV, meaning that the node degree distribution follows the power-law distribution (P k = k −3 ). The simulations are built on the scale-free network generated by the simulation software MATLAB, the physical process of worm propagation in the network is implemented by C++ programming language. There are 10,000 PLCs and 10,000 honeypots in our simulation experiments, and the parameters are the same as which we use in numerical experiments ( Table 3). The epidemic simulation process is consistent with the state transition diagram in Figure 2, so it will not be covered here again. The detailed process of our simulation experiments are presented as follows: 1) Firstly, we randomly choose 250 infected PLCs and other nodes are susceptible. In each round of the simulation experiments, all nodes in various states perform according to (8); 2) For susceptible PLCs which connected with infected PLCs, they can be infected by infected PLCs with rate β 11 . This process also applies to susceptible honeypots with rate β 22 ; 3) Then the susceptible PLCs can be quarantined with rate β 12 , and the infected PLCs can be recovered with probability ϕ 1 . The infected honeypots can be quarantined with probability ϕ 2 . 4) The quarantined PLCs will be recovered with probability ω. 5) Recovered PLCs (honeypots) transform into susceptible state again with probability δ 1 (δ 2 ). Figure 10 (a-f) shows the comparisons between numerical (dashed curves) and simulation (solid curves) results of susceptible, infected, and quarantined PLCs (honeypots), and it implies that the simulation curves match the numerical curves well. We can find that there are some differences between numerical and simulation curves because of the high precision of numerical experiments. Namely, the data in simulation experiments is double type and the data in simulation experiments is integer type. However, the small differences do not affect the validity of our results.

VI. DISCUSSION
In our proposed model, the honeynet is introduced as inhibition strategy to suppress the malware propagation. And the honeynet is a key factor to inhibit the propagation of malware. Thus, a comparison is showed in a numerical experiment to verify the effect of the honeynet. At first we replace the honeynet with an ICS network. It means that all nodes are PLCs in the two-layer ICS network. It is different form (8), the infected PLCs in both two layers can infect other PLCs. The parameters and the process of malware propagation is same as (8). Then the infected PLCs in the same layer of the two models are compared. It is easy to confirm that the effect of honeynet in Figure 11. When the system reach endemic equilibrium, the number of infected PLCs without honeynet is much larger than that  with honeynet. And the peak in the model with honeynet is much lower than that without honeynet. The result indicates that the honeynet can inhibit the propagation of malware in ICS network effectively. Figure 12 is a partially enlarged view of Figure 10 (b) that shows the tiny propagation tendency in the simulation experiment. Actually, the number of infected PLCs always fluctuates within an interval. And the infected honeypots also performs like this. It means that the defense strategy can control the malware in a quantity interval.
Another influence to the honeynet is the network traffic. In this paper, the infection rate β 22 is used to represent the amount of traffic, because the larger the amount of attack traffic, the more susceptible honeypots are to infection, vice versa. When all of the traffic from honeypot nodes is allow to be transmitted in the honeynet, like the traffic from PLC nodes, the infection rate β 22 obtains its maximum value β * 22 . The intrusion detection system (IDS) can detect related attack traffic with the generated detection signatures. Referring to the experimental results in [39], [40], we define the detection rate d as follows, If we want to know the relationship between the traffic and infected honeypots, we will study the relationship between infection rate β 22 and detection rate d. We assume that β * 22 = 2.4×10 −5 , and it is shown in Figure 13. Obviously, the infection rate β 22 and detection rate d is positively related. Then Figure 14 shows the relationship between infection rate β 22 and infected honeypots. When infection rate β 22 increases,   the detection rate d increases as well, it means that the traffic increases. Thus, we can find that the increasing traffic will cause more honeypots to be infected.
In Figure 7 and Figure 9, the comparison of infected honeypots with different average degrees and power-law index are studied when the system reaches the endemic equilibrium. Moreover, the influence of the honeynet topologies (powerlaw index) and connectivity (average degree) also needs to be studied when the system reaches disease-free equilibrium. Figure 15 shows the the comparison of susceptible honeypots with different average degrees. It shows that decreasing average degree will lead the minimum value of susceptible honeypots to decrease in the case of the disease-free equilibrium. It means that if the average degree is lower, fewer honeypots are susceptible. Then the attraction of the honeynet to the attackers may decrease. However, Figure 7 shows that more honeypots will be infected with lower average degree so that the honeynet can collect the information of malware more effectively. Then we know that the effectiveness of average degree depends on the state of our model. For the honeynet topologies (power-law index), Figure 16 shows that the number of infected honeypots increases much more when the power-law index γ = 2.5. The different performances of the honeynet in the case of endemic equilibrium and disease-free equilibrium are very interesting. Therefore, a dynamic connection mode of the honeynet may protect the ICS network more efficiently, which happened to be the excellent ability of the intelligent honeynet. And we will focus on the model of intelligent honeynet and study the intelligent honeynet detailedly in our future work.

VII. CONCLUSION
In summary, we introduce a new mathematical honeynetbased model in ICS network, and the epidemic dynamics in the two-layer complex network is analyzed. Theoretical analysis has revealed the relations between disease epidemics and honeynet potency. The influence of the average degree and the power-law index in the two-layer complex network has been analyzed. In particular, the following conclusions can be obtained, 1) A honeynet-based malware propagation model with immunization and isolation as the defensive measures in a two-layer (ICS network and honeynet) complex network is proposed. 2) The epidemic dynamics of our proposed malware propagation system is analyzed, it has a disease-free equilibrium E 0 and an endemic equilibrium E * . And the local and global stabilities of the disease-free equilibrium are proved. 3) Numerical experiments are conducted to reveal the dynamics of malware propagation. It shows honeypots with more system vulnerabilities is conducive to ensure the malware epidemic to a lower level. In addition, simulation experiments provide the actual behavior of malware propagation and verification of our derivations. 4) The effect of the honeynet is discussed and it is demonstrated that the honeynet with lower average degree or lower power low index is more attractive to the attackers. The results provide proper advice about how to deploy honeypots within a honeynet more effectively. Our analysis provides a better understanding of the interaction relations between malware epidemics and honeynet potency. Based on the analysis results, several practical suggestions are also proposed about how to deploy honeypots more effectively (honeynet with more system vulnerabilities, lower average degree, or lower power low index). Furthermore, the comparison in discussion also indicates that the number of infected PLCs can be controlled to a smaller range under the potency of honeynet. In a word, the honeynet brings a great change in the area of ICS network security. She has been a Lecturer with Northeastern University, since 2004. She is currently a Visiting Scholar with The University of British Columbia, in 2019. Her main research interests include network security, malware propagation modeling, and nonlinear dynamic system analysis. VOLUME 8, 2020