SCEMA: An SDN-Oriented Cost-Effective Edge-Based MTD Approach

Protecting large-scale networks, especially Software-Defined Networks (SDNs), against distributed attacks in a cost-effective manner plays a prominent role in cybersecurity. One of the pervasive approaches to plug security holes and prevent vulnerabilities from being exploited is Moving Target Defense (MTD), which can be efficiently implemented in SDN as it needs comprehensive and proactive network monitoring. The critical key in MTD is to shuffle the least number of hosts with an acceptable security impact and keep the shuffling frequency low. In this paper, we have proposed an SDN-oriented Cost-effective Edge-based MTD Approach (SCEMA) to mitigate Distributed Denial of Service (DDoS) attacks at a lower cost by shuffling an optimized set of hosts that have the highest number of connections to the critical servers. These connections are named edges from a graph-theoretical point of view. We have proposed a three-layer mathematical model for the network that can easily calculate the attack cost. We have also designed a system based on SCEMA and simulated it in Mininet. The results show that SCEMA has lower complexity than the previous related MTD field with acceptable performance.

improves many network services such as management and monitoring, virtualization, distribution and integration.In SDNs, controlling the network traffic is assigned to a logically centralized component called a controller.The controller can create appropriate policies and set related rules on the switches to forward network traffic [1], [2].However, SDNs are facing different security challenges, among which are Distributed Denial of Service (DDoS) attacks.DDoS attacks are sophisticated and deleterious cyber threats categorized as powerful large-scale distributed attacks [3].They are becoming bigger and more common for extortion and malicious activities.AWS [4] reported that a DDoS attack was observed in 2020, which was 44% larger than the previously detected ones.Akamai [5] also reported that more than 3000 distinct DDoS attacks were observed only in the gaming industry in a year.These threat reports emphasize an essential need to perform security countermeasures against DDoS attacks.
Moving Target Defense (MTD) is one of the strategies to protect valuable assets from being compromised by DDoS.MTD intends to confuse the adversary by changing the attack space (e.g. by shuffling network addresses) and aims to invalidate the information gathered during network reconnaissance [6].The advantages of MTD compared to other security mechanisms are (1) their scalability, (2) almost removing the need for threat detection, and (3) frustrating the adversary.Developing a network that can change its configuration and implement MTD methods is challenging.However, as SDN provides a dynamic manageable framework, it is a deserving environment for implementing dynamic security mechanisms [7] such as MTD approaches.
There is a trade-off between implementing a defensive approach and its cost.In some cases, the cost of improving security is too high, which dissuades the network admin from implementing security strategies.An ideal MTD approach keeps the number of reconfigurations and the algorithm complexity low while bringing an acceptable security level [8].To the best of our knowledge, the execution time of all the previous MTD approaches grows as the network gets larger.This shortcoming was a motivation for us to work on simpler algorithms that can reduce both the number of reconfigurations and the complexity.
We proposed an MTD shuffling algorithm that finds the lowest-cost hosts to compromise and then shuffles them.The feature that helps us find low-cost hosts is the number of connections between the host and the critical servers.Since the connections are modeled with edges in a graph, we call the connections between the hosts and the servers edges.Shuffling these important hosts takes lower cost and brings a higher effect.Our proposed method is an SDN-oriented Cost-effective Edge-based MTD Approach, and we call it SCEMA.We have also designed a system that implements SCEMA.The main contributions of this paper are as follows.
1) Introducing a revised model for the networks under DDoS attacks.This model has three layers and uses Petri nets better to show the different states of the critical servers.It also contains mathematical relations for computing the attack's cost.In this model, we can effectively defend against the attacks considering the lowest attack cost.2) Proposing a low-complexity shuffling method, SCEMA, considers the number of connections between the hosts and the servers (i.e., edges) as the main feature of importance.By shuffling the hosts with the highest number of edges, we can reduce the shuffling frequency while keeping the security level high.3) Theoretically proving that SCEMA can achieve a higher or equal level of security compared with related MTD methods in specific networks.4) Proposing a system that implements SCEMA and simulating it using Mininet.Two types of scanning methods, sequential and uniform random, are considered in the simulations.We also present the experimental results that show the effectiveness of SCEMA.5) Presenting related metrics for measuring design goal achievement and comparing SCEMA with the related MTD approaches.The algorithm complexity is the metric for measuring MTD cost, and the adversary's success rate and the rate of the compromised server are the metrics for measuring security level.The remainder of this paper is as follows.section II reviews the previous works in the field of deploying MTD methods in SDN.In section III the threat model, the adversary's behavior, and the goal of an MTD approach to preventing it are presented.section IV is concerned with the network model and its mathematical representation.section V explains the details of the proposed method and section VI proposes a system architecture that indicates how to implement our proposed method in an SDN environment.section VII represents the numerical results of simulating the proposed system.And, finally, section VIII gives the conclusion.

II. RELATED WORK
In this section, we briefly describe the previous works about using MTD methods in SDN to mitigate cybersecurity attacks.The summary of these works is shown in Table I.
Rawski et al. [9] provided a platform for implementing MTD methods in SDNs.Topology mutation is the MTD technique used in this paper.Steinberger et al. [10] also implemented MTD methods in a collaborative SDN environment that reduces the success rate of a DDoS attack.Luo et al. [11] proposed a combined method of MTD and honeypot to improve network security against DDoS attack.Dynamic virtual IP addresses are assigned to the devices.
Macwan and Lung [12] also used virtual IP addresses to hide the real ones through an MTD approach.
Aydeger et al. [13] introduced an optimal MTD strategy to mitigate DDoS attacks.The MTD strategy is modeled as a signaling game.Zhou et al. [14] also proposed a signaling game for defending DDoS attacks in a cost-effective way.Game theory is also used by Zhou et al. [15] and the MTD approach is modeled as a trilateral game.To solve the tradeoff problem between MTD cost and its effectiveness, Markov decision processes are employed for adopting the optimal MTD algorithm, which is called TGCESA (Trilateral Game Cost-Effective Shuffling Algorithm).
Narantuya et al. [16] used multiple controllers to improve both the security and the performance of an MTD approach.Each host has several random virtual IP addresses which are altered over time.Karim et al. [17] proposed a random route mutation method to distribute a flow between different paths and make it complicated to find which hosts are in a specific path.Liu et al. [18] proposed a hopping strategy in which the switches change the ports of the packets to confuse the adversary.Chowdhary et al. [19] also employed a port hopping MTD strategy to mitigate multi-stage attacks.The ports of the virtual machines with the highest level of vulnerabilities are changed.Shi et al. [20] proposed a flexible MTD method in which the obfuscation level is variable.Some decoy servers are placed in the network to delay the attacks, and they are all obfuscated using mutation.Debroy et al. [21] proposed a frequency minimization MTD approach to defense cloud-based applications in SDN against DDoS attacks.
Hyder and Ismail [22] used port and IP shuffling to improve the security of SDN.Medina-L ópez et al. [23] used MTD approaches, by which when the messages are exchanged between hosts, their IP address is changed.So, the intermediary hosts are unaware of the real address.Chang et al. [24] proposed a cost-effective MTD method in SDN which randomizes the IP addresses and synchronizes different MTD phases.Chowdhary et al. [25] used an SDN controller to mitigate cloud network attacks through network reconfiguration.The attack graph of network vulnerabilities plays the main role in analyzing the network security level.
A three-tier model called TAG is proposed by Yoon et al. [26], which is used to reduce MTD cost in SDN by finding an optimal set of hosts for shuffling.A greedy Backward Attack Path (BAP) prediction algorithm is proposed in this work to find optimal hosts to shuffle.In BAP, k most vulnerable attack paths from the adversary to the critical servers are selected, and the hosts in these paths are shuffled.The vulnerability of each path is calculated using attack graphs.
Only a few works have considered both MTD cost and DDoS attacks.These works have some limitations, such as being appropriate for only cloud networks with virtual machines and high complexity in game theory and hash-based approaches that may cause delay and processing overhead on the controller.So, we decided to improve one of the mentioned cost-effective works that do not consider DDoS attacks but is capable of being extended and improved.The algorithm and network model proposed by Yoon et al. [26] (BAP and TAG) can potentially be improved for mitigating DDoS attacks with an optimal MTD approach.

III. THREAT MODEL
In a general computer network, there are multiple hosts and critical servers, and the hosts communicate with the servers to use their services.We have considered that the adversary's target is running a DDoS attack on all the critical servers, which are more than one.Our defined threat model assumes that the adversary is an active internal or external intruder, who can probe the hosts and compromise them in order to create an army that launches a DDoS attack against all the critical servers under his/her command.
As it is shown in Figure 5, an insider adversary is located on one of the hosts and connects to the other hosts through the internal connections in the network.This happens when a malicious user is illegally authorized as one of the network members and has complete access to one of the hosts.An external adversary connects to the hosts from outside of the network.This type of adversary can only communicate with the hosts that are permitted to connect to external nodes or the Internet.However, in our threat model, we have considered that all the hosts have Internet access.
The adversary first scans the network to probe the vulnerable hosts and then utilizes various intrusion tools to exploit their vulnerabilities and obtain special privileges to send traffic.When the adversary gains the related privilege in a host, that host becomes compromised and follows his/her commands.After the scanning/probing phase, the adversary's army is created, and he/she can send them an attack command as well as the address of the critical servers that must be targeted.
Before compromising the targets, the adversary scans the network to recognize network topology and to find vulnerable hosts.This is the reconnaissance phase and can be performed in two main methods [7].Sequential Scanning and Uniform Random Scanning are different methods commonly used by the adversary to scan the network.In the sequential scanning method, the adversary probes the hosts sequentially in a linear way.All the addresses in the address space are probed one after the other.But in the random scanning method, the hosts are randomly probed.Random addresses within the address space are selected and probed.In the defined threat model, the adversary can perform both sequential and uniform random scanning techniques.
It is worth noting that all the hosts are not directly connected to all the servers in a network.Each server has a specific access list that controls who can communicate with them.On the other hand, the vulnerability levels of the hosts are not the same, and some of them are hard to be compromised.As a result, the adversary attempts to compromise an optimal set of hosts, compromising which is not resource consuming, and moreover, they are in the access list of the target servers.For an MTD approach deployed to prevent this threat, the question is "shuffling which hosts, based on their different features, causes the greatest impact on decreasing the number of critical servers that have become unavailable".

IV. NETWORK MODEL
We have modified TAG model to design a new network model which is more suitable for the networks under DDoS attacks.Our model consists of three layers.The first layer is an undirected graph that shows the connections between different nodes of the network.The second layer is a weighted Directed Acyclic Graph (DAG) which shows the vulnerabilities of the hosts and their exploiting cost.This layer is the combination of the second and third tiers of TAG.The key difference between our model and TAG is in the third layer.The third layer of our model consists of Petri nets for each critical server.Petri net is a principal modeling concept for studying distributed events.It is a directed bipartite graph with two types called places and transitions.All the edges in Petri nets are directed from places to transitions and vice versa.Each place contains zero or more tokens and different system states can be explained as different distributions of tokens among the places.A transition in a Petri net can be fired only if its input places contain a specific number of tokens.By firing a transition, the specified tokens are removed from input places and added to output places.Using Petri nets helps us modeling different states of the critical servers facing DDoS attacks, including safe and dangerous conditions.
According to our model, a network can be modeled as N = ({C}, {V, H}, {S}), where C represents the first layer as an adjacency matrix of the network, V and H construct the second layer and indicate the adjacency matrix of vulnerabilities and the set of typical hosts, respectively, and S is the set of critical servers as the third layer.We assume that we have H typical hosts and S critical servers.So, H = {h 1 , h 2 , . . ., h H } and S = {s 1 , s 2 , . . ., s S }. h i indicates the i th host and s i indicates the i th critical server in the network.

A. First Layer
C is a symmetric square matrix of size (H + S + 1) and the element in its i th row and j th column is indicated by c i, j .In general, c i, j is one when c i and c j are connected and otherwise it is zero.The explanation of c i is shown in Equation 1 For example, if c 1,2 is 1, it means that the adversary is directly connected to the first host in the network.It is worth noting that based on our defined threat model (section III), the value of c 1,i for 1 < i ≤ H + 1 is always 1, meaning that the adversary can establish a connection with all of the hosts.

B. Second Layer
We assume that there are V vulnerabilities in the network.Some of them are called remote, and they can be exploited directly by the adversary.Some others are the local vulnerabilities, and they can perform DDoS attack to the critical servers when exploited.The other vulnerabilities are intermediary, and they are exploited by remote vulnerabilities to exploit the local ones.All types of vulnerabilities are modeled as a weighted DAG and the edge weights are the cost of successfully exploiting them.If the probability of exploiting a vulnerability is p, we assumed that its exploiting cost is 1 − p.The exploiting cost is fixed for each vulnerability, and can be calculated by Common Vulnerability Scoring System (CVSS) [27].
The logical structure of V is similar to C. It is a square matrix of size (V + 2), and the element in its i th row and j th column is named v i, j .The value of v i, j is defined in different cases as follows: • In the case that i = 1 and 1 is the probability that the adversary can exploit the ( j − 1) th vulnerability (v j −1 ) directly without exploiting other vulnerabilities.Hence, if it is zero, it means that the adversary has to first exploit other vulnerabilities.As an example, if v 1,2 is 0.5, the adversary can directly exploit v 1 without exploiting any other vulnerabilities with a probability of 0.5.
is the probability of successfully exploiting the ( j − 1) th vulnerability (v j −1 ) provided that the (i − 1) th vulnerability (v i−1 ) is currently exploited.• In the case that 1 < i ≤ V + 1 and j = V + 2, v i, j is the probability of successfully launching a DDoS attack against the critical servers, provided that the (i − 1) th vulnerability (v i−1 ) is currently exploited.For example, if v 2,V +2 is zero, the adversary cannot launch a DDoS attack only by exploiting v 1 .
• In other cases, the value of v i, j is zero, because it indicates an impossible event.For example, the value of v 1,V +2 is 0, because the adversary is unable to directly launch a DDoS attack without exploiting any of the vulnerabilities.V is not a symmetric matrix, and therefore, the values of v i, j and v j,i are not always equal.We have certainly assumed that if a vulnerability can perform a DDoS attack to a specific server, it can also attack the other servers.
Each host, h i , can be shown as where v i j is the index of the j th vulnerability that exists in h i , and 1 ≤ v i j ≤ V .For example, if h 1 = {1, 3}, the first host in the network suffers from v 1 and v 3 as the security vulnerabilities.

C. Third Layer
The servers are modeled as Petri nets.Each server has three states.Safe, Warning, and Dangerous.When less than μ neighbor hosts of a server are compromised, the server is in a Safe state.If more than μ hosts and less than μ + ρ neighbor hosts are compromised, the server is in a Warning state.If more than μ + ρ neighbor hosts are compromised, the server is in a Dangerous state.According to these states, we have assumed that μ < ρ.These states are equivalent as three places in a Petri net.So, each server, s i , can be shown as s i = (P, T , M i ) where P = {P, P , P } is the set of three places of the Petri net, T = {T, T , T } is the set of three transitions, and M i is the initial marking of s i .P, P , and P are the places in which, the server is in Safe state, Warning state, and Dangerous state, respectively.T is the transition from P to P and fires when μ neighbor hosts are compromised.The transition from P to P is T and it fires when the compromised hosts are recovered by performing a shuffling procedure.T is the transition from P and P to P and fires when μ + ρ neighbor hosts are compromised and can cause a DDoS attack.
For each server i , the initial marking M i indicates the number of tokens in each place at the initial state.As all the hosts are initially uncompromised, all of them are in P. So, we have M i = (n i , 0, 0), where n i is the number of hosts which are directly connected to s i and are considered as its neighbors.n i can be calculated as n i = H +1 j =2 c i+H +1, j .Each Petri net has an incidence matrix, D. Its rows are associated with the transitions and its columns are associated with the places of the Petri net.The element in the i th row and the j th column shows the number of tokens that are added to the j th place after the i th transition firing.As the value of μ and ρ are considered to be the same for all the servers, D is similar for the servers and is shown in Equation 2.
A sample server modeled with a Petri net is shown in Figure 1.Three different states for this server are illustrated, and the value of μ and ρ are 5 and 2, respectively.

D. Cost Calculations
In a DDoS attack, the adversary selects one or more targets in the network and commands his army to perform the attack against them.The cost of performing a DDoS attack, which we call Attack Cost (AC), includes managing the attack and compromising the army.The cost of compromising the army is the cost of exploiting related vulnerabilities of the hosts.Compromising Cost (CC) is a metric that can guide the defensive method to find the most desirable targets for the adversary's army.We also call the cost of exploiting the vulnerabilities of a host an Exploiting Cost (EC).
Using an MTD strategy is costly, and the cost of shufflebased approaches includes Implementation Cost (IC) and Shuffling Cost (SC).IC consists of the complexity of executing the defensive strategies.For example, implementing an MTD mechanism in SDN brings IC for the controller and the extra time consumption for the SDN controller is considered as IC.SC is the cost relating to the reconfiguration of the hosts, and it has a direct relation to the number of shuffled hosts.Shuffling the network leads to several configuration changes, and these changes are considered as a cost.While IC and EC are important for evaluating an MTD approach, there is not general rule to theoretically calculate IC and EC.They are commonly measured based on the simulation results or real testbed reports.We have also calculated the complexity order of the proposed method in section VI-C.In this section, we focus on theoretically calculating EC, CC, and AC.
We define the EC value of a vulnerability, as the minimum cost that the adversary must pay to exploit that.The costs which are presented in V are not always the exact EC of a vulnerability.Some vulnerabilities have prerequisites vulnerabilities.So, the adversary must first pay the cost for the prerequisites ones and then exploit that vulnerability.The set of prerequisites vulnerabilities indices for the j th vulnerability in the There may be several ways to exploit a vulnerability, but the adversary tries to use the way with the lowest EC.We define co EC (i, j ) as the lowest EC of the j th vulnerability of the i th host (v i j ) to be exploited, and it can be calculated by Equation 3, where Now the CC of a host can easily be calculated as the minimum cost of the local vulnerabilities of that host.

We name the set of local vulnerability indices of h i as lo(h i ).
For example, if lo(h 1 ) = {2, 3}, the local vulnerabilities of the first host are v 2 and v 3 .v i j is in lo(h i ) if and only if v v i j +1,V +2 is not zero.The cost of compromising h i , co CC (h i ), can be calculated as Equation 4, where Now we can calculate the cost of firing T to find out the attack cost (AC).The AC value of a DDoS attack to s i is equal to the cost of firing T in our model.A transition cost can be calculated as the sum of the total cost of its previous transitions and the total cost of its input tokens.T can be fired only if T is fired.The cost of T is equal to its input tokens which is compromising μ hosts connected to a single server.In addition to firing T , T needs another ρ host to be compromised.So, if the adversary selects a set of μ + ρ hosts, A i , which are connected to s i , the cost of his attack that is firing T is shown by co AC (A i ).The adversary's goal is attacking to all the servers, and the final attack cost is co AC (A).We have co AC (A i ) = a∈ A i co CC (a), and co AC (A) can be calculated as Equation 5.
V. PROPOSED METHOD (SCEMA) The main problem which we aim to propose a solution for is how to optimally shuffle the hosts to reduce SC while keeping the security level high.In this section, we present the main idea of this research and a numerical example of that.We also explain the proof of its superiority over the previous works.

A. SCEMA Approach
In distributed attacks, such as DDoS, the adversary creates an army of compromised hosts and then sends a command to that army to make all of them perform an attack on a specific target within a specific time interval.Since the adversary tries to perform the attack with the possible lowest AC, he/she searches for the minimal set of hosts which can join his army and which is enough to run the attack.We name the set of selected hosts by the adversary as A. We should find out a metric that can lead to the lowest cost A. Other MTD solutions, such as BAP [26], believe that this metric is the CC value of the host.It is assumed in BAP that the adversary wishes to fill A with the hosts that have the lowest cost of compromising.Hence, they shuffled the hosts with the lowest CC to bring security to the network.The process of finding these hosts is time-consuming and can be more complex in larger networks.
We introduce another metric that can be measured in lower complexity and get acceptable or even better results in many cases.The adversary's willingness to find the minimal army and the behavior of distributed attacks motivate us to design a low-complexity MTD method that shuffles only the hosts which have a higher number of connections to the critical servers.In other words, we believe that the metric which can attract the adversary's attention in many cases is the number of neighbor servers (edges) for each host.In distributed attacks, the group of hosts are more important than the individual ones.Therefore, we should concentrate on the connections between the hosts and the critical servers (i.e., the edges) instead of the CC value of each host.The hosts which are connected to more critical servers are the best targets for the adversary's army.Compromising a host which is connected to three critical servers is much easier than compromising three hosts which are connected to only one server.Figure 2 shows an example that compares SCEMA and BAP.The cost of compromising each host and performing DDoS attack on each critical server is shown in the nodes.In BAP, the CC value of each host is important, but in SCEMA the number of connections is important.This example illustrates that performing a DDoS attack on all the servers using our defensive method is impossible.However, using BAP can cause an attack.
We define a shuffling degree for each host.This degree is related to the number of servers that are directly connected to that host.The number of neighbor servers for h i (i.e., its edges) is shown as ne(h i ), and we have ne The shuffling degree of h i (d i ) is calculated as Equation 6.We can say that d i is the normalized value of ne(h i ).

B. Proof
In this section, we present a theoretical proof of a theorem that says SCEMA achieves a higher or equal security level  In homogeneous networks, the hosts are similar and their vulnerabilities are nearly identical.Most of the time, some hosts are more critical, and the network administrator performs security mechanisms to protect them and improve their safety.As a result, these hosts are more connected to the servers and can be used for serious tasks.On the other hand, the remained hosts are more vulnerable and treated as public hosts.We define these types of networks in Definition 1 and call them bipartite networks.
Definition 1: A network, N , is bipartite if all its hosts, H, can be partitioned into two sets, x and X, where all the following conditions are satisfied According to Definition 1, we can write x as d i=1 x i , where x i is the set of hosts, such as h, that ne(h) = i .We can also write X as S i=D X S , where X i is the set of hosts, such as h, that ne(h) = i .The general schema of bipartite networks is shown in Figure 3.
The adversary tries to find the optimal set of host, A adversar y , that has the lowest attack cost and also has μ + ρ connections to each server.BAP and SCEMA algorithms try to find A adversar y by their own mechanisms.The sets which are selected by BAP and SCEMA can be defined as Definition 2 and Definition 3, respectively.Definition 2: A set, A 1 , is BAP selected, if all of the following conditions are satisfied. 1) We have considered two assumptions mentioned in Assumption 1 and Assumption 2.
Assumption 1: We assume that m M ≥ d D , or in other words, m D ≥ Md.
Assumption 2: We assume that the sum of ne() for all the hosts in A 1 and A 2 are exactly S(μ + ρ).In other words, we assume that ne(A 1 ) = ne(A 2 ).
The number of connections to each server from hosts in A adversar y is greater than or equal to μ+ρ.By Assumption 2, we have assumed that all the servers are connected to exactly μ + ρ hosts in both A 1 and A 2 .There are S servers in the network.So, the value of ne(A 1 ) and ne(A 2 ) is S(μ + ρ).
Both BAP and SCEMA claim that their selected sets are the optimal set which is selected by the adversary (A adversar y ).The number of connections to the servers for A 1 and A 2 is satisfied as mentioned in Assumption 2. But the attack cost is not checked yet.In Theorem 1 we define a theorem that says in bipartite networks, SCEMA is more precise in finding A adversar y than BAP.
Theorem 1: In all bipartite networks under Assumption 1 and Assumption 2, the attack cost of each possible BAP selected set is greater than or equal to each possible SCEMA selected set.In other words, co(A 1 ) ≥ co(A 2 ).
To prove Theorem 1, first we define a lemma (Lemma 1) and prove it.
Lemma 1: If p ≤ q ≤ r and p ≤ q ≤ r then m r As r is greater than or equal to all the numbers from p to q, we can say that for all i between p and q, i ≤ r .By multiplying a positive number, such as |x i |, the inequality remains valid.So, we have i |x i | ≤ r |x i | for all i between p and q, and we can say that q i= p i |x i | ≤ q i= p r |x i |.Since r is fixed and independent from the values of i , we can say that q i= p i |x i | ≤ r q i= p |x i |.Now multiply both sides of this inequality by a positive number, m r , leads to Equation 7.
On the other hand, p is smaller than or equal to all the numbers from q to r .So, we can say that for all i between q and r , p ≤ i .By multiplying a positive number, such as |X i |, the inequality remains valid.So, we have p |X i | ≤ i |X i | for all i between q and r , and we can say that r i=q p As p is fixed and independent from the values of i , we obtain p r i=q |X i | ≤ r i=q i |X i |.Now we multiply both sides of this inequality by a negative number, −M p , and change the sign to get Equation 8.
Using Equation 7together with Equation 8, we can easily reach m r q We also suggest Remark 1 and Remark 2 to better show the steps of the proof.To find co(A adversar y ) we need to find the number of hosts from x that are in A adversar y and multiply it by m.Then we have to find the number of hosts from X that are in A adversar y and multiply it by M. Finally, by adding up the obtained values, we can reach co(A adversar y ).If y i and Y i are x i ∩ A adversar y and X i ∩ A adversar y , respectively, the total number of hosts from x and X are d i=1 |y i | and S i=D |Y i |, respectively.So, the attack cost of A adversar y can be calculated as Remark 1.

The value of ne(A adversar y
) is the sum of ne(h) for all the hosts in A adversar y .So, if z i is the set of all the hosts in A adversar y that have i connections to the servers, we can say that ne(A adversar y ) = D i=1 i |z i |.Now, we can calculate ne(A adversar y ) as Remark 2.
Remark 2: If y i = x i ∩ A adversar y and Y i = X i ∩ A adversar y , then we have ne(A adversar y ) = d i=1 i |y i | + S i=D i |Y i |.Now we start proving Theorem 1.We consider all possible cases for BAP and SCEMA selected sets, A 1 and A 2 , and prove Theorem 1 for each case.If in all possible cases the theorem is proved, we can say that it is completely proved.According to Definition 1, we have only four possible cases as follows.
Case 1: The four possible cases are shown Figure 4. We have proved Theorem 1 for all these cases, but only the proof for Case 1 is presented in this section.The other cases are proved in a similar way, and a sketch of their proof is presented in section VIII.These proofs demonstrate that SCEMA has a higher or equal security level compared with BAP in all bipartite networks.Now let us start the proof of Case 1.We have Equation 9 in consequence of Remark 2 and Assumption 2.
Recalling Remark 1, the attack cost of A 1 and A 2 can be calculated as Equation 10.
Now let α = co(A 1 ) − co(A 2 ).If α ≥ 0, we can say that co(A 1 ) ≥ co(A 2 ).So, we compare the cost of A 1 and A 2 by subtracting co(A 2 ) from co(A 1 ).This subtraction uses Equation 10 and results in Equation 11.

Now together with Equation
Now we can replace the value of m d d i=a+1 i |x i | in Equation 12 with its value in Equation 9 to obtain Equation 13.
We know that a ≤ d.So, ma ≤ md and ma−md ≥ 0. So, we obtain Equation 14.
We also know that b ≥ D. So, mb ≥ m D. From Assumption 1 we have m D ≥ Md.Hence, mb ≥ Md and mb−Md ≥ 0. Now we obtain Equation 15.
According to Assumption 1, m D ≥ Md and m D−Md ≥ 0. We also know that both d and D are positive and all the values of |X i | are non-negative.So, we reach Equation 16.
At last, according to Equation 14, Equation 15, and Equation 16, we find out that the right-hand side of Equation 13 is non-negative.So, we obtain Equation 17.

C. Numerical Example
In this section, we consider a sample software-defined network, N E , and present the numerical model for it.The network topology of N E is shown in Figure 5 and its schematic diagram regarding our model is shown in Figure 6.Note that the connection between the hosts is not shown in Figure 6 for simplicity.But the details are in model numeric representation.Network N E has two servers as S = {s 1 , s 2 }.The first server is s 1 = (P, T , M 1 ), where M 1 = (5, 0, 0) and the second server is s 2 = (P, T , M 2 ), where M 2 = (4, 0, 0).C and the relation between the vulnerabilities and their EC is specified in Equation 18and Equation 19.
We consider that μ = 2 and ρ = 1.So, according to Equation 2, D is specified as Equation 21.
Using Equation 4we can calculate the CC value of each host.These costs are shown in Equation 22.The shuffling degrees of the hosts are calculated according to Equation 6and are shown in Equation 23.
BAP suggests selecting the hosts for shuffling among the most vulnerable ones to prevent the attack.So, h 3 , h 5 , and h 6 are selected.But s 1 has still three unblocked connections.So, another host which is connected to s 1 must be shuffled.
As h 2 has the lowest cost, it will be selected.Now the set of hosts for shuffling is A B AP = {h 2 , h 3 , h 5 , h 6 }.But SCEMA selects the hosts with the highest shuffling degree.So, we have A SC E M A = {h 3 , h 4 , h 5 } and the hosts in this set can prevent the attack (s 1 and s 2 have less than three unblocked connections).The cost of these two sets, regarding to Equation 5 are shown in Equation 24.

VI. SYSTEM ARCHITECTURE
We have designed a system in SDN that implements SCEMA.This system, which is shown in Figure 7, contains four main components.Critical servers, typical hosts, network devices, and an SDN controller.Critical servers are the valuable assets in the network and the network admin tries to prevent DDoS attacks against them.The typical hosts are the vulnerable nodes in the network that the adversary attempts to compromise to create his army for performing a DDoS attack.The hosts and the servers are connected through network devices, which are OpenFlow switches in our case.The forwarding rules and management messages are sent to the network devices by an SDN controller.The controller uses five modules to implement SCEMA and manage the network.NTD, SDC, IAS, SID, and FEG.The modules are described as follows.

A. Network Topology Discoverer (NTD)
NTD module uses OpenFlow Discovery Protocol (OFDP) to figure out the current state of the network and its topology.The different network nodes and their connections are found and C can be generated.The network admin also provides the vulnerabilities and their relations and also the list of critical servers.Finally, the NTD module generates the network model, N , and passes this model to the SDC module.This module is triggered by network startup.Then the network topology is discovered and passed to the SDC module.

B. Shuffling Degree Calculator (SDC)
The SDC module is responsible for finding the shuffling degree of each host in the network.This module gets the network model from the NTD module and generates the shuffling degrees of each host.d i for every i is calculated in this module using the information about the connection provided in C. The algorithm performed by shuffling degree calculator module is shown in Algorithm 1.The list of shuffling degrees is then passed to the SID module.

C. Shuffling Interval Detector (SID)
SID finds the hosts that must be shuffled, according to SCEMA.The required information is received from the SDC module.All the reconfigurations and shuffling processes are performed at the beginning of a shuffling interval.Each shuffling interval in our system is a fixed period of time and lasts σ seconds.We have proposed two types of shuffling intervals.Soft intervals and Hard intervals.In Soft intervals, each host has a probability of being shuffled which is its shuffling degree.So, h i is shuffled with a probability of d i .
In Hard intervals, all the first μ+ρ hosts that have the highest value of d i are shuffled.So, we make sure that all the important hosts are shuffled.Each Hard interval comes after δ − 1 Soft intervals.By changing the value of δ, we can change the level of security.Figure 8 shows the first six intervals of the sample network mentioned in section V-C and the shuffled hosts in each interval are illustrated.The value of δ is three in this example.In Hard intervals, three hosts with the highest degree are always shuffled.But in Soft intervals, the hosts are with the probability of their shuffling degree.For example, h 2 is shuffled in the first interval but not in the second interval.
A flow entry timeout notifies the SID module about shuffling interval shifting.Hence, IDS checks the type of current interval and generates the set of hosts that have to be shuffled in that interval.We name this set as λ.λ is then passed to the FEG module for setting the related flow entries.The OpenFlow message that indicates flow entry timeout is called OFPT_FLOW_REMOVED.The algorithm of the SID module is shown in Algorithm 2.

Algorithm 2 SID Module Procedure top ← an empty list
A list storing μ + ρ highest degree hosts while top has less member than μ + ρ do A loop to create top max ← −1 As long as the network configuration has not changed, the shuffling degrees are fixed, and hence, there is no need for repeating Algorithm 1.This is the same for the first part of Algorithm 2, where the hosts are sorted based on their shuffling degree.The second part of Algorithm 2, where the hosts to be shuffled are selected, is repeated during time intervals.So, there are two parts to the whole procedure of the proposed method.The first fixed part is of O(S × H ), and the second repeated part is of O(H ).The procedure of most of the MTD methods can be also divided into the same parts, where the second part is of O(H ).In the fixed part, the degrees/scores of the hosts are calculated, and then in the second part, which is repeated in each interval, the hosts to be shuffled are selected.By this division, we can compare the computational complexity of different MTD methods by focusing on the first part.We can say that the IC of SCEMA is O(S×H ).The fixed part of BAP (i.e., its IC) is of O(S×k×o), where o is the complexity of finding the vulnerable attack path from the critical server to one of the hosts in the network.The value of o is completely dependent on the network topology and the attack path length.The worst case for BAP is when all the hosts are connected to all the other hosts (i.e., a mesh topology).Since, in this case, for each hop in the attack path, all the hosts are considered, o is H 2 , and the total complexity of BAP is O(S × k × H 2 ).In the best case for BAP, the length of the attack path is one, and we have o = H .As a result, the best complexity of BAP is O(S × k × H ), and it is higher than the complexity of SCEMA in any case.Moreover, the complexity of SCEMA is independent of the attack path, k.

D. IP Address Selector (IAS)
IAS module keeps a pool of IP addresses in the network address range.Each address in the pool has a flag that avoids conflicts between the used addresses.When a shuffling process is performed and the hosts need another IP address, the IAS module selects a random address among the addresses in its pool and its flag is not set.The random addresses are passed to the FEG module, and their flag is set.

E. Flow Entry Generator (FEG)
When a shuffling interval is detected by the SID module, the FEG module gets the host information from the SID module and then requests new IP addresses equal to the number of hosts in λ, from the IAS module.Finally, the FEG module generates appropriate flow rules according to the information received from SID and IAS and sets them on network switches.

VII. EVALUATION RESULTS
We have compared SCEMA with BAP [26] and TGCESA [15] as they are comparable with SCEMA.But our main focus is on comparing SCEMA with BAP.

A. Evaluation Metrics
Our design goals are reducing the defense cost and retaining network security.So, we need to measure appropriate metrics to clarify high-goal achievement.The selected metrics are described in the following.
1) Algorithm Complexity: To measure our algorithm complexity, we have calculated the time required for finding the important hosts.Time complexity and space complexity can be used to measure this metric, such as the IC. 2) End-to-End Delay: An efficient security mechanism is one which does not significantly increase the end-to-end delay between the hosts.We have considered end-to-end delay as a metric that can show the SC.Since shuffling a host changes the forwarding paths of the network packets, we expect an extra end-to-end delay when an MTD approach is deployed.
3) Adversary's Success Rate: The adversary's success rate is the ratio of the number of experiments in which the adversary reaches his goal to the total number of experiments.A lower rate for the adversary's success shows a better security performance in SCEMA.
4) Compromised Servers Rate: Even though the adversary's success is reached only when all the servers in the network are compromised, the number of compromised servers is also important in measuring the security level of the network.The compromised servers rate can be calculated as the ratio of the number of compromised servers to the total number of servers.

B. Simulation Environment
We have simulated our system, implementing SCEMA, with different network scenarios in Mininet.The hosts are connected through OpenVSwitches and the switches are controlled by a single POX controller.We have used Ubuntu 18.04 operation system, and the simulation machine has 16 G RAM, and an Intel i7 processor running at 3.2 GHz.
We have defined multiple different network topologies, in all of which, the adversary's node is directly connected to all the host nodes.Three of these networks are shown in Figure 9.
The vulnerabilities of the hosts in the first network are shown in Equation 25.The first eight hosts in the second simulated network are the same as what is mentioned in Equation 25, and its other hosts are represented in Equation 26.
The first 12 hosts in the third network are similar to the second network.The other hosts are mentioned in Equation 27.
The value of V for all the simulated networks is the same, and it is shown in Equation 28.
0 0.8 0.6 0 0.7 0.6 0 0 0 0 0 0 0 0 0 0.8 0 0 0 0.7 0.4 0 0.8 0.6 0 0 0 0 0 0 0 0.6 0 0 0 0 0 0 0.3 0.7 0 0 0 0 0 0 0 0.5 0 0 0 0 0 0 0 0.9 0 0 0 0 0 0 0 0 In the simulation scenarios, if one-third of the hosts connected to a critical server is compromised, the adversary can perform a successful DDoS attack against that server.It means that the values of μ and ρ are different in each scenario.The adversary probes five hosts in each scan, and the scanning interval is 15 seconds on average.To prepare a fair condition for comparing different methods with SCEMA, we have considered a fixed number of shuffles in each interval of all the simulation scenarios.We have considered both sequential and uniform random scanning methods in our simulations, based on the defined threat model in section III, to find out how our solution can protect the network against different types of scanning strategies.

C. Simulation Results
The obtained results of each metric mentioned in section VII-A are presented in this section.
1) Algorithm Complexity: The time and space complexity of executing BAP and SCEMA are shown in Figure 10.In all the cases, the complexity of our proposed algorithm is less than BAP.k is the number of hosts that are shuffled in an interval.The diagram indicates that the time complexity of BAP is markedly increased as both k and network size are increased.But our proposed algorithm is almost independent of the network size.For comparing the complexity of SCEMA, BAP, and TGCESA, all together, we have executed them on different networks.Since the complexity of BAP grows as k gets higher, we have only presented the results for BAP with k = 1.TGCESA focuses on shuffling the servers instead of the hosts.So, its complexity gets higher as the number of servers grows.The time and space complexity are shown in Figure 11.We can see that the complexity of BAP and TGCESA grows as the number of servers increases.BAP and TGCESA also become more complex when the number of hosts is increased.The hosts which are connected to the shuffled server must be migrated to another server in TGCESA.So, the growth in TGCESA complexity is reasonable in the case the hosts are growing.The space complexity of SCEMA is not growing heavily.Because only a simple array of size H + S can handle its implementation.The time complexity of SCEMA has almost a linear growth.
2) End-to-End Delay: The average values of end-to-end delay in all scenarios are shown in Figure 12.As the graph illustrates, when an MTD approach is not deployed, the end-to-end delay is lower than in the cases with shuffling scenarios.However, the point is to consider the trade-off between the end-to-end delay and the security level.BAP and SCEMA cause extra delay, however, the security they bring is acceptable.Moreover, we can see that there is only a small difference between the average delay in BAP and SCEMA scenarios, which indicates that SCEMA does not produce extra delay compared with BAP, and the SC of the proposed method is acceptable.3) Adversary's Success Rate: Figure 13 illustrates the adversary's success rate in different scenarios.In the simulated scenarios, the results of which are presented in Figure 13a, the number of shuffled hosts is not the same, and it grows as the number of hosts increases.We have shuffled one-third of the hosts in these scenarios to make some changes in the scale of MTD and the network.So, in the networks with 9 and 48 hosts, the number of shuffled ones is 3 and 16, respectively.We have considered this situation to make the results independent from the shuffled set size.However, the adversary's resources are fixed in the simulation scenarios.Hence, its impact on largescale networks is low.In other words, in both networks with h and h hosts, where h < h , the adversary can only probe H hosts.As a result, the army sizes in different networks are almost the same, and in large-scale networks, the army size is small compared with the network size and has not had enough power to reach the goal.This is why a descending graph in Figure 13a.About the general results, we can say that it is obvious that in a defenseless network, which we call Normal, the adversary's success rate is higher than the cases utilize a defensive method.Moreover, in all the scenarios, the adversary is more successful when he/she probes a network that deploys BAP compared with SCEMA.This demonstrates that SCEMA is effective in reducing the adversary's success rate.
Another point to mention is that the adversary who uses a sequential scanning method may experience higher success in the presence of an MTD mechanism.When a random scanning method is adopted, the adversary is scanning both valid and invalid hosts, and the valid ones are shuffled before the adversary can create a collaborated army.

4) Compromised Servers Rate:
The compromised servers rate is shown in Figure 14.Again, we see that a Normal network (i.e., without any defensive methods) has a higher number of compromised servers compared with the other cases.In addition, even though our goal is not to reduce the number of compromised servers, we can see that this metric also has a lower amount in SCEMA against BAP.

VIII. CONCLUSION
This paper proposed an SDN-oriented Cost-effective Edgebased MTD Approach, SCEMA, to efficiently mitigate DDoS attacks.SCEMA finds an optimal set of hosts for shuffling to reduce the cost of implementing MTD with acceptable performance.The main idea of SCEMA is to shuffle the hosts with more connections to the critical servers.We propose a three-layer network model to present different security states of the network using Petri nets.We also provide a system architecture that implements SCEMA and simulates this system in Mininet.We observe that SCEMA has lower complexity than previous related MTD methods, and its complexity is independent of the attack path.Thus, it is a costeffective solution and can easily develop large-scale networks.The results also show that with our approach, the security level is kept high with a low shuffling cost.We plan to extend SCEMA in virtual networks [28].Virtualization can split the network into small parts and reduce the cost of implementing the MTD approach.Furthermore, virtualization has the potential to confuse the attacker.Hence, implementing SCEMA in a virtual network may lead to gaining a higher security level.Moreover, we have planned to improve SCEMA's performance by focusing on the shuffling intervals in our future research.We can utilize machine learning models in order to find the optimal shuffling intervals.In other words, we planned to answer these two questions in future works on SCEMA using learning approaches: (1) how frequent the hosts must be shuffled, and (2) when the shuffling process must be started.

APPENDIX
The steps of proving Theorem 1 for Case 1 are explained in section V-B.Proving this theorem for the other cases follows similar steps, and we present a sketch of these proofs in this section.We partition these cases into multiple covering subcases and then prove the theorem for all of these possible cases.Hereafter in this section, α is co(A 1 ) − co(A 2 ).
Since α is greater than a non-negative number, it is not negative, and we conclude that co(A 1 ) ≥ co(A 2 ).

B. The Proof for Case 2 When a < b
There exists a positive number where b = a + k.Now, we have two conditions: k = 1 and k > 1.Now we prove the theorem for both conditions.The summation of two non-negative numbers is not negative.Hence, we have α > 0, and we conclude that co(A 1 ) ≥ co(A 2 ).When k > 1, there exists a positive number, q, where k = q + 1, and so, b = a + q + 1.We have α When k > 1, a positive number, say q, exists that k = q +1.We We know that X b+q+1 ⊆ X b+q+1 , and so, |X b+q+1 | − |X b+q+1 | ≥ 0. Consequently, α is the summation of three non-negative numbers, and hence co(A 1 ) ≥ co(A 2 ).

F. The Proof for Case 3 When a < b
We prove that a < b is impossible.Assume for contradiction that a < b is a possible subcase.So, there exists a positive number, k, where b = a+k.Based   We know that x q+1 ⊆ x q+1 and X D+k ⊆ X D+k .Hence, α is the summation of positive values, and consequently, co(A 1 ) ≥ co(A 2 ).

Fig. 1 .
Fig. 1.A sample server shown as the proposed Petri net model.(a) In safe mode.(b) In warning mode.(c) In dangerous mode.

Fig. 2 .
Fig. 2. Comparing the effectiveness of SCEMA and BAP in a sample network.(a) BAP solution: shuffling the hosts regarding their compromising cost (CC).(b) SCEMA solution: shuffling the hosts regarding the number of their connections to the servers.

Fig. 4 .
Fig. 4. Possible cases of BAP and SCEMA selected sets in a bipartite network.

Fig. 5 .
Fig.5.N E topology with two critical servers and six typical hosts.

Fig. 9 .
Fig. 9. Three of the simulated networks topologies.(a) A network with a single server.(b) A network with two servers.(c) A network with four servers.

Fig. 12 .
Fig. 12. Comparing the average end-to-end delay in the simulated network deploying no MTD approaches (Normal), SCEMA and BAP.

Fig. 14 .
Fig. 14.The evaluation results regarding the rate of the compromised server.(a) All scenarios.(b) Sequential scanning.(c) Random scanning.

A
. The Proof for Case 2 When a = b In this case, we have α = m a S i=D i |X i | − M S i=D |X i |, and based on Assumption 2, we conclude |x a | − |x a | = 1 a S i=D i |X i |.Hence, we have Equation 29.

TABLE I THE
SUMMARY OF RELATED WORK on Assumption 2, we have (a + k)|X a+k | = d i=1 i |x i | + a|X a | + a+k i=a+1 i |X i |.As X a+k ⊆ X a+k , we obtain (a + k)|X a+k | ≤ (a + k)|X a+k |.As a result, (a + k)|X a+k | is smaller than or equal to a+k i=a+1 i |X i | which contains (a + k)|X a+k |.Hence, we have (a + k)|X a+k | ≤ a+k i=a+1 i |X i |.Again from Assumption 2, we obtain d i=1 i |x i | + a|X a | ≤ 0. Since d i=1 i |x i | is nonnegative, we reach Equation 38.