A Multi-Domain Anti-Jamming Strategy Using Stackelberg Game in Wireless Relay Networks

In this paper, we study the influence of a smart jammer on the design of a three-node frequency hopping communication system using an amplify-and-forward relay. The jammer is smart that it senses the frequency and transmit powers used by the legitimate transmitters including the source node and the relay, and optimally adjusts the sensing time and the jamming power allocation to maximize the performance damage of the relay system. We jointly consider the time domain and the power domain to design a multi-domain anti-jamming strategy. To model the interaction between the legitimate transmitters and the jammer, we use Stackelberg game and let the legitimate transmitters act as the leader while the jammer act as the follower. Based on backward induction, a genetic algorithm based on exponential distribution algorithm is proposed to obtain the optimal frequency-hopping speed and the optimal transmit powers of the legitimate transmitters. Simulation results show that the proposed multi-domain strategy outperforms single-domain schemes and the multi-domain random scheme. Moreover, the optimal placement of the relay is also discussed through simulations.


I. INTRODUCTION
In 4G and 5G communication networks, wireless relay is an effective solution to extend the coverage and meet the requirement of high data rate. Due to the open characteristic of wireless channels, the information transmissions from the source node and the relay are exposed under the threat of jamming attack, which causes serious damage on the quality of communications [1]. Frequency hopping spread spectrum [2]- [4] is widely used in the wireless communication systems that require anti-jamming protection. The main idea of frequency hopping (FH) is to divide the available bandwidth into many adjacent subchannels and change the carrier frequency according to a pseudo-random code generator.
With the development of cognitive radio technology, a smart jammer, which quickly senses the frequency hopping The associate editor coordinating the review of this manuscript and approving it for publication was Marco Martalo . communication signals, and immediately injects jamming signals on the detected frequency band [5] with the minimum required power, poses a great challenge to the existing defence mechanisms. To deal with the smart jamming in wireless FH systems, an intuitive approach is to increase the frequency hopping rate or increase the transmission power. However, due to the hardware limitations, there exists a frequency switching time as long as the communication system switches the frequency band. During the frequency switching time, the communication system cannot work [6]. If the frequency hopping speed is too fast, the frequency switching time increases and the effective communication time decreases; if the frequency hopping speed is slow, the jammer can detect the signal correctly with higher probability, which results in more precise jamming. Therefore, it is critical to find an optimal frequency hopping speed for FH systems. A similar tradeoff is also found in the power domain. Increasing the transmit power will improve the legal transmission quality, but this will also increase the probability of VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ being accurately detected and interfered, which will harm the legal transmission. Therefore, the joint selection of optimal FH speed and transmission power for FH communications is an important issue to be solved.

II. RELATED WORK
Game theory is a powerful mathematical tool to model and analyze the mutual interactions among players. Among the game theoretical models, Stackelberg game, which captures the sequential interactions among players, provides a promising approach of strategic decision-making when dealing with the smart jamming. So far, the most widely used anti-jamming method is based on power-domain, which refers to using power control method to deal with the jammer with power perception and power adjustment ability. In [7], the problem of anti-jamming is investigated in wireless communication systems using power control method under intelligent interference. The Stacklberg game method is used to establish the model, and the optimal communication strategy is obtained by solving the Nash Equilibrium (NE). In [8], the authors further studied the use of power control methods to resist intelligent interference in cognitive radio networks with observation errors, and derived the Stacklberg Equilibrium (SE) of anti-jamming games. It is proved that the user obtains a higher utility at the SE than that at the NE. Considering the uncertainty of channel state information and transmission cost information, the SE is derived by Bayesian Stackelberg game and the existence and uniqueness of SE are proved in [9]. Considering the rival-type uncertainty, the anti-jamming problem is modeled as two bayesian games, which are incorporated into a unified equilibrium scale to obtain the optimal transmit power in [10]. In [11], the Stackelberg game is used to solve the anti-jamming problem in the UAV communication network where the drones interfere with each other, and derives the optimal transmission power of UAV and smart jammers.
The above works use game theory to model and analyze the dynamic interaction between the smart jammer and the legitimate system, but they take only the power domain into consideration. When FH is used, besides the power domain, the time domain and the frequency domain can also be exploited for anti-jamming design. In [12], the author model the jamming and anti-jamming problems as stochastic game in frequency domain, and obtains smart channel hopping sequences. In [13], the bimatrix game framework is developed for modeling the interaction between the transmitter and the jammer, and the NE of the game are obtained. It has been proved that the multi-domain anti-jamming technology can enhance the anti-jamming ability with greater flexibility [14], [15]. In [14], the shortcomings of the separately application of FH and transmission rate adaptation methods are discussed,, and the idea of joint use of the two technologies is proposed to prevent interference. Power control anti-jamming based on Stackelberg game and channel switching based on multi-armed bandit are used jointly in [15] to effectively resist interference attacks in heterogeneous networks.
As for the anti-jamming issues in wireless relay networks, game theory is also used to design the anti-jamming strategies [16]- [18]. For the interfered relay system, the source and the relay are represented as the legitimate system in [16], and the interaction between the legitimate system and the jammer is modeled as a noncooperative static game. The existence and uniqueness of the NE are proved in [16]. With a total source and relay power constraint, the legitimate system and the jammer optimally allocate power between listening and forwarding phases respectively. For the power control problem in the relay cooperative anti-jamming system, by modeling the interaction between the legitimate system and the jammer as a Stackelberg game, the optimal transmission power of the legitimate system and jammer is analyzed and the SE is derived in [17]. Considers the problem of multi-user power control with incomplete information and observation errors, a bayesian three-layer Stackelberg game approach is constructed in [18] to solve this problem.
Most of the existing anti-jamming schemes in wireless relay networks are designed based power-domain and decode-and-forward protocol. Actually, the AF protocol has been widely used in practical wireless relay systems. In this paper, we try to solve the multi-domain anti-jamming problem by jointly considering optimal FH speed and transmission power for AF relay networks. We model the interaction between legitimate system and a jammer as Stackelberg game. The source and the relay as leader communicate firstly with the optimal hopping speed and transmission power. On the basis of detected hopping speed and transmission power, the jammer as a follower allocates appropriate signal detection time and interference time in time domain, and also allocates interference powers of the two hops in power domain. For the legitimate system, the genetic algorithm is used to obtain the optimal hopping speed and transmission power. For the jammer, the closed-form solution of the optimal parameters is derived under the given legitimate system parameters. Finally, the anti-jamming performance of the proposed method is compared with single-domain schemes and the multi-domain random scheme.

III. SYSTEM MODEL AND PROBLEM FORMULATION A. SYSTEM MODEL
We consider a three-node two-hop cooperative amplify-andforward (AF) relay network attacked by a smart jammer shown in Fig. 1, which consists of one source S, one destination D, one trusted relay R and one jammer J. R operates in the Half-Duplex (HD) mode. The channels are assumed to undergo flat fading with CSI perfectly and globally known at all terminals. Inspired by the path-loss model [19] which has been widely used in the communication, the channel gain of the source-relay link and the relay-destination link are denoted as α sr = K [d 0 /d sr ] γ and α rd = K [d 0 /d rd ] γ , where K is a coefficient that depends on antenna characteristics and average channel loss, d 0 is the reference distance of antenna far field, γ is the path-loss factor, d sr and d rd denote the distance of the source-relay link and the relay-destination link. Similarly, the channel gain of jammer-relay link and the jammer-destination are denoted as where d jr and d jd denote the distance of the jammer-relay link and the jammer-destination link respectively. Let h sr , h rd , h jr , h jd denote the complex channel coefficients between S and R, R and D, J and R, and J and D, respectively. The communication takes place in two phases due to the HD mode. In the listening phase, the R receives the signal transmitted by the S, and in the forwarding phase, the R forwards the received signal to D. The jammer interferes in listening and forwarding phases. According to [16], The received signal at D under interference can be express as where s, s j 1 and s j 2 are assumed to be independent zero-mean Gaussian signals with power P s , P j 1 and P j 2 , respectively. n r and n d are the zero-mean Gaussian noises at R and D with N r and N d variance respectively. a is the amplifying weight which can be write as It is assumed that the source and the relay have maximum power constraints P s max and P r max respectively, the jammer have total power constraints P J , so P s < P s max , P r < P r max and P j 1 + P j 2 = P J . Define β = P j 1 /P J as the power allocation factor of the jammer. The received Signalto-Interference-plus-Noise Ratio (SINR) at D SINR d can be expressed as Equ. (3), as shown at the bottom of the page. The legitimate system uses FH and power control technology for anti-jamming and the jammer optimize their parameters in time domain and power domain. The legitimate system hops based on a pre-specified pseudo-noise sequence in M sub-bands. A schematic diagram of a typical FH signal structure is shown in Fig. 2. The FH period is T = nT 1 , and T ∈ [0, T max ], T max is the maximum FH period, T 1 is the duration of the listening and forwarding time, n is a positive integer. That is, the legitimate system can adaptively adjust the FH period within the range of 0 to T max according to the parameters of the jammer, and each FH period can only be changed after D receives the information. Due to the limitations of devices, there is inevitably an unstable transient process when the signal frequency is switched. During this process, the FH communication system neither transmits nor receives the signal [6]. Let the duration of the transient process be frequency switching time T c , which is a fixed value related to the hardware device. In each FH period, the smart jammer performs jamming attack as soon as the legitimate transmission is detected. Therefore, each FH period T can be divided into two parts, signal detection time T d and jamming time T j , and T = T d + T j . For convenience, we set T j = mT 1 , m<n that is to say, the jammer will implement jamming after observing m times of complete communication. Because signal detection probability is related to detection time and transmit signal power, the longer the detection time or the greater the detection signal power, the higher the detection probability. Define P d as the detection probability of FH communication signal. According to the relevant study on the performance of an energy detector [20], P d can be given as is the number of all combinations of m elements taken from M − 1 different elements. SNR j is the recieve signal-to-noise-ratio (SNR) at J, which can be expressed as where N j is variance of the zero-mean Gaussian noises at J. Then, the average received SINR of the legitimate system in a FH period can be expressed as: where the first term of the righthand expression represents that D is not interfered by J when J is performing detection and J does not detect the existence of the legitimate transmission, SNR d is the receive SNR at D without interference. The second item represents that J detects the legitimate transmission and perform jamming, SINR d is the signal-tointerference-plus noise ratio at D which can be found in (3). When no jamming signals, SNR d can be written as Average SINR is the key indicator of communication reliability, so we give the utility function based on average SINR. Considering the energy-constraint of the capacity-limited battery for practical wireless devices, the transmission should be power-efficient. Therefore, we take the power cost into consideration when formulating the legitimate system's utility function. We define the utility value of the legitimate system in the following: where C s and C r are unit power costs of the source and the relay respectively. Assume that the jammer is supplied by the power grid so that a worst case to the legitimate communication system is constructed. Compared to the battery-supplied devices, the power cost of the jammer is negligible. We formulate the jammer's utility function as According to the utility function given above, the legitimate system and the jammer aim to maximize their own utility value. The problem of multi domain optimization can be solved by the backward induction. According to the detection results of FH signals, the jammer as a follower determines the optimal detection time T d and the power allocation factor β from the following optimization problem: Similarly, as laeders, the source and the relay determines the optimal transmit power and the FH period from the following optimization problem: Next, we propose an optimization method based on genetic algorithm (GA) to find the optimal solution of the legitimated and the jammer.

IV. THE MULTI-DOMAIN OPTIMIZATION STRATEGY BASED ON STACKELBERG GAME
The expression of the signal to interference plus noise ratio becomes quite complicated due to the AF protocol. Therefore, we can not derive the Stacklberg Equilibrium directly.
In this section, we propose a optimization method based on GA to obtain the optimal multi-domain parameters of follower and leaders successively.

A. FOLLOWER SUB-GAME
In the Stackelberg game, the backward induction is an effective method to obtain optimal solution. That is to say, for the jammer, the maximum utility value is achieved by observing the transmission power of the source and the relay and the FH period of the legitimate system. Therefore, we first solve the optimal jammer parameters β * and T * d when the parameters of the legitimate system are determined. The parameter β only affects the SINR d , apparently, the optimization problem (10) is equivalent to the lower form: and So we make the SINR d minimum to obtain the optimal β * first. The jammer's optimal power allocation β * is given by: where β = 1 2 and [x] 1 0 = min (1, max(0, x)). The convex optimization problem how to obtain optimal β * has been studied in Lemma 1 of [16].
Theorem 1: The optimal value T * d under discrete constraints can be expressed as where m * = arg min and T * d1 is shown in Equ. (21). Proof: After determining β * to minimize SINR d , we optimize T d to maximize the utility value of the jammer U j and Equ. (9) is simplified as follows: Because T d is a discrete variable, the optimal value T * d1 is found without considering its discrete constraints, then the optimal value T * d under discrete constraints is found in the discrete feasible region. After the parameters P s , P r and T of the legitimate system and β of the jammer are determined, the jammer's utility value U j only depends on the P d (T − T d ) from Equ. (18). The P d (T − T d ) is a concave function about T d because of: Therefore, the SNR corresponding to ∂P d (T −T d ) ∂T d T d =0 = 0 is the threshold for the jammer to adopt different optimal jamming power strategies. Based on Equ.s (10)- (14) in [21], the approximated jammer's optimal detection time T * d1 can be derived in closed-form as:

B. LEADER SUB-GAME
The optimal β * and T * d are taken into Equ. (8), and the source and the relay are taken as leaders to optimize P s , P r and T . Because of the good performance of GA in solving the optimization problems having discontinuities, constrained parameters and a large number of dimensions, we use GA based on the jammer optimal solution to solve the optimal problem (11). To slove this nonlinear bilevel programming problem, as in [22], we consider a genetic algorithm based on exponential distribution (GAED), which modifies two main steps of the GA, namely, the evaluation and crossover operations. The steps involved in the GAED are delineated as follows.

1) INITIAL POPULATION
First, we create the initial population pop(0), which includes s individuals, and each individual is represented by For each S 0 Ji , the corresponding optimal solution for the jammer S 0 J i = (β i , T d i ) can be calculated by Equ.s (15) and (16). The parameters of the individuals in the initial population are randomly generated with the bounded and discrete constraints.

2) EVALUATION
For each individual in a certain population pop(k), the fitness values is defined as follows: (22) where η is a sufficiently large positive number. Let S * k denote the individual which has the largest fitness value among all the possible values of R(S k Li , S k Ji ), i = 1, 2 · · · , s. Similarity, let S k denote the individual with the largest utility value of the legal system from all the possible individuals inherited from pop(k − 1).

3) CROSSOVER
This algorithm attempts to use the S * k and S k to improve the search efficiency. The specific method is as follows. Among the s individuals of the kth generation, we select randomly several individuals with probability p c for crossover. The new individual generated by the crossover of S k Li can be write as where Q k is to be optimized based on S * k and S k , µ ∈ [0, τ ], τ is a limiting factor so that S k Li + τ (Q k − S k Li ) does not exceed the feasible region. It can be seen from Equ. (21) that the crossover operator uses vector Q k to provide a crossover direction. Now the problem becomes how to choose Q k reasonably. Generally speaking, after a certain iterations, the vector S * k is a feasible solution, and S k is often not a feasible solution. Among them, the S * k is the vector with Li is mutated with a mutation operator to obtain S k Li . Then, the corresponding lower optimal solution S k Ji is obtained. The set of all mutation offspring ( S k Li , S k Ji ) is denoted as O 2 ; 5: Selection: Select the best N 1 , N 1 < s individuals from the set pop(k) ∪ O 1 ∪ O 2 . The remaining s − N 1 individuals is randomly selected from the remaining individuals of the this set. These two parts constitute the next generation population pop(k + 1) and update S * k and S k . 6: Iteration: If the termination condition is true, stop; otherwise, let k = k + 1, turn to 2.
the largest fitness value that satisfies the constraint, and the S k is the vector with the largest utility value of the legal system. From the perspective of fitness value, the S * k is better than S k , but the S k can provide a possible crossover direction. Therefore, we hope that Q k can approach S * k with a higher probability than S k . The characteristics of the exponential distribution meet this demand. The selection of Q k is given in the following step-by-step. First, let the random variable D follow an exponential distribution, and its probability density function is: Secondly, take sufficiently large number h so that prob(D ∈ (h, ∞)) is sufficiently small. Divide [0, h] into l sub-intervals h 1 ,h 2 , · · · ,h l of equal length. Divide the difference vector between S * k and S k into corresponding l subintervals Finally, according to the roulette selection method, an interval h i is selected and the parameter vector Q k in h i is selected randomly, that is, according to the probability prob(D ∈ (h, ∞)) select interval h i .

4) MUTATION
Mutation individuals were selected randomly from pop(k) with the mutation probability p m . Gaussian mutation operator is used. That is to say, for the variation of the parent S k Li , the mutation operator can be express as where ε is a Gaussian vector, whose elements are i.i.d and distributed as N (0, σ 2 p ). The step of the proposed GAED algorithm is outlined in Algorithm 1.

V. SIMULATION RESULTS AND DISCUSSION
In this section, we first demonstrate the optimal parameters in time-domain and power-domain obtained by the GAED algorithm. Then we compare the proposed multi-domain anti-jamming strategy with single-domain schemes and the multi-domain random scheme. Finally, the influence of different parameters on the utility values of the legitimate system and the jammer are discussed.
In the simulation, we assume the FH period T 1 is 1ms, and the frequency switching time T c is 0.5ms, the power costs of the source and the relay are set as C s = C r = 0.7. The transmit power of the jammer is P J = 10W, the maximum power constraints of the source and the relay are set as P s max = P r max = 1W. The noise power N c = N r = N j = N d = −50dbm. The number of optional channels M is 32, the initial population number s is 30, the crossover probability p c and the mutation probability p m are 0.

A. THE OPTIMAL POWER-DOMAIN AND TIME-DOMAIN PARAMETERS
The accuracy of the GAED algorithm is illustrated by Fig. 3, Fig. 4 and Fig. 5. Fig. 3 and Fig. 4 show that the GAED algorithm can find the optimal solution (P * s , P * r , T * ) of the legitimate system, and Fig.5 shows that the proposed GAED algorithm can find the optimal solution (β * , T * d ) of the jammer. In the simulations of this part, the unit power cost of the source and the relay are set as C s = 0.4 and C r = 1.2.     3 shows that in the case of optimal FH period T * , the legitimate system's utility value corresponding to each value of (P s , P r ) is calculated by traversing. The blue triangle and the red triangle represents the optimal transmit power of the source and the relay (P * s , P * r ) found by the GAED algorithm and the traversal search. It can be seen that the optimal transmit power (P * s , P * r ) found in the GAED algorithm are consistent with the maximum value found by the traversal search. In the case of fixed optimal powers of the source and the relay, the optimal FH period T found by the GAED algorithm is compared with the optimal value obtained through the traversal search in Fig. 4. We can see that the optimal FH period obtained by the GAED algorithm also coincides with that from the traversal search. Under this parameter setting, we also discuss the complexity of the traversal search algorithm and the proposed algorithm. The complexity of the traversal search is approximately O(1000), and the complexity of the proposed algorithm is approximately O(Ks) = O(300). Therefore the complexity of the proposed algorithm is much lower than that of the traversal search.
From Fig. 5, we can see that the GAED algorithm also finds the optimal (β * , T * d ). Moreover, we can also see that the utility value of the jammer is mainly affected by the detection time T d while the power allocation factor β has little effect. The reason is that from Equ.s (6) and (9), the jammer's utility is dominated by SNR d , but the parameter β mainly affects SINR d but not SNR d .

B. THE UTILITY COMPARISON
The comparisons of the proposed strategy with the single-domain schemes and the multi-domain random scheme are shown in Fig. 6. The multi-domain random scheme selects all parameters (P s , P r , T , β, T d ) randomly. There are two kinds of single-domain schemes: the power-domain only scheme and the time-domain only scheme. In the power-domain only scheme, the FH period T is selected by blind FH, the optimal power (P s , P r ) is obtained by traversing, and the optimal detection period T d and the power allocation β are selected by the derived closed form solution in the GAED algorithm. In the time-domain only scheme, the optimal FH period T is obtained by traversing, the power (P s , P r ) is selected randomly, and the optimal detection period T d and power allocation β are selected by the derived closed form solution in the GAED algorithm. The power-domain only scheme optimized the transmit power of the legitimate system and the time-domain only scheme optimized the FH period, while the proposed scheme jointly optimize the parameters both in power domain and time domain. Therefore, the proposed strategy achieves the largest utility value of the legitimate system among four anti-jamming schemes. The legitimate system's utility value of the multi-domain random scheme is the minimum since the parameters of the multi-domain random scheme are selected randomly. Because the legitimate system is the leader and has the first mover advantage, the proposed strategy has the lowest jammer's utility value. The multi-domain random scheme has the largest utility value of the jammer. Through the comparisons, it is found that the proposed strategy has obvious advantages over the single-domain schemes and the multi-domain random scheme.  Fig. 7 shows the influence of the jamming power on the proposed strategy. First, it can be seen that under the current parameter settings, the GAED algorithm tends to converge after 15 iterations. It can also be found that the jamming power has little influence on the utility value of the legitimate system. This is because the utility value mainly comes from SNR d in Equ. (8), which has no interference. Due to the similar reason, the increase of the jamming power has a weak impact on the utility value of the jammer. In Fig. 8, the effects of the transmit power constraints of the source and the relay on the proposed multi-domain anti-jamming strategy are investigated. First of all, we can see that with the increase of P s max and P r max , the number of iterations required for convergence of the GAED algorithm increases. This is because when P s max and P r max increase, the feasible region is expanded, and the algorithm needs more iterations to find the optimal solution. We can also see that there is a bottleneck effect between the two hops. As long as the maximum power limit of the source or the relay is 1W, the utility value of the legitimate system is relatively low, which is about 33. Only when the maximum power limits of the source and the relay are both 5W, the utility value of the legitimate system of the proposed strategy will increase and the utility value of the jammer of the proposed strategy will decrease.

C. THE INFLUENCE OF KEY PARAMETERS ON UTILITY
In Fig. 9, the effects of unit power costs of the source and the relay on the proposed strategy are investigated. As C s or C r increases, the utility value of the legitimate system reduces and the utility value of the jammer increases. It can be found that the effects of power costs of the source and the relay on utility value are similar. The effect of the relay location on utility value is shown in Fig. 10. Through the simulations of different jammer locations, we find that the optimal relay location is around the middle point of the source-destination link. This is because the source and the relay can automatically adjust their transmit power, and the utility value is maximum when the relay is in the middle.

VI. CONCLUSION
In this paper, we propose a multi-domain anti-jamming strategy for a wireless AF relay system using FH. We use Stackelberg game to model the interaction between the legitimate transmitter and the jammer in which the legitimate transmitter is the leader and the jammer is the follower. Based on the backward induction method, a GAED algorithm is proposed to find the optimal parameters of the legitimate system and the jammer. The simulation results show that the GAED algorithm can accurately find the optimal solution in time-domain and power-domain. The impacts of jamming power and unit power cost on utility values are analyzed. The bottleneck effect of the power constraints on the relay network performance is analyzed. Numerical simulations show that the optimal relay location under smart jamming is around the middle point of the source-relay link by using the proposed strategy.