Intelligent Secure Communication for Cognitive Networks With Multiple Primary Transmit Power

In this paper, we study an intelligent secure communication scheme for cognitive networks with multiple primary transmit power, where a secondary Alice transmits its secrecy data to a secondary Bob threatened by a secondary attacker. The secondary nodes limit their transmit power among multiple levels, in order to maintain the quality of service of the primary networks. The attacker can work in an eavesdropping, spoofing, jamming or silent mode, which can be viewed as the action in the traditional Q-learning algorithm. On the other hand, the system can adaptively choose the transmit power level among multiple ones to suppress the intelligent attacker, which can be viewed as the status of Q-learning algorithm. Accordingly, we firstly formulate this secure communication problem as a static secure communication game with Nash equilibrium (NE) between the main links and attacker, and then employ the Q-learning algorithm to select the transmit power level. Simulation results are finally demonstrated to verify that the intelligent attacker can be effectively suppressed by the proposed studies in this paper.


I. INTRODUCTION
In recent years, there have been many progresses in the development of wireless communications [1]- [4], in order to tackle with the increasing challenge of wireless big data [5]- [8] and mobile edge computing [9]- [12]. Among the newly increasing techniques, cognitive technique can be viewed as a novel approach to improve spectrum utilization effectively and also has been recognized as a smart wireless communication technology in the limited of radio spectrum [13]- [17]. When the interference from the secondary users to the primary ones [18] is below a given level or the spectrum is not used, the secondary users are enabled to access the spectrum of the primary users. The channel capacity of the secondary users is limited by the primary users' tolerant interference power in cognitive network [19]- [21]. Thus, it is of vital importance to make sure that the secondary user should make use of the spectrum and reduce interference to the primary user. Most of studies in cognitive network focus on channel identification, detection and management of spectrum and power allocation.
The associate editor coordinating the review of this manuscript and approving it for publication was Min Jia . In practice, the level of primary transmit power can be single due to the transmission of fixed services. When the transmission services are varying, the primary users should use multiple levels of transmit power, in order to provide better performance [22], [23].
On the other hand, with the rapid development of the wireless networks, wireless networks are closely related to people's privacy communication and so on [24]- [26]. The security of wireless communication network has received more and more attention. Wireless communication security has become an important research topic recently and it mainly focuses on the physical-layer security research of wireless networks. Traditional encryption techniques rely on application-layer operations, which causes much more computational complexity [27]. In contrast, the application of the physical-layer security mechanism can make it more difficult for attackers to decipher the transmitted information. In [28], [29], physical-layer security has been proposed to safeguard data confidentiality in 5G wireless communication networks. Besides the above research, there have been some researches on the newly developed materials, which can be used in wireless networks for both transmission and improving the environments [30]- [33]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ Most of the research works on the physical-layer security focus on the fixed-mode attacker, which however ignores the fact that the attacker can be change its mode in order to increase the attack rate. In practice, wireless communication networks are more vulnerable to be attacked by intelligent attackers with the rise and development of new intelligent attackers such as unmanned aerial vehicles [34]. Smart attackers can perform many types of attack based on the environment of the wireless network, including eavesdropping, jamming and spoofing [35], [36]. In order to improve the security performance of communication and reduce the security loss caused by failure to detect attacks in time, many researches have focused on the detection of smart attackers and suppressed the attacks. Specifically, in [37], [38], a Q-learning based power allocation algorithm has been applied to strengthen the secrecy capacity under UAV smart attack. The work in [39] has proposed a power control strategy to suppress the intelligent attacks by using some advanced signal processing techniques such as beamforming and filtering.
In this paper, we consider the wireless communication system where there is a secondary user wants to contact with another secondary user under the constraint of the primary user in the cognitive radio network. Meanwhile, a secondary intelligent attacker exists in this network, and it can work in the eavesdropping, jamming and spoofing modes. In order to improve the security performance of the communication system, a static secure communication game with Nash equilibrium (NE) between the main links and attacker is formulated. We further propose a power control strategy based on Q-learning algorithm to select the transmit power level for the secondary user in the range of tolerant interference power of the primary user. The attacker can select its attack mode among eavesdropping, jamming, spoofing or keeping quiet, according to the practical environments and the cost. The transmitter eventually obtains the optimal transmit power to improve the system secrecy capacity by using the Q-learning algorithm. Simulation results validate that the intelligent attacker can be effectively suppressed by the proposed scheme in the cognitive radio network.
The main contributions of this work are summarized as follows: • A secure transmission of communication problem in cognitive radio networks under a smart attacker is investigated in this paper, and it is formulated as a static secure communication game with NE strategy.
• A Q-learning algorithm is introduced to determine the transmit power of the secondary user which should not be larger than the peak level of the tolerant interference power. The Q-learning algorithm can improve secrecy capacity of the secondary user and suppress smart attacks under the constraint of the primary user.
The outline of this paper is given as follows. Section II describes the system model in the cognitive radio networks. And then in Section III, we study the secure communication game and present the NE of the game. In Section IV, we present the Q-learning algorithm in detail which is used to select the transmit power level in a dynamic game. Section V presents the simulation results followed by the conclusions in Section VI. Fig. 1 shows the system model of the cognitive radio network, where there is one secondary Alice wants to send information to the secondary Bob under the constraint of the primary user. The intelligent attacker Mallory exists in the network, and it can work in the eavesdropping, jamming, spoofing, or silent mode, depending on the instantaneous channel state and system settings. Specifically, if the channel between the Alice and Mallory is in good condition, the Mallory may tend to eavesdrop the confidential signal from the Alice. On the other hand, if the channel between the Bob and Mallory is in good condition, the Mallory may tend to perform spoofing or jamming. When all the channels associated with the Mallory are in poor condition, the intelligent attacker may select to keep silent, as it cannot achieve good result in performing eavesdropping, spoofing or jamming. In this work, we use q ∈ {0, 1, 2, . . . , K } to denote the Mallory action mold, and K is the total number of attack modes. In particular, K = 3 corresponds to that the action mode of Mallory consists of keeping silent, eavesdropping, jamming and spoofing, and q = 0, 1, 2 or 3 represents the silent, eavesdropping, jamming and spoofing modes, respectively.

II. SYSTEM MODEL
To maintain the quality of service for the primary network, the transmit power of the secondary nodes should be limited. In this work, we consider a practical cognitive communication scenario where there exist multiple level of interference transmit power. In particular, we use I P ∈ [0, I P,max ] to denote the tolerant interference from the primary user, which I P,max is the tolerant peak interference. Moreover, suppose that the primary user has (L + 1) levels of tolerant interference power, and the primary user can use the l-th level of the tolerant interference power, denoted by that I P,l is equal to lI P,max L . When the l-th level of the tolerant interference power is used, the transmit power at the Alice is given by where g 1 ∼ CN (0, σ 2 i1 ) denote the channel parameters of the link from the Alice to the primary user. Here we use P q,Mallory to denote the transmit power at the Mallory. From P Alice and P q,Mallory , we will discuss the secure data transmission process, as follows.

A. WHEN MALLORY KEEPS SILENT
When q = 0 holds where the Mallory keeps silent, Alice communicates with Bob by sending signal X a and then Bob receives a signal Y B , given by where the channel parameter of the Alice-Bob link is denoted by g ab ∼ CN (0, σ 2 ab ), and n b ∼ CN (0, σ 2 n ) is the additive white Gaussian noise at the Bob [40]- [42], where the noise effect on the communication systems can be found in the works [43]- [46]. Note that the Alice can utilize the spectrum resources of the primary networks, as long as its interference is tolerated, which can help improve the system spectrum efficiency significantly. Based on the Shannon theory, we can write the capacity of the Alice-Bob link named by R, given by [47]- [49] R = log 2 1 + I P,l |g ab | 2 σ 2 n |g 1 | 2 . (3)

B. WHEN MALLORY PERFORMS EAVESDROPPING
When q = 1 holds where Mallory performs eavesdropping on the signal X a of Alice, it obtains a signal Y E , given by where the channel parameter of the Mallory-Alice link is denoted by g ea ∼ CN (0, σ 2 ea ), and n e ∼ CN (0, σ 2 n ) is the additive white Gaussian noise at the Mallory. In this case, the secrecy capacity under eavesdropping can be written as

C. WHEN MALLORY PERFORMS JAMMING
When q = 2 holds where Bob is disturbed by the Mallory's jamming signal denoted by Z J , Bob receives a signal Y J which consists of both the desired signal and the jamming signal, given by where the channel parameter of the Mallory-Bob link is denoted by g be ∼ CN (0, σ 2 be ).To limit the interference on the primary user, P q,Mallory is given by where P J is the peak interference when the Mallory performs jamming. And g 2 ∼ CN (0, σ 2 i2 ) denote the channel parameters of the link from Mallory to the primary user.
In this case, the transmission capacity is given by,

D. WHEN MALLORY PERFORMS SPOOFING
When Mallory selects to be a spoofer, it transmits a spoofing signal Z S with the power P S to lie to Bob. Then Bob gets a signal Y S , given by where P q,Mallory is limited by in which P S is the peak interference when the Mallory performs spoofing.
The capacity under spoofing is denoted by R S . The more spoofing messages Bob receives, the greater it loses. Hence, the secrecy data rate is modeled as a liner function. Note that the intention of the spoofer is to send a spoofing message to Bob, instead of preventing Alice's transmission. Therefore, if Mallory chooses to perform as a spoofing attack, it only sends a signal when the Alice is silent. The secrecy data rate of Alice which is attacked can be formulated as where γ is the impact factor of each spoofing signal, and γ is in the range of [0.1] .

III. SECURE TRANSMISSION GAME
In this work, we study the condition of secure communication which is under the environment of CR network. We model this problem as a non-cooperative static security game. When the secondary user, Alice and Mallory find the spectrum hole, they can use the spectrum resources of primary users, without affecting the communication of primary networks. Therefore, Alice can select to send signals with transmit power in the range of [0, The intelligent attacker Mallory chooses its attack mode according to the actual situation, that is, q = 0, 1, 2 or 3, which corresponds to keeping quiet, eavesdropping, jamming and spoofing, respectively. These attack modes will destroy the Alice's secure communication rate while ensuring that they are not discovered. Alice, instead, needs to maximize its communication security performance, i.e., R E , R J and R S .
Let f (q) be the cost of attack mode q caused by Mallory in this paper. When q = 1 holds which represents eavesdropping, the attack cost f (q) is equal to θ E . Similarly, when q = 2 and 3 holds which represents jamming and spoofing, the corresponding attack costs f (q) are equal to θ J and θ S , respectively.
In this non-cooperative static secure game, the utility of Alice is related to the confidential capacity and transmit power, and it can be formulated as where C a represents the Alice's cost by unit transmit power. The q-th element of the secrecy capability vector [R, R E , R J , R S ] is denoted by R q . Take this data rate and multiply by the coefficient ln 2, for simplicity. As same as above, the utility of the Mallory is related to the confidential capacity and transmit power, and it can be formulated as In general, the NE strategy of this game is expressed as (P * Alice , q * ). In order to maximize the Alice's own utility U a , it needs to choose the transmit power P Alice appropriately. Meanwhile, Mallory needs to select its attack mode to maximize its own utility U e combined with the actual transmit power of Alice. Neither Alice nor Mallory will benefit from changing the strategy alone. Therefore, in order to maximize the own interest, neither party is willing to change its strategy. From this, we can get the following inequality.
Lemma 1: The static secure game has an NE(x * Alice , 0) given by Proof: See Appendix A. It can be seen from Lemma 1 that when the cost of attack is much higher than the transmission lost, the incentive to attack disappears (i.e., eqs. (18a)-(18c)). Moreover, in the case of poor channel communication environments and serious information leakage (i.e., eqs. (18d)), Alice will stop transmission.

Lemma 2: The static secure game has an NE
Proof: See Appendix B. Lemma 2 illustrates that Alice prefers to transmit with the maximum power in the case of low transmission cost or high attack cost.

IV. POWER ALLOCATION STRATEGY IN DYNAMIC GAME
In practical communication environments, it is difficult for Alice to predict the attack mode and channel information of Mallory in a certain period of time under the constraint of primary user. Q-learning is a classic and widely used algorithm, which can derive the solution of the non-convex problem. In this work, Alice can learn how to select the optimal transmit power by the Q-learning algorithm when it communicates with Bob in the range of tolerant interference power of primary user to suppress the attack from Mallory effectively. Meanwhile, Mallory chooses the corresponding attack mode according to the cost and the choice of Alice.
As shown in Algorithm 1, Q-learning is a value-based and off-policy algorithm. Let Q(s, P Alice ) denote the Q-function of Alice, in which s is the system state and the action P Alice is the transmit power of Alice. P Alice is limited by the primary user which should be not larger than the peak level of the tolerant interference power. The Q-function Q(s, P Alice ) is the expected discounted long-term reward of Alice. The value function V (s) is the maximum of Q(s, P Alice ).
At time n, the attack mode of Mallory is denoted by q n . Alice uses the Mallory's attack mode q n−1 in the last slot as the system state at time n, which is given by s n = q n−1 . In Algorithm 1, we select Alice's action by using the ε-greedy algorithm in a time slot. We randomly explore an action with probability ε and exploit the best action in highest reward Q with probability 1 − ε. The learning rate is denoted by β, which determines how much the error is learned in this time slot. And β is a number less than 1. Meanwhile, the decay value of the future rewards is denoted by the discount factor δ, which is the range of [0,1]. In this trial-and-error process, Alice selects its transmit power to maximize its long-term reward, and can adaptively suppress the Mallory's smart attack.
Note that in the secure game involving two users, the action of one user can be regarded as the state of the other user. Choose transmit power P n Alice by using the ε-greedy algorithm 5: Transmit with power P n Alice 6: Observe the attack mold q n and the utility of Alice U a 7: Update the value function and Q function: Q(s n , P n Alice ) = (1 − β)Q(s n , P n Alice )+ β(U a (s n , P n Alice ) + δV (s n+1 )) 8: Alice Q(s n , P Alice ) 9: end for Accordingly, we regard attacking mode of Mallory, i.e., the action of Mallory, as the state space of Alice, which is denoted as q = 0, 1, 2 and 3. Moreover, we discretize the maximum transmit power P max into L + 1 levels, and define the transmit power level P Alice ∈ {lP max /L} 0≤l≤L as the action of Alice, which is also regarded as the state of Mallory. In further, the state transition probability of the Markov states is not known prior, and hence we use the Q-learning algorithm to solve the secure game, which does not need the state transition probability. In order to execute the Q-learning algorithm, the system needs to observe the attacking mode of Mallory and the utility of Alice. Although such information is maybe difficult to obtain in practice, it is meaningful to study with known information of Mallory's mode and Alice's utility, in the following three folds. Firstly, such information can be obtained through some signal processing methods, such as using some pilot signals in the system to estimate the required channel parameters and the Mallory's mode. Secondly, if we cannot obtain the accurate data of the required information, we can try to obtain some statistical value, through some estimation methods, such as estimating the location of the Mallory. Thirdly, even if we cannot obtain any information of the Mallory's mode and Alice's utility, the study in our work can still serve as a useful benchmark, and help obtain some insights on the secure system.

V. SIMULATION RESULTS
The performance of the proposed Q-learning algorithm was evaluated in this section. In order to implement the algorithm and simulate the practical communication environments, we set σ 2 ab = 1.2, σ 2 ea = 0.1, σ 2 be = 3, σ 2 g 1 = 6, and σ 2 g 2 = 4.2 as the average channel gains 1 [50]. We denote the peak interference power when the Mallory performs jamming and spoofing by P J = 7.4 and P S = 7.2, respectively. The cost of transmit power for Alice C a is set to 0.1, and the impact factor of each spoofing signal γ is set to 0.5. 1 Note that the node location is actually used in the secure game, since the statistical channel gains are related to the distance between the nodes.   10. Specifically, Fig. 2 (a), (b) and (c) are associated with the eavesdropping rate, jamming rate and spoofing rate of Mallory, respectively. As can be seen from Fig. 2, there is a decreasing trend in the attack rate of Mallory after many times of training and learning, and it tends to zero gradually. For example, as can be seen from Fig. 2 (a), there is an evident decline in eavesdropping rate from 25% at the beginning to almost zero after 1000 time slots. Similarly, both the jamming rate and the spoofing rate of Mallory decrease significantly and finally tend to zero. Fig. 2 indicates that Alice can learn how to select the transmit power when it contacts with Bob by the training of Q-learning algorithm in the specific range. In this situation, Mallory tends not to attack because the cost of the attack is too high. In other words, Alice can suppress the attack behavior of Mallory when it communicates with Bob, which further proves that the proposed Q-learning algorithm can achieve the purpose of secure communication. Fig. 3 shows the eavesdropping rate of Mallory with respect to the range of tolerant interference power of the primary user. In particular, Fig. 3 (a), (b) and (c) correspond to the tolerant interference power of primary user in the range of [1,50], [50,100] and [100,150], respectively. We can see from Fig. 3 that the proposed Q-learning algorithm can reduce the eavesdropping rate effectively. For instance, as shown in Fig. 3 (a), the eavesdropping rate falls below 0.025 after 500 time slots and it tends to zero as the time slot increases when the tolerant interference power of PU is in the range of [1,50]. Similarly, the eavesdropping rate begins to show a downward trend after 1500 time slots when the tolerant interference power of PU is in the range of [50,100] and [100,150], respectively. Finally, they all converge to zero when the time slots are larger than 3000. Simulation result in Fig.3 validates that the proposed Q-learning algorithm can make Alice select the optimal transmit power so that it can suppress the attack rate of the attacker in any range of the tolerant interference power of the primary user. In further, the jamming rate and the spoofing rate also decrease significantly and converge to zero in the same range of tolerant interference power. Fig. 4 shows the average secrecy capacity of Alice with respect to the tolerant interference power of the primary user. Fig. 4 (a), (b), (c) and (d) are associated with the tolerant interference power of primary user in the range of [1,10], [1,50], [50,100] and [100,150], respectively. As observed from Fig. 4, we can find that the Alice's average secrecy capacity increases on the whole as the number of training increases. For example, as shown in Fig. 4 (a), the average secrecy capacity of Alice based on the Q-learning algorithm increases dramatically with a rise of around 50%. We can observe from Fig. 4 (b) that there is an obvious increase after 1000 time slots in the secrecy capacity. Similarly, the average secrecy capacity of Alice continues to rise after a short period of decline both in Fig. 4 (c) and (d). In further, the secrecy capacity in Fig. 4 (c) and (d) is much more stable than that in Fig. 4 (b) after 3000 time slots. Simulation result in Fig. 4 demonstrates that the average secrecy capacity of Alice can be improved after learning and can move towards maximization.

VI. CONCLUSION
In this work, we have investigated the secure transmission problem under the smart attack in cognitive networks. The secondary users, Alice and Mallory, were allowed to utilize spectrum resources which were also used by primary user. The attacker, Mallory, had three attack modes including eavesdropping, jamming, and spoofing. We formulated an NE strategy game to maximize the utility of the transmitter and meanwhile minimized its damage from the attacker. The Q-learning algorithm was utilized to control the transmit power of the transmitter and determine the attack mode of the smart attacker. The employed Q-learning algorithm enabled the transmitter to obtain the optimal transmit power during the learning stage in the range of tolerant interference power of the primary user which hence suppressed the attacker eventually. Simulation results were provided to show that the algorithm could effectively and clearly achieve the expected target, which suppressed the attack behavior of the attacker. In future works, we will consider some learning based algorithms [51], [52], especially the deep learning based algorithms [53]- [55], to the considered system, in order to enhance the system performance.

APPENDIXES APPENDIX A PROOF OF LEMMA 1
If eqs. (18a)-(18c) hold, from (14), we have Thus, (16) holds for (x * Alice , 0). From (13), we have (A.1a), (A.1b), and (A.1c) as shown at the top of the next page, The above formulas show that ∂U a (P Alice , 0)/∂P Alice decreases monotonically with respect to P Alice . Thus, if (18d) holds, from (A.2) we have ∂U a (P Alice ,0) ∂P Alice P Alice =0 indicating that ∂U a (P Alice , 0)/∂P Alice = 0 has only a sole solution because of the formula (17a). From (A.2)-(A.4) we can know that when P Alice is smaller than x * Alice , U a (P Alice , 0) monotonically increases. On the contrary, when P Alice is larger than x * Alice , U a (P Alice , 0) monotonically decreases, which means that U a (P Alice , 0) has a maximum value. Thus, (15) holds and (x * Alice , 0) is an NE of this game. In this way, we have completed the proof of Lemma 1.