Mobile Cooperative Sensing Based Secure Communication Strategy of Edge Computational Networks for Smart Cities

With the development of smart cities, lots of mobile cooperative sensing based nodes have emerged. However, due to the open nature of wireless transmission, attackers in the networks can use some intelligent radio devices to deteriorate the secure transmission, which imposes a severe issue of information leakage. In this paper, we consider the transmitter has some computational tasks to be computed, under the environments of intelligent attacker. Due to the limited computational capability, the sender needs to offload some tasks to the receiver. To address this problem, we propose a power allocation algorithm based on combining the technology of reinforcement learning and game theory, in order to achieve an optimal secure data rate and meanwhile reduce the whole task latency of the transmission and computation with Q learning and Nash equilibrium. Then, the Nash equilibrium and its existence conditions are derived and proven mathematically. Finally, we perform some simulations under Matlab platform, and the results show that the proposed algorithm can effectively improve the secrecy data rate and reduce the whole system latency.


I. INTRODUCTION
In recent years, there has been a great progress in the development of smart cities, and many wireless techniques have been proposed to support the system development. Among these techniques, the mobile cooperative sensing is one of the most promising techniques, which can support the deployment and application of smart cities very efficiently [1]- [3]. In particular, compressive cooperative sensing and cooperative and active sensing have been proposed to apply in the The associate editor coordinating the review of this manuscript and approving it for publication was Mu Zhou . mobile sensor network to enhance the system performance [4]- [7]. On the other hand, with the rapid development of wireless technology, the traffic of mobile devices increases sharply [8]. However, due to the limited computational resources and performance, how to make reasonable use of the limited computational resources on the edge nodes becomes an important issue, which needs to be solved urgently [9], [10]. In order to deal with the problems mentioned above, such as insufficient processing capability and limited resources, many researchers have introduced the concept of computational offloading into mobile edge computing (MEC) networks [11]- [13]. In the MEC networks, user terminal (UE) offloads some computational tasks to edge nodes, in order to solve the shortcomings of equipment in resource storage, computational performance and energy efficiency [14]- [16].
The process of computational offloading generally refers to the reasonable allocation of computationally-heavy tasks to the edge nodes with sufficiently computational resources for processing, and then the feedback of the calculated results from the edge server [17], [18]. This process is often affected by a number of practical factors, such as radio communication channels, the performance of the mobile devices, and so on [19]- [21]. Therefore, the key to realize computational offloading lies in specifying an appropriate offloading decision [22]- [25]. The offloading strategy affects the latency and energy consumption of both communication and computation, and it is basically an important method to utilize the computational resources of edge nodes, at the cost of wireless transmission. Hence, the offloading strategy can be viewed as a trade-off between the communication and computation. Generally speaking, the decision about computational offloading can be classified into the following three categories: • Local computation: The entire computational task is completed locally.
• Full offloading: The entire computational task is allocated to the edge nodes for processing.
• Partial offloading: A part of the computational task is left for the local processing, while the other part is offloaded to the edge nodes for processing.
There are some existing works on the study of offloading strategy for the MEC networks [26]. In [27], [28], the authors proposed a deep Q-network which is based on the Q-learning algorithm to optimize the system offloading strategy of MEC networks, in order to reduce the network latency and energy consumption. In addition, the authors in [29] employed the ant colony optimization (ACO) algorithm to optimize the offloading strategy and used the relay selection technique, in order to reduce the system cost measured by a linear combination of both latency and energy consumption. Moreover, the authors in [30], [31] considered price mechanism in the MEC networks, and studied the impact of price on the system offloading strategy. In further, the authors in [32] proposed a novel framework to optimize the offloading strategy as well as the relay selection and wireless bandwidth allocation, in order to enhance the network performance in terms of latency and energy consumption. All these works clearly indicate that the offloading strategy plays a significant role in the system design for the MEC networks.
Another key challenge in the MEC networks is the attack from the smart attackers in the networks. The smart attackers can operate in spoofing, jamming or eavesdropping mode, which severely affects the system secrecy performance. Hence, it is of vital importance to suppress the smart attackers in order to safeguard the secrecy performance of MEC networks. In this viewpoint, the recent unmanned aerial vehicle (UAV) technique can be used to assist the secure transmission, based on the interference alignment [33], [34]. Moreover, the non-orthogonal multiple access (NOMA) technique can be implemented to enhance the network security, where the secrecy data rate can be significantly increased [35]. In further, caching technique can be exploited into the wireless networks, in order to enhance the network security, through increasing the dimension of communication resources at the cost of storage [36], [37].
In this paper, we consider an MEC network where the transmitter has some computational tasks to be computed, under the environments of intelligent attacker. Due to the limited computational capability, the sender needs to offload some tasks to the receiver. By combining the technology of reinforcement learning and game theory, this paper proposes a power allocation algorithm, in order to achieve an optimal secure data rate and reduce the whole task latency of both communication and computation with Q learning and Nash equilibrium. Then, the Nash equilibrium and its existence conditions are derived and proven mathematically. Finally, we perform some simulations under Matlab platform, and the results show that the proposed algorithm can effectively improve the secrecy data rate and reduce the whole system latency.
The organization of this paper is given as follows. After the introduction, Section II describes the model of MEC networks with under intelligent attack, and then details the communication and computation process. Section III presents the transmission game based on the system latency for the transmitter and attacker, and Section IV provides an effective power allocation algorithm for the transmitter in the networks. Simulation results are provided in Section V to offer valuable insights into the system performance, and finally, conclusions are drawn in Section VI.
Notations: Let CN (0, β) be a random variable (RV) with zero mean and variance β, subject to circularly symmetric complex Gaussian. In addition, we use f X (·) to denote the probability density function (PDF) of the RV X , and the operation Pr(·) returns probability.

II. SYSTEM MODEL
As shown in Fig. 1, Alice sends some secure messages to Bob through the main link and there is an attacker Eve in the network. Alice has a computational task, but due to the lack of computational capability, she needs to offload a part of the computational task to Bob. Alice has the flexibility to adjust her transmit power P A . Eve has the option of keeping silent, eavesdropping, jamming, and spoofing as its mode of attack.
• Eve chooses to keep silent: In this case, Alice sends a normalized signal x a to Bob, and then Bob receives a signal y 0 , where h AB ∼ CN (0, σ 2 ab ) is the channel parameter of the main link and n b ∼ CN (0, σ 2 n ) is the additive white Gaussian noise (AWGN) at Bob. According to the Shannon's theorem [38], the system secrecy data rate R 0 can be described as where W B is the wireless bandwidth and σ 2 n is the noise power. The local latency to compute the local task can be written as [27], [28] where ρ is the proportion of task offloading, and L represents the task size. We use η to denote the number of CPU cycles required for one-bit task and the computational capability of the CPU at the Alice is represented by f A . In particualr, ρ represents the proportion of the task to be calculated by the Bob, while 1 − ρ represents the proportion of the task to be computed by the Alice itself.
The transmission latency of offloading, t 1 , is given by The computational latency at the Bob, t 2 , can be written as where f B represents the computational capability of the CPU at the Bob. Therefore, the whole latency is t local + t 1 + t 2 .
• Eve chooses to overhear the message: In this case, the Alice sends Bob the secure message x a , and then Eve receives a signal y 1 , where h AE ∼ CN (0, σ 2 ae ) is the channel parameter of the Alice-Eve link and n e ∼ CN (0, σ 2 n ) is the additive white Gaussian noise (AWGN) at Eve. Similarly, the system secrecy data rate under eavesdropping attack can be written as where [x] + returns x if x is positive, or zero otherwise. From (7), the secure transmission latency t 1 becomes • If Eve chooses to send a jamming signal x J with jamming power P J to obstruct transmission of information: In this case, Bob will receive a signal y 2 as where h BE ∼ CN (0, σ 2 be ) is the channel parameter of the Bob-Eve link. Similarly, the system secrecy data rate under jamming attack mode, R 2 , is denoted by From (10), the secure transmission latency t 1 is given by • Eve chooses to send a spoofing signal x S with a spoofing power P S to deceive Bob: In this case, the Bob will receive a signal y 3 , denoted by Similarly, the system secrecy data rate under the spoofing attack is denoted by where γ represents an influence factor on the spoofing signal. From (13), the secure transmission latency, t 1 , becomes

III. TRANSMISSION GAME BASED ON SYSTEM LATENCY
According to the game theory, the interaction between the Alice and Eve can be viewed as a non-cooperative game. The action set of Alice is [0, P max ], i.e., Alice can choose a proper power P A from the range [0, P max ] as its transmit power, where P max is the maximum transmit power that Alice can choose. The action set of Eve is [0, 1, 2, 3], i.e., Eve can choose one attack mode q from these attack modes, where q = 0, 1, 2 and 3 correspond to four modes of keep silent, eavesdropping, jamming and spoofing, respectively. The purpose of Alice is to reduce the whole latency of the system as much as possible, while the purpose of Eve is to increase the whole latency of the system as much as possible.
In order to achieve the goal of optimizing the system latency, we set the benefit function of Alice u A as where C A is the cost coefficient of transmit power. From (15), we can find that the benefit of the Alice becomes worse when the latency becomes larger or an increased transmit power is used. Hence, the Alice tends to use a smaller transmit power and achieve a smaller latency in the whole secure transmission process. On the contrary, the benefit function of Eve u E is denoted by in which C(q) represents the cost of Eve launching a specific attack mode with C(1) = 0, C(2) = θ E , C(3) = θ J and C(4) = θ S . From the definition of Nash equilibrium, the Nash equilibrium (P A * , q * ) of the game can be obtained from the following two inequalities, It can be seen from (17) and (18) that the strategies of Alice and Eve in Nash equilibrium are better than other strategies in the same environment, that is, both sides reach a balance. In this condition, the system balance is achieved by both the Alice and Eve.
Lemma 1: When inequalities (20a)-(23d) are satisfied, there exists a Nash equilibrium (P A * , 0) in the game, and P A * is given by (19), ρL where the superscript * in R * 0 , R * 1 , R * 2 and R * 3 represent that P A is P A * in R 0 , R 1 , R 2 and R 3 , respectively. Similarly, the superscript m in R m 0 represents that P A is P max in R 0 .
Proof 1: If (20a)-(20c) hold, from (16), we have Thus, (17) holds for (P A * , 0). From (15), we have which indicates that ∂u A (P A , 0) /∂P A monotonically decreases with respect to P A . Moreover, we can have Therefore, when P A → 0 + holds, we can have If (25d) holds, we have Therefore, we know that there is only one solution which satisfies ∂u A (P A , 0) /∂P A = 0, and the solution is given by (19). From the above derivation, we can see the monotonicity of the function u A (P A , 0) with respect to P A . Thus, u A (P A , 0) achieves the maximum value at P = P A * , i.e., (17) also holds for (P A * , 0). In this way, we have completed the proof of Lemma 1.
In the following Lemma 2, we provide an NE (P max , 0) result.
Lemma 2: The game has an NE (P max , 0), if where the superscript m in R m 0 , R m 1 , R m 2 and R m 3 represents that P A is P max in R 0 , R 1 , R 2 and R 3 , respectively. VOLUME 8, 2020 Choose a transmit power P n using the ε-greedy policy 5: Choose the proportion ρ of tasks to compute in local 6: Observe the attack types q n and the utility of Alice u A 7: Update the Q function: Find the optimal value function: V (s n ) = max 0≤P A ≤P max Q(s n , P A ) 9: end for Proof 2: Similar to the proof of Lemma 1, if (21d) holds, we have As ∂u A (P A , 0) /∂P A decreases monotonically with respect to P A , we can find that u A (P A , 0) is increasing monotonically, and it can achieve the maximum value at P A = P max . i.e., (17) holds for (P max , 0). If (21a)-(21c) hold, from (16), we have Thus, (18) also holds for (P max , 0). In this way, we have completed the proof of Lemma 2.

IV. POWER ALLOCATION ALGORITHM
In this paper, we describe the power allocation algorithm for the Alice, which is of vital importance for the system benefits of both Alice and Eve. Specifically, the parameters are firstly initialized, and then Alice uses a ε-greedy strategy to select the transmit power as her current action strategy. After that, Eve selects an attack mode as its behavioral strategy. The Q function Q(s, P A ) is related to the system state s as well as the power P A , and the system state s on time slot t is the attack mode of Eve on time slot t − 1. The value function V (s) records the optimal value of the Q function Q(s, P A ). We set the learning rate to α ∈ [0, 1], and the discount factor to δ ∈ [0, 1]. Finally, through repeated learning, a solution of the power allocation for the Alice can be achieved. The whole procedure of the power allocation algorithm can be summarized in Algorithm 1.

V. SIMULATION RESULTS
In this part, we perform some simulation experiments by using Matlab to verify the effectiveness of the proposed secure communication strategy. The main parameters are set as follows. The average channel gain of the main channel, σ 2 ab , is set to 1.2; the average channel gain of the eavesdropping link, σ 2 ae , is set to 0.2; and the average channel gain of the jamming and spoofing link, σ 2 be , is set to 0.6 [39], [40]. The noise power is set to 1, and the wireless bandwidth W B is set to 100MHz. The task size L is set to 100Mbit, and CPU cycle required for one-bit, η, is set to 10. Moreover, we set f A = 1GHz, and f B = 20GHz. The cost coefficient of the transmit power at the Alice, C A , is set to 0.1, and the influence coefficient of the spoofing, γ , is set to 0.6. In further, the eavesdropping attack cost θ E is se to 2.6, the jamming attack cost θ J is set to 2.8, and the spoofing attack cost θ S is set to 3.
Figs. 2-4 demonstrate the variation of the secrecy data rate versus the time slot, where several values of the offloading ratio ρ are used. Specifically, Fig. 2, Fig. 3 and Fig. 4   correspond to ρ = 0.1, ρ = 0.5 and ρ = 0.8, respectively. We can observe from Figs. 2-4 that after some trials, a stable secrecy data rate can be achieved for different values of the offloading ratio. In particular, a stable secrecy data rate of 1.72 bps/Hz can be achieved after 2300 times of trial when ρ = 0.1; a stable secrecy data rate of 1.74 bps/Hz can be achieved after 1500 times of trial when ρ = 0.5; and a stable secrecy data rate of 1.75 bps/Hz can be achieved after 1200 times of trial when ρ = 0.8. These results clearly indicate that a stable secrecy performance can be achieved for the MEC networks after some trials for different values of ρ, which verifies the effectiveness of the proposed power allocation scheme.
Figs. 5-7 illustrate the variation of the system whole latency of the considered MEC networks versus the time slot, where several values of the offloading ratio ρ are used. In particular, Fig. 5, Fig. 6 and Fig. 7 are associated with ρ = 0.1, ρ = 0.5 and ρ = 0.8, respectively. We can find from Figs. 5-7 that after some trials, a stable latency performance  can be achieved for different values of the offloading ratio. In particular, a stable latency of 1.1s can be achieved after 1000 times of trial when ρ = 0.1; a stable latency of 1.5s can be achieved after 800 times of trial when ρ = 0.5; and a stable latency of 1.6s can be achieved after 500 times of trial when ρ = 0.8. These results clearly indicate that a stable latency performance can be achieved for the MEC networks after some trials for different values of ρ, which further verifies the effectiveness of the proposed power allocation scheme.
Figs. 8-10 demonstrate the variation of the attack rate versus the time slot, where several values of the offloading ratio ρ are used. Specifically, Fig. 8, Fig. 9 and Fig. 10 correspond to ρ = 0.1, ρ = 0.5 and ρ = 0.8, respectively. We can observe from Figs. 8-10 that after some trials, a stable attack rate can be achieved for different values of the offloading ratio. In particular, a stable attack rate of 0.138 can be achieved after 2500 times of trial when ρ = 0.1; a stable attack rate of 0.136 can be achieved after 2000 times of trial when ρ = 0.5; and a stable attack rate of 0.140 can be achieved   after 2000 times of trial when ρ = 0.8. These results clearly indicate that a stable attack rate can be achieved for the MEC networks after some trials for different values of ρ, which verifies the effectiveness of the proposed power allocation scheme furthermore.

VI. CONCLUSION
In this paper, we studied an MEC network where the transmitter had some computational tasks to be computed, under the environments of intelligent attacker. Due to the limited computational capability, the sender needed to offload some tasks to the receiver. By combining the technology of reinforcement learning and game theory, this paper proposed a power allocation algorithm in order to achieve the optimal secure data rate and meanwhile reduce the whole task latency of both the communication and computation with the Q-learning and Nash equilibrium. Then, the Nash equilibrium and its existence conditions were derived and proven mathematically. Finally, some simulations under Matlab platform were demonstrated to show that the proposed algorithm can effectively improve the secrecy data rate and reduce the whole system latency. In future works, we will incorporate some other wireless transmission techniques, such as UAV [41], massive MIMO [42], and deep learning technique [43], [44] into the considered MEC networks, in order to further reduce the system latency and energy consumption.

DATA AVAILABILITY
The data of this work can be available through the request on the corresponding author by e-mail.