A Multi-Armed Bandit Framework for Efficient UAV-Based Cooperative Jamming Coverage

In this paper, the position control of multiple unmanned aerial vehicles, acting as cooperative jammers, is proposed to improve the security level of a legitimate ground transmission, where a precoder is designed to nullify the jamming signals at the legitimate receiver. In this scenario, the maximization of the weighted secrecy coverage, which measures the efficiency of cooperative jamming over a confined region, is addressed by following a multi armed bandit-based algorithm. The results show that our proposal converges to the one obtained by exhaustive search, and it shows significant improvement over a projected gradient-descent benchmark, while offering a shorter running-time. The results show that, for the considered system and the employed precoder, the inclusion of more than two UAVs does not lead to significant advantages in terms of secrecy performance.


I. INTRODUCTION
The advancement of the fifth generation of wireless networks (5G) have brought major breakthroughs that will continue to dramatically evolve towards 6G.This evolution is the motor that accelerates the Internet of Things (IoT) revolution, with applications in very different sectors, and very sensitive information being transferred through mobile networks to enable intelligent services.This big amount of data could be used for increasingly sophisticated attacks, thus security and privacy issues constitute one of the main challenges in 6G [1].
Recently, the use of unmanned aerial vehicles (UAVs) has attracted attention due to their deployment flexibility, strong line-of-sight (LoS) component, and ease of maneuverability, thus being able to bring novel opportunities for next generation of wireless networks as well as novel threat vectors [2].Nonetheless, physical layer security has emerged as a promising solution to safeguard the IoT by exploiting the characteristics of the underlying wireless channel to achieve secure communications with information-theoretic guarantees [3], specially for the very constrained scenarios where cryptography-based solutions will face severe limitations.
In this context, UAVs have been employed as cooperative jammers to assist ground legitimate transmissions by introducing artificial noise to interfere possible eavesdroppers.Particularly, a UAV-based cooperative jamming scheme is employed in [4] and the maximization of the secrecy The authors are with the Centre for Wireless Communications (CWC), University of Oulu, 90570 Oulu, Finland (e-mail: xavier.florescabezas@oulu.fi; diana.moyaosorio@oulu.fi;markku.juntti@oulu.fi).
All the numerical results can be reproduced using the MATLAB code available at: https://github.com/xflorescStaff/UAV-MAB-Framework.git.
Digital Object Identifier 10.1109/TVT.2023.3299670rate of a wireless system is tackled by performing trajectory optimization of the UAV joint with the power optimization of the legitimate and jamming signals.Therein, an iterative optimization algorithm is proposed, and it is shown that a higher secrecy rate is obtained regarding the benchmarks.Therein, only the LoS component of the air-to-ground links was considered, and the location of the eavesdropper is known a priori.
In real systems, it is unpractical to assume that the presence of an eavesdropper is known and the estimation of its channel is available, thus this unrealistic assumption can be dropped if security is defined over a an area.In that sense, the employment of a UAV jammer to enable secret transmissions of a ground wireless network is investigated in [5].In that work, an optimization problem is solved for the maximization of the proposed area-based metric so-called the intercept probability security region (IPSC), which is a measure of how many discretized positions for the eavesdropper experience an intercept probability below a certain threshold.This problem is solved for a set of possible positions of the eavesdropper within a certain area, with constraints such that the intercept probability of each of the eavesdroppers and the outage probability of the legitimate receiver are below certain thresholds.The results from the algorithm approach the exhaustive search results.However, the drawback of that work is that mixed-integer programs are solved with as many constraints as discretized positions of eavesdroppers over the area, thus the algorithm scalability to finer discretization is compromised.
On the other hand, to characterize the security level of wireless communications attained by the use of cooperative jamming, the authors in [6] proposed two area-based metrics, the jamming coverage (JC) and the jamming efficiency (JE), to evaluate the efficiency of the employment of cooperative jamming over a certain area.These metrics are based on the secrecy outage probability (SOP) over a confined region, avoiding assumptions on the knowledge of the positions of eavesdroppers.Later, this idea was extended to the scenario of UAVbased cooperative jamming in [7], by introducing a hybrid secrecy metric, the weighted secrecy coverage (WSC), that measures the coverage weighted by the efficiency of cooperative jamming.Therein, the position control of the UAV-based jammers was tackled to maximize the WSC.Additionally, a null-space precoding was employed in [8] to eliminate the interference produced by the jamming signal at the legitimate receiver.
The works in [7] and [8] have already provided insight on the optimal positioning of the UAV-based jammers and their impact on the WSC.However, they consider a static scenario with two UAV-based jammers, where the position of the receiver is assumed to be perfectly known.In this work, those ideas are extended to the multi-UAV scenario, where uncertainty on the position of the mobile legitimate receiver is considered, and a new design for a precoder that nullifies the jamming signal at the legitimate receiver is proposed.Then, the angular position control optimization for the nU AV UAV-based jammers is formulated to maximize the WSC of the system by modeling the problem according to a multi-armed bandit (MAB) framework, and it is assumed that the eavesdropper can be in any position inside a confined region.The contributions of this paper are the following: i) A closed-form expression for the SOP of the system is derived; ii) The optimal angular position control of the UAV-based jammers is obtained by solving the MAB problem applying the upper confidence bound (UCB) algorithm; iii) The results are contrasted with the ones obtained by applying the projected gradient descent (PGD) algorithm and the UAV 2D positioning algorithm from [5], and it is shown that our proposal converges to the exhaustive search results for low variance on the uncertainty of the position of the legitimate receiver while outperforming the considered benchmarks.

II. SYSTEM MODEL
Consider the system in Fig. 1 composed by a legitimate transmitter Alice (A) located at coordinates (0,0,0), and a legitimate receiver Bob (B) located at (x B , 0, 0).In this system, an eavesdropper Eve (E) located at (x E , y E , 0) within a circular region S of area |S|, defined by the radius R A from Alice, is trying to leak the information sent from A to B. However, the exact position of E is unknown.In order to improve the secrecy performance of the system, it is assumed that nU AV UAV-based cooperative jammers, {J i } i∈{1,...,nU AV } assist the ground communication by transmitting artificial noise.To reduce the complexity raised by coordination control and maneuverability of the UAVs, as well as the probability of collisions, a fixed altitude h J and orbit radius R J around A is assumed for the position of the UAVs, for which the position control of the UAVs is carried out over their angular positions.Thus, their cylindrical coordinates are (R J , θ J i , h J ) with i ∈ {1, . .., nUAV }, respectively.
In this system, B presents a simple mobility pattern within the region S, then x B varies with time.It is assumed that B moves from an initial to a final point over the axis formed between the positions of A and B, thus two different positions for B in the x axis, x (o) B and x (f ) B , are defined during an experiment.These positions are considered randomly and uniformly distributed within S, thus x and Bob completes its movement in T time steps.It was shown in [7] and [8] that there is symmetry with respect to any angular movement from Bob, where the UAVs would simply follow this movement.Therefore, angular movements from Bob can be omitted for simplicity purposes.Also, B assumes a speed v of such that it completely traverses from x B in the course of the experiment.Under these conditions, it is considered uncertainty on the estimate of the position of Bob at discrete time instant l ∈ {0, . .., T − 1} given by x The ground channel coefficients h AU with U ∈ {B, E} undergo Nakagami-m fading, with shape parameter m U and scale parameter ω AU , and are subject to additive white Gaussian noise (AWGN).Thus, the channel gains , where α G is the pathloss exponent for the ground channels.The air-to-ground (A2G) channels, between the UAVs and the ground nodes B and E, present a LoS and a non-LoS (NLoS) component with P LoS and P NLoS probabilities of LoS and NLoS connection given respectively by [5] and P NLoS = 1 − P LoS , where ψ and ω are environmental constants [9], and r J i U is the distance from node U and the projection on the plane of UAV J i .Then, the A2G channel coefficients and channel gains are given by 2 , where L J i U is the average pathloss of the links given by where d J i U is the distance between J i and U, α J is the pathloss exponent for the A2G links, η LoS and η NLoS are the attenuation factors for the LoS and the NLoS links, respectively.Herein, it is considered that A and the nU AV UAVs are single-antenna devices that jointly act as a distributed MIMO system with nU AV + 1 antennas.Thus, at time l, s A [l] is the information signal intended for B, and with i ∈ {1, . .., nUAV }, are the pseudorandom jamming signals.Moreover, the channel vector for each ground node U ∈ {B, E} at time l is denoted by ] T , with the system channel matrix given by ) is the matrix of corresponding transmit powers, and is the precoder matrix.Then, the received signal vector is given by y where y B [l] and y E [l] are the signals received at B and E at time l, respectively.Also, ] T is the information signal vector, and T is the vector of AWGN noise components with power N 0 at both B and E. Therefore, the received signal at node U is given by For simplicity of notation, the time indices are disregarded in the following sections.

A. Precoder Design
The precoder is designed to inject the jamming signal into the null space of B. For that purpose, it is considered that adjacent pairs of UAVs act in a coordinated manner such that s J j = s J j+1 for j ∈ {1, 3, 5, . .., nUAV − 1}, and all UAVs employs the same transmission power P J i = P J for i ∈ {1, 2, . .., nUAV }.Thus, the approach presented in [8] can be generalized by defining the precoders as q A = e , and with i ∈ {1, 2, . .., nUAV/2}.In these expressions, e (N ) i is an N -element vector of zeros with the i-th element equals to 1, such that h H U • e (nU AV +1) j = h J j−1 U .These precoders allow the jamming signals from each pair of UAVs to be nullified at B. Thus, the received signal-to-interference-plus-noise ratio (SINR) at B and E are, respectively, given by γ B = γ A g AB and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. where , and γ A = P A/ N 0 and γ J = P J /N 0 are the transmit signal-to-noise ratios (SNRs) at A and at each of the UAVs, respectively, with the total jamming power being γ T = nU AV i=1 γ J i .

III. PERFORMANCE ANALYSIS
The improvement of the secrecy performance of the proposed system due to the injection of jamming from the UAVs can be measured by the metric proposed in [6], which is described in terms of the SOP, and is given by where the subscript indicates if the SOP is computed with jamming (J) or without jamming (NJ).Note that for Δ > 1, an improvement on the secrecy performance is obtained by the use of jamming.Therefore, in Proposition 1, the SOP for the proposed system is derived.Proposition 1: The SOP of the proposed ground system with the assistance of N UAV-based jammers is given by SOP where SOP ∈ {SOP J , SOP NJ }, β ∈ {β J , β NJ } and η ∈ {η J , η NJ } whether it corresponds to the jamming, or no jamming case, such that Proof: The proof is provided in Appendix A.
To disregard the necessity of the knowledge of the eavesdropping channel, the area-based metric WSC is considered to get insights on the secrecy performance of the system, thus measuring the coverage of efficient employment of jamming, as proposed in [7].In this case, it can be noticed that, due to the precoder, the injection of jamming from the UAVs will only affect negatively E and not B. Thus, SOP J ≤ SOP NJ is true for all points on the area S, and Δ ≥ 1 in (4), which has also been demonstrated for the 2-UAV case in [8].Then the WSC can be expressed as where dx E and dy E are the differentials over the position of E in rectangular coordinates, r E and θ E are the polar coordinates of E (radius and angle), and dr E and dθ E are their respective differentials.Thus, the WSC is an integral over all the possible positions of E over S.

IV. ANGULAR POSITION CONTROL OPTIMIZATION
To simplify the coordination control and maneuverability of the UAVs, as well as to reduce the probability of collisions, a fixed altitude and surveillance radius is assumed, for which the position control of the UAVs is carried out over their angular positions.For that purpose, it is assumed that every adjacent pair of UAVs is located at the same angular separation θ from each other.Furthermore, according to the observations in [8] for the 2-UAV case, with the precoder design that injects jamming into the null space of B, the angle of one of the UAVs that maximizes the WSC is at 0 • if considering the axis formed between A and B, while the other UAV changes its angle as in Fig. 1(c).Under this circumstances, the position optimization problem of the UAVs can be formulated as where UAV J 1 is fixed at 0 • and the rest of UAVs are positioned at an angular separation of θ from the previous one. 1Due to the complexity of the WSC expression and to account for the unreliability of the estimate of x B , a machine learning approach is proposed to address the optimization problem in (8), which is tackled by following a MAB framework, which can be solved with an UCB algorithm [10].
The MAB problem consists of a decision-making entity, so-called agent, which chooses between a discrete number of actions that can perform in order to maximize its utility obtained from the environment.The agent alternates between exploiting the knowledge of the environment to obtain higher rewards or exploring other actions to obtain a better understanding of the environment [10].
In the current system, it is assumed that the actions are in the action space, which is a set of N θ angles θ θ θ = [θ 1 , . .., θ N θ ] with a range limited to (8b), and the reward is the WSC obtained when the UAVs enforce the chosen angular separation.According to the UCB algorithm, action A t is chosen at time t from θ θ θ following the rule where Q t (θ i ) is the expected reward estimate of action θ i at time t, N t (θ i ) is the number of times the action θ i has been chosen up to time t, and c is a constant parameter that controls the degree of exploration.After choosing action A t , its corresponding expected reward estimate is refined by observing the WSC obtained, and this process is carried out over time, exploring the different actions based on their current expected rewards and the number of times they have been chosen.Then, the physical angular position of the UAVs is chosen as the action with the current highest expected WSC in an off-policy fashion.It is worth noting that a common angular separation between adjacent UAVs reduces greatly the action space of the system from a scenario with different angles per UAV.For further details on the implementation of the algorithm, the related scripts are available from the following repository https://github.com/xflorescStaff/UAV-MAB-Framework.git.

V. NUMERICAL RESULTS AND DISCUSSIONS
In this section, simulations of illustrative cases are carried out for the proposed MAB-based optimization approach.Our results are compared with a PGD algorithm, the UAV 2D positioning algorithm proposed in [5], and the discretized exhaustive search results as benchmarks.The simulations are performed over 100 Monte Carlo experiments.For each experiment, a starting point and an ending point for B are randomly selected such that 0 ≤ x  with the ones obtained by exhaustive search on θ θ θ to obtain the best possible WSC.
Fig. 2 shows the SOP versus R S for different numbers of UAVs and total jamming power γ T , considering E at position (−20, 0, 0). 2 Note that the theoretical expression for the SOP perfectly matches with the Monte Carlo simulations.As expected, the SOP increases with the value of R S , and it decreases with higher γ T values.This does not occur for the single UAV case, which cannot implement the precoding scheme since it is implemented only over pairs of UAVs.For the single UAV case, higher γ T values lead to more interference on B, thus increasing the SOP.For other cases, with 2, 4 and 16 UAVs, increasing γ T derives into lower SOP as the jamming effectively impairs the channel to E without affecting the one to Bob.The difference is more significant for higher γ T values.
Fig. 3 shows the normalized WSC averaged over the Monte Carlo experiments versus the number of RL loops.It is assumed that the RL algorithm is performed once at each step given by the moving B from initial to final point, and the UAVs are positioned after RL loop.The figure shows curves for different uncertainty values, σ AB , and the comparisons with PGD, the approach from [5] and exhaustive search over θ θ θ (no uncertainty is considered on the position of B).Note that the proposed algorithm tightly approximates the exhaustive search results with few RL loops for smaller values of uncertainty of the position of B σ AB .Also, it is observed a significant gain in terms of WSC over the performance obtained by the PGD method and that of [5].In particular, the PGD approach reaches lower WSC values because it performs a single gradient descent step per iteration, which is an unreliable and outdated step given the movement of Bob and the uncertainty in its position.The approach from [5] underperforms because of its coarse discretization of the positions of the eavesdroppers and the efficiency and coverage of the jamming are not taken into consideration.
Fig. 4 shows the WSC averaged over the Monte Carlo experiments and over the duration of the movement of B versus γ T , by considering γ J = γ T /nU AV and different values of nU AV with m = 1.Note that the proposed precoding scheme increases the WSC of the system as more power is allocated for jamming, which is expected as any jamming is nullified at B while affecting only E. The non-precoding case performs poorly with the increase on the jamming power.It is also observed that the increase on the number of UAVs impairs the performance in terms of WSC.To understand this behavior, consider the term g INT 2i−1 in (3).Since no interference is produced at B when the precoder is employed, the WSC is affected only by the jamming produced at all possible positions of the eavesdropper.If there are more UAVs in the system, they are positioned closer to each other, making their channels to a given E or to B similar, thus the term g INT 2i−1 becomes smaller.Therefore, for larger numbers of UAVs, the impact of the jamming over the eavesdropper decreases, thus resulting on smaller WSC values.
In Fig. 5, the average angular separations θ obtained through Monte Carlo experiments and the duration of the movement of B are presented versus γ T for different values of nU AV with m = 1.Note that in the results obtained by using the precoding scheme, the angles remain constant for all values of γ T , where it can be observed that the optimal angular separations are obtained such that the UAVs are approximately positioned from 0 • to 180 • at regular intervals.On the other hand, the results obtained without precoding show that the angular separations between UAVs reduce for higher γ T values, thus UAVs are positioned closer together.Fig. 6 shows the average WSC values versus the common altitude h J of the UAVs for different values of γ J .Note that there is an altitude Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.for which the best WSC values are obtained in every case, which may be caused by the change in the predominance of LoS and NLoS components as the altitude of the UAVs increases.Moreover, for high enough γ J values, the precoding case always outperforms the non precoding case.Also, as the UAV common altitude increases, the WSC for the precoding case does not fall below 1 as Δ > 1 always holds.Fig. 7 shows the average WSC versus γ T for different values of shape parameter m for the ground channels, where m = m B = m E .The results show the performance for the cases with the proposed precoding scheme and without it, and following the angular pattern described in Fig. 1(b) (Symmetric) and in Fig. 1(c) (Asymmetric).Note that the WSC performance increases for higher values of m, for which the non precoding case performs better for smaller γ T values.On the other hand, it can be seen that the asymmetric angular positioning of the UAVs obtains better results for the precoding case, while the symmetric angular positioning obtains better results for the non precoding case.
Regarding the time consumption required by the MAB algorithm, the PGD approach and the exhaustive search approach, the following relations where obtained by calculating the running time of each method where T RL is the running time of MAB iteration, T PGD is the running time of a PGD iteration and T ES is the running time of an exhaustive search iteration.These relations were obtained by executing 10 4 experiments under the same conditions, from which it was obtained mean running times of 34.7 ms for the exhaustive search iterations, 6.9 ms for the PGD iterations, and 3.5 ms for the MAB iterations.

VI. CONCLUSION
In this paper, we proposed a fast and simple method for performing angular position control of nU AV UAVs acting as jammers to assist a legitimate ground communication under Nakagami-m fading.A precoder is proposed to inject the jamming signal into the null space of the legitimate receiver.Then, the optimization problem considers the maximization of the WSC under uncertainty on the position of the legitimate receiver.Thus, the optimization problem was modeled following a simple MAB-based algorithm.For the considered system, the results showed that the proposed algorithm outperforms the PGD benchmark and the benchmark from [5] in terms of WSC, and it converges to the exhaustive search benchmark as the uncertainty of the position of the legitimate receiver decreases, while also showing a much smaller running time than the PGD and the exhaustive search benchmarks.The precoding scheme shows better results in WSC than the non precoding case.Particularly, the secrecy performance in terms of WSC is better for lower number of UAV-based jammers.

APPENDIX A PROOF OF PROPOSITION 1
Considering R S > 0, the SOP can be expressed as The simplified expressions for the jamming and no-jamming case are given respectively by where the terms in parenthesis of each expression are represented as β ∈ {β J , β NJ } and η ∈ {η J , η NJ }, respectively, for the jamming (J) or the not jamming (NJ) case.Then, a general expression applied to either case can be written as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

− e
where f X (•) and F X (•) are the PDF and the CDF of the random variable X.We obtain (a) by writing the CDF of g AB and the PDF of g AE and applying the sum representation of the incomplete gamma function [11, 8.352.4] and the binomial expansion [11, 1.111].We obtain (b) by solving the integrals and rearranging the terms.

Manuscript received 21
September 2022; revised 10 April 2023 and 28 June 2023; accepted 24 July 2023.Date of publication 28 July 2023; date of current version 19 December 2023.This work was supported in part by the Academy of Finland through 6G Flagship Program under Grant 346208, and in part by Project FAITH under Grant 334280.The review of this article was coordinated by Prof. Yulin Hu. (Corresponding author: X. A. F. Cabezas.)

B
≤ R A with equal probability among their domain.Then, it is assumed that B is moving with constant speed from the start point to final point over T = 20 time steps.For each time step, a single MAB and PGD executions are performed, the corresponding WSC values are computed, and results are also compared

Fig. 2 .Fig. 3 .
Fig. 2. SOP versus R S values for different numbers of UAVs and values of γ T .