Dealing With Jamming Attacks in Uplink Pairwise NOMA Using Outage Analysis, Smart Relaying, and Redundant Transmissions

This study focuses on optimizing the performance of an uplink pairwise Non-Orthogonal Multiple Access (NOMA) scenario with and without the support of a relayer, while subject to jamming attacks. We consider two different relaying protocols, one where the sources and the destination are within range of each other and one where they are not. The relay node can be mobile, e.g., a mobile base station, an unmanned aerial vehicle (UAV) or a stationary node that is chosen as a result of a relay selection procedure. We also benchmark with a NOMA retransmission protocol and an Orthogonal Multiple Access (OMA) scheme without a relayer. We analyze, adjust and compare the four protocols for different settings using outage analysis, which is an efficient tool for establishing communication reliability for both individual nodes and the overall wireless network. Closed-form expressions of outage probabilities can be adopted by deep reinforcement learning (RL) algorithms to optimize wireless networks online. Accordingly, we first derive closed-form expressions for the individual outage probability (IOP) of each source node link and the relayer link using both pairwise NOMA and OMA. Next, we analyze the IOP for one packet (IOPP) for each source node considering all possible links between the source node to the destination, taking both phases into account for the considered protocols when operating in Nakagami- $m$ fading channels. The overall outage probability for all packets (OOPP) is defined as the maximum IOPP obtained among the source nodes. This metric is useful to optimize the whole wireless network, e.g., to ensure fairness among the source nodes. Then, we propose a method using deep RL where the OOPP is used as a reward function in order to adapt to the dynamic environment associated with jamming attacks. Finally, we discuss valuable guidelines for enhancing the communication reliability of the legitimate system.


I. INTRODUCTION
C ONSIDERING the stringent requirements for embedded or cyber-physical systems, Non-Orthogonal Multiple Access (NOMA) has several advantages [1], [2], [3].It has been shown that the NOMA-based systems can provide more predictable communication than Orthogonal Multiple Access (OMA)-based systems with proper settings for the specific application [4].Moreover, performance in terms of outage probability, wireless connectivity, and user fairness can be improved compared to OMA-based systems [3], [5], [6].In principle, multiple source nodes are served simultaneously using the same time and frequency in uplink NOMA.To separate each source node's signal at the destination, a successive interference cancellation (SIC) unit is used.
However, due to the complexity of the SIC unit and the occurrence of imperfect SIC in practice, having a lower number of active source nodes simultaneously, as in, e.g., pairwise NOMA, is practical [7].In fact, pairwise NOMA can be deployed on top of existing OMA-based protocols for various applications such as factory automation [4], [8], [9], [10].
Due to the nature of open wireless transmissions, interference from co-located clusters and other wireless networks operating at the same frequency band and within the same area cannot be avoided due to the exponential growth of the number of wireless devices [11], [12].In the worst case, harmful jamming attacks also aim to interrupt the ongoing transmissions of legitimate communication systems by generating noise signals over relevant wireless channels [13], [14], [15].For example, a potential reactive jamming attack is reported by the National Institute of Standards and Technology (NIST) in [16].As a result, taking interference in terms of any jammers and/or interferers into account is needed.Moreover, the jammers may be smart and deploy their own strategies, e.g., power allocation and location to optimize their own systems.Therefore, the legitimate system should also have a proper tool to deal with this situation.
To continue enhancing the communication reliability of pairwise NOMA also in the presence of jamming or strong interference, relaying can be used [17], [18].In general, a selected relayer located between the source node and the destination can help to forward the source node's packet to the destination to increase the probability that the source node's packet is decoded correctly at the destination.In the literature, there are different types of relaying strategies which are applicable in uplink NOMA and in this work we focus on relaying protocols that include two source nodes, one or multiple relayers, and a destination.In the first phase, the source nodes transmit their own packets to all relayers and to the destination.Thereafter, only one (selected) relayer forwards the correctly decoded packet(s) to the destination during the second phase using OMA.Examples of relaying protocols aiming to improve the communication reliability for multiple source nodes communicating with only one destination can be found in [19], [20], [21], [22].However, achieve the reliability improvement, both the power allocation and the behavior of the relaying protocols, e.g., the selected relayer's position, play an important role, especially in the presence of jamming attacks and/or strong interference.But the protocols presented in previous publications, e.g., [19], [20], [21], [22], have not taken jamming attacks into account, and thus an investigation on the effects of jamming attacks and how to deal with them in uplink pairwise NOMA using relaying is needed.
In this work, we consider the scenario as shown in Fig. 1 where a pair of two source nodes are communicating with one destination, aided by a (mobile) relayer, e.g., a unmanned aerial vehicle (UAV).The situation when the direct links between the two source nodes and the destination are not available, we denote conventional NOMA relaying protocol (CNRP).In contrast, when the direct links between both source nodes and the destination exist, the relaying protocol is termed full NOMA relaying protocol (FNRP).We also consider a pure retransmission NOMA protocol (RNP) in which both source nodes re-transmit their packets one more time in uplink pairwise NOMA without help from any relayer.To benchmark these three protocols, CNRP, FNRP, RNP, an adaptive power allocation OMA protocol (APAOP) is also considered in which each source node transmits its own packet to the destination in its own phase using adaptive power allocation.We analyze, adjust, and compare the four protocols such that the outage probability is minimized.To evaluate the communication reliability for the whole network, we extend the individual outage probability (IOP) for a single wireless link to the IOP for one packet (IOPP) for each source node taking multiple links into account and then finally the overall outage probability for all packets (OOPP).Accordingly, the main contributions of this work can be summarized as follows: • Closed-form expressions for the IOP of each source to destination link when using pairwise NOMA or OMA, each source node to the relayer link using pairwise NOMA, as well as the relayer to destination link using OMA are derived for Nakagami-m fading channels.
• We analyze the IOPP for each source node considering multiple links between the two source nodes and the destination, between both source nodes and the relayer, and between the relayer and the destination taking both phases into account for the considered protocols.• The OOPP is defined as the maximum IOPP obtained among the source nodes.This metric is useful to optimize the entire wireless network, e.g., to ensure the fairness condition among the source nodes.• Based on the OOPP, we propose a method to use deep reinforcement learning (RL) with OOPP as the reward function to adapt to the dynamic environment associated with jamming attacks.The rest of this paper is organized as follows: The related works are described in Section II, followed by the system model and a description of the four protocols, with and without a relayer in Section III.Next, the considered protocols are introduced in Section IV.After that, Section V presents the calculation of the IOP for one link, the IOPP for each source node, and OOPP in the uplink pairwise NOMA scenario.Next, the deep RL architecture is introduced in Section VI.After that, numerical results are presented in Section VII before providing some general guidelines.Finally, Section VIII concludes the paper.

II. RELATED WORKS
In [23], a relaying scheme for uplink NOMA is proposed with a fixed decoding order (FDO) scheme to enable decoding via linear combining without performing SIC at the destination.Closed-form expressions of the outage probability are derived in Nakagami-m fading channels for both source nodes.However, dynamic decoding order (DDO) is shown as a way to improve the communication reliability [10], [15], [24].Therefore, a DDO should be adopted.A spectrally-efficient cooperative relaying protocol is presented for use in a scenario with multiple relayers [19].The chosen relayer is decided based on the maximum channel gain criterion among channel gains between relayers and the destination.A distributed energy efficiency maximization strategy is considered using a game theoretic approach [25].Several amplify and forward (AF) schemes are investigated in [26], [27], [28], in which the maximum signal-to-interference-plus-noise ratio (SINR) is used to select the best relayer in [28].In [20], a UAV acting as a relayer in a disaster area is proposed to provide uplink relaying services, and the system throughput is also evaluated.To improve the achievable effective capacity for the machine type communication application, an uplink NOMA and buffer-aided relaying are proposed to assist the finite blocklength transmission [29], [30], [31], while multiple relayers scenario is considered using various Relay Selection (RS) strategies in [32].A massive machinetype communication scheme is also investigated in [33].In [34], [35], ergodic capacity and outage performance are presented and discussed for an uplink full-duplex cooperative NOMA system.A multi-hop cooperative NOMA scheme is introduced to enhance system energy efficiency and reliability [36].In [37], the authors considered cooperative techniques good for establishing pattern division multiple access and evaluated the system by using outage probability and system throughput.An energy-efficient secure short packet transmission of cooperative NOMA is proposed to assist massive machine-type communication application [38].In [39], a max-min SINR criterion is used to select the best relayer in uplink NOMA AF scheme and then the outage probability and throughput are evaluated.A partial decode and amplify NOMA scheme for uplink cooperative short packet communication is investigated by using the closed-form expression of average block error rate [21].In [22], the outage probability of the end-to-end uplink and downlink cooperative NOMA is evaluated over Nakagami-m fading channels.In [40], the effects of the relayer's position and power allocation on the ergodic rate are analyzed.In [41], an uplink NOMA-based hybrid satellite-terrestrial relay network is proposed and evaluated by using the outage performance.However, the effects of power allocation for uplink NOMA and relayer's position are still necessary to be investigated in the presence of jamming attacks.Note that the relay position can be mobile either due to it being a mobile access point, a UAV or the result of a selection procedure among multiple fixed location relayers.In addition, jamming attacks can be smart to change their strategies, e.g., location, transmit power, etc., to defeat legitimate systems.Therefore, to deal with these situations, deep RL is a good approach to enhance the communication reliability of legitimate systems [42], [43].

III. SYSTEM MODEL
In this work, we consider a system consisting of two source nodes S s , s ∈ {1, 2} communicating with a destination D aided by a mobile access point (AP), R, e.g., UAV, acting as a relayer in the presence of jamming attacks, Fig. 1.For the CNRP, FNRP, and RNP, during the first phase, both source nodes transmit their packets to the relayer and the destination in uplink pairwise NOMA, while only jammer J 1 is active to generate jamming signal over all channels to attack all relayers and the destination.In the second phase of the CNRP and FNRP, the relayer can help to forward the received packet(s) from the previous slot to the destination in the presence of a jamming attack from J 2 .For the RNP, both source nodes retransmit their packets in the second phase.For the APAOP, each source node is active at each phase.Note that both jammers are mobile and smart to change their transmit power level and position to defeat the legitimate system, while still optimizing their own systems.We assume that the length of each phase is equal to one another and only one jammer is active at each phase.Note that all legitimate devices are located inside the border and are protected by fences or walls.Consequently, all jammers are only allowed to stay outside of the border.Here, the UAV acting as a relayer is located at the altitude h compared to the plane consisting of both source nodes, the destination, and jammers.This is to have higher chances of line of sight (LoS) between the relayer and the legitimate devices on the ground, which can enhance communication reliability.Channels between D and S s , between R and S s , between R and D, between J 1 and D, between R and J 1 , and between J 2 and D are , and [15], respectively.
The channel coefficients g SD s , g SR s , g RD , g J 1 , g J 1 R , and g J 2 are assumed to be Nakagami-m fading, modeling a large number of wireless channels by adjusting its parameters, e.g., Rayleigh fading with m = 1, Rician fading with parameter K when m = (K+1) are the distances and path-loss exponents between D and S s , between R and S s , between R and D, between J 1 and D, between R and J 1 , and between J 2 and D, respectively.We also assume that all devices operate in half-duplex mode with a single antenna.Moreover, the relayer, S s , D, J 1 , and J 2 are located at (x R , y R , h), (x S s , y S s , 0), (x D , y D , 0), (x J 1 , y J 1 , 0), and (x J 2 , y J 2 , 0), respectively.The distances , and d J 2 can be expressed as follows: In practice, perfect channel state information (CSI) is not available at the receiver(s).Therefore, imperfect CSI is considered in this work.The channel coefficients, g y x , (x, y) ∈ {(s, SD), (s, SR), (, RD)}, using linear minimum mean square error are expressed as g , and |g J 2 | 2 can also be characterized by a Gamma distribution with unit mean and shape m y x , m J 1 , m J 1 R , and m J 2 , respectively.In this work, the channel estimation errors are considered to be fixed and independent compared to the average SINR.
For the CNRP and FNRP, the received signals at R in the first phase and D in both phases can be represented as follows: where P, P R , are the total transmit power of both source nodes, transmit power of the relayer, transmit power of the jammer 1 and 2, power allocation level for each source node S s , uplink signal of S s , the signal of R, noise signal of the jamming attacks from jammers 1 and 2, and additive white Gaussian noise at the relayer and the destination for both phases modeled as n R ∼ CN(0, D2.0 ) respectively.Note that the total transmit power level of both source nodes is equal to P, P 1 +P 2 = P.
As mentioned above, DDO can help to enhance the communication reliability of the legitimate system [10], [15], [24].This is why a DDO scheme is considered for the CNRP, FNRP, and RNP.All relayers and the destination use the estimated channel coefficients to decide on the decoding order.Define ) , where W is the system bandwidth.In the first phase, when h SR , S 1 's signal is decoded directly by considering both S 2 's signal and J 1 's signal as interference and then subtracted by SIC from the received signal y 1 R and y 1 D , respectively, before decoding S 2 's signal treating J 1 's signal as interference.Accordingly, the received SINRs at the relayer R and the destination to decode z 1 and z 2 can be represented as follows: where s signal is decoded first before decoding S 1 's signal, thus the received SINRs at the relayer and the destination to decode z 2 and z 1 can be formulated as Here, (μ 1 , μ 2 ) and (μ 1 , μ 2 ) do not need to be the same values.
In the second phase of the CNRP and FNRP, when R is active, the received SINR at the destination can be represented as

IV. PROTOCOLS
In this paper, we investigate four protocols: two relaying protocols in uplink pairwise NOMA, a re-transmission scheme using uplink pairwise NOMA, and a APAOP using OMA as follows.

A. CONVENTIONAL NOMA RELAYING PROTOCOL (CNRP)
In the literature, a wide range of relaying protocols has been proposed for various applications [45].The main principle of many relaying protocols is presented in Table 1 and we name them CNRP when combined with NOMA [36], [46].The previous studies usually assume that severely bad channels happen to direct links between the source nodes and the destination.This is why any direct links between the source nodes and the destination are ignored.When CNRP is adopted, several receivers can operate in a deep sleep mode in different phases, saving power consumption for the wireless devices.This protocol can adapt to adjust power allocation factors in the first phase and the relayer's position to improve the communication reliability for the legitimate system.

B. FULL NOMA RELAYING PROTOCOL (FNRP)
In practice, various applications such as quarrying, mining in construction sites still have good direct links between the source nodes and the destination and thus direct links cannot be ignored.To take all chances into account, indirect links between the two source nodes and the destination should be considered.We name the protocol including direct links between the source nodes and the destination Full NOMA Relaying Protocol (FNRP).The main idea of FNRP is that all receivers including the destination take a chance to decode the transmitted packet(s) from the first phase in Table 1 when operating in a receiver mode.Note that the destination is not active during the first phase for the CNRP.Therefore, the FNRP is useful when improving the communication reliability compared to the CNRP.Moreover, at the receivers, a maximal-ratio-combining (MRC) scheme can be deployed to improve the communication reliability but it is considered as a future work.In addition, when the destination can decode correctly one or two packets in the first phase, a feedback signal to the relayer and the two source nodes is useful.However, we consider the worst case in the presence of jamming attacks that all feedback are dropped.Similar to the CNRP, both power allocation and relayer's position are strategies to deal with jammers.

C. RETRANSMISSION NOMA PROTOCOL (RNP)
In the case without the relayer, the two source nodes just transmit and retransmit their packets in uplink pairwise NOMA and we call this protocol as RNP.Accordingly, during the second phase, the received signal and SINRs at the destination are similar to (9), ( 13), ( 14), (17), and (18) but different power allocation factors.With RNP, power allocation factors for the two source nodes in both phases are useful strategies to improve the communication reliability.

D. ADAPTIVE POWER ALLOCATION OMA PROTOCOL (APAOP)
When there is no relayer, we also consider Adaptive Power Allocation OMA Protocol (APAOP), in which S 1 is active in the first phase with transmit power level of μ 1 P and S 2 transmits its own packet in the second phase with transmit power level of (1 − μ 1 )P.Accordingly, the received SINRs at the destination in both phase can be represented as where To be fair for comparison among the four protocols, the total transmit power of both source nodes in this protocol are twice compared to the previous protocols.Similar to the RNP, only power allocation is used to cope with jammers for this protocol.

V. OUTAGE PERFORMANCE ANALYSIS
In this section, we first derive the closed-form expressions of the IOP for each source node at all legitimate receivers using pairwise NOMA in the first phase.Next, the closedform expressions of the IOP at the destination are derived when using OMA.Finally, the IOPP for each source node considering both phases is presented for all four schemes.

A. THE IOP OF EACH SOURCE NODE AT THE RELAYER AND THE DESTINATION IN THE FIRST PHASE
In this subsection, we derive the IOP for only one source node link to the relayer and the destination using uplink pairwise NOMA.First, we analyze and then derive the IOP of each source node at the destination and the relayer.The S 1 's signal cannot be decoded successfully at R and D when either of the following three disjoint cases occurs: (i) R and D fail to decode S 1 's signal correctly by considering S 2 's signal and J 1 's signal as interference when h , but R and D are still unable to decode S 1 's signal.It is the same for the S 2 signal.To be convenient for writing, we just remove the subscript SR/SD in (11)- (18).To be convenient for use, we use h s instead of h SR/SD s .Accordingly, the IOPs of S 1 and S 2 can be expressed as follows [15]: in which I 1 is calculated as where where A 1 and A 2 are the SINR thresholds to decode correctly the S s 's signal at the destination and R, respectively.Taking all possible cases into account, we can re-write the probabilities I 2 and I 4 as follows: in which I 40 , I 41 , I 42 , I 20 , I 21 , and I 22 are given as where

B. THE IOPS AT THE DESTINATION WHEN USING OMA
The IOP is a reliability metric evaluating a single wireless link.In this subsection, we derive the IOP for each source node link to the destination using OMA and the relayer link to the destination also adopting OMA.The IOP of the relayer link to the destination is derived first.When R is active, the outage probability that z R 's signal is not decoded correctly at the destination is given as where b = A 0 , c = A 0 (σ RD 2 ρ RD + 1).A 0 is the SINR threshold to decode successfully the transmitted packet from the relayer at the destination.
p RD is derived in the Lemma 2 in Appendix-B.When APAOP is deployed, the IOP for each source node link to the destination is determined as where

C. THE IOP FOR ONE PACKET OF EACH SOURCE NODE FOR THE CNRP
The transmitted packet from each source node can travel to the destination via two links consisting of the source node link to the relayer and the relayer link to the destination.Therefore, we consider the IOPP of each source node taking multiple links into account.With the CNRP, the source nodes' packets cannot be delivered correctly to the destination when the destination and/or the relayer have failed to decode the transmitted packets in the second and/or first phases, respectively.In other words, the IOPP of each source node can be derived based on the IOP for single wireless links in Sections V-A and V-B.Accordingly, the IOPP for each source node is given as follows: Here, p SR ) represents the probability that the source node packet-s is decoded successfully at the destination after two phases.

D. THE IOP FOR ONE PACKET OF EACH SOURCE NODE FOR THE FNRP
Compared to the CNRP, the IOPP for each source node for the FNRP considers two paths including three links: the source node link to the destination, the source node link to the relayer, and the relayer link to the destination.Based on the principle of the FNRP, the source nodes' packets cannot reach the destination when: (i) the destination is failed to decode the source nodes' packets in the first phase, and (ii) the destination has no correct packets following the path from S s → R → D. We can see that the IOPP for each source node is a joint probability of multiple IOPs for different single wireless links in both phases.Moreover, the IOP of each source node for single wireless links in different phases are independent to each other.To this end, the IOPP for each source node can be expressed as in which p SD s , p SR s , and p RD are the IOPs for single wireless links and obtained in Sections V-A and V-B.

E. THE IOP FOR ONE PACKET OF EACH SOURCE NODE FOR THE RNP
Both source nodes transmit their packets in uplink pairwise NOMA two times with different power allocation factors.The IOPs for each source node link to the destination in different phases are independent to each other.Therefore, the IOPP for each source node with one retransmission more can be calculated as follows: where p SD s.i is the IOP for the S s 's link to the destination at phase-i with power allocation factor μ 1.i .This probability is calculated as in ( 22) and (23).

F. DEFINITION OF OOPP
In the NOMA-based systems, user fairness is also a strict requirement [6], [47], [48].In which, all users belong to the same class of priority should experience the same quality of service (QoS), e.g., the same communication reliability level.To this end, we define a new metric, namely OOPP, based on the attained IOPP for each source node.Accordingly, the OOPPs for the CNRP, FNRP, RNP, and APAOP are given, respectively as

VI. DEALING WITH DYNAMIC CONDITIONS USING DEEP REINFORCEMENT LEARNING
The legitimate communication system aims to improve the communication reliability by minimizing the OOPP.Accordingly, we formulate three problems as shown in ( 50), (51), and (52).For both CNRP and FNRP protocols, the constraints (50b), (50c), and (50d) are related to the position of relayers inside the border.The constraints on power allocation factors are presented in (50e) and (50f).However, the constraints for the RNP protocols are power allocation factors in different phases in (51b) and (51c).For the APAOP, only power allocation factor is a constraint in (52b).
(P1) : min (P2) : min In practice, the positions of jamming attacks and their transmit power levels can be changed randomly to defeat the legitimate system.Therefore, the legitimate system should take this into account.In this case, RL is considered as a suitable approach to deal with dynamic conditions as mentioned above [49].
The RL architecture includes an agent and environment, Fig. 2. The main goal of RL is to train an agent to complete a task within an uncertain environment.The RL agent consists of two parts: (i) The policy on how to choose actions based on the states (observations) from the environment.The policy is typically a function approximator with tunable parameters using deep neural networks.(ii) The RL algorithm updates the policy parameters continuously based on actions, states, and reward to find an optimal policy maximizing the cumulative reward received during the task.In this work, a deep deterministic policy gradient (DDPG) agent is adopted to search for an optimal policy maximizing the expected cumulative long-term reward because it supports continuous actions, e.g., power allocation factor μ 1 , μ 1 .A DDPG agent is an actor-critic RL agent, including four function approximators as (a) Actor: The actor takes states as input and returns the corresponding action to maximize the longterm reward.(b) Target actor: This function approximator helps to enhance the stability of the optimization by updating the target actor parameters periodically based on the latest actor parameter values.(c) Critic: The critic takes state and action to return the corresponding expectation of the longterm reward.(d) Target critic: The target critic parameter is updated periodically based on the latest critic parameter values to enhance the stability of the optimization.In the training phase, the DDPG agent updates both actor and critic properties at each time step and stores past experiences adopting a circular experience buffer.The agent updates both actor and critic by employing a mini-batch of experiences randomly sampled from the buffer.Adopting a stochastic noise model at each training step is to perturb the action selected by the policy.The observation space, the action space, and the reward are defined as follows: • State space: To attack efficiently, the positions of all jammers and their transmit power levels in different slots can be changed.We also consider that the positions of both source nodes can be updated in different slots as well.Therefore, the state space is defined including (x J 1 , y J 1 , 0), P J 1 , (x J 2 , y J 2 , 0), P J 2 , and (x S s , y S s , 0). • Action space: To improve the reliability performance of the legitimate communication system in ( 50), (51), and (52), the action space includes power allocation factor μ 1 , μ 1 , and (x R , y R , h) for both CNRP and FNRP.However, the action space for the RNP only consists of power allocation factors in different phases, μ 1 , and μ 1 , while only μ 1 is the action for the APAOP.
To reduce the complexity of the DDO scheme while enhancing the fairness condition among the two source nodes, we use the DDO-fixed pairwise power allocation (FPPA) scheme, [10], [15].Accordingly, we only have μ 1 and μ 1.i as actions and their ranges are 0 < μ 1 < 1 and 0 < μ 1.i < 1.
• Reward: The legitimate communication system maximizes the communication reliability in terms of minimizing the long term OOPP.Consequently, the reward depends on which protocol is used, r = −p CNRP/FNRP/RNP/APAOP as shown in ( 46), ( 47), (48), and (49).Based on the state space, action space, and reward, the DDPG can be implemented.Each episode is comprised of multiple steps, where the DDPG algorithm follows a sequence: generating an action based on the current state, determining the next state based on the selected action, and then updating the four neural networks of the DDPG agent to facilitate learning.

VII. NUMERICAL RESULTS
In this section, we present numerical results for the IOPP for each source node and the OOPP of the considered system for the four schemes.The following system parameters are used: [50], [51].To ensure a fair comparison among all considered protocols, we also consider the total transmit power of the two source nodes for the APAOP is 2W.To check the correctness of the analysis in Section V, we also conduct computer simulations using MATLAB.In particular, for each considered IOPP, we first generate 10 7 samples of the channel gains following a Gamma distribution and then check the outage conditions as defined in ( 22), ( 23), ( 40), ( 41), ( 42), ( 43), (44), and (45).The simulation results of the IOPP for each source node are then attained by taking the average of all outage events across 10 7 samples.
To investigate the effect of power allocation on all schemes, we configure the positions and transmit power of legitimate nodes and jammers as follows: 50,20) , (x S 1 , y S 1 , 0) = (0, 0, 0), (x S 2 , y S 2 , 0) = (25, 100, 0), (x D , y D , 0) = (200, 50, 0), P J 1 = 1W, P J 2 = 1W, P R = 1W, (x J 1 , y J 1 , 0) = (100, −20, 0), and (x J 2 , y J 2 , 0) = (200, −20, 0).For the RNP, the power allocation factors for both phases are the same.Fig. 3 illustrates the effect of the power allocation factor on the IOPP for both source nodes using the four protocols mentioned in Section III.We can see that the analytical results and the simulation match very well validating the accuracy of the calculation.Moreover, the power allocation factor affects significantly the IOPP.It can be seen from the figure that the IOPP for both source nodes adopting the FNRP is much smaller than that of using the CNRP.This is because the channels between both source nodes and the destination are good but they are ignored by the CNRP.We also can see that the IOPP for each source node using the APAOP is not fair to each other, e.g., while the IOPP for S 1 's packet decreases dramatically, the IOPP for S 2 's packet increases significantly.
Fig. 4 indicates how power allocation and relayer position affect the OOPP of the CNRP when the two source nodes' parameters and both jammers' parameters are the same as investigated in Fig. 3.It is clear that both power allocation and UAV position play a very crucial role to enhance the communication reliability in terms of minimizing the OOPP  for each source node.We also can see the effects of power allocation and relayer position on the OOPP for the FNRP in Fig. 5.
Regarding the RL, three hidden layers including 100 neurons for each layer followed by rectified linear units (ReLUs) activation functions are adopted for the actornetwork.The activation function for the output layer is the hyperbolic tangent function.For the critic network, both states and actions are considered as inputs.First, all states are fed to a neural network using two hidden layers with 100 neurons for each, and action is fed to another neural network using a hidden layer with 100 neurons.Then, these both neural networks are concatenated before feeding to another neural network with 100 neurons for a hidden layer.The ReLU activation functions are employed for the critic network.Other configurations for the neural networks, such  as learning rate, gradient threshold, regularization factor, sample time, experience buffer length, minibatch size, and the number of steps per episode are set to 1e-3, 1, 1e-4, 1, 1e6, 128, and 1e3, respectively.Note that the number of hidden layers, neurons and other hyperparameters of the deep RL architecture have been selected via trial and error to find the best performing deep RL for the considered problem.For the state space of the RL, we configure that 0 ≤ x S s ≤ 20, 0 ≤ y S s ≤ 100, P J s ∈ {0.1 : 0.1 : 3}W, 0 ≤ x J s ≤ 200, −100 ≤ y J s ≤ −20 as shown in Fig. 6.We configure for the action space related to the relay node position as follows: h = 20, 30 ≤ x R ≤ 190, and 0 ≤ y R ≤ 200.Both training and inference phases of the RL are implemented using MATLAB on Desktop HP Z2 TWR Base G9, Core i9-12900K 3.20G 30MB 16 cores, 64GB DDR5, NVIDIA GeForce RTX 3080.Fig. 7 presents the average episode reward in terms of the OOPP versus the number of episodes for the four schemes.From the figure, we can see that the average episode reward in terms of average OOPP for all schemes converges after 23 episodes.It can be seen from the figure that the FNRP can offer the highest communication reliability, followed by the RNP, while the communication reliability of the APAOP and CNRP schemes is the worst and approximately equal to each other.This phenomenon is also demonstrated in Fig. 3.In the inference phase, we fix the transmit power of both jammers as the same to investigate the average OOPP over 1000 steps.In Fig. 8, a change of the average OOPP following the transmit power of both jammers is provided.In general, the average OOPP grows up significantly when the transmit power of both jammers increases.We also can observe the same trend of the communication reliability offering by each protocol as shown in Fig. 7.

A. CREATING GUIDELINES FOR A SPECIFIC SCENARIO
Based on the obtained results, both FNRP and RNP can offer higher reliability in terms of smaller OOPP compared to the CNRP and APAOP.Finally, the following guidelines can be provided: • When a mobile AP, e.g., a UAV, is available to act as a relayer, the FNRP should be employed to enhance the communication reliability of the legitimate wireless communication system.Even when the reliability requirement in terms of the OOPP is stringent, a mobile relayer may be a must.• When there is no mobile AP acting as a relayer, the RNP can be adopted to ensure that the communication reliability of the legitimate communication system is still good as shown in Figs.7 and 8.

VIII. CONCLUSION
In this paper, we investigate an uplink NOMA scenario with and without the support of a mobile AP acting as a relayer in the presence of jamming attacks.Particularly, we investigate two relaying protocols CNRP and FNRP, a retransmission scheme RNP using uplink pairwise NOMA, and an OMA scheme APAOP.First, we derive the IOP for each source node link to the relayer and the destination as well as the relayer link to the destination, and the IOPP for each source node considering multiple links in Nakagamim fading channels during two phases.Subsequently, we define the OOPP as a suitable metric, the maximum value from the obtained IOPP for both source nodes.To address the uncertain environment associated with jamming attacks, we propose a method using RL that adapts to dynamic parameters, including information related to jammers and the positions of source nodes.The results indicate that both power allocation and the relayer position play an important role to improve the communication reliability for the relaying protocols.Furthermore, the FNRP and RNP schemes offer superior reliability performance compared to the CNRP and APAOP.Finally, we provide a few guidelines for enhancing the communication reliability of the legitimate communication system.It can be seen that the derived closed-form expressions of the IOP, the IOPP, and the OOPP are useful to analyze and design the considered network also in the presence of jamming.Moreover, it can be concluded that outage analysis is an important tool of high practical relevance and it can be used by deep RL for online prediction of how to minimize outage probabilities.

A. APPENDIX I
Lemma 1: Given that h s ∼ G(m s , ρ s m s ) and ), the closed-form expressions of I 1a , I 1b , I 3a , I 3b , I 40 , I 41 , I 42 , I 20 , I 21 , and I 22 can be obtained as follows: , (57) in which I 41a , and I 41b are given as in which I 12a , I 12b and Q 1 are given as in which I 21a and I 21b are given as   m J 1 + q B m J 1 +q 10 , (70) m J 1 + q, B 11 x 2 B m J 1 +q 11 where  ) and h RD ∼ G(m RD , ρ RD m RD ) where m J 2 and m RD are positive integers, the closed-form expression of the probability p RD can be derived as follows: where B 16 = m J 2 ρ −1 J 2 + m RD ρ RD −1 b.Proof: Applying [10, Th. 1], this Lemma is proven.Proof: Applying [10, Th. 1], this Lemma is proven.

x
are uncorrelated.Moreover, all channels follow Nakagami-m fading, therefore channel gains | g y 3 a, I 3b , I 40 , I 41 , I 42 , I 20 , I 21 , and I 22 are derived in the Lemma 1 in Appendix A.

s
and p RD s are the IOPs of the s-th source node link to the relayer and the IOP of the relayer link to the destination, respectively.And (1−p SR s )(1−p RD s

FIGURE 3 .
FIGURE 3. The IOPP for each source node taking both phases into account.

4 .
The OOPP versus UAV position and power allocation for the CNRP (R is closer to D with bigger x R , R is further away from jammers with bigger y R ).

FIGURE 5 .
FIGURE 5.The OOPP versus UAV position and power allocation for the FNRP (R is closer to D with bigger x R , R is further away from jammers with bigger y R ).

FIGURE 6 .
FIGURE 6. Possible positions of the source nodes, relayer, and jammers for the deep RL.

FIGURE 7 .
FIGURE 7. Training progress for the four protocols.

FIGURE 8 .
FIGURE 8. Effect of transmit power levels of both jammers on average OOPP in the inference phase when P J 1 = P J 2 .

J 1 m J 1 e −m 2 ρ − 1 2 b 3 −B 7 b 7 m J 1 ( 6 m 2 Bm J 1 +k+q 9 ,
J 1 + k + q, B 9 x limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
RD and ζ RD , d J 1 and ζ J 1 , d J 1 R and ζ J 1 R , and d J 2 and ζ J 2

TABLE 1 . The principle of CNRP and FNRP.
s signal cannot be decoded correctly by considering both signals from S 1 and J 1 as interference when h