Q-Learning Aided Resource Allocation and Environment Recognition in LoRaWAN With CSMA/CA

The mutual interference among wireless nodes is a critical factor in the Internet-of-Things (IoT) era due to its dense deployment. Due to its large coverage area, wireless nodes may not be able to detect the on-going communication of other nodes in a long range wide area network (LoRaWAN), which is one of the low power wide area (LPWA) standards. This results in packet collision. The packet collision among LoRaWAN nodes significantly deteriorates network performance functions such as packet delivery rate (PDR). Furthermore, if packet collision happens, LoRaWAN nodes must retransmit packets, draining their limited battery power. Thus, mutual interference management among LoRaWAN nodes is important from the perspectives of both network performance and network lifetime. However, due to its large network size, it is difficult to explicitly comprehend the wireless channel environment around each LoRaWAN node, such as the relation among other LoRaWAN nodes. Thus, in this paper, we utilize the powerful machine learning technique. The wireless environment around LoRaWAN nodes are learned, and the knowledge is utilized for resource allocation in order to improve PDR performance. In the proposed method, Q-learning is adopted in a LoRaWAN system, and the weighted sum of the number of successfully received packets is treated as a Q-reward. The gateway (GW) allocates resources to maximize this Q-reward. The numerical results considering LoRaWAN elucidate that the proposed scheme can improve average PDR performance by about 20% compared to the random resource allocation scheme.


I. INTRODUCTION
To meet the demand for high speed communication, wireless access technologies have been evolving.Similarly, low power consumption communication is becoming more important despite the reduction of communication speed due to the emergence of the Internet-of-Things (IoT) [1].Long range wide area network (LoRaWAN) is one of the promising network structures for low power wide area (LPWA) networks, which provide low speed, long range communication for distances of up to 10 km.Chirp-spread spectrum (CSS) technique is adopted for the physical layer of LoRaWAN.For the medium access control (MAC) layer, each LoRaWAN The associate editor coordinating the review of this manuscript and approving it for publication was Kun Yang.node adopts pure ALOHA.Due to this simple MAC protocol, increased packet collision due to the large number of LoRaWAN nodes is a critical factor in the limitation of the network performance.One of the countermeasures is the introduction of a duty cycle, which limits the transmission interval of each node to a predetermined threshold [2].Recently, the application of carrier sense multiple access with collision avoidance (CSMA/CA) was proposed to improve the performance of LoRaWAN [3].For example, CSMA/CA is essential for LoRaWAN in Japan [4].In this protocol, LoRaWAN nodes detect the wireless medium before starting packet transmission.However, due to LoRaWAN's wide communication area and the low transmission power of its nodes, packet collision happens quite often in comparison to legacy wireless LAN systems.Because the LoRaWAN node has limited functionality due to its low cost, the introduction of more complicated interference management technologies into LoRaWAN nodes is not appropriate.One potential solution is to allocate orthogonal frequency channels to LoRaWAN nodes that often collide with each other.In LoRaWAN, there are up to 16 orthogonal frequency channels [5], and each LoRaWAN node randomly chooses one of the multiple available channels from information provided by its gateway (GW).However, it is difficult to decide which channel should be assigned to each LoRaWAN node due to the large scale of the network and the limited functionality of LoRaWAN nodes.Moreover, LoRaWAN nodes cannot inform the GW of the surrounding wireless environment, such as how often each LoRaWAN node can carrier sense (CS) the on-going communication due to its limited functionality.Thus, a resource allocation scheme that does not require such feedback from nodes is demanded.Conventional methods such as spreading factor (SF) allocation schemes are proposed for LoRaWAN in [7]- [9].In [7], SF and coding rate are jointly assigned to ensure a high transmit success rate.Moreover, scalability [8] and coding rate fairness [9] are also considered.However, these works consider only ALOHA multiple access; no existing work considers orthogonal frequency channel allocation in LoRaWAN with CSMA/CA.
In this paper, we propose the utilization of a powerful machine learning technique for efficient orthogonal channel assignment in LoRaWAN with CSMA/CA.Because it is difficult to obtain the training set in a practical system, we focus on reinforcement learning, which does not require a training set but can learn the environment by observing the output from the environment after its action.To the best of our knowledge, this is the first work that tackles orthogonal resource allocation in LoRaWAN where additional information exchange is not allowed.The number of successfully received packets at a GW is used as the reward of learning so that no explicit feedback from a LoRaWAN node is needed for resource allocation.Because there is a strong correlation between the number of received packets and packet delivery rate (PDR), this resource allocation can improve PDR performance.The proposed scheme is shown to improve the average PDR performance by 20% compared to the random allocation scheme through a computer simulation with consideration for LoRaWAN specification.
The rest of this paper is organized as follows.In Sect.II, we introduce LoRaWAN and its system.In Sect.III, we summarize the system model considered in this paper.In Sect.IV, we briefly review the existing learning method.In Sect.V, we propose Q-learning based wireless resource allocation.In Sect.VI, computer simulation results are provided.Sect.VII concludes the paper.

II. LORAWAN A. LORAWAN FUNCTIONS 1) PHYSICS LAYER
LoRaWAN is one of the LPWA standards, and adopts CSS modulation and frequency shift keying (FSK) as a physical layer technology.Its data rate and communication range are determined by SF.SF indicates the receive threshold.In a higher SF, the receiver can receive packets with lower received signal power, but the data rate from the transmitter is also reduced.Let the frequency bandwidth be W [Hz].Then, the chip length T c [sec] of the CSS symbol is given by As SF increases, the transmitted signal has stronger resistance against noise and interference at the expense of the data rate.The typical data rate and signal-to-noise power ratio (SNR) limit is shown in Table 1.The CSS modulated signal is transmitted over one of the orthogonal frequency channels.For LoRaWAN, there are up to K orthogonal frequency channels which depend on region and frequency [14].Each GW informs the LoRaWAN nodes of the available channel indices [5].

2) MAC LAYER AND MULTIPLE ACCESS SCHEME
A simple ALOHA protocol is adopted as a MAC layer in LoRaWAN as its simple operation is suitable for low cost LoRaWAN nodes.Three classes are defined for LoRaWAN nodes, i.e., class A, B, and C [5].Class A is mandatory for all LoRaWAN nodes.Class A nodes receive the downlink transmission together with an ACK message via two receive windows which are open after the uplink transmission of a node.Class B nodes periodically open a beacon receive window.Class C nodes always open a receive window.Thus, a GW can inform each node of necessary commands such as available frequency channels via downlink transmission.

III. SYSTEM MODEL A. SYSTEM MODEL
Fig. 1 shows the LoRaWAN system considered in this paper.N LoRaWAN nodes are randomly and uniformly distributed within a network area of D × D [km 2 ].One GW that controls LoRaWAN nodes and receives information from them is located at the center of the area.In total, K orthogonal frequency channels are available in this system.Let us denote the set of LoRaWAN nodes and that of the orthogonal frequency channels as N and K, respectively.
Each LoRaWAN node generates packets of two different traffics [6].The first traffic is regularly generated following predetermined packet generation interval T interval,n , which is selected from the set T interval .In this study, the LoRaWAN nodes that have the same packet generation intervals are called cluster.A random offset T offset,n ∼ U[0, T interval,n ] is assigned to LoRaWAN node n.The packet generation interval indicates the application type in the communication area such as gas meter and water supply meter.Note that even the LoRaWAN nodes with the same packet generation interval may not transmit the packets simultaneously owing to the different offsets.We assume that there are U application types.Thus, on an average, (N /U ) LoRaWAN nodes generate packets with the same interval and attempt to send the packet to the GW.The second traffic is generated once an event is detected.In this study, an event (e.g., fire and electricity accident) occurs at time T event in each epoch at a random position, and it propagates in the communication area with predetermined speed [6].The exponential propagation model is considered in this study.
Each LoRaWAN node transmits packets in accordance with its duty cycle.After finishing a transmission of packet i, LoRaWAN node n waits for the transmission of packet (i + 1) transmission until T wait,n,i , which is given by where N trans,n,i is the transmitted packet size of packet i from node n, R b is the data rate, and G is the duty cycle.
If multiple LoRaWAN nodes transmit packets to the GW using the same frequency channel simultaneously, the GW receives multiple packets.If both the SNR and the signalto-interference power ratio (SIR) are above the thresholds SNR and SIR , respectively, the packet is considered to be successfully received.If the transmitted packet is lost, the LoRaWAN node retransmits packets based on binarybackoff [11].The backoff length is calculated by uniform distribution with [0,CW], where CW is given as where CW min is the minimum backoff length, and N r is the number of retransmissions.

B. CHANNEL MODEL
The received signal power of LoRaWAN node n at GW is given as where P t,n is transmit power of LoRaWAN node n, P pathloss (d n ) is a path loss component, ψ is shadowing component that is a function of location of LoRaWAN node (x n , y n ).
Pathloss is given as where d n is the distance between LoRaWAN node n and the GW [km], and f c is the carrier frequency [MHz].Propagation parameters a, b, and c are the coefficients for distance, offset, and frequency loss component, respectively.For the propagation model between LoRaWAN nodes, we adopt the same model given by ( 4) and ( 5) with different parameters [13] because GWs are generally located above LoRaWAN nodes.

C. PROBLEM FORMULATION
Let us denote the PDR of LoWaWAN node n by P del n , which is given by where R n denotes the number of successfully received packets from LoRaWAN node n while S n denotes the number of packets generated during a predetermined time length T epoch .
Hereafter, this time length is defined as epoch.
The optimal channel selection aims to choose a proper channel to maximize the expected PDR of each node, i.e., where E[x] denotes the ensemble average operation.In this study, channel allocation is executed every epoch.In this model, R n depends on the channel allocation of other nodes due to their interference.This makes the optimization problem one of combination optimization, i.e. it cannot be solved in practical time.Moreover, S n also depends on other system parameters..For example, a large T wait i.e. small duty cycle G makes the number of transmitted packets small.Thus, the number of successfully received packets becomes small.However, interference also becomes smaller as network traffic is reduced.This phenomenon also happens in the case of large CW.
At GW, it is not possible to know S n without explicit feedback from LoRaWAN node n.To solve this problem, we propose reinforcement learning-based optimization and approximation of the objective function using only the number of successfully received packets.

IV. MACHINE LEARNING TECHNOLOGY
Reinforcement learning is one of the learning schemes that search for optimal action from a given situation.The agent is not given the pair of the specific situation with the optimal action.However, the agent is given the reward for a specific situation and a corresponding action.The agent executes the optimal action based on a reward function that is the sum of the reward of each action.However, this reward function depends on the environment, and it is difficult to solve this function theoretically.To tackle this, the agent approximates the reward function from a taken action and a given reward.This learning scheme is efficient for the specific situations where actions affect subsequent situations, or for situations which provide results from a series of actions, e.g.Markov chain.

1) Q-LEARNING MODEL
Let S and A be a state set and an action set, respectively.Then, the reward function at time t is approximated by the following update equation: where Q(S t , A t ) is the expected value of a reward when an agent takes action A t ∈ A in state S t ∈ S. R t+1 is an instant reward at time t + 1 of action A t , γ ∈ [0, 1] is a discount rate, and α∈ (0, 1] is a Q-learning rate.This approximation converges to real reward function (Q ) [16].
2) Q-LEARNING USING NEURAL NETWORK In traditional Q-learning, the agent needs to keep the Q values for each pair of state and action by using eq.( 8); therefore, the agent keeps it as a

B. NEURAL NETWORK (NN)
NN is one of machine-learning schemes that approximate the relationship between input and output information using neurons [18].This learning scheme contains two steps: forward propagation and back propagation, as shown in Fig. 3.In this research, the GW has N NNs for each of N LoRaWAN nodes.In this section, without loss of generality, we review the NN function of node n.

1) FORWARD PROPAGATION
NN is composed of neurons and couplings.Each neuron is arranged hierarchically, and has two functions: reception and activation, as shown in Fig. 4. First, a neuron obtains the   weighted sum of output from the previous layer.Then, the neuron transforms it by an activation function that is generally nonlinear.Neuron j of layer l receives the weighted sum of the output of neurons in layer l − 1 as n,i,j φ(z where w n,i,j is the coupling weight from neuron i of layer l − 1 to neuron j of layer l, z is the output of neuron i in layer l − 1, and φ(x) is a kernel function.For the kernel function, an ideal function is applied as Then, neuron j in hidden layer l calculates output z (l) n,j by applying an activation function to z where f (•) is the activation function.Generally, rectified linear unit (ReLU) function f ReLU is used for the activation function, which is given as 2) BACK PROPAGATION The NN weights, w n = {w l n,i,j }, are trained by using the error function between the output from the NN and the desired output.Let the error function for given NN weights w n be E(w n ).Then, the optimal NN weights, w n,opt , satisfy where ∇ is a gradient operator.However, since it is hard to derive the optimal weights analytically, it is common to derive it using numerical scheme.The NN parameters are updated from a learning time τ as where τ n is an update term at τ , and initial weights w 0 n are calculated using Xavier initialization [21].
There are many methods that calculate update term τ n , such as stochastic gradient descent (SGD), adaptive moment estimation (Adam), etc.The gradient value of each coupling weight, , is required to calculate τ n .Back propagation (BP) is an efficient method to calculate the gradient.Let us focus on the update of weight w where δ l n,i is called the error gradient that is expressed as where L is the number of layers of NN.In this paper, a squared error is adopted as the error function E(w), which is given by where o n,k is the training data, and z is the approximation of the training data with output k.In this study, we want to approximate Q-function, i.e., o n,k = Q n,k n where Q n,k n is actual Q-reward when resource k n is allocated to LoRaWAN node n.

3) OPTIMIZER
NN parameter w is updated as shown in ( 14) using the gradient value as described above.For calculating τ n , there are several schemes such as SGD and Adam.In SGD, the gradient value is directly used to calculate and update τ n .
Although SGD can escape from a local optimal point, it has the slowest convergence.In Adam, the 1st moment of the gradient is normalized by the 2nd moment of the gradient in order to adapt the learning rate and stabilize calculation.By this normalization, the parameter fluctuation can be suppressed.
For NN, a pure perceptron with L layers is adopted in this paper.Let us define layer 0 as the input layer and layer (L −1) as the output layer, and the other layers are defined as hidden layers.The update equation for NN weights depends on the optimizer.Let us describe the weight update between neuron i of layer l and neuron j of layer l +1 for the LoRaWAN node n.In SGD, weight w (l) n,i,j is updated by where η is an NN learning rate.
In Adam, the update equation is given by where m(l) n,i,j,t is the estimated 1st moment of the gradient at epoch t, v(l) n,j,k,t is the estimated 2nd moment of the gradient, η is the learning rate, and Adam is a small value to avoid division by zero.m(l) n,j,k,t and v(l) n,j,k,t are given by the below equations: where

V. PROPOSED SCHEME A. DESIGN OF LEARNING MODEL
Let one epoch be composed of the channel allocation, Q value observation, and learning process.The GW has one independent Q-learning equipment for each LoRaWAN node , i.e., GW acts as an agent of Q-learning.The frequency channel assignment for the next epoch is determined based on the state at the current epoch.Without loss of generality, we explain the frequency channel assignment for LoRaWAN node n.Let us define state set S, action set A, and Q-value as below.
• State S: The combination of the allocated channel indices of all the nodes.The frequency channel assignment for each LoRaWAN node is represented by onehot-K vector, where the element corresponding to the assigned frequency channel is set to 1, and otherwise set to 0. Thus, each state S t ∈ S is a column vector by stacking up N one-hot-K vector.For example, suppose N = 3 and K = 4 and then one possible state is given by (1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0).
• Action A: The set of channel indices which can be allocated to node n.
• Q-reward Q n,k n : The weighted sum of the number of received packets.It is adjusted by the ratio of the number of received packets from the node n of interest and the minimum number of received packets of other nodes, which is given as where D n,t is the number of successfully received packets from LoRaWAN node n during epoch t, and ν is a selfish parameter that adjusts priority between node n's reward and that of other nodes.The selfish parameter ν is expressed as When ν is small, node n acts selfishly and tries to increase its own number of received packets.On the other hand, if ν is large, GW attempts to equalize the performance of all nodes through resource allocation to node n.This learning contains a two step learning comprised of wireless environmental learning and optimal resource selection.The first part learns the wireless environment around each LoRaWAN node from the input channel allocation state.For example, this learning tries to understand which pair of LoRaWAN nodes do not interfere with each other even if they are allocated to the same frequency channel.The second part is frequency channel allocation based on the wireless environment.Based on the learned wireless environment, the optimal frequency channel is assigned to each LoRaWAN node.In the proposed scheme, the two steps are connected and the frequency channel allocation is performed based on the input frequency channel allocation state.

B. RESOURCE ALLOCATION USING Q-LEARNING
The GW allocates one of K frequency channels to each LoRaWAN node based on the output from NN as follows.We show the allocation algorithm at epoch t.
Step3 The GW observes the number of successfully received packets for state S t .By this, the GW can allocate the frequency channel that maximizes the number of successfully received packets having a correlation with the PDR to each LoRaWAN node.

A. SIMULATION PARAMETERS AND MODEL
In this section, we provide computer simulation results to verify the performance of the proposed scheme.In this simulation, node-GW shadowing is calculated by a spatially correlated shadowing model [12].This component is expressed as a function of the location of node ψ(x n , y n ).Between LoRaWAN nodes, shadowing is calculated using uncorrelated shadowing because the nodes are located near ground height, and the shadowing correlation is very low due to the distance between nodes.This component is therefore expressed as the function of nodes index ψ(n, q) where n and q are indices of nodes.In both situations, uncorrelated shadowing is based on log-normally distributed shadowing loss with zero-mean and standard deviation of σ [dB].The wireless system parameters and the learning parameters are summarized in Tables 2 and 3, respectively.LoRaWAN's parameter is derived from the Japanese parameter configuration AS923 from document [14].For learning parameters, we compare learning schemes and models, e.g. the number of layers, learning rate, optimizers, activate functions, etc.The optimal combination of learning parameters and schemes is used for PDR performance evaluation.For the -greedy scheme, (t) is given as where t is the current epoch, and T is the number of epochs.

VOLUME 7, 2019
As shown in Fig. 7, the optimal number of layers is 4 for the ReLU activation function.This can be explained as follows.
If the number of layers is too small, the performance of the NN is insufficient to express the relationship.On the other hand, if the number of layers is too large, the so called vanishing gradient problem [21] occurs.In other words, the gradient of the error function approaches zero, so the NN cannot be trained.This problem becomes more obvious as the number of layers increases.Moreover, the initial state is not good when the number of layer is large.However, the NN can be effectively trained when the gradient of error function is sufficiently large.Thus, an appropriate number of layers exists.In the following evaluation, L = 4 is used for NN with the ReLU activation function.
The reason for the stair-like curve is as follows.Because the PDR is evaluated at each epoch as shown in eq. ( 6), the maximum number of packets to be received is at most T epoch /T interval,n + 1 = 11.Thus, for example, the PDR performance takes an integer multiple of 1/11.

2) LEARNING RATE
Next, the impact of the optimizer on the PDR performance is shown for each activation function.

3) LEARNING SCHEME
Based on the optimum values for the number of layers L and the learning rate η for each optimizer, the root mean squared error (RMSE) convergence property and the PDR performance of each learning scheme is shown in Fig. 9.The RMSE is defined as Fig. 9a shows that, although the RMSE becomes smaller as learning progresses, it does not converge to 0. This is because, in this learning model, each node may change the resource index used in epoch t + 1 from that in epoch t; thus, the Q-value is not always stable on state S t .This result shows that the SGD optimizer can become small faster than the Adam optimizer.If the Adam optimizer is used, the latter data sets have relatively small effects relative to the former.This results in the latter part having worse convergence performance.Fig. 9b shows the CDF of PDR for different combinations of activation function and optimizer.Although the difference between the performances of two optimizers is relatively small, the computational complexity of Adam is greater than that of SGD.This is because Adam requires additional computations such as the square root of the second moment.Thus, in the following evaluation, SGD optimizer is used.The impacts of number of frequency channels K and CS threshold CS on the CDF of PDR performance are shown in Fig. 11 and Fig. 12. Fig. 11 shows that the performance improvement from the proposed scheme depends on the number of available frequency channels, K .When K = 8, the proposed scheme can improve the average PDR performance by 20%, compared with the conventional random allocation.However, when K = 16, this performance improvement becomes slightly smaller and is approximately 13%.This is  because random channel hopping can avoid packet collision if the system has a sufficient number of channels.In a typical LoRaWAN system, only a small number of channels is available.For example, in the EU standard, the minimum number of channels is set to 3 [6].Thus, it can be said that the proposed scheme is more effective in a practical environment.

D. PDR PERFORMANCE
From the figure, it can be seen that there are sharp transitions of performance at PDR close to 1 and 0 (this is more obvious for a small number of K ).These phenomena can be explained as follows.First, the reason for the sharp transition close to PDR = 1 is due to the nodes close to a GW.Because such nodes are close to GW, the received signal power at GW is considerably high; therefore, their PDR performances is almost 1 even under interference.Second, the reason for the sharp transition close to PDR = 0 is due to the interference among the LoRaWAN nodes.Even with the proposed scheme, strong mutual interference among the nodes may occur.If the mutually interfering LoRaWAN nodes are assigned to the same frequency channel, they interfere with each other and result in packet loss.If the number of available frequency channel K is small, this interference cannot be avoided by random channel hopping.
Because the PDR performance of LoRaWAN with CSMA/CA highly depends on how accurately each LoRaWAN node can CS with each other, we evaluate the impact of the CS threshold CS .Fig. 12 shows that as CS becomes lower, the conventional random channel allocation can slightly improve the PDR performance by avoiding packet collision.However, even when CS is low, packet collisions still happen due to the correlation of the packet generation timing.On the other hand, the proposed scheme can provide much better performance irrespective of CS because the proposed scheme can allocate frequency channels to avoid packet collision depending on hidden terminal relations and the correlation of packet generation.The proposed scheme can avoid allocating identical channel frequencies to nodes that either have the same packet generation timing or cannot CS each other.Through the learning, the proposed scheme, having higher priority, avoids one of the two factors that significantly impact the PDR performance degradation.

E. WIRELESS ENVIRONMENT RECOGNITION
There are two factors that result in packet collision: CS availability and packet transmission timing collision.
To evaluate those factors, we define three metrics, i) CS rate P CS , ii) packet transmission timing difference T PG (n, q), and iii) mean packet transmission timing difference in each frequency channel T PG (k) with k ∈ K. First, P CS is defined as where k is the frequency channel index, n and q are the node indices, C(k) is the set of LoRaWAN nodes allocated to the frequency channel k, I CS (n, q) is the indicator functions given by I CS (n, q) = 1, if node n and node q can CS each other 0, otherwise . (31) T PG (n, q), and T PG (k) are defined as where t n i is the approximated starting time of transmission of packet i from node n.This is given by where δ packet,n is the interval between packet i − 1 and i, which takes into account the duty cycle T wait as δ packet,n = max T interval,n , T wait,n,i .Because GW also shall follow the duty cycle [22].packet retransmission is not allowed.Thus, the error between the actual packet transmission starting time and the approximated one is negligible, i.e., on the order of contention window.
If the CS rate P CS is high, the LoRaWAN nodes allocated to the same frequency channel can CS each other; hence, packet collision can be avoided.If the packet generation timing difference T PG (n, q) is large, the LoRaWAN nodes allocated to the same frequency channel can also avoid packet collision.Thus, from the view point of wireless environment recognition and frequency channel allocation, it is desirable to have high P CS or large T PG (n, q).

1) CS RATE
The CS rate, P CS , of random allocation and the proposed scheme are shown in Fig. 13 The figure shows that the proposed scheme improves P CS slightly, compared with random allocation.This is because the packet generation timing difference is more dominant than CS availability.In the following, we show this.

2) PACKET TRANSMISSION STARTING TIME
The CDF performances of packet transmission with starting time difference T PG (n, q) of the proposed scheme and the conventional random allocation scheme are plotted in Fig. 14.For reference, the performance of the system with K = 1 is also plotted.As Fig. 14a shows, the proposed scheme can slightly increase the value compared with the random  allocation by allocating frequency channels such that the LoRaWAN nodes in the same frequency channel have a greater time difference.Fig. 14b shows the mean value of the packet transmission starting time difference, T PG (k).This result shows that the variance of mean difference is greater for the proposed scheme.Although the improvement is marginal, the PDR performance is significantly improved.From these results, it is indicated that the proposed scheme can increase the mean of difference, while maintain the required transmission time difference.
This trend becomes obvious if the packet generation timing offsets of LoRaWAN nodes are close to each other.We assume the case that T Offset,n is randomly selected from six predetermined values.If the LoRaWAN nodes in the same frequency channel have the same T Offset,n , then we have T PG (n, q) = 0.As Fig. 15 shows, the probability having T PG (n, q) = 0 can be significantly lowered by the proposed scheme compared with random allocation.Furthermore, T PG (k) of the proposed scheme is 3 [sec] greater than that of random channel allocation.Thus, the proposed scheme can effectively avoid packet collision among LoRaWAN nodes in the same frequency channel.

VII. CONCLUSION
In this paper, we have proposed a wireless resource allocation scheme to avoid mutual interference from hidden nodes in CSMA/CA and from traffic collision, and we have evaluated this scheme using computer simulation.By searching for an optimal resource allocation that can maximize the weighted sum of the number of successfully received packets from each node using Q-learning and NN approximation, each node can avoid packet collision without explicit feedback.From computer simulation, it is shown that the proposed scheme can improve the average PDR performance by about 20%.These results indicate that the proposed method could improve packet delivery performance without preparing more wireless resources and explicit feedback that drain LoRaWAN node batteries.

FIGURE 2 .
FIGURE 2. Model of Q-learning based on NN.

FIGURE 3 .
FIGURE 3. Forward propagation and back propagation.

Figure 8
Figure8shows the PDR performance when the SGD optimizer and the Adam optimizer are used.For the SGD optimizer, learning rate η = 10 −2 shows the best PDR performance, while η = 10 −3 provides the best performance when the Adam optimizer is used.

Fig. 10
Fig. 10 shows how the learning proceeds.It can be seen from the figure that the PDR value improves as learning progresses.The impacts of number of frequency channels K and CS threshold CS on the CDF of PDR performance are shown in Fig.11and Fig.12.Fig.11shows that the performance improvement from the proposed scheme depends on the number of available frequency channels, K .When K = 8, the proposed scheme can improve the average PDR performance by 20%, compared with the conventional random allocation.However, when K = 16, this performance improvement becomes slightly smaller and is approximately 13%.This is

FIGURE 11 .
FIGURE 11.Impact of the number of resources, K .

FIGURE 14 .
FIGURE 14. Performance of packet transmission starting timing difference.

FIGURE 15 .
FIGURE 15.Performance of packet transmission starting timing difference when timing offset is clustered.