Deep Learning-Based Long-Term Power Allocation Scheme for NOMA Downlink System in S-IoT

In this paper, we formulate a long-term resource allocation problem of non-orthogonal multiple access (NOMA) downlink system for the satellite-based Internet of Things (S-IoT) to achieve the optimal decoding order and power allocation. This long-term resource allocation problem of the satellite NOMA downlink system can be decomposed into two subproblems, i.e., a rate control subproblem and a power allocation subproblem. The latter is a non-convex problem and the solution of which relies on both queue state and channel state. However, the queue state and the channel state continually change from one time slot to another, which makes it extremely strenuous to characterize the optimal decoding order of successive interference cancellation (SIC). Therefore, we explore the weight relationship between the queue state and the channel state to derive an optimal decoding order by leveraging deep learning. The proposed deep learning-based long-term power allocation (DL-PA) scheme can efficiently derive a more accurate decoding order than the conventional solution. The simulation results show that the DL-PA scheme can improve the performance of the S-IoT NOMA downlink system, in terms of long-term network utility, average arriving rate, and queuing delay.

To realize massive access in S-IoT and adapt to the tight spectrum resources, non-orthogonal multiple access (NOMA) is viewed as an effective solution for enhancing spectral efficiency. Recently, NOMA has drawn significant interest from researchers because of its very promising applications in fifth generation (5G) networks [7]. In general, the existing NOMA schemes can be divided into two categories: power domain NOMA [8]- [10] and code domain NOMA [11], [12]. Power domain NOMA allows multiple UEs to share the same subcarrier simultaneously with different power levels at the transmitter, which increases the sum spectral efficiency but also introduces multi-user interference. To cope this problem, successive interference cancelation (SIC) is applied at the receiver to separate multiplexed UEs signals at the cost of increased computational complexity [13].
The superiority of NOMA has been demonstrated in previous literature [14]. The authors in [8] studied the performance of NOMA when users were randomly deployed, and showed that NOMA can improve the ergodic sum rates of the 5G land mobile system. The user fairness was investigated in [15], where two cases of instantaneous channel state information (CSI) and average CSI were discussed.
Note that the power and storage resources on the satellites are limited, therefore, the outage events may occur due to the insufficient power to allocate, and the limited storage resource on HTS would lead to overflow as well. To address these issues, we propose a long-term joint power allocation and rate control problem, and convert this long term problem into a series of online subproblems according to Lyapunov optimization. In addition, we notice that the power allocation subproblem is non-convex and depends not only on channel state but also on queue state. However, it is extremely strenuous to approximate the optimal SIC decoding order through conventional theoretical analysis.
Thus, we turn to deep learning (DL), also known as deep structured learning or hierarchical learning, which is part of a broader family of machine learning methods based on artificial neural networks [16]- [18]. In the past few years, deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases superior to human experts [19], [20].
Moreover, there have been some works applying DL technology into physical-layer wireless communications in recent years. In [21], the authors propose an effective DL-aided NOMA scheme which can model the channel state information (CSI) by learning the environment automatically via offline learning. In addition, a DL based pilot allocation scheme is designed in [22] to improve the performance in cellular networks with severe pilot contamination by learning the relationship between pilot assignment and the users' location pattern. A novel decentralized resource allocation mechanism for vehicle-to-vehicle (V2V) communications based on deep reinforcement learning is developed in [23], which can be applied to both unicast and broadcast scenarios. Furthermore, in [24] and [25], DL is introduced to learn optimal resource allocation policies in wireless communication systems. Similar to these works, our proposed long-term resource allocation problem is also non-convex and lack of SIC decoding order model knowledge, thus, we adopt the concept of universal function approximation of deep neural networks and develop a deep learning-based approach to train the model of SIC decoding order.
In this paper, considering that both the power and storage on satellites are limited, we first establish a longterm joint power allocation and rate control scheme for the S-IoT NOMA downlink system. Then, by leveraging the Lyapunov optimization framework, we convert the long-term optimization problem into a series of online power allocation and rate control subproblems. However, it is worth noting that our proposed long-term power allocation subproblem is non-convex, and the solution of which depends on the SIC decoding order determined by both queue state and channel state. Considering the queue state and channel state continually change from one time slot to another, which make it extremely strenuous to characterize the optimal SIC decoding order. Therefore, we approximate the SIC decoding order by leveraging deep learning to training through large amounts of data. We further simulate and evaluate our proposed DL-PA scheme, and simulation results show that our proposed DL-PA scheme can improve the performance than the rough conventional estimation scheme.
The remaining part of this paper is organized as follows. In Section II, the system model is provided in detail. The longterm optimal problem is formulated and converted into multiple online problems in Section III. In Section IV, we introduce deep learning to approximate the SIC decoding order. The simulation results and analysis are given in Section V. Some key notations and nomenclature are summarized in Table 1.

II. SYSTEM MODEL
Consider a downlink NOMA system consisting of a satellite source node and K terrestrial terminals, denoted as S and UE 1 , UE 2 , · · · , UE K , respectively. Note that the focus of this paper is to study the optimal resource allocation scheme within a certain NOMA group, and we regard that all the UEs covered by a same spot beam. Thus, we focus on the scenario where a NOMA group under a single spot beam. By taking advantages of power domain NOMA, all the UEs in the same spot beam coverage with different positions can share the same frequency at the same time.
We divide time into multiple time slots t, t ∈ 0, 1, 2, . . . , T − 1. At the beginning of each time slot, the superposition coding (SC) is employed at node S, and S multicasts the following SC signals to the K UEs: , and p i (t) denotes the transmit power allocated to UE i at time slot t. Consistent with most existing works [26], [27], we assume that the links between S and UEs experience independent and identically distributed (i.i.d.) block shadowed-Rician fading distribution and additive white Gaussian noise (AWGN). We define g i (t) as the composite channel gain, which is consisting of antenna gain, beam gain, fading channel coefficient and free space loss at time slot t. Through the i-th downlink, the received signal at UE i can be expressed by At UE i , the UEs are sorted in an ascending order of power level as p s 1 (t) (t) < p s 2 (t) (t) < · · · < p s K (t) (t), which is a key step to find an optimal decoding order of SIC, and this is very challenging for us to solve by our proposed scheme. Then, SIC is used to decode the signals of the UEs with higher power levels. Specifically, UE i need to decode the signal for each UE j with j > i, then subtract them from the received signal until decode the UE i 's own signal. Furthermore, UE i treats the signals of UE l with l < i as interference and the signal-to-interference-plus-noise ratio (SINR) of UE i is given by . . .
During each time slot, with a pre-determined SIC decoding order, we can derive the leaving data rate targeted where W represents the frequency bandwidth, and τ is the duration of each time slot. Let Q i (t) to denote the amount of data buffered at queue Q i during time slot t waiting to be transmitted. And we denote a i (t) as the data targeted UE i arriving at the HTS S from the backhaul link, which are first buffered at queue Q i and then forwarded to UE i over the wireless channel.

III. PROBLEM FORMULATION AND CONVERSION A. PROBLEM FORMULATION
To convert the two long-term constraints: long-term mean rate stability and long-term mean power stability, we establish the following two virtual queue Q and Z .
We find that when a UE has large queue backlog, there are two ways to keep the queue stable and avoid overflow, one is to decrease the arriving rate a i (t) and the other is to increase the leaving rate b i (t). According to [28], a discrete time process Q(t) is network stable, and the overflow can be avoided if the following conditions are satisfied: and (7) where Z (t) denotes the power debt state at time slot t to guarantee the long-term mean power constraint P mean , i.e., (8c). When the allocated power during the last time slot t exceeds P mean to maximize the expected throughput (the long-term network utility), i.e., Z (t) − P mean > 0, then the power can be used in current time slot (t +1) is adjusted to be less or even zero, which ensures that long-term power constraint.
Considering the actual satellite communication environment, the transmission power of HTS is limited by both its long-term average power P mean and short-term peak power P max . Therefore, to maximize the long-term network utility U , which is denoted as a non-decreasing and strictly concave function of the long-term time average arriving data rate [28], we state the original problem which depends on the longterm time average arriving data rate a i (t) and the coefficient of power allocation p i (t) as follows:

B. PROBLEM CONVERSION
It is obvious that the original problem is a time average optimization problem and depends on multiple time slots. According to the Lyapunov theory [28], we can decompose the original problem and get a series of single time slot power allocation sub-problems as follows: The derivation of the decomposition is deduced in Appendix A. We can see that every single time slot online optimization problem only depends on the information during the current time slot. It is worth noting that the optimization objective function depends on both Q and g, which play the key role in the power distribution process. Moreover, the relationship of Q and g directly determines the power level order, which is equal to the SIC decoding order.
Specifically, in the initial state, all the queue states of UEs are same, i.e., all the queue states Q equal to zero. And we allocate power only depends on the channel gain g, which is consistent with conventional power domain NOMA scheme. While after several of time slots, due to the differential power allocation and arriving rate, the queue state Q has changed and produced a difference with each other. At the same time, power allocation no longer depends only on the channel state g, but is also greatly affected by the queue state Q.
Therefore, an optimized SIC decoding order is the key to achieve the maximum utility, we dedicate to derive the optimal SIC decoding order to enhance the performance of our proposed long-term optimization problem in the next section.
However, the objective function is non-convex and it is difficult to deduce an accurate and optimal SIC decoding order. Thus, we propose a FuS algorithm in which a function F = Q + ug is defined to roughly estimate the SIC decoding order, where u is a weight to emphasize the importance of the proportion between the queue state Q and the channel state g. In addition, we fix the weight u to a constant at the beginning of each time slot, and sort all the UEs in an ascending order of F = Q+ug which we regard as a suboptimal decoding order. After reordering all UEs in the suboptimal decoding order, we then use particle swarm optimization (PSO) which iteratively tries to improve a candidate solution with regard to a given measure of quality to find the optimal power allocation. We denote this algorithm as FuS Algorithm (Algorithm 1).

IV. IMPROVE THE SIC DECODING ORDER VIA DEEP LEARNING METHOD
Since the result of FuS algorithm is suboptimal, in this section, we aim to further approximate the SIC decoding order by leveraging deep learning method, which is denoted DLS algorithm to enhance the performance of our proposed DL-PA scheme, as shown in Algorithm 2. The framework of VOLUME 7, 2019

Algorithm 1 FuS Algorithm
Input: Q, Z , D, τ , P max , P mean , a max , B, η, and T ; Output: The channel gain g, the optimal rate R and the optimal power allocation 1 Sort UEs according to F(Q, g); 2 //Network stability control;
The core idea behind in our DL-PA scheme is to use the neural network in Fig. 3 (consisting of three hidden layers of neural network) as an approximation function that computes the SIC decoding order based on the given queue state and channel state. The output of the S-IoT network is then compared to a given optimal SIC decoding order. The aim of  our output is to be as close to the target as possible, and we define a loss function according to mean squared error (MSE) principle as follows Moreover, in our DLS algorithm, we use backpropagation to alter our network's weights, then the future outputs will be closer to our desired target. In this section, we choose the adaptive moment estimation (Adam) algorithm for our DLS algorithm, which is a popular algorithm in the field of deep learning. Compare to other favorably stochastic optimization methods, Adam achieves good results in a fast convergence speed, which is proved to efficiently solve the practical deep learning problems [29]. Moreover, there are some attractive benefits [30] of using Adam algorithm on non-convex optimization problems as follows: 1) Straightforward to implement.
3) Little memory requirements. 4) Invariant to diagonal rescale of the gradients. 5) Well suited for problems that are large in terms of data and/or parameters. 6) Appropriate for non-stationary objectives. 7) Appropriate for problems with very noisy/or sparse gradients. 86292 VOLUME 7, 2019 8) Hyper-parameters have intuitive interpretation and typically require little tuning.
Before training, we need to prepare a lot of data for training. For each different set of queue state [Q 1 (t), · · · , Q K (t)] and channel state [g 1 (t), · · · , g K (t)] combinations, we first traverse all possible sequences, find the decoding order that maximizes the utility and record it as the optimal SIC decoding order [ŝ 1 (t), · · · ,ŝ K (t)], and the detail of this DLS algorithm is described in Algorithm 3.

13
Update s ←− s − lrr 1 √r 2 +ε ; 14 end 15 end 16 end 17 Finish training process; 18 Return [ŝ 1 (t), · · · ,ŝ K (t)];//The optimal SIC decoding order; Then, we split our prepared data into a training and a testing subset, respectively, where the latter is used to evaluate the performance of our DL-PA scheme after the training procedure. In this way, we can make sure that our agent will not overfit on the training data, where overfitting is the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably [31]. After finishing the training process, we use the trained model to approximate the optimal SIC decoding order, and then we resort all the UEs and perform the power allocation process, which is illustrated in DP-PA Algorithm (Algorithm 2).

V. SIMULATION AND ANALYSIS
In this section, we first provide the parameter selection process of our proposed DLS algorithm, and then we analyze the simulation results and compare with FuS scheme in the case of 5 UEs.
During the training process, we use the parameter settings suggested by the Adam optimization proposer: r 1 = 0.9, r 2 = 0.999, ε = 10 −8 . We observe the performance in the S-IoT NOMA downlink scenario of about 500 km away from the 5 UEs to the HTS S, where the SNR is defined as SNR = P mean η . We first fix the learning rate to 0.001, and change the batchsize from 50 to 200. The simulation results are shown in Fig. 4, where Acc denotes the accuracy, which can reach more than 96 percent after 100 iterations with different batchsize values. Thus, it is feasible to use this model to perform training since high accuracy and fast convergence can both obtained.
Then, it is important to choose the appropriate learning parameters and we adjust the parameters by observing MSE value in Fig. 5 and Fig. 6. In Fig. 5, we fix the batchsize to 50 and adjust the learning rate lr from 0.01 to 0.0001. It is obvious that the lowest MSE is obtained when lr = 0.001, and we set lr = 0.001 in our training process. Similarly, we observe the trend of MSE with fixed lr and different batchsize values. All the curves in Fig. 6 converge, while the curve of batchsize = 50 is smoother and more stable. Therefore, we choose batchsize = 50 in our training process.
After setting the feasible learning parameters, we simulate the performance of proposed DLS algorithm in terms of utility, data rate and queue delay. We compare the utility VOLUME 7, 2019   of DLS algorithm and FuS algorithm in Fig. 7 under different SNR. It can be seen that the utility of the total system has been improved in DLS algorithm across a wide range of SNRs. In addition, we can observe that the comparison  of individual performance of per UE in Fig. 8 and Fig. 9. Simulation results demonstrate that the data rate of each UE in the DLS algorithm outperforms that in the FuS algorithm, and the queue delay in the DLS algorithm is also lower than that in the FuS algorithm.

VI. CONCLUSION
In this paper, with the help of deep learning, an improved long-term power allocation DL-PA scheme for S-IoT NOMA downlink system is proposed, which can approximate the optimal SIC decoding order to further improve the performance of our long-term power allocation scheme. Taking the advantage of the Adam optimization algorithm, our training accuracy can achieve more than 96 percent in few iterations. In addition, simulation results demonstrate that the proposed DLS algorithm can efficiently allocate power due to the optimal SIC decoding order than that of FuS algorithm, and the performance of the long-term power allocation scheme is further improved, in terms of long-term network utility, average arriving rate and queue delay.

APPENDIX. A
According to Lyapunov theory, we first define a set of virtual queue state (t) = [[Q 1 (t), · · · , Q K (t)], Z (t)] and construct a Lyapunov function as follows: Then, we can derive the Lyapunov drift function as follows: where a new target generates, to minimize this Lyapunov drift function (t) . A small Lyapunov drift means that the queue is stable and the long-term average power constraint is satisfied. Recalling the previous objective function (8a), our current goal has changed to be minimizing Lyapunov drift function (t) while maximizing the objective function (8a). In order to unite two optimization goals, we construct a drift-minus-penalty function as follows: where V is a parameter used to emphasize the importance between Lyapunov drift minimization and utility maximization, greater than 0. Bring eq. (5) and eq. (7) into eq. (13), we can further bound DMP (t) as the follows: where B is a bounded value because a i (t), b i (t) and p i (t) are all bounded. And our objective function is converted into minimizing the right side of eq. (14). As we can see, eq. (14) can be broken down into two parts that one is only depending on arriving rate a i (t) named rate control (RC) problem, and the other is only relying on p i (t) named power allocation (PA) problem. In addition, the optimal solution can be obtained if and only if the two sub-problems are both solved optimally.
The RC problem is given as follows: RC Problem : min Q i (t)a i (t) − VU i (a i (t)), which is a convex optimization problem, we can find the optimal solution at the extreme value easily. The expression of PA problem is as follows: i = 1, 2, · · · , K .