The Performance of Supervised Machine Learning Based Relay Selection in Cooperative NOMA

This paper aims to exploit the benefits of supervised machine learning algorithms in resolving joint relay selection and non-orthogonal multiple access (NOMA) transmission in cooperative networks. Both relay selection and NOMA are capable to increase the system throughput. The optimization of the joint mixed-integer relay selection under the NOMA transmission problem is hard to be solved for achieving the global maximum of the system throughput. To simplify this complexity, supervised and unsupervised machine learning algorithms have a strong profile in dealing with hard optimization problems. The supervised learning algorithms have shown promising results in relay selection, and one of the widely applied supervised learning algorithms is the support vector machine (SVM). Therefore, in this paper, relay selection is approached as an SVM multi-class classification problem. The main concept of the SVM classification algorithm is to optimize the parameters of the SVM classifiers by training the classifiers with a large set of system realizations. The major advantage of this method is that the training stage can be performed offline as the optimizing of the SVM classifiers parameters requires high processing power, then the trained SVM multi-class classifiers can be used directly with no more training required during the system operation. Simulation results validate that the performance of the proposed supervised learning-based scheme is close to that of the global optimum exhaustive search relay selection scheme and outperforms the other available schemes. In addition, the proposed scheme is considerably simpler than the exhaustive search scheme, primarily when the number of relays is large.


I. INTRODUCTION
The ever increasing demand on wireless data transmission motivates the researchers to apply new solutions that can match the requirements. One of the promising solutions is cooperative networks, specifically, relay assisted communications [1], [2]. Nevertheless, when multiple relays are utilized, it is not a simple task to integrate them and profit from them efficiently. Relay selection is one of the powerful ways to deal with multiple relays adequately. In relay selection, a source transmits a signal to the selected relay which in turn re-transmits the signal to the destination [3], [4]. Therefore, relay selection is a descent way to reduce system complexity and maintain the full diversity gain. In particular, relay selection is a key technology to reduce The associate editor coordinating the review of this manuscript and approving it for publication was Arturo Conde . power consumption and interference [5] For example, Wireless sensor networks (WSN) have been extensively used in military and medical fields. Due to the huge number and small scale of sensor nodes distributed in some WSN applications, the system throughput degrades substantially. Cooperative communication technology is a key technology to improve the throughput of sensor networks [6], [7]. Therefore, cooperative relay is a key technology in current wireless systems, such as long-term evolution (LTE), LTE-Advanced cellular systems [8].
Another promising solution is non-orthogonal multiple access (NOMA) due to its higher spectrum efficiency than orthogonal multiple access (OMA) [9]. The key concept behind NOMA is to handle multiple users in the same frequency band, but at different levels of power. For enhancing the performance of NOMA, cooperative NOMA with a single relay was analyzed in [10]. Later on, cooperative NOMA networks with multiple relays have received more consideration [11], [13]. Relay selection in Cooperative NOMA has been studied in [14], a two-stage max-min relay selection rule with fixed power allocation based on users' quality of service (QoS) and channel quality was proposed. At the first stage a subset of relays which can satisfy the QoS of the low-rate (weak) user, and then, in the second stage, apply the max-min scheme to select a relay, from the subset of the first stage, which can serve the high-rate (strong) user the best. The outage performance of the two-stage max-min scheme is superior to that of no relay or single relay scheme. An enhancements on the two-stage max-min selection rule was proposed in [15]. This is done by prioritizing the relays which can support the NOMA transmission, this led to further reduction in outage probability, hence, higher system throughput can be achieved (see [15]).
In the current smart era where most nodes are capable of performing intelligent functionality. Machine learning has been approved as an adequate artificial intelligence (AI) technique for performing various tasks such as classification. Applying machine learning algorithms in wireless communication is a hot research topic [16]. For instance, relay selection is performed under supervised and unsupervised learning methods. Reinforcement learning, one of the most common unsupervised learning methods has been applied in relay selection [17], [18]. Concurrently, as a result of the accomplishments in channel estimation, channel statistics and estimation required for relay selection are available as labeled data. This makes supervised machine learning that relies on labeled data more suited for doing relay selection in some situations. However, the studies combining relay selection and supervised machine learning is still inadequate [19]. Reference [19] proposed OMA relay selection in a supervised manner, to select the best relay between multiple relays and achieve the selection diversity gain to help forward the signal transmitted by a source to a destination.
Relay selection improves the system performance. On the other hand, relay selection is an NP-hard problem, and this hardness is transferred to the joint relay selection and NOMA problem [20], [21] And the global optimal solution is built on exhaustive search which involves high computation complexity, especially when the number of relays is large. This complexity is unacceptable in Wireless networks where the computation and energy storage of the nodes is limited, for instance, in WSN, schemes with high computation complexity is unattainable. Notwithstanding, to the best of the authors knowledge, none of the available studies has considered a supervised learning based relay selection in cooperative NOMA network, which have the advantages of low complexity and high performance. In addition, the available previous non-buffer relay selection schemes have not considered the benefits of combining NOMA and OMA transmission by employing OMA transmission when NOMA transmissions not supported rather than being in outage, combining NOMA and OMA transmission was suggested for buffer-aided relay network as in [22]. The proposed supervised learning based relay selection scheme takes combining NOMA and OMA transmission into consideration.
This paper applies a data-driven method to design a joint relay selection and NOMA scheme, aiming to maximize the system throughput under the NOMA transmission conditions and the relay selection constraints. Aiming to ensure simple online computation burden of cooperative NOMA relay selection while maintaining high performance, we propose a relay selection rule based on supervised learning method. Firstly, the relay selection is converted to a multi-class classification problem, then support vector machine (SVM) based supervised learning scheme is utilized to select the best relay. To implement SVM, the multi-class classification training system is constructed then massive sample data is fed to the training system, as a result the parameters of the optimal multi-class classifiers are obtained. Once the multi-class classifiers are figured in the training stage, low computations are needed for real-time operation.
The remainder of this paper is organized as follows. In Section II, we introduce the system model. In Section III, we propose a data-driven based joint relay selection and cooperative beamforming scheme. The Section IV evaluates the performance of the proposed scheme and analyzes its complexity. Finally, the paper is summarized in Section V.

II. SYSTEM MODEL
In this paper, a downlink cooperative NOMA network with two users and K relays is considered. Specifically, the system model of the cooperative NOMA is shown in Fig. 1, where there are a source node S, K half-duplex (HF) decode-andforward (DF) relay nodes denoted as R k , k = 1, · · · , K and two users U 1 and U 2 , respectively. The channel coefficients for S → R k denoted as h sr k , and the channel coefficients of R k → U 1 and R k → U 2 links are h r k u 1 and h r k u 2 respectively. All channels are assumed to have flat Rayleigh fading coefficients that remain constant within the time-slot and change independently in different time slots. We assume that the source always has enough information for transmission in all time-slots. In each time-slot, a packet can be transmitted by the source or a relay, information symbols intended for the two users are assembled into packets of equal size. In addition, we assume that the source and the users are not directly connected. Without losing generality, we assume that the transmit powers at all transmit nodes are P t , and the noise variances at all receiving nodes are σ 2 .
As OMA transmission mode is applied, at time-slot t, the link capacity for channel h b k (t) is given by η b k are the instantaneous and average SNR for channel h b k (t) respectively.

A. TRANSMISSION MODE
At each time-slot, source-to-relay S → R k and relay-to-users R k → U m transmissions may operate in a double or a single packet transmission. For the S → R k link, if it satisfies where γ is the target data rate, the source S is able to transmit two packets to R k . This can be achieved based on time-division-multiple-access TDMA technique (or any other OMA techniques) by assigning half of the time-slot for each packet. Otherwise, if (2) is not satisfied but C sr k (t) ≥ γ , a single packet transmission can take place to one of the R k . Noting that if (2) is not satisfied, NOMA transmission can not be applied between relays and users. On the other hand, for the R k → U d (d = 1 or 2) link, NOMA can be applied (under (2)) to transmit packets to U 1 and U 2 together. The superimposed NOMA symbol at the selected relay R k is where x r k,1 (t) and x r k,2 (t) are data for users U 1 and U 2 respectively, and 0 ≤ α ≤ 1 is the power allocation factor. Then the received signal at U d is given by where n d (t) is the additive white Gaussian noise AWGN at user U d . When NOMA is applied, the link capacity is not given by (1) but must include the interference within the multiplexed symbol. To be specific, when η r k u 1 (t) > η r k u 2 (t), the SNR to decode x r k,2 (t) at U 2 is given by Because η r k u 1 (t) > η r k u 2 (t), x r k,2 (t) can also be decoded at U 1 if it can be decoded at U 2 . Removing x r k,2 (t) from the received signal at U 1 by SIC, the required SNR to decode x r k,1 (t) at U 1 is given by Following similar procedures as those in [22], the condition that there exists an α to support NOMA transmission to both U 1 and U 2 (i.e. log 2 (1 + SINR(x r k,2 (t)) ≥ γ and log 2 (1 + SNR(x r k,1 (t)) ≥ γ ) is given by from (7) and (8) .
Similarly, if η r k u 1 (t) < η r k u 2 (t), NOMA condition becomes If the SNR for the R k → U d (d = 1 or 2) links is not large enough to satisfy (10) or (11), NOMA transmission is not possible or not efficient. In this case, if C r k u d (t) > γ , OMA can be used to transmit one packet to U d .
The transmission mode selection is done based on the status of all channels to check the possibility of satisfying (2) and either (10) or (11) for selecting the NOMA transmission mode, otherwise the OMA transmission mode is selected. This task requires that channel state information (CSI) of each link to be exchanged between all nodes including the relays and the control node (can be any relay). Hence, the control node has all the required information to select the right transmission mode in every time slot (channel use).

B. PROBLEM FORMULATION
For traditional OMA transmission mode in relay cooperative networks, the capacity of each transmission link has to exceed the target data rate. In other words sr k (t) ≥ γ and C r k u 1 (t) ≥ γ or C r k u 2 (t) ≥ γ must be satisfied for a successful single data packet transmission. Whereas NOMA transmission mode is more restricted than OMA hence more conditions need to be met. Specifically, by satisfying ( (2) and finding the optimal power allocation coefficient that satisfies (10) or (11)), two data packets is possible with NOMA transmission. Therefore, to maximize the throughput of network, we formulate the optimization problem as follows: where N is the number of time slots, w(t) = 1 denotes the OMA transmission mode, and w(t) = 0 denotes the NOMA transmission mode, the indication function I (−) = 1 if the enclosed relation is correct and I (−) = 0 otherwise, (15c) defines the range of power allocation factor in NOMA mode. It is worth mentioning that the factor 1 2 is due to the fact that the network is in two-hops configuration. According to (12), to maximize the network throughput, we have to optimize the selection of the transmission mode and the correct nodes at each given time slot. When the NOMA transmission is not supported, the system moves to the OMA mode rather than being in outage. For the NOMA transmission mode, the optimal power allocation factor α is required to assure the availability of the users links for data transmission. Instead of solving (12) with simplification assumptions to make it tractable, we employ supervised learning to maximize (12).

III. SUPERVISED-LEARNING-BASED JOINT RELAY SELECTION AND NOMA TRANSMISSION
The selection of the best relay can be regarded as a multi-class classification problem. Recent advance in machine learning can deal with classification task efficiently. The available machine learning approaches can be divided into two types supervised and unsupervised learning algorithms, supervised learning is performed on labeled data [23]. One of the widely used supervised learning algorithm is support vector machine SVM. SVM has been proved to be an effective classification model with good performance with non-normally distributed data, non-linear relationships, noisy and complex data [24]. The main idea of SVM is to select a hyperplane or set of hyperplanes that are the farthest from the training-data points of each class to separate data apart. Finding the optimal hyperplanes is an optimization problem, which is solved by the quadratic program [25], [26]. Currently, there are platforms provide the implementation of SVM such as Scikitlearn in Keras [27].
The first step is the collection of the experimental data. Reliable data is crucial to get a training model that has an adequate generalization ability. Now we explain the proposed supervised-learning based relay selection scheme. Because there are multiple relays, the problem is a multi-class classification problem. The selected features are the channel gains of all links S → R k and R k → U d based on the availability of channel state information CSI, we apply the SVM algorithm to construct the classification problem and predict the label of the class that the current links belong to. The predicted labeled class represents the best relay index k to maximize the achievable throughput. To build the classification model we need a sufficiently large training data set.

A. TRAINING DATA PREPARATION
We generate L channel samples for training. Each sample is represented as All channels coefficients h are randomly generated according to Rayleigh distribution. There are K relays, hence, there are K classes, and the class label represents the index of the relay. For each sample S i vector, the label is generated based on an exhaustive search result of the global maximum for the joint optimization of relay selection and NOMA trnsmission in (12). Particularly, the label of a sample vector S i represents the index of the best relay, for the link states defined in S i , which can achieve the maximum throughput under constraints in (12). This process repeats to produce the entire training data set.

B. SVM CLASSIFIER
The SVM classifier aims to find the hyperplane that separates classes of samples with the largest margin [31]. With L sample vectors S i training dataset and the lable o i of each sample can take the values +1 or −1, the optimal hyperplane formulated as where w is the normal vector on the hyperplane, and v represents the distance between the hyperplane and the origin. In SVM, the training of any two classes separating hyperplane is optimized by solving the following problem min w,v,ζ where Z is the penalty constant, ζ i is the value of error caused by misclassification for sample S i , and the slack term Z L i=1 ζ i is added to make it a soft margin problem. After that, the multi-class classification can be dealt with by combining multiple SVM classifiers.

C. TRAINING PHASE
In the prepared L tuples {S i , o i } as training data set, the training observations belong to K classes. SVM classifiers aim to find the best hyperplanes to separate the classes. Classifying all of the training observations perfectly may lead to higher sensitivity so the addition of a single data point may change the hyperplanes dramatically. For that, we might VOLUME 11, 2023 be willing to consider a classifier based on hyperplanes that do not perfectly separate the classes. In other words, it could be advantageous to misclassify a few training observations in order to improve the classifying of the remaining observations this is known as soft margin SVM. SVM classifier finds such hyperplanes by solving (15) by finding the optimal values of the hyperplane parameter parameters w, v and ζ . After optimizing the parameters, the best relay can be predicted for any new input of channel states. The importance of the supervised learning is stressed in the fact that the training of the SVM classifiers is accomplished offline. In the training phase, an adequate number of CSIs are handled to obtain the efficient SVM classifiers. Upon the completion of the training, the trained SVM classifiers are applied in online scenarios and selects the best relay with simple computations. Therefore, the computational complexity of the proposed SVM scheme is not expected to cause considerable overhead or delay.

D. TESTING PHASE
In the data preparation stage, the number of the data samples should exceed the training data samples L by an adequate ratio and leave the unused T samples for testing, where T denotes the number of testing data samples. The same labeling exhaustive search used with the training data is used in labeling the testing data samples. The trained SVM classifiers are utilized to predict the labels of the testing data samples. The accuracy of the trained SVM classifiers are calculated based on their performance in correctly labeling the testing data samples. The proposed supervised learning based relay selection algorithm is summarized in Algorithm 1.

Algorithm 1 Supervised Learning Based Relay Selection in NOMA Networks Algorithm
1: Initialize the channels state samples S i , i = 1, · · · , L + T ; 2: Calculate the best relay for each sample vector S i based on the exhaustive search method which finds the global maximum of (12); 3: Obtain the label vector o i , which contains the labels of all L + T samples; 4: Train SVM classifier parameters with L labeled data {S i , o i } using (15); 5: Construct the trained SVM classifiers; 6: Calculate the trained SVM classifiers accuracy based on their ability to correctly labeling the testing data samples T ; 7: Predict the best relay for any new input of channel states via trained SVM classifiers.
In general, the complexity of Algorithm 1 is linear to the number of the support vectors, the size of the training set and the number of features. This makes it a hard problem of the order O(n 2 ) to O(n 3 ) as in [28]. However, solving Algorithm 1 for the optimal solution is not practical, and aiming for increasing the accuracy of the utilized classifiers is a common approach in the available studies. The accuracy is proportional to the training set size to a certain limit, after that increasing the size of the training set is not efficient, this is shown in the next section.

IV. PERFORMANCE INVESTIGATION
In this section, we investigate the achievable throughput of the proposed SVM relay selection scheme. For comparison, we also show the performance of the two-stage max-min relay selection rule [14], the enhanced version of the two-stage max-min relay selection rule [15], the optimal exhaustive search scheme and the random relay selection scheme. In the exhaustive search scheme, the performance of all relays is tested and the relay that achieves the upper bound of the throughput is selected. While in the random relay selection scheme, we randomly select 1 out of the K relays to complete the transmission. The training data samples L = 2.4 × 10 5 channel realizations and the test data samples T = 6 × 10 4 channel realizations, it is worth mentioning that a huge number of data samples is vital in training to get a decent SVM classifier and a smaller set of testing data samples is sufficient to determine the SVM classifier accuracy. A common practice is assigning 20 percent of the whole data set for testing, Table 1 shows the accuracy of them SVM classifiers at different sizes of the training set. All channels realizations (h) are independently and identically distributed (i.i.d) complex Gaussian random variables with zero-mean and unit variance. Accordingly, the testing data samples are new for the trained SVM classifier. The remaining simulations parameters are set as follows: the transmit power P t at the source and at each relay are identical, there are two users in all simulations, the number of relays is K = 3 and the target transmission data rate γ = 2 bps/Hz. Any alteration in the mentioned parameters values will be declared. Fig. 2 presents the system average throughput comparison between the supervised learning based relay selection scheme, the exhaustive search scheme, the two-stage maxmin, the improved two-stage max-min and the randomly selected scheme. It is obvious that the exhaustive search scheme achieves the highest throughput. The supervised learning based scheme performance is close to the exhaustive search scheme and outperforms all other schemes, the  performance gap between the supervised learning based scheme and the exhaustive search scheme is due to the accuracy of the supervised classifiers and as the accuracy increases, the gap between the two schemes diminishes. It is noteworthy that the improved two-stage max-min scheme performs insufficiently at low SNR, because it does not consider the OMA transmission when the NOMA transmission is not supported in bad channel conditions.
To show the benefits of considering both of OMA and NOMA in transmission, Fig. 3 demonstrates the outage probability comparison between the improved two-stage max-min scheme and the proposed supervised learning based scheme. The improved two-stage max-min scheme utilizes the NOMA transmission only and the system is in outage when NOMA conditions are not satisfied. On the other hand, the proposed supervised learning based scheme moves to the OMA transmission rather than being in outage, this makes the proposed scheme more flexible to perform well even in bad channel conditions.
It is known that increasing the number of relays increases the system degrees of freedom, which adds more flexibility to the system to avoid deep fading and support transmission in a superior way. Fig. 4 illustrates the impact of doubling the number of relays K = 6 on the system average throughput of the supervised learning based relay selection scheme and the improved two-stage max-min. Consistently, it is noticeable that the supervised learning based scheme outruns the improved two-stage max-min scheme in both cases K = 3 and K = 6. performs insufficiently at low SNR, because it does not consider the OMA transmission when the NOMA transmission is not supported in bad channel conditions. It can be observed that adding more relays has a greater impact on the performance of the improved two-stage max-min scheme than the impact on the supervised learning based scheme. It should be pointed out that using the same size of the training data set L = 2.4 × 10 5 samples has degraded the accuracy of the SVM classifiers to 89 percent. Therefore, the performance of the supervised learning based scheme can be improved by enlarging the size of the training set or using more powerful classifier such as kernelized SVM, which can be examined in future research.
Another parameter that changes the performance of the system is the target data rate γ . Fig. 5 shows the average system throughput for both of improved two-stage max-min scheme and the proposed supervised learning based schemes at three different values γ = 1 bps/Hz, γ = 3 bps/Hz and γ = 5 bps/Hz. It can be observed that the performance gap between the two schemes is the smallest when we apply the least restrictive transmission at γ = 1 bps/Hz. This is the case due to the fact that the NOMA transmission is easier to be achieved as the target data rate is reduced, hence it is more VOLUME 11, 2023 FIGURE 5. Impact of increasing the target data rate on the average system throughput of the improved two-stage max-min scheme and the proposed supervised learning based scheme, where K = 3, γ = {1, 3, 5}bps/Hz. FIGURE 6. Impact of fixing the source transmit power at P t = 10 dB, P t = 15 dB and P t = 20 while changing the relays transmit power on the average system throughput of the improved two-stage max-min scheme and the proposed supervised learning based scheme, where K = 3, γ = 2 bps/Hz. probable that both schemes can select a relay that can support the NOMA transmission. On the other hand, as γ is increased, attaining the NOMA transmission becomes harder and the selection of the correct relay is critical. Because the proposed supervised learning based scheme has a better performance than the improved two-stage max-min scheme, we notice that the performance gap between the two schemes is larger as γ is increased.
Before ending this section, it is worth mentioning that the identical transmit power at the source and the relays is not always the case. Fig. 6 shows the impact of fixing the source transmit power at several levels while varying the relays transmit power on the average system throughput of the improved two-stage max-min scheme and the proposed supervised learning based scheme. It can be seen that reducing the transmit power at the source degrades the throughput for both schemes. This is true because when the source power is low, it becomes harder for the source → the selected relay channel to support the NOMA transmission which requires a data transmission not lower than 2γ , as it has to support the transmission of two packets one for each user. Furthermore, two results can be deduced from Fig. 6, the first one is supporting the results in Fig. 5. Similar to the case where the NOMA transmission is harder to attain because of high γ in Fig. 5, in Fig. 6, as the transmit source power is decreased accomplishing the NOMA transmission gets harder and the performance gap between the improved twostage max-min scheme and the proposed supervised learning based scheme is larger as the source transmit power is decreased. The second result is that the proposed supervised learning based scheme can help in reducing the consumed power for transmitting the signal at the source. This can be noticed by comparing the performance of the improved twostage max-min scheme at 20 dB source transmit power with the proposed supervised learning based scheme at 15 dB source transmit power, the supervised learning based scheme outperforms the improved two-stage max-min scheme at lower relay transmit power (less than 7 dB) and both schemes have a very close performance otherwise. The latter result can not be generalized to all cases, for example, the supervised learning based scheme at 10 dB source transmit power outperforms the improved two-stage max-min scheme at 15 dB source transmit power when relay transmit power is less than 5 dB, however, both schemes have a different performance at relay transmit power above 5 dB and the improved two-stage max-min scheme has better performance in this region. Finally, the impact of various values of the power allocation factor α or having an adaptive α on the system performance can be part of the future research.

V. CONCLUSION
In this paper, a joint relay selection and the NOMA transmission in cooperative network is studied. To reduce the problem complexity, a supervised learning based scheme is considered. SVM classifiers or multi-class SVM classifiers are suitable to performer the best relay selection by dealing with each class as relay index k. Simulation results validate that the performance of the proposed supervised learning based scheme is the closest to the optimal exhaustive search scheme. The proposed supervised learning based scheme has considered to combine both of OMA and NOMA transmission by utilizing OMA if NOMA is not supported rather than being in outage. The proposed supervised learning based scheme outperforms the available schemes when we increase the number of relays, however, to get all the benefits of adding more relays, larger training data set is required.