User Clustering and Resource Allocation in Hybrid NOMA-OMA Systems Under Nakagami-m Fading

In this paper, we tackle the problem of optimizing user clustering, power, and resource (time slot or bandwidth) allocation in the downlink of a hybrid non-orthogonal multiple access (NOMA)-orthogonal multiple access (OMA) system. In such a system, users are organized into several clusters under one of the following scenarios: (1) fixed cluster size, (2) fixed number of clusters, and (3) variable number of clusters and variable cluster size. A power domain NOMA (PD-NOMA) scheme is used in each cluster, while OMA is employed for allocating resources to different clusters. The goal is to maximize the minimum success probability (which is equivalent to minimizing the maximum outage probability) among all users to guarantee fairness. We prove that at the optimal solution, all users have the same success probability, which is called the common success probability (CSP). Then, we propose an efficient algorithm for finding the optimal CSP and cluster resource allocation factors simultaneously. The optimal power allocation factors and the optimal decoding order of users in each cluster are then derived in closed-form expressions based on the obtained optimal CSP. Simulation results show considerable performance gains by the proposed scheme, compared to existing schemes in terms of fairness, the minimum success probability of users, and the sum throughput.


I. INTRODUCTION
Substantial growths in the number of users and emerging high data-rate applications with strict quality-of-service (QoS) requirements pose new challenges for the design/plan of future generations of cellular networks. It has been widely acknowledged that it is imperative to employ more efficient multiple access schemes and improve their performance to cope with such demands. Over the last few years, nonorthogonal multiple access (NOMA) has received a lot of attentions and regarded as a promising multiple access scheme due to its ability to serve multiple users in the same time/frequency resource block. In particular power-domain NOMA (PD-NOMA) is considered in various standardization activities since it can improve spectral efficiency, fairness and throughput of cell-edge users [1], [2]. In PD-NOMA, the base station (BS) combines the users' signals by superposition The associate editor coordinating the review of this manuscript and approving it for publication was Cesar Vargas-Rosales . coding at the transmission side, whereas each user detects its own signal by successive interference cancellation (SIC). However, as the complexity and latency of the SIC method increase with the number of users [3], it is impractical when there is a large number of users in the network. To overcome this issue, it is possible to organize the users into several clusters and deploy orthogonal multiple access (OMA) techniques alongside NOMA.
In fact, the hybrid NOMA-OMA approach has been investigated in several works considering different design goals and under different assumptions [4]- [16]. In general, those existing works can be categorized based on different aspects such as performance metrics, optimization techniques, clustering methods and fading channel models. For example, some authors focus on maximizing the sum rate [4], [5], maximizing the energy efficiency (EE) [6], or minimizing the total power consumption [7]. Other authors consider establishing fairness among the users in terms of diversity order [8], data rate [9], outage [10], VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and throughput [11]. In addition, user clustering algorithms in hybrid NOMA-OMA systems are considered in several works. For example, heuristic user clustering methods are proposed in [12], [17] based on the channel gains, while machine learning methods are studied in [13]. However, none of these methods are based on closed-form expressions that can quantify the resource demand of a cluster and hence can facilitate the clustering algorithm. In contrast, the user clustering algorithm developed in this paper will be based on closed-form expressions of the resource demand. Another observation with regard to the existing user clustering methods for hybrid NOMA-OMA systems is that many of them use static algorithms, which require the total number of users to be fixed before running the algorithm [14]. There are other algorithms that consider dynamic scenarios in which some users can enter or exit the network during the running of the clustering algorithm [13]. Cluster size (N ) is another important parameter in the clustering procedure. This parameter is fixed as N = 2 in some papers [14], [15], and as N ≥ 2 in [16]. Moreover, a recent work considers the more general case of having a variable number of users in each cluster [5], whereas the work in [13] allows users dynamically leave their current cluster and join a better cluster based on some criteria.
A differentiating feature in the research works concerning the hybrid NOMA-OMA scheme is the assumption on the channel state information (CSI). Most of the works, such as [4]- [7], assume perfect instantaneous CSI, which is either impractical or imposes heavy signaling overhead to practically achieve it. In contrast, assuming and requiring statistical CSI only (which is also considered in this paper) can mitigate the overhead issue since the channels can be monitored over longer periods of time, and hence requiring less feedback to be sent to the transmitter. Furthermore, it is pointed out that most works on hybrid NOMA-OMA systems adopt the Rayleigh fading channel model [10], whereas a more general fading model, such as the Nakagami-m fading, has not been considered in the literature.
For clarity, Table 1 summarizes the key points in the above discussion and highlights the differences among existing works on user clustering in hybrid NOMA-OMA systems with respect to research objectives and assumptions.
Considering the above background, in this paper we investigate the problem of user clustering, resource allocation and decoding order selection in a hybrid NOMA-OMA system. In order to guarantee fairness among all the users, we maximize the minimum success probability among them, which is equivalent to minimizing the maximum outage probability. The channel model is Nakagami-m fading and only statistical CSI is available at the transmitter. This channel model presents a high complexity of the resource allocation problem under consideration and it affects all aspects of the solution, including optimal decoding order of the users, and resource allocation factors. For user clustering, we consider three different scenarios: (a) fixed number of users in each cluster, (b) fixed number of clusters, and (c) variable number of clusters and variable number of users in each cluster.
In order to solve the problem of maximizing the minimum success probability among all the users in a hybrid NOMA-OMA system, we first prove that at the optimal solution, all users have the same success probability, which is called a common success probability (CSP). Then, we propose an efficient algorithm to find the optimal CSP and optimal resource allocation factors simultaneously. Next, we derive the optimal inter-cluster power allocation factor for each cluster in a closed form, which is the sum of optimal power allocation factors of individual users in that cluster. We also derive closed-form expressions for the optimal decoding order and intra-cluster power allocation factors of individual users based on the optimal CSP and resource allocation factor of each cluster.
In summary, the contributions of this paper are as follows: • Proposing a novel scheme for user clustering, resource allocation and decoding order selection in a hybrid NOMA-OMA system to guarantee fairness among all users in terms of success probability (or, equivalently, its complement outage probability).
• Proposing an efficient algorithm for finding both the optimal CSP of the users and optimal resource (time slot or bandwidth) allocation factors of the clusters in the system.
• Deriving closed-form expressions for the optimal decoding order, individual user power allocation factors and cluster power allocation factors.
• Proposing three efficient user clustering algorithms considering constraints such as fixed cluster sizes or fixed number of clusters.
• Showing that establishing fairness among all users in a hybrid NOMA-OMA system in terms of the success probability of users can also improve the sum throughput of the system. The rest of the paper is organized as follows. Section II describes the system model. Section III studies the optimal intra-cluster power allocation and decoding order selection for one cluster. Section IV examines the problem of optimal inter-cluster power and resource allocation. Section V proposes user clustering algorithms. Section VI describes the complete proposed scheme. Section VII evaluates performance of the proposed scheme. Section VIII concludes the paper.

II. SYSTEM MODEL
We consider a hybrid NOMA-OMA downlink system with a single-antenna base station (BS) sending mutuallyindependent information to K single-antenna mobile users. With the hybrid NOMA-OMA, the BS arranges users into L clusters. An orthogonal multiple access scheme such as time division multiple access (TDMA) or orthogonal frequency-division multiple access (OFDMA) is used across different clusters, whereas a power domain NOMA (PD-NOMA) is used within a cluster. The choice for the inter-cluster orthogonal multiple access is irrelevant to the analysis in this paper. In particular, the resource allocation factor obtained for each cluster can be interpreted as a proportion of allocated time in the TDMA or as a proportion of allocated bandwidth in OFDMA. Therefore, in the rest of the paper we only refer to the time/bandwidth allocation factor as a resource allocation factor.
Denote the index set of users by K = {1, 2, . . . , K }, the kth user by U k , index set of clusters by C = {1, 2, . . . , L}, the global index set of users in the th cluster by C = {ν ,1 , ν ,2 , . . . , ν ,|C | } and number of users in C by |C |. In fact, C is a subset of K, C ⊂ K, and contains the global (inter-cluster) indices of the users. Clustering should be done such that each user is a member of exactly one cluster. Thus, we should have We also define I = {1, 2, . . . , |C |}, which is intra-cluster index set of the users. Let the total power of the transmitter be P T and the total channel resource is W T , which can be time or bandwidth. The power and resource allocation factor of cluster are denoted with δ and ω , respectively (0 < δ < 1, 0 < ω < 1). Thus, the power and resource allocated to cluster are δ P T and ω W T , respectively. For cluster C , the BS combines |C | independent signals of its users by superposition coding and sends the combined signal to them. Each user has to perform SIC to obtain its own signal. The transmitted signal for cluster C , denoted by x , is given as In the above expression, x ν ,i is the transmitted signal of the ith user in the th cluster, satisfying E(|x ν ,i | 2 ) = 1, and 0 ≤ α ν ,i ≤ 1 is the intra-cluster power allocation factor for U ν ,i . Hence, α ν ,i δ specifies the proportion of the total power P T that is allocated to user U ν ,i . Denote the Nakagamim fading channel coefficient between the BS and U ν ,i by h ν ,i , and additive white Gaussian noise (AWGN) with zero mean and variance N 0 at U ν ,i by z ν ,i . Then, the signal received by U ν ,i is It follows that the normalized instantaneous SNR of the received signal at U ν ,i in the th cluster, ψ ν ,i , is given as where γ ν ,i is the normalized instantaneous SNR of U ν ,i when all the available power P T and resource W T are allocated to cluster C (i.e., δ = 1, ω = 1). Thus, under the assumption of Nakagami-m fading, ψ ν ,i has a Gamma distribution [18] where m ν ,i ≥ 1/2 is the shape factor, The quantitiesψ ν ,i and σ 2 ψ ν ,i , respectively are the mean and variance of the instantaneous SNR ψ ν ,i , and is the Gamma function, defined as VOLUME 10,2022 In this paper, to maximize the minimum success probability among all users in a hybrid NOMA-OMA system, we adopt a bottom-up problem solving approach. We first investigate the intra-cluster power allocation and decoding order optimization for one cluster. Then based on the obtained results, we solve the inter-cluster power and resource allocation problem. Finally we propose clustering algorithms and combine all the results into a unified scheme. For implementation, the BS follows these steps in the reverse order. First it organizes the users into clusters. Then it determines the inter-cluster power and resource allocation factors. Finally it calculates the optimal decoding order and intra-cluster power allocation factor of each user.
For each cluster, an optimization problem should be solved to maximize the minimum success probability of the cluster users by optimizing power allocation factor of each user and selecting the optimal decoding order of users within the cluster. In our previous work [19], we solved such a problem for a single NOMA cluster with K users. Specifically, we proved in [19] that at the optimal solution of the problem, all users have an equal success probability, which we called a common success probability (CSP) of the users. Then, the optimal decoding order and optimal power allocation factor of the users were derived based on their CSP in a closed form and an efficient algorithm was proposed for finding the optimal CSP of the users. The results in [19] thus lay a foundations for the analysis and optimization of the hybrid NOMA-OMA system operating over Nakagami-m fading channels wherein users are assigned into several clusters. As such, in the next section we briefly review the results in [19]. In Section VI we extend the results of [19] to the more general case of hybrid NOMA-OMA.
Given the large number of parameters and notations used throughout the paper, Table 2 summarizes the main system parameters to facilitate reading the paper.

III. INTRA-CLUSTER POWER ALLOCATION AND DECODING ORDER SELECTION
Since this section focuses on power allocation and decoding order for users in one cluster, without loss of generality, we assume that all the power P T and resource W T are allocated to cluster C (δ = 1 and ω = 1). Thus, according to (4), the instantaneous SNR of the ith user in cluster is given as Our objective is to maximize the minimum success probability by optimizing the intra-cluster power allocation factors and the decoding order among all users in the cluster.
With SIC decoding, each user decodes other ''prior'' user signals one by one, and cancels out their effects on the received signal until its own signal is obtained. In general, the decoding order is a permutation of users' indices, denoted by π = {π ,1 , π ,2 , . . . , π ,|C | }. If π ,i = k, then x k is the ith signal to be decoded in cluster . The SNR at U π ,k that is relevant to decoding x π ,i can be calculated as follows γ π ,k π ,i = γ π ,k α π ,i γ π ,k α π ,i I where α π ,i I = |C | j=i+1 α π ,j is simply the sum of intra-cluster power allocation factors of the users whose signals are decoded after x π ,i (those signals are treated as noise). Therefore, based on Shannon's theorem, user U π ,k cannot decode x π ,i correctly, if γ π ,k π ,i < 2 r π ,i − 1, (10) or if one of the prior signals was not decoded successfully, before decoding x π ,i . In (10), r π ,i is the data rate of user U π ,i , normalized according to total resource W T and γ π ,i is the normalized SNR of the user (assuming that the total power P T and resource W T are allocated to one cluster C ). Thus, the outage event for user U π ,k in decoding signal x π ,i can be defined as Note that for the notation O π ,k π ,i used for the outage event above, the superscript specifies the user who is performing the SIC, whereas the subscript specifies the signal that is being decoded.
Obviously, the outage event for user U π ,k with respect to decoding x π ,k is O π ,k π ,k , which is simply the event that U π ,k cannot decode x π ,k (its own signal) correctly. Hence, the success probability of user U π ,k can be written as In [19], we show that for each user U π ,k , a minimum SNR threshold for successful decoding can be found as γ π ,k th = 2 r π ,k − 1 α π ,k − (2 r π ,k − 1)α π ,k I , k ∈ I .
Using the above expression simplifies the expression in (11) to Consequently, it is shown in [19] that the success probability of user U π ,k can be calculated as where Q(·, ·) is the regularized upper incomplete gamma function, defined as [20] Q(a, Then, we show that for maximizing the minimum success probability among users, all the users have an equal success probability, called the common success probability (CSP) (Theorem 2 in [19]). Subsequently, assuming that the optimal CSP of users in cluster is p , we show that the optimal decoding order is given by the ascending order of parameter β π ,k , defined as where Q −1 (·, ·) is the inverse function of Q(a, x) with respect to the second parameter x (see Lemma 3 in [19]). Note that the function Q −1 (·, ·) can be calculated using a numerical method. 1 In other words, the optimal decoding order π should be such that β π ,1 ≤ β π ,2 ≤ · · · ≤ β π ,|C | .
The parameter β actually represents the quality of the channel of each user. Thus, if a user has a lower β it should be given a higher power allocation factor and a higher priority in decoding order. Therefore, the optimal intra-cluster power allocation factors for users in each cluster can be calculated as (for more details, see Theorem 3 in [19]) In [22], necessary conditions are derived for power allocation factors of users in a NOMA system to prevent the signal constellations from overlapping in the superposition coding. It is assumed that each of |C | users of the NOMA cluster employs a square quadrature amplitude modulation (QAM) constellation. We know the fact that the modulation order M π ,i and bit rate r π ,i of user U π ,i are related as where R is the symbol rate of the transmitter for the th cluster. Thus, we can restate the conditions derived in [22] for power allocation factors using the notations in this paper as where ζ π ,i = 2 r π ,i − 1, and without loss of generality, we set R = 1 (for more details, the reader is referred to Proposition 1 and Inequality (19a) in [22]). In the next theorem, we prove that our proposed power allocation scheme always satisfies those necessary conditions. Theorem 1: The power allocation factors (19) for any number of users |C | in the NOMA cluster and arbitrary modulation orders M π ,i employed by the users satisfy the conditions given by (21).
Proof: See Appendix A. Furthermore, the sum of all power allocation factors as derived in (19) can be calculated in a closed form as which is independent of individual intra-cluster power allocation factors. The sum of intra-cluster power allocation factors S(p , π ) should be exactly one. A value less than one means some of the allocated resource remains unused and a value higher than one means that the cluster is using more resources than what has been allocated to it. Thus, in [19], we incorporated and proved the necessity of the constraint to find the optimal CSP in an efficient way by performing a binary search on parameter p . For completeness, the algorithm for finding the optimal CSP is included in Appendix B. In the next section, we extend that algorithm to simultaneously find both the optimal CSP and optimal inter-cluster resource allocation factors when users are grouped into several clusters in a hybrid NOMA-OMA system. We also generalize the obtained intra-cluster power allocation factors (19) to the case of hybrid NOMA-OMA in Section VI.

IV. INTER-CLUSTER POWER AND RESOURCE ALLOCATION
As explained in the previous section, within a cluster, maximizing the minimum success probability of all users can be done by the following steps: 1) Find the optimal CSP p by running Algorithm 7 (see Appendix B). 2) Select the optimal decoding order of users in the cluster according to (18). 3) Calculate the optimal power allocation factors for users by (19). Since all users in a cluster have the same success probability p , the problem of maximizing the minimum success probability of users across all clusters can be formulated as Similar to the intra-cluster optimization problem, we can also prove that at the optimal solution of the inter-cluster optimization problem in (24), the success probabilities of all users are equal. This result is summarized in the following lemma.
Lemma 1: At the optimal solution of problem (24), the success probabilities of all users across all the clusters are equal and we have Proof: See Appendix C. Furthermore, we have the following results regarding the constraints of problem (24).

Lemma 2: At the optimal solution of problem (24), constraints (24b) and (24d) are satisfied with inequality, and constraint (24c) is satisfied with equality.
Proof: This lemma can be proved by contradiction. Suppose that for one of the clusters, either constraint (24b) or (24d) is satisfied with equality. Then the success probability of that cluster would be zero, which contradicts with the objective of maximizing the minimum success probability of all users. On the other hand, if constraint (24c) is satisfied with inequality, then all the cluster power allocation factors, δ , ∈ C can be multiplied by 1/ ∈C δ . Because the success probability is a strictly increasing function of power allocation factors, the increase of power allocation factors increases the success probabilities of users in all clusters, which is a contradiction. Thus, the lemma is proved.
Recall that the results in the previous section were obtained when the total power P T and resource W T are allocated to a single cluster C and the resulting data rates and SNRs of users in the cluster are normalized according to those parameters. In this section the power and resource allocated to cluster C are δ P T and ω W T , respectively. Thus, instead of parameters r andγ , we need to use parameters r/ω and δ γ /ω , respectively, in the function S(p , π ) defined in (22). On the other hand, from Lemma 1 we know that at the optimal solution of problem (24) the success probability of all users across all the clusters are the same. Thus, assuming that the CSP is p we can rewrite (23) for each cluster as follows: where π ,i is the index of the ith user in the optimal decoding order of cluster C . From (26) we can derive the power allocation factor of each cluster based on its resource allocation factor, CSP and statistical CSI in a closed form: In Lemma 2 we proved that at the optimal solution of problem (24), the sum of all inter-cluster power allocation factors δ is equal to one. Thus, if we denote the vector of all inter-cluster resource allocation factors as ω = [ω 1 , ω 2 , . . . , ω L ] and define h(p, ω) as then according to (27) and (28) we should have Therefore, we can reformulate the problem in (24) as follows: Under the normal and expected condition 2 that 0.5 ≤ p ≤ 1, we can prove that problem (30) is convex. As such, we propose an efficient algorithm for solving it by utilizing Karush-Kuhn-Tucker (KKT) conditions [23].
Proof: Refer to Appendix D. According to Lemma 3, after modifying last constraint of problem (30) to 0.5 ≤ p ≤ 1 problem is convex and the KKT conditions of the resulting convex optimization problem are as follows: • Stationarity: • Dual feasibility: Based on (31h), (31i) and (31j), it is straightforward to verify that all µ values should be equal to zero. Otherwise it will result in special cases that are not practically feasible nor important. For instance, any of µ , ∈ {1, 2, . . . , L} not being zero means that the resource allocation factor and consequently, the success probability of that cluster is zero. Thus, the KKT conditions (31) can be rewritten as To simplify the relations we define λ = − λ 2 λ 1 in (32a). Algorithm 1 is then proposed to find inter-cluster resource allocation factors ω , ∈ C, and CSP p of all users simultaneously.
In this algorithm, the parameter specifies the precision of the output parameters and can be chosen arbitrarily as an input of the algorithm. As a default value we set it to = 10 −3 in the simulations. In lines 4 and 7 of this algorithm, it is necessary to find λ and ω such that (32a) and (32d) are satisfied. These parameters can be found using Algorithms 2 and 3, respectively, and they are discussed further below. In Algorithm 1, the value of p is not restricted to the interval (0.5, 1) as stated by constraint (32f). For the case p ∈ (0, 0.5), the algorithm will converge to a solution that guarantees fairness among the users, however, Find λ, ω according to Algorithms 2 and 3, respectively. 8: if h(p, ω) < 1 then 9: p L = p 10: else if h(p, ω) > 1 then 11: p H = p 12: else 13: return p, ω 14: end if 15: end while 16: return p, ω.
we cannot prove the optimality of such a solution by using the KKT conditions. As pointed before, the case that the success probabilities are less than 0.5 are not practically important.
Parameter λ can be found using (32a). In Appendix D (proof of Lemma 3), we obtain ∂h(p,ω) ∂ω in (62). Therefore, λ can be derived as follows: where f (x) = x(ln x − 1). It is clear from (33) that λ < 0 always holds true. On the other hand, from (65) in Appendix D we have which means that λ is a strictly increasing function of ω , and also the converse function ω is a strictly increasing function of λ. Thus, using the fact that ω values should be such that (32d) is satisfied, we can derive boundaries for the acceptable range of λ values. We know that ω ∈ (0, 1) and based on (33), choosing ω in the neighborhood of zero results in λ → −∞. This means that if λ is less than a threshold value, then all ω values will be near to zero, and their sum would not add up to one to satisfy (32d). Therefore, an acceptable range for parameter λ is as follows: λ ∈ (λ min , λ max ). (37) Find ω , ∈ C according to Algorithm 3. Recall that ω is a strictly increasing function of λ. Thus, if λ > λ max the summation ∈C ω > 1 and if λ < λ min the summation ∈C ω < 1. Now that the parameter λ is bounded, we can adopt a binary search for finding its value, as outlined in Algorithm 2. In line 6 of this algorithm, it is necessary to calculate ω values, which are bounded to the interval (0, 1) and should satisfy (33) with the given value for λ. To this end, we consider the following function The root of e(ω ) is the optimal value of ω . Therefore, based on the fact that λ is a strictly increasing function of ω , a binary search can be used to find the optimal ω as proposed in Algorithm 3. By using Algorithms 1, 2 and 3 we can obtain the optimal CSP p and the optimal inter-cluster resource allocation factors ω , ∈ C. Then, the inter-cluster power allocation factors δ , ∈ C can be readily found from the closed-form expression in (27).
It should be pointed out that Algorithms 1, 2 and 3 are operated jointly to find the optimal CSP and resource allocation factors. Specifically, Algorithm 1 performs a binary search on CSP p of the clusters, and finds the optimal value in log 2 (1/ ) iterations. In each iteration, it calls Algorithm 2, which also performs a binary search to find the proper value of λ in log 2 (1/ ) iterations. Likewise, Algorithm 2 in each iteration calls Algorithm 3 to find values of ω by another binary search. These three nested binary search algorithms find the optimal values of ω and CSP p of all L clusters in L log 2 (1/ ) 3 iterations, which grows linearly with the number of clusters L. In contrast, the exhaustive search method would need to evaluate (1/ ) 2L+1 states to find the optimal ω values, optimal δ values and optimal CSP of users, which grows exponentially with the number of clusters L. Thus, the computational complexity of our proposed method is much less than that of the exhaustive search method.

V. PROPOSED USER CLUSTERING ALGORITHMS
Building on the results given in the previous section, in this section we shall propose user clustering algorithms for the following three cases: 1) The number of users in each cluster |C | is fixed.
2) The total number of clusters L is fixed, but the number of users in each cluster |C | can be variable. 3) Both |C | and L are variable. All three algorithms are developed based on the same principle of minimizing the power consumption of all clusters according to the closed-form expression (27) for power allocation factor of each cluster. We consider constant values for the resource allocation factor ω and target success probability p. In the clustering step, the goal is to find users who can cooperate the best in a NOMA setting, in the sense that they need the least power to achieve a given target success probability. After finding the clustering structure, the optimal power allocation factor, resource allocation factor, and optimal CSP of users are determined based on the total available power and resource at the transmitter, according to the results of the previous section. We also investigate the impact of selecting the initial value of CSP on the performance of user clustering by simulations and show that even without iterating over multiple initial values of CSP our proposed algorithms outperform existing algorithms (see Section VII). Therefore, in developing clustering algorithms we assume that resource is allocated equally to all clusters and consider p = 0.95 as a target success probability (but they can be chosen any other value arbitrarily). To derive the cost metric δ for any cluster C , it is necessary to select the optimal decoding order π according to (18).

A. CASE 1: EQUAL NUMBER OF USERS IN ALL CLUSTERS
Let K be the total number of users and N the number of users in each cluster. Then the number of clusters is L = K N (the number of users in the last cluster may be less than N if N does not divide K ). For initialization of the clustering algorithm, we assume that the total available resource is divided equally among the clusters, i.e., ω = 1/L, ∈ C. We also consider an arbitrarily given target success probability, for example p = 0.95.
We first sort users based on the ascending order of parameter β π k , defined in (17). The first user in the list is simply selected as the first user of the first cluster. To choose the second user of the first cluster, we examine every remaining user in the list together with the first user and form a two-user cluster. We calculate δ for each of these two-user clusters according to (27) with ω = 1/L, p = 0.95 and the optimal decoding order in (18). Then the user having the lowest δ is chosen as the second user of the first cluster. The same procedure is then repeated in order to choose the 3rd, 4th, . . . , and N th users of the first cluster. After selecting the N th user of the first cluster, we continue with the same procedure to create the next clusters until all users are clustered. Algorithm 4 provides pseudo-code for the proposed clustering scheme.
Ignoring the complexity in selecting the first user in each cluster, for selecting the second user in the first cluster δ should be calculated K − 1 times, and for selecting the third user, δ needs to be calculated K − 2 times, etc. Thus, the computational complexity of Algorithm 4 is at most which increases polynomially in time with the total number of users K . It should also be pointed out that Algorithm 4 is a static algorithm since all the users should be available before running the algorithm.

B. CASE 2: FIXED NUMBER OF CLUSTERS L
Recall that Algorithm 4 assumes that the number of users in each cluster is fixed, which also means the number of clusters is fixed. For the case considered in this subsection, we relax that constraint and require that only the total number of clusters is fixed, whereas there is no constraint on the number of users in each cluster. To put K users into L clusters, we first sort the list of users based on the ascending order of parameter β π k in (17). Then we choose the first L users of the sorted list (who have the weakest channels) and put them into L clusters. Thus, after this step, each cluster has one user. For clustering the rest of users, based on the sorted list, we calculate δ , ∈ {1, 2, . . . , L} for each user assuming that it has joined cluster C and select the cluster that results in the minimum value of δ (after adding that user).
Algorithm 5 gives pseudo-code for this clustering scheme. It is pointed out that this algorithm can be deployed in a dynamic scenario as well. Since any newly arrived user can join one of the existing clusters based on the criterion of minimizing δ without changing the whole clustering structure. Sorting the users based on β π k in advance has the VOLUME 10, 2022 U sel is selected as the first user of K, and remove it from K.
benefit of simplifying the calculation of δ as explained next. In calculating δ for a cluster, it is necessary to select the optimal decoding order for that cluster according to (18). But if we sort the users first, each user who joins a cluster will be the last user in the optimal decoding order of that cluster. However, for the newly arrived users in a dynamic scenario, the optimal decoding order should be calculated.
Ignoring the complexity in clustering the first L users, for clustering each of the remaining users, δ should be calculated L times. Thus, the computational complexity of Algorithm 5 is proportional to (K − L)L, which increases polynomially in time with number of users K and number of clusters L.

|C | AND VARIABLE CLUSTER COUNT L
In this case, we examine the most general scenario that the number of users in each cluster as well as total number of clusters are variable. Considering the latency and computational complexity of SIC, it is reasonable to set limits on the minimum and maximum numbers of clusters, L min and L max , respectively. In general, when the number of clusters decreases, more resource can be allocated to each cluster. On the other hand, as the number of users in each cluster increases, each cluster needs more power to achieve a target success probability. The computational complexity and latency of SIC also increase for a larger cluster. In this case, we employ Algorithm 5 to search over all numbers of clusters L in the range {L min , L min + 1, . . . , L max }. For each value of L, we cluster the users according to Algorithm 5 and by assuming a target common success probability (such as p = 0.95) and equal resource allocation (ω = [1/L, 1/L, . . . , 1/L]), we derive the sum of power allocation factors of the clusters according to the closed-form expression h(p, ω) given in (28). Then, we choose the best clustering that results in the minimum sum of power allocation factors for all clusters.
Algorithm 6 provides pseudo-code for this clustering scheme. Since this algorithm runs Algorithm 5 in each iteration, its computational complexity is proportional to where L avg = [L min + L max ]/2. Thus, the computational complexity of this algorithm still increases polynomially in time with the number of users K and the number of clusters L.

VI. THE COMPLETE USER CLUSTERING, POWER AND RESOURCE ALLOCATION SCHEME
In previous sections we developed and presented user clustering algorithms, inter-cluster power and resource allocation schemes, and intra-cluster power allocation and decoding order selection separately. In this section, we combine them in a unified procedure that can be implemented at the BS to organize users into clusters, and allocate power and resource to guarantee fairness among users. Recall that we require the statistical CSI, which contains mean and variance of SNR of users be reported to the BS via feedback channels once in every coherence time interval. The user clustering algorithm and resource allocations can have separate update intervals. For instance, if the resource allocation update interval is T , then clustering can have an update interval of kT to reduce the computational complexity. In all calculations we assume that all the rates and SNRs of the users are normalized according to the total available power P T and total resource W T . Thus, if a user reports ψ ν ,i and σ 2 ψ ν ,i which are the mean and variance of its SNR, normalized according to δ P T and ω W T of its cluster, then the BS should replace them withγ ν ,i and σ 2 γ ν ,i , respectively, which according to (4) can be derived as Likewise, for the downlink rates of users, the BS has to normalize them according to the total available resource W T . In Section III, we derived the optimal intra-cluster decoding order and power allocation factor of users assuming that the total power P T and resource W T of the transmitter are allocated to cluster C (δ = 1 and ω = 1) in (18) and (19), respectively. To extend those results to the general case that δ and ω are not necessarily equal to one, we need to replace the rate r with r/ω and the mean of SNRγ π ,k with (δ /ω )γ π ,k in the definition of parameter β π ,k in (17) and the intra-cluster power allocation factors in (19). Thus, the optimal decoding order is based on the ascending order of parameter β π ,k , which is defined as However, since δ and ω do not change for users inside each cluster, deriving the optimal decoding order based on (17) or (43) gives the same result. Since (17) is more compact, we shall always use it for selecting the optimal decoding order. Performing variable replacements in (19) for the generalized intra-cluster power allocation factors, we obtain Flowchart of the complete proposed scheme for user clustering, power and resource allocation in the base station.
Finally, the complete procedure for user clustering, power and resource allocation is summarized in the flowchart of Figure 1 and elaborated further below.
1) Obtain the means and variances of SNRs of all users from the feedback channels. 2) Calculate the shape factor m of Nakagami-m fading channels for all users according to (6). 3) Initialize/Reinitialize a target common success probability (CSP) for user clustering algorithm. 4) Based on the predefined assumption about cluster size and total number of clusters (i.e., being fixed or variable) run one of Algorithms 4, 5 or 6 to cluster the users. 5) Run Algorithm 1 to obtain the optimal CSP p and optimal inter-cluster resource allocation factor ω of all clusters (Algorithm 1 will call for Algorithms 2 and 3 inside itself). 6) If the obtained optimal CSP in Step 5 is good enough (e.g. the absolute difference is less than 0.05) as compared to the initial value of CSP considered, continue to Step 7. Otherwise, go to Step 3 and reinitialize the CSP with the obtained CSP in Step 5. 7) Derive the optimal decoding order for users of each cluster based on the ascending order of parameter β π ,k defined in (17). 8) Use equation (27) to compute the optimal inter-cluster power allocation factor δ of each cluster and obtain the optimal intra-cluster power allocation factor of each user α π ,k according to (44). Then, the value δ α π ,k is the proportion of the total power P T that has been allocated to the kth user in the optimal decoding order of the th cluster. 9) Obtain the signal to be transmitted to each cluster by superposition coding according to (2) and send it to the users of that cluster.
Note that each user has to perform SIC to obtain its own signal. If the BS follows the above procedure, fairness among the users will be guaranteed in terms of the outage or success probability of users, i.e., the minimum success probability among them will be maximized. It is pointed out that according to (2), the BS does not need the values of the optimal inter-cluster power allocation factor δ and optimal intra-cluster power allocation factor α π ,k separately to form the superimposed signal for each cluster. It only needs their product δ α π ,k , which specifies the proportion of the total power P T that should be allocated to user U π ,k and it can be derived directly from (44) by moving δ to the other side of the equation. However, we obtain them separately to keep the logical flow, improve the modularity and readability of the paper, and also to emphasize the fact that the closed-form expression for the inter-cluster power allocation factor δ can be used as a cost metric for user clustering algorithms.

A. COMPUTATIONAL COMPLEXITY ANALYSIS
To complete Section VI, we analyze the computational complexity of our proposed scheme for user clustering, power and resource allocation. It's noteworthy that the main loop of the proposed scheme for iterating over multiple initial target common success probabilities (CSPs) only affects performance of clustering algorithms, since the power and resource allocation algorithms establish fairness among the users for any given clustering. Besides, in Section VII-D, we show that without iterating over this loop and only with a fixed initial target CSP such as p = 0.95, our proposed scheme outperforms existing works. However, if the computing power at the BS and latency constraints of the system are flexible, performing a few iterations (less than 5) over the main loop will decrease the gap between the initial target CSP and the optimal CSP. Consequently, that results in a better performance of user clustering algorithm and in increasing the value of the optimal CSP (see Section VII-B for more details). Therefore, we only analyze the computational complexity of one iteration of the complete proposed scheme as depicted in flowchart of Figure 1.
The first step in the proposed scheme acquires the statistical CSI of users and should be done periodically once in the coherence time interval of the channels. If a user fails to send CSI feedback to the BS in the coherence time interval, it can be omitted from the set of users or served with the previously reported CSI (which may be outdated). Nevertheless, incorporating these details is out of scope of this paper. We assume that there are K users that have reported their statistical CSI to the BS and we derive efficient algorithms to cluster these users and allocate power and channel resources to them such that the minimum success probability among them is maximized. On the other hand, requiring only the statistical CSI is the most practical assumption as it has the minimum signaling overhead compared to other assumptions, especially the assumption of having perfect instantaneous CSI at the BS as considered in many other papers (see Table 1). Thus, we skip the computational complexity of collecting the statistical CSI of users, which can be performed periodically over the feedback channels.
Since, the derived equations for the Nakagami-m distribution of the SNR of users are in closed-form and initializing the target CSP is a constant parameter selection, they can be ignored in computational complexity of the proposed scheme. However, for the next major step which is clustering the users, one of Algorithms 4, 5 and 6 should be used. We showed that the computational complexity of these algorithms increase polynomially in time with increasing numbers of users and clusters. If the total number of users is K , then none of these clustering algorithms requires more than O(K 2 ) iterations to perform the clustering. Thus, we consider O(K 2 ) to be the computational complexity of the clustering step.
The next step is to find the optimal CSP and resource allocation factors according to Algorithm 1. As discussed in the last paragraph of Section IV, by considering the acceptable error in finding all the parameters to be , the computational complexity of Algorithm 1 is L log 2 (1/ ) 3 which increases linearly in time with increasing number of clusters L. All the remaining steps of the scheme are to calculate some parameters such as decoding order and power allocation factors according to closed-form expressions. Thus, their computational complexity is negligible. Therefore, the overall computational complexity of our proposed scheme is proportional to operations. On the other hand, using an exhaustive search method for finding the K optimal user power allocation factors and L cluster resource allocation factors with precision requires investigating 1 K +L states that increases exponentially in time with the number of users K and number of clusters L. Moreover, considering all the possible clustering and decoding orders of users with fixed N = K /L users in each cluster, the number of states in exhaustive search is which increases exponentially in time with the numbers of users and clusters. Hence, our proposed scheme significantly decreases the computational complexity of solving the problem. We will also evaluate the run time of the complete proposed scheme by simulations in section VII-D.

VII. SIMULATION RESULTS
In this section, performance of the proposed algorithms is evaluated by simulations and compared to those of existing algorithms. All simulations were executed on a laptop with Intel(R) Core(TM) i5-5200U CPU2.20 GHz and 8 GB of RAM.

A. PERFORMANCE OF POWER AND RESOURCE ALLOCATION SCHEME
In this subsection we investigate performance of our proposed scheme for power and resource allocation and compare it to the following power and resource allocation schemes: 1) Equal allocation: Power and resource are allocated equally to all clusters. 2) Proportional allocation: Power and resource are allocated to each cluster proportional to the ratio of the number of users in that cluster to the total number of users. 3) Method of [24]: Power is allocated to users according to the distributed power control method proposed in [24] (for more details see Equations (25), (26) and (28) in [24]). However, since no resource allocation scheme is proposed in that paper we use a proportional resource allocation scheme in this case. For the first two of these inter-cluster power and resource allocation schemes, we employ our proposed intra-cluster power allocation to maximize the minimum success probability of users inside each cluster separately. However, for the third scheme we use the power allocation method proposed in [24]. The main goal of our proposed scheme is to establish fairness among all the users. Thus, we first compare performance of these schemes using Jain's index [25] in terms of the success probability of the users. This metric has been adopted in many works (e.g. [11], [26]) to evaluate fairness among users. The Jain's index for the success probability of K users is defined as follows: If the success probabilities of all the users are equal, then Jain's index is maximum and equal to one. In the worst case, where all the success probabilities are zero, except for one user, the index is minimum and equal to 1/K .
Then for having different values of the sum rate, all the users' rates are multiplied by a proper constant factor. It is clear from Figure 2 that as r increases, the performance of our proposed scheme stays the same and fairness is established among all users. However for the other schemes, Jain's index quickly decreases as r increases.
Recall that the goal of our proposed scheme is to maximize the minimum success probability of users. Thus, we also compare the minimum success probability of the users among these schemes in the same simulations that we perform for Jain's index, and the results are plotted in Figure 3. The results show that by establishing fairness among all users in our proposed algorithm, the minimum success probability of users is significantly improved when compared to that of the other power and resource allocation schemes.

B. IMPACT OF ITERATION OVER MULTIPLE INITIAL CSP VALUES
In this section, we investigate the impact of the initial CSP value on the performance of our proposed scheme. To this end, 30 users are generated with random parameters as explained before and run Algorithms 4, 5 and 6 separately. We perform 8 iterations over the loop of the proposed scheme and reinitialize the CSP value of the clustering with the optimal CSP obtained in the last iteration as described in the flowchart of Figure 1. Figure 4 plots the averages of the optimal CSP values over 100 simulation runs versus the number of iterations for different clustering algorithms. We set the first ''initial CSP'' value to be 0.8 (i.e., in iteration 0). It is clear that as the  initial CSP value gets closer to the optimal CSP value, the clustering algorithm performs better and the optimal CSP of users increases. This is because changing the initial CSP value affects both the optimal decoding order and δ values for each cluster. Thus, by selecting an initial value of the CSP closer to the optimal CSP, the clustering algorithm determines the power demand and optimal decoding order of each cluster more accurately. In addition, this simulation shows that the proposed scheme converges very quickly, only after a few iterations, to the optimal CSP of users. Thus, to keep the computational complexity of our proposed scheme as low as possible, in the next two subsections we only consider a predefined CSP value of 0.95 and show that even without iterating over multiple CSPs, our proposed scheme still outperforms existing schemes.

C. PERFORMANCE OF USER CLUSTERING ALGORITHMS
In this section we evaluate the performance of Algorithms 4, 5 and 6 in terms of the minimum success probability of users. In addition to our proposed algorithms, we also consider two other algorithms for comparison. The first one is random user clustering, which does not utilize the statistical CSI of users for clustering and represents a lower bound of performance for other user clustering algorithms. In the simulation results this algorithm is labeled as ''Random clustering''. The second algorithm is the method proposed in [14], which is designed to cluster users into two-user clusters. In that method, users are sorted based on their average SNRs. Then, the first and last users are paired together, the second user and the one before the last user are paired, and in general, the kth user is paired with the K − k + 1th user, where the total number of users K is assumed to be even (refer to Theorem 3 in [14] for more details). In the simulation results this method is labeled as ''Method of [14]''. In order to focus on the impact of user clustering algorithms on the performance, we implement our proposed power and resource allocation scheme for all the aforementioned clustering algorithms.
As before, here we also consider 30 random users and scale up/down their sum rate by multiplying all the rates by a constant scale factor. Figure 5 plots the minimum success probability of the users against their sum rate. It is clear that our proposed algorithms outperform the two other reference algorithms. By comparing our proposed clustering algorithms, it can be seen that as we relax the constraints on the number of users in each cluster and the total number of clusters, the performance of the clustering algorithm improves. This is expected as a higher degree of freedom should help to form a better clustering structure.

D. PERFORMANCE OF THE COMPLETE PROPOSED SCHEME
In this section, we evaluate performance of our complete proposed scheme for user clustering and resource allocation and we compare it with the following reference methods: 1) Clustering method of [14] + equal resource allocation + power allocation of [24] 2) Random clustering + equal resource allocation + power allocation of [24]   3) OMA technique such as TDMA + our proposed power and resource allocation It's noteworthy that our proposed scheme is capable of allocating resources to singleton clusters, which consist of only one user. By considering all the clusters to be singleton, our proposed hybrid NOMA-OMA scheme simplifies to the OMA since all the users will use orthogonal resources in that case. Thus, in the third reference method we consider singleton clusters to clarify the superiority of the hybrid NOMA-OMA scheme in comparison to the pure OMA. In the simulations, we compare the Jain's index, minimum success probability and sum throughput of users for the proposed and reference algorithms. The simulations are repeated 100 times and the averages of the obtained results are plotted in Figures 6, 7 and 8. At each repetition, K = 30 users with random CSI and rate parameters are simulated. Then, for each value of r , the rates of all users are scaled up or down with a proper constant. It is clear that as r increases, our proposed schemes outperform other reference algorithms in all the considered performance metrics. It's notable that the curves of our proposed clustering algorithms are very close to one another and they appear overlapped.   Figure 9 depicts the runtime of our proposed scheme versus the number of clusters. From this figure, it is seen that the computational complexity of our proposed scheme almost linearly increases with the number of clusters, which is consistent with the complexity analysis given in Section VI-A. In this simulation for different number of random users, the proposed scheme is repeated 500 times and the average runtime of the whole scheme is calculated. The precision of calculating parameters such as resource allocation factors is set to = 10 −3 . It's noteworthy that we implemented the scheme in a single-thread mode. However, utilizing parallelism and multi-threading is possible for the implementation of binary searches of Algorithms 1, 2 and 3, which should reduce the runtime of the proposed scheme.

VIII. CONCLUSION
In this paper, we have tackled the problem of optimizing user clustering, power allocation to users, resource (time slot or bandwidth) allocation to clusters, and decoding order in each cluster for the downlink of a hybrid NOMA-OMA system operating over Nakagami-m fading channels. In a hybrid NOMA-OMA system, users are organized into several clusters, where clusters use an orthogonal multiple access scheme to utilize channel resources while users in each cluster employ power-domain NOMA. The goal was to maximize the minimum success probability (or equivalently minimize the maximum outage probability) among all users. We first proved that at the optimal solution of the problem, all the users have a common success probability (CSP). We then proposed an efficient algorithm for finding the optimal CSP and resource allocation factors of clusters simultaneously. We also derived the inter-cluster power allocation factor for each cluster, intra-cluster power allocation factor for each user, and optimal decoding order of users inside each cluster in a closed-form expression based on the CSP, statistical CSI of users and resource allocation factor of each cluster. We proposed efficient algorithms for user clustering under three different scenarios where the number of users in each cluster and/or the total number of clusters are fixed or variable. All three algorithms were developed based on the same principle of minimizing the power consumption of each cluster while achieving a given target success probability. Simulation results show that our proposed schemes for user clustering, power and resource allocation outperform existing schemes not only in terms of fairness and the minimum success probability of users, but also in terms of the sum throughput. An interesting topic for a future work is to develop efficient user clustering and resource allocation methods for the uplink of a NOMA system operating over Nakagami-m fading channels in order to guarantee fairness among users.
The closed-form expressions for power allocation factors in (19) are derived based on these recursive equations.
By using (56) the inequality (55) reduces to 1 β π ,2 < 1 β π ,1 , which is always true according to the optimal decoding order condition (18). Proving the conditions (21) for other values of i = 2, . . . , N is straightforward by following the same method. Therefore, the proof of theorem is complete.

Algorithm 7
Finding the Optimal CSP p of Users in Cluster C Input: m π i ,γ π i , r π i ∀i ∈ I and .

APPENDIX B ALGORITHM FOR FINDING THE OPTIMAL CSP OF USERS FOR ONE CLUSTER
In Algorithm 7 we recall the algorithm proposed in [19] as a reference to facilitate comparison with its extended version developed in this paper, namely Algorithm 1. Algorithm 7 is designed to find the optimal CSP of K users when all of them are grouped into one cluster. In this algorithm, is the precision of calculating the common success probability (CSP). Algorithm 1 finds both the optimal CSP and optimal inter-cluster resource allocation factors across clusters when users are grouped into several clusters.

APPENDIX C PROOF OF LEMMA 1
In [10] a similar lemma is proved for the case of Rayleigh fading. For the case of Nakagami-m fading, the success probability is derived in (15) as p π ,k = Q m π ,k , m π ,k γ π ,k th /γ π ,k .
The function Q(·, ·) is a strictly decreasing function of the second parameter and according to (13) γ π ,k th is a strictly decreasing function of α π ,k . Thus, the success probability p π ,k is a strictly increasing function of power allocation factor α π ,k . The lemma can be proved by contradiction. Suppose that at the optimal solution, the success probabilities of all clusters are not the same. Thus, some clusters have the minimum success probability. Denote those clusters by C = arg min ∈C p and the rest of clusters by C . For these subsets we have C ∪ C = C and C ∩ C = ∅. Based on the fact that the success probability function of each cluster is a strictly increasing function of the power allocation factor of that cluster, we can find an appropriate positive value such that by subtracting from all the power allocation factors of clusters in C , and adding |C | |C | to the power allocation factors of clusters in C , the minimum success probability of the clusters can be increased, which contradicts the optimality of the solution. This proves the lemma.

APPENDIX D PROOF OF LEMMA 3
To prove Lemma 3, we know that the objective function and all the constraints of problem (30) are linear, except for (30b). Thus, to prove the convexity, it suffices to prove that in the constraint (30b), h(p, ω) is a convex function. To this end, we investigate the positiveness of the second order derivatives of h(·, ·) with respect to its parameters. To derive h ω (p, ω) = ∂ 2 h(p,ω) ∂ω 2 we use the parameter:  (62) VOLUME 10, 2022 Subsequently, by using f (x) = ln x and some straightforward algebraic manipulations the second order derivative can be calculated as × ln 2 i j=1 r π ,j /ω 2 2 i j=1 r π ,j /ω /ω + 1 β π ,|C | ln 2  Q(a, b). Thus, by utilizing (80) and considering y = 2 a we have Q(a + 1, a + 2) = 1 − P(a + 1, a + 2) Thus, we know that the inequality Q(m π ,i , m π ,i + 1) ≤ 0.5 is always true and in the interval 0.5 ≤ p ≤ 1 the inequality (79) and (78) are also always true. This proves that h p (p, ω) ≥ 0 and completes the proof of Lemma 3.