Optimal Throughput Fairness Trade-offs for Downlink Non-Orthogonal Multiple Access over Fading Channels

Recently, non-orthogonal multiple access (NOMA) has attracted considerable interest as one of the 5G-defining techniques. However, as NOMA is intrinsically in favour of the transmission of strong users who are capable of carrying out successive decoding, judicious designs are required to guarantee user fairness. In this paper, a two-user downlink NOMA system over fading channels is considered. For delay-tolerant transmission, the average sum-rate is maximized subject to both average and peak power constraints as well as a minimum average user rate constraint. The optimal resource allocation is obtained using Lagrangian dual decomposition under full channel state information at the transmitter (CSIT), while an effective power allocation policy under partial CSIT is also developed based on analytical results. In parallel, for delay-limited transmission, the sum of delay-limited throughput (DLT) is maximized subject to a maximum allowable user outage constraint under full CSIT, and the analysis for the sum of DLT is performed under partial CSIT. Furthermore, a sophisticated orthogonal multiple access (OMA) scheme is also studied as a benchmark to prove the superiority of NOMA over OMA with full CSIT. Finally, the theoretical analysis is verified via simulations by means of various trade-offs for the average sum-rate (sum-DLT) versus the minimum (maximum) user rate (outage) requirement.


I. INTRODUCTION
As the incoming fifth generation (5G) wireless communications features massive connectivity among hetereogeneous types of users in the Internet of Things (IoT), non-orthogonal multiple access (NOMA) has been envisioned as a promising candidate for 5G networks [1][2][3], due to its advantage in enabling high spectral efficiency via non-orthogonal resource allocations over other orthogonal multiple access (OMA) techniques, such as time-division multiple access (TDMA) and frequency-division multiple access (FDMA) (see [4] and the references therein).
Variation forms of NOMA, namely, multiuser superposition transmission (MUST) and layer division multiplexing (LDM), have been included in the 3rd Generation Partnership Project Long Term Evolution Advanced (3GPP-LTE-A) [5] and the next general digital TV standard ATSC 3.0 [6], respectively.
Among a variety of studies addressing the challenges posed by NOMA, a general NOMA downlink framework was proposed in [7] in which a base station (BS) is capable of simultaneously communicating with several randomly deployed users. To increase the throughput of cell-edge users in multi-cell NOMA networks, coordinated multipoint (CoMP) transmission techniques were adopted in [8] and [9] with the BS equipped with a single antenna and multiple antennas, respectively.
On another front, far users that suffer from severe path-loss attenuation are usually disadvantaged in competing for resources enhancing the sum throughput of the system, and therefore their performance could be substantially compromised without proper design. There are mainly three types of countermessures against such unfairness in NOMA networks. The first strategy is to invoke cooperative NOMA [10,11], in which a nearby user is regarded as a relay to assist a distant user. It is demonstrated in [10] that by utilizing the proposed cooperative protocol, all users experience the same diversity order. In [11] the nearby NOMA users are equipped with wireless energy harvesting capability to assist far users. The second strategy is to enhance the worst user performance [12][13][14]. The max-min power allocation problem that maximizes the minimum achievable user rate was studied for single-input-single-output (SISO) NOMA systems in [12], and for clustered multiple-input-multiple-output (MIMO) NOMA systems in [13]. In [14], the authors provided a mathematical proof for NOMA's superiority over conventional OMA transmission in terms of the optimum sum rate subject to a minimum rate constraint.
The third strategy is to introduce additional factors to guarantee fairness. Weighted sum-rate is an effective metric to reflect the priority of users in resource allocation [15,16]. [15] considered a mutlicarrier downlink network, in which each sub-channel can be shared by multiple users by adopting NOMA. Joint sub-channel and power allocation was formulated as a weighted sum-rate maximization problem, and iteratively solved by leveraging a matching problem with externalities. multicarrier NOMA systems employing a full-dupex (FD) BS was considered in [16], and an optimal joint sub-carrier and power policy for maximizing the weighted sum-rate was developed by applying monotonic optimization.

A. Related Work
The above fairness issues were considered for NOMA operating over added white Gaussian Noise (AWGN) channels. In this paper, by contrast, we consider a two-user downlink NOMA network operating over a fading channel, in which optimal power and rate allocation strategies are applied adaptively to channel dynamics for delay-tolerant transmission subject to a minimum average rate constraint, and optimal power controls are derived to maintain a target rate vector at as many fading states as possible for delay-limited transmission subject to a maximum outage constraint.
The information theoretic study of fading broadcast channels (BCs) can be traced back to [17] and [18]. Assuming perfect CSI at both the transmitter (Tx) and the receivers (Rxs), dynamic power and rate allocations for various transmission schemes including code division with and without successive decoding, time division, and frequency division over different fading states were studied for the ergodic capacity region (ECR) and the (zero-) outage capacity region (OCR) in [17] and [18], respectively. The boundaries of the ECRs have been characterized in [17] by solving equivalent weighted sum-rate problems each corresponding to one set of weights. The (zero-) OCRs were inexplicitly characterized by deriving the outage probability regions given a rate vector in [18]. The boundaries of these regions were also obtained by solving equivalent sum-reward maximization problems [18].
However, these schemes do not favour far users that are susceptical to deep fading, since these users are either allocated little power in characterization of the ECRs or prohibitted from transmission due to large power expenditure at the boundary of the outage probability regions. [19] investigated the optimal power allocation over fading BCs taking fairness into account.
Specifically, the authors showed that the minimum-rate capacity region is equal to the sum of the zero-outage minimum-rate vector plus the ECR of a related BC with effective noise. The boundary of this region was then derived based on known results for the ECR in [17].
Our work differs from the above mainly in three aspects. First, instead of an instantaneous minimum rate constraint that cannot be violated in every fading state as considered in [19], we consider an average minimum rate constraint in delay-tolerant applications, and a maximum outage constraint in delay-limited applications, respectively. Hence, we cannot directly relate our proposed trade-off regions to the well investigated ergodic capacity regions in [17]. Second, the key step of weighted sum-utility reformulation in [17] and [18] cannot be readily used for our problems due to the extra quality-of-service (QoS) constraints guranteeing worst user performance in ours. Further, the optimal solutions are obtained using different techniques. We tackle the non-convexity of the problems using Lagrangian dual decomposition together with the ellipsoid method leveraging "time-sharing" conditions [20], while [17] adopted a multi-level "water-filling" algorithm and [18] derived a multi-user generalization of the threshold-decision based power policy.
The main contributions of this paper are summarized as follows. We 1) solve the ergodic sumrate (ESR) maximization problem ensuring a minimum average rate by optimally adapting the power and rate allocations to fading states with full CSI at the Tx (CSIT) for both NOMA and an optimal OMA scheme; 2) obtain the optimal power control to the sum of delay-limited throughput (DLT) maximization problem, which is subject to a maximum outage constraint, with full CSIT for both NOMA and the optimal OMA scheme; 3) under full CSIT, prove the superiority of NOMA over OMA in terms of the considered metrics; 4) under partial CSIT, analyse the ESR and the DLT, respectively, in closed-form with the optimal fixed power allocation and/or proportion of orthogonal resources designed; and 5) characterize the optimal average sum-rate (sum-DLT) versus min-rate (max-outage) trade-offs for different transmission schemes via simulations.
Notation-We use upper-case boldface letters for matrices and lower-case boldface letters stands for the statistical expectation w.r.t the random variable (RV) x. ∼ represents "distributed as" and means "denoted by". The circularly symmetric complex Gaussian (CSCG) distribution with mean u and variance σ 2 is denoted by CN (u, σ 2 ). Ei(x) =

II. SYSTEM MODEL
We consider a simplified downlink cellular system that consists of one BS and two users, which are denoted by U k , k ∈ {1, 2}, with each of their transceivers equipped with a single antenna. We assume that the complex channel coefficient from the BS to U k , h k (ν) experiences block fading with a continuous joint probability density function (pdf), where ν represents a fading state. The channel remains constant during each transmission block, but may vary from block to block as ν changes. The channel gain |h k (ν)| 2 is assumed to consist of multiplicative small scale and large scale fading given by |h k (ν)| 2 = |h k (ν)| 2 λ k , in whichh k (ν) is a complex Gaussian RV denoted byh k (ν) ∼ CN (0, 1), and λ k is a distant-dependent constant. Hence, |h k (ν)| 2 is an exponentially distributed RV with its mean value specified by 1/λ k .
In this paper, we investigate two types of CSIT, i.e., full CSIT and partial CSIT, while CSI at the Rxs is assumed to be perfectly known. When full CSIT is available, the BS can adapt its power and rate of the transmit signal intended for each user to the channel h k (ν)'s in each fading state.
On the other hand, when only partial CSIT including the order of the two channel gains and their channel distribution information (CDI) is available, due to some reasons like limited feedback from the users to the BS or reducing signalling for the purpose of reducing overhead, the BS can only determine its power allocation policy at each fading state based on this order. We also consider two different multiple access transmission schemes, viz., NOMA and optimal OMA. In the NOMA transmission scheme, the two users non-orthogonally access the channel by enabling superposition coding (SC) at the BS and successive interference cancellation (SIC) at the users.
For optimal OMA transmission, we consider power and (continuous) time/frequency allocation both in an adaptive manner, which is referred as OMA-TYPE-II [14]. (Another benchmark scheme, OMA-Type-I, will be introduced in Section V.) A. Full CSIT 1) NOMA: For NOMA transmission, the received signal at the downlink user U k is given by wherek denotes the element in the complementary set of {1, 2} w.r.t k; s k 's is the transmit signal intended for U k 's, denoted by s k ∼ CN (0, 1); p k (ν)'s denotes U k 's transmit power; n k 's is the AWGN at U k 's Rx, denoted by n k ∼ CN (0, σ 2 k ). Moreover, similar to [19], we simultaneously consider two types of transmit power constraints on p k 's, namely, average power constraint (APC) and peak power constraint (PPC), in which the former constrains the total transmit power in the long term, i.e., E ν [p k (ν) + pk(ν)] ≤P , and the latter limits the instantaneous total transmit power belowP , i.e., P k (ν) + pk(ν) ≤P , ∀ν. It is assumed thatP ≤P without loss of generality (w.l.o.g.).
In other words, in this CSI condition, the achievable rate for U k (the stronger user) to decode Uk (the weaker user)'s message is larger than the rate intended for Uk's transmission, and therefore U k is able to successfully perform SIC. Otherwise, U k directly decodes its own message treating Uk 's as interference. The instantaneous achievable rate for U k at fading state ν in bits/sec/Hz is thus given by [17] 2) OMA-Type-II: For OMA-Type-II transmission, each user receives its information over α k (ν) of the time/frequency dedicated to it in fading state ν, where α k (ν) ∈ [0, 1] denotes the proportion of time/frequency shared by U k . The same sets of transmit power constraints as its NOMA counterpart, i.e., APC and PPC, are taken into account as well. Accordingly, the instantaneous achievable rate for U k in fading state ν is given by Note that (3) applies to both TDMA and FDMA transmission in the sense that the total energy consumed for the two users in fading state ν over time remains the same as over frequency, which is given by α k (ν) p k (ν) α k (ν) + αk(ν) pk(ν) αk(ν) = p k (ν) + pk(ν). Hence, we do not differentiate TDMA from FDMA throughout the paper unless otherwise specified.
B. Partial CSIT 1) NOMA: Under partial CSIT, for NOMA transmission, the BS does not know the exact CSI of the two users but their relation and statistical characteristics, and thus we assume a binary power allocation strategy as follows. In each fading state ν, an amount of power p s is always assigned to the stronger user while p w is assigned to the other weaker user. We also assume that p s and p w are fixed over all fading states, and therefore only APC applies, i.e., p s + p w ≤P .
On the other hand, thanks to the perfect CSI at the receiver, the stronger user can still perform SIC, which yields the instantaneous rate R ′NOMA k (ν) for U k expressed as 2) OMA-Type-II: Similarly, for OMA-Type-II transmission, the binary allocation policy with a fixed sharing of time/frequency between the two users is adopted. Specifically, the signal intended for U k is transmitted with power p s if its channel gain from the BS is stronger than Uk's, and with power p w otherwise. The fixed proportion of time/frequency assigned to U k and Uk is α k and alphak, respectively. Consequently, the instantaneous achievable rate for U k is expressed as C. System Throughput 1) Delay-Tolerant Transmission: First, we define the proper metric accounted for the overall performance of downlink NOMA and/or OMA-Type-II system in delay-tolerant types of transmission, where the users will not decode their individual data until the end of all the transmission blocks. A relevant metric in this case is ergodic sum-rate (ESR) [17] that is the achievable sum-rate of the two users averaged over all fading states 1 given by for NOMA and OMA-Type-II transmission, respectively.
2) Delay-Limited Transmission: Next, consider the delay-limited types of transmission for downlink NOMA and/or OMA-Type-II system, in which the BS concurrently transmits to U k and Uk at a predefined rate ofR k andRk, respectively. As the user attempts to recover its message instantaneously at the end of every fading state, to properly measure the performance 1 The terms "average sum-rate" and the "ESR" are used interchangeably throughout the paper.
of each user in such case, the outage probability with which the NOMA user U k fails to decode its intended data w.r.t the target rateR k , k ∈ {1, 2}, under full CSIT is introduced as follows: As seen from (6), when U k has better channel condition, whether its signal-to-noise ratio (SNR) or signal-to-interference-plus-noise ratio (SINR) leads to its outage depends on whether or not it manages to recover Uk's message. If it fails to decode Uk's message at a transmission rate ofRk, it has to decode its own by treating Uk's as interference. Otherwise, if it successfully recovers Uk's, SIC is then performed before it decodes its own interference-free. On the other hand, when U k has worse channel condition, it always decodes its own treating interference as noise. In each fading state ν, an outage indicator function is defined as follows [21].

III. OPTIMUM DELAY-TOLERANT TRANSMISSION
In delay-tolerant scenarios, ESR is maximized at the cost of compromising fairness among users by allocating the resources to the user of better channel condition over all fading states.
To maximize the ESR of the system while guaranteeing certain level of fairness, a minimum achievable ergodic rate requirement for each user, namely, , is imposed, where (·) XX denotes the multiple access scheme that is specified in the context throughout the paper. In this section, the optimal trade-off between the system ESR and user fairness is pursued in the case of full and partial CSIT, respectively.

A. Full CSIT
In the case of full CSIT, the design objective is to maximize the system ESR by jointly optimizing the power and/or orthogonal resource allocations, and the two users' instantaneous rate at each fading state, subject to both APC and PPC at the BS, as well as a minimum ergodic rate constraint for the two users. As a result, the optimization problem is formulated as follows: where the exclusive parameters for OMA-Type -II, {α k (ν)}'s, are only valid when XX is replaced by OMA-Type -II.
It is worthy of noting that the constraints (10d) guarantee that even under circumstances of extremely asymmetric CSI, e.g., when U 1 is located nearer to the BS and thus enjoys much better CSI than U 2 at most fading states, the BS can still delivery information to U 2 at an average rate larger thanR. However, this average rate for U 2 cannot be achieved without (w/o) the minimum ergodic rate constraint.
Problem (P1-NOMA) is non-convex due to the non-convex objective function (c.f. (2)), and therefore no immediate solution can be given. However, for channel fading following continuous distributions, (P1-NOMA) proves to satisfy the "time-sharing" condition in an asymptotic sense [20]. Hence, we can still optimally solve it via its dual problem [21], since strong duality holds under this condition [22].
Next, we apply the Lagrangian dual method to solve (P1-NOMA), the Lagrangian of which is given by where λ is the Lagrangian multiplier associated with the APC given in (10a); δ and µ are those associated with the ergodic rate constraints given in (10d) for U k and Uk, respectively. The dual function of (P1-NOMA) corresponding to (11) is accordingly given by The dual problem of (P1-NOMA) is thus formulated as It is observed that g(λ, δ, µ) is obtained by maximizing the Lagrangian given in (11), which can be decoupled into as many subproblems as the number of fading states all sharing the same structure. The index ν is now safely dropped for the ease of exposition. Taking one particular fading state as an example, the associated subproblem given a triple (λ, δ, µ) can be expressed as (P1-NOMA-sub) : Maximize . Since these problems are independent of each other, they can be solved in parallel each for one fading state. Therefore, w.l.o.g., we focus on solving (P1-NOMA-sub) in the sequel. (15) Proposition 3.1: The optimal power allocation to Problem (P1-NOMA-sub) assuming g 1 > g 2 is given by where p i,1 's and p i,2 's are given by (15) at the top of next page.
Note that Proposition 3.1 assumes g 1 > g 2 for the ease of exposition though, its results also apply to the fading states where g 1 < g 2 by simply exchanging δ, p 1 , and g 1 with µ, p 2 , and g 2 , respectively, in (15). Some insights can be drawn from Proposition 3.1 on how the optimal power allocation policy advocates fairness. To elaborate further, assuming the system design is in favour of U 2 , i.e., δ ≪ µ, it is observed from (15) that p 4,1 monotonically decreases with µ while p 4,2 monotonically increases with µ, which suggests that when the multiplier associated with U 2 's QoS requirement gets larger, the optimal power allocation policy tends to suppress U 1 's transmission while supporting U 2 's despite of a possible case of U 1 's CSI better than U 2 .
Thanks to Proposition 3.1, given a triple (λ, δ, µ), g(λ, δ, µ) is obtained efficiently by solving (P1-NOMA-sub) in parallel over all fading states. (P1-NOMA-dual) can thus be iteratively solved using sub-gradient based methods, e.g., deep-cut ellipsoid method (with constraints) [23, Localization methods]. The required sub-gradient for updating (λ, δ, µ) turns out to be (P − is the optimal solution to (P1-NOMA-sub) at fading state ν, and R * NOMA k (ν)'s is obtained by substituting (2). Note that a feasibleR in (10d) ensures the successful implementation of the ellipsoid method, and thus it is important to consider a reasonableR that does not exceedR max . We can obtain R max by replacing the objective function of (P1-NOMA) with a variableR and then solving the feasibility problem by bi-section overR. Since the involved procedure is quite similar to that for solving (P1-NOMA), we omit it herein for brevity.
The Lagrangian of (P1-OMA-II) is expressed as (16) where λ, δ and µ are Lagrangian multipliers associated with the same constraints as those for (P1-NOMA). Similar to the previous section, L OMA-II can also be decoupled into parallel sub-Lagrangian all having the same structure. We definē Then the associated subproblem one particular fading state is formulated as where the index ν has been dropped for the ease of exposition. To solve (P1-OMA-II-sub), the following two lemmas are required. (p 1 , p 2 , α 1 ) is achieved by its jointly stationary point, it is necessary to have the following conditions satisfied: where The corresponding stationary point is given by (p 1 , p 2 , α 1 ) = 0 to obtain the jointly stationary point.

Lemma 3.2:
If the maximum ofL OMA-II 1 (p 1 , p 2 , α 1 ) is achieved by points on the boundary Proof: Please refer to Appendix A.
Based on Lemma 3.1 and Lemma 3.2, the following proposition is derived.

Proposition 3.2:
The optimal power as well as time/frequency allocation to (P1-OMA-II-sub) is given by where ½ (·) is an indicator function defined as Proof: Please refer to Appendix B.
Remark 3.1: When there are only two users, the optimal solution given by (21) shares some philosophy in common with that achieves the boudary of the time division (TD) capacity region discussed in [17,Thoerem 3]. We focus on solving (P1-OMA-II-sub) in any fading state given a triple (λ, δ, µ), while [17] maximized the total weighted sum-rate in any fading state by determining how to distribute P (n) among M = 2 users such that the instantaneous total (11)]) is satisfied. The optimal solutions both suggest that with probability 1, at most one single user transmits in any fading state. This is because the probability measure of any subset of {((g k (ν), gk(ν))) : h(λ, δ, µ) = 0} (c.f. (18)) assuming continuously joint distribution of (g k (ν), gk(ν)) is zero, and so is that Thoerem 3]. This also explains why the maximum of L OMA-II 1 (p 1 , p 2 , α 1 ) cannot be achieved by its jointly stationary point in probability.

B. Partial CSIT
By analogy, the partial CSIT counterpart of Problem (P1-XX) is formulated as below: where α k 's is only valid in the transmission adopting OMA-Type-II. Similar to Problem (P1-XX), (23c) constrain the minimum average user rate achieved by the two users. First, denote the RV |h k (ν)| 2 (|hk(ν)| 2 ) by X (Y ) 2 . Also, denote the SNR p s g k (ν) and the SINR (4)) by Γ k andΓ k , respectively. It thus follows that the conditional cumulative density functions (CDFs) of Γ k andΓ k are given by respectively, where ε k σ 2 k z ps andε k σ 2 k z pw−psz . In accordance with (24) and (25) where f (·) denotes the function f (x) = e x Ei(−x) (x > 0).
Proof: Please refer to Appendix C.
Since the optimization variable p w only contributes to where u Denoting the SNR of U k in the case of X ≥ Y by Γ k , and that in the case of X < Y byΓ k (c.f. (5)), the conditional CDFs of Γ k andΓ k are given by where ϕ k α k σ 2 k z ps andφ k α k σ 2 k z pw . With (28) and (29), we have the following proposition. Proposition 3.4: The ergodic rate for user U k operating with OMA-type-II, k ∈ {1, 2}, under partial CSIT is given by Proof: Please refer to Appendix D.
Thanks to the results in (27)

IV. OPTIMUM DELAY-LIMITED TRANSMISSION
In delay-limited scenarios, each user attempts to maintain their respective prescribed rate in as much fading states as possible so as to reduce their outage probability defined in (6)/(8).
When the users compete for power and/or time/frequency resources to get their intended data transmitted at the target rate in each fading state, the combined effects of outage probability and individual target rate accounts for the DLT of each user, which causes the solution to the sum of DLT maximization non-trivial. In this section, the optimal trade-offs between the system sum-DLT and the maximum outage probability requirement for the users is investigated for different multiple access schemes under full and partial CSIT, respectively.

A. Full CSIT
In the case of full CSIT, we aim for maximizing the system sum of DLT by jointly optimizing the individual transmit power as well as time/frequency allocation over different fading states, subject to a given pair of APC and PPC at the BS, and a maximum user outage probability constraint. The optimization problem is thus formulated as below.
(P2-XX) : Maximize where {α k (ν)}'s are only valid when the two users access the channel by OMA-Type-II. It is worthy of noting that given the same target rate intended for each user, i.e.,R k =Rk =R, even if U k and Uk suffer from "near-far" physical condition, the far user can still successfully decode its data at this constant rate for more than 1 −ζ proportion of the fading states, thanks to the constraints (31d). As seen from (7) ( (9)), the discrete value of X NOMA k (ν) (X OMA-II k (ν))'s renders non-convexity w.r.t the optimization variables p k (ν), k ∈ {1, 2}, and thus Problem (P2-XX) is also non-convex. Therefore we exploit the similar "time-sharing" condition aforementioned to find their asymptotically optimal solutions in subsection IV-A1 and IV-A2, respectively.

1) Optimal Solution to (P2-NOMA): Adopting Lagrangian dual decomposition method, the
Lagrangian of Problem (P2-NOMA) is given by where λ is the Lagrangian multiplier associated with the APC; δ and µ are those associated with the maximum user outage probability constraints given in (31d) for U k and Uk, respectively.
In line with the principle of dual decomposition, (32) can be maximized by decoupling it into independent subproblems each for one fading state and solving those subproblems in parallel.
With the fading index ν safely dropped, given the dual variables' triple (λ, δ, µ), the following problem is typical of the subproblems sharing the same structure: (P2-NOMA-sub) : Minimize NOMA 2 (p k , pk) Subject to p k + pk ≤P . Then, we investigate the possible combinations of outage occurrences for U k and Uk. Assuming g k > gk, k ∈ {1, 2}, the possible combinations of indicator function X NOMA k and the corresponding decoding strategies adopted by U k are summarized in Table I, where Uk → Uk represents that Uk directly decodes its own information treating interference as noise; U k → Uk → U k denotes U k 's attempt to perform SIC 3 ; → indicates failure of decoding. Specifically, if the first step succeeds, U k is able to cancel the interference from Uk, otherwise U k continues to decode its own treating Uk's as interference. Based on Table I, we derive the optimal solution to (P2-NOMA-sub) in the following proposition.
Proposition 4.1: The optimal power allocation to Problem (P2-NOMA-sub) assuming g k > gk is given by where p i,k 's and p i,k 's are given by and the indicator function ½ (·) is defined the same as (22).
Proof: To minimizeL NOMA 2 (p k , pk), we need to examine every case of combination regarding U k 's and Uk's outage occurrences so as to find the one that minimizesL NOMA 2 (p k , pk). First, we show that the cases I.C and I.D can be safely removed since they are always outperformed by other cases. Take I.C as an example, if U k succeeds in decoding Uk's message at rateRk, it inexplicitly suggests that Uk's message is transmitted at pk > 0. Therefore, the correspondinḡ L NOMA 2 (p k , pk) =R k +Rk + λpk + δ + µ is strictly larger thanL NOMA 2 (0, pk) =R k +Rk + δ + µ  2) Optimal Solution to (P2-OMA-II): In this section, we aim for solving (P2-OMA-II).
Despite of its non-convexity due to the same reason as that for (P2-NOMA), we can still find asymptotically optimal solution to (P2-OMA-II) thanks to the "time-sharing" condition that (p k , pk, α k ) Subject to p k + pk ≤P , where the objective function is defined asL OMA-II 2 (p k , pk, α k ) =R k X OMA-II k +RkX OMA-IĪ k + λ(p k + pk) + δX OMA-II k + µX OMA-IĪ k with the fading index ν dropped for brevity.
Since each U k only needs to decode its own information without seeing interference in the orthogonal transmission, the possible combinations of outage occurrences for U k and Uk are easily shown in Table II, where U k → U k denotes U k 's direct decoding of its own message, k ∈ {1, 2}. Based on Table II, we obtain the optimal solution to (P2-OMA-II-sub) in the following proposition.

Proposition 4.2:
The optimal power allocation to Problem (P2-OMA-II-sub) is given by where p i,k 's and p i,k 's are given by (36) In (36), α * k , which denotes the optimum proportion of time/frequency resource allocated to U k to minimize the total transmit power at one particular fading state, is obtained by solving the following (convex) problem.
In addition, the indicator function is given by (22).
Proof: Please refer to Appendix E.
Next, given the same set ofR k 's, we provide mathematical proof for the superiority of NOMA over OMA-Type-II in terms of the optimum sum of DLT. To prove so, we first introduce the following lemma. we review the case that (p * 1 , p * 2 ) falls in the last case of (36). Given p * , we reallocate them among the two users as p 1 = 2R 1 −1 g 1 and p 2 = (2R 2 −1)( 2R 1 −1 g 1 + 1 g 2 ) assuming g 1 > g 2 w.l.o.g., and therefore X NOMA k = X * OMA-II k = 1, ∀k. This modification is feasible, since p * 1 + p * 2 ≥ p 1 + p 2 in accordance with Lemma 4.1. (Considering a special case of K = 2, it holds true that min

B. Partial CSIT
Problem (P2-XX) under partial CSIT is accordingly recast as follows: where α k 's is only valid when "XX" is replaced by OMA-Type-II, and (37c) constrain the maximum user outage probability of the two belowζ ′ .
In line with the same notation for channel gains as defined in (24) and (25), replace p k (ν) with p s , pk(ν) with p w , ∀ν, when X > Y , and otherwise do this reversely. As a result, can be recast as follows: With CDI regarding X and Y given in Section II, E ν [X ′NOMA k (ν)]'s can be derived based upon (38) shown in the following proposition.

Proposition 4.3:
The outage probability for NOMA user U k given the prescribed transmit ratē R k , k ∈ {1, 2}, under partial CSIT is given by where ε k,1 In addition, α k,c 1 j and β k,c 1 j , j = 1, 2, 4, are given by α k,c 2 j and β k,c 2 j , j = 1, 2, 4, are given by and α k,c l and β k,c l , l = 3, 4, 5, are given by Proof: Please refer to Appendix G.
In accordance with Proposition 4.3, we are able to derive the sum of the DLTR k (1−ζ ′NOMA In line with the principle of power and time/frequency allocations for OMA-Type-II transmission described above (5), replace p k (ν) with p s , pk(ν) with p w , ∀ν, when X > Y , and the reverse when X ≤ Y in (8). The partial CSIT counterpart of (8), ζ ′OMA-II is derived as follows: We are thus able to derive E ν [X ′OMA-II k (ν)] in the following proposition.

Proposition 4.4:
The outage probability for OMA-Type-II user U k given the prescribed transmit rateR k , k ∈ {1, 2}, under partial CSIT is given by where ϕ k,1 α k σ 2 k ξ k ps and ϕ k,2 α k σ 2 k ξ k pw , in which ξ k 2R k α k − 1. 4 The parameter values preceding and coming after "or" form a pair, respectively.
Proof: With CDI of X and Y known, the derivation of E ν [X ′OMA-II k (ν)] from (40) is straightforward and thus omitted here for brevity.

V. NUMERICAL RESULTS
In this section, we verify the theoretical analysis for the considered two-user downlink NOMA system via numerical results. As a bench mark, we also provide a suboptimal OMA transmission scheme, referred as OMA-Type-I, which assigns equal orthogonal resource (time/frequency) to the users over all fading states, i.e., α k (ν) = 1 2 , ∀k, ∀ν in (3), and α k = 1 2 , ∀k, in (5). The corresponding optimal power policies to OMA-Type-I are easily seen to be special cases of (P1-OMA-II) ((P2-OMA-II)) and (P1 ′ -OMA-II) ((P2 ′ -OMA-II)), which can thus be accordingly obtained. U 1 and U 2 are assumed to be located with a distance of d 1 and d 2 away from the BS, respectively. The large-scale path loss model of the channel is given by 128.1 + 37.6 log 10(D) in dB, where D in kilometer (km) denotes the distance from the BS to the user. The small-scale fading is assumed to be independent and identically distributed  Fig. 1 depicts the optimal trade-offs between the average sum-rate of the system and the minimum rate constraints under full CSIT, i.e.,R, achieved by NOMA and the OMA schemes with different distance settings. It is worthy of noting that the horizontal axe in Fig. 1 is given by [0,R max ], and it has been discussed in Section III-A1 regarding how to obtainR max . It is seen that with the near-far distance setting, NOMA outperforms OMA-Type-II transmission in most cases, while the gap shrinks whenR is very little andR approachesR max , respectively.

A. Delay-Tolerant Transmission
Moreover, both NOMA and OMA-Type-II achieve substantially larger optimal trade-off than OMA-Type-I, although OMA-Type-I is seen more robust against increase inR untilR is larger than 0.65bits/sec/Hz. This is because OMA-Type-I is intrinsically of fairness in terms of equal time/frequency assigned to each user irrespective of their CSI. It is also worth noting that when there is no difference between the two users in terms of large-scale fading, the average sum-rate versus min-rate trade-offs almost vanish, since the average sum-rate w/o the minimum rate constraint has already achieved certain fairness, i.e.,   Fig. 2 shows the optimal trade-offs between the average sum-rate of the system versus the 27 minimum rate constraints under partial CSIT, i.e.,R ′ , achieved by various multiple access schemes with different APC. The optimal trade-off regions between the average-sum rate of the system and the fairness are expectedly seen to enlarge with increasing limit on the transmit powerP . While the superiority of the proposed power allocation policies for NOMA against OMA-Type-II is obviously seen, the contrast is more sharply observed for NOMA against OMA-Type-I in Fig. 1.  The comparison between the individual ergodic rate subject to varied minimum average rate constraints is demonstrated in Fig. 3(a) (Fig. 3(b)) for NOMA, OMA-Type-I, and OMA-Type-II, respectively, under full (partial) CSIT. First, we see that OMA-Type-II achieves almost the same ergodic rate for U 1 as NOMA with U 2 's ergodic rate both as little as zero, when there is no minimum rate requirement. This can be intuitively explained as follows. Since U 1 is the near user who enjoys better CSI in most of the fading states, the optimal power policy that maximizes the average sum-rate for both NOMA and OMA-Type-II is to allocate power only to U 1 in all such states. Moreover, the advantage of NOMA begins promising when the system requires a larger R (R ′ ), in that NOMA guarantees the minimum average rate achieved by U 2 while keeping U 1 's average rate the maximum. for each user. It is seen that when the two users suffer from near-far unfairness, the optimum sum of DLT versus max-outage trade-off achieved by NOMA outperforms that achieved by OMA-Type-II and OMA-Type-I, respectively. However, this superiority almost disappears when U 1 and U 2 are both 0.5km away from the BS. This is because with the same large-scale fading, g 1 (ν) ≈ g 2 (ν) in most of fading states, thanks to which the difference between P * O2 and P * N in Lemma 4.1 given K = 2 becomes less. This implies that the total amount of transmit power saved by NOMA tends to be trivial. Furthermore, in this case, no much trade-off is seen for the sum of DLT versus user fairness, as the two users hold similar chances to be the stronger user, and therefore when ζ XX k (c.f. (6) and (8)) is minimized, ζ XX k is nearly minimized as well, where (·) XX stands for NOMA or OMA-Type-II.

B. Delay-Limited Transmission
On the other hand, the impact of differentR k 's on the optimum sum-DLT versus max-outage trade-offs in near-far channel conditions is demonstrated in Fig. 4(b). With the same intended rateR 1 =R 2 = 2bits/sec/Hz, the optimum trade-off achieved by NOMA outperforms those achieved by the OMA schemes. By contrast, whenR 2 reduces to 0.5bits/sec/Hz, the trade-off becomes trivial, since in this case the stronger user's advantage in saving power is compromised by its higher target rate. WhenR 1 continues increasing to 4bits/sec/Hz, the trade-off is seen come back with superior sum of DLT over that in other settings ofR 1 andR 2 .
, can be equivalently characterized by their individual outage probability ζ XX k (ζ ′XX k ). The DLT allocation between the two users subject to different maximum permissive outage is reflected by their outage probability allocation in Fig.   6 under full (c.f. Fig. 6(a)) and partial (c.f. Fig. 6(b)) CSIT, respectively. Under full CSIT, it is seen from Fig. 6(a) that withR 1 =R 2 = 2bits/sec/Hz andP = 2Watt, U 1 achieves almost negligible outage while U 2 can achieve an outage probability as low as 0.3032 by NOMA. OMA-Type-II follows the same trend unless U 1 has to claim more outage states to reserve power for U 2 's transmission. OMA-Type-I is the least favourable in the sense that it does not support any outage requirement lower than 0.36 (0.71). U 1 's outage probability compromised by satisfying a lower maximum outage constraint is larger in the case of partial CSIT than in the case of full CSIT due to loss of optimality (only p s , p w , and/or α k to be optimized).

VI. CONCLUSION
In this paper, we have investigated the average sum-rate and/or the sum of DLT maximization for a two-user downlink NOMA over fading channels imposing QoS constraints on the worst user performance. Under full CSIT, the non-convex resource allocation problems have been solved using the technique of dual decomposition leveraging "time-sharing" conditions. Under partial CSIT, the individual ergodic rate and/or outage probability have been characterized in closed-form, based on which the optimal power policies have been numerically obtained.
Simulation results have unveiled that the optimal NOMA-based power allocation schemes in general outperform the optimal OMA-based ones in terms of various throughput versus fairness trade-offs, especially when the two users' channels experience contrasting fading gains.

APPENDIX A
As p 1 + p 2 =P is fixed, (P1-OMA-II-sub) reduces to the following problem irrespective of λ: where α 2 = 1 − α 1 . Note that = has been relaxed into ≤ in (42a), since it is easy to check that the above problem obtains its optimum value when (42a) is active.
At last, the reasons why the jointly stationary point (c.f. (19)) cannot be the optimal solution to (P1-OMA-II-sub) has been explained in Remark 3.1.

APPENDIX C
where A ′ Next, define 1 g i by c i , ∀i, it is easily verified that 0 < c 1 ≤ c 2 ≤ . . . ≤ c K . By applying [14, g i = P * N , which completes the proof for Lemma 4.1.