On the Energy Efficiency Maximization of NOMA-aided Downlink Networks with Dynamic User Pairing

This study investigates a combined system comprising non-orthogonal multiple access (NOMA) and beamforming in a downlink network. To fully exploit the advantages of NOMA, user (UE) pairing and beamforming design are jointly optimized via a generalized model for UE association, subject to energy efficiency maximization. Owing to the combination of binary variables and nonconvex constraints, the resulting optimization problem belongs to the class of mixed-integer nonconvex programming. An innovative algorithm, integrating the inner-approximation and Dinkelbach methods, is proposed herein to address a nonconvex fractional function. By introducing a pairing matrix and relaxing the binary variables into continuous ones, our approach is capable of reaching an optimal solution, where two arbitrary UEs are optimally paired regardless of geographical or spatial constraints. For practical scenarios, we further propose a robust design to manage the effect of channel estimation errors under settings involving channel uncertainty. Numerical results show that our proposed designs, even with the presence of the imperfect channel state information at the base station, significantly outperform the conventional beamforming and existing pairing schemes.


I. INTRODUCTION
Non-orthogonal multiple access (NOMA) has garnered significant attention from researchers for potential use in future wireless networks, i.e., the sixth generation (6G) wireless networks, owing to its capability of achieving superior system throughput and its compatibility with other systems [1], [2]. Unlike conventional orthogonal multiple access (OMA), NOMA allows users (UEs) to share the same time-frequency resource for data transmission. To remove the inter-user interference, NOMA exploits either power, code or spatial domain, corresponding to the three typical approaches of NOMA: power-domain NOMA [2], [3], code-domain NOMA [4], [5], and spatial-domain NOMA (i.e., multipleinput multiple-output (MIMO), massive MIMO [6]). Herein, we focus on power-domain NOMA, 1 wherein UE signals can 1 Power-domain NOMA is shortly referred to as NOMA in this paper. be superimposed with different power allocation coefficients. In particular, higher power is allocated to UEs with poorer channel gains to perform reliable detection at the receiver. Meanwhile, UEs with better channel gains can suppress the signals of other UEs with poorer channel gains through successive interference cancellation (SIC) techniques before detecting their own signals [7].

A. RELATED WORKS AND MOTIVATION
To fully exploit the advantages of NOMA, the UEs must be grouped/paired efficiently. Therefore, the issue of clustering/pairing UEs has been widely investigated in recent NOMA-related studies. The majority of papers investigated the UE pairing strategy, where each pair consists of exactly two UEs, based on different criteria and under different contexts [8]- [15]. For example, the effect of UE pairing was investigated for a cognitive radio-inspired NOMA (CR-NOMA) system in [8], [9]. Each pair, which includes a primary UE with a poor channel condition and a cognitive UE with a strong channel condition, may share the same spectrum, but different pairs are designed to transmit onto orthogonal channels. In particular, UEs are paired based on the matching theory to improve the system throughput, while satisfying the minimum UE rate requirement [9]. The authors of [10] studied a centralized radio access network (C-RAN) utilizing NOMA in multiple sub-carriers (MSC), where UE pairing and SIC ordering algorithms were proposed to decrease the co-channel interference. For the NOMA-based massive MIMO scenario, the work [11] proposed a joint algorithm considering both UE pairing and pair scheduling for data transmission, wherein the UE pairing is conducted by grouping UEs with similar channel conditions based on their channel norms. The UE pairing strategy for a NOMA broadcast wireless network was considered in [12], [13], in which UE pairs are assigned to different sub-channels. The approach of [12] aims to pair two UEs with the best and worst channel gains in the unpaired UE set until the set is empty, whereas the approach of [13] pairs the two unpaired UEs with the worst channel gains consecutively. In addition, a channel gain based pairing scheme was proposed for a multiple-input single-output (MISO) MSC NOMA system in [14], wherein two UEs with the best and middle channel gains are paired to employ NOMA technique. On the other hand, the distance between UEs and the base station (BS) was used as a criterion of the UE pairing method in [15], [16], where each pair consists of two UEs in one of two disjoint zones, i.e., inner and outer zones.
It is worth to mention that the works in [17]- [21] allocate more than two UEs per NOMA group/cluster instead of making use of UE pairing. In particular, UE clustering algorithms are used to further exploit NOMA benefits for a fullduplex system in [17], for a NOMA-aided massive MIMO systems in [19], [20], and for a NOMA-aided mmWave system in [21]. However, the use of NOMA with a group comprising more than two UEs is typically more challenging to implement than UE pairing algorithms. This is because the virtual partition of cell zones in UE clustering relies on specific practical conditions, such as the cell size, channel conditions, and received power threshold [18], [22].
The beamforming design issue in a NOMA-aided MISO network has received limited attention in the literature [10], [23]- [25]. Specifically, the minorization-maximization algorithm was used for the SE optimization problem in a MISO-NOMA system in [24]. On the other hand, the worst-case achievable sum rate of all users was maximized with a robust beamforming design in [25]. However, the NOMA protocol in those works mainly rely on the SIC ordering without taking into account UE grouping. This motivates us to formulate a generalized multiuser MISO-NOMA framework that combines beamforming and UE pairing.
Since the energy efficiency (EE) is considered as a core criterion for future green cellular networks, the EE performance of NOMA-aided systems have recently attracted attention from the wireless community, e.g., [10], [17], [23], [26], [27]. One of the earliest studies pertaining to EE maximization for a NOMA system was presented in [26], where a suboptimal power allocation scheme was proposed with the assumption of statistical channel state information (CSI) at the transmitter. The authors of [27] investigated a power allocation scheme for EE optimization in a DL NOMA broadcast network. As mentioned above, EE maximization was investigated in [17] via joint sub-channel assignment and power allocation, but for a DL NOMA-aided network using a conventional single-input single-output framework rather than an MU-MISO framework. Besides, a multi-objective optimization problem of both the SE and EE was studied for a downlink (DL) NOMA system in [23]. However, the NOMA protocol in this paper is applied to all UEs without adopting any clustering method, i.e., each UE utilizes SIC to remove the interference caused by the signals of all weaker UEs. This approach is thus difficult to implement due to high complexity and expensive computational cost.
For a NOMA-aided C-RAN network, although the EE maximization problem was investigated via joint sub-carrier assignment and user pairing, as in [10], the QoS requirement for each UE is still left untreated. Meanwhile, a hybrid zero-forcing (ZF) beamforming design was proposed for an MU-MISO system in [28]. Notice that, the ZF beamformer proposed in [28] could only remove the residual interference completely, given that sufficient degree of freedom (DoF) for the ZF scheme is provided, i.e., the number of BS antennas must be at least larger than the number of UE clusters.
To overcome the above DoF limitations of the ZF beamforming and the channel gain restrictions of the greedy-based pairing schemes (as mentioned in previous works, e.g., [11]- [14], [28]), we aim to develop a novel approach that not only provides an optimal solution for UE pairing with a hybrid design of NOMA and beamforming, but achieves a balanced tradeoff between EE performance and complexity. Although there is a plethora of related works on the integration of NOMA and beamforming techniques, we stress that our proposed UE pairing strategy is different from the previous schemes in that it works in more dynamic fashion, where two arbitrary UEs could be optimally paired regardless of geographical or spatial constraints. In addition, by characterizing the UE paring variables with an auxiliary matrix, we reach a general framework for all UE pairing methods. The contributions of this paper are fully described in the following.

B. CONTRIBUTIONS
In this study, we investigate a combined beamforming and UE pairing scheme for a DL NOMA MU-MISO network. Instead of using channel-gain difference for UE pairing as the conventional UE pairing strategy, we propose a novel method to fully exploit the association among UEs for the efficient use of NOMA. The jointly optimized beamforming design for a NOMA-based system results in a mixed-integer nonconvex fractional problem, which cannot be solved using previous approaches. Hence, a combination of relaxation and inner-approximation (IA) methods overlaying the Dinkelbach framework is proposed to provide at least a local optimal solution. Furthermore, the proposed method can be conveniently extended to a robust design if imperfect CSI is available.
Overall, main contributions of this paper are listed as follows: • First, we develop a novel hybrid NOMA-beamforming design, wherein the NOMA beamforming and conventional beamforming methods are simultaneously conceived to exploit the advantages of both technologies. When the NOMA beamforming performance deteriorates, the proposed design automatically switches to the conventional counterpart. Furthermore, the proposed design offers an optimal approach to utilize the entire spectral resource instead of assigning UEs to subchannels.
• Second, we implement the UE pairing strategy in a dynamic UE selection manner, where two arbitrary UEs can be optimally paired in a group, rather than greedy UE pairing based on channel conditions, i.e., channel correlation or geometrical distance, as in [17], [18], [22], [28].
• Third, we feature the binary variables for UE pairing using an upper triangle matrix. This not only reduces the number of binary variables and constraints, but may also yield a generalized framework that is applicable to all UE pairing methods. However, the EE maximization with NOMA-aided beamforming designs under a dynamic UE pairing can be classified as a mixed-integer nonconvex fractional problem, which is highly nontrivial to solve directly. Hence, we transform the problem into a more tractable form by relaxing the binary variables into continuous ones. To address the nonconvexity of the relaxed problem, we jointly employ the IA and Dinkelbach transformation to attain successive convex programming. In the end, a low-complexity iterative algorithm is devised to obtain at least a local optimal point for the original problem.
• Fourth, to achieve a robust design, we further extend the proposed algorithm for EE maximization under channel uncertainty, i.e., with imperfect CSI. In this regard, we exploit a statistical model for estimation error as an additional condition in the original problem formulation.
• Finally, numerical results are compared with existing schemes in terms of EE performance and complexity to verify the effectiveness of our proposed design. Furthermore, we demonstrate that the proposed method provides a fast convergence speed while offering robustness under an imperfect CSI assumption.

C. PAPER ORGANIZATION AND NOTATION
The remainder of this paper is organized as follows. Sections II and III describe the system model and problem formulation of EE maximization, respectively. In Section IV, we propose a low-complexity iterative algorithm for EE maximization using the IA method and Dinkelbach transformation. Section V presents a robust design under channel uncertainty. Numerical results are provided in Section VI, and conclusions are presented in Section VII.
Notation: Uppercase and lowercase letters in bold represent matrices and vectors, respectively. x H denotes the Hermitian transpose of x. C and R are the spaces of all complex and real numbers, respectively. · is the Euclidean norm. |x| and |X| denote the absolute value of a number x and the determinant value of a matrix X, respectively. E{·} represents the expectation and {.} is the real part of a complex number.

A. DATA TRANSMISSION MODEL
We herein consider a DL NOMA network, where a BS equipped with N antennas is located in the center of a cell and serves K single-antenna DL UEs uniformly distributed in the cell, as shown in Fig. 1. The k-th UE, denoted by UE k , k ∈ K {1, 2, ..., K}, is associated with a channel gain vector h k ∈ C N ×1 from the BS. For a NOMA pair, the SIC technique is applied at the stronger UE, i.e., UE with the stronger channel gain, who decodes and then subtracts the message of the weaker UE from the received signal before decoding its own message. To conveniently employ the NOMA technique, we first sort the channel gains of UEs in descending sequence, then assign the indices for UEs based on this order such that the following condition is satisfied 2 : The signal received at UE k can be expressed as where w k ∈ C N ×1 is the beamforming vector from the BS to UE k . x k is the data signal to be transmitted to UE k , where E{|x k | 2 } = 1 is assumed. n k ∼ CN (0, σ 2 k ) represents the additive white Gaussian noise. We begin by assuming that perfect CSI is available at the BS; subsequently, a robust design under channel uncertainty is presented based on the proposed solution.
For decoding operations, the UEs are dynamically paired prior to the messages sent from the BS. We introduce a new binary variable α ,k ∈ {0, 1}, with ∀ , k ∈ K, to perform UE pairing, described as if UE and UE k are paired with SIC being executed at UE , i.e., UE removes the message intended to UE k before decoding its own message, 0, otherwise. ( For notational convenience, we define a matrix α [α ,k ] ,k∈K ∈ {0, 1} K×K . It is clear that when α ,k = 1, the other users are unpaired with UE and UE k . This implies that all other elements on two columns and k and two rows and k must be equal to zero. By setting w = w T 1 w T 2 · · · w T K ∈ C N K×1 , the signalto-interference-plus-noise (SINR) for decoding the UE k message can be expressed as where SINR k,k (w, α) and SINR ,k (w, α) are the corresponding SINRs for decoding UE k 's message at UE k itself and at UE , which are defined as The interference-plus-noise expressions in the denominators of (5a) and (5b) are defined as where k is used to indicate any UE different from UE k , i.e., UE k may precede or succeed UE k , while denotes the index of UEs that precede UE k in K, as defined in (1). For simplicity, we redefine γ k (w, α) SINR k (w, α) and γ k,k (w, α) SINR k,k (w, α). We further define γ ,k (w, α) to be the effective SINR ,k (w, α) for decoding UE k 's message at UE : where a small number ε is used to avoid numerical failure when divided by zero if α ,k = 0, while guaranteeing that γ ,k (w, α) is extremely large in that case.
As illustrated in Fig. 2, assuming that the two UEs UE and UE k ) with h 2 > h k 2 , (i.e., < k) are paired to apply the NOMA technique, we obtain the following binary values α ,k = 1, α k, = 0, and α , = α k,k = 0, ∀ ∈ {K| ≤ − 1}, ∀k ∈ {K|k ≥ k + 1}. It is noteworthy that the symbol × in Fig. 2 denotes the elements with α ,k ∈ {0, 1}, ∀{ = , k = k}. In this case, UE would decode UE k 's message and remove this message from its observation before decoding its own message. Accordingly, the expression in (4) indicates that the SINR for the UE is determined by γ (w, α) = γ , (w, α), since α , = 0 and γ ,k (w, α) → ∞, ∀ ∈ K. Meanwhile, the SINR for UE k becomes γ k (w, α) = min γ k,k (w, α), γ ,k (w, α) owing to the fact that γ ,k (w, α) → ∞, ∀ ∈ K\{ }. Recall that an important criterion for our UE pairing approach is that each UE can be paired with at most one UE. In summary, matrix α satisfies the following linear constraints Remark 1: The arrangement of UEs in the descending channel gain can reduce the number of variables in binary matrix α. In particular, when the UEs are sorted, those with value 1 are always allocated above the main diagonal, based on the SIC technique in a NOMA system. Therefore, the variable matrix α is an upper triangular matrix.

B. POWER CONSUMPTION MODEL
In this paper, power consumption includes the power for data transmission in the form of electromagnetic radiation, operation, signal processing at each UE, and consumption at the circuit [29], [30]. In particular, the total power consumption in our system can be modeled as follows: where P D (w) is the power for data processing at the UE and transmission from the BS to all UEs, and P C is the power consumed by the operation and circuitry.
• First, the power P D (w) for the transmission and processing of data is composed of two parts as [29], [31] The radiated power is the power for transmitting data from the BS to UE k using electromagnetic radiation, with g PA ∈ [0, 1] being the power amplifier (PA) efficiency at the BS [32]. The signal processing power is the power consumed for data processing including the waveform generation, synchronization, and precoding/beamforming vector computation at UE k .
• Second, the power consumption for circuit operation P C can be expressed as [29], [31] where P cir BS and P cir k denote the power consumed by circuit operation at the BS and UE k , respectively [29]. P a BS is the power consumed when the BS is in the active mode, i.e., the power to operate the BS.

C. PROBLEM FORMULATION
From (4), the SE of UE k (in nats/s/Hz) 3 is obtained as Subsequently, the sum SE is expressed as Therefore, the EE (in nats/J), defined as the ratio of the sum throughput (in nats/s) of all UEs to the total power consumption (in W), is expressed as where B is the system bandwidth.
We aim to maximize the EE under the power control and quality-of-service (QoS) requirement for each UE, formulated as Clearly, constraint (16b) guarantees the maximum BS transmit power. Constraints (16d)-(16f) describe the criteria for UE pairing, wherein each UE can pair with at most another UE under the condition of the upper triangular matrix as mentioned in Remark 1. Constraint (16c) ensures the minimum per-UE rate,R k , ∀k ∈ K. Because of the nonconcave objective function (16a), nonconvex constraint (16c), and binary constraints (16d)-(16f), the problem (16) belongs to a mixed-integer nonconvex program; hence, it is generally difficult to solve directly [33], [34].

III. PROPOSED SOLUTION FOR EE MAXIMIZATION
It is observed that the interference-plus-noise terms in (16) contain the strong compounds of binary variables and complex beamforming vectors. Hence, it is highly challenging to transform the problem into popular convex forms, such as semi-definite programming and second-order cone programming. To facilitate problem solving, we first relax the binary variables in (16), hence, constraint (16d) becomes 0 ≤ α k, ≤ 1, ∀k, ∈ K. Subsequently, we introduce a new variable ω k , k ∈ K as a soft SINR for UE k , and apply it to (13) and (14). The sum SE is simplified to with a new constraint imposed by Accordingly, the lower bound of the EE in (15) can be expressed asF where ω {ω k } ∀k∈K . Therefore, the relaxed problem for (16) can be formulated as max w,α,ωF Because the new variable ω is used as an alternative for the SINR, the nonconvex constraint in (16c) is transformed into a linear constraint, as shown in (20d). In addition, the binary constraints are relaxed to the linear forms shown in (16e), (16f) and (20c). However, problem (20) is still nonconvex owing to the nonconcave objective function (20a) and nonconvex constraint (18), which must be approximated to solve it.

A. CONVEXIFICATION FOR FEASIBLE SET OF PROBLEM (17)
Theorem 1: The solution to (20) can be obtained iteratively by solving the following problem at iteration (i + 1): ,k +ε)Θ ,k (w (i) ) and with the predefined matrix A ∈ C N ×K with a being theth column of A. C is a constant scalar, and X ∈ C K×K and Y ∈ C K×K are the matrices of the variables. Proof: Please see Appendix B. It is evident that the feasible set of problem (21) is convex, because it merely contains the linear and second-order-cone (SOC) constraints. However, we must address the nonconcave objective function before deriving a solution to (20).

B. PROPOSED ALGORITHM FOR SOLVING PROBLEM (14)
The throughput in the objective function (19) is easily obtained as in [31,Theorem 3]: where Ω diag([ω 1 , ω 2 , · · · , ω k ]). Subsequently, an auxiliary variable ϑ can be used for power control as the following convex constraint: Hence, the objective function (21a) can be represented as a concave-over-linear form, i.e.,F SE (Ω) ϑ . By applying the Dinkelbach transformation, sub-problem (21) at iteration (i + 1) can be equivalently expressed as max w,α,ω,µ,ϑF where u (i) is the optimal value achieved in the previous iteration. Based on the properties of IA and Dinkelbach methods [35], [36], the optimality of sub-problem (21) and (24) holds, resulting in the optimum of problem (20) at the convergence of an iterative algorithm. However, the feasibility for sub-problem (24) is numerically difficult to obtain at the beginning of the loop in the algorithm, particularly because of the QoS constraint (20d): To solve problem (24), an initial starting point (w (0) , α (0) , ω (0) , µ (0) , ϑ (0) ) is required. Hence, we solve an alternative problem that generates a feasible point for (24): where v = {v k } ∀k∈K is a new slack variable. It can be foreseen that the optimal value V approximately reaches zero when the equalities of constraints (25c) and (25d) hold. This is equivalent to the fact that the QoS constraint in (20d) is satisfied; therefore, the initial feasible point (w (0) , α (0) , ω (0) , µ (0) , ϑ (0) ) can be obtained. When the iterative algorithm for problem (20) terminates, the value of α still belongs to the closed interval [0, 1], i.e., may not be an exact binary value (zero or one). In fact, after a limited number of iterations, the iterative algorithm reaches the convergence when a difference between two consecutive objective values is tolerant. Therefore, the optimal solution with real value of α when solving problem (20) is not a solution for problem (16). To transform α obtained by solving (20) into a binary value, we use a rounding function: where x , x ∈ R returns the maximum integer not larger than x. Subsequently, problem (20) is resolved using the fixed binary value α to obtain the optimal solution for problem (16). The proposed iterative algorithm for solving the EE maximization problem (16) under the perfect CSI assumption at the BS is detailed in Algorithm 1.

1) Convergence analysis
The convergence behavior of the proposed algorithm reflects the properties of the IA method and Dinkelbach transformation [35], [36]. According to [36], the optimum successive convex problem presented in (21) is obtained from that presented in (24). Therefore, problems (21) and (24) have the same optimum and feasible set. Let X = (w, α, ω, µ) and F (i) be the feasible point and feasible set of approximated problem (21) at iteration i, respectively. It is proven for a connected set F (i) [37]. Hence, the program in (21) monotonically improves the objective values for problems (20) and (16) as well as provides at least a local optimal solution satisfying the Karush-Kuhn-Tucker conditions at the convergence. In practice, the algorithm can be numerically terminated at a finite number of iterations when u (i+1) − u (i) < 10 −3 .

2) Complexity analysis
According to [38], the complexity of a convex problem can be measured by the number of constraints and variables, even when the objective function is not linear. As can be seen, problem (24) comprises (5K 2 + K + 2) SOC/linear constraints and (2K 2 + N K + K + 1) variables. Consequently, the per-iteration complexity for solving problem (24) is

IV. ROBUST TRANSMISSION DESIGN
In Section III, the proposed design for EE maximization is analyzed under the ideal assumption of perfect CSI, which is typically difficult to achieve in practice. In real world scenarios, it often happens that the channel is not known a priori at the BS owing to the complex urban environment. The EE optimization of a system that is robust to the imperfect CSI condition has thus attracted attention, e.g., for a MIMO twoway relay network in [39], for a full-duplex multi-user multicell MIMO network in [40], and for a MISO non-orthogonal multiple access system in [28]. For practical purposes, this section develops a robust design for EE optimization of the proposed NOMA-aided downlink network under channel uncertainty, which has not been considered in the literature Algorithm 1 Proposed Algorithm to Solve Problem (16) Phase 1: 1: Initialization: Set i := 0 and solve (25) to generate the initial starting point (w (0) , α (0) , ω (0) , µ (0) , ϑ (0) ). 2: repeat 3: Solve (24) to obtain (w , α , ω , µ , ϑ ).
so far. Since the real channel h k is assumed to be unknown, we first decompose it as [17], [28], [40] h k =ĥ k + ∆h k , whereĥ k and ∆h k are the channel estimate and estimation error, respectively. In this section, the BS is only aware of h k provided by the estimation scheme, whereas the estimation error is undetermined and independent of the channel estimate 4 . Based on the estimation theory, the error can be considered as a random variable that follows a complex Gaussian distribution [41], [42], i.e., ∆h k ∼ CN (0, k I) with the variance k being modeled as where the SNR at UE k is supposed to be ρ k P max BS σ 2 k for the worst-case design, δ ≥ 0 and λ > 0. It is clear that δ = 0 corresponds to the perfect CSI scenario.
The SINR at UE k under channel uncertainty can be rewritten aŝ whereγ k,k (w, α) denotes the SINR for decoding UE k 's message at UE k itself andγ ,k (w, α) stands for the effective SINR for decoding UE k 's message at UE (see (7)), which are expressed asγ whereΨ k (w, α) andΘ ,k (w, α) are defined aŝ Subsequently, the sum SE can be rewritten aŝ Therefore, a robust design problem for EE maximization can be formulated as By utilizing the same steps (17)- (19) to derive (20), the relaxed robust design problem for EE maximization can be expressed as max w,α,ωF s.t.
where τ [τ k ] ∀k∈K is introduced as a new variable, whereas the channel estimatesĤ To derive a solution for the robust design problem (33), we utilize Algorithm 1 with the following modifications. In Step 1, instead of solving (25), the initial starting point is generated by solving the following problem: Subsequently, the convex problem (24) is replaced by (35) in Step 3,and Step 4 is performed to update the hextuple α , ω , µ , τ , ϑ ). The customized algorithm is presented in Algorithm 2.

V. NUMERICAL RESULTS
To evaluate the proposed methods, we consider a DL cellular system comprising a centrally located BS equipped with N antennas and K single-antenna UEs. The Monte Carlo framework is used to investigate the performance of the considered schemes. The channel is assumed to undergo flat fading; the small-scale fading is modeled as a random variable that follows a circularly-symmetric complex normal distribution, i.e., h ∼ CN (0, PL BS,UE I), where PL BS,UE is the path loss (PL) between the BS and UE with distance d BS,UE . Other simulation parameters are shown in Table I, in which the power consumption and PA efficiency parameters are obtained from [29]. The proposed algorithms are To quantify the effectiveness of the proposed scheme, we compare the proposed scheme with the following benchmark schemes: • Exhaustive Search for Pairing (ESP): This scheme considers all possible cases of α to derive a set of subproblems. Subsequently, each sub-problem corresponding to each value of α is solved using the proposed algorithm. Finally, the optimal solution is selected as it provides the best performance among the sub-problems.
• Random Pairing (RaP): In this scheme, the proposed algorithm is utilized to obtain the beamforming vectors for a sub-problem, wherein the UE association matrix α is fixed by generating random values for the upper triangle part of α.
• Beamforming design (BFD): For the conventional BFD, α is set to an all-zero matrix; subsequently, the proposed algorithm is used to compute the power control for beamforming vectors.
Among these three schemes, the ESP method would yield the highest EE, albeit having the highest complexity. Without using UE pairing, the BFD method will provide the lowest EE but require the lowest complexity due to the simple design. RaP is considered as an intermediate solution for the two extreme cases with a performance worse than that of ESP but with a much lower complexity as specified in Section V-A.   NOMA-based systems.

A. EE PERFORMANCE AND COMPLEXITY
In addition to EE performance, computational complexity is a key metric for evaluating the effectiveness of algorithms. In particular, the convergence rate, which must be intuitively investigated via simulations, is typically used as a fundamental criterion of computational complexity. Fig. 4 shows the convergence behavior under a typical channel scenario with different numbers of UEs, e.g., K = 6 and K = 10. It is noteworthy that the ESP comprises many sub-problems, each of which is equivalent to a problem using the RaP scheme. Because the proposed algorithm is designed to generalize the UE pairing methods, it is sufficient to consider the RaP scheme in terms of the number of iterations while omitting the ESP scheme. In general, in Alg. 1, the RaP and BFD schemes require a similar number of iterations to reach the convergence, i.e., approximately 10 iterations. The jumping phenomenon in the convergence behavior of the proposed algorithm is caused by the first step of phase 2, in which the relaxed values of matrix α were recovered into binary ones. Nevertheless, the saturation values for the objective functions are quickly reached after about 10 iterations and remain constant afterwards, providing a fast convergence rate.
Another crucial criterion of computational complexity is the per-iteration complexity in terms of big-O as a function of the number of variables and that of SOC/linear constraints [38]. We present the comparison of Alg. 1 and considered schemes based on this criterion in Table 2. Accordingly, the complexity per iteration for both the ESP (in a sub-problem) and RaP (in a problem/sub-problem) schemes is typically expressed as O((v 2.5 2 (c 2 2 + v 2 ))). Meanwhile, the BFD method has a similar big-O expression; however, it requires a smaller number of constraints, i.e., K 2 + 2K + 2. This confirms that the order of this complexity expression is the same as that of the complexity required by the proposed algorithm, as mentioned in Section III-C. With the presence of a large K, the per-iteration complexities for all the methods above are comparable with each other. However, the ESP method requires a considerably long operation time owing the large number of sub-problems. Meanwhile, the proposed method and other methods merely address an original problem at one time.

B. EFFECT OF QOS REQUIREMENT
In this subsection, we present the effects of the QoS requirement and maximum power budget at the BS on the EE performance. Fig. 5 shows the average EE as a function of the minimum bit rate (or the QoS threshold),R, in bits/s/Hz. As expected, the EE of all methods degrades as the minimum bit rateR increases. For the same QoS threshold, the adaptive capability of the proposed algorithm as compared with the BFD and RaP schemes is confirmed, providing the performance gains of at least 0.5 Mbits/J. Intuitively, it is feasible to obtain a good solution based on Alg. 1 for higher QoS requirements, i.e.,R ∈ {1.8, 2.0}. More interestingly, Fig. 5 shows that the performance of the proposed method is similar to that of the ESP, with a small EE loss of approximately 0.15 Mbits/J.
To further comprehend the EE behavior under changes in the QoS requirement, we plot the CDF ofR, as shown in Fig. 6. It can be observed that the probabilities of obtaining a feasible point for the four considered strategies are inversely proportional toR. The 95-percentile points of the BFD and RaP schemes are achieved at approximatelyR = 1.6 and 2 bits/s/Hz, respectively, rendering it difficult to obtain the feasible points. By contrast, Alg. 1 and the ESP methods have a 95-percentile point around 3 bits/s/Hz. These results clearly reflect the observation in Fig. 5, thereby confirming the advantage of the proposed scheme for the NOMA-based system.   Fig. 7 shows the change in EE when the maximum power budget at the BS, i.e., P max BS , increases. As shown, the proposed method outperforms both the BFD and RaP schemes while providing an EE similar to that obtained using ESP. In particular, compared with the BFD scheme, the performance gain of the RaP scheme at the floor level is approximately 0.4 Mbits/J, which is less than that of the proposed method, i.e., approximately 1.4 Mbits/J. This implies the capability of the proposed approach in obtaining a good solution for NOMAbased systems. In addition, Fig. 7 shows that the EE of all schemes reaches saturated values when the maximum power budget is sufficiently high, i.e., P max BS = 26 dBm. To obtain insights into the EE behavior of the proposed 10  method, Fig. 8 shows the relationship between the sum rate and average total power consumption, W , with respect to different values of minimum bit rates and maximum power budgets at the BS, e.g.,R ∈ [0.2, 2] bits/s/Hz and P max BS ∈ {30, 38} dBm, respectively. It is noteworthy that the power consumption is directly proportional to the minimum bit rate, whereas the sum rate shows a reverse trend. In fact, the higher the power resource, the greater is the degree of freedom provided to satisfy the QoS constraint. This is verified by the observation that the gap between P max BS = 30 dBm and P max BS = 38 dBm increases with the QoS threshold, beginning fromR > 1 bits/s/Hz. WhenR < 1 bits/s/Hz, the values of the sum rate and power consumption in the two cases of P max BS = 30 dBm and P max BS = 38 dBm are similar to each other. Consequently, the EE is unchanged, thereby depicting a sufficient power budget. This result validates the observations from Fig. 7.

D. EFFECT OF NUMBER OF UES
To further investigate the effectiveness of the proposed algorithm, we now evaluate the EE performance of the proposed algorithm with respect to the number of UEs. Since the BFD yields the worst EE performance, as mentioned at the beginning of this section, we therefore consider the RaP method as the baseline for UE pairing. In addition to the Alg. 1 and RaP scheme, we investigate the EE of the following typical greedy-based pairing schemes. 5 • Greedy pairing for channel gain difference (GP-CGD): This refers to the pairing scheme in [12], wherein if the k-th and (K − k + 1)-th UEs are paired, then it corresponds to α k,K−k+1 = 1, 1 ≤ k ≤ K/2 .
• Greedy pairing for dynamic zone boundary (GP-DZB): As in the pairing scheme in [14], the boundary of the inner and outer zones is dynamically determined using the sorted list of channel gain. Accordingly, two sets of UEs are established: the strong-zone set includes the K/2 strongest UEs from the BS, and the weakzone set contains the remaining K − K/2 UEs. Subsequently, the strongest UEs in the strong-zone and weak-zone sets are popped out and paired to each other until the strong-zone set is empty. This results in α k,K− K/2 +k = 1, 1 ≤ k ≤ K/2 . • Consecutive paring (CoP): This refers to the pairing scheme in [13], where two consecutive UEs are paired with each other, and the (2k − 1)-th and (2k)-th UEs are paired based on α 2k−1,2k = 1, 1 ≤ k ≤ K/2 . Fig. 9 illustrates the behavior of the EE of different pairing schemes when the number of UEs K changes, with a fixed value of the minimum bit rate,R = 0.5 bits/s/Hz. It is noteworthy that Alg. 1 is generalized to all pairing schemes; therefore, the three greedy-based methods can be straightforwardly employed by changing the values of matrix α in the proposed algorithm. As shown in Fig. 9, Alg. 1 significantly outperforms the other pairing schemes, even in the case where the degree of freedom is insufficient, i.e., with the number of antennas at the BS N = 4. This gap quickly enlarges when K increases, starting from K ≥ 6. With a high N (N = 8), while the EE of all the five paring schemes increases in proportion to K, the proposed method is still superior to all greedy-based ones, thereby verifying the adaptability and stability of UE pairing with respect to various numbers of UEs in the cell. These simulation results confirm that the proposed method enables a unified UE pairing framework that is applicable for all greedybased schemes as well as random pairing. By solving the joint optimization problem, it provides the best performance among the existing pairing schemes under different network topologies. In addition, as afforded by the RaP method, the GP-based and CoP schemes utilize the pairing matrix provided for computing the beamforming vectors, resulting in an equivalent computational complexity in terms of big-O. This validates the advantages of the proposed algorithm over existing methods for UE pairing.

E. EFFECT OF IMPERFECT CSI
In this subsection, we evaluate the effectiveness of the proposed robust design under a practical scenario in the presence of imperfect CSI. Fig. 10 presents the average EE as a function of the stochastic error level δ, where δ is described in (28) with parameters λ ∈ {0.3, 0.4}. For simplicity, we adopt the three aforementioned schemes, i.e., the ESP, RaP and BFD schemes.
As shown in Fig. 10, the average EE of all four cases with λ = 0.4 is lower than those with λ = 0.3. This shows that the system performance degrades as the accuracy of channel estimation decreases, i.e., with λ = 0.4. Furthermore, this corresponds to a larger estimation error compared with λ = 0.3. Recall that δ = 0 implies perfect CSI cases. Hence, one obtains the same EE values of each scheme at δ = 0 for any value of λ. For each considered scheme, a gap between two cases, i.e., λ = 0.3 (solid lines) and λ = 0.4 (dashed lines), is observed to increase with the growth of δ. In this practical scenario, the proposed scheme performs significantly better than the RaP and BFD schemes, while providing an average EE similar to that of the ESP scheme. Hence, this verifies the robustness of the proposed algorithm against channel uncertainty.

VI. CONCLUSIONS
In this paper, we formulated a generalized framework for a hybrid design of beamforming and NOMA methods in a DL network by constructing a matrix of UE pairing binary variables. In fact, this auxiliary matrix facilitated the optimization problem in terms of EE by reducing the number of binary variables and constraints. Our proposed formulation was devised to yield an optimal UE pairing solution, i.e., it may pair two arbitrary UEs in a cell despite the geographical conditions. Therefore, the entire channel spectrum can be efficiently exploited instead of splitting it into sub-channels as in the conventional method. Given that the EE maximization problem belongs to mixed-integer nonconvex programming, which has scarcely been solved by previous approaches, we thus derived a low-complexity iterative algorithm (Algorithm 1) based on the IA method and Dinkelbach transformation.
Numerical results with realistic parameters were presented to demonstrate the effectiveness of the proposed algorithm. The proposed algorithm achieves a significant EE gain over two benchmark schemes, BFD and RaP, which are the traditional beamforming design and traditional UE pairing for NOMA, respectively, while providing a fast convergence rate and maintaining an acceptable complexity. In addition, the advantages of the proposed algorithm over the existing schemes were validated via simulation results based on the QoS requirement, maximum BS power budget, and number of UEs. For the imperfect CSI scenario, we introduced a robust design developed from the proposed algorithm, namely Algorithm 2, to address a realistic scenario where the channel estimation is inaccurate. An iterative low-complexity algorithm was verified via a stochastic estimation error model to demonstrate its robustness and adaptability over the classical RaP and BFD schemes, while providing an EE performance comparable to the ESP scheme. .

APPENDIX C PROOF OF PROPOSITION 1
To convexify problem (34), we must address the feasible set and objective function. As shown, the feasible set of problem (34) is nonconvex owing to the nonconvex constraint (34c), whereas the objective function performed as a ratio function of sum-log over quadratic form is nonconcave.