Is NOMA Efficient in Multi-Antenna Networks? A Critical Look at Next Generation Multiple Access Techniques

In the past few years, a large body of literature has been created on downlink Non-Orthogonal Multiple Access (NOMA), employing superposition coding and Successive Interference Cancellation (SIC), in multi-antenna wireless networks. Furthermore, the benefits of NOMA over Orthogonal Multiple Access (OMA) have been highlighted. In this paper, we take a critical and fresh look at the downlink Next Generation Multiple Access (NGMA) literature. Instead of contrasting NOMA with OMA, we contrast NOMA with two other multiple access baselines. The first is conventional Multi-User Linear Precoding (MU–LP), as used in Space-Division Multiple Access (SDMA) and multi-user Multiple-Input Multiple-Output (MIMO) in 4G and 5G. The second, called Rate-Splitting Multiple Access (RSMA), is based on multi-antenna Rate-Splitting (RS). It is also a non-orthogonal transmission strategy relying on SIC developed in the past few years in parallel and independently from NOMA. We show that there is some confusion about the benefits of NOMA, and we dispel the associated misconceptions. First, we highlight why NOMA is inefficient in multi-antenna settings based on basic multiplexing gain analysis. We stress that the issue lies in how the NOMA literature, originally developed for single-antenna setups, has been hastily applied to multi-antenna setups, resulting in a misuse of spatial dimensions and therefore loss in multiplexing gains and rate. Second, we show that NOMA incurs a severe multiplexing gain loss despite an increased receiver complexity due to an inefficient use of SIC receivers. Third, we emphasize that much of the merits of NOMA are due to the constant comparison to OMA instead of comparing it to MU–LP and RS baselines. We then expose the pivotal design constraint that multi-antenna NOMA requires one user to fully decode the messages of the other users. This design constraint is responsible for the multiplexing gain erosion, rate and spectral efficiency loss, ineffectiveness to serve a large number of users, and inefficient use of SIC receivers in multi-antenna settings. Our analysis and simulation results confirm that NOMA should not be applied blindly to multi-antenna settings, highlight the scenarios where MU–LP outperforms NOMA and vice versa, and demonstrate the inefficiency, performance loss, and complexity disadvantages of NOMA compared to RSMA. The first takeaway message is that, while NOMA is suited for single-antenna settings (as originally intended), it is not efficient in most multi-antenna deployments. The second takeaway message is that another non-orthogonal transmission framework, based on RSMA, exists which fully exploits the multiplexing gain and the benefits of SIC to boost the rate and the number of users to serve in multi-antenna settings and outperforms both NOMA and MU–LP. Indeed, RSMA achieves higher multiplexing gains and rates, serves a larger number of users, is more robust to user deployments, network loads and inaccurate channel state information and has a lower receiver complexity than NOMA. Consequently, RSMA is a promising technology for NGMA and future networks such as 6G and beyond.


I. INTRODUCTION
I N contrast to Orthogonal Multiple Access (OMA) that assigns users to orthogonal dimensions (e.g., Time-Division Multiple Access -TDMA, Frequency-Division Multiple Access -FDMA), (power-domain) Non-Orthogonal Multiple Ac-cess (NOMA) 1 superposes users in the same time-frequency resource and distinguishes them in the power domain [1]- [5]. By doing so, NOMA has been promoted as a solution for 5G and beyond to deal with the vast throughput, access, and quality of service (QoS) requirements that are projected to grow exponentially for the foreseeable future.
In the downlink, NOMA refers to communication schemes where at least one user is forced to fully decode the message(s) of other co-scheduled user(s). This operation is commonly performed through the use of transmit-side superposition coding (SC) and receiver-side Successive Interference Cancellation (SIC) in downlink multi-user communications. Such techniques have been studied for years before being branded with the NOMA terminology. NOMA has indeed been known in the information theory and wireless communications literature for several decades, under the terminology of superposition coding with successive interference cancellation (denoted in short as SC-SIC), as the strategy that achieves (and has been used in achievability proofs for) the capacity region of the Single-Input Single-Output (SISO) (Gaussian) Broadcast Channel (BC) [6]. The superiority of NOMA over OMA was shown in the seminal paper by Cover in 1972. It is indeed well known that the capacity region of the SISO BC (achieved by NOMA) is larger than the rate region achieved by OMA (i.e. contains the achievable rate region of OMA as a subset) [6]- [8]. The use of SIC receivers is a major difference between NOMA and OMA, although it should be mentioned that SIC has also been studied for a long time in the 3G and 4G research phases in the context of interference cancellation and receiver designs [9].
In today's wireless networks, access points commonly employ more than one antenna, which opens the door to multiantenna processing. The key building block of the downlink of multi-antenna networks is the multi-antenna (Gaussian) BC. Contrary to the SISO BC that is degraded and where users can be ordered based on their channel strengths, the multiantenna BC is nondegraded and users cannot be ordered based on their channel strengths [7], [10]. This is the reason why SC-SIC/NOMA is not capacity-achieving in this case, and Dirty Paper Coding (DPC) is the only known strategy that achieves the capacity region of the multi-antenna (Gaussian) BC with perfect Channel State Information at the Transmitter (CSIT) [10]. Due to the high computational burden of DPC, linear precoding is often considered the most attractive alternative to simplify the transmitter design [11]- [15]. Interestingly, in a multi-antenna BC, Multi-User Linear Precoding (MU-LP) relying on treating the residual multi-user interference as noise, although suboptimal, is often very useful since the interference can be significantly reduced by spatial precoding. This is the reason why it has received significant attention in the past twenty years and it is the basic principle behind numerous 4G and 5G techniques such as Space-Division Multiple Access (SDMA) and multi-user (potentially massive) Multiple-Input Multiple-Output (MIMO) [15].
In view of the benefits of NOMA over OMA and multiantenna over single-antenna, numerous attempts have been made in recent years to combine multi-antenna and NOMA schemes [1]- [5], [16]- [39] (and references therein). Although there are a few contributions considering the comparison of NOMA with MU-LP schemes such as Zero-Forcing Beamforming (ZFBF) or DPC [26], [38]- [40], much emphasis is put in the NOMA literature on comparing (single/multi-antenna) NOMA and OMA, and showing that NOMA outperforms OMA. But there is a lack of emphasis in the NOMA literature on contrasting multi-antenna NOMA to other multi-user multiantenna baselines developed for the multi-antenna BC such as MU-LP (or other forms of multi-user MIMO techniques) and other forms of (power-domain) non-orthogonal transmission strategies such as Rate-Splitting Multiple Access (RSMA) based on multi-antenna Rate-Splitting (RS) [41]. RS designed for the multi-antenna BC also relies on SIC and has been developed in parallel and independently from NOMA [41]- [47]. Such a comparison is essential to assess the benefits and the efficiency of NOMA, since all these communication strategies can be viewed as different achievable schemes for the multi-antenna BC and all aim in their own way for the same objective, namely meet the throughput, reliability, QoS, and connectivity requirements of beyond-5G multi-antenna wireless networks.
In this paper, we take a critical look at multi-antenna NOMA for the downlink of communication systems and ask ourselves the important question "Is multi-antenna NOMA an efficient strategy?" To answer this question, we go beyond the conventional NOMA vs. OMA comparison, and contrast multiantenna NOMA with MU-LP and RS-based non-orthogonal transmission strategies. This allows us to highlight some misconceptions and shortcomings of multi-antenna NOMA. Explicitly, we show that in most scenarios the short answer to that question is no, and demonstrate based on first principles and numerical performance evaluations why this is the case. Our discussions and results unveil the scenarios where MU-LP outperforms NOMA and vice versa, and demonstrate that multi-antenna NOMA is inefficient compared to RS. By contrasting multi-antenna NOMA to MU-LP and RS, we show that there is some confusion about multi-antenna NOMA and its merits, expose major misconceptions and reveal new insights. The contributions of this paper are summarized as follows.
First, we analytically derive both the sum multiplexing gain as well as the max-min fair multiplexing gain of multi-antenna NOMA and compare them to those of MU-LP and RS. The scenarios considered are very general and include multiantenna transmitter with single-antenna receivers, perfect and imperfect CSIT, in underloaded and overloaded regimes. On the one hand, multi-antenna NOMA can achieve gains, but can also incur losses compared to MU-LP. On the other hand, multi-antenna NOMA always leads to a waste of multiplexing gain compared to RS. The multiplexing gain analysis provides a firm theoretical ground to infer that multi-antenna NOMA is not as efficient as RS in exploiting the spatial dimensions and the available CSIT. This analysis is instrumental to identify the scenarios where the multiplexing gain gaps among NOMA, MU-LP, and RS are the smallest/largest, therefore highlighting deployments that are suitable/unsuitable for the different multiple access strategies.
Second, we show that multi-antenna NOMA leads to a high receiver complexity due to the inefficient use of SIC. For instance, we show that the higher the number of SIC operations (and therefore the higher the receiver complexity) in multiantenna NOMA, the lower the sum multiplexing gain (and therefore the lower the sum-rate at high SNR). Comparison with MU-LP and RS show that higher multiplexing gains can be achieved at a lower receiver complexity and a reduced number of SIC operations.
Third, we show that most of the misconceptions behind NOMA are due to the prevalent comparison to OMA instead of comparing to MU-LP and RS. We show and explain that the misconceptions, the multiplexing gain reduction, and the inefficient use of SIC receivers in both underloaded and overloaded multi-antenna settings reyling on both perfect and imperfect CSIT originate from a limitation of the multiantenna NOMA design philosophy, namely that one user is forced to fully decode the messages of the other users. Hence, while forcing a user to fully decode the messages of the other users is an efficient approach in single-antenna degraded BC, it may not be an efficient approach in multi-antenna networks.
Fourth, we stress that an efficient design of non-orthogonal transmission and multiple access strategies ensures that the  Section IX. Conclusions and Future Works use of SIC never leads to a performance loss but rather leads to a performance gain over MU-LP. We show that such nonorthogonal solutions based on RS exist and truly benefit from the multi-antenna multiplexing gain and from the use of SIC receivers in both underloaded and overloaded regimes relying on perfect and imperfect CSIT. In fact, multi-antenna RS completely resolves the design limitations of multi-antenna NOMA. Fifth, we depart from the multiplexing gain analysis and design the transmit precoders to maximize the sum-rate and maxmin rate for multi-antenna NOMA, followed by numerically comparing the sum-rate and the max-min fair rate of NOMA to those of MU-LP and RS. We show that the multiplexing gain analysis is accurate and instrumental to predict the rate performance of the multiple access strategies considered.
Sixth, our numerical simulations confirm the inefficiency of multi-antenna NOMA in general settings. Multi-antenna NOMA is shown to lead to performance gains over MU-LP in some settings but also to losses in other settings despite the use of SIC receivers and a higher receiver complexity. Our results also highlight the significant benefits, performance-wise and receiver complexity-wise, of RSMA and multi-antenna RS over multi-antenna NOMA. It is indeed possible to achieve a significantly better performance than MU-LP and NOMA with just one layer of SIC by adopting RS so as to partially decode messages of other users (instead of fully decoding them as in NOMA).
Organization: The remainder of this paper is organized as follows. Section II introduces two-user Multiple-Input Single-Output (MISO) NOMA (with single-antenna receivers) as a basic building block (and toy example) for our subsequent studies, compares to MU-LP, and raises some questions about the efficiency of NOMA. Section III studies the multiplexing gain of K-user MISO NOMA with perfect CSIT. Section IV extends the discussion to imperfect CSIT. Section V and Section VI study the multiplexing gains of the baseline schemes considered, namely MU-LP and RS, respectively. Section VII compares the multiplexing gains of all multiple access schemes considered and exposes the misconceptions and shortcomings of multi-antenna NOMA. Section VIII provides simulation results. Section IX concludes this paper. An overview of the paper is illustrated in Table I.
Notation: |·| refers to the absolute value of a scalar or to the cardinality of a set depending on the context. · refers to the l 2 -norm of a vector. max{a 1 , ..., a n } refers to the maximum value between a 1 to a n . a H denotes the Hermitian transpose of vector a. Tr(Q) refers to the trace of matrix Q. I is the identity matrix. P ր means as P grows large. CN (0, σ 2 ) denotes the circularly symmetric complex Gaussian distribution with zero mean and variance σ 2 . ∼ stands for "distributed as". O(·) refers to the big O notation. E · denotes statistical expectation. A ∩ B and A ∪ B refer to the intersection (A and B have to be satisfied) and the union (A or B to be satisfied) of two sets/events A and B, respectively.
II. TWO-USER MISO NOMA WITH PERFECT CSIT: THE BASIC BUILDING BLOCK We commence by studying two-user MISO NOMA and show that, by comparing NOMA to MU-LP instead of to OMA, the potential merits of NOMA are less obvious. Limited to two single-antenna users with perfect CSIT, this system model illustrates the simplest though fundamental building block of multi-antenna NOMA. Fig. 1: Two-user system architecture with NOMA (decoding order: user-2→user-1).

A. System Model
We consider a downlink single-cell multi-user multi-antenna scenario with K = 2 users, also known as two-user MISO BC, consisting of one transmitter with M ≥ 2 antennas communicating with two single-antenna users. The transmitter aims to transmit simultaneously two messages W 1 and W 2 intended for user-1 and user-2, respectively.
The transmitter adopts the so-called multi-antenna NOMA or MISO NOMA strategy, illustrated in Fig. 1, that encodes one of the two messages using a codebook shared by both users 2 so that it can be decoded and cancelled from the received signal at the co-scheduled user (following the same principle as superposition coding for the degraded BC). Consider W 2 is encoded into s 2 using the shared codebook and W 1 is encoded into s 1 . The two streams are then linearly precoded by M × 1 precoders 3 p 1 and p 2 and superposed at the transmitter so that the transmit signal is given by Defining s = [s 1 , s 2 ] T and assuming that E[ss H ] = I, the average transmit (sum) power constraint is written as P 1 + P 2 ≤ P where P k = p k 2 with k = 1, 2.
The channel vector for user k is denoted by h k , and the received signal at user k can be written as y k = h H k x + n k , k = 1, 2, where n k ∼ CN (0, 1) is Additive White Gaussian Noise (AWGN). We assume perfect CSIT and perfect Channel State Information at the Receivers (CSIR).
At both users, stream s 2 is decoded first into 4 W 2 by treating the interference from s 1 as noise. Using SIC at user-1, W 2 is re-encoded, precoded, and subtracted from the received signal, such that user-1 can decode its stream s 1 into W 1 . Assuming proper Gaussian signalling and perfect SIC 5 , the achievable 2 This is not an issue in modern systems since, for example, in an LTE/5G NR system, all codebooks are shared since all users use the same family of modulation and coding schemes (MCS) specified in the standard. 3 The precoders p 1 and p 2 can be any vectors that satisfy the power constraint, though the best choice of precoders would depend on the objective function. 4 Though not expressed explicitly, W 2 is receiver dependent since both receivers decode s 2 and the same estimate is not necessarily obtained at both receivers. Hence, more rigorously, we could have written W 2,k , k = 1, 2 to refer to the estimate at user-k. For simplicity of exposure, we have nevertheless opted to drop the index k. 5 Note there is no error in the SIC operation since the rates are achievable under Gaussian signalling and infinite block length. rates of the two streams with MISO NOMA are given by 6 where In (3), log 2 (1 + A) is the rate supportable by the channel of user-1 when user-1 decodes s 2 and treats its own stream s 1 as noise. Similarly, log 2 (1 + B) is the rate supportable by the channel of user-2 when user-2 decodes its own stream s 2 while treating stream s 1 of user-1 as noise. The min in (3) is due to the fact that s 2 , though carrying message W 2 intended to user-2, is decoded by both users and is therefore transmitted at a rate decodable by both users. The most common performance metric of a multi-user system is the sum-rate. In this two-user MISO NOMA system model, the sum-rate is defined as R and can be upper bounded 7 as It is important to note that (5) can be interpreted as the sum-rate of a two-user multiple access channel (MAC) with a single-antenna receiver. Indeed, user-1 acts as the receiver of a two-user MAC whose effective SISO channels for both links are given by h H 1 p 2 and h H 1 p 1 , respectively. This observation will be revisited in the next few sections, and will be shown very helpful to explain the performance of multi-antenna NOMA.
A drawback of the sum-rate is that it does not capture the concept of rate fairness among the users. Another popular system performance metric is the Max-Min Fair (MMF) rate or symmetric rate defined as R k . MMF metric provides uniformly good quality of service since it aims for maximizing the minimum rate among all users.
Throughout the manuscript, we will focus on the sum-rate and the MMF rate as two very different metrics to assess the system performance. We choose these two metrics as they are commonly used in wireless networks, and in the NOMA literature in particular (see e.g., [17], [21], [27], [29], [30] for the sum-rate and [31], [32], [36], [48], [49] for the MMF rate). They are representative for two very different operational regimes, with the former focusing on high system throughput and the latter on user fairness.
In the sequel, we introduce some useful definitions and then make some observations based on this two-user system model.

B. Definition of Multiplexing Gain
Throughout the manuscript, we will often refer to the multiplexing gain to quantify how well a communication strategy can exploit the available spatial dimensions. We define the multiplexing gain, also referred to as Degrees-of-Freedom (DoF), of user-k achieved with communication strategy 8 and the sum multiplexing gain as where R k is the sum-rate. We also define the MMF multiplexing gain as where R k is a first-order approximation of the rate of user-k at high Signal-to-Noise Ratio (SNR). d (j) k can be viewed as the pre-log factor of the rate of user-k at high SNR and be interpreted as the number or fraction of interference-free stream(s) that can be simultaneously communicated to user-k by employing communication strategy j. The larger d (j) k , the faster the rate of user-k increases with the SNR. Hence, ideally a communication strategy should achieve the highest multiplexing gain possible.
The sum multiplexing gain d (j) s is a first-order approximation of the sum-rate at high SNR and therefore the prelog factor of the sum-rate and can be interpreted as the total number of interference-free data streams that can be simultaneously communicated to all K users by employing communication strategy j. In other words, R (j) s scales as d (j) s log 2 (P )+δ where δ is a term that scales slowly with SNR such that lim P →∞ δ log 2 (P ) = 0 (e.g. O(1), O(log 2 (log 2 (P ))) or O( log 2 (P ))), and the larger d (j) s , the faster the sum-rate increases with the SNR.
The MMF multiplexing gain d mmf , also referred to as symmetric multiplexing gain, corresponds to the maximum multiplexing gain that can be simultaneously achieved by all users, and reflects the pre-log factor of the MMF rate at high SNR. In other words, R  Remark 1: Much of the analysis and discussion in this paper emphasizes the (sum and MMF) multiplexing gain as a metric to assess the capability of a strategy to exploit multiple antennas. As it becomes plausible from its definition, the multiplexing gain is an asymptotic metric valid in the limit of high SNR, and hence, does not precisely reflect specific finite-SNR rates. Nevertheless, it provides firm theoretical grounds 8 Throughout this paper, j will be either N for NOMA, M for MU-LP, R for Rate-Splitting, or ⋆ for the information theoretic optimum, i.e., j ∈ {N, M, R, ⋆}.
for performance comparisons and has been used in the MIMO literature for two decades [50]. Furthermore, the multiplexing gain also impacts the performance at finite SNRs as shown in numerous papers [43], [44], [51] and in our simulation results in Section VIII. Moreover, it enables to gain deep insights into the performance limits and to guide the design of efficient communications strategies, as we will see throughout this paper.

C. Discussions
Equations (2) and (3), respectively, suggest that s 1 is received interference-free at user-1, and that s 2 is always decoded in the presence of interference from s 1 . This has as a consequence that MISO NOMA limits the sum multiplexing gain to d = 1, i.e., the same as OMA. Indeed, the sum-rate bound (5) achieved by this two-user MISO NOMA strategy and user ordering user-2→user-1 can be further upper bounded as where the equality in (11) is achieved (i.e. upper bound is tight) by choosing Had we considered the other decoding order where the shared codebook is used to encode W 1 and user-2 decodes s 1 , the role of user-1 and user-2 in Fig. 1 would have been switched (user-1→user-2) and we would have obtained Hence, the sum-rate of MISO NOMA considering adaptive decoding order is upper bounded as and the sum-rate is increased by letting the strong user arg max k=1,2 h k decode the weak user arg min k=1,2 h k . Considering the high SNR regime, (9), (10), (11) all scale at most as log 2 (P ), i.e.
which highlights that the sum multiplexing gain of two-user MISO NOMA (irrespectively of the decoding order) is (at most) one, i.e. d (N) s = 1. Moreover, (11) reveals the stronger result that the sum-rate of MISO NOMA is actually no higher than that of OMA for any SNR! This fact is not surprising in the SISO case (M = 1) since it is well known that to achieve the sum capacity of the SISO BC, one can simply transmit to the strongest user all the time (i.e., OMA) [52]. The above result shows that this also holds for the two-user MISO NOMA basic building block.
The sum multiplexing gain of one can be further split equally amongst the two users, which leads to an MMF multiplexing gain of the two-user MISO NOMA given by d 2 . This is achieved by scaling the power allocated to user-1 as O(P 1/2 ) and that to user-2 as O(P ). In other words, the MMF rate of this two-user MISO NOMA scales at most as 1 2 log 2 (P ) at high SNR. The above contrasts with the optimal sum multiplexing gain d (⋆) s of the two-user MISO BC, that is equal to 2, i.e., two interference-free streams can be transmitted 9 . This can be achieved by performing conventional MU-LP, illustrated in Fig. 2. Recall MU-LP system model where W 1 and W 2 are independently encoded into streams s 1 and s 2 and respectively precoded by p 1 and p 2 such that the transmit signal is given by At the receivers, y k = h H k x + n k , k = 1, 2, and s 1 and s 2 are respectively decoded by user-1 and user-2 by treating any residual interference as noise, leading to MU-LP rates with and B as specified in (4). It is then indeed sufficient 10 to transmit two streams using uniform power allocation and Zero-Forcing Beamforming (ZFBF), so that h H 1 p 2 = h H 2 p 1 = 0, to reap the sum multiplexing gain d mmf = 1 (i.e., each user gets one full interference-free stream). Indeed, with MU-LP, the sum-rate scales as 2 log 2 (P ) and the MMF rate as log 2 (P ) at high SNR [53]- [55]. Such sum-rate and MMF rate would always strictly outperform that of NOMA (and OMA) at high SNR. Since both OMA and NOMA achieve only half the (sum/MMF) multiplexing gain of MU-LP in the two-user MISO BC considered, it is not clear whether (and under what conditions) multi-antenna NOMA can outperform MU-LP and other forms of multi-user multi-antenna communication strategies, and if it does, whether multi-antenna NOMA is worth the associated increase in receiver complexity. The above discussion exposes some weaknesses of multi-antenna NOMA and highlights the uncertainty regarding the potential benefits of multi-antenna NOMA. Hence, in the following sections, we derive the multiplexing gains of generalized Kuser multi-antenna NOMA, so as to better assess its potential.
Remark 2: It appears from (1) and (13) that the transmit signal vectors for 2-user MISO NOMA and 2-user MU-LP 9 This assumes that the two channel directions are not aligned, or in other words, that the rank of the matrix h 1 h 2 is equal to 2. Note that this condition is met in practice. 10 More complicated precoders can be used to enhance the rate performance, but the sum and MMF multiplexing gains will not improve. are the same, therefore giving the impression that NOMA is the same as MU-LP. This is obviously incorrect. Recall the major differences in the encoding and the decoding of NOMA and MU-LP: • Encoding: In NOMA, W 1 is encoded into s 1 and W 2 is encoded into s 2 at a rate such that s 2 is decodable by both users, while W 1 and W 2 are independently encoded into streams s 1 and s 2 in MU-LP. • Decoding: user-1 decodes s 1 and s 2 and user-2 decodes s 2 by treating s 1 as noise in NOMA while s 1 is decoded by user-1 by treating s 2 as noise and s 2 is decoded by user-2 by treating s 1 as noise in MU-LP. Consequently the rate expressions (2), (3) and (14) are different, which therefore suggests that the best pair of precoders p 1 and p 2 that maximizes a given objective function (e.g. sumrate, MMF rate, etc) would be different for NOMA and MU-LP. Choosing p 1 and p 2 according to ZFBF would commonly work reasonably well for MU-LP but would lead to R (N) 2 = 0 in (3) for NOMA. Nevertheless, the above discussion on multiplexing gain loss of MISO NOMA always holds, even in the event where MISO NOMA is implemented with the best choice of precoders, since the above analysis for MISO NOMA is based on upperbound.

III. K -USER MISO NOMA WITH PERFECT CSIT
We now study K-user MISO NOMA relying on perfect CSIT and derive the sum and MMF multiplexing gains.

A. MISO NOMA System Model
We consider a K-user MISO NOMA scenario where a single transmitter equipped with M transmit antennas serves K single-antenna users indexed by K = {1, · · · , K}. The K users are grouped into 1 ≤ G < K groups 11 with groups indexed by G = {1, · · · , G}. There are g users per group, i.e., we therefore assume for simplicity that K = gG. Users in group i are indexed by K i = {ig − g + 1, · · · , ig}. Hence, K = i∈G K i and |K i | = g. Without loss of generality, we assume that users 1, g +1, 2g +1, ..., K −g +1 are the "strong users" 12 respectively in group 1 to G, and perform g −1 layers of SIC to fully decode the messages (and therefore remove interference) from the other g −1 users within the same group. Similarly, the second user in each group (i.e., ig−g+2 in group i) performs g − 2 layers of SIC to fully decode messages from other g − 2 users within the same group, and so on. The two 11 Note that 1 ≤ G < K is a widely considered option for MISO NOMA in which there exists (at least) one user decoding the message of (at least) one another user in each group. Importantly, G = K corresponds to MU-LP as per Section V and is not a MISO NOMA scheme since all K messages are independently encoded into K streams and residual interference is treated as noise at the receivers, i.e. there is no shared codebook and users therefore do not decode the messages of other users. 12 "strong users" here refer to the users who decode the messages of other users in a group. Given the nondegraded nature of the multi-antenna BC, the strong users do not necessarily have to be the users with the largest channel vector norm. The multiplexing gain analysis is general and holds for any ordering. Nevertheless, following [19], [20], we consider in the simulation section the decoding order in each group to be the ascending order of users' channel strength such that "strong users" refer to the users with the largest channel vector norm respectively in group 1 to G. most popular MISO NOMA strategies employ either G = 1 [17]- [20] or G = K/2 [23]- [28] but we here keep here the scenario general for any value of 1 ≤ G < K. The general architecture of MISO NOMA is illustrated in Fig. 3. The twouser building block in Section II can be viewed as a particular setup with K = 2 and G = 1.
At the transmitter, the messages W 1 to W K intended for user-1 to user-K, respectively, are encoded into s 1 to s K . However, some of the messages in each group have to be encoded using codebooks shared by a subset of the users in that group so that they can be decoded and cancelled from the received signals at the co-scheduled users in that group. In particular, taking group 1 as an example, W 2 to W g are encoded using codebooks shared with user-1 such that user-1 can decode all of these g − 1 messages. After encoding, the K streams are linearly precoded by precoders 13 p 1 to p K , where p k ∈ C M is the precoder of s k , and superposed at the transmitter. The resulting transmit signal is Defining s = [s 1 , ..., s K ] T and assuming that E[ss H ] = I, the average transmit power constraint is written as At the receiver side, the signal received at user-k is where h k is the channel vector 14 of each user-k perfectly known at the transmitter and that user, and n k ∼ CN (0, 1) is the AWGN. By employing SIC, user-j in group i (i.e., j ∈ K i ) decodes the messages of users-{k | k ≥ j, k ∈ K i } within the same user group in a descending order of the user index while treating the interference from users in different groups as noise. Under the assumption of Gaussian signalling and perfect SIC, the rate at user-j, j ∈ K i , to decode the message of user-k, k ≥ j, k ∈ K i , is given by where are the intra-group interference and inter-group interference received at user-k, respectively. As the message of user-k, k ∈ K i , has to be decoded by users-{j|j ≤ k, j ∈ K i }, to ensure decodability, the rate of user-k should not exceed In the next subsection, we study the sum multiplexing gain and the MMF multiplexing gain of K-user MISO NOMA. 13 Further constraints can be imposed on the precoder design such that the same precoder is used for all users in the same group. This constraint would however further reduce the optimization space and therefore the rate performance. 14 The rank of the matrix h 1 . . . h K is assumed equal to min{M, K} for simplicity. Note that this condition is met in practice. Fig. 3: K-user system architecture with MISO NOMA (containing G user groups and g users within each group).

B. Multiplexing Gains
The following proposition provides the sum multiplexing gain of MISO NOMA for perfect CSIT.
Proposition 1: The sum multiplexing gain of K-user MISO NOMA with M transmit antennas, G groups of g = K/G users, and perfect CSIT is d Proof: The proof is obtained by showing that an upper bound on the sum multiplexing gain is achievable. The upper bound is obtained by applying the MAC argument (used in (5)) to the strong user in each group and noticing that the sum-rate in groups 1 to G is upper bounded as . . .
Note that the left-hand sides of (20), (21), and (22) refer to the sum of the rates of the messages in group 1, 2, and G, respectively, but can also be viewed as the total rate to be decoded by user 1, g + 1, and K − g + 1 (since those users decode all the messages in their respective group). We now notice that the right-hand sides of (20), (21), and (22) scale as log 2 (P )+δ for large P (following the same argument as in the two-user case). This implies that each group i achieves at most a (group) sum multiplexing gain d , at most one interference-free stream can be transmitted to each group. Summing up all inequalities, we obtain in the limit of large P that which shows that d To this end, it is indeed sufficient to perform ZFBF and transmit min (M, G) interference-free streams to min (M, G) of the G "strong users". Combining the upper bound and achievability leads to the conclusion that d ✷ The following result derives the MMF multiplexing gain of MISO NOMA with perfect CSIT.
Proposition 2: The MMF multiplexing gain of K-user MISO NOMA with M transmit antennas, G groups of g = K/G users and perfect CSIT is For G = 1, i.e., g = K, d The MMF multiplexing gain is always upperbounded by ignoring the inter-group interference, i.e., the G groups are non-interfering. Following again the MAC argument, the sum multiplexing gain of one in each group can then be further split equally amongst the g users, which leads to an upper bound on the MMF multiplexing gain of 1 g . Achievability is simply obtained by designing the precoders using ZFBF to eliminate all inter-group interference, and allocating the power similarly to Subsection II-C, i.e., consider group 1 for simplicity, and allocate the power to user k = 1, . . . , g as O(P k/g ), which leads to an SINR for user-k scaling as O(P 1/g ) and to an achievable MMF multiplexing gain of 1 g . For G = 1, one can simply allocate the power to user k = 1, . . . , K as O(P k/K ), which leads to an achievable MMF multiplexing gain of 1 K . Let us now consider M < K −g +1. Take M = K −g (any smaller M cannot improve the multiplexing gain). Precoder p k of any user-k can be made orthogonal to the channel of K − g − 1 co-scheduled users and will therefore cause interference to at least one user in another group. As a result, the MMF multiplexing gain collapses to 0. ✷ Remark 3: For the MMF multiplexing gain analysis, it should be noted that we consider one-shot transmission schemes with no time-sharing between strategies. This is suitable for systems with rigid scheduling and/or tight latency constraints, and also allows for simpler designs. This assumption is also commonly used in the NOMA literature [31], [32], [36], [48], [49].

IV. K -USER MISO NOMA WITH IMPERFECT CSIT
We now go one step further and extend the multiplexing gain analysis to the imperfect CSIT setting. The results in this section therefore generalize the results in the previous section (with perfect CSIT being a particular case of imperfect CSIT). In this section, the achievable rates are defined in the ergodic sense in a standard Shannon theoretic fashion, and the corresponding sum and MMF mutiplexing gains are defined similarly to Subsection II-B using ergodic rates. We first introduce the CSIT error model before deriving the multiplexing gains of MISO NOMA relying on imperfect CSIT.

A. CSIT Error Model
For each user, the transmitter acquires an imperfect estimate of the channel vector h k , denoted asĥ k . The CSIT imperfection is modelled by whereh k denotes the corresponding channel estimation error at the transmitter. For compactness, For many CSIT acquisition mechanisms [56],ĥ k andh k are uncorrelated according to the orthogonality principle [57]. By further assuming thatĥ k andh k have zero means, we have Γ k =Γ k +Γ k , based on which we 1]. Note that σ 2 e,k is the normalized estimation error variance for user-k's CSIT, e.g., σ 2 e,k = 1 corresponds to no instantaneous CSIT, while σ 2 e,k = 0 represents perfect instantaneous CSIT.
For simplicity, we assume identical normalized CSIT error variances for all users, i.e., σ 2 e,k = σ 2 e for all k = 1, . . . , K. To facilitate the multiplexing gain analysis, we assume that σ 2 e scales with SNR as σ 2 e = P −α for some CSIT quality parameter α ∈ [0, ∞) [42], [43], [55], [58], [59]. This is a convenient and tractable model extensively used in the information theoretic literature that allows us to assess the performance of the system in a wide range of CSIT quality conditions. Indeed, the larger α, the faster the CSIT error decreases with the SNR. The two extreme cases, α = 0 and α = ∞, correspond to no or constant CSIT (i.e. that does not scale or improve with SNR) and perfect CSIT, respectively. As far as the multiplexing gain analysis is concerned, however, we may truncate the CSIT quality parameter as α ∈ [0, 1], where α = 1 amounts to perfect CSIT in the multiplexing gain sense. The regime α ∈ (0, 1) corresponds to partial CSIT, resulting from imperfect CSI acquisition. The CSIT quality α can be interpreted in many different ways, but a plausible interpretation of α is related to the number of feedback bits, where α = 0 corresponds to a fixed number of feedback bits for all SNRs, α = ∞ corresponds to an infinite number of feedback bits, and 0 < α < ∞ reflects how quickly the number of feedback bits increases with the SNR. As a reference, a system like 4G and 5G use α = 0 when limited feedback (or codebook-based feedback) is used to report the CSI, since the number of feedback bits is constant and does not scale with SNR, e.g. 4 bits of CSI feedback in 4G LTE for M = 4.

B. Multiplexing Gains
The following result quantifies the sum multiplexing gain of MISO NOMA for imperfect CSIT.
Proposition 3: The sum multiplexing gain of K-user MISO NOMA with M transmit antennas, G groups of g = K/G users, and CSIT quality 0 ≤ α ≤ 1 is d Proof: Similarly to the proof of Proposition 1, let us look at the G strong users, since they are the ones who have to decode all messages. We recall that d reflects the multiplexing gain of the total rate to be decoded by the strong user ig − g + 1 in group i as a consequence of the fact that this user decodes all g messages in group i. Making use of the results of MU-LP in the G-user MISO BC with imperfect CSIT [43] 15 , we obtain d The achievability part shows that d . It is indeed sufficient to perform ZFBF and transmit min (M, G) streams, each at a power level of P α / min (M, G), to min (M, G) of the G "strong users". If min (M, G) α < 1, one can simply transmit a single stream (i.e., perform OMA) and reap a sum multiplexing gain of 1. Combining the upper bound and achievability leads to the conclusion that we have d For α = 1 (perfect CSIT from a multiplexing gain perspective), Proposition 3 boils down to the perfect CSIT result in Proposition 1.
The following proposition provides the MMF multiplexing gain of MISO NOMA with imperfect CSIT.
Proposition 4: The MMF multiplexing gain of K-user MISO NOMA with M transmit antennas, G groups of g = K/G users, and CSIT quality 0 ≤ α ≤ 1 is The proof is relegated to Appendix A. It is interesting to note that the sensitivity of the multiplexing gain of MISO NOMA to the CSIT quality α is different for G > 1 and G = 1. Indeed the sum and MMF multiplexing gains of MISO NOMA with G > 1 decay as α decreases, while the multiplexing gains of MISO NOMA with G = 1 are not affected by α. This can be interpreted in two different ways. On the one hand, this suggests that MISO NOMA G = 1 is inherently robust to CSIT imperfections since the multiplexing gains are not affected by α < 1. On the other hand, this also reveals that MISO NOMA with G = 1 is unable to exploit the presence of CSIT since its multiplexing gains are the same as in the absence of CSIT (α = 0).
V. BASELINE SCHEME I: 15 See also Proposition 7. Fig. 4: K-user system architecture with MU-LP. Receiver architecture is illustrated for user-k though the same applies to other users, i.e. all K users are equipped with a decoder that maps the received signal into an estimated message by treating residual interference as noise.

CONVENTIONAL MULTI-USER LINEAR PRECODING
The first baseline to assess the performance of multi-antenna NOMA is conventional Multi-User Linear Precoding. In the sequel, we recall the multiplexing gains achieved by MU-LP.

A. MU-LP System Model
Following Subsection III-A, we consider a K-user MISO BC with one transmitter equipped with M transmit antennas and K single-antenna users. As per Fig. 4, the messages W 1 , . . . , W K respectively for user-1 to user-K are independently encoded into s 1 to s K , which are then mapped to the transmit antennas through the precoders p 1 , . . . , p K . The resulting transmit signal is x = K k=1 p k s k . The signal received at user-k is y k = h H k x + n k with n k ∼ CN (0, 1). Each user-k directly decodes the intended message W k by treating the interference from other users as noise. Under the assumption of Gaussian signalling, the rate of userk for k ∈ K is given by The sum-rate of MU-LP is therefore R k , and the MMF rate of MU-LP is given as R

B. Multiplexing Gains with Perfect CSIT
We recall the sum multiplexing gain and the MMF multiplexing gain of MU-LP with perfect CSIT from [54] and [51], respectively.
Proposition 5: The sum multiplexing gain of K-user MU-LP with M transmit antennas and perfect CSIT is d This result 16 is simply achieved by choosing the MU-LP precoders based on ZFBF and transmitting min (M, K) interference-free streams. Note that min (M, K) is also the optimal 17 sum multiplexing gain of the K-user MISO BC 18 [54]. In other words, d The MMF multiplexing gain of the K-user MU-LP with M transmit antennas and perfect CSIT is When M ≥ K, ZFBF can be used to fully eliminate interference. On the other hand, for M < K interference cannot be eliminated anymore and d (M) mmf collapses, therefore leading to a rate saturation at high SNR.

C. Multiplexing Gains with Imperfect CSIT
We use the CSIT error model introduced in Subsection IV-A. We recall the sum multiplexing gain and the MMF multiplexing gain of MU-LP with imperfect CSIT from [43] and [44], [60], respectively.
Proposition 7: The sum multiplexing gain of the K-user MU-LP with M transmit antennas and CSIT quality 0 This result is simply achieved by choosing the MU-LP precoders based on ZFBF and transmitting min (M, K) streams, each with power level P α / min (M, K). This enables each stream to reap a multiplexing gain of α and therefore a sum multiplexing gain of min (M, K) α. If min (M, K) α < 1, one can simply transmit a single stream (i.e., perform OMA) and reap a sum multiplexing gain of 1.
Comparing Propositions 5 and 7, we note that imperfect CSIT leads to a reduction of the sum multiplexing gain. For α = 1 (perfect CSIT in a multiplexing gain sense), Proposition 7 matches Proposition 5. Importantly, in contrast to the Kuser MISO BC with perfect CSIT setting where MU-LP achieves the information theoretic optimal sum multiplexing gain d s , in the imperfect CSIT setting, MU-LP does not achieve the information theoretic optimal sum multiplexing gain [43], [58].
Proposition 8: The MMF multiplexing gain of the K-user MU-LP with M transmit antennas and CSIT quality 0 ≤ α ≤ 1 is This is achieved by performing ZFBF when M ≥ K. When M < K, rate saturation occurs (similarly to the perfect CSIT setting).

VI. BASELINE SCHEME II: RATE-SPLITTING
The second baseline to assess multi-antenna NOMA performance is multi-antenna Rate-Splitting (RS) and Rate-Splitting Multiple Access (RSMA) for the multi-antenna BC [41]- [47]. This literature leverages and extends the concept of RS, 17 This is easily proved by showing that an upper bound on the sum multiplexing gain is equal to min (M, K), which is the same as the lower bound achieved by MU-LP. The upper bound is obtained by noticing that enabling full cooperation among receivers does not decrease the sum multiplexing gain and leads to an effective point-to-point MIMO channel with M transmit and K receive antennas, which has a sum multiplexing gain of min (M, K). 18 More generally, in MIMO BC, d Receiver architecture is illustrated for user-k though the same applies to other users.
originally developed in [61] for the two-user single-antenna interference channel, to design multi-antenna non-orthogonal transmission strategies for the multi-antenna BC.

A. Rate-Splitting System Model
We consider again a MISO BC consisting of one transmitter with M antennas and K single-antenna users. As per Fig.  5, the architecture relies on rate-splitting of messages W 1 to W K intended for user-1 to user-K, respectively. To that end, message W k of user-k is split into a common part W c,k and a private part W p,k . The common parts W c,1 , . . . , W c,K of all users are combined into the common message W c , which is encoded into the common stream s c using a codebook shared by all users. Hence, s c is a common stream required to be decoded by all users, and contains parts of messages W 1 to W K intended for user-1 to user-K, respectively. The private parts W p,1 , . . . , W p,K , respectively containing the remaining parts of messages W 1 to W K , are independently encoded into the private stream s 1 for user-1 to s K for user-K. Out of the K messages, K + 1 streams s c , s 1 , . . . , s K are therefore created. The streams are linearly precoded such that the transmit signal is given by Defining s = [s c , s 1 , . . . , s K ] T and assuming that E[ss H ] = I, the average transmit power constraint is written as At each user-k, the common stream s c is first decoded into W c by treating the interference from the private streams as noise. Using SIC, W c is re-encoded, precoded, and subtracted from the received signal, such that user-k can decode its private stream s k into W p,k by treating the remaining interference from the other private stream as noise. User-k reconstructs the original message by extracting W c,k from W c , and combining W c,k with W p,k into W k . Assuming proper Gaussian signalling, the rate of the common stream is given by Assuming perfect SIC, the rates of the private streams are obtained as The rate of user-k is given by R k +R c,k where R c,k is the rate of the common part of the kth user's message, i.e., W c,k , and it satisfies K k=1 R c,k = R c . The sum-rate is therefore simply written as R The above RS architecture is called 1-layer RS since it only relies on a single common stream and a single layer of SIC at each user as illustrated in Fig. 5.

B. Multiplexing Gains with Perfect CSIT
We here summarize the sum and MMF multiplexing gains achieved by 1-layer RS with perfect CSIT.
Proposition 9: The sum multiplexing gain of the K-user 1-layer RS with M transmit antennas and perfect CSIT is d 19 , it is sufficient 20 to design the private precoders using ZFBF and allocate zero power to the common stream at high SNR. Note that d ✷ Proposition 10: The MMF multiplexing gain of the K-user 1-layer RS with M transmit antennas and perfect CSIT is The MMF multiplexing gain of 1-layer RS was derived and proved in [51] 21 , under the same assumption as in Remark 3. Readers are referred to [51] for more details of the proof of Proposition 10.

C. Multiplexing Gains with Imperfect CSIT
Again, we use the CSIT error model introduced in Subsection IV-A. We recall the sum multiplexing gain of RS with imperfect CSIT from [43].
Proposition 11: The sum multiplexing gain of K-user 1layer RS with M transmit antennas and CSIT quality 0 in Proposition 11 is obtained by using random precoding to design p c with power level P c = O(P ), transmitting min(M, K) private streams and using ZFBF to design the precoders of those min(M, K) private streams, 19 By allocating no power to the common stream, 1-layer RS boils down to MU-LP. 20 More complicated precoders for both the common and private streams can be used to enhance the rate performance, but the multiplexing gain will not improve. 21 The MMF multiplexing gain derived in [51] considers a more complex scenario involving the simultaneous transmission of distinct messages to multiple multicast groups (each message is intended for a group of users), known as multigroup multicasting. By considering the special case where there is a single user per group, we obtain the MMF multiplexing gain of 1-layer RS in this section. each with power level P k = O(P α ). From the SINR expressions at the right-hand side of (31), it follows that the received SINR of the common stream at each user scales as O(P 1−α ), leading to the multiplexing gain of 1 − α achieved by the common stream s c . By performing ZFBF, the transmitter transmits min (M, K) interference-free private streams. The received SINR of each private stream scales as O(P α ) leading to multiplexing gain α. Hence, we obtain the sum multiplexing gain of 1 + (min (M, K) − 1)α.
Importantly, for the underloaded regime M ≥ K, 1-layer RS achieves the information theoretic optimal sum multiplexing gain d in the imperfect CSIT setting [43], [58]. Hence, 1-layer RS attains the optimal sum multiplexing gain in both perfect CSIT and imperfect CSIT (underloaded regime). Actually, for M ≥ K, 1-layer RS is optimal, achieving the maximum multiplexing gain region of the underloaded K-user MISO BC 22 with imperfect CSIT [62], [63].
This optimality of RS (including 1-layer RS), shown through multiplexing gain analysis, is very significant since it implies that one cannot find any other scheme achieving a better multiplexing gain region in multi-antenna BC. As a consequence of this optimality, MU-LP and multi-antenna NOMA will always incur a multiplexing gain loss or at best will achieve the same multiplexing gain as RS for both perfect and imperfect CSIT.
The MMF multiplexing gain of 1-layer RS with imperfect CSIT was derived in [60] (by considering the specific case where there is a single user per group), under the same assumption as in Remark 3. Readers are referred to [60] for more details of the proof of Proposition 12.
This highlights that when M < K, the CSIT quality α can be reduced to 1 1+K−M without impacting the MMF multiplexing gain of 1-layer RS.
Following our discussion of Proposition 11, we know that when M ≥ K, the respective multiplexing gains of the common and each private streams are 1 − α and α. The MMF multiplexing gain when M ≥ K is achieved by evenly sharing the common stream among users, which is the sum of evenly allocated multiplexing gain of the common stream 1−α K and the multiplexing gain of one private stream α, yielding 1+(K−1)α K . When M < K, the achievability is obtained by partitioning users into two subsets K 1 and K 2 with set size of |K 1 | = M and |K 2 | = K − M . Users in K 1 are served via the common and private streams while users in K 2 are served using the common stream only. Random precoding and ZFBF are respectively used for the common stream and the private streams 22 The optimality of RS is not limited to MISO BC but also extends to MIMO BC. Indeed, a more complicated form of RS is multiplexing gain region-optimal for the two-user MIMO BC with imperfect CSIT in the general case of an asymmetric number of receive antennas [64], [65].
with power allocation P c = O(P ) and P k = O(P β ), ∀k ∈ K 1 . It may be readily shown that the respective multiplexing gains of the common stream and each private stream are given by 1 − β and min{α, β}, respectively. By further introducing a fraction z ∈ [0, 1] to specify the fraction of the rate of the common stream allocated to the users in the two subsets, we obtain that the respective sum multiplexing gains of the common stream for the users in K 1 and K 2 are z(1 − β) and (1 − z)(1 − β), respectively. By equally dividing the multiplexing gain of the common stream between the users in the two subsets, the multiplexing gain of each user in , and the multiplexing gain of each user in K 1 is d k,1 = min{α, β} + z(1−β) M . The MMF multiplexing gain among the users is max z min{d k,1 , d k,2 }. When β = α, the optimal rate allocation factor z ⋆ is obtained when (1−α)K and the optimal MMF multiplexing gain is When β < α and z = 0, the optimal power allocation β ⋆ is obtained when 1−β K−M = β. We have β ⋆ =  Table II. The objective of this section is to identify under which conditions NOMA provides performance gains/losses over the two baselines. We then use these comparisons to reveal several misconceptions and shortcomings of multi-antenna NOMA.

A. NOMA vs. Baseline I (MU-LP)
We show in the following corollaries that MISO NOMA can achieve a performance gain over MU-LP but it may also incur a performance loss, depending on the values of M , K, G, and α.
The performance (expressed in terms of multiplexing gain) gain/loss of multi-antenna NOMA vs. MU-LP is obtained by comparing Propositions 3 and 7 (for sum multiplexing gain), and Propositions 4 and 8 (for MMF multiplexing gain), which are summarized in Corollary 1, and 2 (G = 1), and 3 (G > 1), respectively. For the MMF multiplexing gain with imperfect CSIT, we consider G = 1 and G > 1 in two different corollaries.
Corollary 1: The sum multiplexing gain comparison between MISO NOMA and MU-LP is summarized in (35). MISO NOMA never achieves a sum multiplexing gain higher than MU-LP. Corollary 1 shows that MISO NOMA can achieve a lower or the same sum multiplexing gain compared to MU-LP, but cannot outperform MU-LP.
If α = 1 (perfect CSIT), Corollary 1 boils down to d whenever M ≤ G. This is instrumental as it says that the slope of the sum-rate of MISO NOMA at high SNR will be strictly lower than that of MU-LP (i.e., the sum-rate of MISO NOMA will grow more slowly than that of MU-LP) whenever the number of transmit antennas is larger than the number of groups, and hence in this case, MU-LP is guaranteed to outperform MISO NOMA at high SNR. Consequently, in the massive MIMO regime where M grows large, MISO NOMA would achieve a sum multiplexing gain strictly lower than MU-LP (and the role of NOMA in massive MIMO is therefore questionable as highlighted in [66]). If G = 1 as in e.g., [17]- [20], MISO NOMA always incurs a sum multiplexing gain loss compared to MU-LP irrespective of M (except in single-antenna systems when M = 1). In other words, from a sum multiplexing gain perspective, one cannot find any multi-antenna configuration at the transmitter, i.e., any value of M , that would motivate the use MISO NOMA with G = 1 compared to MU-LP. If G = K/2 as in [23]- [27], MISO NOMA incurs a sum multiplexing gain loss compared to MU-LP whenever M > K/2. In other words, from a sum multiplexing gain perspective, the only multi-antenna deployments for which MISO NOMA with G = K/2 would not incur a multiplexing gain loss (but no improvement either) over MU-LP is when M ≤ K/2.
If α < 1 (imperfect CSIT), a sum multiplexing gain loss of MISO NOMA over MU-LP occurs in two different scenarios: 1) medium CSIT quality setting with Corollaries 2 and 3 show that MISO NOMA can achieve either a higher or a lower MMF multiplexing gain compared to MU-LP, depending on the values of M , G, K, and α.
If α = 1 (perfect CSIT), with G = 1 as in e.g., [17]- [20], d If α < 1 (imperfect CSIT), we note from Corollary 3, that for G > 1, CSIT quality α does not affect the operational regimes where MISO NOMA outperforms/incurs a loss compared to MU-LP. This is different from G = 1 where the condition for d mmf is a function of α in Corollary 2. MISO NOMA incurs an MMF multiplexing loss whenever the number of antenna and the CSIT quality are sufficiently large, i.e., M ≥ K and α > 1 K .

B. NOMA vs. Baseline II (RS)
We show in the following corollaries that, for all M , K, α, 1-layer RS (that relies on a single SIC at each user) achieves the same or higher (sum and MMF) multiplexing gains than the best of the MISO NOMA schemes (i.e., whatever G and the number of SICs). In other words, 1-layer RS outperforms (multiplexing gain-wise) MISO NOMA and simultaneously requires fewer SICs (only one) than MISO NOMA. Hence, employing MISO NOMA over 1-layer RS can only cause a multiplexing gain loss and/or a complexity increase at the receiver.
The performance loss of MISO NOMA vs. RS is obtained by comparing Propositions 3 and 11 (for the sum multiplexing gain), and Propositions 4 and 12 (for the MMF multiplexing gain), and is summarized in Corollaries 4, and 5 (G = 1), and 6 (G > 1), respectively.
Corollary 4: The sum multiplexing gain comparison between MISO NOMA and 1-layer RS is summarized as follows MISO NOMA with G = 1 never achieves an MMF multiplexing gain higher than 1-layer RS. Corollary 6: The MMF multiplexing gain comparison between MISO NOMA with G > 1 and 1-layer RS is summarized in (40). MISO NOMA with G > 1 never achieves an MMF multiplexing gain larger than 1-layer RS.
If α = 1 (perfect CSIT), Corollaries 5 and 6 simply boil down to d Recalling again from [62]- [65] that RS achieves the optimal multiplexing gain region in multi-antenna BC with imperfect CSIT, and multi-antenna NOMA (and MU-LP/MU-MIMO) will therefore always incur a multiplexing gain loss compared to RS.

C. Misconceptions of Multi-Antenna NOMA
The comparisons with the MU-LP and 1-layer RS baselines reveal that depending on the particular setting NOMA may incur a multiplexing gain loss at the additional expense of an increased receiver complexity, as detailed in the following.
First, NOMA is an inefficient strategy to exploit the spatial dimensions. This issue could already be observed from the two-user MISO case with perfect CSIT, where NOMA limits the sum multiplexing gain to one, same as OMA, which is only half of the sum multiplexing gain obtained with MU-LP. Moreover, even when considering a fair metric such as MMF, NOMA limits the MMF multiplexing gain to 1 2 , which is again only half of the MMF multiplexing gain obtained by MU-LP.
In the general K-user case, it is clear from Corollaries 1 and 4 that NOMA incurs a loss in sum multiplexing gain in most scenarios, and the best NOMA can achieve is the same sum multiplexing gain as the baselines in some specific configurations. NOMA with G = 1 achieves d Considering the MMF multiplexing gain of the general K-user case, the situation appears to be better for NOMA. Assuming α = 1, from Corollaries 2 and 3, we observe that NOMA incurs a loss compared to MU-LP in the underloaded regime M ≥ K but outperforms MU-LP in the overloaded regime. In particular, NOMA with G = 1 achieves a higher MMF multiplexing gain than NOMA wtih G = K/2 and MU-LP whenever M < K − 1. Hence, though the receiver complexity increases of NOMA does not pay off in the underloaded regime, it appears to pay off in the overloaded regime (since G = 1 with more SICs outperforms G = K/2 with fewer SICs). Nevertheless, the MMF multiplexing gain of NOMA with G = 1 is independent of M , suggesting again that the spatial dimensions are not properly exploited. This can indeed be seen from Corollary 5 where NOMA is consistently outperformed by 1-layer RS, i.e., the increase in MMF multiplexing gain attained by NOMA (G = 1) over MU-LP is actually marginal in light of the complexity increase, and is much lower than what can be achieved by 1-layer RS with just a single SIC operation. In other words, while NOMA has some merits over MU-LP in the overloaded regime, NOMA makes an inefficient use of the multiple antennas, and fails to boost the MMF multiplexing gain compared to the 1-layer RS baseline.
We note that the above observations hold for both the perfect and imperfect CSIT settings. Nevertheless, it is interesting to stress that the sensitivity to the CSIT quality α differs largely between MU-LP, NOMA with G > 1, NOMA with G = 1, and 1-layer RS. Indeed the sum and MMF multiplexing gains of MU-LP, NOMA with G > 1, and 1-layer RS decay as α decreases, while the multiplexing gains of NOMA with G = 1 are not affected by α. This can be interpreted in two different ways. On the one hand, this implies that NOMA with G = 1 is inherently robust to CSIT imperfections since the multiplexing gains are unchanged. On the other hand, this means that NOMA with G = 1 is unable to exploit the available CSIT since the resulting multiplexing gain is the same as in the absence of CSIT (α = 0). One can indeed see from the above Propositions and Corollaries that the sum and MMF multiplexing gains for 1-layer RS with imperfect CSIT are clearly larger than those of MU-LP and NOMA. In other words, NOMA and MU-LP are inefficient in fully exploiting the available CSIT in multi-antenna settings.
We conclude from the theoretical results and above discussions that NOMA fails to efficiently exploit the multiplexing gain of the multi-antenna BC and is an inefficient strategy to exploit the spatial dimensions and the available CSIT, especially compared to the 1-layer RS baseline. The first misconception behind NOMA is to believe that because NOMA is capacity achieving in the single-antenna BC, NOMA is an efficient strategy for multi-antenna settings. As a consequence, the single-antenna NOMA principle has been applied to multiantenna settings without recognizing that such a strategy would waste the primary benefit of using multiple antennas, namely the capability of transmitting multiple interferencefree streams. In contrast to NOMA, other non-orthogonal transmission strategies such as 1-layer RS do not lead to any sum multiplexing gain loss. On the contrary, 1-layer RS achieves the information theoretic optimal sum multiplexing gain in both perfect and imperfect CSIT scenarios (and therefore has the capability of transmitting the optimal number of interference-free streams). 1-layer RS also achieves higher MMF multiplexing gains than NOMA and MU-LP.
Second, the multiplexing gain loss of NOMA is encountered despite the increased receiver complexity. In the two-user MISO BC with perfect CSIT, MU-LP does not require any SIC receiver to achieve the optimal sum multiplexing gain of two (assuming M > 1) and a MMF multiplexing gain of one, while NOMA requires one SIC and only provides half the (sum and MMF) multiplexing gains of MU-LP. This is surprising since one would expect a performance gain from an increased architecture complexity. Here instead, NOMA brings together a complexity increase at the receivers and a (sum and MMF) multiplexing gain loss compared to MU-LP, therefore highlighting that the SIC receiver is inefficiently exploited.
This inefficient use of SIC in NOMA also persists in the general K-user scenario. Recall that NOMA with G groups requires g − 1 layers of SIC at the receivers. Among the two popular NOMA architectures G = 1 and G = K/2, the former requires an even higher number of SIC layers than the latter (namely K − 1 for G = 1 and 1 for G = K/2) and has an even lower sum multiplexing gain (d This highlights the inefficient (and detrimental) use of SIC receivers in NOMA: the higher the number of SICs, the lower the sum multiplexing gain! Comparing to the 1-layer RS baseline further highlights the inefficient use of SIC in NOMA. We note that 1-layer RS causes a complexity increase at the receivers (due to the one SIC needed) but also an increase in the (sum and MMF) multiplexing gains compared to MU-LP (i.e., it is easy to see from Propositions 7, 8, 11, and 12 that the sum and MMF multiplexing gains with RS are always either identical to or higher than those with MU-LP). Hence, in contrast to NOMA, the SIC in 1-layer RS is beneficial since it boosts the (sum and MMF) multiplexing gains and therefore introduces a performance gain compared to (or at least maintains the same performance as) MU-LP. Actually, 1-layer RS achieves the information theoretic optimal sum multiplexing gain for imperfect CSIT, and does so with a single SIC per user. This shows that to achieve the information theoretic optimality, it is sufficient to use a single SIC per user 23 . This is in contrast to NOMA whose sum multiplexing gain is far from optimal and for which the sum multiplexing gain decreases as the number of SICs increases. The inefficient use of SIC in NOMA is also obvious from the MMF multiplexing gain. Indeed, from Proposition 2 and 10 and Corollary 5, the single SIC in 1-layer RS achieves a much larger MMF multiplexing gain than the K − 1 layers of SIC needed for NOMA with G = 1. This again illustrates how inefficient the use of SIC in NOMA often is. It also shows that there exist non-orthogonal (RSbased) transmission strategies with better performance and lower receiver complexity requiring just a single SIC per user.
We conclude from the theoretical analysis and above discussion that NOMA often does not make efficient use of the SIC receivers compared to the considered baselines. The second misconception regarding multi-antenna NOMA is to believe that adopting SIC receivers always boosts the rate since the interference is fully cancelled at the receiver. Considering the two-user toy example, and comparing (2) and (14), the interference power term h H 1 p 2 2 appearing in the SINR of user-1 in the MU-LP rate has indeed disappeared in NOMA thanks to the SIC receiver, such that R M,1 ≤ R N,1 . However, this comes at the cost of a reduced rate for user-2 since R N,2 = min (log 2 (1 + A) , R M,2 ) ≤ R M,2 . In other words, for a given pair of precoders p 1 and p 2 , NOMA increases the rate (or maintains the same rate) of user-1 but decreases the rate (or maintains the same rate) of user-2 compared to MU-LP. Third, reflecting on the above two misconceptions, the NOMA design philosophy does not leverage the extensive research in multi-user MIMO, which has been fundamental to 4G and 5G in achieving the optimal sum multiplexing gain of multi-antenna BC with perfect CSIT and low-complexity transmitter and receiver architectures. The third misconception behind multi-antenna NOMA is to believe that, since NOMA is routinely compared to OMA in SISO BC, it is also sufficient to compare NOMA to OMA in multi-antenna settings to demonstrate its merits. In fact, the Corollaries in Sections VII-A and VII-B show that NOMA is far from being an efficient strategy if NOMA is compared to alternative baselines. Unfortunately, simply comparing with OMA has led the NOMA literature to the misleading conclusion that multiantenna NOMA is an efficient strategy. It should therefore be stressed that comparing NOMA to OMA does not demonstrate the merits of NOMA in multi-antenna settings and most importantly, the baseline for any multi-antenna NOMA design, optimization, and evaluation should be MU-LP and RS, not simply OMA 24 ! In contrast to MISO NOMA, the gain of 1layer RS over MU-LP is guaranteed, i.e., the rate of 1-layer RS is equal to or higher than that of MU-LP, since MU-LP is a particular instance of RS when no power is allocated to the common stream.
Fourth, the SISO BC is naturally overloaded (more users than the number of transmit antennas, namely one), and NOMA was therefore concluded to be suitable for overloaded scenarios. The fourth misconception behind multi-antenna NOMA is to believe that MISO NOMA is an efficient strategy for overloaded regimes, namely whenever K > M . The Corollaries in Subsections VII-A and VII-B nevertheless expose that this is incorrect. It is clear that NOMA incurs a sum multiplexing gain erosion compared to MU-LP and 1-layer RS whenever M > G. Such a loss can occur also in the overloaded regime, namely whenever we have K > M > G. Moreover, NOMA incurs an MMF multiplexing gain loss compared to 1-layer RS whenever M = K −g +1. Here again, such a loss occurs also in the overloaded regime. In contrast to NOMA (and MU-LP), 1-layer RS is an efficient strategy for both the underloaded and overloaded regimes. Though NOMA with G = 1 was shown in Proposition 2 to achieve a nonvanishing MMF multiplexing gain of 1/K in the overloaded regime, this MMF multiplexing gain is considerably smaller than that of 1-layer RS, therefore highlighting the inefficiency of NOMA in the overloaded regime. In particular, we note that the MMF multiplexing gain of 1-layer RS increases with M in contrast to that of NOMA with G = 1 which is constant regardless of M .

D. Illustration of the Misconceptions with an Example
To illustrate the above discussion and make the statements more explicit based on numbers, we consider a MISO BC with K = 6, and compare in Table III the     (that can be as high as 6). 1-layer RS achieves the same (and optimal) sum multiplexing gain as MU-LP. Table IV highlights the MMF multiplexing gains of NOMA, MU-LP, and 1-layer RS for K = 6 with perfect CSIT and stresses the significant benefit of 1-layer RS over NOMA and MU-LP. The entries highlighted in red relate to configurations for which 1-layer RS provides a multiplexing gain strictly higher than that of NOMA and MU-LP. Recall that 1-layer RS provides those multiplexing gains over NOMA and MU-LP with a single SIC per user! In Fig. 6, we further illustrate the tradeoff between the multiplexing gains and the number of SIC layers for M = 4, K = 6 and perfect CSIT. We observe that 1-layer RS enables higher performance and lower receiver complexity compared to NOMA, stressing that the non-orthogonal transmission enabled by RS is much more efficient than NOMA. We see that NOMA with different G is suited for very different settings in this M = 4, K = 6 configuration, namely NOMA with G = 3 performs better in terms of sum multiplexing gain, whereas NOMA with G = 1 achieves a higher MMF multiplexing gain. The baseline 1-layer RS achieves a higher performance  for both metrics and entails a lower receiver complexity. Though the above example was provided for perfect CSIT (α = 1), it is easy to calculate from the above propositions the multiplexing gains for the imperfect CSIT setting for a given CSIT quality α. For imperfect CSIT, the strict superiority of 1-layer RS over MU-LP and NOMA will become much more apparent, as illustrated in Fig. 7 for α = 0.5.

E. Shortcomings of Multi-Antenna NOMA
The previous subsections have highlighted that comparing multi-antenna NOMA to MU-LP and 1-layer RS, instead of OMA, provides a completely different picture of the actual merits of multi-antenna NOMA. In view of the previous results highlighting the waste of multiplexing gain and the inefficient use of the SIC receivers by multi-antenna NOMA, we can ask ourselves multiple questions, which help to pinpoint the shortcomings and limitations of the multi-antenna NOMA design philosophy.
The first question is "What prevents multi-antenna NOMA from reaping the multiplexing gain of the system?" The answer lies in (5), and similarly in (20), (21), and (22). Equation (5) can be interpreted as the sum-rate of a two-user MAC with a single antenna receiver. Indeed, in (5), user-1 acts as the receiver of a two-user MAC whose effective SISO channels of both links are given by h H 1 p 2 and h H 1 p 1 . Similarly, in (20), user-1 acts as the receiver of a g-user MAC whose effective SISO channels of the g links are given by h H 1 p k for k = 1, . . . , g. Such a MAC is well known to have a sum multiplexing gain of one [7], [15]. The multiplexing gain losses compared to the MU-LP and 1-layer RS baselines therefore come from forcing one user to fully decode all streams in a group, i.e., its intended stream and the co-scheduled streams in the group. This is radically different from MU-LP where streams are encoded independently and each receiver decodes its intended stream treating any residual interference as noise. By contrast, in 1-layer RS, no user is forced to fully decode the co-scheduled streams since all private streams are encoded independently and each receiver decodes its intended private stream treating any residual interference from other private streams as noise.
The second question is "Does an increase in the number of SICs always come with a reduction in the sum multiplexing gain?" The answer is clearly no. This anomaly is deeply rooted in the way MISO NOMA was developed by applying singleantenna NOMA principle to multi-antenna settings. The proof of Proposition 1 indeed tells us that the fundamental principle of NOMA consisting in forcing one user in each group to fully decode the messages of g − 1 co-scheduled users is an inefficient design in multi-antenna settings that leads to a sum multiplexing gain reduction in each group.
The third question is "Are non-orthogonal transmission strategies inefficient for multi-antenna settings?" The answer is no. As we have seen, there exist frameworks of nonorthogonal transmission strategies also relying on SIC, such as RS, that do not incur the limitations of multi-antenna NOMA and make efficient use of the non-orthogonality and SIC receivers in multi-antenna settings. The key for the design of such non-orthogonal strategies is not to fall into the trap of blindly applying the SISO NOMA principle to multi-antenna settings, and therefore constraining the strategy to always fully decode the message of other users. Non-orthogonal transmission strategies and multiple access need to be re-thought for multi-antenna settings and one such strategy is based on the multi-antenna Rate-Splitting (RS) and Rate-Splitting Multiple Access (RSMA) literature for multi-antenna BC .
The fourth question is "Since NOMA and RS both rely on SIC, is there any relationship between NOMA and RS?" The answer is yes in a two-user setting, but not necessarily in the general K-user case. In the two-user case, 1-layer RS is a superset of MU-LP, NOMA and multicasting, i.e., MU-LP, NOMA and multicasting are particular instances of 1-layer RS, as shown in [67] and in Table V and Fig. 8. Indeed, MU-LP is obtained as a special case from 1-layer RS by allocating no power to the common stream (P c = 0) such that W k is encoded directly into s k . No interference is decoded at the receiver using the common message, and the interference between s 1 and s 2 is fully treated as noise. NOMA is obtained by encoding W 2 entirely into s c (i.e., W c = W 2 ) and W 1 into s 1 , and  turning off s 2 (P 2 = 0) 25 . In this way, user-1 fully decodes the interference created by the message of user-2. OMA is a substrategy of MU-LP and NOMA, which is encountered when only user-1 (with the stronger channel gain) is scheduled (P c = 0, P 2 = 0). Multicasting is obtained when both W 1 and W 2 are entirely encoded into s c . In the K-user case, 1-layer RS is a superset of MU-LP since by turning off (i.e., allocating no power to) the common stream, 1-layer RS boils down to MU-LP. On the other hand, 1-layer RS is not a superset of NOMA. 1-layer RS and NOMA are particular instances/schemes of the RSMA framework based on the generalized RS relying on multiple layers of SIC at each receiver [46], [47], [68], [79] 26 , as illustrated in Fig. 9.
The fifth question is "How does 1-layer RS achieve simultaneously higher multiplexing gains and a lower receiver complexity than NOMA?" In view of the previous sections, the key is to build non-orthogonal transmission strategies upon MU-LP (and therefore SDMA/multi-user MIMO) such that the performance benefits (including sum multiplexing gain) of MU-LP are guaranteed but extra performance (e.g., in MMF multiplexing gain) is observed by the use of SIC 25 To better relate to the system model in Section II, note that NOMA also has a common message/stream, though commonly not denoted using such terminology. Indeed, the stream of the weakest user, namely s 2 in Section II, is a common stream since it is decoded by both users. s 2 in Section II carries information, namely W 2 , intended for user-2 but is decoded by both user-1 and 2. 26 2-layer hierarchical RS (HRS) in Fig. 9 is proposed in [46] for massive MIMO. Besides one common message decoded by all users as in 1-layer RS, 2-layer HRS relies on multiple group-specific common messages being decoded by different groups of users to further manage inter-user interference. RSMA is a generalized framework that embraces both 1-layer RS and 2-layer HRS as subschemes [47]. receivers. Indeed, a performance gain over MU-LP should be expected from a more complex receiver architecture in multiantenna BC. To do so, one should enable the flexibility at the transmitter to encode messages such that parts of them can be decoded by all users using SIC while the remaining parts are decoded by their intended receivers and treated as noise by non-intended receivers. Hence, we provide the flexibility to partially decode interference and partially treat the remaining interference as noise. This contrasts with MU-LP where interference is always treated as noise, and with NOMA where interference is fully decoded. This flexibility is achieved by extending the concept of RS, originally developed in [61] for the two-user single-antenna interference channel, to the multi-antenna BC. To manage multi-user interference by partially decoding the interference and treating the remaining interference as noise, RS facilitates a complete message-tostreams mapping flexibility for each user to have part of its message transmitted in the common stream and the remaining part in one of the K private streams. By adjusting the power levels of the common and private streams, one can adjust the amount of interference caused to the private streams such that its level is weak enough to be treated as noise. This contrasts with MU-LP where the communication strategy is fundamentally constrained such that the messages are mapped to private streams only (i.e., there is no common stream, and multi-user interference between private streams is treated as noise even when its level is not weak enough to be treated as noise), and with NOMA where the constraint is that the entire message of one of the users is mapped onto a common stream (e.g., W 2 mapped to s 2 decoded by both user-1 and 2 in Section II). These constraints imposed by MU-LP and NOMA are well illustrated by the message-to-stream mapping in Table V. A consequence of the above flexibility is that by decreasing the amount of power allocated to the common stream, K-user 1-layer RS progressively converges to K-user MU-LP and in the limit where no power is allocated to the common stream, K-user 1-layer RS swiftly boils down to K-user MU-LP. Hence, 1-layer RS really builds upon MU-LP and MU-LP is a subscheme of 1-layer RS, which provides guarantee to 1-layer RS that its rate and multiplexing gains are always teh same or better than those of MU-LP. This is completely different from MISO NOMA. MISO NOMA does not build upon MU-LP. With G groups, K-user MISO NOMA can boil down to G-user MU-LP by turning off the power to the weaker users in each group, but K-user MISO NOMA can mathematically never boil down to K-user MU-LP (recall footnote 11). The rate/multiplexing gains of K-user MISO NOMA can therefore be worse than that of K-user MU-LP.
Another interpretation arises by noting that MU-LP (and other form of multi-user MIMO), as one extreme, can be viewed as a full transmit-side interference management strategy. On the other extreme, NOMA can be seen as a full receiver-side interference cancellation strategy. In between stands RS that can be viewed as a smart combination of transmit-side and receive-side interference management/cancellation strategies where the contribution of the common stream is adjusted according to the level of interference that can be canceled by the receiver.
Consequently, RS is an enabler of a general class of communication strategies and can cover a wider set of communication strategies than SDMA and NOMA, which leads to significant multiplexing gain and complexity reduction benefits.
The sixth question is "Can we use other types of receivers than SIC for NOMA and RS and would the multiplexing gains be improved?" We can indeed use other types of receivers but the multiplexing gains will not improve. Instead of using stream-by-stream SIC, we can use any other joint (Maximum Likelihood) decoder. Hence a strong user in NOMA could use a joint decoder to decode its intended stream jointly with all other streams intended to its co-scheduled users in the same group. The multiplexing gains would not improve since the strong user would still act as the receiver of an effective MAC (as discussed in relationship with (5), (20), (21), and (22) and the first question) which limits the multiplexing gains. Similarly, in 1-layer RS, each user could use a joint decoder to decode its private stream jointly with the common stream and the multiplexing gains would not improve (recall that 1layer RS already achieves the information theoretic optimal multiplexing gain region, hence any other scheme, receiver or multi-layer RS would not increase the multiplexing gains any further).

VIII. NUMERICAL RESULTS
Through numerical evaluation, we illustrate the misconceptions and the shortcomings of MISO NOMA. Moreover, we show that, by adopting 1-layer RS, the optimal sum multiplexing gain of the MISO BC is guaranteed in both underloaded and overloaded deployments for both perfect and imperfect CSIT scenarios. Furthermore, results also demonstrate that the MMF multiplexing gain (and MMF rate) is significantly enhanced when using 1-layer RS compared to MU-LP and MISO NOMA, and the complexity of the receivers is reduced compared to MISO NOMA. In other words, our evaluations show that 1-layer RS makes a more efficient use of the spatial dimensions (multiplexing gains) and of the SIC receivers than MISO NOMA, and it is more robust to CSIT inaccuracy.
The following two precoder optimization problems are solved in the simulation for the K-user MISO NOMA system model specified in Section III-A. The first problem is maxi-mizing the sum-rate of MISO NOMA subject to the transmit power constraint, which is given by is the rate of user-k in the MISO NOMA system as specified in (17)- (19). The second problem is maximizing the minimum rate subject to the transmit power constraint, which is formulated as The Weighted Minimum Mean Square Error (WMMSE) optimization framework proposed in [70] (originally developed for MU-LP) is extended to solve both problems (41) and (42). The details of the algorithm are specified in Appendix B. The optimization problems requiring interior-point methods are solved using the CVX toolbox [69]. We will assume K = 6 in the simulations, so as to be able to relate the numerical results to the theoretical results of Table II. The channel h k of user-k has i.i.d. complex Gaussian entries drawn from the distribution CN (0, σ 2 k ). The presented results are averaged over 100 channel realizations.
The following five strategies are compared and analyzed for both perfect and imperfect CSIT: the MISO NOMA strategy specified in Section III-A when G = 3. Each user requires K 3 − 1 = 1 layer of SIC (since each user is possible to be selected as the "strong user" in the corresponding user group). Ideally, the sum-rate (or max-min) rate is maximized by solving (41) (or (42)) for all possible user grouping methods and decoding orders within each group. Due to the high computational complexity of jointly optimizing the precoders, grouping, and decoding order, we assume that the user grouping is fixed 27 while the decoding order in each group i is the ascending order of users' channel strength h k , ∀k ∈ K i in the following results. To keep aligned with the system model in Section III-A, user indices are updated within each group such that h k ≤ h j , ∀k < j and k, j ∈ K i . When the CSIT is imperfect, the decoding order follows the same method but based on ĥ k , ∀k ∈ K i . Recall however that the multiplexing gain analysis is general and holds for any decoding order and any grouping method.
user"). There is no user grouping optimization issue at the transmitter since all users are assumed to be in the same user group. However, the decoding order at users should be jointly optimized with precoders in order to maximize sum-rate (or max-min rate), which however, is computationally prohibitive. Following the literature of single-cell MISO NOMA [19], [20], we assume that the decoding order is the ascending order of users' channel strength h k , ∀k ∈ K. User indices are updated such that h k ≤ h j , ∀k < j and k, j ∈ K. Similarly, the decoding order follows the same method but based on ĥ k , ∀k ∈ K when the CSIT is imperfect.
• MU-LP: MU-LP is the baseline strategy studied in Section V. Each user directly decodes the intended stream by fully treating the interference as noise. The WMMSE algorithm specified in Appendix B can be applied and extended to solve the corresponding sum-rate and maxmin problems of MU-LP [51], [70]. The transmitter and receiver complexity of MU-LP is low since there is no SIC is deployed at each user and no user grouping and decoding order optimization issue at the transmitter. • Orthogonal Multiple Access (OMA): This is the singleuser transmission where only the user with the highest channel strength is served. • 1-layer RS: 1-layer RS is the RS strategy we specified in Section VI. The corresponding sum-rate and maxmin rate maximization problems are solved by using the WMMSE algorithm proposed in [43], [51]. Compared with MISO NOMA schemes, the transmitter and receiver complexities of 1-layer RS are much reduced. Similarly to MU-LP, no user grouping and decoding order optimization is needed. Each user only requires a single layer of SIC.

A. Perfect CSIT
Following [43], the initialization of the precoding matrix P in Algorithm 1 is designed by using Maximum Ratio Transmission (MRT) combined with Singular Value Decomposition (SVD). Specifically, the precoder for the message to be decoded by a group of users is designed based on the SVD of the channel matrix formed by the channel vectors of the corresponding users while the precoder for the message to be decoded by a single user is designed based on MRT. For example, when considering MISO NOMA (G = 3), the message for user-k, k ∈ K i , is decoded by users-{j | j ≤ k, j ∈ K i }. The precoders are initialized as p k = √ p k p k , where p k is the largest left singular vector of the channel estimate H k formed by channels {h j | j ≤ k, j ∈ K i }. The precoder p k of the stream to be decoded at last in each group is initialized as p k = √ p k h k ||h k || , where p k is the power allocated to the corresponding precoder p k and it satisfies that K k=1 p k = P t . Fig. 10 illustrates the sum-rate vs. SNR comparison of the five strategies considered when there are K = 6 users and the number of transmit antennas is M = 3 and M = 6. In Fig. 10(a) and Fig. 10(b), all users have equal channel variances, i.e., σ 2 k = 1, ∀k ∈ K while the users' channel variances are randomly generated from [0.1, 1] in Fig. 10(c) and Fig. 10(d), i.e., σ 2 k ∈ [0.1, 1], ∀k ∈ K. In other words, the average channel strength disparities among users are randomly generated between 0 and 10 dB 28 in Fig. 10(c) and Fig. 10(d).
In the high SNR regime of each subfigure, the multiplexing gains of all strategies are found to match the theoretical sum multiplexing gains specified in Table III Fig. 10. Therefore, MISO NOMA has a reduced sum multiplexing gain, inefficeintly makes use of the available multiple antennas and incurs a significant rate loss, especially at medium and high SNRs. It is not an efficient strategy for multi-antenna settings. The first misconception behind multi-antenna NOMA is confirmed.
As pointed out earlier in this section, the complexity of MISO NOMA at both the transmitter and the receiver is the highest among all strategies studied in this work. At the transmitter, the scheduling complexity is high since the user grouping and decoding order are required to be jointly optimized with the precoders. At the receivers, each user requires multiple layers of SIC and the number of SIC layers at each user increases with the number of users K in the system. In addition to such a high complexity, as evident from Fig. 10, the sum-rate performance of MISO NOMA is worse than that 28 As a reference, at a carrier frequency of 2 GHz, the typical macro cell propagation model of [71] states that the path loss [dB] is equal to 128.1 + 37.6 log 10 (R) where R is the transmitter-receiver distance in km. Considering a macro cell deployment with an inter-site distance of 750m [71], a 0 to 10 dB channel gain disparity implies that users are distributed between e.g. 160m to 300m or between 200m and 375m from their serving base station, i.e. a user located at 300m (375m) will experience 10dB extra path loss compared to a user at 160m (200m). of MU-LP 29 which exhibits a much lower complexity at the transmitter and each receiver. Adopting SIC receivers does not always boost the rate performance. On the contrary, an inefficient and inappropriate use of SIC as in MISO NOMA can make the rate performance worse than simply not using SIC (as in MU-LP). This illustrates the second misconception behind multi-antenna NOMA. We also observe from Fig. 10 that the sum-rate performance of OMA and MISO NOMA (G = 1) is the worst, which is also reflected in their sum multiplexing gains. Hence, comparing MISO NOMA with OMA is not sufficient in multi-antenna settings. Both MU-LP and 1-layer RS should be considered as the baselines for all MISO NOMA schemes. This verifies the third misconception behind multi-antenna NOMA.
In Fig. 11 and Fig. 12, we focus on the MMF rate performance when there are K = 6 users and the number of transmit antennas is varied from M = 3 to M = 6. All users have equal channel variances in Fig. 11 while the users' channel variances are randomly generated from [0. 1,1] in Fig. 12. The MMF multiplexing gains of all the strategies in both Fig. 11 and Fig. 12 match the corresponding theoretical MMF multiplexing gain results specified in Table IV Fig. 11 and Fig. 12. Though MISO NOMA has been promoted as a strategy to enhance user fairness and to deal with overloaded regimes, its MMF rate in the overloaded regime is actually worse than that of 1-layer RS. MISO NOMA is not an efficient strategy for overloaded regimes. This underscores the validity of the fourth misconception behind multi-antenna NOMA.

B. Imperfect CSIT
Let us now consider ergodic sum-rate and minimum ergodic rate maximization problems when the CSIT is imperfect. The two problems are solved by extending the WMMSE algorithm specified in Section B to the corresponding imperfect CSIT setting [43]. This is achieved by using Sample Average Approximation (SAA) method [72] to transform the original ergodic problem to its deterministic counterpart and then using WMMSE to solve the corresponding deterministic problem. In the following results, for a given channel estimateĥ k , k ∈ K, M = 1000 channel samples are generated. The ergodic sumrate or max-min ergodic rate is obtained by averaging over 100 channel estimates. The channel estimateĥ k and channel estimation errorh k have i.i.d. complex Gaussian entries respectively drawn from the distributions CN (0, σ 2 k − σ 2 e,k ), CN (0, σ 2 e,k ), where σ 2 e,k = σ 2 k P −α t . As only channel estimatê h k , k ∈ K is known at the transmitter, the precoders are initialized using the same method as in the perfect CSIT scenario but based on realistic channel estimates. Fig. 13, 14, and 15 are the imperfect CSIT results corresponding to Fig.  10, 11, and 12, respectively. The unspecified parameters in this subsection remain the same as the corresponding ones used for perfect CSIT. Fig. 13 illustrates the sum-rate vs. SNR comparison of the five strategies for imperfect CSIT. The sum multiplexing gains of all strategies in Fig. 13 match the theoretical sum multiplexing gains in Table II   multiplexing gain results, where MISO NOMA (G = 1) has the lowest multiplexing gain, we also observe from Fig. 13 that though MISO NOMA (G = 1) has the highest receiver complexity, its ergodic sum rate performance is the worst even in the preferred NOMA overloaded setting when the users have channel strength disparities. MISO NOMA (G = 1) always achieves a worse sum-rate than MU-LP. It is not beneficial for enhancing the sum-rate of multi-antenna scenarios regardless of whether perfect or imperfect CSIT is used. In comparison, 1-layer RS achieves explicit sum multiplexing gains and sumrate improvement over all other strategies. Fig. 14 and 15 illustrate the MMF ergodic rate results. In general, the MMF multiplexing gains of all strategies in both figures match the theoretical MMF multiplexing gain results specified in Table II. When M = 3/4/5/6, the corresponding We observe that 1-layer RS achieves significantly higher multiplexing gains, which is also reflected in the MMF ergodic rate performance in Fig.  14 and 15. In both the perfect and imperfect CSIT settings, user fairness cannot be improved by MISO NOMA. The MMF ergodic rate performance of MISO NOMA is much worse than that of 1-layer RS.
Therefore, the four misconceptions behind multi-antenna NOMA are further verified for imperfect CSIT. Higher sumrate and MMF rate gaps between RS and MU-LP/multiantenna NOMA are generally observed by comparing the corresponding perfect and imperfect CSIT results. By partially decoding the interference and treating the remaining interference as noise, 1-layer RS is more robust to CSIT inaccuracy. The large performance gain of RS makes it an appealing strategy for future communication networks.

C. Discussions
The simulations so far fully validate the theoretical multiplexing gain analysis and confirm the inefficiency of MISO NOMA. We therefore conclude that the fundamental design principle of NOMA, namely forcing one user to decode the message(s) of other user(s), should be reconsidered or very carefully used for multi-antenna settings.
Thanks to its ability to partially decode interference and partially treat interference as noise, 1-layer RS achieves equal or higher sum-rate and MMF rate performance than all other strategies in both underloaded and overloaded regimes, especially when it comes to metrics that favor user fairness (e.g., MMF rate) in an overloaded regime. This is due to the fact that the inter-user interference becomes stronger in the setting when all users are active and the number of transmit antenna is limited. The superiority of 1-layer RS in managing multi-user interference becomes more pronounced when users suffer from stronger interference. Most importantly, 1-layer RS requires no user grouping and decoding order optimization at the transmitter and only one layer of SIC at each user. Compared with MISO NOMA, the sum-rate and MMF rate performance gain of RS comes at a much reduced transmitter and receiver complexity. 1-layer RS enables a better trade-off between the rate performance gains and the number of SIC layers. Hence, we conclude that 1-layer RS is a more powerful and promising strategy for multi-antenna networks.
Though the evaluations have been limited to 1-layer RS as the basic RSMA scheme, further rate enhancements over 1-layer RS can be obtained with multi-layer RS where the message of a user is split multiple times and multiple SIC layers are implemented at the receivers, as demonstrated in [46], [47], [68], [79], [80].

IX. CONCLUSIONS AND FUTURE RESEARCH
This paper provides a broad, different and useful perspective on multi-antenna NOMA and non-orthogonal transmission to the community working on NOMA and multiple access, and to the future generations of researchers working on multi-user multi-antenna communications. Although NOMA in singleantenna settings has been well understood for a long time, the paper shows that the design of non-orthogonal transmission strategies for multi-antenna settings should be done with care so as to benefit from the multi-antenna dimensions and SIC receivers.
The paper showed in Section II that a two-user multiantenna NOMA increases the receiver complexity and at the same time incurs a loss in multiplexing gain (and therefore rate at high SNR) compared to conventional multiuser precoding (as in used in 4G and 5G), therefore raising concerns on the efficiency of multi-antenna NOMA. Subsequently, a general K-user setting with perfect CSIT and imperfect CSIT were studied in Section III and Section IV, respectively and various multiplexing gains of multi-antenna NOMA were derived. Then we introduced two baseline schemes, namely K-user conventional multiuser precoding Section in V and K-user multi-antenna rate-splitting in Section VI, and studied the multiplexing gains of those schemes. Section VII compares the multiplexing gains of all considered schemes and provides strong theoretical grounds for performance comparisons among all schemes. In particular it identifies the scenarios where NOMA incurs a gain and a loss compared to multiuser linear precoding and demonstrates how NOMA always leads to lower multiplexing gains than rate-splitting though it makes use of a larger number of SIC layers at the receivers. This section is instrumental and exposes various misconceptions and shortcomings of multi-antenna NOMA. Simulation results are then used in Section VIII to confirm our findings and prediction from the multiplexing gain analysis.
Our results show that NOMA is not an efficient solution to cope with the high throughput, reliability, heterogeneity of Quality-of-Service (QoS), and connectivity requirements of the downlink of future 5G and beyond multi-antenna wireless networks. This is due to the fact that the fundamental principle of NOMA consisting in forcing one user in each group to fully decode the messages of other co-scheduled users is an inefficient design in multi-antenna settings. Consequently, the benefits to the research community and future standards and networks of multi-antenna NOMA for downlink communications (e.g. MISO/MIMO techniques for NOMA, NOMA for massive MIMO and cell-free massive MIMO, multiantenna NOMA for millimetre and terahertz communications, NOMA for multi-beam satellite communications, multiantenna NOMA in reconfigurable intelligent surfaces, multiantenna in Multiuser Superposition Transmission (MUST) in 3GPP, etc) are questionable and should be considered carefully in light of the results in this paper.
Instead, non-orthogonal transmission strategies for multiantenna settings should be designed such that interference is partially decoded and partially treated as noise based on the rate-splitting (multiple access) literature so as to truly benefit from multi-antenna transmitters (and potentially multi-antenna receivers) and SIC receivers.
The emphasis of the paper was on downlink multi-user communications. Results suggest that future downlink multiuser multi-antenna communications would strongly benefits from RSMA. RSMA is a gold mine of research problems for academia and industry with issues spanning numerous areas: RSMA to achieve the fundamental limits of wireless networks; RSMA for multi-user/multi-cell multi-antenna networks; RSMA-based robust interference management; RSMA in MU-MIMO, CoMP, Massive MIMO, millimetre wave and higher frequency bands, relay, cognitive radio, caching, physical layer security, cooperative communications, cloud-enabled platforms (C-RAN, F-RAN), intelligent reflecting surfaces, etc; RSMA to unify, generalize and outperform SDMA and NOMA; physical layer design of RSMA-based network; coding and Modulation for RSMA; cross-layer design, optimization and performance analysis of RSMA; implementation and standardization of RSMA; RSMA in B5G services such as enhanced eMBB, enhanced URLLC, enhanced MTC, massive MTC, massive IoT, V2X, cellular, UAV and satellite networks, wireless powered communications, integrated communications and sensing, etc. RSMA can also be used in the uplink, as originally shown for single-antenna systems in [91], and much is left to be done to identify the benefits of RSMA for general uplink multi-user multi-antenna communications. The performance benefits of RSMA vs NOMA vs OMA vs other multiple accesses in the uplink, beyond the existing NOMA vs OMA comparison [92], is also much worth investigating.
APPENDIX A PROOF OF PROPOSITION 4 Let us first consider G > 1 and M ≥ K − g + 1. Recalling from the proof of Proposition 3 that the sum multiplexing gain of Gα can be split equally among the G groups so that each group gets a (group) sum multiplexing gain of α, and following again the MAC argument, the (group) sum multiplexing gain of α in each group can then be further split equally among the g users, which leads to an upper bound on the MMF multiplexing gain of α g . Achievability is obtained by designing precoders using ZFBF, and allocating power (consider group 1 for simplicity) to user k = 1, . . . , g as O(P 1− g−k g α ), which causes the SINR for user-k to scale as O(P α/g ) and an achievable MMF multiplexing gain of α g . To illustrate the achievability in more detail, we consider a simple example associated with K = 4, G = 2, g = 2, and M ≥ 3. First, we design the precoders p 1 and p 2 in group 1 to be orthogonal to the channel estimatesĥ 3 andĥ 4 of users 3 and 4. Similarly, p 3 and p 4 in group 2 are made orthogonal tô h 1 andĥ 2 . Second, allocate power O(P b ) with b = 1 − α/2 to users 1 and 3, and O(P − P b ) = O(P ) to users 2 and 4. Using these precoders and power allocations, the received signals in group 1 can be written as where the quantities under the brackets refer to how the power level of each term scales. From (43) and (44), s 2 can be decoded at an SINR level scaling as . Using SIC, s 2 is cancelled in (43), and s 1 can be decoded at an SINR level scaling as P b P 1−α +P b−α +P 0 = P α/2 . Similar expressions hold for group 2, and we note that all four streams have an SINR scaling as P α/2 , therefore achieving an MMF multiplexing gain of α 2 . Let us now consider G > 1 and M < K − g + 1. Since the MMF multiplexing gain collapses to 0 in the perfect CSIT setting, the same holds for imperfect CSIT.
Let us now consider G = 1. The situation here is the same as in the perfect CSIT setting. There is no inter-group interference and the sum multiplexing gain of one in the single group can be split equally among the K users, which leads to an upper bound on the MMF multiplexing gain of 1 K .
Achievability is obtained by choosing the powers of users k = 1, . . . , K as O(P k/K ), which causes the SINR of userk to scale as O(P 1/K ) and results in an achievable MMF multiplexing gain of 1 K .

APPENDIX B WMMSE OPTIMIZATION FRAMWORK
The WMMSE optimization framework to solve both problems (41) and (42) is specified as follows.
At user-j, j ∈ K i , equalizer g j,k is employed to decode stream s k , k ∈ {k | k ≥ j, k ∈ K i }. The estimate of s k at user-j is obtained as s j,k = g j,k y j,k , where y j,k = m≤k,m∈Ki h H j p m s m + l =i,l∈G m∈K l h H j p m s m + n j is the signal received at user-j after removing the streams decoded before s k . The corresponding Mean Square Error (MSE) is given by where T j,k = |h H j p k | 2 + I j,k are respectively the intra-group and inter-group interference power defined in (18).
By solving ∂ε j,k ∂g j,k = 0, the optimal Minimum MSE (MMSE) equalizer is calculated as Substituting (46) back to (45), the corresponding MMSE is then obtained as With the introduced ε MMSE j,k , the rate at user-j to decode the message of user-k in (17) is equivalently written as R j,k = − log 2 (ε MMSE j,k ). Defining the Weighted MSE (WMSE) of ε j,k with a weight u j,k > 0 as ξ j,k = u j,k ε j,k − log 2 (u j,k ), and defining its Weighted MMSE (WMMSE) by minimizing ξ j,k over u j,k and g j,k as we then establish the rate-WMMSE relationship, which is given by The rate-WMMSE relation in (50) is obtained as follows. The optimum equalizer is calculated as g * j,k = g MMSE j,k from ∂ξ j,k ∂g j,k = 0. Substituting g MMSE j,k back to (48) yields ξ j,k (g MMSE j,k ) = u j,k ε MMSE j,k − log 2 (u j,k ). By solving ∂ξ j,k (g MMSE j,k ) ∂g j,k = 0, we then obtain the optimal MMSE weight, which is given as Substituting u MMSE j,k back to ξ j,k (g MMSE j,k ), we have min u j,k ,g j,k ξ j,k = 1 − R j,k . Following (49), we obtain (50). Substitute (g, u) back to (53) and update P by solving (53); 6 until convergence; Motivated by the rate-WMMSE in (50), we find that the achievable rate of user-k in (19) and the respective set of equalizers and weights as g = {g j,k | j ≤ k, k, j ∈ K i , i ∈ G}, u = {u j,k | j ≤ k, k, j ∈ K i , i ∈ G}, the sum-rate WMMSE problem is formulated as min P,u,g k∈K ξ k (53a) s.t. tr(PP H ) ≤ P.
Following the proof of [43], we find that the MMSE solutions of the equalizers g MMSE = {g MMSE j,k | j ≤ k, k, j ∈ K i , i ∈ G} and weights u MMSE = {u MMSE j,k | j ≤ k, k, j ∈ K i , i ∈ G} satisfy the KKT optimality conditions of (53). Substituting (g MMSE , u MMSE ) back to (53) with affine transformations applied to the objective function, (53) boils down to (41). In fact, for any point (P * , u * , g * ) satisfying the KKT optimality conditions of (53), the solution P * satisfies the KKT optimality conditions of (41). Hence, (53) yields a solution for (41).
Although the transformed problem (53) is still non-convex, it is block-wise convex with respect to P and (g, u). For a given P, the optimal solution of the weights and equalizers are g MMSE (P), u MMSE (P). When (g, u) are fixed, problem (53) becomes convex and can be solved by interior-point methods. Motivated by the block-wise convexity, we use the Alternating Optimization (AO) algorithm as illustrated in Algorithm 1 to solve (53). In each iteration, the equalizers and weights are first updated by (g MMSE (P), u MMSE (P)) for a given P. The updated equalizers and weights (g MMSE (P), u MMSE (P)) are substituted back to (53). Precoder P is then updated by solving (53). P and (g, u) are updated in an alternating manner until the convergence of the sum-rate. Algorithm 1 is guaranteed to converge and it converges to the KKT solution of problem (41). Readers are referred to [43] for the proof.
Following the same procedure, we are able to obtain the transformed WMMSE problem for max-min rate maximization, which is given by By substituting problem (53) in Algorithm 1 with problem (54), we obtain the corresponding AO Algorithm to achieve the KKT solution of the max-min rate problem (42).