Angle-Domain Hybrid Beamforming-Based mmWave Massive MIMO-NOMA Systems

The millimeter-wave (mmWave) large-scale antenna arrays (LSAAs) systems play a vital role in increasing the beamforming (BF) gain and acquiring highly directional propagation. Recently, non-orthogonal multiple access (NOMA) has been integrated into these systems to manage massive connectivity and achieve spectral-efficient communications. This paper focuses on angle-domain (AD) hybrid beamforming (BF) for mmWave LSAAs and NOMA systems, thanks to the low complexity, power consumption, and channel estimation overhead. However, with limited radio-frequency chains, the hybrid BF-based single-beam (SB)-NOMA scheme generating a single beam to serve the NOMA users fails to exploit the multi-user diversity due to narrow beams with LSAAs. To tackle this limitation, we design schemes offering additional degrees of freedom. More importantly, they require only the knowledge of angular information and are suitable for either linear or rectangular antenna arrays, unlike those proposed in the literature. The first scheme exploits the time-domain resources to schedule groups having high spatial interference within distinct time slots. To minimize the need for fast and precise synchronization when applying time division multiple access (TDMA) with mmWave NOMA, we leverage the multi-beam (MB)-NOMA framework. And we propose a joint SB- and MB-NOMA scheme to benefit from NOMA multi-user diversity, whatever the cell load and the users’ positions. Using the New York University channel simulator (NYUSIM), we further validate the performance of the proposed schemes compared to the solution proposed in the literature and others using fully digital BF. Specifically, the proposed TDMA-based scheme achieves a sum-rate gain of up to 83% over the TDMA-based one existing in the literature. Moreover, we verify the superiority of applying both SB- and MB-NOMA instead of only MB-NOMA.

due to its high energy consumption and unaffordable hardware complexity and cost [3], [4]. Alternatively, hybrid BF (HBF), with both sub-and full-connected structures, is considered an effective solution for the possible implementation of mmWave mMIMO systems [4], [5]. This technique separates the signal processing into a low-dimensional digital precoder (addressing a small number of RF chains) and a high-dimensional analog precoder in the RF band to increase the array gain. On the other hand, mMIMO requires a massive channel state information (CSI) overhead, which generates a long training sequence and causes delays in signaling. Exploiting the main features of mmWave channels, i.e., high directionality and significant blockage, the user's angle-of-departure (AoD), w.r.t. the base station (BS), is considered a promising partial CSI for mmWave mMIMO systems [6]. Indeed, it only depends on the direction of the line of sight (LoS), so it varies slowly over time and is not proportional to the antennas number. This information has been the subject of much attention from the research community in various communication scenarios, leading to several proposed angle-domain channel estimation and tracking techniques like [7] for high-speed railways, [8] for indoor 60 GHz mMIMO systems, [9] for mmWave hybrid mMIMO systems, etc.
Massive connectivity is a critical requirement in future cellular networks. However, conventional orthogonal multiple access (OMA) techniques, such as space division multiple access (SDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), etc., support a single user in the same space-time-frequency-code resource block (RB). Recently, non-OMA (NOMA) has emerged to improve network capacity and accommodate massive connectivity by exploiting additional non-orthogonal resources (e.g., power domain resources). Using superposition coding on the transmitter and successive interference cancellation (SIC) at the receiver, the BS can serve multiple users with different channel conditions at the same orthogonal RB.
Motivated by these observations, we focus on angledomain (AD) mmWave HBF-based mMIMO-NOMA systems to take the benefits of three essential technologies, namely mmWave, mMIMO, and NOMA, with only the knowledge of angular information and low-complex BF technique. Next, we review the various HBF-based MIMO-NOMA systems presented in the literature.

B. RELATED WORKS
HBF-based mMIMO-NOMA systems are extensively studied in the literature to reduce energy consumption and hardware complexity [10], [11], [12], [13], [14], [15], [16], [17], [18]. In [10], the authors design a new HBF-based mmWave beamspace MIMO-NOMA scheme to support more users than the RF chains. And an iterative algorithm that solves the power allocation optimization problem is designed to maximize the system sum-rate. In addition, the authors in [11] apply NOMA with the fully-connected HBF and develop a new HBF technique by modifying the conventional block diagonalization scheme. This was done to improve the achievable spectral efficiency by reducing the co-channel interference. The authors in [12] propose a user clustering algorithm according to the users' channel correlation and then formulate a joint HBF and power allocation problem for maximizing the system sum-rate under a minimum rate constraint. Yet, this problem is non-convex. To this end, they first apply an arbitrary fixed HBF and find the power allocation solution. Then, they set the analog precoder technique and create a digital precoder that reduces inter-group interference by applying the approximate zero-forcing (ZF) method. Finally, they solve the analog precoder problem using the constant-modulus constraint with a proposed boundary-compressed particle swarm optimization algorithm. In [13], the authors consider the suband fully-connected structures in the RF stage and ZF in the digital baseband. And they propose an iterative low-complex power allocation algorithm to maximize energy efficiency. To improve spectrum-and energy-efficiencies, the authors in [14], [15] integrate HBF-based MIMO-NOMA with simultaneous wireless information and power transfer (SWIPT). The authors in [16] propose an optimal analog precoder with the aid of ZF in the baseband to maximize both the sumrate and the energy efficiency. This was done for the two hybrid structures operating under LoS and non-LoS (NLoS) mmWave environments. In [17], the authors use the signalto-leakage ratio (SLNR) for the first time as the performance index to tackle the issues of resource optimization in HBFbased mMIMO-NOMA. Specifically, they formulate a joint optimization problem concerning power allocation and HBF in order to maximize the minimum user SLNR, thus ensuring fairness between users.
However, previous works of HBF-based MIMO-NOMA adopt very complex digital precoders and require complete knowledge of CSI. To reduce the complexity of signal processing in baseband and the overhead of channel estimation with LSAAs, we consider AD MIMO-NOMA. Most existing works on AD MIMO-NOMA using analog BF (ABF), DBF, or HBF partition the users into groups according to their angle difference and form a single beam toward each group [6], [19], [20], [21], [22]. These schemes are referred to here as single-beam (SB)-NOMA. Unlike these works, we defined in [23], [24] a spatial interference metric, denoted as β, built based on the array factor definition. And we developed β-based 2-user and multi-user clustering algorithms to schedule the users with high spatial interference in the same SB-NOMA group. In these works, we considered the AD-DBF technique, while in this paper, we focus on implementing the AD-HBF technique with LSAAs. However, the beams are very narrow in such systems, so using SB-NOMA, only users with similar AoDs are served simultaneously in the same group. This means that in mmWave hybrid systems, due to the limited number of RF chains, SB-NOMA with LSAAs cannot provide connectivity to all users, especially in congested cells.
To exploit more degrees of freedom (DoFs) and accommodate more users using limited RF chains, Hu et al. in [19] suggest a joint SB-NOMA and TDMA scheme, denoted here as SB-NOMA-TDMA. Specifically, they cluster the users within single-user (SU) and SB-NOMA groups with angles belonging to a predefined set. Subsequently, a group clustering algorithm is used in the time domain to limit interference in each time slot while considering that increasing spatial direction distance can significantly reduce inter-group interference. To take advantage of the decreased complexity and cost from HBF-based SB-NOMA-TDMA, fast and accurate synchronization in time between users is essential since mmWave communication generally offers a high symbol rate. Moreover, there is more synchronization complexity on both sides of the BS and receiver.
Recently, the authors in [25] discuss the concept of the multi-beam (MB) NOMA framework in mmWave hybrid systems, and the authors in [26] offer its implementation details. Specifically, they propose a beam-splitting technique (BST) that divides the entire transmit array into various sub-arrays. Accordingly, within the same RF chain, the BS can generate multiple analog beams toward multiple NOMA users with arbitrary AoDs. The authors in [26] show that MB-NOMA efficiently exploits the multi-user diversity, unlike SB-NOMA, by performing NOMA transmission even when the users have separated AoDs, especially with LSAAs. In [27], they designed a suboptimal two-stage resource allocation that maximizes the system sum-rate based on a full CSI. Considering only ABF in the first stage, they suggest a joint user grouping and antenna allocation algorithm that maximizes the conditional system sum-rate by leveraging the coalition formation game theory. In the second stage, they adopt ZF in the digital baseband. Subsequently, they formulate a non-convex power allocation optimization problem to maximize the system sum-rate subject to the QoS constraints. A suboptimal solution is devised to solve this problem. However, the scheme in [26], [27] requires complete knowledge of CSI, and both [19] and [26], [27] are designed only for uniform linear array (ULA) architectures.

C. CONTRIBUTIONS
In this paper, we focus on AD HBF-based mMIMO-NOMA systems where only the users' angles are known at the BS. Due to the narrowness of beams with LSAAs and the limited number of RF chains, SB-NOMA fails to provide full connectivity to all users in overloaded scenarios. To address this issue, we propose two schemes offering additional DoFs for the SB-NOMA scheme. While the former leverages the time-domain resources using TDMA, the latter only adopts NOMA. More importantly, the proposed schemes only require the knowledge of angular information and use either ULA or uniform rectangular array (URA). The main contributions of our work are as follows • We propose two-phase schemes, namely, joint β-based SB-NOMA and TDMA and joint AD SB-and MB-NOMA schemes addressing the limitation of SB-NOMA with hybrid LSAAs to provide more DoFs in overloaded scenarios. • The joint β-based SB-NOMA and TDMA scheme proposed in this work was inspired by the one introduced in [19]. In contrast, we utilize the β-based spatial interference metric derived in [24], which is based on the array factor to accurately calculate inter-group interference for any uniform array architecture, such as ULA, URA, or uniform circular array (UCA), rather than using only the angular distance as done in [19]. • Unlike previous work on AD HBF-based mMIMO-NOMA, the proposed joint AD SB-and MB-NOMA scheme leverages the potential of SB-NOMA when users are close to each other and the ability of MB-NOMA to accommodate several users with distinct AoDs. This extends our prior work in [28], which focused solely on ULA.

D. ORGANIZATION
The rest of this paper is organized as follows. The system model of HBF-based MIMO-NOMA is presented in Section II. The performance analysis in terms of the sum-rate of SDMA, SB-NOMA, and MB-NOMA is studied in Section III. The proposed joint β-based SB-NOMA and TDMA, and joint AD-MB-SB-NOMA schemes are presented in Sections IV and III, respectively. The performance of the proposed schemes is evaluated in Section VI. Finally, a summary is given in Section VII.

E. NOTATIONS
Throughout this paper, A, a and a denote matrix, vector and scalar, respectively. (.) T , (.) H and Tr(·) represent the transpose, the Hermitian transpose and the trace, respectively. N (ν, σ 2 ) is a Gaussian random variable with mean ν and variance σ 2 . P(.) is the probability of an event. And = (θ, φ) denotes a couple of azimuth and elevation angles.

II. HBF-BASED SB-NOMA SYSTEM MODEL
The downlink HBF-based mMIMO-NOMA system consists of a BS equipped with M = M x M z 1 antennas and N RF M RF chains to serve K M single-antenna UEs. The BS adopts a hybrid fully-connected structure. Denote by M x and M z the number of antennas along the x− and z−axis, respectively. In this work, we consider both 1D and 2D antenna arrays at the BS by adopting a ULA array along the x-axis and a URA array in the xoz plane, respectively. In the classical AD MIMO-NOMA scheme, the users are regrouped within SU and multi-user SB-NOMA groups, according to their AoDs, see Fig. 1.

A. MMWAVE CHANNEL MODEL
The mmWave channel vector h k ∈ C M×1 between the BS and user k can be expressed as follows where N k is the number of paths, α n,k and ϕ n,k are the amplitude and the phase of the n-th path, and n,k = (θ n,k , φ n,k ) is the couple of azimuth and elevation AoD of the n-th path. a( , M x , M z ) ∈ C M×1 is the transmit array steering vector corresponding to the direction and is given by (2), shown at the bottom of the page for both ULA and URA. For a ULA array along the x-axis, M z = 1 and φ = 0, thus, M = M x and = (θ, 0). For simplicity, we will use, in the rest of this paper, a n,k instead of a( n,k , M x , M z ). This work considers that the LoS path exists in each user's channel and has the highest power, labeled by n = 1. Thus, 1,k = (θ 1,k , φ 1,k ) represents the spatial direction of user k. Throughout this paper, we adopt the realistic and statistical channel simulator developed by New York University, called NYUSIM [29]. This simulator applies the time-cluster and spatial-lobe approach to generate different channel coefficients and is specified only for mmWave frequencies.

B. HYBRID BEAMFORMING DESIGN
Using HBF, the total number of groups G served simultaneously by the BS is restricted by the number of RF chains, i.e., G ≤ N RF . The overall downlink HBF precoding matrix F ∈ C M×G is constructed in two stages as follows where D = [d 1 · · · d G ] ∈ C G×G is the baseband digital component and W = [w 1 · · · w G ] ∈ C M×G is the RF analog component. Specifically, d g and w g denote respectively the digital and analog beamformers for the group g assigned to the g-th RF chain. A pure analog mmWave system is considered in this work to reduce the complexity of the signal processing in the baseband, i.e., D = I G . Thus, the normalization BF factor η given by η =

C. HBF-BASED MIMO-NOMA SYSTEM MODEL
In the AD DBF-based MIMO-NOMA scheme as in [24], the K users are clustered into G SU SU groups and G SB multiuser SB-NOMA groups according to their AoDs. Denote S g , g = 1, . . . , G, as the set of users scheduled on the group g such that G g=1 S g = K and S g S g = ∅, ∀g = g . The user scheduling variable u g k is defined as follows Note that each user will be served within one group, i.e., G g=1 u g k = 1, ∀k ∈ K. In contrast to the angle-based clustering algorithms proposed in the literature [19], [21], we design 2-user and multi-user β-UC algorithms that can be applied with any antenna array architectures [24]. This was done using the spatial interference β k,u metric which is defined as follows As demonstrated in [24], β k,u for ULA can be rewritten by where AF ( 1,u ) ( ) is the array factor of the beam directed at 1,u . Similarly, for any uniformly excited array architecture, e.g., URA, UCA, (5) leads to (6).
For both SU and multi-user SB-NOMA groups, the BS generates a single beam in the spatial direction of each group. This work focuses on AD HBF-based MIMO-NOMA, where N RF M. Thereby, the analog beamformer of group g is given by the array steering vector corresponding to g where g = (θ g , φ g ) is the spatial direction of group g and is given by (8), shown at the bottom of the page. Following [24], for SB-NOMA groups, we take g as the mean value between the minimum and maximum of all users' spatial direction in the group g. It's worth noting that according to [24], β and g should be selected so that the main beam covers all users to avoid severe beam misalignment. Thus, they should satisfy the following conditions with β FSL is the first side lobe level and is equal to 0.217 (or −13.26 dB) for both ULA and URA [30]. Note that the value of β FSL does not depend on the beam's direction and the antennas number.
In short, we apply the multi-user β-UC algorithm to partition the K users within G SU SU and G SB multi-user SB-NOMA groups. After that, we classify the users in each SB-NOMA group according to the angular-based user ordering strategy [24]. 1 This uses the angular-based channel qualityζ metric. Denoteζ g k as the angular-based channel quality of user k belonging to group g. Without loss of generality, we assume that the users are indexed in the descending order of theirζ , i.e.,ζ g k ≥ζ g k ∀ k ≤ k . Therefore, the SIC decoding realizes the signal separation at the users side in the increasing order ofζ . Assuming a successful SIC decoding, the user k from group g receives the following signal y g k where s k is the modulated signal relative to user k, p g is the power allocated to group g such that G g=1 p g = P e with P e the total transmission power, γ k,g is the intra-group power allocation coefficient assigned to user k belonging to group g such that K k=1 u g k γ k,g = 1 according to the NOMA principles, and z g k ∼ N (0, σ 2 n ) is the additive white Gaussian noise experienced at user k. Thereby, user k belonging to group g, has the SINR g k given in (11), shown at the bottom of the page, while decoding his own message. 1. In [24], we propose user ordering and power allocation techniques with only the knowledge of users' AoDs. We find that the proposed user ordering strategy outperforms other limited feedback strategies and that the AD power allocation technique provides an efficient SIC. For these reasons, and since we are interested in the feedback of angular information, we adopt them throughout this work.
In (11), the first term in the denominator is the residual intra-group interference after SIC, and the second one is the inter-group interference. Assuming a successful decoding and no propagation error, user k belonging to group g achieves the following data rate R g k R g k = log 2 And the total sum-rate can be expressed as follows

D. PROBLEM STATEMENT
In hybrid systems, up to N RF groups can only be connected to the BS in the same orthogonal RB. However, when utilizing LSAAs with SB-NOMA, the narrowness of beams limits the capability of handling massive connectivity and adding more DoFs. As the number of antennas, M, increases, the beamwidth narrows, and the number of users served in the same NOMA groups gets smaller. Thus, in an overloaded scenario where K > N RF , the probability that the cell is still overloaded using SB-NOMA, P os = P(G SU + G SB > N RF ), increases with increasing M. As M grows large but finite, This means that in the asymptotic limits when K > N RF , the number of groups, G srv , served by the BS is approximately equal to G srv M 1 → N RF , and P os M 1 → 1. Consequently, K − N RF users will not be able to connect to the BS, making SB-NOMA insufficient for managing the connectivity of all users with LSAAs in an overloaded scenario.
This paper aims to overcome the limitation of SB-NOMA in an overloaded scenario, i.e., G SU + G SB > N RF , to exploit the multi-user diversity with LSAAs. To do so, we design two schemes providing different types of additional DoFs to handle the connectivity of all users so that the G f total groups served at the same orthogonal RB satisfy G f ≤ N RF . Fig. 2 illustrates the flowchart of the proposed schemes. The first one, i.e., joint β-based SB-NOMA and TDMA, leverages the time-domain resources and is inspired by [19]. The other scheme, i.e., joint SB-and MB-NOMA, leverages the MB-NOMA framework [26], in which users with any AoDs can be served in the same group, i.e., by the same RF chain. Both schemes allow the connectivity of all users in an overloaded scenario and require only the knowledge of the user's spatial direction. Our analysis is restricted to cases with 2 ≤ K ≤ 2N RF due to the assumption that each MB-NOMA group supports only two users. Going beyond this and allowing the BS to serve more than 2N RF users through multi-user MB-NOMA groups is left as a future work.

III. PERFORMANCE ANALYSIS OF SPATIAL AND NON-ORTHOGONAL MULTIPLE ACCESS TECHNIQUES
In this section, we derive the sum-rate expressed in (12) using SDMA, SB-NOMA, and MB-NOMA for a two-user scenario and analyze their performance to see how we can exploit them in our proposed scheme. Before that, we first introduce the concept of MB-NOMA and how we extend it to address URA.

A. MULTIPLE ANALOG BEAMS USING BEAM SPLITTING
The authors in [26], [27] propose an MB-NOMA framework for mmWave hybrid systems to serve multiple users having arbitrary AoDs within the same RF chain. Specifically, they design a BST that divides the antenna array into various sub-arrays to generate multiple analog beams. Interestingly, MB-NOMA allows for more exploitation of multi-user diversity than SB-NOMA in mmWave hybrid systems. Indeed, with MB-NOMA, the number of users served simultaneously in the same NOMA cluster is not restricted by the users' AoD distribution as it is with SB-NOMA. In [27], their proposed BST can only be applied to the ULA architecture. However, since URA is more feasible with mMIMO in practice [31], we extend this technique to also deal with URA. In this work, we consider a 2-user MB-NOMA framework, where only two users belong to the same MB-NOMA group. So, we divide the antenna array connected by an RF chain into two sub-arrays to form two analog beams. And we assume that both sub-arrays have the same size, i.e., the same number of antennas M sa x and M sa z along the x-axis and z-axis, respectively, i.e., M sa x M sa z = M/2. Next, we separately present the BSTs with either ULA or URA. To facilitate the understanding of the principle of this technique, we first rewrite the corresponding array steering vector in (2). Then, we consider user k and user k belonging to the MB-NOMA group g, and we reformulate the RF analog beamformer w g assigned to the RF chain g performing beam splitting to generate two different beams toward each user.

1) BEAM SPLITTING WITH ULA
We start with a ULA array having M = M x antennas along the x-axis, for which the array steering vector a( , M x , 1) corresponding to the angle = (θ, 0) is given by From (14), it is clear that the a( , M x , 1) vector can be constructed as a set of the steering vectors of two subarrays with M x /2 antennas separated by a phase shift, i.e., e j2π(M x /2)ω x ( ) . This construction facilitates the understanding of the BST in [27] with ULA. Now assume that the BS adopts the BST to simultaneously serve user k and user k belonging to group g. Recall that the ULA array is split into two sub-arrays with M sa x = M x /2 and M sa z = 1. Therefore, the RF analog beamformer w g assigned to the RF chain g performing beam splitting with two beams is given by [27] From (15), each sub-array generates a beam in the spatial direction of one user.

2) BEAM SPLITTING WITH URA
Similarly, using URA with M x and M z antennas along the x-and the z-axis, respectively, the array steering vector a( , M x , M z ) ∈ C M x M z ×1 corresponding to the angle = (θ, φ) can be rewritten by We now extend the BST to handle URA arrays as well.
We vertically divide the antenna array into two sub-arrays, each with M sa x = M x and M sa z = M z /2 antennas along the xand the z−axis, respectively, as shown in Fig. 3. According to (16), the RF analog beamformer w g performing the BST can be constructed as follows From (17), it's clear that the first sub-array can successfully form a beam toward user k. Furthermore, the steering vector of the second sub-array is multiplied by a constant phase shift, i.e., e j2π(M sa z )ω z ( k ) . In other words, we add the same phase shift to all antennas in this sub-array, which have  different phase weights to direct a beam to k . Thus, it can also successfully form a beam toward user k .
Once the number of sub-arrays and the amount of horizontal and vertical antennas in each are defined, we can rewrite the expression of a( k , M x , M z ) as we did in (17). Knowing that the construction of w g performing the BST is mainly determined by a( k , M x , M z ), this allows us to easily extend the BST to support multiple users per RF chain by considering the same or different sub-array configurations.

3) ILLUSTRATIVE REPRESENTATIONS
To illustrate the generation of two analog beams via the BST, Fig. 4 depicts the normalized array pattern responses for w g in (17) at user k and user k and of a single beam pointed at user k using URA. Compared to the single beam for user k , the maximum magnitude of the array response for w g is halved due to two sub-arrays of equal size with M/2 antennas each. Moreover, the width of the beams pointed at users k and k increases. In the following, we will show that generating multiple analog beams via the BST manages the connectivity of all users in HBF-based mMIMO-NOMA systems. Indeed, MB-NOMA serves more users on each RF chain, while SB-NOMA only considers users with close directions.

B. SYSTEM SUM-RATE OF SDMA, SB-NOMA, AND MB-NOMA IN A 2-USER SCENARIO
Previous work on mmWave MB-NOMA [26], [27] requires full CSI to perform user clustering and does not exploit the potential benefits of SB-NOMA when users are close together. As discussed earlier, angular information is a promising partial CSI for mmWave channels. Therefore, throughout this subsection, we investigate the spatial behavior of SDMA, SB-NOMA, and MB-NOMA, and answer the following question: how can we benefit from both SB-NOMA and MB-NOMA in the angle domain? Here, we consider a special case scenario where the BS serves only two users in a mono-path environment. Thereby, K = 2 and only one SB-NOMA or MB-NOMA group exists, i.e., G = 1. Denote by user 1 the strong user, i.e., γ 1,g ≥ γ 2,g when NOMA is applied. For SDMA, we consider two RF chains at the BS to serve the two users with one beam each.

SYSTEM SUM-RATE 1) SPACE DIVISION MULTIPLE ACCESS (SDMA)
We start by considering that the BS applies the SDMA technique. Thus, a single beam is pointed to serve each user. The SINR SD 1 at user 1 is then given by (SINR SD 2 can be obtained by symmetry) We set β 1,2 (M x , M z ) as the normalized spatial interference between the two users And since η = 1 KM = 1 2M with SDMA, the sum-rate R SD T can be expressed as 2) SINGLE-BEAM NOMA (SB-NOMA) The BS now applies the SB-NOMA technique. So, it forms a single beam between the two users, with an angle calculated as in (8). Thus, SINR SB 1 and SINR SB 2 of users belonging to group g (= 1) are respectively given by where (a) is obtained since G = 1, i.e., p g = P e .
, and since η = 1 MG = 1 M with SB-NOMA, the sum-rate R SB T can be expressed as follows

3) MULTIPLE-BEAM NOMA (MB-NOMA)
The BS now applies the MB-NOMA technique via the BST. So, it generates two beams at the same RF chain; each is directed in the AoD of each user. The analog beamformer w g (g = 1) is given by (15) and (17) with ULA and URA, respectively. Denote by ψ( k , M sa x , M sa z ) the additional phase shift experienced at the second sub-array using either ULA or URA and given by Thus, using either ULA or URA, SINR MB 1 and SINR MB 2 can be expressed respectively as follows Eqs. (25) and (26) can be rewritten by (27) and (28) And since η = 1 MG = 1 M with MB-NOMA, the sum-rate R MB T can be given by (30), shown at the bottom of the page.
From (20), we find that R SD T is a decreasing function of β 1,2 . This also can be seen from Fig. 5(b). In other words, when users are very close to each other, i.e., → (0, 0) and β 1,2 → 1, the system suffers from high inter-user interference and the SDMA performance degrades. For instance, for a large but finite number of antennas or in a high SNR regime, we can obtain this approximation that R SD T (β 1,2 → 1) Fig. 5, we can see this equality and that SDMA performs very well when the users are well separated in space. With a large but finite value of M,  = (0, 0) [32]. R SD T can be then approximated by From (23), R SB T is restricted by the β k,g term. From [24,Lemma 2], if user 2 is in the main lobe of the beam pointed to user 1, i.e., ≤ 3dB 1 ) 2 and β 1,2 ≥ β FSL = 0.217, then both users will be located in the main lobe of the beam directed at g . Therefore, SB-NOMA performs well and the system sumrate is maximized. More the users are separated, more SB-NOMA degrades due to the beam misalignment. This is also observed in Fig. 5(b). With a large but finite value of

and R SB
T represents the highest sum-rate obtained with NOMA as seen in Fig. 5. We can see also that when 0 < ≤ 3dB 1 , SB-NOMA outperforms MB-NOMA since the two beams generated by MB-NOMA scheme do not overlap well. Otherwise, MB-NOMA outperforms SB-NOMA and has a constant sum-rate, as given in Lemma 1.
Lemma 1: Using LSAAs with a large but finite values of M x and M z , MB-NOMA has the following constant sum-rate R MB ∞ when > 3dB 1 for the 2-user scenario Proof: Using LSAAs with a large values of M x and (30) can be rewritten by The second term in (33) can be approximated to log 2 (1 + 1−γ 1,g γ 1,g ) with a large but finite number of antennas. Thus, Lemma 1 is verified. The corresponding value of R MB ∞ in (32) is approximately equal to 14.7 when using 128 × 1 ULA. From Fig. 5(a), we find that the simulation results verify Lemma 1.
Remark 1: Based on the above analysis, we conclude that SB-NOMA is only suitable when users are very close. Otherwise, MB-NOMA has a significant constant sum-rate independently of , compared to SB-NOMA. These results allow us to extend to the more general multi-user scenario by first clustering users with high spatial interference within SB-NOMA groups and then clustering the remaining users in MB-NOMA groups.
Remark 2: SDMA exhibits a significant sum-rate performance with LSAAs when the users are well separated in the angle domain. However, for HBF-based mMIMO-NOMA systems, SDMA fails to satisfy all users in overloaded scenarios, i.e., when K > N RF . Indeed, SDMA can accommodate one user at each RF chain. In contrast, MB-NOMA can serve multiple users with arbitrary AoDs within one RF chain and has a considerable system sum-rate.
Considering these observations, we propose a joint SBand MB-NOMA scheme in Section V that requires only user angles. Additionally, we demonstrate in Section VI-C that combining the benefits of both SB-and MB-NOMA is more advantageous than using one of these schemes alone.

IV. JOINT β-BASED SB-NOMA AND TDMA SCHEME
One possible solution to leverage more DoFs is the implementation of TDMA which may support more than N RF groups within different time slots. Inspired by [19], the joint β-based SB-NOMA and TDMA is a two-phase scheme adopting TDMA only in an overloaded scenario. In the first phase, the multi-user β-UC algorithm proposed in [24] partitions the K users into G SU SU and G SB multi-user SB-NOMA groups. Only in an overloaded scenario, i.e., G SU +G SB > N RF , a second phase that exploits the time-domain resources is applied. In fact, during each time slot, at most N RF groups can be served concurrently. For this, it is necessary to design a group clustering algorithm that schedules the G SU + G SB > N RF groups in the time domain. In [19], the authors design a two-stage group clustering algorithm to reduce interference between groups served simultaneously. Specifically, they use the spatial angular distance as a metric to measure the corresponding inter-group interference. They found that inter-group interference can be significantly reduced by increasing the spatial angular distance. Note that this algorithm is applied only with ULA, and the cluster angles belong to a predefined set of azimuth angles with a fixed search step size J = M. In this section, we have updated this group clustering to reduce the inter-group interference measured by β in each time slot. Contrary to this work [19], β determines the level of spatial interference and includes both angular distance and beamwidth information [24]. Thereby, it is more accurate than the angular distance for calculating inter-group interference. Moreover, it is built based on the array factor, and thus our β-based group clustering algorithm in time-domain can be applied to any array architecture. Since β g,g = β g ,g , we first define the triangular matrix B ∈ C G×G describing the spatial inter-group interference, with the (g, g )-element given by where β g,g is the spatial interference between group g and group g , and is given by , as defined in (5).
Before detailing the proposed scheme, some notations and definitions are presented as follows • N TS is the total number of time slots and is given by
Stage 1: Clustering of groups with high inter-group spatial interference in different time slots 2: repeat 3: Locate in B the 2 groups (e.g., groups o and r) having the largest spatial interference 4: if o ∈ G NA then 5: if r ∈ G NA and t < N TS then 8: t = t + 1, add group r to G TS t , and remove group r from G NA . Locate, at each time-slot n, the groupî n having the highest spatial interference with the group g , i.e., βˆi n ,g = max Add group g to G TS n , and remove group g from G NA . 17: if card(G TS n ) ≥ N RF then 18: remove G TS n from G AV . 19: end if 20: until G NA = ∅.
• G AV is the group set including all the available groups sets and is initialized by In hybrid systems, it is possible to simultaneously schedule at most N RF groups, i.e., card(G TS m ) ≤ N RF . Therefore, if card(G TS m ) < N RF , then G TS m ∈ G AV . Thus, G TS m is considered an available group set if we could add at least one more group in the m-th time slot.
Initially, the first N TS groups with high spatial interference β are scheduled in different time slots. Then, in the second stage, the remaining groups are clustered so that the inter-group interference is reduced in each time slot. Further details of the β-based group clustering algorithm in the time domain are given in Algorithm 1. In Section VI, we show that the joint β-based SB-NOMA and TDMA scheme solves the problem of the SB-NOMA limitation in overloaded scenarios. It works well and offers a considerable sum-rate performance. However, synchronization is a major hurdle for implementing TDMA in mmWave hybrid systems, as quick and precise timing is essential for attaining high data rates. Since applying TDMA with NOMA introduces temporal synchronization complexity, a scheme that grants more DoFs by utilizing just NOMA without using other multiple access (MA) techniques is needed. To that end, we assess the potential of MB-NOMA in the following section, allowing users within the same group to be served, whatever their spatial distribution.

V. JOINT SB-AND MB-NOMA SCHEME
To provide more DoFs for the SB-NOMA in an overloaded scenario without relying on any extra MA techniques, we suggest a scheme based solely on NOMA. Recall from Remark 1 that SB-NOMA is suitable only for nearby users in space, whereas MB-NOMA offers a significant sum-rate regardless of the spatial distribution of users. Taking inspiration from our findings, we combine the benefits of SBand MB-NOMA to tackle the limitation of SB-NOMA in overloaded scenarios. As a first step, as seen in joint SB-NOMA and TDMA, users with high spatial interference are clustered into the same SB-NOMA group according to the multi-user β-UC algorithm from [24]. If G SU + G SB > N RF , then the remaining users are partitioned between two-by-two MB-NOMA groups and SU groups such that the total number of groups G f = G SB + G SU + G MB is equal to N RF . Since G f needs to be less than or equal to N RF to serve all users, the highest possible value, N RF , was selected as it generally yields a higher overall rate. This is because more users served by SDMA with its high BF gain leads to reduced interference and an enhanced system sum-rate. In the proposed joint SB-and MB-NOMA scheme, the transmit RF beamformer w g related to group g depends on its type and is given by (35) at the bottom of the page. In Fig. 6, we present the system model of the proposed scheme.
To the best of our knowledge, this is the first work that jointly leverages the potentiality of SB-and MB-NOMA. And this scheme is applicable for both linear and rectangular antenna arrays with only the knowledge of users' AoDs. We develop a low-complex MB-NOMA UC technique to reduce their complexity using only angular information. From Section III-B, once users with high spatial interference are all grouped in SB-NOMA groups, the manner in selecting two users into MB-NOMA groups based on their AoD is not critical regarding the sum-rate. To this end, we first sort the G SU users in an array in the ascending order of their azimuth AoD θ with either ULA or URA. Then, we adopt three different selection strategies as shown in Fig. 7, denoted as UC1, UC2, and UC3, to select 2-by-2 the users from the sorted array and cluster them in MB-NOMA groups until all the G MB MB-NOMA groups are defined.
We now carry out a comparison in terms of the sum-rate to corroborate the best selection strategy. In Fig. 8, we plot the sum-rate of the different MB-NOMA UC strategies and that of the random MB-NOMA UC, where the users are selected randomly to belong within the MB-NOMA groups. We assume a ULA array with N BS = 128 antennas. As expected for K ≤ N RF , the K users are spatially separated thanks to the high BF gain with LSAAs. So, there is no need to exploit MB-NOMA, and the impact of the selection strategy does not appear here. Otherwise, we adopt MB-NOMA to offer more DoFs. We find that UC1 outperforms other strategies. This superiority is due to lower inter-group interference achieved when using UC1 compared to UC2 and UC3. In Section VI, UC1 is considered as the reference.
Both SB-and MB-NOMA employ the same power allocation and user ordering techniques at the BS and the w g is calculated as in (15) and (17) if MB-NOMA with S g = k, k , a 1,k , M x , M z if SU with S g = {k}.
(35)  same SIC decoding strategy at the receivers, where the primary distinction lies in their analog beamforming design. Whereas SB-NOMA generates a single beam for NOMA users, MB-NOMA creates multiple beams for each user. Adding MB-NOMA to SB-NOMA allows the BS to serve more users using NOMA while introducing additional complexity such as SIC decoding at the receiver. Nonetheless, this system enables users to use the same time-frequency resources without the trouble and need for fast and accurate synchronization.

VI. ILLUSTRATIVE RESULTS AND DISCUSSIONS
In this section, we numerically evaluate the performance of the proposed schemes for mmWave hybrid systems. Specifically, we consider a rural environment using the statistical mmWave channel model and simulator NYUSIM. In Table 1, we summarize the specific values of the adopted simulation parameters.
To illustrate the effectiveness of our proposed schemes, we adopt three baseline schemes; two apply fully DBF (thus requiring N D RF = M RF chains) and the other one uses HBF with N H RF ≤ M. The following acronyms will be used to refer to the different schemes.
• DBF: denotes the scheme considered in [33], where the BS generates K directive beams, each one is steered in the AoD of the intended user. • DBF-SB-NOMA: denotes the scheme proposed in [24], where the β-UC algorithm clusters the users within SU and multi-user SB-NOMA groups. • HBF-SB-NOMA-TDMA: denotes the scheme proposed in [19]. For that, we select N H RF groups and prioritize the SB-NOMA groups to fulfill the requirements of maximum user handling.
For a fair comparison, the different schemes apply our angle-domain user ordering and power allocation strategies designed in [24]. Therefore, all of them only require the users' AoDs.

A. 2D HBF-BASED MIMO-NOMA PERFORMANCE
The BS adopts a ULA array with M = 128 transmit antennas to serve the 2 ≤ K ≤ 2N H RF users. While the fully DBF involves N D RF = M = 128 RF chains, we assume N H RF = 20 RF chains for HBF. Fig. 9 plots the sum-rate and the number of active RF chains, N a RF , per time slot versus the number of users for the different aforementioned schemes. And Fig. 10 plots the probability P os = P(G SU +G SB > N H RF ) that the cell is overloaded using SB-NOMA in hybrid LSAAs systems.
Obviously, for a sparse cell with K ≤ N H RF , the hybrid schemes are the optimal MIMO-NOMA solution, as they provide the same performance as DBF-SB-NOMA with much lower complexity, cost, and power consumption. This is evident from Fig. 9a, where the HBF-SB-NOMA, HBF-SMB-NOMA, β-HBF-SB-NOMA-TDMA curves merge with the DBF-SB-NOMA curve, implying that the cell is not yet overloaded, as also illustrated in Fig. 10, where P os = 0 for K ≤ 20. Otherwise, when the number of users exceeds the RF chains available, the cell begins to experience an overloaded scenario due to very narrow beams with LSAAs, as seen in Fig. 10. However, in this scenario, HBF-SB-NOMA fails to provide full connectivity, even if it outperforms other HBF schemes in terms of sumrate performance. To fulfill the requirements of maximum 3. In [19], the authors propose a simple AD UC strategy, in which UEs with the same estimated angle belong to the same group. To offer new DoFs, the authors developed a group clustering algorithm in the time domain by scheduling the groups with small angular distances in distinct time slots. Note that the estimated angles belong to a predefined azimuth angle set with a fixed search step size J = M [19]. user handling, HBF-SB-NOMA selects N H RF groups and prioritize the SB-NOMA groups. From Fig. 9(a), we can see that its system sum-rate slightly degrades for K ≥ N H RF . Indeed, as more users are present in the cell, they are more likely to be clustered into the same NOMA groups. This is also illustrated in Fig. 11, which shows the percentage distribution of groups when the BS adopts HBF-SB-NOMA for different values of K. We can see that with increasing K, the number of NOMA groups increases, as well as the number of users per NOMA group. However, up to N H RF groups can only be connected simultaneously in hybrid schemes. Therefore, as K grows, the number of groups remains at N H RF , but the number of served users in each group increases, leading to increased interference. This can be observed in the slight degradation of the system sum-rate when comparing HBF-SB-NOMA for K = 25 and K = 40.
To ensure full connectivity for all users and maintain fairness among them, we apply other approaches, such as TDMA and MB-NOMA, which compensate for the limited RF chains. As seen in Fig. 9, they offer a good sum-rate performance compared to DBF with 6.4 ≤ M/N a RF ≤ 64 times fewer active RF chains. In the following, we will separately analyze the evolution of their sum-rate curves for

1) PERFORMANCE OF β-HBF-SB-NOMA-TDMA
As seen from Fig. 9(a), for N H RF < K ≤ 2N H RF , the sum-rate of β-HBF-SB-NOMA-TDMA first decreases, then increases with the number of users, K. Indeed, for cells where N H RF < K ≤ 2N H RF , TDMA is used to serve the SU and SB-NOMA groups within different time slots and get the full connectivity. Scheduling the users in separate time slots reduces the system sum-rate per slot and the need for more active RF chains, as seen in Fig. 9(b). Then, in the second step by continuing to connect more users to the BS, the sum-rate increases again as well as the number of active RF chains. For that reason, the evolution of the sum-rate matches that of the number of active RF chains in Fig. 9(b).
Furthermore, Fig. 9(a) shows that β-HBF-SB-NOMA-TDMA scheme outperforms the one proposed in [19]. For example, when K = 2N H RF = 40 users, β-HBF-SB-NOMA-TDMA achieves a sum-rate gain up to 83% over HBF-SB-NOMA-TDMA [19]. This significant gain reveals the performance of our proposed user and group clustering algorithms in SB-NOMA groups and time slots, respectively, against those proposed in [19]. Moreover, while the latter is proposed only for ULA, our proposed β-HBF-SB-NOMA-TDMA scheme uses β that can be applied to any array architecture. The performance evaluation with URA will be illustrated in Section VI-B.

2) PERFORMANCE OF HBF-SMB-NOMA
HBF-SMB-NOMA, is proposed to provide more DoFs using only NOMA. As seen in Fig. 9, it has superior performance compared to DBF for K < 28 with 6.4 ≤ M/N a RF ≤ 64 times fewer active RF chains. Otherwise, DBF (with or w/o NOMA) has a greater sum-rate at the expense of higher cost and complex processing. Indeed, N a RF is smaller than or equal to N H RF = 20 for HBF-SMB-NOMA however, it always equals to M = 128 for DBF (with or w/o NOMA).
The sum-rate of HBF-SMB-NOMA increases up until K = 26, and then it decreases as shown in Fig. 9(a). This evolution can be explained by that of the number of active RF chains, N a RF , in Fig. 9(b). Once the cell reaches K = 26 users, N a RF starts to stabilize at N H RF , as seen in Fig. 9. To meet the requirement of G f = N H RF in the congested cell, the number of MB-NOMA groups must be increased and that of SU groups must be decreased. However, this comes with a decrease in performance, as seen in Fig. 9 for 26 < K ≤ 40, as the BF gain at each user within the MB-NOMA group is halved compared to that of the single beam in an SU group, as depicted in Fig. 4. Even with a basic antenna allocation, whereby the array is split into two sub-arrays of equal size, and a straightforward MB-NOMA user clustering strategy, HBF-SMB-NOMA achieves a promising sum-rate performance with few active RF chains when compared to digital methods. Future works should focus on designing angle-domain user clustering and antenna allocation methods that maximize the BF gain of each user about the other users and minimize inter-group interference to optimize the system's sum-rate performance.
For a better comparison between HBF-SB-NOMA and HBF-SMB-NOMA, we plot in Fig. 12 the cumulative distribution function (CDF) of the probability P(rate ≤ δ) that the user's rate is lower than a threshold, δ, for K = 25 and K = 40. We can observe that HBF-SB-NOMA fails to manage all of the users, leaving 4% and 25% unserved for K = 25 and K = 40, respectively, while HBF-SMB-NOMA offers full connectivity with P(rate = 0) = 0. However, HBF-SB-NOMA provides higher rates to its served users than HBF-SMB-NOMA as the latter serves a greater number of users, leading to increased inter-user interference. Furthermore, the BF gain when using MB-NOMA is halved against that of SB-NOMA. For example, if K = 40, the probability of HBF-SMB-NOMA providing a rate greater than 6 [bps/Hz] is 7%, while it is 18% for HBF-SB-NOMA. Consequently, even though HBF-SB-NOMA offers a higher system sum-rate, as also seen in Fig. 9(a), HBF-SMB-NOMA manages the connectivity of all users and ensures better fairness among them.
Moreover, it can be seen that HBF-SMB-NOMA brings a notable sum-rate gain when compared to HBF-SB-NOMA-TDMA [19]. For instance, with K = 25 and K = 40, the gain of HBF-SMB-NOMA is nearly 99% and 66.6% respectively. Additionally, for N H RF < K < 35, HBF-SMB-NOMA outperforms our proposed β-HBF-SB-NOMA-TDMA scheme. However, for 35 < K < 40, β-HBF-SB-NOMA-TDMA gains slightly better performance but by requiring fast and accurate synchronization at the BS and the receiver sides.

B. 3D HBF-BASED MIMO-NOMA PERFORMANCE
Due to the size constraints of mMIMO antennas, URA is more practical for mMIMO than ULA [31]. In this section, we consider a 64 × 8 URA at the BS, meaning that there are N D RF = 512 RF chains for DBF. And we consider only N H RF = 20 RF chains for HBF. Fig. 13 plots the sum-rate for the different schemes versus the number of users. The TDMA-based scheme in [19] is specifically tailored for ULA and therefore is not included in this section. Instead, we consider our proposed TDMA-based scheme for comparison.
It is clear that the sum-rate curves for URA and ULA in Figs. 13 and 9 have a similar overall shape. However, the number of users K i at which DBF curves separate from those of HBF is different. Specifically, K i = 20 (resp. K i = 22) for 128 × 1 ULA (resp. 64 × 8 URA). This comparison shows that, with URA, more users can be clustered in SB-NOMA despite the total number of antennas being four times greater than what ULA offers. Indeed, the 3D beam-width in both azimuth and elevation is determined by the number of horizontal and vertical antennas, respectively, that are smaller than the number of antennas in ULA.

C. JOINT SB-AND MB-NOMA VERSUS ONLY MB-NOMA
Now, we will verify the potentiality of using SB-and MB-NOMA frameworks w.r.t. those using only MB-NOMA. Fig. 5 shows that when both users are in each other's main lobe, SB-NOMA slightly performs better than MB-NOMA. Otherwise, MB-NOMA outperforms and has a constant sum-rate. Note that for a 2-user scenario, there is no inter-beam interference impact on the system sum-rate performance. However, SB-NOMA forms a beam using the whole array, while with the BST, the steered beam of MB-NOMA is the superposition of two wider beams. Hence, the beam generated with MB-NOMA when the users are in each other's main lobe is larger than that with SB-NOMA. Therefore, even when the users are in each other's main lobe, SB-NOMA outperforms MB-NOMA thanks to its narrow beams, which increase the BF gain and decrease inter-group interference.
To illustrate this, we compare HBF-SMB-NOMA with HBF-MB-NOMA, which applies only MB-NOMA without SB-NOMA, as in [26]. We assume that the BS adopts a 16 × 8, 32 × 8 or 64 × 8 URA array with 20 RF chains. For HBF-MB-NOMA, since we only consider 2-user MB-NOMA groups, we apply the 2-user β-UC algorithm [23] in the first phase. Fig. 14 plots the average sum-rate for the two schemes versus the number of users, K. We find that HBF-SMB-NOMA is superior to HBF-MB-NOMA in terms of average sum-rate, with the gap between the two increasing with K. The gap reaches its maximum point at K a , depending on the antenna array configuration. For instance, K a = 32, 27, and 25 for 16 × 8, 32 × 8, and 64 × 8 URA, respectively. Indeed, as the number of users in a cell increases, it is more likely that they are located near each other, causing the gain gap to rise with K until it reaches K a , the point at which the NOMA groups in the first phase have saturated. The results verify the potentiality of using both SB-and MB-NOMA instead of only MB-NOMA to benefit from the multi-user NOMA diversity.

VII. SUMMARY
This paper considers HBF-based mMIMO-NOMA systems at mmWave frequencies. In particular, we address the limitation of SB-NOMA to exploit the multi-user diversity in mmWave hybrid systems with LSAAs. We have proposed two schemes offering additional DoFs. Contrary to the work done in the literature, we leverage the directionality of mmWave channels, and we use only angular information. We consider 2D and 3D systems using ULA and URA architectures, respectively. The first scheme adopts TDMA, and the other one leverages the potential of MB-NOMA. Simulation results have shown that they yield significant performance gains in terms of sum-rate, compared to the solution proposed in [19] and other schemes based on fully DBF. For instance, the proposed TDMA-based scheme achieves a sum-rate gain of up to 83% over the existing one. Furthermore, the results demonstrate the effectiveness of the proposed joint SB-and MB-NOMA scheme in providing more DoFs without applying additional multiple access techniques. Moreover, they verify the superiority of this scheme over those using only MB-NOMA. However, their spectral and energy efficiencies are not optimized. An angle-domain resource allocation method, including antenna selection, user clustering, and power allocation to achieve a significant gain, is left as a future work. Besides, it would be interesting to include more practical HBF structures, such as sub-connected HBF, which are known to have reduced power consumption compared to the fully-connected HBF and allow for more energy-efficient designs. Given the potential of the reconfigurable intelligent surfaces (RIS) technology to increase coverage in mmWave communications, which has attracted a great deal of attention, it is of particular interest to examine our proposed schemes in the presence of RIS, as done in [34].