Hybrid Beamformer Exploiting Multistream per User Transmission for Millimeter-Wave NOMA Communications

This treatise investigates multistream hybrid beamforming (HBF) for a millimeter-wave non-orthogonal multiple access (NOMA) system deployed in a downlink of an urban microcell environment. Maximization problem is developed to optimize sum-rate. For the sake of ensuring high correlation of users’ channels, user’s clustering and ordering are based on the angle of arrivals and users’ channel-weights, respectively. An optimized analog combiner of each user is obtained from the first $N_{s}$ column vectors of the left unitary matrix derived from the singular value decomposition of the channel of each user. Analog beamformer is matched to the phase of the strong users ’ composite intermediate analog channel to maximize the beamforming gain. Both the analog combiner and precoder for sub-connected structure (SCS) are formulated via a novel dominant sub-array matrix elements’ extractor. Conventional zero-forcing processing is contemplated for digital precoding. Memory space complexity is evaluated to corroborate the simplicity of the proposed schemes’ computational complexity. Results obtained from the simulation exhibit that HBF-NOMA attain superior sum-rate than HBF-orthogonal multiple access and conventional multiuser schemes in line of sight link. Lastly, the proposed SCS-HBF-NOMA precoding scheme performs higher than fully connected counterpart in terms of energy efficiency.


I. INTRODUCTION
Rapid demands for high data-rate mobile wireless applications such as online gaming, telemedicine, electronic learning, and video conferencing result in a bottleneck for deploying below 6 [GHz] spectrum owing to the inherently scarce signal bandwidth limitation. Therefore, a paradigm shift from the current existing spectrum band to a millimeterwave (mmW) one ranging from 30 [GHz] to 300 [GHz] is inevitable to support high data rate traffic in order of Gbps for the post-5G wireless communications. However, mmW propagation suffers from path loss fading, rain absorption and The associate editor coordinating the review of this manuscript and approving it for publication was Ding Xu . shadowing in line of sight (LOS). The small wavelength of mmW allowing the exploitation of a large number of antennas in a small area of wireless device circuit is to combat the effect of path loss fading. Deploying massive antennas for multiple input multiple output (MIMO) having fully digital beamfoming (DBF) for future wireless system will lead to namely: 1) high radio frequency (RF) chains' complexity, 2) more power consumption, and 3) high implementation cost. Hence, hybrid (digital and analog) beamforming (HBF) has been proposed as an appropriate beamforming (BF) approach for massive MIMO.
The two major HBF structures named fully connected structure (FCS) and sub-connected structure (SCS) are distinguished exploiting RF chains and antennas' linking mode, respectively. Specifically, in FCS-HBF, each RF chain is linked to all antennas via a set of phase shifters (PSs) [1], while the SCS-HBF has each RF chain chosen from the aggregate RF chains at the output of the base band precoder mapped to each group of sub-array antennas via the corresponding set of PSs [2], [3]. Consequently, FCS-HBF can attain full BF gain in comparison to SCS-HBF because each of its RF chains is associated with all antennas. The existing works have studied both structures for single user single-carrier and multi-carrier cases in [3]- [8] and [8]- [10], respectively. Moreover, multiuser system has also been studied for large-scale users connectivity [11]- [14]. Specifically, authors in [14] investigated multistream FCS-HBF for conventional multiuser in order to further improve system sum-rates. However, multistream FCS-HBF structure leads to hardware complexity and consume more power than the single stream counterpart.
Moreover, the traditional multiuser's base station (BS) can only serve a maximum number of users that is equal to its RF chains. Non-orthogonal multiple access (NOMA) was suggested in literature to address this limitation of traditional multiuser system. Power domain NOMA has received much attention among researchers [15] owing to its numerous advantages elucidated in [16]. The power domain NOMA leverages superimposition of multiusers' symbols at the BS before transmission and successive interference cancellation (SIC) of the feeble users' symbols by the strong users at the receiver [17]. Mitigation schemes against the intracluster interference and intercluster interference along with obtaining optimal power allocation for multiuser NOMA systems have been a major concern among researchers in a recent decade. Therefore, a proper BF scheme is crucial for removal of intercluster interference and an efficient SIC scheme to eliminate intracluster interference in NOMA systems.
Study of FCS-HBF-NOMA exploiting single stream per user transmission is well known in literature [1] owing to its full BF gain, which can boost transmit signal to the cluster users for effective communications. Similarly, adopting the SCS-HBF to NOMA for single stream per user configuration is firstly known in [18]. Authors carried out their study using simplistic Saleh Valenzuela channel model, which are not realistic because the channel gain is not based on practical measurement such as the mmW channel model of New York University (NYU). Hence, authors in [19] studied HBF-NOMA performance analysis based on NYU mmW channel and fixed power allocation scheme, wherein SCS-HBF-NOMA built on SIC zero forcing (ZF) scheme manifested a wider performance gap and higher computational complexity compared to FCS-HBF-NOMA. Furthermore, the extension to multistream HBF-NOMA synario was not detailed in [19]. Moreover, multistream HBF structure exploited in [20] leveraged on RF chains diversity scheme for single stream HBF-NOMA communications is simple because the signal processing did not involve combiner. Different from the existing works, this treatise proposes a simple but feasible multistream phase-zero forcing (P-ZF) SCS-HBF on the basis of unique dominant subarray matrix element extractor (DSMEE) in conjunction with a multistream FCS-HBF design solutions for mmW NOMA communications in a line of sight (LOS) of a typical urban micro (UMi) cell environment. We focus more on LOS link because in case of small cell configuration, mmW signal travels by LOS [21], which favors the exploitation of NOMA communications. Nevertheless, investigating the system's performance in a complete non-line of sight (NLOS) configuration is also significant to reveal an insight into the performance degradation arisen from the LOS link blockage, namely physical obstacles such as foliage, buildings, and mountains. The significant contributions of this treatise are recapitulated as follows: 1) We develop a multistream fully connected HBF structure articulating an explicit relationship between each RF chain at the output of digital beamformer (DB) and the corresponding antennas via analog beamformer (AB) components. Based on the developed multistream FCS-HBF structure and the existing multistream SCS-HBF structure [19], the hardware complexity arisen from the number of PSs incorporated in the multistream AB unit of HBF structure can be visualized, which was not revealed in [19]. Hence, our analyses of both the system model formulation and sum-rate maximization of the proposed scheme are carried out based on the multistream HBF structures, different from single stream system based analyses in [19]. 2) In order to design an energy-efficient multistream HBF-NOMA system characterized by a SCS at both BS and users, each user's AC is optimized on the basis of singular value decomposition (SVD) [22] after which a low complexity phased and ZF aided block diagonalization schemes [13] are employed to optimize the (multistream) AB and DB of an SCS-HBF-NOMA system, respectively. Explicitly, each user's AC and their corresponding AB vectors are block diagonalized exploiting a novel DSMEE method. 3) For the purpose of benchmarking the SIC-ZF based SCS-HBF-NOMA and the optimal P-ZF FCS-HBF-NOMA in [19] to our proposed P-ZF based SCS-HBF-NOMA scheme, a simple fixed power domain NOMA is adopted. 4) Finally, computational complexities of the proposed multistream precoding and combining schemes are evaluated in terms of memory space complexity under fixed power domain NOMA assumption.
The rest of this treatise is structured as follows. Methodology to formulate the system model of the proposed downlink multiuser mmW MIMO-HBF-NOMA is elucidated in Section II. Maximization problem is developed to optimize the sum-rate in Section III. Moreover, design solutions proposed for analog combiners (ACs) as well as analog and digital precoders are thoroughly discussed. Also, the power allocation scheme is recapitulated. In Section IV, results obtained from VOLUME 10, 2022 simulation are presented to corroborate our findings with regard to the achievable energy efficiency (EE) and sum-rate. Finally, the paper is concluded in Section V. Notation: We exploit both the lower-case and upper-case boldface letters to represent vectors and matrices, respectively. (·) H , (·) T , (·) −1 , tr (·), |·|, and · F designate the conjugate transpose, transpose, matrix inversion, trace of a matrix, amplitude, and Frobenius norm process, respectively. E (·) represents the expectation. Also, a (i), A (i), A (i, i), and A (:, j) represent vector a having ith row, every element, the diagonal elements, and all rows' elements in jth column of matrix A, respectively. Annotation CN (ψ, ) represents the Gaussian distribution having mean at ψ and variance of . Finally, I NN s represents an NN s × NN s identity matrix, where N and N s denote the number of clusters and that of transmitted symbol streams, respectively.

II. METHODOLOGY
The proposed downlink of a single carrier multiuser mmW massive MIMO-NOMA system's specifications is depicted in Fig. 1, where the transmitter (BS) is equipped with transmit antennas N t and RF chains N RF t . The BS communicates to each of the M users per cluster through the mmW channel H. As evident in user m of Fig. 1, each user is equipped with receive antennas N r ≥ N s and N RF r (= N s ) RF chains to support N s data stream. On the other hand, the BS constitutes either FCS-HBF or SCS-HBF as manifested in Fig. 1, having N RF t = N · N s , where N and N s denote the number of clusters and that of data streams, respectively. Furthermore, the AB constitutes N t · N · N s and N t PSs for FCS and SCS, respectively. As evident in Fig. 2 for FCS-HBF, each group having nth (N · N s ) RF chains connects to all N t antennas via the N · N s sets of PSs at the BS, whereas the corresponding SCS-HBF in Fig. 3 visualizes that each group having nth (N · N s ) RF chains connects to the MN s sets of PSs, where M = N t N RF t denotes the sub-array antennas.  Due to the limited number of RF chains in HBF system, the number of RF chains is restricted to NN s ≤ N RF t ≤ N t and N s ≤ N RF r ≤ N r for the BS and each user, respectively. Unlike traditional multiuser HBF system, where a user accesses BS resources through a single beam, the HBF-NOMA network can support M correlated clustered users simultaneously. Explicitly, M · N users can be spatially served. For the purpose of signal processing analysis, [NN s × M ] users' data streams will be ordered for superposition coded signal x n ∈ C N s ×1 for n = 1, . . . , N . Every superposition coded signal x ∈ C NN s ×1 is written as x = [x 1 , x 2 , . . . , x N ] T and then precoded exploiting D ∈ C NN s ×NN s digital precoder as depicted in Fig. 2 and 3. Then, the output of D is converted to RF signal exploiting analog precoder F ∈ C N t ×NN s . In this regard, the RF output of the AB, namely the transmit signal s can be expressed as Notably, each element of x can be formulated as x n = M m=1 p (n,m) N s · I N s ·x (n,m) , represented as N s superposition coded signal instigated by power domain NOMA, where p (n,m) means U (n,m) user's power coefficient allocated for multistream transmission and x (n,m) ∈C N s ×1 denotes a transmitted multistream symbol for user U (n,m) . Furthermore, the signal vector is constrained to E xx H = P NN s · I NN s , where P is the aggregate transmit power.
On the account of every AB element constituting constant amplitude PSs, the elements of F are normalized to . . , N t and j = 1, . . . , NN s . In order to comply with the aggregate transmit power constraint, D is normalized to satisfy FD 2 F = NN s . Under a flat fading scenario, y (n,m) ∈ C N s ×1 received signal at the output of RF chains of the U (n,m) user is written as: where W (n,m) ∈ C N r ×N RF r and H (n,m) ∈ C N r ×N t represent the multistream AC and mmW MIMO channel coefficient between the user U (n,m) and BS, respectively. D n ∈ C NN s ×N s indicates the corresponding N s column vectors in D precoder, which boost the transmit signal towards the nth cluster. P (n,m) = p (n,m) N s · I N s comprises each symbol stream power and g (n,m) ∈ C N r ×1 represents additive white Gaussian noise (AWGN) annotated as ∼ CN 0, σ 2 at N r receive antennas of each user. It is also noted that there exists k < m users and l = n clusters. NYU mmW channel model 1 for multicarrier MIMO-NOMA system in [24] has been adopted to a single carrier counterpart. Hence, single carrier channel coefficients having v multipath components between the BS and each user U (m,n) are formulated as and represent array steering vectors at the BS and user, respectively, γ (n,m,v) , τ , , f , d T , d R , φ (n,m,v) , and θ (n,m,v) denote the amplitude of the channel gain in v th path, time delay, the phase of the multipath component, carrier frequency, antenna element spacing at the BS, antenna spacing at the receiver, angle of departure (AOD), and AOA, respectively. Notably, for the purpose of achieving low antenna correlation, both d T and d R are set to 0.5λ, where λ denotes wavelength of the signal [25]. These parameters are extracted from the output file saved as ''DirPDPinfo.mat'' produced from the NYUSIM software. Table 1 depicts the mmW NOMA system configurations to generate the ''DirPDPinfo.mat'' output files. It is worthy to mention that achievable performance analysis for the proposed mmW HBF-NOMA system based on NYU channel model enables to guide the mobile system's designer to corroborate the realistic performance of the cutting edge system [24].

III. OPTIMIZATION PROBLEM FORMULATION AND SOLUTION
The optimization problem is developed by firstly modeling the R sum sum-rate of the proposed HBF-NOMA scheme as where the data-rate R (n,m) of mth user in nth cluster is written as and whereH (n,m) ∈ C N s ×NN s = W H (n,m) H (n,m) F denotes analog channel of user U (n,m) , α (n,m) represents the power ration coefficient for U (n,m) th user, intracluster interference I intra (n,m) and intercluster interference I inter (n,m) are formulated as I intra respectively for k < m and n = l adjacent clusters. σ 2 (n,m) denotes noise variance owing to AWGN emanating from receiver's antennas. Notably, the channel state information SINR (n,ϒ)→(n,m) written as is postulated for user U (n,m) to successfully decode the message of other weak users U (n,ϒ) in nth cluster, where ϒ > m, I intra (n,k)→(n,m) = P (n,k) |H (n,m) D n | 2 , P (n,ϒ) = P n × α (n,ϒ) , and σ 2 (n,m) = W (n,m) 2 σ 2 (n,m) . A perfect SIC scheme at the strong user is assumed to eliminate the intracluster interference arisen from the feeble users and the removal of intercluster interference is guaranteed at the strongest user in each cluster via the efficient digital precoding scheme built on the basis of composite matrix constitutingH (n,1) effective analog channel of the strongest user in each cluster [26]. Therefore, data-rate of the strongest user can be simplified as Maximization of the sum-rate in (6) is achievable through the effective users pairing and ordering, which is crucial for obtaining the optimal analog combinerW (n,m) , hybrid precoderF ·Ď n and powerP (n,m) for user U (n,m) . Hence, we aim for maximizing the optimization problem in (11) as follows: Solving the objective function in (11) is non-convex owing to its subjected constraints. Therefore, we subdivide the optimization into two disjointed problems, namely the HBF and power optimization. In literature, the iterative approaches of designing optimal precoders such as alternating minimization-based algorithms [8] and orthogonal matching pursuits algorithm [4] among other iteration-based optimization schemes are capable of approaching global optimal at the expense of high computational overhead cost. Instead, non-iterative precoding leveraging on the system linearity is known to be sub-optimal to the iterative precoding [14]. Moreover, the use of non-iteration-based precoding becomes feasible and its performance enables to converge to a local optimality. For the sake of realizing a practical system having lower computational complexity in conjunction with carefully compromized performance, a best possible harmonization of the non-iteration-based precoding, combining, and a refined power allocation is highly desirable [19], [26].
In order to benchmark the existing traditional multiuser and OMA schemes, the traditional multiuser's sum-rate R trad s is expressed as [14] R trad where SINR trad (n,m) = and achievable data-rate of OMA scheme R OMA (n,m) is obtained from [27], [28] where M denotes the number of users per cluster. Based on (19), an achievable sum-rateR OMA s of OMA scheme is computed from A. USERS PAIRING AND ORDERING The cluster head selection algorithm and user's correlation have been established to allocate users in each cluster owing to the randomly generated geometric channel model used in [18], where the AOA is randomly generated. In our scenario, the realistic channel model is generated for each pair of deterministic AOD and AOA. For effective communication, AOA is usually modelled as the AOD in LOS link. Hence, every user in the unidirectional AOA linking with the same AOD at the BS is allocated to the same cluster. This approach is exploited in line with both the geometric channel model and realistic NYU channel model, thereby analog precoders are often designed based on array steering angle [26], [29]. For instance, clusters [1,2]  . . , N clusters. Hence, the analog channel with a maximum BF gain in each cluster is selected to formulate the composite analog channel for the sake of designing the DBF employing ZF algorithm. Notably, DBF based on strongest users does not only mitigate the intercluster interference at the strongest users but can filter off a certain degree of intercluster interferences generated at the weak users owing to the fact that clustered users are paired on the basis of equal AOAs. Finally, for the purpose of aiding an SIC processing commonly deployed to mitigate intracluster interference in NOMA communications, the users' power coefficients are rationed in ascending order on the basis of their effective hybrid digital-analog channel weights denoted as H (n,m) D n .

B. PROPOSED ANALOG COMBINERS
In order to harvest large intermediate analog channel gain from each user, the analog combiner W (n,m) ∈ C N r ×N s via an SVD of user U (n,m) 's channel coefficient is proposed, which is available at the BS by limited feedback [25]. Hence, the optimized W full (n,m) ∈ C N r ×N s is denoted byW full (n,m) for FCS case, which is computed from the selected first N s column vectors of the left unitary matrix U. SVD of each user's channel is computed to obtain U as where V ∈ C N t ×N t and ∈ R N r ×N t represent the right unitary (square) matrix and the diagonalized singular values on U and V matrices, respectively. Based on (21), FCS combiner is formulated as [22] W full (n,m) = where 1 : N s denotes the first N s column vector spaces in U.
In case of the optimized SCS-AC,W sub (n,m) ∈ C N r ×N s can be formulated aš where M r is obtained from M r = N r N s . For the sake of solving the block diagonalization constraint in (23), we postulate a selective matrix G r i ∈ C M r ×N r that choosesw i (n,m) ∈ C M r ×1 dominant elements of each ith column vector of W full (n,m) , where i = 1, . . . , N s . Therefore, G r i for each ith sub-array column is formulated as for i = 1 to N s . Hence,w i (n,m) ∈ C M×1 is obtained from Then, an optimalw In the regard of (28), analog precoder F is derived for the FCS precoder as On the other hand, F for SCS is obtained via the DSMEE based phased scheme derived as follows. Firstly, we formulate intermediate AB with non-zero element F sub int ∈ C N t ×NN s for SCS as Then, a novel DSMEE is proposed to extract M dominant elements from each column vector of F sub int to optimize F sub . Therefore, a selective precoder is the ith column vector of F sub int ∈ C N t ×NN s to steer ith stream to their various nth cluster for N s streams per cluster and n = 1, . . . , N clusters. Accordingly, DSMEE G t i for each ith stream of N s streams per cluster is formulated as

D. PROPOSED DIGITAL PRECODERS
In order to design the optimized DB, the designed optimized AC and that of AB are processed to implement the effective analog channelH (n,m) ∈ C N s ×NN s for each user. Hence, each user's effective analog channel is formulated asH (n,m) = H (n,m) F. Afterwards, the cluster users ordering is performed to obtain the users, which possesses the highest analog channel exploitingH (n,1) = max H (n,1) , . .
Finally, ZF processing is implemented to obtain the multistream DB as [26] D = H (n,1) H (n,1) H H (n,1) where β is the normalized factor to ensure total power constraint on digital baseband. D ∈ C N RF t ×N ·N s multistream DB spatially multiplexes N · N s streams through a single multistream AB denoted as F ∈ C N t ×NN s .

E. POWER ALLOCATION SCHEME
We adopt both fixed power NOMA and a simple dynamic power NOMA built on the basis of large scale fading parameters to allocate users' power. Guided by the fixed power scheme in [26] and [30], total transmit power P is constrained to 1 [watt] and cluster users' power is configured to p n = P N . Furthermore, an aggregate summation of the clustered user power ration coefficients is restricted to unity, namely α (n,1) + α (n,2) + . . . + α (n,M ) = 1. p n is shared among the clustered users according to p (n,m) = α (n,m) × p n . Explicitly, the multistream power per nth cluster and multistream power per user U (n,m) are formulated as P n = p n N s I N s and p (n,m) = α (n,m) × P n , respectively. Therefore, power ration coefficients are constrained to satisfy α (n,1) < α (n,2) < , . . . , < α (n,M ) owing to the magnitude order of users' hybrid (analog-digital) channel 3 in (39) Adopting fixed power scheme to allocate transmit power for users can lead to unfair access to BS resources on the ground that users' distances to BS are randomly distributed and not fixed for each cluster. For the sake of allocating user's power on the basis of dynamic nature of the large scale fading parameters such as distance and path loss coefficient, a dynamic power allocation (DPA) scheme is introduced to leverage both inverse path loss (IPL) model and the user's distance to BS, which enables to optimize power ration coefficients. Based on the aforementioned, DPA based on distance model is formulated for the user U (n,m) power coefficient 3 Hybrid (analog-digital) channel means the linear matrix product involving the effective analog channel matrix and DB matrix of a single user.
where d (n,m) and c (n,m) represent the distance and pathloss coefficient for user U (n,m) , respectively. Accordingly, the data-rate for each user is iterated deploying both (40) and (41) in turns to return the optimized power ration coefficient for the system's operation. The optimized power ration coefficient is obtained based on max where D annotates distance.

F. ENERGY EFFICIENCY PERFORMANCE METRIC
Maximization of sum-rate plays a pivotal role in achieving EE, because EE of an HBF-NOMA denoted as E NOMA is obtained from the ratio between the attainable sum-rate and the aggregate power consumed, namely HBF precoders' power and total transmit power. Hence, EE is formulated as [1], [18]

G. SUM-RATE GAIN
For the purpose of clarity, the sum-rate gain (SRG) exploited in the discussion of results is measured based on the slope of sum-rate's curve. In this regard, the SRG is written as where denotes difference on achievable maximum and minimum values.

H. SIC-ZF BASED SCS-HBF-NOMA JUXTAPOSING P-ZF COUNTERPART
In mmW NOMA communication downlink, SIC based optimized subconnected structure AB exploiting single stream and multistream per user transmission was derived from (30) in [19]. More explicitly, dominant element of f sub n ∈ F sub namedf M n was obtained from the conjugate transpose of the v 1 's phase, mathematicallyf M n = 1 √ M e j v 1 , where v 1 is obtained from the SVD ofḠ n−1 ∈ C M×M = G t i G n−1 . It is worth mentioning thatḠ n−1 represents a submatrix of G n−1 ∈ C N t ×N t , where G n−1 = H H (n,1) (T n−1 ) −1 H (n,1) for N s = 1 and T n−1 = p (n,1) H (n,1) FF H H H (n,1) . Computational complexity cost to achieve the SIC based optimized AB was found higher than that of the phased scheme based AB deployed for the fully connected based HBF-NOMA. The higher complexity of SIC based AB arises from the high computational complexity costed to compute G n−1 matrix for n = 1, . . . N . Instead, in this paper, the phased scheme postulated for the AB optimization leverages on the conjugate transpose of the strongest user's intermediate analog channel for each cluster in (30) rather than v 1 , which is easy to compute and feasible for massive MIMO LOS link. Both SIC and phased based analog precoding schemes are aided by ZF based digital precoding. Details on the computational complexity of both the two SCS-HBF-NOMA schemes and FCS-HBF-NOMA are recapitulated in the followings.

I. COMPUTATIONAL COMPLEXITY ANALYSIS
The computational complexities of the P-ZF based multistream FCS-HBF-NOMA and SCS-HBF-NOMA systems as well as the SIC-ZF based multistream SCS-HBF-NOMA [19] are evaluated in terms of memory space complexity exploiting Big-O notation, which also draws conjecture about the memory space consumed by the various HBF-NOMA algorithms. Complexity of mathematical operations such as multiplication operation, SVD function, inverse operation, and exponential function on matrices is considered for the memory space complexity evaluation. Updating precoder and combiner matrices as well as their derivative matrices costs trivial memory space complexities for all schemes. Hence, their updates do not contribute significantly to the complexity analysis. Memory space complexity that arises from cluster users ordering based on channel weights [26] is equal for all schemes. Therefore, the complexities induced by both matrix update and users' ordering are ignored in Table 2 illustrating upper bounds of the memory space complexity of the HBF-NOMA precoders.

1) P-ZF BASED FCS-HBF-NOMA COMPLEXITY
The proposed multistream FCS-HBF-NOMA scheme's memory space complexity is analyzed for AC, AB, and DB as follows: • AC on the basis of SVD: AC consumes the memory space complexity of O NMN 2 r N t for SVD computations based on N · M users exploiting (21).
• AB built on the phased scheme: memory space complexity of AB for FCS-HBF-NOMA costs a complexity of O (NN s N r N t ) and O (NN s N t ) for multiplication and exponential function, respectively, in (29).
• DB leveraged on ZF scheme: memory space complexity of DB for FCS-HBF-NOMA consuming the complexities of O NMN s N r + N 2 MN 2 s N t + 3N 3 N 3 s and VOLUME 10,2022 O N 3 N 3 s corresponding to multiplication and inverse function arises from computing (38), respectively. Therefore, the memory space complexity for the multistream FCS-HBF-NOMA leads to a complexity of 1 and O (NN s N t ) for multiplication, SVD, inverse operation, and exponential function, respectively as depicted in the second row of Table 2, where 1 represents

2) P-ZF BASED SCS-HBF-NOMA COMPLEXITY
The proposed multistream SCS-HBF-NOMA scheme's memory space complexity is analyzed for AC, AB, and DB as follows: • AC on the basis of SVD and DSMEE methods: AC costs the memory space complexity of O (NMN s M r N r ) and O (NMN s N r ) corresponding to multiplication in (31) and (32)  for multiplication, exponential function, and addition operations, respectively. Complexity to update the AB matrix only costs O ((M + 1 + NN s ) N t ), which has been ignored in Table 2 on the ground that its matrix update does not involve mathematical operations and its degree of polynomial 4 arisen from both N t and M exponents goes to one. In the same vain, a space complexity for computing fixed power ration coefficients requires O N 2 N 2 s memory space complexity, which is also trivial and hence ignored in Table 2.
• DB contemplated on ZF scheme: memory space complexity of DB for SCS-HBF-NOMA spent the same complexity as in III-I2. Accordingly, the overall complexity for P-ZF based SCS-HBF-NOMA is summed for each of the mathematical operations and then presented in the third row of Table 2, where

3) SIC-ZF BASED SCS-HBF-NOMA COMPLEXITY
The memory space complexity of the SCS-HBF-NOMA scheme configured for multistream per user transmission is analyzed to benchmark the proposed subconnected structure aided HBF-NOMA scheme as follows: • AB contemplated on SIC scheme: memory space complexity of AB for SCS-HBF-NOMA costs a complexity of O M 3 , O N 3 N 3 s , and O (N M) corresponding to multiplication, SVD, inverse operation, and exponential function, respectively.
• DB on the basis of ZF scheme: memory space complexity of DB for SIC-ZF based SCS-HBF-NOMA consumes the same complexity for ZF processing as in III-I1. Accordingly, the overall complexity for SIC-ZF based SCS-HBF-NOMA are summed for each of the mathematical operations and then presented in the fourth row of Table 2, where

4) GENERAL OBSERVATION
SIC-ZF Based SCS-HBF-NOMA scheme leads to the highest complexity than the other HBF-NOMAs as the N t and M becomes large owing to the second degree of polynomial in 3 arisen from N 2 t and the third degree of polynomial in 4 induced from M 3 . The memory space computational complexity cost for obtaining the proposed multistream P-ZF based FCS-HBFNOMA precoders and combiners leads to lower memory space complexity compared to the proposed multistream SCS-HBF-NOMA owing to the DSMEE processing employed to block-diagonalize SCS-HBF (analog) combiner and precoder matrices. This conjecture can be well substantiated by comparing 1 to 2 , namely 2 manifests more memory space complexity than 1 . It is worth noting that multiplication operation articulates the highest memory space complexity than other operations. Hence, the conjecture is drawn solely from the multiplication operation.

IV. RESULTS
System parameters for LOS link are configured as in Table 1. Link level simulations of the traditional multiuser HBF and HBF-OMA configurations scenarios for M (= 2) users per cluster and N (= 2) clusters are implemented to benchmark the performance of our proposed multistream HBF-NOMA schemes using MATLAB [14].  [30]. Since NYUSIM software can generate realistic users' channel coefficients, a single run simulation is conducted on the basis of system configurations for SNR values ranging between −10 to 30 [dB]. A simple but feasible DPA scheme proposed on the basis of both (40) and (41) results in marginal performances with that of FPA scheme. Hence, link level simulation results for FPA scheme are only reported here.
Attainable sum-rates for the proposed HBF-NOMA scenarios transmitting two and four streams per user compared with a single stream counterpart are illustrated in Fig. 4. It is observed that multistream FCS-HBF-NOMA obtains higher spatial multiplexing gain to yield higher sum-rate than that attained by the subconnected structure counterpart at SNRs ranged from 0 to 30 [dB] and 10 to 30 [dB] corresponding to two and four streams per user transmission case, respectively, while the single stream FCS-HBF-NOMA exhibits higher sum-rate than that gained by the SCS-HBF-NOMA counterpart at all SNRs. It is worth noting the reason for the performance trend differences between the HBF-NOMA configured for two and four streams as well as its single stream counterpart. More explicitly, this performance trend differences arise from impacts on both the number of multistream and the user's power sharing factor SNR NN s , resulting in a lower attainable sum-rate for multistream FCS-HBF-NOMA at low SNRs than that achieved by the subconnected counterpart. This is because, more power is required to boost fully connected AB structure for achieving full BF gain, whereas the power sharing factor arisen from N s = 1 is enough to boost a reduced number of PSs in the fully connected single stream AB to attain higher BF gain than its subconnected structured counterpart at all SNRs. In order to validate the results obtained in Fig. 4, Fig. 5 illustrates attainable sum-rates of HBF-OMA and HBF traditional multiuser configured for four streams to benchmark the proposed scheme operating in LOS link. Fig. 5 portrays that the proposed SCS-HBF-NOMA scheme achieves a superior sum-rate compared to the SCS-HBF-OMA and SCS-HBF traditional multiuser configurations at all the SNRs. Notably, at the SNRs ranged between 10 and 30 [dB], the HBF-OMA also achieved sum-rate gain of 2 [bps/Hz/dB] using (43) to approach the same sum-rate as that achieved by their HBF-NOMA counterparts. Fig. 6 shows the attainable EEs of the P-ZF based FCS-HBF-NOMA benchmarking the proposed P-ZF based SCS-HBF-NOMA scheme configured for one, two, and four streams. Results in Fig. 6 reveal that the proposed scheme yields meaningfully increased EEs as the number of data streams is increased. On the other hand, one stream FCS-HBF-NOMA offers the best possible EE compared to its two and four streams' configurations with higher sum-rates. This finding is due to the fact that the power consumed by the AB becomes higher as the number of streams is incremented. Explicitly, N · N s · N t PSs are required for AB, which depends mainly on N s configuration, whereas that required by the SCS counterpart remains N t PSs and becomes independent of N s . Fig. 7 illustrates EE attained by the SCS-HBF-NOMA on the basis of SIC-ZF scheme benchmarking the proposed P-ZF counterpart configured for one to four multistreams in turns. It is manifested in Fig. 7 that the proposed SCS-HBF-NOMA systems for one to four streams per user configuration yield substantively higher EE than its SIC-ZF counterpart at all SNRs. Notably, the proposed two streams assisted SCS-HBF-NOMA is capable of surpassing the EE of the single stream counterpart at SNRs ranged from 10 to 30 [dB]. Moreover, the sum-rate performances of the SIC-ZF based SCS-HBF-NOMA benchmarking the proposed schemes configured for two and four multistreams are portrayed in Fig. 8. Referencing the SIC-ZF based SCS-HBF-NOMA scheme at SNR of 30 [dB], the proposed scheme configured for two and four streams yields performance improvements of 28 % and 33 %, respectively. At low to medium SNRs ranged from −10 to 10 [dB], the proposed scheme manifests a sub-   stantively higher multiplexing gain compared to the SIC-ZF counterpart. Furthermore, at SNR of 30 [dB], a multiplexing gain of 46 % is obtained by the proposed scheme, which is slightly higher than that of 43 % yielded by the SIC-ZF counterpart. Therefore, an explicit conjecture can be drawn from the performance trend that the proposed scheme is considered as a better candidate operating at low to high SNRs.
In order to reveal an insight into the performance of the P-ZF based multistream HBF-NOMA operated in NLOS link, ''DirPDPinfo.mat'' is generated for NLOS link for two users in clusters one and two, which are located at [27, 78 m] and [20, 133 m] distances to BS in conjunction with their corresponding azimuth AOAs of 50 o and 60 o , respectively [19], [31]. The NLOS link-level simulations are implemented for the P-ZF based FCS-HBF-NOMA and SCS-HBF-NOMA schemes as well as SCS-HBF-NOMA built on SIC-ZF algorithm [19] to benchmark their performances in LOS environment as illustrated in Fig. 9. It is worth noting that the four streams aided SCS-HBF-NOMA built on P-ZF processing results in the worst performance in NLOS link. Specifically, referencing sum-rates in the LOS link, FCS-HBF-NOMA and SCS-HBF-NOMA built on P-ZF schemes as well as SCS-HBF-NOMA built on SIC-ZF scheme lead to performances degradation of [59, 82, 42] % at SNR of 10 [dB] and [42,75,18] % at SNR of 30 [dB] in NLOS link, respectively. It is inferred from the results that a good LOS link is inevitable to deploy P-ZF scheme for SCS-HBF-NOMA system exploiting multistreams per user transmission.

V. CONCLUSION
In this paper, the sum-rate maximization problems in a mmW NOMA system operating in LOS environment have been investigated. A hybrid precoding and analog combining schemes capable of supporting multistream per user have been designed for both FCS and SCS configurations under RF chains and total transmit power constraints. Based on it, a FPA scheme have been exploited for the study and a simple DPA technique was also introduced. Substantially reduced computational complexity of the proposed scheme was explicitly corroborated via the memory space complexity analyses. Link-level simulation results manifested that HBF-NOMA attains its sum-rate higher than HBF-OMA and traditional multiuser systems. Moreover, the proposed multistream SCS-HBF-NOMA leveraging P-ZF precoding scheme yielded higher EE than the FCS-HBF-NOMA. Our findings also articulated the significance of dominant LOS link for an effective mmW HBF-MIMO-NOMA communications induced from highly correlated cluster users. A meaningful insight into SNRs requirements yielding higher attainable sum-rates for the proposed multistream SCS-HBF-NOMA compared to the fully connected counterpart has been manifested. Lastly, the proposed SCS-HBF-NOMA scheme is strongly recommended as an energy efficient BF-NOMA scheme for post-5G wireless communication.