Limited Feedback-Based User Clustering for Non-Orthogonal Multiple Access in mmWave Systems

Non-orthogonal multiple access (NOMA) and mmWave are considered to be promising techniques for 5G and beyond cellular network communication systems. While the mmWave band provides a very wide under-utilized bandwidth, NOMA enhances the spectral efficiency compared to orthogonal multiple access (OMA). The combination of these two techniques could be considered as the key solution to the high required data rates in next-generation communication systems. The implementation of NOMA is studied and verified under an ideal condition with perfect knowledge of the channel state information (CSI) at the base station (BS). However, under the practical conditions, the fluctuation of the wireless channel makes perfect CSI unachievable at the BS. Hence, we proposed to use the angle of departure (AoD) as feedback information from UEs to the BS. We assume that, mobile users (UEs) perfectly estimate the channel by detecting the pilot signals. Then, UEs quantize the AoD and feed back to the BS. Finally, the BS uses the AoD to perform user clustering, power allocation and beamforming. To reduce the feedback overhead further, we proposed a user clustering algorithm which uses one-bit feedback to determine the change of the AoD. Numerical results demonstrate that the proposed NOMA system outperforms the conventional orthogonal multiple-access (OMA) system with the same amount of feedback information.


I. INTRODUCTION
The steady increase of high rate wireless communications means that fifth generation (5G) and beyond 5G networks must guarantee much higher spectrum efficiency and energy efficiency compared to the current 4G. In recent years, two main topics which have received much attention in both academic and industrial research are non-orthogonal multiple access (NOMA) [1]- [5] and millimeter wave (mmWave) [6]- [11]. Many studies have been carried out to verify the advantages of NOMA with regard to improving the system capacity, spectrum efficiency and user fairness as well. Fundamentally, NOMA is a multiplexing technique in which the power domain is utilized to multiplex user connections [1]. In NOMA, the BS transmits signals of multiple UEs at the same time, frequency and code but at different power levels. Stronger UEs who experience better channel conditions are paired up with those who receive worse channel conditions, The associate editor coordinating the review of this manuscript and approving it for publication was Cunhua Pan . the difference in the channel condition is compensated by the greater power assigned to the weaker UEs. The interference caused by the stronger UE signal is removed by successive interference cancellation (SIC) at the receiver. In other words, the advantage of signal processing helps to overcome the nonorthogonality of the NOMA system.
MmWave is referred to the band of spectrum between 30 GHz and 300 GHz. With the huge underutilization of bandwidths up to 7 GHz, mmWave has been considered as a promising solution for the bandwidth shortage issue [7]. The propagation of mmWave differs from that on the conventional sub-6 GHz band, it features low scattering and high directionality, which allows for spectrum reuse by limiting the amount of inter-cell interference [6]. In addition, the short wavelength of the mmWave is suitable with the prospect of using larger antenna arrays. Fortunately, the enhanced propagation loss on mmWave band significantly reduces the interference experienced by UEs in a NOMA-based network. This is the main motivation behind the combination of mmWave and NOMA in this paper. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

A. CURRENT RELATED WORK
In the literature, mmWave communication is one of the new areas in wireless communication. The early research working on the mmWave channel model was found in [12], in which the propagation properties were characterized and analyzed, and a channel model focusing on the single-input and singleoutput (SISO) model in indoor scenarios was also presented. In another study on the mmWave channel model [13], the authors proposed a ray-tracing based model that simply regenerated the propagation of the mmWave including both line-of-sight (LOS) and non-line-of-sight (NLOS) paths. MmWave channel modeling based on the angle of departure (AoD) and the angle of arrival (AoA) of the propagation paths was also proposed in [14]. More recently, in another study, the coverage and rate trends of mmWave cellular systems were analyzed in [15]. To overcome the high path loss of the mmWave system, a hybrid beamforming method, referred to as spatially sparse precoding (SS), was developed [16]. However, this work targets single-user scenarios. Then, implementation of SS precoding for a multi-user mmWave system was also proposed [17], [18].
In the aspect of NOMA, earlier work [1] summarized the prospects of NOMA, the key techniques and several open issues were discussed. The authors emphasized the important effects of user clustering and resource allocation on the performance of NOMA systems. From the same perspective, careful work by Ali et al. [2] intensively investigated and proposed a comprehensive solution for the NOMA downlink multi-user system which includes user pairing algorithms, beamforming design and optimal power allocation [2], [3]. In downlink multi-user system, beamforming plays an important role in defining the system sum rate. Hence, another study focusing on designing a beamforming scheme to reduce inter-clusters and intra-cluster interference was conducted in [20]. In terms of user fairness in NOMA, to enhance the throughput-fairness trade-off, Al-Wani et al. [21] proposed two user scheduling algorithms based on proportional fairness (PF), referred to as PF-MPECG-SIR and PF-MPECG-CORR. Other works with the same objective of proportional fairness can be found in the literature as well [22], [23].
An early combined NOMA and mmWave system was also studied [24], in which a performance comparison between mmWave and ultra-high frequency (UHF) cellular systems using different multiple-access techniques (NOMA and OMA) was carried out. These results presented the improvement in terms of the outage probability and average sum-rate, they encouraged the integration of mmWave and NOMA to enhance the system sum capacities in 5G networks. Later, Zhang et al. [25] provided a comprehensive theoretical analysis of the achievable capacity of integrated NOMA-mmWave massive-MIMO systems. The proposed capacity expression was divided into two cases based on the SNR (low SNR and high SNR) and then derived with statistics and eigenvalue distribution tools. Another work investigated the application of NOMA in mmWave communications, Cui et al. [26] proposed random beamforming (BF) to overcome the burden of feedback in mmWave NOMA systems. An interesting study about a NOMA framework which uses the high directivity of mmWave propagation was also published [27]. In that work, NOMA transmissions are established by using the distance and angle feedback from UEs. However, the angle estimation in mmWave system is difficult and sometimes impossible. Another important study on mmWave-NOMA can be found in [28], where hybrid BF structure is equipped in the BS. Zhu et al. proposed a user grouping algorithm based on the K-means algorithm and the normalized channel correlation of users. The authors also provided a sub-optimal solution of the power allocation including intra-group and inter-group power allocation. Subsequently, the authors focused on designing a digital BF by using the approximate zero-forcing method under an arbitrary and fixed analog BF. On the other hands, the research work in [29] focused on solving the problem of user fairness in mmWave-NOMA system. Xiao et al. proposed joint analog BF and power allocation to maximize the minimal achievable rate among users instead of maximizing achievable sumrate. An interesting study which considers joint Tx-Rx beamforming and power allocation problem in mmWave-NOMA networks can be found in [30]. In order to solve this problem, the authors proposed a three-stage power optimization for the Tx side, the Rx side and for Tx-Rx jointly. A sub-optimal solution is obtained based on boundary-compressed particle swarm optimization algorithm. The aforementioned works play important roles and provide a solid foundation for the study of mmWave and NOMA.

B. MOTIVATION AND CONTRIBUTIONS
Most of the research works on NOMA and mmWave mentioned in the previous section are based on the assumption that perfect channel knowledge (CSI) is available at the BS. However, this assumption is typically unachievable in practical cases. Especially in mmWave MIMO systems, which have numerous transmit and receive antennas, full channel information feedback incurs very high feedback overhead. In addition, the feedback delays, channel estimation errors and high mobility rates of mobile UEs are the main obstacles preventing the achievement of perfect CSI at the transmitter. These issues emphasize the importance of investigating the performance of mmWave NOMA systems under limited CSI feedback conditions. In this paper, we combine NOMA transmission and mmWave communication and investigate an integrated system. The main contributions of our research are summarized below: • To deal with practical conditions, in which the perfect CSI is not available at the BS, we propose to use AoD and one-bit determining the change of AoD as the partial feedback in mmWave MIMO NOMA system as modeled in early work [17], [28]. AoD information is used to group UEs into clusters, calculate the BF vectors at the BS, and arrange NOMA transmissions. • Under a limited channel feedback condition, we propose a user clustering scheme for NOMA systems which utilizes AoD information feedback from UEs. The intention of the proposed user clustering scheme is to maximize the sum-rate (or sum throughput) in NOMA clusters by mitigating the inter-cluster interference.
• Finally, we comprehensively evaluate the performance of the proposed NOMA mmWave system by carrying out a simulation with broadly different numbers transmit antennas and UEs. Both the limited feedback and full CSI assumptions are investigated in the simulations.

C. PAPER ORGANIZATION AND NOTATION
The remaining of this paper is organized as follows.
In section II, the system model of the NOMA mmWave system and the corresponding received signal model are described. Section III presents the key issues related to user clustering and the proposed algorithms for a limited-feedback NOMA system. Power allocation for NOMA is discussed in section IV. Section V presents numerous simulation results and discussions pertaining to the proposed NOMA system. The conclusions and suggestions for future works are given in section VI.
In this paper, we use normal letters a, lowercase boldface letters a, and uppercase boldface letters A to represent scalar quantities, vectors, and matrixes, respectively. In addition, R denotes the sets of real numbers and C denotes the set of complex numbers. A T and A H represents the transpose and the Hermitian transpose operation of the matrix A, respectively. A (i,j) is the entry of matrix A with the i-th row and j-th column. Furthermore, |a| denotes the magnitude of a scalar a, and A represents the Frobenius norm of matrix A. Additionally, C m×n , R m×n , and N m×n represent the spaces of m × n complex, real, and natural matrices, respectively. Finally, I N indicates an N × N identity matrix.

II. SYSTEM MODEL A. MIMO-NOMA mmWave SYSTEM
We consider the downlink scenario of a multi-user MIMO-NOMA in an mmWave cellular network, as illustrated in Fig. 1. We assume that the BS is equipped with fully connected hybrid beamforming, in which each of N RF RF chains is fully connected to all N T transmit antennas. There are K UEs attempting to communicate with the BS, and each UE is equipped with N R receive antennas (K ≥ N RF ). The K UEs are grouped into N s clusters, where N s ≤ K /2. In each cluster, two or more UEs per cluster are served simultaneously, while one UE belongs to only one cluster. The BS transmits N s (N s = N RF ) streams to server N c clusters. We also assume that each cluster is assigned a single beam by the BS and all UEs in the same cluster are scheduled on the NOMA basis principle. According to MIMO theory, the number of clusters that can be served by the BS at the same time cannot be greater than the number of RF chains at the BS. If N C > N RF , then multiple clusters must use the same beam, but they are assigned orthogonal spectrum resources.

B. SIGNAL MODEL 1) MIMO BEAMFORMING IN mmWAVE
Consider a conventional multi-user MIMO mmWave system, we adopt the hybrid beamforming architecture used in earlier work [17] to compensate for the high path loss in mmWave. Here, we consider the transmitted data vector for N s streams , is the data symbol for the i-th stream. We denote the precoding matrix as F ∈ C N T ×N s and the channel matrix between the BS and VOLUME 9, 2021 Hence, the precoding matrix can be represented as F = The received signal at k-th UE is then expressed as where w k ∈ C N R ×1 , w k 2 = 1 is the combining vector at the UE, z k is a zero-mean complex Gaussian noise vector such that z k ∼ CN (0, σ 2 ). Within the scope of this paper, to reduce the computation complexity and power consumption of the receiver, we assume that only an RF combiner is used by UEs. The AoA information can be used to select the best combining vector w k that maximizes w H k H k at the receiver. The procedure of calculating the combining vector w k is also ignored because the receiver is able to obtain the CSI, and the computation of the combining vector is straightforward. From the received signal in (2), the SINR at the k-th UE of the i-th stream is calculated as follows: 2) APPLY NOMA TRANSMISSION In contrast to the OMA strategy, in NOMA the UEs in one cluster receive the same symbol, and the technique used to combine multiple symbols for different UEs into one symbol is referred to as superposition encoding [1]. By implementing the NOMA transmission principle, the superposed signal for all UEs in the n-th cluster can be represented as follows where s n,m is the message signal for the m-th UE in the n-th cluster and p n,m is the corresponding transmit power such that M m=1 p n,m = p n , where p n is the total transmit power for the n-th cluster. The received signal in (1) Then, the achievable throughput for UE m in cluster n is given by Subsequently, the overall achievable throughput of the MIMO-NOMA system can be expressed as [28] From equations (5) and (7), it is important to note that the overall achievable throughput of the network and the achievable throughput of each UE are directly determined by three key factors: the beamforming technique, the user clustering approach, and the inter-cluster and intra-cluster power allocations.

C. mmWAVE CHANNEL MODEL AND AOD FEEDBACK
High free-space path loss is one of the main characteristics of the mmWave system, and it leads to low scattering and limited spatial diversity. In addition, the large antenna arrays used in mmWave transceivers have high directivity. Thus, there are fewer propagation paths in the mmWave than in the conventional UHF wave, and the signal propagates mainly via LOS and low-order reflections. For these reasons, in this paper we adopt the extended Saleh-Valenzuela model, which captures the mathematical structure present in mmWave channels [12]- [14]. The channel matrix H k [t] between the BS and the k-th UE at the time slot t can be expressed as where α k,l is the complex gain of the l-th ray, a MS (ϕ k,l ) and a BS (φ k,l ) are the antenna array response vectors for the AoD φ k,l and AoA ϕ k,l at the BS and UE, respectively. For simplicity, we adopt a commonly used antenna array structures, i.e., the uniform linear array (ULA). For both the transmitter and receiver, the array response vector for a ULA is expressed as follows in which, j 2 = −1, λ is the wavelength of the mmWave carrier frequency, and d is the spacing between the 49470 VOLUME 9, 2021 antenna elements. To capture the variation of the wireless channel caused by the movement of UEs, we adopt the temporally correlated mmWave channel model [13], [15]. At the very first time slot, we initialize α k,l [0] ∼ CN (0, 1), ϕ k,l , φ k,l ∼ U(0, 2π) with ∀k = 1, 2, . . . , K and ∀l = 1, 2, . . . , L. The channel H k [0] at the first time slot can be fully determined by α α α ]. According to the temporally correlated channel model, the channel matrix H k [t + 1] is generated from the current one H k [t] as follows: where α α α k [t] ∈ C L is the path gain variation vector consisting of complex Gaussian random variables with zeromean and unit-variance. Furthermore, p = J 0 (2π f D T ) is the time correlation coefficient in Jake's model [15]. T and f D denote the time slot duration and the maximum Doppler frequency, respectively. In addition, φ φ φ k [t], ϕ ϕ ϕ k [t] ∼ N (0, σ 2 u ) represent the angle variations of AoA and AoD [17].
Essentially, to implement NOMA, the BS must have the ability to arrange UEs into clusters and allocate resources (power and bandwidth) for them. In addition, the precoding vector is calculated based on the channel information. Thus, the BS requires each UEs to send feedback information about its channel. Fundamentally, the transmission includes two phases in each time slot: the pilot transmission and the data transmission. In the first phase, the BS transmits pilot signals to all UEs. In this phase, the transmissions are done in the manner of the conventional OMA, as described in the previous section. Each UE independently measures its channel according to the pilot signals and sends feedback on the CSI to the BS. However, with increases in the numbers of antennas in the transmit and receive antenna arrays, the channel estimation and feedback become more computationally complex. In addition, rapidly fluctuating channels over time make full channel information feedback even more expensive in terms of resources. It is important to note that by adopting the geometric-based mmWave channel model, the channel matrix becomes a function of the complex gain α k,l , AoD φ k,l and AoA ϕ k,l , which are much more compact and slowly varying. In this paper, we assume that channel matrix H k is perfectly estimated at the k-th UE. The receiver selects a RF combiner to maximize the channel gain |w H k H k |. Then, the estimated AoDφ k,l is selected from a predefined Q bits codebook set which is shared by both the BS and UE, ϕ k,l and ϕ k,l ∈ {0, 2π/2 Q , . . . , 2(2 Q − 1)π/2 Q }. The index of the quantized AoD is then fed back to the BS to arrange the UEs into clusters and allocate resources for transmission back to the BS through an error-free but limited-rate feedback channel.

III. USER CLUSTERING IN DOWNLINK MIMO-NOMA
Generally, for user clustering and scheduling problem in a common multi-user system, an exhaustive search is considered as the optimal solution. This method examines all possible combinations for every single user. In the MIMO-NOMA mmWave system, the number of clusters equal the number of RF chains at the BS, and the number of UEs selected to establish the NOMA cluster and the candidate UEs in the cell is even higher. Hence, the computational complexity of an exhaustive search becomes extremely high and unaffordable for practical systems.

A. CRITERIAS
With the objective of sum-throughput maximization in a cell, the key factors that must be considered for user clustering in a downlink MIMO-NOMA system are as follows. From equations (2) and (5), the advantage of NOMA compared to the conventional OMA approach is that the SIC process for stronger UEs can completely eliminate the interference caused by the signals of weaker UEs in the same cluster. However, the SIC requires distinctiveness between the channel gains of the cluster head and those of the remaining UEs. Therefore, the first criterion for the user clustering algorithm is that the cluster heads must have high channel gain. The second point is to reduce the inter-cluster interference among clusters. According to equation (3), the inter-cluster interference |w H n,m H n,m f j | 2 is proportional to the correlation between different clusters. To reduce the interference, the channel H n,m of cluster head UEs must be as distinctive as possible. From geometric-based channel model in (8), the correlation between any two UEs is defined by the antenna array response vectors corresponding to AoD and AoA. Thus, the second criterion is to select UEs with different AoD as the cluster heads. From the aspect of remaining UEs other than the cluster heads, to obtain the benefit of the precoding designed for the cluster, all UEs in the same cluster should have a similar AoD and should be distinguished from the UEs in other clusters. In this work, we propose the use of the AoD and channel gain information as the criteria for forming downlink NOMA clusters.

B. USER CLUSTERING ALGORITHMS
In this section, we describe the proposed user clustering algorithms, which are specially designed for the NOMA system in a spatial sparse environment. Let consider the aforementioned downlink scenario, where the BS with N RF RF chains and N T antennas communicates with K mobile UEs in a single cell. The number of clusters is defined as N c = N RF and each cluster serves M UEs simultaneously. First, we provide the NOMA user clustering algorithm, which uses the AoD and channel gain information. With the assumption that all UEs can estimate the AoD and the channel gain and then feed them back to the BS via an error-free feedback channel, Algorithm 1 includes the two stages of selecting cluster heads and informing NOMA clusters. This procedure is described in as follows. VOLUME 9, 2021 Algorithm 1 AoD Based User Clustering Update: Q ← Q + {φ k * ,l * } 5: end for 6: for j ≤ M do 7: for i ≤ N s do 8: {k * , l * } = arg min k∈U ,l∈ [1,L] end for 11: end for 12: Include k-th UE into each cluster The key idea is that the BS attempts to achieve a benefit from the spatial sparse property of the mmWave environment. In detail, UEs which are apart from others in the angle domain are given priority to be cluster heads while UEs which are in close together are proper to form a group in the same cluster. The described algorithm focuses on the case that there are two UEs in each NOMA cluster. To inform higher order NOMA clusters with more than two UEs, steps 7 to 12 are repeated until all UEs are grouped into the same cluster. However, the higher the order of the cluster, the more transmit power will be required for the weakest UE, while the transmit power is limited. Therefore, more than two UEs in one cluster may not to be feasible in NOMA system.
Based on the idea of Algorithm 1, we develop a more robust version of user clustering algorithm in which the UEs provide feedback of only one bit, labeling the variation of the channel parameter, to the BS. It should be noted that in Algorithm 2, at the first time slot of each period T FB , the BS is assumed to know the channel parameters of the UEs, whereas in the subsequent time slot, the channel parameters are tracked based on the bit feedback from the UEs. In other words, we combine the one-bit feedback and random perturbation beamforming in [17] to track the time-varying channel parameters and schedule the transmission. In step 6, we indicate that the sign(·) function provides the direction of the change in the channel parameters. Hence, bφ t k,l = 1 if φ t−1 k,l ≤ φ t k,l and bφ t k,l = 0 if φ t−1 k,l ≥ φ t k,l . In other words, if the AoD shifts counter clockwise, bφ t k,l = 1 whereas bφ t k,l = 0 for the clockwise case. In addition, bα t k,l defines the change in the magnitude of the channel complex gain.
α t k,l and φ t k,l are random factors which are predefined based on the channel model, as described in section II. B.

IV. POWER ALLOCATION
Power allocation also plays an important role in the performance of the system and needs to be considered Algorithm 2 Limited Feedback AoD Based User Clustering 1: Initialization: At t = 0, the BS use normal AoD feedback:φ φ φ 0 k = φ φ φ 0 k ,α α α 0 k = α α α 0 k . 2: Iterative Tracking: 3: Select k * index of strongest path: l * = max l∈ [1,L] |α t k,l |. 4: The BS usesφ t ,l * to select f RF and calculate f BB using the MMSE precoding in [18] 5: Pilot signals are sent to all UEs. 6: UEs detect the pilot signal and estimate the channel parameters: φ φ φ t k , α α α t k . 7: UEs calculate the one-bit feedback by the change of channel parameters compared to the previous time slot: bφ t k,l = sign(φ t−1 k,l − φ t k,l ), bα t k,l = sign(|α t−1 k,l | − |α t k,l |). 8: UEs sent the one-bit feedback to the BS. 9: The BS uses one-bit feedback and the apply the perturbation to generate the estimate new channel parameters for the next time slot: carefully when selecting an optimal power allocation for NOMA. Fundamentally, power allocation in a MIMO-NOMA system includes two steps: allocating transmit power for clusters and allocating power for the UEs in each cluster. There are several key points related to inter-cluster power allocation. First, the total transmit power of the BS, which is divided into N s transmit beams, should satisfy the power constraint N s n=1 p n = N s , where, p n is the transmit power of the n-th beam. The average total transmit power constraint is satisfied by normalizing the magnitude of the precoders F RF F BB 2 F = N s as proposed in [17]. Secondly, because each beam is utilized by all UEs of a cluster, the transmit power for a beam is proportional to the number of UEs served by that beam. If each beam serves the same number of UEs (same cluster order), the power allocated for cluster is then equal, as was adopted in [3]. This approach provides a nearly optimal solution because the levels of channel gain distinctness are similar. In this paper, we apply the water-pouring algorithm to assign the transmit power for the clusters. In this case, the power levels allocated for clusters depend on the channel gain of the worst UE in that cluster to guarantee the fairness among all clusters. In detail, the power allocated for a cluster is inversely proportional to the channel gain of the highest order UE in that cluster. The cluster which serves the worst channel UE receives more power to enhance the data rate. Finally, the allocated power for each cluster is calculated by solving (11). The first expression represents the total transmitted power constraint N s n=1 p n = N s . The second expression is the application of water-filling algorithm, p 1 g 1 = p 2 g 2 = . . . = p N s g N s in which g n is the gain of the strongest path in The optimization problem for intra-cluster power allocation can be considered as similar to power allocation in the downlink SISO NOMA case, which is presented in [2]. Within a cluster, the first constraint is the beam transmit power budget, which is determined during the inter-beam power allocation step discussed above. The second constraint is to guarantee the minimum throughput requirements of all UEs in the NOMA clusters and finally the requirement of the minimum signal power distinctness among the UEs to undertake SIC at the UEs.
In this work, we adopt the dynamic power allocation scheme proposed in [1], [3], which provides a closed-form solution of (12). Optimal power allocation for the first UE of the n-th MIMO-NOMA cluster is summarized in Table 1.

A. SIMULATION ASSUMPTIONS
In this section, we carry out simulations to evaluate the performance of the proposed NOMA schemes and compare the results with those from the conventional NOMA and OMA scheme. The proposed NOMA scheme in which the AoD is utilized as feedback information to form the NOMA cluster and perform beamforming is referred to as the AoD-based NOMA. On the other hand, the conventional NOMA is the scheme that uses the SINR as CSI feedback. According to MIMO theory, the number of data streams that the BS can serve simultaneously is equal to the number of RF chains, N s = N RF [17], [18], [28]. Practically, the number of users K in a cell exceeds the number of RF chains, K N RF . Consequently, the BS cannot serve all users simultaneously. To guarantee user fairness, another orthogonal multiplexing access scheme should be implemented. In this paper, the OMA scheme is referred to as the TDMA scheme, in which two groups of higher gain UEs and lower gain UEs are alternatively served in each time slot, with the performance being the average of these two groups. We choose TDMA as a competitor because in this scheme, the inter-beam interference can be eliminated completely [3]. Moreover, using this scheme ensures the best performance among OMA schemes [17]. The OMA scheme is referred to as the TDMA scheme, in which two groups of higher gain UEs and lower gain UEs are alternatively served in each time slot, with the performance being the average of these two groups. We consider a single-cell scenario in which the BS is located at the center of the cell area with an inter-site distance of 600 meters (m). In addition, K UEs are assumed to be randomly distributed around the BS. The K /2 UEs with higher channel gain are assumed to be randomly distributed within 300m around the BS. It is also assumed that the UEs are moving at a walking speed of v = 3 km/h. The simulations are done over ten thousand time slots T = 1µs and the result is the average throughput of all UEs in the cell. The main simulation parameters are summarized in Table 2. Fig. 2 shows the sum-rate performances of the proposed NOMA schemes and competitors when the number of UEs in the system varies. From the simulation results, the proposed NOMA scheme with user clustering based on the AoD information feedback significantly outperforms the conventional OMA schemes. In detail, compared to the OMA scheme, the proposed NOMA scheme, which uses the AoD information feedback, achieves a total throughput gain of 41.4% over the OMA scheme with K = 256 and a total throughput gain  of 35.4% over the OMA scheme, with K = 128. Compared to the conventional NOMA scheme, the proposed NOMA scheme also provides 17.8% and 15.3% total throughput gain with K = 32 and 64, respectively.

B. SIMULATION RESULTS
In the limited feedback condition, the proposed NOMA scheme offers a much more significant gain compared to the conventional NOMA and OMA schemes. With K = 128, the total throughput provided by the proposed NOMA gains 196.3% compared to the OMA scheme and 60.9% compared to the conventional NOMA scheme. Due to the limited feedback condition, the conventional NOMA has difficulties in both calculating the procoding matrix and forming the NOMA clusters. Therefore, the conventional NOMA shows the worst performance among the compared schemes. The improvement of the proposed scheme comes from its ability to mitigate the interference in the spatial domain based on the AoD information feedback. In addition, as the number of UE increases from 16 to 256, the total throughput increases from 30.85 bps/Hz to 50.69 bps/Hz. The proposed NOMA system can take advantage of multi-user diversity better than NOMA system, because the increase in the UEs number in the cell results in a higher probability of NOMA clusters being formed. Fig. 3 shows the sum-rate performances of the NOMA and conventional OMA systems as a function of the average signal-to-noise ratio (SNR). We can see that the proposed NOMA scheme achieves the highest system sum-rates for both full CSI information and the limited feedback condition. The proposed NOMA provides a 55.4% gain compared to the OMA scheme and a gain of 176.7% compared to conventional NOMA scheme under the limited feedback condition. It worth to notice that the AoD information is much more compact than the channel information. For example, in the assumed MIMO system, which has N T transmit antennas, N R receive antennas, and Q quantization bits for phase shifters, the number of feedback bits is shown in Fig. 4. With 256 UEs, the using of AoD information reduces by 98.5% the number of the feedback bits compared to the conventional CSI feedback. In case of using limited feedback, AoD information reduces 91.67% of the number of feedback bits.

VI. CONCLUSION
The implementation of NOMA is a promising approach to improve the spectral efficiency performance of mmWave wireless cellular systems. This paper focuses on a downlink multi-user MIMO mmWave network in which AoD information is proposed to be used as feedback information to help the BS arrange NOMA transmissions. The BS uses AoD information to form NOMA clusters and calculates the beamforming vectors for each cluster. Thanks to the poor scattering property of mmWave systems, the AoD information is used efficiently to eliminate inter-cluster and intracluster interference. Accordingly, the system performance is improved. We also proposed AoD-based user clustering for the NOMA scheme. Through simulation results, the proposed NOMA scheme outperforms the conventional NOMA and OMA schemes under both full and limited CSI feedback conditions.