ZF-Based Downlink Hybrid Precoding and Combining for Rate Balancing in mmWave Multiuser MIMO Systems

This paper proposes a new design strategy for hybrid precoding and combining in the downlink of millimeter wave (mmWave) multiuser multiple-input multiple-output (MU-MIMO) channels. When channel state information is available at the transmitter, the proposed scheme designs the analog precoder and combiners by iteratively factorizing the matrices for fully digital precoding and combining, respectively, using the alternating optimization technique. Then, the digital precoder and combiners are obtained through block diagonalization of effective MU-MIMO channels composed of the analog precoder, MU-MIMO channels, and analog combiners, in order to eliminate inter-user interferences. Moreover, focusing on rate balancing among users, we derive a new power allocation algorithm that exploits a modified gradient descent method. The proposed method iteratively adjusts the power allocated to each user in terms of maximizing the minimum user rate. Through numerical simulations, we verify the convergence of the proposed design procedure for hybrid precoding and combining. Moreover, it is shown that the proposed method outperforms the conventional hybrid precoding and combining methods for rate balancing as well as achieves the minimum user rate close to the performance upper bound.


I. INTRODUCTION
Millimeter wave (mmWave) communication systems have been widely investigated as a means to accommodate rapidly increasing wireless traffic loads. The use of high carrier frequency fundamentally enables the increase in channel capacity [1]- [4], however at the same time, it causes some hurdles to achieving coverage such as huge path loss and rain attenuation [5]- [7]. Since mmWave communication systems can use a large number of antennas and radio frequency (RF) devices by virtue of the reduced wavelength, it is natural to utilize the massive multiple-input multiple-output (MIMO) technology for achieving a huge beamforming gain as a means to mitigate the increase in path loss [8]. Furthermore, it provides a spatial multiplexing gain by simultaneously transmitting data streams using multiple beams.
In a mmWave MIMO system, the beamforming and spatial multiplexing gains are achieved by the fully digital precoding, fully analog precoding, and hybrid precoding methods. The fully digital precoding can attain the theoretical channel capacity by arbitrarily adjusting the precoding coefficients, however the implementation cost for large-scale MIMO transceivers is excessive because the number of RF chains is equal to the number of active transmit antennas [6], [7]. In constrast, the fully analog beamforming enables low-complexity implementation with minimum number of RF chains, yet reduces the beamforming gain due to the limited resolution of phase shifters [9]- [11]. As a compromise of the analog and digital precoding methods, the hybrid precoding technique has been studied in many literatures [12]- [24]. The hybrid beamforming scheme interconnects a small number of digital data streams to a large number of RF antennas through twostage architectures composed of digital processing and analog beamforming, providing a good tradeoff between performance and complexity. For example, the compressive sensing theory was used to design sparse hybrid precoding and combining matrices in [12], and the theoretical performance of hybrid precoding was analyzed in [13] considering the relationship between the number of RF chains and the number of data streams. Moreover, the hybrid precoder and/or combiner were designed by matrix factorization techniques based on the alternating minimization algorithm [14]- [17], and practical implementation issues were considered such as the limited resolution of phase shifters in analog beamforming [18]- [21] and the limited feedback in closed-loop hybrid precoding [22]- [24].
When the fully digital precoding is used in the downlink of multiuser MIMO (MU-MIMO) systems, the precoding matrix can be designed based on two criteria -one is to maximize the sum rate for total throughput optimization [25]- [29] and the other is to maximize the minimum user rate for fairness [30], [31]. As an extension of point-to-point mmWave communication systems, the hybrid precoding and combining architectures are also considered in the downlink of mmWave MU-MIMO systems [24], [32]- [37]. Several hybrid precoding methods have been developed for MU-MIMO systems using the fully analog combining [32], employing the codebookbased hybrid precoding and fully analog combining [24], and utilizing the phase-shifting analog precoding in combination with digital precoding [33]- [37]. When the analog precoder is implemented by phase shifters, the digital precoder can be designed by the zero-forcing (ZF) [33], [34], regularized channel diagonalization [35], and minimum mean square error (MMSE) techniques [36], [37]. This paper proposes a new hybrid precoding method for the downlink of mmWave MU-MIMO systems based on the block diagonalization (BD) and power allocation, when analog precoder and combiners are implemented by phase shifters. Different from the point-to-point communications, the simultaneous transmission of multiple data streams causes inter-stream and/or inter-user interferences in a MU-MIMO channel. When the channel state information (CSI) of all users is known to the transmitter, the inter-stream and inter-user interferences can be completely removed by the BD technique which is a sort of ZF methods, and the MU-MIMO channel can be separated into multiple independent channels. Because the number of RF chains are limited in mmWave MIMO communications, the analog precoder is designed prior to the BD processing. To this end, we first compute the precoder for fully digital precoding and combining exploiting the BD technique of MU-MIMO channels, and then the proposed analog precoder is obtained through matrix factorization of the fully digital precoder. The digital precoder is designed through BD of the effective channels including the analog processing and real channels. In order to guarantee the fairness among users, the proposed method conducts power allocation to data streams in terms of maximizing the user-wise achievable rate. The main contributions of this paper are summarized as follows.
• Considering the rate balancing among users, we formulate an optimization problem to design the hybrid precoder and combiners for a mmWave MU-MIMO system. To solve the optimal problem, we first propose an iterative algorithm that decomposes the fully digital precoder and combiners into corresponding hybrid precoders and combiners, respectively. The proposed method is applicable to the design of the analog precoder on the transmitter side and the analog combiners on the user side. In the proposed algorithm, the analog precoder is obtained by iteratively updating the factorized matrices in the direction that the Frobenius norm of factorization error matrix is minimized, and the analog combiners are computed in a similar manner. • The effective MU-MIMO channels are defined by applying the designed analog precoder and combiners to the original channels. Then, the effective channels are converted to independent interference-free channels via the BD technique. A new power allocation problem is formulated under the total power constraint for rate balancing, and it is shown that the objective is quasiconcave. An iterative power allocation algorithm is proposed for maximizing the minimum user rate through some modification of the gradient descent method. The proposed power allocation scheme ensures that all users have the same achievable rate irrespective of channel conditions. • Through computer simulations, we evaluate the performance of the proposed method in terms of the minimum user rate, and show that the proposed design method is beneficial compared to conventional hybrid precoding and combining techniques. Moreover, the effect of imperfect CSI is presented through numerical simulations.
The remainder of this paper is organized as follows. Section II introduces the downlink of a mmWave MU-MIMO system with hybrid precoding and combining, and briefly explains the conventional BD technique. Section III describes the proposed ZF-based design method of hybrid precoder and combiners as well as the proposed power allocation algorithm for rate balancing. Simulation results are provided in Section IV and conclusions are presented in Section V.
Notations: Superscripts T , H, * , and −1 denote transposition, Hermitian transposition, complex conjugate, and inversion, respectively, for any scalar, vector, or matrix. |x| means the absolute value of x; the notations |X|, ∥X∥, and ∥X∥ F denote the determinant, ℓ 2 -norm, and Frobenius-norm of matrix X, respectively; I m represents an m × m identity matrix; 0 m×n and 1 m×n denote the m × n zero matrix and all-ones matrix, respectively; tr(A) is the trace operation of matrix A; (A) m,n means the (m,n)th entry of A; diag(x) returns a diagonal matrix whose main diagonal elements are equal to x; blkdiag({A} M m=1 ) denotes a block-diagonal matrix composed of A 1 ,· · · ,A M ; A • B represents the Hadamard product of matrices A and B; and x ∼ CN (0, σ 2 ) means that a complex random variable x conforms to a complex normal distribution with zero mean and variance σ 2 . E[x] stands for the expectation of random variable x.

II. SYSTEM MODEL AND PREVIOUS WORK
This section introduces the system model for the downlink of a mmWave MU-MIMO system using hybrid precoding and combining, and briefly explains the previous work related to the BD approach for ZF-based transmission. For notational convenience, we assume that the users receive the same number of data streams using the same number of antennas, i.e. each user receives L data streams using N antenna elements. Notice that the proposed method derived under this assumption can be applied to a general case with different numbers of antennas at users, if the number of transmit antennas is equal to or greater than the number of total data streams.

A. SYSTEM MODEL FOR HYBRID PRECODING AND COMBINING
The modulated symbol vector s ∈ C KL×1 satisfying E[ss H ] = I LK is transmitted to K users via analog/digital hybrid precoding. The transmitted signal is expressed as where F D ∈ C MRF ×KL is the baseband digital precoding matrix for adjusting the magnitudes and phases, F A ∈ C M ×MRF is the RF analog precoding matrix with unit magnitude and phase shifters. Denote that the total transmit power as P t , and then it holds that ∥F A F D ∥ 2 F ≤ P t . Let us denote the channel between the transmitter and user k as H k ∈ C N ×M , i.e. a MIMO flat fading channel. Suppose that the CSIs for all users {H k ; 1 ≤ k ≤ K} are known to the transmitter. User k conducts the analog/digital hybrid combining to detect the modulated symbol vector At user k, the received signal after hybrid combining is denoted as where W A,k ∈ C N ×NRF is the RF analog precoding matrix with unit magnitude and phase shifters, W D,k ∈ C NRF ×L is the baseband digital precoding matrix for adjusting magnitudes and phases, and n k ∈ C N ×1 is the noise vector whose elements are independent and identically distributed (i.i.d.) Gaussian random variables with zero mean and variance σ 2 k , i.e. n k ∼ CN (0, σ 2 k I L ). Using the channels {H k }, we design F A , F D , {W A,k }, and {W D,k } for hybrid precoding and combining.

B. BLOCK DIAGONALIZATION FOR MU-MIMO SYSTEM
When fully digital precoding and combining are used in a MU-MIMO system, the BD technique in [25] is introduced based on the generalized channel inversion. Specifically, when the number of transmit antennas is equal to or greater than the number of total data streams, i.e. M ≥ KL, the BDbased precoding completely removes the inter-user and interstream interferences. When a linear precoding is used at the transmitter, the received signal at user k is expressed as where F F D ∈ C M ×KL is the fully digital precoding matrix, and y k ∈ C N ×1 is the received signal vector at user k. Note that M RF = M , F D = F F D , and F A = I M , because the fully digital precoding method is used. By separating the desired signal from the inter-user interferences, we can rewrite as where F k ∈ C M ×L is the precoder for user k, i.e. F F D = [F 1 F 2 · · · F K ], andF k ands k are the precoding matrix and transmit symbol vector corresponding to all users except user k, defined as When the fully digital combining is used, W A,k = I N for all k and the first-stage digital combiner for user k, W k ∈ C N ×L , is composed of the left singular vectors corresponding to the L largest singular values of H k through singular value decomposition (SVD). After the first-stage digital combining, we have The purpose of BD is to make the interference term removed in (7) as well as to maximize the user-wise achievable rate. For notational convenience, we define the interference channel as Suppose thatL k is the rank ofH k , i.e.L k = rank(H k ). Then, we perform SVD forH k to remove the inter-user interference as follows:H are the right singular vectors corresponding to nonzero and zero singular values, respectively. Suppose that L k = rank(H kṼ where are the right singular vectors corresponding to nonzero and zero singular values, respectively. Using the right singular vectorsṼ (9) and (10), we design the ZF-based precoder as follows: (11) where P ∈ R KL×KL is a diagonal matrix with nonnegative elements representing the power allocation to individual data streams. From the total transmit power constraint, it holds tr(P ) ≤ P t . Moreover, U k in (10) is the second-stage digital combiner for the effective channelH kṼ (0) k , and the fully digital combiner for user k is denoted as Design the analog precoder F A and combiners {W A,k } by factorizing F F D and {W F D,k } using the AO algorithm.

III. PROPOSED HYBRID PRECODING AND COMBINING METHOD
Perform BD for the effective channels Optimize the power allocation using per-stream SNRs {ρ k,l } in terms of maximizing the minimum user rate.
Compute the digital precoder using F D,0 and the diagonal power Overall procedure of the proposed design method for hybrid precoding and combining.
the following subsections, we propose a design procedure of the analog/digital precoders and combiners for hybrid processing in the downlink MU-MIMO system, and then derive a new power allocation method for rate balancing among users.

A. DESIGN OF HYBRID PRECODER AND COMBINERS
As the first step, the fully digital precoder and combiners are evaluated using (7)- (12). Here, the power allocation is not considered, i.e. P = Pt KL I KL , because the fully digital precoder F F D in (11) is utilized only for the design of the analog precoder F A . In the proposed design procedure, the analog precoder F A and analog combiners {W A,k } are determined by factorizing the fully digital precoder F F D and combiners {W F D,l }, respectively, in the MMSE sense through the alternating optimization (AO) algorithm in [14]. The MMSE matrix factorization problem is formulated as where (13b) denotes that the analog precoder is implemented by phase shifters and (13c) means the transmit power constraint. In (13c), we use the equality constraint, because the downlink achievable rate is proportional to the transmit power. When F A is fixed, the digital precoder minimizing the cost function (13a) is given by where A † means the pseudo-inverse of A and c is a scaling factor to meet the transmit power constraint (13c). On the other hand, when F D is fixed, an iterative algorithm is derived to find the analog precoder minimizing the cost function (13a) which exploits the conjugate gradient method combined with the Riemannian gradient for the unit modulus solution. When denoting J(X(i)) = ∥F F D − X(i)F D ∥ 2 F , the Euclidean gradient of J(·) is given by where X(i) ∈ C M ×MRF is the analog precoder at ith iteration.
To find the solution conforming to the unit modulus constraint, the Riemannian gradient is computed through orthogonal projection of ∇J(X) onto the tangent space of the complex circle manifold: Suppose that D(i) ∈ C M ×MRF is the conjugate direction at ith iteration. From the Riemannian gradient in (16), the conjugate direction is updated as where Proj(D(i)) is obtained by replacing ∇J(X(i)) with D(i) in (16) and β(i) is a step-size parameter. Finally, the analog precoder is updated as below: (X(i + 1)) m,n = (X(i) + α(i)D(i + 1)) m,n | (X(i) + α(i)D(i + 1)) m,n | where α(i) is a step-size parameter, 1 ≤ m ≤ M , and 1 ≤ n ≤ M RF . By repeating (15)-(18) until X(i) converges, we have the analog precoder minimizing J(·): where X(end) is the finally updated matrix satisfying the convergence criterion. Similarly, the analog combiners are designed by solving the MMSE matrix factorization problem for each user as follows: The problem (P 2) is the same as (P 1) except the transmit power constraint, thus the analog combiner W A,k is determined by performing the conjugate gradient method in a similar manner to (15)- (18). In this case, we set c = 1 in (14) because the scaling for the transmit power constraint is not required . Now, we design the digital precoder and combiners. By utilizing the analog precoder and combiners designed from (P 1) and (P 2), the effective channel for user k, G k ∈ C NRF ×MRF , is expressed as By replacing {H k } with {G k }, we carry out the BD proce-dure for MU-MIMO downlink channels in (8)- (12). Then, from (11), the digital precoder prior to power allocation, F D,0 ∈ C MRF ×KL , is obtained as and the digital combiner for user k, W D,k ∈ C NRF ×L , is determined from (12) as below: Overall, the proposed algorithm for designing F A , F D,0 , {W A,k }, and {W D,k } is summarized as Algorithm 1. j = j + 1.

14.
Compute the digital combiner: W D,k (j) = W † A,k (j)W F D,k .

15.
Update the analog combiner until convergence utilizing (15)-(18), by replacing F F D and F D with W F D,k and W D,k (j), respectively. 16.

B. OPTIMIZATION OF POWER ALLOCATION
From (11) and (22), the digital precoder is given by where Σ k is obtained from the BD of {G k } shown in (10), P k ∈ R L×L is a submatrix of P satisfying P = blkdiag({P k } K k=1 ), and W H,k = W A,k W D,k . Since Σ k and P k are diagonal matrices, the nominal signal-to-noise ratio (SNR) of lth data stream at user k is computed as where Σ k = diag(ρ k,1 , ρ k,2 , · · · , ρ k,L ). Suppose that p k,ℓ is the power allocated to the lth data stream of user k, i.e. P k = diag(p k,1 , p k,2 , · · · , p k,L ). Then, the achievable rate for user k is expressed as Here, notice that the achievable rate is a function of p k describing the power allocated to user k. f k (p k ) is obtained by the water-filling algorithm that optimally assigns the power p k into L independent channels with {γ k,1 , · · · , γ k,L }. Specifically, the optimal power allocation is expressed as where λ is the Lagrange multiplier denoting the flood level conforming to the power constraint (27b), and [x] + = max(0, x).
To ensure fairness of achievable rates among users, we formulate the following max-min optimization problem: where p = [p 1 , p 2 , · · · , p K ] T . It is difficult to find a closedform solution for (P 3). Instead, we derive an iterative algorithm based on the gradient descent method and the waterfilling approach. When {γ k,ℓ } in (26) are fixed, the achievable rate f k (p k ) is a monotonically increasing function with respect to (w.r.t.) p k and the objective of (P 3) is a quasiconcave function from Property 1. In other words, (P 3) has a unique solution that can be found by a convex optimization technique.
To find the optimal solution of (P 3), we evaluate the numerical gradient of f k (p k ): where ∆p is a small positive constant. At the optimal point, it holds that Let us define the power allocation vector at ith iteration as p(i) = [p 1 (i), p 2 (i), · · · , p K (i)] T . With linear approximation around p k (i), we may write the difference between the optimal user rate f o and the actual user rate f k (p k (i)) as follows: Also, we can rewrite (31) as Due to the transmit power constraint, the total transmit power is not changed, i.e. K k=1 ∆p k (i) = 0. By applying this condition to (32), we have Again, by substituting (33) into (32), we can compute ∆p k (i) for all k, and the power allocation vector can be updated in the direction of increasing g K (p) by employing the gradient descent method: where ∆p(i) = [∆p 1 (i), ∆p 2 (i), · · · , ∆p K (i)] T and α p (i) is given by where 0 < β p ≤ 1 is the step-size parameter. The proposed power allocation method is summarized as Algorithm 2. Finally, the optimal power allocation matrix is given by P o = diag(p o ), and we design the digital precoder F D by substituting P o into (24).

IV. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed hybrid precoding and combining method through numerical simulations, and compare the proposed scheme with existing hybrid processing techniques for MU-MIMO systems. Specifically, we consider four hybrid precoding methods and three joint hybrid precoding/combining schemes as follows: • Fully digital precoding: the fully digital precoder is used at the transmitter (i.e. M = M RF , F A = I M , F D,0 = F F D ); the fully digital combiners are used at the receiver i = i + 1.
Estimate the optimal max-min user rate f o using (33). 9.

11.
Update the power allocation vector using (34). 12. until ∥∆p(i)∥ < ϵ p , where ϵ p is the tolerance for termination. 13. Output: p o = p(i).  [37]: following the approach in [37], the analog precoder is constructed by selecting the M RF beamforming vectors having the largest correlations with {H k }, when the fully digital combiners are used at the receivers. The digital precoder is designed via the BD of effective channels. • Random analog precoding: the analog precoder is defined as an arbitrary matrix with unit modulus via random phase shifting; the fully digital combiners are used at the receivers; and the BD technique is used to determine the digital precoder.  User 1 User 2 User 3 User 4

Number of iterations (i)
Achievable rate (bps/Hz) (RB) that maximizes the minimum user rate. For comparison, the power allocation for sum rate maximization (SRM) is also used based on the water-filling algorithm. Notice that we put the keywords RB and SRM in the legend to denote the power allocation methods. The keywords are dropped if no confusion arises between SRM and RB criteria.
In the simulation, we used the following parameters: N = 4, N RF = 3, L = 2, and M RF = KN RF for receivers; ϵ F = ϵ W = 0.001 for Algorithm 1; and ϵ p = 10 −5 , ∆p = 10 −5 , and β p = 0.5 for Algorithm 2, unless otherwise stated. In (17) and (18), α(i) and β(i) are adjusted by the backtracking line search [39, Ch 9.2] from the initial values α(0) = 1 and β(0) = 0.5. As in [12]- [24], the Saleh-Valenzuela channel model is used to generate mmwave MU-MIMO channels with the following parameters: the carrier frequency is 28 GHz; the number of clusters is 3; the number of subpaths per cluster is 8; the subpath angular spread for azimuth and elevation directions is π/64 at the transmitter and π/16 at VOLUME x, 2021 Fully digital precoding for SRM Proposed hybrid precoding for SRM Corr.-based hybrid precoding for SRM [37] Random analog precoding for SRM Fully digital precoding for RB Proposed hybrid precoding for RB Corr.-based hybrid precoding for RB [37] Random analog precoding for RB

SNR (dB)
Sum rate (bps/Hz)    the receiver, respectively; and the inter-element spacing is equal to half wavelength at both the transmitter and receiver. The average channel gains are asymmetrically configured to reflect the distance variation between the transmitter and receiver, i.e.
where ζ k is a random variable uniformly distributed in the range of (0.1, 1.0). For simplicity, we set σ 2 1 = σ 2 2 = · · · = σ 2 K . Every point denoting the achievable rate was obtained by averaging the simulation results over more than 200 independent channel realizations, except the convergence analysis in Figs. 3 and 4.

A. PERFORMANCE EVALUATION UNDER PERFECT CSI
This subsection presents the simulation results when the perfect CSI is available at the transmitter. Figs. 3 and 4 show the convergence behaviors of Algorithms 1 and 2, respectively, when K = 4, M = 32, and SNR = 10 dB. In Fig. 3, F A is a 32 × 12 matrix and F D,0 is a 12 × 8 matrix, while W A,k   is a 4 × 3 matrix and W D,k is a 3 × 2 matrix. Due to higher dimensions of factorized matrices, the hybrid precoder design requires more number of iterations than the design of hybrid combiners. Also, the convergence speed varies depending on the characteristics of the fully digital combiners {W F D,k } when designing the hybrid combiners for users. Even in the worst case, the matrix factorization procedure in Algorithm 1 is completed within 20 iterations. In Fig. 4, whereas the user rate difference is very large before the power allocation (i = 1), the user rates rapidly converge to the same value through Algorithm 2 and the minimum user rate is maximized. The power allocation procedure requires only 5 ∼ 8 iterations.
Figs. 5 and 6 compare the sum rate and the minimum user rate of various hybrid precoding methods, respectively, when K = 4 and M = 32. For SRM, the power allocation is conducted by the water-filling algorithm across multiple data streams, thereby the sum rate of all users is maximized at the cost of the rate imbalance among users. For RB, the power  allocation is performed by Algorithm 2, thus the minimum user rate is maximized with some sum rate loss. In Fig. 5, the proposed hybrid precoding for SRM outperforms the Corr.-based hybrid precoding for SRM and the random analog precoding for SRM, and presents the sum rate performance comparable to the fully digital precoding for SRM. Similarly, the proposed hybrid precoding for RB significantly improves the minimum user rate compared to the corr.-based hybrid precoding for RB and the random analog precoding for RB, and also performs very close to the fully digital precoding for RB which is the upper bound. As expected, the RB-based precoding methods achieve better minimum user rate than the SRM-based schemes.
In Figs. 7-9, we compare the minimum user rate of various transmission techniques, when the power allocation is conducted by Algorithm 2 for RB. Fig. 7 shows the minimum user rate across SNR when K = 4 and M = 32. In addition, Figs. 8 and 9 present the change of the minimum user rate with increment of the number of users and the number of transmit antennas, respectively, when M = 32 and SNR = 10 dB. For all cases, the proposed hybrid precoding scheme outperforms the corr.-based hybrid precoding and the random analog precoding. Also, the proposed hybrid precoding & combining method performs better than the existing counterparts such as the corr.-based hybrid precoding & combining and the random analog precoding & combining. The proposed hybrid precoding method exhibits reasonable performance loss compared to the fully digital precoding scheme. The performance gap between the proposed hybrid precoding and the proposed hybrid precoding & combining is higher than that between the fully digital precoding and the proposed hybrid precoding, because the nullity ofH k in (9) (or the rank ofṼ (0) k ) is reduced in BD by the use of hybrid combining. In Fig. 8, the analog precoding gain decreases as the number of users increases, thus the rate loss of the random analog precoding is reduced with increment of K. In contrast,  the performance loss of the random analog precoding grows as the number of transmit antennas M increases, because the analog precoding gain is proportional to M .

B. PERFORMANCE EVALUATION UNDER CSI UNCERTAINTY
In this subsection, we evaluate the performance of the proposed method when the transmitter has some CSI errors. The CSI error is denoted as E k ∈ C N ×M whose elements are i.i.d. Gaussian noises with zero mean, and the channel matrix with CSI uncertainty is expressed as where k = 1, 2, · · · , K. We define the normalized mean square error (NMSE) to describe the average power of the CSI error relative to the mean channel power as follows: In practical systems, the precoder and combiners are designed usingH k instead of H k , resulting to imperfect cancellation of inter-user interferences in ZF-based transmission. Denote the hybrid precoders and combiners obtained fromH k asF A , F D ,W A,k , andW D,k . In this case, the achievable rate for user k is computed as whereF H =F AFD is the entire matrix for hybrid precoding, W H,k =W A,kWD,k is the entire matrix for hybrid combining, and the matrices C 0,k and C 1,k are given by Here,F H,k ∈ C M ×L is the precoding matrix for the data streams transferred to user k that satisfiesF H = F H,1FH,2 · · ·F H,K . VOLUME x, 2021 Fig. 10 denotes the minimum user rate according to the NMSE, σ 2 ϵ , when K = 4, M = 32, and SNR=20 dB. It is assumed that the NMSE is identical to all users. As the NMSE increases, the minimum user rate gradually decreases in all hybrid precoding methods. The proposed hybrid precoding scheme performs better than the corr.-based hybrid precoding and the random analog precoding methods, irrespective of the NMSE, and the performance difference is reduced with increment of the NMSE. As in the case of only hybrid precoding, the proposed approach achieves higher minimum user rate than the existing methods when the hybrid precoding and combining are jointly used. As before, the proposed method performs very close to the fully digital precoding in the entire NMSE region.

V. CONCLUSION
In this paper, we have proposed a new design procedure for hybrid precoding and combining in mmWave MU-MIMO systems considering the rate balancing among users. In the proposed scheme, the analog precoder and combiners are determined by factorizing the fully digital precoder and combiners in the least squares sense, while the digital precoder and combiners are designed by BD of effective channels. The proposed ZF-based hybrid precessing ensures interferencefree transmission in the downlink under the perfect CSI. Moreover, the proposed power allocation algorithm maximizes the minimum user rate given the hybrid precoders and combiners. Numerical simulation results show that the proposed approach is more beneficial than the existing hybrid precoding and combining schemes in terms of the minimum user rate.
The proposed method is applicable to the transceiver design of cellular base stations and Wi-Fi APs equipped with largescale transmit antenna elements operating in mmWave bands. Also, the proposed design scheme can be utilized in the uplink of mmWave MU-MIMO systems, by exploiting the duality between the downlink and uplink. It is a good future research topic to design hybrid precoders and combiners for rate balancing when the MMSE-based precoding and combining are used. .