Hybrid Precoding and Combining Strategy for MMSE-Based Rate Balancing in mmWave Multiuser MIMO Systems

In this paper, a new hybrid precoding and combining method is proposed for the downlink of multiuser multiple-input multiple-output (MU-MIMO) millimeter wave (mmWave) channels. The proposed method designs the precoders and combiners for radio frequency (RF) and baseband processing, respectively, based on the minimum mean square error (MMSE) criterion and the rate fairness among users. To design the RF precoder and combiners implemented by phase shifters, a new matrix factorization algorithm is devised by combining the gradient method with the orthogonal projection. Under the total transmit power constraint, the proposed factorization method increases the achievable rate by making the columns of the RF precoder near-orthogonal and growing the Frobenius norm of the baseband precoder. In addition, a new MMSE-based rate balancing algorithm is proposed to design the baseband precoder and combiners in terms of maximizing the minimum user rate. The proposed rate balancing scheme iteratively updates the baseband precoder, the transmit power constraint for the baseband precoder, the baseband combiners, and the weighting vector for rate balancing. Through theoretical analysis, it is shown that the proposed design method has a polynomial complexity order. Numerical simulations present that the proposed matrix factorization method outperforms existing schemes requiring low computational complexity and the proposed rate balancing scheme converges to a stationary point satisfying the total transmit power constraint. Moreover, simulation results in MU-MIMO channels are provided to show that the proposed design scheme performs better than existing hybrid processing schemes while achieving the minimum user rate close to the upper bound of MMSE processing.


I. INTRODUCTION
To meet the rapidly increasing demand for wireless communication services, the network capacity can be improved by employing advanced physical layer techniques such as massive multiple-input multiple-output (MIMO) [1], enhancing area spectral efficiency using small cells [2], and providing advanced cooperation through cloud radio access networks (C-RANs) [3]. On the other hand, the millimeter The associate editor coordinating the review of this manuscript and approving it for publication was Olutayo O. Oyerinde . wave (mmWave) band from 30 to 300 GHz has been attracting a great attention as a means to fundamentally increase the capacity using more spectrum bands [4], [5], [6], [7]. The standalone mode in 5G New Radio (NR) exploits the mmWave bands in Frequency Range 2 (FR2), and the commercial NR networks adopting the standalone mode has been gradually deployed in recent years [8], [9].
MmWave cellular systems have several obstacles such as the huge path loss and rain attenuation caused by the ten-fold increase of the carrier frequency [10], [11], [12]. Fortunately, mmWave transceivers can be equipped with VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ large-scale antennas because the antenna form factor is reduced by virtue of small wavelength. This enables to form highly directional beams that provide significant beamforming gains for compensating for the path loss. Moreover, a spatial multiplexing gain can be achieved by concurrently transmitting multiple data streams. The mmWave system with large-scale antennas requires prohibitive cost and power consumption for fully digital precoding that controls both the magnitude and phase of digital baseband signals, because a dedicated radio frequency (RF) chain is needed for each antenna element. Considering the constraint on the number of RF chains, the two-stage hybrid precoding architecture has been widely investigated as a means for effectively interconnecting a small number of digital data streams to a large number of RF antenna elements [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]. Sparse hybrid precoding and combining schemes were developed using the orthogonal matching pursuit algorithm for compressive sensing-based reconstruction in [14], and an adaptive parameter estimation method was proposed for mmWave-specific channel estimation [15]. Moreover, it was shown through a theoretical analysis that the performance of a hybrid precoder can approach that of a fully digital scheme if the number of RF chains is equal to or greater than the number of data streams [16]. The performance of hybrid precoding and/or combining has been improved by employing the alternating minimization-based design schemes [17], [18], [19] and matrix factorization techniques [20], [21]. In addition, design methods for joint hybrid precoding and combining were devised for practical mmWave transceivers with low-resolution phase shifters [22], [23], [24], and the corresponding spectral efficiency was analyzed [25]. In [26], [27], and [28], codebook-based hybrid precoding was studied to reduce the feedback information in practical mmWave systems.
The hybrid precoder and combiner design scheme has been further extended to mmWave multiuser MIMO (MU-MIMO) systems [27], [28], [29], [30], [31], [32], [33], [34]. In [29], the authors derived the lower bound on the achievable rate for single-path channels and developed a low-complexity hybrid precoding algorithm in downlink MU-MIMO systems with analog combining. Joint RF-baseband hybrid precoding was investigated based on the predetermined codebook to reduce the feedback overhead and facilitate hardware implementation in a multiuser multiple-input single-output (MU-MISO) system [27] and a MU-MIMO system with analog combining [28], respectively. The phase-shifting RF precoding can be combined with baseband precoding based on block diagonalization (BD), singular value decomposition (SVD), and regularized channel diagonalization techniques [30], [31], [32]. Also, a minimum mean squared error (MMSE) criterion is employed to design hybrid analog/digital precoders and combiners for MU-MIMO systems [33], [34]. In the downlink of fully digital MU-MIMO systems, the precoder for maximizing the achievable sum rate can be designed under total power or per-antenna power constraints using BD of multiuser channels [35], [36], [37], regularized channel diagonalization [38], [39], generalized channel inversion [40], and weighted MMSE [41]. On the other hand, the rate balancing precoding method has been studied under the MMSE criterion to ensure the rate fairness among users in the downlink MU-MIMO channels [42], [43]. The rate balancing approach was also employed to the design of hybrid precoders and combiners for the mmWave MU-MIMO system based on zero-forcing (ZF) [44].
Motivated by previous work, this paper focuses on the MMSE criterion and the rate balancing for hybrid precoding and combining in mmWave MU-MIMO systems. When the RF precoder and combiners are implemented by phase shifters, we propose a new matrix factorization technique for the design of RF precoder and combiners. By concatenating the designed RF precoder, the original MIMO channels, and the designed RF combiners, we define the effective MU-MIMO channels. From the effective channels, a new MMSE-based rate balancing algorithm is devised that computes the baseband precoder and combiners in the MMSE sense while ensuring rate fairness among users. The main contributions of this paper are summarized as follows.
• We define the optimization problem with respect to the RF and baseband precoders in terms of maximizing the minimum user rate for fairness. Considering the constant-modulus constraints, a new design method for RF precoder and combiners is proposed by combining the matrix factorization technique with orthogonal projection. In the proposed method, the fully digital precoder (or combiner) is decomposed into the RF and baseband precoders (or combiners) by iteratively updating the RF precoder (or combiner) using the gradient method and the orthogonal projecting technique in [45] and [46].
• The effective channels are defined by concatenating the RF precoder, the original MU-MIMO channels, and the RF combiners. Considering the total transmit power constraint and the MMSE-based rate balancing criterion, a new design procedure is devised for the baseband precoder and combiners. The proposed algorithm iteratively adjusts the norm constraint of the baseband precoder, updates the baseband precoder and combiners in the MMSE sense, and controls the target user rates for maximizing the minimum user rate. The proposed algorithm guarantees rate balancing among users conforming to the total transmit power constraint.
• Through theoretical analysis, the complexity order of the proposed algorithms are compared with those of existing hybrid processing methods. In addition, it is shown that the proposed entire procedure for hybrid processing has a polynomial time complexity similar to conventional MMSE-based schemes.
• Through numerical simulations, we verify the convergence of the proposed matrix factorization algorithm and the proposed rate balancing design scheme, respectively. Also, simulation results show that the proposed method is advantageous than existing hybrid processing techniques for mmWave MU-MIMO systems while achieving the minimum user rate close to the upper bound. Moreover, under imperfect channel state information (CSI), it is demonstrated that the proposed method is still beneficial over conventional hybrid processing schemes. The organization of this paper is as follows. In Section II, we introduce the MU-MIMO system with hybrid precoding and combining, and formulates the max-min optimization problem to design hybrid precoders and combiners for the downlink MU-MIMO system. In Section III, the proposed method is derived for jointly designing hybrid precoders and combiners accounting for the MMSE-based rate balancing. Section IV compares the complexity order of the proposed algorithms with those of existing methods and Section V provides numerical simulation results to present the convergence and benefits of the proposed design schemes. Finally, Section VI concludes this article.
Notations: Superscripts T , H , * , and −1 denote transposition, Hermitian transposition, complex conjugate, and inversion, respectively, for any scalar, vector, or matrix. |x| means the absolute value of a scalar x; the notations |X|, X , and X F denote the determinant, 2 -norm, and Frobenius-norm of matrix X, respectively; I m represents an m × m identity matrix; 0 m×n and 1 m×n denote the m × n zero matrix and all-ones matrix, respectively; tr(A) is the trace operation of matrix A; diag(x) returns a diagonal matrix whose main diagonal elements are equal to x; blkdiag(·) stands for a block-diagonal matrix with matrices on its diagonal; A(i, j) denotes the ith row and jth column of matrix A; • and ⊗ are Hadamard and Kronecker matrix products; x ∼ CN (0, σ 2 ) means that a random variable x conforms to a complex normal distribution with zero mean and variance σ 2 ; and E[x] stands for the expectation value of a random variable x.

II. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, we present the system model for the downlink of a MU-MIMO communication link with hybrid precoding and combining, and define the optimization problem to design the hybrid precoders and combiners in terms of maximizing the minimum user rate for rate balancing. Fig. 1 describes the MU-MIMO system for downlink transmission using hybrid precoding at the transmitter with M antennas and M RF RF chains and hybrid combining at the receiver with N antennas and N RF RF chains. All data streams are concurrently transferred to K users through baseband precoding followed by RF precoding. For notational convenience, it is assumed that all users receive the same number of data streams, i.e. L data streams per user, with the same number of receive antennas and RF chains. Notice that it is straightforward to extend the proposed scheme to a MU-MIMO system with an arbitrary number of data streams and receive antennas. The number of RF chains for the transmitter and user receivers satisfy that KL ≤ M RF ≤ M and L ≤ N RF ≤ N , respectively. H k ∈ C N ×M is the downlink channel matrix for user k whose elements represent flat fading channel gains. It is assumed that the CSI for all users, {H k ; 1 ≤ k ≤ K }, are available at the transmitter. For example, when time division duplexing is used, {H k } can be estimated in the uplink from the channel reciprocity as shown in Fig. 1. It is noticeable that the effect of imperfect CSI is evaluated through numerical simulations in Section V.

A. MU-MIMO SYSTEM MODEL
When a modulated symbol vector s ∈ C KL×1 is transmitted using the baseband precoder F B ∈ C M RF ×KL and the RF precoder F R ∈ C M ×M RF , the transmit symbol vector x ∈ C M ×1 is given by where W B,k ∈ C N RF ×L and W R,k ∈ C N ×N RF are the baseband and RF combiners for user k, respectively, and n k ∈ C N ×1 is the noise vector composed of independent and identically distributed (i.i.d.) complex Gaussian variables with zero mean and variance σ 2 k , i.e. n k ∼ CN (0, σ 2 k I N ). Throughout the paper, it is assumed that the elements of RF combiners {W R,k } have a constant amplitude and adjustable phases, i.e.

B. PROBLEM FORMULATION
We design the RF precoder and combiners with a constant amplitude and adjustable phases as well as the baseband precoder and combiners with controllable amplitude and phases. When a Gaussian symbol vector s is transmitted over the downlink MU-MIMO channel with hybrid precoding and combining, the achievable rate for user k is given by [30], [34] where C k ∈ C L×L is defined as Here, F B,k ∈ C M RF ×L is the baseband precoder for transmitting the modulated symbols to user k, i.e. F B = F B,1 F B,2 · · · F B,K . This paper focuses on designing the precoders and combiners that maximizes the minimum user rate for rate balancing among users, and thus the optimization problem can be formulated as Here, the objective R k is a nonconvex function because it includes C −1 k . Moreover, it is more challenging to solve the optimization problem (5) due to the nonconvex constant-modulus constraints for the RF precoder and RF combiners in (5b) and (5c), respectively. Thus, it is difficult to find a globally optimal solution of (5). Instead, in order to develop an optimization method with tractable complexity, we reformulate the rate balancing optimization problem into two separate design problems for RF processing (i.e. RF precoding and combining) and baseband processing (i.e. baseband precoding and combining).

III. PROPOSED MMSE-BASED HYBRID PROCESSING
In this section, we propose a new algorithm to design the RF precoder and combiners based on the matrix factorization method, and then derive a new MMSE-based rate balancing algorithm to design the baseband precoder and combiners. The overall procedure of the proposed design method is presented in Fig. 2.

A. MATRIX FACTORIZATION FOR DESIGNING RF PRECODER AND COMBINERS
We propose a new matrix factorization method to design the RF precoder and RF combiners for hybrid processing in MU-MIMO systems. Firstly, the fully digital precoder and combiners are obtained by employing the MMSE-based rate balancing technique in [43] that iteratively updates the precoder and combiners using the MSE duality between downlink and uplink. Then, the RF precoder (or combiner) for hybrid processing is determined by factorizing the fully digital precoder (or combiner) in the least squares (LS) sense. For example, given the fully digital precoder F o ∈ C M ×LK , the LS matrix factorization problem is formulated as Since it is obvious that the achievable rate is maximized when the maximum transmit power is used, the constraint in (5d) is replaced with the equality constraint in (6c). Also, when the RF precoder F is fixed, the optimal baseband precoder is given by where c is a scaling factor to meet the transmit power constraint (6c). In [21], it was shown that the power constraint can be removed without loss of local and global optimality. Following the approach in [17] and [21], we temporarily drop the power constraint (6c) and denote the baseband precoder as . Now, the matrix factorization problem (6a) can be rewritten as Note that the transmit power constraint will be considered in the design of the baseband precoder of Section III-C.
To further simplify the optimization problem, define the When F R is an optimal solution of (7a), F R D is also optimal for D = diag(e jθ 1 , e jθ 2 , · · · , e jθ M RF ) with arbitrary phases In other words, the optimal phase matrix R corresponding to the optimal RF precoder F R is not unique. Without loss of optimality, we restrict the first row of being a zero vector to obtain a unique solution. Then, we may write and the problem (7) can be reformulated in the following form by substituting F to F( ). where Since the constant-modulus constraint is removed in (9) by employing F( ) in (8), the optimal solution R ∈ R (M −1)×M RF can be found by solving the unconstrained minimization problem in (9).
In an attempt to develop a low-complexity algorithm for finding the optimal phase matrix R , we employ the gradient descent method. As stated in [15], it is natural to design the RF precoder or the baseband precoder as an orthogonal matrix, in order to mitigate the transmit power increment in concatenation of F R and F B for hybrid precoding. By imposing this constraint to the RF precoder, we insert a matrix projection step that makes the columns of F( ) be as close as orthonormal to each other, which is derived from the orthogonal projection technique in [45].
Specifically, let us denote the phase matrix at the ith iteration as (i). To apply the gradient descent method, Initialize i = −1 and each element of (0) is set to a random phase over [−π, π). 3. Compute the initial RF precoder F( (0)) by substituting Calculate Z 1 and Z 2 using (11). 7.
Update the phase matrix using the gradient descent method given by (12). 9.
Compute F( (i + 1)) by substituting (i + 1) into (8), then calculate the cost function f ( (i + 1)) from the definition in (9). 12. until |f ( (i + 1)) − f ( (i))| < 1 , where 1 is the tolerance for termination. 13. Output: we compute the gradient of f ( (i)) with respect to (i) from [21,Appendix B] as follows: where Im(x) is the imaginary part of a complex x, A(m : n, :) means the submatrix composed of rows m through n of a matrix A, and Z 1 (i) and Z 2 (i) are given by Using the gradient in (10b), the phase matrix is updated by the gradient descent method as below: where µ 1 is the step-size parameter. As a next step, we conduct the matrix projection. When the RF precoder corresponding to 0 (i + 1) has full rank, F( 0 (i + 1)) is factorized by the reduced singular value decomposition (SVD) as follows: where U p ∈ C M ×M RF is a complex orthogonal matrix, is a diagonal matrix whose diagonal elements are positive, and V p ∈ C M RF ×M RF is a unitary matrix. From the results in [45, Sec. III-F], the nearest tight frame to F( 0 (i + 1)) (i.e. a complex orthogonal matrix closest to F( 0 (i + 1)) in Frobenius norm) can be obtained as VOLUME 10, 2022  Here, the projected precoder satisfies F H p F p = I M RF , yet its elements do not have a constant amplitude. To make the RF precoder have a constant amplitude, we update (i) by only taking the phases of F p as follows: The proposed matrix factorization algorithm for designing F R is summarized as Algorithm 1. The matrix factorization problem for RF combiners is identical to that for the RF precoder except that no transmit power constraint is present. As stated in (9), the transmit power constraint is not used in the design of the RF precoder but utilized in the design of the baseband precoder in Section III-C. Therefore, the matrix factorization procedure in (10)- (15) can be applied to the design of RF combiners {W R,k } as well.

B. DOWNLINK AND UPLINK EQUIVALENT CHANNELS FOR MSE DUALITY
Using the RF precoder and combiners designed in the previous subsection, we compute the downlink effective channels From the effective channels for multiple users, we derive a new algorithm to design the baseband precoder F B and baseband combiners {W B,k } based on the MMSE-based rate balancing criterion. The proposed algorithm exploits the user-wise MSE balancing strategy derived from the MSE duality in [42] and the rate balancing scheme derived from the weighted MSE (WMSE) optimization in [43]. When fully digital precoders and combiners are designed in the MMSE sense via an iterative algorithm, the Frobenius norm of the precoder remains constant during iterations due to the transmit power constraint [43]. In contrast, when hybrid processing is used, the Frobenius norm of the baseband precoder, i.e. F B F , is not fixed but varied at every iteration to update the precoder and combiners, because the concatenated precoder is subject to the transmit power constraint in hybrid To take into account this fact, the proposed algorithm iteratively adjusts the Frobenius norm of the baseband precoder so that the concatenated precoder satisfies the transmit power constraint.
Given downlink (DL) and uplink (UL) effective channels, {G H k } and {G k }, respectively, Figs. 3 and 4 present the DL and UL equivalent channels for designing the baseband precoder and combiners. In the DL channel, the modulated symbol vector s is transmitted using the precoder is the DL transmit filter composed of the kth user filtering matrix V k ∈ C M ×L , and P = blkdiag{P 1 , P 2 , · · · , P K } is the DL power allocation matrix defined by a diagonal power allocation matrix for user k, P k ∈ R L×L + . Note that the ith column of V has a unit norm, i.e. v i = 1. Similarly, the receive combiner for user k is denoted as ×L is the receive filter for user k, the diagonal matrix β k ∈ R L×L + contains scaling factors ensuring that the columns of U have unit norm, i.e. u i = 1, and U = blkdiag{U 1 , U 2 , · · · , U K }. From the total power constraint in (6c), the matrix P meets the following constraint: Here, notice that the baseband equivalent transmit power, tr(P), is adjusted according to F R and V as explained in Section III-C.
In the UL channel, we switch the role of the transmit and receive filters. Thus, the transmit filter for user k is denoted as W k = U k Q 1/2 k and the multiuser receive filter is given by 88048 VOLUME 10, 2022 F H = Q −1/2 βV H , where Q = blkdiag{Q 1 , Q 2 , · · · , Q K } is the UL power allocation matrix composed of diagonal power allocation matrices for user k, Q k ∈ R L×L + , and β = blkdiag{β 1 , β 2 , · · · , β K }. In addition, we denote the overall UL channel as G = [G 1 G 2 · · · G K ].

C. DESIGN OF BASEBAND PRECODER AND COMBINERS FOR MMSE-BASED RATE BALANCING
This subsection describes the proposed algorithm to design the baseband precoder and combiners based on the MMSE and rate balancing criteria. When describing the proposed iterative algorithm, we omit the index for iteration to avoid clutter. For notational convenience, it is assumed that the noise variance of a DL receiver is identical for all users and also assumed that the noise variance of a DL receiver is the same as that of the UL receiver, 1 i.e. σ 2 1 = σ 2 2 = · · · = σ 2 K = σ 2 and n ∼ CN (0, σ 2 I M RF ).
Given U and Q, we compute the UL receiver filters {V k } and scaling matrices {β k } for MMSE combining as follows: where b UL k, = M RF m=1 |A k (m, )| 2 and k = 1, 2, · · · , K . Using the DL transmit filter V and receive filter U, the equivalent channels for DL and UL are given by respectively. In case that an optimal MMSE combiner is used, the UL MSE is computed as where Re(x) is the real part of a complex x and 1 LK ×1 means the LK × 1 all-ones vector.
On the other hand, using the MSE duality between the UL and DL, the DL power allocation matrix P 0 is obtained as where (m, n) = | G UL (m, n)| 2 , diag(diag(A)) means the diagonal matrix composed of the diagonal elements of A, and I LK is the LK × LK identity matrix. When fully digital processing is used at the transmitter and receiver based on the MMSE criterion as in [42] and [43], the DL precoding matrix defined as V P 1/2 0 satisfies the transmit power constraint, i.e. V P 1/2 0 2 F ≤ P. In case that hybrid precoding is used, the 1 It is straightforward to extend the proposed MMSE-based rate balancing algorithm derived in the following subsection for multiuser systems with different noise variance. transmit power constraint is changed to (17), as explained in Section III-B. Thus, the power allocation matrix P needs to be designed so that the baseband precoder F B = V P 1/2 satisfies the transmit power constraint (17). Unfortunately, the power allocation matrix P 0 obtained by (21) does not satisfy the power constraint for hybrid precoding. To tackle this issue, we propose a new procedure to adjust the transmit power of the hybrid precoder by scaling the diagonal matrix P. Specifically, given V and P 0 , the power allocation matrix conforming to (17) is obtained as where µ 2 is a step-size parameter to control the speed of power adjustment and P s ∈ R LK ×LK is a scaled power allocation matrix.
In a similar manner to (18), we can compute the DL receiver filters {U k } and scaling matrices {β k } as follows: where b DL k, = N RF m=1 |B k (m, )| 2 and k = 1, 2, · · · , K . Again, when an optimal MMSE combiner is used, we get the DL MSE vector for all data streams as From the dual expression of (21), the UL power allocation matrix Q is given by Now, by modifying the user-MSE optimization technique in [42] and [43], we update the UL power allocation Q. Let us define ξ k ∈ R K ×1 be the MSE weight for user k. Considering MSE balancing among users, the weighted UL MSE optimization problem can be formulated as where UL w,k = tr( k E UL k ), k ∈ C L×L is a MSE weight matrix for user k, P B = tr(P) is the total transmit power allowed for the baseband precoding in the UL, and E UL k is given by where F = [F 1 F 2 · · · F K ]. We decompose Q k =q k Q k , whereq k = tr(Q k ) is the individual power allocation for user k and Q k is the normalized power allocation matrix for user k satisfying tr( Q k ) = 1. For fixed { Q k }, we adjust {q k } to update the UL power allocation for MSE balancing. By substituting where a k , b kj , and c k are given by Here, F k = q k F k and W k = 1 √q k W k . Define matrices A and C as and denoteq = [q 1 ,q 2 , · · · ,q K ] T . Then, we can rewrite (28) in a vector-matrix form as where UL w = diag( UL w,1 , UL w,2 , · · · , UL w,K ). Denote that ξ = diag(ξ 1 , ξ 2 , · · · , ξ K ). By multiplying ξ −1 to both sides of (31), we have where UL is a constant at the optimal point of (26). To combine the first and second terms of the right-hand side of (32), we defineq where q ∈ R K ×1 + is an unconstrained power allocation vector. By replacingq with q in (32), we can rewrite Therefore, the optimal q is given by the principal eigenvector corresponding to the maximum eigenvalue of ξ −1 (A + σ 2 P B C1 K ×K ), and the optimal vectorq o can be obtained by scaling the optimal q using (33). Also, we update the UL power allocation matrix as whereq o k is the kth element ofq o . Note that the normalized power allocation matrix Q k is not changed but the individual power allocationq k is updated for MSE balancing according to (26). for j = 1 : J do 6.
Update the DL power allocation matrix using (22) and adjust the baseband transmit power as P (i) B = tr(P (i) ). 10.
Update the DL receiver filters {U Obtain DL from (24) and calculate the UL power allocation matrix Q (i) using (25). 12.
Find the optimal UL power allocation for MSE balancingq o,(i) using (29)-(30), the eigendecomposition of (ξ (i−1) ) −1 (A + σ 2 where g({x k }) = min(x 1 , x 2 , · · · , x K ) and 2 is the tolerance for termination. 17. Obtain the baseband precoder F B = V (i) (P (i) ) 1/2 and baseband combiners W B,k = (U (i) Finally, we formulate the max-min user rate optimization problem for ensuring rate balancing as follows: where ρ k is a weight for adjusting the achievable rate of user k. By defining the minimum ratio as a scaling factor t, we can write R k ρ k ≥ t = min From [43,Lemma 1], the DL achievable rate for user k can be expressed as where

is the DL MSE matrix given by
By substituting (38) into (37), we can obtain and by manipulating both sides of (40) and using DL w,k = tr( k E DL k ), we have where ξ k = log 2 | k | + L −R k is the MSE weight andR k = tρ k is the target rate for user k (i.e. R k ≥R k ). Because the optimal MSE weight matrix is given by k = (E DL k ) −1 , the maximum DL user rate is computed from (38) as follows: and the variables for rate balancing can be updated as below: The overall design procedure for baseband processing is summarized as Algorithm 2. As mentioned before, the concatenated hybrid precoder needs to meet the transmit power constraint in (17), so the initial transmit power P (0) B is computed using (22). Notice that the proposed algorithm iteratively adjusts the baseband transmit power at ith iteration P (i) B with (22) whenever V and P 0 are changed, whereas the MSE balancing method in [42] and rate balancing scheme in [43] update the precoder and combiners under a fixed transmit power constraint. The convergence of Algorithm 2 will be shown through numerical simulations in Section V-A.

IV. COMPLEXITY ANALYSIS
This section compares the time complexity of the proposed algorithms with existing schemes. Firstly, Table 1 presents the complexity order for various matrix factorization methods including the proposed Algorithm 1. Here, J 1 is the number of iterations for each factorization method to satisfy the termination condition. The complexity order of the gradient projection (GP) method is identical to the alternating optimization (AO) scheme with O(M 2 M RF ). However, as shown in [18], the GP method requires less computational complexity than the AO scheme in numerical runtime simulations, because the AO approach necessitates more complicated procedures for updating the RF precoder. The proposed matrix factorization technique has the same complexity order as the GP method, yet requires slightly more computational load for orthogonal projection via SVD. When M RF is proportional to M , the BFGS scheme has the highest complexity order. It is noticeable that the complexity order is identical to all algorithms when M RF is fixed. In Section V-B, it is demonstrated that the runtime of the BFGS scheme is comparable to that of the AO through numerical simulations with fixed M RF . Table 2 compares the complexity order of the proposed Algorithm 2 with those of existing design methods for the baseband precoder and combiners. Here, J 2 is the number of iterations for optimizing the power allocation in the ZF-based sum rate maximization (ZF-SRM) [35] and the ZF-based rate balancing (ZF-RB) [44], while it is the number of iterations for adjusting MMSE-based filters and RB-based power allocation in the MMSE-based design scheme [43] and the proposed Algorithm 2. N ns = M RF − (K − 1)L denotes the dimension of the null space obtained by BD of effective channels in ZF-based techniques. Whereas ZF-based schemes calculate the BD procedure of multiuser channels only once, the MMSE hybrid method and the proposed algorithm iteratively computes the DL and UL filters in combination with power adjustment. In general, it holds that J 2 M RF N ns K and M RF > LK , and thus the MMSE hybrid method and the proposed algorithm require more computational complexity than the ZF-SRM and ZF-RB schemes. We compare the runtime of overall hybrid processing algorithms in Section V-C.

V. SIMULATION RESULTS
Through numerical simulations, we present the convergence of the proposed algorithms in Section V-A, and the proposed Algorithm 1 is compared to conventional matrix factorization methods in Section V-B. In addition, the performance of the proposed hybrid processing with Algorithms 1 and 2 is compared with those of existing hybrid processing schemes under the perfect CSI and imperfect CSI, respectively. The baseline schemes considered in the simulations are as follows: • Fully digital MMSE processing [43]: the rate balancing technique in [43] is used to design the fully digital MMSE precoder and combiners for MU-MIMO systems. This method denotes the performance upper bound of MMSE-based precoding and combining in terms of maximizing the minimum user rate.
• Proposed MMSE-based hybrid method: the hybrid precoders and combiners are designed according to the proposed algorithms in Section III. The RF precoder F R and combiners {W R,k } are determined by Algorithm 1, and the baseband precoder F B and combiners {W B,k } are obtained by Algorithm 2.
• ZF-RB hybrid method [18], [44]: the RF precoder and combiners are designed by the matrix factorization method based on the GP method [18]. The baseband precoder and combiners are obtained by combining the BD technique with the power allocation method for rate balancing in [44].
• MMSE hybrid method [17], [43]: the RF precoder and combiners are designed by the matrix factorization method based on the AO method [17]. The baseband precoder and combiners are designed by the MMSE-based iterative algorithm in [43] and then scaled by multiplying a constant to meet the transmit power constraint.
• Corr.-based MMSE hybrid method [34]: following the approach in [34], the RF precoder and combiners are jointly constructed by sequentially selecting the beamformer and combiner with the maximum correlation from predetermined codebooks. Algorithm 2 is used to design the baseband precoder and combiners.
• ZF-SRM hybrid method [18], [35]: the RF precoder and combiners are designed by the matrix factorization method based on the GP method [18]. The baseband precoder and combiners are obtained by the BD technique with the water-filling algorithm for sum-rate maximization in [35].
• Random RF processing: the RF precoder and combiners are defined as random matrices whose elements have a constant magnitude and random phases, respectively. Algorithm 2 is used to design the baseband precoder and combiners. This scheme presents the performance lower bound of RF precoding and combining.
To generate the mmWave MU-MIMO channels, we set the regarding parameters as follows: the carrier frequency is 28 GHz; the number of clusters is 3; the number of subpaths per cluster is 8; the angle-of-departure (AoD) and angleof-arrival (AoA) for each cluster are uniformly distributed from −π to π in the azimuth direction and from −0.5π to 0.5π in the elevation direction, respectively; the subpath angular spread is set to π/64 and π/16 for the transmitter and receiver, respectively, by assuming it is identical for azimuth and elevation directions; and the inter-element spacing is equal to half wavelength for both the transmitter and receiver. Considering the distance variation from the transmitter to the receiver in MU-MIMO systems, the average channel gains are set asymmetrically with 10 dB deviation. Specifically, where ζ k is a random variable with uniform distribution in the range of (0.1, 1.0). The nominal signal-to-noise ratio (SNR) is defined as the total transmit power over the noise variance, i.e. P/σ 2 . The convergence behaviors in Figs. 5-7 present numerical results obtained from an instantaneous channel realization, while the results in Figs. 8-14 are obtained by averaging the minimum user rate or runtime over more than 100 independent channel realizations.

A. CONVERGENCE OF PROPOSED ALGORITHMS
This subsection verifies the convergence of Algorithms 1 and 2 through numerical simulations, when the CSI is perfectly 88052 VOLUME 10, 2022  known to the transmitter. Fig. 5 shows the convergence behavior of the proposed matrix factorization scheme described as Algorithm 1, when K = 1, M RF = 3, and SNR = 20 dB. Blue curves mean the cost function f ( (i)) defined in (9) representing the squared Frobenius norm of the error matrix, and red curves denote the achievable rate obtained by the factorized hybrid precoding matrices when the receiver uses the optimal fully digital combining matrix. For all cases, as the number of iterations increases, the cost function gradually decreases while the achievable rate rapidly grows. Specifically, the cost function f ( (i)) converges to a steady state after about 300 iterations for all antenna configurations, and the steady-state value increases with the increment of the number of transmit and receive antennas because the number of elements in F o is proportional to the number of transmit antennas M . The achievable rate converges faster than the cost function so that the achievable rate reaches a near-peak value after about 100 iterations regardless of the number of antennas.
To show the convergence of the rate balancing algorithm summarized as Algorithm 2, we present the change of user rates according to the number of iterations in Figs. 6 and 7, when P = 1, K = 4, M = 32, and SNR = 20 dB or SNR = 5 dB. Here, the RF precoder and combiners were designed by decomposing the fully digital precoder F o and the fully digital combiners {W o,k } using the proposed matrix factorization algorithm, respectively. At every iteration, the red curve represents the instantaneous total transmit power defined as P t = F R F B 2 F = F R V (i) (P (i) ) 1/2 2 F and the other curves denote the achievable rates of four users, respectively. The instantaneous transmit power P t converges to the maximum transmit power P = 1 as the number of iterations increases. Whereas huge rate variations appear among users during the initial transient period, user rates gradually converge to a common steady-state value after 150 iterations in Fig. 6 and 100 iterations in Fig. 7, respectively. In general, the achievable rates of users tend to converge faster in the low SNR region than in the high SNR regime, because the steady-state user rate is lower in the low SNR region.

B. PERFORMANCE COMPARISON OF MATRIX FACTORIZATION METHODS
Various matrix factorization techniques are evaluated to design the RF precoder and combiners in terms of the minimum user rate and runtime. Specifically, the proposed Algorithm 1 is compared to existing matrix factorization methods such as the AO algorithm [17], the gradient method [18], and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [21]. Algorithm 2 was commonly utilized to design the baseband precoder and combiners. Fig. 8 compares the minimum user rate according to SNR, when K = 4 and M = 32. The fully MMSE processing denotes the performance upper bound achieved by the fully digital precoding and combining in [43]. In combination with a proper matrix factorization algorithm for RF processing, the baseband precoder and combiners are designed using the Proposed Algorithm 2 in the proposed hybrid method and the MMSE-based iterative scheme in the MMSE hybrid method, respectively. In the proposed hybrid method, the proposed Algorithm 1 outperforms the conventional factorization schemes such as the AO algorithm, the GP method, and the BFGS algorithm, while achieving the minimum user rate comparable to the fully digital MMSE processing denoting the upper bound. On the other hand, the AO algorithm obtains the highest minimum user rate in the MMSE hybrid method. When a ZF-based method is used to design the baseband precoder and combiners, the inter-user interference is completely removed and the performance is not so sensitive to the matrix factorization method but the power allocation scheme for ensuring rate balancing among users. For this reason, we use the GP method with the lowest complexity for the ZF-based baseline schemes such as the ZF-RB and ZF-SRM hybrid methods.
To compare the computational complexity of various matrix factorization methods, we present the average runtime across the number of transmit antennas when K = 4 and SNR = 20 dB. The average runtime was measured using a software implemented by MATLAB R2022a and a server with i7-12700 4.9 GHz CPU, 16 GB RAM, and 64-bit operating system, and every point was obtained by averaging the execution time over more than 100 independent channel realizations. Since M RF = KN RF = 12 irrespective of the number of transmit antennas, all matrix factorization algorithms have the complexity order O(M 2 ) from Table 1. In Fig. 9, the GP method requires the lowest runtime regardless of the number of transmit antennas. As mentioned in Section IV, the proposed matrix factorization algorithm necessitates slightly more computations for orthogonal projection compared to the GP method, and thus Algorithm 1 has slightly larger runtime than the GP method. It is noticeable that the proposed method achieves at least 4 dB SNR gain compared to the GP method in Fig. 8. Moreover, because the AO and BFGS methods require more complicated procedures for updating the RF precoder than Algorithm 1, the runtime of the proposed method is just 2.2 ∼ 10.8% and 7.0 ∼ 31.1% compared to those of the AO and BFGS algorithms, respectively.

C. PERFORMANCE EVALUATION UNDER PERFECT CSI
This subsection compares the minimum user rate of the proposed method with existing hybrid processing schemes when the perfect CSI is available at the transmitter. The minimum user rate is presented for various hybrid processing schemes according to the SNR in Fig. 10, the number of users in Fig. 11, and the number of transmit antennas in Fig. 12. The proposed MMSE-based hybrid method performs very close to the fully digital MMSE processing attaining the performance upper bound, irrespective of the SNR regions, the number of users, and the number of transmit antennas. The proposed method mitigates the inter-user interference by designing the baseband precoder and combiners in the MMSE sense, whereas the ZF-based techniques enforce the baseband precoder to completely remove the inter-user   interference through additional constraints. Thus, the proposed method obtains better minimum user rate than the ZF-RB and ZF-SRM hybrid methods except the case with no inter-user interference like K = 1 of Fig. 11. As the number of users increases in Fig. 11, the minimum user rate decreases faster in the ZF-based methods than other MMSE-based schemes, because the MMSE-based schemes mitigate the inter-user interference more effectively than the ZF-based methods. In Fig. 10, the MMSE hybrid method shows slightly worse minimum user rate than the ZF-RB method due to the performance loss of the scaling procedure for complying with the transmit power constraint. The ZF-RB hybrid method exhibits much better performance than the ZF-SRM hybrid method, because the power allocation is conducted for rate balancing in the ZF-RB method and for sum-rate maximization in the ZF-SRM method. Moreover, the proposed MMSE-based hybrid method outperforms the corr.-based MMSE hybrid method and the random RF processing for all cases. In Fig. 12, the performance difference between the proposed scheme and the ZF-based hybrid method decreases with the increment of the number of transmit antennas due to the reduction of inter-user interference. Also, notice that the minimum user rate for the random RF processing is almost the same regardless of the number of transmit antennas, because the RF precoder does not achieve beamforming gains. Fig. 13 presents the average runtime of the overall hybrid processing methods according to the number of transmit antennas when K = 4 and SNR = 10 dB. We used the same server as in Fig. 9 for measuring the average runtime. The overall hybrid processing method is composed of the matrix factorization and the design procedure for the baseband precoder and combiners in Tables 1 and 2   method. In contrast, the complexities for the matrix factorization and the baseband design procedure are comparable in the MMSE hybrid method, and thus its complexity order is given by O(J 1 M 2 M RF ) + O(J 2 M 3 RF ). When the parameters are set as L = 2, K = 4, and M RF = KN RF = 12, the complexity order is given by O(J 1 M 2 )+O(J 2 ) for the MMSE hybrid method and O(J 2 ) for the other methods. In Fig. 13, the average runtime increases with the increment of M in the MMSE hybrid method, whereas the runtime is almost the same irrespective of the number of transmit antennas in the other hybrid processing schemes including the proposed method. The proposed method has higher runtime than the ZF-based methods, because the MMSE-based baseband design requires more computational load than the ZF-based design as shown in Table 2.

D. PERFORMANCE EVALUATION UNDER CSI UNCERTAINTY
Considering CSI errors in practical systems, we compare the performance of various hybrid processing methods. VOLUME 10, 2022 CSI uncertainty is caused by the channel estimation error and/or the outdate of CSI in time-varying channels. The channel with CSI errors can be represented aŝ where k ∈ C N ×M is a CSI error matrix for user k whose elements are i.i.d. complex Gaussian random variables with zero mean, and k = 1, 2, · · · , K . To describe the power of CSI errors relative to the channel power gains, we define the normalized MSE (NMSE) as follows: Applying {Ĥ k } in (44) instead of {H k } as the input of Fig. 2 (3) and (4) intô F R ,F B , {Ŵ R,k }, and {Ŵ B,k }, respectively, we can computê R k , i.e. the achievable rate for user k under CSI uncertainty. Fig. 14 shows the minimum user rate of various hybrid processing schemes according to the NMSE, when K = 4, M = 32, and SNR = 25 dB. For simplicity, we assume that the NMSE is identical to all users. As expected, the minimum user rate gradually decreases with the increment of the NMSE for all processing methods. As in the perfect CSI scenarios, the proposed MMSE-based hybrid method performs better than the ZF-RB and ZF-SRM hybrid methods irrespective of NMSE, while achieving the minimum user rate very close to that of the fully digital MMSE processing. Moreover, the proposed scheme obtains huge gains in the minimum user rate compared to the corr.-based MMSE hybrid method and the random RF processing.

VI. CONCLUSION
A new MMSE-based design method was proposed for hybrid processing in the downlink of mmWave MU-MIMO systems that computes the RF precoder and combiners using the proposed matrix factorization algorithm and obtains the baseband precoder and combiners via the proposed rate balancing algorithm. Considering the matrix concatenation for hybrid precoding, the proposed matrix factorization scheme makes the columns of the RF precoder near-orthogonal and the proposed rate balancing algorithm adjusts the internal transmit power for baseband precoding. Various numerical simulations demonstrate that the proposed method performs better than existing hybrid processing techniques in terms of maximizing the minimum user rate with reasonable computational complexity.
The proposed method can be utilized to design the hybrid precoders and combiners for future 5G-Advanced and 6G mobile systems with large-scale antenna elements deployed in mmWave and Terahertz bands. In addition, the proposed matrix factorization scheme for constant-modulus RF processing can be exploited to a wireless communication link with an intelligent reflecting surface (IRS) which enhances the link performance by controlling phase shifts of IRS elements.