Near-Optimal Design for Hybrid Beamforming in mmWave Massive Multi-User MIMO Systems

Millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems can obtain sufficient beamforming gains to combat severe path loss in signal propagation. The hybrid (analog/digital) beamforming with multiple data streams can be utilized to further improve mmWave spectral efficiency. In this paper, we focus on the hybrid beamforming design of a downlink mmWave massive multi-user MIMO (MU-MIMO) system based on full-connected structure, and aim to maximize the sum rate of the overall system as an objective function. In the analog beamforming stage, a piecewise successive iterative approximation (PSIA) algorithm is proposed to design the analog beamformer and combiner. This algorithm not only has a linear property, but also can obtain closed-form solutions. In the digital beamforming stage, the piecewise successive approximation method is utilized to design the digital beamforming based on the criterion to avoid the loss of information, which can help reduce the computational complexity and is also implemented simply. The results show that the proposed scheme achieves good sum-rate performance in the mmWave massive MU-MIMO system, and outperforms the state-of-the art MIMO hybrid beamforming design schemes, even when the number of base station antennas is not very large.

In addition, the high directivity can help the design of beamforming techniques to direct the signal in a certain direction, which overcomes the high PL problem and establishes reasonable signal-to-noise ratio (SNR) links [10].
Although the rationales of beamforming are the same regardless of the carrier frequency, signal processing in mmWave systems is subject to a set of important practical constraints. For example, traditional MIMO systems often perform digital linear beamforming at baseband, which enables controlling both the signal's phase and amplitude and supporting multi-stream multi-users communications [11], [12]. However, the baseband beamforming (the digital beamforming) requires not only dedicated baseband processor, but also radio frequency (RF) hardware and analog-to-digital converter (ADC) for each antenna element. When a large number of antennas is deployed, the high hardware cost, complexity, and power consumption of digital beamforming architecture become unaffordable to be implemented in practice. Therefore, it forces mmWave systems to rely heavily on the analog or RF beamforming processing [13]. Both beamforming and combining of analog beamforming processing are implemented by a network of analog phase shifters controlling the phase of the transmitted signal at each antenna element in the RF domain, which have been applied to mmWave wireless local area network (WLAN) systems to provide a simpler architecture [14], [15]. Compared to the digital beamforming processing, since the substantially reduced number of RF chains, there are implementation benefits in terms of lower hardware complexity and lower power consumption. However, the analog beamforming is subject to additional constraints, e.g., spatial multiplexing of the beams is impossible, an RF chain only shapes one beam in a cycle and is only applied to one data stream scenario, which cannot improve the spectral efficiency. Moreover, the phase shifters are controlled digitally and obtain only quantized phase values. Therefore, these constraints limit the development of analog beamforming. For a better tradeoff between the performance and costs, a hybrid beamforming approach that combines analog and digital beamforming is proposed for the mmWave massive MIMO systems. Since the hybrid beamforming achieves similar performance to the full-digital one with much lower power consumption and hardware complexity, it has attracted a great deal of research attention [16], [17].
The achievable sum rate and the mean squared error (MSE) are two important optimization objectives for the hybrid beamforming design problems [18], [19]. Since the former is an important performance evaluation standard in mmWave systems, the design aims to maximize the sum rate for hybrid beamforming in this paper. However, many challenges have arisen from maximizing data rates under the constraints derived from the hybrid architecture. Currently, this problem is solved by two methods in the existing research work. One is to jointly design the analog and digital beamformer/combiner and the other is a two-stage method, wherein the analog beamformer/combiner is designed separately from the digital beamformer/combiner.
The joint design methods are widely used for hybrid beamforming to approach full-digital performance for single-user MIMO (SU-MIMO) and multi-user MIMO (MU-MIMO) scenarios. For SU-MIMO systems, by fully exploiting the sparsity of the channel, a least square (LS) and an approach utilizing matching pursuit (MP) can decompose full-digital beamforming into a separate analog and digital beamforming for mmWave channels [20], [21]. Based on employing an iterative algorithm and approaching the non-convex optimization with a convex problem, the method aiming at full-digital single-user solutions can be utilized to jointly design an analog and digital precoder/combiner [22]. For MU-MIMO systems, a weighted sum mean square error (WSMSE) minimization approach is proposed to jointly design analog and digital beamforming, which aims to approximate the performance of the block diagonalization (BD) solution for full-digital beamforming [23]. An over-sampling codebook (OSC)-based hybrid minimum sum-mean-square-error (min-SMSE) precoding scheme for mmWave MU-MIMO systems to optimize the BER is proposed in [24].
The two-stage method is widely utilized for designing hybrid beamforming for MU-MIMO communications to approximate the capacity. Specifically, most MU schemes prefer harvesting energy based on the channel matrix in the analog stage, and further eliminating the inter-user interference based on the baseband beamformer which takes the influences of the channel matrix and the RF beamformer into account in the following digital stage. The widely used method is the zero forcing (ZF). For example, a lowcomplexity hybrid BD scheme is proposed to harvest the large array gain through the RF beamforming and combining, and then digital BD processing is performed by the generalized ZF in conjunction with an equal gain transmission (EGT) scheme [25]. A hybrid beamforming design method based on the Modified Generalized Low Rank Approximation of Matrices (MGLRAM) is proposed to perform the ZF combined with an iterative procedure [26]. Further, maximizing the sum rate of the equivalent baseband channel in the analog stage, followed by excluding inter-user interference in the digital stage, is investigated in [27] and [28]. The proposed hybrid regularized channel diagonalization (HRCD) scheme is utilized by simple non-iterative processing for digital beamforming, and EGT method for analog beamforming [29]. Furthermore, an approach based on the criterion of minimizing mean square error to maximize the signal-to-leakage-plus-noise ratio of each user in the digital stage is studied in [30]. Beside the above design criterion, the authors of [31] introduced a method based on leveraging BD technology to eliminate the inter-user interference in the analog stage and minimize the mean square error of each data stream to harvest energy in the digital stage. Note that there are other schemes that utilize the two-stage method to design SU-MIMO communications. For instance, a method that maximizes the single-user spectral rate by optimizing analog processing with fixed digital processing is proposed in [32]. A low-complexity iterative matrix decomposition based hybrid beamforming (IMD-HBF) scheme is proposed to obtain the optimal analog and digital solutions [33]. Among the above two-stage design schemes, hybrid beamforming can realize the optimal full-digital beamforming if and only if the number of RF chains is twice the number of data streams, e.g., the method is presented and discussed in [28], but this approach is implemented at the expense of high power consumption and costs. Then, the method of leveraging the BD technology to eliminate the inter-user interference will reduce the system performance since the overlap of the row subspace of each user channel matrix becomes significant when the number of users is large. Although the method presented in [33] outperforms the BD technology, it has a high number of iterations, which enhances the computational complexity of the system.
In this current paper, we focus on the hybrid beamforming design of mmWave massive MU-MIMO system with full-connected structure, where the single BS equipped with a large antenna array is assumed to serve several multi-antenna multi-stream users. Employing the two-stage method, perfect channel state information (CSI) is derived to design the analog and digital beamformer/combiner. The main contributions of this paper can be summarized as follows.
• In the analog beamforming stage, we minimize the MSE between the equivalent analog transceiver signals to reduce the information loss of signals in the analog channel transmission, which can achieve the purpose of maximizing the mutual information of the analog transceiver signals. Meanwhile, a piecewise successive iterative approximation PSIA) algorithm is proposed to design the analog beamforming in the internal iteration. The proposed design scheme can obtain the optimal saturation value with fewer iterations.
• In the digital beamforming stage, the piecewise successive approximation method is utilized to design the digital beamformer and combiner, which is based on the criterion to avoid the loss of information at each stage. The complexity of the proposed design method is lower than that of the baseband BD technology which is combined to design digital beamforming by the state-ofthe art schemes such as EGT-BD [25], MGLRAM [26], HyEB [27], and HySBD [28], etc. In addition, the performance of the proposed design method is superior to the baseband BD technology in terms of eliminating the inner-user and inter-user interferences.
• Under the condition that the number of RF chains is the same as the number of data streams, the performance of the proposed hybrid beamforming system outperforms the state-of-the art for hybrid beamforming systems. Further, the proposed design scheme has higher and more stable power efficiency than the existing schemes. Even when the number of BS antennas is not very large, the proposed design also shows superior performance in terms of sum rate. In addition, the solutions obtained in this paper are closed-form ones.
The remainder of this paper is organized as follows. Section II briefly introduces the channel and system models of the mmWave massive MU-MIMO system. The original problem of the system design is formulated and discussed in Section III. Section IV presents and discusses the simulation results of the proposed hybrid beamforming design scheme. Finally, conclusions are drawn in Section V.
Notations: Bold upper-case and lower-case letters represent matrices and column vectors, respectively; (·) −1 , (·) T , and (·) H denote inversion, transpose, and conjugate transpose, respectively; The Frobenius norm of the matrix A and the 2-norm of the vector a are expressed as A F and a 2 , respectively. A (i, j), A (:, j), and A (i, :) respectively denote the (i, j)th complex element, jth column vector, and ith row vector of matrix A, and |A (i, j)| is the amplitude; A (i : j, :) and A (:, i : j) represent the matrix consists of vectors from rows i to j and columns i to j of the matrix A, respectively; I N is the identity matrix of size N × N ; CN (0, σ 2 ) is the complex Gaussian distribution with mean 0 and the variance σ 2 ; A denotes the operation of getting the angle of each entry in matrix A; D l×l and C m×n describe a real diagonal matrix of dimension l × l and a complex matrix of dimension m × n, respectively; tr {·} and Re (·) indicate the trace and real part taking operators, respectively; vec (·) and unvec m×n (·) are the vectorization and maxicization, respectively; (·) , denotes the th diagonal element of a matrix. Expectation operator is denoted by E [·]. The determinant and block diagonalization operation of a matrix are respectively expressed as |·| and blk (·).

II. SYSTEM DESCRIPTION
In this section, we present the mmWave signal and channel model considered in this paper.

A. SYSTEM MODEL
Consider the downlink of the mmWave massive MU-MIMO system with full-connected subarray structure shown in Fig. 1 in which a BS serves K users simultaneously. The BS is equipped with a large number, N BS , of antennas and FIGURE 1. Block diagram of the mmWave massive MU-MIMO system with hybrid beamforming structure. VOLUME 8, 2020 M BS RF chains, and each user is equipped with N MS antennas and M MS RF chains to support N S data streams in a parallel mode. To enable multi-stream communication and reduce the complexity of the hardware, the number of RF chains is constrained by K N S ≤ M BS ≤ N BS for the BS and N S ≤ M MS ≤ N MS for each user. As can be seen in Fig. 1, the transmitted symbol s with the total transmit power constraint P t first passes through a diagonal power allocation matrix P ∈ D K N S ×K N S which distributes power to the transmitted symbol s k of each user and satisfies P 2 F = P t . Then, the symbol s after power allocation is processed by digital beamforming using a baseband beamformer F BB ∈ C M BS ×K N S . After beamforming in the baseband domain, an RF beamformer F RF ∈ C N BS ×M BS is applied for analog beamforming. Therefore, the discrete-time transmitted signal is finally represented as For simplicity, we consider a block fading channel model [34], e.g., the narrowband flat fading channel model, which yields a received signal of each user where y k ∈ C N MS ×1 is the kth received vector, H k ∈ C N MS ×N BS is the channel matrix from BS to the kth user, and n k ∈ C N MS ×1 is the corresponding complex additive white Gaussian noise vector in which the elements follow the independent and identically distributed (i.i.d.) complex Gaussian distribution with zero mean and variance σ 2 , i.e., n k ∼ CN 0, σ 2 . To enable beamforming, we assume that the CSI is known perfectly and instantaneously to both the BS and each user. In practical systems, CSI at the receiver can be obtained via training, then shared with the BS via limited feedback from the receiver to the BS [35]. At the receiver, each user employs its analog phase shifters and digital combiner to obtain the processed received signal where W k RF ∈ C N MS ×M MS is the analog combining matrix and W k BB ∈ C M MS ×N S is the digital combining matrix for the kth user. Similar to analog beamforming, the analog combining is implemented by using phase shifters. Therefore, W k RF also satisfies the constant amplitude constraint W k RF (:, i) W k RF (:, i) H , = N −1 MS . When Gaussian symbols are transmitted over the mmWave channel, the achievable sum rate is given by [36] where k i = (k − 1)N S + i, s k i is the ith entry of the signal s k of the kth user sent by BS, and P k i is the corresponding power allocation. The first term on the right side of (4) indicates the desired signal, and the other three terms represent inner-user interference, inter-user interference, and noise, respectively. Hence, the sum rate in (3) can be rewritten as where SINR k i is the signal to interference to noise ratio (SINR) of the signalŝ k i , which can be calculated by the ratio of the desired signal energy in (4) to the interference plus noise energy of the remaining terms. The SINR k i is formulated as where k ∈ {1,2, . . . ,K }, i ∈ {1,2, . . . ,N S }.

B. CHANNEL MODEL
The mmWave propagation has limited spatial selectivity or scattering, which will lead to high free-space path loss. However, traditional MIMO channel models cannot accurately reflect this characteristic. Similarly, the massive tightly-packed antenna arrays adopted in mmWave transceivers lead to high levels of antenna correlation. If the statistical fading distribution in traditional MIMO analysis is used in the mmWave channel modeling, it becomes inaccurate [11]. Therefore, we adopt a narrow band channel model with uniform linear arrays (ULAs), such as the extended Saleh-Valenzuela model, to obtain the mathematical structure of mmWave channel accurately [37]. We assume that the scattering channel has N c scattering clusters, each of which consists of N p propagation paths. Therefore, the discrete-time narrowband channel matrix of the kth user can be expressed as where γ = N BS N MS N c N p is a normalization factor, and the channel matrix satisfies E H k il is the complex gain of the lth ray in the ith scattering cluster for the kth user, which follows the independent Gaussian distribution, i.e., α k il ∼ CN (0, 1). θ k il and φ k il represent the azimuth angles of arrival/departure (AoAs/AoDs) of the lth ray in the ith scattering cluster for the kth user, which obey the truncated Laplacian distribution [25]. The functions MS θ k il and BS (φ k il ) denote the transmit and receive antenna array gain at the corresponding angles of departure and arrival. Finally, a MS (θ k il ) and a BS (φ k il ) are the normalized antenna array response vectors at an azimuth angle of θ k il and φ k il , respectively. For the sake of simplicity but without loss of generality, we assume that when both the BS and each user adopt ULAs, the array response vector a MS (θ k il ) and a BS (φ k il ) can be presented as [37] where j = √ −1, β = 2π λ , λ is the carrier wavelength of the signal, and d is the inter-element spacing, e.g., d = λ 2 .

III. MULTIUSER HYBRID BEAMFORMING DESIGN
This section discusses the hybrid beamforming design of the downlink mmWave massive MU-MIMO system with the full-connected structure. The design goal is to maximize the sum rate of the system expressed in (3), hence the optimization problem can be formulated as where F RF and W RF are the feasible sets of constant-modulus complex numbers of F RF and W RF , respectively. Since both the objective function and the constraints are nonconvex, the original problem in (10) is nonconvex. Solving the original problem in (10) directly, the five matrix variables (F RF , F BB , W RF , W BB , P) need to be jointly optimized, and finding the global optima of the joint optimization problems with similar constraints is intractable for the MU-MIMO case [38]. Therefore, we utilize the two-stage design method to obtain the analog and digital beamforming solutions of the original problem, thereby reducing the difficulty of the solution process. The proposed PSIA method is utilized to obtain the closed-form optimization solutions of F RF and W RF , where each total iteration includes one external and m internal iterations. Then, the equivalent baseband channel H is exploited to design the optimal digital beamformer F BB and combiner W BB . Finally, the power allocation matrix P is designed by using waterfilling.

A. DESIGN OF INITIAL ANALOG COMBINING MATRIX
Assuming the signal transmission mode can be described as To avoid the data loss, we firstly utilize the minimum MSE (MMSE) between x and x in the analog stage as the objective function to obtain the analog combining matrix W RF_init with the optimal phase. Then W RF_init is used as the initial value of the analog beamforming design to maximize the baseband channel capacity, so as to reduce the number of internal iterations of the design in the analog stage.
According to the previous analysis, the analog combiner W RF_init is not only constrained to be constant-modulus, but also constrained to be block-diagonal. Therefore, the MMSEbased analog combining problem is formulated as follows It is worth noting that the expectation operation in (11) is difficult to be handled. For this reason, we first assume the analog beamformer F RF is fixed, then the objective function in (11) can be rewritten as VOLUME 8, 2020 where y x = HF RF x + n. By introducing a constant matrix (12) can be re-expressed as SinceWE y x y H x W H is a constant value, (12) can be reformulated by using (13) as follows It can be seen from (14) that the solution of the objective function is independent of the constant term, by removing the constant term in (14), the minimization problem in (12) can be equivalent to solving the following problem as Although (11) has been transformed into the problem with no-expectation operation, there are still other constraints in (P). To tackle the block-diagonal constraint in W RF_init , the matrix operator can be converted into a vector operator, based on the properties of Kronecker product: vec where ⊗ represents the Kronecker product between two matrices. Meanwhile, to satisfy the constant-modulus constraints, we initialize W RF_init (i, j) = 1. The problem (P) can be reformulated as where d = vec W R 1 2 x and A = R 1 2 x H ⊗ I K M MS . It is observed from (16) that the variable w RF_init has many zero elements. Since these zero elements do not contribute anything in the procedure of matrix multiplication, hence that the zero elements in w RF_init can be removed [18]. The new variable vector is given bŷ (16) is eliminated. Therefore, (16) can be rewritten as whered andÃ are respectively the vectors and matrices generated from d and A after removing the columns corresponding to the zero elements in w RF_init ,x =ŵ RF_init , and the vector "1" denotes the all-one vector. Inspired by the SCF algorithm [39], the constant-modulus constraint can be eliminated by considering (18) in real domain. Hence, consider the sequence of constraint with wherex (n) l , l = 1, 2, . . . , L is the lth element ofx (n) , L = K N MS M MS , and (n) denotes the nth optimization procedure. Replacing the constant-modulus constraint in (18) by (19), (18) can be rewritten as To illustrate the constraint (19) is adjusted to satisfy the constant-modulus constraint, letx (n−1) be the solution which satisfies the constraint Re B (n−1)x(n−1) = 1, and the constant-modulus affine solution ofx (n−1) is given bỹ , and the constraints of the next problem Q (n+1) are the same as problem Q (n) , i.e., Re B (n+1)x(n+1) = 1 and Re B (n)x(n) = 1 are equivalent. Thus,x (n+1) =x (n) is derived which means the algorithm converges. Otherwise, the constraint is updated through the constant-modulus affine solution ofx (n) according to (20). As a conclusion, the obtained solution converges to a constant-modulus one by the adaptive constraint.
For transforming the cost function into the completely equivalent real-valued version, be the optimal solution of Q (n) and x (n) be the complex version defined as wheres (n) l is the lth element ofs (n) . In this case, the matrix B (n) is defined as Therefore, the optimal solution of complex-valued problem Q (n) can be obtained by solving the completely equivalent real-valued problem as follows Note that,Q (n) is a convex optimization problem with linear equality constraints. Therefore, according to (24), the Lagrange function ofQ (n) is given by where λ is the Lagrange multiplier. Let the partial derivative of (25) with respect to s and λ are ∂L(s,λ) ∂s = 0 and ∂L(s,λ) ∂λ = 0, respectively. Thus, we have the equation set as follows Since the coefficient matrix (Lagrange matrix) of the equation set is non-singular, Lagrange matrix is invertible. Then the inverse matrix can be expressed as Based on the properties of inverse matrix: X −1 X = I, we can derive SinceÂ HÂ is the Hermitian matrix, its inverse matrix exists. Then C, D, and Z can be expressed as Both sides of (26) are multiplied by the inverse of the Lagrange matrix, the solution ofQ (n) can be obtained as where λ = −2DÂ HD − Z1. Although the problem in (30) cannot obtain an optimal amplitude solution, an optimal phase solution can be obtained.

B. ANALOG BEAMFORMING AND COMBINING MATRICES DESIGN
In the previous subsection, the analog combining matrix W RF_init with optimal phase has been obtained. Thus, the PSIA method is utilized to solve the optimal analog beamforming F opt RF and combining W opt RF in this subsection.
According to the rationales of information theory, when the inner-user and inter-user interference are eliminated and the downlink broadcast channel capacity of the overall baseband channelH can be reached, the capacity of the system is equal to the mutual information between x andx, i.e., R = I x,x [28]. Since the proposed digital beamforming is designed based on SVD, the resulting solutions are unitary matrices and can make the baseband channels of different users mutually orthogonal. In addition, the digital beamforming matrix satisfies F RF F BB 2 F = K N S . Therefore, maximizing the capacity of the overall system can be approximately equal to maximizing I x,x , i.e., R ≈ I x,x , and the expression of I x,x is described as It can be seen from (32) that maximizing I x,x is equivalent to maximizing W H RF HF RF 2 F , which means that the maximum equivalent baseband channel is only related to the VOLUME 8, 2020 solution of analog beamforming F opt RF with W RF is fixed. Thus, the problem of analog beamforming can be formulated as where H comp = W H RF_init H. Inspired by the method which tries to avoid the loss of information [28], the SVD of composite channel is defined as where U comp and V comp are the left and right singular value vector matrices, respectively, and comp is the singular value vector matrix sorted by descending order. Therefore, the objective function in (33) can be reformulated as Further, we define the following two partitions of the matrices comp and V comp as (34) gives rise to the following approximate expression of (34) As can be seen obviously from (36) that the objective function tr comp is maximized when the unconstrained analog beamforming F RF = V 1 comp . Unfortunately, V 1 comp does not satisfy constant-modulus constraint. However, the Frobenius norm can be employed to compute the distance between the unconstrained and the constrained solutions. Therefore, F RF can be obtained by solving Then, the objective function in (37) is expanded as follows where ϕ (i, j) = F RF (i, j) − V 1 comp (i, j). From (38), we observe that when ϕ (i, j) = 0, i.e., F RF (i, j) = is minimized. Therefore, the optimal beamforming matrix can be expressed as Next, the obtained F RF is substituted into the objective function of maximizing the equivalent baseband channel to solve the optimal analog combining W RF . Since W RF is blockdiagonal, the objective function is formulated as Then, the total analog combining matrix W RF is obtained by block diagonalization. It is worth noting that the initial value of subproblems P (n) i i=1,2 (internal iteration) derives from the problem P (external iteration). In the mth internal iteration of the nth external iteration, the subproblem P . Therefore, the monotonically non-increasing of the sequences produced by the subproblems P and eventually the objective values converge [18]. Then, the convergence of P (n) can be guaranteed by B (n) . Therefore, the proposed analog beamforming design method is monotonically non-increase and eventually converges. In addition, the stop criterion of the external iteration for the proposed method is set as f (n+1) − f (n) ≤ ε, where ε is a small factor, and f (n) is the objective value of (11) in the nth external iteration, namely, 129160 VOLUME 8, 2020 In conclusion, the advantages of the analog beamformer and combiner designed by the above methods are that the channel information can be utilized more adequately to improve the sum rate of system. The overall procedure of the proposed analog beamforming scheme is summarized as Algorithm 1. for m = 1 to N iter do 10: Compute

C. DIGITAL BEAMFORMING AND COMBINING MATRICES DESIGN
This subsection discusses the design of digital beamforming and combining matrices based on the analog beamforming and combining matrices obtained in the previous subsection. Considering the baseband BD scheme, which employs ZF to eliminate inter-user interference by the null space orthogonal bases of baseband channels of different users and ensure zero inner-user interference by digital combiner in the digital beamforming stage. Thus, a MU-MIMO downlink channel can be decomposed into multiple parallel independent SU-MIMO channels [40]. However, when the number of users is large, the overlap of the row subspace of channel matrix per user becomes significant, which results in a quite poor performance [41]. Moreover, the operational dimension of baseband BD scheme also becomes large, thereby increasing the computational complexity. However, in [28], the criterion for trying to avoid the loss of information is proposed to design the analog beamforming, which can reduce the computational complexity. Therefore, inspired by the scheme proposed in [25] and [28], we design the digital beamforming and combining matrices by empolying the criterion which tries to avoid the loss of information.
Meanwhile, the complexity analysis shows the advantages over the baseband BD scheme.
The equivalent baseband channel is defined asH = W H RF HF RF . According to (3) and (10), the problem of digital beamforming design can be formulated as Based on the criterion proposed in [28], the SVD of equivalent baseband channel of each user is expressed as Design the digital combining matrix W k BB as the first N S column vector ofŪ k , i.e., W k BB =Ū k (:, 1 : N S ) , k ∈ {1, . . . , K }, and bring it into (44), then the objective function can be rewritten as wherē The power allocation obtained by utilizing waterfilling will be approximately performed by equal power allocation in the digital stage, i.e., P ≈ P t K N S I K N S for N BS approaches infinity. Due to W RF derived in the above subsection is a para-unitary matrix (i.e., W H RF W RF = I M BS ), we can obtain Thus, (46) can be further written as (49) VOLUME 8, 2020 It can be seen from (49) that the optimal digital beamforming F BB of the maximized objective function can be obtained by the first K N S columns of the right singular vector ofH comp . Defining the SVD ofH comp asH comp =Ū comp¯ compV H comp , then we set F BB =V comp (:, 1 : K N S ).
Up to now, the optimal digital beamforming F BB and combining W BB obtained to eliminate inter and inner-interference can be illustrated in the following.
For a massive MIMO system with multi-antenna users, the asymptotic orthogonality of different user channels has been proven as the number of BS antennas is large [28]. Here, we extend the conclusion to the baseband channel model, then the correlation matrix of different user baseband channels follows Proof: For a massive MIMO system with multi-antenna users, the correlation matrices of different user analog channels satisfy the following asymptotic orthogonality [28] lim SinceŪ p is a unitary matrix, (53) can be further written as Defining the following two partitions of matrices¯ k andV k as: i.e., every of the first N S right singular vectors belonged to two different user equivalent baseband channels are asymptotically mutual orthogonal in massive MIMO regimen. Using W k BB =Ū k (:, 1 : N S ) , k = 1, . . . , K , (45), and (47), we can obtain To sum up, when F BB =V comp (:, 1 : K N S ), we havē In other words, the data streams of different users can be independently transmitted on subchannelH kṼk , so as to eliminate inter-interference in baseband channel. In addition, let W k BB =Ū k (:, 1 : N S ) , k = 1, . . . , K , the innerinterference can also be eliminated. For satisfying the constraint F RF F BB 2 F = K N S , we adjust each column of F BB to be F BB (:, i) = F BB (:,i) F RF F BB (:,i) F , i ∈ {1, . . . , K N S }. Therefore, all the constraints in (44) are satisfied and the specific procedure is summarized as Algorithm 2.
Then, we compare the computational complexity of the proposed digital beamforming design method and the baseband BD scheme in [25]. Since both of them need to compute the equivalent baseband channel of each user, we compare computational complexity from the solution procedure of digital beamforming and combining.
It can be found from Algorithm 2 that the complexity of the proposed scheme except computing the baseband channels mainly comes from steps 4, 7, and 8. In step 4, the equivalent baseband channelH k ∈ C M MS ×M BS of each user performs SVD to obtain the digital combining matrix, and the corresponding complexity is O KM 2 MS M BS [42]. In step 7, the composite matrixH comp is obtained by one multiplication of two matrices, hence the complexity is O K 2 M MS N S M BS . The last one originates from computing the digital beamforming matrix F BB in step 8. Since this part , i ∈ {1, . . . , K N S }; 10: Obtain the total equivalent baseband channel H total = W H BBH F BB ; 11: Compute P by using waterfilling power allocation of the total equivalent channel H total ; 12: Output: F BB , W BB , P requires one SVD of the composite matrixH comp , the complexity is O K 2 N 2 S M BS . To sum up, the overall computational complexity of the proposed digital beamforming design scheme is O K 2 N 2 S M BS . In contrast, since the baseband BD scheme proposed in [25] performs SVD for the matrix with dimension (K − 1) M MS × M BS to obtain the null space orthogonal basis of each user channel under the same parameter configuration, the corresponding computational complexity is O K (K − 1) 2 M 2 MS M BS . In addition, due to the same method is utilized to obtain the digital beamformer and combiner in [26]- [28], the corresponding computational complexity required by each literature for designing digital beamforming is also O K (K − 1) 2 M 2 MS M BS . Therefore, the proposed design method enjoys much lower computational complexity compared with baseband BD scheme.

IV. NUMERICAL SIMULATION
To evaluate the performance of the proposed hybrid beamforming design scheme, the corresponding simulation results are presented in this section. All simulation results are obtained by averaging over 1,000 random channel realizations based on MATLAB platform. For simplicity, the propagation environment is modeled as a N c = 8 cluster with N p = 10 rays per cluster. The AoAs/AoDs of all channels are assumed to follow the uniform distribution within [0, 2π ]. In the simulation, we consider that the BS with N BS = 256 antennas and M BS = 16 RF chains serves K = 8 users, where each user is equipped with N MS = 16 antennas and M MS = 2 RF chains to support N S = 2 data streams simultaneously. The noise variance at each user is σ 2 = 1, and the SNR is defined as P t σ 2 . Furthermore, we set the maximum number of iterations as N iter = 7, and the factor in Algorithm 1 is set as ε = 10 −6 [18]. It is worth noting that we focus on the hybrid beamforming design of massive MIMO system with full-connected structure in the paper. We compare the performance of the proposed scheme and the state-of-the art hybrid beamforming design schemes, which include the least number of RF chains (the least number of RF chains is equal to the number of the transmitted streams) based HySBD scheme [28], the fulldigital dirty paper coding (DPC) method [43], the EGT-BD based scheme [25], the iterative based MGLRAM method [26] and HyEB approach [27]. Since the DPC implemented with the iterative waterfilling algorithm has been certified to be capacity-reaching in the broadcast channel, it is used as the performance upper bound of the hybrid ones.
A. PERFORMANCE FOR SUM RATE Fig. 2 compares the sum rate performance of different beamforming schemes versus SNR when the number of BS antennas is large (N BS = 256). In the simulation, the number of iterations is set to 20 in the MGLRAM and HyEB schemes. It can be seen from this figure, since the baseband BD technology leads that the overlap of the row subspace of each user channel matrix becomes significant, the performance of the system decreases as the number of users increases. However, the analog hybrid beamforming design scheme in Algorithm 1 improves the capacity of the equivalent baseband channel, and the digital hybrid beamforming derived in massive MIMO regimen can eliminate both inner-user and inter-user interferences. Therefore, we can observe from Fig. 2 that the proposed hybrid beamforming design scheme is superior to the state-of-the art for hybrid beamforming ones. Meanwhile, the result also verifies the effectiveness of the proposed design scheme in BS with a large number of antennas. In addition, the HySBD is slightly better than MGLRAM with the least number of RF chains.
To further investigate the performance of the proposed design scheme in small antenna arrays, Fig. 3 demonstrates the sum rate comparison for different beamforming schemes versus SNR when the number of BS antennas is small  (N BS = 32). We assume the BS with M BS = 16 RF chains, each user with N MS = 4 antennas, and the least number of RF chains M MS = 2. As can be seen from the figure that although the proposed design scheme is derived from the theoretical analysis for a massive MIMO regimen, it is outstanding compared with the state-of-the art schemes even if the number of BS antennas is not very large. Fig. 4 compares the sum rate performance of different beamforming schemes versus the BS antennas, where SNR = 0dB. As can be seen from this figure, the sum rate performance of different design schemes improves correspondingly as the number of BS antennas increase, where the proposed design scheme is better than others. When the number of BS antennas is large, the performance gap between the proposed beamforming and other hybrid beamforming schemes (except for HyEB) is small. However, compared with the small number of BS antennas, the performance gap between the proposed beamforming scheme and the DPC scheme is larger. Furthermore, although the proposed scheme is derived from the massive MIMO system, it works relatively well even with not a very large number of BS antennas. For illustration, the sum rate of the proposed beamforming scheme with N BS = 64 and N BS = 96 respectively approaches 86% and 88% of that reached by the DPC scheme, while the sum rate of HySBD with the same settings only approaches about 66% and 74%. Fig. 5 compares the sum rate performance of different beamforming schemes versus the number of users, where the number of users changes from 2 to 15, and the transmission SNR is SNR = 0dB. We can observe from the figure that the gap between different design schemes is very small when the number of users is small (except for HyEB), e.g., k ≤ 3, and the performance of all schemes is closer to that of the DPC scheme. As the number of users increases, the sum rate performance of different design schemes becomes large, where the proposed design scheme is better than others. Furthermore, it can also be explained that with the increase of the scale of the system, the proposed design scheme effectively eliminates the inner and inter-user interference, so as to improve the performance of system. of data streams, i.e., M MS = N S , and M BS = K M MS . We find that the performance of all different beamforming schemes is similar and very close to that of the DPC scheme as the number of data streams per user is small, i.e., N S = 1. When the number of data streams supported by the system increases, the gaps between the sum rates of different schemes become larger correspondingly. However, the proposed hybrid beamforming scheme outperforms other schemes when the number of data streams is different.

E. PERFORMANCE FOR POWER EFFICIENCY
As mentioned in Sec. I, the power consumption is a key issue which deserves our consideration for different hybrid beamforming schemes. In this subsection, we aim to compare the power efficiency performance of different hybrid beamforming design schemes [44]. Considering the hybrid beamforming design based on full-connected structure, the power consumption mainly includes the following parts: a) the power amplifier (PA) connected to each antenna at the BS; b) the low noise amplifier (LNA) at the receiver; c) the phase shifter (PS) and the RF chain on both receiver and transmitter sides; d) the digital baseband (BB) processor; e) the digitalto-analog converter (DAC) on the receiver side and the ADC on the transmitter side.
Considering the full-digital beamforming for MIMO, there is a DAC, an RF chain, a PS and a PA for each antenna at the BS. Then, a BB which adapts all the data streams to the transmit antennas is required. At the receiver, each antenna is equipped with an ADC, an RF chain, a PS, and an LNA, plus a baseband digital combiner that combines all the outputs of ADC to obtain the soft estimate of the transmitted symbols. Therefore, the amounts of power consumed by BS and users in full-digital MIMO architecture are expressed respectively as P DPC_BS = N BS (P RF + P DAC + P PA ) + P BB , where P BB , P RF , P LNA , P PA , P PS , P DAC , and P ADC , are the power of BB, the power of each RF chain, the power of each LNA, the power of each PA, the power of each PS, the power of each DAC, and the power of each ADC, respectively. Different from the full-digital beamforming for MIMO, each RF chain requires the same number of phase shifters as that of all antennas to control the phase of all antennas in the full-connected structure. Therefore, the amounts of power consumed by BS and users in full-connected MIMO architecture are expressed respectively as P TOTAL_BS = M BS (P RF + P DAC + N BS P PS ) To better compare the performance of different hybrid beamforming design schemes, the power efficiency η is used as the standard of measurement, which is expressed as follows: where P is the total power consumption of the system.   7 presents the power efficiency performance of different beamforming design schemes versus SNR. The simulation parameters according to [44] are set as follows: P BB = 243mW, P RF = 40mW, P LNA = 30mW, P PA = 16mW, P DAC = 110mW, and P ADC = 200mW. Furthermore, the power consumed by each PS is assumed to be 10mW as in [45]. It can be seen clearly that the proposed scheme can transmit the signal more efficiently with the same SNR and power consumption, which means it has higher power efficiency. Further, since the full-digital MIMO architecture requires more hardware and produces higher power consumption, its power efficiency performance is relatively low compared with the full-connected architecture. Therefore, the full-digital MIMO architecture is rarely used for signal propagation in practical applications. In addition, as shown in Fig. 2, based on the fact that the sum rates of the system achieved by the HySBD and MGLRAM algorithms are similar, hence the power efficiencies of the two algorithms are approximate with the same power consumption.    (60) and (62) that the power consumption of the PS is dominant when the number of BS antennas increases. Thus, under the number of large-scale BS antennas, the power consumption of the full-connected structure is higher than that of the full-digital structure. According to the simulation results shown in Fig.4, the power efficiency of different schemes is lower than that of the full-digital DPC scheme. The number of BS antennas is set as N BS = 256 for different the number of each user antennas. It can be seen from (61) and (63) that the power consumption of the DAC is dominant when the number of each user antennas increases. Since the sum rate increases with more power consumption for the full-digital DPC scheme, its power efficiency declines considerably as the number of each user antennas increases. In summary, the proposed hybrid beamforming design scheme has stable and high power efficiency regardless of changing the number of the BS and each user antennas.  number of iteration N iter , where K = 4. As can be seen from the figure that the proposed PSIA algorithm converges tremendously regardless of the number of BS antennas and the value of SNR. Especially, the sum rate performance of the system reaches 90% of the convergence value after one iteration. In addition, after 7 iterations, the sum rate performance of the system tends to saturate and no longer grows, reaching the maximum value. Therefore, the maximum number of iterations is set as N iter = 7 in all simulations.

V. CONCLUSION
In this paper, we have investigated the hybrid beamforming design of a downlink mmWave massive MU-MIMO system with full-connected structure. A two-stage linear hybrid beamforming design scheme has been proposed to obtain optimal close-form solutions. The criterion of trying to avoid the loss of information has been adopted for the sub-procedure of each communication in both analog and digital stages to approach the performance of full-digital as far as possible. Further, we have solved the initial problem by approximating different optimal targets in the analog and digital stages, respectively, which not only ensured the maximum channel capacity in each stage, but also maximized the overall capacity of system. Finally, the simulation results show that the performance of the proposed hybrid beamforming design scheme outperforms the existing methods regardless of whether BS is equipped with large or small antenna arrays. Since the proposed scheme for the hybrid beamforming system is based on perfect CSI, perspectives of this work include an extension to consider the case of imperfect CSI as in [46], which is more practical in future applications. In addition, considering the huge available bandwidth of mmWave communications, the hybrid beamforming design for massive MU-MIMO orthogonal frequency-division multiplexing (OFDM) systems can also be considered for our future work.   RUPAK KHAREL (Senior Member, IEEE) received the Ph.D. degree in secure communication systems from Northumbria University, U.K., in 2011. He is currently a Senior Lecturer at the School of Engineering, Manchester Metropolitan University. His research interests include various use cases and the challenges of the IoT and cyber physical systems (CPS), cyber security challenges on CPS, the energy optimization of the IoT networks for green computing, the Internet of Connected Vehicles (IoV), and smart infrastructure systems. He is a Principal Investigator of multiple government and industry funded research projects. He is a member of IET and a Fellow of the Higher Education Academy (FHEA), U.K.