Sparse Channel Estimation and Hybrid Precoding Using Deep Learning for Millimeter Wave Massive MIMO

Channel estimation and hybrid precoding are considered for multi-user millimeter wave massive multi-input multi-output system. A deep learning compressed sensing (DLCS) channel estimation scheme is proposed. The channel estimation neural network for the DLCS scheme is trained offline using simulated environments to predict the beamspace channel amplitude. Then the channel is reconstructed based on the obtained indices of dominant beamspace channel entries. A deep learning quantized phase (DLQP) hybrid precoder design method is developed after channel estimation. The training hybrid precoding neural network for the DLQP method is obtained offline considering the approximate phase quantization. Then the deployment hybrid precoding neural network (DHPNN) is obtained by replacing the approximate phase quantization with ideal phase quantization and the output of the DHPNN is the analog precoding vector. Finally, the analog precoding matrix is obtained by stacking the analog precoding vectors and the digital precoding matrix is calculated by zero-forcing. Simulation results demonstrate that the DLCS channel estimation scheme outperforms the existing schemes in terms of the normalized mean-squared error and the spectral efficiency, while the DLQP hybrid precoder design method has better spectral efficiency performance than other methods with low phase shifter resolution.


I. INTRODUCTION
Due to the rich bandwidth resources of the millimeter wave (mmWave), mmWave communication has attracted broad attention and become an important technology in future wireless communication systems [1], [2]. When operating at high frequency, the mmWave signal experiences high path loss. Fortunately, this challenge can be overcome by directional beamforming with a massive multi-input multi-output (MIMO) antenna array. Since mmWave bands have short wavelengths, large antenna arrays can be packed into small form factors [3].
Due to the large antenna arrays of mmWave communications, channel estimation requires a large number of time slots as overhead. Note that the mmWave channels have sparsity feature in the beamspace domain with hybrid precoding [4]. Although the beamspace is typically addressed in the mmWave lens antenna arrays, we can also obtain the beamspace channel with hybrid precoding by introducing a dictionary matrix consisting of column steering vectors. Several channel estimation schemes have been proposed to explore the beamspace channel sparsity. For examples, a distributed grid matching pursuit (DGMP) channel estimation scheme was proposed [4], where the dominant entries of the line-of-sight (LOS) channel path were detected and updated iteratively; an orthogonal matching pursuit (OMP) channel estimation scheme was proposed to detect the dominant entries of multiple channel paths [5]; a simultaneous weighted orthogonal matching pursuit (SWOMP) channel estimation scheme was proposed [6], where the frequency-selective mmWave channels were considered based on the OMP method. However, these compressed sensing (CS) channel estimation schemes estimate the dominant beamspace channel entries sequentially and greedily, which cannot guarantee the global optimality [7].
After the channel estimation of mmWave communications, hybrid precoding consisting of analog precoding and digital precoding is usually adopted. Analog precoding aims to form directional beams using phase shifter network, while digital precoding is designed to mitigate interference of multiple data streams. Several hybrid precoding methods have been proposed for single-user multi-stream mmWave communication systems. For examples, a hybrid precoding algorithm was proposed [8], where the analog precoding problem was formulated as a sparse reconstruction problem and the OMP method was adopted; to avoid the greed of the OMP method, the alternating minimization method was used [9], where the hybrid precoding problem was designed as a matrix decomposition problem and the analog precoder and digital precoder were optimized alternately; to reduce the computational complexity of the alternating minimization method, the hierarchical codebook was used to obtain multiple beams and then form the analog precoding [10], [11].
In the multi-user multi-stream mmWave communication systems, the base station (BS) transmits multiple data steams to serve all users simultaneously. To improve the spectral efficiency, the beamsteering codebook based on steering vectors was used to formulate the analog precoder vectors and the digital precoder was designed [12]. To consider the hardware constraint of the limited phase shifter resolution, beam allocation for multiple users was considered [13], where the discrete fourier transformation (DFT) codebook was adopted for analog precoding and the phase shifter resolution must be proportional to the number of antennas. To remove the constraint that the phase shifter resolution was related to the number of antennas, a quantized angle linear search (QALS) precoding scheme was proposed [14], where the angular domain was quantized according to the limited resolution of phase shifters and a linear search method was used to obtain the optimal analog beamforming vectors aligning with the dominant channel paths. However, these hybrid precoding schemes design the analog precoder using the steering vectors of quantized angles, which is heavily constrained by the resolution of phase shifters. When the mmWave system is equipped with low resolution phase shifters, there is a small number of available steering vectors of quantized angles. Since the angles of arrival (AoAs) of channel paths are randomly distributed, it cannot guarantee that the precoding based on these limited steering vectors can always have the high beamforming gain. Therefore these hybrid precoding schemes may have unsatisfactory spectral efficiency performance if none of these limited steering vectors can be aligned with the AoAs well.
Recently, the application of deep learning to mmWave communications has received much attention owing to the capability of deep learning to solve complicated nonlinear problems [15]- [17]. For examples, a machine learning based beam prediction scheme was proposed [18], where the machine learning tools and situational awareness were combined to learn the beam information (power, optimal beam index, etc) from past observations; a learned denoising based approximate message passing network was proposed to estimate the mmWave communication system with lens antenna array [19], where the noise term was detected and removed to estimate the channel. However, channel estimation for mmWave massive MIMO systems with hybrid precoding was not considered [19]. Besides, a deep learning based beamforming design method was proposed [20], where a beamforming neural network was trained to learn how to optimize the beamformer for maximizing the spectral efficiency; a deep reinforcement learning hybrid precoding method was proposed [21]. However, both these two deep learning hybrid precoder design methods neglect the constraint of limit resolution of phase shifters.
In this paper, we investigate sparse channel estimation and hybrid precoding considering the limited resolution of phase shifters for multi-user mmWave massive MIMO systems. The paper has the following two main contributions. 1) We propose a deep learning compressed sensing (DLCS) channel estimation scheme for the multi-user mmWave massive MIMO systems. The DLCS scheme consists of beamspace channel amplitude estimation and channel reconstruction. In the offline training stage, we train the channel estimation neural network (CENN) using the simulated environment based on the mmWave channel model. Then in the online deployment stage, the correlation between the received signal vectors and the measurement matrix is fed into the trained CENN to predict the beamspace channel amplitude. Afterwards, the indices of dominant entries of beamspace channel are obtained, based on which the channel can be reconstructed. Unlike the existing work that estimates the dominant beamspace channel entries sequentially [4]- [6], we estimate dominant entries simultaneously, which will be shown to have better channel estimation performance.
2) We propose a deep learning quantized phase (DLQP) hybrid precoding method for the multi-user mmWave massive MIMO systems. In the DLQP method, we first design the analog precoder and then the digital precoder. In the offline training stage, we obtain the training hybrid precoding neural network (THPNN) using the estimated channel vector and real channel vector of each user, where the approximate phase quantization is considered. Then in the online deployment stage, we obtain the deployment hybrid precoding neural network (DHPNN) by replacing the approximate phase quantization in the THPNN with ideal phase quantization, where the estimated channel vector of each user is fed into the DHPNN to obtain the analog precoding vector. Afterwards, the analog precoding matrix is obtained by stacking the analog precoding vectors of all users, based on which the digital precoding matrix can be calculated by zero-forcing (ZF).
The rest of the paper is organized as follows. In Section II, we introduce the system model and formulate the problem of channel estimation for the multi-user mmWave massive MIMO systems with hybrid precoding. In Sections III, we propose the DLCS channel estimation scheme. In Section IV, we develop the DLQP hybrid precoder design method. The simulation results are provided in Section V. Finally, Section VI concludes the paper.
We use the following notations. Symbols for vectors (lower case) and matrices (upper case) are in boldface. (·) T , (·) * , (·) H , and (·) −1 denote the transpose, conjugate, conjugate transpose (Hermitian), and inverse, respectively. We use I K to represent identity matrix of order K. The set of P × Q complex-valued matrices and real-valued matrices are denoted by C P ×Q and R P ×Q , respectively. We use E{·} to represent expectation. The l 2 -norm of a vector and Frobenius norm of a matrix are denoted by · 2 and · F , respectively. We use a[p] to denote the pth entry of a. Complex Gaussian distribution is denoted by CN . We use | · | to denote the absolute value. Im(a) and Re(a) denote the imaginary and real parts of a, respectively.

II. SYSTEM MODEL AND PROBLEM FORMULATION
We first introduce the system model of multi-user mmWave massive MIMO. Then the channel estimation problem is formulated as a CS problem to estimate the sparse channel in the beamspace.

A. System Model
We consider a downlink multi-user mmWave massive MIMO communication system that comprises a BS and U users with single antenna, as shown in Fig. 1. The BS is equipped with a uniform linear array (ULA) [1]. Note that the present method can be generalized to other array structures. Let N A and N R denote the numbers of antennas and RF chains at the BS, respectively. Hybrid precoding is typically adopted, where the number of antennas is much larger than that of RF chains, i.e., N A N R [2]. We consider the orthogonal multiple access, where the number of active users simultaneously connected with the BS is no larger than the number of RF chains, i.e., U ≤ N R [10]. If U < N R , the BS will only turn on U RF chains to serve the U users simultaneously and turn off N R − U RF chains, which will save the power consumed at the BS.
For downlink transmission, the BS performs hybrid precoding, which consists of baseband digital precoding and RF analog precoding [13]. The received signal of all U users, denoted by y dl ∈ C U , can be represented as where F R ∈ C N A ×U and F B ∈ C U ×U denote the analog precoder and digital precoder, respectively. To normalize the power of the hybrid precoder, we set F R F B 2 F = U . We denote the signal vector by s ∈ C U satisfying E{ss H } = I U and additive white Gaussian noise (AWGN) vector by n ∈ C U satisfying n ∼ CN (0, σ 2 I U ). The channel matrix for the BS and all users is denoted by There are different kinds of channel model in mmWave systems, such as the clustered mmWave channel model and the Saleh-Valenzuela mmWave channel model [2], [22]. We choose the Saleh-Valenzuela mmWave channel model in our paper. The channel vector h u ∈ C N A for the BS and the uth user is represented as where the channel vector, number of multiple channel paths, and complex gain of the ith path are denoted by h u,i , L u , and g u,i , respectively. Typically h u consists of one LOS path (the 1st channel path), and L u − 1 non-line-of-sight (NLOS) paths (the ith channel path for 2 ≤ i ≤ L u ). The steering vector α(N, θ) can be expressed as Denote the AoA for the ith path of the uth user by ϑ u,i , which is uniformly distributed over [−π, π) [4], [23]. Then we have θ u,i sin ϑ u,i if the distance between adjacent two antennas at the BS is half-wave length [4].

B. Problem Formulation
To design F B and F R for downlink data transmission, H should be estimated. Based on channel reciprocity, the estimate of downlink channel can be obtained by employing uplink channel estimation to estimate H. Note that the proposed DLCS channel estimation scheme can also be used for the downlink channel estimation. Since the BS usually has more computing power than each user in practice, we consider the uplink channel estimation where the neural network (NN) is trained and utilized for prediction at the BS. For uplink channel estimation, mutually orthogonal pilot sequences are transmitted by all users to distinguish different signals from different users for K times. Denote the pilot matrix consisted of the U mutually orthogonal pilot sequences from U users by P ∈ C U ×U . For the uplink pilot transmission, we use K different analog precoding matrices and digital precoding matrices, denoted by F k R ∈ C N A ×N R and F k B ∈ C N R ×N R , respectively, for k = 1, 2, . . . , K. The pilot sequences received at the BS for the kth sending are given by where the AWGN matrix for the kth transmission is denoted by N k . Each entry of N k obeys CN (0, σ 2 ). Based on the orthogonality of U mutually orthogonal pilot sequences, i.e., where After each user repeatedly transmits orthogonal pilot sequences K times, R k for k = 1, 2, . . . , K can be stacked as where Note that N A > N R K since N A N R and we need a small number of time slots for channel training. Denote the uth column of R by r u for u = 1, 2, . . . , U . Then r u can be represented as where n u is the uth column of N . Note that the mmWave channels have sparsity feature in the beamspace domain [4], [6]. We define as a beamspace channel vector where angle grid. Note that the range of AoAs is quantified into G grids for t = 1, 2, . . . , G. Based on the fact that A H A = GI N A /N A , eq. (11) can be further rewritten as Due to the sparse property of h b u , eq. (12) is essentially a sparse recovery problem, which can be tackled by CS techniques [24]. Note that the sparsity of h b u can be impaired by channel power leakage caused by the limited beamspace resolution of A [25], which indicates that h b u is not perfectly sparse and many entries of h b u are small but nonzero. Sparse channel estimation schemes such as OMP and DGMP estimate the dominant beamspace channel entries in a sequential and greedy manner. However, they cannot guarantee the global optimality. Therefore, in the following we will propose a DLCS channel estimation scheme to estimate dominant beamspace channel entries simultaneously.

III. DLCS CHANNEL ESTIMATION
The proposed DLCS channel estimation scheme consists of beamspace channel amplitude estimation and channel reconstruction. The main idea of the DLCS scheme is to estimate first the beamspace channel amplitude using an offline-trained CENN, and then sort the estimated beamspace channel amplitude in descending order to select the indices of dominant entries, and finally reconstruct the channel according to the selected indices. The block diagram of the DLCS scheme is illustrated in Fig. 2. The detailed steps of the DLCS scheme are summarized in Algorithm 1.

A. Beamspace Channel Amplitude Estimation
We define as the measurement matrix in (12). As shown in Algorithm 1, we feed Φ and r u to obtain the estimate of h u , denoted bŷ h u , for u = 1, 2, . . . , U . The correlation vector between Φ and r u , denoted by c u ∈ C G , can be expressed as Algorithm 1 DLCS Channel Estimation 1: Input: Φ, r u , J. The sparse channel estimation schemes sequentially select the atoms, i.e., column vectors of Φ, which yield the greatest correlation with r u . However, such greedy algorithms cannot guarantee the global optimality, which motivates us to use the NN to estimate the atoms simultaneously instead of sequentially.
As shown in Fig. 2, the beamspace channel amplitude estimation has two stages: the offline training of the CENN and its online deployment. The CENN is first trained offline and then used as the kernel of the beamspace channel amplitude estimation. The input of the CENN is c u . The amplitude of h b u can be denoted by The output of the CENN is denoted byĝ u and is expected to be g u .
As illustrated in Fig. 3, the adopted CENN in this work consists of three hidden layers and a fully connected (FC) layer. Since the NN can only deal with the real number, the input of the CENN is a real-valued vector having 2G entries composed by the imaginary and real parts of c u . Each hidden layer includes an FC layer and a batch normalization (BN) layer. The numbers of neurons in these three hidden layers are set as 1,024, 512, and 256. The activation function adopted in the FC layer is the ReLU function, which can be represented as f Re (x) = max(0, x).  (15) and the correlation of the received signals and the measurement matrix in (14), the training data of c u and g u can be obtained. In fact, the process to obtain c u and g u involves the following four steps: i) we randomly generate a channel vector based on the mmWave channel model in (3); ii) we obtain g u based on (15); iii) we compute the received signal vector r u based on (10); iv) we obtain the correlation vector c u based on (14). We divide the data set into the training set and the validation set randomly, where the size of the training set is nine times the size of the validation set. The output of the CENN isĝ u .
The training of the CENN aims to minimize the difference betweenĝ u and g u . The difference, typically named as the loss in machine learning, can be calculated in several ways. In this work, we calculate the loss by measuring the mean square error as [15] f We adopt the adaptive moment estimation (Adam) optimizer to train the CENN by TensorFlow. The CENN is trained for 1,000 epochs, where 50 mini-batches are utilized in each epoch. The learning rate is set to be a step function, which decreases with training epochs. The learning rate is initialized with the value of 0.01 and decreases 5-fold every 400 epochs.
During the online deployment of the CENN, we obtain the real measured r u from practical mmWave channel environments. We compute c u based on (14), which is then fed to the offline-trained CENN. The prediction of g u by the CENN isĝ u .

B. Channel Reconstruction
Note that the sparsity of h b u can be impaired by channel power leakage caused by the limited beamspace resolution of A [25], which indicates that h b u is not perfectly sparse and many entries of h b u have small but nonzero values. Denote the number of dominant entries of g u by J, which is the beamspace channel sparse level. In the online deployment stage, we sortĝ u in descending order according to the absolute value ofĝ u . Then we obtain the indices of the first J entries, which are the prediction of the indices of J dominant entries in g u .
We denote the prediction of these J indices by Γ ∈ R J . We further letĥ b u denote an estimate of h b u . We initializeĥ b u to be zero. Then the J dominant entries ofĥ b u can be computed via the least squares (LS) estimation aŝ where Φ Γ consists of J columns of Φ and the column indices are denoted by Γ. Then using the result A H A = GI N A /N A , we obtain the estimated channel vector for the uth user based on (11) asĥ It is shown that the proposed DLCS channel estimation scheme can avoid the greedy search that is commonly adopted by the existing sparse channel estimation schemes based on CS, since the DLCS scheme estimates dominant entries simultaneously instead of sequentially.

IV. DLQP HYBRID PRECODER DESIGN
Hybrid precoding is usually required for downlink data transmission after the channel estimation. In the proposed DLQP hybrid precoder design method, we first design the analog precoder and then the digital precoder. The main idea of the DLQP scheme is to first train the THPNN using the estimated channel vectors, where the approximate phase quantization is considered. Then the DHPNN is obtained by replacing the approximate phase quantization in THPNN with ideal phase quantization, where the estimated channel vectors are fed into the DHPNN to obtain the analog precoder vectors. Finally the analog precoding matrix is obtained by stacking the analog precoding vectors of all users and the digital precoding matrix can be calculated by ZF. The block diagram of the DLQP method is illustrated in Fig. 4. The detailed steps of the DLQP method is summarized in Algorithm 2.

A. Analog Precoder Design
Denote the analog precoder vector and approximate analog precoder vector by f u [f u,1 , f u,2 , . . . , f u, . . ,f u,N A ] T ∈ C N A , respectively, for u = 1, 2, . . . , U . As shown in Fig. 5,ĥ u is fed to the THPNN to obtainf u , whileĥ u is fed to the DHPNN to obtain f u . Note that the difference between the THPNN and DHPNN is that we use approximate phase quantization in the THPNN so that the NN can be trained, while we use ideal phase quantization in the DHPNN to meet the practical constraint of limited phase shifter resolution.
We define B as the quantization bit number of the phase shifters used at the BS, where the RF phase is quantized into Q 2 B discrete values. Each entry of f u is randomly drawn from the set {e j2πn/Q , n = 1, 2, . . . , Q}. The hybrid precoder design schemes based on beamsteering codebooks design the analog precoder vector as the steering vector of the LOS channel path [12]- [14]. However, such schemes require that Q ≥ N A to obtain high beamforming gain, which will have unsatisfactory performance when Q < N A . This requirement motivates us to use the NN to design the analog precoder when Q < N A . As shown in Fig. 4, the hybrid precoder design has two stages: the offline training of the THPNN and online deployment of the DHPNN, where the DHPNN is obtained based on the THPNN by replacing one layer of the THPNN. The TH-PNN is first trained offline and then the DHPNN is obtained, which is used as the kernel of the hybrid precoder design. The input of the THPNN and DHPNN isĥ u . The outputs of the DHPNN and the THPNN are the analog precoder vector f u and approximate analog precoder vectorf u , respectively.
As illustrated in Fig. 5, both the adopted THPNN and DHPNN consist of six layers, where five of them are shared. Since the NN can only deal with the real number, the input of the THPNN and DHPNN is a real-valued vector having 2N A entries composed by the imaginary and real parts ofĥ u . Each of the first four layers consists of a convolutional (Conv) layer and a pooling (Pool) layer. The kernel size and strides of each Conv layer are set to be five and one, respectively. The number of filters of these three Conv layers are set as 16, 32, and 64, respectively. Both the pool size and strides of each Pool layer are set to be two. The activation function adopted in the first three layers is the ReLU function, while that adopted in the fourth layer is the Sigmoid function, which can be represented as f Sig (x) = 1 1+e −x . Since the output of the FC and Pool layers can only be real number, we cannot directly obtain the complex-valued f u . Then the output of the fourth layer is the phase of analog precoder vector, which is denoted by where φ n ∈ [0, 2π), for n = 1, 2, . . . , N A . Since the RF phase is quantized into Q discrete values, in the DHPNN we use the ideal quantization (IQ) layer to quantize the continuous phase vector φ into the discrete phase vector. Denote the IQ function and the step function by Λ(·) and ε(·), respectively, where ε(x) = 0, x < 0, Then Λ(·) can be written as It is shown in Fig. 6 that Λ(x) is not differentiable when x = 2πq/Q, q = 1, 2, . . . , Q, which indicates that standard deep learning training algorithms, such as stochastic gradient descent (SGD), cannot be directly applied to train the NN. To overcome this problem, we use the approximate quantization (AQ) layer in the THPNN for offline training instead of the IQ layer. Therefore the DLQP hybrid precoder design method uses two NNs. However, the DLCS channel estimation scheme needs no quantization. Therefore the DLCS channel estimation scheme uses only one NN. Denote the AQ function by Γ(x) [26], which can be represented as where η a constant to represent the degree of approximation. As shown in Fig. 6, it is more accurate for Γ(x) to approximate Λ(x) if we set η as a larger number. It is also shown that Γ(x) is differentiable for x ∈ [0, 2π). Then we use Γ(x) to quantize φ in the THPNN for offline training. Denote the phase vector after quantization byψ [ψ 1 ,ψ 2 , . . . ,ψ N A ] T ∈ R N A , which can be represented as During the offline training of the THPNN, we generate the dataset ofĥ u and h u based on the output of the CENN and simulated mmWave channel environment. With the channel vector in (3) and the estimated channel vector in (18), the training data ofĥ u and h u can be obtained. In fact, the process to obtainĥ u and h u involves the following five steps: i) we randomly generate the channel vector h u based on the mmWave channel model in (3); ii) we compute the received signal vector r u based on (10); iii) we obtain the correlation vector c u based on (14); iv) we feed c u to the offline-trained CENN for the DLCS channel estimation to getĝ u ; v) we obtain the estimated channel vectorĥ u based on the channel reconstruction in (18). The output of the THPNN isf u .
The training of the THPNN aims to maximize the beamforming gain, i.e., the inner product off u and h u . Since the THPNN is trained to minimize the loss, we calculate the loss as the opposite number of the inner product, which can be represented as [18] f The training of the CENN aims to minimize the difference betweenĝ u and g u , while the training of the THPNN aims to maximize the beamforming gain. Therefore the loss function in (16) is different from that in (24). Note that the output of the NN is the analog precoder vectorf u , while we need to calculate the analog precoding matrix F R so that the spectral efficiency can be obtained. However, the computational process fromf u to F R is not differentiable, which cannot be applied to the NN. Therefore we do not set the spectral efficiency as the loss. We adopt the adaptive moment estimation (Adam) optimizer to train the THPNN by TensorFlow. The THPNN is trained for 6,000 epochs, where 200 mini-batches are utilized in each epoch. The learning rate is set to be a step function. The learning rate is initialized with the value of 0.01 and decreases 2-fold every 2000 epochs.
During the online deployment stage, we obtain the DHPNN based on the offline-trained THPNN by replacing the AQ layer in the THPNN with IQ layer. To obtain the input of the THPNNĥ u , we obtain the real measured r u from practical mmWave channel environments. We compute c u based on (14), which is then fed to the offline-trained CENN for the DLCS channel estimation to getĝ u . We obtain the estimated channel vectorĥ u based on the channel reconstruction in (18). We then feedĥ u to the DHPNN for the DLQP hybrid precoder design to get f u . Note that different from the offline training of the THPNN, we use the IQ function Λ(x) to quantize φ in the DHPNN, which ensures that each entry of f u is drawn from the set {e j2πn/Q , n = 1, 2, . . . , Q}. Denote the phase vector after quantization by ψ [ψ 1 , ψ 2 , . . . , ψ N A ] T ∈ R N A , which can be represented as Based on the phase vector ψ, the analog precoder vector f u can be obtained. By setting ψ n as the phase of f u,n , f u,n can be represented as f u,n = e jψn , n = 1, 2, . . . , N A .
After obtaining f u in the online deployment stage, the analog precoding matrix F R can be represented as It is shown in (24) that the analog precoder is designed to maximize the beamforming gain, where the quantization of the RF phase is considered. Note that although we use the AQ layer for offline training, we adopt the IQ layer in the online deployment stage, which guarantees the consistency of our adopted NN and the practical hardware constraint of limited phase shifter resolution.

B. Digital Precoder Design
We denote the estimated channel matrix for the BS and all users byĤ We further denote the effective channel matrix by Analog precoding aims to form directional beams using phase shifter network, while digital precoding is designed to mitigate interference of multiple data streams after analog precoding. Then the ZF digital precoding matrix can be represented by To satisfy the total power constraint, each column of the designed digital precoder, denoted by f B,u , should be normalized, i.e., such that F R f B,u 2 2 = 1, u = 1, 2, . . . , U . It is shown that the proposed DLQP hybrid precoder design method can obtain the analog precoder considering the quantized phase constraint, which is of great value in practical mmWave systems.

V. SIMULATION RESULTS
In the following we will present the performance evaluation for the proposed DLCS channel estimation scheme and the proposed DLQP hybrid precoder design method. Considering a multi-user mmWave massive MIMO communication system, the BS equipped with N R = 4 RF chains and N A = 64 antennas serves U = 3 users with single antenna. We set G = 128 according to [6], and we set the number of multiple paths in mmWave channel as L u = 2, where g u,1 ∼ CN (0, 1) and g u,2 ∼ CN (0, 0.5) [6], [11]. For the uplink pilot transmission, we set F k B = I N R . Therefore the hybrid precoding matrix is equal to the analog precoding matrix and is also a random matrix. The quantization bit number of the phase shifters used at the BS is B = 4, leading to Q = 16 [6]. We set η = 100. Since h b u is not ideally sparse due to the power leakage, the beamspace channel sparse level should be larger than L u , i.e., J > 2. We set J = 6, 7 in performance simulating. Note that the CENN is trained to predict the beamspace channel amplitude, where the training process of the CENN is independent of J. The proposed DLCS channel estimation scheme is compared with the existing OMP [5] and DGMP [4] channel estimation schemes, while the proposed DLQP hybrid precoder design method is compared with the existing QALS [14] hybrid precoder design method. We also compare the DLQP method with the Exhaustion hybrid precoder design method, i.e., we generate the analog precoding matrix F R for 30, 000 times, where each entry of F R is randomly drawn from the set {e j2πn/Q , n = 1, 2, . . . , Q}, and then the digital precoder is designed according to Algorithm 2. We select the hybrid precoder with the largest spectral efficiency as the output of the Exhaustion hybrid precoder design method.
We first evaluate the performance of the proposed DLCS channel estimation scheme from Fig. 7 to Fig. 10. As shown in Fig. 7, the channel estimation performance for the proposed DLCS scheme together with the existing schemes is compared in terms of SNR. The channel estimation performance is measured by the normalized mean-squared error (NMSE), which is defined by (32) We use K = 8 time slots to transmit pilots for uplink channel estimation. To make a fair comparison, we fix the pilot training time slots to be eight for the OMP and DGMP schemes. It is shown that the DLCS scheme has better channel estimation performance than existing schemes. When SNR = 10 dB, the DLCS scheme with J = 6 has 51.7% and 65.8% performance improvements over the OMP and DGMP schemes, respectively, while the DLCS scheme with J = 7 has 51.3% and 65.5% performance improvements over the OMP and DGMP schemes, respectively. We explain the reason for the performance improvements as follows. The OMP scheme estimates the beamspace channel dominant entries sequentially, which cannot guarantee global optimality. The DGMP scheme only estimates the LOS path, while the proposed DLCS scheme can simultaneously estimate all the dominant beamspace channel entries.
As shown in Fig. 8, we compare the spectral efficiency for the proposed DLCS scheme with the existing schemes in terms of SNR. Based on the estimated channel, there are various methods to design the hybrid precoding for mmWave downlink transmission. Similar to [6], in this work we wish to compare the upper bound of the downlink spectral efficiency, which can be simply measured by the fully-digital precoding. The ZF precoding matrix can be represented by To meet the total power budget, the uth row of F dl , denoted by f dl u , should be normalized, i.e., f dl u ← f dl u / f dl u 2 such that f dl u 2 = 1 for u = 1, 2, . . . , U . Then the spectral efficiency is given by [6] It is seen from Fig. 8 that the DLCS scheme has better channel estimation performance than existing schemes. When SNR = 10 dB, the DLCS scheme with J = 6 has 2.5% and 7.8% performance improvements over the OMP and DGMP schemes, respectively, while the DLCS scheme with J = 7 has 2.6% and 8.3% performance improvements over the OMP and DGMP schemes, respectively. The reason for the smaller spectral efficiency gap between different schemes than the NMSE gap is that the NMSE performance is much more sensitive to the success rate of the sparse recovery, while the spectral efficiency performance is determined by the beamforming gain and is less sensitive to the success rate of the sparse recovery.
In Fig. 9, the channel estimation performance for the DLCS, OMP, and DGMP schemes is compared in terms of the number of time slots for channel training. We use the same number of pilot training time slots for the DLCS, OMP, and DGMP schemes. SNR is fixed as 15 dB. From Fig. 9, it is shown that the DLCS scheme has the best channel estimation performance. When fixing the number of pilot training time slots to be K = 7, the DLCS scheme with J = 6 has 56.1% and 85.2% performance improvements over the OMP and DGMP schemes, respectively, while the DLCS scheme with J = 7 has 56.0% and 84.3% performance improvements over the OMP and DGMP schemes, respectively.
As shown in Fig. 10, we compare the spectral efficiency for different schemes in terms of the number of time slots for channel training. The system parameters for performance simulation are set to be the same as those for Fig. 9. It is shown that the DLCS scheme can have better channel estimation performance than the OMP and DGMP schemes. When the number of channel training time slots is more than eight, the spectral efficiency of the DLCS scheme remains constant, indicating that K = 8 is sufficient to obtain the full channel state information.
In the following, we evaluate the performance of the proposed DLQP hybrid precoder design method in Fig. 11 and Fig. 12. Fig. 11 compares of the spectral efficiency for the proposed DLQP hybrid precoder design method together with the existing methods in terms of SNR. Since the QALS method needs high phase shifter resolution to obtain analog beamforming vectors aligning with the dominant channel paths, we also simulate the QALS method with Q = 64. It is seen from Fig. 11 that the DLQP method has better spectral efficiency performance than existing methods. When SNR = 10 dB, the DLQP method with J = 6 has 59.9%, 83.6% and 3.5% performance improvements over the Exhaustion, QALS with Q = 16 and QALS with Q = 64 methods, respectively, while the DLQP method with J = 7 has 62.0%, 86.5% and 3.6% performance improvements over the Exhaustion, QALS with Q = 16 and QALS with Q = 64 methods, respectively. We explain the reason for the performance gap as follows. In the Exhaustion method, although we generate the analog precoding matrix F R for 30, 000 times, the number of the total possible F R should be Q N A U = 1.55 × 10 231 , which is far more than the acceptable computational complexity. In the QALS method, the AoA of the LOS channel path cannot be aligned with well with the small number of available steering vectors of quantized angles.
As shown in Fig. 12, we compare the spectral efficiency for different hybrid precoding methods in terms of the number of time slots for channel training. The system parameters for performance simulation are set to be the same as those for Fig. 9. It is shown that the DLQP method can have better spectral efficiency performance than the Exhaustion and QALS methods. When fixing the number of pilot training time slots to be K = 7, the DLQP method with J = 6 has 49.3%, 70.1% and 2.9% performance improvements over the Exhaustion, QALS with Q = 16 and QALS with Q = 64 methods, respectively, while the DLQP method with J = 7 has 51.0%, 71.0% and 2.7% performance improvements over the Exhaustion, QALS with Q = 16 and QALS with Q = 64 methods, respectively.

VI. CONCLUSIONS
We proposed a DLCS channel estimation scheme and a DLQP hybrid precoder design method for the multi-user mmWave massive MIMO communication systems. The proposed DLCS scheme and DLQP method were compared with the existing works in the aspect of NMSE and spectral efficiency. Simulation results showed that the proposed DLCS scheme has better channel estimation performance than existing schemes and the proposed DLQP method has high spectral efficiency with low resolution of phase shifters. As a future work, it is worth developing the channel estimation and hybrid precoding design for wideband multi-user mmWave massive MIMO transmission adopting deep learning.