AoD-Adaptive Channel Feedback for FDD Massive MIMO Systems With Multiple-Antenna Users

In this paper, we propose an efficient feedback scheme for an angle of departure (AoD) based channel estimation in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems with multiple antennas at the users. The channel feedback scheme is based on zero-forcing block diagonalization (BD) and it is proposed for two distinct design cases; in case I, the number of streams intended for a user equals the number of antennas at that user; in case II, the number of streams is less than the number of receive antennas. Case I is applicable in scenarios where high data rate requirements are needed as it transmits data symbols over all of the available degrees of freedom of the system. Diversely, case II is applicable when reliability is a priority in the system as it uses the additional receive antennas at the user to achieve spatial diversity to enhance the link performance. The proposed scheme is analyzed for the two cases by quantifying the downlink rate gap from the case of perfect channel state information (CSI). Moreover, we design structured feedback codebooks based on optimal subspace packing in the Grassmannian manifold and show that these codes achieve close performance to the perfect CSI case. Additionally, a vector quantization scheme is proposed to quantize the user’s channel matrix when optimal power allocation across multiple streams is adopted in the low signal-to-noise ratio (SNR) region. The feedback codebooks are based on optimal line packing in the Grassmannian manifold, where every vector of the user’s channel matrix is quantized and sent to the BaseStation. The results demonstrate a fundamental trade-off between vector quantization, with power optimization across the data streams, and subspace quantization. Specifically, vector quantization codebooks outperform subspace-based codebooks in the low SNR region, while the situation is reversed in the high SNR region.


I. INTRODUCTION
Massive multiple-input multiple-output (MIMO) wireless communication systems have been shown to introduce dramatic improvements, in both spectral and energy efficiency, by simultaneously serving multiple users with simple linear precoders [1]- [3]. To fully utilize multiplexing and array gains of massive MIMO, the downlink channel state information (CSI) must be acquired at the BaseStation (BS) to perform precoding and digital beam-forming tasks on the transmitted signals. In time division duplex (TDD) massive MIMO systems, there have been many works in acquiring downlink channels at the BS using the estimated uplink channels by utilizing the channel reciprocity. However, in frequency division duplex (FDD) systems, channel reciprocity cannot be used to obtain the downlink CSI at the BS. In this case, the downlink CSI is generally estimated at the user equipment and then fed back to the BS. The huge number of antennas at the BS leads to an overwhelming overhead, which adversely impacts the system's bandwidth and energy efficiency as well as exacerbating its latency, rendering such a system to be impractical for time-varying channels. Hence reducing this overhead is paramount for realizing the potentials of this technique. FDD deployments provide wider coverage than TDD as mobile equipments in FDD systems transmit on a continuous basis, which enables devices to achieve cell-edge rates farther from the base station [4]. Hence, FDD needs fewer BSs than TDD as long as FDD devices achieve desired cell-edge rates at farther distances. This results in reducing the deployment and operating costs since FDD requires fewer BSs for the same coverage. The previous advantages give a strong motivation to advance FDD massive MIMO topic.

A. RELATED WORK
Several CSI techniques were proposed in the literature for reducing the feedback overhead in FDD massive MIMO systems. For instance, in [5], a spatially common sparsity adaptive channel estimation and feedback scheme was proposed. The authors developed a compressive sensing scheme that exploits the sparse nature of the downlink channels in the angular domain for reliable downlink CSI estimation and feedback with significantly reduced overhead. The authors in [6] also proposed a channel feedback algorithm based on compressive sensing to reduce the feedback load without degrading the quality of channel reconstruction at the BS. They exploited the correlation of CSI to design a quasi-signalindependent dictionary to enhance the quality of CSI recovery at the BS. In [7], the authors exploited the hidden joint sparsity structure in the multi-user MIMO channel matrix by presenting a joint multi-user MIMO channel recovery at the BS. Multiple users fed back distributed channel measurements to the BS to recover the channel matrix using a joint orthogonal matching pursuit algorithm. A robust closed-loop pilot and CSI feedback resource adaptation scheme was proposed in [8], which exploits the joint sparsity of the multi-user massive MIMO channels in order to improve the CSI estimation. Additionally, the framework can minimize the needed pilot and feedback resources for successful CSI recovery. In [9], the authors were able to estimate the users' downlink channel covariance matrix from the uplink pilots using the fact that the angular scattering function of the user channels is invariant over frequency bands. They proposed a novel sparsifying precoder, based on the covariance information, to maximize the effective sparsified channel matrix's rank when the sparsity of each effective user channel is not larger than some desired downlink pilot dimension, resulting in reducing the downlink training and the CSI feedback overhead. In [10], the user first compresses the CSI, based on some compressive sensing approach, then the BS reconstructs the CSI using a deep neural network. A real-time CSI feedback architecture based on a long-short-term memory (LSTM) was proposed in [11] using deep learning CSI sensing and recovery network.
There is also much work on feedback codebook design for massive MIMO systems. For example, a codebook was proposed in [12] for compressed channel feedback for correlated massive MIMO channels. This codebook can quantize and feedback the compressed low-dimensional CSI with reduced overhead. A reduced-dimensional subspace codebook for lens antenna array aided massive MIMO systems was proposed in [13]. By leveraging the concept of angle coherence time, large-dimensional vectors in the channel subspace are first generated. Based on these vectors, the reduced-dimensional subspace codebook is created by considering both the lens and the beam selector. The equivalent channel is quantized and fed back to the BS using this codebook. The authors of [14] proposed a novel dual-stage Grassmannian product quantization approach suitable for high-dimensional CSI. This method works efficiently when the channel can be decomposed in the angular domain, where efficient discrete Fourier transform (DFT) codebooks can be exploited for CSI compression. In [15], the authors proposed a scheme to extrapolate the channel frequency response from the uplink channel estimates to the downlink frequency range. This approach completely removes the need for any feedback from the users' side to the BS. However, the price for this is a degradation in the quality of the downlink channel estimates, due to a downlink spectral efficiency reduction.
In the aforementioned works, the compressive sensing and channel statistics-based feedback techniques suffer from high complexity to provide high-quality channel estimates and reconstruction accuracy at the BS without significantly decreasing the spectral efficiency. In contrast to the above works, an angle of departure (AoD)-adaptive subspace codebook for channel feedback was proposed in [16]. The paper utilized the idea that the angles of departure vary much slower than the channel gains, which results in a significant reduction in the required feedback overhead. This is because the channel vector is constrained in a lower-dimensional subspace of the full M -dimensional space (where M is the number of transmitting antennas at the BS) during the angle coherence time. Table 1 categorizes the related reviewed works along with their main distinctive characteristics so as to better position this paper's work by difference.

B. CONTRIBUTIONS
In contrast to existing work, this paper designs and studies an AoD-adaptive channel feedback framework in massive MIMO systems when the users have multiple antennas, which, to the best of our knowledge, has never been addressed before. 1 Different from the above-mentioned channel statistics codebooks, the AoD based massive MIMO channel feedback leverages the concept of angle coherence time in the millimeter-wave (mmWave) channel model. The AoDs information can be estimated with low overhead only once every angle coherence period using the sparse downlink channel estimates at the users. Then, only the low-dimensional channel gains of the resolvable paths of each receive antenna are fed back during the angle coherence period, which we address in this work. As the number of receive antennas increases, the channel feedback overhead increases, which could exhaust the network resources. To overcome this, we propose a feedback scheme that jointly reports the CSI of the receive antennas to the BS resulting in a massive reduction in the required feedback overhead. More specifically, we do not quantize and feed back the channel vector of each receive antenna independently, instead, we quantize and report the subspace spanned by these vectors to the BS which results in a significant rate improvement per each user. To achieve this, we use BD [17] for precoding. Two different design cases are considered; in one, the number of streams intended for a user equals the number of antennas at that user, and in the other case, the number of streams is less than the number of antennas. Since BD involves simultaneous transmissions of multiple data streams to each user while canceling the interference from other users, it only needs the channel subspace of each user's channel matrix at the BS, which requires fewer feedback bits compared to reporting 1 In the multiple receive antenna case, the proposed scheme is based on quantizing and feeding back subspaces not vectors as in the single receive antenna case, where we use Grassmannian subspace packing to design the quantizer. Moreover, to be able to only feed back the channel subspace not the whole matrix, zero-forcing block diagonalization (BD) is used as the interference canceling scheme instead of the regular zero-forcing (ZF) as in the single antenna case. As a consequence, the feedback scheme design is different and all of the derived analytical bounds that evaluate the rate loss of our proposed feedback scheme are different from the single antenna case and provide insights that cannot be drawn from the simpler, single antenna users case. the actual channel matrix. Furthermore, optimally designed subspace codebooks to quantize the subspace of each user's channel matrix are devised. The main goal is to design low overhead feedback schemes while minimizing the system rate loss due to channel quantization. In addition, this paper extends our previous work in [18], [19] by providing detailed mathematical analysis to quantify the rate loss resulting from the proposed subspace based quantization scheme for the two considered cases. Moreover, a vector quantization based codebook design, based on water-filling and optimal line packing on the Grassmannian manifold, is proposed for the low signal-to-noise ratio (SNR) region, and it is shown to enhance the downlink spectral efficiency. The contributions of this paper can be summarized as follows: 1) We propose an efficient and structured feedback BD-based AoD-adaptive codebooks using optimal subspace packing on the Grassmannian manifold for massive MIMO systems with multiple antenna users. 2) A channel feedback scheme for two distinct design cases is proposed; in one case the number of streams intended for a user equals the number of antennas at that user, and in the other, the number of streams is less than the number of receive antennas. 3) Detailed performance analysis for the two considered cases is provided to quantify the rate gap between the system with perfect CSI and our proposed scheme, in which we prove that the required number of feedback bits to achieve a constant rate gap scales linearly with SNR. 4) A vector quantization codebook, based on optimal line packing on the Grassmannian manifold, is proposed to enhance the per-user rate in the low SNR region when power allocation (water-filling) across multiple data streams is used, where it is shown that vector quantization codebooks outperform subspace-based codebooks in the low SNR region, while the situation is reversed in the high SNR region. The rest of the paper is organized as follows. In Section II, the adopted system model is presented, while in Section III, the design of BD-based beam-forming matrices for the two considered scenarios is described. The AoD adaptive subspace codebook used for channel quantization is presented in Section IV. Throughput degradation due to digital channel feedback is analyzed in Section V. The water-fillingbased channel quantization and feedback are discussed in Section VI. Simulation results and conclusions are given in Sections VII and VIII, respectively.
Notation: Matrices and vectors are written in boldface letters; matrices are capital letters while vectors are lower-case letters. The transpose and conjugate transpose (Hermitian) of a matrix are denoted by (·) T , (·) H , respectively. |x| is the absolute value of a scalar. The notation (x) + means that (x) + = 0 when x ≤ 0, while (x) + = x when x > 0. E[·] denotes the expectation operator. Finally, I P denotes the identity matrix of size P × P.

II. SYSTEM MODEL A. DOWNLINK MASSIVE MIMO CHANNEL MODEL
In this paper, we consider the mmWave massive MIMO broadcast (downlink) system with a single BS communicating with K multi-antenna users as shown in Fig. 1. 2 The BS has M transmitting antennas while the kth user, ∀k ∈ {1, 2, · · · , K }, has N k receiving antennas. The typical massive MIMO model is used in our system model where the number of transmitting antennas is much higher than the number of users (or users' antennas in our case), i.e., M k N k (typically M is in the range of hundreds in massive MIMO). We adopt the narrowband ray-based downlink channel model, which was explained in detail in Chapter 7 of [23] and was adopted in [16] for the case of single receive antenna, for the downlink channels. The number of resolvable paths seen by the BS to the kth user, P k , depends on the scatters around the BS, which are few typically ranging from 2 to 8, and each path has a different AoD from the BS as shown in Fig. 1. The receive antennas are located in a local multi-path fading environment with many reflectors around it as shown in Fig. 1, hence, the resolvable paths are received at each receive antenna after passing through a multiplicative fading channel. Consequently, each path is multiplied by a complex Rayleigh fading coefficient g i ∼ CN (0, 1). Therefore, the overall channel vector seen at the jth receive antenna of the kth user is given as 2 The mmWave channel with non line of sight (NLOS) was investigated and compared against the line of sight (LOS) case in [20], where the authors conducted extensive propagation measurement campaigns at 28 GHz and 38 GHz ranges to study the statistical characteristics of mmWave propagation. Other realistic field measurement campaigns, where the NLOS case was extensively studied, at 28 GHz and 73 GHz in New York City (USA) [21], and at 10 − 100 GHz in Berlin (Germany) [22] were held, and the results were so promising. In light of the previous measurements, the authors in [16] assumed the mmWave channel model in massive MIMO with two-hop scattering with NLOS, where they considered the case of single antenna users.
where parameter θ k,i (1 ≤ i ≤ P k ) represents the AoDs of the ith path of the kth user. The transmitting antennas at the BS form a uniform linear array (ULA) [1], where a(θ k,i ) ∈ C 1×M is a steering vector that represents the antenna response of the ith resolvable path of the kth user, and it can be written as where λ is the signal wavelength, d is the spacing between every two successive antennas at the BS. Although the receive antennas of the same user k see the same AoDs from the BS, they experience different and independent complex path gains. The reason for this is that the receive antennas of the kth user are spatially well-separated so they see completely independent path gains in their local multi-path fading environment. In the mmWave range, it is feasible to have sufficient separations between the few receive antennas as the signal wavelength is already small. Therefore, the channel matrix of the kth user, H k ∈ C N k ×M , can be expressed as 3 where the matrix A k (θ k,1 , θ k,2 , · · · , θ k,P k ) ∈ C P k ×M is defined as The row-space of A k is called the channel subspace throughout this paper. The jth row of G k ∈ C N k ×P k contains the complex Rayleigh path gains of the jth antenna at the kth user (i.e., the entry G k (j, i) represents the complex gain of the ith path of the jth antenna at user k). The complex path gains in G k are independently and identically distributed (i.i.d.), as explained earlier, circularly-symmetric complex Gaussian random variables with zero mean and unit variance. Please note that we considered the use of ULAs in this paper for simplicity of presentation. The proposed feedback scheme can work, using the same procedure, with uniform planar arrays (UPA) of antennas with M 1 horizontal antennas and M 2 vertical antennas, where A k (φ k,i , θ k,i ) is a function of the azimuth, φ k,i , and the elevation, θ k,i , AoDs of the ith path of the kth user [16]. However, the channel matrix of the kth user is kept the same as H k = G k A k (φ k,i , θ k,i ), hence, the proposed channel feedback technique can also be considered. 3 It should be noted that there will be an angular spread for each scattering cluster; however, the reflected angles within this interval are usually modelled as discrete angles in literature which represent the resolvable paths. Overall, a narrowband clustered channel representation (based on extended Saleh-Valenzuela model) is proposed for mmWave massive MIMO channel, as it allows accurate capturing of the characteristics of mmWave channels. Under this clustered model, the narrowband channel matrix H is assumed to be the sum of the contributions of P propagation paths [24]- [26]. Based on this, the channel model we adopts in this paper is widely used in the literature, see for example [16], [27]- [31].
The BS sends m k streams to user k, where m k ≤ N k . Let d k ∈ C m k ×1 contain the m k data symbols to be transmitted simultaneously to the kth user such that Before transmitting the users' data symbols, the kth user symbol vector is multiplied by the precoding matrix F k ∈ C M ×m k . Thus, the transmitted vector x ∈ C M ×1 , which contains data symbols intended for all the users, is given by and the received signal at the kth user can be written as where n k ∈ C N k ×1 is a circularly symmetric complex Gaussian noise vector at the kth user with a zero vector mean and identity covariance matrix. The second term in (7) represents the summation of the interference, from the signals intended to all other users in the cell, at user k. The users' precoding matrices, F k 's, are unitary matrices (i.e., F H k F k = I m k ), and in order to adhere to the power constraint, we have E d k 2 = γ K , ∀k ∈ {1, 2, · · · , K }, where γ is the total transmit power at the BS.

B. PARTIAL CSI FEEDBACK
In this subsection, we present the CSI feedback elements of the considered channel model that was presented in the previous section. We assume in this paper that each user knows its own downlink CSI. 4 Then, the obtained downlink CSI at each user, H k , is quantized and fed back to the BS to perform downlink precoding.
The channel matrix, H k , is composed of two parts; the matrix, A k , which is a function of the AoDs, and the path gains matrix G k . We assume throughout the paper that the AoDs are perfectly known at both the user and the BS. This assumption is justified by the fact that the users can estimate the AoDs from downlink channel estimates they already have using the standard multiple signal classification (MUSIC) algorithm [35]. Precisely, the kth user calculates the correlation matrix of the channel vector of the jth receive antenna The kth user estimates the previous expectation using the sample average technique computed over various channel estimates within the angle coherence time. Therefore, AoDs can be estimated at user k based on R k using the traditional MUSIC algorithm. Then, the users quantize and feed back the AoDs, θ k,i , to the BS so it can 4 In FDD systems, the downlink channel matrix H k is estimated at the user side through downlink channel training. The training overhead for the downlink channel estimation in FDD massive MIMO systems is greatly increased due to the large number of antennas at the BS. However, several effective downlink training methods were proposed to address this problem [5], [27], [32]- [34] by utilizing the angular domain sparsity in the massive MIMO channel model. generate A k . With reasonably designed number of AoD quantization bits, B 0 , the channel sub-spaces, A k , can fairly assumed to be the same on both sides, without a significant mismatch.
The AoDs are fed back only once every angle coherence time of θ k,i . The coherence time of the AoDs, θ k,i , is relatively very large compared to the coherence time of the path gains in G k . Consequently, we focus in this paper on studying the quantization and feedback overhead of the path gains matrices, G k , since it constitutes most of the feedback overhead compared to the relatively insignificant feedback overhead of the AoDs. Therefore, the BS only needs to know the low dimensional path gains matrix G k ∈ C N k ×P k in order to generate the actual channel matrix H k .
We assume, before Sec. VI, that the power allocated to each user is uniformly divided across its multiple data streams. Hence, in order to perform BD, which will be discussed thoroughly in Sec. III-A, the BS only needs to know the spatial direction of each user's channel matrix. The spatial direction of the kth user, H k ∈ C N k ×M , is defined as the row-space of the channel matrix, H k . The rows of H k are orthonormal and its row-space represents the spatial direction. In the case of massive MIMO channel model described earlier, the spatial direction can be represented as where the rows of G k ∈ C N k ×P k are orthonormal and its row-space represents the row-space of G k . It was proved in [16] that the rows of the channel subspace, A k , are asymptotically orthogonal, i.e., A k A H k ≈ M I P k , as M → ∞. Since A k is known at the BS, G k can be a low-dimensional representative of the spatial direction and its quantization represents the quantization of H k . The quantization of G k , say G k ∈ C N k ×P k , is chosen from the codebook C k = {C k,1 , C k,2 , · · · , C k,2 B }, that consists of 2 B low-dimensional quantization sub-spaces in C N k ×P k , where B is the number of feedback bits for each user and the rows of C k,i are orthonormal. The details of the beam-forming matrix design and the codebook design are discussed in Sec. III and Sec. IV, respectively. The kth user quantizes the row-space of its path gains matrix, G k , to a quantization subspace, where d( G k , C k,i ) is the distance metric between the two matrices G k and C k,i . In this paper, we adopt the chordal distance as our distance metric [36], which is given by where the φ j 's are the principal angles between the two row-spaces of the matrices G k and C k,i [36]. The rows of each matrix C k,i ∈ C k are orthonormal (i.e., C k,i C H k,i = I P k ∀ C k,i ∈ C k ), and each C k,i represents a quantization subspace in the codebook. The chordal distance can be calculated VOLUME 10, 2022 as follows where the values of this distance range between 0 and √ N k . Note that no channel magnitude feedback is needed at BS in the case of uniform power allocation across the data streams. The BS can hence generate the channel matrix as

III. DESIGN OF BD BASED BEAM-FORMING MATRICES
In this section, the procedures of the BD precoding scheme are presented, and the corresponding per-user data rates are stated. The design procedures of the users' beam-forming matrices F k are presented for the two considered design cases. Additionally, the corresponding feedback information required for each case will be highlighted.

A. DESIGN OF USERS' BEAM-FORMING MATRICES
BD is considered in this paper as a linear precoding technique that is applied at the BS to serve multiple users in the massive MIMO cell. BD is a precoding technique that completely nulls the interference at each user from other users. In light of the BD procedure, each precoding matrix, F k , is chosen under the constraint of having H j F k = 0, ∀j = k. This requires the column space of the precoding matrix to be in the null space of the matrix formed by stacking all {H j } j =k matrices, hence nulling the interference terms in (7) at each user. However, zero interference cannot be achieved practically as the BS does not have perfect knowledge of {H k } K k=1 . In the case of limited feedback, the BS only knows the subspace quantization of the row-space of each H k , namely, H k . Then, the BS performs the BD procedure using the quantized subspaces, H 1 , H 2 , · · · , H K , to generate the practical precoding matrices, F 1 , F 2 , · · · , F K .
The number of receive antennas at user k, N k , is assumed in this paper to be smaller than the number of resolvable paths P k , (i.e., N k < P k ). This makes the channel vectors of the receive antennas of user k be linearly independent since they see P k independent paths with independent path gains (i.e., entries of G k are independent). In the following, the two cases we consider when designing the precoding matrices F k are presented.

1) CASE I, N k = m k
In this case, the number of antennas of user k, N k , is assumed to be equal to the number of data symbols, m k , transmitted to this user. Define W k as where H k , k ∈ {1, 2, · · · , K }, is the quantized feedback version of the original spatial direction H k of the kth user. The precoding matrix, F k , of the kth user is forced to lie in the null space of W k to achieve the zero interference constraint.
Since the AoDs differ from one user to another, the channel subspaces of the users, A k , are also independent from each other as they depend only on the AoDs of the users. Then, we can infer that the spatial directions of the users, H k , are linearly independent as well as they lie in the channel subspaces. The rank of where N R is the total number of receive antennas and M N R . Define the singular value decomposition (SVD) of W k as where V (1) k holds the first L k right singular vectors, while V k forms an orthonormal basis for the null space of W k , and therefore, its columns are candidates for the columns of the kth user precoding matrix, F k .
The effective channel of the kth user is the product, H k V (0) k . As long as interference from other users is canceled, the precoder selection problem now is equivalent to the singleuser MIMO capacity maximization problem. Consequently, the best precoder is the right singular vectors of that effective channel [37]. The rank of the effective channel, H k V (0) k , isL k , and it is upper bounded asL k ≤ min{L k , L k }, where L k is the rank of the quantized channel, H k . Hence, the SVD of the effective channel of user k is given as where k isL k ×L k and the columns of R (1) k are the firstL k singular vectors. Finally, the product V (0) k R (1) k forms an orthonormal basis of dimensionL k , and it represents the precoding matrix that maximizes the capacity of the kth user while achieving zero interference. Hence, the precoding matrix is written as 2) CASE II, N k > m k In this case, the number of antennas at the kth user, N k , is assumed to be greater than the number of data symbols, m k , transmitted to that user. Adding more receive antennas enhances the diversity gain at the users. Since the number of receive antennas at the user k is greater than the number of data streams intended to it, feeding back the whole spatial direction, H k ∈ C N k ×M , as in case I, is not needed. For case II, only the subspace spanned by the first m k right singular vectors of the channel matrix H k is needed to be fed back to the BS. Let the SVD of the channel matrix H k of the kth user be where U k ∈ C N k ×N k and V k ∈ C M ×M are unitary matrices, and k ∈ C N k ×M is a rectangular matrix that has the singular values on its diagonal. Let V k,m k be a matrix that contains the first m k columns of V k . The column-space of V k,m k needs to be quantized and fed back. Taking the Hermitian operator on both sides of (15), we get From (16), as long as the off-diagonal elements of H k are all zeros, we can notice that each column of H H k is a linear combination of the first N k columns of V k . Thus, the subspace spanned by the first N k columns of V k is equivalent to column space of H H k ∈ C M ×N k . Consequently, we can conclude that the column-space of V k,m k always lies in the column-space of H H k as m k < N k . As long as the column space of H H k always lies in the column-space of A H k , then the column-space of V k,m k always lies in the column-space of A H k ∈ C M ×P k too. This is important since A k is assumed to be already known at the BS. Therefore, a low-dimensional codebook, to be designed in Sec. IV, can be used to quantize V k,m k . In [38], the columns of V k,m k was proved to be isotropically distributed on the subspace in which they lie. Consequently, a Grassmannian subspace packing based codebook can be used to quantize V k,m k , which will be presented in Sec.
The effective channel of the kth user will be the product V H k,m k S k . The SVD of this product is given by where R (1) k represents the first m k right singular vectors. Finally, the product, S k R (1) k , forms an orthonormal basis of dimension m k , and it represents the precoding matrix F k ∈ C M ×m k that maximizes the capacity of the kth user while achieving zero interference. The precoding matrix F k is given by Hence, the received vector y k ∈ C N k ×1 at user k becomes The received vector y k , in (20), is finally left multiplied by U H k,m k , where U k,m k ∈ C N k ×m k is the matrix that contains the first m k columns of the matrix U k given in (15).

B. THE PER-USER RATE
The BS uses the quantized feedback, H k , to compute the precoding matrices, F k , and perform downlink precoding on the transmitted data vector of user k, d k ∈ C m k ×1 . BD is a linear precoding scheme that cancels the interference at each user due to all other users as discussed in Sec. III-A. Therefore, the interference term in (7) is completely canceled in the case of perfect CSI knowledge at the BS, i.e., H k ≡ H k .
Then, the per-user ergodic rate for both cases I & II is given by [17], [38] where uniform power allocation across the multiple data streams is adopted.
In practical systems, the BS cannot have ideal downlink CSI information because the feedback is limited with B bits per user. Consequently, the second term in (7), which represents the interference at user k due to all other users, cannot be totally canceled because the row-space of H k is not exactly the same as the true spatial direction of the channel, i.e., the row space of H k . Hence, some residual interference power will remain due to subspace quantization of the channel, and hence the per-user rate for both cases I & II is given as [39] where the term H k F k F H k H H k represents the useful signal intended for user k and, K j=1,j =k H k F j F H j H H k represents the multi-user interference at user k. It should be noted that (21) and (22) are optimistic ergodic rates since we assume that the receiver has perfect knowledge of the combined channel matrix, H k F k , and also of the whole interference covariance matrix. This can be approached by pilot schemes in the downlink streams.

IV. AoD-ADAPTIVE SUBSPACE CODEBOOK
The AoDs of the ith path of user k, θ k,i , which were defined in (3), depend on the obstacles that surround the BS. These obstacles change their physical positions very slowly compared with the channel path gains coherence time. On the other hand, the path gains in G k depend on the scatterers that surround user k. Therefore, the path gains in G k , changes much faster than the path AoDs, θ k,i s, [16], [40]. Hence, the coherence time of the AoDs of a resolvable path is much longer than the coherence time of its path gains. However, the size of G k is much lower than the size of the channel matrix, H k , which substantially reduces the feedback overhead. The spatial direction of the user k, H k , is asymptotically isotropically distributed in its channel subspace, i.e., the row-space of A k (θ k,1 , · · · , θ k,P k ), during the angle coherence time. As in (3), we can see that each row of the channel matrix, H k , is a linear combination of the P k resolvable paths of user k, where A k is determined by the AoDs. H k is asymptotically isotropically distributed in the row-space of A k because the rows of A k , i.e., steering vectors, are asymptotically orthogonal to each other, i.e., A k A H k ≈ M I P k [16], VOLUME 10, 2022 Quantize G k to G k = C k,Z k , where Z k is calculated as in (8); 4 The index Z k is fed back to the BS; 5 end 6 if Case II then 7 Calculate V k,m k as in Sec. III-A2; Quantize J k to J k = C II,k,Z II,k , where Z II,k is calculated as in (23); 10 The index Z II,k is fed back to the BS; 11 end and the path gains in G k are modeled as i.i.d. circularly symmetric complex Gaussian random variables with zero mean and unit variance. This makes the spatial direction of each user to be asymptotically uniformly distributed in its channel subspace during the angle coherence time, and this justifies the use of subspace packing based channel quantizer to quantize and feedback the channel matrices. The number of paths, P k , is greatly less than the number of transmit antennas, M , at the BS because of the limited scattering of mmWave [20]. Hence, the row-space of A k is a low-dimensional subspace of the full M -dimensional ambient space. As long as the BS knows the AoDs, only the row-space of the path gains matrix, G k ∈ C N k ×P k , needs to be quantized and fed back to the BS. Then, for case I, G k ∈ C N k ×P k , is chosen from the codebook C k = {C k,1 , C k,2 , · · · , C k,2 B }, that consists of 2 B low-dimensional quantization sub-spaces in C N k ×P k . G k is chosen from the codebook according to (8). The rows of G k are orthonormal, and its row-space is isotropically distributed over the complex P k -dimensional space.
For case II, since the column-space of V k,m k lies in the column-space of A H k , as proved in Sec. III-A2, then V k,m k can be represented as where J k ∈ C P k ×m k is a unitary matrix. As long as the BS knows A k , we can only quantize and feed back the column space of the low-dimensional matrix J k . Hence, a low-dimensional codebook C II,k = {C II,k,1 , C II,k,2 , · · · , C II,k,2 B }, that consists of 2 B low-dimensional quantization sub-spaces in C P k ×m k , can be used to quantize J k in case II. The matrices C II,k,i ∈ C P k ×m k are unitary and represent the quantization subspaces of case II AoD-adaptive subspace codebook. Then, J k = C II,k,Z II,k ∈ C P k ×m k is chosen from C II,k according to The column-space of J k is isotropically distributed over the complex P k -dimensional space. Algorithm 1 summarizes the feedback procedures for both cases I & II.

A. RANDOM SUBSPACE QUANTIZATION CODEBOOKS
Generally, obtaining optimal quantization codebooks is not an easy task, especially when the number of quantization subspaces is large. Therefore, the performance of such codebooks can be studied by averaging over random codebooks [41]. Analyzing the performance of random codebooks is much easier, providing us with some useful performance bounds for structured codes. In our subspace quantization problem, a set of 2 B m k -dimensional subspaces is randomly picked in the ambient P k -dimensional Euclidean space. The infinite set containing all subspaces of dimension m k in the Euclidean P k -dimensional space is called Grassmannian manifold, which is denoted by G P k ,m k . The 2 B random quantization subspaces in our random quantization codebook are uniformly distributed over the Grassmannian manifold G P k ,m k . We can pick a quantization subspace over over G P k ,m k at random by generating a matrix of dimension m k ×P k whose all entries are i.i.d. complex Gaussian. Then, using QR decomposition, an orthonormal basis for the row-space of the generated random matrix is obtained to represent the randomly picked quantization subspace. Then, the average quantization error can be calculated by averaging over many random codebooks. However, random codebooks cannot be used in practical systems.

B. GRASSMANNIAN SUBSPACE PACKING
As long as the row and column spaces of G k and J k respectively are isotropically distributed over G P k ,m k , we can solve a subspace packing problem in the Grassmannian manifold to obtain a structured codebook to be used in practical systems. The subspace packing problem is defined by finding 2 B subspaces in a higher dimensional space where the minimum distance between any two subspaces in the set is maximized. The chordal distance, defined in (10), is used in this paper as the distance metric between the subspaces. By solving the subspace packing problem and finding a good set of 2 B quantization subspaces, we can construct a Grassmannian subspace codebook. An iterative algorithm in [36] is used to solve the subspace packing problem. When the number of subspaces in the codebook, 2 B , is lower than P 2 k , the minimum inter-distance between the subspaces in the Grassmannian codebook reaches an upper bound called the Rankin bound [36]. This upper bound is the maximum attainable theoretical distance that can be achieved.

V. THROUGHPUT ANALYSIS
In this section, we calculate the rate gap between the ideal rate and the rate using a random subspace quantization scheme. The rate gap is calculated assuming that all users have the same number of receive antennas, i.e., N k = N , the same number of data streams, i.e., m k = m, and the same number of resolvable paths, i.e., P k = P. Then, an expression for the required number of feedback bits to achieve some constant rate gap is derived, where we prove that the number of bits scales linearly with the transmit power γ dB in dB.

A. RATE GAP CALCULATIONS
The per-user rate of the ideal CSI case is given by (21), and the per-user rate of the quantized CSI case is given by (22). Both cases I & II follow the same analysis in calculating the rate gap because the channel feedback in both cases, H k and V k , has the same isotropic distribution [38] in the channel subspace A k . Then, the same BD procedure is applied in both cases to calculate the precoding matrices. Following Theorem 1 of [39], which gives an upper bound for the rate gap in Multi-User MIMO systems, we derive an expression for the per-user rate gap due to limited feedback in our massive MIMO system model.
The per-user rate gap R(γ ) is bounded as follows Here, (a) follows by neglecting the positive semi-definite interference terms in the quantity Following the BD procedure, both F k and F k are asymptotically isotropically distributed, and they are chosen independent of H k . This means that the first two terms in (25) where T k ∈ C N ×N is unitary and uniformly distributed over G N ,N , Z k ∈ C N ×N is lower triangular with positive diagonal elements and satisfies tr(Z k Z H k ) = d 2 (H k , H k ), Y k ∈ C N ×N is lower triangular with positive diagonal elements satisfying Y k Y H k = I N − Z k Z H k , and the rows of M k ∈ C N ×M form an orthonormal basis for an isotropically distributed complex N dimensional subspace in the M − N dimensional right nullspace of H k . Moreover, the matrices Y k , H k and T k are independent of each other, as are the pair Z k and M k . By right multiplying both sides of (30) by F j , we get for k = j due to the fact that H k F j = 0 by the BD procedure. Therefore, The inter-user interference in (32) can reach its upper bound in an extreme case, where the channels of all the K users are strongly correlated. In this case, K users share the same clusters around the BS in the ray-based channel model, i.e., P 1 = P 2 = · · · = P K = P and A 1 = A 2 = · · · = A K = A. Thus, we can omit the subscript k of P k and A k in this proof. As discussed earlier in Sec. IV, the row-spaces of both the feedback channel matrix, H k , and the spatial direction, H k , of user k are distributed in the row-space of A. Since the row-space of H k can be orthogonally decomposed along the row-spaces of H k and M k as in (30), M k should also be distributed in the row-space of A. Hence, by utilizing the asymptotic orthogonality among the row vectors of A when M → ∞, M k can be expressed as M k = 1 √ M P k A, where the rows of P k ∈ C N ×P are orthonormal. As long as the row-space of the quantized channel matrix H k lies in the row-space of A, and according to Sec. III, then the column-space of the precoding matrix F j lies in the column-space of A H . Consequently, utilizing the orthogonality among the column vectors of A H when M → ∞, the precoding matrix F j can be expressed as F j = 1

√ M
A H E j , where E j ∈ C P×m is a unitary matrix whose columns are orthonormal. Hence, substituting in (32), the interference term can be calculated as where the second equality follows from the result AA H = M I P when M → ∞.
Lemma 1: In the extreme case that all K users are strongly correlated and share same clusters around the BS, i.e., P 1 = P 2 = · · · = P K = P and A 1 = A 2 = · · · = A K = A, row-space of M k is distributed in the right null space of H k as shown in (30), we have Therefore, the row-space of P k is distributed in the right null space of X ik . On the other hand, as previously mentioned, the BD precoding matrix can be expressed as Since the column-space of the BD precoding matrix F j is orthogonal to the row-space of H k , i.e., Therefore, the column-space of E j is isotropically distributed in the right null space of G k . Now we have proved that both the row-space of P k and the column-space of E j are isotropic sub-spaces in the null space of G k . Based on [39], we have where D is the average subspace quantization error which, for case I, is given by while for case II, D is given by where d(., .) is the chordal distance defined in (10). Hence, the interference term is given as Consequently, the rate gap can be upper bounded using the following equation

B. QUANTIZATION ERROR
In this subsection, we calculate the quantization error, D, of the spatial direction of user k when the AoD-adaptive subspace codebook is used for both cases I & II.
For Case I, we have, from Sec. IV, that Also, the spatial direction of the kth user, H k , can be represented as H k = 1 √ M G k A k as previously discussed in II-B. Hence, the quantization error, for case I, can be given as Step (a) is true due to A k A H k ≈ M I P . Both G k and G k are isotropically distributed subspaces on the complex P-dimensional space. Then, the distortion or error that results from the quantization of can be bounded as [42] where T = N (P − N ) and is a unitary matrix. Hence, the quantization error, for case I, can be given as The column-spaces of both J k and J k are distributed on the P-dimensional space. Then, the quantization error D, for case II, can be bounded as where T = m(P − m) and

C. FEEDBACK BITS
Now, we discuss the required number of feedback bits B that results in a constant rate gap. After bounding the quantization error byD, the rate loss can be bounded as Let the rate gap be such that R(γ ) ≤ log 2 (b) bps/Hz, and substituting forD from (43), then the number of feedback bits that guarantees this rate loss is given by where C = C PN and T = N (P − N ), for case I, while C = C Pm and T = m(P − m), for case II. It is noticeable that B scales linearly with the transmit power γ dB in dB.

VI. WATER-FILLING BASED CHANNEL QUANTIZATION AND FEEDBACK
Due to limitations of the zero-forcing techniques in general when the system noise is high, BD based precoding is sub-optimal at low SNR region. Consequently, in this section, we introduce a quantization scheme that is based on optimal power allocation (water-filling) across the multiple data streams to maximize the system's total sum rate at low SNR region. In the case of uniform power allocation, as discussed in the previous sections, only the row-space of H k , i.e. (the spatial direction of user k) is needed at the BS. However, in order to perform power allocation across data streams, more information about the channel matrix, H k , is needed at the BS. Therefore, in this section, we present a quantization and feedback scheme of the channel matrix, H k , which is needed when optimal power allocation (water-filling) across the multiple data streams is intended. The water-filling algorithm needs to know the direction of each channel vector of user k, i.e., the direction of the rows of H k , as well as the magnitude information of each direction, i.e., the Frobenius norm of each row of H k . Consequently, we cannot use the subspace quantization codebook that we used in the previous sections, but instead, we shall quantize and feedback each of the N k channel vectors separately to maintain the direction information in each of them. We will discuss the proposed channel quantization scheme first in the next subsection, then the power allocation scheme is presented in the following subsection. Later, we will show that water-filling is very useful at low SNR regions.

A. PROPOSED CHANNEL QUANTIZER FOR WATER-FILLING
Now, the proposed quantization technique of the channel matrix H k of the kth user is discussed. The total number of bits, B, allocated for quantizing H k is equally distributed among the rows of H k . As discussed in Sec. IV, we only need to feedback the path gains matrix G k because the channel subspace of each user A k is assumed to be known at the BS. Hence, each row of the path gains matrix G k ∈ C N k ×P k is quantized separately using vector quantization. The quantization of the ith row, g k,i ∈ C 1×P k , of the path gains matrix G k is chosen from the codebook C k,i = {c k,i,1 , c k,i,2 , · · · , c k,i,2 B * }, where c k,i,j ∈ C 1×P k is the jth quantization vector in the codebook and B * is the number of bits used to quantize each row, g k,i , of G k . The kth user quantizes its G k to N k quantization vectors that form the quantized path gains matrix G k , where G k is defined as Additionally, the squared Frobenius norm of the ith row of G k , g k,i 2 F , is quantized and fed back to the BS. The magnitude information of the channel vectors does contribute in the power allocation solution because it affects the calculation of the singular values of the effective channel, H k F k , of User k, where F k is the precoding matrix. We show next, in Sec. VI-B, the procedures of the power allocation solution and how it is computed using the singular values of H k F k . The squared frobenius norm g k,i 2 F has an Erlang distriution with parameters (P k , 1) because the entries of g k,i ∈ C 1×P k are modeled as complex Gaussian random variables with zero mean and unit variance. Hence, we can quantize g k,i 2 F , using Lloyd-Max scalar quantizer [43], based on Erlang distribution and number of quantization levels equals to 2 B norm , where B norm is the number of bits allocated for quantization of g k,i 2 F . Although the frobenius norm information is useful in performing the water-filling process at the BS, we will show in the simulation results section that allocating all the feedback bits in quantizing the direction information of the channel vectors is more useful than allocating a portion of the bits for direction information and another for quantizing the norm information. This reflects that the direction information is more sensitive for enhancing the per-user rate using waterfilling than the norm information.
The proposed channel quantizer in this section is suboptimal when uniform power allocation among the users' data streams is adopted. The subspace quantization proposed in Sec. IV is better than this quantizer when the BD with uniform power allocation is used. The reason for this is that the BD procedure does not require the direction information of each row vector of H k but only the spatial direction of it as discussed in Sec. II-B i.e. (the subspace spanned by the rows of H k ). Hence, the subspace quantization performs better because it assigns all the bits in quantizing the channel matrix's spatial direction H k . However, we show next that the proposed vector quantizer in this section, combined with optimal power allocation across the data streams, outperforms subspace quantization at low-to-mid SNR regions. Fig. 2 compares the performance of the two different quantizers when uniform power allocation across the data streams is adopted. The figure shows that subspace quantization outperforms the proposed vector quantization in all SNR ranges. This indicates that subspace quantization of H k is always better than vector quantization when using BD with uniform power allocation strategy. VOLUME 10, 2022

B. POWER OPTIMIZATION ALGORITHM
In this subsection, a power optimization technique, which aims to maximize the system's total sum rate at low SNR region, is discussed. Power optimization across the data streams is discussed under both perfect and limited CSI feedback cases.

1) POWER OPTIMIZATION ASSUMING IDEAL CSI AT THE BS
Here, the power optimization algorithm based on ideal CSI at the BS is discussed. We are interested in finding the power allocation diagonal matrices k , ∀k ∈ {1, 2, · · · , K } that maximize the sum rate of the whole system. Hence, the precoding matrix at the BS for user k becomes F k 1/2 k . When considering the BD based linear precoding at the BS, the inter-user interference is totally cancelled and the sum rate of the system assuming perfect CSI knowledge at the BS becomes where F k is the precoding matrix at user k and the diagonal elements of k scale the power transmitted into each of the columns of F k . Because of the nature of the BD structure, the BS sees every user as a point-to-point MIMO link. Therefore, the sum information rate of the system can be calculated as where k is a diagonal matrix whose elements σ k,i are the singular values of the effective channel, H k F k , of user k and k is a diagonal matrix whose elements δ k,i are the power values transmitted into each of the N k data streams of user k. Now, the power allocation problem that maximizes the sum-rate can be rewritten as where γ is the total transmitted power at the BS. The above problem is clearly a convex optimization problem and it can be solved using the standard solutions. The solution of this problem is well known and it has a closed form expression [37] when solved using the Lagrange multiplier method.
The power allocation solution of the above problem is given as The value of α is determined such that the total power constraint at the BS is satisfied. Hence, α is the solution of Algorithm 2: Power Optimization Across the Data Streams 1 User k quantizes G k into G k based on vector quantization and send it to the BS 2 The BS calculates H k = G k A k 3 The BS performs the BD procedures and generates the precoding matrices F k 4 The BS computes the singular values, k , of the effective channel, H k F k , of user k 5 The BS calculates the value α that satisfies this equation The BS calculates the optimal power scaling values,δ k,i , according toδ * The final scaled precoding matrix of user k becomes F k 1/2 k the following equation

2) POWER OPTIMIZATION ALGORITHM CONSIDERING VECTOR QUANTIZATION OF CSI AT THE BS
Now, the power optimization algorithm when considering limited feedback of CSI at the BS is discussed. We shall use the proposed vector quantizer as discussed in Sec. VI-A. The path gains matrix G k is first quantized to G k at the kth user then fed back to the BS. Then, the BS uses G k to form H k and hence going through the BD procedure to generate the precoding matrices F k as discussed in Sec. III-A1. Then, power optimization accross the data streams is adopted at the BS. Algorithm 2 summarizes these steps in an easy way. Due to limited feedback of CSI at the BS, the interference on user k due to signals of other users is not totally canceled. Because of this residual interference, we cannot express the information rate of user k as in (52). Hence, the per-user rate considering limited feedback of CSI and power optimization across the data streams is given by (55) It should be noted that Algorithm (2) uses the quantized channels H k and the precoders F k to solve a water-filling problem only on the leading term of (55) ignoring the interference term; the interference is taken into the rate expression as a consequence. Therefore, Algorithm (2) can be considered as a simplified approach to solve our water-filling problem without optimizing the whole rate, which is in general a very difficult problem to find a closed-form solution or to optimally solve using a low complexity numerical method.

VII. SIMULATION RESULTS
In this section, the performance of the proposed feedback system and codebook design is examined and verified. The system parameters are set as follows. The number of antennas at the BS is M = 128, the number of users in the system is K = 8, the number of antennas at each user is N k = N = 2 for case I, while N = 3 for case II. The number of data streams transmitted simultaneously to each user is m k = 2 and the number of resolvable paths is P = 3. The path AoDs of the users are independent and uniformly distributed in − 1 2 π, 1 2 π . The simulation parameters of our simulation setup are collected in Table 2. Fig. 3 compares the performance of BD against the conventional ZF scheme, in which the multiple antennas of the same user are orthogonally precoded, in Case I with N k = m k = 2. Fig. 3 also compares the performance of the ideal case, where perfect CSI is assumed available at the BS, and the limited feedback case where quantized CSI is fed back to the BS with B = 2 and 4 per user. Note that in the case of conventional ZF scheme, the channel vector of each antenna at the kth user is separately quantized and fed back to the BS; therefore, the feedback bits for each user are divided among  its receive antennas in this case. This is because in the case of conventional ZF, any user antenna is used to receive a single stream, and all other streams must be nulled, including the streams intended for the same user, which is not the case in BD. In Fig. 3, we plot the per-user rate using the AoDadaptive codebook with both random subspace quantization and using Grassmannian subspace packing based codebook. From this figure, we can easily see the BD approach's performance gains compared to the conventional ZF approach. It can also be noticed that Grassmannian codes are always better (or slightly better) than random codes. Finally, it is clear that increasing the number of feedback bits enhances the system performance, and we can get arbitrary close to the performance of the ideal system with perfect CSI at the BS.

A. BD BASED SUBSPACE QUANTIZATION (CASE I & II)
Remark: It should be noted that the random subspace quantization scheme does not correspond to a fixed quantizer that is picked randomly and used in all iterations. Random quantization here means that in each iteration, we generate a quantizer whose quantization subspaces are drawn randomly from a uniform distribution over the space. Then, the per-user rate is averaged over all the iterations assuming that the quantizer is known at both the user and BS side. By averaging over a large number of iterations, some good quantizers might equalize the bad performance of some other bad quantizers. Indeed, random quantization is impractical to be implemented in any practical system and it is only here used to evaluate the feedback scheme and for comparison purpose. In contrast, Grassmannian codes are structured, which deem them suitable for practical implementation in real systems as they are designed according to some specific design criteria with some performance guarantees. Fig. 4 compares the performance of BD with ideal and quantized CSI against the ideal and quantized CSI of the conventional ZF scheme for case II with N k = 3, m k = 2 with B = 2 and 4 per user. The same observations mentioned above while commenting on the results of Fig. Fig. 3 apply in this case as well. Moreover, it is noticeable that case II has a higher per-user rate than case I for the same number of user streams. This is since in case II, we assume more receiving antennas at each user than the number of streams, which introduces diversity gain at the users.
Then, we assess the derived analytical rate gap upper bound in (47) versus the number of BS antennas M . The number of feedback bits B is set as (48) with b = 3 and the downlink SNR at the users is fixed at 10 dB. Fig. 5 shows the theoretically derived upper bound on rate gap in (47) and the simulated rate gap of the proposed feedback scheme, which is calculated by R = R CSIT − R QUANT . The graph shows that the analytical rate gap becomes tighter as the number of BS antennas increases. This observation makes sense because the assumption that A k A H k ≈ M I P k , which we used to derive the theoretical upper bound, becomes more valid.
In Fig. 6 and Fig. 7, we present numerical results for the practical per-user rate when using a random quantization codebook for both cases I and case II respectively. The required number of feedback bits is scaled as in (48) to guarantee a maximum rate gap of log 2 (b), where we show the results for b = 2 and 4. We notice in Fig. 6 and Fig. 7 that  the rate gap, between the ideal case (perfect CSI at the BS) and the practical case, does not increase as the SNR increases; this is due to scaling the number of feedback bits B with the transmitted power γ dB as explained above. It is clear that the rate gap at any SNR does not exceed the maximum value of log 2 (b), which validates the expression in (48) for both cases I & II.

B. BD WITH WATER-FILLING BASED VECTOR QUANTIZATION
In this subsection, the derived power allocation scheme's performance based on the proposed vector quantizer is examined and verified. Also, we compare the power allocation scheme based on vector quantization with the subspace quantization  technique discussed in Sec. IV. The number of antennas at each user, N k = N = 2, is equal to the number of data streams transmitted simultaneously to each user, m k = 2.
In Fig. 8, the performance of the BD scheme without power optimization is compared to the performance of BD with power optimization for both ideal and quantization cases. We use a total of B = 8 and 4 bits per user, whether for subspace quantization or vector quantization. The graph clearly shows that ideal BD with power optimization (waterfilling) outperforms the regular ideal BD for all SNR ranges. However, the gap between both schemes decreases in higher SNR regions. The reason for this fact is that water-filling resists the system noise. Hence, at high SNR values, the system noise nearly becomes insignificant, and the regular BD approaches BD with water-filling. Fig. 8 also shows a trade-off between both schemes in the case of limited feedback (quantization). In the low SNR regime, water-filling with the proposed vector quantization scheme is better than the regular BD with subspace quantization. The reason for this trade-off is that we have two sources of noise, in that case, the system noise and the quantization noise. At low SNR, the system noise is more dominant than the quantization noise, hence using water-filling along with vector quantization is more useful as it is more immune to the system noise. However, at high SNR, the noise resulted from the quantization of H k is more dominant. Hence, no gain is added from using the water-filling solution, and subspace quantization, which is the better quantizer when no water-filling as in Fig. 2, resulted in better performance.
As long as BD based power optimization scheme is useful at the low SNR regime, Fig. 9 shows a comparison of both schemes in the range from SNR = −2dB to SNR = 2dB with B = 8 bits/user. The graph shows that BD based water-filling using the proposed vector quantization clearly outperforms the regular BD based subspace quantization at these low SNR levels. There is nearly a gain of 1dB in SNR when using water-filling with vector quantization. This gain is moderately good at these low SNR levels. Fig. 9 also shows the effect of having the ideal Frobenius norms of the channel vectors g k,i on the water-filling solution at the BS. It is clear that this norm information does enhance the BD based vector quantization with water-filling because it affects the calculation of the singular values of the effective channels of the users as discussed before. However, the performance of the water-filling scheme is still better than the regular BD based subspace quantization if all the bits are put in quantizing only the directions of g k,i and no information is fed back about the norms of g k,i . Finally, we show in Fig. 10 that there is no gain of quantizing the Frobenius norms of g k,i when we have a limited number of feedback bits per user, but we shall put all the bits in quantizing only the directions. As previously discussed in Sec. VI-A, we used the Lloyd-Max algorithm to quantize g k,i 2 F which has an Erlang distribution. The performance of BD based vector quantization with water-filling is better when we allocate the total B = 8 bits for quantizing the directions of g k,i than allocating 4 bits for the directions and another 4 bits for the squared norm values.

VIII. CONCLUSION
In this article, we have considered the problem of channel feedback in FDD massive MIMO systems with multiple antennas at the users. We devised a channel feedback scheme using low-dimensional codebooks to reduce the required feedback bits. The rate gap between quantized channel feedback and the case with perfect CSI at the BS has been mathematically quantified for the considered cases. A systematic approach to design the channel feedback codebooks, in which the codebook design is formulated as a subspace packing problem over the Grassmannian manifold, was proposed. It was shown that using vector quantization, combined with water-filling, can outperform BD-based subspace quantization in the low SNR region. However, in the high SNR region, the situation is reversed, and the performance of BD-based subspace quantization becomes better where the effect of receiver noise is less significant, while the effect of the quantization noise becomes dominant.
For future directions of this research, we may consider studying and analyzing the use of analog feedback for the path gain matrix G k and compare it to the digital subspace quantization proposed in this paper. It is expected to have some trade-off between both schemes in the high/low SNR regions that needs to be investigated. His research interests include cooperative communications and relay channels, layered multimedia transmission, wireless sensor networks, multiuser MIMO/OFDM systems, cognitive radio, optimization, game theory, smart metering, and smart grids. VOLUME 10, 2022