On the Accuracy of Quantization Cell Approximation in MIMO Broadcast Systems Based on Limited Feedback

This study investigates the accuracy of quantization cell approximation (QCA) in a multiple-input multiple-output (MIMO) broadcast channel. QCA is an analytical quantization model used to approximate the quantized channel state information (CSI) in limited-feedback-based MIMO systems. It has been widely used in important studies for analytical tractability because it approximates the quantized CSI as a simple beta random variable multiplied by a deterministic value. Moreover, the effect of quantization is solely concentrated on the deterministic value such that the corresponding performance analysis is stochastically independent of the quantization process. Nevertheless, the accuracy of QCA has not been carefully demonstrated in previous studies. In this study, a generalized version of QCA is proposed with a complete analysis. Because the proposed QCA requires the use of a specific distance measure, the validity of the distance measure is first investigated. Based on the proposed distance measure, the accuracy of QCA is estimated by analyzing the difference between the spectral efficiencies achieved using QCA and random matrix quantization (RMQ). The corresponding results show that the difference gradually decreases and converges to zero as the number of feedback bits increases. As QCA and RMQ provide performance upper and lower bounds, respectively, in terms of codebook construction, these results prove the asymptotic validity of QCA with respect to the number of feedback bits. Both analysis and simulation results demonstrate that the difference in spectral efficiencies is also small for a moderate number of feedback bits. In addition, this study also demonstrates an asymptotic difference in spectral efficiencies with respect to the signal-to-noise-ratio (SNR). The difference increases with the SNR, but it is bounded by a finite value. Thus, the difference in the worst case SNR can also be suppressed by increasing the number of feedback bits.


I. INTRODUCTION
Multiple-input multiple-output (MIMO) systems have been studied as promising technologies to meet the consistently increasing demand for higher-speed wireless communication [1], [2]. In particular, to achieve spatial multiplexing gain without increasing the number of antennas for mobile users, multiuser MIMO (MU-MIMO) systems have been proposed. In MU-MIMO systems, an access point or base station (BS) with multiple antennas simultaneously communicates The associate editor coordinating the review of this manuscript and approving it for publication was Shree Krishna Sharma . with multiple users (i.e., with the group of their antennas [2]- [4]). The multiple data streams prepared for multiple antennas belonging to different users are guided to the associated antennas using transmit and receive techniques called spatial-division multiplexing (SDM). Accordingly, the capacity of MU-MIMO systems increases linearly with the minimum value between the numbers of BS antennas and total antennas of the associated user group. Various SDM schemes and corresponding theoretical results have been presented to exploit the inherent gains in MU-MIMO channels [5]- [8].
For downlink transmission in MU-MIMO channels, different users in the same communication group are not generally allowed to perform joint signal processing. Thus, an SDM scheme is necessary at the transmitter to obtain a certain amount of multiplexing gain by appropriately guiding the data streams to their associated users. To achieve this, the transmitter requires a form of channel state information (CSI). The accuracy of the CSI at the transmitter (CSIT) generally determines the performance of downlink communication in MU-MIMO systems [9]. However, in practical systems, obtaining a precise CSIT is quite challenging, particularly when a frequency-division duplex (FDD) is used. This is because a transmitter operating in FDD mode cannot directly track the downlink channels. A popular solution in this type of case is limited feedback, in which each user estimates, quantizes, and feeds back the CSI to its associated transmitter [10]- [17]. Because limited feedback uses finite rate quantization, inevitable multiuser interference occurs in the received signal of each user.
In limited-feedback-based MU-MIMO systems, the downlink spectral efficiency is significantly affected by the amount of multiuser interference, and the amount of multiuser interference is determined by the number of feedback (or quantization) bits. In particular, the available multiplexing gain is directly related to the number of feedback bits. Specifically, in [9], assuming that each user has a single receive antenna, the authors showed that the number of feedback bits should be increased with respect to the signal-to-noise ratio (SNR) at a rate of B = (N t − 1) log 2 P (1) to achieve the full multiplexing gain using zero-forcing beamforming (ZFBF) based on limited feedback in MIMO broadcast channels, 1 where, B, N t , and P denote the number of feedback bits, number of transmit antennas, and SNR, respectively. In [18], the corresponding results were generalized to a case where each user has multiple receive antennas. By using the chordal distance as a distance measure for quantization and the block diagonalization (BD) (based on limited feedback) as an SDM scheme, it was shown that the bit scaling, is sufficient to achieve the full multiplexing gain, where N r denotes the number of receive antennas. This scaling rate implies that the required number of feedback bits per data stream can be reduced using multiple receive antennas by quantizing the matrix channel with an appropriate distance measure. The amount of reduction corresponds to the prelog coefficient (N t − N r ) for each receive antenna, which is induced from the reduction in dimensionality for the quantization per data stream [18]. An optimal distance measure in terms of the achievable multiplexing gain was proposed in [19], and it was shown that the optimal distance measure achieves a higher multiplexing gain than the chordal distance when the bit scaling rate is insufficient to achieve the full multiplexing gain. However, it was empirically shown that the bit scaling in (2) is also necessary for the optimal distance measure in order to achieve the full multiplexing gain. For analytical tractability, the aforementioned studies on limited feedback used a random matrix (or vector, if N r = 1) quantization (RMQ) to obtain the quantized CSI. With RMQ, each codeword is an independent and isotropically distributed unitary matrix. RMQ has been widely used in important studies on limited feedback because it is intuitive in terms of codebook construction, and the mathematical analysis is tractable when the communication performance is averaged over random codebooks [9], [18]. However, the mathematical analysis with RMQ remains complicated because the quantization error is distributed based on the minimum order statistics of 2 B independent and identically distributed (i.i.d.) random components. Thus, obtaining an explicit distribution or a statistic for a performance metric in wireless communication is difficult with RMQ.
As an alternative to RMQ, quantization-cell approximation (QCA) was considered to approximate the distribution of the quantized CSI [13], [20]- [22]. Based on QCA, the quantized CSI is approximated as a simple beta-type random variable multiplied by a deterministic value, which decreases as B increases [13]. Thus, the corresponding analysis is stochastically independent of B, which considerably simplifies the mathematical analysis. Accordingly, more sophisticated analyses can be performed with QCA with respect to various performance metrics, as has been done in previous studies [13], [19], [20], [23], [24]. Specifically, in [13], the distribution of the approximated signal-to-interference-plus-noise ratio (SINR) of each user was derived using QCA. Then, based on the distribution, an asymptotic spectral efficiency achieved by ZFBF with an appropriate scheduling was analyzed with respect to the number of users. The authors in [24] proposed an extended version of QCA applicable for N r > 1, to investigate the scaling law of the spectral efficiency with respect to the number of users when each user uses multiple antennas for SDM. Using QCA is much more advantageous in complicated wireless communication systems such as MU-MIMO in dense cellular networks because it considerably simplifies the distribution of quantization errors. For example, the authors in [23] analyzed various performance metrics based on QCA, including the outage probability, network throughput, and multi-stream transmission capacities in a stochastic network. In [20], the SINR distribution, downlink ergodic spectral efficiency, and optimal number of feedback bits were additionally analyzed in a stochastic network.
As previously mentioned, several important studies on FDD-based MU-MIMO systems have used QCA for analytical tractability. Nevertheless, the accuracy of QCA has not been carefully considered and only a few simulation results were provided in corresponding studies. In particular, the extended version proposed in [24] requires additional assumptions as compared with the conventional QCA derived for N r = 1. However, the reliability of the approximation was not carefully discussed. Because the concept of QCA is similar to that of a sphere-packing argument, one may intuitively expect that QCA will closely capture the performance of a well-designed quantization codebook if B is sufficiently large. However, an analytical result supporting this expectation does not exist. It is known only that QCA achieves a performance upper bound in terms of codebook construction. In recent wireless communications, the number of transmit antennas at the BS has increased and the networks consisting of these BSs are becoming denser [25]- [31]. The corresponding performance analysis will be far more complicated than those of existing MU-MIMO systems, and the necessity of QCA will increase as the system becomes more complicated owing to analytical tractability. Thus, we require a mathematical basis on the reliability of QCA for further applications, and this is the main subject of this study.
In this study, the accuracy of QCA in MIMO broadcast channels is considered. Because mobile devices use multiple receive antennas for SDM in recent wireless communication [32], [33], the extended version of QCA proposed in [24] is revisited in this study. The mathematical formulation and corresponding bases are provided to validate further the use of QCA for all N r ≥ 1. Specifically, the generalized version of QCA requires the use of a specific distance measure. Thus, the performance of the corresponding distance measure is first investigated. A sub-optimality of the distance measure is analyzed in terms of the sufficient number of feedback bits to achieve the full multiplexing gain. Then, the accuracy of QCA is analyzed using the proposed distance measure. For the accuracy, since the use of QCA and RMQ respectively correspond to the upper-and lower-bound performances in terms of codebook construction, the gap between the spectral efficiencies achieved using QCA and RMQ is considered. It is shown that they are asymptotically equivalent with respect to B, implying that the error of using QCA can be arbitrarily small for a sufficiently large B. Moreover, the spectral efficiency gap is shown to increase with P and is bounded by a finite value as P approaches infinity. During investigations, matrix-variate distributions that are essential for analyzing limited-feedback-based MU-MIMO systems are summarized and are newly derived in this study. Moreover, a simulation guideline is presented to help readers understand how to construct a quantized channel matrix based on QCA. Because QCA is an analytical method that does not construct an explicit codebook, a realization of the quantized CSI should be obtained based on matrix-variate distributions of the corresponding channel matrices. Both simulation and analysis results demonstrate the accuracy of QCA. The error induced from using QCA is generally small for moderate values of B and P, and it gradually decreases and converges to zero as B increases. The contributions of this paper can be summarized as follows: • The accuracy of the generalized QCA, applicable for all N r ≥ 1, is analyzed with various performance metrics including, B, N r , N t , and P.
• The generalized QCA is composed of two basic steps: 1) using a specific distance measure and then 2) approximating the quantization region based on ideal sphere-packing logic under the use of the distance measure. This paper provides the asymptotic optimality of both steps.
• A simulation guideline is presented to help readers understand how to construct the quantized channel matrix when we use the generalized QCA for N r ≥ 1.
Because QCA is an analytical method that does not construct an explicit codebook, a realization of the quantized CSI should be obtained based on matrix-variate distributions of the corresponding channel matrices. The remainder of this paper is organized as follows. Section II introduces the system model and preliminaries, and Section III introduces the principles of QCA and RMQ. Section IV provides the mathematical formulation of QCA that is applicable for all N r ≥ 1. In addition, the accuracy of QCA is analyzed based on the formulation. Section V presents a guideline for performing a simulation with QCA and RMQ. Section VI concludes the paper by discussing the applications of QCA.
Notations: Matrices and column vectors are denoted by upper-and lower-case boldface letters, respectively. The superscripts (·) T and (·) H indicate the transpose and complex conjugate transpose of a matrix, respectively, and tr(·) and det(·) indicate the trace and determinant of a matrix, respectively. In addition, etr(·) indicates e tr(·) , 0 is a zero matrix, and I m is an m×m identity matrix. The partial ordering B C for two arbitrary square matrices indicates the positive definiteness of B − C. The sets R and C represent the set of real and complex numbers, respectively, and C m×n denotes the set of all m × n complex matrices. Pr(·) denotes the probability of an event, E(·) denotes the expectation, and d = denotes the equality in distribution. For a matrix A, [A] i,j indicates the (i, j)-th element of A; moreover, vec(A) represents the vectorization of A that converts the m×n matrix into an mn×1 column vector when the columns of A are stacked as follows: where a i is the ith column of A. The operator ⊗ represents the Kronecker product. A random variable X is denoted as X ∼ Beta(a, b) if it follows a beta distribution with parameters a and b.

II. SYSTEM MODEL
In this paper, a MIMO broadcast channel in which a single base station (BS) communicates with K users is considered. The BS has N t transmit antennas, and each user has N r receive antennas. Each MIMO channel between the BS and a user k ∈ {1, . . . , K } is given by a random channel matrix H k ∈ C N t ×N r whose entries are i.i.d. complex Gaussian random variables with zero mean and unit variance. The received signal of user k is then given by where x is the transmit vector and n k is a complex Gaussian noise vector with independently distributed entries of zero mean and unit variance. The transmit vector is given by x = K l=1 V l s l , where V l ∈ C N t ×N r is the precoding matrix and s l ∈ C N r ×1 is the information symbol vector consisting of N r independent data symbols for user l. Each user is fully served with an N r degree of SDM. Thus, the transmitter broadcasts a total of KN r independent data streams simultaneously. The total transmit power is P at the BS, and equal power allocation is considered across users and data streams, such that E[s l s H l ] = P N t I N . Because the noise variance is normalized to 1, the transmit power P also corresponds to the average SNR. To focus mainly on the effect of quantization, a specific user scheduling algorithm is not considered (i.e., all K users are simultaneously served by the BS, and their channel matrices are i.i.d). All data streams are multiplexed based on SDM, and the total number of data streams is equal to the number of transmit antennas (i.e., it is assumed that K = N t N r ). Because of the symmetry, the distribution of the received signal is equivalent for all k = 1, · · · , K . Thus, without loss of generality, we can focus on analyzing the spectral efficiency of user 1, and this corresponds to the average spectral efficiency achieved by each user. From (4), the received signal of user 1 can be represented as For simplicity, subscript index 1 indicating user 1 is omitted, such that y = y 1 , H = H 1 , s = s 1 , and V = V 1 .

A. LIMITED FEEDBACK MODEL AND DISTANCE MEASURE
The performance of the multiple-antenna transmission in MIMO broadcast (downlink) channels can be improved by using an appropriate precoding strategy [11]. The extent of the performance improvement largely depends on the amount of available CSI at the transmitter. To construct appropriate precoding matrices, directional information of the channel is required, which corresponds to the left unitary matrix of the singular value decomposition (SVD) of the channel matrix [18]. In this study, the compact SVD of the channel matrix H is denoted as such that the columns of H ∈ C N t ×N r span the column space of H, the columns of U ∈ C N r ×N r span the row space of H, and the diagonal matrix ∈ C N r ×N r consists of N r nonzero eigenvalues of H H H.
To provide CSI to the transmitter, user 1 quantizes H and feeds back the quantized CSI to the transmitter. To accomplish this, user 1 uses a finite-length codebook C = W 1 , . . . , W 2 B , which is fixed beforehand and is also known to the BS; B indicates the number of feedback bits allocated to each user, and different codebooks are used for different users. Each codeword W j is given by a semi-unitary 2 matrix in C N t ×N r (i.e., W H j W j = I N r ), and is different from all other codewords. Let J = {1, . . . , 2 B } be the index set for the codewords. Then, assuming perfect channel estimation at the receiver side, the quantization process can be described aŝ where d(·, ·) is a distance measure. Because all entries of each channel matrix are i.i.d. complex Gaussian random variables, H is isotropically (or uniformly) distributed in C N t ×N r . User 1 feeds back indexn to the transmitter, and the transmitter can obtain the quantized CSI H as from codebook C.
For each codeword W j ∈ C, channel subspace matrix H can be decomposed into the components that lie in the column space of W j and in the left null space of W j as follows: Let the compact SVD of ( where the columns of S j ∈ C N t ×N r span an N r -dimensional subspace isotropically distributed in the left null space of W j . The diagonal matrix j ∈ C N r ×N r consists of the eigenvalues of H H (I N t − W j W H j ) H, and E j ∈ C N r ×N r is an isotropically distributed unitary matrix. Moreover, S j , j , and E j are mutually independent [34]. Because the matrix j measures the quantization error, it is referred to as the quantization error matrix in this study. Furthermore, the largest eigenvalue of for j = 1, · · · , 2 B , and the normalized matrix j is defined as for j = 1, · · · , 2 B . From (10), for each j = 1, · · · , 2 B , we have From (8)- (13), the channel matrix can be represented using the quantized channel matrix as VOLUME 8, 2020

B. PRECODING MODEL
In this study, BD is considered for the transmit precoding. BD is a simple linear precoding method and is widely used because it achieves a comparatively high spectral efficiency using a relatively low-complexity algorithm for eliminating multiuser interference between different users. The BD precoder tries to make V as the matrix that satisfies V H H l = 0 for all l = 2, · · · , K . However, with limited feedback, the BS only knows the quantized channel matrices { H l : l = 1, · · · , K } that are fed back from the associated users. Thus, the precoding matrix V of limited-feedback-based BD is chosen to satisfy V H H l = 0 for all l = 2, · · · , K such that the columns of V form an orthonormal basis of the left null space of the following matrix:

C. NOTATIONS: MATRIX VARIATE DISTRIBUTIONS
The achievable rate in a MIMO channel is generally represented by a log-determinant of a random matrix [2]. Thus, the spectral efficiency of the system is dependent on the statistics of such a matrix. This study defines the following matrixvariate distributions based on previous studies on multivariate statistical analysis [34]- [36]. Definition 1: A random matrix A ∈ C m×n is said to have a complex matrix variate normal distribution with mean matrix M and covariance matrix 1 ⊗ 2 , and is denoted as A ∼ CN m,n (M, 1 ⊗ 2 ), if vec(A T ) follows a complex multivariate normal distribution with mean vector vec(M T ) and covariance matrix 1 ⊗ 2 , where 1 ∈ C m×m 0 and 2 ∈ C n×n 0. This definition is the complex counterpart of DEFINITION 2.2.1 in [34]. One may be more familiar with the special case of 2 = I n [35], [36].
Definition 2: An m × m random Hermitian positive definite matrix A is said to have a complex Wishart distribution with parameters m, a > m − 1, and ∈ C m×m 0, and is denoted as A ∼ CW m (a, ), if its probability density function (PDF) is given by for B 0, and 0 otherwise, where˜ m (a) is the complex multivariate gamma function defined as [36] Definition 3: An m × m random Hermitian positive definite matrix A is said to have a complex matrix variate beta distribution with parameters m, a, and b, and is denoted as for 0 ≺ B ≺ I m , and 0 otherwise. If A ∼ CW m (a, I m ), then the expected value of A is given by [37] If A ∼ CB m (a, b), then the expected value of A can be calculated using the methods in [38] as

D. PRELIMINARIES: MATRIX VARIATE DISTRIBUTIONS
In this section, some essential matrix-variate distributions in MIMO broadcast channels are derived and summarized based on the notations previously given. As a direct extension of Theorems 3.2.5 and 5.2.4 in [34] to the domain of complex numbers, we have the following lemmas. Lemma 1: Let X ∼ CN m,n (0, ⊗ I n ) and P ∈ C n×n be a Hermitian idempotent matrix of rank r ≥ m. Then XPX H ∼ CW m (r, ).
Proof: The proof follows the same argument used to prove Theorem 3.2.5 in [34].
. Proof: The proof follows the same argument used to prove Theorem 5.2.4 in [34]. Lemma 1 implies that for arbitrarily chosen j ∈ J . Furthermore, the following lemma can be obtained from Lemmas 1 and 2. Lemma 3: For n > m, let Z ∈ C n×m be an orthonormal basis for an m dimensional plane isotropically distributed in C n×m , and P ∈ C n×n be a Hermitian idempotent matrix of rank r ≥ m. Then, Z H PZ ∼ CB m (r, n − r).
Proof: See Appendix A. From this lemma, we have Moreover, (19) and (21) implies that and (20) and (24) imply that

E. PERFORMANCE METRIC
For notational simplicity, the term including the multiuser interference is denoted as By (6), I U is represented as In addition, from (14), where (a) follows because H H V k = 0 for all k = 2 · · · , K (Section II-B). Then, by combining (28) and (29), I U is reformulated as By defining for k = 2, · · · , K and j = 1, · · · , 2 B , I U is represented as a multi-variable function as follows: Based on a sphere-packing argument [1], the achievable rate in a MIMO channel is given by the ratio of the volume (or, correspondingly, the power) of the total received signal space to that of the noise-plus-interference space [18]. Thus, from (5) and (27), the downlink instantaneous rate of user 1 can be represented by the following function R: where For convenience of analytical description, the instantaneous rate R is represented as a multi-variable function of random matrices. In (33), I U is the only term dependent on the quantization error. The downlink spectral efficiency of each user is defined as the ergodic downlink rate: where the expectation is taken over all the random components in R.

III. ANALYTICALLY TRACTABLE MODELS FOR QUANTIZATION A. RMQ
For analytical tractability, two quantization models have been widely used in previous studies for analyzing achievable performance in MIMO broadcast channels based on limited feedback. The first model uses randomly generated codewords. It is well-known as random vector quantization because it was first considered for vector quantization problems assuming N r = 1 in early studies of MIMO broadcast channels [9]. Because each channel between a user and the BS is given by a matrix, this type of quantization based on random codewords is denoted as the previously defined RMQ in this study. Assuming RMQ, each codeword W j ∈ C is a random unitary matrix uniformly distributed in C N t ×N r and is independent of all other codewords. Then, the spectral efficiency T is obtained by averaging over random codewords as well as over other random components such as fading channels. Because the performance of the system is averaged over random codewords if we use RMQ, we can argue that at least one realization of codewords always exists whose performance is equal to or better than the ensemble average [9]. This is a similar approach to the random coding argument used to prove Shannon's channel coding theorem. In this context, the performance achieved using RMQ is considered as the lower bound performance in terms of codebook construction. RMQ is easy to implement in a simulation, is intuitive in terms of codebook construction, and is tractable for mathematical analysis. Thus, it has been consistently used to analyze the communication performance in MIMO broadcast channels based on limited feedback [9], [18]. In particular, RMQ simplifies the analysis of quantization error matrix j (e.g., a significant amount of analysis was performed on quantization errors when N r = 1 in [9]). However, communication systems are becoming increasingly complicated in recent wireless communication standards (e.g., the number of antennas is increasing, and the network is becoming denser [25]- [31]). In these complicated systems, analyzing the quantization error is considerably difficult based on RMQ because the distribution of the quantization errors is still given by the minimum-order statistics of 2 B independent random variables [9]. Moreover, obtaining simulation results with RMQ for a large B is difficult when N r > 1 [18] (see Section V). QCA VOLUME 8, 2020 as described in the following section is an alternative solution to further simplify the corresponding analysis.
If N r = 1, QCA for vector quantization (QCAVQ) approximates the true quantization area R i of each codeword w i , as for some κ invariant with respect to i [13], where w i denotes the vector version (when N r = 1) of W i . Then, it further assumes thatR i andR j are disjoint if i = j and 2 B i=1 Pr(h ∈ R i ) = 1, whereh denotes the vector version (when N r = 1) of H. This is an ideal condition for the quantization region, and thus QCAVQ is known to achieve a performance upper bound in terms of codebook construction. Based on these assumptions, we have [13] where κ(B) = 2 − B N t −1 , and ζn is the scalar version of n such that from (13), ζn 1 − |h Hĥ | 2 . It is shown that F ζn (x) with the QCAVQ is greater than any CDF of the quantization error obtained from any codebook C [13]. This is the reason QCAVQ is known to achieve performance upper bound. From the CDF, we know that In other words, with QCAVQ, the quantization error can be represented by κ multiplied by an ordinary beta random variable. Thus, the entire effect of quantization is compressed into the deterministic variable κ(B) such that the mathematical analysis becomes much simpler than using RMQ because we do not need to take expectation over random codewords. However, (39) is applicable only when N r = 1. Moreover, QCAVQ has been implicitly assumed to provide tight approximation by providing only certain simulation results in the literature. Similar to the sphere-packing argument [1], one may intuit that QCAVQ provides close approximation for a large B. However, the lack of an analytical basis remains regarding the accuracy of using QCAVQ. The extended version of QCA for matrix quantization was first proposed in [24] for analytical tractability, but the detailed logic behind and corresponding accuracy of the approximation were not carefully investigated. Thus, the main subject of this study is to reformulate QCA to be applicable for all N r ≥ 1 and to carefully investigate the reliability of QCA. Because QCA and RMQ provide the performance upper and lower bounds in terms of codebook construction, respectively, the gap between the spectral efficiencies obtained by using QCA and RMQ is the primary focus of the analysis. The accuracy of QCA is investigated with respect to various system parameters including N t , N r , B, and P.

IV. QCA FOR MATRIX QUANTIZATION A. FORMULATION
In this section, QCA as proposed in [24], which is applicable for all N r ≥ 1, is reformulated. As described in Section II-A, each codeword W j is a candidate for the quantized CSI, and the quantization error is represented by j . Consequently, in quantization (7), a distance measure d should be carefully determined to reduce the eigenvalues listed in diagonal matrix j effectively. One famous example is the chordal distance, where the quantization performance of the chordal distance was discussed in previous studies [18], [40]. In terms of the achievable multiplexing gain, an optimal distance measure was also proposed in [19].
To develop QCA applicable for all N r ≥ 1, the common random variable intrinsic in j is considered. Because , the PDF of the largest eigenvalue λ j follows a beta distribution with the parameters N r (N t − N r ) and 1, which is given by [41] for arbitrarily chosen j ∈ J . Lemma 4: The largest eigenvalue λ j and normalized matrix j = j /λ j are mutually independent.
Proof: See the proof of Lemma 1 in [24]. This means that all entries of the quantization error matrix j equally contain λ j as an independent component, implying that reducing λ j can reduce all the entries of j . In other words, we have an option to suppress the quantization error matrix by solely minimizing single random variable λ j . In this context, the distance measure is considered for each j ∈ J , where λ j is defined in (11). If we apply d = d 1 to (7), the quantization process minimizes the largest eigenvalue of the quantization error matrix to obtain the quantized CSI H. Because λ j follows a beta distribution with parameters N r (N t − N r ) and 1 (40), the quantization process obtains the minimum of 2 B beta random variables, which is mathematically equivalent to the vector quantization problem as described in [9], [13]. Thus, we can equally apply the QCA argument used for vector quantization such that the true quantization area T i of each codeword W i , is approximated as for some δ invariant with respect to i. Note that d 1 (W i , H) is a beta random variable with parameters N r (N t − N r ) and 1 (40), and it is equally likely for all i that a realization of H belongs to T i . Thus, (43) corresponds to where (a) follows from (40) and (41). The original quantization regions are disjoint (T i ∩ T j = ∅, if i = j) and the union Pr(H ∈ T i ) = 1). By inheriting these properties, δ is chosen to satisfy In (44), the superscript QCA in λ QCÂ n is used to indicate that it is obtained using QCA. Applying the union bound of probability, it can be easily shown that the CDF in (44) provides an upper bound for any CDF of λn obtained by minimizing λ j with any codebook C. This implies that QCA in (44) provides a performance upper bound in terms of codebook construction, given that d 1 is used as the distance measure.
In summary, by combining the use of d = d 1 and the approximation T i ≈T i in (43), we can obtain QCA applicable for matrix quantization problems. That is, the CDF (44) implies that Furthermore, Lemma 4 implies that for arbitrarily chosen j. If N r = 1, then the quantization error matrix QCÂ n corresponds to ζn in (38) and (39). Thus, (47) is the extended version of QCAVQ described in Section III-B. It is obtained by additionally assuming d = d 1 , and is applicable for all N r ≥ 1. If a distinction is required, then this extended version is hereafter referred to as QCA for matrix quantization (QCAMQ). If not, it is simply called QCA for all N r ≥ 1.

B. PERFORMANCE OF DISTANCE MEASURE
To generalize QCAVQ to QCAMQ, the largest eigenvalue of the quantization error matrix is considered as the distance measure for quantization (i.e., d = d 1 in (7)). Thus, to verify the validity of QCAMQ, we should first verify the validity of using d 1 for quantization. In fact, obtaining an explicit optimal solution for the quantization measure that maximizes the spectral efficiency is barely possible. The chordal distance has been widely used for matrix quantization [18], [40] because it corresponds to the sum of all eigenvalues in j such that minimizing the chordal distance can intuitively suppress the quantization matrix j , and because the corresponding quantization performance of the chordal distance was extensively investigated in the literature [11], [40]. Denoting the chordal distance as d c , it is represented as A distance measure that maximizes the multiplexing gain is an alternative choice, and is defined as [19]: Because a spectral efficiency cannot be explicitly solved with respect to B, the quantization performance in wireless communication has commonly been analyzed in terms of the number of feedback bits required to maintain the constant spectral efficiency gap when compared with the optimal performance [9], [18], where the optimal performance is the achievable spectral efficiency with perfect CSIT. In matrix quantization, both d c and d m are empirically known to require the following scaling rate for B to achieve a constant spectral efficiency gap with respect to the SNR from the spectral efficiency obtained with the perfect CSIT: for some constant C. To validate the use of d 1 , the suboptimality of d 1 is presented by showing that the scaling rate in (50) is also sufficient for d 1 to maintain the constant gap from the optimal performance. Lemma 5: Suppose that d 1 (W j , H) = λ j is used as the distance measure (i.e., d = d 1 in (7)). Then, for arbitrarily chosen j from J = {1, · · · , 2 B }. Moreover, for each k = 2, · · · , K , Proof: By definition (12), j = j λ j for each j ∈ J . Because d 1 (W j , H) = λ j is used as the distance measure, and the random matrices j and E j are independent of λ j (Lemma 4), it follows that n d = j and En d = E j for arbitrarily chosen j from J . Thus, the proof of (51) is completed. For (52), see Appendix B.
We intend to derive a sufficient condition for B given that d = d 1 to maintain a constant rate gap from the spectral efficiency achieved with perfect CSIT. To achieve this, it suffices to use RMQ for the quantization codebook because it provides the performance lower bound in terms of codebook construction.
Theorem 1: Let T PCSI be the spectral efficiency achieved with the perfect CSIT and T RMQ be the spectral efficiency achieved using limited feedback with RMQ. Then, if d = d 1 , it suffices to scale the number of feedback bits as to obtain where r 1 is any constant larger than 1 and N 0 is defined as N 0 N r (N t − N r ).
Proof: See Appendix C. This theorem implies the following. To maintain a rate gap no greater than N r log 2 r 1 from the optimal spectral efficiency achieved with the perfect CSIT, with respect to P, it suffices to scale the number of feedback bits linearly with dB-scaled transmit power given that d = d 1 for quantization. This theorem demonstrates a suboptimality in terms of achieving the full multiplexing gain with respect to P, as this scaling rate of B is equivalent to that empirically obtained with the optimal distance measure that maximizes the multiplexing gain [19]. It is also equivalent to that obtained with the chordal distance [18]; the only difference is the amount of constant term N 0 log 2 N 0 +1 N r N t −N 0 log 2 (r 1 −1). Fig. 1 verifies this theorem, and it also compares the performance achieved using d 1 with that achieved using the chordal distance (d c ). With bit scaling in (53), the difference in spectral efficiency is negligible between using d 1 and d c , which demonstrates the asymptotic sub-optimality of d 1 .
Based on the suboptimality discussed in Theorem 1, unless otherwise specified, it is hereafter assumed that the measure d 1 is consistently used as the distance measure for quantization.

C. ACCURACY OF QCA
As described in Section IV-A, QCAMQ consists of the following two essentials: 1) The use of d 1 as distance measure for quantization.
2) Approximating the quantization cell of each codeword as T i ≈T i (43). Because the validity of using d 1 was discussed in the previous section, the accuracy of the approximation T i ≈T i is investigated in this section. As QCA and RMQ provide the performance upper and lower bounds in terms of codebook construction (Section III), respectively, the gap of spectral efficiencies achieved using QCA and RMQ is analyzed.
Let λ QCA be the largest eigenvalue of n obtained using QCA, and let λ RMQ be the largest eigenvalue of n obtained by minimizing d 1 (d = d 1 in (7)) with RMQ; the subscript indexn is omitted for simplicity. Applying Lemma 5 to (35), the spectral efficiency obtained using QCA and RMQ are respectively given by If we have an infinite number of feedback bits, then the quantized CSI H obtained using a well-designed distance measure and a quantization codebook will be arbitrarily close to the true quantized CSI H (i.e., lim B→∞ H = H). In this context, the following theorem proves an asymptotic validity of QCA and RMQ with d = d 1 for quantization. Theorem 2: Let T PCSI be the spectral efficiency achieved with the perfect CSIT as in Theorem 1. Then, with d = d 1 in (7), we have Proof: See Appendix D. As both the lower (RMQ) and upper (QCA) bounds of the spectral efficiency converge to the same value, they are asymptotically equivalent assuming d = d 1 with respect to B. This implies that QCA can precisely model the quantized CSI obtained from any well-designed codebook that performs better than RMQ if B is sufficiently large provided that d = d 1 is used as a distance measure. This theorem also provides an asymptotic optimality of the distance measure d 1 . However, we still do not know the number of feedback bits that corresponds to a sufficiently large number of feedback bits. Thus, having information regarding the accuracy of QCA for moderate values of B is desirable.
Note that in (55), the terms R j,k and T are not related to the quantization error. By replacing them with their expected values, we can define the following estimates: where the outermost expectations are considered with respect to all remaining random components.
Each error for estimating T QCA usingT QCA and estimating T RMQ usingT RMQ may be greater than expected. However, the only difference in calculating T QCA and T RMQ in (55) is the difference between λ QCA and λ RMQ . Thus, anticipating that the quantization performances of QCA and RMQ are not significantly different, we can expect that the estimation errors T QCA −T QCA and T RMQ −T RMQ are close, although each of them may be greater than expected. In mathematical terms, T QCA −T QCA is given by and similarly, Consequently, their difference is represented as where f λ QCA (x) = 0 for δ ≤ x ≤ 1. Thus, each of the following can make (T QCA −T QCA ) − (T RMQ −T RMQ ) be sufficiently close to zero. 1) Both T RMQ and T QCA are sufficiently close toT RMQ andT QCA , respectively.
2) The PDFs f λ QCA (x) and f λ RMQ (x) are sufficiently close.
3) The absolute value of the integral in (60) integrated only on {x ∈ [0, 1] : f λ QCA (x) > f λ RMQ (x)} is sufficiently close to that integrated only on {x ∈ [0, 1] : f λ QCA (x) ≤ f λ RMQ (x)}. The first one depends on the distributions of R j,k and T. Because they consist of normal beta and Wishart matrices, their variances are not significantly large such that both T QCA −T QCA and T RMQ −T RMQ are not expected to be very large. However, neither of them can be arbitrarily small. By contrast, f λ QCA (x) and f λ RMQ (x) can be arbitrarily close as B increases because the supports of both PDFs approach zero. As their supports approach zero, the third one can also be arbitrarily close as B increases. Moreover, as the values of λ QCA and λ RMQ become smaller, the integrand E R(x, {E[R j,k ]}, E[T]) − R(x, {R j,k }, T) becomes less dependent on x; the dummy variable x corresponds to λ QCA and λ RMQ . Thus, the third one is well satisfied for small values of λ QCA and λ RMQ .
Based on these observations, the estimation error T QCA − T QCA is expected to be sufficiently close to T QCA −T QCA for moderate values of B; the corresponding closeness will be demonstrated in the following sections. Consequently, the following approximation is considered: In (61), T approximates the true rate gap T between the upper and lower bounds for spectral efficiency, where upper bound T QCA is the spectral efficiency obtained using QCA and the lower bound T RMQ is obtained using RMQ. Thus, T should be sufficiently small if we want to use QCA to model a limited-feedback-based system in MIMO broadcast channels. Obviously, if B approaches infinity, we have Next, the approximated spectral efficiency gap T is analyzed to investigate the validity of QCA. It follows from (33) and (57) that whereĪ QCA U is defined and evaluated from (32) as where (a) follows from Lemma 5 with N 0 = N r (N t − N r ) as defined in Theorem 1. Similarly,Ī RMQ U is defined and evaluated asĪ RMQ U In (34), because V is an orthonormal basis of the left null space of G in (15), it is independent of both H and H.

Moreover, because H is a unitary matrix that is isotropically distributed in
where (a) follows from (25). By combining (63)-(66), we obtain N r for simplicity. VOLUME 8, 2020 From the CDFs of λ QCA /δ in (108) and λ RMQ in (109), we know that each expectation term comprising T in (67) is given by the common form of for some constants a and b, where X is a random variable with the CDF F X (x) = 1 − 1 − x m L for some constants m and L (e.g., a = PN r + N t , b = PM 1 δ, m = N 0 , and L = 1 for the first expectation term of T in (67)). Lemma 6: Let X be a random variable distributed in [0, 1] with the CDF F X (x) = 1 − 1 − x m L , where m and L are positive integers. Then, we have for any positive constants a > 1 and b > 0. Furthermore, Proof: See Appendix E. Using Lemma 6, we can evaluate T from (67) as follows: where (a) follows by applying Lemma 6 to each term in (67), (b) follows from the variable change of y = δx for the second term, and N 1 and N 2 are defined as Alternatively, applying (70) to the first two terms on the righthand side (RHS) of (a) = in (71), we can reformulate T in a more explicit form: Normalized gap of downlink rates achieved by using QCA and RMQ. T is obtained by simulating (35), and T is obtained by (73) by numerically integrating ϒ 2 .

FIGURE 3.
Normalized gap of downlink rates achieved by using QCA and RMQ. T is obtained by simulating (35), and T is obtained by (73) by numerically integrating ϒ 2 . (73)

D. ASYMPTOTIC ANALYSIS 1) WITH RESPECT TO B
Because N 1 −N 2 > 0, ϒ 1 (B) is negative and ϒ 2 (B) is positive for B ≥ 0. Moreover, because (1 − y N 0 ) 2 B ≤ 1 is a decreasing function of y in [0, 1], whose valid support approaches zero as B increases, we can expect that ϒ 2 (B) is a positive decreasing function that approaches zero as B increases. In addition, because N 2 + δy → N 2 and N 1 + δy → N 1 with increasing B, ϒ 1 (B) also approaches zero as B increases. Thus, unless the functions ϒ 1 (B) and ϒ 2 (B) rapidly fluctuate with respect to B, we can expect that T gradually approaches zero as B increases, as illustrated in Fig. 2. The spectral efficiency gap between QCA and RMQ decreases as B increases, which implies that QCA and RMQ are asymptotically equivalent with respect to B. In Fig. 2, both T and T are normalized by T RMQ to emphasize that the rate gap is much smaller than the achievable rate. The approximation is generally tight for moderate values of B and becomes gradually closer while approaching zero with increasing B, as expected.
In mathematical terms, lim B→∞ ϒ 1 (B) = 0 is straightforward. Moreover, from (111), we have Thus, where (a) follows from the variable change of z = x N 0 and (·) denotes the gamma function. Therefore, the rate gap converges to zero as follows: where (a) follows from (62). This is an equivalent result to Theorem 2, which implies that QCA and RMQ are asymptotically equivalent with respect to B.

2) WITH RESPECT TO P
From (33) and (35), the spectral efficiency can be rewritten as At the RHS of (77), I U is the only term that is related to quantization, and N t P I N r is the only term related to the transmit power. Thus, the inverse matrix (I U + N t P I N r ) −1 is the only term that includes the effect of the quantization error or transmit power. Note that and Thus, in terms of the achievable rate, the effect of the quantization error is negligible when P = 0, and gradually increases as P increases; it is maximized at P → ∞. Furthermore, we can expect that the rate gap T = T QCA − T RMQ will also increase as the effect of the quantization error in T increases. Accordingly, it can be expected that Fig. 3 clearly demonstrates this observation. The normalized gap is generally small and increases with respect to P. It is larger for the case with N t = 4 because the number of quantization bits is less sufficient than the case of N t = 6. The error reaches near 7% in the worst case of this figure, but it is still acceptable and can be decreased by using more feedback bits if required. From (67), T can be rewritten as based on the properties of the logarithm. If P → ∞, it follows that Because λ QCA d = δX for a beta random variable X with parameters N 0 and 1, and λ RMQ is the minimum of 2 B i.i.d. beta random variables with parameters N 0 and 1, E log λ RMQ and E log λ QCA can be obtained as follows by applying Lemma 3 in [9]: The first and second terms on the RHS of (82) can be calculated using Lemma 6. Consequently, we obtain lim P→∞ T · log 2/N r where (a) follows by applying (70) to the first integral and the variable change of y = δx to the second integral. In addition, (b) follows because (1 − (δx) N 0 ) 2 B is a decreasing function of B, which is lower-bounded by the limiting value  (80) and (84), it can be concluded that T increases with respect to P, but is bounded by the finite value T 2 (Fig. 4). In Fig. 4, the spectral efficiency gap achieved between QCA and RMQ are depicted without normalization, to verify the analytical upper bound T 2 . The analytical approximation T well approximates T and the analytical upper bound T 2 corresponds to an asymptote with respect to P as expected. The gap obtained with N t = 4 and N r = 1 is considerably larger than that with N t = 8 and N r = 2. However, because the achievable spectral efficiency is also larger with N t = 4 and N r = 1, as shown in Fig. 5, the normalized gap is not very different; this can be simply verified by normalizing the rate gap in Fig. 4 by the spectral efficiency in Fig. 5. In Fig. 5, the spectral efficiency of N t = 4 and N r = 1 is much larger than that of N t = 8 and N r = 2 for a large P because B = 30 is insufficient to perform SDM effectively with eight transmit antennas. The potential performance of the case of N t = 8 and N r = 2 with a sufficient number of feedback bits will be greater than that of the case of N t = 4 and N r = 1 as is partially shown in Fig. 5 for a small P. In other words, because the amount of quantization error increases with the SNR and the numbers of antennas, it is better to use fewer antennas when the SNR is high, and better to use more antennas when the SNR is low.

From (67), T can be rewritten as
If N t approaches infinity for fixed P and B, the quantization error unboundedly increases, and therefore, the advantage of quantization becomes negligible for both cases of QCA and RMQ. Specifically, because Although (86) implies that T → 0 as N t → ∞, this result is obtained mainly because both the achievable rates using QCA and RMQ approach zero as N t increases for fixed P and B. Nevertheless, the effect of the difference in quantization schemes (QCA vs. RMQ) disappears as N t increases, as demonstrated in Fig. 6. In short, if N t increases by implementing an increasingly larger number of antennas at the BS, such as in massive MIMO systems, the achievable rate of limited-feedback-based SDM gradually approaches zero given that B is fixed. Thus, a more sophisticated algorithm should be prepared to suppress the quantization error efficiently with a sufficient number of feedback bits.

V. SIMULATION GUIDELINE
When we use QCA, a realization of the quantized CSI is obtained from the corresponding distributions described in Section IV-A without constructing an explicit codebook. As QCA is an analytically tractable model that is not based on explicit codebook construction, it is advantageous for both analysis and simulation. In this section, a guideline is presented to generate each realization of the quantized CSI H for a given realization of the true CSI H during simulation. By exchanging the roles of H and H in (14), H can be decomposed as where the columns ofS ∈ C N t ×N r span an N rdimensional subspace isotropically distributed in the (N t − N r -dimensional) left null space of H, the diagonal matrix n ∈ C N r ×N r consists of the eigenvalues of H H (I N t − H H H ) H, andȆ ∈ C N r ×N r is an isotropically distributed unitary matrix. Moreover,S, n , andȆ ∈ C N r ×N r are mutually independent. It should be noted that n is invariant when the roles of H and H are exchanged.
Remark 1: An m × n, m ≥ n, isotropically (or uniformly) distributed semi-unitary matrix can be obtained as the n orthonormal eigenvectors of AA H corresponding to the n non-zero eigenvalues, where A is an m × n complex matrix whose entries are i.i.d. complex Gaussian random variables with mean 0 and variance 1.
The matrixȆ can be constructed according to Remark 1. The matrixS can be obtained by multiplying an independent and isotropically distributed semi-unitary matrix in C (N t −N r )×N r to an orthonormal basis of the left null space of H. Let where Q can be obtained based on Remark 1 because it is also an isotropically distributed unitary matrix obtained from the QR decomposition. The procedure thus far is applicable independent of the quantization process. We next discuss how to construct n , which depends on the distance measure and quantization codebook.

4) QCA
From (47), if we use QCA, then the quantization error matrix is given by n d = δ j for arbitrarily chosen j ∈ J and is independent ofS andȆ. Furthermore, because j for arbitrary j consists of eigenvalues of a complex beta distributed matrix with parameters N t −N r and N r (24), j can be obtained from any independent matrix A ∼ CB N r (N t − N r , N r ). The matrix Y is subsequently obtained using n = δ j as previously mentioned in (88). Then, by substituting both Y and n into (88), we finally obtain H for a given H.

5) EMULATING A SIMULATION OF RMQ WITH
As a comparison group, RMQ with d 1 and d c are considered in this study (Fig. 1). With RMQ, the quantized CSI is generated by constructing a random codebook C of size 2 B and then choosing the minimizing indexn based on (7). This is a cumbersome procedure as compared to QCA, and the computational complexity of constructing random codebook increases as B increases. The computational complexity of using QCA is independent of B. Moreover, a codebook construction for a large B may be impossible depending on the device used for simulation. Because a simulation for a large B with RMQ is included in this study, to verify the asymptotic performance, a method that emulates the generation of the quantized CSI with RMQ is also considered for efficiency and feasibility. As previously discussed, it suffices to emulate the distribution of n for (88) to obtain a realization of the quantized CSI based on RMQ during simulation.
If d = d 1 , n = n /λn is independent of both the quantization process and quantization error λn (Lemma 5). Thus, n can be obtained from any independent complex beta distributed matrix with parameters N t −N r and N r . The largest eigenvalue λn can then be obtained using a CDF inversion method. 3 That is, λn = F −1 λ RMQ (U ) with a realization U of an independent random variable uniformly distributed in (0, 1), where F −1 λ RMQ is the inverse function of F λ RMQ in (109).

6) EMULATING A SIMULATION OF RMQ WITH D = D C
If d = d c , then all eigenvalues of n are dependent on the quantization process, where the quantization process selects the codeword that minimizes the chordal distance between W j and H. In this case, the minimum chordal distance is first realized, and then the entries of n are obtained from their joint distributions conditioned on the realization of the chordal distance. Although the joint PDF of the eigenvalues of a complex beta matrix is known, a marginal distribution of each eigenvalue is not generally known in a common explicit form. Thus, a CDF inversion method should be prepared for each value of N t , N r , and B, as described in [18]. A numerical integration may be required for a large N r because the marginal CDF is difficult to calculate explicitly. The emulation of this case is well described in [18]. However, as a complement, it is revisited in this study with additional details. First, it is defined that where the normalizing constant c n,p,q,β is equivalent to that defined in [40]. By combining Corollary 1 and Lemma 3 in [40], we can obtain the CDF of the minimum of 2 B i.i.d. chordal distances as If h(B) > 1, then the CDF is not explicitly known for x > 1. Thus, complete emulation is possible only when B is sufficiently large to satisfy h(B) ≤ 1. If B is too small to guarantee h(B) ≤ 1, then a normal quantization with an explicit random quantization codebook should be performed instead. However, as the codebook is not very large, the computational complexity is not very high in these cases. Subsequently, the marginal distributions of the eigenvalues of a complex beta matrix conditioned on the chordal distance are required. They do not have a general form applicable to all N t and N r , and thus they should be calculated for each case. For example, if N r = 2, then the joint PDF of the eigenvalues of a complex beta matrix is given by (Definition 1.1 of [41]) where the eigenvalues are denoted as η 1 and η 2 , and C = . The marginal PDF of η 1 conditioned on the chordal distance η = η 1 + η 2 is calculated as The PDF of the chordal distance f η (y) can be obtained at least for 0 ≤ y ≤ 1 by differentiating (90) after substituting B = 0. If B is sufficiently large to satisfy h(B) ≤ 1, then a realization of the minimum of the chordal distances is less than 1 with probability 1. Thus, it suffices to know f η (y) for y ≤ 1. The marginal CDF F η 1 |η (x|y) is obtained by integrating f η 1 |η (x|y) with respect to x. Now, we are prepared to generate n for this specific case. Based on (90) and the CDF inversion, a realization D of the minimum of 2 B i.i.d. chordal distances is first obtained. With D, a realization of η 1 is subsequently obtained using F η 1 |η (x|D) and the CDF inversion. Lastly, we obtain η 2 = D − η 1 . Because each inverse function of the corresponding CDFs may not have an explicit form, a numerical quantization may be required for the CDF inversion.
As demonstrated in this section, emulating the use of RMQ with the chordal distance is quite cumbersome as compared to the QCA proposed in this study, and is not always feasible. QCA is very easy to simulate; is applicable for all system parameters including N t , N r , B, and P, and the computation time is significantly shorter than that required to construct an explicit codebook.

VI. CONCLUSION
The main objective of this study was to investigate the validity of an analytical quantization model called QCA. To achieve this, the gap between the upper and lower bounds for the spectral efficiency was investigated, with the upper and lower bounds being obtained using QCA and RMQ, respectively. The analytical results demonstrated that the gap is generally small regardless of system parameters N t , N r , B, and P. It was further shown that the gap can be arbitrarily small for a sufficiently large B. Simulation results were obtained to demonstrate the accuracy of the analytical results. Based on the analytical and simulation results, QCA can be regarded as a quantization model that closely approximates the performance achieved with a well-designed codebook. Moreover, the analysis framework derived in this study can be applied to various wireless networks to further validate the performance of QCA, if required.
If QCA is used to model the quantization process, then the primary advantage is that the stochastic analysis is considerably simplified as compared to the case when using an explicit codebook. If an explicit codebook is used, then the quantized channel is given by the minimum order statistics of 2 B random variables, which may also exhibit certain correlations depending on the codebook design. Thus, the corresponding distribution of the quantized CSI is very difficult to analyze. By contrast, if QCA is applied, then the quantized CSI is given by a simple random beta matrix multiplied by a deterministic value, which is a decreasing function of B as described in (47). Most important, the effect of the quantization is concentrated solely on the deterministic value such that the corresponding stochastic analysis is independent of B. For example, the instantaneous achievable rate is given in (33) as where only the first argument λn is represented in functions R and I U for simplicity. From (46), we have λ QCÂ n d = δX for any independent beta random variable X with parameters N 0 and 1. Thus, assuming the use of QCA, (93) can be rewritten as .
Each random matrix in (94) is given by a form of a simple Wishart or beta matrix. Although obtaining an explicit formula for the ergodic rate E[R] remains difficult, the problem is much simpler than when using an explicit codebook. At the very least, we have explicit distributions for random components with QCA. Based on a simple assumption or an approximation, further analysis can be performed from (94) with QCA with respect to various performance metrics, as was done in previous studies such as [13], [19], [20], [23]; details of the corresponding studies are introduced in Section I. As these studies did not give much attention to the validity of using QCA, the analytical results in this study can support the results therein. Furthermore, because QCA can reduce the complexity of the analysis, the results in this study can further encourage the use of QCA for future studies of limitedfeedback-based schemes in more complicated MIMO systems, including massive MIMO systems with FDD in dense cellular networks.

APPENDIXES APPENDIX A PROOF OF LEMMA 3
Let X ∼ CN m,n (0, I m ⊗ I n ), and the compact SVD of X H be A 1 D Because an idempotent matrix is always diagonalizable and its eigenvalues are either 0 or 1 [43], the rank of I n − P is n − r.
Because A 1 is an orthonormal basis for an m dimensional plane isotropically distributed in C n×m , the proof is completed.

APPENDIX B PROOF OF LEMMA 5
From (10), S j is an orthonormal basis for an N r -dimensional plane isotropically distributed in the (N t − N r )-dimensional left null space of W j . Moreover, S j is independent of V k . Thus, S j d = XY, if X ∈ C N t ×(N t −N r ) is an orthonormal basis of the left null space of W j and Y ∈ C (N t −N r )×N r is an isotropically distributed matrix independent of S j .
Then, by applying Lemma 3 twice, we have Then, from (20), E[S H j V k V H k S j ] = N r N t −N r I N r , for k = 2, · · · , K . Thus, From (26), E[E j j E H j ] = N t −N r N t I N r . Moreover, from (40), By combining (98) and (99), we obtain where the last equality follows from (25).

APPENDIX C PROOF OF THEOREM 1
If the transmitter has perfect CSI, BD (as described in Section II-B) completely eliminates the multiuser interference such that I U = 0. Thus, from (33) and (35), whereV is the precoding matrix obtained with H = H under the perfect CSIT assumption, andT H HVVH H. Becausȇ V is still independent of both H and H, the distribution ofT is equivalent to T [18]. From (33) and (35) when Lemma 5 is applied, where λ RMQ n denotes the largest eigenvalue of the quantization error matrix, with the indexn chosen based on RMQ given that d = d 1 in (7). With RMQ, λ RMQ n corresponds to the minimum of 2 B i.i.d. beta random variables λ 1 , · · · , λ 2 B , with parameters N r (N t − N r ) and 1. Applying Lemma 1 in where (a) follows becauseT and T are identically distributed, and (b) follows from Jensen's inequality. With (32), where (a) follows from Lemma 5 and (b) follows from E[λ Because this theorem is intended to specify a sufficient condition on B that guarantees T PCSI − T RMQ ≤ N r log 2 r 1 , it suffices to derive a sufficient condition on B that satisfies N r log 2 N r (N t − N r ) + 1 N r N t Pδ + 1 = N r log 2 r 1 . (106) By solving (106) with respect to B, we obtain B = N 0 log 2 P + N 0 log 2 N 0 + 1 N r N t − N 0 log 2 (r 1 − 1), with the notation N 0 = N r (N t − N r ).

APPENDIX E PROOF OF LEMMA 6
Because a > 1, b > 0, and 0 < X < 1, log(a + bX ) is a positive random variable.
where (a) follows from the variable change of z = a b + y, (b) follows from the variable change of x = b a z, and (c) follows from the binomial theorem.