Random SBT Precoding for Angle Estimation of mmWave Massive MIMO Systems Using Sparse Arrays Spacing

In this article, we consider the estimation of angle of arrival (AoA) and angle of departure (AoD) for mmWave MIMO channels. For mmWave systems, there is typically a large number of antennas but limited radio frequency (RF) chains. The RF chain limitation indirectly restricts the effective number of antennas or the effective array size. In radar applications, it is known that a larger array size leads to more accurate AoA estimation or more resolvable paths. It is also known that with sparse arrays, which places antennas in a nonuniform sparse manner, an enlarged virtual array can be constructed when the sources are uncorrelated. We show that the idea of sparse arrays can also be used in mmWave systems to have augmented array sizes. The main difficulty in applying sparse arrays to mmWave MIMO channels lies in the fact the signals of different paths can be correlated in mmWave system, but uncorrelatedness is necessary for applying sparse array results. We show that the paths can be decorrelated by employing random precoders and combiners that are submatrices of banded Toeplitz matrices (SBT). When the precoder and combiner are SBT that are designed according to the antenna spacing of a sparse array, we can construct enlarged virtual transmit and receive arrays. With enlarged virtual arrays, accurate estimates can be obtained, as will be demonstrated by simulation examples.


I. INTRODUCTION
Recent advances show that it is feasible to pack a large number of antennas in a small area, particularly in millimeter wave (mmWave) communication systems that use small wavelengths. However cost and power constraints often prohibit having one dedicated radio frequency (RF) chain for each antenna [1]- [3]. Hybrid precoding that employs RF analog processing and baseband digital processing has been proposed to overcome the RF limitation [4]. The RF limitation also places a restriction on the effective number of antennas, i.e., effective array size, for the estimation of angle of arrival (AoA) and angle of departure (AoD). In radar applications, it is known that a larger array size can improve the estimation accuracy [5] or increase the number of resolvable paths [6], [7]. Array augmentation The associate editor coordinating the review of this manuscript and approving it for publication was Bilal Khawaja .
can be achieved through the use of sparse arrays when the sources are uncorrelated [8], [9]. By placing N antennas in a nonuniform sparse manner, a virtual ULA array with a size in the order of N 2 can be constructed [6]. The ULA structure of the virtual array lends itself to subspace-based methods for angle estimation such as MUSIC [10] or ESPRIT [11]. The use of sparse arrays usually needs a large number of training vectors as it requires the sources be uncorrelated and sample covariance matrix of the sources be close to a diagonal matrix. Having insufficient training vectors leads to partial correlation in the sample covariance matrix, which in turns causes interference in subsequent angle estimation. The problem addressed in sparse linear arrays bears some similarities to the problem of direction finding in mmWave systems, but there are important differences. In the former case the number of antennas is limited, whereas in the latter case there is usually a large number of antennas but limited RF chains. For mmWave channels, signals of different paths can be correlated, so the condition of uncorrelated signals that is assumed in sparse linear array literature is not satisfied. Furthermore, in addition to the estimation of AoA, the AoD needs to be estimated as well for mmWave systems.
Many methods have been proposed to estimate AoA/AoD subject to hybrid structure [12]- [24]. In these works, the channels are assumed to be either time-invariant or timevarying. In the case of a time-invariant channel, the path gains and angles remain the same throughout training, i.e., training completed within the channel coherence time [12]- [19]. The precoder and combiner are designed in [12] so that an augmented matrix consisting of the received vectors becomes a submatrix of the channel and 2D unitary ESPRIT is applied for subsequent channel estimation. The channel reciprocity in time division duplexing systems is exploited to estimate the signal subspace in [13]. A two-step AoA estimation is given in [14] by first applying 2D-DFT to obtain a coarse estimation, which is then improved through angle rotation on a finer grid. Hierarchical multi-resolution estimation using successive beams of different beamwidth is considered in [15]. It is shown in [16] that auxiliary beam pairs can be used to provide high resolution channel estimates. With the feedback of initial AoD estimates, beamforming can be applied to improve the SNR for high resolution estimation [17]. Compressed sensing based methods have been shown to be a useful tool for angle estimation [18]- [20]. Accurate channel estimation is achieved using orthogonal matching pursuit (OMP) in [19]. Based on OMP, generalized block OMP is developed in [20] to take advantage of the block sparsity property of mmWave channels for better channel estimation. In the case of a time-varying channel [21]- [24], the angles remain the same while the path gains are time-varying due to fading that is caused by user movement. Such a channel model is valid if the training period is longer than the channel coherence time. An ESPRIT based estimation of AoA, AoD and path delays is given in [21] for MIMO-OFDM systems. A channel estimation method using subspace fitting of signal subspace is proposed in [22]. Simultaneous estimation of AoA and AoD using 2D beamspace MUSIC method is proposed in [23] by exploiting uncorrelated path gains. When the path gains are uncorrelated, the received covariance matrix has a Toeplitz structure, which has been shown to be useful for denoising and improving the estimation accuracy [24]. For estimation methods developed for a time-invariant channel, the performance can be degraded when there is fading. On the other hand, for methods designed for a time-varying channel, the performance can be considerably affected when there is little user movement and the channel is almost time-invariant.
In this article, we show that sparse arrays can be incorporated in the estimation of AoA and AoD for mmWave channels. The precoder is a submatrix of a banded Toeplitz (SBT). In this case, the coefficients in each column vector of the precoder are obtained by shifting the coefficients of a prototype beamformer. We show that when the shifts are designed according to the antenna spacing of a sparse array, array augmentation can be achieved. Suppose linear arrays are used, and there areN r RF chains at the receiver andN t RF chains at the transmitter. We can construct a virtual uniform linear array (ULA) of size in the order ofN 2 t at the transmitter, and a virtual ULA of size in the order ofN 2 r at the receiver. By designing the prototype beamformer to have random phase, we can decorrelate signals of different paths and apply sparse array results whether the channel is time-invariant or timevarying. This means angle estimation can be decoupled from the channel coherence time. Like earlier sparse array results, the decorrelation of the paths needs a large number of training vectors as correlation among the paths leads to interference in angle estimation. When a large number of training vectors is not available, we develop an interference cancellation scheme based on initial estimates of the angles. With the aid of interference cancellation, more accurate angle estimates can be obtained, as will be demonstrated in simulations.
Notation: The variance of a random variable x is denoted as σ 2 x and the expectation of x by E[x]. The 2-norm of a vector f is denoted as ||f|| and the k-th entry of f as [f] k . The notation A T , A * , and A H denote, respectively, the transpose, the conjugate, and conjugate transpose of a matrix A. I n denotes an n × n identity matrix. Given an m × n matrix X, the notation vec(X) denotes the mn × 1 vectorized version of X. The notation A B denotes the Khatri-Rao product and A ⊗ B the Kronecker product of two matrices A and B. A useful property concerning the two products is as follows. Let X = ABC, then vec(X) = (C T ⊗ A)vec(B). When B is a diagonal matrix, we have vec(X) = (C T A)b, where b is a column vector that consists of the diagonal elements of B.
Outline of the Paper: A system model is given in Sec. II. Sec. III introduces SBT precoding. In Sec. IV, a review of sparse arrays is given and the proposed array augmentation using sparse array spacing for mmWave MIMO systems is presented. Simulation results are given in Sec. V and a conclusion in Sec. VI.

II. SYSTEM MODEL
Consider the wireless system with N t transmit antennas and N r receive antennas in Fig. 1. The transmitter is equipped withN t RF chains and the receiverN r RF chains. For the kth training block, the channel is modeled by an N r × N t matrix H k . We adopt the geometric channel representation that is useful for modeling mmWave propagation. Suppose VOLUME 8, 2020 the channel has L paths, then [25] where D α,k is an L × L diagonal matrix with the i-th diagonal element [D α,k ] i,i = α k,i , which is the complex gain of the i-th path. The two matrices A t and A r consist of antenna response vectors at, respectively, the transmitter and receiver sides. In particular, A r = a r,1 a r,2 · · · a r,L and A t = a t,1 a t,2 · · · a t,L , are of dimensions, respectively, N r × L and N t × L, where a r,i and a t,i are antenna response vectors that depends on the array. We assume ULA is used. For an ULA, a r,i can be given in terms of the angles of arrival (AoA) θ r i and a t,i in terms of the angles of departure (AoD) θ t i . Define the N × 1 vector then a r,i = a N r (u i ) and a t,i = a N t (p i ), where u i = ξ cos θ r i , p i = ξ cos θ t i , ξ = 2πd/λ, d is the antenna spacing and λ the wavelength.
Assumptions: We assume the channel stays the same during one training block ofN t training vectors. The AoA and AoD remain unchanged during the training period. For the path gains, we consider two cases. (1) The path gains are fixed during training, i.e., α k,i = α i . In this case we say the channel is time-invariant. (2) The path gains α k,i vary from block to block, in which case the channel is said to be time-varying. The channel is time-invariant if the entire training period is less than the channel coherence time.
For the kth training block, the transmit precoding matrix F k = f k,0 f k,1 · · · f k,N t −1 is of dimensions N t ×N t and ||f k, || = 1. The receive combining matrix is G k = g k,0 g k,1 · · · g k,N r −1 , of size N r ×N r , and ||g k, || = 1. Consider a block ofN t training vectors S = s 0 s 1 · · · sN t −1 , which is used for all training blocks. The kth received block Y k = y 0 y 1 · · · yN t −1 is given by 1 where N k is the N r ×N t channel noise matrix. The channel noise is assumed to be additive white Gaussian with zero mean and variance σ 2 n .

III. SBT PRECODING
Consider the following class of matrices. Let B be an N × M matrix with N ≥ M and n 0 , n 1 , · · · , n M −1 be integers such that 0 = n 0 < n 1 < n 2 < · · · < n M −1 ≤ N − 1. Suppose B satifies [B] n,0 = 0 for n > N − 1 − n M −1 and Such a matrix is lower triangular and it is an N ×M submatrix of an N × N banded Toeplitz (SBT) by keeping only the column vectors corresponding to the integers n 0 , n 1 , · · · , n M −1 . Fig. 2 gives an example of an SBT precoder withN t = 3, 1 In the presence of mutual coupling among the antennas, the output of one antenna can be affected by other antennas. We assume there is no mutual coupling.   Fig. 2, will be referred to as the prototype beamformer. For example, the first column vector of F k is equal to f k,0 = f T k 0 T . When the precoder F k is SBT with {n 0 , n 1 , · · · , nN t −1 }, the i-th column vector f k,i is the same as the prototype beamformer f k except for a shift of n i . Similarly, when the combiner G k is SBT with {m 0 , m 1 , · · · , mN r −1 }, the k-th column vector g k,i is the same as the prototype g k except for a shift of m i . Let ] n e −jnω be, respectively, the discrete-time Fourier transform of the beamforming prototype f k and the combining prototype g k . We have the following lemma.
Lemma 1: When the precoder F k is SBT with {n 0 , n 1 , · · · , nN t −1 }, then where D f ,k is a diagonal matrix with the i-th diagonal element given by [D f ,k ] i,i = F k (p i ). TheN t × L matrixĀ t has the ith column given byā t,i = 1 e jp i n 1 · · · e jp i nN t −1 T . Likewise, when the combiner G k is SBT with {m 0 , m 1 , · · · , mN r −1 }, then The matrixĀ r is of dimensionsN r ×L and the i-th column is given byā r,i = 1 e ju i m 1 · · · e ju i mN r −1 T .
It can be rewritten as e −jn p i F k (p i ) as f k, is a shift of the prototype by n . It follows that the i-th row of A H t F k can be written as The right hand side of the above equation is F k (p i )ā H t,i , wherē a t,i is as defined in Lemma 1, and (5) follows. We can derive (6) in a similar manner.
Using the result in Lemma 1, the received block in (3) can be simplified as where we have used S = σ s IN t and D k = σ s D H g,k D α,k D f ,k with D f ,k and D g,k as defined in (5) and (6). Notice that the signal partĀ r D kĀ H t is the same as that of a system withN r receive antennas,N t transmit antennas and equivalent path gains [D k ] i,i = d k,i , given by With the use of SBT precoders and combiners, the system behaves as if there is no precoding and combining. The effect of precoding and combining is absorbed into the equivalent path gains. In other words, we can use SBT precoders and combiners to affect the equivalent path gains. Let us vectorize the received matrix Y k and call it y k , we have where It is given by where (9) is uncorrelated and R n = σ 2 n IN tNr . Uniform SBT: From Lemma 1, we see thatĀ t is an antenna response matrix corresponding to a hypothetical array, whose elements are placed according to {n 0 , n 1 , · · · , nN t −1 }. SimilarlyĀ r is an antenna response matrix corresponding to a hypothetical array, whose elements are placed according to {m 0 , m 1 , · · · , mN r −1 }. In the special case that the hypothetical arrays are uniform, i.e., n i = i and m i = i, bothĀ t andĀ r are antenna response matrices corresponding to ULA of sizes N t andN r , respectively. In this case, the estimation of angles from R y is a two-dimensional direction finding problem [27] with an effectiveN t -element ULA at the transmitter andN relement ULA at the receiver. In particular, we can compute the eigen decomposition of R y , The signal subspace can be expressed as U s = Ā * t Ā r W, for some invertible matrix W. Based on the signal subspace U s , the angles can be estimated using 2D MUSIC or 2D ESPRIT. The number of identifiable paths is up toN rNt − 1 with 2D MUSIC [32] and up toN rNt − max(N r ,N t ) with 2D ESPRIT [33]. In the following section, we will see that the effective ULA size can be increased to O(N 2 t ) at the transmitter and O(N 2 r ) at the receiver if we use sparse array spacing to design the SBT precoder and combiner.

IV. DIRECTION FINDING USING SBT
It is known that for a uniform linear array (ULA) with N antennas, subspace-based methods such as MUSIC can identify up to N − 1 uncorrelated sources [10]. The number of identifiable sources can be significantly increased through the use of sparse linear arrays. By judiciously placing N antennas, a virtual ULA of size in the order of N 2 can be constructed and the number of resolvable sources increased. For mmWave massive MIMO systems, the antenna array can be large but the number of RF chains is limited. In the following, we will see that for mmWave channels we can construct virtual arrays of sizes in the order ofN 2 t for the transmitter andN 2 r for the receiver. This can be done by designing SBT precoders and combiners according to antenna spacing in sparse linear arrays. To make the paper more self-contained, we review in Sec. 4.1 related material on sparse linear arrays [6], [7] that is essential for our subsequent discussion. In Sec. 4.2, we show how to take advantage of sparse array results to increase the effective array sizes for hybrid mmWave MIMO systems.

A. REVIEW OF SPARSE LINEAR ARRAYS
Consider N antennas that are placed at the following locations Then the sparse array is said to have degree of freedom (DoF) equal to 2M r + 1.
For example, let N = 4 and S = {0, 1, 4, 6}. Then D = {n : −6 ≤ n ≤ 6}; it contains 13 consecutive integers, from −6 to 6 and DoF = 13. Due to the consecutive integers in the coarray, a virtual ULA of size 2M r + 1 can be constructed. Suppose there are L sources and the antennas are placed according to S. The received vector is where α k is the L × 1 source vector, A r is the antenna response matrix with the i-th column given by a r,i = e jm 0 u i e jm 1 u i · · · e jm N −1 u i T , u i is the AoA parameter of the i-th source and n k is the channel noise, assumed to be white and uncorrelated with the sources. The autocorrelation matrix Assume the sources are uncorrelated, then R α is diagonal. The vectorized version of R y is given by where λ is an L ×1 vector consisting of the diagonal elements of R α . It turns out that A * r A r contains an antenna response matrix corresponding to an enlarged ULA of size 2M r +1. Let C = A * r A r and its i-th column vector be c i , then c i is given by c i = a * r,i a r,i , a vector of size N 2 × 1. As the elements VOLUME 8, 2020 of a r,i are e jm u i for = 0, 1, · · · , N − 1, the elements of There are some repeated elements and thus C has some repeated rows. We can select distinct rows of C that correspond to the consecutive integers in D using a (2M r + 1) × N 2 selection matrix. For the above example D = {−3, −2, · · · , 3} and DoF is 7. We can use the 7×9 selection matrix DefineÃ r asÃ which is an antenna response matrix corresponding to a (2M r + 1)-element ULA with antennas located at where the ith column vector ofÃ r is given byã r,i = e −jM r u i e −j(M r −1)u i · · · e jM r u i T . We observe thatr y is equivalent to the output of a virtual ULA of size 2M r + 1 with a single transmit vector λ. By applying spatial smoothing onr y as in [7], we can obtain a (M r +1)×(M r +1) square matrixR y . Thanks to the ULA structure inr y , the signal subspace can be extracted fromR y and the angles estimated using useful subspace methods like MUSIC or ESPRIT. Up to M r sources can be identified. Various sparse arrays have been proposed to design the spacing of the antennas so that M r is in the order of N 2 e.g., minimum redundancy arrays [26], nested arrays [6]. Then the array size is enlarged from N to O(N 2 ). The Cramér-Rao bound (CRB) for sparse arrays have been analyzed in [8], [9], [29].
There are some differences between the problem of direction finding for mmWave channels and that typically considered in sparse linear arrays for radar applications. For the mmWave channel in (1), we need to estimate AoAs as well as AoDs, not just AoAs. Furthermore, the signals corresponding to different paths, can be correlated. We will see that by using random SBT precoding, we can decorrelate the paths so that sparse array results can be applied. The decorrelation allows us to construct virtual arrays of augmented sizes as in sparse linear arrays.

B. VIRTUAL ARRAY AUGMENTATION USING SPARSE ARRAY SPACING
We use SBT precoder F k and combiner G k as in Lemma 1. Suppose the sets of integers used to construct F k and G k are given, respectively, by Similarly for the receiver side, the difference coarray D r = , M r and DoF= 2M r + 1. To illustrate how sparse array results can be used in direction finding for mmWave systems, we first consider the special case when there is a single RF chain at the transmitter, i.e.,N t = 1. The autocorrelation matrix R y in (10) reduces to When the equivalent path gain vector d k is uncorrelated, R d is a diagonal matrix. How the equivalent path gains can be decorrelated will be addressed later in the design of precoders and combiners. Assuming R d is diagonal and vectorizing R y , we have where λ is an L × 1 vector that consists of the diagonal elements of R d and r n = vec(R n ) is the vectorized version of the noise autocorrelation matrix. Observe that the signal part in (17) has the same form as that in (12). Suppose the set S r used to construct the SBT combiner is designed according to a sparse array and the associated DoF is 2M r + 1. Just like in Sec. 4.1, we can apply a (2M r + 1) ×N 2 r selection matrix J r on r y so that r y = J r r y corresponds the output of a virtual array of size 2M r + 1. That is whereÃ r is as given in (13). Then the angles can be estimated as in sparse arrays. Now let us address the more general caseN t ≥ 1. Vectorizing the autocorrelation matrix in (10), we have, a vector of size (N tNr ) 2 × 1. Theorem 1: Suppose the precoder and combiner are SBT with associated sets of integers as in (15). Assume the equivalent paths are uncorrelated and R d is diagonal. Then the autocorrelation matrix R y of the received vector can be vectorized as r y in (18). We can apply a selection matrix J on r y so thatr y = Jr y , a vector of size (2M r + 1)(2M t + 1) × 1, is of the formr whereÃ r is a (2M r + 1) × L ULA antenna response matrix with the i-th column vectorã r,i = [e −jM r u i e −j(M r −1)u i · · · e jM r u i ] T . The (2M t + 1) × L matrixÃ t is also a ULA antenna response matrix with the i-th column vectorã t,i = e −jM t p i e −j(M t −1)p i · · · e jM t p i T . The noise vector q is given by q = Jr n . A proof is given in Appendix A and a construction of the selection matrix J is given therein. Now let us consider the noise term q in (19). Suppose G k is statistically semi-unitary In this case, the noise vector q has a close-form expression as given in the following lemma (a proof given in Appendix B).
Lemma 2: When the combiner G k is statistically semiunitary with E[G H k G k ] = IN r , the noise vector q = Jr n in Theorem 1 has only one nonzero element, right in the middle of the vector, and it is equal to the channel noise variance σ 2 n . Observe thatr y in Theorem 1 is equivalent to the received vector when the receiver is equipped with a (2M r + 1)-antenna ULA and the transmitter a (2M t + 1)-antenna ULA. The signal and noise subspaces can be obtained from the vectorr y by using 2D matrix pencil [27] or 2D spatial smoothing [31], [37]. The spatially smoothed matrix contains both signal and noise parts. The signal part can be derived as in [27], [37]. But the noise part is different as the autocorrelation matrix has been rearranged as in Theorem 1; it is not the usual channel noise term and additional derivation is needed. To explain the behavior of noise after spatial smoothing is applied, we define the 1D selection matrix J ×i for x ∈ {t, r} and i = 0, 1, · · · M x , and also the 2D selection matrix for i = 0, 1, · · · M t and j = 0, 1, · · · M r . The 2D spatially smoothed matrix is defined as [37] Y ss = x 0,0 x 0,1 · · · x 0,M r x 1,0 · · · x M t ,M r , where x i,j = J i,jry andr y is as given in (19). It is a square matrix of size (M t + 1)(M r + 1) × (M t + 1)(M r + 1). When spatial smoothing is applied onr y , the noise part is spatially smoothed along with the signal part. We have the following expression of Y ss . Theorem 2: Suppose the combiner G k is statistically semi-unitary as in Lemma 2. When we apply spatial smoothing on the vectorr y in (19), the spatially smoothed matrix Y ss in (21) is given by whereÃ t is an (M t + 1) × L ULA antenna response matrix with the i-th column given byã t,i = 1 e jp i · · · e jM t p i T and A r an (M r + 1) × L ULA antenna response matrix with the i-th column given byã r,i = 1 e ju i · · · e jM r u i T .
A proof is given in Appendix C. Theorem 2 means that Y ss is equivalent to the autocorrelation matrix of a system with an (M r + 1)-element ULA at the receiver and (M t + 1)element ULA at the transmitter. The signal subspace, i.e, the column space of Y ss , can be obtained by applying eigen value decomposition on Y ss as in (11). Due to the embedded ULA structure in the signal subspace, 2D subspace methods such as MUSIC and ESPRIT can be used for angle estimation. The number of identifiable paths is up to (M t +1)(M r +1)−1 when 2D MUSIC [32] is applied and up to max(M t (M r + 1), M r (M t + 1)) when 2D ESPRIT is applied [33]. In the case that sparse array spacing is used to construct the precoder and combiner, M t is in the order ofN 2 t and M r in the order of N 2 r . After estimates of the angles are obtained, the path gains can be estimated using least squares [27]. Notice that the signal part of Y ss in (22) is similar to that of R y in (10) when the integers used to construct the precoder and combiner are uniform. The difference is thatĀ r andĀ t in (10) correspond to ULA of sizesN r andN t , respectively. But in (22), the two matricesÃ r andÃ t correspond to ULA of sizes M r + 1 and M t +1, respectively. By using sparse array spacing, the size of the ULA at the transmitter is enlarged fromN t to O(N 2 t ), and the size of the ULA at the receiver enlarged fromN r to O(N 2 r ). In the above discussion we assume that the signals of different paths are uncorrelated and R d is a diagonal matrix so that sparse array results can be used. The following lemma states that R d is a diagonal matrix if the path gains are uncorrelated or when the prototypes are properly designed so that the equivalent path gains becomes uncorrelated.  (2) Proof: For the i-th diagonal element, we have which is positive if the first condition is satisfied. For the off-diagonal elements, note that which is equal to zero if condition (2) holds. The second condition can be satisfied if the path gains α k,i are uncorrelated random variables with zero mean. It happens when the channel exhibits fading during training, i.e., the separation of training blocks larger than the channel coherence times [23]. Then R d becomes a diagonal invertible matrix and the paths are uncorrelated automatically even for deterministic precoder and combiner. On the other hand, if the channel coherence time is larger than the training period, i.e., time-invariant channels, the path gains stay the same during training and E[α k,i α * k,j ] = 0. In this case, the second condition in Lemma 3, i.e, path decorrelation, can be satisfied VOLUME 8, 2020 by using random precoding and combining (to be discussed next). Then the equivalent paths are uncorrelated, irrespective of the channel coherence time.
Design of prototypes: For SBT precoders and combiners, only the prototypes f k and g k need to be designed. From (4), we see that the precoder prototype f k has N t − nN t −1 coefficients and the combiner prototype g k has N r − mN r −1 coefficients. Lemma 3 shows that the path gains can be successfully decorrelated if the prototype F k (ω) or G k (ω) is uncorrelated in frequency, i.e., This can be satisfied approximately by designing the prototypes as follows. Let us choose the coefficients of the prototype to be binary random phases. That is, where β k,i are independent and β k,i = 1 or −1 with equal probability, and γ = 1/ (N t − nN t −1 ) is a normalizing factor so that the prototype has unit norm. In this case, it can be shown that (a proof given in Appendix D) which is approximately zero when (N t − nN t −1 ) is large. The above equation also implies that That is, condition (1) of Lemma 3 is satisfied. When the prototype combiner g k is designed in the same manner, we can verify that the statistically semi-unitary property E[G H k G k ] = IN r used in Lemma 2 and Theorem 2 is satisfied. From (24), we see that there is significant correlation between F k (p i ) and F k (p j ) if two AoDs are close such that |p i − p j | < 2π/(N t − nN t −1 ). In this case, the path gains can still be decorrelated if the AoAs are sufficiently separated, i.e., |u i − u j | > 2π/(N r − nN r −1 ). As long as the AoAs and AoDs are not close at the same time, we can use random beamforming and combiner to decorrelate the path gains. When two paths have close AoAs and AoDs, they can be merged as one path and there is no need of decorrelating the two. With the design in (23), the coefficients of the precoder are 1, −1, or 0, which can be implemented using phase shifters and switches [35].
Proposed Algorithm: The proposed angle estimation using random SBT precoding is as follows. For the k-th training block, we use random precoder F k and random combiner G k with prototypes designed according to (23 ). Given N b received training blocks Y 1 , Y 2 , · · · Y N b , we vectorize them to y 1 , y 2 , · · · y N b . The empirical autocorrelation matrix is computed as The ith equivalent path gain is given by We can vectorize R y,e , and apply selection matrix as in Theorem 1 and spatial smoothing as in (21) to obtain Y ss,e , the empirical version of the spatially smoothed matrix Y ss in (22). Based on Y ss,e , 2D subspace based methods like MUSIC or ESPRIT can be applied to estimate the angles. The above procedure is summarized in Steps 1-5 of Algorithm 1.
Notice that, with finite number of training vectors, R d,e is not diagonal and the off-diagonal entries will interfere with the estimation of angles later. The problem can be greatly alleviated if estimates of the angles are available so that the interference can be estimated and canceled. We observe that the vectorized R y,e is given by where r d,e = vec(R d,e ). In the above expression, one of the Khatri-Rao product in (18) is replaced with Kronecker product as R d,e is not diagonal. We can rewrite r y,e as where λ e is an L × 1 vector that consists of the diagonal elements of R d,e , λ off = vec(R d,e ), and R d,e is the same as R d,e except that the diagonal elements are replaced by zeros. The first term is the desired signal part, while the second is the interfering term due to the off-diagonal entries of R d,e . Using r y,e , we can obtain estimates of the angles as in Steps 1-5 of Algorithm The cancellation procedure is given in Steps 6-8 of Algorithm 1. After removing the interference, we can user y,e in place of r y,e and proceed with Steps 3-5 of Algorithm 1.
Due to the removal of interference, we can obtain better angles estimates. The new estimates can also be used for another iteration of interference removal, after which angle

Compute the empirical autocorrelation matrix
y k y H k and its vectorized version r y,e . 3. Apply a selection matrix J on r y,e as in Theorem 1 to obtainr y,e = Jr y,e . 4. Apply spatial smoothing onr y,e as in (21) to obtain Y ss,e . 5. Obtain estimatesû 1 ,û 2 , · · · ,û L of AoAs and estimateŝ p 1 ,p 2 , · · · ,p L of AoDs from Y ss,e using 2D estimation methods like MUSIC or ESPRIT. 9. Set r y,e =r y,e and repeat Steps 3-5.
estimation can be performed again. Table 1 gives the order of complexity for the steps in Algorithm 1 that require computations. In Table 1, ESPRIT is used for Step 5 and the order of complexity is obtained from [34], [39].
Single RF Chain: The proposed method can not be directly applied if the transmitter or receiver has only one RF chain.
For example, when there is only one transmit RF chain, AoD can not be estimated. One way around this is to use extra training time to compensate for a single RF chain. For example, suppose there is only on transmit RF chain. We would like the transmitter to behave as if there wereN t RF chains and the hypothetical precoder F k is N t ×N t . With the training input block S = IN t , the ith transmitter output vector is precisely the ith column vector of F k . For the first training vector of the kth block, the first column vector f k,0 of F k is used as the actual beamformer. For the next training vector, the next column vector f k,1 is used as the actual beamformer. Preceding in this mannerN t times, we haveN t transmit output vectors that are equal to the output vectors of the hypothetical transmitter F k withN t RF chains. Thus, by using extraN t − 1 training vectors, we can obtain a transmitter that acts as if there wereN t RF chains. We can use the same trick too when the receiver has only one RF chain.

V. SIMULATIONS
Consider the channel model in (1) with antennas spaced by half wavelength. We use 10 4 channel realizations in the Monte Carlo evaluation. The channel is time-varying in Example 2 and time-invariant in the rest of the examples. The following setting is used, unless otherwise indicated. The numbers of transmit and receive antennas are N t = N r = 64, the numbers of RF chain areN t =N r = 4, and 100 training blocks are used in the simulation. The AoA and AoD are uniformly distributed over [−π, π]. In Algorithm 1, we use 2D ESPRIT to estimate the angles and four iterations of interference cancellation are applied.
Example 1: With the use of random beamforming, the beam pattern is different for every training block. Upon collecting N b training blocks, we can compute the equivalent path gains using the empirical autocorrelation matrix given in (26). Suppose the actual path gains are time-invariant α k,i = α i and the combining prototype is fixed for all training blocks. The ith equivalent path gain is given by σ 2 is the average beam pattern at the transmitter and F k (ω) denotes the discrete-time Fourier transform of the prototype for the kth training block. Fig. 3 plots the average beam pattern F ave (ω) when the number of training blocks N b is 1, 10, 100, and 5000. The average beam pattern approaches a constant as N b increases. This is consistent with the fact that with binary random coefficients, the prototype satisfies E[|F k (ω)| 2 ] = 1 for all ω, as given in (25).
Example 2: In this example, we examine the case when the number of training blocks N b is finite. For a time-varying channel, Fig. 4 plots the mean squared error as a function of N b for different SNR σ 2 s /σ 2 n = −10 dB, 0 dB, and 10 dB. The mean squared error 1 for AoA are computed for L = 2 paths, where p i = π cos θ t i and u i = π cos θ r i . We have used θ t 1 = 20 • , θ t 2 = 40 • , and θ r 1 = −20 • , and θ r 2 = −40 • . We see that the error decreases steadily with N b . Good estimates can be obtained even when there is only a moderate number of training vectors. For example, when N b = 100, the error is around −40dB for SNR = 0 dB. As a benchmark, the CRB [8] of the estimation error has been plotted. The estimation error is close to the CRB except when the SNR is low and the number of training blocks is small. Therefore the CRB provides an accurate characterization of the estimation error.
Example 3: For time-invariant channels, the equivalent path gains can be approximately decorrelated by properly designing the random prototype when N t is large. In this example we examine the performance as a function of N t when the beamforming prototype have random coefficients as in (23) while the combining prototype is a deterministic vector g k = 1 0 · · · 0 T with N r = 64. Fig. 5(a) shows the normalized mean squared error (NMSE) of the channel and SNR σ 2 s /σ 2 n = 5 dB. The NMSE of the channel is defined as E[||Ĥ − H|| 2 F /||H|| 2 F ], whereĤ is the estimated channel matrix. The NMSE is shown forN r =N t = 4, 5, and 6. The set S t , taken from the table for minimum redundancy array in [36], are {0, 1, 4, 6}, {0, 1, 4, 7, 9}, and {0, 1, 4, 5, 11, 13} forN t = 4, 5 and 6, respectively. We can see that the error reduces with N t andN t . Without interference cancellation (IC), the performance for different N t can be close for small N t . For example, when N t = 16, there is no difference betweenN t = 5 andN t = 6. This can be understood as follows. The prototype has N t − nN t −1 coefficients. When N t = 16, the number of nonzero coefficients are 7 and 3 forN t = 5 andN t = 6, respectively. When the prototype has few nonzero coefficients, the gain from enlarged virtual arrays can be offset by the loss due to a lack of design freedom for decorrelating the paths. But when interference cancellation is applied, increasingN t still lowers the NMSE for a small N t . This is because interference cancellation reduces the interfering off-diagonal terms of R d,e , i.e., decorrelating the equivalent path gains.
Example 4: For a finite number of training vectors, the empirical autocorrelation matrix of the effective path gains R d,e is not diagonal, which causes interference in angle estimation. In Fig. 6, we examine the usefulness of interference cancellation when the precoder and combiner prototypes are random, as designed in (23). The NMSE  is plotted with and without interference cancellation for L = 2 in Fig. 6(a). The effect of interference cancellation is particularly significant for high SNR range when the number of training blocks is small, e.g., N b = 50. For high SNR, the noise is small and the remaining path correlation plays a major role in angle estimation. With cancellation, the interference is largely removed and the estimation accuracy greatly improved. Fig. 6(b) shows the NMSE for different number of paths L when N b = 1000. As L increases, the L × L empirical autocorrelation matrix R d,e has more off-diagonal terms and thus more interfering terms in angle estimation. A satisfactory NMSE can be achieved even for a larger L when interference cancellation is applied.
Example 5: When two angles are close, they may not be resolved correctly. Consider two AoAs, θ r 1 = 30 • − θ and θ r 2 = 30 • + θ. The two AoAs are considered successfully identified if |θ r i − θ r i | < θ, for i = 1, 2 [9]. Fig. 7 shows the probability of resolution, i.e., the probability when the two AoAs are successfully identified for SNR = 5 dB. The AoDs are chosen as θ t 1 = 15 • , and θ t 2 = 45 • . We plot the probability as a function of θ. When θ is 1 • , the two AoAs can be successfully identified with high probability, around.86,.93, and.95, respectively for N b = 128, 256, and 1000. Example 6: In the data transmission to be followed after channel estimation, the precoder and combiner are designed according to the estimated channel, which results in a degradation. Fig. 8(a) shows the achievable rate, which is given by where ρ = σ 2 s /σ 2 n and R g = G H G. The result is compared with [15], [19] and [20] for L = 3 paths. In [15], a multi-resolution training codebook of beamforming vectors is used for multi-stage closed loop angle estimation; feedback is required before the training of the next stage. In Fig. 8 four stages, i.e., three feedbacks, are used and in each stage the resolution is increased by a factor of four, resulting in a total of 432 training blocks. For [19], [20], the angles are estimated using compressive sensing, OMP in [19] and generalized block OMP in [20]. For both compressive sensing curves, the dictionary size is 100 for AoA and AoD. The resolution increases with the dictionary sizes, and so does the complexity, which increases with the product of dictionary sizes. The number of training blocks used is 256 for [19], [20] VOLUME 8,2020 and the proposed method. The proposed 'SBT' is very close to 'Perfect CSI' in middle and high SNR range. It enjoys a much lower complexity compared to compressive sensing based methods. In terms of complexity, it is in the order of 1.8 * 10 5 for 'SBT' (computed from Table 1), 6.4 * 10 7 for [15], and 4.1 * 10 7 for both [19] and [20] (obtained from [38]). Fig. 8(b) shows the NMSE comparison of the systems considered in Fig. 8(a). We can see that a small estimation error can be achieved with 'SBT'.

VI. CONCLUSION
In this article we consider the estimation of AoA and AoD for mmWave massive MIMO communication systems using SBT precoders and combiners. By designing SBT precoder and combiner based on the spacing of sparse arrays, we can construct augmented virtual ULA and reduce estimation error. Due to random beamforming, the equivalent path gains can be decorrelated even if the channel is time-invariant. Complete decorrelation of different paths requires arrays of infinite sizes and infinitely many training vectors. However, with the aid of interference cancellation, simulations show that the proposed method achieves high estimation accuracy even for a finite number of training vectors and moderate antenna sizes. The expression of r y in the AoA-only case involves only the Khatri-Rao product of two antenna response matrices and we can apply a selection matrix to obtain a ULA matrix. In the general case, eq. (18) involves the Khatri-Rao product of four antenna response matricesĀ t Ā * r Ā * t Ā r . We would like to switch the ordering so thatĀ r andĀ * r are next to each other, andĀ t andĀ * t are also next to each other. This will allow us to exploit the results from sparse arrays directly. The order of the four matrices in the product can be changed with proper permutation. The following fact is useful for constructing such a permutation matrix.
Fact 1: Given two vectors a and b, the Khatri-Rao products a b and b a are permutation equivalent [30]; one can be obtained from the other through a permutation. That is, a b = P(b a), where P is a permutation matrix. Let a be M × 1 and b be N × 1. Then the (M + k)-th row of P is e T +kN for = 0, 1, · · · , N − 1, and k = 0, 1, · · · , M − 1, where e i is the i-th column of I MN .
The noise vector q can be obtained by directly applying J on r n , so we consider only the signal term in the following. Using Fact 1, we can construct a permutation matrixP 1 such thatP 1 (Ā * t Ā r ) =Ā r Ā * t . Let P 1 = IN tNr ⊗P 1 and r 1 = P 1 r y . Using the property (A ⊗ B)(C D) = (AC) (BD) and the expression of r y in (18), the vector r 1 can be written as r 1 we can use Fact 1 again to construct a permutation matrix P 2 so that r 2 = P 2 r 1 is given by Let then we have r 2 = Pr y . Now the four matrices in the above expression of r 2 are in the desired order. As in the single transmit RF chain case, we can apply a selection matrix J r onĀ * r Ā r so thatÃ r = J r (A * r A r ) is an antenna response matrix corresponding to a (2M r + 1)element ULA. Similarly, we can apply a selection matrix J t onĀ * t Ā t so thatÃ t = J t (A * t A t ) is an antenna response matrix corresponding to a (2M t + 1)-element ULA. Let us apply the selection matrix J t ⊗ J r on r 2 , then where P is as in (31), then J is a selection matrix. Then, the signal part ofr y is as given in Theorem 1.

APPENDIX B PROOF OF LEMMA 2
As the noise n k is uncorrelated, r n = σ 2 n1 , where1 = vec(IN tNr ). We have q = Jr n = σ 2 n J1. Observe that1 contains onlyN tNr identical nonzero coefficients. We can obtain q by tracking the nonzero coefficients upon the application of J. The matrix J in (32) contains two parts, a selection matrix J t ⊗ J r and a permutation matrix P, which in turns consists of permutation matrices P 1 and P 2 . We consider the application of J t ⊗ J r and the two permutations separately.
Application of permutations. Let r 2 = Pr y , we have Notice that1 is the stacking ofN tNr standard vectors e i , for i = 0, 1, · · ·N tNr − 1. For each i, we can find unique 0 ≤ j i ≤N t − 1 and 0 ≤ k i ≤N r − 1 such that Furthermore there is a one-to-one correspondence between i and (j i , k i ). Upon the first permutation, the location of the '1's move and we can verify that where i = k iNt + j i . To consider the second permutation, let q 1 = P 2 P 11 . It can be shown that the nonzero coefficient in e i will move to the i -th entry of q 1 , where i = j iNtN 2 r + iN r + k i . Using the expression i = j i N r + k i , we have i = N 2 r j i (N t + 1) + k i (N r + 1). The nonzero coefficients in q 1 are located atN 2 r m(N t + 1) + k(N r + 1) for 0 ≤ m ≤N t − 1 and 0 ≤ k ≤N r − 1.
Application of J t ⊗ J r . The noise term in (19) is q = (J t ⊗ J r )q 1 . As the selection matrix J t ⊗ J r is designed according to the signal part, we turn our attention to the signal plus noise vectorr y . Using the expression of r 2 in (33) andr y = (J t ⊗ J r )r 2 , we get To consider the application of J t ⊗ J r , let us unvectorize r 2 and shape it to anN 2 r byN 2 t matrix, where Q 1 is anN 2 r byN 2 t matrix obtained by unvectorizing q 1 . Usingr y = (J t ⊗J r )r 2 , we can unvectorizer y and shape it to the (2M r + 1) by (2M t + 1) matrixR y = J r R 2 J t . It follows thatR where Q = J r Q 1 J t . The noise vector q in (19) can be obtained by vectorizing Q.
We can verify that when we unvectorize q 1 , the nonzero unity coefficients in q 1 move to (k(N r + 1), m(N t + 1))th element of Q 1 for k = 0, 1, · · · ,N r − 1, and m = 0, 1, · · · ,N t − 1. Notice that the above row index k(N r + 1) for k = 0, 1, · · · ,N r − 1 are also the indices of the rows ofĀ r Ā * r whose elements are all equal to unity. When we apply the selection matrix J r onĀ r Ā * r we keep only one row that corresponds to the unity rows ofĀ r Ā * r . Similarly, the column index m(N t + 1) in (k(N r + 1), m(N t + 1)) for m = 0, 1, · · · ,N t − 1 are also the indices of the rows of A t Ā * t whose elements are all equal to unity. When we apply the selection matrix J t onĀ t Ā * t we keep only one row that corresponds to the unity rows. Therefore we are left with only one nonzero coefficient in the noise term. Now we only need to determine where the nonzero coefficient is in the matrix Q. As the unity row ofÃ t is in the middle of the matrix, i.e., M tth row, and the unity row ofÃ r is also in the middle of the matrix, i.e., M r -th row, the nonzero coefficient is moved to the (M r , M t )-th entry. Unvectorizing Q, we obtain q as stated in the Theorem.

APPENDIX C PROOF OF THEOREM 2
The signal part of Y ss can be obtained using the result from [27]. We will consider the noise part only. Let q i,j = J i,j q, where J i,j is as defined in (20). Then the noise part of Y ss is q 0,0 q 0,1 · · · q 0,M r q 1,0 · · · q M t ,M r .
We can obtain the expression in (22) if we can show that the column vectors in the above matrix are σ 2 n e 0 , σ 2 n e 1 , σ 2 n e 2 , · · · , where e i is the i-th column of the identify matrix I (M t +1)(M r +1) . Notice that q i,j = (J t i ⊗ J r j )q can also be expressed as where Q i,j = J r j Q(J t j ) T and Q is the (2M r + 1) × (2M t + 1) unvectorized version of q that is given in (36). As shown in the proof of Lemma 2, the matrix Q has only one nonzero coefficient, right at the center of the matrix. The expression Q i,j = J r j Q(J t j ) T means that Q i,j is a submatrix of Q. An example of Q for M t = M r = 2 is as shown in Fig. 9. The vector q 0,0 is the vectorized version of Q 0,0 , which is shown in the figure, and q 0,1 is the vectorized version of Q 0,1 , also shown. Each Q i,j contains the nonzero entry of Q and thus each q i,j is a standard vector scaled by σ 2 n . In particular, q 0,0 = σ 2 n e 0 , and q 0,1 = σ 2 n e 1 . Notice that upon vectorization of Q i,j , the location of the nonzero entry in q i,j moves down by one when j increases by one and the location of the nonzero entry moves down by M r + 1 when i increases by one. Therefore the matrix in (37) is equal to σ 2 n I (M r +1)(M t +1) .

APPENDIX D PROOF OF (24)
Using the beamforming prototype in (23) As E[β k,n β k,m ] is equal to 1 for n = m and it is equal to 0 for n = m, we have Using the above expression, we can obtain |E[F k (p i )F * k (p j )]| as given in (24).