Robust Gridless Estimation of Angles and Delays for Full-Dimensional Wideband mmWave Channels

In this paper, we consider robust channel estimation for a millimeter wave (mmWave) massive MIMO system with uniform planar arrays (UPA). For many gridless angle estimation methods of mmWave channels, the channel gains needs to be time-invariant during training. We propose a gridless method that is applicable to time-invariant and time-varying channels, and the proposed method is robust to channel variations. This is done by employing time-varying precoders and combiners that are banded block Toeplitz (BBT) matrices. The advantage of time-varying BBT precoders is two-fold. First, with time-varying BBT precoding we can fabricate time-varying channels. Thus the channel can be estimated as if it is a time-varying channel whether the actual channel is time-varying or not. We will show that, thanks to fabricated time variation in the channel, a small training overhead can be used. Secondly, BBT precoding and combining preserve the shift invariance property of the received data matrix. Due to the preservation of shift invariance property, we can employ most existing direction finding methods for estimating the angles and delays, including low complexity gridless subspace based algorithms. Simulation results are given to show that the proposed time-varying BBT precoding provides robust and accurate channel estimates with a low complexity.


I. INTRODUCTION
High data rates have been achieved in mmWave communication systems [1] by employing a large number of antennas for beamforming. Due to heavy path loss, an mmWave channel typically consists of a few dominant paths. The estimation of an mmWave channel becomes a problem of finding the angles and gains associated with the paths. Experiments have shown that the angles of mmWave channels experience large-scale fading whereas the path gains experience small-scale fading [2]. In other words, AoA and AoD are relatively stationary, but the path gains can vary faster, leading to a time-varying channel.
In the design of angle estimation algorithms, there are often some implicit assumptions on the channel. For example, direction finding of narrowband mmWave systems has been developed for time-invariant channels [3], [4], [5], [6], [7], [8], [9] and time-varying channels [10], [11]. For time-invariant channels, angles estimation methods have been developed for uniform linear array (ULA) and uniform planar array (UPA). The sparse nature of mmWave has been exploited in compressed sensing based methods for angle estimation [3], [4]. In these methods, the angles are approximated by values drawn from dictionaries or grids, which restrict the estimation resolution. To eliminate the limitation of grids in compressed sensing based methods, super resolution methods have been proposed in [5], [6] based on atomic norm minimization and in [7] based on an iterative reweighted method. Super resolution angle estimation is proposed in [8] by designing the precoder and combiner so that efficient subspace-based methods can be applied for angles estimation. Beamspace angle estimation for multi-user transmission is considered in [9] when the transmitter is equipped with UPA and receiver with a single antenna. For time-varying channels, the path gains vary during training. In [10], [11] the path gains in different training blocks are assumed to be uncorrelated. When the path gains are uncorrelated, the covariance of the received data has a Toeplitz structure, which can be utilized for noise reduction [10]. Beamspace approaches to direction finding for time-varying mmWave channels are considered in [11]. The assumption of uncorrelated path gains requires the training time be longer than the channel coherence time so that there is enough variation in the path gains during training [10], [11]. The estimation accuracy is affected when the path gains are almost static due to little user movement. For methods that are developed specifically for time-varying channels or time-invariant channels, the estimation result is sensitive to the time-variation of the channel. Direction finding methods that are not sensitive to the time-variation of the channel have been considered in [12], [13]. A hierarchical beam training method using successive beams of different beamwidth is considered in [12]. The estimation accuracy is limited by the underlying grids. In [13] the precoder is obtained by selecting the columns of a Toeplitz matrix according to the spacing of a sparse linear array. An enlarged virtual ULA can be constructed for high resolution angle estimation. However, a large number of training vectors is generally needed therein as the estimation is based on statistics, and the method does not generalize to wideband channels or when the array is UPA. For wideband transmission, the delays of different paths need to be estimated as well. Subspace based and compressed sensing based channel estimation methods have been proposed for either time-invariant channels and time-varying channels. Subspace based methods are considered in [14], [15] for time invariant channels using closed-loop two-stage estimation. With the aid of feedback after the first stage, better estimates can be obtained. Subspace based methods have also been used for time-varying wideband channels and the variation in path gains is utilized for joint estimation of angles and delays [16], [17]. Compressive sensing based methods for MIMO-OFDM systems are proposed in [18], [19] for time-invariant systems and in [20], [21] for time-varying systems. Two channel estimation methods are developed in [18], one using a time domain approach that performs a 3D grid search, and the other a frequency domain approach that uses orthogonal matching pursuit (OMP) for each OFDM subcarrier. The estimation accuracy of the frequency domain approach can be further improved by exploiting the fact that AoA and AoD are the same in all subcarriers [19]. To track time-varying path gains, LMS adaptation is employed in [20] and Baysian learning is used in [21]. Due to adaptation, the estimation result is not sensitive to the channel variations. However the updates performed in these adaptive methods demand a high complexity as computations of large matrices are involved. In addition, in these compressed sensing based works [18], [19], [20], [21], path delays can not be estimated and the information is embedded in the subcarrier channels of the MIMO-OFDM system. In this case, the path gains of all the subcarrier channels need to be fed back to the transmitter and a large feedback overhead is needed [14].
In this paper, we propose a robust angle estimation method that is applicable to time-invariant and time-varying channels for mmWave systems equipped with UPA. We use precoders and combiners that are banded block Toeplitz matrices (BBT). We show that BBT precoding allows us to have partial control of the equivalent path gains. The equivalent channel is always time varying by employing time-varying BBT precoding. As a result the estimation of angles for mmWave channels is converted to a classic angle estimation problem that can be solved by many useful methods in the literature. Exploiting the time-varying property of the equivalent channel, the angles can be estimated using a small overhead, even if the channel is time-invariant. For wideband transmission, angles as well as delays need to be estimated. We show that we can design BBT precoders so that an artificial time-varying channel can be fabricated for wideband channels too. The path delays can be estimated along with the angles and they can be fed back with a small feedback overhead. The contributions of this paper are summarized as follows: r We propose to use time-varying BBT precoding so that both time-varying and time-invariant mmWave channels can be reliably estimated. As no assumption of prior channel information is used, it is suitable for angle estimation in the initial stage of channel acquisition.
r Through BBT precoding, the estimation of angles and delays of mmWave channels is converted to a classic harmonic retrieval problem that can be solved using well-developed methods in the literature with a low complexity. Adaptive estimation methods that are applicable to both time-invariant and time-varying wideband mmWave channels have also been developed in [20], [21]. However the algorithms in [20], [21] require the computations of large matrices and a high complexity is needed.
r Time-varying BBT precoding leads to artificial timevarying path gains and the proposed method needs only a small number of training blocks. For example, for narrowband channels, the minimum number of training blocks is equal to the number of paths L, for either time-invariant and time-varying channels.
r The path delays are estimated jointly with the angles. As the delays are explicitly estimated, they can be fed back to the transmitter directly using a small overhead. When the path delays are not estimated [18], [19], [20], [21], but embedded in subcarrier channels, a large feedback overhead is required [14]. Notation: The variance of a random variable x is denoted as σ 2 x . The 2-norm of a vector f is denoted as ||f|| and the k-th entry of f is [f] k . The notation A T , A * , A H denote, respectively, the transpose, the conjugate, the conjugate transpose. I n denotes an n × n identity matrix. Given an m × n matrix X, vec(X) is the nm × 1 vectorized version of X. The notation A B denotes the Khatri-Rao product and A ⊗ B the Kronecker product of two matrices A and B. The expectation of a random variable is denoted by E [·]. A useful property concerning the two products is as follows. Let X = ABC, then vec(X) = (C T ⊗ A)vec(B). When B is a diagonal matrix, we have vec(X) = (C T A)b, where b is a column vector that consists of the diagonal elements of B. Outline of the paper: A system model for narrowband transmission is given in Section II. In Section III, we introduce BBT precoding and develop angle estimation for narrowband channels. Section IV addresses the estimation of angles and path delays for wideband channels. Simulation results are given in Section V and a conclusion in Section VI.

II. SYSTEM MODEL FOR NARROWBAND TRANSMISSION
Consider the narrowband wireless system with N t transmit antennas and N r receive antennas in Fig. 1. For the m-th training block, the channel is modeled by an N r × N t matrix H[m]. We adopt the geometric channel representation that is useful for modeling mmWave propagation of L paths [2], is the complex gain of the i-th path. For the case the transmitter and receiver are each equipped with a uniform linear array (ULA), the two matrices A r , of size N r × L, and A t , of size N t × L, are given by where The angle parameters u i can be expressed in terms of AoA θ r i and the parameters p i in terms of AoD θ t i , where d is the antenna spacing and λ is the wavelength.
On the other hand, consider the case when the transmitter and receiver are each equipped with a uniform planar array (UPA) in the yz-plane [14]. Suppose the UPA at the receiver is N ry × N rz (N ry in the y direction and N rz in the z direction) and N r = N ry N rz , then A r , of size N r × L, is given by where A ry , of size N ry × L, and A rz , of size N rz × L, are defined as The angle parameters u y i and u z i can be given in terms of the angles in elevation 1 θ r i and angles in azimuth φ r i as u z i = (2π d/λ) cos θ r i , and u y i = (2π d/λ) sin θ r i sin φ r i . The i-th column of A r is given by Similarly for an N ty × N tz UPA at the transmitter with N t = N ty N tz , the matrix A t , of size N t × L, is given by where A ty , of size N ty × L, and A tz , of size N tz × L, are defined as A ty = a N ty (p y 1 ) a N ty (p y 2 ) · · · a N ty (p y L ) , The angle parameters p z i and p y i can be given in terms of the angles in elevation θ t i and angles in azimuth φ t i as p z i = (2π d/λ) cos θ t i , and p y Notice that we use only RF precoding and combining for angle estimation. Baseband precoder and combiner are not used and hence are not shown. Consider the transmission of training vectors in blocks of length N RF t , S = (s 0 s 1 · · · s N RF t −1 ), where s i is N RF t × 1. The m-th received block is given by where N[m] is the N r × N RF t noise matrix. The noise is white Gaussian with zero mean and variance σ 2 n . We assume the channel stays the same during a block of N RF t vectors but the path gains can vary for different training blocks. For simplicity, we will use S = σ s I N RF t . In this case, we have for In the following we review an angle estimation problem that is termed a harmonic retrieval problem in the literature. We also give a review of banded Toeplitz matrices that will be used later in Section III.
A brief review of harmonic retrieval problem [37]: Consider an antenna array of N r elements. Suppose there are L sources. The m-th received vector is given by is the channel noise, K is the number of received vectors, and A is the antenna response matrix that is determined by AoAs and the antenna array. For a ULA array, A is the same as A r in (2). The i-th column of A is (1 e ju i · · · e j(N r −1)u i ) ) T , which has the so-called one-dimensional (1D) shift invariance property, i.e., the bottom (N r − 1) elements identical to the top (N r − 1) elements except for a scalar of e ju i . Define Y = (y[0] y[1] · · · y[K − 1]). Then , of size L × K, consists of the source vectors. The estimation of the AoAs based on the received Y is a harmonic retrieval problem that can be solved using many methods. For example, it can be solved using ESPRIT, thanks to the shift invariance property. If the ranks of A and S d are both equal to L, then L paths can be identified. When a UPA array is used, A is the Khatri-Rao product of two ULA antenna response matrices like that given in (4). The estimation of AoA becomes a 2D problem. More generally, when A is the Khatri-Rao product of j ULA antenna response matrices, we have a j-dimensional harmonic retrieval problem. Each column of A has the j-dimensional shift invariance property. Estimation algorithms for such problems have been extensively studied in the literature, e.g., [24], [25], [26], [27].
A review of banded Toeplitz matrices: In the presence of precoding and combining, the received vectors are no longer in the span of shift-invariant antenna response vectors and subspace based methods are not applicable. For ULA, 1D shift invariant property can be preserved if the precoder is lower triangular banded Toeplitz [22]. An N × M lower triangular banded Toeplitz matrix B is given by Due to the Toeplitz structure, the matrix can be obtained from T , which will be referred to as the prototype of the matrix.

III. ANGLE ESTIMATION USING BBT PRECODING AND COMBINING
Consider the received block of vectors in (10) with UPA arrays. The UPA receive antenna response matrix A r is the Khatri-Rao product A ry A rz as given in (4) and A t = A ty A tz as in (7). Suppose the number of RF chains at the receiver can be factorized as N RF r = N RF ry N RF rz . We propose to use a combiner of the following form rz , are banded Toeplitz as in (12).  (13). We have where where the two matrices F y [m] and F z [m] are banded Toeplitz as in (12), of sizes N ty × N RF ty , and N tz × N RF tz , respectively. Let f y,m be the prototype of F y [m] and f z,m be the prototype of F z [m]. The Fourier transform of f y,m and f z,m are denoted as F y,m (ω) and F z,m (ω), respectively. Similar to the result in Lemma 1, we have Combining the results in (14) and (16) (10) is given by Observe thatĀ r is the antenna response matrix of an N RF ry × N RF rz UPA andĀ t is the antenna response matrix of an N RF ty × N RF tz UPA. The signal partĀ r D[m]Ā H t in (17) is the same as the received data of a system equipped with UPAs, but with no precoding and combining. The equivalent i-th path gain is [D[m]] i,i . The expression in (18) implies that we have partial control of the equivalent path gains through the precoder and combiner. We can introduce artificial fading to the equivalent path gains by using precoder and combiner that vary block by block. Consider the following prototype for G y [m], where ε m,n is randomly drawn from {0, 1, . . . , 2 N Q − 1} with equal probability, N Q is a chosen positive integer, and γ = Collecting K training blocks, we have Y = [y[0] y[1] · · · y[K − 1]], which can be written as  (19). Then Y (nb) in (22) is equivalent to the received matrix of a time-varying channel, whether the channel is timeinvariant or time-varying. The angles can be estimated using methods developed for harmonic retrieval problems.
An angle estimation algorithm based on Theorem 1 is given in Algorithm 1. If ULA are used at the transmitter and receivers, A r reduces to A rz , A t reduces to A tz , and thus A is the Khatri-Rao product of two ULA antenna response matrices. The problem in (22) becomes a 2D harmonic retrieval problem. When we use Y (nb) to estimate the channel, the number of identifiable paths is related to the size of Y (nb) , as shown in the following lemma.
Lemma 3: Consider the angle estimation problem in (22). We can identify L paths in the absence of noise if Proof: See Appendix B.
The above results means that the number of training blocks K can be as small as L. When the transmitter and receiver are equipped with ULAs, we can use the proof in Appendix B to show that the condition in (23) becomes L ≤ min(N RF t + N RF r − 2, K ). Lemma 3 also means that K should be at least L. The overhead can be further reduced at the cost of a smaller measurement dimension by applying spatial smoothing [16]. After estimates of the angles are obtained, the path gains can be estimated using a least square approach. For the estimation of path gains, the receivers need to know the precoder used. For this, we can generate the random phase in (19) using a random number generator and the seed can be made available to the receiver ahead of time.
Remarks: In the above discussion, we have considered angle estimation for only point-to-point transmission. The proposed method can also be used for uplink multi-user hybrid mmWave MIMO systems. This can be done by designing training vectors as in [34]. The training vectors of different users are time-multiplexed so that at any instant, the received vector is from only one user. As a result, angle estimation for the multi-user case can be done for each user separately. Notice that the coefficients of the analog RF precoder are changed between two training blocks. To allow time for changing the values of the RF phase shifters, we can leave time between two training blocks and send out training blocks as fast as it is allowed. Time-varying precoders are also employed in [3], [4], and [12]. Also note that an BBT precoder contains some nonzero and zero coefficients. The nonzero coefficients can be implemented using N Q -bit phase shifters. For the implementation of the zero coefficients, switches may be used [30].

IV. CHANNEL ESTIMATION FOR WIDEBAND TRANSMISSION
In the narrowband case the delays of different paths are small compared to the symbol period T and they are ignored. For wideband transmission, the delays need to be taken into consideration. In this section, we show that BBT precoding can be used for the joint estimation of angles and delays whether the channel is time-varying or time-invariant. Suppose the training time is divided into K training intervals. For the q-th training interval, a wideband geometric channel of L paths is of the form [18] where A r and A t are as in (4) and (7), D q (t ) is an L × L diagonal matrix with the i-th diagonal element given by is the gain of the ith path gain in the q-th training interval, p(t ) is the pulse shaping function, and τ i is the delay associated with the i-th path. Suppose p(t ) has an excess bandwidth with respect to the symbol rate β and a finite support [0, L p T ). Let τ max be the maximum delay spread and N c T = L p T + τ max , then H q (t ) is nonzero for t ∈ [0, N c T ). The path gains remain the same in each training interval but can change to different values in different training intervals due to fading [29]. For the q-th training interval, the precoder and combiner are, respectively, F[q] and G[q], which are used for the entire q-th training interval. Suppose M = N c N RF t training vectors are transmitted in each training interval and the received signal is sampled with P times the symbol rate, i.e., the sampling period T s = T P . The received vector at the q-th training interval is given by where t = 0, T s , . . . , (MP − 1)T s , and n = t T . Consider the following training sequence where [e i ] i = 1 and [e i ] j = 0 for j = i. That is, N c − 1 zero vectors are inserted between every two nonzero training vectors to prevent inter-symbol interference. The resulting training sequence for the q-th training interval is thus σ s e 0 0 · · · 0 σ s e 1 0 · · · 0 · · · σ s e N RF t −1 0 · · · 0 (27) Letȳ q (t ) = [y T q (t ) y T q (t + N c T ) · · · y T q (t + N c (N RF t −1)T )] T for t = 0, T s , . . . , (N c P − 1)T s . We collect all the received vectors in the q-th training interval into an N RF r N RF t × N c P matrix, It turns out that the column space ofỸ[q] carries the angle information while the the row space ofỸ[q] carries the delay information, as stated in the following lemma.

Lemma 4: Suppose the precoder F[q] and combiner G[q] are BBT as in (13) and the training sequence in (26) is used. ThenỸ[q] can be expressed as
where A is as defined in (21), D[q] is a diagonal matrix whose diagonal elements are as given in (18), and P = (p(τ 1 ) p(τ 2 ) · · · p(τ L )) is an N c P × L matrix with the i-th column given by p( The expression in (28) is free from inter-symbol interference. Using the training sequence in (26), we have transformed a wideband angle estimation problem to a narrowband problem. The information of angles is buried in the antenna response matrix A while the information of delays is embedded in the matrix P. We can further process P to construct an antenna response matrix corresponding to the delays in a manner similar to [36]. To do this, let U o be the N c P × N c P DFT matrix, where N c ≥ N c , and let U be the N c P × N c P submatrix of U o that consists of the first N c P rows. Observe that the i-th column of P is p(τ i ), which consists of samples of the pulse shaping function with sampling period T s . The i-th row of P T U contains the N c P-point DFT of p T (τ i ). We assume that the oversampling factor P ≥ (1 + β ) so that there is no aliasing when p(t ) is sampled. In this case, each row of P T U contains at most N c (1 + β ) nonzero coefficients. Defined the N c P ×N d selection matrix where D ξ is an L × L diagonal matrix with the i-th diagonal element give by where (N [q]) and d [q], consisting of the diagonal elements of D[q]D ξ , is given by Then can be expressed as where [1] · · · n [K − 1]] and we have added the subscript 'wb' to indicate the wideband case. The L × 1 vector d [q], containing the equivalent path gains of the q-th training interval, is partially controlled by the precoder and combiner. Artificial fading can be introduced by using BBT precoder and combiner that vary in different training intervals. Notice that (33) has the same form as the harmonic retrieval problem in (11). It is a five-dimensional harmonic retrieval problem asĀ d A contains five ULA antenna response matrices. We can use, e.g., [27], to estimate the four types of angles and delays. When the transmitter and receiver are equipped with ULA instead of UPA,Ā d A contains three ULA antenna response matrices and the harmonic retrieval problem is three-dimensional. Summarizing, we have the following result.
Theorem 2: Suppose we design the training sequence as in (26) and use BBT precoder and combiner with prototypes as in (13). Then, Y (wb) in (33) is equivalent to the received matrix of a time-varying channel with path gain vector d [q] for q = 0, 1, . . . , K − 1. The angles and delays can be estimated as in a typical harmonic retrieval problem, whether the channel is time-invariant or time-varying.
Notice that (32) has a form similar to that in (18). However the index 'q' of d [q] in (32) is the index of training interval, whereas the index 'm' of d[m] in (22), is the index of training block. The procedure for the proposed wideband estimation is given in Algorithm 2. Following a procedure similar to the proof of Lemma 2, we can show that L paths can be identified if That is, the number of training intervals K can be as few as L. In the case ULAs are used, the condition in (34) becomes L ≤ min(N RF t + N RF r +N d − 2, K ). Remarks: Oversampling has also been considered in [17] [16], [36] for joint estimation of angles and delays. However, these works require the path gains be time-varying. The time separation of the training intervals needs to be larger than the channel coherence time so that the channel is time-varying.
Computational Complexity: The complexity of the proposed angle estimation method depends on the algorithm used for harmonic retrieval. We analyze the complexity for the case when ESPRIT is used. We first form the received matrix Y (nb) as in (22) for the narrowband case, or Y (wb) (33) for the wideband case. Then we compute the empirical covariance , based on which the signal subspace is extracted and ESPRIT is applied. We list the complexity for narrowband and wideband cases in Table 1. For the step of subspace extraction, orthogonal iteration [37], [38] with i iterations is used in Table 1. We have included in Table 1 the complexity of angle pairing using the method in [33].

V. SIMULATION RESULT
In the following examples, we use the mmWave channel model in (1) where y q (t ) is obtained by sampling the received vector signal as in (25). 4: Construct the data matrix of the q-th training interval as as in (31). 6: Form the overall data matrix where y [q] is the vectorized version of Y [q]. 7: Obtain estimates of AoAs, estimates of AoDs and estimates of delays from Y (wb) using harmonic retrieval methods.
spacing. For the wideband experiment in Example 5, practical mmWave channels with clustered paths are generated using the simulator in [39]. For time-varying channels, the variation of the path gains is modeled using the first-order Gauss-Markov process as in [20], The channel correlation coefficient is evaluated using Jake's model, = J 0 (2π f d T b ), where J 0 (.) is the zeroth order Bessel function of the first kind, f d is the maximum Doppler frequency, and T b is the time separation between two consecutive blocks. The random variable w i [m − 1], independent of α i [m − 1], is a circular symmetric complex Gaussian random variable with unit variance. The prototypes of the random precoder and combiner are designed using N Q = 4 in (19).  Fig. 2(a) shows the average mean squared error (MSE-A) of the angles 1 We have used θ t 1 = 75 • , θ t 2 = 55 • , θ r 1 = 80 • , and θ r 2 = 60 • . The plots are given for SNR σ 2 s /σ 2 n = 0 dB, 10 dB, and 20 dB. As a benchmark, the Cramér-Rao Bound (CRB) [40] of the estimation error has been plotted. The gap between the estimation error and the CRB bound is around 1 dB. Fig. 2(b) shows the normalized mean square error (NMSE) of the channel for a time-invariant channel ( = 1 in (35)) with L = 3 paths. The

NMSE is defined as
whereĤ is the estimated channel matrix. The AoA and AoD are uniformly distributed over [−π, π] and SNR σ 2 s /σ 2 n = 0 dB, 10 dB, and 20 dB. We see that the NMSE decreases with K for a time-invariant channel. The proposed method works for both time-invariant and time-varying channels.
Example 2: Consider the case when UPAs are used at both the transmitter and receiver for the narrowband channel in (1) with N t = N r = 64 (N ty = N tz = N ry = N rz = 8) and N RF t = N RF r = 4. We use BBT precoder and combiner with N RF ty = N RF tz = N RF ry = N RF rz = 2. Fig. 3(a) shows the average mean squared error as a function of the number of training blocks. The channel is time-varying and there are L = 2 paths The CRB [40] has also been plotted for comparison. In the above simulations we have assumed that knowledge of the number of paths L is available. When L is not known a priori, it can be estimated using model estimation [28]. Fig. 3(b) shows the NMSE of a time-invariant channel when L is known and when L is estimated using model estimation [28]. The angles are uniformly distributed  over [−π, π] and K = 40. We see that having no prior knowledge of L leads to a small penalty in estimation error.
Example 3: Consider the wideband model in (24) when the transmitter and receiver are both equipped with UPA and the channel is time-varying with L = 2. The pulse shaping function p(t ) is a raised cosine function with excess bandwidth β = 0.3 and oversampling factor P = 2. The setup of antennas and RF chains is the same as that in Example 2, i.e., 64 antenna elements, and 4 RF chains. The AoAs and AoDs are the same as those in Fig. 3(a). The two path delays are τ 1 = 1.3T and τ 2 = 2.4T. Fig. 4(a) gives the MSE-A of the angles and Fig. 4 T , as given in (30). For delay estimation, we choose N c = 16 in the construction of DFT matrix and N d = 16 for the selection matrix in (29). It can be seen from Fig. 4 that both errors are close to the CRB [35], [40] for a wideband channel.  at both transmitter and receiver. Fig. 5 shows the MSE-A for p 1 = −0.2π , p 2 = 0.1π, u 1 = −0.1π, u 2 = 0.2π, SNR σ 2 s /σ 2 n = 5 dB, and K = 4 training blocks. The result is compared with those of [6], [8], and [11]. The channel is assumed to be time-invariant during training in [6], [8], and assumed to be time-varying in [11]. We see that the error of [11] increases with as the path gains become more static in time. Also, the performances of both [6] and [8] are degraded when is small and the path gains vary faster. With the proposed method, the error is not significantly affected by . Thanks to artificial fading introduced by time-varying BBT precoding, the proposed method is much more robust to channel correlation.
Example 5: In the data transmission to be followed after channel estimation, the precoder and combiner are designed according to the estimated channel, which results in degradation. In this example, the channel is time-invariant, N t = N r = 32, N RF t = N RF r = 8, L = 5, and ULAs are used at the transmitter and receiver. We consider narrowband transmission in Fig. 6 and wideband transmission Fig. 7.
For a narrowband scenario, Fig. 6(a) shows the achievable rate, which is computed using log 2 (det(I N RF where ρ = σ 2 s /σ 2 n , G opt consists of the dominant left singular vectors ofĤ, F opt consists of the dominant right singular vectors ofĤ, and R = G H opt G opt . The result is compared with [3], [4], [7], [12]. In [12], hierarchical beam training is used for closed loop angle estimation; feedback is required before the training of the next stage. A total of 300 training vectors are used. For [3], [4], the angles are estimated using compressed sensing based methods with a dictionary size of 64 for both AoA and AoD. OMP algorithm is used in [3] and generalized block OMP (G-BOMP) algorithm is used in [4]. For the iterative reweighted method [7], the estimates are gridless. The number of training vectors for [3], [4] [7] and the proposed method is 128. Fig. 6(b) plots the NMSE for the five methods in Fig. 6(a). In terms of complexity, it is in the order of 8.6 * 10 4 for the proposed method (computed from  8.2 * 10 7 for [7]. The performance of the proposed method is comparable to that of the super resolution method in [7], but it enjoys a much lower complexity. In Fig. 7, we consider joint estimation of delays and angles for a wideband channel. The mmWave channel simulator NYUSIM [39] is adopted to generate non line-of-sight channels with clustered paths. We have used carrier frequency 28 GHz, bandwidth 500 MHz and sampling frequency 1 GHz in the simulations. We plot in Fig. 7(a) the achievable rate, which is given by [19] where N f n is the κth subcarrier channel, for κ = 0, 1, . . . , N f − 1. We have used N f = 128. The pulse shaping function p(t ) is a raised cosine function with excess bandwidth β = 0.3. We choose P = 2, N c = 64 andN d = 64 for delay estimation. Twelve training intervals, i.e., 5472 training vectors, are used in Fig. 7(a). Also shown in Fig. 7(a) are two compressed sensing based methods, [18] and [19]. For [18], [19], the dictionary size is 64 for both AoA and AoD, and the number of training vectors is 5580. In [18], [19], the actual delays are not estimated, but embedded in the subcarrier path gains. For the feedback of delay information, all the subcarrier path gains need to be fed back to the transmitter as explained in [14]. It can be seen from Fig. 7(a) that the proposed method can achieve a rate close to that of perfect CSI. In terms of complexity, it is in the order of 2.1 * 10 7 for the proposed method (computed from Table 1), and 1.5 * 10 8 for both [18] and [19]. In Fig. 7(b), we plot the achievable rate using the parameters in Fig. 7(a) except that the number of training intervals is reduced to one, i.e., 456 training vectors, for the proposed method. The number of training vectors used for [18], [19] is 558. The order of complexity for the proposed method is the same as in Fig. 7(a), i.e., 2.1 * 10 7 , and it is 3.3 * 10 7 for both [18] and [19]. For the proposed method, using a small overhead causes some loss in the achievable rate, but the gap with the curve of perfect CSI narrows as SNR increases. In addition, our feedback overhead is much lower since the actual delays are estimated for each path. Therefore, our method enjoys a much lower complexity and feedback overhead by comparison.

VI. CONCLUSION
In this paper, we propose an angle estimation method that can be applied to time-invariant or time-varying channels. We use BBT precoder and combiner with time-varying prototypes to convert the angle estimation problem for mmWave systems to a classic harmonic retrieval problem. Only a small training overhead is needed. We have considered narrowband transmission with ULA and UPA, and wideband transmission with ULA and UPA. For each case the angles can be estimated efficiently using methods developed in the literature for time-varying channels. We have demonstrated that gridless estimates can be obtained reliably with low complexity and low training overhead whether the channel is time-invariant or time-varying.   (14).

APPENDIX B PROOF OF LEMMA 3
We first consider the detection of AoD parameters p y 1 , p y 2 , . . . , p y L . For the noiseless case, we have Y = AS d . Suppose ESPRIT is used. We compute the SVD to Y to obtain an N RF r N RF t × L matrix E s that has the same column space as A. If N RF ty ≥ 2, we can define the (N RF ty − 1) × N RF ty selection matrices J 1 = I N RF ty −1 0 and J 2 = 0 I N RF ty −1 . Also define m = J m ⊗ I N RF tz ⊗ I N RF ry ⊗ I N RF rz , m = 1, 2. It is known that [27] if 1 E s has full rank, then e j p y 1 , e j p y 2 , . . . , e j p y L are the eigenvalues of ( 1 E s ) † 2 E s , where ( 1 E s ) † denotes a left inverse of 1 E s . Notice that 1 E s has full rank if the rank of 1 Y is L, which is true if the ranks of S d and A are both L, where A =Ā * ty Ā * tz Ā ry Ā rz andĀ ty = J 1Āty . The rank of S d is equal to L if L ≤ K. Observe that A is the Khatri-Rao product of four Vandermonde matrices. The rank of a Khatri-Rao product has been shown to be related to Kruskal rank (k-rank) [32]. The k-rank of a matrix X, denoted by krank(X), is defined as the maximum number k such that every k column vectors of X are linearly independent, whereas the rank of X is the maximal number of linearly independent columns. The k-rank of a matrix satisfies rank(X) ≥ krank(X) and for a Vandermonde matrix X, krank(X) = rank(X). It has been shown in [32] that, for two matrices N ∈ C n×l and M ∈ C m×l , krank(N M) ≥ min(l, krank(N) + krank(M) − 1), (37) if krank(N) ≥ 1 and krank(M) ≥ 1. This implies krank(Ā ry Ā rz ) ≥ min(L, krank(Ā ry ) + krank(Ā rz ) − 1). Using this result in (37)  Therefore rank(A ) = L if L ≤ N RF ty + N RF tz + N RF ry + N RF rz − 3. Combining this with the condition L ≤ K yields (22). In a similar manner, we can verify that using the same condition, we can obtain AoD parameters p z 1 , p z 2 , . . . , p z L and also AoA parameters u y 1 , u y 2 , . . . , u y L and u z 1 , u z 2 , . . . , u z L . The result in Lemma 3 follows.

APPENDIX C PROOF OF LEMMA 4
As the noise part is straightforward, we consider only the signal part, i.e., the noiseless case. When the training sequence is as in (26), the received vector is ISI-free and y q (t + iN c T ) can be written as y q (t + iN c T ) = σ s G H [q]H q (t )F[q]e i . Then Y q (t ) = y q (t ) y q (t + N c T ) · · · y q (t + N c (N RF t − 1)T ) is equal to σ s G H [q]H q (t )F[q]. Using BBT precoder and combiner, we haveȲ q (t ) = σ sĀr D H g [q]D q (t )D f [q]Ā H t . Notice that the column vector that consists of the diagonal elements of σ s D H g [q]D q (t )D f [q] can be expressed as D[q]p(t ), wherē p(t ) = (p(t − τ 1 ) p(t − τ 2 ) · · · p(t − τ L )) T . Thus, the vectorized version ofȲ q (t ) can be written asȳ q (t ) = AD[q]p(t ). Puttingȳ q (0),ȳ q (T s ), · · · ,ȳ q ((N c P − 1)T s ) together, we obtain (28).

APPENDIX D PROOF OF (30)
Let P( j ) be the continues time Fourier transform of p(t ), then the Fourier transform of p(t − τ i ) is P( j )e − j τ i . When the sampling rate is larger than the Nyquist rate and there is no aliasing, the discrete time Fourier transform of p(nT s − τ i ) is ∞ n=−∞ p(nT s − τ i )e − jnω = 1 T s P( j ω T s )e − j ω Ts τ i , where |ω| ≤ π. By definition, the i-th row of P T U is p T (τ i )U. Notice that p T (τ i )U computes the N c P-point DFT of p T (τ i ) as U is a submatrix of a DFT matrix. Thus It follows that Using the above expression, we arrive at (30).