A Novel Beam Alignment Scheme for Mobile Millimeter-Wave Communications Based on Compressed Sensing Aided-Kalman Filter

In millimeter wave (mmWave) communications, fast and reliable beam alignment is crucial to the achievable data rate. The problem is even more challenging when either the transmitter or the receiver, or both move that need timely update of the best beam pair. Some existing work tracks the channel changes by Kalman filter (KF). Alternatively, compressed sensing (CS) approaches are useful to reconstruct the channel based on the sparsity of mmWave channels. However, the aforementioned methods either need a full scan over all possible beam pairs or require frequent beam training, leading to high overhead. In this paper, a novel beam alignment scheme based on an integrated KF and CS framework is proposed for the single-user mmWave channel. Since the CS performance is heavily dependent on the sparsity level, the proposed method increases the signal sparsity level by applying adaptive CS on the observation residual computed from the previous estimate of the support to predict the angles of the dominant paths, while the corresponding path gains are tracked by the reduced-order KF. To further exploit the spatial correlation of sparse signals, our approach considers the adaptation of the sensing matrix based on the previous estimate to enhance the estimation accuracy, and the weighted CS is adopted for shrinking the possible range of solutions. To balance the estimation accuracy and alignment overhead, a switching mechanism is proposed that determines the timing for switching the estimation policy between beam training and beam tracking. Simulations are performed to evaluate the performance of the proposed method subject to numerous important factors, such as signal-to-noise ratio (SNR), number of available beams, overhead for beam alignment, beam variation rate, and the dominance of line-of-sight path.


I. INTRODUCTION
M ILLIMETER Wave (mmWave) is a promising technology to fulfill the ever-growing demand of high data rate transmission. It is the core technology of the Fifth Generation (5G) cellular networks and beyond with numerous appealing use cases, such as Vehicle-to-Everything (V2X) [1], mobile Augmented Reality (AR)/Virtual Reality (VR), urgent health care, and so on [2], [3]. Compared to the sub-6 GHz counterpart, mmWave signals suffer much higher attenuation due to distance-dependent propagation loss and blockage [4]. To remedy, the use of highly directional transmission and reception through beamforming has been widely considered. Thanks to the small wavelength of mmWave, it is feasible to install many antennas on mmWave devices so as to leverage beamforming gain provided by large antenna arrays.
To achieve reliable and high data rate using mmWave, channel state information (CSI) is necessary for designing the transmitting and the receiving beams. For mmWave communications with large antenna arrays, acquiring CSI can be costly as the training overhead of conventional pilotbased channel estimation schemes grows with the number of antennas. To address this issue, some solutions have been proposed. For example, CSI can be obtained using beam space channel representation [5], [6] or referred to as virtual channel [7], which has a lower dimension than the antenna space channel. Alternatively, the mmWave channel can be cast into a parametric model using the angles of departure (AoD), angles of arrival (AoAs), and the path gains [8].
Since the number of paths in mmWave channels is small (i.e., sparse channel), the number of channel parameters to be estimated is not large that facilitates CSI acquisition. The sparse mmWave channel also leads to the recent development of compressed sensing (CS)-based channel estimation schemes for mmWave [9], [10]. A challenge of CSI acquisition is when the channel is time-varying due to movement, in which case CSI needs to be updated timely to ensure the beams used by the transmitter and the receiver are aligned. Unless the training rate matches the channel variation speed, beam misalignment may frequently occur that degrades achievable data rate and may even cause link failure. In typical wireless systems, such as WiFi and LTE, training signals are transmitted periodically and it is unlikely to predict the channel variation speed accurately enough. Further, simply increasing the training frequency is not desired because of high training overhead and suspended data transmission during beam training.
To acquire CSI timely in time-varying mmWave channels, efficient beam alignment strategies are of vital importance. Similar to channel tracking in time-varying multi-input and multi-output (MIMO) systems, channel statistics can be used to track the channel changes of time-varying mmWave channels. Furthermore, the sparse feature of mmWave channels suggests that it is possible to track channel changes using a small number of training signals. For mobile mmWave communications, how to accurately track the channel changes with low overhead is a great challenge, which is the focus of this work.

A. RELATED WORK
The need for efficient beam alignment methods for time-vary mmWave channels has stimulated numerous studies recently. The technique of subspace tracking used in conventional MIMO systems is applied to mmWave hybrid beamforming systems in [8], which updates the analog precoder and combiner sequentially according to temporal channel statistics in terms of correlation coefficient. One limitation of this approach is that the orthogonal basis of the subspace to be tracked needs to be updated one by one for ensuring good tracking accuracy, resulting in high tracking overhead. Tracking overhead can be reduced by exploiting the sparsity of mmWave channel as done in the compressed sensing (CS)-based beam tracking. CS-based beam tracking does not track AoAs/AoDs directly. Instead, it tracks the support set (i.e., no-zero elements) of the equivalent MIMO mmWave channel [5] or functions of AoAs/AoDs, such as the spatial frequency [7] or the spatial direction [11], [12]. Existing CS-based beam tracking algorithms have some weaknesses as follows. (i) While the sparsity is exploited in [5], the support set is tracked using both uplink and downlink channel sounding that still results in high tracking overhead. Besides, they focused on tracking angular variations while the channel gain information is also important to mobile mmWave communications. (ii) To track the spatial frequencies or the spatial directions of the mmWave channel, the receiver needs to feed back the measured channel matrix [7], [12]. This significantly causes high feedback overhead. (iii) The work in [11] presumes a linear motion model and the presence of LoS component, which may not always be valid in practice. (iv) To improve the estimation accuracy, CS is combined with Bayesian inference (BI), leading to a so-called turbo-BI algorithm for tracking the sparse massive MIMO channel. Turbo-BI employs two modules, including a linear minimum mean square error (MMSE) estimator followed by a sparsity combiner. The output of the sparsity combiner is the posterior distribution of the beam space channel, which is used to refine the MMSE estimation and then these two modules are iteratively performed until reaching a convergence, leading to high computational complexity.
Kalman filter (KF) is another popular method to track channel variations for its simplicity. Several KFbased mmWave channel tracking algorithms have been reported [13], [14], [15], [16], [17]. In KF, the system dynamic is captured by the so-called state, which is estimated based on a series of measurements. The native KF assumes the linear relationship between the state and collected measurements. In the context of channel tracking, such a linearity may not hold because the measurement (i.e., the received training signal) is not linear with respect to the state (i.e., AoAs/AoDs). In this case, methods such as linear approximation [13] and Gaussian approximation [15] are widely used. However, existing KF-based tracking algorithms for mmWave channels have the following limitations. (i) Only the angular variation of the channel is tracked [13], [14] and thus, an additional mechanism is required to estimate channel gain variations. The work in [16] tracks the complex channel coefficients assuming they are governed by the correlation coefficient, which needs to be known in priori. (ii) The state is updated using full beam training, leading to the training overhead growing with the channel dimension. Besides, the sparsity of the mmWave channel is not exploited in [17].
Besides CS and KF, other beam tracking methods based on channel distributions have also been studied. In [18], a probabilistic estimation approach is proposed to track the temporal variation of the mmWave channel using the angular information embedded in the preamble. This approach saves tracking overhead because no spatial scanning is required but the received RF signal needs to be stored before baseband processing. In [19], the idea of dedicated beams is proposed to track users with different mobility. For high-mobility users, additional training symbols, referred to dedicated beams, are used to estimate the posteriori distribution of the time-varying mmWave channel. Their beam tracking algorithm performs better than the conventional KF-based beam tracking [14] yet the posteriori channel distribution needs to be recursively calculated. Another beam tracking algorithm based on posteriori update is presented in [20] for mmWave-supported inter-unmanned aerial vehicle (UAV) communications. Compared to [19], the beam tracking algorithm [20] is more lightweight but it only tracks the angle variation.

B. PAPER CONTRIBUTION
This work proposes a novel beam alignment approach for the mobile mmWave channel. The paper has the following contributions.
• Our approach is based on [21], which uses a fixed sensing matrix to track the time-varying supports in a sparse signal. In our work, the sensing matrix is adaptively determined according to the channel variations. During beam training, the sensing matrix is formed by a subset of available analog beams at the BS and the UE instead of all of them as required in full beam scanning. To track the channel variation, the proposed beam tracking method uses a very small sensing matrix according to the tracking result in the previous slot. To improve the tracking accuracy using a small sensing matrix, we propose a weighted CS, where the weights are designed also based on the past tracking results. This is different from [21] that uses a regular CS. Moreover, we propose a switching mechanism to alternately perform beam training and beam tracking according to a hypothesis test result. A similar idea is used in [21] for detecting the change of the support set but not for switching between two different procedures. Besides, we explain how to determine the threshold used in the hypothesis test that is not specified in [21]. • Our approach can track both angular and gain variations of the sparse mmWave channel. This is achieved by using CS to find the angles of the dominant paths and a reduced-order KF to track the path gains. Notice that CS and KF are coupled in the sense that CS is performed based on KF output. Most existing approaches track the angular variations only but not the channel gains [5], [11], [13], [14], [17], [20]. • The proposed approach does not rely on a specific channel variation model as assumed in [11] and thus it is not restricted to a certain moving scenario. Besides, it updates the angles and path gains of the underlying channel by jointly using CS and KF as mentioned without recursive procedures required in [18], [19]. Notation: Upper-case and lower-case boldface letters A and a denote a matrix and a vector, respectively. (·) T , (·) * , (·) H , (·) −1 , and · p denote the transpose, conjugate, conjugate transpose, matrix inversion, and l p norm operation, respectively. The diag{a} denotes the diagonal matrix whose diagonal elements consist of the elements in the vector v.
E{·} denotes the expectation. Let S denote the support set of x, i.e., the indices of its non-zero elements. a S denotes the size(S) length sub-vector containing the elements of a corresponding to the indices in S. For a matrix A, A S represents the sub-matrix obtained by extracting the columns of A corresponding to the elements in S. We also use the notation A S 1 ,S 2 to denote the sub-matrix of A containing columns and rows corresponding to the indices in S 1 and S 2 respectively. The set operations ∪, ∩ and \ have the usual meanings (note S 1 \S 2 denotes elements of S 1 not in S 2 ). S c denotes the complement of S. We use the notation CN (m, R) to denote the complex Gaussian distribution with mean vector m and covariance matrix R. Finally, I N is the N × N identity matrix.

II. SYSTEM MODEL
In this work, a mmWave system with one fixed base station (BS) and a single moving user equipment (UE) is considered. The BS and the UE have N BS and N UE antennas, respectively, in uniform linear array (ULA). For the N-element ULA, the N × 1 steering vector is given by where . . , N} and the spatial frequency representing the physical direction, λ is the signal wavelength and d is the antenna spacing.

A. CHANNEL MODEL
The BS and the UE are synchronized in time, which is divided into equal-length time slots. Denote H t the channel coefficient at slot t, which remains constant within a slot but varies in different slots. Following the widely used Rician fading model, where K is the Rician factor, α l,t , ψ l,t , and φ l,t are complex path gain, the angle of arrival (AoA), and the angle of departure (AOD), respectively, of the lth path, l = 0, 1, . . . , L, in slot t ≥ 0. At the mmWave frequencies, the number of resolvable paths is often small such that the channel is sparse in the angular domain. To see this, divide the spatial angle from 0 to 2π into Q spatial bins. With sufficiently large Q, it is reasonable to assume ψ l,t , φ l,t ∈ {0, 2π Q , . . . , 2π(Q−1) Q }. This allows us to transfer the spatial-domain channel matrix H t to the angular domain as given by where Here a UE (ψ i ) and a BS (φ j ) are the steering vectors corresponding to the ith angular bin and the jth angular bin for AoA and AoD, respectively. Thus, the (i, j)-element in H t represents the path gain between the j-th angular bin of the BS and the i-th angular bin of the UE. When the number of paths L is small, the condition (L + 1) Q implies only L + 1 elements in H t have non-zero values and the rest are zero. The sparsity of H t suggests that the channel between the BS and the UE can be characterized by L + 1 dominant beam pairs, each with a specific gain and corresponding angular bins. When the UE moves, both the gains and the angles of these L + 1 beam pairs may change in time, leading to a time-varying mmWave channel that needs to be tracked.

B. TRANSMISSION PROTOCOL
In this work, beam alignment refers to the procedure that determines the transmit beamformer at the BS for data transmission and the receive combiner at the UE for data reception. Fig. 1 illustrates the conventional beam alignment using full beam scanning and the proposed one. Suppose the BS can steer the analog beam to a finite number of directions denoted as Q BS by adjusting the phase shifters of the antenna array. If full beam scanning is used, the BS transmits the training signals using Q BS analog beams where the ith transmit beam is The UE receives the training signals using Q UE combining vectors, each steering the analog beam to a specific direction. The jth receive beam is denoted as w j = a UE (ψ j ) ∈ C N UE , j = 1, 2, . . . , Q UE . Thus, the conventional beam alignment needs to train Q BS × Q UE beam pairs. Initially, the proposed beam alignment scheme performs full beam scanning. Consider a particular beam pair composed by the i-th transmit beam and the j-th receive beam. For the sake of clarity, the slot index is dropped in the rest of this section. The received training signal can be expressed as where n ji is the complex Gaussian noise and s is the training signal. Following [19], we let s = 1 for simplicity. By collecting the received signals through Q BS × Q UE beam pairs, we have the following matrix representation. where is the combining matrix of dimension N UE × Q UE , and N ∼ CN (0, σ 2 obs I) is an Q UE × Q BS Gaussian noise matrix with elements being independent identically distributed (i.i.d.). By concatenating the columns of Y and letting y = vec(Y), n = vec(N), we have a vectorized expression as given by [3] where ⊗ denotes the Kronecker product and x = vec(H) with H being the angular-domain channel matrix in (3) or known as the beamspace channel. The vectorized form (6) will facilitate the process of casting the beam alignment/training problem into a CS-aided KF (CS-KF) framework as will be seen in Section III. As explained in Section II-A, H is sparse and thus, only a few elements in x is non-zero. The indices of these non-zero elements provide the AoA and AoD information to be estimated that will be explained in the next section.
Once the initial beam pairs are acquired, the UE can decide the combining vectors for the following data reception. The UE also feeds back the indices of non-zero elements in x to the BS, which uses the feedback information to decide the beamforming vectors for data transmission. After the data transmission, beams are aligned by performing either beam training or beam tracking, according to the switching mechanism presented in Section III-C. Fig. 2 illustrates the proposed beam alignment scheme with details given in the sequel.

III. PROPOSED BEAM ALIGNMENT SCHEME
We propose a beam alignment approach that jointly estimates the angular and gain variations of the time-varying mmWave channel by combining CS and KF, where the former leverages the channel sparsity and the latter deals with the temporally correlated sparsity. The proposed beam alignment scheme switches between the beam training procedure and the beam tracking procedure, each using a different sensing matrix. In this section, we elaborate these two procedures and how they are alternately triggered.

A. BEAM TRAINING PROCEDURE
In the beam training procedure, CSI (including AoA/AoD and path gains) is acquired by using M train number of training signals. Specifically, the BS transmits the training signal using M BS,train transmit beams selected from F while the UE receives the training sigmal using M UE,train receive beams selected from W. Overall, the number of training signals, M train , in the beam training phase is equal to M UE,train × M BS,train . To cast the beam training problem into the CS framework, we first rewrite (6) as where y t is the observation vector at slot t, The state is governed by the state transition equation, where v t is assumed to be a zero-mean white Gussian noise We note that σ 2 sys depends on the variation of the path gains as will be specified in Section IV.
Remark 1: The state vector x t is sparse because it is the vectorized beamspace channel as explained in Section II. Thus, only a few entries in x t have non-zero values depending on the number of resolvable paths. This implies one can estimate the path gains by estimating x t . The indices of non-zero entries indicate the path angles in the beamspace channel.
The information about path gains and angles is embedded in the support of x t denoted as S t . For t = 1, the support of x 1 is obtained via full beam scanning. For the rest of time, the correlation between x t and x t−1 is exploited to estimate S t without probing all the beam pairs. Without loss of generality, suppose the first change of the active beam pairs occurs at slot t a . The change may require to add or remove a certain non-zero entries in S t as explained below.
1) Addition: Let us first consider the case that the change corresponds to a new set added to the support set of x t . To estimate the new state denoted as x new at t = t a , we use the filtering error, which is the difference between the observation vector y t and the predicted one based on the old estimatex t , as given bỹ Since the observation can be expressed as (9) can be rewritten as Clearly, the new set belongs to the complementary support set S c t . Thus,ỹ t,f contains the information about and it is a sparse vector. This motivates us to find by estimating Algorithm 1: Beam Training Algorithm Input: Observation y t , sensing matrix P train , current estimatex t , adding/deletion threshold α.
1. Addition. Compute the filtering error (9) (a) Estimate x new by applying CS on filtering error using DS (11).
Output: The support set S t and the updated state vectorx t .
(x t ) S c t using CS. One common sparse estimator is the Dantzig selector (DS), which is an l 1 -norm minimization problem as given by [22] x new = min where is a constant depending on the noise level. Given the solution of (11), the support of x new , denoted as add is then obtained by thresholding Now the detected new support, namely the indices of the newly added beam pairs is given by = (S c t ) add and the new support of x t is updated as S t = S t ∪ .
2) Deletion: It also happens that an active beam pair no longer matches to any path angles due to UE movement. In this case, the entries in x t corresponding to the outdated beam pairs should be removed from S t . Also, some indices may be incorrectly added into S t due to CS error using (11). Similar to the addition of new supports, we remove those outdated indices from S t by thresholding. Denote the deletion set del = {i : |(x t ) i | < α}, which contains the indices to be removed from S t . Then the support set is updated as A complete algorithm of the proposed beam training is presented in Algorithm 1. In each slot, we run KF including the prediction and the update steps, to capture the channel dynamics. Since the channel is sparse, it is sufficient to run a reduced-order KF on the entries of x t designated by the support. From the classical KF theory, KF prediction, also known as the time-update, performs the following steps to predict the current statex t and the estimate covariance matrix R t|t−1 .
wherex t|t−1 represents the estimate of x at slot t based on that at slot t − 1, and R t|t−1 is the mean squared error in the estimatex t|t−1 . Initially,x 0 = 0 and R 0|0 = 0. Then the following routines are used to update system dynamics based on the current support S t−1 as [21] where K is the KF gain, σ 2 obs is the observation noise variance in (5), and ie is the covariance matrix of the innovation error, which will be defined later. Upon completing the KF prediction and update, the support S t is updated by the addition and the deletion steps as described above. Finally, the system dynamics are refreshed using the updated support S t .

B. BEAM TRACKING PROCEDURE
The beam training procedure presented above provides a profile of the active beam pairs to be used for data transmission. The purpose of beam tracking is to maintain the estimation accuracy of the active beam pairs using M track = M BS,track · M UE,track M train number of training signals. This is beneficial to reduce the overhead of beam alignment but also challenging because now fewer observations are available.
Suppose beam tracking is performed at slot t, where the received training signal is given by where represents the sensing matrix in the beam tracking phase. In this work, F track ∈ C N BS ×M BS,track and W track ∈ C N UE ×M UE,track are chosen according to the active beam pairs estimated at slot t − 1. A reasonable choice is the active beam pairs in slot t − 1 and their adjacent beams considering the slow channel variations within a short period. A similar idea has also been used in [19], [23] to reduce the overhead of search-based beam tracking. Specifically, [19] proposed to estimate AoD and AoA using dual beams for the time-varying mmWave channel, considering the sparsity of the mmWave channel. The two beams are selected from the codebook based on the distributions of AoD and AoA for minimizing the estimation error. When the channel varies slowly, the selected two beams are often adjacent to the active beams in the past beam training period. In [23], an analog beam is assigned with two virtual beams that form a so-called auxiliary beam pair to probe the path angles. This corresponds to insert some more columns in the beamforming and the combining matrices to enhance the angular resolution of beam training. Inspired by [19], [23], we employ the active beam pairs in the previous slot and their two neighboring beams to form the sensing matrix for beam tracking. For example, if f i is active in slot t − 1, F track in slot t is composed by f i−1 and f i+1 , for i = 2, . . . , Q BS − 1. If f 1 or f Q BS is active in slot t − 1, only one adjacent beam will be included in F track in slot t. The same rule applies to determine W track .
Similar to beam training, the task of estimating the active beam pairs in the beam tracking stage is equivalent to estimating the support of the state vector x t , which can be accomplished by running CS on the KF filtering error. Since we aim to reduce the overhead for beam tracking, it is desired to use a very small number of training signals, i.e., the measurements available to estimate x t in the tracking stage is much less that in the training stage. In this case, a regular CS approach may not provide a reliable estimate. To enhance the estimation accuracy, we adopt a weighted l 1 minimization approach as given by [24] x new = min where ω t diag ([ω t,1 , . . . , ω t,Q BS Q UE ]) represents the weighting matrix and w t,i is the weight assigned to the ith entry of x t for i ∈ [1, Q BS Q UE ]. The objective of norm minimization in (15) suggests a simple choice of weights being the inverse of the entries in x t (which correspond to the channel coefficients in our problem). As a result, large weights force the solution of (15) to concentrate on the indices where ω t,i is small, corresponding to the indices of the active beam pairs. If the true state x t (i.e., actual channel matrixH t ) is k-sparse, i.e., obeying x t 0 ≤ k, then (15) is guaranteed to find the correct solution with the choice of weights mentioned above [24]. Since the exact channel matrixH t is unknown, an alternative is to employ the previous estimatex t−1 as the weights. This is because not onlyx t−1 is sparse but also it presents the similar sparsity profile as x t when the channel varies slowly.
Based on the above explanations, we propose the following strategy to determine weights ω t,i in (15). Considering the fact that small weights could be used to encourage nonzero entries, it is reasonable to assign a small weight to the ith beam pair if it is the active one in slot t − 1. Meanwhile, the neighboring beam pairs of the active ones in slot t − 1 are likely to be significant in slot t and thus, they should be assigned with small weights too. For the rest beam pairs, their likelihood of becoming the active beam pair could be ignored. Thus, we have the following rule.

C. SWITCHING MECHANISM
In practice, tracking errors might be accumulated as beam tracking relies on the previous estimate of the active beam pairs. Intuitively, one can perform beam training when the tracking error is large, but the tracking error is unknown. Besides, beam training is relatively costly due to more training signals required and thus, the switching between beam training and beam tracking should be carefully determined.

Input:
The observation y t , sensing matrix P track , state vectorsx t−1 and x t , the adding and deletion threshold α.
Initialize Set up the weighting matrix t according to (16). 1. Addition. Compute the filtering error (9).
The addition to the support set is = S c t add .
Set R t|t−1 :, del = 0 and S t ← S t \ del .

Output:
The support set S t and the updated state vectorx t .
Reference [21] provides a decision rule based on a hypothesis test to determine whether the estimation error is large, but it does not specify the threshold required in the test.
To find a proper switching threshold, we consider t a where the change of beam pairs occurs that causes the estimation error, or known as the innovation error in KF, as given bỹ For t < t a , S t is perfectly known through full beam scanning performed initially. Hence,x t|t−1 matches to x t and the distribution ofỹ t for t < t a is dominated by the noise, which has a zero mean. Now, consider the addition of new support to S t at t a . The case of deletion follows the same principle and is omitted here. At t = t a , the observation y t can be expressed as y t = P S t (x t ) S t + P (x t ) + n t . Then the innovation error due to the addition of is y t a = y t a − Px t a |t a −1 = P S ta x t a S ta + P x t a + n t a − Px t a |t a −1 = P S ta x t a −x t a |t a −1 S ta + P x t a + n t a = P x t a +ñ t a , whereñ t a = [P S ta (x t a −x t a |t a −1 ) S ta + n t a ] ∼ CN (0, ie ) and ie is given in (13). From (18), the new set introduces non-zero mean toỹ t a . On the other hand,ỹ t has a zero mean when the change of support set does not occur. This implies the decision on switching between beam training and beam tracking can be transformed into a hypothesis testing problem with two hypotheses: the null hypothesis H 0 refers to the case that result of beam tracking is not trustworthy and the alternative hypothesis H 1 is the case that the result of beam tracking is trustworthy. Whether the beam tracking result is trustworthy can be evaluated by filtering error norm as defined by [21] FEN ỹ H where the covariance matrix fe = [I−P S t K] ie [I − P S t K] H (see Appendix A). Then the hypothesis testing under

2: Set S ← S t−1
3: Run KF prediction (12) with S 4: Run KF update (13)  Run Beam Tracking (Algorithm 2) 10: Run KF update (13) with S 11: Set S t ← S Output: Support set S t and estimated state vectorx t 12: Set t ← t + 1 and go to step 2 considearion can be stated as follows.
where γ 0 is the switching threshold between beam training and beam tracking. A large γ 0 avoids excessively switching to beam training, thereby reducing the false alarm probability. This, however, increases the risk of miss alarm, that is, overusing beam tracking. For a threshold γ 0 , the false alarm probability can be expressed as Then the switching threshold can be found as where F −1 is the inverse cumulative distribution function of FEN. The following proposition is useful to determine the threshold γ 0 . Proposition 1: For perfect beam alignment, FEN follows the chi-squared distribution, i.e., FEN ∼ χ 2 ν , where ν is equal to 2M train or 2M track depending on the beam alignment scheme (beam training or tracking).
Proof: See Appendix B. Accordingly, γ 0 can be computed numerically given P fa and ν, and the fact that FEN is chi-squared distributed. Now the switching mechanism can be described as follows. At slot t, suppose beam tracking is used. At slot t + 1, the FEN is computed and compared with the threshold γ 0 . If FEN is larger than γ 0 , beam training will be adopted at slot t + 1 for acquiring channel variations using M train number of training signals. In case FEN is smaller than γ 0 , beam tracking will be performed in slot t + 1 using M track training signals. Algorithm 3 summarizes the proposed beam alignment scheme, including the beam training procedure, the beam tracking procedure, and the switching mechanism described above.
Remark 2: Sometimes the FEN may have a value very close to the threshold γ 0 that leads to too frequent switching between beam training and beam tracking. This can be avoided by running the hypothesis test multiple times  (20) in two consecutive slots. The switching occurs if the null hypothesis is received twice. This can help to stabilize the algorithm without deteriorating the performance.
We conclude this section by summarizing the existing beam alignment schemes relevant to our work in Table 1, which highlights the key idea for path angle estimation in the related work and whether path gain estimation is included. Except [7], [16], [17], [19], [25], the related work focuses on tracking the angle variation without addressing the channel gain estimation in the time-varying sparse channel considered in our work. In [7], the path gain is estimated in an iterative manner and each iteration needs to compute the Hessian matrix of a maximum likelihood cost function. To estimate the path angle, a lengthy sounding procedure is employed and the UE needs to feed back the estimated channel matrix. The problem of costly feedback is resolved in [25] by developing an adaptive strategy for determining the sensing matrix. However, the resultant sensing matrix may not meet the restricted isometry property (RIP) and thus, sophisticated algorithms are used to recover the sparse channel matrix. The method proposed in [16] requires movement (position and velocity) and path gain information to estimate the angles. In addition to the complexity for collecting the movement information, the estimation error of path gain will propagate to the angle estimation. The method proposed in [17] tracks the angular variation for one path only and the number of training signals for gain estimation needs to be larger than that for tracking angles. References [19] and [23] use some selected beams for beam alignment for reducing the overhead as discussed in Section III-B. The similar idea is also adopted during beam tracking in our work but we do not need the angle distribution as required in [19]. The angle distribution is not required in [23] either but only the path angles are estimated while the proposed CS-KF tracks both path gains and angles using very few beams.

IV. SIMULATION RESULTS
Simulation results are presented to evaluate the performance of the proposed beam alignment scheme for the time-varying mmWave channel. The simulation scenario follows the system model described in Section II with the parameters listed in Table 2. In simulations, the path gain of the lth path is modeled as a Gauss-Markov process as given by where ρ is the correlation coefficient, and z t ∼ CN (0, σ 2 sys ) with σ 2 sys = 1 − ρ 2 . As to path angles, when the AoD and AoA of the lth path, namely φ l,t and ψ l,t , in slot t match to beam pair b ∈ [1, Q BS Q UE ], it implies b is in the support set of x t . Thus, we model the angular variations as follows. If b ∈ S t , φ l,t+1 and ψ l,t+1 match to one of the adjacent beam pairs of b with probability p v , while they match to b with probability 1 − p v . A larger p v , referred to as the beam variation rate, renders more frequent AoA/AoD changes. The temporal change of the support set can be stated as follows.
In evaluating the proposed beam alignment scheme, labeled as CS-KF, the mean square error (MSE) of the estimated vector x t at each slot t, i.e., E[ x t −x t 2 2 ], was computed by averaging 100 runs, each containing 20 slots. During the beam training stage, the even-indexed beam pairs are active. That is, M BS,train = 1 2 Q BS and M UE,train = 1 2 Q UE . As to beam tracking, the active beam sin each slot as well as their two adjacent beams are used to transmit and receive training signals. The number of active beams in the tracking stage may vary depending on the number of detected paths, i.e., non-zero entries in the solution of (15). Suppose all the (L + 1) paths can be identified, the number of active beam pairs is equal to L + 1. Hence, M BS,track = M UE,track = 3(L + 1).
For comparisons, the following beam alignment schemes are used as the benchmark.
• CS-only: It estimates x new by solving DS in (11) every slot independently. In other words, beam training is performed consistently using the same amount of training signals as the proposed beam tracking scheme. • Genie-KF: It generates the minimal MSE estimate of x t by performing the reduced-order KF with known support set. • Full-Extended KF (Full-EKF) [13]: It estimates AoAs and AoDs using EKF assuming path gains are fixed. In our simulations, EKF proposed in [13] is implemented with known path gains and full observations, i.e., training signals are transmitted in Q BS Q UE beam pairs. • Half-EKF: It is a replica of EKF except that only a quarter of beam pairs are active for training as in the proposed method. • Beamspace: Our work is based on the beamspace channel representation (3), which is also considered in [11] for the single-antenna UE and a linear moving model. For comparisons, we use the tracking method in [11] to find the AoD (i.e., active transmit beam) of each path using Q BS of training signals and assume that the receive beams and the path gains are known perfectly.

FIGURE 3. MSE vs. SNR of different algorithms.
• Auxiliary beam pair (AuX): In [23], joint AoD and AoA estimation is achieved using auxiliary beam pairs. The idea is to cover the desired angular range using analog beams at both the BS and the UE. To extend the angular coverage, each analog beam is accompanied by two virtual beams, referred to as the auxiliary beam pair. In this way, AoD and AoA can be estimated using less training signals than that by full scanning. In simulations, the number of auxiliary beam pairs at the BS is set to N BS /2 − 1 and that at the UE is N UE /2 − 1.
As a result, the required number of training signals in AuX is close to that used in the beam training stage in the proposed CS-KF. Same as the beamspace scheme, we use AuX to estimate the AoD and assume the path gains are known perfectly. In addition to the MSE of the estimated channel matrix, labeled as AuX-H, we also evaluate the MSE of AoD estimation with the results labeled as AuX-AoD. When solving DS in (11) and WDS in (15), we set = 2 log(N BS N UE )trace( fe )/(Q BS Q UE ) as suggested in [21].

A. IMPACT OF SNR AND CHANNEL VARIATION RATE
We first examine the impact of observation noise to the MSE of all the considered beam alignment schemes. Given the average power of training signal equal to one, we define SNR as 1/σ 2 obs . Fig. 3 shows the MSE performance of different schemes for varied SNRs. As expected, they all achieve better accuracy as SNR increases.
The MSE gap between the CS-KF and the genie-KF reduces with SNR, where the latter delivers the lower bound on estimation error provided with known support. The full-EKF also performs comparably to the genie-KF using full observations. Comparing to the full-EKF, CS-KF used only 25% of the observations during the beam training stage and much fewer observations during the beam tracking stage. Clearly, the reduced observations in the CS-KF leads to certainly performance loss (10% higher MSE at SNR of 30 dB). If EKF uses the same amount of observations as the CS-KF, the result labeled as Half-EKF is much worse than CS-KF and even inferior to CS-only (i.e., performing independently beam training every slot). This reveals the major drawback of EKF that needs a large amount of observations to maintain a good beam alignment accuracy. On the other hand, the CS-only scheme also suffers a poor MSE (56% higher MSE than CS-KF at SNR of 30 dB), suggesting that using CS alone could not provide a reliable beam alignment performance. Comparing with the EKF and CS-only, CS-KF achieves a reasonable accuracy with much lower overhead. As to the beamspace scheme, its performance is close to the CS-KF when SNR is below 20 dB but it performs worst than the CS-KF at the high SNR regime (40% higher MSE than CS-KF at SNR of 30 dB). Since the AoA and the gain of each path is assumed to be perfectly known in the beamspace scheme, the only source of error is the AoD estimation. In a noise-free channel, the beamspace scheme can perfectly estimates the AoD [11], which is not the case in practice. Comparing with the CS-KF, the beamspace scheme is more sensitive to the observation noise that limits its estimation accuracy. Finally, Aux-AoD delivers a better performance than CS-only and Half-EKF using nearly the same amount of training signals while Aux-H performs the worst among all the considered schemes. This can be explained as AuX aims to estimate the physical angles of each path. The estimation error of AoD affects the phase of every antenna element and the accumulated errors result in inaccurate estimate of the channel matrix. Instead, CS-KF and the beamspace schemes estimate the active beams corresponding to the path direction (i.e., the index of elements in x t ) rather than estimating the exact angles directly. Hence they do not suffer from accumulated phase errors as in AuX. The KF-based scheme can estimate the path angles reliably but this is achieved only with full beam scanning. Since full-EKF and half-EKF perform similarly to Genie-KF and CS-only, respectively, their results are omitted in the following discussions. The results of AuX-H and AuX-AoD are also excluded for brevity. Fig. 4 shows the beam alignment performance under different beam variation rates. Here we vary the beam variation rate p v from 0.01 to 0.15, corresponding to the mobility speed from 8.99 m/s to 134.85 m/s when the distance between the BS and UE is 10 m. As can be seen, the MSE of the proposed CS-KF increases with p v and the same trend is observed in the genie-KF scheme. The reason is that, the required settling time of KF exceeds the coherence time of each entry in the state vector x t when the channel varies fast. As to the CS-only and the beamspace schemes, they do not track channel variation using the past estimates but performs estimation in each slot independently. Thus, the MSEs of CS-only and the beamspace schemes are not sensitive to p v . In fact, the MSE of CS-only is fairly poor even at high SNR, which is the result of performing estimation using partial observations without the aid of the past estimates. Comparing the slopes of the MSE curves in Figs. 3 and 4, we see that the proposed CS-KF has a higher MSE as p v increases but the performance degradation due to fast beam variations is less than that due to noisy observations. Fig. 5 shows the beam alignment performance under different numbers of antennas. Here we set N BS = N UE that vary from 10 to 32. As shown, all the considered schemes have a lower MSE as the number of antennas increases because of a higher spatial resolution for estimating the AoA and AoD. For different schemes, the MSE reduces with the number of antennas in a different rate. In the CS-only, the Genie-KF, and the beamspace schemes, the MSE improvement with more antennas is minor compared to CS-KF because their accuracy is dominated by the observation noise that is invariant of the number of antennas. While the observation noise also affects the alignment accuracy of CS-KF, some estimation errors could be corrected by the addition and deletion steps. This allows CS-KF to benefit more from the increased spatial resolution with a larger number of antennas. The beamspace scheme performs better than the CS-KF when N BS = N UE < 12 because the former performs full beam scanning in each slot. However, a 3-dB performance loss is incurred as the number of antennas increases. As explained above, the beamspace scheme is very sensitive to the observation noise, which can not be mitigated by adding more antennas. Fig. 6 shows the MSE versus the average training overhead, which is the average between the number of training signals M BS,train · M UE,train to the total number of available beams (equal to 1,024). We plot the training overhead from 5% to 50% because the two benchmarks, namely CS-only and Genie-KF reach their performance limit within this range.

C. IMPACT OF TRAINING OVERHEAD
Here the beamspace scheme is not considered because it requires full beam scanning to obtain a reasonable accuracy. For the proposed CS-KF, we consider two options to estimate the support in the tracking stage: (i) use (16) to determine the weights in WDS with the results labeled as "Weighted CS-KF (weighted)" and (ii) set al. weights equal to one with the results labeled as "Unweighted CS-KF". One can see that Weighted CS-KF has a smaller MSE than Unweighted CS-KF. The MSE gain of Weighted CS-KF is around 2 ∼ 2.7 dB, confirming the effectiveness of the proposed weighting rule (16) in improving the estimation accuracy of the support set. The improvement is invariant to the training overhead because the weighting rule (16) is used in the beam tracking stage where the number of training signals is fixed. When the training overhead is very small (< 3%), the CS-KF performs worse than the CS-only. In this region, the underlying system is highly underdetermined and thus CS performs poorly. Also, KF fails to reliably predict x t based onx t−1 when the number of training signal is too small. With the increased training overhead, the estimation accuracy of both CS and KF gets improved while the estimation accuracy of CS-only is limited because it relies on partial observations only without the aid of past estimates as in KF. proposed CS-KF.

D. IMPACT OF RICIAN FACTOR
The MSE performance subject to different Rician factors is illustrated in Fig. 7(a). As shown, the trends of MSE curves are quite different for different schemes. For CS-KF, the MSE first increases because the NLoS paths become weaker as the Rician factor increases and thus, the corresponding beam pairs are more likely to be removed incorrectly in the deletion step. The MSE curve reaches a maximum around the Rician factor of 9 dB. As the Rician factor further increases, the trend reverses because the LoS component is more dominant, implying that the channel is more sparse. The sparser channel helps to improve the estimation accuracy of the CSbased algorithm. This also explains the reduced MSE of CS-only as the Rician factor increases. Since the deletion step is not used in CS-only, the MSE decreases monotonically. For the Genie-KF, it assumes perfect knowledge of the support set and thus the MSE is not sensitive to the Rician factor. As to the beamspace scheme, its MSE performance is quite stable within the considered range of the Rician factor. This is because when the Rician factor is large, the estimation error is primarily dependent on the quality of the LoS path, whose strength is determined by the SNR.
Since the addition and deletion steps play important roles in the proposed CS-KF, we investigate the impact of the addition/deletion threshold α as a function of the observation noise variance to the MSE in Fig. 7(b). The threshold α is related to the detection sensitivity of beam update. With a small α, addition and deletion of the support set occur more frequently. While this increases the detection sensitivity of beam changes due to channel variations, the chance of incorrect addition and deletion also increases. The choice of α depends on the noise variance. From Fig. 7(b), α = 2σ obs achieves the lowest MSE for SNR of 10 dB while α = 3σ obs is the best for the SNR of 20 and 30 dB. This can be explained because the SNR is inversely proportional to σ obs . Therefore a higher α should be used to improve the effectiveness of the addition and deletion steps.

E. PERFORMANCE OF SWITCHING MECHANISM
In the proposed beam alignment scheme, beam training and beam tracking are adaptively performed according to the tracking error. To demonstrate the merit of such a dynamic operation, we compare the proposed beam alignment scheme and the periodic beam training scheme without adaptation, which performs beam training in a fixed period from 3 to 15 slots in Fig. 8. Here we plot the MSE and the average number of used training signals. From the figure, a longer training period rapidly increases the MSE despite a certain saving in terms of the training signals used. Unfortunately, it is not straightforward to determine a proper training period as it depends on the observation quality, the channel variation speed, and the available spatial resolution. This difficulty is unlocked by the proposed switching mechanism as explained in Section III-C.
The effectiveness of the dynamic switching mechanism is evaluated in Fig. 9 using two metrics: detection rate and false alarm rate. The former measures the percentage of correctly accepting the null hypothesis H 0 (i.e., the switching from beam tracking to beam training) and the latter measure the percentage of falsely rejecting H 0 . Clearly, there is a tradeoff between the detection rate and the false alarm rate, as seen from the figure. A high detection rate of the proposed beam alignment scheme suggests that it can avoid error accumulation by properly switching between beam training and beam tracking. For the SNR between 15 dB and 30 dB, CS-KF maintains the detection rate of 99%. On the other hand, a low false alarm rate avoids unnecessary switching from beam tracking to beam training, thereby reducing the training overhead. As SNR increases, the false alarm rate reduces from 23% to 7%. The above results indicate that the proposed beam alignment scheme achieves a good balance between the training overhead and the detection sensitivity of beam changes.

V. CONCLUSION
A novel beam alignment scheme for a single-user timevarying mmWave channel was proposed in this paper. We showed that the proposed CS-KF can accurately track the path gains and angles using very low overhead. This is achieved by exploiting the integrated CS and KF framework previously proposed for sparse signal reconstruction. Simulation results revealed that CS-KF has a much lower MSE than using KF and CS alone given the same amount of training signals. Besides, CS-KF can approach to the estimation lower bound at high SNR using much fewer training signals. The advantage of low overhead comes from using previous estimates obtained by KF for estimating the sparse channel using CS. Besides, the dynamic switching between beam training and beam tracking is important. The former consumes more training signals than the latter and thus, it should be used only when the estimation error of beam tracking is large. A simple switching mechanism based on hypothesis testing is employed to determine the timing of switching between beam training and beam tracking. This strategy performs better than the conventional mmWave beam alignment approach that performs beam training with a fixed period. The merits of high alignment accuracy and low overhead make the proposed beam alignment scheme attractive to mmWave communications over the time-varying channel.

APPENDIX A DERIVATION OF FE
Since ∈ S c t , (18) can be written as y t = P S c t (x t ) S c t +ñ t (A.1) whereñ t is the noise of the innovation error in (18). Replace P S c t (x t ) S c t in (10) byỹ t −ñ t , the filtering error can be expressed as Sinceñ t = P S t (x t −x t|t−1 ) S t + n t , (A.2) can be shown as From (13), (x t −x t−1 ) S t is a function of the innovation error, i.e.,x t −x t−1 = Kỹ t . Thus, the filtering error is also a function of the innovation error as can be seen from rewritten (A.3) asỹ Asỹ t is noisy,ỹ t,f is also noisy with the noisẽ n t,f = I − P S t K ñ t , (A.5)