Fast Accurate Beam and Channel Tracking for Two-dimensional Phased Antenna Arrays

The sparsity and the severe attenuation of millimeter-wave (mmWave) channel imply that highly directional communication is needed. The narrow beam produced by large array requires accurate alignment, which can be achieved by beam training with large exploration overhead in static scenarios. However, this training expense is prohibitive when serving fast-moving users. In this paper, we focus on accurate two-dimensional (2D) beam and channel tracking problem in mmWave mobile communication. The minimum exploration overhead of 2D tracking is given in theory first. Then the time-varying channels are divided into three cases: Quasi-static Case, Dynamic Case I and Dynamic Case II. We further develop three tracking algorithms corresponding to these three cases. The proposed algorithms have several salient features: (1) fading channel supportive: they can simultaneously track the channel gain and 2D beam direction in fading channel environments; (2) low exploration overhead: they achieve the minimum exploration requirement for 2D tracking; (3) fast tracking speed and high tracking accuracy: in Quasi-static Case and Dynamic Case I, the tracking error is proved to converge to the minimum Cramer-Rao lower bound (CRLB). In Dynamic Case II, our tracking algorithm outperforms existing algorithms with lower tracking error and faster tracking speed in simulation.


I. INTRODUCTION
Millimeter-wave (mmWave) mobile communication is currently a hot topic due to its much wider bandwidth compared with sub-6GHz spectrum. In mmWave channels, the much shorter wavelength leads to much higher propagation attenuation. Fortunately, with the shorter wavelength, the narrower beam and larger array gain can be formed with a given array aperture providing compensation for the propagation loss [1]- [5]. However, the narrower beam also implies the requirement of more accurate beam alignment, which can be achieved by beam training at the expense of significant exploration overhead in static or quasi-static scenarios [6]- [11]. While in mobile communication, the fast change of wireless channel either in the direction or the channel gain makes beam training inefficient and inaccurate. Therefore, accurate beam tracking with low exploration overhead is crucial for serving fast-moving users in mmWave communication system. Moreover, the large array gain requirement of mmWave mobile communication implies a large number of antenna elements in the array. However, due to the high cost of AD/DA devices and the high energy consumption of mmWave RF chains, the number of RF chains should be kept much smaller than that of antenna elements, leading to hybrid beamforming or even analog beamforming for cost-effective and energy-efficient systems [3]- [5], [12], [13]. In analog beamforming, since only one RF chain is available with a certain set of programmable phase shifters at a certain time (forming a so-called exploring beamforming vector (EBV) in this paper), only one direction (or dimension) of the channel can be observed. A randomly selected EBV may most probably lead to missing the real propagation ray and result in an observation of very low signal-to-noise ratio (SNR). What is interesting is that an EBV resulting highest SNR is also inappropriate, since the observation could not provide any information of direction variation. Hence it is crucial to select informative EBVs dynamically to achieve as accurate direction information as possible. Since the EBVs should be designed according to historical observations with noise, this process is really tracking rather than estimation.
Some beam tracking methods have been proposed in [9], [14]- [16], which utilize historical exploring directions and observations to obtain current estimates. However, the EBVs are not optimized in those tracking algorithms, which may lead to poor observations. A beam tracking algorithm is proposed in [17], trying to optimize the EBVs, assuming that the channel gain is known. In [18], the authors start to jointly track the channel gain and the beam direction with optimal EBVs. Despite the progress, the proposed algorithm is based on uniform linear array (ULA) antennas, which can only support one-dimensional (1D) beam tracking. While in several mobile scenarios, e.g., dense urban area [19] or unmanned aerial vehicle (UAV) scenarios [20], both horizontal and vertical directions are variable and need to be tracked. The algorithm designed for 1D beam tracking in [18] could not be efficiently applied to the two-dimensional (2D) problems directly.
In this paper, we focus on accurate 2D beam and channel tracking problem. The mmWave wireless system in this paper periodically works in exploration and communication mode. In the exploration stage of one exploration and communication cycle (ECC), the transmitter sends q pre-defined pilot sequences. Then the receiver points in one exploring direction in the duration of each pilot sequence and makes an estimate of the channel gain and the direction of the incoming beam at the end of the q-th exploration. In the communication stage of one ECC, the beam is aligned in the estimated direction. Based on this structure, we care about the following questions: 1) What is the minimum exploration overhead q in each ECC?
2) How to determine the q exploring directions?
3) How to track the beam direction and the channel gain for different time-varying channels, e.g., slowly changing channels and fast fading channels? 4) How is the accuracy, convergence and stability of the tracking algorithm?
To answer these questions, we perform theoretical analysis on 2D beam tracking and propose corresponding algorithms to efficiently track different time-varying channels. The main contributions of this paper are summarized as follows: 1) It is proved that for the unique estimate of the channel gain and the 2D beam direction within one ECC, the minimum exploring overhead in each ECC is q = 3.
2) Dynamic beam and channel tracking strategies for three different time-varying channels are proposed and optimized. The salient advantages of these tracking algorithms are given below: i) For channels changing slowly, i.e., with quasi-static channel gain and beam direction (called Quasi-static Case in this paper), optimal exploring directions are derived which are proved to be determined only by the array size, and approach constants as the array size goes large enough.
The tracking error of our proposed algorithm is proved to converge to the minimum Cramér-Rao lower bound (CRLB).
ii) For channels with quasi-static beam direction and fast-changing channel gain (called Dynamic Case I in this paper), the scenario where the channel gain satisfies Rayleigh distribution is studied as a special case in this paper. An algorithm for beam only tracking is proposed, and it is proved to converge and achieve the minimum CRLB on beam direction.
iii) For channels with fast-changing beam direction and channel gain (called Dynamic Case II in this paper), simulation result shows that our algorithm can achieve faster and more accurate tracking performance compared with existing methods. s ∈ C 1×Ls , and |s| 2 = E p is the transmit energy of each pilot sequence. Then the receiver points in one exploring direction in the duration of each pilot sequence and makes an estimate of the channel gain and the direction of the incoming beam at the end of the q-th exploration. In the communication stage of one ECC, the beam is aligned in the estimated direction.

A. Channel Model
In mmWave channels, only a few paths exist due to the weak scattering effect [4]. Because the beam formed by a large array in the mmWave system is quite narrow, the interaction between multi-paths is relatively weak. In other words, the incoming beam paths are usually sparse in space, making it possible to track each path independently [23]. Hence, we focus on the method for tracking one path. Different paths can be tracked separately by using the same method. In k-th ECC, the direction of the incoming beam path is denoted by (θ k , φ k ), where θ k ∈ [0, π/2] is the elevation AoA and φ k ∈ [−π, π) is the azimuth AoA. Then the channel vector of this path where β k = β re k + jβ im k is the complex channel gain, x k = [x k,1 , x k,2 ] T = M d 1 cos(θ k ) cos(φ k ) λ , N d 2 cos(θ k ) sin(φ k ) λ T is the direction parameter vector (DPV) determined by (θ k , φ k ), a(x k ) = [a 11 (x k ) · · · a 1N (x k ) a 21 (x k ) · · · a M N (x k )] T is the steering vector with a mn (x k ) = e j2π( m−1 M x k,1 + n−1 N x k,2) (m = 1, · · · , M; n = 1, · · · , N), and λ is the wavelength.

B. RF and Base Band Preprocessing
Let w k,i ∈ C M N ×1 be the EBV for receiving the i-th (i = 1, · · · , q) pilot sequence in k-th ECC. The entries of w k,i are of the same amplitude with [w k,i ] l = 1 √ M N , where [w k,i ] l denotes the l-th element of w k,i . After phase shifting and combining, the i-th received sequence in k-th ECC at the baseband output of the RF chain is given by where ζ T k,i ∼ CN (0, σ 2 z J Ls ) is a circularly symmetric complex Gaussian random vector with J Ls denoting the identity matrix of L s order. By match filtering on the received sequence ν k , the i-th observation in k-th ECC is given below: where Step (a) is due to the definition z k,i ζ k,i s H |s| . It is clear that z k,i is a complex Gaussian random variable with z k,i ∼ CN (0, σ 2 z ). Let W k [w k,1 , . . . , w k,q ], z k [z k,1 , . . . , z k,q ] T and y k [y k,1 , . . . , y k,q ] T denote the exploring beamforming matrix (EBM), the noise vector and the observation vector respectively. Then we can rewrite (4) as follows:

C. Tracking Loop
The frame structure of our system is given in Fig. 2. It is assumed that the beam estimator can output an initial estimate that falls in the main lobe of the DPV: Then our tracking starts from this initial estimate of the channel gain and the DPV.
In the exploration stage of k-th ECC, the receiver needs to choose an EBM W k based on previously used EBMs W 1 , · · · , W k−1 and historical observation vectors y 1 , · · · , y k−1 . By applying W k , we can get the observation vector y k . Then the estimateψ k β re k ,β im k ,x k,1 ,x k,2 T of the channel parameter vector ψ k β re k , β im k , x k,1 , x k,2 T is obtained by using all EBMs and observation vectors available. From a control system perspective, ψ k is the system state,ψ k is the estimate of the system state, the EBM W k is the control action and y k is a non-linear noisy observation determined by the system state and control action. Hence, the task of a tracking design is to find the following strategy: where F c k denotes the control function in k-th ECC while F e k denotes the estimation function in k-th ECC.

III. PROBLEM FORMULATION
denote a causal beam and channel tracking scheme in k-th ECC. Then the beam and channel tracking problem is formulated as: (1), (2), (5), (7), (8) where the constraint (10) ensures thatĥ k β k a (x k ) is an unbiased estimate of the channel vector h k = β k a (x k ).
Problem (9) is challenging due to the following reasons: 1) It is a constrained partially observed Markov decision process (C-POMDP) that is quite difficult to solve optimally [24] [25].
2) There are M × N phase shifts to adjust in each EBV w k,i . This makes the optimization of EBV too complicated due to the high degree of design freedom.
3) To obtainψ k in k-th ECC, k EBMs, i.e., W 1 , · · · , W k , need to be designed, making it difficult to optimize so many beamforming matrices simultaneously when k is not quite small.

4)
The time-varying features of the channel in (1) restrict the tracking algorithm and system performance. Hence, it is hard to design a tracking method for a general channel model in (1).
These challenges above make it extremely difficult to solve this problem optimally. Hence, we add some reasonable constraints in this paper to take the first step of optimal tracking policy: 1) EBV constraint. Instead of general phase shifts, we use steering vectors to design the EBVs, where ω k,i [ω k,i1 , ω k,i2 ] T denotes the i-th exploring DPV in k-th ECC. This ensures that only two variables need to be designed for each EBV.
2) Exploring direction constraint. Although the exploring DPV in (11) can be any form, however, considering the tracking accuracy, it is better to make sure that ω k,i falls within the main lobe of the DPV in (6). Thus, it is a reasonable choice to use the currently estimated direction to perform a certain offset for exploring. For this purpose, we use such an architecture in this paper. That is, the i-th exploring DPV in k-th ECC, i.e., ω k,i in (11), is determined by the previous estimated DPV plus an exploring offset ∆ k,i . Considering the design of the offsets that change in different ECCs is also very complicated, we adopt fixed exploring offsets ∆ i (i = 1, 2, 3) in this paper: Therefore, the EBV in (11) can be rewritten as 3) Time-varying channel constraint. The channel vector in (1) is determined by two parts: the channel gain β k and the DPV x k , both of which may change slowly or fast. Therefore, four possible cases exist: (i) Both the channel gain and the DPV change slowly; (ii) The channel gain changes fast while the DPV changes slowly; (iii) The channel gain changes slowly while the DPV changes fast; (iv) Both the channel gain and the DPV change fast.
These four cases correspond to different practical scenarios, which can be modeled as follows: When both β k and x k change slowly, e.g., the user keeps static or quasi-static, the channel can be seen as approximately fixed. For the sake of convenience, we assume that β k = β = β re +jβ im , For channels that β k changes fast while x k changes slowly, e.g., the user moves fast without rotating, the beam direction can be seen as approximately fixed, i.e., x k = x, when the distance between transmitter and receiver is very large compared with the wavelength. In order to distinguish with other dynamic scenarios, this case is called Dynamic Case I.
This case requires that the channel gain keeps static or quasi-static while the beam direction changes fast. However, in mmWave channels, the fast change of beam direction usually leads to the fast change of channel gain since the propagation paths change. This case exists only when the antenna array rotates around the first antenna element which keeps static. This is not the usual case and is not studied in this paper.

D. Dynamic Case
This case happens in most fast moving scenarios except Dynamic Case I, e.g., the user moves while the receiver array rotates. In order to distinguish with Dynamic Case I, we call it Dynamic Case II.

IV. HOW MANY EXPLORATIONS ARE NEEDED IN EACH ECC?
Before further studying the tracking problem in (14), we will first study the number of explorations needed in this section.
Under the constraint in (11), two explorations in each ECC are sufficient to jointly track the channel gain and 1D beam direction according to [18]. When tracking the horizontal and vertical beam direction simultaneously, it is straight forward that four explorations are feasible by separately using two explorations to track each dimension of the 2D beam direction. However, with four explorations, the channel gain is updated twice in each ECC, possibly leading to redundancy. Since it will cost time resource for each exploration, we may try to answer the question that can we reduce the times of exploration, or what is the minimum number of the explorations required?
Then the following lemma is proposed to help determine the minimum exploration overhead q in each ECC: Lemma 1 tells us that at least three explorations are required in each ECC no matter we want to jointly estimate β k and x k or just estimate x k . Hence, the minimum exploration overhead in In the next three sections, we will separately study the tracking problem for Quasi-static Case, Dynamic Case I and Dynamic Case II. In these three cases, a set of same symbols are used for the sake of writing convenience: the EBM W k , the observation vector y k and the estimate of channel parameter vectorψ k . The values of these symbols may vary for different cases.
However, it will cause little confusion as long as noticing the case where these symbols appear.
V. QUASI-STATIC TRACKING: PERFORMANCE BOUND, CONVERGENCE AND OPTIMALITY The beam and channel tracking problem for Quasi-static Case is studied here. In Quasi-static , the conditional probability density function of the observation vector y k is given by In this section, we will first provide the lower bound of tracking error in Quasi-static Case. Then we develop a tracking algorithm and prove this algorithm can converge to the minimum CRLB asymptotically.

A. Cramér-Rao Lower Bound of Tracking Error
The Cramér-Rao lower bound theory gives the lower bound of the unbiased estimation error [26]. Based on this, we introduce the following lemma to obtain the lower bound of tracking error in Quasi-static Case: Lemma 2. In Quasi-static Case, given W 1 , · · · , W k , the MSE of the channel vector in (14) is lower bounded as follows: where V is the Jacobian matrix given by and the Fisher information matrix I S (ψ, W l ) is given by Proof. See Appendix B.
The CRLB in (16) is a function of the EBMs W 1 , . . . , W k . It is hard to optimize so many EBMs simultaneously. Consider any tracking algorithm that can converge to the DPV x. Then given any error e x > 0, there exists some k 0 so that when k > k 0 , x k − x < e x . In other words, when k > k 0 , W k ≈ W = [w 1 , w 2 , w 3 ] T , where w i is defined below: with ∆ S,1 , ∆ S,2 , ∆ S,3 denoting the fixed exploring offsets in Quasi-static Case. Hence, as k → +∞, the asymptotic CRLB in (16) is given by where C S (ψ, W) is the normalized CRLB since the CRLB in (16) decreases with k.

B. Asymptotically Optimal EBM
Let us consider the optimal EBM W * S . In (21), three EBVs need to be optimized, i.e., three 2D exploring offsets need to be determined. It is hard to get analytical results for such a sixdimensional non-convex problem. Numerical search is a feasible way to obtain the three optimal exploring offsets. However, these optimal exploring offsets may be related to some system parameters, e.g., the channel gain β, the DPV x and the antenna array size M, N. Once these system parameters change, numerical search has to be re-conducted, resulting in high complexity.
This leads to the question: how do these parameters affect optimal offsets and cause the difference of CRLB. Then the following lemma is provided to answer this question: Proof. See Appendix C.
Lemma 3 reveals that ∆ * S,1 , ∆ * S,2 , ∆ * S,3 are only related to array size M, N. Hence, the numerical search times can be reduced to one for a particular array size M, N. Numerically, we find later that even if ∆ * S,1 , ∆ * S,2 , ∆ * S,3 may change for different array sizes, ∆ * can be used to take the place of ∆ * S,1 , ∆ * S,2 , ∆ * S,3 as long as M and N are sufficiently large. Therefore, the numerical search times is reduced to one in the end.
By numerical search in the main lobe of the DPV in (6), we can obtain the asymptotically optimal exploring offsets ∆ * S,1 , ∆ * S,2 , ∆ * S,3 in TABLE I and Fig. 3. With these offsets, a general way to generate the asymptotically optimal EBM W * S = [w * S,1 ,w * S,2 ,w * S,3 ] is obtained to achieve the minimum CRLB as below: By adopting ∆ * S,1 , ∆ * S,2 , ∆ * S,3 to smaller size antenna arrays, we compare the minimum CRLB and the CRLB achieved by ∆ * S,1 , ∆ * S,2 , ∆ * S,3 in TABLE I. As illustrated in Fig. 4, when the antenna number M = N ≥ 8, we can approach the minimum CRLB with a relative error less than 0.1% by using ∆ when M = N ≥ 8.

C. Joint Beam and Channel Tracking
Before we have provided a low-complexity numerical method to design the optimal EBM and obtain the minimum CRLB, given that the DPV x is known. However, in a real tracking problem, the DPV x is unknown and the EBMs need to be adjusted dynamically. In addition, a sequence of optimal beamforming matrices can only tell us what the minimum CRLB is, but it can not tell us which tracking algorithm can achieve the minimum CRLB. In this subsection, we propose a specific tracking algorithm to approach the minimum CRLB.
The proposed tracker is motivated by the following minimization problem: where the exact value and the gradient of the objective function are not available and can only be observed via the noisy vectors y 1 , · · · , y k . Hence, (24) is a stochastic optimization problem [28].
A two-layer nested optimization algorithm is proposed to find the solution of (24). In the inner layer of (24), we use stochastic Newton's method [29] to update the estimate, given bŷ where b k is the step-size for Stochastic Newton's method, Step (a) is obtained by substituting (15) into (28) and H S ψ k−1 , W k is the Hessian matrix defined below: In the end, the estimate is updated as follow: where Step (b) results from (29) and the definition that Here, b S,k is the tracking step-size that will be specified after.
In the outer layer of (24), as to be shown in Section V-D, it is equivalent to minimize the CRLB, i.e., C S ψ k−1 , W k , which leads to that the exploring offsets equal to ∆ * 3 . Finally, the proposed tracking algorithm is summarized in Algorithm 1.

D. Asymptotic Optimality Analysis
We care about the following three questions of the proposed algorithm:  (30) can converge to ψ and achieve the minimum CRLB under some constraints. Unfortunately, our tracking algorithm cannot satisfy the necessary requirements for this theorem. Since it is hard to answer the three questions at once, we try to deal with them one by one.
To answer the first question, i.e., the convergence of the proposed algorithm, we can apply EPV for receiving the i-th pilot sequence in k-th ECC is given below: wherex k [x k,1 ,x k,2 ] T and ∆ * S,i (i = 1, 2, 3) are given by TABLE I. After match filtering, the observation vector y k is obtained in (3).
2) Estimate Update: where ∂x 2 and the Fisher information matrix I S ψ k−1 , W k is defined in (18). Here, b S,k is the step-size that will be specified after. ensureψ k converges. In Theorem 5.2.1 of [31], the stable point is a crucial concept. To study the stable points of our problem, we first rewrite (30) as (33): where ς k is the updating direction defined as: This updating direction ς k is a random vector, which can be divided into a deterministic part f ψ k−1 , ψ and a zero mean stochastic partẑ k , i.e., where f ψ k−1 , ψ is defined as follows: andẑ k is given bŷ Thus, the tracking procedure in (30) can be rewritten aŝ According to [31, Section 4.3], a stable pointψ k−1 of f ψ k−1 , ψ satisfies two conditions: is negative definite. Hence, we define the stable points set as below: The channel parameter vector ψ is a stable point: Therefore, ψ is a stable point, i.e., ψ ∈ S. Other stable points in S correspond to the local optimal points of the beam and channel tracking problem, which are without the main lobe B(x).
We adopt the diminishing step-size in (40), given by [30]- [32] where K S,0 ≥ 0 and ǫ S > 0. Then the following theorem is developed to study the convergence of the proposed algorithm: Theorem 1 (Convergence to a Unique Stable Point). If b S,k is given by (40) with ǫ S > 0 and K S,0 ≥ 0, thenψ k converges to a unique stable point in S with probability one.
Proof. See Appendix D.
Therefore, for the general step-size in (40),ψ k converges to a unique stable point. Except for the channel parameter vector ψ, the antenna array gain of other stable points in S is quite low, resulting in low tracking accuracy. Unfortunately, the estimated DPV may jump out of the main lobe in the tracking process and converge to other local optimal points due to the existence of observation noise. Hence, one key challenge is to ensure that the tracking algorithm converges to ψ rather than other stable points. Then we develop the following theorem to answer the second question: Proof. See Appendix E.
It is assumed that the beam estimator in Fig. 2 can output an initial estimatex 0 within the main lobe B (x). Under the conditionx 0 ∈ B (x), Theorem 2 tells us the probability ofx k → x is related to |s| 2 . Hence, we can reduce the step-size and increase the transmit SNR |s| 2 σ 2 z to make sure thatx k → x with probability one. Sinceψ k converge to a unique stable point corresponding to the local optimal point,ψ k → ψ is also achieved whenx k → x.
The third question is answered by the following theorem: Proof. See Appendix F. In this section, we choose one special case to study for Dynamic Case I, i.e., β k satisfies the The conditional probability density function of the observation vector y k [y k,1 , y k,2 , y k,3 ] T is given by where Σ y,k is the covariance matrix of y k defined as follows: with J 3 denoting the 3-order identity matrix. Immediately, we can obtain that The following structure of this section is similar to Section V: we first formulate the beam tracking problem and provide the lower bound of it. Then we develop a tracking algorithm and prove this algorithm can converge to the minimum CRLB.

A. Problem Formulation
Since we only need to track the beam direction in Dynamic Case I, the estimation function in (8) is reformulated as follows: Let Ξ DI,k = F c k , F e DI,k denote a causal beam tracking scheme in k-th ECC: based on previously used EBMs W 1 , · · · , W k−1 and historical observations y 1 , · · · , y k−1 , choose an appropriate EBM W k , apply it to obtain y k and make an estimation of the DPV x in k-th ECC by using all EBMs and observations available. Hence, in k-th ECC, the beam tracking problem is formulated as: (1), (2), (5), (7), (13), (46), where the constraint (48) ensures thatx k is an unbiased estimate of the DPV x.
Before providing a specific tracking algorithm, we will first explore the performance bound of problem (47).

B. Cramér-Rao Lower Bound of Tracking Error
We now perform some theoretical analysis on beam tracking problem. Based on the CRLB theory in [26], we introduce the following lemma to obtain the lower bound of tracking error: Lemma 4. In Dynamic Case I, given W 1 , · · · , W k , the MSE of the DPV in (47) is lower bounded as follows: The Fisher information matrix I DI (x, W l ) is given by where the p-th row, j-th column (p = 1, 2; j = 1, 2) of I DI (x, W l ) is given in (51): with g l ,g l,p and G l,p defined below: Proof. We use Theorem 3.2 in [26] to prove Lemma 4. For the detailed proof, see Appendix G in [27].
The CRLB in (49) is a function of the EBMs W 1 , . . . , W k . Similar to that in Quasi-static Case, we consider the normalized CRLB: By optimizing only one EBM W, we can further get the minimum CRLB, given by Solving problem (54) yields the optimal EBM W * DI = w * DI,1 , w * DI,2 , w * DI,3 : where ∆ * DI,1 , ∆ * DI,2 , ∆ * DI,3 denote the optimal exploring offsets in Dynamic Case I.

C. Asymptotically Optimal EBM
Consider the optimal EBM W * DI now. In (54), three EBVs need to be optimized, i.e., three 2D exploring offsets need to be determined. Numerical search is a feasible way to obtain the three optimal exploring offsets. However, it may result in high complexity since these optimal offsets may change with some system parameters, e.g., channel gain parameter σ 2 β , the DPV x and antenna array size M, N. This leads to the question that how these parameters affect the optimal offsets. Then the following lemma is proposed to answer the question.
Lemma 5. In Dynamic Case I, the optimal exploring offsets ∆ * DI,1 , ∆ * DI,2 , ∆ * DI,3 have the following three properties: and converge to constant values as and ∆ * DI,1 , ∆ * DI,2 , ∆ * DI,3 are unrelated to Proof. Lemma 5 is obtained by exploring the property of the Fisher information matrix in (50).
Due to the space limitation, the detailed proof is omitted here and provided in Appendix H of [27].
Lemma 5 reveals that ∆ * DI,1 , ∆ * DI,2 , ∆ * DI,3 are only related to array size M, N and Hence, the numerical search times can be reduced to one for a particular array size M, N and a 22  is not quite small. Therefore, the numerical search times is reduced to one in the end.
By numerical search in the main lobe of the DPV in (6), we can obtain the asymptotically optimal exploring offsets ∆ * DI,1 , ∆ * DI,2 , ∆ * DI,3 in TABLE II and Fig. 5. With these offsets, a general way to generate the asymptotically optimal EBM W * DI = [w * DI,1 ,w * DI,2 ,w * DI,3 ] is obtained to achieve the minimum CRLB as below: By adopting ∆  EPV for receiving the i-th pilot sequence in k-th ECC is given below: wherex k = [x k,1 ,x k,2 ] T and ∆ * DI,i (i = 1, 2, 3) are given by TABLE II. After match filtering, the observation vector y k is obtained in (3).
2) Estimate Update: The estimatex k = [x k,1 ,x k,2 ] T is updated bŷ where I DI (x k−1 , W k ) is defined in (50) and b DI,k is the step size that will be specified later.

D. Recursive Beam Tracking with Asymptotic Optimality Analysis
We now provide a specific beam tracking algorithm to approach the minimum CRLB in (54).
The proposed tracker is motivated by the following maximization likelihood problem: Similar to that in Section V, we propose a two-layer nested optimization algorithm to find the solution of (58). Finally, the proposed tracking algorithm is given in Algorithm 2.
We now perform the asymptotic optimality analysis. Diminishing step-size is adopted in (61), according to [30]- [32] b DI,k = ǫ DI k + K DI,0 , k = 1, 2, · · · (61) where K DI,0 ≥ 0 and ǫ DI > 0. Then we can prove that if the initial estimatex 0 is within the main lobe and ǫ DI = 1, the proposed algorithm can converge to x with the minimum CRLB asymptotically, i.e., The proof is similar to that in Section V and the details are omitted here since nothing new is provided in the proof. EPV for receiving the i-th pilot sequence in k-th ECC is given below: wherex k = [x k,1 ,x k,2 ] T and ∆ DII,i = ∆ * S,i (i = 1, 2, 3) are given by TABLE I. After match filtering, the observation vector y k is obtained in (3).
2) Estimate Update: where I S ψ k−1 , W k is the Fisher information matrix defined in (18) and b DII,k is the step size for Dynamic Case II.

VII. JOINT BEAM AND CHANNEL TRACKING FOR DYNAMIC CASE II
In Dynamic Case II where both the channel gain β k and the DPV x k change fast, the conditional probability density function of the observation vector y k is given by Establishing theorems of tracking, as in Section V and Section VI, is very difficult in Dynamic Case II. Even if theoretical analysis is not conducted in this section, we still provide a tracking algorithm.
Inspired by the asymptotically optimal tracking algorithm in Section V and Section VI, we design a similar joint beam and channel tracking algorithm in Algorithm 3.
Different from the step-size in Quasi-static Case and Dynamic Case I, we adopt constant step-size in Dynamic Case II because diminishing step-size cannot track the fast-changing β k and x k . The constant step-size b DII,k will be specified later.

VIII. NUMERICAL RESULTS
We obtain some numerical results to verify the performance of our proposed tracking algorithms for Quasi-static Case, Dynamic Case I and Dynamic Case II. Reference algorithms include the compressed sensing algorithm in [9], the IEEE 802.11ad algorithm in [15], the For initial estimate before tracking in Fig. 2, an exhaustive beam sweeping is conducted for our proposed algorithms and the EKF method in [16]. Then an initial estimate is obtained by using the orthogonal matching pursuit method in [33]. This ensures that the initial estimated DPVx 0 is within the main lobe in (6). As for the compressed sensing and IEEE 802.11ad tracking algorithm, the initial estimate can be obtained by the algorithm itself.
As for tracking, three explorations are conducted in each ECC for all the algorithms to ensure fairness. When adopting the joint beam and channel tracking algorithm by using four explorations, we use a buffer to store the received observations and update the estimate when receiving four new observations. Next we will show the performance of these tracking algorithms separately.

A. Quasi-static Case
In Quasi-static Case, the AoA (θ,φ) as defined in Section II is chosen evenly and randomly in θ ∈ 0, π 2 , φ ∈ [−π, π). The channel gain β is modeled as Rician fading with a K-factor κ=15dB, according to the channel model in [34]. The step-size is set as b S,k = 1 k . Simulation results are averaged over 1000 random system realizations. Fig. 8 indicates that the channel 30 300 3000 Exploration number

B. Dynamic Case I
In Dynamic Case I, the AoA (θ,φ) as defined in Section II is chosen evenly and randomly in θ ∈ 0, π 2 , φ ∈ [−π, π). The channel gain β k is modeled as Rayleigh fading with The tracking step-size is set as b DI,k = 1 k . Simulation results are averaged over 1000 random system realizations. Fig. 9 indicates that the DPV MSE of our proposed algorithm can converge to the minimum CRLB and achieves much lower tracking error than other algorithms.

C. Dynamic Case II
In Dynamic Case II, the AoA (θ k ,φ k ) as defined in Section II is modeled as a random walk process, i.e., θ k+1 = θ k + ∆θ, φ k+1 = φ k + ∆φ; ∆θ, ∆φ ∼ CN (0, δ 2 A ). The initial AoA values are chosen evenly and randomly in θ 0 ∈ 0, π 2 , φ 0 ∈ [−π, π). The channel gain is modeled as a first-order Gaussian-Markov process, i.e., β k+1 = ρβ k + γ k , where γ k ∼ CN (0, 1 − ρ 2 ). We adopt ρ = 0.995 in simulation. The initial channel gain β 0 is modeled as Rician fading with a K-factor κ=15dB, according to the channel model in [34]. As for the step-size, numerical results show that when b DII,k = 0.7, the joint beam and channel tracking algorithm can track beams with higher velocity. Hence the step-size is set as a constant b DII,k = 0.7. Simulation results are averaged over 1000 random system realizations. Fig. 10  This work is the first step to beam and channel tracking with 2D phased antenna arrays. In future work, we will further study the following problems: i) establishing the corresponding theorems in Dynamic Case II; ii) jointly tracking multiple paths; iii) tracking at both the transmitter and receiver.
Then the relationship between the phase angles of two different observations y k,i and y k,j (i = j) is given in (68), where ω k,j1 − ω k,i1 and ω k,j2 − ω k,i2 are determined by the exploring DPV and unrelated to the channel parameter vector ψ k . From (68), we can know that once the exploring directions are determined, the relative phase angle between two observations belongs to one of the two constant values that are unrelated to ψ k . In other words, the relative phase angles can help nothing in unique estimation.
Then we explore the minimum observation overhead of the following two cases: 1) If we want to obtain unique estimate of ψ k within one ECC, at least 4 independent real equations with respect to ψ k are needed since ψ k contains four real variables (i.e., the real part β re k , the imaginary part β im k of channel gain β k and the two direction parameters x k,1 , x k,2 ). After q explorations in each ECC, we can obtain q independent amplitude equations and only 1 independent phase angle equation, which are q + 1 independent real equations with respect to ψ k in total. Hence, at least 3 explorations are needed to obtain 4 independent real equations and estimate 4 real variables of ψ k .
2) If we only want to obtain unique estimate of x k within one ECC, at least 2 independent real equations with respect to x k are needed since x k contains two real variables (i.e., two direction parameters x k,1 , x k,2 ). It seems that fewer explorations are sufficient. However, we cannot obtain any absolute amplitude and phase information with respect to x k from one observation in (65) since β k is unknown. In addition, the relative phase angles are one of two determined values that are unrelated to x k . Thus, the phase angle equations are useless for estimating x k . After q explorations in each ECC, we can obtain q − 1 independent relative amplitude equations with respect to x k in total. Hence, at least 3 explorations are needed to obtain 2 independent real equations and estimate 2 real variables of x k . Therefore, the proof is completed.

APPENDIX B PROOF OF LEMMA 2
In problem (14), the constraint (10) ensures thatĥ k is an unbiased estimate of h. Consider each element of the channel vector h, i.e., h mn (ψ) = βe j2π( m−1 According to section 3.8 of [26], if a function f ψ is an unbiased estimate of f (ψ), i.e., E f (ψ) = f (ψ), then we can obtain that where Var[f (ψ)] denotes the variance of f (ψ) and I(ψ) is the corresponding Fisher information matrix.
Combining (14) and (69), we have where Step (a) is obtained by substituting (69) into (70) and Step (b) is due to the definition of V in (17).
As for the Fisher information matrix in (18), we can obtain ∂log p S (y l |ψ,W l ) ∂β re as follows: Similarly, ∂log p S (y l |ψ,W l ) ∂β im , ∂log p S (y l |ψ,W l ) ∂x 1 , and ∂log p S (y l |ψ,W l ) ∂x 2 are given as Hence, the gradient of log p S (y l |ψ, W l ) is obtained as follows: With the help of (73), we can obtain that Substituting (73) and (74) into (18), the Fisher information matrix is given as follows: where in Step (c) we have used the following property of Re {·}: with u, v denoting column vectors andū denoting the conjugate of u.
Step (d) is due to the exchangeability of E [·] and Re {·}: Step (e) is due to the i.i.d. circularly symmetric complex Gaussian property of each element of z l , which means that E z l z H l = σ 2 z J 3 , where J 3 is a 3-order identity matrix.
Step (f ) in (77) reults from the property of complex Gaussian noise: Proof. For z k = [z k,1 , z k,2 , · · · , z k,q ] T and z k,i ∼ CN (0, σ 2 z ), i = 1, 2, · · · , q as an i.i.d. circularly symmetric complex Gaussian random variable, assume that z k,i = z x k,i + jz y k,i . Then we have Hence, we can obtain that and E [z k,i z k,j ] = 0, for i = j because z k,i and z k,j are independent.
Correspondingly, E z l z T l = 0 and which yields the result of Step (f ).
Therefore, the Fisher information matrix is derived in (75) and Lemma 2 is proved in the end.

APPENDIX C PROOF OF LEMMA 3
Lemma 3 is proved in three steps: Step 1: We prove that ∆ * S,1 , ∆ * S,2 , ∆ * S,3 are unrelated to the channel gain β. The basic method is block matrix inversion. We first rewrite the Jacobian matrix V in (17) as follows: where V 1 and V 2 are given by It is clear that both V 1 and V 2 are unrelated to β. Besides, we can obtain the following property where X is an arbitrary 2 × 2 matrix andV 1 denote the conjugate of V 1 . With the help of Jacobian matrix V in (82), the Fisher information matrix in (18) can be divided into four 2 × 2 matrices as follows: whereβ denotes the conjugate of β and A, B, D are defined as: By combining (84) and (86), we can obtain the property of B: where X is an arbitrary 2 × 2 matrix.
By using the block matrix inversion method, the inverse of the Fisher information matrix in (85) is given by where I ip 1 and I ip 2 (β) are defined in (89) and (90): with J 2 denoting a 2-order identity matrix. The middle part of I ip 2 , i.e., ( Re {βB}), can be rewritten as follows: where Step (a) results from the property of B in (87), Step (b) is due to that A defined in (86) is a real matrix and Step (c) is due to the definition of I s : Therefore, we can rewrite I ip 2 in (90) as follows: By combining (20) and (88), we can obtain that where Step (d) is by substituting (82) and (89) into (94). Since both V 1 in (83) and A in (86) are unrelated to the channel gain β, the first part of (94), i.e., Tr A −1 V H 1 V 1 are unrelated to β. By substituting (82) and (93), we can obtain the second part of (94), i.e., Tr I ip 2 (β) V H V in (95), where Step (e) and Step (f ) follow the property of the B in (87). It is clear that is also unrelated to β because none of the matrix A, B, V 1 , V 2 is related to β. Hence, C S (ψ, W) is unrelated to β.
Consider the CRLB in (20). we will first prove that the Fisher information matrix I S (ψ, W) is unrelated to the DPV x. Next we will prove that V H V is also unrelated to x. Then it is clear that the minimum CRLB and the optimal exploring offsets ∆ * S,1 , ∆ * S,2 , ∆ * S,3 are unrelated to x. The Fisher information matrix in (85) tells us that only W H V may be related to x, which is given by with W H a (x), W H ∂a(x) ∂x 1 and W H ∂a(x) ∂x 2 expanded as follows: Since the EBVs are of steering vector forms, i.e., ] T denotes the i-th exploring offset, the elements of W H a (x) and W H ∂a(x) ∂x 1 can be written as: As shown in (98)  ∂x 2 also has nothing to do with x. Therefore, W H V in (96) is unrelated to x. Hence, the whole Fisher information matrix in (85) is invariant to x.
As for V H V, we write it as below: which shows that V H V has nothing to do with x. (20), i.e., C S (ψ, W), is unrelated to x because both the Fisher information matrix I S (ψ, W) and V H V have nothing to do with x. Therefore, the minimum CRLB in (21) and the optimal exploring offsets ∆ * S,1 , ∆ * S,2 , ∆ * S,3 are invariant to the DPV x. in (97) are given as follows:

Now it is clear that the CRLB in
Hence, each element of W H V/ √ MN in (96) converges when M, N → +∞, which results in that I S (ψ, W)/MN in (85) also converges. The limit is defined as follows: The limit of V H V in (100) is given by By combining (102) which reveals that the minimum CRLB in (21), i.e., C min S (ψ), converges and the optimal exploring offsets ∆ * S,1 , ∆ * S,2 , ∆ * S,3 also converge to constant values determined by (104). Therefore, Lemma 3 gets proved.

APPENDIX D PROOF OF THEOREM 1
Recall the beam and channel tracking procedure in (38). Since z k [z k,1 , z k,2 , z k,3 ] in (37) is composed of three i.i.d. circularly symmetric complex Gaussian random variables, the expectation ofẑ k is E [ẑ k ] = 0 and the covariance matrix is given as follows: where Step (a) is the result of (73) and Step (b) follows the definition of the Fisher information matrix in (18).
Assume {G k : k ≥ 0} is an increasing sequence of σ-fields of {ψ 0 ,ψ 1 ,ψ 2 , . . .}, i.e., G k−1 ⊂ G k , circularly symmetric complex Gaussian random variables with zero mean,ẑ k is independent of for k ≥ 1 and ς k = f ψ k−1 , ψ +ẑ k is also independent of G k−1 . Theorem 5.2.1 in [31,Section 5.2.1] gives the conditions that ensureψ k converges to a unique point with probability one when there are several stable points. Next, we will prove that if the step-size b S,k is given by (40) with any ε S > 0 and K S,0 ≥ 0, the joint beam and channel tracking algorithm in (32) satisfies the corresponding conditions below: 1) Step-size requirements: 2) It is necessary to prove that sup k E f ψ k−1 , ψ +ẑ k 2 2 < +∞.
From (38) and (105), we have where Step (c) is due to (105) and thatẑ k is independent of f ψ k−1 , ψ . From (36), we have As the Fisher information matrix is invertible, we get and for i = 1, 2, 3 and all possible w k,i and x, where [δ k,i1 , δ k,i2 ] T = ω k,i − x. Thus we can get Combining (110) and (114), we have According to (110), it is clear that Tr I(ψ k−1 ,W k ) −1 < +∞. Then, we can get that 3) The function f ψ k−1 , ψ should be continuous with respect toψ k−1 . By using (36), we know that each element of f ψ k−1 , ψ is continuous with respect tô . Therefore, f ψ k−1 , ψ is continuous with respect tô ψ k−1 .

4) Let
We need to prove that +∞ k=1 b S,k µ k 2 < +∞ with probability one. From (106), we get µ k = 0 for all k ≥ 1. So we have +∞ k=1 b S,k µ k 2 = 0 < +∞ with probability one. By Theorem 5.2.1 in [31],ψ k converges to a unique stable point within the stable points set with probability one.

APPENDIX E PROOF OF THEOREM 2
Theorem E is proven in three steps: Step 1: Two continuous processes based on the discrete processψ k = [β re The discrete time parameters are defined as: t 0 The first continuous processψ(t), t ≥ 0 is constructed as the linear interpolation of the sequenceψ k , k ≥ 0, wherē ψ(t k ) =ψ k , k ≥ 0. Therefore,ψ(t) is given bȳ The second continuous processψ k (t) is the solution of the following ordinary differential equation (ODE): for t ∈ [t k , ∞), whereψ k (t k ) =ψ(t k ) =ψ k , k ≥ 0. Thus,ψ k (t) can be given as Step 2: By using the two continuous processesψ(t) andψ k (t) constructed in Step 1, a sufficient condition for the convergence of the discrete processx k is provided here.
We first construct a time-invariant set I that includes the DPV x within the main lobe, i.e., as the beam direction of the processψ 0 (t) that is closest to the boundary of the main lobe, which is given by 3 [ ] Then we pick δ such that where u −∞ = min l=1,2 [u] l denotes the minimum element of u. Note that when t ≥ t b , the solutioñ ψ 0 (t) of the ODE (118) will approach the real channel gain β and DPV x monotonically as time t increases. Hence, we construct the invariant set I below: An example of the invariant set I is shown in Fig. 11.
Then, a sufficient condition will be established in Lemma 6 that ensuresx k ∈ I for k ≥ 0, and hence from Corollary 2.5 in [32], we can obtain thatx k converges to x. Before giving Lemma 6, let us provide some definitions first: the DPV x monotonically as time t increases. One possible T is given by where [·] i denotes the i-th element of the vector. Hence, we can obtain the following lemma: ≤ δ for all l ≥ 0, thenx k ∈ I for all k ≥ 0.
According to Lemma 1 in [18],x k,1 ∈ I for all k ≥ 0 andx k,2 ∈ I for all k ≥ 0. Hence, x k ∈ I for all k ≥ 0.
Step 3: We will derive the probability lower bound for the condition in Lemma 6, which is also a lower bound for P (x k → x|x 0 ∈ B (x)).
We will derive the probability lower bound for the condition in Lemma 6, which results in the following lemma: Lemma 7. If (i) the initial point satisfiesx 0 ∈ B(x), (ii) b S,k is given by (40) with any ǫ S > 0, then there exist K S,0 ≥ 0 and R > 0 such that Proof. See Appendix I.
Finally, applying Lemma 7 and Corollary 2.5 in [32], we can obtain which completes the proof of Theorem 2.

APPENDIX F PROOF OF THEOREM 3
If the step-size b S,k is given by (40) with any ε S > 0 and K S,0 ≥ 0, the sufficient conditions are provided by Theorem 6.6.1 [30, Section 6.6] to prove the asymptotic normality of With the condition thatψ k → ψ, we can prove that the beam and channel tracking algorithm satisfies the conditions above and obtain the variance Σ x as follows: 1) Equation (38) is supposed to satisfy: (i) there exists an increasing sequence of σ-fields {F k : k ≥ 0} such that F l ⊂ F k for l < k, and (ii) the random noiseẑ k is F k -measurable and independent of F k−1 .
As is shown in Appendix D, there exists an increasing sequence of σ-fields {G k : k ≥ 0}, where ς k is measurable with respect to G k−1 and independent of G k−1 .
2)x k should converge to x almost surely as k → +∞.
3) The stable condition: In (36), we rewrite f ψ k−1 , ψ as follows: where D 1 is given by Then the stable condition is obtained that: which leads to ε S > 1 2 . 4) The noise vectorẑ k satisfies: and where step (a) is obtained from (105).
Combining (69), (132) and (21), we can conclude that APPENDIX G PROOF OF LEMMA 4 In problem (47), the constraint (48) ensures thatx k is an unbiased estimate of x. According to section 3.7 of [26], ifx is an unbiased estimate of x, then we can obtain that which reveals that ∆ * DI,1 , ∆ * DI,2 , ∆ * DI,3 are unrelated to Finally, the proof is completed.
APPENDIX I PROOF OF LEMMA 7 The following lemmas are introduced to prove Lemma 7.
Lemma 8 (Lemma 3 [18]). Given T by (123) and If there exists a constant C > 0, which satisfies for all k ≥ 0 and 1 ≤ l ≤ k T , then where L and C f are defined in (153) and (154) separately.
Let ξ 0 ∆ = 0 and ξ k ∆ = k l=1 b S,lẑl , k ≥ 1, whereẑ l is given in (37). With (117) and (119), we have for t k+l , 1 ≤ l ≤ k T , and ψ n (t k+l ) =ψ To bound , ψ dv on the RHS of (152), we obtain the Lipschitz constant of function f(v, ψ) considering the first varible v, given by Similar to (109), for any t ≥ t k , we can obtain that there exists a constant 0 < C f < +∞ such that f ψ k (t), ψ 2 ≤ C f .