Fast Reconstruction and Iterative Updating of Spatial Covariance Matrix for DOA Estimation in Hybrid Massive MIMO

Due to the high resolution property, multiple signal classification (MUSIC) algorithm has been widely used for direction-of-arrival (DOA) estimation in wireless systems. To reduce the cost caused by radio frequency (RF) chains, hybrid structure has been adopted in massive MIMO systems operating at millimeter-wave bands. With hybrid structures, the received signals at the antennas are not fed directly to the receiver, and thus the spatial covariance matrix (SCM), which is essential to MUSIC algorithm, cannot be obtained using traditional sample average algorithm. Based on our previous works, we propose a fast beam sweeping algorithm in this article for SCM reconstruction in hybrid massive MIMO systems. Using multiple RF chains in hybrid structures, we find that beam sweeping can be conducted in a parallel manner so that the SCM can be reconstructed much faster than our previous works. To further accelerate the updating of the reconstructed SCM, an iterative updating procedure is also developed in this article. The iterative procedure can update the reconstructed SCM much faster than the batch-processing approach because it updates the SCM immediately as long as new beam sweeping results are available. Simulation results are also presented in this article to demonstrate the improvement of the proposed approach.


I. INTRODUCTION
Direction-of-arrival (DOA) estimation is an important topic in the field of array signal processing. Many algorithms have been proposed and investigated for DOA estimation in the past decades [1]. As one of the most important algorithms, multiple signal classification (MUSIC) algorithm gained a lot of attention due to its high resolution property [2], it has also been applied in many practical wireless systems.
To satisfy the future demand for high rate transmission, millimeter wave (mmWave) bands are being explored in the fifth generation (5G) and beyond 5G cellular systems [3], [4]. Massive multiple-input multiple-output (MIMO) is one of the most important enabling technologies in mmWave The associate editor coordinating the review of this manuscript and approving it for publication was Luyu Zhao . communication systems because the high path loss in the mmWave bands can be compensated effectively by the large array gain in massive MIMO systems [5]. Due to the high cost of radio frequency (RF) chains, digital beamformer can be hardly adopted in mmWave communication systems. Instead, analog beamformers are widely considered in massive MIMO at mmWave bands in order to reduce the number of RF chains. This leads to the hybrid analog-digital structure in massive MIMO systems [6]- [10]. For the hybrid structure, beamforming is conducted in the analog domain, and a group of phase shifters are used for analog beamforming. The received signals at the antennas are first combined in the analog domain before sent to the digital receiver via RF chains. In this way, the number of RF chains can be greatly reduced.
Although widely adopted, the hybrid structure makes it difficult to use MUSIC algorithm for DOA estimation in massive FIGURE 1. In hybrid massive MIMO, the received signals are first combined in the analog domain before sent to the digital receiver via RF chains. For each RF chain, the received signal is first down-converted to the baseband and then sampled by the analog-to-digital converter (ADC).
MIMO systems. For the hybrid structure, the received signals at the antennas are unavailable to the digital receiver. In this case, the spatial covariance matrix (SCM), which is essential in MUSIC algorithm, cannot be obtained using traditional sample average approach [11]. As a consequence, MUSIC algorithm cannot be directly used in massive MIMO systems with hybrid structures. As the MUSIC algorithm is not applicable in hybrid systems, a simple method for DOA estimation is to search for the direction with the maximum received power [12]- [14]. However, these methods are restricted by the Rayleigh limitation, and thus the high resolution property cannot be sustained [2].
To utilize the high resolution property of MUSIC algorithm, we have developed a beam sweeping algorithm for SCM reconstruction in massive MIMO system with single RF chain [15]. Based on our previous works in [15], we will propose a fast beam sweeping algorithm in this article for SCM reconstruction in hybrid massive MIMO systems with multiple RF chains. In the presence of multiple RF chains, the predetermined DOA required in [15] are divided into several sub-groups, with each sub-group corresponding to one RF chain. Then, by beam sweeping simultaneously on all RF chains, the predetermined DOAs can be swept in a parallel manner, so that the SCM can be reconstructed much faster than our previous works. The parallel beam sweeping algorithm proposed in this article is designed to work in a batch-processing manner, and the reconstructed SCM cannot be updated until all the predetermined DOAs have been swept. To accelerate the updating of the reconstructed SCM, an iterative updating procedure is further developed in this article. Using the iterative procedure, the reconstructed SCM can be updated immediately as new beam sweeping results are available. As a result, the updating of reconstructed SCM can be accelerated because there is no need to wait for sweeping all the predetermined DOAs. Computer simulation is also conducted in this article to demonstrate the improvement of the newly proposed algorithms.
The rest of this article is organized as follows. In Section II, signal model for hybrid massive MIMO is introduced and then traditional MUSIC algorithm is reviewed briefly. In Section III, the fast beam sweeping algorithm is presented, including the sub-grouping of predetermined DOAs and parallel beam sweeping. The iterative updating procedure is presented in Section IV. Simulation results can be found in Section V and the conclusions are drawn in Section VI.

II. SYSTEM MODEL
In this section, we will first present the signal model used in this article. Then, we will review briefly the traditional MUSIC algorithm, and discuss its restriction in hybrid massive MIMO systems. Fig. 1, we consider a hybrid massive MIMO system composed of a uniform linear array (ULA) with M antennas and N RF chains. The received signals at the antennas are divided onto N paths using analog power splitters. After going through the analog phase shifters, the signals are combined using the power splitters before sent to the RF chains.

As in
Denote y m (t) to be the received signal at the m-th antenna, and x i (t)'s (i = 0, 1, · · · , L − 1) are L narrow-band signals impinging from far field onto the array. Denote (·) T to be the VOLUME 8, 2020 transpose of a matrix or a vector, then the received signal vector y(t) = [y 0 (t), y 1 (t), · · · , y M −1 (t)] T can be represented as where θ i is the DOA of x i (t), z(t) denotes the additive Gaussian noise vector with E{z(t)z H (t)} = N 0 I M where N 0 is the noise power and I M is an M ×M identity matrix, (·) H indicates the conjugate transpose of a matrix or a vector and a(θ i ) is an M × 1 steering vector with the m-th entry where d denotes the distance between adjacent antennas and λ indicates the wave length.
If L signals are mutually independent and the power of the i-th signal is E{|x i (t)| 2 } = σ 2 i , then the M × M SCM for the received signal vector in (1) can be obtained as As shown in (3), SCM is Hermitian, and therefore the eigenvalue decomposition of the SCM can be given by where U s and U n denote the orthogonal base vectors corresponding to the signal and the noise subspaces, respectively, and s is a diagonal matrix composed of the eigenvalues corresponding to R.
where K denotes the number of samples. Accordingly, the eigenvalue decomposition of the estimated SCM in (5) can be given as where U s , U n , and s denote the orthogonal base vectors, the noise subspaces, and the eigenvalue matrix corresponding to the estimated SCM. Then, unknown DOAs can be determined by searching the peak values of P(θ ), In (5), y[k] is necessary to estimate the SCM. In hybrid massive MIMO, however, Fig. 1 indicates that the received signals at the antennas cannot be obtained directly by the digital receiver. Only the weighted combination of the entries of y[k] is available in each RF chain. As a result, the sample average in (5) cannot be used in massive MIMO systems with hybrid structures.
To address this issue, we have developed an algorithm in [15] for SCM reconstruction in massive MIMO with single RF chain. As only one RF chain is considered in [15], the procedure for SCM reconstruction requires a long time duration because the predetermined DOAs has to be swept sequentially. In this article, we will introduce a fast beam sweeping algorithm in Section III so that the SCM can be reconstructed much faster.

III. FAST BEAM SWEEPING ALGORITHM
For beam sweeping algorithm, we can define as a group of predetermined DOA angles. For the single RF chain case as in [15], the analog beamformers switch the beam directions to the predetermined DOAs in (8) sequentially. Then, by collecting the received power on each predetermined DOA, the SCM can be reconstructed through solving a set of linear equations. For the fast beam sweeping algorithm, the predetermined DOAs are first divided into a set of sub-groups, with each sub-group corresponding to one RF chain. Then, the beam sweepings are conducted on all RF chains simultaneously.

A. SUB-GROUPING OF PREDETERMINED DOAs
In the case of N RF chains, define n to be a sub-group of the predetermined DOAs corresponding to the n-th RF chain, then we have Let | n | indicate the cardinality of the set n , then we can obtain that | n | is an integer with 0 ≤ | n | ≤ Q, and If the beam sweepings are conducted simultaneously on all RF chains, the time duration for beam sweeping procedure is determined by the RF chain with the most predetermined DOAs. In other words, the required time duration should be proportional to Therefore, we should minimize (11) in order to achieve the fastest beam sweeping. Considering that 0 ≤ | n | ≤ Q and the restriction in (10), it it easy to obtain that the minimization in (11) can be achieved when Q predetermined DOAs are uniformly divided among all the RF chains. Without loss of generality, we assume that Q/N is an integer to simplify the notation in the rest of this article. Under this assumption, we can divide Q predetermined DOAs evenly into N subgroups, that is Accordingly, we have max n=0,··· ,N −1 which means each sub-group has Q/N predetermined DOAs.

B. PARALLEL BEAM SWEEPING
For the n-th RF chain, the analog beamformer switches the beam direction to the predetermined DOA angles in (12) sequentially. For the q-th sweeping beam on the n-th RF chain, the predetermined DOA is θ (nQ/N +q) where q = 0, 1, · · · , Q/N − 1. Accordingly, the analog beamforming vector on the n-th RF chain is a n (θ (q) ) = a(θ (nQ/N +q) ). (14) In this case, the weighted combination of the received signals in the n-th RF chain can be represented by From Fig. 1, the signal combination is first down-converted to the baseband and then sampled before sent to the digital receiver. Therefore, the sampled signal can be expressed as Denote P (q) n to be the average power of c (q) n [k], then it can be obtained, using sample average, as . (17) When the number of samples is large enough, the sample average in (17) can be replaced by the statistical average, and thus we can obtain Using the vec(·) operator to (18), the left-hand-side of (18) can be given as vec{a H n (θ (q) )Ra n (θ (q) )} = [a T n (θ (q) )⊗a H n (θ (q) )]vec(R), where we have used equation (1.10.25) in [16], that is, Denote b n (θ (q) ) = a n (θ (q) ) ⊗ a * n (θ (q) ) and r = vec(R), which are both M 2 × 1 vectors, then (18) can be rewritten as If taking all predetermined DOAs in n into account, (21) can be extended to a group of linear equations as where p n = [P n (θ (0) ), P n (θ (1) ), · · · , P n (θ (Q/N −1) )] T (24) are M 2 × Q/N matrix and Q/N × 1 vector, respectively. In above, we only consider the n-th RF chain. In fact, the procedure from (15) to (22) can be also conducted for all the other RF chains simultaneously. In other words, the beam sweepings over n 's with n = 0, 1, · · · , N − 1 can be conducted in a parallel manner. Therefore, (22) can be further extended, by taking the other RF chains into account, as where are Q × M 2 matrix and Q × 1 vector, respectively. Then, similar to [15], the SCM can be obtained, by solving (25), as where I M 2 denotes an M 2 × M 2 identity matrix, σ 2 is a diagonal loading scalar to avoid ill-conditioned result, and is an M 2 × Q matrix. Finally, the desired SCM can be reconstructed through R = unvec( r).
Note that the second equation in (27) indicates that the vector form of R can be obtained by projecting the beam sweeping result, p, onto the column space of C. Although the matrix inversion in C causes heavy computational burden for real-time processing, C can be pre-calculated off-line if the predetermined DOA angles and diagonal loading coefficient are fixed. In this case, the huge computational burden for real-time processing can be avoided, and only a linear matrix production is required for real-time processing, which needs M 2 Q complex multiplications. Therefore, compared to the single RF chain case in [15], adopting multiple RF chains has no impact on the computation complexity.

C. DISCUSSION
For the single RF chain case as in [15], the predetermined DOAs have to be swept sequentially. If the number of predetermined DOAs is Q, then Q beam sweeping operations are required with each beam sweeping operation corresponding to one predetermined DOA. As a result, the number of beam sweeping operations is Q. VOLUME 8, 2020 In the case of multiple RF chains, Q predetermined DOAs are divided into N sub-groups, with each sub-group having Q/N predetermined DOAs. For each beam sweeping operation, N RF chains can sweep N predetermined DOAs simultaneously, with each RF chain corresponding to one predetermined DOA. Therefore, the number of beam sweeping operations is Q/N , which is N times lower compared to the single RF chain case. As a result, the number of beam sweeping operations can be reduced significantly, and thus the procedure for SCM reconstruction can be much faster.

IV. ITERATIVE UPDATING
For continuous SCM reconstruction, the parallel beam sweeping algorithm presented in Section III can update the SCM in a batch-processing manner. In this case, all predetermined DOAs have to be swept before the reconstructed SCM can be updated. In this section, an iterative updating procedure will be presented so that the reconstructed SCM can be updated as long as the new beam sweeping results are available. Since there is no need to wait until all the predetermined DOAs are swept, the reconstructed SCM can be updated much faster.

A. BATCH-PROCESSING
To describe the iterative updating procedure, consider a sample sequence in the time domain as in Fig. 2 (a). In Fig. 2 (a), one sample bundle contains K samples. Due to parallel beam sweeping, N beam sweeping results can be obtained in one bundle, with each one corresponding to one RF chain. Therefore, for batch-processing, Q/N bundles are required to reconstruct the SCM because we need Q/N bundles to sweep all the predetermined DOAs, as in Fig. 2 (b).

B. ITERATIVE UPDATING PROCEDURE
Denote P n [i] to be the beam sweeping result during the i-th sample bundle on the n-th RF chain. Actually, P n [i] is equal to P n (θ (q) ) in (17) if the predetermined DOA, θ (q) , is swept in the i-th sample bundle. Then, denote p[i] to be the overall beam sweeping results in the i-th bundle. Similar to (26), p[i] can be represented by . . .
where p n [i] corresponds to the beam sweeping results on the n-th RF chain in the i-th bundle. For the n-th RF chain, the analog beamformer sweeps the predetermined DOAs in (12) sequentially. Since only one predetermined DOA can be swept in one sample bundle, p n [Q/N + i] can be given as  30) can be better demonstrated if we consider the following differential form, that is Using the identity in (27), C can be divided into where C n = (B H B + σ 2 I M 2 ) −1 B * n is an M 2 × Q/N matrix, then (33) can be rewritten as By substituting (31) into (35), the iterative updating procedure can be obtained as VOLUME 8, 2020 where c n,i denotes the i-th column of C n , that is Since c n,i 's can be pre-calculated once the predetermined DOAs are fixed, each iteration in (36) needs NM 2 complex multiplications. Equation (36) shows that the vector form of SCM reconstructed in the i-th sample bundle can be obtained as the SCM reconstructed in the (i − 1)-th sample bundle plus an correction term. The correction term is a linear combination of c n,i , where the coefficients depends on the difference between of the beam sweeping results in the (Q/N + i)-th bundle and the i-th bundle. The iterative updating procedure in (36) is demonstrated in Fig. 2 (c).

C. DISCUSSION
The iterative updating procedure in (36) indicates that the reconstructed SCM can be updated every sample bundle. As a comparison, if the batch-processing is used to update the reconstructed SCM, we can obtain where i 0 is an integer. Equation (38) indicates that the batch-processing cannot update the reconstructed SCM every sample. Instead, the SCM is updated every Q/N sample bundles because the batch-processing has to wait until all the predetermined DOAs have been swept. Therefore, the iterative procedure can achieve a faster updating than the batchprocessing, as shown in Fig. 2 (c). Note that the iterative procedure cannot work without initialization although it can accelerate the updating of SCM. In this sense, the batch-processing can be used jointly with the iterative procedure. The batch-processing can be used for initialization within the first Q/N sample bundles, while the iterative procedure can be used for consecutive updating in subsequential sample bundles. The combination of iterative updating and batch based initialization is shown in Fig. 2 (c).

V. COMPUTER SIMULATION
Computer simulation is adopted in this section to demonstrate the efficiency of the fast reconstruction algorithm. In the simulation, we consider a ULA composed of M = 16 antennas with d/λ = 0.5. Three independent signals from θ = −20 • , 20 • , 40 • are impinging onto the array with unit powers, and each sample bundle contains K = 3000 samples. The predetermined DOAs are uniformly distributed from −90 • to 90 • . Without special specification, the signalto-noise ratio (SNR) is set as −5 dB in the simulation. Similar to [15], normalized squared error (NSE) is adopted in this article to evaluate the accuracy of SCM reconstruction, which is defined by In the simulation, the NSE is obtained by averaging over 2000 runs to obtain a mean value. MUSIC algorithm is adopted for DOA estimation once reconstructed SCM is available, and the grid for exhaustive searching in (7) is set to be −90 o to 90 o with 1 o as the grid size. Note that MUSIC algorithm may suffer from the off-grid DOA estimation issue [17], interpolation between neighboring DOA grids can be used in this case to improve the accuracy of DOA estimation [18]. Fig. 4 shows the NSE performances where different diagonal loading coefficients have also been included. As shown in the figure, the NSE is improved as the rising of the time duration because more predetermined DOAs can be swept. In the case of single RF chain, since each sample bundle can sweep only one predetermined DOA, a long time duration is thus required to sweep all the predetermined DOAs. In the presence of more RF chains, the time duration can be greatly reduced because each sample bundle can sweep multiple predetermined DOAs with each DOA corresponding to one RF chain. As an example, Fig. 4 shows that about 40 sample bundles are required to achieve NSE = 10 −3 when N = 1, while it need only 10 beam sweeping operations when N = 4. Therefore, the procedure of SCM reconstruction can be accelerated by adopting more RF chains. Fig. 5 presents the NSE performances during the iterative updating procedure. As expected, the iterative procedure can update the reconstructed SCM every sample bundle, and therefore the SCM can be updated much faster than the batchprocessing approach. It is interesting to note that the NSE performance remains unchanged during the updating procedure. This is because the iterative updating is essentially the same with the batch-processing except that the iterative procedure can update the SCM much faster. Therefore, we can observe that both approaches have the same performance. In addition, the NSE performances during the iterative procedure are determined by the number of predetermined DOAs. Once Q is determined, the NSE will remain unchanged during the iterative updating procedure. Theoretically, the iterative procedure Due to the iterative procedure, the reconstructed SCM can be updated every sample bundle. The NSE during iterative updating can be the same to that in the initialization. FIGURE 6. NSE performances with different SNRs. It is interesting to see that the NSE is rising as the SNR increase. In addition, the NSE can be improved by adopting more RF chains.
can be conducted to i = ∞. In practice, the iteration can be stopped as long as the SCM reconstruction is not required any more. Fig. 6 shows the reconstruction accuracy with different SNRs. From Fig. 6, it is interesting to find that the NSE will get worse as the rising of SNR. This is especially the case for Q = 32. When SNR is small, the desired SCM is close to an identity matrix except for a scaling factor determined by the SNR. In this sense, the reconstruction of SCM is similar to estimate this single scaling factor. The reduction of the numbers of unknown variables leads to improvement of reconstruction accuracy. When SNR is large, however, there exist much more unknown variables in the the desired SCM, and thus the reconstruction accuracy gets worse. When Q = 40, the reconstructed SCM is already accurate enough, and thus changing the SNR has little impact on the NSE. Fig. 6 also indicates that the NSE can be improved by using more RF chains. By adopting more RF chains, the number of beam sweeping operations can be reduced, and the overall noise power included by beam sweeping operations can be also reduced accordingly. Therefore, NSE can be improved by adopting more RF chains. Fig. 7 shows the performance improvement on the DOA estimation, where we have adopted the mean-square-error (MSE) as an indicator to evaluate the accuracy of DOA estimation, that is As expected, the accuracy of DOA estimation can be improved as the rising of the time duration. This is because more predetermined DOAs can be swept as the rising of time duration and the reconstructed SCM is thus more close to the true SCM. Similar to Fig. 4, the time duration can be greatly reduced by exploiting multiple RF chains, so that the DOA estimation procedure can be conducted much faster than the single RF chain case.

VI. CONCLUSION
In this article, we have developed a fast beam sweeping algorithm by exploiting multiple RF chains in hybrid massive MIMO system. In the presence of multiple RF chains, the predetermined DOA required are divided into several subgroups, with each sub-group corresponding to one RF chain. Then, by beam sweeping simultaneously on all the RF chains, the predetermined DOAs can be swept in a parallel manner, so that the SCM can be reconstructed much faster than our previous works. To accelerate the updating procedure, an iterative updating procedure has also been developed in this article. Using the iterative procedure, the reconstructed SCM can be updated as long as the new beam sweeping results are available, and there is no need to wait until all the predetermined DOAs are swept. Simulation results have also been presented in this article, demonstrating that the proposed approach can greatly accelerate the reconstruction and updating procedures for SCM reconstruction.