Low-Complexity Compressed Sensing Downlink Channel Estimation for Multi-Antenna Terminals in FDD Massive MIMO Systems

In frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems, the overhead of downlink channel estimation is complex in terms of the pilot overhead, the calculation cost and the estimation process. In this paper, we propose a low-complexity downlink channel estimation scheme based on compressed sensing for mobile multi-antenna terminals. In the scheme, the mobile terminal estimates the downlink massive MIMO channel and utilises the characteristics of the spatial sparsity of the massive MIMO channel to reduce the feedback overhead by feeding back the nonzero value of the sparse channel. Specifically, we propose a low-complexity estimation algorithm based on compressive sensing for multi-antenna terminals to reduce the computational overhead of the terminal. Since different antennas of a terminal share the same support set, the algorithm estimates multiple indices per iteration, collecting the estimated indices of different antennas at the end of each iteration, thereby reducing the total number of iterations of the algorithm. Then, we derive a halting condition for a greedy algorithm that stops the iteration process according to the residual energy. The simulation results illustrate the efficiency of the halting condition for the greedy algorithm and the low complexity of the proposed algorithm. In contrast to different greedy algorithms and the Bayesian algorithm, the proposed algorithm has a complexity that decreases as the number of terminal antennas.


I. INTRODUCTION
With the popularity of smartphones and the rise of the Internet of Things (IoT), data traffic transmitted over wireless networks has grown exponentially every year [1]. Avoiding the capacity of wireless networks limits the sustainable growth of data traffic, and new technologies must be developed to satisfy the increasing data traffic demand. In recent years, large-scale massive multiple-input multiple-output (MIMO) technology has attracted wide attention internationally in academia and industry [2]- [4]. As the number of antennas increases, the power consumption of each antenna decreases, and the spectrum efficiency is greatly improved [5]. The increased degrees of freedom make large-scale MIMO systems very robust [6]. These advantages make a large-scale MIMO system a key technology for next-generation cellular networks.
The associate editor coordinating the review of this manuscript and approving it for publication was Damien Roque .
To obtain the spatial gain of a large-scale MIMO system, accurate uplink and downlink channel state information (CSI) is essential at the base station (BS) for coherent processing, such as precoding and combining. In time division duplex (TDD) mode, because the uplink channel and the downlink channel operate in the same frequency band, the acquisition of CSI at the BS is simplified by utilizing the reciprocity between the uplink and downlink channels [7], [8]. Therefore, in the early stage of large-scale MIMO systems, TDD large-scale MIMO systems were widely studied. However, the uplink and downlink channels in an FDD large-scale MIMO system are not reciprocal, as they operate in different frequency bands. The downlink CSI in FDD consumes more feedback overhead that should be sent back to the BS via the uplink. Therefore, the acquisition of the downlink CSI for FDD massive MIMO systems is challenging [9]. At present, a large number of communication systems are operating in FDD mode. To update these systems and keep them compatible with new technologies, research on VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ downlink CSI acquisition in FDD massive MIMO systems is of great practical significance. The scale of conventional MIMO antennas is small, and the pilot overhead for channel estimation is proportional to the scale of the antennas of the transmitter [10]. Therefore, the pilot overhead is acceptable for downlink channel estimation in small-scale MIMO systems. Conversely, traditional channel estimation methods exhibit a rapid increase in the pilot overhead in large-scale MIMO systems, which will occupy uplink transmission opportunities and decrease the spectrum efficiency [11]. The estimation complexity increases cubically with the number of BS antennas, which is too complex in terms of implementation.
Studies of large-scale MIMO channels have demonstrated that the channel exhibits significant spatial sparsity as the number of antennas increases [12]- [14]. Compressed sensing has been applied to exploit the spatial sparsity property in a channel to overcome the significant pilot overhead in FDD systems. Bajwa [12], [15] analysed a sparse multipath channel and first proposed a method based on compressed sensing to estimate sparse multipath channels. Kim [16] and Lee [17] applied the more efficient compressed sensing reconstructed algorithms to improve the estimation accuracy. The SP algorithm [18] selects multiple columns in each iteration, and the convergence is significantly faster than that of the MP and OMP algorithms. Greedy algorithms such as MP [16], OMP [17], and SP [18] assume that the channel is strictly sparse, an assumption that is often difficult to satisfy in practice. The SP algorithm also requires time-varying parameters of the channel a priori, which limits the application of the SP algorithm. Sparse Bayesian learning (SBL) [19] is another class of reconstruction algorithms. SBL reconstructs a sparse signal by finding the maximum posterior probability density of the sparse channel, but it does not require knowledge of the sparsity. However, the computational complexity of SBL is significantly greater than that of greedy algorithms.
Structured sparsity is utilised to reduce the complexity of channel estimation in large-scale MIMO systems. Rao [20] proposed the concepts of individual joint sparsity due to local scattering and distributed joint sparsity due to common scattering. The BS obtains the distributed joint sparsity, and then the individual joint sparsity of each terminal is estimated based on the estimated distributed joint sparsity. However, the distributed joint sparse channel is absolutely sparse; the approximate sparse channel model adopted by [13], [21], [22] is more practical. Gao [23]- [25] exploited the structured sparsity of an OFDM-MIMO channel to simplify the complexity of the CSI request. In addition to the spatially structured sparsity, it is well-known that the angle of arrival (AoA) and the angle of departure (AoD) slowly vary over time due to the sluggish mobility of terminals. Zhu [26], [27] proposed DOMP and SMP to track a dynamic sparse channel. Rao [18] inputted the last sparsity information as the prior knowledge for the next estimation to reduce the calculational burden.
Despite the critical importance of the challenges of channel estimation, the complexity proposed in the literature is still insufficient. This paper proposes a low-complexity channel estimation algorithm for an FDD large-scale MIMO system with multi-antenna terminals. The main contributions of this paper are as follows: 1) We propose a low-complexity channel estimation algorithm that utilises the sparsity property of a large-scale MIMO channel and the multi-antenna property of terminals. In the proposed algorithm, each antenna of a terminal is treated as an independent antenna to choose a column in the measurement matrix, and then all columns are combined. By selecting multiple columns in an iteration, the proposed algorithm requires fewer iterations to reach convergence. 2) We propose a halting threshold for a compressedsensing-based greedy algorithm. Using the proposed threshold, the greedy algorithm uses residual power as the criterion for halting the iteration, eliminating the dependence on the sparsity of the channel. 3) We validate the proposed algorithm and halting threshold in utilising the sparsity of the channel and the property of multi-antenna terminals while reducing the time consumption of the estimation compared with that of other greedy algorithms and the Bayesian algorithm.
The remainder of this paper is organised as follows: Section II introduces the system transmission model, channel sparsity representation and received signal correlation. Section III discusses the proposed channel estimation algorithm and halting threshold. Section IV simulates the performance of the proposed technique. Section V concludes the paper.
Notation: The normal math fonts 'X' and 'x' express a matrix and a vector, respectively. The Frobenius norm is represented by . f , and the notation X 2 refers to the 2-norm. Transpose, conjugate transpose and pseudo-inverse operators are represented by the notations (.) T , (.) H , and (.) † , respectively. The notation |.| is the cardinality of a set or modulus of a number. The real number field and the complex number field are denoted by R and C, respectively. [X, Y, Z] refers to a matrix assembled using the matrices X, Y, and Z. The notations X(j, :) and X(:, j) denote the j-th row and j-th column of the matrix X. The notation x(j) denotes the j-th element in the vector x. I denotes the identity matrix. X is the sub-matrix of X consisting of columns of X indexed by . The operation \ represents set subtraction.

II. SYSTEM MODEL A. MASSIVE MIMO SYSTEM
In FDD large-scale MIMO systems, there is a BS with M antenna elements serving K multi-antenna terminals in a cell. Each terminal employs an antenna array with N elements and M K and M N . Suppose that all of the antenna arrays are uniform linear arrays (ULAs) and that the interval space is λ/2, where λ is the wavelength. The wireless channel is a flat-fading channel, and the length of a coherent time block is T c . The channel gain remains approximately constant in the same coherent time block, and the coefficients are independent in different coherent time blocks. In a coherent time block, the time block is divided into a training phase with length T p and a transmission phase with length T d . The BS broadcasts T training pilot symbols via the downlink channel in the training phase. In the jth slot, the pilot signal Y k,j ∈ C N ×T received by the kth terminal in the cell is expressed as follows: where H k,j ∈ C N ×M is the downlink channel matrix between the BS and the kth terminal in the cell and W k,j ∈ C N ×T is the corresponding additive white Gaussian noise (AWGN) matrix, whose elements are independent and identically distributed (i.i.d.) complex Gaussian random variables with zero mean and variance σ 2 . ρ is the received signal-to-noise ratio (SNR). X ∈ C M ×T is the downlink pilot sent by the BS, and tr(X H X) = T . For convenience, the subscript j is omitted in the following. The downlink CSI is obtained at terminals, and then it is fed back to BS via the uplink. Thus far, the closedloop process of obtaining the CSI has been completed, containing pilot broadcasting and feedback [28]. The obtained H k is the estimation of the large-scale MIMO channel matrix The channel between the BS and the kth terminal is expressed as follows [29], [30]: where a T (θ) = 1, e jπ sin θ , . . . , e jπ(M −1) sin θ T a R (ϕ) = 1, e jπ sin ϕ , . . . , e jπ(N −1) sin ϕ T The symbols θ k , ϕ k , d and a are the AoD, AoA and corresponding angular spreading, respectively. r k (θ, ϕ) is the complex gain of the channel. The BS is often placed at a higher position, farther away from the scatterer, and the scatterer is distributed over a limited area. Therefore, d is limited to a relatively small range. From the view of a terminal, scatterers surround the terminal, and the terminal receives rays from all directions, with a ≈ π .

B. REPRESENTATION OF A SPARSE CHANNEL
The matrix H k is not a sparse matrix; however, the channel matrix H a k in the virtual angular domain is sparse [29], [30]. The relation between H a k and H is as follows: where U r and U t are the unit discrete Fourier transform (UDFT) matrices. The element h a k,n,m in H a k is the complex gain in the mth AoD and the nth AoA window between the BS and the terminal. Because d is limited in a small range, H a k is obviously sparse. Fig. 1(a) shows the amplitude gain of h a k,n,m . The amplitudes of the elements outside of [θ k − d , θ k + d ] are approximately zero. Therefore, the nonzero elements in H a k provide nearly all of the gains of the channel between the BS and the terminal, and the contributions of the other elements are ignored.
Set as the support set of the index of the column vector, whose norm is nonzero, in a sparse matrix H a k .
where α is the defined threshold used to determine that H a k (: , i) 2 is approximately zero.
The support set n is the support of channel vector h a k,n between the BS and each antenna of the terminal.
Because each antenna of the terminal shares the same position, the propagation paths from the BS to each antenna VOLUME 8, 2020 experience the same path of reflection and scattering. Therefore, all antennas of a terminal have the same d , a and n , and n is the same as . However, the amplitudes and phases among h a k,n are different for each antenna. Fig. 1(b) contains plots of the cumulative probability distribution F |h a k,n,m | (x) curve (blue) and the cumulative gain Equation (9) describes the percentage of elements with amplitudes less than x. Equation (10) describes the percentage of the total gain of the elements whose amplitudes are less than x in relation to the total gain of the channel. For example, in Fig. 1(b), the amplitude of 90% of the elements in the channel matrix are less than 1, and the total gain provided by these 90% elements is less than 5%. If the 10% non-sparse elements in the channel are accurately estimated, a channel gain equivalent to greater than 95% is correctly obtained. A small proportion of nonzero elements contributes to nearly all of the gain of the channel.

C. CORRELATION BETWEEN EACH ANTENNA
The channel of each antenna of a terminal shares the same ; however, other parameters, such as amplitude and phase, are irrelevant. The correlation between h a k,n 1 and h a k,n 2 is as follows: Table 1 shows the correlation η of the channel vector between four different antennas of a terminal calculated using the 3GPP spatial channel model (SCM) [31] The BS employs an antenna array with 200 elements; the factor η between the different antennas is low.

D. COMPRESSED-SENSING-BASED CHANNEL ESTIMATION IN FDD LARGE-SCALE MIMO SYSTEMS
According to (1) and (5), the received signal at the terminal can be written as follows: andX is defined as the measurement matrix and is known at terminals. The channel estimation problem can be formulated as follows: This problem is equivalent to the following: This estimation problem is a sparse signal recovery problem in compressed sensing. The signal of each antenna is as follows: . When a terminal employs a single antenna, the received signalȲ k degenerates into a vector y k , and the sparse signal recovery problem is called the single measurement vector (SMV) problem in the theory of compressed sensing. When a terminal is a multi-antenna terminal, the received signal is a matrix, called the multiple measurement vector (MMV) problem. In the MMV problem, the sparse signal vectors of different antennas have some of the same structures because they share the same in this paper.

III. PROPOSED ALGORITHM A. PROCESS OF FDD DOWNLINK CHANNEL ESTIMATION
In this section, the channel estimation process is described as follows.
1) BS broadcasts the downlink pilot to all terminals in the cell; 2) Each terminal receives the downlink pilot signal; 3) Each terminal estimates the downlink CSI according to the received downlink pilot signal; 4) Each terminal feeds back the estimated CSI, which is expressed in the virtual angular domain, to the BS via uplink; and 5) The estimated CSI is converted into the temporal domain and is combined for downlink precoding. Different from centralised channel estimation at the BS side, we adopt the strategy of channel estimation at the terminal.
This strategy is combined with a channel estimation algorithm based on compressive sensing. It takes advantage of the sparsity property of massive MIMO channels. By feeding back nonzero channel coefficients in sparse channels, the strategy effectively reduces the CSI feedback overhead and the waste of uplink resources. The centralised estimation strategy requires the terminal to feedback the complete received pilot signal; thus, the feedback overhead is significantly more than the overhead of the estimated CSI feedback [20], [32]. However, because the terminal universally uses a battery to provide power and does not have the conditions to run an extremely complex estimation algorithm, a channel estimation algorithm with low complexity is required to ensure that the power consumption of the mobile terminal is reduced. In the next section, a low-complexity channel estimation algorithm is described that meets this objective and runs on the terminal with a low computational burden to ensure reduced battery overhead.

B. DESCRIPTION OF THE PROPOSED ALGORITHM
The algorithm proposed in this paper is a large-scale MIMO channel estimation algorithm for multi-antenna terminals. The algorithm is a greedy sparse signal reconstruction algorithm for downlink channel estimation in FDD massive MIMO systems. The algorithm reduces the computational complexity by taking advantage of the characteristics of multi-antenna terminals. if |ˆ i | > T then 9 Unlike JOMP and SOMP, the proposed algorithm improves the estimation efficiency for multi-antenna terminals. Table 1 illustrates that the channel vector correlation between different antennas of the same terminal is low. Taking advantage of this feature, Lines 3-5 in the algorithm select an index for each antenna. There is a high probability that these indices are different. The proposed algorithm selects up to N indices in each iteration. The efficiency of the algorithm is significantly higher than that of JOMP and SOMP, which only choose one index to add to the support in each iteration.
Note: The purpose of exit condition 1 is to ensure that the algorithm is correct and stable. As the number of iterations increases, |ˆ i | will gradually increase. When the value of |ˆ i | is greater than the pilot length T , † in Line 13 does not exist, wihch will lead to incorrect results; therefore, it is necessary to force the algorithm out of the iteration process when |ˆ i | > T .

C. ANALYSIS OF THE ALGORITHM
When N = 1, the terminal receives the pilot by a single antenna, and the algorithm degenerates into a conventional OMP. The complexity of the proposed algorithm is the same as that of OMP. When N > 1, in each iteration, N elements are selected from and put into the setˆ . If the N elements are not the same, the number of iterations will be less than that of other greedy algorithms, such as OMP. In the following, we explain why it is very probable that the N elements are different. In other words, by analysis, we find that P(| a | > 1) > 0 when N > 1.
First, define the set 1 , 2 as follows: where the set C is the set of indices of the elements inh a n ; is a support set ofh a n ; and C\ and C\ˆ are the complement sets of andˆ , respectively.
In the ith iteration, the residual R is computed. The residual for each antenna is expressed as follows: Rearrange the elements in the vector r n according toˆ i , 1 , and 2 as follows: The intersection set ofˆ , 1 , and 2 is empty, and the column vectors in the matrixX exhibit a low correlation. Therefore,X Ĥ X 1 andX Ĥ X 2 in (24) are approximately 0 and ignored. Substituting (24) into (23), we obtain the following: Rearrange the elements inXr n according toˆ , 1 , and 2 as follows:X Xr n is split into three parts as follows: Equation (27) indicates that the elements inˆ will not be selected again as new elements. In (28), the first part, X H 1X 1h 1 , is much larger than the noise when the SNR is large; thus, the noise can be ignored. When the SNR is approximately zero,X H 1 r n is decided according to the two parts in (28). In (29), 2 is the set that contains the indices of the zero elements in sparseh. Therefore,h 2 is an approximately zero vector. Equation (29) is determined by the noise.
When the SNR is large, τ n ∈ 1 . According to Section II-C, the correlations among antennas are small, and the index of the maximum elements from different antennas are nearly independent. Therefore, while | 1 | > 1, P(| a | > 1) > 0 holds. In each iteration, the algorithm may choose multiple indices to reduce the computational burden.

D. HALTING THRESHOLD OF THE ALGORITHM
When the sparsity | | is known, the iteration stops when | | = |ˆ |. Although large-scale MIMO channels are sparse, they are not strictly sparse. In H a k , there is a large number of nearly zero channel coefficients, and the sparsity of the channel is continuously changing in joint coherent times, which is difficult to determine. It is inefficient to stop the estimation according to |ˆ |.
In the proposed algorithm, the halting condition is the residual power of R. R i 2 F < γ indicates the correct time to stop the iteration and obtainĤ a k . The algorithm should stop when all the support elements of the channel vector are selected to joinˆ . In each iteration, 1 = ∅ holds at the correct time to stop, and the elements in h 2 are nearly zeros at this time. The residual is as follows: The power of the residual is as follows: In other words, when 1 = ∅ holds, the residual power of the signal of each antenna is approximately the power of the noise, σ 2 . Therefore, in the algorithm, the halting threshold γ can be selected to stop the iteration when the average power of the residual r k,n is equal to the noise power σ 2 . At this time, the probability of erroneously selecting the support element is the smallest, and the performance degradation caused by excessive iterations is prevented.

IV. NUMERICAL RESULTS AND DISCUSSION
In this section, we describe the simulations conducted to verify the effectiveness of the proposed algorithm. Consider the scenario of a BS with 200 antennas and 10 terminals, each with 4 antennas. The channel model in the simulation is the 3GPP SCM [31]. The correlation between column vectors fromX should be as small as possible. The pilotX in the simulation is chosen from the 20,000 generations ofX. The elements ofX are randomly chosen from the set {+1, −1}. We consider several counterparts: • OMP [33], [34]: a greedy algorithm for the SMV problem. Run the algorithm on an arbitrary antenna and share the estimated supportˆ to other antennas.
• JOMP [20]: a centralised channel estimation algorithm • SOMP with different halting conditions [35], [36]: a greedy algorithm for the MMV problem. SOMP-1 halts with a statistical sparsity bound S, the same used in reference [20], and SOMP-2 halts at the proposed halting threshold; and • MSBL [19], [37]: a Bayesian learning algorithm for the MMV problem. To evaluate the performance ofĤ, the simulation repeats T repeat times, and the normalised mean squared error (NMSE) is applied as follows: In Fig. 2, we show the NMSE performance and the time consumption with respect to different halting thresholds γ , where we set T = 70 and SNR = 20 dB. In Fig. 2 (b), as the threshold γ increases, the calculation time of the algorithm decreases because the larger threshold value causes the algorithm to exit earlier. In Fig. 2 (a), the NMSE performance curve has a minimum near the threshold γ ≈ σ 2 , which indicates that the halting threshold γ we derived is valid. The estimation algorithm stops when the NMSE performance is approximately optimal. A suitable γ prevents the performance loss caused by insufficient or excessive iterations. This result verifies that our analysis is effective, and consequently, γ is fixed at σ 2 in the following simulations.
Figs. 3 and 4 depicts the NMSE and time consumption of different algorithms with respect to the pilot length T , where we set SNR = 20 dB. In Fig. 3, the NMSE curves all gradually decrease as the pilot length T increases. Regardless of the pilot length T , the algorithm proposed in this paper performs slightly worse than the SOMP-2 algorithm. At each terminal, in each iteration, SOMP exploits the entire received signal Y k to choose the best element in the support , which minimises the residual R k 2 F . The proposed algorithm chooses an element in the support for each received signalȳ k,n .  As a result, the calculation time of the algorithm will be significantly reduced. The JOMP and SOMP-1 algorithms overestimate since they halt according to the statistical sparsity bound S [20]. The algorithms with S choose more extra wrong elements than the algorithms with residual energy. Bayesian-based MSBL achieves the best NMSE performance with T < 60, while at T > 60, it cannot provide a more accu-rateĤ. The OMP algorithm shows the worst performance. Although different antennas of the terminal share the same support, the importance of each index is not same. OMP uses the information in only one antenna, which will inevitably lead to serious performance loss. Fig. 4 compares the time consumption for different algorithms. Each algorithm repeats 2000 times in a simulation, and the time consumption is calculated by an average. Bayesian-based MSBL requires hundreds of times more computational time than the greedy-based SOMP and JOMP. The proposed algorithm and OMP are the two fastest among the algorithms. This result is consistent with our analysis.
The proposed algorithm chooses more different elements in an iteration and finishes faster than the other algorithms. OMP only estimates the received signal of one antenna and then shares the estimated supportˆ with other antennas. Therefore, the amount of calculation will be reduced several times. JOMP and SOMP possess similar performances in terms of time consumption. Because these algorithms do not completely exploit the characteristics that each antenna of the terminal shares the same support set, they must perform more operations than the proposed algorithm.  In Figs. 5 and 6, we show the effect of the SNR on the performance and time consumption, where we set the pilot length T = 70. Regardless of the SNR, the proposed algorithm still performs worse than the SOMP-2 algorithm. The reason is the same as that in the analysis in the previous paragraph. When SNR < 20 dB, MSBL has worse NMSE performance than the other greedy algorithms, while the MSBL curve decreases quickly beyond that of JOMP, SOMP and the proposed algorithm. The OMP algorithm still performs poorly on multi-antenna terminals. As shown in Fig. 6, the calculation time of the MSBL algorithm decreases as the SNR increases. The calculation times of JOMP and SOMP-1 remain constant and do not change with the SNR. The specific reason is that these two algorithms exit according to S, and S does not depend on the SNR. In comparison, the calculation times of SOMP-2 and the proposed algorithm increase as the SNR increases because the halting threshold γ of the algorithm is related to the noise σ 2 . The computational cost of the proposed algorithm is less than that of the other algorithms when the SNR is less than 30 dB.   7 is a plot of the spectral efficiency CDF at SNR = 20 dB. The spectral efficiency is consistent with the NMSE performance. The NMSE performance of the OMP is far worse than the other algorithms. Therefore, its spectrum efficiency is far inferior to the other algorithms. The spectral efficiencies of SOMP and JOMP are similar; JOMP reaches higher spectral efficiency. The spectral efficiency of the proposed algorithm is approximately 40-90 bit/s/Hz. The largest spectral efficiency of the proposed algorithm is greater than that of SOMP V2, but the average spectral efficiency is not as good as that of SOMP V2.
Figs. 8 and 9 illustrate the NMSE performance and time consumption versus the number of antennas of the terminal, N . Increasing the number of antennas of the terminal means that the channel matrix scale increases and that more information can be used for channel estimation. As the number of antennas increases, the estimation results of JOMP, SOMP and MSBL improve. The difference is that NMSEs of JOMP and SOMP slowly improve after some number of antennas. The proposed algorithm improves as the number of antennas increases when there are fewer than 8 antennas. After the number of antennas is greater than 8, the NMSE performance becomes worse as the antennas increase. In each iteration, the operation of selecting an index is performed independently according to the received signal of each antenna.  Although multiple indices are selected from the support in one iteration, none of these indices are globally optimal. With the increasing number of antennas, the received signal of each antenna is disturbed by noise, and the probability that the selected index is wrong will increase such that the performance will deteriorate. In addition to noise, when the number of antennas increases, the estimated supportˆ i is further away from the best supportˆ . As shown in Fig. 10, when N = 10, the estimation can be completed in two iterations, but the estimated support is further away from the optimal support. The simulation shows that when the terminal antenna exceeds 8, the effect of noise and the increase in granularity play a leading role, which causes performance degradation. Due to the size limitation of mobile devices, the possibility of integrating a large number of antennas is low. Some high-end phones, such as the iPhone 11 and the Mi 10, are currently equipped with 4 antennas [38], [39]. Therefore, the proposed algorithm is still valuable in terms of practical applications. In terms of time consumption, when there are fewer than 8 antennas, the proposed algorithm takes less time as the number of antennas increases. This trend is different from that for the other two algorithms. This phenomenon occurs because as the number of antennas increases, the proposed algorithm can select more elements of the set in an iteration. When the number of BS antennas does not change, | | will not change appreciably. Therefore, the support of the channel will not change. The proposed algorithm is therefore able to perform the estimation with a smaller computational burden. When the number of antennas is greater than 8, the complexity increment in an iteration is greater than the decreasing complexity caused by increasing the antennas. As a result, the time consumption of the proposed algorithm increases when the number of antennas is greater than 8. The time consumption of MSBL increases rapidly with the number of antennas when there are fewer than 6 antennas, while those of the JOMP and SOMP algorithms increase slowly. However, the counterparts consume much more time. Fig. 11 compares the feedback overhead of the estimation algorithm and normalises the feedback overhead based on the JOMP algorithm. Although the JOMP and MSBL are estimation algorithms based on compressed sensing, these two algorithms perform channel estimation at the BS side and must completely feed back the received pilot signals to the BS. Therefore, the feedback overhead of these two algorithms is multitudinous and stable. The proposed algorithm and SOMP-1 and SOMP-2 algorithms perform channel estimation at the terminal, and the feedback is nonzero, withcorresponding indices in the sparse angular domain channel. The feedback overhead is approximately 1% of the JOMP and MSBL feedback overheads. The feedback overhead of the SOMP-1 algorithm using the statistical sparsity bound S is fixed and determined by the bound S. The feedback overhead of SOMP-2 and the proposed algorithm, with a halting criterion of residual energy, gradually increases with the increase in the SNR. This is because under the condition of high SNR, when the algorithm is stopped, the residual energy is smaller, and more elements will be added to the estimation of the support set. However, under the condition of low SNR, the algorithm will advance to stop and exit to avoid an incorrect estimation of the support set.

V. CONCLUSIONS
In this paper, we consider FDD large-scale MIMO systems with multi-antenna terminals and propose a low-complexity and low-feedback compressed sensing algorithm for downlink channel estimation. The proposed algorithm first utilises the characteristics of a multi-antenna terminal to reduce the computational complexity. By applying the algorithm, the terminal spends less feedback overhead than the scheme estimated at the BS. Then, by analysing the power of the residual when the algorithm terminates, we propose using the power of the residual as the halting condition so that the greedy algorithm no longer requires the channel sparsity parameter. When there are less than 8 antennas, the algorithm has advantages as more antennas are integrated into the terminal. When the number of antennas exceeds 8, the performance will be reduced, but the computational complexity is still better than the other algorithms.