Time-Varying Sparse Channel Estimation Based on Adaptive Average and MSE Optimal Threshold in STBC MIMO-OFDM Systems

Channel estimation is still a challenge for space time block coding (STBC) multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems in time-varying environments. To estimate the channel state information (CSI) precisely without increasing complexity in any significant way, this paper utilizes the sparsity and the inherent temporal correlation of the time-varying wireless channel, and proposes a novel channel estimation method applied to STBC MIMO-OFDM systems. The proposed method consists of two schemes: adaptive multi-frame averaging (AMA) and improved mean square error (MSE) optimal threshold (IMOT). First, the temporal correlation of the time-varying channel is modeled by a linear Gauss-Markov (LGM) model, and the AMA scheme is incorporated to refine the initial estimated channel impulse response (CIR) through noise reduction. Based on the LGM model, the optimal average frame number is adaptively determined by minimizing the MSE of the denoised CIR. Then, the sparsity of the wireless channel is utilized to model the CIR as a sparse vector, and the IMOT scheme is performed to further remove the noise effect by discarding most of the noise-only CIR taps. Specifically, the IMOT scheme is achieved by recovering the CIR support across the optimal “tap-to-tap” threshold derived by minimizing the MSE of each CIR tap. Moreover, the prior confidence level of the tap to be active is calculated through multi-frame statistics to further improve the performance of the IMOT scheme. Simulation results verify that the proposed AMA-IMOT channel estimation method can achieve better performance than comparison methods.


I. INTRODUCTION A. BACKGROUND AND CONTRIBUTIONS
As the communication requirement is growing by leaps and bounds, multiple-input multiple-output (MIMO) techniques, which have the potential to increase the channel capacity significantly, have gained considerable attention from both academia and industry. Meanwhile, diversity techniques can be applied to MIMO systems to further improve the link reliability in the flat fading channel [1], [2]. On the other hand, orthogonal frequency division multiplexing (OFDM) The associate editor coordinating the review of this manuscript and approving it for publication was Prakasam Periasamy .
techniques can divide the frequency-selective fading channel into multiple flat fading subchannels, which are widely used in modern communication systems for its superior performance and high spectral efficiency [3]- [5]. The cyclic prefix (CP) is inserted in the front of each OFDM symbol as a guard interval (GI), which can avoid the inter-symbol interference (ISI) effectively [5]. For these reasons, the MIMO and OFDM systems are usually combined as MIMO-OFDM, which can improve the system capacity and performance simultaneously [6].
Space-time block coding (STBC) is a widely used transmitter diversity scheme in MIMO-OFDM systems [7], which can obtain the maximum diversity gain and achieve maximum likelihood (ML) decoding through simple linear operation [8]. Recently, non-orthogonal multiple accessing (NOMA) integrated with MIMO-OFDM as a promising technology is actively investigated, which enables multi-user overlapping transmissions over the same timefrequency resource and thus has an improved spectral efficiency [9], [10]. However, both space-time block decoding (STBD) and the multiple user detection of NMOA all require accurate channel state information (CSI) of multiple transceiver channels. In addition, for MIMO-OFDM systems with estimated CSI, coherent demodulation can be implemented instead of differential demodulation and can obtain a 3-4 dB signal-to-noise ratio (SNR) gain [5]. For these reasons, channel estimation is an important module for STBC MIMO-OFDM systems and the accuracy of channel estimation directly affects the recovery quality of the final received signals [11]. Nevertheless, channel estimation is a challenging problem in wireless systems due to the noise effect and time varying of wireless channels. Specially, in MIMO systems, the signals from other transmit antennas add further complexity for a certain transceiver channel [12].
Channel estimation for OFDM systems has been widely researched [13]- [15]. In general, channel estimation methods can be categorized into three types: data-aided channel estimation, blind channel estimation, and semi-blind channel estimation [13]. Although blind and semi-blind channel estimation methods have higher spectral efficiency, they are often subjected to high complexity. In addition, blind channel estimation methods often require some prior knowledge of channel statistics (KCS), which is difficult to obtain [14], [15]. In practice applications, data-aided channel estimation methods are more attractive for its reliability and simplicity. By multiplexing the known time-domain training preambles or frequency-domain pilot signals into OFDM symbols, dataaided channel estimation methods can accurately estimate the channel frequency response (CFR) or channel impulse response (CIR) in a simpler way [13].
To extend the data-aided channel estimation methods into MIMO-OFDM systems, the multiplexed known data signal must be orthogonal to avoid the interference of multiple antennas. In practice, the orthogonality is usually realized by using orthogonal training sequences in code domain or silent pilot approach in frequency domain [16], [17]. Specially, for STBC MIMO-OFDM systems where the channel is assumed to be quasit-static over two or more sequential OFDM symbols, a space-time orthogonal pilot pattern is proposed in [18], which can be used for channel estimation without the reduction of frequency-domain sampling density. This pilot pattern can work well when the channel is constant over several sequential OFDM symbols, which is coincide with the assumption of STBC MIMO-OFDM systems. Therefore, the pilot pattern in [18] is also used in this paper.
Recently, researchers have proved that for some broadband wireless channels, the equivalent sampled CIR often presents a sparse structure due to the delay disparity and the relatively high sampling rate [19]. Such channels usually appear in ultra-wideband [20] and underwater [21] communications. Since many CIR taps correspond to delays where no propagation channel paths are actually present, sparse CIR indeed contains only a small proportion of nonzero valued taps whose positions form the CIR support. Through tracking and recovering of the CIR support, channel estimation accuracy can be significantly improved with a small number of pilots [19]- [21]. At the same time, for the time-varying channel, the channel variation among different OFDM symbols has an inherent temporal correlation. Such correlation can be utilized for averaging [22] or polynomial-fitting [23], which can further improve the performance of channel estimation.
Based on the sparsity and the inherent temporal correlation of the time-varying wireless channel, this paper proposes a new method for channel estimation in STBC MIMO-OFDM systems. The proposed method consists of two schemes: Scheme (a) is the adaptive multi-frame averaging (AMA) and scheme (b) is the improved mean square error (MSE) optimal threshold (IMOT). In the first scheme (a), the time-varying channel is assumed to follow an underlying linear Gauss-Markov (LGM) model [24], and the multi-frame averaging is incorporated to refine the initial time-domain least squares (LS) [25] CIR estimate through noise reduction. Based on the LGM model, the optimal average frame number is adaptively determined by minimizing the MSE of the denoised CIR. In the second scheme (b), the objective is here to propose enhancements on the denoised LS CIR by discarding noiseonly taps and retaining the most significant taps (MSTs). To precisely select the MSTs and recover the CIR support, an optimal ''tap-to-tap'' threshold is derived by minimizing the MSE of each CIR tap. For this MSE optimal threshold, the prior probability of the tap to be active is a key parameter. In the scheme (b), this key parameter is obtained by calculating the confidence level through multi-frame statistics, which utilizes the probability distribution of the estimated CIR and thus can further improve the performance of the MSE optimal threshold. It is noted that a prior channel sparsity degree, i.e. the number of nonzero taps, is required to combine these two schemes. In addition, scheme (a) and scheme (b) can also be adopted separately without any prior KCS. Simulation results show that the proposed method outperforms the conventional counterparts in terms of true CIR structure detection rate (TCSDR), bit error rate (BER) and normalized MSE (NMSE) over the time-varying sparse channel without significant increase in complexity.

B. RELATED WORK
There have been many conventional pilot-based channel estimation methods for MIMO-OFDM systems, such as the LS method [25], minimum MSE (MMSE) method [26], and linear MMSE (LMMSE) method [27]. The LS estimator has the advantage of low complexity, but it is easily affected by noise [25]. By taking advantage of channel statistics, the MMSE estimator is robust to noise effect and has good estimation performance. However, the MMSE method has a high computational complexity because it involves the inverse VOLUME 8, 2020 operation of the matrix [26]. The LMMSE estimator is a simplified version of the MMSE estimator at the expense of a slight performance degradation, but it still requires prior KCS to calculate the channel autocorrelation matrix [27].
The additive white Gaussian noise (AWGN) in the wireless channel is an important factor affecting the accuracy of channel estimation. Recently, much research work is being devoted to suppressing the noise effect without increasing complexity in any significant way. The main idea to achieve this goal is to exploit the sparse properties of the wireless channel [19]- [21]. The number of the MSTs in a sparse CIR is small and most CIR taps are indeed noise-only. Therefore, if one can get completely correct CIR support and only remove all the noise-only taps, the noise effect can be suppressed without complexity increase, and the performance of channel estimation can be improved naturally. Rosati et al. [28] have proved that MSTs-based estimator with ideal CIR support has the potential to reach comparable performance with respect to the MMSE estimator. With this aim, Minn and Bhargava [29] propose to directly select the positions of only J strongest CIR samples as the CIR support. When J is exactly equal to the channel sparsity degree, good performance can be achieved. In addition, by introducing a certain unitary transformation into pilots, Zhou et al. [30] propose a real-valued sparse Bayesian learning (SBL) approach to convert complex-valued channel recovery problems into real ones, which brings a decrease in computational complexity, as well as a good noise suppression.
The classic compressed sensing (CS) theory has been successfully applied to recover the sparse CIR support [31]. The greedy pursuit (GP) algorithms such as orthogonal matching pursuit (OMP) [32], regularized OMP (ROMP) [33], and block OMP (BOMP) [34] are the most commonly used CS algorithms, which are based on a simple greedy selection of the dictionary matrix columns. To improve the recovery performance, Ma et al. [34] propose an adaptive support aware BOMP (ASA-BOMP) algorithm, in which both timedomain preambles and frequency-domain pilots are adopted. In the ASA-BOMP algorithm, jointly sparse property of MIMO channel and the training in time domain are firstly exploited to obtain the partial common support (PCS) of multiple transceiver channels. Then, the PCS can be used to improve the recovery probability and reduce the computational complexity of the BOMP algorithm. On the other hand, according to the data fusion principle, Uwaechia and Mahyuddin [35] built a CS collaborative framework by combining OMP, ROMP and other CS algorithms, which can significantly improve the signal recovery performance. However, the above mentioned CS algorithms [32]- [35] often require a large number of iterations to correct approximation errors [36], so its computational complexity is high. Moreover, a prior KCS about channel sparsity degree is often necessary for the GP-based CS algorithms to achieve the optimal performance.
To achieve CIR support recovery with low computational complexity and without any prior KCS, threshold-based selection (TBS) method is more appropriate [37]. In the TBS method, the CIR support can be recovered only need to compare the amplitude of each sample of the raw CIR estimate with a predetermined threshold. Obviously, the operation of the TBS method is very simple and how to determine an appropriate threshold is the primary and crucial task of the TBS method [37]. According to the research of Minn and Bhargava [29], a suitable choice of the threshold depends on the operating signal-to-noise ratio (SNR), which is only related to the noise variance for power normalized transmission signal. Following that, Kang et al. [38] use the absolute value of twice the noise variance as the threshold, and Lee et al. [39] directly use the noise standard deviation obtained by wavelet decomposition as the threshold. It can be seen that the appropriate threshold is highly associated with the noise energy added to the signal, and the accuracy of the estimated noise variance is very important for the TBS method. Therefore, Xie et al. [40] propose a more rigorous universal threshold with a two-step iterative noise variance estimation method, which can greatly improve the estimation accuracy of the noise variance. In this way [38]- [40], a dynamic number of MSTs is selected per OFDM symbol, and the noise effect can be effectively suppressed without any prior KCS during different SNR scenarios.
However, the above thresholds [38]- [40] are set according to heuristics, which is not optimal and can be further improved based on some optimal criteria. Therefore, by maximizing the correct detection probability of the MSTs, Oliver et al. [41] propose an optimal threshold for sparse Rayleigh channel estimation. In the same way, Rosati et al. [42] derive a sub-optimal threshold by minimizing the MSE of the global CIR. As expected, the performance of thresholds [41] and [42] is better than conventional heuristic threshold. However, due to the derivation of these optimal thresholds is usually based on the channel statistic, to obtain optimal performance, the calculation of thresholds [41] and [42] requires a prior KCS about channel sparsity degree. To overcome this disadvantage and further improve the performance, Jellali and Atallah [43] derive a tap-tuned threshold by minimizing the MSE of each CIR tap. Different from conventional global threshold [38]- [42], which is constant in one OFDM symbol, this tap-tuned threshold allows the value of the threshold for each CIR tap can be different in one OFDM symbol, and thus have the potential to achieve better performance. Inspired by the threshold in [43], this paper proposes an IMOT approach, which is the scheme (b) in the proposed method, to obtain better recovery performance. It is noted that the gain of the IMOT compared to the threshold [43] will be shown in the simulation results of Section IV.
Although the TBS method with an ideal threshold can effectively suppress noise effect by discarding noise-only taps, the noise existing in MSTs still restricts the accuracy of channel estimation. To address this problem, multi-frame averaging can be used to suppress the noise effect in MSTs, which is based on the inherent temporal correlation of the time-varying channel. Following that, Lee et al. [44] propose a noise reduction method by averaging the channel coefficients of LS estimation in two or more OFDM frames. The averaging method in [44] uses a preset and fixed average frame number, and works well in the static channel. However, in the time-varying channel, the Doppler spread will bring Doppler distortion in the averaging process. Although the multi-frame averaging method with more average frames can obtain better noise suppression ability, the Doppler distortion caused by it will be more serious [45]. Therefore, the average frame number is a crucial parameter for the multi-frame averaging method and should be carefully determined. According to the research of Zhang et al. [45], a suitable choice of the average frame number is related to the Doppler spread and the operating SNR in the time-varying channel. To quantitatively analyze the effect of Doppler distortion on the accuracy of channel estimation, the LGM model can be used to describe the variation of the time-varying channel [24]. After that, by minimizing the MSE of the CIR after averaging, this paper proposes an AMA approach, which is the scheme (a) in the proposed method, to obtain the MSE optimal average frame number.
In addition, the multi-frame averaging method can not only be used to suppress the noise effect. According to the research of Rosati et al. [28], by extending the observation window over several OFDM symbols, which is similar to the multiframe averaging, higher reliability of the MSTs selection can be achieved, especially in low SNR conditions. Therefore, by combining the multi-frame averaging method and the TBS method, this paper proposes an AWA and IMOT based channel estimation method in STBC MIMO-OFDM systems. The three main novelties in this paper are as follows: First, based on the tap-tuned MSE optimal threshold in [43], a ''tapto-tap'' IMOT is derived by modifying the MSE introduced in the case of false alarm and missed detection. Moreover, the prior confidence level of the tap to be active is calculated through multi-frame statistics to further improve the performance of the IMOT approach. Second, based on the LGM model, the effect of Doppler distortion in the averaging process is analyzed quantitatively. Then, the AMA approach is derived to obtain the optimal average frame number by minimizing the MSE of the denoised CIR. Third, the IMOT approach and the AMA approach are combined, which can not only suppress the noise in the MSTs, but also further improve the recovery probability of the CIR support.
The remainder of this paper is organized as follows. Section II introduces the sparse channel estimation in STBC MIMO-OFDM systems where the system model, the sparse channel model, and the time-domain LS estimator are described. Section III details the proposed AMA and IMOT based channel estimation method. The experimental results of the proposed method are presented in Section IV. Finally, the conclusions are discussed in Section V.

II. SPARSE CHANNEL ESTIMATION IN STBC MIMO-OFDM SYSTEMS
A. SYSTEM MODEL A 2 × 2 STBC MIMO-OFDM system with N subcarriers and utilizing the proposed channel estimation method, is presented in Fig. 1, where a space-time orthogonal pilot pattern [18] is used to avoid interference from other transmit antennas. This paper assumes that the system is perfectly synchronized and that the CP length N CP is longer than the maximum channel delay spread M . It is noted that the extension of the proposed channel estimation method from 2 × 2 STBC MIMO-OFDM system to other types of STBC MIMO-OFDM systems is possible, if different codeword matrices and orthogonal pilot patterns are used.
At the transmitter side, the input bits are first grouped and mapped according to a pre-specified modulation schemequadrature phase shift keying (QPSK). Then, the modulated symbols are sent to the STBC block. The codeword matrix of the STBC is different depending on the number of the transmit antenna N t . For example, for a MIMO-OFDM system with N t = (2, 3, 4), the full rate Alamouti [7], rate 3/4 non-square and square [11] codeword matrices are widely used in practice, respectively. Therefore, the Alamouti code is employed in this paper for the 2 × 2 STBC MIMO-OFDM system, and its codeword matrix X(k) corresponding to the kth subcarrier can be represented as [7]: where the superscript (·) * represents the conjugate operation, X t (k, n) and X t (k, n + 1), t = 1, 2 represent the frequencydomain data transmitted from the tth transmit antenna, and located on the kth subcarrier of the nth and (n + 1)th OFDM symbols, respectively. The complex symbol s n (k) and s n+1 (k) corresponding to the kth subcarrier are drawn from the QPSK constellation at the nth and (n + 1)th time slots, respectively. Therefore, at a given symbol period, two complex symbols are simultaneously transmitted from two antennas, and the actual number of the QPSK symbols per Alamouti codeword matrix is two. At the receiver side, after CP removal operation, the fast Fourier transform (FFT) output Y r (n) ∈ C N ×1 , r = 1, 2 at the rth receive antenna and the nth OFDM symbol can be represented as [11]: where diag (X t (n)), t = 1, 2 denotes the N × N diagonal matrix with the vector X t (n) on its principal diagonal, and represents the transmitted nth OFDM symbol corresponding to the tth transmit antenna. The notation I 2 is a 2×2 identity matrix and the matrix F ∈ C N ×M is obtained by selecting the first M columns of the standard N × N discrete Fourier transform (DFT) matrix. The operator ⊗ denote the Kronecker product and h r (n) = [h T 1r (n), h T 2r (n)] T ∈ C 2M ×1 is the channel vector for the nth OFDM symbol obtained by stacking h tr for all the transmit antennas, where h tr ∈ C M ×1 is the sampled equivalent CIR between the tth transmit antenna and rth receive antenna. The noise n r (n) ∈ C N ×1 at the rth receive antenna and the nth OFDM symbol is the zero-mean complex AWGN with covariance matrix σ 2 I N , i.e. n r (n) ∼ CN (0, σ 2 I N ). After FFT transformation, the estimated CSI for each transmitter-receiver pair is obtained by the proposed AWA and IMOT based channel estimation method, which will be detailed in Section III. For the 2 × 2 STBC MIMO-OFDM system, it is assumed that the channel responses are constant during two sequential OFDM symbols. Therefore, STBD can be implemented only by using CFR estimated for a specific OFDM symbol, and the received signals corresponding to the kth subcarrier after STBD can be represented as [18]: whereŝ n (k) andŝ n+1 (k) corresponding to the kth subcarrier are the decoded QPSK symbols at the nth and (n + 1)th time slots, respectively. Y r (k, n) and Y r (k, n + 1) are the received frequency-domain data at the rth receive antenna, and located on the kth subcarrier of the nth and (n+1)th OFDM symbols, respectively.Ĥ tr (k, n) represents the estimated CFR between the tth transmit antenna and rth receive antenna. Obviously, the accuracy of channel estimation directly affects the recovery quality of the final received signals.

B. SPARSE CHANNEL MODEL
In typical broadband wireless channels, the CIR is intrinsically sparse due to several significant scatterers [31]. For a 2 × 2 MIMO-OFDM system, the CIR h tr (τ ) at the time τ between the tth transmit antenna and rth receive antenna can be modeled as: where the quantities α l tr , τ l tr , δ(·) and L represent the path gain and path delay for the lth multipath component, the Dirac delta function, and the total number of dominant channel paths, respectively. In Rayleigh fading, at any time instant α l tr (τ ) can be modeled as a complex Gaussian random variable with zero mean and variance σ 2 l /2 per branch. The total channel energy is normalized to one, i.e. L−1 l=0 E (α l tr ) 2 = L−1 l=0 σ 2 l = 1. For the STBC MIMO-OFDM system, it is assumed that the channel responses are constant for each codeword matrix [11]. Therefore, it is reasonable to consider that the CIR is quasi-static during one OFDM symbol. Consequently, the time-domain sampled equivalent CIR h tr [m, n] at the mth sample of the nth OFDM symbol can be expressed as: where T s is the system sample period. For each dominant channel path, depending on whether its delay τ l tr is integer multiples of T s or not, its energy is mapped into one tap or leaks over more adjacent taps. It should be noted that the non-sample spaced channel can be approximated as sample spaced when over-sampling is applied [43]. Therefore, for simplicity without losing generality, the CIR considered in this paper is only sample spaced, and it exhibits a perfect sparse structure, which can be represented as: where c l tr is an integer and τ l tr = c l tr T s . Hence, c l tr indicates the position of the lth nonzero CIR tap, i.e. lth MST, and the set C tr = [c 0 tr , c 2 tr , · · · , c L−1 tr ] denotes the CIR support of the channel between the tth transmit antenna and rth receive antenna. Since the CIR is intrinsically sparse, the number of the nonzero CIR taps is much smaller than the maximum channel delay spread, i.e. L M . Following the continuous transmission of OFDM symbols, the channel for each transmitter-receiver pair can be abstracted as two domains, i.e., tap and symbol, which is shown in Fig. 2. In Fig. 2, the arrows groups are CIR gains of the channels. Each arrow represents a nonzero tap for the CIR. Consequently, the figure illustrates the variation of the path gains for a L-tap sparse channel.
In tap domain, the CIR for a specific OFDM symbol is modeled by (6), and the tap-based CIR at the nth OFDM symbol for the tth transmit antenna and rth receive antenna can also be expressed as a sparse vector: h tr (n) ∈ C M ×1 is a L-sparse vector, and its nonzero elements are indexed by the support set C tr . For wireless channels, in symbol domain, the variation of the path delays is much slowly in contrast to the path gains. Thus, the CIR support is nearly unchanged over a large number of OFDM symbols [19]. Moreover, based on the inherent temporal correlation of the time-varying wireless channel, the variation of the path gains is continuous and highly correlated for adjacent OFDM symbols, and it can be modeled using an underlying LGM model. For ease of notation, this paper uses h (l) n = α l tr (n) to represent the lth path gain of any transmitterreceiver pair at the nth OFDM symbol. Provided that the lth CIR nonzero tap remains active, the probability density function (PDF) for h (l) n would be illustrated as the following equations [24]: where σ 2 l is the average power of the lth path and γ is the temporal autocorrelation of the CIR.

C. TIME-DOMAIN LS CHANNEL ESTIMATION
The LS estimator can be directly applied to STBC MIMO-OFDM systems by using space-time orthogonal pilot pattern [18]. In space-time orthogonal pilot pattern, the pilot symbols from different transmit antennas are orthogonal in time domain instead of in frequency domain. Therefore, N P uniformly spaced subcarriers (with N P ≤ N ) are used as pilot subcarriers which are shared by different transmit antennas, and the pilot position ranged in ascending order is denoted as [P 1 , P 2 , · · · , P N P ]. In 2 × 2 MIMO-OFDM system, the space-time orthogonal pilot pattern can be expressed as: where represents the pilot symbol corresponding to the tth transmits antenna and the nth OFDM symbol. X P is the pilot vector with element randomly drawn from the QPSK constellation, and mod(·) is the modulo operation. After substituting (10) into (2), the FFT output Y rP (n) ∈ C N P ×1 over the pilot subcarriers at the rth receive antenna and the nth OFDM symbol can be expressed as: where F P ∈ C N P ×M and n rP (n) ∈ C N P ×1 are both the submatrix indexed by [P 1 , P 2 , · · · , P N P ] in row, obtained from the matrix F and n r (n), respectively. It can be seen from (11) that for a specific transmitter-receiver pair, there is no interference from other transmit antennas over the pilot subcarriers. For ease of notion without losing generality, this paper drops the antenna index in subscript when next to illustrate the channel estimation method, and the CIR LS estimateĥ LS (n) at the nth OFDM symbol can be expressed as [45]: where F + P = (F H P F P ) −1 F H P = (1/N P )F H P , the superscript (·) H stands for Hermitian transposition. h(n) ∈ C M ×1 is the actual CIR and n LS (n) ∈ C M ×1 is the noise in the CIR LS estimate, which is the zero-mean complex AWGN with covariance matrix σ 2 n I M , i.e. n LS ∼ CN (0, σ 2 n I M ). Then, this paper will characterize the statistical properties of the LS estimation. Considering the sparsity of the wireless channel, the LS CIR in (12) can also be represented as: In typical Rayleigh fading channel, the CIR coefficient h(m, n) can be modeled as a zero-mean complex Gaussian random variable with variance σ 2 m /2 per dimension. Actually, it can be seen from (4) and (6), when m is the lth element in the CIR support set C, it is a fact that σ 2 m ≡ σ 2 l . Since n LS (m, n) is the complex AWGN, the independence between h(m, n) and n LS (m, n) allows to deduce that h(m, n) + n LS (m, n) is also a zero-mean complex Gaussian random variable. Therefore, the random variableĥ LS (m, n) is distributed as:

III. PROPOSED CHANNEL ESTIMATION METHOD BASED ON AMA AND IMOT
To suppress the noise effect and improve the estimation accuracy without increasing complexity in any significant way, this paper proposes a novel channel estimation method which consists of two schemes: AMA and IMOT. For the noise suppression of these two schemes, AMA scheme, based on the inherent temporal correlation of the time-varying channel, is achieved by averaging the estimated channel coefficients of the latest few frames; IMOT scheme, based on the sparsity of the wireless channel, is performed by recovering the CIR support across the optimal ''tap-to-tap'' threshold and removing all the noise-only taps. Furthermore, these two schemes can also be adopted separately. Next, this paper will detail these two schemes firstly, and then illustrate the combination of the two schemes.
A. AMA SCHEME As can be seen from (12), the noise in the LS CIR is the main factor that restricts the accuracy of channel estimation, and the MSE of the LS channel estimation can be derived as: Therefore, the MSE of the LS estimation is determined by the noise power. To reduce the MSE of the LS estimation, multi-frame averaging is used to suppress the noise power. If the wireless channel is time-invariant, i.e. h(i) ≡ h(n), i = 1, 2, · · · , N n , the averaging of multi-frame LS CIR can be expressed as: where 1 ≤ n ≤ N n , the quantities F, N n ,ĥ AVE (n), and n AVE (n) are the average frame number, the total number of transmitted OFDM symbols, the denoising CIR, and the noise of the nth OFDM symbol after multi-frame averaging, respectively. It is noted that n AVE (n) is still the zero-mean complex AWGN, but its variance has been reduced to σ 2 n /F. Thus, the MSE after multi-frame averaging can be expressed as: As can be seen from (17), the MSE of the multi-frame averaging method decreases with the increase of F in the timeinvariant channel. However, in the time-varying channel, due to the effect of Doppler spread, the path gains during multiple continuous OFDM symbols cannot be kept constant. Nevertheless, based on the inherent temporal correlation of the time-varying channel, the multi-frame averaging method can still work by carefully selecting F. For the mth CIR tap of the nth OFDM symbol, the estimation error ε (m,n) after multi-frame averaging can be expressed as: Subsequently, the MSE after multi-frame averaging in the time-varying channel can be derived as: where the first and second item in the MSE results are contribute from the Doppler distortion D m F and the noise, respectively.
It can be seen from (19) that both Doppler distortion and noise are controlled by the parameter F. Therefore, to achieve the minimum of the MSE after multi-frame averaging, it is necessary to adaptively obtain and adjust F according to the channel parameters, which is also the main idea of the proposed AMA scheme. Next, to derive the optimal average frame number F OPT , D m F in the MSE result should be further analyzed.

1) ANALYSIS OF THE DOPPLER DISTORTION
To quantitatively analyze the effect of Doppler distortion in the averaging process, the LGM model in (8) and (9) can be extended to describe the variation of the CIR tap between any two different OFDM symbols [24]: where j is an integer, m ∈ C, and h m n = h (m, n) represents the value of the mth CIR tap at the nth OFDM symbol. γ j is the temporal autocorrelation between h(n) and h(n + j), it is given by [46]: where J 0 (·), f d , and T G = (N + N CP )T s represent the first type of zero-order Bessel function, Doppler spread, and the OFDM symbol period with the GI, respectively. Obviously, γ j is an even function, i.e. γ j = γ −j .
According to (21), h m n+j can be represented by h m n : where e m j ∼ CN (0, (1 − γ 2 j )σ 2 m ) represents the modeling error, and it is independent with the CIR. Therefore, the Doppler distortion D m F can be expressed as: where D m Fj = h m n−j − h m n represents the variation of the CIR tap after j OFDM symbols, based on (23), it can be re-written as: Based on (22)-(25), the first item in the MSE result (19), which is contribute from the Doppler distortion, can be further analyzed, and (19) can be re-written as: where φ F = F−1 j=0 (γ j − 1). Obviously, based on (22), (26) has only three unknown parameters: F, f d , and σ 2 n . Next, this paper uses two simple methods to estimate f d and σ 2 n , respectively.

2) ESTIMATION OF UNKNOW PARAMETERS
It can be seen from (12) that the LS noise n LS (m, n) is a complex AWGN, and consequently its amplitude |n LS (m, n)| follows the Rayleigh distribution with cumulative distribution function (CDF) given by: For the nth OFDM symbol, when Q n (x) = 0.5, the corresponding value of x is the median of |n LS (n)|, thus the relation between the variance of n LS (n) and its amplitude's median value median(|n LS (n)|) can be written as [40]: [median(|n LS (n)|)] 2 ln 2 .
For sparse channel, the number of the nonzero CIR taps is much smaller than the maximum channel delay spread, the majority of taps are noise-only tap. Therefore, it is  (32) γ min : allowable minimum temporal autocorrelation N n : total number of transmitted OFDM symbols T G : OFDM symbol period with the GI F: average frame number F max : allowable maximum average frame number F OPT : optimal average frame number The Searching Procedure: 1: Initialization Determine the search space The estimation of f d is investigated in [45], and it is based on the fact that the autocorrelation function of the received pilot symbols is the same as that of the CIR, i.e. γ (j) = J 0 (2πf d |j| T G ). Searching for the first negative value of γ (j), and then let that j be z. Then, the first zero crossing point z 0 of γ (j) can be calculated as [45]: where the autocorrelation γ (·) is estimated as follows: The first zero crossing point of J 0 (x) is x = 2.405 [45]. Thus, using the estimated zero crossing point z 0 , the Doppler spread VOLUME 8, 2020 can be estimated as:f

3) THE IMPLEMENTATION OF AMA
The main idea of the proposed AMA scheme is to adaptively adjust F according to the parametersσ 2 n andf d , which are estimated from (29) and (32), to obtain F OPT minimizing the MSE AVE in (26). Note that F is a positive integer and it is limited to the range of [1, N n ], therefore F OPT can be directly searched step by step with step length F = 1. At the same time, considering that there should be a strong temporal correlation between the frames used for averaging, the search space can be further reduced by setting an allowable minimum temporal autocorrelation γ min . The procedure to search F OPT is described in Algorithm 1.
In Algorithm 1, we first reduce the search space by determining the maximum allowable average frame number F max , and then search F OPT step by step in the reduced search space [1, F max ]. Based on the LS estimate and the Algorithm 1, the specific process of the proposed AMA scheme is shown in Fig. 3. It is noted that the red dotted line block in Fig. 3 is actually the Algorithm 1, which is also the core process of the AMA scheme.

B. IMOT SCHEME
Due to the introduction of Doppler distortion in the timevarying channel, the performance of AMA scheme in the scenarios of high Doppler spread will not be improved apparently. To suppress the noise effect in more general scenarios, the sparse properties of broadband wireless channels can be exploited. In typically sparse channel, not all the LS CIR taps are significant; in fact, most of them contain only noise. Therefore, by collecting all the positions of the MSTs and discarding other noise-only taps, the noise in LS CIR can be significantly suppressed. Obviously, the critical aspect of this method is the recovery of the CIR support.
The TBS method is a simple and effective strategy to recover the CIR support [38]- [43], and it is based on the concept that only those CIR taps whose amplitude overcome the threshold T are retained: where for ease of notation, we drop the symbol index without any loss of generality. It is clear that, how to determine an appropriate threshold T is the primary and crucial task for the TBS method. Different from typically global threshold [38]- [42], which is constant in one OFDM symbol, this paper adopts a ''tap-to-tap'' threshold T m , which is inspired by [43], to track and recover the CIR support, hoping to achieve an improved recovery performance. This ''tap-totap'' threshold T m allows the value of the threshold for each CIR tap can be different in one OFDM symbol, and it is obtained by deriving a closed form expression for the TBS MSE per CIR tap, and then minimizing it with respect to T m .

1) ANALYSIS OF THE TBS MSE PER CIR TAP
Based on the estimated LS CIR in (13), for the mth CIR tap, the estimation error ε m TBS = ĥ TBS (m) − h(m) after the threshold decision of T m can be expressed as: Then, the tap elementwise MSE can be derived as: It can be seen from (34) and (35) The analytical derivation is provided in Appendix. A. Missed Alarm (MA): A significant tap that actually contains channel energy is rejected, since it does not overcome the threshold, i.e. ĥ LS (m) = |h(m) + n LS (m)| < T m . This case contributes to MSE m TBS with the neglected channel energy, and MSE m TBS is expressed as: The analytical derivation is also provided in Appendix. B.

Correct Detection (CD):
A significant tap that contains channel energy is correctly detected as MST, i.e. ĥ LS (m) = |h(m) + n LS (m)| ≥ T m . In this case, it contributes to MSE m TBS with the noise components of the estimated LS CIR, and MSE m TBS is expressed as: Based on the above analysis of these four possible cases, the tap elementwise MSE after threshold decision can be written as sum of three terms that represent the different contributions from, respectively, FA, MA, and CD: where P m is the prior probability of the mth tap to be active. P MA m , P CD m , and P FA m are respectively the probabilities of missed alarm, correct detection, and false alarm. It can be seen from (14) thatĥ LS (m) is a complex Gaussian random variable, and consequently its amplitude ĥ LS (m) follows the Rayleigh distribution. Therefore, the probabilities P MA m , P CD m , and P FA m can be derived as: Substituting (36)- (38) and (40)- (42) into (39), the tap elementwise MSE can be re-written as: where

2) OPTIMAL THRESHOLD: ANALYTICAL EXPRESSION
The aim of the IMOT is to find the optimal ''tap-to-tap'' threshold T IMOT m which minimizes MSE m TBS , which equals to the search of the global minimum of (43). For this purpose, the first order derivative of MSE m TBS (T m ) with respect to T m is analyzed. From (43), the derivative is expressed as: where µ 1 and µ 2 are respectively the first order derivative of µ 1 and µ 2 with respect to T m . Setting (44) equals to zero, and it can be expressed as: Note that (45) is a transcendental equation, so it is hardly to obtain its analytical solution directly. Intuitively, the complexity of (45) is mainly from µ 1 and µ 2 , which are related to T m . Therefore, to simplify the equation (45), this paper assumes that ''tap-to-tap'' threshold T m in µ 1 and µ 2 can be approximated by a global thresholdT proposed in [40]:T = σ n 2 ln(N CP ).
This assumption is reasonable, since the global threshold T is simple and shows great performance [40]. Under this assumption,μ 1 =T 2 (eT 2 σ 2 m − 1) andμ 2 =T 2 , which are both constant during one OFDM symbol, are used to replace µ 1 and µ 2 in (45) respectively. In this case,μ 1 = µ 2 ≡ 0, and (45) can be re-written as: It can be seen that (47) is true if and only if at two possible values of T m , denoted T m1 and T m2 , which verify: Obviously, T m1 is meaningless, and the optimal ''tap-to-tap'' threshold is only related to T m2 . The equation (49) makes sense if and only if 0 ≤ α ≤ 1. In the following, we will discuss two cases.
• if α > 1, in this case β > 0 is always true, which results in ∂MSE m TBS ∂T m ≥ 0. Then, the optimal ''tap-totap'' threshold minimizing the MSE m TBS is T IMOT m = T m1 = 0. In other words, any taps are treated as MSTs and retained.

3) THE IMPLEMENTATION OF IMOT
The equations (47) and (50) show that for T IMOT m evaluation, the parameters σ 2 n , σ 2 m , and P m must be known. The estimated LS noise varianceσ 2 n can be obtained according to (29). Meanwhile, the variance of the CIR coefficient σ 2 m can also be estimated from the raw LS CIR estimateĥ LS (m) [43]: Furthermore, the parameter P m is the prior probability of the mth tap to be active, and it is crucial for the accuracy of the optimal ''tap-to-tap'' threshold. In the literature [43], two possible values are proposed to evaluate P m . One isP m =σ 2 m denoted as ''local sparsity level (LSL)'', and the other isP m = P ≡ L N CP denoted as ''global sparsity level (GSL)''. The performance of LSL is slightly better than GSL, and the calculation of LSL requires no prior KCS about channel sparsity degree. Therefore, based on LSL and the raw LS CIR estimate, this paper proposes an enhancement method to evaluates P m by calculating the confidence level through multi-frame statistics, which uses the probability distribution of the AWGN and thus can further improve the performance of the optimal ''tap-to-tap'' threshold.
Suppose the number of statistical frames is W , for every W consecutive frames, the estimated multi-frame LS CIRs [ĥ LS (n),ĥ LS (n+1), · · · ,ĥ LS (n+W −1)] are collected. Next, for the mth tap of the collected multi-frame LS CIRs, this paper counts the number of the real part of the mth LS CIR tap value located in the intervals of (−∞, 0) and (0, +∞), which can be expressed as N m Re1 and N m Re2 respectively [3]: whereĥ m LS,n =ĥ LS (m, n) represents the value of the mth LS CIR tap at the nth OFDM symbol, count[·] and Re(·) denote the operations of element counting and extracting the real part of an element respectively. In the same way, the number of the imaginary part of the mth LS CIR tap value located in the intervals of (−∞, 0) and (0, +∞) can be expressed as N m Im1 and N m Im2 respectively. Then, this paper defines a variable N m max as: where N m Re = max(N m Re1 , N m Re2 ) and N m Im = max(N m Im1 , N m Im2 ). Without losing any generality, W is assumed to be an odd number for convenience, so all the possible value range of both N m Re and N m Im is [(W + 1)/2, (W + 3)/2, · · · , W ]. Consequently, all the possible values of N m max are only in the range of [W +1, W +2, · · · , 2W ]. If the mth LS CIR tap is noise-only tap, i.e. m / ∈ C, based on the PDF of the zero-mean complex AWGN, the probabilities of the real or imaginary part of the mth tap value located in the intervals of (−∞, 0) and (0, +∞) both are 0.5. So, in the case of m / ∈ C, N m Re = N m Im = (W + 1)/2 is highly possible, and N m max ≈ W + 1 is an event with high confidence level. On the other hand, considering the power of the propagation channel path is usually much greater than noise power in practice [43], if the mth LS CIR tap is MST, i.e. m ∈ C, the values of N m Re and N m Im are almost only depend on the pure CIR tap (h m n , h m n+1 , · · · , h m n+W −1 ). Although the path gain values during multiple continuous OFDM symbols in the time-varying channel are variable, the positive or negative signs of the path gains among W statistical frames are almost constant, which is based on the inherent temporal correlation of the time-varying channel [3]. Therefore, in the case of m ∈ C, N m Re = N m Im = W is more possible, and N m max ≈ 2W is an event with high confidence level.
Summarily, after conducting multi-frame statistics, the resulting variable N m max will indicate whether the mth CIR tap is active or not. That is, the closer N m max is to 2W , the higher the confidence level of the mth CIR tap is active. Conversely, the closer N m max is to W + 1, the lower the confidence level of the mth CIR tap is active. Therefore, this paper plans to use the confidence level to evaluate P m , which can be expressed as:P where i ∈ [W + 1, W + 2, · · · , 2W ]. According to the Bayes formula, (55) can be re-written as: ∈ C), and P(m ∈ C) denotes the initial probability of the mth CIR tap is active, it can be evaluated by LSL, i.e. P(m ∈ C) =σ 2 m . It is noted that the calculation of P i 1 and P i 2 involves the derivation of the distribution law of the sum of two independent discrete variables, and the specific calculations of P i 1 and P i 2 are provided in Appendix. C and Appendix. D, respectively. Now, by taking the calculated P i 1 and P i 2 into (56), the specific confidence levelP m (i) after multi-frame statistics can be obtained to evaluate P m . So far, the unknown parameters   Fig. 4. Noted that the red dotted line blocks in Fig. 4 are the core processes of the IMOT scheme.

C. THE COMBINATION OF AMA AND IMOT SCHEMES
Although the IMOT scheme can effectively suppress noise effect by discarding noise-only taps, the noise existing in MSTs still restricts the accuracy of channel estimation. To address this problem, this paper proposes a novel channel estimation method, named as AMA-IMOT, by combining AMA and IMOT schemes. The basic process of the AMA-IMOT method is to make a threshold decision on the estimated CIR after multi-frame averaging. As a result, the AMA-IMOT method can not only remove the noise effect at the noise-only taps, but also suppress the AWGN at the MST taps. Furthermore, since the derived optimal threshold shown in (50) is related to noise power, the noise suppression of the preceding AMA scheme can enhance the performance of the subsequent IMOT scheme naturally, and the TCSDR can be further improved. The specific process of the proposed AMA-IMOT channel estimation method is shown in Fig. 5.
To achieve the appropriate combination of scheme (a) AMA and the scheme (b) IMOT, some modifications different from the separately schemes are required, and the modified steps have been marked by yellow filling block in Fig. 5. Since the noise in noise-only taps will be removed by IMOT scheme, the MSE contribute from the noise-only taps can be ignored for searching the optimal average frame number in scheme (a), and the MSE after multi-frame averaging shown in (26) should be re-written as: whereL is the estimated channel sparsity degree. In theory, the best performance is obtained in case ofL = L, however, as shown in the simulation results, the AMA-IMOT method is robust even in the case of a mismatch betweenL and L. Therefore, in scheme (a) the searching of the optimal average frame number is still based on Algorithm 1, only the MSE expression (26) in step 3 should be replaced by (57).
Since the introduction of Doppler distortion, it is a fact that F OPT = 1 is a common case in the scenarios of high Doppler spread. In other words, the multi-frame averaging method is invalid for high-mobility situations. To improve the TCSDR in the case of F OPT = 1, scheme (b) is divided into two cases depending on the value of F OPT .
In the case of F OPT = 1, the process of scheme (b) is almost the same as that of IMOT scheme, with the main difference being the initial input CIR. The initial input CIR of scheme (b) can be expressed as: For the derivation of the optimal threshold, the noise and path variances of the initial input CIR must be known. Obviously, the estimation of the noise variance of the initial input CIR h AVE is σ 2 A = σ 2 n F OPT , and the estimation of the path varianceσ 2 m can be obtained by replacingĥ LS andσ 2 n in (51) withĥ AVE andσ 2 A . Then, by substitutingσ 2 m andP m into (50) and replacing σ 2 n with σ 2 A , the optimal ''tap-to-tap'' threshold T IMOT m can be calculated. Finally, the estimated CIRĥ AI of the AMA-IMOT method in the case of F OPT = 1 can be expressed as: On the other hand, in the case of F OPT = 1, the noise suppression ability of the scheme (a) is invalid, but the TCSDR of the derived threshold can also be improved by extending the observation window over several OFDM symbols, which is similar to the multi-frame averaging in form, but the parameter F OPT is replaced by F max : Then the estimation of the noise variance of the initial input CIRĥ AVE is σ 2 A = σ 2 n F max , and the estimation of the path VOLUME 8, 2020 variance is same as the case of F OPT = 1. However, although the threshold is still obtained based onĥ AVE and (50), it is noted that the coefficient of the finally estimated CIRĥ AI after threshold decision is based onĥ LS : Therefore, the estimated CIR after maximum averaging is only used to recovery the CIR support, but the finally obtained values of the MSTs are still based on the LS CIR. In other words, in the case of F OPT = 1, the TCSDR of AMA-IMOT method is further improved than IMOT scheme, but the noise in the MSTs is no longer suppressed.

IV. SIMULATION RESULTS AND DISCUSSIONS
In this section, the simulation experiments are presented to demonstrate the performance of the proposed AMA-IMOT channel estimation method. The simulation is performed in the time-varying multipath environments with different Doppler spread. For obtaining a comprehensive evaluation of the proposed AMA-IMOT method, three sparse channel models, namely Channel-A [19], Channel-B [29], and Channel-C [37], with different channel sparsity degrees and power delay profiles (PDPs) are considered. The Channel-A model is the typical exponentially decaying sparse channel with a CIR length M = 256, where only L = 6 taps with random positions are non-zero MSTs. An exponentially decaying PDP with rate γ = 4/M is used in the Channel-A, thus the path power for the MST is σ 2 m = e −γ m . The Channel-B is the advanced television technology center (ATTC) and the Grand Alliance DTV laboratory's ensemble E model, while the Channel-C model is the Hilly Terrain channel. The PDPs of both models are shown in Tables 1 and 2,  respectively. This paper considers a QPSK modulated 2 × 2 STBC MIMO-OFDM system, and the main simulation parameters for the system are presented in Table 3. In the STBC MIMO-OFDM system, the number of the total subcarriers is 1280, and the CP occupies 256 subcarriers. Therefore, the total number of actual application subcarriers including pilots and data is N = 1280 − 256 = 1024, where N P = N /4 = 256 comb type pilot subcarriers are employed. For both pilots and data, the symbols are drawn from a QPSK constellation. The system sample period is T s = 0.1 µs and the duration of each OFDM symbol with the GI is T G = 1280 × 0.1 µs = 128 µs. Moreover, N n = 200 OFDM symbols are considered for each simulation, and the finally simulation results are obtained by averaging totally 40000 trail runs. In dynamic multipath environments, Doppler spread is chosen to be 20 Hz, 40 Hz, 80 Hz, and 160 Hz, respectively. For a better study of the performance of the proposed channel estimation method, neither interleaving methods nor any channel coding techniques are used.
In this paper, the performance of channel estimation is evaluated in terms of BER, TCSDR, and NMSE, and the   TCSDR and NMSE are defined as: where N FA and N MA respectively represent the number of false alarm taps and missed alarm taps after threshold decision,ĥ tr and h tr respectively denote the estimated CIR and real CIR for the tth transmit antenna and rth receive antenna. In the following, we will start with the parameter selection and robustness evaluation of the proposed AMA-IMOT method. Then, the performance of the proposed method is compared with those of the conventional channel estimation methods, i.e. time-domain LS method [25], several different TBS methods respectively proposed by Kang et al. [38], Oliver et al. [41], and Jellali and Atallah [43], and the CS-based OMP method [31]. Moreover, two ideal channel estimation methods, known the true CIR support and the perfect CSI respectively, are simulated as the benchmark. After that, the complexity of the proposed method is analyzed.

A. ANALYSIS OF PARAMETERS SETTING
For the proposed AMA-IMOT channel estimation method, the parameters γ min and W should be set in advance, at the same time, a prior channel sparsity degree L should be estimated to combine scheme (a) and scheme (b). In this subsection, the effects of different γ min and W on channel estimation accuracy will be demonstrated and analyzed based on simulation results. Meanwhile, the robustness of the proposed AMA-IMOT method against mismatch on the parameter L will be tested. In Fig. 6, under Channel-A with different SNR and Doppler spread, the TCSDR performance of the proposed AMA-IMOT method with different W is presented. It can be observed that the influence of W parameter on the TCSDR performance of the AMA-IMOT method is not obvious at the low Doppler spread range (from 0 Hz to 50 Hz); however, in majority of considered Doppler spread range (from 50 Hz to 250 Hz), with the increase of Doppler spread, the TCSDR performance of the AMA-IMOT method improves with the increase of W , especially in the low SNR scenario. In the case of SNR = 0 dB, the TCSDR gaps among W = 1, W = 3, W = 5, W = 7, W = 9, and W = 11 are about 0.13%, 0.09%, 0.07%, 0.05%, and 0.02% at the Doppler spread of 240 Hz, respectively. It is clearly that the TCSDR improvement obtained by increasing W is gradually decreasing. Meanwhile, considering the complexity of the multiframe statistics increases with the increase of W , to obtain a tradeoff between performance and complexity, this paper finally chooses W = 7 for the proposed AMA-IMOT method and all the following simulation.
Under Channel-B with different SNR and Doppler spread, Fig. 7 displays the NMSE performance of the proposed AMA-IMOT method with different γ min . It can be seen that the AMA-IMOT method is not sensitive to the change of γ min parameter, especially in the high SNR scenario. In a fixed SNR scenario, with the increase of Doppler spread, the NMSE of the AMA-IMOT method with different γ min parameters converge to a same value. Meanwhile, the AMA-IMOT method with larger γ min reaches the convergence value earlier. At the SNR = 5 dB, the NMSE of the AMA-IMOT method with γ min = 0.99, γ min = 0.98, γ min = 0.96, and γ min = 0.93 reaches the convergence value respectively at the Doppler spread of 150 Hz, 180 Hz, 270Hz, and 360Hz. Therefore, to reach the NMSE convergence value at a larger Doppler spread, the γ min parameter should be as smaller as possible. On the other hand, considering the search space of the optimal average frame number can be reduced by using the larger γ min parameter, this paper finally chooses γ min = 0.90 for the proposed AMA-IMOT method and all the following simulation. Fig. 8 reports the comparison between AMA-IMOT and the conventional OMP channel estimation method under the Channel-C with L = 12. Since both methods require the knowledge of the channel sparsity degree L, this paper tested the robustness to a mismatch on this parameter. Even in the case of no mismatch, i.e.L = 12, AMA-IMOT outperforms OMP at majority of considered SNR range. WhenL differs from L, especiallyL < L, OMP performance degrades rapidly. On the other hand, AMA-IMOT method has good performance even in the case of a large mismatch. Obviously, different from the OMP method, the AMA-IMOT method is robust to a mismatch on the L, which relaxes the requirement for the prior channel sparsity degree, indicating that the AMA-IMOT method is suitable for practical applications.
Therefore, this paper assumes that the L parameter is a prior knowledge for the following simulation, i.e.L = L.

B. ANALYSIS OF TCSDR AND NMSE PERFORMANCE
In this subsection, the TCSDR and NMSE performance of the proposed method is analyzed by comparing with several different TBS methods and the CS-based OMP method. The TCSDR performance of different channel estimation methods under static Channel-A is shown in Fig. 9.
As shown in Fig. 9, the TCSDR curves of the six channel estimation methods all improve with the increase of the SNR, and the proposed AMA-IMOT method has the best TCSDR performance in all the considered SNR range. Not only that, the IMOT scheme can also be used separately,  and obtained the sub-optimal TCSDR performance. At the SNR of 0 dB, compared with the OMP method and the TBS methods derived by Jellali et al., Oliver et al., and Kang et al., the IMOT method has about 0.82%, 0.98%, 3.72%, and 12.04% TCSDR improvement, respectively. Although the ''tap-to-tap'' threshold is also be used by Jellali et al., the calculation of the confidence level of the tap to be active further improve the performance of the IMOT method. Fig. 10 displays the TCSDR and NMSE performance of different channel estimation methods under dynamic Channel-A with Doppler spread 20 Hz. Compared Fig. 10(a) with Fig. 9, it can be seen that the TCSDR performance under Doppler spread of 20 Hz is almost identical with that of the static case, indicating that the channel estimation method based on CIR support recovery can be well applied to timevarying channel. As shown in Fig. 10(a), except for TBS method of Kang et al., all other channel estimation methods can recover the CIR support almost exactly at the high SNR region, which is mainly due to only the threshold derived by Kang et al. is not based on optimal criterion, and its false alarm probability is high. In Fig. 10(b), it is clear that the time-domain LS and the separate AMA scheme cannot provide satisfactory estimation performance, since both of them doesn't exploit the sparse property of channels. In other words, the conventional methods are not suitable for the sparse channel estimation. Nevertheless, the AMA scheme can be used to further significantly improve the performance of IMOT scheme, and enables the final AMA-IMOT method has the best NMSE. At the NMSE of 10 −1 , compared with the true support and the IMOT methods, the AMA-IMOT method has about 3.8 dB and 4.4 dB SNR gains respectively. Obviously, the AMA-IMOT method performs even better than the ideal channel estimation with the true CIR support, the main reason is that the AMA-IMOT method can not only remove the noise effect at the noise-only taps, but also suppress the AWGN at the MST taps.
The TCSDR and NMSE performance of different channel estimation methods under dynamic Channel-B with Doppler spread 40 Hz is shown in Fig. 11. In Fig. 11(a), compared with other conventional CIR support recovery-based methods, the proposed AMA-IMOT and IMOT methods still show the optimal and sub-optimal TCSDR performance, respectively. At the TCSDR of 0.99, the AMA-IMOT methods exhibits a SNR gain of 4.0 dB, 8.1 dB, and 9.3 dB over IMOT, TBS of Jellali et al., and OMP methods, respectively. It is clear that the TCSDR performance of IMOT scheme can be further improved after the noise suppression of AMA scheme, which is because the optimal threshold derived in IMOT scheme is also noise-related. In Fig. 11(b), the NMSE performance of all methods descends with the increase of SNR, and the proposed AMA-IMOT method still has the best performance in majority of the considered SNR range. At the NMSE of 10 −1 , the AMA-IMOT method can provide about 1.2 dB and 2.7 dB SNR gains compared with the true support and the IMOT methods, respectively. However, the SNR gains in Fig. 11(b) are obviously lower than that of Fig. 10(b), the main reason is the optimal average frame number F OPT decreases with the increase of Doppler spread, resulting in the reduced noise suppression ability of AMA scheme.
The TCSDR and NMSE performance of different channel estimation methods under dynamic Channel-C with Doppler spread 80 Hz and 160 Hz is shown in Figs. 12 and 13, respectively. By comparing Figs. 12 and 13 with Fig. 11, it can be seen that the performance under Channel-C is worse than Channel-B, the reason is that Channel-C has stronger frequency selectivity and Doppler spread than Channel-A. However, the proposed AMA-IMOT method still has the best TCSDR and NMSE performance among the six CIR support recovery-based methods. By comparing Fig. 12 and Fig. 13, it can be seen that the NMSE performance of AMA-IMOT method under Doppler spread 160 Hz is obviously inferior to that of 80 Hz. At the target NMSE of 10 −1 , the AMA-IMOT method outperforms the true support and the IMOT methods with a SNR gain of 0.8 dB and 1.8 dB under Doppler spread 80 Hz; however, under Doppler spread 160 Hz, the AMA-IMOT method slight inferior to the true support method, and outperforms the IMOT method only with a SNR gain of 0.5 dB. The main reason is that the noise suppression ability of AMA scheme degrades rapidly in the scenario of high Doppler spread. Specially, the scheme (a) in the AMA-IMOT method only considers the noise in the MSTs, as shown in (57), making the optimal average frame number F OPT in AMA-IMOT method is smaller than that of the separate AMA scheme under the same conditions. On the other hand, the TCSDR performance of AMA-IMOT method under Doppler spread 80 Hz and 160 Hz is almost identical. This is because although the noise suppression of multiframe average becomes invalid in AMA-IMOT method, i.e.  F OPT = 1, at the SNR = 13 dB and the SNR = 4 dB for the Doppler spread 80 Hz and 160 Hz respectively, the TCSDR performance can still be improved by extending the observation window through maximum average.

C. ANALYSIS OF BER PERFORMANCE
In this subsection, the BER performance of the proposed method is analyzed. The BER performance of different channel estimation methods under static Channel-A and dynamic Channel-A with Doppler spread 20 Hz is shown in Figs. 14 and 15, respectively.
As shown in Figs. 14 and 15, the BER performance of OMP method and the TBS method of Jellali et al. is almost identical, and the proposed AMA-IMOT has the best BER performance expect the ideal estimation with the perfect CSI, which is coincides with the TCSDR and NMSE performance shown in Figs. 9 and 10. In Fig. 14, at the BER of  This again confirms that the proposed method is capable of significantly suppressing noise, thereby enabling reliable communication especially in the low-to-medium SNR scenarios.
Figs. 16 and 17 display the BER performance of different channel estimation methods under dynamic Channel-B and Channel-C, respectively. By comparing Fig. 16 and Fig. 17, it can be seen that with the increase of Doppler spread or SNR, the BER gap between the AMA-IMOT method and perfect CSI increases. For example, in Fig. 16 with Doppler spread 40 Hz, at the BER of 10 −3 , compared with perfect CSI, the AMA-IMOT method has about 0.2 dB SNR degradation; in Fig. 17 with Doppler spread 80 Hz, at the BER of 10 −4 , compared with perfect CSI, the AMA-IMOT method has about 0.5 dB SNR degradation. Nevertheless, compared with other conventional channel estimation methods, the AMA-IMOT method still maintains the best BER performance. This is because although the AWGN at the MST taps cannot be suppressed by AWA in the scenario with high SNR and high Doppler spread, the superiority of IMOT scheme and the introduction of maximum average make the final AMA-IMOT method work well in time-varying channels.
Finally, to more intuitively present the superiority of the proposed AMA-IMOT method under different scenarios, the comparison of BER and NMSE performance of the AMA-IMOT method with time-domain LS method and TBS method of Jellali et al. under three channels with different Doppler spread and SNR is shown in Table 4. As shown in Table 4, with the increase of Doppler spread, the BER and NMSE performance of the three channel estimation methods all tend to increase. Compared with the time-domain LS and TBS method derived by Jellali et al., the AMA-IMOT method is more sensitive to Doppler spread. Nevertheless, the proposed AMA-IMOT always keeps the optimal performance under all different channel scenarios, especially at the low SNR scenarios, the superior of AMA-IMOT method to the conventional channel estimation methods is more obvious. Therefore, the proposed AMA-IMOT is suitable for the channel estimation of time-varying channels.

D. COMPLEXITY ANALYSIS
In this subsection, the computational complexity (for simplicity only complex multiplications are considered) of the proposed AMA-IMOT method within one OFDM symbol is analyzed. The complexity of the proposed method is mainly composed of four parts, which are the initial time-domain LS CIR estimate, the AMA scheme, the IMOT scheme, and the final FFT operation. The time-domain LS CIR estimate given in (12) can mainly be realized by a N P size IFFT operation, so its computational complexity is around O(N P log 2 N P ). For AMA scheme, the computational complexity of multi-frame averaging and the noise variance estimation in (29) both are O(M ), and the complexity of Algorithm 1 approximates to O(F max ). Note that the estimation of Doppler spread is implemented only once, not for each OFDM symbol, so its computational complexity can be neglected. Considering   into CFR, a N size FFT with computational complexity O(N log 2 N ) is needed, and thus the total computational com- The computational complexity comparison for the proposed AMA-IMOT method and the conventional channel estimation methods is shown in Table 5. As shown in Table 5, a N size FFT operation is required for all CIR-based channel estimation methods, and the computational complexity of AMA-IMOT method is lower than that of the conventional OMP method and slightly higher than that of the TBS method of Jellali et al. Overall, the complexity of AMA-IMOT method is within an acceptable range, and the merit of the proposed method is that it can obtain much better system performance than the conventional channel estimation methods under reasonable computational complexity.

V. CONCLUSION
In this paper, we study the sparsity and the inherent temporal correlation of the time-varying wireless channel, and propose an AWA and IMOT based channel estimation method in STBC MIMO-OFDM systems. First, based on the temporal correlation of the time-varying channel, the AMA scheme is incorporated to refine the initial CIR through multi-frame averaging. By utilizing the LGM model, the optimal average frame number is adaptively determined by minimizing the MSE of the denoised CIR. Then, the sparsity of the wireless channel is utilized to model the CIR as a sparse vector, and the IMOT scheme is performed to further remove the noise effect by discarding most of the noise-only CIR taps. Specifically, the IMOT scheme is achieved by recovering the CIR support across the optimal ''tap-to-tap'' threshold derived by minimizing the MSE of each CIR tap. Meanwhile, the prior confidence level of the tap to be active is calculated through multi-frame statistics to further improve the performance of the IMOT scheme. Finally, considering the AMA scheme is invalid in the scenario of high Doppler spread, the maximum average extending the observation window is introduced to assist the IMOT scheme.
The proposed AMA-IMOT method can not only remove the noise effect at the noise-only taps, but also suppress the AWGN at the MST taps. Compared with the conventional LS, TBS, and OMP methods, simulation results demonstrate that the AMA-IMOT method has the best TCSDR, BER, and NMSE performance in the time-varying channel, especially in the low-to-medium SNR scenarios. Additionally, the proposed method has a lower computational complexity compared with OMP method, and its performance does not depend too much on the prior channel sparsity degree, so the proposed method can be easily implemented in practice and has broad prospects.

A. MSE FOR FALSE ALARM
In the case of FA, the estimation error after threshold decision can be expressed as: Since n LS (m) is the complex AWGN, the noise amplitude |n LS (m)| follows the Rayleigh distribution R(σ 2 n 2), and its PDF is given by: Consequently, the estimation error ε m TBS |FA has a truncated Rayleigh distribution, and it can be represented as [42]: where K 1 is a normalization factor, and it can be given by: Taking (65) Since the power of the propagation channel path is usually much greater than noise power in practice [42], it is reasonable to transform the constraint |h(m) + n LS (m)| < T m into |h(m)| < T m . Consequently, (69) can be re-written as: ε m TBS |MA = |h(m)| |(|h(m)| < T m ) .
In typical Rayleigh fading channel, the amplitude |h(m)| of the CIR coefficient follows the Rayleigh distribution R(σ 2 m 2), and its PDF is given by: Therefore, similar to the case of FA, the estimation error ε m TBS |MA also follows a truncated Rayleigh distribution, and it can be represented as: where K 2 is given by: Taking (71)-(73) into account, the derivation of the MSE for the mth tap in the case of MA is given by: (74)

C. THE CALCULATION OF THE PROBABILITY ONE
In the case of m ∈ C, the (W + 1)/2 × 1 vectors Ψ 1 , Ψ 2 , and the W × 1 vector Ψ are used to represent the conditional distribution law of N m Re , N m Im , and N m max , respectively, which can be expressed as: where ρ ∈ [1, 2, · · · , (W + 1)/2], q ∈ [1, 2, · · · , W ]. Ψ 1 (ρ), Ψ 2 (ρ), and Ψ (q) denote the ρth and qth element of the vectors Ψ 1 , Ψ 2 , and Ψ respectively. Considering N m max = N m Re + N m Im , Ψ is depend on Ψ 1 and Ψ 2 , and it is actually the linear convolution of them: where * denotes the operation of linear convolution. According to (13), for all W statistical frames, the mth LS CIR tap valueĥ m LS, n = h m n + n m LS, n can be regarded as a complex Gaussian random variable with mean h m n and variance σ 2 n , i.e.ĥ m LS, n ∼ CN (h m n , σ 2 n ). Therefore, the PDF f LS (x) of Re(ĥ m LS, n ) can be expressed as: Consequently, the probability P Re of the value of Re(ĥ m LS, n ) located in the interval of (0, +∞) can be calculated as: where erfc[·] is the complementary error function. Since h m n is unknown in practice,ĥ m LS, n is used to approximate it in this paper, and the approximated P Re is shown as: Based on (79) and the binomial theorem, Ψ 1 (ρ) can be calculated as: Analogy to Ψ 1 (ρ), Ψ 2 (ρ) can be calculated as: whereP Im = 0.5 · erfc[−Im(ĥ m LS, n ) σ n ], and Im(·) denotes the operation of extracting the real part of an element. Thus, taking (80) and (81) into (76), the conditional distribution law Ψ of N m max can be completely obtained. Naturally, P i 1 = P(N m max = i|m ∈ C) = Ψ (i − W ) can also be calculated.