Low Complexity Channel Prediction Using TFOS-ELM Method for Massive MIMO Systems

Multiple-input multiple-output (MIMO) technology can potentially help to achieve high data rates for multiuser communication. To achieve better performance, the channel state information (CSI) is estimated by the pilot. However, the estimated CSI cannot be used in downlinks when the mobile speed is very high, since it becomes outdated due to the rapid channel variation. In a massive MIMO system, the issue of outdated CSI is serious when using traditional techniques. Therefore, in order to obtain accurate CSI, the prediction of future CSI is required. In this paper, a low complexity online extreme learning machine (ELM) is proposed for the online prediction of the fast fading channel. First, we introduce the structure of the online sequential extreme learning machine (OS-ELM) and combine the training process of the OS-ELM with a forgetting mechanism (FM) to predict fast changing channels. Second, we use the truncated polynomial expansion (TPE) to reduce the computational complexity of the OS-ELM with the FM (FOS-ELM). In addition, the performance of the proposed algorithm is verified through simulation results, and we apply channel prediction in the precoding process. It is shown that the communication quality is improved by our channel prediction algorithm.


I. INTRODUCTION
Multiple-input multiple-output (MIMO) wireless communication improves the transmission rate by increasing the number of transmitter and receiver antennas [1], [2]. However, those results rely on perfect knowledge of the channel state information (CSI) at the transmitter and receiver. But in the fast time-varying channel, the CSI can be outdated due to the delay of computations at the base station (BS), which might cause significant system performance degradations. There has been considerable work conducted on this issue, and the channel prediction has received significant critical attention in wireless communication research.
Channel prediction is beneficial for communication performance, and topics include adaptive coding, modulation, The associate editor coordinating the review of this manuscript and approving it for publication was Usama Mir . decoding processes, and channel equalization [3]- [9]. In wireless communication links, channel prediction helps the communication system to deploy physical-layer parameters and schemes [3]. The improper deployment of modulation schemes causes higher bit-error-rate (BER) or the lower data throughput. In addition, channel prediction is favorable for radio resource allocation and interference management. Therefore, channel prediction is essential for a wide range of wireless communication technologies [9].
In the past few years, several researches were conducted on channel prediction. The existing researches on channel prediction fall into three main categories: 1) the auto-regressive (AR) model [3], [10], [11], 2) the sum-ofsinusoids (SOS) [12]- [15], [17], 3) the neural network model [18]- [20]. The AR model, which adopts the AR algorithm, uses the outdated CSI to predict the future CSI. The greatest advantage is that the AR model does not need to know VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ the information of the physical parameters. The AR model only uses the outdated CSI to calculate the AR coefficients. Duel-Hallen [3], [10] employed an AR model to trace the channel changes and computes the MMSE estimates of the future CSI based on the outdated CSI. Since the AR model is a linear prediction model [16], it has wonderful effect and low computational complexity in slow changing channel, but the performance of the AR model is limited by fast changing channels. The SOS model begins with an approximation of the physical propagation process. The wireless channel is modeled as a sum of complex sinusoids. Unlike the AR model, the outdated CSI is not directly used by the SOS model. The parameters of the SOS model are estimated using outdated CSI. J. K. Hwang proposed MUSIC to estimate the physical parameters [17]. M. Chen investigated ESPRIT to predict future CSI [15]. If the parameters of the wireless channel do not vary with time, the SOS model has excellent predictive performance. However, the channel parameters change in practical communications, which causes performance degradation as time passes. The feasible measure is frequently used to estimate the parameters of the SOS model, but it increases the computational complexity.
Recently, neural networks become major channel prediction methods. Navabi et al. [18] assessed the viability of using a backpropagation (BP) neural network technique for estimating the user-channel feature. Potter et al. [19] proposed a recursive neural network (RNN) method for channel prediction. Jhihoon Joo offered a deep learning-based prediction model for predicting future channel information [20]. Neural networks provide excellent prediction performance for fast changing channels. However, the traditional BP neural network, RNN and deep learning require a large number of computational complexity to calculate the parameters of the fading channel. Therefore those neural networks are not applicable to online channel prediction.
With the coming of fifth generation mobile communication systems (5G), massive MIMO is utilized to promote the spectral efficiency and data throughput of communication systems by deploying large scale arrays with multiple antennas at the BS [1], [21], [22]. However, the traditional channel prediction methods have a large number of matrix operations with high computational complexity, which impose formidable pressure on the user due to the limited computational resources.
In this paper, the online sequential extreme learning machine with the forgetting mechanism and truncated polynomial expansion (TFOS-ELM) is proposed. We first apply a novel prediction algorithm developed from [23], [24], [26], which takes both neural network structure and computational speed into account. Our algorithm is based on the online sequential extreme learning machine (OS-ELM) with the forgetting mechanism (FOS-ELM). The TFOS-ELM algorithm applies in the online channel prediction with a low computational burden. In addition, the TFOS-ELM algorithm has a wonderful channel prediction effect for fast changing channels. We give more detailed analysis in the simulation results. Moreover, our algorithm is applied in the precoding stage under, and the results show that channel prediction can improve the precoding performance in a massive MIMO system.
The main contribution of our work is first to utilize OS-ELM for online channel prediction in a Massive MIMO system. The online channel prediction has more accuracy than off-line channel prediction in real communication environment. The FOS-ELM can avoid the influence of obsolete data but it causes huge complexity. Therefore, we exploit the TPE method to avoid the complicated matrix inversion. The TFOS-ELM model utilizes lower computational complexity to realize online channel prediction. Due to low complexity, the TFOS-ELM is easy to apply in online channel prediction for fast changing channels.
The rest of this paper is organized as follows. Section II briefly describes the massive MIMO systems, the prediction procedure and channel datasets. In Section III, the OS-ELM and FOS-ELM are introduced simply. In Section IV, we propose a channel prediction framework for the TFOS-ELM model. In Section V, the performance analysis and simulation results are provided. Finally, the paper is concluded in Section VI.

II. SYSTEM MODEL A. CHANNEL MODEL
Multipath delay can be expressed as some channel impulse responses (CIRs) between BS antennas and user antennas in the massive MIMO system [1], [21], [22]. The multipath delay fading channel can be modeled as [3]: where P is the number of reflection paths or obstacles between the (m t , n r )-antenna pair; N t and N r are the transmitter and receiver antenna, respectively; and (1 ≤ m t ≤ N t , 1 ≤ n r ≤ N r ). For the p-th path, A p , f p , and φ p are the amplitude, Doppler frequency, and initial phase, respectively. The Doppler frequency shift is given as follows: where f c is the carrier frequency, and v is the speed of the vehicle. c is the speed of light. θ p is the angle between the radio wave and the direction of the vehicle. The fast changing channel has a large v. The parameters A p , f p , and φ p are very slowly time-varying, and all of them are viewed as relevant in a coherent time period. Thus, the signal g m t n r (t) in Eq.(1) is converted into a superposition of complex sinusoids as 36682 VOLUME 8, 2020 Therefore g m t n r (t) = g m t n r I (t) + jg m t n r Q (t) where The discrete-time channel is formed by sampling the wireless channel over a fixed time as where g m t n r (k) is the discrete-time complex channel value. g m t n r (k) is expressed as the real part g m t n r I (k) and the imaginary part g m t n r Q (k), respectively. g m t n r (k) is the complexvalue of the channel information of the (m t , n r )-antenna pair at time k.
In an OFDM (Orthogonal Frequency Division Multiplexing) system, the pilot structure has three different types, including the block type, the comb type, and the lattice type. In our paper, the block-type pilots are arranged to estimate the CSI. S t is the period of the pilot symbols with respect to time. The pilot symbol period satisfies the following inequality Then, the channel estimation matrix can be depicted as follows.

B. DESCRIPTION OF THE SLFN STRUCTURE
The traditional ELM is a simple structure of SLFNs [23] and its network structure is shown in Figure 1. However, the ELM requires that all channel datasets of the SLFN are ready for learning. The SLFNs can be described as follows [23]: where a i ∈ (0, 1) is the weight vector between the input layer and the i th hidden node.Ñ is the number of SLFN hidden nodes. b i ∈ (−1, 1) is the bias of the i th hidden node. β i is the weight vector, which connects hidden layer and output layer; and F(a i , b i , x) is the output of the i th hidden node with respect to the input.

C. STRUCTURE OF THE CHANNEL DATASETS
Before channel prediction model based on SLFN, we need consist channel datasets which give us a way to easily predict channel in the OS-ELM, FOS-ELM and TFOS-ELM model. The silding window is applied to consist channel dataset and training model in channel prediction. In online channel prediction, Channel datasets improve the accuracy of prediction and suit the structure of prediction model. As shown in Figure 2.(a), the length of sliding window is (k − i + 1), M is the step of silding window. The blue dotted box is sliding window at time T K , the gray dotted box is sliding window at time T (K −1) . It means that the parameters of prediction model are updated when every M CSI are received.
As shown in Figure 2.(b), the structure of channel datasets are consisted in time T K . The (v + 1) represents the size of channel dataset (x, t) and v is the number of SLFN input nodes. The input of channel dataset x i and output of channel dataset t i are represented as follows.
Channel datasets is not only suit to the structure of SLFN, but also exploit the information of every CSI. In order to predict every future CSI, the step of silding window is 1 in channel dataset. In Figure 2.(a), the sliding window contains In real channel prediction scenario, the step of sliding window M is equal to the number of CSI in coherent time. In other words, we need to update the parameters of SLFN in every coherent time.

III. STRUCTURE OF THE ONLINE PRDICTION MODEL BASED ON ELM.
In practical applications, the ELM algorithm needs to update its own parameters for every online sequence in the prediction, which causes large computational complexity. The OS-ELM algorithm [24] can avoid the repeated training of past samples, and the extreme machine learning algorithm is realized with online learning. The AR model is a linear model, the predictive performance cannot be satisfied with a VOLUME 8, 2020 The OS-ELM is implemented with a single hidden layer feed-forward neural network (SLFN) which has a fast learning speed and can also achieve wonderful generalization performance compared to traditional algorithms.

A. THE DESCIRPTION OF THE OS-ELM
Assume the step of sliding windows is a fixed size M . Therefore each chunk of the channel datasets is ( First, suppose that the initial channel datasets ((x l , t l )|l = 0, 1, 2, ...kM ) has arrived. We fix the number of hidden nodes N and randomly assign the node parameters [(a i , b i ), (i = 1, . . . ,Ñ )], (M ≥Ñ ). Let H l denote the l-th hidden-layer output matrix [24]: where The output weight β(k) can be expressed as follows [24]: where Second, when M new channel datasets arrive, H k+1 is calculated by Eq. (11), and the β(k + 1) can be expressed as follows [24]: where m = 1, 2, . . . , M . Although the OS-ELM algorithm can avoid the repeated training of historical data, channel datasets are often timely with respect to some real applications. In network training research [25], [26], forgetting mechanism can reduce the influence of useless information.

B. THE DESCRIPTION OF THE FOS-ELM
When the forgetting mechanism is introduced into OS-ELM. When M new channel datasets arrive, the earliest channel datasets are deleted. The channel datasets are composed by Eq.(9) and Eq. (10). Therefore, the formula of β(k+1) is given as [26] However, the complexity of the FOS-ELM resists the channel prediction performance in fast changing channels. The complexity makes it difficult to apply the FOS-ELM algorithm in online predictions. Therefore, we proposed a novel prediction model, which reduced the computations of the training process and the influence of invalid data. It is easy to conduct online channel predictions in fast changing channel.

IV. THE TFOS-ELM CHANNEL PREDICTION ALGORITHM A. THE DESCRIPTION OF TPE ALGORITHM
Lemma 1: For any positive definite Hermitian matrix X [27].
Therefore, the Hermitian matrix is converted into a matrix polynomial expansion, when all the maximum of eigenvalues max λ(X ) are smaller than unity, hence any eigenvalues of the matrix power λ(X ) converge to 0 for close to infinite. The matrix disconverges when one eigenvalue is larger than unity, because the matrix power λ(X ) can be infinity when approaches infinity. So, the eigenvalues use the normalization factor κ. If a κ is chosen that satisfies Eq.(21), X −1 can be approximated by the TPE algorithm.
Similar to the FOS-ELM, the channel datasets has been assembled. The l-th hidden-layer output matrix H l was calculated using Eq.(11), and the β(k) of the TFOS-ELM was calculated using Eq. (12).
Because I+P k is a Hermitian matrix in Eq. (17), the inversion of matrix I+P k can be converted into a special form. The TPE algorithm conducts the matrix inversion through polynomial expansion, and then it intercepts the first few terms, which reduces calculation complexity. According to Lemma 1, the inverse operation of matrix P k is found using Eq. (20). It is denoted by Therefore, the β(k + 1) of the TFOS-ELM algorithm was calculated using Eq.(24) The channel prediction value can be expressed as Eq. (24).

C. THE PROCESS OF THE TFOS-ELM ALGORITHM
In Figure 4, the general scheme of sub-channel prediction with the TFOS-ELM algorithm is depicted, and the steps of MIMO channel prediction is concluded in the following Algorithm 1 and Figure 3.
Since the CSI is a complex value, the channel prediction is divided into two parts: the real-part and the imaginarypart. The TFOS-ELM calculates value of theĝ m t n r (k + 1) depending on the real-part and the imaginary-part. In our paper, all N t N r channels are simultaneously predicted. Then, the channel matrixĝ(k +1) is composed of everyĝ m t n r (k +1).
Next, the specific process of our algorithm is shown in algorithm 2:

D. COMPLEXITY ANALYSIS OF THE TFOS-ELM
The channel prediction complexities of the FOS-ELM and the TFOS-ELM are compared. The complexity of prediction algorithms includes complex addition and multiplication in every update. Assume that all new chunks of CSI data are the same size (M ) andÑ is fixed. The matrix inversion method of OS-ELM and FOS-ELM is Cholesky factorization [28]. The cost of matrix inversion is (1 3)n 3 + 2n 2 (where n is size of matrix). The complexity of the OS-ELM, FOS-ELM algorithm and the TFOS-ELM algorithm are shown as follows.
There are standard operations for the matrix. We get the computational complexity of β(k + 1) for the OS-ELM and FOS-ELM algorithm in TABLE 1. The complexity of OS-ELM and FOS-ELM are expressed as two part. One is the complexity of matrix inversion, other is process of β(k + 1). The specific data shown in TABLE 1. The TFOS-ELM algorithm avoiding the matrix inversion, the computational complexity of β(k + 1) is converted into terms of matrix multiplication.
From Table 1, the step of sliding window M has great influence on complexity. Considering the real channel prediction scenario and the structure of channel dataset, the step of sliding window M is equal to the number of CSI in coherent time. To obtain much more accurate future CSI, we need estimate more CSI in coherent time. The steps of sliding window M increase with the number of CSI data in one coherent time.
From Eq.(26∼27), the complexity of OS-ELM has lower complexity than others when M +12 < 24J . As the time goes by, the performance of OS-ELM model could degradation due to obsolescence of earliest channel datasets. Therefore, the OS-ELM is not suit in online channel prediction. Comparing FOS-ELM with TFOS-ELM, M is more thanÑ in the channel prediction model. Due to M ≥Ñ and M J , the computational complexity of the TFOS-ELM is less than that of the FOS-ELM. In addition, the complexity of β(k + 1) is only one sub-channel in the massive MIMO, thus, as the number of BS antennas increases, the TFOS-ELM algorithm is better than the FOS-ELM algorithm with respect to the overall complexity of all channels. Overall, when the step of sliding windows

V. SIMULATION
The Massive MIMO-OFDM simulation parameters are listed in Table 2. The numbers of receiving terminals, BS antennas and receiving terminal antennas are 16, 100 and 1, respectively. All of our simulations are completed in MATLAB 2014.a environment on a personal computer including a quadprocessor with 3.3 GHz and 16 GB of RAM.
In our simulations, the pilot symbol period (S t ) is set to 0.5ms in the block-type pilot arrangement when the Doppler frequency is 100Hz. The size of channel dataset is 5, the step of sliding window M is 30. When the predictors collect M channel datasets, the predictors update the OS-ELM, FOS-ELM and TFOS-ELM once. The AR model is different from the OS-ELM, FOS-ELM and TFOS-ELM; therefore it should update its parameters using the MMSE when 30 CSI arrive.
In order to evaluate performance of four prediction model, root mean square error (RMSE), correlation coefficients (R) and R-squared (R 2 ) analyze the performance of prediction between the actual and obtained CSI. Due to the online channel prediction, the testing accuracy play a key role in performance criteria. In this paper, we define a model, which reaches a best validation [29] performance in terms of RMSE,    Table 3.
In Table 3, the testing results are presented by using the AR, OS-ELM, FOS-ELM and TFOS-ELM. In this process, 2000 CSI data are utilized. Compare with AR model, the OS-ELM, FOS-ELM and TFOS-ELM have higher RMSE and R. In addition, the TFOS-ELM approaches to the performance of FOS-ELM in RMSE and R criteria. But the TFOS-ELM has lower computational complexity according to Eq. (27). Figures 6-9 can explain the high-performance of the four prediction algorithms. Figure 5 depicts the amplitude of the (1,1)-antenna pair channel prediction with the OS-ELM, FOS-ELM and  TFOS-ELM algorithms in a scenario in which the Maximum Doppler shift is 100 Hz and the delay is 0.5ms. We can find that the channel coefficients are correlated in the coherent time. Figure 6 shows a situation where the maximum Doppler shift is set to 100 Hz with a 0.5ms delay. As the SNR increases, the OS-ELM, the FOS-ELM, the TFOS-ELM with J = (4, 6, 8) and the AR algorithms have different MSEs. The results show that all algorithms have better performance than the AR algorithm, since the OS-ELM, the FOS-ELM and the TFOS-ELM have higher non-linear prediction performance. In addition, at a low SNR, the OS-ELM and the FOS-ELM perform worse than the TFOS-ELM algorithm, since our channel prediction model will become inaccurate when the noise is large. As J increases, the influence of the noise on the TFOS-ELM algorithm will also increase. The TFOS-ELM prediction algorithm is not sensitive to the effects of noise when J is small.
In Figure 7, the average performance rate of massive MIMO precoding is investigated. The OS-ELM, the FOS-ELM, the TFOS-ELM, and the AR algorithm are adopted before the Zero-Forcing (ZF) precoding in massive MIMO system. The channel delay is set to 0.5ms, the Doppler frequency is 100Hz, and J = (4, 6,8). We observe that the average performance rate of the simulation with the channel prediction is better than the simulation without the prediction  when the SNR is more than −7dB. This because the channel prediction can compensate for the effects of the delay. In addition, when the SNR is less than −7dB, the accuracy of the channel prediction may be influenced by noise. The AR algorithm has lower average rates than the other prediction algorithms in our simulation. In addition, the average rates of the OS-ELM, the FOS-ELM, and the TFOS-ELM are similar, but the TFOS-ELM runs faster than the other algorithms when the number of BS antennas is large.
In Figure 8, the simulation results demonstrate that the BER performance of a system using channel prediction is better than that with no prediction. The BER of the AR algorithm is higher than those of other prediction algorithms. The TFOS-ELM (J = 4, 6, 8) has best performance when low SNR exists. This is because the ZF precoding enhances the influence of noise and the TPE algorithm decreases the impact of the noise. The simulation results in Figure 8 show that the FOS-ELM algorithm has higher performance than the TFOS-ELM algorithm when the SNR more than −7dB, but it cause high computational complexity (the computational complexity is shown in Table 1). A suitable value of J can allow the TFOS-ELM to attain high communication performance. Comparing the OS-ELM and the TFOS-ELM prediction algorithms, the TFOS-ELM algorithm (J ≥ 6) has similar performance to the FOS-ELM algorithm.
The computational complexities of the three prediction algorithms with steps of silding windows M (the number of CSI in one coherent time) are presented in Figure 9. As the step of sliding window M increases the computational complexities of the FOS-ELM are more than those of the TFOS-ELM and OS-ELM in the massive MIMO system. When M ≥Ñ andÑ = 10, Figure 9 depicts that the TFOS-ELM algorithm with J = (4, 6, 8) has lower computational complexity than the FOS-ELM algorithm. In Figures 6-9, where J = (4, 6,8), v = 5, the number of nodesÑ is 10. The OS-ELM and TFOS-ELM have the similar computational complexity, but the prediction accuracy of the TFOS-ELM is better than the OS-ELM. The computational complexity of the FOS-ELM is larger than the OS-ELM and the TFOS-ELM. Therefore, as the step of sliding window M and the number of BS antennas increase in the massive MIMO system, our proposed prediction algorithm has a substantial advantage with respect to the computational complexity.

VI. CONCLUSION
In this paper, we proposed the TFOS-ELM algorithm for online channel prediction. We first introduced the FOSELM to predict channel, and utilized TPE to reduce the complexity of FOS-ELM algorithm. The TFOS-ELM algorithm was simulated using a fast changing channel. The simulation results show that the TFOS-ELM algorithm has superior performance compared to the OS-ELM, FOS-ELM and AR algorithms for channel prediction using fast changing channels. Furthermore, we applied the channel prediction to precoding. The results show that our algorithm improves the communication quality. Our future work includes improving the performance of modulation, power control and beamforming performances based on channel prediction. YIGANG HE received the M.Sc. degree in electrical engineering from Hunan University, Changsha, China, in 1992, and the Ph.D. degree in electrical engineering from Xi'an Jiaotong University, Xi'an, China, in 1996. His teaching and research interests are in the areas of power electronic circuit theory and its applications, testing and fault diagnosis of analog and mixed-signal circuits, electrical signal detection, smart grid, satellite communication monitoring, and intelligent signal processing. He has published some 300 journal and conference papers, which was included more than 1000 times in Science Citation Index of American Institute for Scientific Information in the aforementioned areas and several chapters in edited books. He has been on the Technical Program Committee of a number of international conferences. He was the recipient of a number of national and international awards, prizes, and honors, i.e., the winner of the National Outstanding Youth Science Fund, China National Excellent Science and Technology Worker.