ELM-based Superimposed CSI Feedback for FDD Massive MIMO System

In frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO), deep learning (DL)-based superimposed channel state information (CSI) feedback has presented promising performance. However, it is still facing many challenges, such as the high complexity of parameter tuning, large number of training parameters, and long training time, etc. To overcome these challenges, an extreme learning machine (ELM)-based superimposed CSI feedback is proposed in this paper, in which the downlink CSI is spread and then superimposed on uplink user data sequence (UL-US) to feed back to base station (BS). At the BS, an ELM-based network is constructed to recover both downlink CSI and UL-US. In the constructed ELM-based network, we employ the simplified versions of ELM-based subnets to replace the subnets of DL-based superimposed feedback, yielding less training parameters. Besides, the input weights and hidden biases of each ELM-based subnet are loaded from the same matrix by using its full or partial entries, which significantly reduces the memory requirement. With similar or better recovery performances of downlink CSI and UL-US, the proposed ELM-based method has less training parameters, storage space, offline training and online running time than those of DL-based superimposed CSI feedback.


I. INTRODUCTION
T HE massive multiple-input multiple-output (MIMO) brings the fifth generation (5G) wireless communication system many advantages in system capacity and link robustness. However, the premise of these advantages is that the accurate downlink channel state information (CSI) can be obtained by base station (BS) [1]. In time division duplex (TDD) mode, downlink CSI can be estimated from uplink CSI by using channel reciprocity [2]. For frequency-division duplexing (FDD) mode, the reciprocity-based downlink CSI is not available due to the difference between uplink and downlink frequency bands [1], [3]. Thus, the downlink CSI in FDD massive MIMO system should be estimated by users and fed back to the BS [3].
Although the codebook-based CSI feedback method effectively reduces the feedback overhead at user side, the huge number of BS antennas in massive MIMO system substan-tially results in a tremendous dimension of codebook, which is too large to be applied in practice [3]. To alleviate this issue, compressive sensing (CS)-based CSI feedback methods have been proposed in [3]- [6], in which the temporal correlation [3], sparse enhancement basis [4], and spatial correlation [4]- [6] of CSI are developed. However, the downlink CSI is approximately sparse for a specific model rather than a general assumption, which may cause practical problems when the hypothesis is not valid [7]- [17].
Recently, deep learning (DL) methods have been successfully applied to physical-layer of wireless communication, e.g., CSI feedback [7]- [17], modulation recognition [18]- [20], information security [21] [22], etc. For CSI feedback, the DL-based methods outperformed many existing CS schemes in feedback reduction, yet they still occupy significant uplink bandwidth resource. To avoid the occupation of uplink bandwidth resources, the superimposed CSI feedback VOLUME 4, 2016 1 arXiv:2002.07508v2 [eess.SP] 12 Mar 2020 was proposed in [23] and further expanded to DL-based approach in [24]. From [24], the DL-based superimposed CSI feedback illuminates that the recovery of downlink CSI is superior than that of superimposed CSI feedback in [23] with similar UL-US's recovery performance. Even so, the DLbased scheme is still hindered by many disadvantages, such as the high complexity of parameter tuning, large number of training parameters, long training time, etc. This motivates us to develop ELM-based superimposed CSI feedback to improve the DL-based approach in [24].
For feedback reduction, the DL-based CSI feedback developed in [7]- [17] could be classified into two categories. The first category is mainly based on a neural network called CsiNet [7], which achieved superior performance over various CS-based CSI feedback. Yet, the time correlation, frequency correlation, spatial correlation, feedback delay and feedback errors, etc., were not considered in CsiNet, and thus lead to limited applications. To remedy these defects, a series of improvement methods have emerged in [8]- [12]. In [8], a CsiNet long short-term memory (CsiNet-LSTM) was proposed by exploiting the time correlation, thereby suiting for practical application in time-varying channels. The recurrent neural network-based CsiNet in [9] was developed to capture the temporal and frequency correlations of wireless channels. Considering the spatial correlation among antennas, the bidirectional LSTM (Bi-LSTM) and bidirectional convolutional LSTM (Bi-ConvLSTM) were proposed in [10]. In addition, the feature extractions of CSI according to multiple resolutions and different domains (e.g., spatial and temporal domains) were presented in [11] and [12], respectively. Furthermore, the noisy feedback in [13], feedback errors and feedback delay in [14] were considered to enrich the CSI feedback of CsiNet. These methods in [8]- [12] substantively enhanced the performance of CsiNet in [7]. Another category of feedback reduction proposed for DL-based CSI feedback takes into account the quantization operation, e.g., [15]- [17]. In [15], a bit-level optimized NN, i.e., the joint convolutional residual network (JC-ResNet), was constructed with both CSI compression and quantization. By employing the multiple-rate CS neural network framework, [16] compressed and quantized the CSI to improve reconstruction accuracy and decrease storage space. The architecture in [17], which was composed of convolutional layers followed by quantization and entropy coding blocks, presented promising performance. Although the DLbased CSI feedback in [7]- [17] has achieved significant improvement in feedback reduction compared with the CSbased approaches, yet the uplink bandwidth resources are still seriously occupied due to the massive MIMO scenario.
Without any occupation of uplink bandwidth resources, the superimposed CSI feedback was proposed in [23] and [24]. In [23], the downlink CSI was spread and then superimposed on uplink user data sequences (UL-US) to feed back to BS. Although the occupation of uplink bandwidth was avoided, the recoveries of downlink CSI and UL-US were deteriorated by superposition interference. The interference cancellation has been a research hotspot in the field of wireless communications [25] [26]. To remedy the defect of [23], a DL-based superimposed CSI feedback was proposed in [24], which consistently improved the estimation of downlink CSI with similar or better UL-US detection performance. But still, the DL-based superimposed CSI feedback is challenged by long training time, complex parameter tuning, large memory requirement, etc. Unlike the DL-based CSI feedback methods mentioned above, extreme learning machine (ELM) is a single-hidden layer feed-forward neural network, required no gradient back-propagation [27]. In addition, its input weights and hidden layer biases are randomly generated, and the output weights are analytically calculated by solving the least squares norm problem [28]. Specially, the ELM network can directly process complex-valued inputs, and can employ complex-valued weights and biases as well [28]. Therefore, the ELM processed many advantages, e.g., fast learning speed (hundreds of times faster than that of backpropagation algorithm), good generalization performance [27]- [29], etc. Inspired by these advantages, an ELM-based superimposed CSI feedback method is proposed in this paper to improve the DL-based superimposed CSI feedback in [24].

B. CONTRIBUTIONS
In this paper, an ELM-based superimposed CSI feedback method is proposed to improve the DL-based approach in [24]. To the best of our knowledge, there are few literatures focusing on the ELM-based superimposed CSI feedback method. The main contributions of this paper are as follows: • An ELM-based network is constructed to recover downlink CSI and UL-US, in which four ELM-based subnets are employed to replace and simplify the DL-based subnets in [24], leading to low complexity of parameter tuning, less training parameters and training time, etc. • In the proposed ELM-based network, the requirement of storage space for input weight matrices and bias vectors of four subnets is greatly reduced, compared with those of the standard ELM network [30]. This is achieved by loading the input weight matrices and hidden bias vectors of each subnet from the same matrix (which provides full or partial entries). • Compared with the DL-based superimposed CSI feedback method in [24], the proposed method can obtain a considerable recovery performance of downlink CSI and UL-US with less overhead, e.g., less training parameters, storage space and training time, etc.
The remainder of this paper is structured as follows: In Section II, we introduce the system model of superimposed CSI feedback. The ELM-based CSI feedback method is presented in Section III, and followed by the numerical results in Section IV. Finally, Section V concludes our work. Notations: Bold face upper case and lower case letters denote matrix and vector respectively. (·) T , (·) H , (·) † , denote the transpose, conjugate transpose, and matrix pseudoinverse respectively. I P is the identity matrix of size P × P ; BN (·) denotes the operation of batch normalization; · 2 is the Euclidean norm.

II. SYSTEM MODEL
Considering a massive MIMO system consists of a BS with N antennas and U single-antenna users, the transmitted signal of user-u, u = 1, 2, . . . , U , denoted as x u , can be given by where ρ ∈ [0, 1] stands for the power proportional coefficient (PPC) of downlink CSI; E u represents the transmitted power of user-u; h u ∈ C N ×1 denotes the downlink CSI from BS to user-u; P u ∈ R M ×N is a spreading matrix, satisfying P T u P u = M I N ; d u ∈ C M ×1 denotes UL-US, and M is the UL-US length. In general, we assume M > N due to main task of the user services.
In (1), the downlink CSI is spread by P u , and then superimposed on the UL-US to transmit to BS. At BS, after the processing of matched-filter (MF) (i.e., the conventional multiuser detector structure consists of a MF bank front [31]), the received signal from user-u, is denoted as r u ∈ C N ×M , can be given by [24] where, n u ∈ C N ×M is the additive white Gaussian noise (AWGN) vector of feedback link with zero mean and σ 2 uvariance entries; g u ∈ C N ×1 denotes the uplink channel vector from user-u to BS.
With the received r u at BS, the main task of superimposed CSI feedback in [23] and DL-based superimposed CSI feedback in [24] is to recover downlink CSI and detect UL-US. Compared with the superimposed CSI feedback in [23], the DL-based superimposed CSI feedback in [24] improves the estimation of downlink CSI with similar or better detection performance of UL-US. Even so, the DL-based superimposed CSI feedback in [24] is still challenging due to long training time, large training parameters, etc. To improve the DL-based superimposed CSI feedback in [24], an ELMbased superimposed CSI feedback method is proposed in this paper, which will be elaborated in the next section.

III. ELM-BASED SUPERIMPOSED CSI FEEDBACK
Similar to [24], a coarse estimation is also employed by ELM-based superimposed CSI feedback, for which the interference of uplink channel is eliminated and the network structure is simplified. According to the received signal r u , the coarse estimation can be given by Then, the estimated x u is delivered to an ELM-based network to recover the downlink CSI and UL-US.

A. NETWORK ARCHITECTURE
The proposed ELM-based network consists of four subnets (i.e., CSI-ELM1, DET-ELM1, CSI-ELM2 and DET-ELM2), in which the downlink CSI recovery and UL-US detection are addressed by solving a multi-task problem. This network structure is illustrated in Fig. 1 and described as follows: • CSI-ELM1 and DET-ELM1 have the same network structure as CSI-ELM2 and DET-ELM2, respectively. CSI-ELM1, DET-ELM1, CSI-ELM2, and DET-ELM2 are successively cascaded to form a multi-task network. Between two cascaded subnets, some expert knowledge is inserted to reduce interference. VOLUME 4, 2016 • For CSI-ELMi, i = 1, 2, the neurons of input layer, hidden layer, and output layer are N , 8N , and N , respectively. Similarly, the neurons of input layer, hidden layer, and output layer are M , 8M , and M for DET-ELMi, respectively. It should be noted that the proposed ELM-based subnets are complex-valued subnet.
To match the real-valued subnet in [24], its neurons of hidden layer are set as half as that of [24]. • The input of each subnet is normalized by an operation of batch normalization (BN). Different from the standard ELM network [30], the hidden output of each subnet employs a linear activation function. • Only the output weights (i.e., Φ 1 , Φ 2 , Φ 3 , and Φ 4 ) need to be trained. The input weights (i.e., W 1 , W 2 , W 3 and W 4 ) and hidden biases (i.e., b 1 , b 2 , b 3 , and b 4 ) are randomly chosen from the same matrix by using its full or partial entries. Once the input weights and hidden biases are chosen, they are fixed. Compared with the DL-based superimposed CSI feedback network in [24], the storage space of parameters, the number of training parameters, and the training time are significantly reduced. As a whole, the proposed ELM-based network possesses the similar architecture as that of the DL-based CSI feedback network in [24], but with many advantages, e.g., fewer training parameters, less storage space requirement and shorter training time, etc. These advantages will be presented during the offline training and online running procedures. Naturally, we can further accelerate and simplify neural networks by combining the network compression (e.g., [32]) and ELM. In this paper, we mainly focus on ELM-based superimposed CSI feedback and leave this combination to future work.

B. OFFLINE TRAINING
Similar training approach in [24] (i.e., subnet-by-subnet training) is adopted in this paper, but with many differences. On the one hand, only output weights of each subnet need to be learned for ELM-based network without the requirement of gradient updating [27] [30]. On the other hand, the input weights and hidden biases are randomly chosen and then fixed, rather than learned from the ELM-based network. In addition, the way of obtaining the input weights and hidden biases is also different from the standard ELM network [30]. For the offline training, we first describe the data collection as following.

1) DATA COLLECTION
To train the ELM-based network, the training sets are acquired by a simulation approach. In addition, the input weights and hidden biases also need to be obtained. We first describe the collection of training sets.
The generation of input weights and hidden biases is different from the standard ELM network [30]. We randomly generate W ∈ R 8M ×M , and save it in storage space. In W, each entry is modeled as standard normal distribution with zero mean and unit variance. Then all input weights and hidden biases (i.e., W 1 , W 2 , b 1 , and b 2 ) are loaded from W by using its full or partial entries. Since only 8M 2 coefficients (from W) need to be saved rather than all input weights and hidden biases of standard ELM network, the storage space can be significantly reduced.

2) NETWORK TRAINING
With the collected data, CSI-ELM1, DET-ELM1, CSI-ELM2 and DET-ELM2 are trained in turn to obtain Φ 1 , Φ 2 , Φ 3 , and Φ 4 , respectively. Prior to a subset training, the expert knowledge is employed to eliminate superimposed interference. After a subnet is trained, its input weights, hidden biases and output weights are then fixed for training the following subnet. By referencing the architecture in Fig. 1, the training procedure is presented as follows.
Despreading: Before training each subnet, a despreading operation for the training set x (k) u , k = 1, 2, 3, 4, is employed to reduce superimposed interference from UL-US, which can be given by where u , and x (4) u , are used to generate CSI-ELM1's input for training CSI-ELM1, DET-ELM1, CSI-ELM2, and DET-ELM2, respectively.
Training CSI-ELMi: With the training input h (i) u , i = 1, 2, the hidden output of CSI-ELMi, denoted as H (2i−1) u , can be expressed as In (7), h u is obtained according to (6), while the h u is formed based on the output of DET-ELM1. According to the hidden output matrix H (2i−1) u and label T (i) h , the output weight matrix of CSI-EMLi can be given by Once the output weight matrix Φ 2i−1 was obtained, it was then fixed for training the following subnet.
Reduction of downlink CSI interference: In order to train DET-ELMi, i = 1, 2, we use the expert knowledge to reduce the superimposed interference from downlink CSI. According to the output of CSI-ELMi, i.e., h (i) u , this interference reduction can be represented as From (9), the superimposed interference from downlink CSI is partly removed, yielding an improved input of DET-ELMi. With the input d (i) u , DET-ELMi is then trained to learn its output weight Φ 2i .
Training DET-ELMi: According to the input d Based on the hidden output H After the training of DET-ELMi is finished, the output weight matrix Φ 2i is obtained. Then, the learned Φ 2 (i.e., i = 1) is used to train the next subnet (i.e., CSI-ELM2) and the learned Φ 4 (i.e., i = 2) is saved for online running. UL-US interference reduction: With the trained Φ 2 , the DET-ELM1 can produce its output d (1) u . Before entering CSI-ELM2, the superimposed interference from UL-US should be reduced. This interference reduction can be represented as Then, the superimposed interference from UL-US is partly removed, which improves the input of CSI-ELM2. From the processing mentioned above, the output weights, i.e., Φ 1 , Φ 2 , Φ 3 , and Φ 4 , are learned from the network training. This training procedure is summarized in TABLE 1.
According to the network training and corresponding processing, all required network parameters, i.e., Relative to the training of DL-based network in [24], the training for proposed ELM-based network presents many advantages. On the one hand, the gradient back-propagation is usually employed by DL-based network's training to update network parameters and minimize the loss function [30]. This training usually needs large training set and long  training time. On the other hand, the DL-based network's training is usually accompanied by gradient disappearance, over-fitting, and complex parameter tuning [27] [33], etc. In contrast, the proposed ELM-based network is a forward network (without the requirement of gradient backpropagation), whose parameter training can be performed by matrix operation [30]. Since only the output weight matrices (i.e., Φ 1 , Φ 2 , Φ 3 , and Φ 4 ) rather than all network parameters (i.e., W 1 , need to be trained, the proposed ELM-based network possesses shorter training time, less training parameters, and easier training operation than DL-based network in [24]. In addition, W 1 , W 2 , b 1 , and b 2 are all loaded from the same matrix W (i.e., the actual memory for all these parameters is equal to the size of matrix W), leading to less parameter memory compared with that of [24].

C. ONLINE RUNNING
With the trained network parameters, the online running procedure is presented in TABLE 2, and some explications are given as follows.
Network input: With the received signal r u in (2), the coarse estimation is employed according to (3) to capture the network input x u ∈ C M ×1 and simplify network architecture.
Estimation of downlink CSI: The CSI-ELM1 and CSI-ELM2 are employed to estimate downlink CSI, which are given in step 3) and step 7) of TABLE 2, respectively. This estimation of CSI-ELMi, i = 1, 2, can be expressed as where h (1) u and h (2) u are the outputs of despreading and interference reduction (from UL-US), respectively. In TABLE 2, h (1) u and h (2) u can be obtained according to step 2) and step 6), respectively. Specially, h (2) u is viewed as the estimated downlink CSI according to the proposed ELM-based network in this paper.
Reduction of downlink CSI interference: Prior to UL-US's detection, the interferences from downlink CSI are VOLUME 4, 2016 reduced according to the step 4) and step 8) in TABLE 2 for DET-ELM1 and DET-ELM2, respectively. The interference reduction for DET-ELMi, i = 1, 2, can be represented as Then, d u and d (2) u are used to serve the UL-US's detections of DET-ELM1 and DET-ELM2, respectively.  (14) i.e., use the expert knowledge to obtain d (2) u . 9): Utilize DET-ELM2 to detect UL-US with input d (2) u , i.e., detect to obtain d (2) u , which is expressed in (15) with i = 2. Output: hu = h (2) u and du = d Detection of UL-US: Based on the captured d (i) u , i = 1, 2, the UL-US can be detected by using DET-ELMi, which can be expressed as In (15), d (1) u and d (2) u are detected according to step 5) and step 9) in TABLE 2, respectively. In the proposed ELM-based network, d u , we employ an interference reduction, i.e., the step 6) in TABLE 2, to generate the ELM-ELM2's input, which can be given by Then, h u is used as the input of CSI-ELM2 to estimate downlink CSI, which is given in step 7) in TABLE 2.
From step 1) to step 9) in TABLE 2, the downlink CSI and UL-US can be recovered according to the online running, i.e., h u = h (2) u and d u = d (2) u can be obtained from the proposed ELM-based network with the received signal r u .

IV. EXPERIMENTAL ANALYSIS
In this section, we give some numerical results of the proposed ELM-based CSI feedback. Some definitions and basic parameters involved in simulations are first given in IV-A. Then, we show the normalized mean squared error (NMSE) of downlink CSI and bit error rate (BER) of UL-US in IV-B to verify the effectiveness of the proposed ELM-based CSI feedback. The last but not the least, in IV-C, the less training parameters, storage space, training time and online running time than those of DL-based superimposed feedback in [24] are presented. In the following, we describe the experimental setting.

A. PARAMETER SETTING
Some definitions involved in simulations are given as follows. The signal-to-noise ratio (SNR) in decibel (dB) of the signal received at BS from user-u is defined as [23] [24] The NMSE, which is used to evaluate the recovery of downlink CSI, is defined as [23] [24] In the experiment phase, M = 512, N t = 10 4 . The Walsh matrix is employed as the spreading matrix P u . Both uplink and downlink CSI, i.e., h u and g u , are randomly generated on the basis of the distribution CN (0, (1/N )). The UL-US d u is formed according to the symbols of quadrature-phaseshift-keying (QPSK) modulation. We randomly generate the matrix W, i.e., each entry in W obeys the independent and identically distributed (i.i.d.) Gaussian distribution with zero mean and unit variance. The training input data-sets are generated from (1) to (3), where ρ = 0.20 is considered. Unlike DL-based network in [24], where the training SNR is set as SN R = 5dB, the ELM-based network is trained under noise-free case. According to [24], the hidden neurons of the CSI-NETi and the DET-NETi, which are employed in DL-based superimposed feedback, are 16N and 16M , respectively. The testing data is generated by utilizing the same method of generating the training data, and we stop the testing for BER performance when at least 1000-bit errors are observed [24]. For description convenience in IV-B and IV-C, we denote the traditional superimposed feedback in [23], the DL-based superimposed feedback in [24], and the proposed ELM-based superimposed CSI feedback as "Ref [23]", "Ref [24]", and "Proposed", respectively.

B. BER AND NMSE PERFORMANCE
In this subsection, the BER and NMSE are plotted to verify the effectiveness of the proposed ELM-based CSI feedback. We first show the performance with different values of N in   2. Then, the impact of PPC on BER and NMSE is given in Fig. 3.
To verify the proposed method can achieve the similar performance of "Ref [24]", the BER and NMSE curves are presented in Fig. 2, where M = 512, ρ=0.20, and different values of N (i.e., N = 16, N = 32, and N = 64) are considered. From Fig. 2, the similar BER performances of "Ref [23]", "Ref [24]", and "Proposed" are observed for the same PPC ρ=0.20. For the NMSE performance, both "Ref [24]" and "Proposed" outperform "Ref [23]" due to the introducing of neural network (NN). Need to be mentioned that, the superimposed CSI feedback is a multi-task problem, i.e., both the downlink CSI and the UL-US need to be recovered at BS, and NN is particularly suitable for solving this complex problems. Besides, in terms of NMSE, "Proposed" is similar or slightly better than "Ref [24]". In detail, when N = 64, the "Proposed" and "Ref [24]" have the similar NMSE. Yet the NMSE of "Proposed" is better than that of "Ref [24]" as N = 16 and N = 32. A relatively small N , e.g., N ≤ 32, is easier to present this ELM-based network's advantages of NMSE. The "Proposed" obtains slightly better NMSE than that of "Ref [24]". One of the possible reasons is that the complex parameter tuning for DL-based CSI feedback in Ref [24] results in the difficulty in learning optimal network parameters. The other possible reason is that the testing SNR is not the training SNR for "Ref [24]", and thus degrades the NMSE performance. This also reflects that the proposed ELM-based network has a good generalization against the varying SNR. Although the BER and NMSE of "Ref [24]" are not improved significantly, the proposed ELM-based CSI feedback embodies obvious advantages during offline training and online running (which will be expatiated in IV-C).
In Fig. 3, we validate the BER and NMSE's robustness of the proposed method against the impact of PPC, where different values of N (i.e., N = 16, N = 32, and N = 64) and different values of PPC (i.e., ρ = 0.05, ρ = 0.10, and ρ = 0.15) are considered, respectively. From Fig. 3, the BER of "Proposed" is slightly better than those of "Ref [23]" and "Ref [24]" for each PPC in high SNR regime (e.g., SN R ≥ 12dB) when N = 16 and N = 32. The "Ref [24]" faces with complex parameter tuning and generalization problem, resulting in its BER performance slightly inferior to that of the proposed method. For NMSE, in low SNR regime (e.g., SN R ≤ 2dB), the "Ref [23]" and "Ref [24]" are slightly better than "Proposed" for a relatively small PPC (e.g.. ρ ≤ 0.10). The possible reasons for this are that the insufficient spread spectrum gain (defined as M/N ) and the least-square solution (ELM transforms the learning training problem into solving the least-square norm problem   of output weight matrix) result in the difficulty of resisting the noise in low SNR regime. Therefore, the proposed ELMbased network cannot work well in relatively low SNR (e.g., SN R ≤ 2dB) with a relatively large N (e.g., N = 64) and a relatively small PPC (e.g., ρ ≤ 0.10). In high SNR regime (e.g., SN R ≥ 12dB), the "Proposed" obtains the best NMSE performance, and a larger PPC obtains greater improvements. As a whole, the proposed ELM-based superimposed CSI feedback avoids complex parameter tuning and possesses good generalization ability. In addition, the improvement in offline training and online running also makes the proposed method attractive.
To sum up, according to Fig. 2 and Fig. 3, the proposed ELM-based CSI feedback shows similar BER and NMSE as "Ref [24]". In proposed method, the performances of BER and NMSE are robust against the impact of PPC. With similar BER and NMSE of "Ref [24]", in following subsection, we will elaborate the advantages of "Proposed", e.g., less training parameters, storage space and training time, etc.

C. OVERHEAD COMPARISON
As can be seen in IV-B, the BER and NMSE of "Proposed" are similar as those of "Ref [24]". In this subsection, we will show the overhead comparisons between "Proposed" and "Ref [24]". Their overheads are presented in TABLE 3, where the number of training parameter, storage space, training time and online running time, are compared and details are described as follows.
Training time: Since the network of "Ref [24]" is DLbased network, thus we carry out its training on a server with NVIDIA TITAN RTX GPU and Intel Xeon(R) E5-2620 CPU 2.1GHz×16. Unlike the DL-based network of "Ref [24]", the proposed ELM-based network is a complex-valued and feed-forward network, yielding its training can be easily performed by using Matlab's matrix operation on a personal computer (PC). Thus, we train the network of "Proposed" on a PC with CPU i5-4210U (1.7GHz×4). From the comparison of training parameters, the training parameters of "Proposed" are far fewer than those of "Ref [24]", leading to a shorter training time. In TABLE 3, the maximum time consumption of "Proposed" and the minimum time consumption of "Ref [24]" are considered among the cases where N = 16, N = 32, and N = 64. Even without GPU's acceleration, to capture similar BER and NMSE, the training time of "Proposed" is less than 12 minutes, while the "Ref [24]" consumes more than 70 minutes. Overall, compared with "Ref [24]", the proposed method significantly reduces the training time.
Online running time: Because the same expert knowledge is adopted by "Proposed" and "Ref [24]", the main difference of online running time is reflected in four subnets. Thus, we observe and analyse the online running time according to the four subnets. For a fair comparison, 10 4 online-running experiments are conducted for "Proposed" and "Ref [24]" on the same PC (with CPU i5-4210U) by using Matlab software, respectively. That is, the DL-based method in "Ref [24]" is also run on Matlab software by importing its trained network parameters and architecture. During the experiments, only running time is considered, i.e., the time for generating the data set is not included. From TABLE 3, the time consumption of "Ref [24]" is much longer than that of "Proposed". The analysis can also derive the same conclusion. For "Proposed", each DET-ELMi has 2×(8M ×M )+4×(M ×8M ) = 48M 2 real multiplications, and each CSI-ELMi has 2 × (8N × N ) + 4 × (N × 8N ) = 48N 2 real multiplications. Here, the input weights and output weights are real-valued matrix and complex-valued matrix, respectively. Relative to complex-valued input, the input weights and output weights respectively produce 2 times and 4 times real multiplications. In addition, the real additions From the overhead comparison in this subsection, the proposed ELM-based CSI feedback embodies many advantages. Concretely, with similar BER and NMSE, the "Proposed" has less training parameters, storage space, offline training time and online running time than "Ref [24]". VOLUME 4, 2016

V. CONCLUSION
The DL-based superimposed CSI feedback is still facing many challenges, such as the complexity of parameter tuning, huge number of training parameters, long offline-training and online-running time, etc. To remedy these defects, the ELMbased superimposed CSI feedback has been investigated in this paper. By employing the simplified versions of ELMbased subnets, the proposed method brings little change to the neural network structure of the original DL-based network but significantly reduces training parameters and offline-training time. More importantly, without loss of BER and NMSE performance, the proposed method requires less storage space and online-running time than those of original DL-based superimposed CSI feedback. Other approaches such as combining compression techniques and ELM could also be explored in future work.