Deep Learning based Downlink Channel Prediction for FDD Massive MIMO System

In a frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) system, the acquisition of downlink channel state information (CSI) at base station (BS) is a very challenging task due to the overwhelming overheads required for downlink training and uplink feedback. In this paper, we reveal a deterministic uplink-to-downlink mapping function when the position-to-channel mapping is bijective. Motivated by the universal approximation theorem, we then propose a sparse complex-valued neural network (SCNet) to approximate the uplink-to-downlink mapping function. Different from general deep networks that operate in the real domain, the SCNet is constructed in the complex domain and is able to learn the complex-valued mapping function by off-line training. After training, the SCNet is used to directly predict the downlink CSI based on the estimated uplink CSI without the need of either downlink training or uplink feedback. Numerical results show that the SCNet achieves better performance than general deep networks in terms of prediction accuracy and exhibits remarkable robustness over complicated wireless channels, demonstrating its great potential for practical deployments.

division duplexing (FDD) massive MIMO systems due to the prohibitively high overheads associated with downlink training and uplink feedback.
In fact, there are two important observations that can help reduce the overheads. First, the wireless channels between BS and users only have a small angular spread (AS) as demonstrated in [2]- [4]. Due to the small AS and the large dimension of the channels, massive MIMO channels exhibit sparsity in the angular domain. Secondly, there exists angular reciprocity between the uplink and the downlink channels since the uplink and the downlink share the common physical paths [4]. Since the acquisition of the uplink CSI is convenient in massive MIMO systems, many studies have suggested to extract partial information of the downlink CSI from the uplink CSI, thereby reducing the downlink training overhead or to employ compressive sensing (CS) based algorithms to reduce the overhead of the uplink feedback [5], [6]. For example, in [5], the downlink channel covariance matrix (CCM) is first estimated from the uplink CCM and then the eigenbeamforming is used to reduce the overhead for the downlink training when AS is less than 5 • . In [6], the channels are first parameterized by distinct paths, each characterized by path delay, angle, and gain. Then, the frequency-independent parameters, i.e., path delays and angles, are extracted from the uplink CSI to help reduce the downlink training. Nevertheless, the method in [6] is applicable as long as the propagation paths are distinguishable and the path number is small. Besides, several CS-based channel feedback schemes for massive MIMO have been proposed to reduce the feedback overhead but are sensitive to the model errors and suffer from high complexity.
Due to its excellent performance and low complexity [7], deep learning has been introduced recently to the wireless physical layer and has achieved superior performance over various topics, such as channel estimation [8], detection [9], CSI feedback [10], etc. In [11], a convolutional neural network (CNN) is trained to predict the downlink CSI based on the CSI of multiple adjacent uplink subcarriers for single-antenna FDD systems. In [12], a fully-connected neural network (FNN) is trained for uplink/downlink channel calibration for massive MIMO systems. In this paper, we propose a sparse complexvalued neural network (SCNet) for the downlink CSI prediction in FDD massive MIMO systems. Due to the richer representational capacity offered by complex representations, the SCNet can further improve the performance of channel prediction. Our contributions are summarized as follows.
1) Inspired by [14], we reveal a deterministic uplink-todownlink mapping function for a given communication environment when the position-to-channel mapping is bijective. Then, we prove that the uplink-to-downlink mapping function can be approximated with an arbitrarily small error by a feedforward network. 2) We propose a SCNet for downlink CSI prediction in FDD massive MIMO systems, which is applicable to complex-valued function approximation with complexvalued representations. Moreover, sparse network structure is adopted to reduce the complexity and improve the robustness. 3) Experiment results demonstrate that SCNet outperforms the FNN of [12] in terms of prediction accuracy and exhibits remarkable robustness over the number of paths.
II. SYSTEM MODEL Fig. 2 illustrates an FDD massive MIMO system, where the BS is equipped with M ≫ 1 antennas in the form of uniform linear array (ULA) 1 and the user is equipped with a single antenna. Since the proposed approach works for different users separately, we only need to illustrate for a single user. The channel between the user and the BS is assumed to consist of P rays and can be expressed as 2 [13], where f is the carrier frequency, and α p , φ p , τ p and θ p are the attenuation, phase shift, delay, and direction of arrival (DOA) of the p-th path, respectively. Moreover, a (θ p ) is the the array manifold vector defined as, a (θ p ) = 1, e −jχ sin θp , · · · , e −jχ(M−1) sin θp T , where χ = 2πdf /c, d is the antenna spacing, and c is the speed of light. According to [2]- [4], the incident AS with mean DOA θ seen by the BS is limited in a certain region, i.e., θ p ∈ [θ − ∆θ/2, θ + ∆θ/2]. Note that α p depends on (i) the distance between the user and the BS, denoted by D, (ii) the transmitter and receiver antenna gains, (iii) the carrier frequency, and (iv) the scattering environment. The phase φ p depends on the scatterer materials and wave incident/impinging angles at the scatterers. The delay τ p depends on the distance travelled by the signal along the p-path [4].

III. CHANNEL MAPPING FORMULATION
Denote h (f U ) and h (f D ) as the uplink and the downlink channels of the user with f U and f D being the uplink and the downlink frequencies, respectively. As indicated in Eq. (1), h (f D ) cannot be simply obtained from the h (f U ) for FDD systems. However, since the downlink and the uplink experience the same propagation environment with the common physical paths and the spatial propagation characteristics of the wireless channels are nearly unchanged within certain bandwidth [4], there is an intrinsic relation between the uplink and the downlink CSI.
In the following, we first define an uplink-to-downlink mapping function, following the approach in [14], and prove its existence. Then, we leverage deep learning to find the mapping function.

A. Existence of Uplink to Downlink Mapping
Consider the channel model in Eq. (1), where the channel function h(f ) is completely determined by the parameters α p , φ p , τ p , P , ∆θ, and θ. As discussed at the end of Section II, α p , φ p , τ p , P , and ∆θ are the functions of the communication environment (including the antenna gains, scatterers, etc.), mean DOA θ and distance D.
Definition 1: The position-to-channel mapping Φ f can be written as follows, where the sets {(D, θ)} and {h(f )} are the domain and codomain of the mapping Φ f , respectively. Then, we adopt the following assumption for further analysis.
Assumption 1 [14]: The position-to-channel mapping func- The Assumption 1 means that every user position has a unique channel function h(f ), and vice versa. Although it cannot be proved analytically, the probability that Φ f is bijective is actually very high in practical wireless communication scenarios, and approaches 1 as the number of antennas at the BS increases [14]. Therefore, it is reasonable to adopt Assumption 1 in massive MIMO systems.
Under Assumption 1, the channel-to-position mapping, i.e., the inverse mapping of Φ f , exists, which can be written as follows: Next, we investigate the existence of the uplink-to-downlink mapping, as given in Proposition 1.
Proposition 1: With Assumption 1, the uplink-to-downlink mapping exists for a given communication environment, and can be written as follows, where Φ fD • Φ −1 fU represents the composite mapping related to Φ fD and Φ −1 fU .
Proof. From the Definition 1, we have the mappings Φ fD : Under Assumption 1, the mapping Φ −1 fU exists with its codomain equal to the domain of Φ fD . Therefore, the composite mapping Φ fD • Φ −1 fU exists for any possible position (D, θ).
A more general proposition can be found in [14].

B. Deep Learning for Uplink-to-Downlink Mapping
Proposition 1 proves the existence of the uplink-to-downlink mapping function. However, the function cannot be depicted by known mathematical models, which motivates us to resort to deep learning algorithms. Based on the universal approximation theorem [15], we obtain Theorem 1 as following.
Theorem 1: For any given small error ε > 0, there always exists a positive constant N large enough such that (6) where NET N (x, Ω) is the output of a three-layer feedforward network with x, Ω and N denoting the input data, network parameters, and the number of hidden units, respectively.
Proof. (i) Since h(f U ) is bounded and closed, H is a compact set; (ii) Since Φ fD and Φ −1 fU are continuous mapping and the composition of continuous mappings is still a continuous mapping, we know that for ∀x ∈ H such that Ψ U→D (x) is a continuous function. Based on (i), (ii), and the universal approximation theorem [15, Theorem 1], Theorem 1 is proved.
According to Theorem 1, the uplink-to-downlink mapping function can be approximated with an arbitrarily small error by a feedforward network with a single hidden layer. Thus, we can train a network to predict the downlink CSI from the uplink CSI and can significantly reduce the overhead required for downlink training and uplink feedback at the cost of offline training.

IV. SCNET BASED DOWNLINK CSI PREDICTION
In this section, we will first introduce the architecture of the SCNet. Then, we discuss how to train and deploy it in massive MIMO systems.

A. SCNet Architecture
Although it has been proven in Theorem 1 that a three-layer network is able to predict the downlink CSI, we propose the SCNet instead of the three-layer network for practical considerations as follows: (i) A deep network with an appropriate number of layers learns better than a three-layer network; (ii) A spare network can reduce the network parameters, and therefore is easier to train and is more robust; (iii) Compared with the real-valued networks, the complex ones have richer representational capacities and therefore are more powerful in learning complex-valued functions [16].
As shown in Fig. 2, the input of the SCNet is the uplink CSI h(f U ). The output of the SCNet is a cascade of nonlinear transformation of h(f U ), i.e., is the network parameters to be trained. Moreover, f (l) is the nonlinear transformation function of the l-th layer and can be written as, where g is the activation function and is given by with ℜ[·] and ℑ[·] being the real and imaginary parts of the vectors, respectively. We set the number of neurons in the middle hidden layer to be much fewer than that in the output layer, which forces the SCNet to compress the representation of the input. We would like to emphasize that the compression task would be very difficult if the elements of input x are independent of each other. However, since there exists the sparse structure 3 in the uplink channel h(f U ), the SCNet is able to discover the intrinsic sparsity of h(f U ) in massive MIMO systems. As a result, the SCNet can not only reduce the redundancy of network parameters but also become more functional and robust [17].

B. Training and Deployment
The proposed downlink CSI prediction has two stages, i.e., the off-line training and the on-line deployment stages. In the off-line training stage, the BS collects both the downlink and the uplink CSI as training samples to train the SCNet. Specifically, during a coherence time period, the downlink CSI is first estimated at the user side by downlink training and then fed back to the BS. The uplink CSI is estimated at the BS by uplink training. The SCNet is trained to minimize the difference between the outputĥ(f D ) and the supervise label h(f D ). The loss function is 3 Since AS is narrow, the massive MIMO channels exhibit sparsity in the angular domain. See more details in [1]. where V is the batch size 4 , the superscript (v) denotes the index of the v-th training sample, · 2 denotes the ℓ 2 norm, and N h is the length of the vector h(f D ). The loss function Loss (Ω) is minimized by the complex designed adaptive moment estimation (ADAM) algorithm [16] until the SCNet converges.
In the deployment stage, the parameters of the SCNet are fixed. The SCNet directly generates the prediction of the downlink CSIĥ(f D ) based on the uplink CSI h(f U ).

C. Complexity Analysis
Denote n l as the number of neurons in the l-th layer. The required number of floating point operations (FLOPs) is used as the metric of complexity. For real-valued network, the total number of FLOPs required is L−1 l=1 n l−1 n l . However, as a complex multiplication is 4 times of its real counterpart, the total number of FLOPs required in the SCNet is 4 L−1 l=1 n l−1 n l . Nevertheless, it should be noted that in a real-valued network, the input complex data are typically separated to real and imaginary parts and then fed to the network. Therefore, the size of the real-valued network is lager than that of a complexvalued network.
V. SIMULATION RESULTS Unless otherwise specified, the system parameters are set as follows: the BS is equipped with 128 antennas; the uplink frequency follows the 3GPP R15 standard, i.e., f U = 2.5 GHz. In the simulation of Section V-A, the attenuation of each path follows Rayleigh distribution. The phase and delay of each path follow uniform distribution over [−π, π) and [0, 10 −4 ]s. The number of paths is 200 in both the training and deployment stages. While in the simulation of Section V-B, the parameters of each path are generated according to the ray-tracing simulator [18]. The number of paths is 200 in the training stage and varies in the deployment stage. 4 Batch size is the number of samples in one training batch. An FNN in [12] is originally designed for uplink/downlink channel calibration for massive MIMO systems, which can also be used for the downlink channel prediction in the FDD massive MIMO systems. Therefore, the FNN is used as a benchmark in this paper. Keras 2.2.0 is employed as the deep learning framework for both the SCNet and the FNN. We choose the number of neurons in the hidden layer as (128, 64, 128) by trails and adjustments. The initial learning rate of the ADAM algorithm is 0.001. The batch size is 128. The parameters of the SCNet are initialized as complex distribution with normalized variance 5 . The uplink CSI fed to the SCNet is estimated by the minimum mean-squarederror (MMSE) algorithm when the signal-to-noise ratio (SNR) is 25 dB. The network is trained for each AS degree and each downlink frequency separately. The number of training samples is 102,400, and the number of epochs is 400.

A. Prediction Accuracy versus AS and Frequency Difference
Normalized MSE (NMSE) is used to measure the prediction accuracy, which is defined as where E [·] represents the expectation operation. Fig. 3 depicts the NMSE performance of the SCNet and the FNN based downlink CSI predictors versus AS ∆θ and frequency difference f D − f U , respectively. Fig. 3(a) shows that the NMSE performance of both the SCNet and the FNN degrades as AS increases while the slope of the NMSE curve decreases as AS increases. This is because that as AS increases, the sparsity of channels in the angular domain decreases, and thus it is harder for the networks to learn the structure of the channels and to accurately predict the downlink CSI. The networks are less sensitive to AS in the wider AS case 6 , which accounts for the decrease of the slope in the wide AS case. Fig. 3(b) shows that the NMSE performance of both the SCNet and the FNN degrades as the frequency difference increases. This is because the correlation of CSI between the uplink and the downlink tends to vanish as the frequency difference increases. As shown in Fig. 3, the proposed the SCNet outperforms the FNN in all scenarios, which validates that the SCNet can benefit from the rich representational capacity offered by complex representations.

B. Robustness Analysis
In Sections V-A, the channels are generated based on Eq. (1) with the same statistics. However, channels in real-world may be more complicated and the statistics mismatches between the training and deployment stages are also inevitable. To test the robustness of both the SCNet and the FNN, data generated from Wireless InSite [18] under different scenarios are used to train and test. As shown in Fig. 4, the number of paths in the training stage is 200 while it varies in the deployment stages. The results show that the variations on statistics of channel degrade the performance, but the SCNet and the FNN still exhibit remarkable prediction accuracy, which validates the excellent generalization ability of deep neural networks.
VI. CONCLUSION In this paper, we revealed the existence of a deterministic uplink-to-downlink mapping function for a given communication environment. Then, we proposed the SCNet for the downlink CSI prediction in FDD massive MIMO systems. Simulation results have demonstrated that the SCNet performs better than the existing network in terms of prediction accuracy. Furthermore, the remarkable robustness of the SCNet with respect to the statistic characteristics of wireless channels has shown its great potential in real-world applications.