Deep Learning Based Nonlinear Signal Detection in Millimeter-Wave Communications

For millimeter-wave (mm-Wave) communications, signal detection in the presence of the power amplifier (PA) nonlinearity and unknown multipath channel has remained one challenging task in single-input single-output (SISO) communication system. Besides, the PA nonlinearity in multiple-input multiple-output (MIMO) communication system also has severe effects upon the signal detection in receiver-end.In this paper, firstly, we suggest a deep-learning (DL) framework, i.e. integrating feedforward neural network (FNN) and recurrent neural network (RNN), to combat both the nonlinear distortion and linear inter-symbol-interference (ISI) from a global point of view, thereby accomplishing nonlinear equalization and signal detection at the receiver-end in SISO communication system.Utilizing the powerful mapping and learning capability of DL, our new method is able to detect symbols via the received signals corrupted by both nonlinear distortion and linear ISI, avoiding both the explicit nonlinear pre-distorter in transmitter and the channel state information (CSI) estimator.Secondly, our DL-based framework can also successfully cope with the joint nonlinear distortion and space-time decoding problem in MIMO communication system, without explicitly pre-calibrating nonlinear distortion and estimating CSI.Numerical experiments demonstrate our DL-based detector is more effective in alleviating the performance degradation both from the coupled nonlinear distortion and linear ISI in SISO communication system, and the coupled nonlinear distortion and linear space-time decoding in MIMO communication system.Compared with the state-of-the-art methods, e.g. pre-distorter and post-equalizer, our DL-based scheme effectively improves the detection performance.

communication have motivated considerable researches. The digital pre-distortion (DPD) techniques, e.g. relying on the memory polynomial [8] or real-valued focused time-delayed neural network (RVFTDNN) [9], may compensate nonlinear distortion of PAs in transmitter. However, such DPD techniques fail to mitigate the nonlinear effects of PA completely. When combined with the subsequent linear ISI, the residual nonlinear distortion would degrade signal detection. Meanwhile, other than the channel estimator and coherent detector at the receiver-end, the extra pre-distorter substantially increases the complexity of transmitter, which becomes less attractive to the low-complexity and low-power devices.
At the receiver-end, a joint nonlinear equalizer and detector, relying on the Bayesian statistical inference, is also proposed to address nonlinear distortion and multipath effect in SISO communication system [10], [11]. Even if the PA nonlinearity model is used as a priori, it can only address the PA nonlinearity partially when estimating CSI, via a locally linear approximation based on the Taylor series expansion (TSE). In more complex situations, e.g. with unknown PA model and non-line-of-sight (NLoS) propagation, its detection performance will be degraded remarkably.
Nonlinear PA also seriously impend the signal detection in MIMO system. Space-time block codes (STBC) has been developed to provide reliable transmission for MIMO system [12], [13]. The traditional MIMO detection methods, zeroforcing (ZF) and minimum mean square error (MMSE) detectors, have been utilized to decode STBC code [14], [15]. The ZF-based and MMSE-based detectors require the accurate channel statistics. And unfortunately, these existing methods don't consider PA nonlinearity at the transmitter-end.
Researchers have applied deep learning (DL) to the physical layer for modulation recognition [16], [17], encoding and decoding [18], [19], channel estimation and detection [20] [21]. For example, the work in [20] investigated joint orthogonal frequency-division multiplexing (OFDM) and feedforward neural network (FNN) scheme for channel estimation and signal detection in SISO system. Yet, to the best of our knowledge, the application of DL to signal detection of SISO and MIMO systems, especially in the presence of nonlinear distortion and linear ISI, remains one open problem.

B. OUR WORK AND CONTRIBUTIONS
In this paper, we propose a DL-based joint nonlinear equalizer and signal detector at the receiver-end. In our model, we propose two different DL-based methods to cope with joint nonlinear PA and linear ISI in SISO system, and joint nonlinear PA and decoding STBC code in MIMO system, respectively.
To be specific, the main contributions of our work are summarized as follows: • Meanwhile, the complex baseband processing of transmitter-end pre-distorter is avoided. After the supervised training, our scheme serves as one nonlinear detector, reconciling channel estimator with coherent detector, which would be effective in applications.
The rest of this paper is structured as follows. In Section II, we introduce the system model, and the new DL-based signal detection method is then presented in Section III. Section IV provides numerical simulation results. We finally conclude our investigation in Section V. Notations: Bold lowercase and uppercase letters represent vectors and matrixes, respectively, i.e. a and A. (·) T and (·) H represent the transpose and Hermitian transpose, respectively. (·) * represents the conjugate complex number. (·) −1 is the inverse of a square matrix. The Euclidean norm of a vector or a matrix is denoted by . 2 2 . R n×m is a real space of the dimension n × m. C n×m represents a complex space of the dimension n × m, and I n×n denotes an n-by-n identity matrix. CN 0, σ 2 denotes the distribution of a circularly symmetric complex Gaussian (CSCG) random variable with mean zero and variance σ 2 .

II. SYSTEM MODEL AND PROBLEM STATEMENT
In this work, we not only focus on both the SISO system with nonlinear PA and multipath fading channel, which is of great promise to the emerging D2D communication [22], where the devices have only one single antenna, but also consider the MIMO system with nonlinear PA, where it is one key technique for 5G wireless communication [23].
Owing to the inherent hardware imperfection, the PA nonlinearity at the transmitter-end would have substantial effects on emitted signals, i.e. the constellation of emitted signals would be seriously distorted. The PA nonlinearity is usually characterized by amplitude modulation-amplitude modulation (AM-AM) and amplitude modulation-phase modulation (AM-PM) models. In this analysis, we adopt the nonlinear PA model regulated by IEEE 802.11ad task group (TG) [5], [11], whereby the AM-AM and AM-PM models are given by: Here, V in , G(V in ) and ψ(V in ) are input voltage amplitude, output voltage amplitude and additional phase, respectively. The linear gain g l , the smoothness factor σ s , the saturation level V sat , constant factors α, β, q 1 , q 2 can be found in [5].
As is shown in Fig. 1, i.e. the curve of AM-AM and AM-PM, the nonlinear distortion is dominantly relevant to the input voltage V in . When the input voltage V in is less than 0.1, the nonlinear and phase shift can be neglected and PA can be considered as an ideal linear system. But the input voltage V in is relevant closed to the saturation V sat , nonlinear distortion will lead to serious consequences for signal detection.

A. SISO COMMUNICATION SYSTEM
Given the PA nonlinearity and the multipath interference, the block diagram of the entire SISO communication system model is described by Fig. 2. First, the binary source sequence b i (i = 0, 1, 2 · · · ) is fed into an M -order linear modulator (e.g. M-QAM), which then maps each m (m = log 2 M ) bits source sequence into a modulated symbol x t (t = 0, 1, 2 · · · ) ∈ X , with |X | = M . Then, each symbol x t is passed through the nonlinear PA, and the emitted signal At the receiver-end, the received signal at the discrete time t (t = 0, 1, 2 · · · ) reads: ∈ C J ×1 denote the baseband complex multipath response, the white Gaussian noise and the emitted signal, respectively. J denotes the length of channel impulse response (CIR). For ease of analysis, the channel h is deemed to be quasi-static, which follows the complex Gaussian distribution with the mean vector h and the covariance matrix [11]. The main concern is how to recover the unknown symbols {x t } sequentially from the observed signals {y t }, which are corrupted by the coupled nonlinear distortion and linear ISI.
There are several approaches to address the aforementioned problems. As shown in Fig. 2a, the DPD techniques, relying on polynomial's memory model [8] or neural networks [9], can alleviate the nonlinearity at the transmitterend, e.g. by training a pre-distorter to approximate the inverse function of PA. Apart from the increased complexity of transmitter, such DPD methods are insufficient to mitigate nonlinear effects. Besides, extra channel estimator and signal detector are needed at the receiver-end. The Bayesian equalizer and detector have also been proposed at the receiverend [11], as shown in Fig. 2b. Although it avoids the highcost transmitter with pre-distorter, its detection performance is still limited, due to the approximation residual caused by the first-order TSE.
From a global point of view, we design a DL framework to mitigate the coupled nonlinear distortion and linear ISI, as in Fig. 2c. We implement DNN at the receiverend, which serves as one joint equalizer and detector, thereby potentially alleviating the complexity of the whole system. Moreover, both nonlinear distortion and linear ISI can be mitigated effectively by our cascaded FNN and RNN structure, thus greatly enhancing the detection performance.

B. MIMO COMMUNICATION SYSTEM
We also consider a single-user N r ×N t MIMO system without considering ISI, where N r and N t are the number of receiver antennas and transmit antennas, respectively. Without losing of generality, N r = N t = 2 is used to analysis in this article, which is illustrated in Fig. 3. we use STBC as spatial modulation to encode the modulated symbols, where every 2 × 2 codeword matrix from two information symbols x t 1 = {x 2t 1 , x 2t 1 +1 } (t 1 = 0, 1, 2 · · · ) will be sent during T = 2 times slots from two transmitted antennas [24], [25]. Thus the STBC signal matrix can be written as follows: The STBC signal matrix X t 1 is then passed through front-end nonlinear PA, and the transmitted signal matrix is generated, which is given by At the received end, the received signal matric Y from two receiver antennas, 2 × 2 dimension matrix, can be written as where H is the 2 × 2 dimensional channel matrix with independent identically distributed (i.i.d) elements from VOLUME 8, 2020 And the detailed received signal matric Y t 1 can be rewritten as in (7), where y i,s is received signal value at the i-th receiver antenna and the s-th times slots, h ij is the channel state between i-th receiver antenna and j-th transmit antenna, and b i,s is the white Gaussian noise value at the i-th receiver antenna and the s-th times slots. At the receiver-end, our main concern puts on how to decode the observed signal matrixes Y t 1 corrupted by the nonlinear distortion and thus to recover the unknown symbols There are numerous studies to decode STBC code for MIMO system, e.g. ZF and MMSE receiver [14], [15]. Specifically, for ZF receiver, the estimateX ZF of the mod- exists, whereĤ denotes the least square (LS) estimate of the channel matrix, given by Y p X H p X p X H p −1 [26]. X p and Y p represent the transmitted pilot signals and the received pilot signals, respectively. For MMSE receiver, the MMSE detection removes the interference by: According to eq. (8) and (9), both the ZF-based and MMSEbased detectors require to estimate CSI. Utilizing the powerful mapping and learning capability of neural network, the proposed DL-based framework is also applicable to MIMO system. By avoiding the acquisition of the prior CSI, our method can cope with the joint PA nonlinearity and STBC decoding problem in MIMO system by optimizing the detection performance from the global point of view.

III. DEEP LEARNING-BASED SIGNAL DETECTION A. SISO COMMUNICATION SYSTEM
The proposed DNN structure involves cascaded RNN and FNN in SISO system. The first one aims to equalize the multipath channel and thus mitigate ISI, while the second one targets at calibrating the nonlinearly distorted signals and outputting the detected symbols, as in Fig. 4.
Such a cascaded DNN is trained firstly, i.e. based on the back-propagation algorithm [27], and then serves as the joint equalizer and detector in the test stage. In the following, we elaborate on such two sub-networks.

1) RNN
RNN is powerful in dealing with time series with memory [27], and hence can be utilized to model and combat ISI caused by multipath channel, as in Fig. 4. In our RNN, the input vector of the time index t is defined as follows: In the supervised training stage, the label s t of input vector y t is denoted as 1 where the label s t is generated from a one-hot mapping on modulated symbol x t , i.e. s t = g(x t ), where g is the one-hot mapping function. Here, 1 (k) is an M -dimensional one-hot vector, whose k-th element is set as one and zero otherwise [28]. The first layer of DNN is to map the complex input vector into its real representation, which is easier to be analyzed [29].
Specifically, each complex signal is structured into a twodimensional real-valued vector, i.e. r t = [Re(y t ), Im(y t )], whereby Re(y t ) and Im(y t ) respectively denote the in-phase (I) and quadrature phase (Q) component. Then, the representative input matrix, i.e. R t = [r t−J +1 ; · · · ; r t−1 ; r t ] ∈ R J×2 , is fed into a multi-input to single-output RNN architecture in Fig. 4, whereby the weight matrices U, W and V represent the input-to-hidden connections, hidden-to-hidden recurrent connections and hidden-to-output connections, respectively. The hidden state in RNN, i.e. d t , is of importance to retain the important information extracted from the previous signal sequences [27], which will be propagated to the next. Thus, the information flow is characterized by: where d t and r t denote the hidden state and the new input data at time t, respectively. f 1 propagates the hidden state from time t − 1 to time t. Finally, the output of RNN is computed via where tanh is the activation function, c is the bias vector. Then o t is then fed into the following FNN to further decouple the nonlinear distortion.

2) FNN
Owing to its powerful nonlinear mapping ability [27], deep FNN, i.e. multilayer perceptions (MLPs), is then used to model the complex mapping between RNN's output o t and target signal x t . FNN is made up of cascaded fully-connected layers, whereby the ReLU activation function is adopted in each hidden layers, i.e. max(0, o t ). For the last layer FNN, its output is: where l, S and p t denote the number of layers, the set of all parameters and the output of FNN [20]. f (i) 2 (i = 1, · · · , l −1) and f FNN are the mapping function of the i-th layer and the mapping function of the whole l-layer FNN, respectively.
Given the multi-classification problem, i.e. |X | = M > 2, we adopt the softmax function as the activation function in the last output layer. In this manner, the output of the last layer, p t = [p 0 t , · · · , p M −1 t ] T , gives the probability vector over M possible messages, i.e. the sum of the vector equals to 1, M −1 m=0 p m t = 1, p m t ∈ (0, 1). Specifically, the probability of the m-th symbol is generated by: where z t = [z 0 t , · · · , z M −1 t ] T is an input vector of the softmax activation function in the last layer at time t.

3) TRAINING AND TESTING PROCESS
In the training process, the cross-entropy function is used as the objective loss function, which measures the difference between the label s t and the output p t , i.e.
where N is the total number of the training samples. The set of network parameters is thus updated by using the minibatch stochastic gradient descent (SGD) algorithm with the adaptive moment estimation (ADAM) optimizer [27]. After training, the resulting network can minimize the loss function L 1 . Finally, in the testing process, the information symbol at time t is recovered after the de-mapping of one-hot mapping, and the detected symbolx t can be attained from one-hot vectorp t ,x t = g −1 (p t ), where the index ofp t with the only nonzero value is equivalent to that of p t with the largest value.
In practice, the LSTM cell is adopted in the implementation of RNN, owing to its stability in sequential processing [27]. Note that, this finite memory length of RNN should be no less than that of CIR (J ) in different propagation scenarios. Besides, the expressive capability of DNN will become stronger with the increasing number of neural network layers and neurons per layer, yet the computational complexity will also increase [27]. Thus, we need to configure the structure of DNN, depending upon different propagation scenarios. Note also that the output dimension of the last layer DNN should be equivalent to the number of modulated symbol categories (|X | = M ). Based on such practical considerations, the detailed layout of our DNN in SISO system is provided in Table I.

B. MIMO COMMUNICATION SYSTEM
Without considering ISI in MIMO system, our DNN-based framework only involves FNN. Our DL-based framework can jointly decode STBC code without acquiring the prior of CSI and mitigate the nonlinearly distorted signals, and therefore the complexity of the resulting FNN-based framework is lower than that of DNN-based, joint RNN and FNN, framework, as in Fig. 5. We describe the associated procedures in the following. We preprocess the received signal matric Y t 1 from two received antennas as the input vector y t 1 of the time index t 1 , according to the characteristic of STBC. As in Fig. 5, the input vector y t 1 is defined as follows: where f pre is the matrix transformation function. Meanwhile, we restructure the STBC signal matrix X t 1 as the label x l t 1 of the input vector y t 1 , which is characterized by: In agreement with our method in SISO system, the complex input vector y t 1 and its label x l t 1 should be firstly mapped into their real representation, which is easier to train with our DL framework [29]. Specifically, the real-valued input vector y t 1 is thus given by: 2t 1 ), Im(y 1,2t 1 ), Re(y 2,2t 1 ), Im(y 2,2t 1 ), Re(y 1,2t 1 +1 ), Im(y 1,2t 1 +1 ), Re(y 2,2t 1 +1 ), Im(y 2, Re(y 1,2t 1 ) Im(y 1,2t 1 ) Re(y 2,2t 1 ) Im(y 2,2t 1 ) Re(y 1,2t 1 +1 ) Im(y 1,2t 1 +1 ) Re(y 2,2t 1 +1 ) Im(y 2,2t 1 +1 ) (18) where f c2r is the mapping function from complex value into real value. Similarly, we can also obtain the real-valued label x L t 1 , which is denoted by: Similar to the FNN in SISO system, our FNN sub-network is composed of cascaded fully-connected layers, and each hidden layers adopt the ReLU activation function. Due to the value of label x L t 1 between -1 and 1, the tanh activation function is adopted in the last output layer. The output of our framework is given by: where S and x t 1 denote the set of all parameters and the output of neural network at the time index t 1 . f map is the mapping function between x t 1 and y t 1 .
In the training process, we use mean-squared error (MSE) as the objective loss function, which is defined as, The ADAM optimizer is selected as the optimization algorithm. The aim of training network is to make the output of neural network x t 1 to fit its label x L t 1 . In the test processing, we can obtain the detected signal [x 2t 1 ,x 2t 1 +1 ] from the output of neural network x t 1 , which is shown by: where f −1 c2r and f −1 pre are the inverse mapping function of f c2r and f pre , respectively. The detailed layout of our framework in MIMO system is shown in Table II.

IV. EXPERIMENTAL SIMULATIONS AND PERFORMANCE EVALUATIONS
In this section, we execute computer simulations and provide numerical results to validate our DL-based detection approach in SISO and MIMO communication systems, respectively.

A. SISO COMMUNICATION SYSTEM
In the numerical simulations and analysises, we will consider three types of channel model in SISO communication system [11]: 1) the single-path LoS channel scenario, where the channel mean and covariance are set to be h = [1 + 1j] and = δ 2 (δ = 0.01); 2) the LoS multipath channel scenario (J = 3), where the channel mean and covariance matrix are set to be h = [1 + 1j, 0.1 + 0.1j, 0.006 + 0.006j] and = diag{δ 2 , δ 2 , δ 2 }, δ = 0.01; 3) the NLoS multipath channel scenario (J = 10), where the strongest path appears in the latter position rather than the first one. In the following, we focus on the quasi-static channel. The high-order modulation 16QAM is used.

1) EXPERIMENT I: LoS CHANNEL WITH NONLINEAR PA
The training and testing sets include 10000 and 160000 samples, respectively. In the training stage, we use a fixed SNR value, i.e. E b /N 0 = 7dB. And, the obtained model can be then directly applied to other unknown SNRs in realistic scenarios [28]. The epochs and batch size are set to be 30 and 100, respectively. The learning rates are 0.001 and 0.0001 for the former 70% and the latter 30% epochs.
Our method, jointly addressing the nonlinear distortion and linear ISI at the receiver-end, is validated in Fig. 6. It is seen from Fig. 6a and Fig. 6b that the bit error rate (BER) performance of our proposed DNN-based scheme is superior to that of the Bayesian-based scheme at the different output power back-off (OBO) values, in LoS channel conditions. Fig. 6c shows the BER performance of FNN-based detector corresponds closely to that of our proposed DNN-based scheme in the simple LoS multipath channel scenario. But especially at severely nonlinear distortion (e.g. OBO=9dB), our method is slightly prior to FNN-based method.
Further, we also compare our scheme with the Bayesianbased method [11] at the receiver-end (with both accurate and inaccurate PA model, e.g. 5% deviation on PA parameters), FNN-based method at the receiver-end and the DPD techniques at the transmitter-end, e.g. the polynomial's memorybased [8] and the RVFTDNN-based [9] methods. As in Fig. 7, our method outperforms both the DPD method [8], [9] and the Bayesian-based method [11]. More importantly, it avoids the highly complicated transmitter designing, in contrast to RVFTDNN-based DPD method [9]. Besides, our method per- forms 15 times faster than the RVFTDNN-based method [9]. E.g. the average processing time in the test stage of our method is about 6.52s, whilst RVFTDNN-based DPD method requires 96.75s when N = 160000.

2) EXPERIMENT II: NLoS CHANNEL WITH NONLINEAR PA
In this case, the training and testing sets include 160000 and 160000 samples. The training samples involve two SNR cases, i.e. 25% under E b /N 0 = 20dB and 75% under E b /N 0 = 22dB. The epochs and batch size are set to be 200 and 250, respectively. The learning rates are 0.01, 0.001 and 0.0001 for the first 50%, the middle 30% and the last 20% epochs.  Fig. 8, the performance of various detectors degrades in more complex NLoS channels. Again, our scheme attains the better performance than the Bayesian-based method under both accurate and inaccurate PA model, by mitigating the nonlinear distortion and linear ISI more completely via the unprecedented global expressive ability of our DL framework. Besides, we can observe that DNN-based method is much better performance than FNN-based method in complex NLoS channel. Incapable of the long time memory, it is a touchy problem for FNN to deal with time series in complex environments. Fig. 8 also shows that our method outperforms RNN-based scheme especially at low SNR region. Meanwhile, owing to the extraordinary complexity of RNN, our method performs three times faster than RNN-based scheme.

B. MIMO COMMUNICATION SYSTEM
In this section, we execute computer simulations and provide numerical results to evaluate the proposed DNN-based detector in both 2 × 2 MIMO system and 2 × 4 MIMO system. The high-order modulation used is 16QAM. The spatial modulation used is STBC code. The channel matrices H are generated with i.i.d. CN (0, 1) in numerical simulation experiments. The channel mode is also quasi-static in this scenario.
The training and testing sets include 10000 and 160000 samples, respectively. In the training stage, the fixed SNR value, epochs, batch size and learning rates are the same as the SISO LoS channel scenario, respectively.
As in Fig. 9, we evaluate the detection performance of our method between SISO, 2 × 2 MIMO and 2 × 4 MIMO scenarios with the difference OBO values, respectively. It is easy to observe that the BER performance of 2 × 2 MIMO scenario is much better than that of SISO scenario. And 2 × 4 MIMO scenario has the best the BER performance, compared with SISO and 2 × 2 MIMO scenarios. This simulation results validate our theoretical analysis that MIMO system based STBC code has diversity gain to improve the detection performance of the whole communication system. Further, we also compare our scheme with the RVFTDNNbased DPD scheme [9] and only using the MMSE detection scheme [15]. As in Fig. 10, the detection performance of the only using the MMSE detection scheme is worst. Thus the nonlinearity PA has serious effects upon the detection performance in MIMO system. Fig. 10 shows that our DNN-based scheme achieves better performance than the RVFTDNNbased DPD scheme in the difference OBO values. Besides the RVFTDNN-based DPD scheme needs to acquire the perfect knowledge of the MIMO channel matrix to decode STBC code, in contrast to our scheme without the prior of MIMO channel matrix information. In Fig. 10, we can also see that the detection performance of our scheme at linear PA corresponds closely to that of the LMMSE detector. Hence we can obtain that our scheme can be very effective for decoding STBC code, without the prior of CSI. It is seen in Fig. 11 that the estimated error of channel state at the receiver-end will degrade the detection performance. The greater the relative error ratio, the worse the BER performance. For instance, if the value of estimated error is relative error ratio of 10% at E b /N 0 = 9dB, the BER performance may be deteriorated even by one order of magnitude. Thus we can draw a conclusion that it is important for the RVFTDNN-based DPD scheme to estimate the channel state.

V. CONCLUSION
A DL-based detection approach is designed, which, as one joint nonlinear equalizer and signal detector, is not only capable of mitigating both PA nonlinearity and multipath ISI at the receiver-end in SISO mm-wave communication system, but jointly mitigating PA nonlinearity and decoding STBC code in MIMO communication system. In SISO system, our DLdetector is that two cascading sub-networks are integrated, i.e. the former RNN with memory aims to combat the linear ISI, whilst the latter FNN targets at calibrating the nonlinear distortion. In this manner, both the sophisticated pre-distorter in transmitter and the explicit CSI estimator in receiver are excluded, leading to the simplified implementation. In MIMO system, our DL-detector, consisting of FNN, is to jointly mitigate nonlinear PA and decode linear STBC code. With the concept of global and joint processing, the detection performance is effectively improved, by fully utilizing the powerful nonlinearity modelling capability of neural network. Our new DL-based detector effectively enhances the signal detection in the presence of coupled nonlinear and linear distortion, and thus provides great promise in emerging mm-wave communications.