Deep-Learning-Aided Joint Channel Estimation and Data Detection for Spatial Modulation

Deep neural network (DNN)-aided spatial modulation (SM) is conceived. In particular, a pair of DNN structures are designed for replacing the conventional model-based channel estimators and detectors. As our first prototype, the conventional DNN estimates the channel relying on the pilot symbols and then carries out data detection in a data-driven manner. By contrast, our new DeepSM scheme is proposed for operation in more realistic time-varying channels, which updates the channel state information (CSI) at each time-slot (TS) before detecting the data. Hence, our novel DeepSM scheme is capable of performing well even in highly dynamic communication environments. Finally, our simulations show that the proposed DeepSM outperforms the conventional model-based channel estimation and data detection for transmission over time-varying channels.


Adam
Adaptive For coping with the dramatically increasing data traffic, novel transmission schemes are in urgent demand for improving the throughput while minimizing the deployment complexity. Among numerous cutting-edge techniques, spatial modulation (SM) [1]- [7] has constitute a promising next generation scheme for massive machine type communications (mMTC) [8]- [10], for TeraHertz communications [11] and for intelligent reconfigurable surfaces (IRS) [12]- [14]. The roots of SM can be traced back to 2001, when Chau and Yu [15] proposed the space-shift keying (SSK) concept, which only conveys information by the transmit antenna (TA) indices. By contrast, the SM [1], [2] activates a single TA, which transmits a single amplitude-phase modulation (APM) symbol. This unique TA activation scheme allows the transmitter to implicitly convey additional information bits 'hidden' in the active TA index patterns, hence achieving energy-efficient communications. A subsequent development of SM activating a small fraction of TAs is known as generalised SM (GSM) [16], which simultaneously activates a group of TAs for conveying multiple APM symbols.
Since only a low number of radio frequency (RF)-chains are active during each symbol instant, the optimal maximum-likelihood SM detector is capable of jointly detecting both the active TA indices and the classic APM signal at a low complexity. Since the RF-chains are the priciest and highest-power transceiver components, SM has compelling benefits over the family of conventional multiple-input multiple-output (MIMO) systems, such as the vertical Bell Laboratories layered space-time (V-BLAST) coded or space-time block coded (STBC) systems [17].
At the time of writing, machine learning is attracting increasing attention in wireless communications [18]- [21]. In particular, by employing neural networks (NNs), near-optimal low-complexity channel estimation and/or data detection can be achieved for different physical-layer communication schemes, relying either on a data-driven approach [22], [23], where no mathematical model is required, or on a model-driven basis [24]- [26], which exploits the benefits of both well-established mathematical physical-layer communication models and of NNs.
Machine learning techniques have also demonstrated good performance in SM systems [27]- [32]. First, adaptive SM can be designed with the aid of different deep learning techniques. For instance, Yang et al. [27] integrated machine learning with adaptive SM-MIMO systems, while Tato et al. [28] adopted multilayer feedforward NNs (MFNNs) for accurately evaluating the mutual information (MI) in support of adapting the SM modes. Furthermore, deep learning techniques may also contribute to the TA selection of SM. In particular, Zhang et al. [29] formulated a data-driven NN architecture for TA selection by exploiting the channel state information (CSI), while a deep neural network (DNN)-aided TA selection scheme was designed for secure SM in [30]. Recently, Albinsaid et al. [31] investigated the block DNN-aided detection of GSM, whereas Shamasundar and Chockalingam [32] proposed a modularized DNN architecture for GSM detection.
However, to the best of our knowledge, there is no research published on DNN-aided joint channel estimation and data detection conceived for SM. To fill this knowledge-gap, a pair of DNN architectures are designed for joint channel estimation and data detection in SM systems. The main contributions of this paper are contrasted to the literature in Table 1 and are summarized as follows: • Firstly, we apply the fully-connected multi-layer DNNs to SM and conceive a conventional DNN architecture for joint channel estimation and data detection of SM, operating in a data-driven manner, which replaces the conventional model-based channel estimator and detector by DNNs. Our simulations demonstrate that the conventional DNN-aided channel estimation and data detection of SM is capable of approaching the bit error ratio (BER) performance to that of the conventional model-based SM channel estimation and data detection approach for transmission over time-invariant channels, despite its reduced detection complexity.
• We then propose a novel DNN architecture, referred to as DeepSM, which detects the transmitted bits and in the meantime, updates the CSI of each time slot (TS) for transmission over time-varying fading channels in a data-driven manner. We demonstrate that our proposed DeepSM is capable of improving the BER performance of the conventional model-based channel estimation and data detection, despite its reduced detection complexity VOLUME 8, 2020 in terms of runtime, when communicating over more dynamic channels.
• Finally, we investigate the effects of the quantization errors imposed by finite-resolution Analogue-to-Digital Convertors (ADCs) on the BER performance attained. When our DeepSM is employed, better BER performance can be observed in the presence of 4-or 6-bit ADCs than that of conventional model-based approaches. The rest of this paper is structured as follows. Section II reviews the SM system model, while Section III discusses the conventional model-based SM channel estimation and data detection. Next, Section IV details a pair of DNN architectures designed for joint channel estimation and data detection of SM. Then, the performance of SM evaluated for transmission over different wireless channels is quantified in Section V. Finally, our conclusions and future research ideas will be provided in Section VI.

II. SYSTEM MODEL
In this section, we detail the transceiver model of a SM system employing N t TAs and N r receive antennas (RAs), as shown in Fig. 1, transmitting over wireless channels, along with the main assumptions of the paper.
As shown in Fig. 1, first, the pilots are transmitted during the N t pilot TSs, followed by T TSs for transmitting a single frame of data symbols. Within the t-th data transmission TS, the u-bit information u u u(t) is transmitted only by one of the N t TAs. More specifically, u u u(t) is first divided into two sub-groups expressed as u u u(t) = u u u 1 (t)|u u u 2 (t). The first u 1 = log 2 N t bits of u u u 1 (t) are SSK modulated [15], [33], resulting in a SSK symbol s 1 refers to the SSK mapping and M 1 is the SSK symbol set, which can be expressed as M 1 = {1, · · · , N t }. The remaining u 2 = log 2 M 2 bits u u u 2 (t) are mapped to an APM symbol s 2 refers to the APM mapping and M 2 is defined as the symbol set of M 2 -ary APM. Then the APM symbol s 2 (t) together with the active TA index s 1 (t) mapped to the SSK symbol are transmitted by a single active TA. The transmitted signal can be expressed as where e e e s 1 (t) (t) is a (N t × 1) antenna selection vector comprising only a single non-zero element '1' in the s 1 (t)-th position, which indicates the active TA's index.
Given the channel vector h h h s 1 (t) between the s 1 (t)-th TA and N r RAs in the t-th TS, the signal received by the BS can be expressed as where and n n n(t) is the additive white Gaussian noise (AWGN) obeying a zero-mean complex Gaussian distribution with a covariance matrix of σ 2 I I I N r . Hence, 1/2σ 2 is the signal-to-noise ratio (SNR) per SM symbol. For time-varying channels, h h h s 1 (t) is expressed as [34] h h h s 1 where h h h s 1 (0) obeys the independently and identically dis- is the unpredictable difference from the channel vector, which also obeys the i.i.d. complex Gaussian distribution with a mean of 0 and a covariance of I N r , while α is the autoregressive (AR) coefficient, which can be defined as , where α τ/2 is the time-domain (TD) correlation and τ is the discrete time-lag. The detailed discussion of α can be found in [34].

III. CONVENTIONAL CHANNEL ESTIMATION AND DATA DETECTION FOR SM
In this section, we briefly review the conventional pilot-aided channel estimation and maximum-likelihood data detection regime of SM in Sections III-A and III-B, respectively.

A. CONVENTIONAL CHANNEL ESTIMATION
As shown in Fig. 1, we assume that prior to the data transmission, N t TSs are employed to transmit the pilot symbols over the N t TAs, where the proportion η of the N t pilot symbols to the T -length data frame is defined as More specifically, in the p-th (1 ≤ p ≤ N t ) pilot TS, the p-th TA transmits its signal x(p) = 1 over the wireless channel. The pilot signalȳ y y(p) received at the BS in the p-th (1 ≤ p ≤ N t ) TS can be expressed as where h h h p is an N r × 1 vector representing the CSI between the p-th TA and the N r RAs.

Then the received signal is considered as the estimated CSÎ h h h p , which is expressed aŝ
After the N t pilot TSs, we can obtain the N r × N t estimated channel matrix, which can be expressed aŝ whereĥ h h p(0) =ĥ h h p , p = 1, 2, · · · , N t , which will be employed for the maximum-likelihood data detection in the next subsection.

B. MAXIMUM-LIKELIHOOD DATA DETECTION
Following the pilot transmission, data symbols are transmitted from t = 1-st to t = T -th data TSs, which are detected with the aid of the estimated channel matrixĤ H H (0) obtained from (6). Specifically, given the received signal y y y(t) and the estimated channelĤ H H (0), the optimal maximum-likelihood detection finds the estimateŝ 1 (t)|ŝ 2 (t) ∈ M = M 1 ⊗ M 2 by solving the following optimization problem: wheres 1 (t) represents the legitimate active TA index at the tth TS ands 2 (t) is the potential candidate of the APM symbol in the symbol set M 2 at the t-th TS. We can see that the maximum-likelihood detector has to visit all M = N t × M 2 possible combinations ofs 1 (t) ∈ M 1 ands 2 (t) ∈ M 2 , yielding a complexity order of O (N t M 2 ).

IV. DATA-DRIVEN DNN-AIDED CHANNEL ESTIMATION AND DATA DETECTION
In this section, we first introduce DNN-aided channel estimation and data detection assisted SM in Section IV-A, followed by the proposed data-driven DNN architecture in Section IV-B.

A. CONVENTIONAL DNN ARCHITECTURE
The multi-layer fully-connected DNN architecture of Fig. 2 can be employed for replacing the conventional channel estimator and data detector discussed in Section III. More specifically, in time-invariant stationary scenarios, the channelŝ H H H (0) estimated from the pilot symbols can be expressed aŝ where Y Y Y (0) = [ȳ y y(1),ȳ y y (2), · · · ,ȳ y y(N t )] represents the pilot symbols of N t pilot TSs. Hence, as shown in Fig. 2, the received pilots Y Y Y (0) and the signal y y y(t) received during the t-th data TS constitute the inputs of the L-layer fully-connected DNNs, where each layer is comprised of Z = 64 nodes, yielding the output of where W W W l and b b b l , l = 1, · · · , L, represent the weights and biases, respectively, of the l-th hidden layer of the DNNs of Fig. 2. Here, the rectified linear unit (Relu) function of f Relu (a) = max(0, a ) is employed for activating the DNN training, and the sigmoid function of f sigmoid (a) = 1 1+e −a is used at the output layer to obtain the detected bitsû u u(t). The network weights, which comprise the parameters that are being optimized in the DNN, can be expressed as The training process optimizes the network weights θ by minimizing the loss function. In this paper, the mean squared error (MSE) between the output bitsû u u(t) of the DNN and the transmitted bits u u u(t) in the t-th TS is adopted as the loss function, which can be expressed as where B is the size of a mini-batch. The network weights θ θ θ are updated for the batches and randomly picked up from the data samples, using the classic stochastic gradient descent (SGD) algorithm [35] expressed as where ε is the learning rate of the SGD and ∇ϕ 1 (θ ) represents the gradient of ϕ 1 (θ ). In this paper, ε = 10 −3 is selected for the performance characterization of both the DNN and of our proposed DeepSM architectures. Note that during the training process, the popular adaptive moment estimation (Adam) optimizer is employed for the off-line learning. Our experiments show that 1.5 × 10 5 training samples are sufficient for the training set.

B. DEEPSM
Both the maximum likelihood detection as well as the conventional DNN architecture assume the CSIĤ H H (0) to remain near constant during the data transmission period. In order to dispense with this idealized simplifying assumption, we now propose a novel DNN architecture, referred to as the DeepSM, which updates the estimated CSIĥ h hˆs 1 (t) of the active TA at the t-th TS for all t ∈ [1, T ] and operates in a data-driven fashion, as shown in Fig. 3. Similar to the DNN of Fig. 2, the proposed DeepSM technique replaces the conventional channel estimation and data detection operations relying on channel-models by the DNN of Fig. 3. A joint data-driven channel estimation and data detection technique can be designed with the aid of the multiple-layer fullyconnected DNNs, which updates the CSI at each TS.
More specifically, both the CSI matrixĤ H H (t − 1), which is obtained from the DeepSM in the (t − 1)-st TS, and the received data y y y(t) at the t-th TS are input to the DeepSM, resulting in a [2N t N r + 2 N r ]-node input layer. As shown in Fig. 3, the hidden layers of our DeepSM architecture are comprised of two subgroups. The upper subgroup comprising L 1 hidden layers is employed for updatingĥ h hˆs 1 (t) at the t-th TS andĥ h hm (t) =ĥ h hm (t−1) form(t) =ŝ 1 (t) ∈ M 1 , while the lower subgroup comprising L 2 hidden layers learns to detect the transmitted bitsû u u(t) at the t-th TS.
For DeepSM, the number of nodes employed in the hidden layers for both subgroups is fixed to Z 1 = Z 2 = 64, which is sufficiently high for attaining a superior BER performance for SM systems, while maintaining a moderate detection complexity. Additionally, L 1 = L 2 = 3 hidden layers are selected after the trial experiments.
The operations of DeepSM are summarized in Algorithm 1. More specifically, the DeepSM relies on the following steps. To start with, in the t = 1-st TS, the real and imaginary parts of each element inĤ H H (0) are input to the DNN via the 2N t N r -node input layer of the upper subgroup in DeepSM. In the mean time, the real and imaginary parts of the elements in the signal vector y y y(1) received during the t = 1-st TS are input to the lower subgroup of the DeepSM via the 2N r -node input layer of the lower subgroup. Following this, given the inputs ofĤ H H (0) and y y y(1), the lower subgroup of the proposed DeepSM is activated to obtain the outputû u u(1). In the meantime, the upper subgroup updates the CSIĥ h hˆs 1 (1) at the t = 1-st TS. Note thatŝ 1 (1) = V SSK [û u u (1)] is determined by the outputû u u(1) of the lower subgroup, which indicates the active TA index in the t = 1-st TS. The output of the proposed DNN architecture can be expressed aŝ h h hˆs 1 1 y y y(t) where W W W (1) l 1 , l 1 = 1, · · · , L 1 , represent the weights and biases of the l 1 -th hidden layer of the upper subgroup, respectively, whereas W W W (2) l 2 and b b b (2) l 2 , l 2 = 1, · · · , L 2 are the weights and biases of the l 2 -th hidden layer of the lower subgroup, respectively.
The network weights of DeepSM may be expressed as After the outputĥ h h s 1 (1) of the t = 1-st TS is obtained,ĥ h h s 1 (1) will be employed for updating the s 1 (1)-th column ofĤ H H (0), obtaining the updated CSI in the t = 1-st TS, which can be expressed aŝ Later in the t-th (t > 1) TS, the same operations as those in the t = 1-st TS are performed, obtainingĥ h h s 1 (t) andû u u(t).

V. PERFORMANCE RESULTS
In this section, we characterize the performance of the conventional DNN and of the proposed DeepSM architectures for SM channel estimation and data detection in terms of the associated loss function, BER performance and complexity, in Sections V-A to V-C, respectively.

A. LOSS FUNCTION
First, Fig. 4 shows the loss function vs. the number of epochs of the proposed DeepSM architecture employing L 1 = L 2 = 1, 2, 3 or 4 hidden layers, with each layer adopting Z 1 = Z 2 = 64 nodes, where a single epoch refers to a complete training session relying on a training data set. The SM system considered in Fig. 4 employs N t = 4 TAs and N r = 4 RAs using binary phase shift keying (BPSK) modulation and communicates over time-varying Rayleigh fading channels associated with α = 0.98. It can be observed from Fig. 4 that a single or a pair of hidden layers, i.e. L 1 = L 2 = 1 or 2 are insufficient for the proposed DNN architecture, whereas the employment of L 1 = L 2 = 4 layers may result in over-fitting, hence requiring slightly more epochs to achieve convergence of the loss function. Hence, L 1 = L 2 = 3 hidden layers are adopted for the remaining simulations.
Furthermore, we investigate the influence of the modulation order on the loss function ϕ 2 of the proposed DNN architecture for the channel estimation and data detection of the SM system for transmission over time-varying Rayleigh fading channels at SNR= 25 dB, where N t = 4, N r = 4, α = 0.99, 0.98 or 0.97, and BPSK or quadrature phase shift keying (QPSK) are employed. Here we adopt L 1 = L 2 = 3 hidden layers with each layer comprising Z 1 = Z 2 = 64 nodes. As shown in Fig. 5, a lower number of epochs are required for the convergence of the loss function, when BPSK is employed.
Additionally, observe in Fig. 5 that there is a rapid initial convergence leading to a plateau, followed by a second rapid  convergence phase. The reason lies in that the loss function of (18) is comprised of the sum of two MSEs, which corresponds to the outputĤ H H (t) andû u u(t), respectively.

B. BER PERFORMANCE
In this section, the BER performance of the proposed DeepSM channel estimation and data detection is investigated in Fig. 6 over time-invariant Rayleigh fading channels, while in Figs. 7 and 8 for transmission over time-varying Rayleigh fading channels 1 and in Fig. 9 over quantized time-varying Rayleigh fading channels, where low-resolution ADCs are employed.
To elaborate, Fig. 6 characterizes the BER performance of the conventional DNNs for the joint channel estimation and  data detection scheme of our SM system for transmission over time-invariant channels employing N r = 2 or 4 and N t = 2 or 4, which shows a similar performance to that of the conventional approach discussed in Section III. Hence, a direct application of the DNNs is capable of achieving near model-based channel estimation and data detection performance for the SM channel estimation and data detection over time-invariant channels.
However, since the conventional DNN architecture fails to update the CSI at each TS in realistic time-varying channels, it fails to facilitate reliable detection, hence resulting in a poor BER performance. By contrast, the proposed DeepSM outperforms the conventional DNN architecture in the high-SNR region, which is an explicit benefit of exploiting the CSI at each TS. This is shown in Figs. 7 and 8, where the uncoded SM system employs N t = 4 TAs and N r = 4 RAs for communicating over time-varying Rayleigh fading channels under imperfect CSI in conjunction with η = 0.2 and 0.1. Furthermore, by comparing Figs. 7 and 8, we can see the  influence of η. Naturally, having a higher percentage of pilot symbols in a frame results in an improved BER performance. Hence, for an uncoded system, where the target BER lies between 10 −2 to 10 −3 , the DeepSM advocated shows the best performance among the three approaches.
Furthermore, the residual BER can be mitigated by forward error correction (FEC) codes, as demonstrated in Figs. 7 and 8, where a half-rate low-density parity-check (LDPC) code is employed. We can see from both figures that for the LDPC-coded SM systems, DeepSM still achieves the best BER performance at high SNRs. Fig. 9 further characterizes the BER performance of both the conventional model-based and of the DeepSM aided joint channel estimation and data detection in the context of SM systems for transmission over fading channels, where 4-or 6-bit ADCs are employed by the receiver for quantizing the received signal. The SM system of Fig. 9 adopts the same parameters as those of Fig. 7. We can see that our DeepSM exhibits a lower error floor than the conventional channel estimation and data detection approach in the case of low-cost 4-bit or 6-bit ADCs. Additionally, Fig. 9 shows that the BER performance of the DeepSM scheme employing a 4-bit ADC approaches that of an ideal ADC.

C. COMPLEXITY
Now we characterize the complexity when employing the different channel estimation and data detection approaches conceived in this paper, as shown in Table 2.
In particular, the complexity order of the conventional maximum-likelihood approach, conventional DNN-aided approach as well as DeepSM are O(N t M 2 ), O(Z 2 ) and O(Z 2 1 + Z 2 2 ), respectively. Specifically, the complexity order of the maximum-likelihood approach is determined by the modulation order, whereas the complexity order of the DNN-aided approaches are determined by the size of the hidden layer in the DNNs. This indicates that when the DNN architecture is fixed, the modulation order will not influence the detection complexity of either conventional DNN or DeepSM much.
Additionally, we can see from Table 2 that the MATLAB runtime [36], [37] of the maximum-likelihood approach increases significantly when either N t or M 2 increases. By contrast, when DNN-aided approaches are employed, the MATLAB runtime can be greatly reduced and remains similar as the increase of modulation order N t or M 2 , since the DNN structure allows the parallel operations in MATLAB, significantly reducing the communication latency. Additionally, by comparing the two DNN-aided approaches, we can see that the DeepSM requires a higher running time due to the higher size of hidden layers, but achieves significantly better BER performance over the time-varying channels, as demonstrated in Section V-B.

VI. CONCLUSION AND FUTURE RESEARCH
We have first used the conventional DNN for joint channel estimation and signal detection in SM systems communicating over time-invariant channels. Then, a DeepSM structure relying on a pair of DNN subgroups has been proposed for channel estimation and data detection in SM systems communicating over time-varying fading channels. Our studies and simulation results have shown that the conventional DNN based detector is capable of achieving a similar BER performance to that of the model-based channel estimator and detector over idealized time-invariant channels. However, the proposed DeepSM outperforms both the model-based approach and the conventional DNN structure, even when communicating over time-varying and non-linear channels, since the proposed DeepSM estimates the CSI at each TS.
Our future work will investigate data-driven DNN-aided detection in multiuser communications in the scenarios of non-coherent as well as coherent mMTC and Internet of Things. Furthermore, reducing the gap to the capacity by iterative detection and decoding approaches is worth investigating.
LIE-LIANG YANG (Fellow, IEEE) received the B.Eng. degree in communications engineering from Shanghai TieDao University, Shanghai, China in 1988, and the M.Eng. and Ph.D. degrees in communications and electronics from Northern Jiaotong University, Beijing, China, in 1991 and 1997, respectively. From June 1997 to December 1997, he was a Visiting Scientist with the Institute of Radio Engineering and Electronics, Academy of Sciences, Czech Republic. Since December 1997, he has been with the University of Southampton, U.K., where he is currently a Professor of Wireless Communications with the School of Electronics and Computer Science. His research interests include wireless communications, wireless networks, and signal processing for wireless communications, as well as molecular communications and nanonetworks. He has published more than 390 research articles in journals and conference proceedings, authored or coauthored three books and also published several book chapters. He is a Fellow of the IET. He served as an Associate Editor to the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY and Journal of Communications and Networks (JCN). He is also an Associate Editor to IEEE ACCESS and a Subject Editor to the Electronics Letters. He was a Distinguished Lecturer of the IEEE VTS.
LAJOS HANZO (Fellow, IEEE) received the D.Sc. degree from the Technical University of Budapest, in 2009, and the Honorary Doctorate degree from the University of Edinburgh, in 2015. He is the Head of the Next Generation Wireless Group, University of Southampton. He is currently a Foreign Member of the Hungarian Academy of Sciences and the former Editor-in-Chief of the IEEE Press. He has served as Governor of both IEEE ComSoc and of VTS. He has published more than 1900 contributions at IEEE Xplore and 19 Wiley-IEEE Press books. He has helped the fast-track career of 119 Ph.D. degree students. More than 40 of them are Professors at various stages of their careers in academia and many of them are leading scientists in the wireless industry. He is a Fellow of REng, IET, and EURASIP. VOLUME 8, 2020