Reduced Complexity Learning-Assisted Joint Channel Estimation and Detection of Compressed Sensing-Aided Multi-Dimensional Index Modulation

Index Modulation (IM) is a flexible transmission scheme capable of striking a flexible performance, throughput, diversity and complexity trade-off. The concept of Multi-dimensional IM (MIM) has been developed to combine the benefits of IM in multiple dimensions, such as space and frequency. Furthermore, Compressed Sensing (CS) can be beneficially combined with IM in order to increase its throughput. However, having accurate Channel State Information (CSI) is essential for reliable MIM, which requires high pilot overhead. Hence, Joint Channel Estimation and Detection (JCED) is harnessed to reduce the pilot overhead and improve the detection performance at a modestly increased estimation complexity. We then circumvent this by proposing Deep Learning (DL) based JCED for CS aided MIM (CS-MIM) of significantly reducing the complexity, despite reducing the pilot overhead needed for Channel Estimation (CE). Furthermore, we conceive training-aided Soft-Decision (SD) detection. We first analyze the complexity of the conventional joint CE and SD detection followed by proposing our reduced-complexity learning-aided joint CE and SD detection. Our simulation results confirm a Deep Neural Network (DNN) is capable of near-capacity JCED of CS-MIM at a reduced pilot overhead and reduced complexity both for Hard-Decision (HD) and SD detection.


I. INTRODUCTION
Index Modulation (IM) constitutes a cost-and energy-efficient technique in the face of escalating throughput requirements [1], [2], [3].The concept of IM has evolved from the idea of space-shift keying proposed by Chau et al. [4] in 2001, which maps the information to the indices of the activated Transmit Antennas (TAs).Then, Spatial Modulation (SM) was proposed, which transmits the classic amplitudephase modulated symbols over the activated TA [5], [6].To eliminate the influence of Channel State Information (CSI), differential SM is proposed by Bian et al. [7].As a further advance, the concept of IM has been devised by harnessing the philosophy of SM in several single dimensions, which was finally further developed to activating multiple of these dimensions [3], [8].
To elaborate further, the SM is first applied to Orthogonal Frequency Division Multiplexing (OFDM) transmission to avoid inter-channel interference [9].As Subcarrier-IM combined with OFDM (SIM-OFDM) exploits the IM concept in the Frequency Domain (FD) [10], where extra information can be delivered by the index of the activated subcarriers.Then Wen et al. [11] also investigate the IM-aided OFDM (OFDM-IM), which split the whole available OFDM spectrum into groups and Iqbal et al. [12] extend OFDM-IM in Multiple-In-Multiple-Out (MIMO) scheme.Although Tsonev et al. [13] and Basar et al. [14] investigated enhanced OFDM-IM for increasing the spectral efficiency, the presence of inactive subcarriers resulted in throughput reduction compared to classical OFDM.Hence, Zhang et al. proposed a novel Compressed Sensing (CS) [15] aided SIM-OFDM [16] for exploiting the sparsity of subcarriers to improve the performance, despite also reducing the detection complexity [17].
As a further evolved arrangement, Space-Time Shift Keying (STSK) is a multi-functional MIMO technique in the IM family that utilizes both the Time Domain (TD) and Spatial Domain (SpD) to strike a flexible diversity vs multiplexing trade-off [18].The information bits in STSK are used for selecting one or several dispersion matrices from a set of Q dispersion matrices, which spread the signal over T time slots and M TA elements in the SpD.By the careful design of dispersion matrices, an improved Bit Error Ratio (BER), throughput and complexity design trade-off can be struck [19].
Multi-dimensional Index Modulation (MIM) was conceived by Shamasundar et al. [20].This scheme enhances the degrees of freedom in IM designs by exploiting its advantages across multiple domains without requiring additional hardware resources, such as extra Radio Frequency (RF) chains or increased transmission power.As a further development, Zhang et al. [16] proposed the concept of Compressed Sensing-aided Sparse Index Modulation-Orthogonal Frequency Division Multiplexing (CS-SIM-OFDM).Briefly, this scheme employs Compressed Sensing (CS) [15] to capitalize on the inherent sparsity of symbols in the Frequency Domain (FD), thereby improving the system's throughput [17].Furthermore, Lu et al. [21] proposed a method that combines CS techniques with STSK and OFDM-IM.This integrated approach seeks to garner the collective benefits of both STSK and OFDM-IM.Further refinements incorporating SM were discussed in their subsequent treatise [2].Additionally, Hemadeh et al. introduced a multi-functional layered SM paradigm in [3].This concept aims for maximizing the flexibility in dimension combinations, optimizing the trade-offs among performance, hardware costs, and power consumption.
Since the MIM conveys information in several dimensions, Maximum Likelihood (ML) detection is theoretically capable of detecting the multi-dimensional signal jointly, albeit at an escalating complexity upon increasing the degrees of freedom or dimensions [22].In [2], CS-aided MIM (CS-MIM) was proposed, where multiple detection stages were harnessed for recovering data from the CS, STSK and OFDM-IM domains, again, at an extremely high complexity.
On the other hand, coherent detection requires the knowledge of CSI, which is estimated by transmitting pilots to the receiver [23].Although SM exhibits energy savings by only employing a single RF chain, the pilot based Channel Estimator can only obtain the active TAs' CSI, hence it requires more time to estimate the whole MIMO channel.In [24], Faiz et al. proposed recursive least-squares-based adaptive channel estimator for SM under the assumption that the MIMO channel experienced block fading.Then, Wu et al. [25] investigated a novel Channel Estimation (CE) scheme by exploiting the channel correlation, which significantly reduced the pilot overhead.Acar et al. [26] employed a systematical pilot insertion method to estimate the SM-MIMO channel.However, the pilot overhead reduces the payload at a given bandwidth efficiency and the CE complexity increases with the number of antennas [27].
Historically speaking, Abuthinien et al. [28] proposed a semi-blind ML-based Joint Channel Estimation and data Detection (JCED) scheme for MIMO systems at a minimum pilot overhead.Furthermore, Chen et al. [29] designed an iterative JCED for STSK systems, which imposes reduced complexity, while maintaining a high throughput and attaining near-optimal BER performance.Similarly, Sugiura et al. [30] applied JCED in a SM scheme, while Acar et al. [31] proposed a similar iterative JCED for coded SM-OFDM.
The complexity of ML sequence detection in Gaussian channels may be deemed practical for one-dimensional IM, but the complexity of MIM detection increases significantly.
It is therefore of interest to explore sub-optimal receivers that can approach the performance of the ML detector at a reduced complexity, such as the Expectation-Maximization algorithm investigated by Cozzo et al. [32], while employing a reduced number of pilot symbols.
Deep Learning (DL) has been attracting increasing attention in wireless communications [38], [39].For instance, a Deep Neural Network (DNN) was used for detecting MIMO signals in [40], [41].In [42], a DNN was employed for estimating the channel of OFDM systems and data detection.Recently, Qing et al. [43] proposed an effective CE and detection scheme relying on a sophisticated neural network models for achieving a similar or possible even better detection performance than the conventional Minimum Mean Square Error (MMSE) arrangement.In [35], Satyanarayana et al. proposed a DNN-aided semi-blind detector for drastically reducing the pilot overhead needed for CE of a multi-set STSK scheme, which was also extended to Soft-Decisions (SD) in [36].Additionally, both CE and detection have been performed using neural network in [37], where Xiang et al. proposed a DNNbased iterative JCED for SM systems.
However, the JCED techniques presented in the literature have been designed for single-dimensional IM systems, Furthermore, there is a paucity of research exploiting DL-based SD detection.Hence, in order to narrow the knowledge gap, we design a DNN-based JCED for CS-MIM systems, which can harness both Hard-Decision (HD) as well as SD detection.
Table 1 boldly contrasts the novelty of this paper to the literature.Against the above background, the detailed contributions of this paper are summarized as follows: r We propose a reduced complexity JCED for HD CS- MIM, employing a data driven DNN.The proposed learning aided JCED method is capable of attaining near-ML performance at a low pilot overhead and complexity.
r We then further extend this DNN-aided JCED CS-MIM scheme for producing soft information, where we combine our system with channel coding in order to attain an improved BER performance.pilot overhead, while also approaching the performance of conventional JCED, despite its reduced complexity and pilot overhead.The rest of the paper is organized as follows.In Section II, the system model of CS-MIM is presented.In Section III, we design JCED techniques for our CS-MIM system relying on our proposed learning-aided detector along with its complexity analysis.Finally, in Sections IV and V, we analyze the results and conclude, respectively.

II. SYSTEM MODEL
In this section, we introduce the transceiver model of the CS-MIM system employing N t TAs and N r Receiver Antennas (RAs).Fig. 1 shows the block diagram of the CS-MIM system considered, where an OFDM symbol has N c subcarriers, which are then equally divided into G groups.Each group has N f = N c /G subcarriers in the FD 1 , while N v subcarriers of each group are applied for the CS-MIM system in the Virtual Domain (VD). 2 The FD signal is attained by 1 FD is the OFDM symbol domain after CS processing, as shown in Fig. 1. 2 VD is the actual domain, where subcarrier index modulation is applied before the CS process as shown in Fig. 1.This concept was firstly introduced in [16] to illustrate the CS techniques in IM system to improve the spectral efficiency.
compressing the VD signal using CS as detailed in [2], where N f is set lower than N v to increase the throughput.The CS-aided OFDM symbols will then be transmitted from the activated TAs decided by the antenna selector of Fig. 1.Then, after transmission over the wireless channel, the receiver estimates the channel and detects the signal.In the following, we present the details of the processing stages at the transmitter and the receiver.

A. TRANSMITTER
As shown in Fig. 1, b bits are split into G groups, where b g bits, (g = 1, 2, 3. ..G) of each group are split into three parts by the block splitter: b g,1 bits for SM, b g,2 bits for frequency index modulation, b g,3 bits for STSK to form the space-time symbols.In the following we detail the different blocks of the CS-MIM transmitter in Fig. 1.

1) SUBCARRIER INDEX SELECTION
The bit sequence b g,2 is applied in the subcarrier index selector to activate a subcarrier in each group, as shown in Fig. 1.Only K subcarriers are activated out of the N v available subcarriers and the other subcarriers remain unused.In the following we consider an example to illustrate the subcarrier selection procedure, where we consider the example of K = 2 active subcarrier out of the N v = 4 available subcarriers in each group.This results in 4 possible subcarrier index combinations in total.Table 2 shows an example of the subcarrier selection, where K 1 , K 2 represents the active subcarriers and 0 represents the inactive subcarriers.Explicitly, when the input bits sequence is b g,2 = [00], the first and second subcarriers are activated, as shown in Table 2. Then the selected active subcarrier combination is populated by K space-time symbols, where the STSK codewords are fenerated by the STSK scheme for each group.

2) STSK ENCODING
The bit sequences b g,3 of size K log 2 (QL) are fed into the STSK encoder of Fig. 1 to output K STSK codewords {X i , . . ., X i , . . ., X K }, where the dispersion matrix spreads the information both over M TAs and over T time slots in each subcarrier and each space-time codeword X [i] ∈ C M×T is generated by spreading a conventional L-ary constellation symbol by a specific dispersion matrix selected from Q available dispersion matrices.The STSK encoder is characterized by the parameters (M, N, T, Q, L), where M, N, T represent the number of TAs, RAs and time slots, while Q, L are the number of dispersion matrices and of L-ary constellation symbols.Then, the space-time symbol S is generated by mapping the K generated STSK codewords to the K selected active subcarriers decided by the subcarrier index selector, while other subcarriers remain inactive and are set to zero.Considering b g,2 = [00] in the example shown in Table 2, in this case, we assume STSK(M, N, T, Q, L) = (2, 2, 2, 2), which have codewords {X 1 , . . ., X 2 , X 3 , X 4 }.Then, given the assumption that b g,3 = [0 0 0 1], the space-time symbols become S = [X 1 , X 2 , 0, 0], where we assume that X 1 and X 2 are the STSK codewords generated based on the bit sequence b g,3 .

3) SPACE-TIME SYMBOL FORMATION AND APPLICATION OF COMPRESSED SENSING
The G groups of space-time symbols S are assembled by the block creator of Fig. 1 to form a long space-time frame, which is processed by the space-time mapper to output a symbol for transmission over multiple TAs and time-slots, as shown in Fig. 1.Equivalently, the space-time symbols S of each subcarrier group are mapped to M TAs during T time slots, which have MT symbol sequences {s 1,1 , . .., s M,T } for spreading the M TA's signals during T time slots.These symbol sequences {s 1,1 , . .., s M,T } are then compressed by a CS measurement matrix FD .The FD vector s FD m after CS is then mapped to the OFDM subcarriers, which can be written as: s FD m,t = As m,t .Similar to conventional OFDM, the FD symbol per time slot will be transformed into TD symbols to be transmitted by their corresponding TAs and then a Cyclic Prefix (CP) will be added.

4) ANTENNA SELECTION
After Inverse Fast Fourier Transform (IFFT) and CP addition, the TD symbols are transmitted by the activated TAs specified by the antenna selector of Fig. 1.Explicitly, b g,1 bits are conveyed by the antenna selector of Fig. 1, which selects M antennas from the N t available TAs, where we have N AC antenna combinations in total.To avoid the correlation caused by sharing the same TA elements among different antenna combinations, the Distinct Antenna Combination scheme of [44] is used to decide upon the index To elaborate further, let us consider an example using M = 2, N t = 4 and N AC = 2.As shown in Table 3, when the input bit is b 1 = [0], then the first and second TAs will be activated to transmit the modulated symbols in a specific subcarrier block, while the other two TAs remain inactive.Similarly, if the incoming bit sequence is b 1 = [1], then the third and fourth TAs will be selected to transmit the symbols.More specifically, for b g,2 = [0 0] and b g,3 = [0 0] along with STSK (2, 2, 2, 2, 2) we can have the space-time block formulated as S = [X 1 , 0, 0, 0] and after CS the FD S FD may be expressed as Then, we assume 4 TAs for transmission and b g,1 = [1].As shown in Table 3, we can have the CS-MIM modulated symbol S formulated as

B. RECEIVER STRUCTURE
We consider a receiver employing N R RAs.The signal arriving from the transmitter is assumed to be transmitted over a frequency-selective Rayleigh fading channel and the CSI is acquired by CE, as discussed in Section III-C.The CP is removed and then the received signal is transformed to the FD signals by using the Fast Fourier Transform (FFT), as shown in Fig. 2. The space-time demapper collects the FD symbols received from N r RAs over T time slots to recover the space-time symbols, which are then split into G groups by the Block Splitter of Fig. 2. Afterwards, the symbols received by each subcarrier group are represented as characterizing the ST structure per group and the space-time symbol received at the α-th subcarrier of each subcarrier group, respectively.
Let the FD channel matrix be represented as where S represents the modulated signal after SM at transmitter and S FD [α] ∈ C M×T denotes the space-time symbols at α subcarriers transmitted from M TAs over T time slots and W [α] ∈ C N r ×T represents the Additive white Gaussian noise (AWGN) obeying the distribution of CN(0, σ 2 N ), and σ 2

N
is the noise variance.Furthermore, I AC ∈ C N t ×M denotes the (N t × M )-element sub-matrix, which describes the selection pattern of active TAs for each subcarrier group at the transmitter.For high-integrity detection, accurate channel information is required, which is attained by employing CE techniques relying on known pilots in practical model-based solutions.
In the next section, we will discuss CE techniques suitable for CS-MIM and characterize the JCED method.

III. CHANNEL ESTIMATION AND DETECTION FOR CS-MIM
Given the received signal Y of (3), the receiver infers the information bits of the STSK codewords, the bits embedded into the activated the subcarrier indices and the bits mapped to the active TAs.This detection process requires the channel state information, which can be acquired by channel estimation.In the following, we consider both separate channel estimation and detection and JCED, where we propose a deep learning aided JCED technique capable of reducing both the complexity as well as the pilot overhead without substantially eroding the performance.
The signal received at the α-th subcarrier during a time slot, can be represented as where h α r,t is the CSI between the r-th RA and the t-th TA for the α-th subcarrier for subcarrier group g.Additionally, S[α] FD can be extended as {S 1 α , . . ., S M α } for a single time slot.Then, the channel matrix H corresponding to N f ST signals of each subcarrier group can be expressed in a diagonal structure of size (N r N f × N t N f ) as where H α (α = 1, 2. .., N f ) represents the corresponding CSI at the α-th subcarrier.
Similarly, the antenna selection pattern matrix associated with N f subcarriers of each group ĪAC ∈ C N t N f ×MN f has the structure of The received signal Y contains N f space-time symbols at N f subcarriers in the FD of each subcarrier group.Given the received signal model The FD space-time signal can be represented as where Ā ∈ C MN f ×MN v is the equivalent measurement matrix A used for compressing the VD vector and S ∈ C MN v ×T denotes the VD space-time symbol.Then, S can be expanded as S = I SI X , where X ∈ C MK×T represents K STSK codewords and I SI ∈ C MN v ×MK is the subcarrier index selection pattern.Hence, (7) can be rewritten as: In the following, we first present the conventional CE and HD detection for the CS-MIM system considered, followed by the conventional JCED.Then, we introduce both the conventional SD detection and the SD-JCED scheme of the CS-MIM system.Afterwards, we present our proposed NN aided HD-JCED, where the neural network replaces the exhaustive search with a learned classification model in order to significantly reduce the computational complexity, followed by the neural network aided SD-JCED.

A. CONVENTIONAL CHANNEL ESTIMATION AND DETECTION
In this section we present the conventional channel estimation and detection designed for the MIM system, followed by the JCED to output both HD as well as SD values.

1) CHANNEL ESTIMATION
As shown in Fig. 2, we use the CE scheme for acquiring the CSI used for detection.Conventional pilot based CE, which inserts pilots in each symbol may become inefficient in this context due to randomly activating both the subcarriers and TAs [26].We circumvent this problem by constructing a dedicated pilot frame for estimating the CSI by the channel estimator for our CS-MIM receiver, as shown in Fig. 2.This mitigates the challenge of randomness caused by the TA index selection.The pilot frame has the same size as the information frame, where only a single TA is activated for each subcarrier group.In this case, the number of subcarrier groups G is higher than or equal to that of the TAs N t .Furthermore, each of the N t TAs can be activated more than once in each frame.
Then the CSI of every single TA and subcarrier group can be estimated by the channel estimator.Afterwards, we can obtain the estimated CSI matrix Ĥ of the equivalent subcarrier group by linear interpolation techniques [23].Fig. 3 shows the flow chart of the conventional CE and detection.Firstly, the pilot symbol Y p is input to the channel estimator.Then, with the aid of the appropriate CE method, the estimated CSI Ĥ may be acquired by the detector and then used for recovering the information bits.Let us model the received space-time pilot symbol based on (7) as where the space-time pilot symbol is Sp = diag{ S p,1 , S p,2 , . . ., Sp,M }.Then the Least Squared CE (LSCE) is given by In this case, we can calculate the complexity of LSCE, as shown in (11) where R H represents the channel's correlation matrix [30].
The MMSE-CE requires the calculation of R H and CSI matrix inversion.Then, we can characterize the computational complexity as To track the channel, piecewise linear interpolation is used for acquiring the CSI, which can be formulated as: where Ĥn p and H n are the estimated CSI matrix at the pilot symbol position and D denotes the pilot insertion spacing.

2) MAXIMUM LIKELIHOOD DETECTION
The ML detector makes a joint decision on the TA index of the STSK codewords and of the subcarrier using an exhaustive search, which can be formulated as where γ , β and φ represent the estimates of the activated TAs index, the activated subcarrier index and the index of K STSK codewords in each subcarrier group, respectively [2].At the receiver, the ML detector carriers out a full search for evaluating all possible candidates, which has a complexity order of O[N AC N SI (QL) K ] per subcarrier group.Then, the total computational complexity of the ML detector relying on perfect CSI can be expressed as With the aid of the LSCE/MMSE-CE relying on ML, we can have the total complexity order of CE-aided ML detection formulated as O LSCE/MMSE−CE + O ML .

3) JOINT CHANNEL ESTIMATION AND DETECTION
To further improve the detection performance, data detection based iterative JCED is considered.Fig. 4 shows the flow chart of the JCED, which starts using the same procedure as the conventional CE, where the estimated CSI is acquired by the channel estimator with the aid of pilot symbols.Afterwards, we recover the bits from the received signal and the estimated CSI.Then, the remodulated symbols created from the recovered bits are used for updating the CSI.'This process is then repeated for several iterations to improve the estimated CSI accuracy.By exploiting the remodulated symbols, the JCED can increase the CE accuracy and hence increase the detection performance without increasing the pilot overhead.Based on [29] and [30], the JCED of CS-MIM is described by Algorithm 1.
As illustrated in Algorithm 1, there are two thresholds, which are used for terminate the update loop.First, we set a maximum number of iterations, I max , which progressively enhances the CE and detection performance.This allows for an adjustable algorithmic complexity based on the number of iterations.The second approach introduces a termination constant, β, which controls the accuracy of the CE.Based on the theoretical results of the MMSE-CE-aided and ML-based detection, we can determine the MSE gap between conventional CE-based detection and ML detection, assuming perfect CSI.Consequently, the constant β can be selected within this gap and should be sufficiently low.In this scenario, the algorithm's complexity solely hinges on the CSI condition, which can be unpredictable.In general, a suitable termination threshold is chosen to strike an appropriate performance vs. complexity trade-off.Alternatively, both threshold may be harnessed for maximizing the algorithm's efficiency.In this case, we can represent the complexity order of the HD-JCED as , if the number of iterations is smaller than I max .In a nutshell, the total complexity oder can be expressed as

B. SOFT DECISION DETECTION
SD detection is employed for attaining near-capacity performance when combined with channel coding by exchanging soft values between the MIMO detector and the channel decoder.However, the complexity of the optimal maximum a posteriori probability MIMO detector rapidly becomes prohibitive upon increasing the modulation order and the number of TAs [45].In the following, we will present the conventional SD detector of CS-MIM, followed by our LS/MMSE-CE based SD-JCED aided CS-MIM system.

1) CONVENTIONAL SOFT DECISION DETECTION
A channel coded CS-MIM scheme is shown in Fig. 5, which was proposed in [2] for achieving near-capacity performance.The information bit sequence b is encoded by a Recursive Systematic Convolutional (RSC) encoder.Then, the coded bit sequence c is interleaved to generate the interleaved stream u, which is entered into the CS-MIM modulator of Fig. 1.
At the receiver side of Fig. 5, the pilot data is processed first for estimating the channel, where the estimated channel H is entered into the soft CS-MIM receiver that outputs Log-Likelihood Ratio (LLRs).The LLRs output from the demodulator are then passed to the de-inteleaver and the RSC decoder performs soft decoding.In Fig. 5, L(•) represents the LLRs of the bit sequences, where L e (u) is the extrinsic LLR output after soft demodulation and L a (c) is the de-interleaved LLR sequence of L e (u), which constitutes the a priori information for the RSC decoder.
The LLR of a bit is defined as the ratio of probabilities associated with the logical bits '1' and '0', which can be written as L(b) = log p(b=1)  p(b=0) .The conditional probability p(Y |X γ ,β,ϕ ) of receiving the signal Y of a subcarrier group defined in (3) is given by [46] where X γ ,β,ϕ represents the STSK codewords at the β-th realization of active subcarriers, which are transmitted through the φ-th realization of an active TA.Furthermore, N 0 is the noise power, where we have σ 2 n = N 0 /2 with N 0 /2 representing the double-sided noise power spectral density.
Hence, we can formulate the LLR of bit u i as where X l 1 and X l 0 represent a subset of the legitimate equivalent signal X corresponding to bit u l , when u l = 1 and u l = 0, respectively, yielding Upon using ( 15) and ( 16) we obtain the LLR L(b i ) of the bit sequence conveyed by the received signal Y.To simplify the LLR calculation, the Approximate Log-MAP (Approx-Log-MAP) algorithm based on the Jacobian Maximum operation [47] is used [48], which is given by (17) where jac(.)denotes the Jacobian maximum operation and the intrinsic metric of λ γ ,β,ϕ is At the receiver, the soft demodulator evaluates the probability of each bit being logical '1' and '0'.Then it applies the approx-log-MAP algorithm for obtaining the extrinsic LLRs of the coded bits, which has a complexity order of O[2 (c g ) (N AC N SI (QL) K )], where c g represents the numbers of coded bits after the RSC encoder and interleaver.Then, we can have total complex-

2) SOFT DECISION JOINT CHANNEL ESTIMATION AND DETECTION
Then we can also apply the same JCED algorithm for SD CS-MIM and the resultant procedure is described in Algorithm 2. Similarly, we can represent the complexity of the SD-JCED as However, both HD and SD JCED impose excessive complexity upon updating the CSI of each symbol.In the following, we propose DNN-based MIM detectors for reducing the complexity.

C. PROPOSED LEARNING BASED CHANNEL ESTIMATION AND DETECTION
In this section, we first introduce the DNN-aided HD detection of CS-MIM.Then, we propose an iteratively updated DNN model for JCED of CS-MIM.Afterwards, we extend the proposed DNN-based JCED model to SD CS-MIM systems.

1) CONVENTIONAL DNN-AIDED CE AND HD DETECTION
The DNN architecture of Fig. 6 can be harnessed for replacing the conventional HD data detector of Section III-A2).As shown in Fig. 6, the pilot symbols Y p and the received symbols Y constitute the inputs of the L-layer fully-connected network.The channel Ĥ is estimated from the pilot symbols Y p by the DNN model during the training phase.Then the output bits û can be obtained using the estimated channel and the received signal, yielding the output of where W n and θ n , n = 1, . . ., L represent the weights and biases, respectively.A Long Short-Term Memory (LSTM) layer is employed as the initial layer to capture the nonlinear relationships between the transmitted signals and the CSI.The LSTM layer can be mathematically represented as where C k is commonly referred to as the cell state [49], which represents the information flow over time.Additionally, x k and z k denote the input and output at the k-th symbol instant, respectively.The term z k−1 represents the output at the k − 1-st instant, and φ k−1 denotes the LSTM layer's parameters.These parameters are stored in the cell state for subsequent iterations and are shared across them.Then in (19), the Rectified linear unit (Relu) function of f Relu (s) = max(0, s) is employed for activating the DNN during the training phase, and the sigmoid function of f sigmoid (s) = 1 1+e −s is used to obtain the detected bits û.
Furthermore, the complexity of the Neural Network (NN) is governed by the operations involved in forward and backward propagation between each neuron.Generally, the complexity order of an NN can be expressed as where n i and n o represent the sizes of the input and output layers, respectively, and n l (l = 1, 2, . . ., L) denotes the numbers of the hidden layers between them.The equation of the sigmoid layer is formulated as f sigmoid (s) = 1 1+e −s , which has the evaluation complexity order of O [1] and the LSTM has the complexity order of O[n l (n d + n l )], where n d is the neural dimension of the input layer of the LSTM.Then we have the total computational complexity of Fig.
. The raw input data represented in the complex-valued matrix form obtained from the received signal Y has to be vectorized first.We rearrange the complex values by separately extracting the real as well as imaginary parts and then merging them into a real-valued vector.In the training phase, we employ randomly generated data, which are transmitted over a frequency selective Rayleigh fading channel using MIM.Then, both the received pilot and data symbols are employed as the input data of the DNN.In this case, we use a high pilot overhead for simulating a high-performance CE scenario.To maximize the performance of the trained learning-based CE and detection, different pilot overheads are applied for considering sufficiently diverse scenarios.The number of training samples required is selected based on experimentation by gradually increasing the training size until acceptable MSE values are achieved.In this case, the MSE loss function of the DNN used for the training is where B is the sample size of the current iteration.A stopping criterion can be defined either by the number of iterations or by an MSE threshold.Then, the parameter sets {W n , θ n } can be updated in each training iteration based on our learning algorithm using gradient descent, which is formulated as where α > 0 is the learning rate and ∇L({W n , θ n }) represents the gradient of L({W n , θ n }).In our proposed NN aided detection, we use α = 0.001.After the training phase, the DNN model learns the mapping from the received signal and stores both the weight as well as the bias information, which will be used for producing the desired outputs based on the input data in the testing phase.The statistical properties of the input/output data have to remain the same as those used in training.

2) SEPARATE DNN-AIDED CE AND DETECTION
To further reduce the effect of CE error, we propose the twopart DNN models of Figs. 7 and 8 for CE and detection, respectively.Firstly, the fully connected NN of Fig. 7 is used for estimating the channel using the current received symbol Y τ−1 and next received a symbol Y τ as input and then it outputs the estimated CSI Ĥτ , where Ĥτ = {H  In this case, the first received symbol is In this case, we can have the complexity of CE NN as The training process optimizes the network weights θ by minimizing the loss function based on the MSE between the estimated CSI ĤS τ of antenna S and current real CSI H S τ .In this case, the MSE loss function used for the training is where B is the sample size of the current iteration.Fig. 8 shows the DNN employed for detection, which is performed after completing the CSI estimation using the DNN of Fig. 7.The output of the first DNN model of Fig. 7, which is the CSI H s τ of a specific activated TA s, and the received symbol Y τ are used as input for the NN of harnessed for signal detection.The output of the DNN of Fig. 8 corresponds to the output bits û, which is formulated as: Afterwards, the total complexity of two NN is O CE + O detection , where O detection have the same form with the conventional NN.
In this case, the MSE loss function used for the training is where B is the sample size of the current iteration.More specifically, both the estimated CSI obtained by the DNN model of the previous symbol and the current received data are entered into the model, which requires an input layer having [2N t N r N f + 2N r N f ]-nodes.As shown in Fig. 9, the proposed DNN model can be split into two subgroups.The first subgroup utilizes the information of the received data and the estimated CSI to update the estimated CSI of the next symbol, while the second subgroup detects the transmitted bits of the current symbol.The proposed DNN-JCED procedure is described in Algorithm 3.
For HD-JCED, we consider the subgroup of the detection as a multi-label classification problem, where both the preprocessed symbols and the estimated CSI are input to a NN, which outputs the corresponding classification based candidates of each bits.For the upper subgroup of Fig. 9, the DNN will update the CSI using the trained weights of each layer.
Then, sigmoid activation is used for the output layer of the proposed subgroup DNN to generate dependent probabilities at the output layer of our classification problem.Hence, the output of the DNN model can be expressed as where W 1 n 1 and b 1 n 1 , n 1 = 1, . . ., N 1 , represent the weights and biases of the subgroup layers used for updating the channel estimate, while W 2 n 2 and the bias b 2 n 2 , n 2 = 1, . . ., N 2 , are the weights and biases of the layers employed for detecting the information bits.Then, we have the weight sets of As the number of the first layer nodes depends on the input data size, the appropriate number of nodes should be selected for the hidden layers which is sufficiently high for attaining an enhanced BER performance, at reduced detection complexity.In this case, we designed 3 hidden layers having 64 nodes used for both subgroups.
In the training phase, we use randomly generated data, transmitted over the wireless channel using MIM as the input data and perfect CSI for training the model weights θ 1 and θ 2 .In this case, the MSE loss function used for the training is where u represents the target labels, û denotes the detected bits and B is the sample size of the current iteration.Using ( 19) and ( 27), we can obtain the loss function of this DNN model as We can define a stopping criterion, which can be either the number of iterations or an MSE threshold.Then, the parameter sets {W n , θ n } can be updated in each training iteration based on the learning algorithm using gradient descent, which is formulated as where α > 0 is the learning rate and ∇L({W n , θ n }) represents the gradient of L({W n , θ n }).In our proposed NN aided detection, we use α = 0.001.
Then, during the training phase, the model learns the mapping from the received signal and stores both the weight and bias information, followed by outputting the predicted results that are expected to approximate the desired input data having similar statistical properties to those of the training.
In this model, the pair of inputs exhibit independent input connection complexity, which is characterized by O[n i1 n 1 + n i2 n 1 ].The complexity of the hidden layers and of the output layer is identical to that of the conventional NN.More specifically, we can have the computational complexity of For our SD-JCED system, we also consider a similar DNN architecture to that of [36], but we have a different output for the model.Since the conventional SD detector will obtain the LLRs of received signal after the CS-MIM soft demodulator, we replace the detected bits û by the extrinsic LLR L e at the output.Then the output of the SD DNN model can be expressed as and the corresponding loss function is

IV. SIMULATION RESULTS AND ANALYSIS
In this section, we characterize the learning-aided CS-MIM system proposed in Section III relying on both HD and SD.The performance of the conventional detector will also be presented for comparison with the proposed methods.We also consider systems having N t = 4, 8 with 2 RF chains.More specifically, only the bits for antenna selection b g,3 is changeable.Furthermore, we also investigate the performance of the proposed methods in different channel conditions.To characterize the channel conditions, we adjust the normalized maximum Doppler frequency f m in order to emulate both slow-and fast-fading channels.We assume that the system's    of CS-MIM in Discrete-Input Continuous-Output Memoryless Channels (DCMC) for both the neural network model and conventional CE methods.For CS-MIM system with 8 TAs, both at transmitter and receiver, the Scheme 1a) achieves about 1.95 dB at the BER of 10 −4 under the assumption of perfect CSI knowledge at the receiver.In this case, we can achieve highest throughput as shown in Fig. 10 which is R t = 1.333 bits/sec/Hz.However, in more realistic situation, pilot required to deploy CE techniques and cause pilot overhead.Generally, in the simulation, pilot symbols are designed and applied.Then, 1% pilot overhead indicates that every 100 symbols require 1 pilot symbol.As shown in Figs. 10  and 11, Scheme 1b) is capable of achieving an improved performance, but at an increased pilot overhead.Scheme 1b) associated with 10% pilot overhead is capable of achieving similar results to those of Scheme 1c).Furthermore, Scheme 1b) associated with 2% overhead and 5% overhead exhibit a 4 dB and 1.7 dB discrepance with respect to the ideal Scheme 1c) at a BER of 10 −4 , respectively.When Scheme 1c) of JCED is applied at the receiver, it can significantly reduce the pilot overhead and yet obtain a near-ML performance.More specifically, we consider the JCED under 3 iteration updating and achieve BER of 10 −4 only 0.1 dB SNR worse than Scheme 1a) of ML detector with very few pilot.
We also analysis the detection performance of the system with 4 TAs.Fig. 12 also shows the DCMC of Scheme 1 and Scheme 2. With less antennas, firstly, the performance of CS-MIM is reduced due to reduction of space sparsity.Scheme 1a) having N t = 4 TAs achieves a BER of 10 −4 at 4.3 dB SNR, as shown in Fig. 13.Similarly, along with N T = 8 TAs Scheme 1b) also requires a higher pilot overhead for achieving a high performance.In conjunction with a 2% pilot overhead Scheme 1b) is about 5 dB worse than Scheme 1a) and their gap is reduced to 1.6 dB for 5% pilot overhead.However, Scheme 1c) still succeeds in achieving near-capacity performance, as shown in Fig. 13.
As we discussed in Section III, the ML detector applies an exhaustive search having complexity order of  3 In this case, we assume that the network have I layers and each layer have nerual size of n h i (i = 1, 2, 3, . . ., I ) and n i , n o represents the neural size of input and output layers.Although the DNN-based JCED model require at least 3 iteration, which means 3 times complexity than conventional DNN model, to achieve near-ML performance, it is several magnitude less of complexity compared to the conventional JCED either CE-ML detection method.
We also conducted simulations for two variants of Scheme 2. Leveraging a high pilot overhead based estimated CSI from Scheme 1b, the model can be efficiently trained to achieve improved detection performance, even with a reduced pilot overhead in challenging channel conditions.As depicted in Fig. 10, Scheme 2a exhibits a performance that is approximately 2 dB inferior to Scheme 1a.By employing the iteratively updated CE model, Scheme 2b further minimizes the estimated CE error, resulting in a mere 0.9 dB loss at a BER of 10 −4 .Notably, Scheme 2b achieves a nearly 1 dB improvement over Scheme 2a at a computational complexity of roughly 3 × 10 4 .This increase in complexity may be deemed acceptable, especially when compared to the complexity of Scheme 1b (1.2 × 10 6 ) and to that of Scheme 1a (8.5 × 10 6 ) over three iterations.We also investigate the system associated with N t = 4 TAs.Then the performance of Scheme 2 is slightly degraded owing to is eroded diversity gain.Scheme 2b) attains a BER of 10 −4 at SNR of 5.1 dB, while the conventional CE-aided DNN Scheme 2a) performs 0.8 dB worse than Scheme 2b).
Additionally, we compare the performance for varying Doppler frequency values.Specifically, we modulate the normalized Doppler frequency f m to emulate channels ranging  from slow to fast variations.In Fig. 14 we consider a channel with normalised Dople frequency of f m = 2 × 10 −6 , while we used f m = 10 −6 in the results in Fig. 11.Scheme 1a maintains consistent results as observed in Fig. 10, while Scheme 1b with a 10% overhead incurs a 0.7 dB loss at a BER of 10 −4 , compared to the 0.1 dB difference in Fig. 11.In this context, Scheme 1c with a 5% overhead demonstrates superior CE accuracy compared to Scheme 1b, necessitating a higher overhead to achieve near-optimal performance relative to Scheme 1c in Fig. 11.Similarly, despite the increased overhead in Scheme 2a and Scheme 2b aiming for enhanced CE accuracy and detection performance, they exhibit losses of 0.3 dB and 1.3 dB, respectively, at a BER of 10 −4 when compared to their counterparts in Fig. 11.This suggests that Scheme 1c and Scheme 2b offer some resilience against rapidly varying channels.As illustrated in Fig. 15 with f m = 10 −5 , Scheme 2a is 2.6 dB inferior to Scheme 1b in Fig. 11, while Scheme 2b lags by 1.2 dB compared to Scheme 1c in Fig. 11.
Let us now consider the performance of SD detection, where we employ a half-rate RSC encoder as shown in Table 4. Then we can the maximum achievable rate is R t = 0.66667 bits/sec/Hz for system which N t = 8 and   R t = 0.61111 bits/sec/Hz for system which N t = 4.As shown in Fig. 18, Scheme 3a) could achieve a BER of 10 −4 at −1.83 dB with perfect CSI acquired at receiver.In practical situation, CE is required with highly pilot overhead.Naturally,the Scheme 3c) of JCED detection could achieve near-ML performance which achieve −1.8 dB SNR at 10 −4 of BER with few pilot and moderate complexity mounting.For N t = 4, the JECD could achieve less difference with ML detector which is 0.14 dB worse than the ML detector.For NN-based CE and detection, the conventional model Scheme 4a) leads to about 2 dB gap of 10 −4 BER compared with Scheme 3b) and Scheme 3c).With the assist of DNN-based JCED, we can narrow the gap to 1 dB with 3 iteration updating.For system with N t = 4, the performance of DNN-JCED is more effective with the significant reduce in TA and RA number.
In Figs.16 and 17, we compare the performance of the HD and SD.In Fig. 16 the benefit of the SD is clearly visible, because it provides a sharp BER reduction at an SBR of about −3 dB, while Scheme 3a) requires -2.2 dB SNR at at a BER of 10 −4 .Both Scheme 1b) and Scheme 1c) are capable of achieving near-optimal results.For learning-based CE and detection, Scheme 3a) and Scheme 3b) perform slightly worse than the conventional Scheme 1, while the SD scheme attains a 3.1 dB and 2.9 dB gain compared to the HD-aided Scheme 2a) and Scheme 2b).As expected, the performance improvement of SD is worse for N t = 4 than for N t = 8, as shown in Fig. 17, where the gap between Scheme 1a) and Scheme 3a) is about 2.4 dB at a BER of 10 −4 , compared to a discrepancy of 3.6 dB in Fig. 16.
Fig. 18 also characterizes the learning aided JCED-SD detection methods applied to our CS-MIM system.The NN based JCED method is about 0.3 dB worse than ML detector with the perfect CSI acquired at receiver.With more antenna for transmitting, the performance is slightly degrading.However, the complexity of DNN based JCED is far small than conventional JCED method with N t = 8 system.For higher number of iterations update, the NN model will have an improved performance.However, the proposed learning method has a complexity order of O[2n i1 n 1 + ) N AC N SI (QL) K + c g 2 c g N AC N SI (QL) K ] for the conventional scheme, where I it denotes the number of iterations.

V. CONCLUSION
Both conventional and learning-assisted JCED of CS-MIM was proposed relying on HD and SD.Our analysis shows that JCED was the potential of reducing the pilot overhead and yet improve the detection performance compared to the separate CE and detection.In simulation, we have first used the conventional HD JCED of CS-MIM systems communicating over Rayleigh fading channels and the learning-aided JCED is capable of achieving similar performance while decrease the complexity of JCED.Then, a DNN model with subgroups has been designed for SD JCED in CS-MIM systems, which are capable of approaching the performance of conventional SD CS-MIM system with reduced computational complexity.In summary, our studies and simulation results have shown that the conventional JCED is capable of achieving a similar BER performance to the ML detector with idealized CSI.

FIGURE 6 .
FIGURE 6. Fully-connected DNN model for CS-MIM channel estimation and data detection.

FIGURE 7 .
FIGURE 7. Separate fully-connected DNN model in CE of CS-MIM systems.

FIGURE 8 .
FIGURE 8. Separate fully-connected DNN model for detection in CS-MIM systems.
where H p is the pilot symbol and the fully connected layer is used as output layer to learn the CSI.A variety of different pilot overheads are considered in the training phase to enhance CE performance of the trained model in diverse channel conditions.Then we can obtain the output of the DNN-aided CE as

1 )
signalling rate is 100MBaud and the maximum Doppler frequency is 100 Hz, which corresponds to a normalised Doppler frequency f m of 10 −6 .The resultant BER performance is evaluated by Monte-Carlo simulations.Using the parameters summarized in Table4 andthe parameters used by the learning model of Scheme 2 and Scheme 4 outlined in Table 5 and 6, we investigate a set of five schemes for N t = 4, 8, respectively, which are summarised as follows: Scheme 1: HD-ML-based Detection of CS-MIM system with TAs and RAs N t = N r = 4, 8. a) perfect CSI at receiver.b) MMSE CE and ML detection.c) MMSE-aided-JCED.2) Scheme 2: HD DNN-aided CE and detection of CS-MIM system with TAs and RAs N t = N r = 4, 8. a) Conventional DNN-aided CE and Detection.b) DNN-aided JCED with I max = 3 iteration.

FIGURE 12 .
FIGURE 12. BER performance comparison of HD detector of Schemes 1,2 with N t = 4 under f m = 10 −6 .Our simulation parameter are shown in Tables 4-6.

FIGURE 13 .
FIGURE 13.BER performance comparison of HD detector of Schemes 1,2 with N t = 4 under f m = 10 −6 .Our simulation parameter are shown in Tables 4-6.

FIGURE 15 .
FIGURE 15.BER performance comparison of HD detector of Schemes 1,2 with N t = 8 under f m = 10 −5 .Our simulation parameter are shown in Tables 4-6.

FIGURE 16 .
FIGURE 16.BER performance comparison of HD and SD detector of Schemes 1-4 with N t = 8 under f m = 10 −6 .Our simulation parameter are shown in Tables 4-6.

FIGURE 17 .
FIGURE 17. BER performance comparison of HD and SD detector of Schemes 1-4 with N t = 4 under f m = 10 −6 .Our simulation parameter are shown in Tables 4-6.

TABLE 3 . Look up Table Example of Antenna Selection in the CS-MIM System Having M = 2, N t = 4
LSCE [N r N t MT N 2 f ].To minimize the estimation Mean Square Error (MSE) of H, the popular MMSE-CE formulated as . To elaborate, the complexity of LSCE is dominated by the CSI matrix inversion and multiplication.Then we can characterize the complexity of LSCE by the complexity order of O