A MIMO Detector With Deep-Neural-Network for Faster-Than-Nyquist Optical Wireless Communications

Conventional multiple input multiple output (MIMO) detection algorithms face challenges related to computational complexity and limited performance when handling high-dimensional inputs and complex channel conditions. In order to enhance signal recovery accuracy in atmospheric turbulence channels for faster-than-Nyquist (FTN) optical wireless communication (OWC) systems, a deep learning (DL) based MIMO detector is proposed. By leveraging a deep neural network (DNN), it becomes possible to learn nonlinear mappings within MIMO systems, resulting in improved detection performance while reducing computational overheads. Simulation results validate that our proposed DNN detector achieves comparable performance to the maximum likelihood (ML) method, while reducing complexity by 40%.

A MIMO Detector With Deep-Neural-Network for Faster-Than-Nyquist Optical Wireless Communications Minghua Cao , Ruifang Yao , Qinxue Sun , Yue Zhang , Qing Yang , and Huiqin Wang Abstract-Conventional multiple input multiple output (MIMO) detection algorithms face challenges related to computational complexity and limited performance when handling high-dimensional inputs and complex channel conditions.In order to enhance signal recovery accuracy in atmospheric turbulence channels for faster-than-Nyquist (FTN) optical wireless communication (OWC) systems, a deep learning (DL) based MIMO detector is proposed.By leveraging a deep neural network (DNN), it becomes possible to learn nonlinear mappings within MIMO systems, resulting in improved detection performance while reducing computational overheads.Simulation results validate that our proposed DNN detector achieves comparable performance to the maximum likelihood (ML) method, while reducing complexity by 40%.
Index Terms-Deep neural network, faster-than-nyquist, multiple input multiple output, optical wireless communication.

I. INTRODUCTION
T HE implementation of fast multiplexing and demultiplex- ing for non-orthogonal frequency-division multiplexing (NOFDM) is achieved through the utilization of inverse fast Fourier transform and fast Fourier transform [1].The authors propose a novel approach called non-orthogonal discrete multitone (NODMT), which combines the advantages of NOFDM and discrete multi-tone (DMT).This integration has the potential to significantly enhance spectral efficiency in future optical networks.The multiple input multiple output (MIMO) system, renowned for its capability to improve spectral efficiency and enhance link reliability, has emerged as a prevalent technology in contemporary wireless communication standards [2].Compared to single input single output (SISO) systems, MIMO systems effectively utilize spatial resources and augment channel capacity without requiring additional bandwidth because each receiver antenna can simultaneously receive signals transmitted by all transmitting antennas [3].With the growing demand for high data rate communication networks, Mazo demonstrated in 1975 that higher transmission rates could be achieved using fasterthan-Nyquist (FTN) technology [4].In [5], the authors introduced the concept of non-orthogonal wavelength division multiplexing (WDM) and provided a comprehensive review of the principle of FTN.A performance comparison between FTN and constellation shaping was presented.The authors extensively discussed the underlying principles of FTN signaling, emphasizing its orthogonality and distinguishability relative to Nyquist and Mazo limits.Furthermore, they demonstrated single carrier time domain FTN signals using the cascaded binary-phaseshift-keying iterative detection (CBID) algorithm [6].In [7], the authors investigated the capacity of FTN signaling for both frequency flat and frequency selective (FS) MIMO channels.Considering that FTN introduces additional frequency selectivity, it was found that precoding in time (or equivalent frequency on the spectrum) combined with waterfilling in spatial domain achieves capacity for frequency flat MIMO channels with FTN.This combination of FTN technology and MIMO technology further enhances spectrum efficiency by artificially compressing symbol intervals to transmit more symbols.However, it also increases the complexity of signal detection.Maximum likelihood (ML) detection [8] represents the optimal choice capable of achieving peak performance.However, its computational grows exponentially with both modulation order and the number of antennas, posing challenges for practical implementation.On the other hand, algorithms with lower computational complexity such as zero forcing [9] (ZF) and Minimum Mean Square Error [10] (MMSE) achieve detection through straightforward linear transformations but exhibit a significant gap compared to the ML algorithm and have unsatisfactory performance for FTN signals with a high acceleration factor.
In recent years, deep learning (DL) has garnered significant attention from both academia and industry due to its powerful learning capabilities.The main advantage of DL lies in its ability to extract crucial information from premarked training data [11].Consequently, DL approaches have become increasingly popular for solving MIMO signal detection problems.For example, [12] explores an unsupervised DL-based MIMO detection method that utilizes an autoencoder to learn the entire system.Moreover, DetNet [13] represents one of the earliest DL-based detection approaches for MIMO detection by employing a model-based algorithm that unfolds the iterations of the projected gradient descent method.In [14], authors proposed a simplified version of DetNet by reducing inputs dimensions and simplifying network structure.Corlay et al. [15] suggest replacing the sigmoid activation function used in DetNet with a multi-plateau version and implementing two networks with distinct initial values to simultaneously detect transmitted signals, resulting in improved detection performance through selection based on a smaller loss function solution.The authors conducted a comprehensive evaluation of four prominent model-based deep learning techniques, namely DetNet, MMNet, GEPNet, and Recurrent Equivariant MIMO (RE-MIMO), based on different working principles.They assessed the reliability, complexity, and robustness of these techniques against the practical Massively Parallel Non-Linear (MPNL) processing detection approach [16].Similar to DetNet, orthogonal approximate message passing network (OAMP-Net) was developed by unfolding the OAMP algorithm [17], [18].It has been proven that OAMP-Net requires minimal training time and is capable of adapting to varying channels.However, it necessitates prior estimation of noise variance.Additionally, two data-driven approaches called deep neural network (DNN) and convolutional neural network (CNN) are employed for MIMO detection over a fixed channel case in [19].Another data-driven approach is presented in [20], where conventional DL network structures are utilized for signal detection in a typical MIMO system over an erroneous channel scenario.In [21], authors propose a DL-based detection method utilizing neural networks to obtain optimal decision regions for multi-user MIMO systems.Furthermore, in [22], a MIMO detection method employing DNN is proposed for an optical transmission system, indicating the growing trend of applying DL to communication technology.The authors propose the modified expectation propagation network (MEPNet), which employes the DL scheme and unfolds iterative the modified expectation propagation detector (MEPD) to provide the best damping factor and initial variance in [23].The authors in [24] propose a parallel detection network (PDN) that achieves a significant diversity effect by incorporating a tailored loss function and minimizing the similarity between detection networks.Notably, the performance of PDN exhibits substantial improvement with an increasing number of parallel detection networks in time-varying MIMO channels.The authors [25] introduce the learn iterative search algorithm (LISA), which treats the signal detection problem as a tree-based decision problem with the objective of learning the optimal decision strategy.The authors develop a model-driven DL detector based on variational Bayesian inference, where their proposed unfolded DL architecture is inspired by the non-invertible variational Bayesian learning framework, effectively avoiding matrix inversion by maximizing the relaxed evidence lower bound in [26].In [27], authors propose an efficient data-driven detection network, i.e., accelerated multiuser interference cancellation network (AMIC-Net), for uplink massive MIMO systems.
However, the design and implementation of DL-based MIMO detection algorithms present their own challenges.These challenges encompass the selection of an appropriate network architecture, optimizing of hyperparameters, mitigation of overfitting issues, and management of the computational complexity associated with training large-scale DNNs.Nonetheless, numerous research endeavors have demonstrated the feasibility and efficacy of DL-based MIMO detection in various wireless communication scenarios.In this research context, the paper is dedicated to overcoming the challenges of high complexity in DL detection algorithms, particularly focusing on time complexity.Therefore, a DNN-based detection method is proposed for pulse position modulation (PPM) signal detection in MIMO-FTN OWCs.
The remaining part of this paper is organized as follows: Section II presents the system model, followed by the detection scheme is in Section III.Numerical results and complexity analysis are presented in Section IV.Finally, Section V concludes the paper.

II. SYSTEM MODEL
Traditionally, intensity modulation/direct detection based on an on-off keying (OOK) is widely accepted in OWC owing to its easy implementation and lower cost [28].To further improve spectrum efficiency and anti-interference capability, PPM has been considered for OWC communications.Compared with OOK, PPM significantly increases the data transmission rate and system reliability [29].Fig. 1 illustrates a DNN-based MIMO-FTN OWC system utilizing 4PPM modulation.User data is initially encoded using Gray code and then mapped into 4PPM format.Subsequently, the mapped signal undergoes FTN shaping via a filter [30] before being converted from digital to analog through a digital-to-analog converter (DAC), and transmitted through multiple optical antennas in the atmospheric channel.At the receiver end, the optical signals are received by multiple optical antennas and converted into electrical signals which are then forwarded for analog-to-digital conversion (ADC), matched filtering, and sampling.Finally, the signal is sent to the DL module for data recovery.
According to the Nyquist criterion, in a bandwidthconstrained channel, the maximum code rate for high-speed data transmission should not exceed twice the channel bandwidth if we want to avoid inter-symbol interference (ISI).If this limit is exceeded, severe ISI will occur and result in a degradation of the system's bit error rate (BER) performance.This maximum rate for ISI-free transmission is also referred to as the Nyquist rate.Therefore, for an ideal low-pass channel with the bandwidth of W, the symbol transmission rate must be less than 2W Bd, where 2W Bd is defined as the Nyquist rate [31].For an ideal band pass channel, the corresponding Nyquist rate is W Bd. In FTN rate communication, the symbol transmission rate exceeds 2W Bd while maintaining a low-pass channel of just W. With the same symbol rate, FTN technology directly improves spectrum efficiency [32].Assuming that h(t) represents a band-limited pulse with finite energy and H(f ) is its Fourier transform, h(t) is considered T-orthogonal when it satisfies (1).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Since the sin c(t) = sin(πt/T ) (πt/T ) pulse, which is a T-orthogonal pulse, can be obtained by normalizing its energy: Using the pulse g T (t) to transmit binary data with a symbol period T, assuming the transmitted data is an all "1" sequence and there is no ISI, this transmission mode is referred to as orthogonal transmission.In this case, the transmitted signal can be represented as: where a n ∈ {−1, 1}, N1 and N2 are integers that satisfy N2 > N1.Assuming transmission through an Additive White Gaussian Noise (AWGN) channel with a noise two-sided power spectral density of N0 /2, the BER at the receiver using optimal detection can be expressed as: where is the complementary error function, E is the energy of pulse g T (t).Sending data with smaller time intervals while keeping the transmission power constant can be expressed as: where a n denotes the information carried by the n − th symbol, τ signifies the time acceleration factor (0 < τ < 1), which characterize the Nyquist compression ratio, and T corresponds the symbol period, T = τ T < T .It should be noted that when τ = 1, the entire process is equivalent to Nyquist transmission without any ISI.When 0 < τ < 1, the system can be considered as an FTN transmission system.FTN improves the spectral efficiency by reducing distance between adjacent symbols within limited bandwidths.
Fig. 2 illustrates the transmitted symbols with Nyquist and FTN signaling, respectively.Each symbol's waveform is subject to interference from other symbols, thereby introducing challenges in demodulation and symbol detection.
The atmospheric channel fading coefficient h follows Gamma-Gamma distribution [33], and its probability density function can be expressed as: where  of α and β can be determined as where σ 2 = 0.5C 2 n k 7/6 L 11/6 is the Rytov variance, C 2 n is the refractive-index structure constant, L is the transmission distance, k = 2π/λ, λ is the wavelength, d = kD 2 /(4L), D is the receiver aperture diameter.
The considered system is a MIMO configuration, where the number of receiving antennas (N r ) and transmitting antennas (N t ) are defined.The received signal can be mathematically expressed as where x denotes the transmitted symbol vector, H denotes the channel matrix and n denotes the noise.When employing ML detection, the estimated signal can be mathematically represented as follows: where y, H, and x denote the received signal, channel matrix, and transmitted signal respectively.The ML detection technique achieves optimal performance when there is perfect knowledge of the channel state information (CSI).It is widely acknowledged as the most effective approach for detecting signals in MIMO systems.However, its exponential complexity renders it impractical for real-world applications.Therefore, DL-based algorithms are being considered for signal detection.

A. The Structure of the DNN
The schematic diagram in Fig. 3 illustrates a fully connected DNN with 8-layer, consisting of an input layer, 6 hidden layers, and an output layer.The primary function of the input layer is to receive initial input data and propagate it through the network.The hidden layers play a crucial role in learning and extracting pertinent features from the input data.The output layer servers as the final stage of the DNN, responsible for generating predictions Structure of m-th hidden layer.
or outputs based on the processed input data.To achieve multiclassification, the Softmax activation function is employed to produce the final output of the network.These interconnected layers are weighted connections that are learned during training process.As illustrated in Fig. 3, we can describe this complete DNN as a function f (•) that maps input vectors to output vectors at each layer through neuron calculations.The mapping function of the entire DNN can be represented as: where x in denotes the input vector, x out denotes the output vector, and θ = {θ 1 , θ2, . . ., θL} denotes parameter set of DNN which includes subset parameters in each layer.As depicted in Fig. 4, for an M-layered DNN, the x M −1 refers to m−th layer's input vector which acts as (m−1)−th layer's output vector.The mapping function can be expressed as: where x m denotes m − th layer's output vector, w m ∈ θm and bm ∈ θm denote weight and bias respectively for m−th layer, ρ m (•) signifies activation function adding non-linearity to enable arbitrary fitting capability [34].Furthermore, we have opted to utilize the Sigmoid function [35] as the activation function for output layers and the rectified linear unit (ReLU) [36] activation function for hidden layers.Specifically, these functions can be expressed as: The Sigmoid function yields an output value ranging from 0 to 1, rendering is suitable for binary classification methods.On the other hand, the ReLU function exhibits characteristics of simplicity, low computation cost and fast convergence speed.

B. The Structure of the DNN-based MIMO Detector
The proposed DNN-based MIMO-FTN detector is illustrated in Fig. 5.At the receiving end, the signal is sampled and fed into a DL module, which can be considered as a "black box" that decodes the signal using a neural network, thereby achieving demodulation of the signal.In the DL module, the model is Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. Training Procedure and Details
The training process of DNN consists of the following steps and procedures: (a) Data preparation.Initially, Matlab software is utilized to generate a random 1 × 10 6 (0-1) sequence.Subsequently, the data sequence is mapped into 4PPM signal, followed by dividing the data into training and testing dataset.(b) Parameter initialization.The weights and biases of the network are randomly initialized to introduce an element of randomness.(c) Forward propagation.Data is input from the input layer and propagated through the hidden layer, with the output layer responsible for generating predictions or final result.This involves utilizing the output of each preceding layer as input of subsequent layers until reaching the output layer itself.(d) Loss Calculation.The network's output is compared with true labels in order to calculate loss function values accordingly.(e) Backpropagation.Data flows from the output layer back to the input layer while gradients of each parameter with respect to loss function are calculated using chain rule principles.The parameters are then updated iteratively via an optimization algorithm that minimizes loss.(f) Parameter update.Network parameters such as weights and biases are updated using an optimization algorithm based on gradients.(g) Iteration.By repeatedly feeding data into the network and performing aforementioned steps, network parameters continue being updated until a desired criterion is achieved.Once all iteration has been completed, DNN detector can serve as a MIMO system detector equipped with trained parameters.Specifically, we employ adaptive moment estimation optimizer during our training process.

D. Testing Process
The testing process of a trained DNN involves evaluating the model's performance and generalization ability using an independent testing dataset.By utilizing the learned weights and biases, the testing data is propagated through the trained network via forward propagation to obtain predictions or outputs from the output layer of the network.Finally, the SoftMax function classification can be utilized for signal detection.
The accuracy and Loss curves are depicted in Fig. 6.As illustrated in Fig. 6(a), it becomes apparent that the network's testing accuracy tends to stabilize after undergoing 40 epochs of training.Furthermore, as observed from Fig. 6(b), the loss exhibits no further reduction beyond the 40th epoch.Ultimately, the network achieved an accuracy rate of approximately 99.99%, with a loss value below 10 −4 .

IV. SIMULATION ANALYSIS
The choice of training or test dataset size contingent upon the complexity of the system and the employed DL algorithm.Utilizing a small dataset may lead to subpar detection performance, as it might not adequately capture the diverse characteristics inherent to the system.Conversely, using a large dataset can escalate computational complexity [37].Therefore, multiple simulations are conducted to determine the appropriate dataset size and parameters that can yield optimal BER performance.It should be noted that the accuracy of the neural network is affected by the DL algorithm itself, which plays a crucial role in solving nonlinear problems.Hence, evaluating the performance and robustness of our proposal is essential.The simulation parameters used are listed in Table I.
Table II displays the accuracy of the network under different learning rates.A validated system necessitates an appropriate learning rate, as convergence becomes unattainable with excessively large values and slow or nonexistent convergence occurs with excessively low values.Furthermore, increasing the learning rate may cause a transition from underfitting to overfitting [38].It is evident from the Table that a learning rate of 0.001 yields optimal performance.Fig. 7 shows the BER curves under different modulation formats.Pulse amplitude modulation (PAM) commonly employed for intensity modulation, where the pulse amplitude is adjusted according to a specific law to regulate the output.However, achieving ideal sampling of the impulse sequence proves challenging in practical scenarios.Quadrature phase shift keying (QPSK), a form of quadrature modulation, enables transmission of two bits per symbol by dividing the carrier signal into in-phase (I) and quadrature (Q) components that are modulated independently.PPM modulation stands out due to its simplicity, ease of implementation, and high robustness, leading to significant improvements in data transmission rate and system reliability.As shown in Fig. 7, it can be observed that QPSK outperforms 4PPM modulation when the signal to noise ratio (SNR) falls below 17.5 dB.However, when SNR exceeds this threshold value, 4PPM surpasses QPSK as a result of DL being consistently performed within the real-valued domain with consideration given to treating imaginary part signals as equivalent real-valued representations.Henceforth, we adopt 4PPM modulation due to its superior performance at high SNR levels.
As depicted in Fig. 1, the signal is transmitted via the antennas and directed towards the Gamma-Gamma atmospheric channel.It is widely acknowledged that various atmospheric factors, such as rain, snow, sleet, fog, haze, pollution, etc., significantly impact laser beams by inducing reflection, refraction, scattering, and attenuation [39], [40], [41].The accuracy of modeling atmospheric turbulence using the Gamma-Gamma distribution has been demonstrated with distinct levels of turbulence intensity (weak, moderate, and strong) categorized by refractive-index structure constant values [42] of C 2 n = 1 × 10 −17 , C 2 n = 1 × 10 −14 , and C 2 n = 1 × 10 −13 , respectively.The curves depicted in Fig. 8 illustrate the correlation between different levels of atmospheric turbulence intensities and BER, with a roll-off factor of 0.5, τ = 0.8, and a transmission distance of 1000 m.It is evident from the figure that weak turbulence conditions result in superior BER performance.Specifically, when BER= 3.8 × 10 −3 , the BER performance under weak turbulence is approximately 1 dB and 2 dB better than that under moderate and strong turbulence, respectively.Moreover, as the intensity of turbulence increases, the adaptability of DL gradually deteriorates.
The BER performance of the 2 × 2 MIMO-FTN system gradually deteriorates with increasing transmission distance at a specific wavelength, as depicted in Fig. 9, where the acceleration factor is set to 0.8 and SNR is maintained at 20 dB.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.This degradation can be attributed to atmospheric turbulence within the channel, leading to varying degrees of reflection, refraction, scattering, and attenuation of the laser signal.Furthermore, a comparison among different wavelengths reveals that DL method and longer laser wavelengths exhibit superior BER performance and enhanced ISI resistance for the system.Fig. 10 shows the impact of the acceleration factor on the system's BER performance.As the BER reaches 10 −4 , reducing the acceleration factor from 1 to 0.9 and further to 0.8 results in a degradation of approximately 2.5 dB and 5 dB, respectively.Similarly, when the BER is 10 −3 , decreasing the acceleration factor from 1 to 0.9 and then to 0.8 leads to a decline in BER performance by about 3 dB and 4 dB, respectively.It can be inferred from this figure that as acceleration factor decreases, there is a rapid decrease in BER curves observed.Moreover, it is evident that with an acceleration factor of only 0.8, our proposed approach can still ensure satisfactory communication quality.Additionally, the spectrum efficiency has enhanced by approximately 25%.
The BER curves for different numbers of antennas is depicted in Fig. 11.The MIMO-FTN system exhibits a significant reduction in BER compared to the SISO-FTN system.At a BER is 10 −4 , the SNR for 2 × 2 and 2 × 4 MIMO-FTN system  is reduced by approximately 7.5 dB and 14 dB, respectively, relative to the SISO-FTN system.Hence, increasing the number of antennas can effectively mitigate the impact of atmospheric turbulence on BER.In addition, augmenting the number of antennas at the receiving end confers greater advantages in mitigating the effects of atmospheric turbulence compared to increasing them at the transmitting end.Compared to the massive MIMO method proposed in [26], our proposal shows comparable BER performance in 4 × 4 MIMO-FTN systems.Additionally, the trained neural network by the [26] can only accommodate a particular channel realization.
Fig. 12 shows the BER performance and required training time of the DNN versus the number of network layers.We can find that when the network is with an increasing number of network layers, the training time becomes larger indicating that it becomes more difficult to train the parameters in the network.Moreover, the BER performance is improved when the number of network layers increases from 6 to 8, since more layers can help capture the interference correlation characteristics among different symbols.However, too many parameters will cause the training result inaccurate due to inefficient back propagation of gradients in deep networks [43], [44], which leads to a poorer BER.Additionally, studies in [45], [46] have shown that increasing the number of network layers results in a significant rise in computational complexity and overfitting.Overfitting can be categorized into three causes [47].Firstly, when the training dataset consists of a small number of samples, it may not accurately represent all possible scenarios, resulting in less accurate predictions by the trained network.Therefore, it is important for the training dataset to encompass various types of data as much as possible.Secondly, the network is unable to precisely estimate the relationship between input and output due to excessive interference from training data.Lastly, high complexity within the network requires processing numerous parameters to fit every data point in the training dataset accurately; consequently, this prevents generalization to test datasets.Henceforth, selecting an appropriate number of network layers plays a crucial role in system performance optimization.By taking into account both the BER performance and training time, we can validate that it is reasonable to set the number of network layers to 8. It becomes evident that the performance of DNN approaches that of ML and the complexity reduced by 40%.This can be attributed to ML's exhaustive search over all possible transmitted signals in order to find the global optimal solution, resulting in prohibitively high time complexity.Furthermore, compared to the method presented in [27], our approach exhibits improved performance and achieves a certain level of complexity reduction.

V. CONCLUSION
We propose a DNN decoder for MIMO-FTN signal detection, which utilizes the backpropagation algorithm to compute gradients from the output layer to the input layer and updates each parameter using the chain rule to minimize loss.Simulation results demonstrate that this network exhibits comparable BER performance to the ML method while reducing complexity by 40%.Consequently, this scheme effectively reduces system complexity while ensuring spectral efficiency.In addition, the DL-aided detection method holds promise as a candidate for high-speed FTN-MIMO OWC systems.
is the second class modified Bessel function of order α − β; Γ(•) is the Gamma function, α and β are the large and small scale scattering coefficients, respectively.The values Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 9 .
Fig. 9. Curve of transmission distance and BER at different wavelengths.

Fig. 11 .
Fig. 11.Correlation between BER and the number of antennas.

Fig. 12 .
Fig. 12. Correlation between the BER and training time as well as the number of network layers.

TABLE I SUMMARY
OF SIMULATION PARAMETERSTABLE II RELATIONSHIP BETWEEN LEARNING RATE AND ACCURACY Fig. 7. Relationship of BER and modulation format.