Deep Learning-Based Image Denoising Approach for the Identification of Structured Light Modes in Dusty Weather

Structured light is gaining importance in free-space communication. Classifying spatially-structured light modes is challenging in a dusty environment because of the distortion on the propagating beams. This article addresses this challenge by proposing a deep learning convolutional autoencoder algorithm for modes denoising followed by a neural network for modes classification. The input to the classifier was set to be either the denoised image or the latent code of the convolutional autoencoder. This code is a low-dimensional representation of the inputted images. The proposed machine learning (ML) models were trained and tested using laboratory-generated mode data sets from the Laguerre and Hermite Gaussian mode bases. The results show that the two proposed approaches achieve an average classification accuracy exceeding 98%, and both are better than the classification accuracy reported recently (83–91%) in the literature.


I. INTRODUCTION
F REE space optics (FSO) is a wireless communication technology that has received considerable attention for various applications. FSO is seen as a feasible solution to the many connectivity problems in optical communication networks, especially when deploying additional optical fibers is either too expensive or not possible [1]. FSO can be used to establish secure communications in metropolitan areas between buildings and cities, as well as to provide backup for optical fiber links. At a low cost, FSO can provide long-range, high-throughput line-of-sight transmission [2].
FSO signals are sensitive to a wide range of propagation effects in outdoor locations. For example, particles in the atmosphere created by various meteorological conditions, such as rain, fog, and dust, cause optical signal scattering. When the particle sizes are equivalent to the signal wavelength, the effect is severe [3]. Dust particles, in particular, have an average radius that is inversely related to particle height, ranging from 8 to 19 μm at altitudes between 21 and 1 m from the ground, respectively [4]. As a result, the amount of scattering introduced by these particles on signals at the 1550 nm wavelength is substantial compared to the attenuation introduced by bigger particles such as raindrops. Furthermore, dust particles include minerals that scatter light more strongly than fog water drops [5]. Therefore, researching the impact of dust on FSO signals is critical, particularly for towns located in desert areas where dust storms are more common. It is relevant here to mention that desert climate zones account for 14.2% of the Earth's surface area. In literature, several research works address the performance of optical signals in fog, scintillation, and rain conditions [3], but light transmission via dust storms is comparatively less explored.Nowadays, complex light beam shapes have replaced the conventional Gaussian waveforms in optical wireless communication [6]. These complex structures include Laguerre Gaussian (LG) [7], Bessel Gaussian (BG) [8], and Hermite Gaussian (HG) [9] light structures. Employing space as an extra degree of freedom for data multiplexing helps to address bandwidth bottleneck difficulties in optical networks. The various patterns of spatial light modes can also be exploited as information carriers and in the construction of M-ary pattern coding schemes.
Despite the benefits of adopting structured light modes in FSO, atmospheric circumstances substantially impact the phasefronts of the propagated light beam, making identification of the initially encoded signals at the receiver side more difficult. The use of adaptive optics (AO) in the presence of atmospheric turbulence is one technique to adjust beam distortions [10]. This is usually accomplished by modulating a spatial light modulator (SLM) sequentially or a deformable mirror until an objective function is minimized, allowing the original transmitted beams to be reconstructed. However, this raises the receiver's implementation complexity. Furthermore, the optimization process is carried out in cycles [11], which limits the use of the AO-based technique in rapidly changing environments. Digital signal processing (DSP) techniques are also attempted to solve the problem of atmospheric effects. In particular, multipleinput-multiple-output (MIMO) equalization can be used to rectify channel impairments [12]. However, as the number of transmitted spatial modes grows, this approach becomes more difficult.
Machine learning approaches can also be used to correctly identify spatial modes in turbulent channels without the need for AO or DSP equalization algorithms [13]. Authors of [14] employed an artificial neural network approach to discriminate between 16 LG modes in a real-world 3 km free-space transmission using mode patterns recorded on a camera. In [15], an analogous approach was used to classify LG modes transmitted over a distance of 143 km of propagation between two islands. The authors of [16] proposed using convolutional neural network (CNN) for the detection of orbital angular momentum (OAM) modes in turbulent FSO networks. In [17], the authors investigated the demodulation of OAM beams using several classifiers in various atmospheric environments. Similarly, the authors of [18] proposed the use of a CNN-based system for the sake of atmospheric turbulence detection and adaptive demodulation of OAM-FSO signals. In [19], the authors used simulated data to show the potential of a CNN-based method for detecting OAM modes susceptible to turbulence and misalignment. The predicted turbulence impairment is fed back to the transmitter in order to ensure impairment-free transmission of OAM modes. In [20], the authors employed a CNN classifier to detect 21 laboratory-generated HG modes with various input beam parameters.
Images received by an FSO system are noise-prone, affecting communication performance. Image denoising gained momentum in the last decade as it benefits many fields, including image dehazing, restoration, and classification [21], [22], [23]. Image denoising can be achieved using conventional image processing techniques like image filtering, whether in the spatial or frequency domain. In the same context, machine learning algorithms have shown excellent performance when used for image denoising [24], [25].
This article is intended to tackle the problem of identifying structured light modes in dusty weather. The main contributions of this article are as follows: r Propose an image-denoising algorithm based on the use of a convolutional autoencoder (CAE). The CAE algorithm has been shown to be effective in many applications, including image denoising. It utilizes the convolutional architecture of the deep learning algorithm of CNNs, which is based on the image convolution concept to produce a denoised image of desired output size.
r Propose two different methods for the identification of structure light modes. The first method uses the denoised mode image produced by the CAE as an input to a neural network, while the second method replaces the denoised mode image with the CAE latent parameters.
r Evaluate the performance of the two proposed identification methods using the experimentally generated LG, Mux-LG, and HG modes and compare the results with those of the recently published work in [28]. The rest of the article is organized as follows: Section II presents the experimental setup used to collect the dataset investigated in this work and provides a description for the dataset. Section III gives details of the proposed identification method. Section IV presents the achieved results, while Section V provides concluding remarks.

II. EXPERIMENTAL SETUP AND DATASET
The dataset used in this study is generated using the experimental setup shown in Fig. 1 as in [28], where a 1550 nm laser diode (LD-TeraXion PS-TNL) is used to generate a continuous wave (CW) laser signal which is coupled into a standard singlemode fiber (SMF). Then, a fiber beam collimation (Thorlabs,   F230FC-1550) is used to collimate the SMF's into free space, where a Gaussian beam is propagated. The collimated beam is pointed towards a half-wave plate (HWP), which is turned to align the polarization of the propagated beam with a horizontally polarized liquid crystal on a silicon spatial light modulator (SLM-Hamamatsu X13138-08). A mirror is used to reflect the HWP's output in the direction of the liquid crystal display of the SLM. A computer (PC1) was used to control the phase holograms imprinted on the liquid crystal display of the SLM to obtain 32 mode patterns (i.e., LG, Mux-LG, or HG modes) after beam reflection. The experimental measurements were performed in a controlled dusty environment which enables; i) experiment sustainability under the same circumstances and ii) flexibility in controlling the density of dust particles (i.e., weak, moderate, and severe dust effect). A ∼ 1-m controlled environment chamber was built to mimic the effect of a dusty communication channel on the quality of the transmitted spatial modes. The dust particles are uniformly spread throughout the chamber by fans placed at the bottom. To measure the visibility of the dusty channel, a visible link was established using a 520 nm green LD which was added in parallel with the communication link, as shown in Fig. 1. At the receiver side, a    photodetector (PD 1) was used to record the received power of the visible link under different visibility conditions. The received mode pattern was split into two copies using a 50:50 beam splitter (BS), where the mode optical power and the mode pattern profile were recorded and captured, simultaneously, using PD 2 and a charged-coupled device (CCD) beam profiler, respectively.
The experiment dataset was collected at ten consecutive time slots, with the first time slots having the lowest visibility range and the tenth time slot having the highest visibility range. The experiment lasted for 100 seconds with a frame rate of 10 frames/sec. Thus, the collected dataset is of size 8000, 8000, and 16000 images for LG, Mux-LG, and HG modes, respectively. These modes are used as information carriers. Samples of the experimentally received light modes (no dust) are depicted in Fig. 2. Fig. 3 shows the proposed model for identifying structured light beams. It consists of two stages: image denoising and mode classification. Image denoising has a significant impact on visual image quality. Such an impact is also applicable to the domain of optical communications. Multiple denoising techniques exist, ranging between conventional image processing methods and deep learning algorithms [24]. Here, we opt for the CAE as a deep learning algorithm for image denoising. Once the received beam profile is denoised using the CAE, the clean image or the latent code of the CAE is used as an input to a feed-forward neural network for mode classification. The dataset generated in Section 2 is used for training and testing in both stages.

A. Convolutional Autoencoder for Image Denoising
Autoencoders are deep learning algorithms used for image denoising, anomaly detection, dimensionality reduction, latent space classification, or data synthesis using variational autoencoder [29]. The main idea behind the autoencoder is to compress the input data at a given layer before reconstructing it back at the output layer. The autoencoder mainly consists of two parts: the encoder and the decoder. The encoder compresses the input  data to a predesigned size, and the output is the latent code with reduced dimensionality. This code is considered an optimal representation of the inputted data and could be used for other purposes like classification or recognition. The decoder is trained to reconstruct the original data using the latent code. Fig. 4 delineates the basic structure of the autoencoder algorithm.
While the simple structure of the autoencoder is considered the deep feed-forward neural network, it can also be built using a CNN. The CAE model uses convolutional layers mainly intended to manipulate 2D data. In this work, the proposed architecture of the CAE model is built up using four convolutional layers for the encoder part, each of which has a rectified linear unit (Relu) to act as an activation function and 32 filters with the size of each filter is (3 × 3). Each convolutional layer is followed by a max-pooling layer with kernel size (2 × 2), producing output with a size half of its input. The decoder comprises four convolutional layers and the output convolutional layer. Following the same convention as the encoder part, each convolutional layer has 32-(3×3) filters and a Relu activation function. Also, each convolutional layer is followed by an up-sampling layer with (2 × 2) kernel size, producing output with a size double its input. The last convolutional layer has a sigmoid activation function whose output is an image of size exactly matching that of the input image to the encoder. Fig. 5 depicts the whole structure of the proposed CAE architecture. For illustration purposes, Fig. 6 shows the outputs of the learned convolutional filters of the first and last layers of the encoder, when Mux-LG ±03 of Fig. 2 is considered. This is to shed some light on its dominant features.
CAEs, like any other models, have their own limitations. They are designed specifically for image data, and their performance is highly dependent on the input image size. Large image sizes can lead to memory limitations and increased computational requirements. Further, CAEs can capture local features of an image effectively due to the use of convolutional layers but may struggle to capture complex patterns or reconstruct images that are significantly different from the training images.

B. Neural Network for Light Mode Classification
An artificial neural network (ANN) is a machine learning algorithm whose basic cell is the neuron. One or more neurons can construct a layer. A typical network consists of one input layer, one or more hidden layers, and one output layer. The main process of a neuron is to calculate the following equation [30]: where x is the input vector, w is the weights vector, b is the bias vector, and ϕ(.) is the activation function. The activation function could be linear where the output of ϕ(.) equals its input, or ϕ(.) is a nonlinear function like 'Sigmoid' or 'Relu'. In this work, we propose a feed-forward ANN model for light mode classification. The proposed model consists of an input layer and one hidden layer of 100 neurons in addition to the output layer. The input to the ANN is either the denoised mode image or its corresponding CAE latent code.

IV. PERFORMANCE EVALUATION
The performance of the proposed method for structured light identification in dusty weather is considered in this section. Towards this objective, data of the 8-ary LG, 8-ary Mux-LG, and 16-ary HG light beam structures are collected experimentally in the laboratory for 100 seconds, as described in Section II. Therefore, the LG dataset (likewise the Mux-LG dataset) consisted of 8000 images each, while the HG dataset comprised 16000 images. This is because LG and Mux-LG images have 8 classes (each), while HG images have 16 classes. The collected dataset is divided into 10 subsets with an equal number of images. The first subset (captured at the first 10 seconds) contains the beam profiles of the lowest visibility, and the 10th subset (captured at the last 10 seconds) contains the images of the highest visibility. For each subset, 70% of images are selected for training, and 30% of images are retained for testing. The structured light modes received via the FSO link are resized to the dimension of 256 × 256 × 3 to facilitate processing with less complex computing facilities. Given the training and testing data, the CAE is built for the LG, Mux-LG, and HG light mode structures. The input data for the CAE model is selected from images from the 10 different data subsets. Fig. 7 shows samples of the chosen dataset from subset 5 with its corresponding ground truth. The light structure in the upper row is hardly seen because of the dust effect.
As stated before, the proposed CAE model is built using four convolutional layers and four max-pooling layers for the encoder side, four convolutional layers, four up-sampling layers, and an output convolutional layer for the decoder side. The CAE and ANN models are trained for 1000 epochs with batch size 128 using an 'Adam' optimizer, which proved successful in backpropagation over a deep neural network [31]. The ANN model is trained using 5600 images for each LG and Mux-LG data and 11200 images for HG data.
The results achieved using the proposed CAE model are evaluated by two approaches. First, the output of the encoder (latent code) is passed to the ANN model to identify light mode. Second, the output of the decoder part (reconstructed image) is passed to the ANN model for the same classification task.
The trained model is tested using unseen images, which account for 2400 images for each LG and Mux-LG data, and 4800 images for HG data. These images are the 30% of images of the 10 subsets of all visibility regions. This is to show the performance of proposed CAE and ANN models irrespective to the visibility region of mode under consideration. The classification accuracy of the ANN trained with the latent code of CAE is 98.3%, 98.5%, and 98.2% for LG, Mux-LG, and HG, respectively. Fig. 8 shows the confusion matrices for the results of the testing dataset, which offer a relatively small number of mode misclassifications. In particular, the results reveal that the most affected modes are LG 07 , Mux-LG ±07 , HG 12 , and HG 22 . Thanks to the encoder, which produced a latent code of size 16 × 16 × 3 instead of a 256 × 256 × 3-sized image, These effective results have been obtained with reduced inputs to the ANN.
In the second approach, the reconstructed images at the output of CAE are used as inputs to the ANN. The CAE is trained  following the same methodology explained before, and the results achieved using the testing dataset are 99.5%, 99.4%, and 99.2% for LG, Mux-LG, and HG, respectively. These results are slightly improved over the results obtained using the latent code. However, the input dimension (256 × 256 ×3) to the ANN model is substantial compared to that (16 × 16 × 3) of the first approach, which utilizes the latent codes. Fig. 9 illustrates samples of the reconstructed images outputted from the decoder part and its original inputs. Fig. 10 presents the confusion matrices for the results of three light structures. From the confusion matrices, we find the most affected modes are LG 07 , Mux-LG ±07 , and HG 12 . This is intuitively not surprising because the presence of  I  RUN-TIME FOR THE TRAINING AND TESTING PHASES AND THE NUMBER OF PARAMETERS FOR EACH MODEL   TABLE II AVERAGE CLASSIFICATION ACCURACY (%) USING THE PROPOSED APPROACHES IN COMPARISON WITH THE RESULTS OF [28] relatively high level of dust is expected to cause misclassification for modes of similar structures. For example, Fig. 11 shows the structural similarity index measure (SSIM) of the Mux-LG ±07 with other modes. SSIM is a metric used to measure the similarity between two given images [32]. It is evident from Fig. 11 that the highest SSIM is with Mux-LG ±08 . By examining the confusion matrix of Fig. 10(b), we observe that Mux-LG ±07 and Mux-LG ±08 modes are misclassified, which is consistent with the aforementioned assertion.
Next, we repeat the above experiments by converting the colored images to grayscale images to reduce computational complexity. The CAE model is fed with grayscale mode profiles of size (256 × 256 × 1) instead of colored images of size (256 × 256 × 3). The proposed grayscale-based version has reached a classification accuracy of 98.7%, 98.1%, and 98% for LG, Mux-LG, and HG, respectively, using the latent codes; while it has achieved a classification accuracy of 98.5%, 98%, and 98.7% for LG, Mux-LG, and HG, respectively, using the denoised decoder's images. Fig. 12 shows samples of the reconstructed grayscale images outputted from the decoder part and its original inputs. Table II summarizes the obtained results in terms of average classification accuracy using the proposed approaches compared with the results of [28], which used a CNN with two convolutional layers, two pooling layers, and a fully connected layer, in addition to the output layer. It is worth mentioning that the classification results obtained using the colored images slightly outweigh those obtained using the grayscale images of light modes, which adds an advantage to the grayscaled version of being less costly in computational complexity. Table I presents the run-time for the training and testing phases and the number of models' parameters for both the proposed approaches and the CNN approach of [28]. The designed models were trained using the TensorFlow library in Pycharm environment running on a PC with Intel core i9-9900 K CPU 3.60 GHz, 64 G RAM, and Nvidia Geforce RTX2080TI graphic card. It is clear from the results that using grayscale images produces models with fewer number of parameters and less time to run during both training and testing phases, with the CNN approach of [28] is the most efficient.
Finally, experiments have been conducted to show the performance of proposed CAE and ANN models in each visibility region. Towards this objective, we test the previously trained models using the 30% images of each subset (visibility region) alone. Since we have 10 subsets, the results are presented against the number of visibility region. The achieved classification results using the denoised images and latent codes are compared with the results reported in [28]. Fig. 13 displays the results, where it is evident that the classification accuracies achieved using the proposed approaches are very similar for the three light mode structures, while both of them are far better than the classification accuracy reported in [28], especially for the first subsets with images of low visibility.

V. CONCLUSION
In this article, we considered an algorithm to denoise structured light mode images affected by dust. We proposed a convolutional autoencoder for image denoising and a neural network for modes classification. This study was conducted using experimental data of three light structures. The neural network model was built and tested using the denoised images and the latent code of the convolutional autoencoder for dimensionality reduction. The proposed system performed better using the denoised images with a classification accuracy of 99.5%, 99.4%, and 99.2% for LG, Mux-LG, and HG modes, respectively. Further, the proposed system was then trained and tested using grayscale images instead of colored images of the same dataset to reduce the computational cost. The results achieved are almost the same as those achieved using the colored images. Table II summarizes the obtained results in terms of average classification accuracy using the proposed approaches compared with the results of [28].
A natural extension for the work presented in this article is to test the performance of proposed approaches using light mode images captured from real field measurements. In such an environment, the proposed models may need to be first fine-tuned by retraining on light mode images captured at new visibility conditions. Another interesting point for future research is to investigate the proposed methods' performance under the effect of free space turbulence with and without the presence of dusty weather. This is a challenging problem, as the turbulence may take three different states; weak, moderate, and strong.