Multidimensional Information Assisted Deep Learning Realizing Flexible Recognition of Vortex Beam Modes

Because of the unlimited range of state space, orbital angular momentum (OAM) as a new degree of freedom of light has attracted great attention in optical communication field. Recently there are a number of researches applying deep learning on recognition of OAM modes through atmospheric turbulence. However, there are several limitations in previous deep learning recognition methods. They all require a constant distance between the laser and receiver, which makes them clumsy and not practical. As far as we know, previous deep learning methods cannot sort vortex beams with positive and negative topological charges, which can reduce information capacity. A Multidimensional Information Assisted Deep Learning Flexible Recognition (MIADLFR) method is proposed in this letter. In MIADLR we utilize not only the intensity profile, also spectrum information to recognize OAM modes unlimited by distance and sign of topological charge (TC). As far as we know, we first make use of multidimensional information to recognize OAM modes and we first utilize spectrum information to recognize OAM modes. Recognition of OAM modes unlimited by distance and sign of TC achieved by MIADLFR method can make optical communication and detection by OAM light much more attractive.


Introduction
Since Allen et al [1] recognized that vortex beam with phase structure exp(ilφ) carries OAM l per photon, vortex beam has been extensively investigated in optical manipulation [2], imaging [3], optical communication [4].And because of the fact that can take any integer value, vortex beam has great potential in optical communication [5].
In the past 20 years, there are plenty of progress in techniques for sorting OAM modes.Holograms can be used to transform spiral phase structure, thus it can be a mode specific detector [6].However, this kind of measurement requires a number holograms, which makes it not practical for detecting large number of OAM modes.More efficient sorting can be done with Mach-Zehdner interferometer and a Dove prism in each arm [7].Theoretically, the efficiency can reach 100%, but sorting N modes requires N − 1 cascaded interferometers.Berkhout et al [8] demonstrated a very successful method for measuring the orbital angular momentum states of light based on log-polar transformation.With this method we can get angular spectrum of vortex beam using two static optical elements.Recently there are also a number of researches improving its resolution [9] [10] [11] [12] [13].In fact, gradually-changing-period gratings [14] and annular gratings [15] etc. can also sort LG beams, but log-polar transformation seems to be the most intuitive method, as a result of which, we will use log-polar transformation method to extract angular spectrum information in this letter.These methods are useful when light beam is ideal.For vortex beam through atmospheric turbulence, deep learning is usually used to recognize OAM modes.However, there are also some drawbacks in previous deep learning methods for recognizing OAM modes.
In recent years, deep learning has been widely applied in computer vision.There are several researches about utilizing deep learning to recognize OAM modes [16] [17] [18][19] [20][21] [22].Zhanwei Liu et al first realized superhigh-resolution recognition of OAM modes with the help of deep learning [16].Junmin Liu proposed a deep learning based atmospheric turbulence compensation method [19].However, previous deep learning methods seems to have few drawbacks that limit their application.
Convolutional neural network (CNN) extracts features from intensity profiles alone in methods proposed before.Nonetheless, as Laguerre-Gaussian (LG) light propagates, the radius of the beam increases, however, the radius of the beam is a quite important feature of LG beam for CNN.Thus, when the training set and testing set of CNN contains LG light propagate different distances, the accuracy of CNN prediction would decrease compared to the same distance case as we will show in the following.Because LG light with the same absolute value of TC however with opposite sign share quite similar intensity profile which is all the information sent into CNN, there is no deep learning method realizing sorting LG light with positive and negative TC efficiently.These drawbacks result from the fact that direct intensity detection cannot provide phase information which gives light beams OAM.However, classically, it is the special phase profile that gives light beam OAM.On the other hand, Angular spectrum information is able to extract part of phase information enough to reveal OAM of light beam directly.
In this letter, we proposed and demonstrated a MIADLFR method to remove constrains of preivious deep learning methods mentioned above.MIADLFR method explores both intensity information and spectrum information at the same time with the help of multidimensional feature fusion convolutional neural network (MFFCNN) proposed in this letter.Multidimensional information used together can achieve things impossible for using only intensity information.With multidimensional information, recognition of OAM modes unlimited by distance or sign of TCs can be realized.What's more, MIADLFR can also increase the accuracy of recognition of OAM modes, reduce the size of training set and parameters required significantly as we will show in discussion part.At first MFFCNN extract features from intensity information and spectrum information.Then fully connected layers process these two dimensions features and gives prediction.

Theory
The system used in this letter can be seen in Fig. 1.Firstly, with the help of spatial light modulator (SLM), Gaussian beam generated by laser can be transformed into LG light.Then LG light gets through atmospheric turbulence simulated by atmospheric turbulence screens.There are two paths for detection in system we used.Through the first path, we get the intensity distribution of LG light after atmospheric turbulence.Through the second path, we get the spectrum information with the help of Log-polar transformation method [8].In the end, angular spectrum information and intensity information are sent into MFFCNN together.
The complex field becomes LG light U (z, x, y)| z=0 after SLM, which carries angular momentum.Then it gets through atmospheric turbulence.
For mathematical simplification, treating turbulence as a finite number of discrete atmospheric turbulence screens is a common technique to generate large training sets [19].A number of numerical models have been proposed to simulate atmospheric turbulence.In this letter we used model developed by Hill [23] and defined by Andrews [24].The modified von Karman refractive index power spectrum density can be written as:φ 2 (κ) = 0.49r n is the atmospheric refractive index structure constant, representing the turbulence intensity.Phase change of atmospheric turbulence screens can be described by θ(x, y) = F F T (M φ(κ)).M is an N ×N dimensional complex random number array with a mean of 0 and a variance of 1.We might as well set H = exp(ik∆z) • exp(−i(κ 2 x + κ 2 y )∆z/(2k)), which represents Fresnel propagation.Then the beam after going through one atmospheric turbulence screen can be described by: Thus, intensity profile after n atmospheric turbulence screens can be represented by: As a result, the intensity distribution detected in the first path is I 1 (x, y).
As shown in Fig. 1. after atmospheric turbulence, one single optical element can be used to achieve the coordinate transformation (x, y) → (u, v).Here, v = a arctan(y/x) and u = −a ln( x 2 + y 2 /b).The phase factor of the optical element is given by φ 1 = 2πa/(λf )[y arctan(y/x) − x ln( x 2 + y 2 /b) + x].A phase corrector is also needed to eliminate the phase distortion.The phase corrector is given by φ 2 = −2πab/(λf )exp(−u/a)cos(v/a).Detailed explanation can be seen in ref [8].For simplicity, the light beam after log-polar transformation is represented as: Then, angular spectrum profile sent into CNN can be represented by: As a result, the angular spectrum information detected in the second path is I 2 (x, y).As shown in Fig. 1. , after log-polar transformation, intensity image and angular spectrum image are sent into two CNN respectively.
Deep learning has merged as an important class of artificial intelligence.Recently, deep learning has shown its great power in image classification [25] and CNN is an important tool in deep learning [26].CNN is made up with convolutional layers and fully connected layers.Convolutional layers can extract features which are inherently invariant to spatial transformations in images and fully connected layers are able to process information extracted by upstream convolutional layers nonlinearly.After fully connected layers and appropriate activations, we finally get predicted labels.
The structure of MFFCNN is shown in Fig. 1.What distinguishes MFFCNN from previous CNNs is that the input of MFFCNN contains both intensity and angular spectrum information simultaneously.First, MFFCNN extract features from intensity images and angular spectrum images.In order to ensure the effectiveness of the features, we introduce convolutional block attention module (CBAM) [27] after convolutional layers, which refines features along channel and spatial axes by inferring their attention maps.The obtained tensors z 1 , z 2 are considered to exist in different feature spaces and represent individual informative meanings.Next, we use a fusion layer [28] to disentangle dimension specific and cross dimension dynamics by modeling each of them explicitly.The layer is defined as a differentiable outer product between z 1 and z 2 : Here, z is the fused tensor; ⊗ indicates the outer product between tensors.The extra constant dimension with value 1 generates the dimension specific dynamics and thus z can be viewed as a 2D square of all possible combinations of two tensor spaces.
Finally, the stacked fully connected layers process the fused tensor z and give final classification results.
Therefore, MFFCNN is used to find a map f which can serve as a discriminative boundary among different TCs: Here, l is the TC predicted by MFFCNN; W represents trainable parameters in MFFCNN; f is the map to find.
In order to find such a map, we need to define loss function to evaluate the difference from prediction and actual TCs.Then through gradient descent we can minimize loss function to make MFFCNN more reliable through mini-batch.The loss function we used can be given by: Here M is the size of testing set; N is the total number of TC in training set; y c is a binary indicator which takes value 1 if and only if the actual TC of the ith sample of testing set is c. p c is the probability of the TC of the ith sample to be c predicted by MFFCNN.

Results
The wave length λ used in this letter is 532nm.Beam waist w 0 of LG light is 0.03m.Size of atmospheric turbulence phase screen is 600 × 600.Before sending images into MFFCNN images are resized to 128 × 128.In training sets, there are 600 images for each class of OAM modes.In testing sets, there are 200 images for each class of OAM modes.In log-polar transformation, we used following parameters: a = 0.075/2πm, b = 0.075m.Discussion about number of images for each mode in training set is made below.MIADLFR method can be used to recognize OAM modes even when LG light in training set propagate different distance from those in testing set.In order to show that unlike previous deep learning methods(Intensity one-dimensional recognition), recognition achieved by MDIADLFR is robust to change of distance, we first use training set generated in a fixed distance(1000m in our example in Fig. 2) just like previous methods and then change the distance for testing sets.As shown in Fig. 2., accuracy for traditional methods only reaches peak when distance for training sets and testing set are the same.The accuracy for one dimensional intensity recognition will drop quickly if distance for testing set is different from training set.We believe that this is because as LG light propagate the radius of light beam gets bigger and the radius of light beam is an important feature CNN extracted for mode recognition, as a result of which previous researches are not suitable for OAM modes recognition for distances different from training set.Heat map in Fig. 2. can verify our theory.As shown in Fig. 3 (a), when the distance for training set is larger than it in testing set, TC predicted by one dimensional CNN is smaller than actual TC.On the contrary if distance for training set is smaller than testing set, TC predicted by one dimensional is larger than actual TC, as we can see in Fig. 3.However, for multidimensional recognition, the accuracy is above 90% for distance less than 1200m and dips slowly as LG light propagate.Accuracy for angular spectrum one dimensional recognition is lower than multidimensional overall.This seems to because features extracted from these two dimensions compensate each other as a result of which accuracy for multidimensional recognition is higher than anyone dimension.value of TC but different sign are quite similar.They share the same radius of beam which is a vital feature CNN extracted.Thus, intensity profile is not a quite suitable way to sort OAM modes with positive and negative TCs, as a result of which previous deep learning methods are all limited to positive TC recognition.It can be seen in Fig. 4 (c) (f) that the transverse position in the angular spectrum information detected in path 2 is related to the TC.As shown in Fig. 4 (b) (e), atmospheric turbulence can induce crosstalk between channels especially for adjacent channels.Besides, the crosstalk gets larger as the TC gets larger.

Recognition of OAM modes unlimited by sign of TC
As shown in Fig. 5. (a), MFFCNN is perfectly capable of sorting LG light with positive and negative TC with the same absolute value.As the absolute value of TC grows, the accuracy drops gradually.This is because the crosstalk induced by atmospheric turbulence between channels gets severer as the absolute value of TC grows, as shown in Fig. 4 (b) (e), which is consistent with previous researches [29].

Accuracy for various strength of atmospheric turbulence
It is shown in Fig. 6.(a) that the accuracy of intensity one dimensional recognition falls since the structure constant of refractive index reaches 1 × 10 −15 .This is because for intensity one dimensional recognition, the intensity profile especially the radius of light beam changes greatly when the atmospheric turbulence is strong.For angular spectrum one dimensional recognition, when the atmospheric turbulence gets stronger the crosstalk between channels tend to be symmetric, thus accuracy slides much slower than intensity one dimensional recognition.As for multidimensional recognition, MFFCNN extract features from both intensity profile and angular spectrum so that recognition of OAM modes is less affected by atmospheric turbulence and the accuracy is over 97% when C 2 n reaches7 × 10 −15 .

Size of training set
As is known to all that larger training set usually result in higher accuracy of course below certain upper limit.Generating training set usually take a lot of time and it takes more time to train a model with larger training set.As a result, we need to make a trade-off about the size of training set.In most occasions, usually choose points after which the accuracy rises quite slow as the size of training set grows.We might as well refer the size for each class as converge point (CP) in this letter.It can be seen in Fig. 6.(b) (c) that the longer LG light propagated that is to say the severer LG light is affected, the larger training set is needed.When LG light propagates 1000m and 1200m, the CP point is 300 for MFFCNN.Meanwhile, the CP points are 600 and 750 for z = 1400 and z = 1600 respectively.This result is reasonable, because we usually need larger training set to find best map for classification when the input gets more noise.As for previous methods, CP points are 600 for various distances as shown in Fig. 6. (c).In a nut shell, MFFCNN only need a half of size for previous method which can make better use of the potential advantage of large OAM state space.

Complexity
From Fig. 6.(d) (e), we can see that MFFCNN does not need a complex CNN structure to reach a very high accuracy.In Fig. 6.(d), we can also see that MFFCNN can get 95% accuracy with only one convolutional layer, meanwhile accuracy for one dimensional CNN is only 65%.As we can see one dimensional CNN needs three layers to reach a point after which more complex CNN structure would not raise the accuracy significantly.We believe this is because angular spectrum dimension information compensates features that need complex CNN to extract from intensity dimension.However, features that cannot be extracted from angular spectrum dimension do not need complex CNN to extract from intensity dimension to extract.As shown in Fig. 6.(e), intensity one dimensional recognition needs larger number of channels to reach a stable point, which also means that intensity one dimensional recognition is computationally expensive.What's more, with the help of intensity dimension information, accuracy for MFFCNN is 10% higher than angular spectrum one dimensional recognition the whole time.To sum up, MFFCNN is much less computationally demanding.

Conclusion
In this letter, we proposed MIADLFR method realize flexible recognition of OAM modes.MIADLFR method can extract features from both intensity and angular spectrum thus it can realize things previous deep learning methods fail to achieve, and raise accuracy of prediction remarkably.Such method can recognize OAM modes propagating different distances and sort OAM modes with positive and negative TCs.When atmospheric turbulence refractive index reaches 7 × 10 −5 accuracies are 80.1%, 93.2%, 97.9% for intensity one dimensional, angular spectrum, multidimensional recognition respectively.MFFCNN only need 300 images for each mode in training set to reach a relatively stable point for accuracy.Meanwhile, previous methods need at least 600 images.Smaller training set needed would reduce workload for both generating training set and training remarkably, thus make it easier to show the potential advantage of large OAM state space.What's more, MFFCNN needs much simpler CNN structure than previous methods, which makes it much faster to train and recognize OAM modes and thus more practical.
MIADLFR method proposed in this letter can not only be used in OAM modes recognition, it can be applied in other problems especially for those require multiple inputs.For some problems, collection of training set is technically or computationally expensive, in which case MFFCNN can also come to use.

Figure 1 :
Figure 1: The system used for MIADLFR method.SLM is spatial light modulator used to transform gaussian beam to LG beam.Atmospheric turbulence screen is for simulating atmospheric turbulence.Log-polar mapper, phase corrector and lens are used to get angular spectrum information.Parameters of MFFCNN is shown in the schematic diagram.

Figure 2 :
Figure 2: Accuracy for of multidimensional and one dimensional recognition of OAM modes propagating various distances through atmospheric turbulence of atmospheric refractive index structure constant C 2 n = 1 × 10 −14 with TC ranging from l = 1 to l = 9.Distance for training set is 1000m.

Figure 3 :
Figure 3: (a) Intensity one dimensional recognition accuracy when distance for testing set is 200m.Distance for training set is 1000m.(b) Intensity one-dim recognition accuracy when distance for testing set is 1600m.Distance for training set is 1000m.

Figure 6 :
Figure 6: (a) Accuracy of LG light getting through atmospheric turbulence with various C 2 n when the distance travelled is 1000m.(b) Accuracy of MFFCNN recognition for various number of images in training set for one single mode with C 2 n = 1 × 10 −14 .(c) Accuracy of recognition for various number of images in training set for one single mode with C 2 n = 1 × 10 −14 using previous methods.(d) Behavior of MFFCNN and one dimensional CNNs with different number of convolutional layers at z = 1000m and C 2 n = 1×10 −14 .x coordinates represent the structure of convolutional layers.For example, '16-32' means that there are two convolutional layers with 16 and 32 channels respectively.(e) Behavior of MFFCNN and one dimensional CNNs with different number of channels at z = 1000m and C 2 n = 1 × 10 −14 .