Identifying Probabilistically Shaped Modulation Formats Through 2D Stokes Planes With Two-Stage Deep Neural Networks

A lightweight two-stage convolutional (deep) neural network (CNN) based modulation format identification (MFI) scheme is proposed and demonstrated for the polarization domain multiplexing (PDM) fiber communication system with probabilistically shaped (PS) modulation formats. The scheme is tested on a PDM system at a symbol rate of 28 GBaud. Six probabilistically shaped (PS) modulation formats (of 3 bit/symbol PS-16QAM, PS-32QAM, and PS-64QAM, of 4 bit/symbol PS-32QAM and PS-64QAM, and of 5 bit/symbol PS-64QAM) along with six standard modulation formats (BPSK, QPSK, 8PSK and three uniformly shaped (US) QAM: US-16QAM, US-32QAM and US-64QAM) are identified by the trained CNN. By taking advantage of computer vision, the results show that the proposed scheme can provide very high accuracy and significantly improve the identification performance over the existing techniques. The influences of the learning rate of the CNN are also discussed.


I. INTRODUCTION
To meet the demand of various data services, an intelligent optical network [1], which is called elastic optical networks (EON), has drawn considerable interest in these years. This network has the capability of adjusting transceiver configurations, such as changing modulation format and data rates intelligently [2] according to different data demands. The key to implementing EON is to design a hitless flexible transceiver that can identify the modulation format at the receiver end (Rx) to ensure proper demodulation.
A number of modulation format identification (MFI) methods were proposed, which can be used to design the hitless flexible Rx, including some blind algorithms [3], [4] and pilot aided [5] schemes. Other MFI methods like the number of clusters or the higher-order statistics [6]- [10] of The associate editor coordinating the review of this manuscript and approving it for publication was Gustavo Olague . the power distributions in the Stokes space or constellation planes [11]- [14] of received signals. However, these methods are both designed to handle the problem of MFI of standard modulation formats such as PSK signals and uniformly shaped (US) QAM signals. To the best of our knowledge, there is no existing method that can be used to identify the modulation format when the probabilistically shaped QAM signals are involved.
Probabilistic shaping (PS) has been receiving a significant amount of attention since it can realize higher capacity and higher spectral efficiency on optical communications [15]. The challenge of the MFI scheme is that the accuracies of the above MFI methods will tend to some lower values since the PS-mQAM will tend to some other kinds of standard modulation formats or other PS-mQAM for some special cases. For instance, as in some specific conditions, in the Jones space, PS-16QAM may have very small differences between US-4QAM or QPSK signals, and PS-64QAM may also appear similar to PS-16QAM. Although the kind of PS-QAM can be determined after we obtain the channel quality (OSNR or something else) in the receiver side, and receivers seem to know what PS-QAM will be sent, the quality of the channel is not always constant in a real system, especially in an EON system, and it is hard for a receiver to have any preknowledge of the modulation format used by various transmitters with different OSNRs.
Machine learning methods have attracted much attention, and some ANNs methods are used as identification techniques [16]. However, ANN is not suitable for complicated situations, such as having too many modulation formats or high-order QAM signals. Recently, convolutional neural networks (CNNs) have been used in the area of artificial vision, and their performances are very good [17], [18]. An image-based method using a 6-layer CNN proposed in [19] improved the identification performance. However, the new method that uses Jones constellations has poor tolerance to frequency offset and phase noise, and the OSNR values for identifying high-order QAM signals precisely remain relatively high. Additionally, the identification rate will be even worse if PS-QAM signals are added. In [20], we proposed a high-performance MFI scheme by successfully using lightweight convolutional neural networks (CNNs). By changing the received signals into Stokes space and projection constellations in Stokes space onto three coordinate planes, the CNN-based MFI scheme can achieve very high accuracy, even if the OSNR is very low. It is also reasonable to design the MFI scheme following a similar step introduced in [20].
In this paper, we propose our MFI scheme using two-stage CNNs, which can also handle the situations in which some commonly used PS modulation formats are involved. By mapping received signals into the Stokes space and projecting the constellations onto three coordinate planes, we can obtain a series of images, which are used as inputs of the CNNs. Twelve modulation formats are tested. They are BPSK, QPSK, 8PSK, US-16QAM, US-32QAM, US-64QAM, PS-16QAM (3 bit/symbol), PS-32QAM (3 and 4 bit/symbol), PS-64QAM (3, 4, 5 bit/symbol). The results show that our CNN-based MFI scheme performs very well. All modulation formats can be identified when OSNR is higher than 15 dB with accuracy higher than 95%. The tolerances of the proposed MFI scheme with respect to to chromatic dispersion (CD), polarization-dependent loss (PDL), and polarization mode dispersion (PMD) are also discussed.
In section II, the two-stage CNN-based MFI scheme is introduced. The setup of the experimental system is shown in section III. In section IV, the performances of the scheme are discussed, and we explain our conclusions in section V.

II. MODULATION FORMATS IDENTIFICATION SCHEME A. GENERATION OF PS-QAM SIGNALS
The probabilistic amplitude shaping (PAS) architecture [21] is proposed for making PS practical. In general, the standard PAS process mainly contains three steps. In the first step, a distribution matcher (DM) is used. The outputs of the DM can be used to form PS √ m pulse amplitude modulation ( √ m-PAM) symbol sequence. The DM generates the shaped positive amplitude A = 1, 2, · · · , √ m/2 of the constellation symbols X following a probability distribution P A with corresponding entropy H (P A ) [22], where m is the order of QAM symbols. From H (P A ), we can obtain the entropy H (P X ) (in bit/symbol) with all the amplitude portions of all constellation symbols X . The entropy H (P X ) decreases when the distributions of PS-QAM signals become more shaped (see Fig. 1). In this article, the constant composition distribution matching (CCDM) [23] is used. In the second step, a forward error correction (FEC) encoder might be used to ensure error-free transmission. Finally, in the third step, a symbol mapping maps shaped stream of bits into a stream of symbols. A brief architecture of PAS is shown in Fig. 2. Different from other PS-mQAM, the probabilistic fold shaping (PFS) method is used to generate PS-32QAM [24].

B. STOKES MAPPING AND IMAGE GENERATION
In a PDM system, after a coherent receiver, the corresponding constellation can be obtained. In [19], the MFI method used images generated in a Jones constellation diagram. However, constellation points randomly distribute around the ideal location or shift in real optical communication systems with signal degradation, such as phase noise and frequency offset. In this case, especially when OSNR is relatively low, MFI schemes may not obtain a reliable performance. In [20], we proposed a CNN-based method by projecting constellations in the Stokes space onto the Stokes plane, which can overcome phase noise and frequency offset. In detail, the received PDM signals can be mapped into Stokes space using the formula   where e x , e y are received PDM complex signals after algorithms and (ax, ay) are the amplitudes of the complex signals. δ is the phase difference between e x and e y . A 3D Stokes space can be obtained by the last three components (s 1 , s 2 , s 3 ) of the Stokes vector S in formula (1). By noticing that the amplitudes and relative phase of the signals are retained, phase noise and frequency offset vanish, and the constellations of signals in Stokes space are independent from the phase noise and frequency offset (see Fig. 3 (a)). As in [20], each constellation point in the Stokes space, (s 1 , s 2 , s 3 ), can be projected onto three Stokes planes, plane (s 1 , s 2 ), (s 2 , s 3 ) and (s 1 , s 3 ), then three images for the MFI scheme can be generated. In Fig. 3, QPSK and PS-16QAM are considered. In Fig. 3(b), PS-16QAM (3 bit/symbol) might not be easily distinguished from the QPSK in the Jones constellation plane, but they can still be separated by using constellations in Stokes space. In Fig. 3(c), as phase noise and frequency offset are encountered, PS-16QAM (3 bit/symbol) and PS-32QAM (3 bit/symbol) are too close, but they can also be separated in Stokes space. In Fig. 4 and Fig. 5, images generated from signals in the Jones constellation plane and Stokes space of all modulation formats considered in this paper are listed.

C. THE PROPOSED SCHEME
The proposed CNN-MFI scheme is designed for a PDM coherent system, which is shown in Fig. 6. Behind the modulation format-independent chromatic dispersion (CD) compensation and Pol-deMUX, the proposed CNN-MFI scheme is processed. After the modulation format is determined successfully, other modulation format-dependent equalization algorithms can be carried out.
To implement the proposed CNN-MFI scheme, there are two main steps (see Fig. 7). In the first step, received signals are mapped into Stokes space, and images are generated by projecting 3D Stokes onto three 2D Stokes planes, and then in the second step, modulation formats are identified by a two-stage trained neural network in real-time.  The architecture of the proposed scheme. After modulation format-independent processes, the proposed MFI scheme is processed, and then all modulation format-dependent processes can be implemented after the MFI process. The first-stage CNN is used to distinguish whether the signal belongs to a standard modulation format or is a PS-QAM signal. Then, the second-stage CNN obtains the specific modulation format. The lightweight deep neural network MobileNet V2 [25], [26] is used as our CNN scheme, which helps to improve the efficiency of our deep neural network. The basic unit of a MobileNet is called a depthwise separable convolution, which mainly separates a standard convolution into a depthwise convolution and a pointwise convolution. The new convolution layer is called a bottleneck, which can reduce the computational cost 8 or 9 times less than a standard layer. Each stage is an independent MobileNet V2 with 21 layers: 17 residual bottleneck layers, 3 convolutional layers, and 1 average pooling layer. The two-stage CNNs are used here because if we use one-stage CNNs, it will be hard for the CNN to learn all the MFs well in the case of so many formats. However, if we first distinguish the standard MFs and PS-QAMs, it will be easier for the second-stage CNN to determine specific modulation formats as they just learn standard MFs or PS-QAMs.

III. EXPERIMENTAL SETUP
To verify the feasibility of the proposed scheme, a 28 Gbaud coherent experiment platform is shown in Fig. 8. The 28 Gbaud signals of 12 modulation formats were generated by a 65 GS/s arbitrary wave generator (AWG) with an offline MATLAB program [15]. At the transmitter, the external cavity lasers (ECLs) with a linewidth of 100 kHz were utilized to produce light at the central wavelength of 1550 nm and were modulated by an integrated polarization-multiplexing I/Q modulator. Then, the transmitted signals were amplified by an EDFA and sent into a span of an 80 km standard single-mode fiber (SSMF). The loss of fiber was 0.2 dB/km, the chromatic dispersion parameter was 16.9 ps/(nm·km) and the nonlinear coefficient was 1.27 km −1 ·W −1 . Additionally, an ASE noise source and a variable optical attenuator (VOA) were coupled to change the OSNR condition of the fiber link. At the receiver side, a bandwidth optical bandpass filter (OBPF) of approximately 33 GHz was used to filter out the out-band noise, and the filtered signals were fed into a 42 GHz electrical bandwidth polarization diversity coherent receiver. The local oscillator (LO) was a 100 kHz bandwidth. After balanced photoelectric detection (BPD) and 80 GS/s analog-to-digital sampling of a real-time oscilloscope, electrical signals were processed by an offline DSP module, where our proposed CNN-MFI was embedded. Inside the DSP module, the CD compensation and Pol-deMUX [27] were performed first, then signals were mapped into Stokes space, and images were generated for three 2D Stokes planes. These three images are called an image combination, which is used as input of our three-channel CNN.
The modulation format was identified by feeding the generated images to the proposed MFI scheme, and then modulation format-dependent equalizations could be carried out.
We did not adopt data augmentation methods, including image flip, and translation since the generated images are not images of natural color scenes. To train the first-stage network, we used 3,240 image combinations in the training process (each modulation format has 270 image combinations), VOLUME 8, 2020 and for the second-stage network, 64,800 image combinations were used (each modulation format had 5,400 image combinations). OSNR ranged from 9 dB to 35 dB for these two-stage training images. The size of the input images was 224 × 224 pixels and had 256-level grayscale, so the largest memory size of an image was 49 kb. Additionally, only one generated image needs to be stored in the memory once a time; it is small enough to implement the algorithm in a DSP. The training process of the CNN can be performed offline. To generate images, 20,000 symbols were used, and 100 epochs were conducted to converge the neural network. We selected 10 −5 as the initial learning rate, and the Adam algorithm was used to adaptively adjust the learning rate during the training process.
In the test stage, 32,400 image combinations for 12 modulation formats (OSNR from 9 dB to 35 dB) were used to test the accuracy of the trained model. Fig. 9 shows the result of our MFI system. Fig. 9 (a) shows that the first-stage CNN can easily distinguish the standard MFs and PS-QAMs at all OSNRs. After the second-stage CNN, the proposed system can recognize all 12 modulation formats at relatively low OSNRs. In our opinion, the criteria of the evaluation for whether the performance of our system was good enough that the identification accuracy for each modulation format was higher than 95%. Fig. 11 shows that the minimum value of OSNR when the criteria can be reached was approximately 15 dB for standard MFs and PS-QAMs, so we give the identification result for OSNR range from 15 dB to 35 dB in Fig. 9 (b) and Fig. 9   higher than 9 dB, which was lower than the FEC threshold of 28 GBaud BPSK (9.73 dB). According to [28]- [30], the required OSNRs under the FEC threshold for PS-16QAM, PS-32QAM, PS-64QAM are approximately 17 dB, 19.7 dB and 22.5 dB, respectively, which are also lower than the   It can be seen in Fig. 9 (b) that six image combinations of US-16QAM were misclassified as US-64QAM. The reason is that the visual appearance of the two formats were very similar to each other when the OSNR was too low, e.g. 13 dB, and it is very hard to distinguish them from each other (see Fig. 10 (a)). For similar reasons, for PS-QAM signals, four image combinations of 3 bit/symbol PS-16QAM were misclassified as 3 bit/symbol PS-64QAM and two image combinations of 3 bit/symbol PS-32QAM were misclassified as 3 bit/symbol PS-64QAM and one image combination of 3 bit/symbol PS-64QAM was also misclassified as 3 bit/symbol PS-32QAM (see Fig. 10 (b) for example), which are shown in Fig. 9 (c). However, as OSNR was higher than 16 dB, the CNN-MFI scheme identified all twelve modulation formats precisely (Fig. 11).

IV. RESULTS AND DISCUSSION
As in Fig. 11, the proposed MFI scheme reached a precise identification accuracy at a very low OSNR requirement compared to conventional Stokes space cluster algorithms described in [12] and [31]. US-and PS-mQAM signals were successfully identified when OSNR was higher than 14 dB; for PSK signals only, all PSK modulation formats were recognized when OSNR was as low as 9 dB.
Meanwhile, the proposed MFI scheme also runs quickly and requires very small storage memory. In a 28 Gbaud system, the time to achieve 20,000 samples was approximately 0.174 µs, and it took approximately 5 ms to determine the modulation format. The hardware platform was an Intel Xeon E5 v4, frequency 1.7 GHz, 8 cores, and a GTX TITAN Xp GPU card, and 12 GB memory.
In Fig. 12, the effects of sizes of images and numbers of symbols in each combination of images is considered. Three sizes of images are shown: 56 × 56 pixels, 112 × 112 pixels, and 224 × 224 pixels. The range of numbers of symbols in each combination of images are from 5,000 to 25,000 at 5,000. For the same number of symbols in each combination of images, the identification accuracy increases as the size of the images increases. However, a larger size of images implies that more computational resources are needed. Here, we take images of 224 × 224 pixels as a reasonable size, which is a tradeoff of efficiency and complexity. Additionally, using a specific size of images in a combination of images increases the identification accuracy as the number of symbols increases. Other than typical MFI schemes, such as schemes proposed in [12] and [31], the complexity and requirement of memory of the proposed MFI scheme does not change considerably as the number of symbols increases because they are used to generate images only. From Fig. 12, the identification accuracy does not improve considerably as the number of symbols becomes larger than 20,000. Therefore, we use 20,000 symbols in the proposed MFI scheme.
In Fig. 13, we investigated the change in identification accuracy with the increasing number of epochs as we trained VOLUME 8, 2020  the CNN. The conclusion is that the accuracy achieves a stable value after 50 or 60 epochs for three sizes of images with 20,000 symbols. Therefore, it is enough to train the CNN with 60 epochs, and the accuracy can achieve a reliable value.
Although CD can be assumed to be compensated precisely [32] and polarization demultiplexing can be performed before the MFI scheme, in practice, the influence of impairments such as polarization-dependent loss (PDL), residual CD, and polarization mode dispersion (PMD) in optical transmission systems still exist and should be considered. In the following, the tolerances of the proposed MFI scheme with respect to these impairments are considered. Two uniformly shape modulation formats (US-16QAM, US-64QAM) and two probabilistically shaped modulation formats (4 bit/symbol PS-32QAM, 4 bit/symbol PS-64QAM) at OSNR 22 dB are discussed as examples. All impairments are simulated using the offline MATLAB program.
To the effect of PDL, all the Stokes constellation points in 3D Stokes space will linearly shift in the same direction but in the 2D Stokes plane (s 2 , s 3 ), it will not change (see Fig. 14). Therefore, benefiting from our three channels input, the scheme can accurately identify signals with PDL effect. PDL is set in the range from 0 dB to 10 dB. In Fig. 15 (a), the proposed MFI scheme can tolerate PDL until 6.5 dB for US-16QAM, 6.5 dB for US-64QAM, 4.5 dB for PS-32QAM (4 bit/symbol) and 5 dB for PS-64QAM (4 bit/symbol), respectively.
For the residual CD (-300 -300 ps/nm), the results are shown in Fig. 15 (b). The tolerances of the proposed MFI scheme of the four modulation formats are -170 -160 ps/nm, -200 -210 ps/nm, -220 -230 ps/nm and -240 -250 ps/nm, respectively.  To the effect of PMD, the DGD is set in a range from 0 to 15 ps. In addition, the tolerances of the CNN-MFI system for the four modulation formats are 7 ps, 10.5 ps, 11 ps and 12.5 ps, respectively (see Fig. 15 (c)).
From Fig. 15, the proposed MFI scheme can tolerate residual CD, PDL and PMD effects in a wide range. It is interesting to see that for residual CD and DGD, PS-QAM signals perform better than US-QAM signals. The reason is as follows: the inner part of constellations of PS-QAM signals is more likely to appear than the outer part, which makes signals more powerful against the residual CD and DGD effects. However, for the PDL effect, PS-QAM signals perform worse than US-QAM signals. This is because the images of PS-QAM in the 2D Stoke plane (s 2 , s 3 ) are too similar to each other with the same low H (P X ) (see Fig. 16). Additionally, we notice that higher-order modulation formats often have better tolerance than the lower order modulation formats since lower order modulation formats tend to be misclassified as higher-order modulation formats. However, this does not influence the identification accuracy of US-64QAM or 4 bit/symbol PS-64QAM, as they are the highest order modulation format for US-and 4 bit/symbol PS-mQAM signals.

V. CONCLUSION
For twelve modulation formats, BPSK, QPSK, 8PSK, US-16QAM, US-32QAM, US-64QAM, PS-16QAM (3 bit/symbol), PS-32QAM (3 and 4 bit/symbol), PS-64QAM (3, 4, 5 bit/symbol), a deep learning-based modulation identification scheme is proposed. By mapping all received signals into three 2D Stokes planes, images are generated that are fed into the two-stage CNN-MFI scheme. The results show that the proposed CNN-MFI scheme can achieve a relatively high identification accuracy (>95%) with relatively low OSNR burden (>15 dB). Additionally, the tolerances of the proposed scheme with respect to the effects of residual CD, PDL and PMD are also discussed. The results show that the scheme has a wide range of tolerances to these effects. WENBO  From 1988 to 2008, he was a Professor in physics with the Department of Physics, School of Science, Beijing University of Posts and Telecommunications, where he is currently a Professor with the Institute of Information Photonics and Optical Communications. He is the author or coauthor of more than 200 international journals (including IEEE and OSA journals) and peer-reviewed conference papers (including OFC, ECOC, CLEO, and APOC), and holds more than 20 China patents. His research interests include polarization effects in fibers, multilevel phase modulation formats, recovery of optical signal distortions, and generation of optical frequency comb.
Dr YONG LI received the M.S. degree in applied mathematics, M.S.E.E., and Ph.D. degree in electrical engineering from the University of Notre Dame. He is currently an Associate Professor with the School of Electronic Engineering, Beijing University of Posts and Telecommunications. His research interests include computer vision, multispectral images, deep learning, and differential geometry.