Reliable Optical Performance Monitor: The Combination of Parallel Framework and Skip Connected Generative Adversarial Network

Future optical network is developing towards highly heterogeneity and flexibility, which means that the various signals will be transmitted in the network and the optical performance monitor is more likely to encounter the signal beyond its monitoring range. When the signal beyond monitoring range (abnormal data) is input, the conventional optical performance monitoring (OPM) framework without the ability of data filtering will produce completely wrong results. Although the serial OPM framework has the ability of data filtering, it increases the processing time cost. We propose a novel parallel OPM framework, in which the judgement and analysis modules process the input data simultaneously to reduce the time cost. Moreover, the light-weight and high-performance skip connected generative adversarial network (GAN) trained only on the normal data (within the monitoring range) is proposed in the judgement module to filter the abnormal data in a fast-speed way (~9 ms). In the simulation, eight common signals are used to test the performance of the skip connected GAN in the judgement module. The optimal area under the curve (AUC) value of 0.952 is obtained when the abnormal data is defined as 60 Gbps 64QAM signal. Besides, the impact of the latent vector length, the task weights, the weight of abnormal score, shifted K values and training data size on the model performance are studied.


I. INTRODUCTION
With the expansion of the Internet users and the emergence of various services like cloud computing, artificial intelligence (AI) and internet of things (IoT), the optical network is becoming heterogeneous, dynamic and complex so as to ensure that massive data can be effectively transmitted [1]. Moreover, in order to maintain good operation and management of optical network, it is of great significance to use optical performance monitoring (OPM) along with bit-rate and modulation format identification (BR-MFI) in the network's intermediate nodes to accurately monitor the performance parameters of the transmitted signal (e.g. bit-rate, optical signal-to-noise ratio (OSNR), modulation The associate editor coordinating the review of this manuscript and approving it for publication was San-Liang Lee .
format, etc.) [2]. These accurately monitored parameters, which are sent to the optical control layer, directly reflect the signal status and can provide as important decision basis for the management of optical network.
In the network's intermediate nodes, the traditional framework of the optical performance monitor consists of the data generation module and the data analysis module in sequence, which means that the monitor first collects signals to generate suitable data, and then analyzes the generated data to obtain the monitoring results. In order to obtain more accurate monitoring results, a large number of deep learning (DL) technologies such as deep neural networks (DNN) [3]- [7], the convolutional neural networks (CNN) [8]- [11], the long short-term memory (LSTM) [12], [13] and so on, which belong to the category of supervised learning are applied to the data analysis module for OPM and BR-FMI. Thanks to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the capability of extracting and sharing the features, the DL technology is more powerful than machine learning (ML) technology. The usage of these DL technologies which belong to the supervised learning class, can be separated into training stage and testing stage. It is noted that in the supervised learning, the training and testing sets are derived from the same monitoring range (the same data distribution). Otherwise, the DL model would produce completely wrong results. For instance, a DL model trained with multiple phase shift keying (MPSK) signals with the aim of identifying the signal type cannot get the correct results when the input data is quadrature amplitude modulation (QAM) signals. However, with the development of the optical network, the optical performance monitor would inevitably encounter the data which beyond its monitoring range (named as abnormal data). Worse still, the traditional monitoring framework does not have the ability of data filtering, and the input abnormal data would lead to the DL models in the data analysis module get wrong results.
In order to enhance the reliability of the optical performance monitor, a serial OPM framework and a generative adversarial network (GAN) with the encoderdecoder-encoder (EDE) structure were introduced with the purpose of filtering the abnormal data [14]. The judgement module is placed after the data generation module so as to filter the abnormal data before the data analysis module. The training set of the data analysis module is used to train the EDE based GAN in the judgement module to learn the distribution of the normal data which lie the monitoring range, which means that no additional data is required. The judgement module is loosely coupled with the data analysis module, which means that the DL models in the analysis module can continue to work without any modification. Compared with the idea of using the supervised learning method to identify the abundant abnormal data, our previous work [14] is low-cost and scalable, which greatly improve the reliability of monitoring. However, our previous work still has the following drawbacks: (1) The serial OPM framework increases the processing time of the input data. It is because the input data needs to be processed by each module successively. The more serial modules, the longer it takes. (2) The EDE based GAN model in the judgement module is complex and its performance is not good enough. Hence, a more advanced OPM framework as well as a more light-weight and high-performance algorithm in the judgement module are needed to obtain more reliable and fast optical performance monitoring.
In this article, we address the drawbacks in our previous work and propose a novel parallel OPM framework as well as the skip connected GAN model to realize reliable and fast optical performance monitoring. In the parallel OPM framework, the generated data is processed by the judgement module and the data analysis module in parallel, which shorten the processing time compared with the serial OPM framework. Moreover, the skip connected GAN is trained to learn the distribution of the input image data and the latent vector space. The skip connections in the generator of the GAN facilitate the multi-scale capture of the image space [15]. The discriminator not only helps the generator to learn the data distribution but also learns the distribution of the latent vectors by itself. In the data generation module, the data is generated by the asynchronous single channel sampling (ASCS) method. The performance of the skip connected GAN in the judgement module is verified by eight signals with diverse optical impairments (OSNR, differential group delay (DGD) and chromatic dispersion(CD)): 60/100 Gbps 4 quadrature amplitude modulation (QAM), 60/100 Gbps quadrature phase-shift keying (QPSK), 60/100 Gbps 64QAM, 60/100 Gbps 16QAM.
The remainder of this article is organized as follows. In Section II, the principles of the parallel OPM framework, asynchronous single channel sampling and the skip connected GAN model are illustrated and discussed. In Section III, the simulation set-up is presented and described. Moreover, the simulation results of the skip connected model is also presented and discussed. The conclusion of this article is summarized in Section IV.

A. THE PARALLEL OPM FRAMEWORK
In order to maintain good control and management of the optical network, it is important to deploy optical performance monitors in the intermediate nodes. However, due to the expensive hardware cost and high sampling rate, the digital signal processing (DSP)-based coherent detection methods are not suitable to be deployed at the intermediate nodes [16]. Instead, with the advantage of cost-effective and low-speed sampling, the direct detection and asynchronous sampling (DDAS) method is used in this work. The performance parameters monitored by the optical performance monitors are sent to the control and management layer to formulate the management strategy for the network. However, there is a contradiction between the new signal generated with the upgrade of the optical network and the optical performance monitor with fixed monitoring range. Specifically, completely wrong monitoring results will be got when the new signal beyond the monitoring range of the optical performance monitor is input, which is a hidden danger for the management of the network. Therefore, to enhance the reliability of the monitor, a judgement module which can filter the abnormal data is needed.
The structure comparisons among the traditional OPM framework, the serial OPM framework and the new proposed parallel OPM framework are shown in Fig. 1. As far as we know, the parallel framework is first proposed in the OPM field by us. The data generation and analysis modules are the two modules existing in all three OPM frameworks. The judgement module only exists in the serial and parallel OPM frameworks. Therefore, the traditional OPM framework is unreliable. Moreover, the data processing time of each OPM framework is investigated to highlight the low time-cost advantage of the parallel OPM framework. The processing times of the traditional, serial and parallel OPM frameworks from the input data to the results are defined as t T all , t S all and t P all , respectively. The processing times of the generation, judgement and analysis modules are defined as t G , t J and t A , respectively. In the traditional OPM framework, the input data will be processed by the data generation module and the data analysis module in sequence, which means that the processing time of the traditional OPM framework can be expressed as: In the serial OPM framework, the input data will be processed by the data generation module and the judgement module. If the judgement module classifies the input data as normal data, then the input data needs to be processed by the data analysis module. Otherwise, the judgement module will produce a warning message and deny the service. The processing time of the serial OPM framework can be expressed as: when the input is abnormal (2) In the parallel OPM framework, the input data will be processed by the data generation module firstly, and then processed by the judgement and analysis modules in parallel.
If the judgement module classifies the input data as normal, the analysis results from the data analysis module is regarded as the monitoring results. Otherwise, the judgement module will produce the warning message and deny the service. The processing time of the parallel OPM framework can be expressed as: Moreover, according to equations (1)-(3) and the size relationship between t J and t A , we can conclude as follows: According to our previous work [10], we take processing time of t J = 9ms and t A = 51ms from this work (shown in Section C, part III) and our previous OPM work [10] (which perform OPM based on ASCS images), respectively, as examples to analyze equation (4) concretely. According to equations (1)-(3), when the input is normal, the t T all , t S all and t P all equals to (t G + 51) ms, (t G + 60)ms and (t G + 51) ms, respectively, and when the input is abnormal, the the t T all , t S all and t P all equals to (t G + 51) ms, (t G + 9)ms and (t G + 9) ms, respectively. The result of the specific example is consistent with equation (4) when t J < t A . It is noted that in our parallel framework, the judgement module and analysis module are decoupled, which means the existing OPM works proposed by researchers can be applied in the analysis module without any modification. With the development of the research, the t J and t A are likely to be shorter and shorter. However, no matter what relationship between t J and t A is, equation (4) summarizes all the results. It is clear that the t P all is definitely not larger than the t S all , which demonstrates the superiority of the parallel OPM framework. Moreover, the parallel structure requires higher demand for the computing power of the optical performance monitor, but with the development of hardware equipments, the computing power is no longer an obstacle.

C. THE SKIP CONNECTED GAN IN JUDGEMENT MODULE
For the purpose of filtering the abnormal data, the unsupervised GAN is used to design the DL model. The GAN proposed by Goodfellow et al. [20] is a hot spot in the research of DL [21]- [26]. The principle of the GAN is founded on the rivalry of two networks within a zero-sum game framework. The first network named as generator (G) is used to capture the input data's distribution, whilst, the second network named as discriminator (D) is used to predict the correct class (i.e., normal or abnormal). Each network is constantly improving its ability in the competition until they reach a balance. With the help of GAN, the designed DL model has a stronger ability to learn the normal data distribution so as to filter the abnormal data. Moreover, in the EDE based GAN model which we proposed before, the generator is the structure of the encoder-decoder-encoder for the purpose of learning the distribution of the image and latent spaces simultaneously. This kind of generator is complex and its performance needs to be improved. Here, we simplify the encoder-decoder-encoder structure of the generator to the encoder-decoder structure, and skip connections are added to enable the multi-scale capture of the image space. The encoder-decoder structure with skip connections is similar to the UNet style which is good at capturing the image details [15]. Different from the EDE based GAN which learns the distribution of image and latent spaces simultaneously in the generator, the skip connected GAN learns the distribution of image and latent spaces in the generator and discriminator, respectively. With the simplified structure of the generator and the added skip connections, the new proposed skip connected GAN model is light-weight and high-performance.
The structure of the skip connected GAN is illustrated in Fig. 4, which consists of the generator (G) and the discriminator (D) networks. In the network G, the input image I with the shape of 32 × 32 × 3 is down-sampled by the encoder sub-network G E to the low-dimensional feature with the shape of 1 × 1 × 512, then, the low-dimensional feature is up-sampled to reconstruct the input image I asÎ by the decoder sub-network G D . The sub-network G E has four layers, each of which consists of the Convolutional operation, LeakeyReLu and BatchNorm. As the symmetrical structure of G E , the G D also has four layers, each of which consists of the Convolutional transpose operation, BatchNorm and ReLu. Moreover, the G D uses the method of the skip connection so that every down-sampling layer in the G E is copied and concatenated to its homologous up-sampling layer in the G D . The benefit of using the skip connections is that they provide direct feature transfer between the layers so that both the local and global information is probed and better image reconstruction is obtained. The network D is used to classify the real image I from the fake imageÎ generated by G during training. Besides, the network D also extracts the latent feature vector of the input image and the reconstructed image from the Penultimate layer with shape of 1 × 1 × 64. Other detail configurations such as the stride, padding, filter size and so on are clearly illustrated in Fig. 4. The whole skip connected GAN is trained on the normal data, and tested on both the abnormal and normal data. In the training phase, the skip connected GAN learns the distribution of the normal data in the image and latent vector spaces. In the testing phase, since the model is never trained on the abnormal data, the reconstruction loss of the abnormal data is higher than the reconstruction loss of the normal data, which can be used as the standard to discriminate the normal from abnormal data. In order to train and test the skip connected GAN, a dataset {I i } m+n is split into the training set {I i } m and testing set {I i } n , where I i ∈ R 32×32×3 is the input image, m + n is the total number of images. Moreover, the training set {I i } m only contains m normal images and the corresponding label y i = 0 denotes normal data. The testing set {I i } n contains n normal and abnormal images and the corresponding label y i ∈ {0, 1} denotes normal and abnormal data, respectively. VOLUME 8, 2020 During the training phase, the adversarial loss, the reconstruction loss and the latent vector loss are combined to train the model. The adversarial loss is used to impose G to reconstruct image authentically, and D to classify the real image from the generated image. The adversarial loss is denoted as: The reconstruction loss is used to capture the distribution of the input normal data and reconstruct the input image as similarly as possible. The reconstruction loss defined by the L 1 distance is expressed as: The latent vector loss is used to reconstruct the latent vector from the input and the generated images as akin as possible. The Penultimate layer of the D is used as the extracted latent vector of the input image and the generated image. The latent vector loss is expressed as: where the f (·) is the output of the D s penultimate layer. Finally, the combined total training loss can be defined as the weighted sum of the three individual losses above: loss total = loss rec + λ 1 loss adv + λ 2 loss lat (8) where the weight parameters λ 1 and λ 2 are used to adjust the influence of loss adv and loss lat , respectively. During the testing phase, the abnormal score is defined to measure how likely a test image is to be an abnormal data. The input test image will be regarded as abnormal data when its abnormal score is higher than a certain threshold. The abnormal score S I of the given test image I can be expressed as: where R I is the reconstruction loss between the generated and input images as stated in equation (6). L I is the latent vector loss between the extracted feature vectors of the input and generated images as stated in equation (7). λ 3 is the weight parameter of the abnormal score balancing the influence of the R I and L I . The entire testing set's abnormal scores are standardized to the range of 0 and 1.

III. SYSTEM SETUP AND RESULTS
The VPItransmissionMaker and the Pytorch library are used to set up the simulation system as illustrated in Fig. 5. Eight signals (60/100 Gbps 64QAM, 60/100 Gbps 16QAM, 60/100 Gbps QPSK, 60/100 Gbps 4QAM) are prepared in the transmitter to be transmitted over a single-mode fiber (SMF). During the transmission, the variable optical attenuator (VOA), the erbium-doped fiber amplifier (EDFA) and the CD/PMD emulator are applied to add OSNR and AUC represents the probability that the positive sample ranks ahead of the negative sample, which is independent with the threshold. However, the classification accuracy is related to the threshold. The AUC value (the higher the better) is usually used to evaluate the performance of binary classifiers, so it is suitable to use the AUC value to evaluate our model which predicts two classes: normal or abnormal.

A. IMPACT OF THE TASK WEIGHTS, LATENT VECTOR LENGTH AND SHIFTED K VALUE
The impact of the task weights λ 1 and λ 2 on the ''GAN 1'' (abnormal type: 60 Gbps 64QAM, λ 3 = 0.4, latent vector length equals 64 and k = 10) is considered. The results are shown in Fig. 6. The value range of λ 1 and λ 2 are both from 0 to 5 with step 1. Obviously, the model performance dose not change regularly. Only by lots of simulation experiments, the impact of the task weights can be analyzed. The performance is relatively poor in the areas when λ 1 > 4 or λ 2 < 2. It is significant that when λ 1 = 2 and λ 2 = 3, the model obtain the highest performance (AUC = 0.952). Besides, when the λ 1 = 2, λ 2 = 3, λ 3 = 0.4 and k = 10, the AUC values influenced by the latent vector length of the ''GAN 1'' under different monitoring range are shown in Fig. 7. The latent vector is in the Penultimate layer of the discriminator network, and the length of the latent vector means the number of the feature map channels. The latent vector lengths of 32, 64, 100, 256 and 512 are used to verify the model performance. It can be seen that when the latent vector length equals 64, the optimal model performance is obtained. However, as long as the length is more than or less than 64, the model performance would deteriorate. It is because that the latent vector length directly affects the model's ability to probe the distribution of the data in the latent space. Specifically, too short length would make the model unable to probe the complete data distribution. Too long length would lead to redundant information.
Moreover, the AUC performance of ''GAN 1'' under different monitoring range influence by the shifted k values when λ 1 = 2, λ 2 = 3, λ 3 = 0.4, latent vector length equals   Fig. 8. The value range of k is from 4 to 14 with step 2. For k values of 4, 6, 8, 12 and 14, we use the ASCS method to generate the training and testing sets the same way as k = 10 for different monitoring ranges. It is clear from Fig. 8 that when k = 10 most monitoring ranges obtain the optimal AUC performance, except when 100 Gbps QPSK is defined as abnormal whose optimal AUC is obtained when k = 6. Moreover, some typical phase portrait images of eight signal types under different K values are shown in Fig. 9. For MQAM signals, it is clear that a K value smaller than 10 closes the phase portraits along the diagonal or edges, while a K value bigger than 10 increases the expansibility of the sample points, which lead to the underestimation of performance. For QPSK signals, when K value is bigger than 8 or 10, the sample points shrink inward. Therefore, when K equals 10, the phase portraits of most signal types can show sufficient information for model to learn. VOLUME 8, 2020 The task weights, latent vector length and shifted k values have a direct impact on the model performance, so it is essential to pick the apt parameter values. Under the simulation conditions in this article, the best model performance is obtained when λ 1 = 2, λ 2 = 3, latent vector length equals 64 and k = 10.

B. WEIGHT OF ABNORMAL SCORE AND FEATURE VISUALIZATION
When the 60 Gbps 64QAM is defined as the abnormal signal, we take the ''GAN 1'' (λ 1 = 2, λ 2 = 3, latent vector length equals 64 and k = 10) as the research object. During the testing phase, the histogram of the abnormal scores affected by the abnormal score weight λ 3 is illustrated in Fig. 10. According to equation (9), the weight λ 3 is used to balance the influence of the R I and L I . The λ 3 ranges from 0 to 1. The smaller the value of λ 3 is, the greater the effect of L I on the abnormal score is. On the contrary, the larger the value of λ 3 is, the greater the effect of R I on the abnormal score is. A typical histogram is plotted when λ 3 equals 0, 0.2, 0.4, 0.6, 0.8 and 1, respectively. It is clear that when λ 3 equals 0.4, the abnormal score's distribution of the normal images and the abnormal images are highly differentiated, and the optimal AUC (0.952) is obtained. Furthermore, the t-SNE [28] plot of the latent vector produced by the discriminator's Penultimate layer is shown in Fig. 11. The original high-dimension latent vectors are reduced to low-dimension vectors for the purpose of visualization. As shown in Fig. 11, there is a obvious boundary between the abnormal and normal data, which means that the discriminator has been able to discriminate the normal data from the abnormal data.

C. PERFORMANCE OF THE JUDGEMENT MODULE
Based on the determined optimal parameters, the AUC values of the ''GAN 1'', ''GAN 2'' and previously proposed ''EDE  GAN'' when different signals are defined as the abnormal ones is shown in Fig. 12. No matter which signal is defined as abnormal, the ''GAN 2'' has the lowest AUC among the three models. The AUC values of the ''GAN 1'' is higher than the AUC values of the ''EDE GAN'' in almost all cases, except  the case that 100 Gbps QPSK is defined as the abnormal. The highest AUC value 0.952 of the ''GAN 1'' is obtained when the abnormal signal is defined as 60 Gbps 64QAM. Moreover, in order to compare the cost performance between the ''GAN 1'' and ''EDE GAN'', the number of parameters, the total and mean processing times are recorded in Table 1. Based on an Intel Core i7-6700 CPU, we record the total and mean times by using the ''GAN 1'' and ''EDE GAN'' models to process the images in testing set (2200 images, abnormal type: 60 Gbps 64QAM) one by one. It is found that with less number of parameters, the ''GAN 1'' model takes about 9 ms to process a single image, which is faster than the processing time of the ''EDE GAN'' model (∼12 ms). Too many parameters will slow down the processing speed. Thanks to the simplified structure, the ''GAN 1'' is light-weighted and fast. The above results show that the simplified model structure and the added skip connections not only shorten the processing time but also improve the model performance.
Chose the ''GAN 1'' and ''GAN 2'' as the research object (abnormal signal: 60 Gbps 64QAM), the typical input and reconstructed images are depicted in Fig. 13. The abnormal images have red border. The input images are revealed in Fig. 13(a). The reconstructed images by the ''GAN 1'' are shown in Fig. 13(b), and the reconstructed images by the ''GAN 2'' are shown in Fig. 13(c). Moreover, the numbers displayed on the top of the reconstructed images are the correlation between the reconstructed and the input images. The ''GAN 1'' is capable of reconstructing both normal and abnormal images and achieves better reconstruction performance than the ''GAN 2'', which means that the skip connections is powerful and it can capture the distribution of both domains in the image space. Since the ''GAN 1'' cannot make a clear distinction between normal and abnormal data in the image space, the obvious distinction is reflected in the latent vector space, which is discussed in the above section B. Then, in order to explore the limit of the ''GAN 1'' performance, we continue to reduce the training data size from original 6600 to 5500, 4400 and 3300, and measure the AUC values using the testing set at different epochs, as shown in Fig. 14. When the training data size is 6600, the AUC curve converges at epoch 19. When the training data size is 5500, the AUC curve converges at epoch 22. When the training data size is 4400, the AUC curve converges at epoch 24. Nevertheless, when the training data size continue to reduce to 3300, the AUC curve cannot converge anymore.
We can conclude that with the reduction of the training data size, the AUC curve is more unstable and takes more epochs to get convergence, and finally, the AUC curve fails to get convergence when the training data size is 3300, since it is difficult for ''GAN 1'' model to learn the data distribution from the insufficient data size.

IV. CONCLUSION
A novel parallel OPM framework with skip connected GAN is proposed to filter abnormal signals, which improves the reliability of the optical performance monitor. The judgement and analysis modules are parallel organized in the OPM framework to process the input data simultaneously, which is faster than the serial OPM framework. Moreover, the proposed skip connected GAN simplifies the EDE based GAN by means of an encoder-decoder structure together with skip connections, which is light-weight and highperformance. When the 60 Gbps 64QAM signal is defined as the abnormal one, the skip connected GAN obtains the optimal AUC performance (0.952). When a single image is input, the average processing time of the skip connected GAN is around 9 ms. The impact of the latent vector length, the task weights, the weight of abnormal score, shifted K values and training data size on the performance of the model are studied. The parallel OPM framework and the skip connected GAN further improve the reliability and reduce the processing time, which is meaningful to the upgrading of the network.