Experimental Demonstration of Linear Inter-Channel Interference Estimation Based on Neural Networks

In this paper, an algorithm for the estimation of the linear inter-channel crosstalk in a dense-WDM polarization-multiplexed 16-QAM transmission scenario is proposed and demonstrated. The algorithm is based on the use of a feed-forward neural network (FFNN) inside the coherent digital receiver. Two types of FFNNs were considered, the first based on a regression algorithm and the second based on a classification algorithm. Both FFNN algorithms are applied to features extracted from the histograms of the in-phase and quadrature components of the equalized digital samples. After a simulative investigation, the performance of the channel spacing estimation algorithms was experimentally validated in a 3 × 52 Gbaud 16-QAM WDM system scenario.

of the channel spacing, and the presence of a frequency offset between the local oscillator and the transmitter laser [4]. Several solutions have been proposed to mitigate the linear ICI effect arising from these impairments [5], [6], [7] using digital signal processing techniques at the receiver side. Another approach consists in minimizing the impact of linear ICI by adapting the characteristics of the transmitted signal (modulation format, symbol rate, spectral shaping, etc.) to the conditions of the link by monitoring the frequency spacing among adjacent channels [8], [9], [10], a parameter that is strictly linked to the linear ICI. Machine learning (ML)-based methods have recently received considerable attention for linear ICI and channel spacing estimation. Among the most frequently used ML algorithms are k-nearest neighbors (KNN) [11], density-based spatial clustering of applications with noise (DBSCAN) [12], and support vector machine (SVM) [13]. However, while these methods have been employed to evaluate the amount of ICI, they are limited to providing qualitative assessments [12] or treating the estimation problem as a mere classification task [11], [13].
In this paper, we propose an alternative method, based on Feed Forward Neural Networks (FFNNs) applied to features extracted from the histograms of the in-phase and quadrature components of the equalized samples in a coherent receiver, demonstrating that it is a valid technique for monitoring the linear ICI in a DWDM optical communication system through the estimation of the frequency spacing between optical sub-channels. Two types of FFNNs were considered: a Cross-Entropy Loss (CEL) -based Neural Network, which uses a classification algorithm, and a Mean Square Error (MSE)-based Neural Network, which employs a regression algorithm. The regression-based FFNN slightly outperforms the classification-based FFNN, achieving a channel spacing estimation Root Mean Square Error equal to 0.32 GHz in simulations and 1.26 GHz in the experiments, in a 3x52Gbaud polarization-multiplexed (PM) 16-QAM scenario.
The paper is organized as follows. Section II introduces and describes the methodology, while Section III describes the simulation and the experimental setup. The obtained simulation results are then shown in Section IV, while the experimental results are described in Section V, comparing them with the simulations of the previous section. Finally, Section VI draws the conclusions.

II. METHODOLOGY
The proposed method is based on the generation of histograms using the in-phase (I) and quadrature (Q) signal components of 16-QAM modulated symbols after equalization in a coherent receiver, as in [11], [12], [13]. As shown in Fig. 1, four local maxima and three local minima can be indeed extracted from both the I and Q histograms. The resulting 14 features are concatenated in a vector, which is given as input to an FFNN. By denoting the input features as the vector y 0 , the output of a L-layered FFNN is the vector y L = f F F NN (y 0 ), that is obtained through the following recursive computation: where, for each l-th layer, y l is the output vector, W l and b l are the weight matrix and the bias vector (i.e. the learnable parameters of structure), and a l is the activation function. This nonlinear mathematical model, whose coefficients are optimized through gradient-descent-based algorithms, can be used as a universal function approximator [14], able to solve nonlinear regression problems as well as classification tasks. In this paper, two alternative criteria are used to train the FFNN: r Mean Square Error (MSE) criterion (regression-based FFNN): the FFNN, having a linear output activation function and a scalar output, estimates the ICI in terms of sub-channel frequency spacing. In this situation, the network is trained to minimize the difference between the sub-channel spacing Δf , relative to given inputs, and the model predicted outputΔf , thus solving a nonlinear regression problem.
r Cross-Entropy Loss (CEL) criterion (classificationbased FFNN): Assigning N classes to N different values of frequency spacing, the FFNN exploits a Sof tmax() output activation function to produce an N -dimensional output vector whose entries are an estimation of the probability that a given input belongs to a certain class [14]. In this situation, the FFNN is trained to minimize the cross-entropy between the distribution of the input samples  Fig. 2 The proposed algorithms are able to estimate the amount of linear ICI that falls in the channel under test. Each amount of linear ICI corresponds to a single value of channel spacing, under the assumption that the two neighboring channels are equally spaced. For simplicity, and for a direct comparison with Refs. [11], [12], [13], the results in the following sections are expressed in terms of a single value of channel spacing Δf , thus assuming that the frequency spacing is the same between the channel under study and its two neighbors. Note that, if the neighboring channels are not equally spaced, the two values of channel spacing can still be estimated, for instance by complementing the ML algorithm results with spectral estimation methods, such as those described in [8], [9], able to identify the ratio between the linear ICI induced by the left and right channel, respectively.
The FFNN performances are evaluated by computing the Root Mean Square Error (RMSE) between the predictions and the actual outputs on a test dataset, as follows: To make the results achieved by the two approaches comparable, when testing the trained CEL-based FFNN a scalar product is performed between its SoftMax output s = [s 1 , s 2 . . . , s N ] and the vector Δf = [Δf 1 , Δf 2 . . . , Δf N ] , where Δf i is the frequency spacing related to the i-th class among the N selected. The resulting scalar output o = s · Δf becomes thus equivalent to the MSE-based FFNN output, and the RMSE can be computed as well. Using both criteria, we trained the FFNN for 30 epochs over the simulation training dataset, using an Adam optimizer with a learning rate = 0.001, momentum α = 0.9 and decay rate ρ = 0.999. In the hidden layers, the ReLU () activation function has been used, since it is the most commonly adopted in Deep Learning Literature [14]. The remaining FFNN hyper-parameters are discussed in Section IV.

III. EXPERIMENTAL AND SIMULATION SETUP
To evaluate the performance of the proposed method in a flexible-grid system, a 52-Gbaud 16-QAM channel (the channel under test) and two similar interfering channels were generated with three CISCO-NCS1004 line cards. All 16 QAM transmitted symbols were shaped by a Root Raised Cosine (RRC) filter with a roll-off factor equal to 0.25. The frequency spacing between channels varied from a minimum of 52 GHz (equivalent to the symbol rate) to a maximum of 61 GHz (at which the effect of linear ICI becomes negligible compared to the ASE noise). The precise and continuous grid-less tunability supported by the card made the frequency detuning effect negligible and allowed us to monitor the cross-talk only. Moreover, a large bandwidth filter (125 GHz) was used before the receive port of the line cards to avoid any filtering issues. Amplified Spontaneous Emission (ASE) noise was added at the receiver side to test the channel spacing monitoring algorithm at an initial signal-to-noise ratio (OSNR) equal to 17 dB for creating the training and test dataset, then varied up to 19 dB to assess the performance of the algorithm in presence of different OSNRs. The line cards allowed the downloading of symbols after equalization and before the decision.
A twin copy of the experimental setup was also simulated with a training and a validation dataset at an OSNR equal to 16 dB (corresponding to the same back-to-back performance achieved in the experimental set-up), in order to find the best FFNN structures to be exploited for the experiments. The IQ histograms have been generated with a resolution of 100 bins, using sequences of 2 18 symbols. In the simulations, the training dataset consisted of 12440 labeled sample sequences (uniformly distributed among the 9 training sub-channel spacings), while the validation dataset consisted of 7000 labeled sample sequences (uniformly distributed as well). The IQ histograms, from which the 14 features were extracted, were generated using sequences originating from different Monte-Carlo simulations. In the experimental analysis, both the training and test dataset consisted of 4416 labeled sample sequences (uniformly distributed among the 7 sub-channel spacings). In this scenario, the sequences with the same sub-channel frequency spacing originated from a unique long experimental acquisition. Therefore, to extend the number of available sequences, IQ histograms were computed over partially-overlapped observation windows: this technique, which takes inspiration to window slicing data-augmentation [15], was adopted in order to extend the number of available training and test data. In the experiments, from 7 sequences of 1228800 symbols with fixed sub-channel spacing, 552 samples were retrieved using observation windows of 122880 symbols and a window shift size of 2000 symbols. In both experimental and simulation setups, a non-uniform distribution of the frequency spacing was used when generating the FFNN training sequences to take into account the non-linear dependence between the frequency spacing and the SNR penalty, as shown in Fig. 3.
The above figure clearly shows that higher channel spacings (i.e., from 58 GHz to 62 GHz) tend to not affect significantly the SNR with respect to lower ones: therefore, the related histograms tend to be more difficult to be discriminated.
According to this, the following values of frequency spacing were used in the simulation and experiment:

59] GHz
Moreover, in order to prevent overfitting during simulations, the related datasets were generated separately using different instances of 16-QAM symbols sequences and ASE noise: training and validation sequences having the same sub-channel spacing are thus uncorrelated between them.

IV. SIMULATION RESULTS
In this section, we report the results achieved using the simulation setup, which we exploited to find the best FFNN architecture to be adopted for perform the experiments using real data. We leveraged the simulation training dataset to optimize the FFNN coefficients using Stochastic Gradient Descent optimization, while we used the validation dataset to evaluate the performance of the trained architecture (in order to prevent overfitting issues) and tune the model hyper-parameters. To accomplish this task, we performed a grid-search analysis, varying the number of hidden layers and the per-layer number of hidden units (i.e., the depth and the width of the FFNN). For each depth-width pair, we trained and validated the FFNN using different mini-batch sizes (B = [1, 5, 25, 50, 100, 200]), each one for 10 different runs, averaging then the RMSE results and selecting those with the best performances (i.e., lowest RMSE). The performance over the training and validation dataset are reported as contour line plots in Fig. 4 As it can be observed from the figures, configurations having lower RMSE on the training dataset do not necessarily have a good performance also in the validation dataset (thus indicating the possible occurrence of overfitting). This is especially evident in the case of the CEL-based FFNN: as the complexity (i.e., FFNN depth and width) increases, the validation RMSE (Fig. 4(c)) increases while the training RMSE ( Fig. 4(a)) slightly decreases. Therefore, the best chosen architectures have respectively 1 hidden layer with 42 hidden neurons in the CEL-based FFNN case (RMSE = 0.36 GHz) and 2 hidden layers with 14 neurons each in the MSE-based FFNN case (RMSE = 0.32 GHz). It should be noted that the CEL-based FFNN, while achieving a slightly higher RMSE at best than the MSE-based FFNN, exhibits more stable performance as the structure varies (i.e. the RMSE never exceeds 0.45 GHz). In contrast, MSEbased FFNN performance varies strongly across the depth-width search space (with a maximum validation RMSE = 2 GHz). To better investigate the performance of the 2 best CEL-based and MSE-based FFNN architectures, the RMSE relative to each tested sub-channel spacing in the validation dataset has been then evaluated and reported in Fig. 5(a) and (b).
As expected by the SNR vs Channel Spacing analysis illustrated in Fig. 3, in both cases the RMSE tends to be higher when the overlap between the channels is small (i.e., sub-channel spacing higher than 57 GHz), as a consequence of the lower impact of the ICI on the IQ histograms. The extracted features indeed do not contain enough information for an accurate estimation of the channel spacing. Moreover, it can be observed that the error dependency on the spacing is smoother in case of the regression-based FFNN, as shown in Fig. 5(b), while the classification-based FFNN tends to predict the output as one of its main classes, as shown in Fig. 5(a).
As a further confirmation of the previous observations, Fig. 6 shows the simulation results in terms of the joint probability density function between the predicted and tested sub-channel spacing values for both the CEL-based and MSE-based FFNN. Consistently to Fig. 5(a), the CEL-based FFNN estimations in Fig. 6(a) tend to be distributed in a stepped fashion, according to the sub-channel frequency training classes. Moreover, as the tested sub-channel spacing is higher, the distribution points are more scattered in both the MSE and CEL cases.
In conclusion, the simulation results show that the FFNNs are able to effectively estimate the sub-channel spacing, with the MSE-based FFNN slightly outperforming the CEL-based one. In terms of computational complexity, both cases have a similar number of learnable parameters (i.e., weights and biases): 673 coefficients for the MSE-based FFNN, and 555 coefficients for the CEL-based FFNN. The optimal performance achieved by both FFNN architectures with a limited number of parameters (i.e., less than 1000) suggests that using Artificial Neural Networks for ICI estimation would not eventually impose an excessive computational burden in practical scenarios. Moreover, for both algorithms, the accuracy of the estimation tends to be lower in the region where the tested channel spacing is larger, i.e., where the impact of ICI on the system performance is less relevant and thus an accurate estimation of the amount of overlap is less important.

V. EXPERIMENTAL RESULTS
This section shows the experimental validation of the channel spacing estimation algorithms, in a similar setup to that used in simulations (see Section II for details). In particular, only the results obtained using the regression-based FFNN are shown in the following, since it gives better performance than the   Section IV, Table II shows the RMSE performance on the simulation validation dataset testing the same sub-channel spacings as in the experiments.
While using the simulated dataset the overall RMSE is 0.32 GHz using the MSE-based FFNN, in the experiments the same Neural Network brings to an overall RMSE equal to 1.26 GHz. Furthermore, as observed in simulation, the RMSE relative to higher sub-channel frequency spacings (58-59 GHz) is higher with respect to the more overlapped scenarios. In case where ICI is more relevant, the RMSE is around 1 GHz, proving that the algorithm is able to effectively estimate the sub-channel spacing in a real experimental setup.
Finally, we assessed the impact of a non-perfect match between the OSNR value used in training and testing. In this analysis, the FFNN initially trained at a fixed OSNR=17 dB (16 dB in the simulation setup), has been tested using sequences generated at three different OSNR values, with only the first value matched to the OSNR used in the training phase:  Fig. 7 shows the results in terms of distribution of the estimated frequency spacing in all the considered OSNRs for both simulation and experimental setup.
It's clear that, when the OSNR at which the FFNN is tested is different from the one used during training, a bias error in the estimation is present. If the OSNR values are known, this estimation error can be easily estimated and compensated for. However, if the difference in the OSNR values is high (e.g. 2 dB, in Fig. 7) the variance of the distribution increases and the RMSE will thus be higher than in the case in which the OSNR values in training and testing are similar.

VI. CONCLUSION
The FFNN-based estimator proposed and analyzed in this paper is able to estimate the frequency spacing between adjacent channels with an RMSE around 0.32 GHz in simulations and around 1.26 GHz in experiments. It was also shown that the estimation error is lower when the overlap between channels is higher, i.e. when the impact of ICI on the system performance is more relevant and thus an accurate estimation of the amount of overlap is more important. Moreover, our FFNN-based method allows for a much finer estimation of frequency spacing compared to other classification-based ML algorithms proposed in the literature [11], [13], as their maximum frequency resolution is limited to the spacing step among the adopted training classes.
In conclusion, while the use of Feed-Forward Neural Networks presents promising results, there is still room for further optimization. For example, 1-D Convolution Layers could be integrated before the FFNN to improve the feature extraction from the IQ histograms (e.g., going beyond the search for local extreme points). Nevertheless, the results presented in this paper strongly support the efficacy of Artificial Neural Networks as a valuable ICI monitoring tool for optimizing future grid-less optical networks. Further research in this direction is warranted to fully exploit the potential of this approach.