Learned Signal-to-Noise Ratio Estimation in Optical Fiber Communication Links

This paper proposes a signal-to-noise ratio (SNR) estimator based on recurrent neural network (RNN) in optical fiber communication links. The proposed estimator jointly estimates the linear and nonlinear components of the SNR. The input features of the proposed estimator are carefully designed based on a combination of the lower quartile and entropy extracted from the received signal. The proposed input features do not require knowledge of the transmitted symbols. In the proposed SNR estimator, three different RNN models are investigated, namely simple RNN, gated recurrent units, and long short-term memory. The overall computational complexity of the three models of the proposed estimator, including the feature extraction and RNN structures, are analyzed. Numerical results show that the three models of the proposed estimator provide a trade-off between the complexity of the RNN structure and estimation accuracy. Furthermore, the proposed estimator achieves a better SNR estimation accuracy and reduces the overall computational complexity compared to the literature.

The estimation of linear and nonlinear components of SNR based on a single hidden layer of a feedforward (FF) neural network (NN) have been presented in [3], [4], [5], [6], [7], [8]. In [3], [4], [5], the amplitude noise covariance and phase noise correlation have been used as input features to estimate the nonlinear SNR. The auto-covariance function and its principal component analysis along with the normal and tangential components of the received signal have been utilized as input features in [6] and [7]. In [8], the overall computational complexity of the FFNN-based nonlinear SNR estimator has been reduced due to the use of the auto-correlation function of the received signal and its statistical moments. The estimators proposed in the literature have considered the transmitted symbols to be perfectly known to extract the input features. In practice, for such estimators, the SNR estimation accuracy is degraded since the detected symbols are used instead of the exact transmitted symbols. Moreover, the features proposed in the literature require a high computational complexity for their extraction.
Recently, in [9], joint estimation of linear and nonlinear SNRs has been proposed using a two-hidden layer FFNN. The proposed input features in [9] are the entropy for the real part, imaginary part, amplitude, and phase of the received signal. These input features require a low computational complexity for their extraction. However, further improvement in the SNR estimation accuracy is needed.
In this paper, a novel combination of lower quartile and entropy-based features for the received signal is proposed to estimate the linear and nonlinear components of the SNR jointly. The chosen features measure the spread and uncertainty in the received signal, which accurately characterize the SNR. Moreover, three NN structures are investigated for the proposed SNR estimator, namely simple recurrent NN (SRN), gated recurrent units (GRU), and long short-term memory (LSTM). The overall computational complexity, including the complexity of feature extraction and complexity of the RNN structure, is analyzed in terms of real multiplications and additions. The proposed estimator does not require the transmitted symbols to be known for extracting the proposed input features. Dualpolarization (DP) 16-ary quadrature amplitude modulation (16-QAM) over various realizations and system configurations of the standard single-mode fiber (SSMF) is utilized to construct the dataset.
The rest of the paper is organized as follows: Section II describes the system model, while Section III introduces the proposed SNR estimator. In Section IV, the computational complexity of the proposed estimator is derived. Section V discusses the numerical results. Finally, conclusions are highlighted in Section VI. Fig. 1 illustrates the block diagram of the DP 16-QAM coherent optical fiber communication system with the proposed SNR estimator. The random binary sequences (RBSs) for the dual orthogonal polarizations are mapped utilizing the 16-QAM modulator at the transmitter side. Here, the number of the wavelength division multiplexed (WDM) channels is denoted by C w . A square-root raised-cosine (SRRC) pulse shaping filter is used after the upsamling process. Polarization beam combiners (PBCs) followed by an optical multiplexer (Mux) are employed to combine the orthogonal polarizations and mix the WDM channels' signals, respectively. The multiplexed signals are transmitted via the SSMF link that consists of S spans. The EDFA follows each span in the SSMF link to compensate for the optical fiber loss.

II. SYSTEM MODEL
The optical demultiplexer (Demux) processes the received signal at the receiver side to extract each of the WDM channel's signal, in which the center channel's signal is selected to evaluate the performance of the proposed estimator. The polarization beam splitter (PBS) splits the selected signal into dual orthogonal polarizations. Afterward, a digital signal processing (DSP) unit is applied for matched filtering with the SRRC filter, linear chromatic dispersion (CD) compensation, downsampling, and adaptive equalization. In the end, the carrier phase estimation (CPE) and symbol decision are operated to retrieve the transmitted signal.
The SNR estimation is performed before CPE (i.e., pre-CPE) to jointly estimate the linear SNR, denoted by SNR Lin , and nonlinear SNR, denoted by SNR NLin . The SNR Lin characterizes the ASE noise effect due to the EDFA. The SNR NLin models the Kerr-induced nonlinear noise, which includes intra-and inter-channel nonlinear interference effects. The total SNR (denoted by SNR t ) which includes the contribution of both SNR Lin and SNR NLin , is expressed as Here, the reference SNR t can be measured by an optical spectrum analyzer (OSA) or calculated using the error vector magnitude principle as [10], [11] where x n and y n are the n-th symbols of the transmitted and post-DSP received signals, respectively, N represents the total number of symbols, and | · | denotes the absolute value operator. The reference SNR Lin can be obtained through the optical SNR (OSNR) as [12] SNR The OSNR can be measured using OSA-based techniques [13], [14], [15], [16] or calculated approximately by [17,] OSNR(dB) ≈ 58 + P − (10 log 10 (S) + αL + F) , (4) where R b is the baud rate in GHz, B is the reference bandwidth in GHz. Here, P is the launch power in dBm, α denotes the fiber loss in dB/km, L represents the span length in km, and F is the noise figure of the optical link in dB. The reference SNR NLin can be obtained from (1).

III. PROPOSED SNR ESTIMATOR
This section introduces the proposed SNR estimator, including the input features and the three models of the RNN structure.

A. Input Features
The main benefit of the RNN input features is to capture the information about the SNR Lin and SNR NLin . Therefore, selecting the input features is essential, which significantly contributes to the overall computational complexity and impacts the estimation accuracy of the SNR estimator. In the proposed SNR estimator, the input features are selected based on the entropy and lower quartile that can be extracted directly from the pre-CPE received signal.
The entropy of the received signal is defined as the uncertainty in the signal. The entropy of the received signal, H[y], can be expressed as  The lower quartile of the received signal measures the signal spread, in which 25% of the received signal symbols are below the lower quartile value when all symbols are in ascending order. The lower quartile-based features are Q l [ (y)] and Q l [ (y)], which indicate the lower quartile of the real and imaginary parts for y, respectively. The calculation steps of the lower quartilebased features are summarized in Algorithm 2.

B. Neural Network Structure
The proposed SNR estimator is based on an RNN that consists of input, recurrent, and output layers, as shown in Fig. 2. The input layer has six neurons corresponding to the input features: H[∠y], and C w . Three different RNN models are investigated, referred to as SRN, GRU, and LSTM. In case of the SRN model, the output of the recurrent layer at any time step t is expressed as where h t−1 ∈ R N h ×1 is the recurrent layer output at previous time step t − 1, and N h denotes the number of the recurrent layer neurons. In (6), x t ∈ R N i ×1 is the vector of input features at time step t, N i denotes the number of neurons for the input layer, and σ g (·) is the sigmoid activation function. Here, W h ∈ R N h ×N i is the weight matrix for the connection between the input layer and recurrent layer, U h ∈ R N h ×N h is the weight matrix for the connection between the recurrent layer and feedback, and b h ∈ R N h ×1 is the bias vector of the recurrent layer output in the SRN model. For the GRU model, the recurrent layer output can be represented as where z t andh t ∈ R N h ×1 are the outputs of the update gate and candidate unit at time step t, respectively, which are calculated as where W z ∈ R N h ×N i is the weight matrix for the connection between the input layer and update gate. Here, W c ∈ R N h ×N i is the weight matrix for the connection between the input layer and candidate unit, U z and U c ∈ R N h ×N h are the weight matrices for the feedback connections at time step t − 1, b z and b c ∈ R N h ×1 are the bias vectors for the update gate and candidate unit, respectively, and denotes the Hadamard product operator. In (9), r t ∈ R N h ×1 is the output of the rest gate at time step t, and is calculated by where W r ∈ R N h ×N i is the weight matrix for the connections between the input layer and reset gate, U r ∈ R N h ×N h is the weight matrix for the connection between the reset gate and feedback, and b r ∈ R N h ×1 is the bias vector for the reset gate. The output of the recurrent layer in the LSTM model is where o t and c t ∈ R N h ×1 are the output gate and memory cell state vector at time step t, and σ h (·) refers to the tanh activation function. The output gate and memory cell state vector are respectively expressed as and where W o ∈ R N h ×N i and U o ∈ R N h ×N h are the weight matrices for the connections between the output gate, input layer and feedback, respectively, and b o ∈ R N h ×1 denotes the bias vector of the output gate. In (13), f t , i t ,c t , and c t−1 ∈ R N h ×1 are the forget gate, input/update gate, candidate unit outputs at time step t, and the memory cell state vector at time step t − 1, respectively. The outputs of the forget gate, input/update gate, and candidate unit are respectively given as where W f , W i , and W c ∈ R N h ×N i are the weight matrices for the connections between the input layer and forget gate, input/update gate, and candidate unit, respectively, and U f , U i , and U c ∈ R N h ×N h are the weight matrices for the feedback connections.
Here, b f , b i , and b c ∈ R N h ×1 are the bias vectors of the forget gate, input/update gate, and candidate unit, respectively.
In the RNN models, the set of activation functions is selected based on an extensive investigation that provides the best SNR estimation accuracy. Finally, the output layer of all RNN models (i.e., SRN, GRU, and LSTM) consists of two neurons corresponding to SNR Lin and SNR NLin , which can be expressed as

IV. COMPLEXITY ANALYSIS
In this section, the computational complexity in terms of real multiplications (denoted by M (Ξ) ) and real additions (denoted by A (Ξ) ) is analyzed for the three models of the proposed RNNbased estimator, compared to the estimators proposed in [4], [5], [7], [8], [9]. The overall computational complexity consists of the complexity of input features extraction and the feature processing using the RNN structure.
The computational complexity of the RNN structure using the SRN model is the complexity of the recurrent layer given in (6) and the output layer given in (17). Therefore, the computational complexity required for the SRN structure is For the GRU structure, the computational complexity is calculated based on (7) and (17), which is Similarly, the computational complexity of the LSTM structure is obtained from (11) and (17), which is given as Consequently, based on the RNN parameters in Section III-B, the SRN, GRU, and LSTM structures cost 560, 806, and 1053 real multiplications, respectively. Further, 580, 832, and 1092 real additions are respectively required for SRN, GRU, and LSTM structures. The input feature extraction of the proposed estimator requires 225N + 375 real multiplications and (75 log 2 (N ) + 103)N + 147 real additions. The summary of the computational complexity of the proposed estimators compared to the literature is provided in Table I. It is worth noting that the complexity of the input features extraction is the most significant contributor to the overall computational complexity of the estimators, as shown in Table I.

V. NUMERICAL RESULTS AND DISCUSSIONS
This section investigates the trade-off between the computational complexity of the estimator structure and estimation accuracy for the three models (i.e., SRN, GRU, and LSTM) of the proposed RNN-based estimator. Further, the comparison with the literature estimators in [4], [5], [7], [8], [9] is performed. The estimation accuracy is measured by the normalized root mean square error (NRMSE). The estimated data range is utilized for normalization in the NRMSE [18], [19]. In addition, the standard  I  SUMMARY OF THE OVERALL COMPUTATIONAL COMPLEXITY   TABLE II  SIMULATION PARAMETERS deviation (SD) is applied as a second metric to assess the three models of the proposed RNN-based estimator compared to the literature estimators.

A. Simulation Setup
Monte-Carlo simulations are used to construct the dataset for the DP 16-QAM SSMF. The size of dataset is 4950 realizations (3465 used for training and 1485 used for testing), and the simulation parameters are summarized in Table II. The SNR Lin values range between 13.16 dB and 25.93 dB, while the range of SNR NLin values is between 5.07 dB and 26.7 dB. The number of histogram bins used to extract the entropy-based features is 25. It is worth noting that the SRRC roll-off factor is an essential parameter for the optical fiber system performance and spectral efficiency. Small roll-off factor values are suitable for WDM systems with 50 GHz channel spacing to achieve high spectral efficiency, and CD and polarization mode dispersion tolerance. The small roll-off factor value benefit comes from decreasing the linear crosstalk between neighboring WDM channels and reducing the unwanted higher-order sidebands in the frequency domain [20], [21], [22], [23]. Consequently, the SRRC roll-off factor of 0.14 is chosen, which is commonly used in the literature [3], [4], [5], [6], [7], [8], [9].
The TensorFlow library is used to implement the three models of the proposed RNN-based estimator mentioned in Section III-B. The hyper-parameters of the proposed RNN-based SNR estimator are as follows: the number of the recurrent layer neurons (i.e., hidden layer size) for SRN is 20 neurons (i.e., N h = 20 for SRN), while the GRU and LSTM use 13 neurons as a hidden layer size (i.e., N h = 13 for GRU and LSTM). The standardization is used for the input features of the proposed RNN estimator, including SRN, GRU, and LSTM. The Adam optimizer is utilized with a learning rate of 0.01 at 150 epochs of 25 batch size for the SRN, GRU, and LSTM models of the proposed RNN estimator. Table III shows the investigation of the input features for the three models of the proposed RNN-based estimator. In this table, five cases are considered to reveal the reason for choosing the final combination of the proposed input features. In Case 1,H[|y|] and H[∠y] are used as input features to SRN, GRU, and LSTM models of the proposed RNN-based estimator for estimating the SNR Lin and SNR NLin . The NRMSE results for this case range from 8.40% to 10.08% for SNR Lin and between 6.58%-6.94% for SNR NLin . In Case 2, the H[ (y)] feature is added to the input features of Case 1 to improve the NRMSE results, which become 6.46%-7.62% for SNR Lin and 4.79%-5.46% for SNR NLin . Similarly for Case 3 and Case 5, C w and Q l [ (y)] are respectively added as input features in order to improve the estimation accuracy of SNR Lin and SNR NLin . Finally, in Case 5, the lower quartile-based features and entropy-based features in addition to C w are used as input features to the three models of the proposed RNN-based estimator in order to accurately estimate the SNR Lin and SNR NLin . The NRMSE results for Case 5 are 4.32%-4.78% for SNR Lin and 3.37%-3.77% for SNR NLin . Table IV presents the trade-off between the SNR estimation accuracy and computational complexity of the three RNN structures. It is shown that the structure of the SRN costs the lowest complexity compared to the GRU and LSTM structures. In contrast, the LSTM achieves the best estimation accuracy for SNR Lin and SNR NLin compared to the SRN and GRU. Fig. 3 depicts the learning behavior of the SRN, GRU, and LSTM models for the proposed RNN-based estimator versus the number of epochs. The vertical axis of Fig. 3 represents the MSE for the training and testing losses. It can be seen that the three proposed models are well-trained, i.e., there is no overfitting or underfitting.

C. Comparison With Literature Estimators
In Table V, TABLE III  INPUT FEATURES INVESTIGATION FOR THE PROPOSED RNN-BASED ESTIMATOR   TABLE IV  INVESTIGATION OF THE THREE PROPOSED  replaced by the decision symbols,x, for extracting the features, which degrades the estimation accuracy, as shown in Table V The summary of the overall computational complexity mentioned in Table I for the proposed RNN-based and literature estimators is depicted in Fig. 4. This figure shows the overall computational complexity comparison between the estimators at N = 2 16 . It is shown that the proposed RNN-based estimators achieve the lowest overall computational complexity in terms of real additions. For real multiplications, the proposed estimator provides a comparable computational complexity compared to [9] and lower than other estimators in the literature. The reduction in the overall computational complexity of the proposed RNN-based estimator is due to utilizing input features with low complexity in their extraction.
Finally, the proposed RNN-based estimator utilizing SRN, GRU, and LSTM with the lower quartile and entropy-based features requires a lower overall computational complexity to accurately estimate SNR Lin and SNR NLin compared to the literature estimators. Furthermore, the proposed RNN-based estimator is applicable for SNR estimation on WDM systems with intra-and inter-channel nonlinear interference effects. Besides, the general framework of the RNN-based estimator is expected to accurately estimate SNR Lin and SNR NLin with other modulation formats when the RNN hyper-parameters are adequately re-tuned.

VI. CONCLUSION
In this paper, a novel RNN-based SNR estimator with three different models, i.e., SRN, GRU, and LSTM, was proposed and investigated in order to simultaneously estimate SNR Lin and SNR NLin using lower quartile and entropy-based features. The proposed input features were utilized to capture the spread and uncertainty of the received signal. The overall computational complexity analysis was presented for the proposed estimator. The proposed RNN-based estimator provided more accurate SNR estimation (e.g., NRMSE of 4.32% for SNR Lin and 3.37% for SNR NLin ) at a low overall computational complexity compared to the literature.