ML-Based Spectral Power Profiles Prediction in Presence of ISRS for Ultra-Wideband Transmission

A generalized method based on machine learning (ML) and artificial neural networks (ANNs) is proposed for a fast and accurate prediction of spectral and spatial evolution of power profiles in support of performance and quality-of-transmission (QoT) real-time assessment of ultra-wideband links. These systems, operating on bandwidths larger than the standard C–band, are affected by inter-channel stimulated Raman scattering (ISRS), whose impact on power profiles evolution along the fiber is generally estimated by solving numerically a set of nonlinear ordinary differential equations (ODEs). However, the computational effort, in terms of complexity and convergence time to the solution, increases with the bandwidth and the number of transmitted wavelength division multiplexing (WDM) channels, which makes the usual approach no longer particularly suitable to operate in real time. To meet the speed requirements, three different ANNs are introduced to make fast predictions of power profiles over frequency and distance considering a wide range of scenarios: different power per channel values, different fiber types and different span lengths. Two ANNs are used on synthetic data to estimate the impact of linear and nonlinear fiber impairments in support of system modeling. Specifically, one to directly predict the evolution of spectral power profiles along the fiber and the other to estimate the coefficients to insert in a closed-form version of the EGN model. A third ANN operates on experimental data and it is used to predict power profiles at the end of the fiber for fast estimations of system performance. The obtained results show highly accurate predictions with values of maximum absolute error, computed between predicted and actual power profiles, not exceeding 0.2 dB for $\sim$97% of cases for synthetic data and always below 0.5 dB for experimental data. Such results prove the potential of the proposed approach making it suitable for real time application of QoT estimation.


ML-Based Spectral Power Profiles Prediction in Presence of ISRS for Ultra-Wideband Transmission
Ann Margareth Rosa Brusin , Member, IEEE, Antonino Nespola , Mahdi Ranjbar Zefreh , Stefano Piciaccia, Pierluigi Poggiolini , Fellow, IEEE, Fabrizio Forghieri, Fellow, IEEE, and Andrea Carena , Senior Member, IEEE Abstract-A generalized method based on machine learning (ML) and artificial neural networks (ANNs) is proposed for a fast and accurate prediction of spectral and spatial evolution of power profiles in support of performance and quality-of-transmission (QoT) real-time assessment of ultra-wideband links.These systems, operating on bandwidths larger than the standard C-band, are affected by inter-channel stimulated Raman scattering (ISRS), whose impact on power profiles evolution along the fiber is generally estimated by solving numerically a set of nonlinear ordinary differential equations (ODEs).However, the computational effort, in terms of complexity and convergence time to the solution, increases with the bandwidth and the number of transmitted wavelength division multiplexing (WDM) channels, which makes the usual approach no longer particularly suitable to operate in real time.To meet the speed requirements, three different ANNs are introduced to make fast predictions of power profiles over frequency and distance considering a wide range of scenarios: different power per channel values, different fiber types and different span lengths.Two ANNs are used on synthetic data to estimate the impact of linear and nonlinear fiber impairments in support of system modeling.Specifically, one to directly predict the evolution of spectral power profiles along the fiber and the other to estimate the coefficients to insert in a closed-form version of the EGN model.A third ANN operates on experimental data and it is used to predict power profiles at the end of the fiber for fast estimations of system performance.The obtained results show highly accurate predictions with values of maximum absolute error, computed between predicted and actual power profiles, not exceeding 0.2 dB for ∼97% of cases for synthetic data and always below 0.5 dB for experimental data.Such results prove the potential of the proposed approach making it suitable for real time application of QoT estimation.

I. INTRODUCTION
T HE disparity in the growth rate between the internet data traffic (∼60%/year) and the capacity actually provided by commercial optical fiber systems (∼20%/year), will lead in the near future to the so-called capacity crunch [1].Indeed, in the last few years, the emergence of new Internet applications, 5 G technologies, cloud computing, video streaming, Internet-of-Things (IoT) and machine-to-machine (M2M) communications, together with the increase of the number of users and devices connected, has caused an exponential increase of capacity demand [2], [3].To cope with this persistent growth, improvements of current optical communication systems must be implemented.
For this purpose, different technologies are currently being investigated by researchers, each characterized by noticeable advantages in terms of bandwidth increase.An interesting costeffective solution consists in ultra-wideband transmission, trying to fully exploit all optical bands (C-, L-, S-, O-and E-band) available in the fiber.In this case, the capacity increases linearly up to 10 times with respect to current standard single mode fiber (SSMF) systems [4], but most importantly this solution allows to exploit the broad spectrum available in the already deployed fibers, avoiding the installation of new cables (and additional further costs).
Even more capacity can be achieved if space division multiplexing (SDM) technologies are considered.Implemented through multiple parallel fibers (MPF), multi-mode fibers (MMF) or multi-core fibers (MCF), SDM is able to provide up to 2-3 orders of magnitude the capacity of the current SSMF [4], [5], scaling with the number of fibers, modes or cores, respectively.Unfortunately, such a larger capacity comes at the cost of deploying new dedicated cables, thus ultra-wideband transmissions represent the most suitable candidate as a short-term solution.
When the transmission bandwidth is extended beyond the usual C-band, other nonlinear impairments, besides the Kerr effect, become stronger and no longer negligible, causing a further degradation of transmission quality.Among them, inter-channel stimulated Raman scattering (ISRS) is particularly noteworthy.
Consisting in a power transfer from higher to lower frequency carriers, ISRS depends on the spectral load provided at the input of the fiber and produces a tilt on the power spectral profile [6].Furthermore, its effect is stronger in the first kilometers of the fiber, where indeed the power is higher.
In optical communication networks design, optimization and monitoring, the accurate estimation of physical layer propagation effects is fundamental.Computation speed is essential as well.Several physical layer models, accounting for both linear and nonlinear effects in the fiber, have been proposed over the last decade [7], [8], [9], [10], [11], [12], [13], [14].Among them, the GN and EGN models [10], [11] have achieved wide adoption by both the industry and academia.
While the original version of these models required timeconsuming numerical integration, approximate closed-form model (CFM) versions have been worked out.Initially these CFMs were limited to C-band systems, but more recently they have been extended to ultra-wideband (UWB) by making them capable of accounting for frequency-dependent dispersion, loss, non-linearity coefficient and, most importantly, ISRS [15], [16], [17], [18].
These GN/EGN approximate CFMs can provide a full UWB multi-span system performance estimation in milliseconds.However, as input, they require the spectrally resolved power evolution along the span.These spectral power profiles are shaped by ISRS.Unfortunately, their estimation requires the numerical integration of a set of nonlinear ordinary differential equations (ODEs) with a complexity that scales up with the number of WDM channels.
While approximate closed-form solutions have been proposed for the ISRS ODEs, they are accurate only within rather stringent assumptions on the WDM signal spectrum and general system features.In fact, the numerical integration of the ODEs is often preferred, because power profile accuracy is in turn key for the GN/EGN CFM to provide reliable results.The ISRS ODE integration time turns out to be the limiting factor in the speed of the overall GN/EGN CFM system performance assessment.
To solve this issue, particularly significant in UWB transmission scenario, already in [19], two artificial neural network (ANN) models were presented to predict the evolution of power profiles along the fiber span.One model directly predicts the spectral and spatial evolution of power profiles, while the other predicts the parameters required in the closed form-model formula demonstrated in [20] for NLI estimation, where instead these parameters are determined through fitting.This machine learning (ML) based solution finds support in the promising results obtained when ML and ANNs were applied to other optical communication systems problems, such as in the analysis and design of Raman amplifiers [21], [22], [23], [24].
The determination of the NLI CFM coefficients through best fitting of the true power profile in presence of ISRS, computed by numerically integrating (1) as in [20], is also investigated in [25], [26], [27].In particular, in [26] and [27] also Raman amplification is taken into account.Nevertheless, in all these works no clear indications are provided neither on the computational time required by the fitting procedure to determine the coefficients nor on the spatial resolution considered in the evaluation of the power profile evolution along frequency and distance.
As a matter of fact, the spatial resolution (as well as the spectral resolution) has a significant impact on the computational time required by the numerical RS to determine the power profile evolution.A higher resolution is preferred to have a more accurate power profile description, but at the same time it is more computationally demanding.Moreover, the fitting optimization needs to be performed again from scratch every time a new link configuration is considered.
In this context, the method presented in [19] was proved to be ultra-fast and highly accurate in power profiles prediction, achieving maximum absolute errors within 0.1 dB.However, the analysis was limited to a single type of fiber (SSMF) and to a single value of launched power per channel (P ch = 0 dBm).For this reason, in this paper we propose an upgrade of both ANN models to support a generalized scenario.We test the new designed ANNs over different types of fibers and multiple power per channel values.To have a full control of the data-set generation performed for different scenarios of power per channel and fiber type, the analysis is carried out considering synthetic data.For simplicity in the generation of the data-sets, we assume uniform launch power profiles, namely the channels turned on are assumed to have equal power.
Afterwards, we propose an additional ANN model to be used for in-field applications to predict power profiles at the end of the fiber span.Indeed, this ANN represent an interesting tool to obtain fast estimations of system performance.In this case, to consider a more reliable and realistic scenario, the ANN model is trained, validated and tested using experimental data.
The paper is organized as follows.Section II first presents the system setup used for the synthetic data-sets generation.After that, the two ANN models, referred to as Model 1 and Model 2, introduced to support modeling of linear and nonlinear propagation effects, are described together with an extensive discussion on their prediction performance.Then, in Section III the third ANN model, referred to as Model 3, is presented for the prediction of power profiles at the end of the fiber span for in-field system performance evaluation.Since the ANN Model 3 operates on experimental data, the experimental setup and the generation of the training and testing data-sets are illustrated.The ANN prediction performance is discussed at the end of Section III.Finally, the conclusions are drawn in Section IV.

II. ANNS TO SUPPORT MODELING OF LINEAR AND NONLINEAR PROPAGATION EFFECTS: MODEL 1 AND MODEL 2
In this Section, we first describe the scenario under analysis and the system setup considered for the generation of the synthetic data-sets used in the ML-framework.Then, we present the considered ANN models, with a brief description of the training and validation process, followed by the testing results.
Similarly to the study presented in [19], the effect of different input spectral loads causing inter-channel stimulated Raman scattering (ISRS) is analyzed over a single fiber span, but compared to [19], here the study is extended to a more general scenario including multiple types of fibers and different values of channel power.prediction of parameters α 0 , α 1 and σ used in the closed-form model (CFM) formula to evaluate power profile evolution.M sb represents the number of different spectral loads extracted for the sub-band granularity data-set for each discrete value of sub-band power P sb , while M ch corresponds to the number of different spectral loads extracted for the channel granularity data-set for each discrete value of channel power P ch .

A. System Setup and Synthetic Data-Sets Generation
To generate the training and testing data-sets required in the machine learning framework, we follow the same procedure presented in [19], where the testing data-set was generated on a channel-basis, while the training data-set was generated on a subband-basis, with each subband represented by a group of 10 adjacent channels.This assumption is based on the fact that power loss profiles do not change significantly along the frequency of adjacent channels.In [19], this choice was motivated by the need of reducing the space to explore in the generation of different partial spectral load conditions for the training data-set over the 220 channels, as 2 220 combinations of channels with on/off states were possible.Moreover, operating on a subband-basis was beneficial also to reduce the complexity at the neural network since the number of inputs and outputs is reduced.In particular, the choice of 10 adjacent channels represented a good trade-off between accuracy and complexity.
The system setup is shown in Fig. 1, path (a).Similarly to [19], also here we assume to operate both on channels and on subbands.More precisely, in case of channels the system under study consists of an ideal Nyquist wavelength division multiplexing (WDM) comb, composed of 220 channels (N ch ) in the ITU-T grid assuming 50 GHz frequency spacing over the C+L-band operating between 185 THz and 196 THz, for a total of 11 THz bandwidth.The power carried by each channel can assume the following discrete values P ch = [−10, −7.5, −5, −2.5, 0, 2.5] dBm.Instead, in case of subbands, each subband is identified by 10 adjacent channels with same frequency spacing and symbol rate as before, corresponding to a total of 22 subbands with 500 GHz frequency spacing and with power given by the contribution in power of these 10 channels.Therefore, the power levels that can be assumed by each subband are P sb = [0, 2.5, 5, 7.5, 10, 12.5] dBm.
Once generated, the WDM comb is given at the input of a single span of fiber of L s = 100 km length and then it is propagated along the span.Its power evolution along frequency and distance can be modelled by the set of ordinary differential equations (ODEs) reported in (1), one for each frequency channel (subband) j [20], [28]: . . .
) Specifically, N is number of channels (subbands) considered, z is the distance from the fiber span origin, P j (z) is the power of channel j at distance z, α j is the fiber attenuation coefficient, which can be different channel by channel (subband by subband) as in general it depends on frequency, C r (f i − f j ) is the true Raman gain profile, or Raman gain efficiency coefficient, expressed as a function of (f i − f j ), which is a characteristic of the fiber related to the effective area, and finally P j (z) is the power transferred from/to the other WDM channels due to ISRS.Indeed, since the system operates over the C+L-band, the effects caused by ISRS are no longer negligible.Since no analytical solutions are available to the expressions in (1), in general they are solved numerically by means of Raman solvers (RS).In our case, we use the RS available within the open source library GNPy [29].Thus, given both input and output power profiles, it is possible to determine the power loss L for any channel/subband j along the fiber span at any desired distance z as: A qualitative representation is shown in Fig. 1, where the power loss profiles of a subband granularity data-set are plotted with respect to the frequency at different distances.Depending on whether the training or the testing data-set is generated, the parameters of the fiber and the generation procedure can be different, for this reason they will be explained separately in the following.
1) Training Data-Set Generation: Real deployed optical communication networks are not made up of only one type of fiber, but instead different ones can be found, which are characterized by different attenuation coefficient α and Raman gain efficiency coefficient C r (related to the fiber effective area).Therefore, since the ANN should handle different scenarios, the training data-set is not generated for just a specific type of fiber.On the contrary, both attenuation coefficient and Raman gain efficiency coefficient are extracted from uniform distributions (α ∼ U[0.14, 0.22] dB/km and C r ∼ U[0.19, 0.74] W -1 km -1 , as reported in Table I), whose lower and upper limits have been selected to cover the range of values of commercial fibers.To simplify the generation of the data-sets, the fiber attenuation coefficient is assumed to be flat along frequency.Although this assumption might be too simplistic, it would not impact the validity of the methodology, because the ANN would just learn the relation between the input spectral load and the power profile at the desired distance, regardless the assumptions considered for the power profiles generation.To consider a more realistic scenario where the fiber loss profile is a function of the frequency, the ANN just need to get at its input the fiber loss profile vector instead of a single scalar.
As mentioned above and similarly to [19], the training data-set generation is performed on a subband-basis.Assuming that each subband has 50% probability of being turned on or off to emulate partial loads, for each power level value, we extract 7500 different configurations of input spectral load (M sb ), attenuation coefficients α and Raman gain efficiency coefficient C r .Then, the corresponding power profile and loss profile are determined by means of the numerical RS, with a resolution of 1 km.The frequency position of the subbands that are turned on is randomly selected.The value of M sb is selected by increasing by 50% the number of partial spectral loads generated in [19], i.e. 5000, since in this analysis also α and C r are randomly extracted.Then, the overall training data-set is obtained by merging and shuffling all the obtained profiles, for a total of 7500 (partial spectral loads) × 100 (distance points) × 6 (channel power values) = 4.5 • 10 6 samples.
2) Testing Data-Set Generation: Since the trained ANN needs to be used to make predictions on existing fibers, the testing data-sets are generated considering real commercial fibers: standard single mode fiber (SSMF), two non-zero dispersion shifted-fibers (NZDSF) Truewave-RS and LEAF [30], and two ultra-low loss fibers TeraWave SCUBA 125 [31] and TeraWave SCUBA 150 [32].Their attenuation coefficient, Raman gain efficiency coefficient and effective area are reported in Table I.In case of TeraWave SCUBA 125 and Ter-aWave SCUBA 150, the Raman gain coefficients C r are not provided in the datasheets, therefore we analytically computed them using (3): where A eff is the effective area given in the datasheets and for the Raman gain coefficient g R we assume the same value as for SSMF.Also in this case, channels are assumed to be on or off with 50% of probability.For each fiber type and power per channel level, we extract 2500 configurations of load (M ch ), such that the position of channels turned on is randomly selected.The evolution of power and loss profiles is numerically determined at every kilometer along the fiber span using the RS.The resulting testing data-set is composed of 2500 (partial spectral loads) × 100 (distance points) × 6 (channel power values) × 5 (f iber types) = 7.5 • 10 6 samples.

B. Machine Learning Framework and Artificial Neural Network Models
Once generated, the two data-sets, which are actually independent, are used to train and then test the ANNs.As discussed above, already in [19] the use of ANNs to make accurate spatial and spectral predictions of loss profiles was proposed and successfully demonstrated, but there it was limited to a single type of fiber (SSMF) and to 0 dBm power per channel.In particular, in [19] two ANN models were presented.A first ANN model, Model 1, used to predict the loss and the output power profiles at any desired distance.Then, a second model was introduced, Model 2, predicting the triplets of coefficients α 0 , α 1 and σ.These coefficients are the contributing terms to the equivalent channel loss in a span, expressed as [33]: where j represents the channel (sub-band) index.α 0,j corresponds to the fiber loss without accounting for ISRS, α 1,j represents the loss variation induced by ISRS at the beginning Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
of the span, while σ j is an indication of the speed of ISRS effect in vanishing with the optical power along the fiber span.Once α 0,j , α 1,j and σ j have been assigned to each channel (sub-band), the nonlinear interference power can be fully calculated in closed-form.Nevertheless, these coefficients, that can assume different values channel by channel (sub-band by sub-band), can not be computed in closed-form.Instead, they are determined by fitting the actual power profile with (5): such that a cost function looking at the errors along the fiber span is minimized [20].Additionally, since stronger nonlinear effects mostly occur in the first kilometers of the fiber span, where indeed the power is stronger, higher weights are assigned to the cost function in this section.
In extending these two approaches to handle multiple fiber types and channel power levels, it is fundamental to provide the ANN with useful information regarding the scenario for which it is making predictions.Fig. 1, paths (b) and (c), shows the ANN models considered in this paper.Compared to the ANNs in [19], together with the vector of spectral load S = [S 1 , S 2 , . . ., S N sb ] T , with N sb being the number of subband, and the position along the fiber L fiber , also the fiber attenuation coefficient α, the Raman gain efficiency coefficient C r and the power per channel P ch are given as features at the input of both ANN models.The vector of spectral load S is in logarithmic units (dBm).Instead, the labels at the output of the ANNs are the same as those in [19], which indeed are: the vector of loss profile L = [L 1 , L 2 , . . ., L N sb ] T in dB for Model 1, and the vector of coefficients [α 0 , α 1 , σ] = [α 0,1 , . . ., α 0,N sb , α 1,1 , . . ., α 1,N sb , σ 0 , . . ., σ N sb ] T expressed in dB/km for Model 2. We can notice that, although we have added further features, the size of the ANNs remains limited, with 26 nodes at the input and 22 (66) for Model 1 (Model 2) at the output, confirming the advantage of operating on a subbandbasis.
To create our ANN models and to train, validate and test them, we rely on TensorFlow and Keras libraries.For Model 1, the ANN architecture is composed of 2 hidden layers (HLs) and 1000 hidden nodes (HNs).The ANN weights are initialized based on Glorot initialization with uniform distribution, the default option in Keras.The training is based on the backpropagation training algorithm considering Rectifier Linear Unit (ReLU) activation function over 1000 epochs and assuming learning rate λ = 0.001.The loss function used in the training is the mean square error (MSE) computed between the target and the predicted outputs of the ANN, i.e. the true power profile from the RS and the power profile predicted by the ANN, respectively.During the training phase, 10% of the training data-set is held out to validate the accuracy of predictions and to keep under control any possibility of over-fitting and under-fitting of the ANN.A similar ANN architecture is considered for Model 2: 2 HLs, 500 HNs, ReLU activation function, MSE loss function, learning rate λ = 0.001 and 1000 epochs, but here the MSE is computed between the fitted and the predicted coefficients.Also in this case, 10% of the training data-set is used for ANN validation.
Once trained, the ANNs are used to make predictions on unseen data, in our case on the testing datasets for different fiber types and different levels of channel power.The entire testing process is schematized in Fig. 2 for both ANN models.More precisely, the power per subband, given as the contribution of the channels turned on in each subband, is provided at the input of both ANNs.Since the output layer of Model 1 provides 22 points for the loss profile, linear interpolation is performed over the 220 channels.Then, given the input power profile along channels, it is possible to compute the power profile at the desired distance.
Similarly for Model 2, for which also the coefficients α 0 , α 1 and σ, predicted with a subband granularity just for L fiber = [50-100] km range of distances (assuming 1 km step), are linearly interpolated over the 220 channels and afterwards inserted in (5) to compute the overall power profile evolution from the fiber start to any desired distance L.
To assess the prediction performance of ANNs, we compare each predicted power profile P pred and the corresponding target power profile P targ in terms of maximum absolute error (E MAX ) defined as follows: where P pred , P targ and E MAX are in logarithmic units.Instead, N f is the number of points in frequency, which actually corresponds to the number of channels.In case of ANN Model 1, P pred and P targ are represented by the power profile directly predicted by the ANN and the power profile generated by the numerical RS, respectively.Instead, in case of ANN Model 2, P pred and P targ are the power profiles obtained inserting respectively the predicted and the fitted coefficients α 0 , α 1 and σ in (5).

C. Testing Results of ANN Model 1
Fig. 3 shows the violin plots for the E MAX of the power profiles predicted at every kilometer using ANN Model 1 plotted with respect to the different power per-channel levels for the five considered types of fiber.For each violin plot we also highlight the mean value and the standard deviation in black with their values reported in the legend.Additionally, the respective maximum values are plotted as single cross marker.From Fig. 3, we can observe that up to P ch = −2.5 dBm the E MAX is always below 0.1 dB with average around 0.025 dB and reduced standard deviation for all fiber types.A similar behavior is observed when P ch = 0 dBm in case of SSMF, TeraWave SCUBA 125 and TeraWave SCUBA 150, while for the Truewave-RS and the LEAF this is no longer true.Indeed, as can be seen in Fig. 3(b) and (c), for these two last types of fiber, prediction errors can be larger than 0.1 dB at 0 dBm of power per channel, with values up to 0.36 dB and 0.31 dB, respectively.Consequently, also their mean and standard deviation values increase, corresponding to 0.1±0.075dB and 0.081±0.045dB, respectively for Truewave-RS and LEAF.
To explain this behavior, we should recall that, due to a smaller effective area, the Truewave-RS and the LEAF have higher  Raman gain efficiency coefficient C r than the other three types of fiber, as reported in Table I.As a consequence, the ISRS effect is stronger, which produces a stronger tilt on the power profile along the fiber span.
With P ch = 2.5 dBm, the highest value of power considered, both ISRS effect and tilt of power profiles become particularly significant also for SSMF, TeraWave SCUBA 125 and TeraWave SCUBA 150.Indeed, the more power is transferred from higher to lower frequency carriers, the more different power profiles can be at a given distance, also depending on the input spectral load.This affects also the prediction performance of the ANN, as it is required to handle a larger space of input-output combinations.This is clear in Fig. 3, where for all fiber types E MAX can assume values larger than 0.1 dB, with larger values obtained in case of Truewave-RS and LEAF.However, the mean values stay within 0.1 dB, except for the TeraWave SCUBA 125, which indeed on average is characterized by the worst ANN prediction performance for P ch = 2.5 dBm.
Interestingly, we can notice that for Truewave-RS and LEAF, although the maximum values of E MAX are the largest with respect to the power per channel, the mean value is slightly lower when P ch = 2.5 dBm than when P ch = 0 dBm.
Nevertheless, the ANN Model 1 analyzed here shows great abilities in making accurate predictions, performing even better than the ANN models considered in [19] under the same scenario, i.e.SSMF and P ch = 0 dBm.As a matter of fact, here with just a single ANN, trained to handle different scenarios, we are able to obtain predictions with E MAX way lower than 0.1 dB without performing span slicing into sub-spans and without considering data-sets with sub-bands partially turned on.In fact in [19], sub-bands were not only considered completely turned on or off (2 states), but they could also assume intermediate states.
The ability of ANN Model 1 in making highly accurate predictions is also confirmed by the cumulative distribution functions (cdf) of E MAX showed in Fig. 3(f) for each fiber type and considering all together the different power per channel values.Indeed, there we can see that E MAX ≤0.2 dB for more than 96.6% of cases when Truewave-RS is considered, while this percentage is ∼100% for all other types of fibers.The best performance is observed for SSMF and TeraWave SCUBA 150, for which E MAX is always below 0.2 dB and assumes values ≤0.1 dB for more than 99% of cases.
Apart from the prediction accuracy of the trained ANN, it is fundamental to compare its computational effort with respect to the numerical RS.For this purpose, we run a speed test on the same server (Intel Xeon CPU E5-2690 v4 @ 2.60 GHz) at same conditions.The RS requires ∼3 minutes to provide the complete evolution of the power profile for a single input partial load along frequency channels (220 channels) and distance up to the span length (1 km spatial granularity is assumed).On the contrary, the ANN Model 1 is capable of predicting the overall power profile evolution in only 0.24 s, proving the potential of the ANN to be used in real-time in-field applications.

D. Testing Results of ANN Model 2
Once trained, the ANN Model 2 is used to make predictions for fiber spans of length L fiber = [50-100] km with a resolution of 1 km.This means that the coefficients predicted for a desired length L fiber are inserted in (5) to compute the complete power profile evolution from 1 km up to L fiber .However, unlike the ANN Model 1, the prediction performance of the ANN Model 2 is analyzed over two ranges of distances, since the NLI generation is different along the fiber span.In particular, the prediction errors are evaluated from 1 km to L fiber and from 1 km to 30 km, a value close to the effective length of the considered fibers spans, where indeed most of the NLI is generated.
Fig. 4(a)-(e) shows the resulting violin plots of the E MAX for distances from 1 km to L fiber for the five different types of fibers and different power per channel values.Similar observations can be made with respect to the results obtained considering ANN Model 1, but slightly larger errors are observed here.Also in this case, the prediction errors are larger for higher values of power per channel, especially for Truewave-RS and LEAF.Indeed, for P ch = 2.5 dBm, the average E MAX are 0.11 dB and 0.069 dB, respectively for Truewave-RS and LEAF, and for some few cases E MAX assumes values beyond 0.5 dB.
Slightly lower errors are obtained when P ch = 0 dBm, as the mean E MAX reduces to 0.082 dB and 0.063 dB, respectively for Truewave-RS and LEAF, and all values are within 0.42 dB.Also the TeraWave SCUBA 125 shows larger errors when P ch = 2.5 dBm, with average and maximum E MAX values corresponding to 0.077 dB and 0.32 dB, respectively.On the other hand, highly accurate predictions are achieved for the other fiber types and power per channel values.This is demonstrated by their average E MAX which is always below 0.05 dB.
Like for ANN Model 1, to have an immediate idea of the ANN Model 2 prediction performance, also here we evaluate the cdfs for the different types of fibers including all power per channel values.The results, reported in Fig. 4(f), show that for all fibers, regardless of the value of power per channel, E MAX ≤0.2 dB for more than 97% of cases, confirming again the capability of the ANN model to provide accurate predictions.
However, if we analyze the prediction performance in the first 30 km of the fiber span L fiber , represented by the violin plots in Fig. 5, we can observe that the trained ANN model is able to provide highly accurate predictions also for Truewave-RS and LEAF with higher P ch , as all E MAX values are below 0.2 dB and their average never exceeds 0.035 dB.Interestingly, results obtained for SSMF and P ch = 0 dBm are comparable to those presented in [19], which correspond to the same scenario.This means that, although the ANN Model 2 has been trained considering a more extended data-set, with different fiber types and power levels, its prediction accuracy was not affected.On the contrary, the ANN has been enhanced to handle a more general scenario without loosing in accuracy.
In addition, the overall prediction performance of ANN Model 2 in the first 30 km of fiber span is summarized by the cdfs in Fig. 5(f) for the different fiber types analyzing together different values of uniform channel power, where we can see that E MAX ≤0.1 dB for ∼100% of cases.
Like for the ANN Model 1, also for the ANN Model 2 we observe a substantial reduction of the computational time required to determine the spatial and spectral evolution of the power profiles.In particular, when using the trained ANN, the determination of α 0 , α 1 and σ for a single power profile is  obtained in only 0.24 s.This is a very low value compared to the 3.48 s taken by the approach described in [20], where 3 s were required by the numerical RS and 0.48 s by the fitting.

III. ANN FOR IN-FIELD SYSTEM PERFORMANCE EVALUATION: MODEL 3
Whether in the previous section we extensively analyze ANNs Model 1 and Model 2 and the potentiality of using them in support of linear and nonlinear modeling for QoT estimation, in this section we present the ANN Model 3 for an in-field system performance evaluation.For this purpose the data-sets used for ANN training, validation and testing are generated experimentally.

A. Experimental Setup and Data-Sets Generation
The experimental setup considered for the training and testing data-sets generation is shown in Fig. 6, path (a).Like for the synthetic data, also here the system considers a WDM comb over the C+L-band, which consists of N ch = 95 channels with 75 GHz frequency spacing and 64 GBd symbol rate.Two different widespectrum noise sources are exploited to mimick 45 channels in the L-band and 50 channels in the C-band.In particular, two programmable optical filters (Finisar Waveshaper) shape the noise signals to generate an equivalent flat WDM spectrum at the output of the following erbium-doped fiber amplifiers (EDFA).Afterwards, C and L-band channels are combined in a single WDM comb whose power is controlled by a variable optical attenuator (VOA) so that the power per channel assumes the following discrete values P ch = [−10,−7.5,−5,−2.5,0]dBm, where the highest value is 0 dBm due to hardware limitations.An optical spectrum analyzer (OSA) measures the spectrum of the transmitted signal after it goes through a 10% splitter and the first port of a switch.The remaining 90% of the signal propagates over an ITU-T G.652 standard compliant Corning SMF-28 fiber span and the switch before reaching the OSA again.Finally, the measured spectra are re-scaled to recover losses introduced by the splitter and the switch.For our analysis, three fiber spools with different nominal span lengths L span = [20,40,60] km and approximatively the same loss of 0.19 dB/km are considered.
Similarly to Section II, also here training and testing data-sets are generated on subband-basis and channel-basis, respectively, where each subband is formed by five adjacent channels, for a total of 19 subbands.Thus, channels and subbands are assumed to have 50% probability of being on or off to enable partial spectral loads.In case of subbands the selection of the on/off state is on the subband, therefore when the subband is on, this means that all channels of that subband are on.Similarly when the subband is off, all channels of that subband are off.
Specifically on the data-sets generation, for each value of span length L span and power per channel P ch , we extract M sb = 3000 and M ch = 1500 different partial spectral loads, for a total of 45000 and 22500 samples, respectively for training and testing data-sets.Then, once we have the power spectral density (PSD) of the received spectrum P SD RX dBm at the fiber output, we compute the corresponding loss profile L, in logarithmic units, as: L = P SD T X dBm − P SD RX dBm (7) where P SD T X dBm is the power spectral density of the input spectral load.
The data have been collected during different measurement sessions, thus, despite the measurement process is repeatable, the experimental setup conditions might slightly change from one session to another one.

B. Machine Learning Framework and Artificial Neural Network Model
The considered ANN model, reported in Fig. 6  It is worth noticing that, like in case of synthetic data, here as well we operate on subbands, which advantageously allows to reduce the complexity of the ANN without significantly affecting the prediction accuracy.Therefore, when we test the trained ANN, also the spectral load profiles of the testing data-set at the input of the neural network are considered with subband granularity.To determine the power profiles at the end of the fiber span, for simplicity referred to as predicted power profiles, a linear interpolation from subbands to channels is performed over the predicted loss profiles, to which we add the original input spectral load at channel granularity (Fig. 6 path (c)).

C. Testing Results
Like in case of ANNs Model 1 and Model 2, the prediction accuracy of the ANN Model 3 is assessed by computing the E MAX between the PSDs of the power profile measured at the end of the span and the profile predicted by the ANN.The results for the three different fiber spools at different power per channel are reported in Fig. 7 in the form of violin plots.The average and the maximum values of E MAX are plotted as single square and cross markers, respectively.Similarly to the results obtained for the synthetic data, in general the errors are smaller for lower values of P ch , with an increasing trend for the mean E MAX .Specifically, it increases from ∼0.08 dB to 0.23 dB for L span = 20 km, and from 0.11 dB to 0.28 dB for L span = 40 km.This is no longer observed for L span = 60 km, as shown in Fig. 7(c), where the average E MAX is in the range from 0.13 dB up to 0.37 dB, but the increasing trend from lower to higher power per channel is no longer respected.This might be explained by recalling that the experimental data have been collected during different sessions, thus some samples of training and testing data-sets may be affected by uncertainties affecting losses.Nevertheless, with values of E MAX all lower than 0.5 dB, the trained ANN Model 3 is able to provide predictions with particularly high accuracy, showing great potentialities to be used for in-field system performance evaluation in an ultra-wideband transmission scenario.

IV. CONCLUSION
In this paper, three different ANN models have been presented to operate in an UWB system scenario affected by inter-channel stimulated Raman scattering (ISRS).
Two ANN models, Model 1 and Model 2, have been proposed to predict the evolution of power profiles along the fiber span in support of linear and nonlinear modeling of propagation effects, respectively.In particular, this analysis was performed considering synthetic data over a wide range of system scenarios, including different types of fibers and a set of discrete power per channel values.These assumptions significantly extend the study presented in [19], where only a single type of fiber and a single power per channel value were considered.From the obtained results, with E MAX ≤0.2 dB for more than 96.6% of cases using Model 1, and E MAX ≤0.2 dB for 96.9% of cases using Model 2, both ANNs proved to be highly accurate in predicting power profiles at any desired distance for real fibers.Noticeably, in case of ANN Model 1, E MAX is always below 0.1 dB for P ch ≤-2.5 dBm regardless the type of tested fiber, while in case of Model 2 E MAX <0.1 dB for all fibers and power per channel values when the prediction errors are evaluated in the first 30 km of the fiber span, where indeed most of the NLI generation takes place, as the power is higher.
The third ANN model (Model 3) has been introduced as a proof of principle to determine the power profiles at the end of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the fiber span for a fast in-field system performance evaluation based on actual measurements.The study was carried out over experimental data generated considering three fiber spools of SMF-28 with different span lengths and for a set of discrete values of power per channel.Assuming a measurement uncertainty of ∼0.1 dB, the results showed pretty accurate power profiles predictions for all considered scenarios, with E MAX always below 0.5 dB.
The obtained results demonstrate the feasibility of using machine learning and artificial neural networks to obtain fast and accurate power profile prediction both in support of system modeling for the evaluation of QoT and in in-field applications.
In addition, the ANN-based solution becomes even more beneficial with respect to standard approaches when more than one power profile needs to be evaluated at a time.As a matter of fact, the computational time for the numerical solver and for the fitting process increases linearly with the number of profiles to predict, as they are invoked iteratively on each profile.Instead, the ANN is capable of handling large sets of data, since it operates on matrices, providing the prediction of power profiles and of coefficients α 0 , α 1 and σ in few seconds for thousands of different spectral loads.

Fig. 1 .
Fig.1.Paths: (a) Simulation setup considered for data-sets generation, (b) ANN Model 1 used for power profiles prediction and (c) ANN Model 2 used for prediction of parameters α 0 , α 1 and σ used in the closed-form model (CFM) formula to evaluate power profile evolution.M sb represents the number of different spectral loads extracted for the sub-band granularity data-set for each discrete value of sub-band power P sb , while M ch corresponds to the number of different spectral loads extracted for the channel granularity data-set for each discrete value of channel power P ch .

Fig. 2 .
Fig. 2. Testing process from subbands to channels for (a) ANN Model 1 and (b) ANN Model 2. For Model 1, the loss profiles L, the input power loads S and the output power profiles P are in logarithmic units.

Fig. 3 .
Fig. 3. Violin plots versus power per-channel of the maximum absolute error (E MAX ) computed between the power profiles predicted using Model 1 and the actual power profiles at every kilometer for: (a) SSMF, (b) Truewave-RS, (c) LEAF, (d) TeraWave SCUBA 125 and (e) TeraWave SCUBA 150, at different spectral uniform launch power profiles.For each violin plot the mean value and the standard deviation are also reported, together with the maximum value, plotted with cross marker.(f) Cumulative distribution functions of E MAX for the different fiber types including all power per channel values.

Fig. 4 .Fig. 5 .
Fig. 4. Violin plots versus power per-channel of the maximum absolute error (E MAX ) computed between the power profiles predicted using Model 2 and the actual power profiles for distances from 1 km to L fiber =[50-100] km in case of: (a) SSMF, (b) Truewave-RS, (c) LEAF, (d) TeraWave SCUBA 125 and (e) TeraWave SCUBA 150, at different spectral uniform launch power profiles.For each violin plot the mean value and the standard deviation are also reported, together with the maximum value, plotted with cross marker.(f) Cumulative distribution functions of E MAX for the different fiber types including all power per channel values.

Fig. 6 .
Fig. 6.Paths: (a) Experimental setup used to generate training and testing data-sets for different span length (L span ) nominal values.EDFA: erbium doped fiber amplifier, WXC: wavelength cross-connect, VOA: variable optical attenuator, OSA: optical spectrum analyzer; (b) ANN Model 3 considered for the ML-framework; (c) ANN testing process over channels.

Fig. 7 .
Fig. 7. Violin plots of the maximum absolute error (E MAX ) computed between predicted and actual power profiles of testing data-set for different power per channel values and for different span lengths: (a) 20 km, (b) 40 km and (c) 60 km.For each violin plot the mean value and the standard deviation are also reported, together with the maximum value, plotted with cross marker.
path (b), is a feed-forward neural network based on TensorFlow and Keras libraries and characterized by 2 HLs and 500 HNs.Also for Model 3, the weights of the ANN nodes are initialized using Glorot initialization with uniform distribution.The features at the input are represented by the spectral load information S = [S 1 , S 2 , . . ., S N sb ], the length of the span L span and the power per channel P ch .The loss profile at the fiber output L = [L 1 , L 2 , . . ., L N sb ] corresponds to the labels at the output of the ANN.The ANN training is based on back-propagation algorithm assuming ReLU activation function, MSE loss function, λ = 0.001 learning rate and it is performed over 1000 epochs considering 90% of the training data-set.The remaining 10% is held out for ANN validation.

TABLE I PARAMETERS
FORDIFFERENT TYPES OF FIBER