Model-Aware XGBoost Method Towards Optimum Performance of Flexible Distributed Raman Amplifier

Toward the next-generation ultra-long-haul optical network, an extremely gradient boosting (XGBoost)-aided machine learning (ML) model is proposed to maximize the flexibility and uniformity in the performance of distributed Raman amplifier (DRA). In order to achieve an accurate prediction of desired signal gain spectrum and bit error rate (BER), a novel decision-tree based system is employed against inconsistent dimensionality between pump frequency and power. The impact of various model evaluation techniques: mean squared error (MSE), coefficient of determination (R2), root mean square measured data ratio (RSR) and the Nash-Sutcliffe coefficient (NSE) are discussed in detail. It is shown that the proposed method can diagnose the fault within 2.3 ms with accuracy of 99.6% and has also the highest estimation and efficacy in comparison with other ML based tree models. The reported work transforms the successful implementation of XGBoost model to estimate the desired gain profile and BER of DRA in low-loss optical wavelength region (1260–1650 nm).


I. INTRODUCTION
D ISTRIBUTED Raman amplifiers' (DRA) positive amplification over a past decade, have been identified as an optimistic solution [1] against Erbium doped fiber amplifier (EDFA) [2] in spell of noise figure and owing to flattened gain profiles. DRA is advocated as a promising solution for augmentation of transmission capacity beyond the S+C+L [3] windows and also supports positive amplification at low loss optical wavelength region (1260 nm-1650 nm) in WDM network [4], [5]. However, broadband amplification over RAs by employing multiple pumps at distinct frequencies gives rise to a challenge to optimize dimensionality between pump frequency and power [6].
Recently, machine learning (ML) is renowned as a new direction in research and innovation to address many emerging challenges of fiber Raman amplification. ML based advanced genetic algorithm [7], [8] proposed in heuristic optimization methods category are prone to find local optimums in Raman amplification including optimization of various state of art approaches based on heuristics [9], [10]. Furthermore, ML methods have been used in the optimization of forward and inverse system models to train the required combination for pump power and gain fraction [11], [12], particularly deep learning methods which train the models for the appropriate pump power and frequency for a given target gain profile [12]. This in turn, deep learning predicated neural network (NN), imparts an excellent performance in optically amplified network on the huge dataset collection. A flexible method for the optimization has been proposed [2], [13] for optimization of the Raman amplification in forward and backward propagation. Moreover, NN has tremendous performance on the dataset of text, image, audio, and language; but, outperforms on tabular data and regression types issues with less number of samples [14]. With greater impact of gradient boost tree models such as XGBoost [15], random forest (RF), support vector machine (SVM), and adaptive boost (AdaBoost) on tabular data are having outstanding performance [16], [17], [18]. Among all decision trees models applied in practice, the XG-Boost algorithm draws wide-spread attention towards considerable research interest in industrial machinery, power system, and industrial infrastructure domains owing to its better prediction decision effect, computing speed, strong portability etc. and also moderately in optical network [19]. Moreover, XGBoost model has also been evolved for the optimal design of multimode of fiber, in addition, the diagnosis for cause-aware network failure with better estimation and accuracy [17], [19]. In the unrepeatered devised long haul optical network, it is a challenge to contest the frequency of multiple high pump lasers and pump power altogether to yield the uniform flattened DRA gain in ultra-short band. The concerned issues can be resolved using supervised ML based decision tree XGBoost algorithm.
In this work, an accurate XGBoost method is proposed to optimize the performance of flexible DRA interrelated to spectral variations of signal gain and the bit error rate (BER) profile. The algorithm is implemented using the principle of MLbased regression techniques. DRA has rendering advantages of ML by estimating the prediction of testing and training dataset (discrete variable catalogue), collected in consideration of forward, backward and bidirectional pumping propagation simulation modules. The XGBoost has a unique conjunction with an experimentally trained dataset. The prediction capacity of the XGBoost is evaluated for the signal gain and BER at the receiver side. Model evaluation statistics MSE, R 2 , RSR and NSE are investigated by the assessment of each input parameter with the influence of training and testing dataset. The proposed model is compared with other ML methods such as RF, SVM, and AdaBoost to justify the predictive quality and capacity of XGBoost algorithm, by employing the statistical performance measures.

II. METHODS AND DESCRIPTIONS
A supervised machine learning (SML) develops a desired output by mapping input function. The test labelled datasets is used to train the categorized data and predict outcomes accurately. It represents as input labelled data, a sequence of historical instruction data consisting of both input figures and corresponding estimations. SML categorized the dataset into classification and regression type issues in accordance with the output is constant or distinct. The prediction of the output desirable dataset of DRA can be treated as a regression type issues.

A. XGBoost Algorithm
XGBoost is an ensemble ML method based on the gradient boosting decision tree principle. It has built-in feature to make the training faster with large dataset. It is concatenated with gradient boosting framework approach to resolve the classification and regression type modelling issues. In XGBoost algorithm, the weaker base learner (decision tree) is added to other learners and good enough to correct the errors produced by prior models in prediction. The additive modelling iteration supports boosting approach, and a basic weaker learner turns into the stronger learner, as shown in Fig. 1. As the XGBoost model was proposed by Chen and Guestrin [15] using the gradient boosting decision tree framework that leads structured and tabular dataset for resolving classification and regression type prediction modelling problems. XGBoost enhances the accuracy through the mean squared error differentiable loss function and the gradient descent optimization algorithm that follows first and second order term of the Taylor's series expansion and the complexity of the expanded term is restricted by adding a regularization term. The regularization term controls the over-fitting and helps to equalize the final learning weight w. XGBoost fits to find and predict the regular term to restrict over-fitting through row and column types sampling. The algorithm supports parallel [17] (splitting procedure of nodes) and distributed computing (multi-threading) at the same time, therefore it probes that the fastest model is plausible. The description of XGBoost algorithm [15] for the given dataset of n samples and m attributes D ={(x i , y i )}{|D| = n,x i ∈ R m , y i ∈ R} , in addition the k does prediction for the output values of a tree ensemble model in following manner: from which F can be defined as tree size for regression, subsequently, f k function is defined in (1),intended towards the independent built decision tree structure which contains leaves and their weights represented by z and w respectively, also the shape of the tree is depicted by z and total number of leaves in the tree is shown by S. Meanwhile, in the construction of the decision tree errors can be produced, and errors must be minimized to fulfil the target of the XGBoost model: In above equation, actual and predicted values are represented by y i and y i . The term u is defined as a difference target function that measures the error between actual and predicted values respectively with r that is used in reiteration in order to reduce the errors and ω is defined as a complexity function added with decision tree function in the designed model w.r.t the function f k as where w is discussed earlier as a weight of leaves or edges or a blade node including γ, as an least loss function in addition to while the node isolation exist. As we level the final leaves weights in sequence, γ and λ(regularization function) manage the complexity of the constructed decision tree. Finally, the target objective function can be resolved by Taylor's series equation and simplification as wherein the first and second order derivatives are prevailed as loss function terms p i and q i positively [17].

B. Random Forest
The RF algorithm can be used to resolve the classification and regression type's issues with its basic structure and low computational cost. It can be implemented in complex task based continuous and categorical dataset. As an ensemble tree model, it adopts bagging principle in parallel mode while boosting principle in sequential mode to train the basic learner [20]. Final prediction is obtained by the mean value of the all individual decision trees at the output value of the employed RF regression problem. All decision trees are individually trained and MSE function is used for separating leaves and decision tree [17]. The number of used decision trees and depth or leaves define the overall computational cost.

C. Support Vector Machine
The SVM method can be used for regression type challenges by using its classification characteristics. Support vectors are the data-points or the coordinates of separate point in 2 dimensional plane at which hyperplane (line or circle) can be realized. Toward the optimal prediction of the hyperplane, at first a hyperplane is selected then kernel technique is adopted to keep the actual data in small dimensional space and thereafter convert it into large space. Randomness or nonlinearity in dataset is preferable by SVM. The computational time and cost depends on the kernel type and its linear or nonlinear nature. As we intend towards the best prediction or finely tuned hyperplane, it become a time consuming process analyzed by time complexity О(n 2 ) and О(n 3 ) procedure where n is datapoints [16].

D. Adaptive Boost Algorithm
It is one of the ensemble machine learning technique in which strong learning algorithms is built up through the weak learning methods. Decision tree has only 1-level split, known as stump. The built-in model is given by common weights to each of the data points. By improvement in the basics of algorithm, the training weight can be adjusted as per dataset. Meanwhile the training remain continue till the least error is not observed. During the training phase, AdaBoost develops 1-split learners with low accuracy that grasp improvement with its antecedents. In order to get better accuracy the AdaBoost required to tune finely the hyperparameters. As less process is required priorly in the estimation of dataset, so AdaBoost is less susceptible from over-fitting loss [18].

III. SYSTEM SET-UP AND MODELLING
A single pumped forward-backward DRA module is considered, where the pumps transmit in the both directions for the forward signals. The schematic diagram of employing the extreme gradient boosting (XGBoost) technique to a designed DRA communication system is shown in Fig. 2. The spectral variations in the signal gain along with BER outline is observed in the existing region 1260-1650 nm of DWDM network. A mono pump DRA module is designed to justify the desired variation in the signal gain and BER.
The proposed framework is incorporated with forward signals, forward and backward pumps, Simulink based programmed embedded block sets, DWDM links and XGBoost algorithm based learning model. In the designed framework, the selection of the pump parameters (i.e., pump reference wavelength and pump power) and input signal power, channel wavelength, signal attenuation, and pump position determine the signal net gain directly. As an important part of design procedure, the mathematical model of DRA is the foundation of realizing the optimization.
The Raman amplification behavior is dealt by Raman coupled differential equations, where the interactions of the pump with pump, signal along signal, and pump towards signal, as following form written for the forward-backward pump propagating waves [8], [21]: where the superscripts + and − represents the forwardpropagating and backward-propagating pump signals. z is the distance variable along the fiber. v and μ parameters depict the optical frequencies at which spontaneous Raman scattering, pump interactions and Rayleigh backscattering noise are considered individually for each frequency, while the amplified spontaneous noise is included in the first two integrals along with Raman amplification. Signal gain is calculated at various z by integrating the (6) from z = 0 to z = L, the output signal is received at the output of DRA and hence the net gain of the signals is calculated with frequencies of signals and pumps. P f (z, μ) or P b (z, μ) represents the optical power (pump or signals) at the optical frequencies v and μ in forward and backward pump propagation at the pumping distance z in km. g R (μ − v) is the Raman gain coefficient from the frequency v to the frequency μ. The third term α(z, v) is signal attenuation coefficient, considered as fiber attenuation loss at respective distance within the single mode fiber and is validated on the basis of pump position and types of input signaling (forward/backward) whereas the fourth term represents Rayleigh backscattering noise. The signal gain for any channel at multi-wavelength platform can be written as where L is the transmission length, P s (0), the input signal power, P s (L) is output signal power at the end of the fiber.

A. Proposed Transmission Set-Up Using XGBoost Method
The other characteristics of DRA is bit error rate, can be calculated on the basis of received photocurrents after de-multiplexing of the signals along with the quantitative values of all parameters. The BER is realized at the receiver as [22], [23], [24] where the quantum efficiency η defines the types of photodiode and as a photon drops on photodetector, considered as bit 1, therefore total no. of photons is represented by N p . Differential phase shift keying (DPSK)/Non-return to zero (NRZ) modulation format is employed at the transmitter, shown in Fig. 2.
The relation B = 1/T correlates the bit duration and bit rate respectively. Hence, received optical power Ro P will be Where Np is the total number of photons, Δf is the bandwidth and hυ is the energy of the photon.

B. Modelling
The proposed workflow (Fig. 3) carried out the following steps in sequence:- Step I. Simulation output of the designed DRA framework: In the first step, MATLAB Simulink based DRA model set up is carried out to coordinate with data for signal gain and bit error rate of multi-wavelength channels.
Step II. Data collection and preparation: In the second step, the desirable dataset is collected in the form of sample index and utilized to build the training and testing datasets. The 10% dataset is considered for testing, 85% dataset for training whereas 5% dataset for validation purpose respectively.
Step III. Arrange desirable training and testing dataset to initialize the course of action for XGBoost algorithm: In this step, training dataset is utilized for training the model, based on XGBoost algorithm. The hyper parameters are optimized by user defined parameter which is carried out via multiple time runs on the training data and thereafter analyzing the performance of the ensemble training model with testing data.
Step IV. Evaluation measures and validation of the proposed model: In the final step, testing dataset is adopted to validate the proposed model through the model evaluation statistics: error index, standard regression and dimensionless measures.

IV. DATASET COLLECTION
The dataset collection is assembled from the successional sequence of wavelength in wide-band span O-E-S-C-L-U band. It is having 0.4 nm ultra-short band channel spacing. A mono pump signal of 500 mw is tuned at 1480 nm frequency throughout the DWDW channelization as low loss wavelength envelopes lie within this region. The designed model is propagated in the forward, backward and bidirectional pumping module. To robust the network transmission capacity, gain and performance of the designed DRA, 100Gbps bit rate is implemented via DPSK and NRZ modulation format. Moreover, the variation in gain profile of DRA in the SMF fiber span of L 1 = 100 km with 40 km fiber length of Raman amplifier for the measurement of OSNR and signal gain at 10 km, 20 km, 30 km and 40 km subsequently. Variation in the signal gain from 4dB to 70dB is observed in the entire optically amplified region accorded to the pump position, Raman gain efficiency and signal attenuation coefficient.
Dataset is summarized on account of the finely tuned hyperparameters. The hyperparameters of the gradient boosting decision tree ML models are tuned and initiated best using the GridSearch cross validation technique. For finding the best parameters we have used the techniques listed in the article [14]. After cross validation and finely tuned hyperparameters, dataset is utilized on training of data. The permuted dataset arranged in the labelled for training and testing data as: -train: validation: testing = 85: 5: 10. Furthermore, hyperparameters are tuned accorded to the validation dataset and executed the best prediction values on testing data. Dataset summary is an influence of all input parameters that influence the gain and BER of DRA is considered.

V. MODEL EVALUATION STATISTICS
To recognize the involved techniques for the evaluation of the proposed model, a substantial review of literature [25], [26] is sighted for the estimation, prediction and validation of the designed model. Three prime categories of the model evaluation techniques are: Error Indexed, Standard Regression and Dimensionless measurers.

A. Error Indexed Measures
The variance is computed in terms of the dataset of interest. MSE, MAE and RSR are error indices used mainly in evaluation of the model.

1) Mean Squared Error (MSE) Estimation:
MSE is estimated through the predicted values close to the regression line of actual values. Deviations from the actual value and the nearby values are represented as an error. To avoid the negative values, squaring is done. Like variances, it grants more weight when the differences become large. Average set of errors naming it's a mean squared error. Lower the MSE, better the forecast. It can be expressed as Where y i andŷ i is the actual data and the predicted data on account of the gain and BER dataset respectively.

2) Root Mean Square Error (RMSE) Measured Data Standard Deviation Ratio (RSR):
It can be calculated as the ratio of variance of measured data and RMSE. The lesser the value of RSR, lower the RMSE and better the performance of simulated model. RSR is incorporated with normalization factor, so that the reported result can be applied to various constituents.

B. Standard Regression Measures
The coefficient of determination (R 2 ) relates degree of succession between measured data and simulated data. The proportionate in the variance of measured data exposed by the model is described by R 2 , that is It ranges within 0 to 1, higher the values showing shorter the error variance.

C. Dimensionless Measures
Nash-Sutcliffe coefficient (NSE) and index of agreement (d) are two categories of dimensionless measures. NSE is recommended due to normalized statistics and provides considerable information on reported dataset. It can be expressed as correlation of the residual variance with the standardized dataset variance, so that From (10)-(13), n is the total no. of data, y i is the true or actual data on account of signal gain and BER whileŷ i is the predicted data for signal gain or BER respectively.ȳ is the mean of the actual or true data.
The approximated value of coefficients of determination R 2 is 1 which states that the proposed model has best fits on the true/actual data. If R 2 > 0.8 ∼ 1, the proposed model is deemed robust. Also NSE is a normalized statistic and regulates the level of residual variance of the data being measured. The variations in NSE scale is between -Ý to 1, an accurate matching is observed as the value of NSE become 1. Although, the value examined within 0.6 -0.7 is also effective and represent a well-built model with highly correlation [25]. The RSR is calculated by division of the root mean square error and the standard deviation of the observed data. RSR varies from 0 to the optimal significant positive value i.e., ranges from 0 to positive values. Variations in RSR are: RSR > 0.700; 0.600 ≤ RSR ≤ 0.700, 0.500 ≤ RSR ≤ 0.600 and 0.000 ≤ RSR ≤ 0.500 [25], [26].

VI. RESULTS & DISCUSSIONS
The predicting variables are provided via input set (x), given by [B, P, T, d, λ] as shown in Table I, while the referenced variable (y) is signal gain and BER of DRA. Each stage of the training acquired desirable size of testing and training datasets. As a consequence, 85% (100 testing sample index) of the total dataset are enumerated to generate the model in spell of 15% persisting data (37 testing sample index) which is employed to testing the developed model. Trial and error procedure is adopted to tuned the parameters of XGBoost model and ascertain optimal hyperparameters values are applied to estimate the exact signal gain and BER of DRA precisely. The present work optimizes more or less substantive parameters of XGBoost and clears up the definition of used hyperparameters. The tuning up the parameters of selected models were changed throughout during the trial procedure till the resolution of best metrics representation, as shown in Table I.

A. Mean Squared Error Estimation Analysis
The comparative plots of the signal gain and BER is shown in Fig. 4, wherein the testing inputs are sorted and then plotted for better visualization. It can be perceived from Fig. 4(a) that the actual values and the predictive values of XGBoost model are superposed with one another that exemplified an adequate functioning of XGBoost model, even on unseen data that is testing data. In addition, the MSE estimation analyzes the performance of testing and training phase data in regression form, shown in Fig. 4(a). It can be also observed that the XGBoost contributes the best prediction on actual data along the signal gain appearance in converging shape lying spectral variations in the signal gain level from 70dB to 4dB steadily for the optical wavelength region of 1260-1650 nm. The proposed XGBoost is having balanced prediction for the gain pitch in the entire optical region. Moreover, RF, SVM and AdaBoost based MSE estimation is deficient as compared to the XGBoost model in our case. Fig. 4(b) depicts the log BER values to estimate the net BER between the actual values and predicted values.
The divergence in BER is cataloged from 10 −285 to 10 −5 for the entire multi-wavelength range from 1260-1650 nm, wherein the training dataset act in accordance with XGBoost model absolutely, although other ML models estimate inadequate prediction level to the training dataset. As it can be distinguished from the results and discussion section related to Fig. 4, that the MSE estimation is a regression type problem. The final model is assembled on the succeeded input variables: -wavelength (λ), bands (B), input signal power (P ), position of the pump for the calculation of the gain in km (D), type of modulation format (M ), prototype of the pumping scheme (T ) though the output variables are BER (ber) and signal gain (G). The signal gain is correlated to the following functional components, therefore, the requisite model is devised as It is explicitly stated that the dataset is either categorical or continuous (Fig. 2), therefore the regression types issues are correctly justified through the Tree model. Further, normalization and 1-hot wire encoding are adopted in string indexing. Hence, the band can be exemplified: The final estimated results using the XGBoost model is accurate and with least numbers of errors. Each model had different parameters, represented in Table I. The observed output on the utilized parameters are: -1) Signal gain -learning rate = 0.5922448979591837 2) BERlearning rate = 0.8165102040816327 In Fig. 5, a comparison among different ML models and their performance on training and testing data for the MSEs of gain and BER using different variables is shown. The MSEs of training data for ML models (XGBoost, RF, SVM and AdaBoost) are very less, but only XGBoost has equally low testing data MSE. It reveals that even with hyperparameters tuning of the other three ML models, the outcome over fit on training data and hence they are not able to perform well on testing data. On the other hand, the XGBoost model takes into account the variability and robustness and hence performs outstanding on testing data as well. For the output signal gain and BER, the XGBoost model exposed the accurate error estimation wherein MSE is 0.705667 and 1.316402 for training and testing data for the signal gain, shown in Fig. 5(a). Further, from Fig. 5(b), it can be observed that the XGBoost model exhibits the best estimation for the BER also, consequently the MSE for training and testing dataset is 0.0187373 and 0.7731500 respectively against a comparison with RF, SVM, and AdaBoost Algorithm, is represented in Table II.

B. Prediction Performance of the Comparative Models on Statistical Indices
As far the prediction accuracy is concerned, the XGBoost has the best performance as compared with other three algorithms. The predictive outcomes in terms of the training and testing datasets are shown in Fig. 6.
In addition, from Fig. 6(a), the signal gain prediction results on testing data, explored that the XGBoost model has highest prediction with the values of R 2 is 0.990852, RSR is 0.095646 and the NSE is 0.990852, followed by RF,SVM and AdaBoost algorithm subsequently. On the other hand, from Fig. 6(b), the prediction results for the signal gain on training data explore that the XGBoost model has the highest prediction with values of R 2 is 0.977906, RSR as 0.148641 and NSE with 0.977906, trailing by the RF, SVM, and AdaBoost model. The demonstration of these predictive outcomes reveal in context of all four model comparison, the XGBoost model become superior to the other models and offered an equilibrated prediction throughout on testing and training datasets.
As discussed earlier, the predictive performance of the training and testing datasets interrelated to BER is shown in Fig. 7. The BER prediction on testing data, shown in Fig. 7(a), exhibits that the XGBoost model has upmost prediction with the values of evaluation parameters R 2 (0.999745), RSR (0.015955) and NSE (0.999745), compared to RF, SVM, and AdaBoost, method as represented. Furthermore, other three models are having comparatively least accuracy and prediction owing to undesirable standard regression and dimensionless measurement issues. The BER prediction is examined on training data is presented in Fig. 7(b), which explore that the XGBoost model has maximal prediction with evaluation parameter wherein the value of R 2 is 0.999999, RSR as 0.743902 and NSE with 0.999999 as comparing with other three models RF, SVM and AdaBoost. XGBoost performs outstanding in model evaluation techniques R 2 , RSR and NSE, as depicted.
The training time is related to the CPU calculation speed and the prediction time or test time is one of the important parameter for gradient boosted trees models that rationalize the model how fast it is. The test time is being calculated on the depth of tree estimated for testing and training data. Variations in the prediction time of XGBoost model with SVM, RF and AdaBoost on training and testing dataset is depicted in Fig. 8(a), the test time measured on signal gain dataset is the lowest, that is 0.00588 seconds for XGBoost, as compared with RF, SVM, and AdaBoost, as shown in Fig. 8(a). Moreover, it can also be observed that the time elapsed to train the signal gain dataset for XGBoost is 80.68841 seconds prevailing by the depth of trees while in comparison with SVM is 8.242129 sec, RF with 8.195554 sec and training time for the AdaBoost is 3.745477 sec.
Although, finding huge datasets to train actual XGBoost consumes more time, however, as it completed the training, the trained model can be used consistently in future and therefore rapid prediction is performed. In contrary, RF/SVM/AdaBoost requires to reconstruct the loop and iteration in converging while the predicted target is reciprocated. Further, as iteration depth increases during classification, it results enhancement in the training time and test time of the XGBoost and other three gradient boosting ML models.
Alternatively, it can also be revealed from Fig. 8(b) that the test time is lowest for XGBoost model as compared with others ML model. In contrast, the XGBoost model is a fast learner, having shortens test time on prescribed data. Further, the test time measured for XGBoost model for BER dataset is 0.00552 second, which is the lowest one among RF, SVM, and AdaBoost, as presented in Fig. 8(b). The time elapsed in the training of BER dataset is 83.572 seconds, which is comparatively 2.884 seconds more than the training time of signal gain dataset. Meanwhile, other ML models RF, SVM and AdaBoost are having comparatively less training time than XGBoost model and the comparison of elapsed training time and test time is shown in Fig. 8(a) and (b). It can also be stated that XGBoost and other three models are trained and tested individually for the assessment of signal gain and BER profile, correlates with approximately equal test time and elapsed time.

VII. CONCLUSION
A novel XGBoost based DRA framework meets the design goals efficiently in the estimation of signal gain and BER. It is observed that the XGBoost predicts outstanding performance with the massive categorical dataset presented in tabular form in this work. This in turn, the size and nature profile of the dataset does not affect the learning process of the proposed model as it performs better. The categorical nature of the tabular dataset (categorical and continuous input) based gradient boost decision tree models are more prominent to resolve regression types problems. In the estimation phase, the perceived outcomes of the XGBoost model exhibit the best estimation on testing and training data with lowest MSE value of 0.705667 and 1.316402 for signal gain and 0.0187373 and 0.7731500 for BER subsequently. Further, with the model testing and evaluation phase, the results showed that the XGBoost has the highest prediction performance for the signal gain wherein R 2 is 0.990852, RSR be 0.095646 and NSE is 0.990852 and for the BER, the calculated value of the R 2 is 0.999745, RSR has 0.015955 along with NSE as 0.999745, compared to others RF, SVM and AdaBoost ML models. Furthermore, the proposed model is intended to promote the accumulation of a more massive dataset, so that its predictive potential efficacy will be enlarged enough and time-saving. The XGBoost model revealed a target oriented results on the measured tabular dataset and it can be used to associate the dimensionality between pump frequency and power in wide-range Raman application. The presented model has the ability to predict the target possible signal gains and BER in the adopted low loss optical region of space. The proposed model can be employed in the unrepeatered based ultra-longhaul network to realize more robust distributed Raman gain profiles with increasing the number of pumps and propagation module.