Hybrid Regression Model for Link Dimensioning in Spectrally-Spatially Flexible Optical Networks

The latest trends in computer networks bring new challenges and complex optimization problems, one of which is link dimensioning in Spectrally-Spatially Flexible Optical Networks. The time-consuming calculations related to determining the objective function representing the amount of accepted traffic require heuristics to search for good quality solutions. In this work, we address this problem by proposing a hybrid regression model capable of the objective function estimation. The presented algorithm uses a machine learning model built on already evaluated solutions for choosing new promising ones, providing a fast and effective method for solving the considered problem. The experimental evaluation conducted on two representative network topologies demonstrates that the proposed approach can significantly outperform other methods in the case of the EURO28 topology, while for the US26 topology, it provides results comparable to the solutions obtained so far.


I. INTRODUCTION
Recently, telecommunication networks have become an indispensable part of society's everyday life, providing support for such vital areas as education, business, finances, health care, entertainment, social life, to enumerate a few. Their crucial function in society was especially emphasized during the COVID-19 pandemic when several activities could be performed only remotely [1]. The networks' important role and increasing popularity also bring continuous growth of the number of users, connected devices, as well as interest in bandwidth-intensive services [2]. In order to meet these growing requirements, the networks have to continuously evolve by implementing advanced physical architectures and technologies which are optimized or controlled by a dedicated software intelligence [3]. Currently, one of the most promising technologies for optical transport networks is the idea of Spectrally-Spatially Flexible Optical Networks (SS-FONs), which combines benefits of the architecture of elastic optical networks (EONs) and the technology of spatial division multiplexing (SDM) [4]. The EONs' effectiveness comes from the operations within flexible frequency grids and support for advanced modulation and transmission techniques [5] while SDM allows extending the links' capacity limit by utilizing a number of spatial resources on each physical link [4].
Moreover, in recent years the field of computer network optimization has been intensively developing the concept of cognitive optical networks, introduced in order to improve the performance of future optical networks comparing to the conventional solutions used nowadays. In a nutshell, a cognitive optical network is defined as a network with a cognitive process that can perceive the current network conditions and then plan, decide, and act on them [6].
The key new element of cognitive optical networks is the application of various machine learning methods to support and enhance well-known existing optimization approaches.
In this work, we address an optimization problem of an efficient link dimensioning in SS-FONs, which consists in deciding on the amount of spatial resources to be activated in order to maximize the amount of the served users' traffic and minimize the network operational cost [7]. The considered problem is highly challenging due to two contradictory optimization criteria, enormous solution space, and timeconsuming evaluation of a candidate solution, which requires to simulate demands allocation in the network using an external time-consuming routing algorithm. To tackle the problem, we propose and tune a new optimization algorithm based on a supervised machine learning hybrid-regression model. The efficiency of the proposed approach is evaluated using extensive experiments run for two representative network topologies and various traffic patterns.
The main paper's novelty and contributions are: • An interpretation of the optimization problem as a regression problem. • Architecture of a hybrid supervised learning system for a regression task with an integration rule based on a probabilistic classifier. • Experimental evaluation of the proposed methods. The rest of the paper is organized as follows. Section II discusses related works introducing state-of-the-art methods in the field. Section III introduces the network optimization problem and the procedure of dataset generation. Section IV describes a proposition of the supervised learning-based system for network optimization, while Sections V and VI present the experiments' set-up and results accordingly. Finally, Section VII concludes the whole study.

II. RELATED WORKS
This section briefly discusses recent works related to cognitive optical networks and the application of machine learning methods to other optimization problems.
The processes in cognitive optical networks, which learn or make use of history to optimize the network performance, can apply various machine learning mechanisms [6]. One of their most popular applications are traffic forecasting and improving routing or resource allocation strategies [8], [9]. Various methods were applied for the task of traffic prediction, wherein the most efficient were based on typical neural networks with single hidden layer [3], [10], nonlinear autoregressive neural networks [9] and long shortterm memory networks [8]. Besides network traffic, the researchers also proposed models to forecast the quality of services realized in the network and the overall network performance under some specific configurations/circumstances. For instance, the authors of [11], [12] proposed a regression model to estimate the bandwidth blocking probability in the network (it should be minimized) based on the applied configurations of available modulation formats. Barletta et al. [13] defined a classification model to predict the probability that the bit error rate of a candidate optical path will not exceed the system tolerance threshold. Feature engineering was conducted by employing the traffic volume, modulation format, path length, length of its longest link, and a number of all included links as attributes. Ibrahimi et al. [14] utilized the regression methods to propose a model estimating the signal-to-noise ratio (which influences the quality of a transmission significantly) of candidate optical paths in a network. For more information on optical networks and the application of various machine learning techniques in optical networks, please refer to [3], [15]- [17].
Machine learning methods can be used not only to support computer network optimization methods, but also to support other solvers for generic optimization problems. For instance, in [18] the linear and quadratic regression models are used in STAGE algorithm to approximate the best starting point for the local search procedure. The model is trained iteratively, and the optimization procedure is repeated from a new starting point giving good results for various optimization problems. Among more recent works, the regression models are used to estimate the cost function in the offshore wind farm layout problem [19], where training data comes from complex estimations of Mathematical Optimization. The regression models can be also applied to approximate functional gradient descent [20], which was found promising method for optimization in robotic problems. Moreover, it can be used for prediction of modeling efficiency [21] in feature selection tasks, based on cross-validation protocol employed in the Genetic Algorithm.

III. NETWORK OPTIMIZATION PROBLEM
The analyzed optimization problem refers to SS-FONs. The main idea of SS-FON is an operation within flexible frequency grids, where the entire spectrum width available in an optical fiber is divided into narrow and same-size segments called slices [22]. The adjacent slices can be then grouped together to create communications channels, which are used for a data transmission. Depending on the number of involved slices, SS-FON gives possibility to create channels of different size (tailored to the incoming traffic demands) and in turn to provide superior spectrum utilization. Concurrently, SS-FON allows to extend links' capacity limit by introducing spatial dimension to enable parallel optical signal transmission through spatial resources (or spatial modes for the sake of simplicity) co-propagating in suitably designed optical fibers. There are several candidate fiber solutions proposed for SS-FON realization, wherein the most popular ones are: single-mode fiber bundle (SMBF), multi-core fiber (MCF), few-mode fiber (FMF), few-mode multi-core fiber (FM-MCF) [4], [23].
In this paper, authors put the attention on SS-FON network realized using SMFB fibers. It is important to denote that it is focused on a transport network topology, which operates through areas of different cities, countries, or even continents. The physical augmentation of the network infrastructure (i.e., fiber links) is a complex and expensive process. Therefore, in order to mitigate these challenges, the transport networks are always over-provisioned. The links of existing networks are already equipped with several spatial resources. In order to decrease network operational costs, the operators keep some of these resources switched off. However, they can be quickly and easily activated whenever necessary.
Optical network operators offer various services based on backbone optical networks consisting of fibers and devices. From the business point of view, one of the main goals of network operators is to maximize the amount of served network traffic since it allows to increase the revenue and number of customers. The basic approach to this goal is to increase network capacity by adding new resources to the network (e.g., fibers, devices). Nevertheless, the network expansion generates additional expenses and takes a long time (e.g., installing new fibers can last months). Therefore, the operators prefer to use a method that leverages the existing infrastructure and delays the new expenditures. To this end, this work states a proposition of an approach that allows to re-dimension an SS-FON by changing the number of utilized fibers in some network links. What is vital, this re-dimensioning method does not entail additional capital expenditures since only the existing resources (installed dark fibers, switching nodes, transponders) currently available in the network are utilized. In addition, the re-dimensioning approach can be applied relatively fast since the most timeconsuming element is to activate not-used fibers through a proper port rearrangement in network nodes, e.g., operating with Architecture on Demand (AoD) paradigm.
The considered optimization problem related to the SS-FON re-dimensioning consists in augmenting or reducing the number of active fibers in some network links under a constraint that a limited number of ports is available in network nodes. SS-FON is modeled as a directed graph G = (V, E) where V is a set of optical nodes (devices) and E is a set of directed links (bundle of fibers). We assume that each SS-FON link has a number of available fibers aggregated in a bundle. Some of the fibers in the link are active, i.e., they can be currently used to provision the incoming lightpath requests. In turn, some of the fibers are inactive (dark), and they cannot be currently used to provision the incoming requests. It is assumed that inactive fibers are already deployed, and it is simple and quick to lift them up and make them available for allocation of lightpaths. As a default configuration, SS-FON is dimensioned uniformly, i.e., every link e ∈ E has the same number of active fibers denoted as K. Moreover, it is assumed that every optical node v ∈ V has p v = K · deg(v) input/output ports, where deg(v) denotes the node degree (number of adjacent links) of the port v. In other words, every node v has as many ports as necessary to serve all active fibers connected to the node v.
The considered optimization problem can be formulated in the following way. Let integer variable y e denotes the number of fibers assigned to the link e. Moreover, let a set Y = (y 1 , y 2 , ..., y |E| ) denotes a network configuration defined as a network, in which each link e ∈ E is assigned with the number of fibers given by y e . Due to the limited number of available ports p v in nodes v ∈ V , every network configuration must satisfy the constraint that the overall number of fibers connected to/from a particular node v cannot exceed p v . A network configuration is feasible if the above node constraint is satisfied. The considered optimization problem involves finding a feasible network configuration that provides the best network performance under a dynamic traffic scenario.
The following procedure is applied to verify a particular network configuration Y . A network dimensioned according to Y (each network link is assigned with y e fibers according to Y ) is saturated with a dynamic traffic, i.e., traffic requests between network nodes arrive in the network over a time. Every request has a holding time in which the request stays in the network, and after this time, the request is removed from the network. Moreover, every request has a capacity, i.e., bit-rate required to serve the request. To serve a request that arrived in the network, a lightpath has to be established consisting of a routing path and a range of optical spectrum slices allocated in network links included in the routing path necessary to serve the request's bit-rate. We assume a flexible back-to-back (B2B) regeneration of an optical signal that allows regeneration of lightpaths with modulation conversion in any intermediate node of a lightpath [24]. If the network resources are not sufficient, the request is rejected. The main performance metric used to measure a particular network configuration is bandwidth blocking probability (BBP) defined as the ratio of the volume of rejected requests to the whole offered traffic volume. Next, a more aggregated metric called accepted traffic (AT) for 1% threshold of BBP is formulated. To measure the accepted traffic, the network traffic introduced to the network is gradually increased in order to reach BBP of 1%. This procedure allows estimating the amount of traffic that can be served in the network configured according to Y with BBP 1%, which is a commonly accepted threshold. For the dynamic routing, we apply Adaptive Routing with Back-toback Regeneration (ARBR) algorithm proposed and evaluated in [24]. For a more insightful description and more details on the considered optimization problem, please refer to [7], [25].
The addressed SS-FON re-dimensioning optimization problem (i.e., decision on values of variables y e ) can be solved using various approaches. In our previous works [7], [25], we have proposed a heuristic method, in which the decision which links are to be augmented or reduced is made based on metrics assigned to the network links. In particular, links with the highest value of the metric are selected to be augmented, while the selection of links to be reduced is made according to the lowest value of the metric. All re-dimensioning decisions must satisfy the node constraint due to the limited number of ports. We have examined various link metrics utilizing data analytics observations related to network topological characteristics and network traffic allocation within a particular period.
Moreover, in the context of this work, we have solved the problem using Integer Linear Programming (ILP) approach. In more detail, the analyzed problem was formulated as a set of constraints with an objective function that minimizes the expected load on the most congested network link after the re-dimensioning. The ILP model uses as an input the values of network loads obtained by simulating the dynamic routing operation in a network configured in a default way, i.e., y e = K for every e ∈ E. Finally, the obtained solutions VOLUME 4, 2016 are used to create datasets applied in further experiments.
Product of performed simulation states collection of datasets -for each considered topology -fueling further supervised learning methods: DS C set of all correct problem instances, but without calculated criterion function, DS R set of 300 random correct instances (subset of DS C ), with calculated AT value, DS H set of 16 correct instances being products of different heuristic approaches to the problem [25], with calculated AT value, DS I set of 32 correct instances being products of different ILP approaches to the problem [7], with calculated AT value, DS O union of DS H and DS I , DS A union of DS R and DS O sets supplemented with categorical vector establishing label (positive) for DS O instances and label (negative) for elements of DS R .

IV. SUPERVISED LEARNING INTERPRETATION OF THE PROBLEM
Both network optimization and supervised learning are optimization tasks, thus -for their basic concepts -it is possible to define a generic plane of interpretation. It exists mainly in the area of a description of system inputs [11]. An instance of an optimization solution Y is represented by a vector describing the parameters and the related criterion function value AT which has to be determined in a complex network simulation. Similarly, an instance of the supervised learning problem x is also a vector, described by a label, which can be either discrete or continues value for classification and regression task respectively. Highlighting this similarity, one may easily propose a translation between these definitions, and a definition of a recognition model capable of estimating the AT value for new solutions, without the need to perform a computationally expensive simulation.
In the optimization task the system is known in advance, but all calculations performed for it are time-consuming due to its high complexity. On the other hand pattern recognition methods may model such system, which means that the resulting prediction is no longer so time-consuming, but at the same time, burdened with a measurable generalization error. The last key difference between systems of both types is also the user's expectations. From optimization systems, it is required to designate an instance of the problem that will minimize or maximize the value of the objective function. On the other hand, pattern recognition systems are required to minimize the prediction error.
The default requirement for ensuring high generalization capability of any model employing the inductive learning paradigm is to provide the training procedure with a sufficiently large pool of annotated observations [26]. However, due to the characteristics of the problem under consideration and the high complexity of the simulation used to construct the training set for research purposes, only a limited pool of objects is available. Such a situation makes the problem particularly difficult [27] and necessitates the use of alternative decision-making system improvement mechanisms, which in the proposed solution will be based primarily on ensemble approach to supervised learning [28].
It is essential to underline that the available labels of the dataset allow the interpretation of the problem as both a regression and classification task. Thus, in the designed method, the column of the AT value will be a target of the regression, and the column specifying the source of the solution instance (DS R , DS H or DS I ) will be a target of the classification.

A. SOLUTION ARCHITECTURE
The task of the algorithm proposed in the following paper is to construct a recognition model capable of high-quality AT value prediction, allowing to probe the complete available set of correct solutions (DS C ) in order to select the optimal network configuration Y that maximizes this value.
The first step to solve this problem is to construct a basic regression model (A reg ) fitted on all available samples from the training set (DS A ). It states a basic solution, which turns out to be sufficient in the case of problems with lowcomplex distribution characteristics [29]. However, as it was shown in the preliminary tests, which are also presented in Section VI of the paper, it is not an acceptable solution to solve the problem of optimal network configuration selection. It is necessary to underline that the authors are aware of the fact that the typical metrics for assessing the quality of regression models are not perfectly tailored to the optimization-aiding challenge [30]. However, the low r 2 score for objects from the DS O category obtained by A reg , was sufficient to induce a suggestion for the construction of a more complex solution.
This early premise from the preliminary research led to the proposition of a hybrid architecture of the constructed prediction model [31]. Its primary idea was the construction of a minor, homogeneous pool of independent regression models [32], diversified by identified categories of available data [28], integrated with the use of a classification model capable of generating support for pattern membership for these categories [33].
The first of the two most important criteria for the construction of a reliable hybrid recognition system is to ensure appropriate differentiation of the models contained therein to support the final decision [34]. Therefore, in the proposed collection of regressors, the basic A reg model built on all three categories of labeled subsets (DS R , DS I , and DS A ) was supplemented with two additional predictors: The second key factor of an effective hybrid recognition system is the design of an appropriate method of integrating the responses of member regressors [35]. The authors, after a preliminary analysis of simple approaches of achieving consensus -based on the rules of averaging the obtained prediction [36] -decided on a more complex combination. The proposition is based on a trainable fuser, estimating the weights of member regressors' decisions based on the support given by it to problem categories identified as the training sets of individual models from the pool [37].
The classification model supporting the integration of the ensemble is trained on the DS A set, in the labeling bias solving the dichotomy of belonging to the DS R and DS O categories constituting its disjoint subsets. Due to the disproportion between the classes of the problem interpreted in this way, the set of random solutions (DS R ) will assume the role of a negative class, and the optimized solutions union (DS O ) -a positive class, which will allow us to denote the support vector F of the classifier as: where F R and F O estimate the probability of belonging to the, respectively, DS R and DS O set. The prepared integration model allowed to propose two rules for the combination of base regressors' responses: Optimized Random Combination (ORC) which weighs only the predictions of the regressors R reg and O reg with the support vector F, assuming the formula: Optimized Random All Combination (ORAC) which also takes into account the response of the A reg regressor, adopting the formula: The rule of integration of the pool of regressors incorporated in this way allowed for the closure of the proposed processing architecture, presented in Figure 1. Its left side presents the relations between the individual training sets of the hybrid model, and the right side -the applied approach to the combination.
The proposed solution to the problem presented in this paper considers a set of predictors with a hierarchical structure, using both regression and classification models. In order to standardize the experimental evaluation, the Multi-Layer Perceptron (MLP) with the lbfgs optimization function and 100 neurons with the ReLU activation function in a single hidden layer was selected as the base algorithm for the construction of the recognition model in both the regression and classification task.
The authors would like to underline that in the preliminary experiments, the potential of many other simple regression methods was verified for the considered problem pool, which justified the use of MLP as the base model for both regression and classification.

B. AIDING OPTIMIZATION BY A HYBRID REGRESSION MODEL
In the experimental evaluation procedure of the proposed hybrid AT value prediction method for instances of the analyzed problem, a 5-fold cross-validation protocol was used. This approach was also utilized as an additional factor stabilizing the estimation of the criterion function value in the final optimization-aiding model, in which instances from the complete available set of valid, unlabeled DS C solution instances were given at the input of the hybrid predictor built on each of the cross-validation folds. Furthermore, this made it possible to build an external ensemble of predictors, using the knowledge from the entire training set, which is quite a common, recognized practice in the development of production recognition systems [38].
In the final approach to aiding optimization, for the best 20 instances of the DS C set, the simulation procedure calculated the value of the criterion function. Selected best instances had to meet following criteria: (a) the hybrid regression method returned the highest predictions, (b) the classification model assigned them to the category DS O , which distribution is characterized with a higher expected value.

V. EXPERIMENTS SET-UP
In order to evaluate the proposed method, a coherent experimental protocol was established, under which both the distribution of solutions, and the predictive ability of the proposed method were examined. The protocol defines the description of the datasets, the detailed plan of each experiment, and the specification of the research environment in which the research was conducted.

A. DATASETS
In the experimental evaluation of the hybrid aidedoptimization method proposed in this paper, two representative optical network topologies were used (Figure 2), each of them in two network traffic scenarios, which together gives four analyzed datasets that constitute a problem for recognition models: EURO28A EURO28 network topology, in which requests are uniformly distributed between all origindestination node pairs in the network. EURO28G EURO28 network topology, in which the distribution of requests between origin-destination nodes is inversely proportional to the distance between these nodes. US26A US26 network topology, in which requests are uniformly distributed between all origin-destination node pairs in the network. US26G US26 network topology, in which the distribution of requests between origin-destination nodes is inversely proportional to the distance between these nodes. The adopted strategy for the construction of the recognition system uses a classification model that resolves the dichotomy between the categories of data with a highly imbalanced cardinality in the available training set, which could cause difficulties in prediction resulting from the presence of imbalanced data [39]. These difficulties were taken into account by the authors of the study. However, the experimental evaluation carried out in terms of classification showed that the datasets, despite a strong imbalance [40], present a relatively simple dichotomy, solved by the proposed predictor at the level of over 90% of the balanced accuracy score [41], so they were not subjected for additional preprocessing phases necessary to balance the prior class distribution.

B. EXPERIMENTS DESIGN
As a part of the experimental evaluation of the proposed method, four computer experiments were designed to verify possibly broad spectrum of aspects influencing the processing: E1 Analysis of the distribution of the AT value in individual categories of solutions.
The first preliminary experiment aims to build more knowledge about the analyzed problem to support the observations resulting from further experiments and reach binding conclusions. The distributions of AT values for the sets DS A , DS R , DS H , DS I and DS O determined for each collected dataset will be analyzed, using the kernel density estimation, with the bandwidth estimated using the silverman method. Such an approach will allow the dependencies between the labeled subsets of the DS C and draw preliminary conclusions about the potential dependencies between member regressors.
The possible observed significant differences between the AT distributions for different categories of solution sources will constitute a proper justification for the hybrid approach to the construction of the recognition system.
To gather information about the solution distribution in target value space (explained with AT) results were obtained using kernel density estimation with estimated bandwidth using silverman method. All available solutions for different datasets were used in various combinations described in Section III.
E2 Assessment of the recognition model's ability to distinguish between DS R and DS O members.
A critical component of the proposed hybrid regression model is the development of a robust recognition system fuser. The literature on the problem indicates that the predictive ability of a multi-classifier system based on a decision rule aimed to recognize an area of responsibility of the member classifiers has a direct impact on the effectiveness of the entire ensemble [42]. Therefore, this experiment will aim to assess the classifier's effectiveness in controlling the combination of regressors, assuming that the observations typical for classifier ensembles can also be generalized for the other supervised learning tasks.
This experiment is conducted on a single model -MLP, which will be used to estimate binary classification accuracy explained in balanced accuracy score. The experimentation protocol is stratified 5-fold cross-validation, conducted on each dataset.
E3 Assessment of the cross-predictive ability of the developed regression models concerning each of the categories of labeled solutions.
It should be underlined that the regression models built as a part of the proposed processing architecture, due to the lack of dedicated optimization-aiding metrics for such solutions, are optimized in a traditional way to squared-loss However, it is necessary to carry out such evaluation to assess the constructed models' general predictive ability and verify their effectiveness for individual categories of available training data. Therefore, this experiment will aim to verify how single individual models (A reg , R reg , and O reg ) and the proposed hybrid models (ORC and ORAC) perform both against the full available dataset (DS A ) and its categories (DS R and DS O ).
Regression models created in this experiment are compared with explained variances score obtained in stratified 5-fold cross-validation. All results for individual model are compared for statistical significance using paired Student's t-test. The last experiment will be designed to verify how the proposed hybrid models and their member-regressors work in link dimensioning aiding. For each of the regressionbased models, the best 20 solutions will be selected following the aiding-strategy described in Section IV-B, for which the real AT values will be determined in the simulation procedure. Both the highest AT values from such selected samples and (using the T-student test for independent sam-ples) the distributions of the best 20 values of the criterion function available in the sets DS R , DS H and DS I will be compared.
Such an approach will allow for the final verification of whether the classically trained (squared-loss optimized) hybrid regression model can be used to select the best available network configuration and whether the possible profit from this type of solution turns out to be statistically significant.
Positive verification in this scope will allow us to propose an effective hybrid regression method for link dimensioning aiding. However, obtaining a method statistically dependent on the standard approaches used so far in this domain will suggest the need to adapt the existing regression algorithms to optimization to the cost function, which does not directly assess the predictive ability of the regression model, but its effectiveness in supporting external optimization tasks.
Presented results are determined following the procedure specified in Section IV, based on the prediction fused by ensemble, and the non-parametric Student's t-test for independent samples was used to determine the statistical relationship between the methods used to solve the network problem.
The research environment used to conduct experiments was prepared in Python supported by commonly used scientific packages -scikit-learn [43] and SciPy [44]. Provided results of experiments are reproducible and code is available in on-line repository 1 .
The solution of the considered problem instances for four analyzed datasets were obtained using two approaches described above, namely, heuristic algorithms [25] imple- mented in C++ and an ILP approach implemented in C++ using the ibm ilog cplex Optimization Studio V12.6.3.

VI. EXPERIMENTAL EVALUATION
Conducting the research under the conditions strictly defined in the previous section resulted in the evaluation of the proposed method. For each of the experiments, the observations and conclusions drawn on the basis of the obtained results are presented. Figure 3 presents the density distributions of the AT value for the analyzed datasets. The AT is expressed in NTUs (Network Traffic Units) [7]. The results of the preliminary analysis prepared in E1 confirm the existing significant differences in distributions between solutions of different categories. As predicted, the expected value for the solutions selected with heuristic methods is much higher than for the pool of random solutions. At the same time, it is possible to observe a much smaller variance in these distributions, which is related to the search procedure itself, whichconsidering the nature of the network optimization problem -aims to maximize the value of AT. However, the expected, perfect training set shall cover the entire solution space in the considered supervised learning problem. Therefore, the use of only solutions from the optimized pool may lead to the underrepresentation of the problem, which will increase the inaccuracy of predictions, especially for solutions placed outside of the local optima evaluated in heuristics.

E1: Analysis of the distribution of the AT value in individual categories of solutions
The benefit of concatenating all sets to DS A is to increase the quantified value while maintaining a high standard deviation. The performed experiment also indicates -which is especially emphasized in solutions from the DS I groupthe tendency of heuristics to search in local optima, which is exposed in two peaks for the density distribution plot observed in each dataset. The proposed combination of DS I and DS H allows creating a homogeneous pool that can be used for the needs of the recognition task.
E2: Assessment of the recognition model's ability to distinguish between DS R and DS O members.
The observed differences in the distributions, especially the significant difference between the expected value of the sets DS O and DS R , give grounds for formulating a dichotomy consisting in selecting the set to which the previously undetermined solution belongs. The results presented in Table 1 seem to confirm this assumption, where we observe very high results of the balanced accuracy score metric for the MLP. As a result, it is possible to prepare a hybrid model in which the high ability of the classifier to recognize categories will be used to integrate the pool of regressors modeled on separate pools of solutions. Thus, the narrowing of the pool of optimized solutions, which in the case of a single regressor could harm the prediction, can be reduced by using the model prepared in E2, allowing for the creation of a hybrid model, composed of regressors trained on the fragments of the solution pool.
E3: Assessment of the cross-predictive ability of the developed regression models regarding each category of labeled solutions.
As mentioned before, the square-loss metric is insufficient to assess the regressor model's usefulness in the optimization aid task. However, the results obtained using this metric allow for a reliable comparison of the created models in terms of the correctness of prediction. Each section of Table 2 relates to the results obtained on the successive datasets. Each row in the section informs which subsets were used to estimate the regression quality (DS A : all; DS R : random; DS O : optimized). Table columns represent consecutive approaches to the solution. These are both the base regressors (A reg , R reg and O reg ) and the hybrid models built with their use in two integration approaches (ORC and ORAC).
In almost all sets, except for EURO28A, we can observe a bias of the O reg regressor favoring the DS O category, which for DS R and DS A sets achieves significantly lower results than other models from the single models group. However, it is worth noting that the statistical significance of differences was observed only on the EURO28G set, with a very high deviation of the established metric values for other datasets. It can be assumed that this is related to the narrow distribution of the objective function values, which was a observation of E1.
At the same time, no statistically significant difference can be observed between A reg and R reg regression efficiency. However, the average values on the DS A and DS R sets are higher for R reg on almost all sets except US26G. It is also worth noting that including the set of solutions optimized for training the regressor improves the results on DS O , which is an expected observation. Interestingly, the R reg results on the DS O set for US26 problems after flattening reaches the result of 0.00, while for EURO28, it manages to achieve acceptable results. Presumably, it is related to the scale of the intersection of both sets.
The validity of the use of hybrid methods was confirmed for the problems from the EURO28 group, for which a  statistically significantly better result than the basic models can be observed. Furthermore, the hybrid approach, using a classifier identifying the solution class (hybrid models), allows obtaining a model with a much better global predictive ability than solutions built on the entire available pool of solutions (DS A ) or any of their categories (DS R and DS O ). Consistent with this observation, single regression models perform well with consistent distributions (disjoint DS R and DS O characteristics) but cannot achieve a globally high predictive capacity in such an environment. The final element of the analysis of the proposed methods is presenting the quality of the hybrid regression model in the Link Dimensioning-aiding task, included in Table 3. It shows, for each dataset, the best value of the criterion function (AT metric) in each of the training subsets (DS R , DS H , and DS I ) and the best values of the twenty solutions identified by each of the recognition models selected according to the procedure set out in Section IV. For comparison, the AT value obtained for the base approach, which is the current solution used in SS-FON networks, is also marked.
Based on a hybrid approach, we produce a solution with generally better decomposition characteristics. The expected value of a random variable of solutions increases with assumptions about its normality while reducing the standard deviation. Increasing the expected value with a reduction of the standard deviation means that we have a better chance of achieving a better solution and, at the same time, a lower chance of achieving the best solution, so we achieve greater stability of results.
This observation raises the problem of discrepancy in general research objectives in optimization and pattern recognition, as indicated earlier. In optimization, we want to find the best, single result, which is the optimum of the objective function (which may sometimes be burdened with the nondeterministic nature of the simulation). On the other hand, we want to build a model with the best average solution parameters in supervised learning. Hence, VOLUME 4, 2016  the perfect optimization result for the recognition model often turns out to be a harmful outlier. At the same time, an excellent supervised learning outcome sometimes leads to average results from an optimization perspective. It does not mean, of course, that the use of supervised learning methods does not allow to improve the results achieved by standard optimization methods dedicated to specific optical network issues. For EURO28 datasets, each of the proposed methods allowed obtaining a higher AT than the base solution and each of the solutions proposed so far by the literature. In the US26 set, it was possible to achieve results comparable to the solutions proposed so far, whichaccording to the authors' assumptions -results mainly from the substantial inconsistency of DS R and DS O criterion function distributions.

VII. CONCLUSIONS
This paper proposes a hybrid regression method dedicated to supporting the configuration of SS-FONs. This method allows the prediction of the AT value based on the network configuration provided to the model, which is later used in the review of all possible network configurations in order to select a quasi-optimal solution. The conducted experimental evaluation showed its usefulness in the EURO28 topology and competitiveness against state-of-the-art solutions in the US26 topology. Future works plans to introduce further problem decomposition by distinguishing between heuristic (DS H ) and ILP (DS I ) models in place of the general DS O model. This raises the interesting problem of a strongly reduced size of the training set, but could potentially improve the quality of the model for the US26 topology. The predictive ability of the hybrid regression model seems to strongly depend on the appropriate definition of the solution category, which seems to be the most likely cause of significant differences between the effectiveness of the support obtained for EURO28 and US26.
It is possible that an interesting approach would be to use as part of the solution also unsupervised learning methods to determine the category of solutions and use the proposed hybrid combination rule based on these categories.