A Survey on Machine Learning Techniques for Routing Optimization in SDN

In conventional networks, there was a tight bond between the control plane and the data plane. The introduction of Software-Defined Networking (SDN) separated these planes, and provided additional features and tools to solve some of the problems of traditional network (i.e., latency, consistency, efficiency). SDN is a flexible networking paradigm that boosts network control, programmability and automation. It proffers many benefits in many areas, including routing. More specifically, for efficiently organizing, managing and optimizing routing in networks, some intelligence is required, and SDN offers the possibility to easily integrate it. To this purpose, many researchers implemented different machine learning (ML) techniques to enhance SDN routing applications. This article surveys the use of ML techniques for routing optimization in SDN based on three core categories (i.e. supervised learning, unsupervised learning, and reinforcement learning). The main contributions of this survey are threefold. Firstly, it presents detailed summary tables related to these studies and their comparison is also discussed, including a summary of the best works according to our analysis. Secondly, it summarizes the main findings, best works and missing aspects, and it includes a quick guideline to choose the best ML technique in this field (based on available resources and objectives). Finally, it provides specific future research directions divided into six sections to conclude the survey. Our conclusion is that there is a huge trend to use intelligence-based routing in programmable networks, particularly during the last three years, but a lot of effort is still required to achieve comprehensive comparisons and synergies of approaches, meaningful evaluations based on open datasets and topologies, and detailed practical implementations (following recent standards) that could be adopted by industry. In summary, future efforts should be focused on reproducible research rather than on new isolated ideas. Otherwise, most of these applications will be barely implemented in practice.


I. INTRODUCTION
Until few years ago, most company networks followed a traditional approach. In particular, legacy networking devices obeyed an architecture based on a tight bond between control and data planes [1], translated into a vendor lock-in, in which networks became complex and difficult to maintain and manage, particularly as they rapidly grew. When software is tightly bundled with hardware, interfaces are sellerspecific. Manufacturers write the code, leading to long delays The associate editor coordinating the review of this manuscript and approving it for publication was Cong Pu .
in introducing the latest features and functions, i.e., networks are quite static and not flexible enough, which obstructs new business projects and applications. Software-Defined Networking (SDN) overcomes these issues by exchanging the control logic from devices to a central place (the SDN controller), in which networking decisions and overall functionality is developed based on common programming languages. Afterwards, the exchange of control logic is usually implemented by the OpenFlow protocol [2]. Fig. 1 illustrates the architecture of SDN, in which the data plane (forwarding functions) and control plane (network control) are decoupled. This opens a new wide range of possibilities. The SDN paradigm can be leveraged for multiple functions, such as traffic engineering, network virtualization, and load balancing, according to the network administrator needs [3]. It is helpful for new business projects and provides the facility of flexibility and virtualization. In particular, SDN has rapidly grown together with the Network Functions Virtualisation (NFV) [4] concept. They combined forces to boost emergent networking applications, including 5G, in which SDN serves as a network resource manager and reinforces network orchestration. Nevertheless, traditional routing algorithms are not good or suitable for SDN because their convergence and response are slow, and they follow a distributed approach, like the OSPF algorithm.
On the other hand, the concept of Artificial Intelligence (AI) was introduced by John McCarthy in 1956 [5]. In the field of computer science, AI is also known as Machine Intelligence. Machine Learning (ML) is a category of AI based upon the natural intelligence that can learn from data, make decisions, identify patterns and perform different actions with less human intervention. The devices based on ML perceive the real environment and apply actions according to their needs or requirements to maximize the opportunity to achieve their goal successfully. ML can potentially be used to solve many problems in networking, including design, implementation, performance and verification.
Nowadays the use of ML techniques is increasing. It is considered that these techniques are better as compared to traditional algorithms, particularly for the processing and analysis of large volumes of data. In the area of networking, researchers are paying their attention to the usage of these techniques. For example, the Knowledge plane concept was first coined in 2003 by Clark et al. [6] and introduced the primitive view of ML techniques in networking. Different ML techniques are employed in SDN to achieve synergistic effects and to overcome individual limitations.
Additionally, in the specific field of SDN, ML has been leveraged in different applications, including traffic engineering [7], [8], resource management [9], [10], intrusion detection systems [11], [12] and for other security purposes [13], [14]. For instance, Mijumbi et al. [15] leverage it for adjusted virtual network and managed resources in virtualized network by using control plane, or Akyildiz et al. [16], which introduce the state of art for traffic-engineering in SDN/OpenFlow networks.
As a consequence, in SDN, the role of ML has recently boosted due to its multiple applications. The architectural logic of SDN harmonizes better with ML algorithms than with traditional algorithms. In particular, many research results combine ML techniques with SDN for routing optimization. Furthermore, ML is seen as key technology trend for 6G and beyond [17].

A. CONTRIBUTIONS OF THE SURVEY
In this paper, we survey different approaches of ML techniques for routing in SDN. We try to cover most of the ML techniques and classify them into three primary categories. The main objective is to provide a comprehensive overview of ML techniques in SDN for routing optimization, emphasizing on contributions and learned lessons for future research.
The main contribution of this survey is that it strictly focuses on ML techniques applied for routing in SDN. While other surveys have a more generalist approach (focusing either on SDN or ML, different networking applications, and providing an overall idea), our survey aims to delve into specific routing applications and why ML has become such an important actor thanks to SDN (i.e., centralizing the logic and facilitating the integration of ML, otherwise unfeasible in traditional routing approaches, mostly distributed).
In summary, this survey encompasses the following contributions: • It provides an in-depth overview of SDN, routing, and ML techniques, performed by a group of researchers coming from different fields and expertise in different areas, which enriches the analysis.
• It presents a qualitative analysis of ML techniques to help new researchers in the field where to start from, as a guideline, based on the context of the scenario to be analyzed and the desired applications.
• It classifies the most recent works in relation with the survey according to three main categories of ML. Most works were published during the last three years.
• It analyzes and compares all works, including the techniques leveraged, their specific objective (considering all of them are focused on routing), their implementation and evaluation, pros and cons. This analysis is concluded by a summary of learned lessons and research trends.
• It provides a comprehensive section including future research directions, which, from our point of view, represents the most interesting part of the survey, as much work still needs to be done in the field to be relevant in a long-term manner.

B. METHODOLOGY OF THE SURVEY
The search of the state of the art was mainly performed using the Google Scholar site, which comprehensively indexes VOLUME 9, 2021 works (articles, patents, etc.) from different journals and sites, and even from archive repositories. During our search, they main keywords used were: routing, SDN and ML (these two latter both using acronyms and the full name), which are the three core terms in relation with the survey, but we also looked for AI, optimization, traffic engineering, load balancing, NFV, learning, supervised, unsupervised and reinforcement (which are directly related with the classification of ML techniques, explained within the following sections), among others. Additionally, we also used survey, overview and tutorial to examine the closest related works, and to evaluate the contributions of our survey. The search yielded thousands of results, most of them published within the last five years, from which we filtered the ones directly related with our analysis. The growth of publications was particularly relevant within the last two years with an exponential increase for the reinforcement learningbased approaches. For this reason, we applied filters based on number of citations to analyze the most cited ones first, and we focused on articles written in English (which was the most common language) and published in prestigious journals (preferably indexed in JCR).
Finally, we also scrutinized the references of articles already selected for the survey to look for additional relevant works.

C. STRUCTURE OF THE SURVEY
The roadmap of this manuscript is depicted in Fig. 2. The article starts with a extensive analysis of the related work in Section II and core definitions of SDN in Section III. Afterwards, a general description of ML techniques, together with a qualitative comparison, is presented in Section IV, which is divided into three categories i.e. Supervised Learning (SL), Unsupervised Learning (UL), and Reinforcement Learning (RL) (which includes Deep Reinforcement Learning (DRL)). Section V is devoted to the application of these ML techniques together with SDN for routing optimization. This section is finalized by a quick overview that presents learned lessons, current trends and the best published works so far, according to our analysis. Section VI discusses specific future research directions and open issues of routing optimizations in SDN, followed by the overall conclusions in Section VII. Finally, Table 1 alphabetically lists the acronyms used throughout the paper.

II. RELATED WORK
To provide a context of the contributions of this survey, the first step is to review some surveys related with the methods and techniques of ML applied to routing SDN, which are summarized in Table 2. This summary presents the authors, the focus of the survey, as well as the coverage of the three areas that characterize our survey: SDN, routing and ML.
In particular, an empty cell means that area is not covered, while one or two ticks indicate the topic is partially and fully covered, respectively. Additionally, pros (highlights) VOLUME 9, 2021 and cons (missing aspects in relation with the contributions of our survey) are also included as two separate columns. It is important to note that the selection of works was based on relevance to our survey (at least two of the three ideas covered in our survey should be included) and/or number of citations. Otherwise, if not filtered, there are hundreds of surveys somehow related to ours (either because of SDN, routing or ML), like surveys about SDN controller placement [18] or ML applied to network security [19]. The first two surveys in the list are strictly focused on the SDN paradigm. Although they only focus on one aspect of the three covered in the survey, they are worth mentioning due to its high amount of citations (>1000). Nunes et al. [20] present the state-of-art in programmable networks, with a particular focus on SDN. These networks are depicted from the oldest to the newest development ideas, followed by the architecture of SDN and the standard of OpenFlow. Diverse alternatives are also discussed for the implementation and testing of SDN-based services and protocols. Finally, they provide information about current and future SDN-based application trends, as well as multiple research directions of SDN. Hu et al. [21] survey the implementation of SDN/OpenFlow, including basic concepts, language abstraction, applications, virtualization, controller, security, Quality of Service (QoS), as well as integration with optical and wireless networks. They also compare the merits and demerits of different network implementation schemes. This survey is particularly helpful to understand the progress of SDN/OpenFlow designs.
Afterwards, we would like to highlight two surveys that still mainly focus on SDN, but including some sections to analyze the specificities of routing in this field. Kreutz et al. [22] is one of the most referenced surveys in the SDN field. It discusses the definition of SDN, its core concepts and differences compared to traditional networks. The architecture of SDN is presented in a bottom-up approach. The authors performed a comprehensively analysis of its architecture, APIs, network programming and network layers. They also focused on the major problems of cross layering and their solutions. Keeping in view the security, performance, scalability and resilience, the design of controllers and switches are addressed in this study as well. Mendiola et al. [23] extensively survey approaches for traffic engineering in SDN, indirectly mentioning their application in routing in SDN.
Additionally, with a bigger emphasis on routing and smaller on SDN, Karakus et al. [24] provide a comprehensive survey and summary of taxonomy and characterization of SDN control plane scalability. Two main areas are discussed: network topologies and mechanism to tackle scalability. In the former, they describe the relationship of the topology with scalability, considering the impact of a centralized/ distributed controller and, transversally, hybrid and hierarchical designs. In the later, they studied mechanisms to optimize controller scalability, such as control plane routing and parallelism based optimization. It finalizes summarizing challenges and open problems for scalable SDN control planes. On the other hand, just focusing on ML and routing, without emphasis on SDN, Chen et al. [25] provide a very good overview on the application of Artificial Neural Networks (ANNs) on wireless networks applications.
The first survey works to address the three features examined in this survey (SDN, routing and ML) are more recent (from the last three years). Binsahaq et al. [26] focus on autonomic provisioning and management of QoS in SDN. As part of that analysis, it encompasses some works related with ML and routing, and the authors specifically have a section devoted to ML for QoS management. Etengu et al. [27] extensively analyze AI-assisted networks for green routing and load balancing, focused on a pragmatical approach, that is, hybrid SDN, usually leverage for smooth migration from legacy systems. At the end of the survey, the authors provide a set of challenges and future research directions, and they define a specific framework to tackle them. Qian et al. [28] concisely survey a set of applications in communication networks where reinforcement learning is applied, including network caching or task offloading. It includes very briefly the relationship with SDN and routing applications. Mammeri et al. [29] comprehensively analyze reinforcement learning approaches for routing, not only for SDN-based networks, but for all types of networks, which provides a very good overview of the evolution of this specific ML technique and its application in communication networks. Jamshidi et al. [30] explain applications based on ML methods and techniques by dividing them into six categories of networking, namely: traffic prediction, network security, cloud services, application identification, domain name system, and QoS. In all these categories, they determine the ML methods and input datasets. It summarizes the various challenges and major findings of these input data and ML methods. Zhang et al. [31] presents diverse applications of ML in routing and resource allocation in optical networks, without any specific focus on SDN-enabled networks.
Four works are close to the objectives of our survey. Boutaba et al. [32] survey ML research opportunities and evolution in the field of networking. They provide a brief introduction to ML techniques, engineering techniques, approaches and methods for data gathering in network traffic, followed by an overview of ML techniques in routing, traffic classification, QoS/QoE, anomaly detection, fault management, and intrusion detection. Additionally, they focus on the importance of secure learning support, online learning and the architectural design of systems so that ML can be used easily. Their survey covers above 500 studies. Xie et al. [33] present a comprehensive detail of the ML techniques, architecture and working of SDN. Different types of ML algorithms are explained and described in SDN in terms of optimization, QoE/QoS, security, resource management, and traffic classification. Future research and challenges are also discussed. Zhao et al. [34] surveys the diverse networking applications that benefit from the combination of SDN and ML, including a section about routing optimization, though not in depth. Quach et al. [35] is the closest to our work so far, but it just focuses on approaches based on reinforcement learning. In any case, it is a concise survey about that type of routing in SDN and provides a quick overview about objectives and associated algorithms.
Finally, Farhady et al. [36], Scott-Hayward et al. [37], Al-Heety et al. [38], Hatagundi et al. [39], Chica et al. [40] reviewed different SDN related technologies, the details of SDN planes, benefits, challenges, security, and attacks in SDN but their scope is further from the analysis of this survey, as they do not discuss the applications or use of ML in SDN.
Currently, to the best of our knowledge, no one specifically surveyed the ML techniques for routing optimization in SDN. To fill this gap, in this paper, we provide a detailed study of ML types and their usage in SDN routing. We envision that our discussion and exploration will provide readers with an overall understanding of ML techniques for routing in SDN and foster more subsequent studies on this issue.

III. SOFTWARE-DEFINED NETWORKING (SDN)
Over the last decade, a new wave of innovation has emerged in the networking field thanks to the SDN paradigm [22]. In its origins, it consisted mainly of a protocol, Open-Flow [41], which separated the data and control planes, allowing the flourishing of new network protocols and designs. However, it rapidly evolved into a new architectural approach in which the so-called dummy switches (data plane) were managed by a logically centralized entity, the SDN controller (control plane), through the OpenFlow protocol. Although the concept of uncoupling these two planes was not new in the field. SDN unlocked the hardware market, very opaque until that moment, bringing the opportunity for new manufacturers and researchers to cooperate, even in hybrid environments [42]. Currently, the Open Networking Foundation (ONF) is in charge on the main standardization efforts in the field of SDN.
By definition, SDN hides the complexity of the network design. Its architecture (previously depicted in Fig. 1) provides dynamic, cost-effective, manageable and adaptable network control. An alternative definition of the SDN architecture is illustrated in Fig. 3, in which SDN consists of four planes [43].
At the bottom of the architecture, the Data Plane is also known as the forwarding plane, user plane or carrier plane [44]. It consists of the set of network devices (virtual or physical) that transmits the user traffic. The Data Plane handles arriving frames according to the logic of the Control Plane. Some of the actions to be applied include forwarding the frame, modifying it or discarding it.
The Control Plane is the network brain, responsible of decisions such as routing or traffic signaling [44]. Though originally designed completely separated from the Data Plane, some part of the Control Plane might be delegated to network devices under some circumstances, following a hybrid approach [42]. The communication of these two planes is performed through the Southbound Interface (SBI), originally following the OpenFlow protocol, but currently involves other alternatives such as P4Runtime [45].
Above it, the Application Plane is connected through the Northbound Interface (NBI), usually asynchronously (e.g., REST API), to define the overall behavior of the network desired by the network administrator. Some authors merge Application and Control planes, some other do not. The criterion to separate them is that usually the Control Plane consists of core networking functions, common for all types of applications (for instance, topology discovery, shortest-path computation, etc.), while the Application Plane are individual applications that leverage the Control Plane to be executed. The so-called SDN controllers are software platforms that include both Control and Application planes.
Finally, the role of the Management Plane, transversal to the three previous planes, is to provide a mean to manage the network for additional aspects such as configuration, monitoring, billing, etc. Some common protocols include classic ones like: HTTP (Hyper Text Transfer Protocol), SNMP (Simple Network Management Protocol), XML (Extensible Markup Language), RMON (Remote Network Monitoring), and SSH (Secure Shall). This plane is clearly the most heterogeneous of the architecture and encompasses diverse challenges [46]. In some specifications, particularly the latest ones, the Management Plane is seen as part of the Control Plane, as a management-control continuum.
In summary, the main benefit of the SDN paradigm is that it brings new possibilities for logically centralized network control. For instance, it allows users to access virtual and physical elements from a single location, because of its virtualized control planes and forwarding planes. SDN also allows administrators to monitor everything centrally, which enhances global view management compared to traditional networks. Some major telecom organizations (e.g., Google [47], VMware [48], Microsoft [49], or Facebook [50]) have already adopted the SDN architecture for their data centers. At the same time, some popular network vendors and related companies (namely Cisco [51], Huawei [52], NEC [53], Verison [54], HP [55], and AT&T [56]) are also firmly committed to the application of the SDN architecture by designing and producing SDNrelated components. As a consequence, centralized techniques like ML are increasing in SDN, reinforced by its architecture, including applications such as resource management, QoS prediction, traffic engineering, security and routing optimization.

A. ROUTING APPLICATIONS AND CHALLENGES IN SDN
Optimized routing could be considered one of the core objectives in computer networks. In particular, this objective is directly related to network traffic engineering, as this field is founded on one particular idea: to accomplish that traffic is routed according to the exact traffic demands [23]. Therefore, we could claim that traffic engineering is one type of the multiple optimizations of routing, as routing could also be optimized based on other parameters (and not only on traffic demands). Additionally, these traffic demands are variable depending on whether we consider data or control traffic. In this regard, the logically centralized view of the SDN controller facilitates many aspects in comparison to traditional routing. For instance, topology graphs can be easily extracted from the network and shortest-path algorithms, like Dijkstra, can be efficiently -and dynamically-computed to obtain the best paths. This had led to the direct application of computer science algorithms to computer networks [57], without the need of translating them into distributed protocols, like the generation of disjoint paths for traffic engineering purposes, which is now easier than ever [58]. Consequently, thanks to SDN, routing can be easily parameterized based on types of optimal routing (shortest path, constrained shortest path, etc.), cost functions or resources, for example. This facilitates and easy adaptation and deployment based on the specific scenario [57], as there is not a clear winning type of routing applicable to all networks.
It is also important to highlight that the data and control plane decoupling of SDN implies the incorporation of a new communication channel in the southbound of the architecture, typically implemented with OpenFlow. This channel can be implemented either in an out-of-band or in an in-band mode. In the former, the communication between both planes is direct (though it requires more resources for deployment), while in the former it is not. That is, in-band SDN also requires the application of traffic engineering for optimized routing.
Another example is the opportunity to implement newer functionality, particularly the one related with cloud computing, like ML. In this regard, SDN simplifies the development of ML techniques to support network routing thanks to its centralized monitoring capabilities.
Nevertheless, although SDN is an ideal answer for Information and Communication Technology (ICT) deployments, cloud suppliers and undertakings, SDN faces a few challenges [59] that affect its performance and usage. The set of SDN challenges comprises: • Controller location: SDN implies an additional communication channel between the data and control plane, which might not be completely transparent, particularly in large networks, in which out-of-band communication might be unfeasible. Therefore, the specific location of the controller should be carefully planned.
• Scalability: Directly related with the previous aspect, as SDN is logically centralized, network managers should consider to what extent should all control be delegated to the controller, to avoid bottlenecks and scalability issues. However, this decision is not trivial for all use cases.
• Performance optimization: Performance optimization is a challenge in all network types per se, but in SDN the way to achieve it changes from a distributed approach to a centralized one.
• Security: As SDN is logically centralized, it might be easily threatened.
• Interoperability: Particularly relevant in large networks, heterogeneity and interoperability among different types of SDN technologies is still a challenge.
• Reliability: Similarly to traditional networks, reliability is also a challenge. However, in SDN is even worse, as the control channel communication represents a new potential failure point that should be reliable and, hence, protected. One of the consequences is that SDN controllers must be astutely arranged to forestall manual blunders. For example, in a conventional system when one or many system gadgets fall flat, management information errors might be locally kept and do not affect the overall behavior of the network. Whereas in SDN, a solitary controller is accountable for all the systems, and if there is any inaccuracy in it, the entire system might fall. To address this issue, research should be focused on coordination of distributed SDN controllers with security guarantees. Currently, from all existing SDN controllers [60], we would like to highlight two of them: Ryu [61], because of simplicity and easy prototyping, and ONOS [62], as it is supported by the ONF and implements the driving SDN use cases devised by industry.
In summary, the centralized architecture of SDN provides a faster overview of the network status and substantially smoother programmability and updates, but it still requires a control overhead that needs to be carefully managed and that is established now in a north-south (hierarchical) style rather than east-west (flat) manner, typical of distributed legacy systems.

B. ML IN SDN ENVIRONMENTS
Although ML (as well as AI, generally speaking) has been applied in networking for two decades now, its adoption in practical deployments is still in early stages [63]. Thanks to the softwarization of networks, the application of AI and ML in networking is nowadays potentially easier to implement, thus, opening a wide range of new functionalities. In fact, some authors have recently addressed the term Knowledge-Defined Networking (KDN) [64], which include the so-called Knowledge Plane [6], directly related with the inclusion and integration of Artificial Intelligence in SDN environments.
In particular, data-driven networks [65] are one type of computer networks, fostered by both SDN and NFV, which could easily adapt to traffic demands (once again for traffic engineering purposes) or network changes, for example. Although some authors agree that there is still work to be done (in particular regarding models and architectural aspects [65]), it seems we have now reach the right momentum to even accomplish the concept of self-driven networks [66]. For example, a self-driven network benchmarking framework was recently proposed by Zerwas et al. [67] and they prove how it can be applied to a well-know SDN software switch, viz. Open vSwitch (OVS).
Finally, we would like to put some additional emphasis in the case of the future 6G networks, as many authors already agree that ML is a key enabler [68], [69].
Some applications included in their roadmap are, for instance, object localization, Unmanned Aerial Vehicle (UAV) communication, surveillance, security and privacy preservation [69]. All of them envisioned as part of fog/edge computing architectures [70].
However, although the SDN architecture allows a very straightforward application of intelligent algorithms, there is still a need to analyze which suits best each type of network and data, as the requirements greatly vary among different network scenarios. Furthermore, open networking datasets are still a scarce resource for the research community, and these are key components to design ML-based frameworks.

IV. MACHINE LEARNING TECHNIQUES
ML was first introduced by Arthur Samuel in 1959. ML is the branch of AI that enables the systems to learn automatically from experience and to improve themselves without being explicitly programmed [71]. It guides systems for making good predictions based on data. ML systems can make decisions and identify different patterns. ML models get the new data independently and make decisions, computations and results by learning from previous state of computation. It provides solution in many problems, such as pattern recognition [72], character recognition [73], speech recognition [74], vision, or robotics.
ML is a very vast field whose methods have been classified attending to multiple categories. A general classification groups ML techniques according to the kind of learning involved, distinguishing the supervised, unsupervised and reinforcement learning (with a particular focus on deep reinforcement learning), as depicted in Fig. 4. On the other hand, the irruption of ANN, particularly the Deep Neuronal Network (DNN) (also Deep Learning in the literature), meant a substantial improvement of the error rates for the different ML tasks, to the point of classifying the methods between the classical and the neural-network-based methods, or even more specifically DNN-based methods. The present survey follows both classifications in parallel. This is because the provided classification is non-exclusive and that, consequently, methods of one category can be used with other types of learning. However, we have grouped the methods in the mentioned learning categories considering the most frequent learning technique, paying special attention to the area of routing optimization in SDN. Alternative criteria for classifying ML methods exist, such as arranging the methods according to the kind of training algorithm used (distinguishing between closed-form vs. iterative algorithm), or categorizing them attending to the final task in classification or regression methods.
There exists an additional orthogonal learning paradigm called federated learning which consists of a set of distributed learners which can be individually trained following one of the other mentioned learning paradigms and coordinately elaborate classifications or predictions. This special paradigm reminds us of the ensemble methods (random forest, boosting and bootstrap), but device distributed, which means both data and learning are individually used to create learners, even in different network nodes, whose predictions are then combined. Unfortunately, the authors did not find works that use this kind of learning for routing optimization in SDN, hence it was excluded of the classification. However, this approach is recently irrupting in near fields, such as mobile and wireless networks [75], [76].

A. SUPERVISED LEARNING (SL)
SL is a learning paradigm based on discovering the unknown function f : X → Y that relates the input and output spaces, X and Y respectively, from input-output pairs (x i , y i ) ∈ X ×Y . This process is called training and requires a labelled dataset for the accomplishment of the task. Literally, supervised training algorithms infer the map f from the provided training data D, typically minimizing a loss function L which penalizes the committed error. Learning algorithms seek f in specific function spaces f ∈ F, most of them are parametrized, and consequently, the learning task becomes into an optimization problem: Different parametric function spaces F with different learning algorithms correspond to the existent variety of supervised methods. The following methods are commonly considered as supervised methods, although some of them can be trained in an unsupervised way, or using a reinforcement learning strategy, and consequently, belonging to several categories:

1) ARTIFICIAL NEURAL NETWORK (ANN)
Artificial Neural Networks (ANNs) [77] consist on a set of connected units known as artificial neurons which emulate the biological neuronal networks of the animal brains. Due to their ability to model complex non-linear relations and their capacity to massively address data, they revolutionized the ML field. ANN-based effective applications include: adaptive control, laser applications, medical areas, process logging, and energy areas. The Perceptrons and Multilayer Perceptrons (MLP) were the first architectures of ANNs. Also, ANN models relations described by dynamic systems, such as the Recurrent Neuronal Network (RNN) [78].
Deep Neural Network (DNN) [79] is a subcategory of the previous one, which bind together a huge amount of recent networks architectures which have in common the high number of interconnected layers. Deep Learning starts with the Convolutional Neural Network (CNN), a DNN with a sequence of convolutional layers configured in cascade. They are capable of extracting intrinsic local features, the called deep features, proving to surpass the result of its predecessor in both classification and regression task. Nowadays, the research efforts are focused on the improvement of the DNNs, as the amount of publications in this field proves. Autoencoders [80], Residual Networks (RESNET) [81] or VGG [82] are CNNs included in this category. DNNs also include networks for temporal sequence, such as, the improved RNN [78], which evolved to the novel Long-Short Term-Memory (LSTM) [83] and Gate Recurrent Unit (GRU) [84]; and the Random Neural Networks (RndNN) [85], which represent a set of cells that are connected in a network that transmits spiking signals. Some of these DNNs can also be trained using reinforcement learning algorithms.

2) MARKOV DECISION PROCESS
Markov decision process [86] is a kind of stochastic process in discrete time. They obey the Markov property which establishes that the probability to pass to a specific state in the next time exclusively depends on the current state. They try to find a good action policy for the decision maker which is affected by noise environment.

3) LINEAR REGRESSION
Linear Regression [87] is one of the simplest and more effective ML methods. The linear regression assumes that a linear dependence exists between the dependent variable y and the explanatory variables (the independent variables). The simplest estimation algorithm retrieves the coefficients using mean-square-error. Robustness against outlayers were introduced driving to the LASSO, Ridge or ElasticNet regressors.

4) LOGISTIC REGRESSION
Logistic Regression [88] is used for classification problems. It is based on the idea of probability and it uses predictive analysis algorithms. The Logistic Regression uses an increasing cost function. This cost capacity can be characterized as the sigmoid function (logistic function) rather than a linear function. Logistic regression confines the cost function in the range between 0 and 1. Both Linear and Logistic Regression are included in the called Generalized Linear Model (GLM), a wide model which unify various other statistical models.

5) RANDOM FOREST
Random Forests [89] are supervised learning methods which assemble the result of a large number of decision trees of multiple sizes to estimate a unique value in regression or to yield a class in classification.

6) EVOLUTIONARY ALGORITHMS
Genetic Algorithms (GA) are probability search algorithms inspired by the genetic mechanism of Darwinian natural selection and biological evolution. GAs provides the solution to deep problems by the reproduction process and code techniques. In many domains, GAs have been used with considerable efficacy.

B. UNSUPERVISED LEARNING (UL)
UL seeks patterns among unlabelled datasets. Contrary to SL, human supervision disappears due to lack of pre-labelled input-output pairs. Unsupervised methods self infer relations among the variables according to features such as orthogonality, correlations, statistical separability, etc. The clustering or grouping methods together with the one based on principal components analysis are the most common unsupervised methods, but not exclusively. Recently, we count on unsupervised DNN-based methods such as the Generative Adversarial Networks (GAN) [90].

1) K-MEANS
K-means [91] is a ML algorithm, specifically, a vector quantization technique that seeks to group a number of observations This method minimizes the cluster variance. Each observation is associated to the cluster with the nearest distance to the cluster centroid.

2) HIERARCHICAL CLUSTERING
Hierarchical Clustering [92] groups near observations in clusters and establishes links between optimizing cluster dissimilarity. As a result, the method returns a partial ordered dendogram which provides the data clusters with a hierarchy.

3) SELF-ORGANIZING MAPS (SOM)
Self-Organizing Maps (SOM) [93] are ANN trained to retrieve a low-rank discrete representation of the input space, the called map, given the unlabeled training data. The method looks for the intrinsic topological properties of the input space.

4) GAUSSIAN MIXTURE MODELS (GMM)
Gaussian mixture models (GMM) [94] assume that observations are generated by a mixture of a finite number of Gaussian variables. It is a probabilistic model which generalizes k-means modelling the uncertainty of cluster assignments by introducing the covariance to the problem.

C. REINFORCEMENT LEARNING (RL)
RL is another machine learning paradigm conceived to teach an agent to make local decisions and take actions in order to minimize a cumulative penalty or maximize a cumulative reward [95], [96], as illustrated in Fig. 5. Contrary to the SL and UL paradigms, the temporal variable is decisive, and the error metric is time distributed. In particular, in comparison with the supervised approach, RL does not count on labeled datasets. Feedback is obtained from the environment over the agent acts. Typically, Markov Decision Support (MDS) systems comprise the RL framework, where dynamical programming algorithms are used to maximize the reward. Recently, DNN-based frameworks were introduced and significantly improved this learning paradigm [97]- [99].

1) Q-LEARNING
Q-learning [100] is a model-free RL method to teach the agent an action policy according to the state and the observations from the environment. As a model-free RL, the method does not use the transition probability. The method operates under an MDS framework finding an optimal policy using an expectation-maximization algorithm of the cumulative reward computed over all the successive steps, starting from the current state. Nowadays, it constitutes a baseline for the existing RL methods.

2) DOUBLE Q-LEARNING
Double Q-learning [101] is an improvement of Q-learning which overcomes the problem of overestimation of the action VOLUME 9, 2021 values in noise environments, which results in a learning deceleration.

3) STATE-ACTION-REWARD-STATE-ACTION (SARSA)
SARSA [102] is another RL method over MDS. The acronym shows that the updating function of the Q-value depends on five aspects, namely: the current state of the agent, the action the agent chooses, the reward the agent receives for choosing this action, the state that the agent enters after taking that action, and the next action the agent chooses in its new state.

4) DEEP REINFORCEMENT LEARNING (DRL)
DRL [103] is a subtype or subclass of RL that combines ANNs with RL models to enable SDN agents to learn the most efficient path and to achieve their goal. DRL incorporates ANNs to the agents in the RL framework. Traditional RL methods cannot solve high-dimensional decision making problems due to the high complexity of their states. ANNs bring better function approximation to the agent for making a decision, surpassing the mentioned disadvantage, which now can learn accurate policies π(a|s) in a supervised way. It enables us to take the important decisions at wide range and solve them. Traditional DRL controllers [104] use fixed pre-processing steps, which are unable to adapt their processing state in response towards the learning signal. DRL is applied to many applications like robotics, healthcare centers, finance, smart grids and many more. The structure of DRL are shown in Fig. 6.
While DRL could be seen as part of RL and not as a differentiated type, we have specifically distinguished it from RL because, particularly during the last two years, there is a growing hype in its application in SDN environments and, for that reason, we believe it deserves its own analysis section. Due to its interesting for the community, we point out a special DLL method, the Deep Q-learning an evolution of Q-learning with ANNs.

5) DEEP Q-LEARNING
Deep Q-learning [97] substitutes the MDS framework with DNN and solves the problem of multiple states and massive data. The traditional Q-table, which keeps track of the states, actions, and their expected rewards, is now substituted by an ANN to predict both action and Q-value only from the state. Usually, its methods are based on RNNs, LSTMS and GRU, due its intrinsic evolutionary character, besides CNNs [98], [105].

D. SELECTING THE BEST ML METHOD
After introducing the different techniques, classified into three core types, we would like to provide a quick -and qualitative-overview of which technique or method seems to be more suitable for routing in SDN. There is no straightforward answer for this matter, and we could state that the best solution is strongly conditioned by several factors: 1) Dataset type: Scenarios where a labeled dataset is available allow the use of supervised ML methods, which are usually more accurate than its nonsupervised counterpart. Learning from datasets permits to infer input-outputs relations that can be considered for routing. However, it is very important to have observations that cover the whole variability of situations.
In this regard, we want to remark that, as we will examine within the following section, the majority of the works for routing in SDN use simulated datasets for training the ML algorithms. Only a few approaches directly work with real datasets, which better capture the real input-output relation than the synthetic ones.
As the access to this kind of information is more difficult and the field does not count on standardized databases that allow testing the different proposals, unsupervised methods are frequently applied to find patterns in unlabeled datasets. On the other hand, RL is specific for dynamical optimization problems, such as, the routing optimization problem in SDN. RL methods have the ability of learning from the environment and adapting to the change of environment conditions. The agent must be trained maximizing a reward function from the environment instead of using a labeled database. 2) Dataset size: The size and nature of the database strongly constrains the type of ML method we can use for estimating routing parameters. Large databases are suitable for ML techniques that involve a huge number of parameters such as ANNs or DNNs. Large databases also avoid the overfitting problem and allow to infer new input-output relations difficult to find in small datasets with a few observations. Nevertheless, the use of large databases requires long training time and expensive equipment, such as, graphic cards. The computation time for inferring the parameters tends to be higher than using small databases. Additionally, small datasets are more available and easier to manage for training any ML method than the large ones. However, they may not permit to infer complex inputoutput patterns. 3) Problem type: Many routing optimization approaches in SDN divide the routing task into sub-problems that can be individually solved by ML methods, such as, ''maximum throughput & minimum cost'', ''minimum congestion probability'' or ''bandwidth prediction'' problems. From a ML point of view, we distinguished two different types of problems: classification and regression. In classification, we want to identify which category, from a finite set of different classes, an observation belongs to; while in a regression problem, we want to estimate real vectors that belongs to continuum intervals. ML methods are different depending on the type of problem to solve. Considering all these factors, large datasets are appropriate for ANN-based and DNN-based approaches, which can extract interesting parameters from data. The difficulty of finding large datasets can be softened by a first training with synthetic database [106]- [108] and, afterwards, using a last fine-tuning step with a small real dataset. ANN-based methods suffer from overfitting if they are trained with mediumsize or small dataset. With medium-size dataset, we can try support vector machines and the ensemble methods, including random forest. Specifically, random forest has proven to be faster than other ensemble methods since it is a treebased ensemble. With small datasets, the best option is to use linear regressors, such as, ridge, lasso or elastic-net regressors, which are simpler but faster than the previous methods and, in most cases, effective enough [109], [110]. With no given dataset, unsupervised clustering methods are required. The most sophisticated unsupervised methods are the hierarchical clustering and the self-organizing maps, which even work with large unlabeled dataset. The more traditional method K-means is also used with medium-size databases [111], [112]. Similar to supervised learning, deep reinforcement learning should be applied in those scenarios where multiple iterations with the environment are permitted, specially the LSTMs and RNNs [113]- [115]. Neural networks need to be extensively trained. Otherwise, reinforcement learning methods based on MDS such as Q-learning or SARSA can be used [116], [117].

V. MACHINE LEARNING TECHNIQUES FOR ROUTING OPTIMIZATION IN SDN
As already presented, ML [118] can play a core role in optimizing routes in SDN, by saving time, money and ensuring the fast delivery of data within the required time. While traditional routing techniques [119]- [121] suffer from complex dynamics in networking, and face some problems such as performance declines and low convergence, ML is particularly appropriate for the SDN architecture, as it is capable of easily centralizing the information gathered in the network. Accordingly, ML together with SDN compose a thriving approach in the game of route optimization.
Although the overall procedure in ML is based on continuously retrieving data, training it, learning from it, predicting the new values and choosing the most efficient route, ML strategies might be utilized depending on the specific strategy and system requirements. In this survey, we comprehensively examine the state of the art of ML techniques that are implementable and applicable in SDN. To this purpose, we classify the ML techniques for routing optimization in SDN following the taxonomy of Section IV in three categories: Supervised Learning (SL), Unsupervised Learning (UL), and Reinforcement Learning (RL). The latter contains an additional subsection dedicated to Deep Reinforcement Learning (DRL), and its table is separated as well from the one of classical RL. The large amount of DRL methods in routing optimization of SDN justifies their exposition separately from the reinforcement learning methods, which strictly include them considering the theoretic taxonomy.
Afterwards, the works analyzed are ordered following the different techniques leveraged for the conceptual implementation. All of these ideas are summarized in Tables 3 and 4 for SL, 5 for UL, 6 for RL, and 7 and 8 for DRL, in which we classify the different ML works based on the following parameters: types of techniques, objectives, implementation and evaluation, and advantages and disadvantages. Additionally, this chapter is finalized by providing an overview of learned lessons and current research trends.
The order of appearance of the different works is chronological, but also based on the ML techniques used and relating proposals by shared sets of authors. In particular, we started from the oldest work in the different types of ML, and then continued with similar works (using the same ML technique) from oldest to newest, so that all proposals were somehow intertwined and following a logical timeline. We believed this approach could facilitate the description and understanding of the evolution of the different proposals, as strictly following a chronological order could cause the reader miss the relationship between approaches, as well as their pros and cons.
Finally, we would like to highlight that the present survey focuses on the different ML techniques found in routing optimization in SDN. Observe that most of the optimization techniques appear in the literature to complement the ML methods and subordinate to them. That is the case of Sabeeh et al. [122], who propose a hybrid intelligent system, named Hybrid Intelligent Approach (HIA), which is used to optimize the performance of SDN. In most of the cases, optimization techniques are used for training the ML methods, reducing the number of features, or finding some important hyperparameters.

A. SUPERVISED LEARNING
Dynamic routing is a technique that forwards data using different routes based on given conditions or communication circuits. NeuRoute [106] is a framework of dynamic routing for SDN that leverages ML and solves the Maximum Throughput Minimum Cost Dynamic Routing Problem, achieving the same result as other dynamic routing algorithms, but requiring less execution time. NeuRoute is a dynamic framework that is controller-agnostic, which uses a neural network for learning traffic characteristics. Based on a real-time predict traffic matrix, forwarding rules are generated to optimize network throughput. To ensure a certain  value of QoS, the common practice is to allocate more network resources than strictly required, based on peak traffic load estimation. In a case when peak loads are predictable, this practice of QoS is quite simple but in the long term, it is not justified economically. The basic motivation of Neu-Route is that, in dynamic routing, due to high computational complexity, the use of traditional algorithm solutions is not practical. Two of its main core blocks are based on DNN: the traffic matrix predictor and the traffic routing unit. The traffic matrix predictor is a LSTM which accurately predicts the next step. The traffic routing unit is designed with a FFN which learns how to match the traffic demands to the routing paths.
Chen-Xiao et al. [107] introduce a load balance resolution system with the benefit of a global network view for SDN. It increases the performance of data broadcasting in SDN. The principle is to outperformed legacy routers, which store routing tables that only contain destination network and nexthop information, hence missing a global routing view. The authors propose a mechanism in which the SDN controller discovers all paths between source node and destination node, and implements a load balancer application to efficiently distribute the traffic. The load balancer server maintains the load in each path [107] based on real-time metrics. More specifically, the load balancer immediately calculates all load conditions of multiple paths that are received from the SDN controller. After receiving the chosen path for transmission, the SDN allocates the flow tables for OpenFlow [136] switches to achieve a certain data-flow transmission. To this purpose, the authors propose an ANN composed by one single hidden layer (with a maximum of 11 neurons), which receives four load features as inputs, namely: bandwidth utilization ratio, packet loss rate, transmission latency, and transmission hop. The ANN infers the integrated load. The authors evaluate this architecture using Mininet and the Floodlight controller [137], and results suggest better performance and a decrease in network latency of 19.3%.
Wu et al. [123] present AIER, an ANN to predict the minimum congestion probability among all path configuration. The network is trained to predict the congestion given the loads for all data flows and all the available path configuration.
Sabeeh et al. [122] propose a hybrid intelligent system, named Hybrid Intelligent Approach (HIA), which is used to optimize the performance of SDN. HIA, whose architecture can be seen in Fig. 7, is a combination of multiple ML methods and techniques working together or parallel. The performance optimization of SDN is performed using a hybrid intelligent approach. The ML techniques, namely ANNs and Adaptive Network Fuzzy Inference System (ANFIS) [138], are used for mapping and modeling. Additionally, GA [139] and Particle Swarm Optimization (PSO) [140] are optimization techniques that give maximum performance of SDN by using the ANN model. In this paper, the authors performed the simulation of SDN by using Mininet and the POX controller, for collecting input and output datasets.
NeuTM, also proposed by Azzouni et al. [124], uses LSTM-RNNs [141] for traffic matrix forecasting. It applies a sliding window technique for obtaining the input-output pairs to feed the Neural Networks. The LSTM is a strong self-learning algorithm with the ability to detect complex non-linear patterns, widely used for time-series predictions.  The results show that LSTM performs better than traditional RNNs and obtains high prediction accuracy in a very short training time.
Benamrane et al. [125] focus on SDN in avionic networks, where the complexity of security of communication, management, handover between radios, and QoS requirements are the major challenges. The interest of SDN in avionics is the ability to program the aircraft and the ground network devices in a unified and centralized way through software applications. The authors provides an adaptive bandwidth manager based on real-time traffic which runs on top of the SDN controller and ensures the QoS policy fulfillment for the aircraft critical and non-critical services. This bandwidth manager optionally includes a time series forecasting module based on ARIMAs and LSTMs capable to predict future bandwidth variations.
RouteNet, proposed by Rusek et al. [126], [127], is a new type of Graph Neural Network (GNN) specifically conceived for modeling computer networks. It is inspired by the Message Passing Neural Network (MPNN) previously proposed in the field of quantum chemistry. RouteNet is capable of capturing the complex relationships between between topology, routing and input traffic to produce accurate estimations of the per-source/destination pair mean delay and jitter. It is trained with synthetic data generated by a custom-built packet-level simulator with queues using OMNeT++. The delay and jitter are related to the bandwidth capacity of each corresponding egress links. Using RouteNet as a SDN controller, the authors show the ability to optimize multiple Key Performance Indicator (KPI) and to guarantee the servicelevel agreements (SLAs) of a particular set of flows.
The Machine Learning Routing Computation (MLRC) module, implemented by Troia et al. [109] considers it is a big challenge to provide accurate and efficient quality communications to end-users due to the amount of data transported by current telecommunications networks. In this regard, the authors leveraged the ONOS controller [142] to build a machine learning model, called MLRC, to train and configure the optimization in charge of finding the different paths in the SDN network. MLRC implements a logistic regression classifier due to its simplicity and explainability. According to their results, the SDN network is able to recomputed its routing configuration and execute it in a very limited lapse of time for any incoming shift in the traffic matrix. However, the authors anticipated their results are limited and real datasets could facilitate more advance models for optimized routing in real networks with industrial applications.
Wang et al. [110] present a module based on machine learning and implemented in SDN to enhance QoE. It chooses the best path, monitors, and controls and predicts the performance of the network. The researcher uses quality of experience (QoE) [143] to evaluate the performance and condition of the application. An optimal QoE is difficult to achieve for real-time applications, so a set of Key Performance Indicators (KPIs) [144] was defined. Moreover, their SDN module works both with information acquired from both the SBI and the NBI, as the SBI collects the network matrices and the NBI collects KPIs.
Sun et al. [128] combine a variety of ML algorithms to propose a data flow classification method called MACCA2-RF&RF, which identifies the data flow category (with almost perfect accuracy) and obtains the QoS requirements. The authors comprehensively evaluate their proposal with real datasets and an SDN implementation based on Floodlight and Mininet, which is quite close to real scenarios. However, some parts of their design still need improvement, such as the amount of table entries installed, which should be reduced to be scalable.
Choudhury et al. [129] introduce ML to control more efficiently SDN-enabled IP/Optical Networks [145] with SDN. The Open ROADM (Reconfigurable Optical Add-Drop Multiplexer) [146] concept together with the SDN controller tools permit the ISPs to more efficiently and homogeneously obtain network performance data to set up the best wavelength paths that meet the requirements of optical networks. For this purpose, ML is used to predict the best performance of wavelengths in multiple vendors. In their architecture, SDN controls all-optical routers, all-optical nodes, edge routers, and optical nodes, hence providing a global view. In the end, the authors defined two applications in ML that are managing IP and optical networks. The first application provides the facility of long-term perdition with global optimization, while the second produces short-term traffic prediction that helps out in reducing the customer traffic on the network.
EL-Garoui et al. [130] leverage SDN and ML for efficient routing in smart cities, where most applications are based on Internet-of-Things (IoT). They develop a framework based on the Naive Bayes algorithm and create a dataset based on the Montreal city open data website and the SUMO urban mobility simulator. After comparison with other protocols, like OSLR, obtaining better results in terms of delay and packet delivery ratio.
Hardegen et al. [108] present PFR, which is a flow routing paradigm that aims to efficiently distribute traffic (nearly evenly) over links/paths to avoid high load/congestion. Conditions for flows can be improved by minimizing observed latency/maximizing required throughput. The authors briefly provide a summary of the ML techniques employed. They continuously train a DNN on incoming data while treating the prediction of flow characteristics as a multiclass classification problem. As forecasting is carried out as flows start, only features known ahead of time are usable. Besides a continuous model update, an interface to request a prediction for flow 5-tuples is offered. Finally, a key aspect of this approach is that the authors implement their solution using P4 programmable switches, instead of following the classic centralized SDN model.
Awad et al. [131] focus on a rather theoretical analysis of enhanced multipath routing using DNNs. Although they leverage the TOTEM open source traffic engineering toolbox [147] (supported by experts in the field of computer networks) and their evaluation is pretty comprehensive, they do not provide any insights on actual SDN implementations, which limits the scope of their proposal.
Akbar et al. [132] design one of the few works analyzed that focuses on real computer network scenarios leveraging AI and SDN. In particular, they present a proposal based on genetic algorithms to achieve adaptative and reliable communication in IoT-fog environments, which could be considered one of the main objectives of the future 6G networks [148], [149]. The authors implement an SDN-based framework to evaluate their proposal and leverage real datasets. However, the evaluated topology is only one fixed custom topology.
Owusu et al. [133] propose diverse implementations of ML models to classify traffic in SDN-IoT networks for traffic engineering. The authors compared three different classifiers: Random Forest Classifier, Decision Trees Classifier and K-Nearest Neighbors Classifier. Also they evaluate two feature selection methods: Sequential Feature Selection (SFS) and Shapley additive explanations (SHAP). According to their analysis, the best accuracy rate, 0.83, is obtained by the random forest classifier with SFS. RoPE, proposed by Sacco et al. [134], is an architecture that adapts the routing strategy of the underlying edge network based on future prediction bandwidth. RoPE is a conglomerate of supervised time-series models and machine learning methods train to predict the bandwidth in such a way the controller can check whether the desired application fits the network load. It automatically chooses the algorithm to apply, in order to guarantee the best possible performance. Choosing the right forecasting method for a given use case is a function of many factors such as the historical data available and exogenous variables (e.g., weather, concerts). Data for training is collected via the Mininet emulator. As a result, the SDN controller tracks the past link loads and takes a new route if the current path is predicted to be congested.
Finally, Todorov et al. [135] present an architectural design to implement four types of ML techniques to improve load balancing and segment routing in SDN. However, the article does not provide any additional insights on implementation nor provides any type of evaluation.

B. UNSUPERVISED LEARNING
Budhraja et al. [111] state that usual SDN routing approaches do not usually follow privacy and compliance requirements of data transmission. This is particularly magnified considering the fact that SDN routes are usually static or defined specifically for each communication flow, which is prone to suffer from diverse security attacks like, for instance, Denial of Service (DoS). If such a kind of routing is performed in a controlled environment (HIPAA), we can lose important information in case of an attack. In this paper, the author focus on the privacy of sensitive data transmission and the restricted challenges of compliance in SDN environments. Since a big number of packets transmitted via the same data path is considered as a risk, route randomization is performed by monitoring the forwarding path and its transmitted packets. The required results are obtained by using i) ML and analytics for the computation of risk in SDN network; ii) distributed routing based on swarm algorithm; iii) minimizing the route randomization and risks for achieving the requirement of compliance and privacy. The proposed scheme works on history, as it collects previous packets for the purpose of training and then data packets are efficiently routed. For risk identification, the K-means clustering algorithm is used. It identifies k-centroid objects for finding the risk ratio, and it is processed offline. The risk is analyzed and then for routing data packets the online method is used to make a real-time decision. Ant colony optimization is used for making real-time decisions with low complexity level.
Kumar et al. [112] explore the applicability of ML algorithms for selecting the least congested route for routing traffic in SDN. The proposed method of route selection provides a list of possible routes based on the network statistics dynamically provided by the SDN controller. The authors propose two ML methods: a K-means clustering algorithm and the Vector Space Model with cosine similarity. The proposed methods are tested in Mininet using the Ryu controller and they made a comparison with Dijkstra's routing algorithm. The experiments shows that the best Round Trip Time (RTT) measurement of the traffic flows is achieved by the implemented K-means closely followed by Vector Space model, surpassing the times obtaining by Dijkstra.

C. REINFORCEMENT LEARNING
Lin et al. [116] emphasize the urgent need to define a reliable QoS routing mechanism for large-scale SDN-based networks. To solve this issue, they propose QoS-aware adaptive routing in multi-layer SDN. The architecture of hierarchical distributed control planes is introduced by combining the work of Kandoo [154] and Xbar [155]. Levels of this distributed control plane are Super Domain (master), switch subnets and slave controllers. Thanks to a RL, the authors achieve a reliable SDN infrastructure and minimum signal delay, later on expanded with time efficiency, and QoS aware of packet forwarding. This QoS-adaptive routing outperforms conventional Q-learning. VOLUME 9, 2021   Rischke et al. [150] consider addressing diverse and varying traffic loads implies the utilization of complex model, hence they focus on achieving a model-free RL scheme. Their proposal, QR-SDN, creates multiple paths between source and destination, which achieves substantially lower flow latencies. However, they devise additional research efforts are needed to conceive a scalable approach as the network size increases.
Casas-Velasco et al. [151] introduce a routing approach entitled Reinforcement Learning and Software-Defined Networking for Intelligent Routing (RSIR), which is founded on the need of adding a Knowledge Plane, as mentioned in Section III.B, to the network, which is fed by data gathered by the Management Plane. In particular, they define a proactive RL-based routing algorithm based on link-state metrics and implement it in a prototype with real traffic matrices. RSIR is compared against the classic Dijkstra's algorithm, which is leverage by most routing protocols. Results show that RSIR obtains more shortest paths and is able to better balance the load, hence reducing the overall latencies. As future work, they envision the evolution of their approach to DRL.
Fang et al. [117] consider that Dijkstra-based routing algorithms might have problems, particularly when data streams are combined by selecting the same forwarding path, which greatly reduces the use of network connections and leads to network congestion. As SDN is not constrained to any particular routing algorithm, the authors consider the application of RL, with a Q-learning-based routing algorithm, specifically for comparison against the RIP protocol. Additionally, by combining RL and NNs, which means the Q-table in Q-learning is replaced by a NN, the authors present a Deep Q-learning-based routing algorithm as well. Both algorithms are simulated and exhibit good performance results.
Sendra et al. [152] presents a solution to enhance network performance based on QoS and security concerns. The solution is implemented in a distributed manner only with Mininet and no controller, to facilitate testing a proof-of-concept. Their solution involve the application of reinforcement learning over the traditional OSPF routing protocol, using Quagga, which permits modifying the routing algorithms. It is tested and compared against the conventional OSPF routing protocol and results show that it enhances OSPF, obtaining more stable routes, with lower loss rates and better jitter and delay.
Valadarsky et al. [153] focus on data-driven routing and present some preliminary results in the context of intra-domain traffic engineering. They perform an analysis applying both supervised and reinforcement learning in a complementary way (reinforcement learning takes past values from the traffic demands and trains the values, while it assumes the future values or traffic demands with the help of supervised learning). However, no specific effort is performed to integrate this idea in SDN scenarios, although the authors leave it as future work.  [113], [156] present the CRE architecture that enhances the processing efficiency by gathering the network states according to the QoS requirements.

1) DEEP REINFORCEMENT LEARNING
Francois et al. [156] propose a new routing application called Cognitive Routing Engine (CRE) that enhances the efficiency of the processing and gathering of network states, and provides the best routing path that according to QoS requirements. The authors particularly consider the cloud provider use case, which typically needs dynamic re-routing for the different tenants, and focus on the design of the CRE module as an SDN application, as depicted in Fig. 8, in which the CRE application sits at the same level of the link discovery service. CRE is based on RNNs and tested in a Mininet scenario, but not exhaustively compared with other approaches.
Francois et al. [113] updated their previous work by a practical scenario based on specific data center locations, plus the use of the Floodlight SDN controller.
Sun et al. [114], [157] combine the Recurrent Neural Network (not to be confused with RNN) with Deep Deterministic Policy Gradient (DDPG) [181] to model TIDE, which proves to reduce network delay, as compared to standard shortest path routing schemes, like OSPF. In TIDE, the network model is represented as traffic data sequences in the router. The evaluated is performed via a realistic scenario based on Pica8 switches (well-known commercial SDN-capable hardware switches) and the POX SDN controller. In this experiment, 1000 training steps are present in each RNN-DDPG, and for performance measurement the average transmission delay is added in the total. After some time, it is observed that RNN-DDPG performs better as compared to shortest path. Although the results are promising, the authors foresee scalability issues in bigger scenarios. For this reason, a new work by Sun et al. [158], [159], entitled SINET, is presented afterwards specifically focused on scalability, in which partial control is applied together with DRL. SINET is evaluated via the OMNeT++ packet-based simulator, showing very good preliminary results. Finally, Sun et al. [160] present an updated solution for enhanced and scalable traffic engineering (similarly to their previous work), entitled ScaleDRL, in which they leverage the idea from the pinning control theory to select a subset of links in the network (set as critical links) and provide decisions based on them, hence fostering scalability. Their implementation is performed just with the OMNeT++ simulator, which might seem limited.
Stampa et al. [161] focus on the KDN concept to design a DLR agent to minimize network delay. The RL agent uses three signals that are state, action and reward, to provide a near optimal solution. The RL agent is an off-policy, actorcritic, deterministic policy gradient algorithm that exchanges these three signals for interacting with the network.
Yu et al. [162] propose the DDPG Routing Optimization Mechanism (DROM). DROM is based on neural networks, not Q-tables, which saves time and storage, and works in continuous time with effective black-box optimization. The evaluation is focused on delay and throughput, in comparison with the well-known OSPF protocol, and the authors additionally measured convergence time, obtaining good simulation results. Maheswari et al. [163] and Xu et al. [164] present a very similar work to DROM, following the same approach.
Yao et al. [165] exploit a hybrid ML paradigm that combines a distributed intelligence, based on units called ''AI routers'', with a centralized intelligence, called the ''network mind'', to provide different network services. Using this paradigm, the authors deploy centralized AI control for connection-oriented tunneling-based routing protocols, such as, multiprotocol label switching and segment routing, to guarantee a high QoS. Besides, for hop-by-hop IP routing, the authors shift the intelligent control responsibility to each AI router to ease the overhead imposed by centralized control and use the network mind to improve the  global convergence. The work provides a DRL-based algorithm for an effective routing policy generation. The authors apply a DDPG approach for policy generation [182]. A DDPG agent has two main components: a deterministic policy network, the called actor, which attempts to improve the current policy; and a Q-network, the called critic, which evaluates the quality of the current policy. An iterative alternation between both actors reach the optimum policy. The authors simulate their proposal with OMNeT++. Experiments prove that with increasing load intensity, the AI-based routing achieves better performance than shortest path routing.
Zhang et al. [166] apply deep neuronal networks for content-awareness and exploit DRL for traffic engineering decisions. They provide a parallel online learning mechanism to use DRL that has trial-and-error nature. They improve network performance in terms of total network throughput, bandwidth utilization, and load balance.
Nahar et al. [167] apply SDN-enabled spectral clusteringbased routing together with DDPG to define SeScR. The special thing about this proposal is that the objective are not packet-based networks, but Vehicular Ad-Hoc Networks (VANETs) instead. For evaluation, they used OMNeT++ together with SUMO, a popular traffic simulator.
Tu et al. [115] highlight the existing challenge for optimized routing in space-ground integration networks, particularly when changes occur in the topology and link status. For that purpose, they define the ML-SSGIN framework, which uses the DDPG algorithm and a neural network that integrates LSTM and Dense layers. They compared their proposal with OSPF, obtaining better results in terms of throughput and delay.
Quang et al. [168] also leverage the concept of KDN to apply the ML principles in SDN environments. In order to improve the performance of QoS-aware routing, the author exploit a DRL agent with Convolutionary Neural Networks VOLUME 9, 2021 in the KDN context to improve latency and packet loss rate. The results obtained show that even in complex networks, the proposed approach can significantly improve the performance of the routing configurations. By proposing a DDPG algorithm, the authors address the continuous control needs. The OMNeT++ discrete event simulator (v5.4.1) was used to obtain the latency and packet loss rate.
Swain et al. [169] propose the Convolutional Deep Reinforcement Learning (CoDRL) model, consisting of a DDPG agent coupled with a Convolution layer. The authors simulate the environment with OMNeT++ and show that CoDRL clearly outperforms OSPF in terms of delay and packet loss.
Lu et al. [170] design an enhanced version of DDGP entitled DDPG-EREP, and they evaluate it with an emulated network (composed by the Ryu SDN controller and Mininet), instead of using a simulator (as the previous works). However, their evaluation is limited to a single execution of a fixed topology and additional tests should be performed to prove the benefits of their approach.
Liu et al. [171], [172] particularly emphasize on the need for optimized routing in data center networks. Their approach focus on the specific needs of these types of networks and how resource allocation and routing affects the overall performance of software-defined data center networks. For this purpose, the employ Q-network (DQN) and DDPG to build their model, DRL-R. After an extensive evaluation performed via simulation in OMNeT++, their results outperform those of traditional OSPF and TIDE (another DRL-based routing model previously mentioned).
Fu et al. [173] propose a routing strategy based on deep Q-learning (DQL) specifically designed for data center networks. In particular, the authors consider that mice and elephant flows (usual types of flows in data center networks) have different requirements: both need low packet loss, but reduced delay is more important in mice flows, while high throughput is more relevant for elephant flows. Their proposal outperforms ECMP [174], the classic routing algorithm for data center networks, and SRL+FlowFit [175], which is an improved routing algorithm in comparison to ECMP and focuses on balancing the network load in folded-Clos data center topologies.
Jalil et al. [176] present Deep Q-Routing (DQR), which uses dueling deep Q-network with prioritised experience replay to compute a path for any source-destination pair request in the presence of multiple QoS metrics, such as delay, bandwidth or loss. They compare their approach with other existing learning methods for greedy online routing, showing better results in terms of loss and path cost, while keeping the best bandwidth most of the times and a reasonable delay.
Chen et al. [177] comprehensively analyze the need for optimized routing in SDN and present RL-Routing. After an extensive evaluation based on a real SDN controller and networks, RL-Routing proves to offer better results than other routing algorithms like OSPF and Least Loaded (LL).
Etengu et al. [27] propose a DNN-based approach in a hybrid SDN/OSPF network deployment. The SDN controller performs energy-efficient routing and enhanced performance with QoS guarantees. It is composed by both the SDNenabled supervised ML module and the DRL module. The hybrid SDN-enabled supervised ML is formed by an LSTM to perform traffic flow prediction using time-series datasets, which extracts short-term network data traffic variabilities and periodicities to ensure traffic flow prediction and energyefficient routing with guaranteed QoS performance. The DRL module performs learning from the existing historical data and iteratively from the interfacing with the defined network setting.
Jha et al. [178] focus on multipath routing in Data Center Networks (DCNs) and, for that reason, they directly try to compete against Equal-Cost Multi-Path (ECMP), which is one of the most popular protocols in those scenarios. In their design, they use DRL to compute the links weight and, afterwards, they apply Dijkstra's algorithm (as other traditional approaches). Although their evaluation is performed via an SDN-based environment, it does not consider typical traffic patterns from DCNs (such as elephant/mouse traffic), the tests are not comprehensive, and in-depth details from their implementation are missing for reproducible research.
Srivastava et al. [179] present a bio-inspired RBM algorithm to optimize load balancing. However, their analysis and evaluation seems limited, as they do not consider the measurement of standard metrics, the network topology is a fixed mesh (which is not common in practical networks) and they do not provide any additional thoughts on the actual SDN deployment.
Babayigit et al. [180] focus on DCNs and evaluate and compare a DRL technique with others like ANN, SVM and logistic regression. The results show that their approach is very efficient for load balancing, outperforming all the rest in diverse evaluated parameters. However, the authors do not provide specific details of the technique implemented, which makes it hard to reproduce.

D. LEARNED LESSONS AND RESEARCH TREND OVERVIEW
After examining the works that apply ML together with SDN for optimal routing, several conclusions arise at first sight: • Since the publication of the KDN concept four years ago, there is a huge tendency to apply ML and AI in SDN environments (particularly towards 6G) and, in the case of routing, DRL is particularly relevant in the last two years, as most published works fall in this type of ML technique.
• Most works compared their proposal with shortest path algorithms in terms of latency and/or throughput, and either use OMNeT++ for simulation, which might not be realistic enough, or leverage the Ryu SDN controller, which is very easy and good for prototyping, but it does not follow the requirements of the industry (e.g. bad performance, as it is written in Python).
• Selected topologies and datasets are often very specific and differ among authors. Only a few works use several types of topologies and datasets to guarantee comprehensive and homogeneous evaluations.
• Few efforts have been made to create synergies or even compare the different ML works in relation with routing in SDN. Most evaluations performed just compare their approaches with classic routing protocols and no competing proposals (probably because implementations are usually not publicly available), which hinders the attainment of actual conclusions.
• Most proposals lack design and/or implementation details, which makes it a hard task to reproduce results or produce comprehensive comparisons. For example, DDN-based proposals do not detail their architectures and the parameters used in their networks. Apart from these four main learned lessons, there are some other trends observed in our analysis. For example, most designs propose a centralized architecture, following the idea of classic SDN, while distributed or hybrid SDN approaches are set aside. In the case of evaluation, most proposals agree on the use of topologies like GÉANT, NSFNET and BRITE-generated, which are consistent with practical implementations, although almost all are wired networks. These topologies are usually deployed with Mininet via Open vSwitches (we assume, as most works omit this specificyet important-detail). As for datasets and traffic pattern generation, there is a huge heterogeneity of approaches: some leverage existing datasets, some others directly generate their own traffic based -or not-in current literature analysis, while many directly omit to provide details about this technical aspect.
Finally, the majority of works agree that future research efforts should be made regarding three aspects, namely: (1) scalability enhancement, (2) evaluation with more types of (real) datasets and (3) automatic fine-tuning of the system (which needs some manual configuration in the very first stages).
As a conclusion, following the definitions, descriptions, and evaluation of the different proposals presented, we believe the most complete and/or promising approaches are the following: • Sacco et al. [134], as they realize a comprehensive analysis with a testbed close to practical scenarios, including real traces, and application and comparison of different techniques.
• Hardegen et al. [108], because they leverage P4 programmable witches, which might have the best performance over other implementations.
• Casas-Velasco et al. [151], since they present a very complete implementation and evaluation and leverage the KDN concept.
• Fu et al. [173], because they particularly focus on a type of scenario (data center networks) and carefully design their approach around it.
• Chen et al. [177], as their implementation and evaluation is very complete, and close to real scenarios. Therefore, we recommend to follow the work from these research teams in case of interest in the field. Additionally, just out of curiosity, all of these five research items were published in 2020, which shows the very recent trend in the field.

VI. FUTURE RESEARCH DIRECTIONS
ML and AI have already influenced almost every field of human life [183]. Although ML algorithms are mostly leveraged for robotics, image and signal processing, they are playing and undeniable role in network control and management as well [184]. In particular, ML has been applied to routing problems in computer networks as early as in 1994 [185] and rapidly evolving everyday [186].
Recently, SDN has emerged in the field to provide a wider range of possibilities in the field of routing optimization with ML, as seen in previous sections. Nevertheless, this field still demands immense research efforts towards full-fledged ML-based networking environments, which we discuss in detail in the following sections. Though these challenges could be considered a burden, we believe they indeed illustrate an opportunity towards real and practical next-generation networks. For this reason, for each of the five sections, we will summarize the envisioned future research directions, together with the overall goal, in case these could hopefully serve as inspiration for the research community.

A. WHAT IS OPTIMAL ROUTING?
Though it might seem trivial, this is the first question that should arise when trying to design optimized routing algorithms based on ML for SDN environments. Networking scenarios are vast and heterogeneous and, for sure, not limited to be assessed by latency and throughput. Hence, when asked about the definition of optimal routing, the initial answer should be it depends.
For instance, first of all, in physical terms, networks could be divided into two main types: wired and wireless, and they have different routing protocols to start with. As an example, latency and throughput could be valid parameters to measure routing quality in wired environments, but some wireless scenarios, like Low-power and Lossy Networks (LLNs) [187], might require low power consumption or high-robustness instead. Additionally, network topologies also vary depending on the specific use case. Optimized routing in data center networks might drastically differ from what it is expected in large service-provider networks, which could even follow business-based directives. Finally, networks are dynamic and change (not only because of updates, but also because of failures) and this should be taken into account as a factor as well.
All of these ideas are just a few considering the physical media aspect, but many more could be evoked considering other aspects, like types of communication (unicast, multicast, broadcast), or applications. This is particularly relevant VOLUME 9, 2021 for 5G networks and beyond [188] for example, in which new types of requirements and applications are still flourishing.
Nevertheless, after our analysis of the state of the art, we found out that most research works simply consider a very limited subset of networks: wired, unicast, and considering latency and throughput as main drivers. Only a few mention specifically the application to data center or wireless scenarios. For that reason, we devise the following research directions: • Efforts should be made to apply ML in routing in wireless scenarios and, particularly, constrained scenarios.
• Broadcast and multicast optimal routing would be very valuable to assess.
• Traffic patterns, topologies and network changes should be considered in future analysis.
• Additional metrics should be evaluated as part of optimal routing, such as: node energy consumption, resilience or business-based metrics. Overall goal: A ML-based routing algorithm for SDN should be customizable based on a diverse set of parameters (latency, throughput, CPU usage, energy-efficiency), media (wired and wired), types of communication (unicast, multicast, broadcast), applications (traffic patterns) and topologies (DCNs, IoT, etc.). Additionally, apart from typical performance evaluations, proposals should also encompass long-term and multidisciplinary objectives, such as sustainability, hence tackling challenges envisioned by the Sustainable Development Goals (SDGs). If not feasible, the authors should at least justify the use case scenario and the evaluation method, to be consistent.

B. SECURITY AS A CROSS-CUTTING FEATURE
Possibly related with the previous aspect, security is an orthogonal aspect in networking [189], which affects all types of scenarios and should also be evaluated as part of any type of optimal routing. As many works already exist that apply ML and SDN for network intrusion detection, we would like to particularly focus on two aspects: data acquisition and routing policy population. In particular, we envision the following research directions: • ML-based proposals should consider the possibility that data acquisition could be hampered or modified to obtain faulty results, hence either a secure mechanism should be defined or a ML-based method to filter these attacks should be part of the overall designed ML method.
• Similarly to data acquisition, installation of routing entries could be affected as well by security attacks and this should be alleviated or, at least, proven to be safer than traditional and/or distributed approaches. Overall goal: Security should be assessed as a crosscutting parameter when evaluating the application of ML in SDN environments. The definition of an overall secure ML framework for SDN would be extremely valuable for the whole research community.

C. ARCHITECTURAL APPROACHES AND DATA MODELING
Though the classic definition of SDN presents a logically centralized architecture, it is not the only architectural approach to follow when applying ML-based approaches and, more importantly, it could even be not the most beneficial either. Researchers aiming at the application of AI and ML in SDN and, more generally, in programmable networks, should consider alternative architectural approaches like hybrid SDN (either vertically or horizontally [42]) or in-band SDN communication [190], as they could enhance and optimize the behavior of their proposals, including the monitoring side and data acquisition, or the potential security breaches that might be more severe in strictly centralized environments. To achieve this initiative, researchers could still leverage Mininet, but using BOFUSS switches [191] instead of (by-default) Open vSwitches, as the former can be easily modified. Alternatively, technologies like P4 [192] and XDP [193] have already demonstrated enhanced network programmability capabilities [149].
Additionally, alternative architectures could also provide deeper knowledge-based environments related with data modeling. So far, most data is directly obtained from the network, like CPU usage, packets received and sent, etc. Nevertheless, instead of this type of raw data, ML could profit from the use of advanced and high-level architectures like ontology-based [194] or even described by data bases [195], in which data is collected, merged and could provide an enhanced vision of the network. While it is true that these SDN architectures are more immature, some thoughts about potential applications with ML could be worth it.
Accordingly, the related research directions are the following: • Proposals of ML-based SDN frameworks should consider the possibility of following non-centralized architectures, hence analyzing its benefits in comparison with centralized architectures. The simplest approach would be redesigning one existing framework into a noncentralized scheme.
• Although more incipient, it would be nice to assess to what extent ML can benefit from using high-level data models. Overall goal: To evaluate the advantages (security, scalability, etc.), and even disadvantages, of using non-centralized SDN architectures in ML-based frameworks.

D. IN THE NEED OF OPEN DATASETS AND IMPLEMENTATIONS
The need of open datasets and implementations is probably the most important of the five types of research directions. Although solutions based on ML for networking are growing more rapidly everyday, these frameworks not only rely on the specific developed code, but they also need input data to train and/or test their models. Such data is scarce and barely shared [196]. Most times, this is because the collection of network data involves individual privacy issues [166]. Although this could initially have a high cost (for the first researchers following this idea), it would benefit the whole community tremendously in the long term, because it would permit other to reproduce, compare and enhance the existing solutions, hence increasing their impact. Recent initiatives are appearing in this regard, like the Softwarized Network Data Zoo (SNDZoo) [197], which intends to start an open ecosystem for dataset collections in the networking domain, based on a specific methodology to achieve homogeneous collection and publication.
Alternatively, open implementations is another, and probably easier, method to foster the merging efforts in the field. Whilst most surveyed works have used open platforms to implement their ideas (like the Ryu controller or the OMNeT++ simulator), most of them omit publishing them in public repositories like GitHub, which is a simple and very effective way to promote the merging of efforts from different proposals and research groups.
In conclusion, we envision the next research directions: • To build upon existing open data ecosystems like SNDZoo and define the requisites to make it grow faster.
• To evaluate what is the most beneficial method for implementation replication, i.e., what open platforms and tools should be prioritized for later publication and reutilization.
• To develop some type of framework or community to compete based on specific AI & ML challenges based on homogeneous datasets and topologies, which would foster evolution and replication of results. Overall goal: To foster open datasets and implementations to achieve more valuable results and ideas for the research community. At least, all frameworks should have a public link to their implementations.

E. INTO THE FOG
As previously mentioned, the current evolution of networks is every day more focused on the edge of networks, where IoT devices -and users-reside. This clear trend [17], [68] is moving step by step the intelligence of the network far from the core, towards what is called edge computing, fog computing and, even, mist computing [198]. When checking these names anybody can clearly visualize that the future of the ML approaches should be based on federated approaches, as the ones referenced before [75], [76]. However, these paradigms are still incipient and many challenges still need to be tackled. An example of these challenges are LLNs (previously mentioned), in which nodes are constrained in memory and battery and, therefore, routing is -per se-a challenge for them. This type of networks would benefit from this architectural approach as stand-alone devices cannot cope with the whole computational requirements of a centralized ML approach.
In particular, we envision the next research directions: • To determine the minimum computational requirements of network nodes to act as federated ML nodes.
• To define a negotiation and/or communication framework to allow efficient, secure and scalable communication among nodes.
• To align the previous two points with specific SDN and NFV architectural concepts and technologies (e.g. leverage SDN in-band communication for federated ML approaches). Overall goal: While this survey focuses on ML for its application to networking, some research efforts should be directed to networking for ML too, as they are both complementary.

F. TOWARDS INDUSTRY-BASED PRACTICAL SCENARIOS
Finally, we would like to mention an objective directly related with the previous ones: working on implementations close to industry-based practical scenarios. Now that most network innovation in companies is based on open source software, we, as part of the research community, should profit from it and leverage the same platforms and tools for a more effective adoption by industry. Alternatively, merging efforts with other big projects like Pronto [199], [200] would be clearly beneficial. Additionally, considering the application of ML in routing is usually foreseen as a step towards automatized network management, we should continuously monitor to what extent is ML trusted by network operators. Moving from a traditional (almost manual) management to another based on ML might imply severe changes and even unexpected outcomes. Therefore, the benefits of applying ML in these environments should be proven and clear or, otherwise, the potential impact might be too low.
In summary, some research directions could be the following: • To implement scenarios based on the ONOS controller, which is the one most supported by the ONF and industry. Alternatively, OpenDaylight could also be a good choice.
• To create a communication channel with industry to check their needs and propose initiatives, which could also be feasible via the ONF (they provide the mechanisms to do so). Overall goal: Implementation and evaluations should be as close to real scenarios as possible for effective adoption by industry. To this purpose, using platforms leveraged for commercial solutions (like ONOS) and communicating with standardization bodies (ONF) is pivotal.

VII. CONCLUSION
In this paper we surveyed the use of ML in SDN for routing optimization, classified into three types (SL, UL and RL), which are first introduced and defined, together with some of the associated techniques. According to our analysis, during the last three years, the works using ML for routing optimization in SDN have rapidly flourished, and particularly those leveraging DRL. Nevertheless, most research works are based on simple prototypes and for very specific network scenarios (wired, centralized SDN, and compared to distributed routing VOLUME 9, 2021 algorithms based on latency and throughput) and are hard to reproduce and compare. Thus, their evaluations are not completely meaningful and conclusive. We believe a sustained effort is needed to create an open ecosystem in which the different works support each other, instead of being proposed independently. Otherwise, most research efforts might never be implemented in practice. To this purpose, we finalize the survey with six sections including specific research directions for this field.  He also teaches SDN/NFV and data center networks. He has participated on different competitive research projects from Madrid regional government, national, and European. His research activity has been published in high impact JCR indexed research magazines, conferences, and workshops on networking technologies. VOLUME 9, 2021