A Learning-Automata-Based Congestion-Aware Scheme for Energy-Efficient Elastic Optical Networks

The flexible nature of elastic optical networks (EONs) effectively uses spectral resources for optical communication by allocating the minimum required bandwidth to network connections. Since the energy consumption of such networks scales with the magnitude of bandwidth demand, addressing the issue of energy wastage is important. This fact has a profound impact on the design of efficient schemes for energy aware optical networks, and adaptivity arises as one of the most important properties of these networks. Learning Automata are Artificial Intelligence tools that have been used in networking algorithms, when adaptivity to the characteristics of the network environment can result in significantly improved network performance. In this work, a new adaptive power-aware algorithm is introduced, which selectively switches off bandwidth-variable optical transponders (BVTs) under low utilization conditions, to achieve energy efficiency. A novel adaptive scheme, which makes use of Learning Automata to significantly reduce the total energy consumption, while at the same time avoiding the onset of congestion, is proposed. The proposed scheme monitors network congestion, in terms of Bandwidth Blocking Probability (BBP), and the learning mechanism finds the optimal amount of energy-saving so that congestion is avoided, while at the same time significant energy savings are achieved. The proposed Learning Energy-Saving Algorithm (LESA) is evaluated via extensive simulation results, which indicate that it achieves an energy saving of up to 50%, compared to other energy efficient solutions.


I. INTRODUCTION
The need for bandwidth is growing more than ever before as technology evolves [1] and high bandwidth-dependent applications have increased at unprecedented level. Future needs will also be driven by emerging capacity-demanding applications, including autonomous vehicles, the internet of things, high bandwidth enhanced video and virtual reality. According to Cisco, global IP traffic stood at 122 Exabytes in 2017 and it is estimated that these numbers will triple by The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wang . 2022 [2]. As a result, communication networks must provide greater capacity, become more adaptive during the implementation of new services, and be more flexible in terms of design and configuration. Networks need to be designed and built differently to provide more dynamic connectivity and to make better use of the available resources in the network.
Historically, the design of WDM optical networks has used a fixed grid plan to meet the requested traffic demand. Wavelengths of line rates 2.5, 10, 40, and 100 Gbps were all suitable for 50-GHz spacing in backbone networks and 100-GHz spacing in [3] metro-core networks. However, bit rates above 100 Gbps are unlikely to fit into this scheme [4].
Elastic optical networks, which are seen as the successor of WDM fixed networks, are considered the most suitable architecture for next generation backbone and metropolitan networks as they are characterized by high spectral efficiency and adaptability [5].
The use of tunable transponders and flexible spectrum grid (or flex grid) are assumed by elastic optical networks. These networks, based on orthogonal frequency-division multiplexing (OFDM) [6] support lightpaths with different bitrates, exploiting the flexible grid technology where the spectrum is split into frequency slots of 25, 12.5 or less GHz compared to coarser splitting of 50 GHz or 100 GHz of traditional WDM networks. Hence, the slots are combined to create channels, which are not overlapping due to OFDM's orthogonality capacity, of the desired size using bandwidth that is strictly necessary for the transmission [7].
On the other hand, the energy consumed by the rapidly expanding infrastructure of ICT (Information and Communication Technology) in combination with the enormous increase of Internet users poses a significant economic and environmental challenge [8], [9]. In detail, ICTs account for 20% of global electricity consumption, while it is estimated that IT internet-connected devices will produce 14% of global emissions within 20 years, according to the European Framework Strategy for Energy and Environmental Efficiency in the ICT sector [10]. Energy consumption has become particularly pressing in recent years, due to the problems resulting from its unconsidered use at a global level. Therefore, a move towards green technologies and eco-friendly solutions has been made. Furthermore, the network infrastructure is occuping a substantial part of the energy footprint in ICT, whereas this portion is about to grow over the years. In [11], M. Pickavet et al., mention that for network equipment, the energy consumption growth rates are typically about 12% per year. Thus, the concept of energy efficient (or green) networking has emerged as an important research topic [12].
This work supports the elastic architecture and aims at finding new power-oriented schemes to contribute to additional reduction in terms of energy consumption on an IP Over EON. Within this study, a new adaptive power-aware algorithm is implemented, which selectively turns off BVTs, along with the corresponding IP router ports, to achieve energy efficiency under low utilisation conditions, during the network operation. A novel adaptive scheme is proposed, which makes use of Learning Automata, to curtail the overall energy consumption while maintaining the BBP at low levels.
This paper is organized as follows. Section II overviews the related research on this field, while Section III provides insights about the network model. Section IV and V presents the power consumption model as well as the Learning Automata mechanism. The proposed heuristic scheme is analyzed in detail in Section VI. Simulations results are demonstrated in Section VII. Finally, Section VIII concludes this article.

II. RELATED WORK
A multitude of published papers have considered energy efficiency in the design of IP Over WDM optical networks [13]- [16]. An important approach to energy saving in IP over optical networks is the selective disabling of inactive network components when the traffic load is low, i.e. during night hours, while maintaining the network's vital functions, accommodating the residual capacity. In [17], energy efficient solutions ranging from Mixed Integer Linear Programming (MILP) to heuristics algorithms are proposed. More specifically, when the traffic load is reduced, these schemes disable various network elements while maintaining a set of critical constraints such as full connectivity and maximum link utilization. The same authors in [18] evaluate the actual power consumption savings considering realistic traffic patterns on a real IP backbone network. A comprehensive comparison of four different protection schemes in sleep mode scenario is presented in [19]. Also, a cognitive power management technique which manages to take efficient decisions, based on traffic prediction, by switching off underutilized network elements is proposed in [20]. In detail, three cognitive algorithms, depending on the elements they are trying to deactivate, are proposed.
Besides the energy efficiency of IP Over WDM optical networks, various research works concerning power minimization of elastic optical networks can be found in the literature. A fairly common, yet effective method of energy saving is the extensive application of optical bypass, reducing thus the number of high energy-consuming optical-electricaloptical (O-E-O) conversions, as the signal can be transported, amplified and switched directly in the optical domain. In [21], energy efficient traffic grooming in IP-over-elastic optical networks taking into account sliceable optical transponders is studied. MILP models among their corresponding heuristics are implemented, for each of three different types of BVTs, and investigated in terms of energy efficiency. In detail, the authors study the energy efficiency with reference to non-sliceable, partially-sliceable, and fully-sliceable BVTs, showing that as the number of available slices per BVTs increases, the more energy efficient the scheme becomes. An online algorithm, namely Selene, which exploits the signal overlap technique for power savings in EONs is developed in [22] and [23]. Energy efficient virtual optical network embedding schemes over elastic optical networks under dynamic as well as static scenarios using integer linear programming (ILP) model are developed in [24] and [25]. Two power-saving methods for data centers and transponders are implemented for node and link mapping respectively. The study of energy efficiency in optical transport networks, comparing the performance of a flexible network grid based on OFDM with that of Wavelength Division Multiplexing (WDM) with a Single Line Rate and a Mixed Line Rate operation is presented in [26]. In this article, energy-aware heuristic algorithms are proposed for resource allocation both in static and dynamic scenarios with time-varying demands for the Elastic-bandwidth OFDM-based network and WDM networks (with SLR and MLR). The RMLSA heuristic used in OFDM-based networks, uses a path evaluation method, while exploiting electrical grooming capabilities, in order to achieve energy gains. Transponders, EDFAs and optical cross connects are taken into account in energy consumption calculations. [27] provides an in depth energy efficient comparison between conventional path protection schemes for fixed-grid (WDM) and flexible-grid (EON) networks. Also, an energyefficient manycast routing and spectrum assignment algorithm in elastic optical networks supporting cloud computing applications is presented in [28].
Apart from the above mentioned techniques, a great number of published articles pertaining to artificial intelligence (AI) approaches in conjunction with energy efficiency issues in optical networks can be found in the literature [29], [30]. [31] proposes a heuristic method based on ant colony optimization to reduce network energy footprint by exploiting the basic principles of swarm intelligence for finding the most energy-efficient routes from source to destination nodes. In addition, a multi-objective genetic algorithm is proposed in [32] to design survivable virtual topologies in order to reduce both energy consumption and network congestion in WDM optical networks. A hybrid heuristic routing algorithm implemented to provide either energy efficiency or higher network performance is presented in [33]. The proposed heuristic consists of two modes i.e., oriented to energy efficiency or performance. The underlying functionality is provided by Learning Automata which can be configured to converge to paths that lead to energy efficiency or performance. Particle swarm optimization (PSO) algorithm is used, in [34], in order to solve the problem of resource allocation (RA) based on the signal-to-noise plus interference ratio optimization in a hybrid WDM/OCDM network under quality-of-service and power efficiency constraints.
In the light of the aforementioned remarks, the current work focuses on network level, where energy-aware resource allocation may lead to significant energy savings, considering that the network BBP remains at low levels. In this paper the trade-offs in terms of power consumption and BBP when switching off underutilized network devices (i.e. BVTs) among an IP over EON are considered and estimated. In the results section it is illustrated how affecting one of these two factors can influence the other. Results are obtained by the use of a Learning Automata mechanism which adapts to each traffic load while manages to achieve a good tradeoff in terms of energy savings and BBP.

III. NETWORK MODEL A. IP OVER EON ARCHITECTURE
The main aspect of an elastic optical network is that the optical channels' bandwidth and modulation format can be elastically adjusted to the characteristics of the lightpaths. An example of a typical IP Over EON architecture, as shown in Figure 1, is considered. The IP over elastic optical network consist of two layers, the IP and the optical layer. In the IP layer point of view, each node is equipped with a central IP router, while the optical layer consists of the optical switching nodes connected with fiber optic cables. The optical layer offers the link between the IP routers. In each node multiple traffic streams from access network enter the IP router. Each IP router port is connected to the optical switching node through BVTs. At the starting point of the data transmission, BVTs are responsible to convert the electrical flows from the IP layer to optical flows (E/O conversion), then the traffic enters the optical domain and is routed in all optical connections over the optical network. When all optical traffic traveling along the lightpath reaches its destination, the BVT converts the signal back to electrical (O/E conversion) and finally reaches the end point at the IP layer. Data are then forwarded and handled by the corresponding IP router. Finally, to enable optical signals to travel over long distances, erbium doped fiber amplifiers (EDFAs) are used in fiber optic connections.

B. ELASTIC TRANSPONDER TECHNOLOGIES
Two types of transponder technologies according to their sliceability degree can be categorised as follows.

1) NON-SLICEABLE BVT
This type of transponder is designed to provide flexible lightpaths. NS-BVT allows any optical channel with any spectral width and central frequency to be established without strictly following the ITU-T fixed grid. NS-BVT has only one slice, which is exclusively used to serve one lightpath, and thus it is called non-sliceable. Due to its high available bandwidth it is offered to serve future demands (i.e 400 Gbps). However, it often suffers from low utilization. For example a 400 Gbps NS-BVT works inefficiently with a 40 Gbps lightpath, since it is lightly loaded and uses only 10% of its capacity.

2) SLICEABLE BVT
To overcome the above inflexibility of NS-BVT, sliceable BVT was proposed in the literature [35], [36]. Unlike NS-BVT, this type of transponder, which is also designed to provide flexible lightpaths, allows more than one lightpath to be established in the same transponder. A physical transponder can be logically sliced into multiple sub-transponders, each of which can serve an independent lightpath between source and destination nodes without electric processing at intermediate nodes. As a result, various optical flows can be aggregated into one optical transponder in order to improve its utilization. This feature of S-BVT enables optical grooming, which can also significantly improve energy efficiency, since no new transponders required for new connections to be accommodated. An example of optical grooming is shown in Figure 2. Should there exist any available slices to source and destination nodes, node A and node D respectively, lightpath λ2 can be optically groomed with pre established lightpaths λ1 and λ3.

C. EON CONSTRAINS
The issue of accommodating new connections in elastic optical networks is quite complicated. Unlike WDM networks where each connection is assigned a wavelength, in EON the frequency slots (FS) are combined to form channels of appropriate size for every new incoming connection. When designing an IP over EON, one should select the route and the spectral resources for a connection request arriving to the network. This is known as the problem of Routing and Spectrum Assignment (RSA) [37], [38]. The EON implementation imposes to the RSA problem two constraints: (1) the spectrum continuity constraint, that is the allocation of a connection, must follow the same spectral resources on each link along the route and (2) the spectrum contiguity constraint, that is the allocation of a connection must be on contiguous FS on each link along the route.

IV. POWER CONSUMPTION ANALYSIS
Three main components which can influence the amount of power consumption on an IP Over EON are considered in this study, namely IP router ports, S-BVT and EDFAs.

A. IP-ROUTER PORT
A 400 Gbps IP router port, which connects the IP router to the optical transponder is considered. An IP router port consumed 560 W (1) [21].

B. EON BANDWIDTH VARIABLE TRANSPONDER
The power consumption of a BVT can be expressed as in (2) according to [27], and [21]. TR represents the transmission rate of the optical transponder, where in case of a sliceable transponder indicates the sum of transmission rates of all subtransponders. An additional 20% of power consumption is considered as an overhead contribution for each transponder. Moreover, it is assumed that the energy consumption of the transmitter and the receiver are identical and are equal to half of the power consumption of a transponder.
C. ERBIUM DOPED FIBER AMPLIFIER Erbium Doped Fiber Amplifiers are considered as amplifiers in this study. The power of the EDFA is represented in Equation (3), in which X is the spectrum width for amplifying. An inline amplifier is deployed every 80km along the fiber, while a postamplifier as well as a pre amplifier are required at the ends of the fiber link.
The total power consumption is calculated by adding the total energy consumption of the BVTs, the EDFAs and the IP router ports (4). The contribution from the transponders and EDFAs is obtained by the addition of the energy consumed by each of the connections examined during the simulation; whereas the one from the IP router ports, which is fixed and common, is obtained by the total energy consumed by these network elements in the total simulated time. Furthermore, the energy consumption of the switch fabric is not considered in this study.

V. LEARNING AUTOMATA MECHANISM
Learning Automata (LA) [39], [40] are artificial intelligence tools that can be applied to learn the characteristics of a system's environment. One major advantage of LA is that they do not need to have any knowledge of the environment they operate or any analytical knowledge of the task to be optimized. A LA is a finite state machine tool which improves its performance by interacting with the random environment in which it operates. Given the dynamic nature of a networking environment, LA are ideal for implementing adaptive protocols ( [41]- [44], [33], [45]- [47]) by using network feedback information. The main purpose of a LA is to find within a set of actions the optimal one, that is the action that causes the minimum average penalty received by the environment (this criterion is equivalent to maximizing the average reward received by the environment). The low computational complexity that a LA exhibits enables it to rapidly converge to the best action of the environment with which it interacts. Figure 3 illustrates the operation of a typical LA, in which there is a set of possible actions a 1 , a 2 ,.., a M as well as the corresponding probabilities p. P(n) = p 1 (n), p 2 (n), . . . , p M (n) constitutes a vector which represents the probability distribution for M actions at each instant n. It holds that M i=1 p i (n) = 1. At first, the LA has no specific knowledge about the environment it operates and as a consequence all initial probabilities are considered to be equal. At each instant n, an action a i 1 ≤ i ≤ M is selected with probability p i (n). The action chosen by the automaton responds with a stochastic reaction β i (n), which is used to update the probability vector P. Upon completion of this update, the LA selects the next action based on the updated probability vector p (n+1) . This means that the probabilities of some actions are increased or decreased according to the feedback received from the environment. After a few instances the LA selects the optimal action, which is the action that has the minimum penalty probability. That is, the LA decides by itself which action is better than the others.

VI. THE LEARNING ENERGY SAVING ALGORITHM (LESA) A. ADAPTIVE MODEL FORMULATION USING LEARNING AUTOMATA
Regarding the use of LA, in the context of this study, is the detection of an optimal number of BVTs that should be switched off so that one manages to achieve important energy savings while maintaining congestion at a low level. Congestion can be determined by the observed BBP, which is defined as the ratio of blocked bandwidth over total requested bandwidth. BBP is selected as it is a common metric for utilization of bandwidth usage efficiency in such networks. If other metrics were used to measure congestion, the operation of the algorithm and the results would show the same trends. Indicatively, if congestion is measured indirectly via observing utiization of the network node resources, increased congestion would show up by increased utilization of these resources and would again translate to increased energy efficiency when compared to lower node resource utilization incurred by more node equipment being in the on state. Then the LA would just need to find a good trade-off between resource utilization and energy savings, as it does now for BBP and energy savings. Moreover, if congestion is measured indirectly via observing delay, increased congestion would show up by increased delay and would again translate to increased energy efficiency when compared to lower delays incurred by more node equipment being in the on state. Same as before, the LA would just need to find a good trade-off between delay and energy savings, as it does now for BBP and energy savings. In short, there are 2 actions that lead to the next or previous state. This way the LA, based on the corresponding probabilities, estimates where the transition will take place. Equations (5) - (12) correspond to the probability updating scheme of the learning automaton that was described in the previous section. At each cycle n, the basic choice probability P of the selected action a is updated according to the network feedback reaction. P +1 (t) refers to action +1 , P −1 (t) refers to action −1 , whereas the term state (S) refers to the number of BVTs when a percentage of the free BVTs have been removed from network nodes (i.e. S 5 corresponds to the number of BVTs per node when 50% of unused BVTs have been switched off), as it could be seen in Figure 4. LA can then choose, based on P, whether to increase the number of BVTs to be switched off (S +1 ) per node by action +1 , or decrease the number of BVTs to be switched off (S −1 ) per node by action −1 . At first, the ratio of energy savings to BBP is estimated for a specific state. In this work energy savings are defined as the resulting profit between the energy consumed without any network resource deactivation and the energy consumption for a specific state S. In detail, the LA selects a state and the energy savings (ES) are calculated, then the observed BBP (congestion) for the specific state is estimated and finally, the ratio ES BBP is assessed. Afterwards, the LA checks if this ratio of the state is greater than the calculated ratio of the previous state. Should the ratio be greater or equal than the previously estimated ratio the basic choice probability of a increases according to (5), (6), (9) and (10). Otherwise, the basic choice probability of a decreases according to (7), (8), (11) and (12). Subscripts u (up) and d (down) in eq. (5) - (12) are used for ease of comprehension representing either the increase or the decrease in probability. Eq. (5) and (6) are used when the LA is correctly selected (increased ratio) action +1 , while (9) and (10) are used when the LA is correctly selected (increased ratio) action −1 . On the contrary, (7), (8) are used when the ratio is less than the previously estimated ratio for action +1 , while (11) and (12) are used for action −1 . L is a parameter that governs the speed of the automaton convergence. The lower the value of L, the more accurate the estimation made by the automaton, a fact however that comes at the expense of convergence speed. Two L values are used in this study, L 1 = 0.01 and L 2 = 0.05. After receiving a rewarding response for the action in the cycle n, the corresponding probability of P increases using L 1 , whereas after receiving a response that indicates a penalty, the probability is reduced, using L 2 .
For example, it is assumed that at a certain cycle n, the LA is found at state 3 (S 3 ), the corresponding actions are action −1 with P −1 = 0.45, action +1 with P +1 = 0.55 and the ratio of ES to BBP is r. At cycle n + 1, the LA chooses the action with the greater probability, P +1 (action +1 ) and the new state is 4. Then the ratio r for state 4 is estimated and compared to previously estimated ratio r. Should the ratio r be greater than r, the LA receives a rewarding response and as a consequence updates the probability scheme using (5) and (6). Finally, the LA is now at state 4 and the probabilities for action −1 and action +1 are P −1 = 0.4455 and P +1 = 0.5545 respectively. This procedure is repeated until the LA converges to a certain state.

B. ALGORITHM DESCRIPTION
The main idea of the proposed algorithm, namely LESA, is the design of an energy efficient scheme which manages to reduce the total energy consumption during network's operation, by adaptively switching off a number of optical transponders, and as a result the corresponding IP router ports connected to them, in low-use scenarios while observing network congestion so as not to affect the BBP. In the operation phase of the network, new and variable rate connection requests arrive dynamically and have to be served upon their arrival, one by one. LESA algorithm consists of two separate periods. The first period involves the observation phase of the algorithm, during which calculations are made regarding the utilization of the BVTs. The second period refers to the use of Learning Automata for estimating the relation between the energy savings achieved and BBP under a different number of excluded BVTs (learning phase). Finally, the value that was indicated by the LA, constitutes the best one between the energy savings achieved and BBP. That is, the number of BVTs to be switched off in order for the BBP not to be affected significantly. The complete notations used in the following algorithms are mentioned in detail in Table 1. Algorithm 1 shows the pseudocode of the proposed algorithm LESA during the observation phase. During this period, the algorithm starts routing the traffic demands which arrive dynamically in the network. LESA calculates the shortest paths between the node pairs, using the k-shortest path method, and routes the demands according to the First Fit algorithm, while ensuring the continuity and contiguity constraint. During this period, the existing BVTs on the physical topology, as well as the BBP are monitored for a fixed number of arrivals. Transmitters' and receivers' utilization percentages for each node in the physical topology have been calculated. Afterwards, the mean BVT utilization per node is estimated. In the final step of this period, the power consumption of IP router ports, BVTs and EDFAs, using (1), (2), (3) and (4), as well as the BBP of the initial physical topology are estimated. Additionally, the algorithm outputs the number of free BVTs and the total number of BVTs per node after a certain percentage of the free BVTs have been removed. Observation's phase output is used as an input to phase two of the algorithm, the decision making with Learning Automaton phase (learning phase).
Algorithm 2 shows the pseudocode of the proposed algorithm LESA during the learning phase with a learning automata mechanism. Tran [i][x], is an array which constitutes the number of BVTs per node, where x indicates the node on the physical topology, when i% of the free BVTs have been removed, i.e. i = 0%, 10%, 20%,.., 100%. This array corresponds to the states that the learning automaton can be found. In detail, S 2 corresponds to i = 20%, while S 8 corresponds to i = 80%. LA may chose to either increase the number of unused BVTs to be switched off, action (+1) , from the physical topology with P (+1) , or decrease the number of unused BVTs to be switched off, action (−1) , from the physical topology with P (−1) . Firstly, the algorithm chooses randomly a state sr and calculates BBP for this state, as well as the energy gains (ES ) of this state in comparison to PC total given from phase 1. Then the algorithm retrieves the action with the highest probability from ActionVector (lines 10 and 23), action (+1) or action (−1) which corresponds to S sr+1 or S sr−1 respectively, runs the simulation for a fixed number of arrivals and estimates the new BBP for the current state, as well as the energy gains (ES ) of this state in comparison to PC total given from phase 1. The connection requests are routed as in phase 1 with the k shortest path routing algorithm using FF scheme without employing any rerouting of the traffic. Should the ratio ES BBP be greater or equal to ES BBP , the learning VOLUME 8, 2020  (5) and (6) are applied when the LA rewards the increment of the switched off BVTs, while (9) and (10) when the LA rewards the decrement of the switched off BVTs. Otherwise, the LA updates the updating probability scheme according to (7), (8), (11) and (12). By the end of this period, the algorithm ends up (convergence of LA) with the estimated value of percentage of switched off BVTs (S) in order for the BBP not to be significantly affected. The alternation of states from active to inactive BVTs takes place sparsely in time so that the overall performance of the network is not affected. For example, when the LA is running, the network remains in the same state for a large number of incoming connections until the next transition to a different state depending on the LA decision to either increase or decrease the number of inactive element, as described before.

C. DATA COLLECTION
The network control architecture of Figure 5 is assumed in this study. A central Network Controller can collect data both from the IP and optical layer and accept control messages so as to adapt the virtual network topology (VNT). At the optical layer, the Controller collects data regarding BVT utilization

VII. PERFORMANCE EVALUATION A. STUDY CASE
In order to evaluate the performance of the proposed algorithm, a set of simulation experiments were conducted. To estimate the overall power consumption of different design solutions, one network topology was considered. Figure 6 shows a metropolitan mesh network [48], which consists of 29 nodes and 41 links (with corresponding distances), for each of which two directions will be considered.

B. SIMULATION PARAMETERS AND ASSUMPTIONS
An elastic optical network simulator has been implemented, using Python 3.7 on Spyder (The Scientific Python Development Environment). The complete set of parameters used in this study are listed below.
1) The number of FS on a fiber equals to 160. This is a typical value which is used in many previous works. Each link is bidirectional (with two unidirectional fibers). The granularity of FS is 25 GHz. 2) One FS as a guard-band associated with each of the connections is considered.

3) Connection requests follow a Poisson process with an
average connection's inter arrival time (IAT ) equals to 1 (λ), while their holding time follows a negative exponential distribution with mean value (1/µ). The latter is tuned to achieve the desired traffic load [49]. 4) The number of FSs per connection corresponds to the uniform distribution. Each new incoming connection can take any value from 1 to 9 with a uniformly distributed probability [49]. 5) The source and destination nodes of a request are randomly and independently selected from the network topology. 6) K-shortest path, with k=3, and the widely known First Fit scheme, are used for solving the RSA problem. 7) 400 Gbps sliceable BVT which can launch 10 subcarriers (a sub-carrier is associated to a FS) enabling optical grooming, and each sub-carrier can carry a 40-Gbps signal is considered in this study. The number of sliceable BVTs as well as the corresponding IP router ports per node is assumed to be 15. 8) At first it is assumed that all the devices are all in on state. 9) The offered load is determined by λ / µ (Erlangs). 10) The modulation format used in every connection is assumed to be the same during the whole simulation. 11) Spectrum continuity and contiguity constraints are ensured for each connection. 12) Results presented below are averaged over 3 × 10 5 connection requests per simulation. 13) LA training number is assumed to be 100.

C. SIMULATION RESULTS
The performance metrics such as energy consumption, BBP, energy savings as well as the convergence of the LA have been evaluated in the metropolitan optical network of Figure 6.
In order to measure the energy-saving potential of LESA, a routing and spectrum assignment approach which energy efficiency lies in the use of sliceable BVTs and optical grooming [21], namely Elastic case [49], has been implemented. This algorithm routes the incoming requests using the first fit strategy without deactivating any network resources, exploiting the advantages of EONs. Figure 7 illustrates the total energy consumption versus the offered load (Erlangs) between LESA and Elastic case algorithm. Each point in the VOLUME 8, 2020  graph (concerning the LESA algorithm) corresponds to the energy consumption when using the value/state (S) obtained by LA mechanism after a training time (x symbol indicates the state S LA finds). The energy consumption of each compared methods rises in a common way as the offered load increases. It is worth noticing that LESA always outperforms the reference Elastic case algorithm. Corresponding results obtained in terms of power savings are summarized in Figures 9a, 10a, 11a and 12a. These results are translated into profit by up to 50%, 29%, 33% and 32% for 50, 100, 150 and 200 Erlangs, respectively.  Figures 9a, 10a, 11a and 12a the simulation was carried out for 3 × 10 5 connections. For each traffic load, two subfigures are presented, with the first (9a, 10a, 11a and 12a) representing the percentage of energy savings under the different states of the algorithm versus the BBP, whereas the second (9b, 10b, 11b and 12b) depicting the convergence of basic choice probabilities of LA towards different levels of energy-saving. The arrows shown in the line graphs (Figures. 9a, 10a, 11a and 12a) correspond to the increase or decrease in the ratio ES BBP compared to the previous state, i.e. S 6 ratio is greater that the ratio of S 5 in Figure 9a. Should two convergence values of the LA are too close to each other, both values are considered acceptable. For example, under the offered load of 100 Erlangs where BBP in S 4 and S 5 is almost the same, without exceeding the percentage of %0.3, both states are considered best or near best. The final state's percentage is estimated taking into account the average of both states. For the sake of uniformity, S 4 was used in the diagrams under 100 Erlangs. Under the offered load of 50 Erlangs, an energy saving from 6% to 90% is achieved, while the BBP takes values from 0% to 66%, for states S 1 = 10% to S 10 = 100%, respectively (Figures 9a and 9b). More specifically, for the first six states (S 1 -S 6 ) of the simulation the BBP remains at zero levels, while the percentage of power gain rises progressively as the number of the switched off BVTs increases, since a significant number of BVTs as well as router ports is reduced. For the rest of the states, the graph's curve changes significantly, as the BBP shifts at a faster rate than the energy savings. As a result, the most acceptable value of the free BVTs that should be switched off is 60% or S 6 . Should 60% of free BVTs per node will be switched off from the physical topology, a power saving of about 50% is achieved, while the BBP still remains at zero levels. Observations are verified on Figure 9b which report the convergence of the LA. As it could be clearly seen, the LA chooses the preferred value of S 6 most of times with a percentage of 49%.
Similar results can be seen in Figures 10, 11 and 12 when the network load is increasing. The LA mechanism used by LESA algorithm always manages to find the most acceptable value for each network traffic demand. In detail, the obtained value for 100, 150 and 200 Erlangs is S 4 , S 5 and S 5 respectively. In each case it can clearly be seen that the LA is constantly moving around the desired state. From the algorithm description, it is evident that a small part of network's resources is sacrificed in order to achieve lower power consumption values. Notwithstanding, extensive simulation results have proven that this sacrifice is minimal in terms of energy savings attained. To support this, BBP, in linear scale, versus the increasing offered load is depicted in Figure 8. BBP of both algorithms increases when the traffic load increases. As it could be seen, BBP remains the same as long as the offered load is up to 100 Erlangs for both LESA and Elastic case algorithm. Although, as expected in higher offered load values the Elastic case algorithm results in lower BBP, as the lightpaths have more chances to be accommodated in a network with a greater number of BVTs. However, the proposed algorithm manages to save important amounts of energy without significantly increasing the BBP, which in worst case scenario, i.e. 200 Erlangs, alters by 2%. Figure 13 illustrates the adaptability of the LESA algorithm under variable network load. In more detail, the algorithm starts with an offered load of 50 Erlangs while gradually increasing to 200 Erlangs. The estimated values of the LESA based on the Figures 9b, 10b, 11b and 12b are the desired values and are reflected on the graph as a straight line constituting the upper limit selection for the specific state S i (i = 0%, 10%, 20%, .., 100%), while LESA VL shows the progress of the algorithm as the traffic demand varies dynamically. Specifically, LESA VL (LESA Variable Load, which is the same LESA algorithm applied to a variable load network) describes the rate of achievement of the preferred value as the network operates, compared to LESA. For each traffic load, two instances of the proposed algorithm are   plotted on the graph. For the first instance in the curve, i.e. for 150 Erlangs the percentage is 81%, at first the algorithm operates by considering the learning information retrieved from the previous load (100 Erlangs), while for the second   instance, which is 88%, the algorithm operates by considering the current traffic load information (150 Erlangs). The initial fall of the curve and its subsequent rise is due to the fact that the algorithm at first operates with outdated information during the learning mechanism, while progressively approaching the desired value as it finally uses the updated information in the learning process. For example, when the offered load is changed to 200, at first the LA selects the preferred value with a percentage of 78% compared to LESA, until it finally adjusts to this value with a percentage of about 91%. Same results are also depicted for 50, 100 and 150 Erlangs.
In order to study the applicability of the proposed algorithm another network scenario was tested. The USnet backbone network, which consists of 25 nodes and 43 links was considered for additional evaluation (Fig. 14). The actual distances, in kilometers, between two neighboring nodes are shown in the Figure. Figures 15 and 16 depict the total energy consumption versus the offered load and BBP versus the offered load for LESA and Elastic case algorithm, respectively. As it could be seen, the power consumption of the compared methods increases in a common way as the offered load increases. This is explained by the fact that the existing lightpaths live longer in the network due to longer connections' duration. As in the metropolitan network scenario, LESA never under-performs compared to the Elastic case algorithm. Corresponding results obtained in terms of power savings are translated into profit for LESA by up to 49%, 22%, 21% and 15% for 50, 100, 150 and 200 Erlangs, respectively. Obtained values for 50 to 200 Erlangs are depicted in Figures 17,18,19 and 20. The difference in energy gains between the two network topologies (backbone vs metro) is due to the fact that a network with a smaller number of nodes such as the reference USnet network needs more active BVTs to serve the requests without affecting the overall BBP, as more connections start from each node in the network under the same traffic. This is why the LA ends up choosing states that correspond to lower energy saving compared to that of metropolitan network.
Finally, Figures 21 and 22 illustrate the breakdown of energy consumption regarding IP router ports, SBVTs and EDFAs in metropolitan and USnet network topologies between Elastic case and LESA algorithm, respectively. Regarding the power consumption of IP router ports is fixed, since their consumption is only affected by the number of SBVTs connected to them. As expected, on one hand, their energy consumption remains constant during the Elastic case algorithm, which does not deactivate any network devices. Pertaining to the LESA algorithm, on the other hand, the energy consumption of the IP ports is also constant but in lower levels as it is directly affected by the deactivation of the SBVTs. The lowest energy consumption is observed at low load values for both topologies, since, for those load values   i.e., 50 Erlangs, the algorithm achieves the lowest number of active BVTs. Furthermore, for SBVTs and EDFAs, the results presented in Figs. 21 and 22, show common trend for both network scenarios, an increase in power consumption in both network elements. It is observed that the energy consumption increases with the growth of traffic load with the SBVTs being the most dominated elements between these two. This  is explained by the fact that the existing lightpaths live longer in the network due to longer connections' duration. However, the overall distribution of power consumption among the presented network elements tend to change when applying LESA algorithm. In detail, while the SBVTs continue to be the most dominated elements, their power consumption is noticeable lower due to LESA's capacity of adaptively switching off the appropriate number of SBVTs from the network nodes while at the same time does not affect the BBP. On the other side, the power consumed by the EDFAs remains in similar levels in both algorithms for each traffic load. This is explained by the fact that energy consumption of EDFAs is modified by the amount of blocked connections (i.e., less served connections, less power consumption), which as long as the BBP and as a result the number of served lightpaths are not affected, remains stable.
It is worth mentioning that the LA used in the Learning Phase of the algorithm executes a few instructions in each iteration. Taking into account the typical energy consumption of a modern CPU (e.g 165W for Intel Core i7 under load) this translates to negligible computation energy consumption. On the other hand, the energy gains from LESA are substantial, e.g, on the USnet backbone it achieves energy gains of tens of kilowatt hours (from 50 kWh to 125 kWh depending on the traffic load). Thus, significant energy savings for a cost of few instantaneous uses of a device with power consumption in the order of 165 W is achieved. Additionally, such energy gains are achieved from the moment the LA starts to operate (learning phase), and significant amounts of energy are saved due to the reduced active network equipment in the network after each iteration (even if at each iteration they are not the maximum as the algorithm has not ended yet) until it finally reaches the most efficient state of active/ inactive BVTs and terminates.

VIII. CONCLUSION
In this work, a new adaptive power aware algorithm is introduced, which selectively switches off BVTs under low utilization scenarios supporting energy efficiency. The main idea of the presented scheme is the deactivation of a number of BVTs on network nodes after a period of observation of the network activity. The proposed algorithm makes use of Learning Automata (LA) in a mechanism that selectively switches off BVTs in low-load scenarios to achieve energy gains. Based on the observation of the BBP, LA aims to find the most acceptable number of BVTs that should be switched off so that there is a noticeable increase in terms of energy savings without affecting the BBP. Extensive simulation results indicate that the proposed scheme can achieve an energy saving of up to 50%, compared to other energy efficient solutions. The use of Learning Automata for energy saving in elastic optical networks can be the base of a new family of elastic networks, which are capable of operating efficiently, with low energy consumption under any load conditions. GEORGIOS I. PAPADIMITRIOU (Senior Member, IEEE) received the Diploma and Ph.D. degrees in computer engineering and informatics from the University of Patras, in 1989 and 1994, respectively. In 1997, he joined the Faculty of the Department of Informatics, Aristotle University of Thessaloniki, Greece, where he is currently serves as a Full Professor. He is also a Professor with the Department of Informatics, Aristotle University of Thessaloniki. He has published 131 articles in peer-reviewed journals (57 in the IEEE Journals) and 136 articles in international conferences. He is the author of three books published by Wiley and an editor of a book published by Kluwer /Springer. He has participated in 21 research projects, some of which as a team leader or a coordinator. He serves as an evaluator for the international and national research and development programs. His major research interests are wireless networks, optical networks, learning algorithms, and application of learning algorithms in communication networks. He has served as the Chair/TPC Chair for four international conferences, a TPC Member for 61 international conferences, and a reviewer for 36 scientific journals. He has served as an Associate Editor for the IEEE Network, the IEEE TRANSACTIONS ON  EMMANOUEL VARVARIGOS received the Diploma degree in electrical and computer engineering from the National Technical University of Athens (NTUA), in 1988, and the M.S. and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), in 1990 and 1992, respectively. He was an Assistant and later an Associate Professor with the ECE Department, University of California, Santa Barbara, from 1992 to 1998. He was an Associate Professor with the ECE Department, Delft University of Technology, The Netherlands, from 1998 to 2000. He was a Professor with the CEID, University of Patras, Greece, from 2000 to June 2015. Since June 2015, he has been a Professor with the ECE Department, National Technical University of Athens. From 2003 to 2016, he was also the Scientific Director of the Greek School Network, Network Technologies Division, Computer Technology Institute Diophantus (CTI), which through its involvement in pioneering research and development projects, has a major role in the development of network technologies and telematic services, Greece, and is responsible for development and operation of the Greek School Network, the largest public network, Greece. From January 2017 to December 2018, he was also a Professor and the Head of the Electrical and Computer Systems Engineering Department, Monash University, on leave of absence from NTUA. He has also worked as a Researcher with Bell Communications Research, and has consulted with several companies in the U.S. and in Europe. He has participated in more than 30 USA-and EU-funded research projects, and in many national research projects, and has been the consortium coordinator in six of them. His research interests include in the areas of optical communication networks, optical interconnects for data centers, network protocols, grid and cloud computing, smart energy grids, and network services.