Reconfiguration of Optical-NFV Network Architectures Based on Cloud Resource Allocation and QoS Degradation Cost-Aware Prediction Techniques

The high time required for the deployment of cloud resources in Network Function Virtualization network architectures has led to the proposal and investigation of algorithms for predicting traffic or the necessary processing and memory resources. However, it is well known that whatever approach is taken, a prediction error is inevitable. Two types of prediction errors can occur that have a different impact on the increase in network operational costs. In case the predicted values are higher than the real ones, the resource allocation algorithms will allocate more resources than necessary with the consequent introduction of an over-provisioning cost. Conversely, when the predicted values are lower than the real values, the allocation of fewer resources will lead to a degradation of QoS and the introduction of an under-provisioning cost. When over-provisioning and under-provisioning costs are different, most of the prediction algorithms proposed in the literature are not adequate because they are based on minimizing the mean square error or symmetric cost functions. For this reason we propose and investigate a forecasting methodology in which it is introduced an asymmetric cost function capable of weighing the costs of over-provisioning and under-provisioning differently. We have applied the proposed forecasting methodology for resource allocation in a Network Function Virtualization architectures where the Network Function Virtualization Infrastructure Point-of-Presences are interconnected by an elastic optical network. We have verified a cost savings of 40% compared to solutions that provide a minimization of the mean square error.


I. INTRODUCTION
The Network Function Virtualization (NFV) [1]- [3] was introduced a few years ago to reduce the software maintenance and updating costs of traditional middleboxes. It is based on the execution of Virtual Machines (VM) in datacenters called Network Function Virtual Infrastructure-Point of Presence (NFVI-PoP). The VMs execute software implementing network functions. The network service is composed by a set of network functions referred to as Service Function Chain (SFC). The NFV paradigm has the advantage of allowing a dynamic and flexible allocation of resources The associate editor coordinating the review of this manuscript and approving it for publication was Bong Jun David Choi .
(processing, memory and disk resources) necessary to support a service function. The resource allocation problems have been largely investigated [4]- [11] as well as the interconnection problem of the NFVI-PoPs with optical networks [12]- [17] when the support of high bit rate SFCs is needed. While NFV allows for a cloud resource flexible reconfiguration [18], Elastic Optical Network (EON) allows for a flexible bandwidth resource reconfiguration thanks to the allocation of consecutive frequency slots of 6.25 GHz or 12.5 GHz [19]- [21].
Two approaches are possible for the reconfiguration of resources: technique is ineffective due to the high time needed to reconfigure cloud resources which can take tens of minutes and cause Quality of Service (QoS) degradation; • proactive approach: the reconfiguration is triggered in advance based on a traffic or necessary resources prediction [28]. Among the reactive approaches, Ghaznavi et al. [29] propose consolidation algorithms based on horizontal scaling techniques in which the processing capacity is varied by increasing/decreasing the Virtual Network Function Instance (VNFI) without changing the processing capacity allocated to each VNFIs. An over-sized static resource allocation to the VNFIs may avoid reconfigurations but with this overallocation the financial benefits would occur only in low cloud resource cost scenario when the objective of the network provider is only to avoid reconfiguration in order not to pay QoS degradation penalty to users.
To avoid complex NFV state management issue, solutions based on vertical scaling techniques [30] in which the VNFIs are dimensioned to achieve the processing capacity required by the traffic, have been also investigated; when the traffic increases/decreases, rather than adding/removing VNFIs, their processing capacity is increased/decreased.
The dynamic allocation of resources to the VNFI leads to high reconfiguration times if the operation is not performed in advance. For this reason most recent research on the reconfiguration of NFV network architectures follows the proactive approach and are based on the application of Artificial Intelligence (AI) techniques [28].
Unfortunately, prediction techniques are not able to predict exactly the traffic or the necessary bandwidth and cloud resources because there is always an innovative component that cannot be predicted. When prediction errors occur, there is an increase in operational cost, which can be of two types: • Over-Provisioning (OP) cost: it is the cost of additional bandwidth and cloud resources that the network operator needs to use when the prediction algorithm produces traffic or needed resources overestimates; • Under-Provisioning (UP) cost: it is the cost of compensation to the user that the network operator has to pay for the QoS degradation that occurs when the prediction algorithm produces traffic or needed resources underestimates. The costs mentioned above may have a different impact on the operational cost and that depends on the service type to be supported. The reconfiguration techniques based on prediction and proposed in literature fail to minimize the operational cost because they are mainly based on the minimization of a symmetric error function (i.e. Mean Squared Error (MSE), Mean Absolute Error (MAE),. . . ) and therefore they are not able to weigh differently the UP and OP costs.
The main contribution of this work is to propose a prediction technique which, aware of the fact that traffic cannot be accurately predicted, tries to overestimate or underestimate traffic in relation to the values of OP and UP costs. This objective is achieved by minimizing an asymmetric cost function characterized by a parameter that takes into account the OP and UP costs.
The proposed methodology can be applied for any prediction technique, both traditional and those based on the application of AI. In this article we illustrate the proposed methodology for the following two prediction techniques: the first one based on Seasonal Autoregressive Integrated Moving Average (SARIMA) traditional models; the second one based on the application of Long Short Term Memory (LSTM) neural networks.
To our best knowledge, only Bega [31]- [33] proposes a solution for Mobile Network Resource Orchestration in which the different values of the OP and UP costs are taken into account. Deep Cognitive framework is proposed for the resource allocation to slicing in a 5G mobile environment. It is based on a deep learning technique in which the cost function attributes a rising cost as the amount of over-allocated resources increases and a constant penalty, that is independent of the lost traffic amount, when a QoS degradation occurs.
In this article we also propose a solution in which OP and UP costs are considered and our work differs from [31]- [33] in the following points: • our solution is proposed for a NFV network scenario where the data center and network model is well detailed and articulated and the application is mainly based on the NFV implementation of middleboxes; • the proposed procedures are based on SARIMA and LSTM prediction techniques that are different from those proposed in [31] where convolutional neural networks are considered; • the function costs are characterized by parameters that can be set for a cost penalty not necessarily constant but related to the amount of traffic lost during the underprovisioning periods. The paper is organized as follows. The related work is mentioned in Section II. We describe the problem statement in Section III. The SARIMA traffic forecasting technique based on asymmetric cost function is illustrated in Section IV. The asymmetric LSTM traffic forecasting technique is described in Section V. The numerical results, reported in Section VI, show the effectiveness of the proposed technique with respect to MSE-based traditional forecasting techniques in an NFV network environment. Appendix A describes an extension of the European Telecommunications Standards Institute (ETSI) NFV architecture for the support of the proposed prediction and reconfiguration algorithms. Appendix B reports the evaluation of a parameter of the SARIMA-based forecasting technique with asymmetric cost function.

II. RELATED WORK AND RESEARCH MOTIVATION
Reactive resource reconfiguration [12]- [14], [26] approaches in NFV networks have shown all their limits in terms of QoS degradation due to the high time needed to change the allocated cloud resources (increase/decrease of cores allocated, instantiation/removing of VNFIs,. . . ) [28]. For this reason recently the focus has been on a proactive approach where VOLUME 8, 2020 cloud resources can be reconfigured in advance thanks to a traffic [34] or allocated resource prediction [28]. A traffic prediction-based approach is proposed in [34], [35] in the case of NFV networks in which the NFVI-PoP are interconnected by an EON [34], [35]; the SFC traffic parameters are predicted in [36], [37] by applying a LSTM recurrent network.
Tang et al. [38] proposes a traffic prediction method for scaling resources in NFV environments based on traffic modeling with an Autoregressive Moving Average (ARMA); the predicted traffic values are obtained by minimizing MSE.
Among the solutions based on the prediction of the resources to be allocated, Farahnakian et al. [39] proposes regressive algorithms for estimating memory and processing consumption in cloud datacenters; the proposed solutions are based on Linear Regression [40] and K-Nearest Neighbor Regression (K-NNR) [41] methods that notoriously determine the prediction by minimizing symmetric error functions. A VNF migration algorithm is proposed and investigated in [42]; it is based on a deep belief network framework to predict the future resource requirements; the authors show how the proposed solution can obtain better estimates of CPU resources than a solution based on Back Propagation Neural Network [43] in terms of MSE. Some solutions [28], [44]- [46] have been proposed on the prediction of host load in cloud infrastructures; these solutions are based on time series forecasting with LSTM recurring neural networks; however, all are based on minimizing MSE.
Other approaches have been proposed that are based on machine learning classification procedures; for example in Rahman's work [47] the classification problem is to choose the best VNFI resource scaling actions to minimize operational cost and QoS degradation.
All the above mentioned solutions have the ambition to predict exactly the traffic or resources to be allocated. For this reason are based on the minimization of symmetric cost functions such as MSE. Unfortunately, there are random components that are not predictable and that leads to an unavoidable prediction error. Such a mistake leads to higher operational costs. For example, if the predicted traffic is higher than the real traffic, the resources will be over-sized and this will lead to an OP cost; in the opposite case less resources will be allocated and this will lead to a QoS degradation and to an UP cost characterized by the compensation due to the user.
Our research objective is to propose and evaluate a prediction-based allocation technique in which both UP and OP costs are takes into account.
A preliminary result on the advantages of the proposed SARIMA prediction technique is illustrated in [48]. The following contributions are added in this manuscript: • an extensive description of the SARIMA prediction model with asymmetric loss function is reported; • an innovative prediction algorithm based on an LSTM recurrent neural network with asymmetric cusp loss function is added; • extensive numerical results are reported in which the operational costs of an NFV network with resource allocation based on SARIMA and LSTM are evaluated.
• an extension of ETSI NFV architecture with proposed traffic prediction and resource allocation modules is reported and described in Appendix A.

III. PROBLEM STATEMENT
The objective of the paper is to propose and evaluate a solution for the cloud and bandwidth resource allocation in NFV environments in which the traffic offered is not apriory known but it is predicted according to a prediction technique aiming at minimizing the total operational cost. Two cost components are considered for a predicted SFC traffic: i) Cloud Resource Allocation Cost; ii) QoS Degradation Cost occurring when the traffic is incorrectly predicted, less resources are allocated and the network operator must pay a compensation cost to the user due to lost traffic. We report the reference scenario in Fig. 1.a where four NFVI-PoPs interconnected by an EON are represented in a given traffic scenario. Five VNFIs are instantiated: NFVI-PoP #1 hosts VNFI #1 and VNFI #2, NFVI-PoP #2 hosts VNFI #3 and VNFI #4 and NFVI-PoP #3 hosts VNFI #5. Processing resources, represented by black rectangles, are allocated to the VNFIs. Three optical paths are also set up to interconnect the tuples VNFI #1 and VNFI #5, VNFI #2 and VNFI #3, VNFI #4 and VNFI #5. The interconnection is realized by allocating Frequency Slots (FS) whose number is depending on: i) the bandwidth to be allocated between the VNFIs; ii) the optical path length that determines the best modulation system according to the optical signal quality to be guaranteed [12].
In a dynamic traffic scenario, the cloud and bandwidth resources have to be re-allocated according to the current traffic conditions. The following operations can be performed: i) horizontal [1] or vertical scaling [26] of the cloud resources allocated to the VNFIs; ii) migrations of VNFIs towards a different NFVI-PoP [18]; iii) reconfigurations of optical paths by changing the routing and increasing/decreasing the number of allocated FSs [12]. We report an example of reconfiguration in Fig. 1.b in the case of a traffic increase between the VNFIs #1 and #5. For handling this increase the cloud resources allocated to the two VNFIs are increased by applying a vertical scaling technique that lead to increase their processing capacity of an amount represented with a grey rectangle in Fig. 1.b. Furthermore the optical path bandwidth is increased by allocating other two FSs in the network links on which the optical path is set up and represented with dotted blue lines in Fig. 1 Reactive reconfiguration approaches are not suited in NFV environments especially due to the high time needed to reconfigure the cloud resources [34]. For this reason traffic prediction is needed to allocate in advance the cloud resources. Unfortunately the traffic cannot be predicted exactly and the prediction error may lead to resource over/under provisioning with a consequently increase in operational network cost. Over provisioning occurs when the predicted traffic is higher than the real one; in this case more cloud and bandwidth resources than needed are allocated; an example of over provisioning is illustrated in Fig. 1.c where the additional cloud and bandwidth resources are reported with violet rectangles; obviously the allocation of unnecessary resources leads to a cost increase.
Under provisioning occurs when the predicted traffic is lower than the real one; in this case less resources than needed are allocated as illustrated in Fig. 1.d where the lack of needed resource is represented with crossed rectangles; the under provisioning leads to QoS degradation due to the traffic amount which will inevitably be lost because of the lack of resources; that will determine a cost increase for the network operator due to the compensation cost to be paid to the user for the lost traffic.
From the shown example, we can observe that because the errors in predicting traffic are inevitable, the impact on cost increase is not only dependent on the absolute value of the error but positive and negative errors can differently impact on the cost increase depending on the values of the resource allocation and QoS degradation costs. In particular if the resource allocation costs are higher than the QoS degradation ones, the errors made by the algorithm should lead to predict lower traffic values than the real ones; conversely the algorithm should behave in the opposite way when the QoS degradation costs are higher than the resource allocation ones.
The prediction algorithms are based on the minimization of an error function referred to as loss function. Most of the solutions proposed in literature are based on symmetric loss functions (i.e. MSE, MAE) that are not able to optimize the total cost as previously explained. For this reason we propose solutions with asymmetric loss function and characterized by parameters whose setting depends on the resource allocation and QoS degradation costs. The setting of the parameters is based on the observation of past traffic values and allows for a total cost minimization.
Next we illustrate the cloud infrastructure, network and traffic models in Subsection III-A. The prediction framework is described in Subsection III-B. VOLUME 8, 2020

A. CLOUD INFRASTRUCTURE, NETWORK AND TRAFFIC MODELS
We represent with the graphḠ = (V ,L) the NFVI-PoPs interconnected by the EON, where the setL denotes the optical links and the setV denotes the union of three sets: i) V NP containing the NFVI-PoPs; ii)V A containing the access nodes in which the traffic is originated/terminated; iii)V S containing the optical switches.
The NFVI-PoPs are equipped with cloud resources characterized by processing cores. We denote with Nv the number of cores assigned to the NFVI-PoPv ∈V NP .
VNFIs, supported by VMs, are activated to support the execution of Service Functions (SFs) as Firewall (FW), Load Balancer (LB), Network Address Translation (NAT),. . . We assume vertical processing resource scaling where the processing cores assigned to the VNFIs can be changed over time according to the VNFI current load. In particular if F SFs are supported then F VNFI types can be instantiated. For the i-th (i = 1, . . . , F) type VNFI we denote with: . . , F): the maximum processing capacity that can be assigned to i-th type VNFI; • n c i (i = 1, . . . , F): number of cores assigned to i-th type VNFI when the maximum processing capacity is provided; : the processing capacity assigned to i-th type VNFI when k cores are assigned to i-th type VNFI. We also denote with c corē v the core cost expressed in ($/h) and characterizing the cost of renting one processing core for one hour in the NFVI-PoPv ∈V NP . We also introduce the average core cost c core av expressed by: The traffic demand is characterized by the SFCs whose bandwidth is variable over time. N SFCs are generated; the i-th SFC (i = 1, . . . , N ) is characterized by R i SFs and we introduce the binary variable We characterize the SFCs with the average bandwidth offered in Time Intervals (TI) of duration T s . In particular we denote with b j (i) the offered average bandwidth of the i-th SFC (i = 1, . . . , N ) in the j-th (j = 1, 2, · · · ) TI.
The cloud resource allocation cost for the i − th SFC is denoted with C RA,i ; it is expressed in ($/Gb) and it characterizes the average cost for the cloud resource allocation needed to the SFC bandwidth of one Gb. This cost can be easily expressed as: Expression (2) can be justified as follows: • the i-th SFC allocation cost C RA,i is given by the sum of the allocation costs of each of the R i SFs composing the SFC; • the support of one Gb of traffic for p-th type SF requires the allocation of n c p C pr,max p cores each of which has an average cost of c core av ; • the cost evaluated in each term of expression (2) has to be included if the j-th SF of the i-th SFC is of p-th type that is if s i (j, p) equals 1.
Finally we denote the QoS degradation cost with C QoS ; it is expressed in ($/Gb) and characterizes the cost to be paid by the network operator when resources are not allocated for a SFC bandwidth of one Gb.
To limit their number, the VNFIs are shared among the SFCs. The VNFIs are instantiated and connected with optical paths. The SFCs are routed through the VNFIs so as to execute the SFs of each SFC. The VNFIs and their interconnection can be represented by the graph G = (V , L) where the set of nodes V characterizes the VNFIs and the set L contains elements representing the logical links interconnecting the VNFIs.

B. CLOUD AND BANDWIDTH PROVISIONING FRAMEWORK WITH TRAFFIC PREDICTION
A resource allocation algorithm has the objective to determine an embedding (Ḡ, G) of the VNFI graph G = (V , L) into the physical graphḠ = (V ,L) by determining: i) in which NFVI-PoP any VNFI is executed; ii) the cloud (processing) resources to be assigned to the VNFIs; iii) in which optical path any logical link has to be routed; iv) the number of FSs to be allocated on the chosen optical path. When traffic variations over time occur, cloud and bandwidth reconfigurations are needed to reduce the costs. Some reconfiguration techniques have been proposed. For instance the solution proposed in [12] leverages the following techniques: i) migration of VNFIs towards lowest cost NFVI-PoPs; ii) vertical cloud resource scaling by increasing/decreasing the number of cores allocated to the VNFIs. To apply the techniques, embedding changes of the VNFI graph G = (V , L) into the physical graphḠ = (V ,L) are needed and depending on the the processing capacities f requested by the nodes v ∈ V and the requested bandwidth f (j) e (j = 1, 2, · · · ) by the links e ∈ L of the VNFI graph in the j-th TI (j = 1, 2, · · · ). The processing capacities and the link bandwidths are depending on the offered SFC bandwidths and for this reason they are not a-priori known. We propose and investigate a reconfiguration solution based on the prediction of the offered SFC bandwidths. Because it is not possible to determine the traffic exactly, we propose a solution that underestimates or overestimates the traffic according to the values of the resource allocation and QoS degradation costs.
The main steps performed by the framework for the cloud and bandwidth resource provisioning are illustrated in Algorithm 1. The inputs are: the physical graphḠ = (V ,L), applying the NORR/ONRCA algorithms [12] and evaluating the embeddings n+j (Ḡ, known up to TI n and the VNFI graph G = (V , L). Next a multi-step ahead prediction of the SFC offered is performed in step 2 by predicting the next h SFC bandwidth values b n+j (i) (i = 1, . . . , N , j = 1, . . . , h). That allows for the evaluation in step 3 of an estimate of the link bandwidths f e (n+j) and the nodes processing capacities f (n+j) v of the VNFI graph in the TIs n + 1, . . . , n + h. The knowledge of these estimated values and the application of cloud and bandwidth resource reconfiguration algorithms allow in step 4 for the determination of h new embeddings n+j (Ḡ, G) (j = 1, . . . , h) to be applied in the TIs n + 1, . . . , n + h. We apply the reconfiguration algorithms proposed in [12] referred to as NFV/Optical Resource Reconfiguration (NORR) and Optical Network Reconfiguration Costs Aware (ONRCA). Finally the framework returns the evaluated embeddings n+j (Ḡ, G) (j = 1, . . . , h).

IV. SFC BANDWIDTH FORECASTING BASED ON SARIMA AND ASYMMETRIC LINEX COST FUNCTION
To simplify the notations, next we drop the i parameter characterizing the offered SFCs; we will explain the traffic forecasting procedures for a generic SFC.
We propose an SFC bandwidth forecasting procedure based on: i) characterizing the SFC bandwidth values {b j j = 1, . . . , n} as a time series; ii) modeling the time series with a SARIMA process; iii) forecasting the observed bandwidth valuesb n+j at time n + j (j = 1, . . . , h) of a SARIMA in the case in which an asymmetric cost function of the error b n+j −b n+j (j = 1, . . . , h) is minimized. The main steps of the proposed methodology are illustrated in Fig. 2 and explained in the next Subsections IV-A-IV-D. The following steps are performed: i) in the first step, illustrated in Subsection IV-A, trend and seasonality, due to the traffic cycle-stationarity, are removed from the times series and a stationary time series is achieved; ii) in the second step, illustrated in Subsection IV-B, the stationary time series is modeled as an Autoregressive Moving Average (ARMA) process by estimating the ARMA model parameters with a maximum likelihood procedure; iii) in the third step, illustrated in Subsection IV-C, the time series forecasting is performed by minimizing the conditioned expectation of the asymmetric cost function of the forecasting error; iv) in the four step, illustrated in Subsection IV-D, the trend and seasonality are recovered.
Finally we illustrate in Subsection IV-E how to set the parameter of the asymmetric cost function so as to achieve bandwidth forecast values allowing for the minimization of the cloud resource allocation and QoS costs.

A. TREND AND SEASONALITY ELIMINATION PROCEDURE
The traffic is non-stationary [26] and has trend and seasonality components. For instance it is well known that the traffic has a daily seasonality component. The trend and seasonality components can be eliminated by differentiating the time VOLUME 8, 2020 series {b j j = 1, . . . , n}. To perform this differentiation we introduce the operator B k that delays k times the values of the time series, that is B k b j = b j−k . The differentiated time series {d j j = 1, . . . , n} can be expressed as follows [49]: where s is the seasonal parameter that may be chosen equal to 24 if a typical daily traffic profile is considered, d and D are the number of times in which the time series {b j j = 1, . . . , n} is differentiated to eliminate the trend and the seasonality respectively. If the parameters s, d and D are appropriately chosen, the time series {d j j = 1, . . . , n} can be made stationary [49].

B. PROCEDURE OF ARMA PARAMETERS IDENTIFICATION AND ESTIMATION
The second step of the proposed methodology consists in modeling the stationary time series {d j j = 1, . . . , n} with an Autoregressive Moving Average (ARMA) process {D j j = 1, . . . , }, that is expressed by the following expression [49]: wherein: • ϕ(B) = 1 − ϕ 1 z − · · · − ϕ p z p and ω(B) = 1 − ω 1 z − · · · − ω q z q are the autoregressive and moving average components respectively allowing for the characterization of correlation between the values of the time series belonging to different seasons; • π(B) = 1 − π 1 z − · · · − π P z P and ϑ(B) = 1 − ϑ 1 z − · · · − ϑ Q z Q are the autoregressive and moving average components respectively allowing for the characterization of correlation between the values of the time series belonging to a same season; • µ is a parameter linked to the average value of the time series and it equals zero in the case of zero average time series; • {W j j = 1, . . . , } is a white noise with zero average and standard deviation δ. The identification of the ARMA model involves the choice of the following parameters:  [49]. In the application of the ML procedure the values {d j j = 1, . . . , n} of the original time series are used; at the end we also check that the residuals, given by the difference between the original and ARMA values, are uncorrelated [49].

C. PROCEDURE OF TIME SERIES FORECASTING
All of the prediction-based resource allocation algorithms in NFV environments aims at exactly forecasting either the traffic [34] or the resources [28] to be allocated. They are based on the minimization of the conditioned expectation of a symmetric cost function of the forecast error d n+j −d n+j (j = 1, 2, . . . , h). The classical case is the minimization of the conditioned expectation of MSE, that is E n [(D n+j −d n+j ) 2 ] where the symbol E n [ * ] is the expectation conditioned to the knowledge of the values {d j j = 1, . . . , n}.
The choice of symmetric cost functions leads to equally weight positive and negative errors. Conversely being aware that an exact traffic prediction is not possible, our objective is to make mistakes where it is more convenient according to the cloud resource allocation the QoS degradation costs. For this reason we consider asymmetric cost functions and because of its simplicity we choose the LINEX function [50]. That leads to the minimization of the conditioned expectation E n [L(D n+j −d n+j ) 2 ] where the LINEX function L(x) is defined as follows: In particular notice that: i) for a > 0 (a < 0) the error d n+j −d n+j (j = 1, . . . , h) has higher cost when it is positive (negative); ii) for |a| increasing the difference in cost of positive and negative errors grows.
It is possible to prove [50] that the LINEX optimal predictorĝ n+j (j = 1, . . . , h) has the following expression: whereind n+j (j = 1, . . . , h) is the MSE optimal predictor and σ 2 n+j|n (j = 1, . . . , h) is the conditioned error variance E n [(D n+j −d n+j ) 2 ] whose the iterative evaluation is reported in Appendix B.
From Figure 2 we can notice how the the forecasting valuesĝ n+j (j = 1, . . . , h) in the asymmetric cost function case are achieved by evaluating the forecasting valuesd n+j (j = 1, . . . , h) that minimizes the conditioned expectation of MSE and then by applying the expression (6).

D. TREND AND SEASONALITY RECOVERY PROCEDURE
The final step of the proposed methodology consists in recovering the trend and the seasonality to the predicted time seriesd n+j (j = 1, . . . , h). From the initial transformation of expression (3), we can obtain the following expression [49]: where the coefficients β i (i = 1, . . . , d + sD) are the coefficients of the polynomial From eq. (7) we can write the following expression of the predicted valuesb n+j (j = 1, . . . , h): The valuesb n+j (j = 1, . . . , h) can be recursively evaluated from expression (8) with j = 1, 2, . . . , h and by taking into account thatb n+j−i = b n+j−i for (i = j, j + 1, . . . , d + Ds).

E. SETTING OF THE LINEX COST FUNCTION PARAMETER
We need to set the parameter a of the LINEX function. The value of the parameter determines the shape of the asymmetric cost function and has to be chosen so as to optimize the sum of the cloud resource allocation and the QoS degradation costs. To evaluate the optimal value a opt , instead of using all of the time series {b j j = 1, . . . , n} to identify ARMA process, we split the time series in two sets: the first one is used to estimate the ARMA parameters and the second one is used to evaluate the parameter a opt . The pseudo-code of the procedure for the setting of the parameter a is illustrated in Algorithm 2. We assume to choose the parameter a opt in the interval [a min , a max ]. The procedure chooses (line 2) the index 1 < p < n so that the time series {b j j = 1, . . . , p} is used for the ARMA parameters estimation (line 3), while the time series {b j j = p + 1, . . . , n} is used to evaluate the parameter value a opt (line 4). Next for each value of a, the sum C(a) of the cloud resource allocation and QoS degradation costs (lines 5-10) is evaluated. Finally the value a opt minimizing C(a) and the optimum cost C opt are determined and returned as output (line 13).

V. ASYMMETRIC LOSS FUNCTION-BASED LSTM PREDICTION ALGORITHM
The L unfolded stages version of the LSTM prediction algorithm is illustrated in Fig. 3.a and consists of the following two layers: • the LSTM prediction layer: it performs the time series prediction by providing the storage of the internal states; we consider the case of a single layer composed by L LSTM Cell Blocks (LCB) referred to as LCB j (j = n − L + 1, . . . , n); • the feed forward network layer: it evaluates from the output of the last LSTM layer the h steps ahead predicted bandwidth valuesb n+j (j = 1, . . . , h) stored in the vectorb n,h . The SFC bandwidth predictions are performed by the LSTM layer which has as inputs the SFC bandwidth values b j (j = n − L + 1, . . . , n). The output vector h n is processed by a feed forward neural network which provides to evaluating the vectorb n,h of predicted SFC bandwidth values.
In the LSTM layer the state variable s j (j = n − L + 1, . . . , n) is also updated. In the LSTM Cell Block LCB j , shown in Fig. 3.b, the state variable s j in the j-th TI depends on the following variables: i) the SFC bandwidth value b j ; The LSTM innovative idea is to introduce the forget and input gates that decide which components of the state vector has to be deleted (forget gate) and preserved (input gate). This operation is learned through the training of the weight matrices W fh , W fx , W gh , W gx and biasing vectors d f , d g . The output gate is also introduced in LSTM neural networks. It is characterized by the matrices W oh and W ox , the biasing vector d o and controls what information encoded in the state variable is sent to the output h j of the LSTM Cell Block LCB j .
If W ih , W ix and d i denote the weight matrices and biasing vector for the input, we can write the following expressions for the evaluation of the state variable s j and the output h j of the LSTM Cell Block LCB j : where σ ( * ) represents the sigmoid activation function, while ϕ( * ) represents the tanh activation function. All of the LSTM-based traffic prediction algorithms proposed in literature, [28] are based on the minimization of a symmetric cost function of the errors e n+j = b n+j −b n+j  (j = 1, . . . , h). We consider asymmetric cost functions and because of its simplicity we choose a cusp linear loss function as represented in Fig. 4 where the slopes are dependent on the resources allocation cost C RA and QoS degradation cost C QoS both defined in ($/Gb). As reported in Fig. 4 the training process minimizes the Asymmetric Mean Absolute Error AMAE n,h expressed by: (11) where I (x) is the indicator function that is I (x) = 1 for x > 0 and I (x) = 0 for x < 0.

VI. NUMERICAL RESULTS
We will evaluate the effectiveness of the asymmetric cost function-based SARIMA and LSTM forecasting model in predicting the requested SFC bandwidth when both the cloud resource allocation and QoS degradation costs are considered. The SARIMA and LSTM forecasting technique will be applied in a real scenario to evaluate the operation cost of an NFV network and compare it to the case in which an MSE traditional forecasting technique is applied. We describe the simulation environment in Subsection VI-A. The application of the asymmetric cost function-based SARIMA forecasting technique to real traffic data is illustrated in Subsection VI-B. Finally we will show in Subsection VI-C the effectiveness of the proposed SARIMA and LSTM forecasting solutions when it is applied to allocate the resources in an NFV network whose NFVI-PoPs are interconnected by an EON.

A. SIMULATION ENVIRONMENT
The numerical results will be provided for the values of the simulation parameters reported in Table 1   composed by 12 optical switches and 15 links. The network is equipped with four NFVI-PoPs located in the cities of Atlanta, Denver, Houston and Indianapolis. Each NFVI-PoP is equipped with 3072 cores whose average cost per hour is c core av = 1$/h. We assume that one SFC is established for each tuple of access nodes reported in Fig. 5. We assume as SFC bandwidth values the real ones reported in [51] for the ABILENE network. In particular we consider the traffic values measured at hourly intervals. These values are used to forecast the future traffic values according to the procedure illustrated in Sections IV and V. The SFs are supported by four types of VNFIs whose characteristics, that is maximum processing capacity and the number of allocated cores, are reported in Table 2.

B. SARIMA BANDWIDTH FORECASTING OF A SINGLE SFC BY MINIMIZING AN ASYMMETRIC COST FUNCTION
We evaluate the proposed forecasting technique for the time series reported in Fig. 6 reporting the hourly bandwidth values requested by the SFC instantiated between the nodes Chicago and Indianapolis. The time series is composed by 480 traffic values measured in the weekdays from May 31st 2004 to June 27th 2004 [51]. We have organized the time series into three sets: i) the first 240 values are used for the parameters estimation of the SARIMA model; ii) the next 120 values are used to evaluate the parameter a opt of the LINEX function as illustrated in Subsection IV-E; iii) the last 120 values are used for the test phase in which the real and one-step predicted values are compared. The choice of the core costs and processing capacities of Tables 1 and 2 leads to a cloud resource allocation cost C RA = 0.025$/Gb for the SFC considered according to the expression (2). VOLUME 8, 2020 We carry out the comparison for the following values of the QoS degradation cost: • C QoS = 0.0025$/Gb; that corresponds to the case in which the OP cost is higher than the UP one; • C QoS = 0.025$/Gb; that corresponds to the case in which the OP and UP costs are equal; • C QoS = 0.25$/Gb; that corresponds to the case in which the UP cost is higher than the OP one. We also introduce the parameter w = C RA C QoS ; its value equals 10, 1 and 0.1 when C QoS equals 0.0025, 0.025 and 0.25 respectively.
By applying the procedure illustrated in Subsection IV-B we have estimated the best parameters of the SARIMA model; this study has led to the choice of the following parameter values: i) the value of the parameter s has been chosen equal to 24 due to the traffic daily periodicity; ii) both the differentiation parameters d and D for the trend and seasonality elimination have been chosen equal to 1; iii) the maximization of the likelihood function for the ARMA model illustrated in Subsection IV-B has led to the choice of the following parameter values: p = 16, q = 8, P = 1, Q = 1. To determine the optimal parameter a opt of the LINEX function we evaluate, considering the 120 values of the times series of indexes from 241 to 360, the cost function C(a) introduced in Subsection IV-E for values of a in the range [−100,100]; the parameter value a opt is determined by choosing the value of a minimizing C(a). In particular we report in Fig 7 the function C(a) for a in the range [−100,100] and for values of the parameter w equal to 0.1, 1 and 10. The minimization operation leads to choose for a opt the values −25, −2 and 20 for w equal to 0.1, 1 and 10 respectively. We can remark from Fig. 7 that: • when w = 0.1 and consequently the OP cost is lower than the UP one, the procedure for the choice of a opt  behaves correctly by determining a negative value a opt so that the LINEX function expressed by expression (5) provides a lower cost when the real traffic is higher than the predicted one and a higher cost in the opposite case; • when w = 10 and consequently the OP cost is higher than the UP one, the value a opt is positive and the LINEX function gives more weight to errors in which the real traffic is higher than the predicted one; • when w = 1 and consequently the cloud resources allocation and QoS degradation costs C RA and C QoS are equal, the parameter a opt is near to zero and provide a balanced cost function.
The comparison between real and predicted values of the time series from index 361 to 480 is reported in Figs 8-10 for w equal to 0.1, 1 and 10 respectively. We report the prediction values when the MSE and a LINEX cost functions  [51] have been used to evaluate the optimal parameter a opt of the LINEX function according to the procedure illustrated in Subsection IV-E; • the total cost has been evaluated for the period from June 21th 2004 to June 27th 2004 when the optical bandwidth and cloud resources are allocated and reconfigured on the basis of the predicted traffic values and by applying the reconfiguration algorithms described in [12]. We report the cost in Fig. 11 in the cases in which the SARIMA traffic predictions are performed with the minimization of the MSE and LINEX function. Three cost components are reported as a function of the parameter w: the total cost, the cloud resource allocation cost and the QoS degradation cost. From the results reported in Fig. 11 we can remark that: • the proposed forecasting solution based on the asymmetric cost function allows for total costs lower than or equal to the one of the MSE-based forecasting solution; the total costs of the two solutions are equals only for w = 1 that is when the OP and UP costs are equal; as a matter of example the total costs of the LINEX and MSE solutions for w = 0.1 are 1794$ and 2955$ with 40% cost advantage of our proposed prediction solution; • the better performance in cost total of the LINEX predictions for w lower than 1 is due to the fact that it reduces the resource under-provisioning periods as highlighted from the QoS degradation costs that are lower with respect to the MSE-based prediction solution; • the better performance in cost total of the LINEX predictions for w higher than 1 is a consequence of the reduction in the over-provisioning periods that, as highlighted in Fig. 11, leads to resource allocation costs lower with respect to the MSE-based prediction solution. Next we show in Fig. 12 the cost comparison when the traffic predictions are performed with SARIMA and LSTM approaches respectively. The LSTM predicted values are evaluated by applying the proposed traffic forecasting algorithm illustrated in Section V and from the knowledge of the real requested SFC bandwidth values from May 31st 2004 to June 20th 2004 [51]. The real traffic values are used for the LSTM training. To reduce the training times we have considered an LSTM network with the following parameters [36]: i) the number N nr of neurons equals 8; ii) the loop-back parameter L equals 24; iii) the batch size N sz equals 24; iv) the total number N ep of epochs has been fixed  to 20, that is LSTM training process is executed 20 times to find the best model to perform forecasting.
We can notice from Fig. 12 as the application of an AI prediction technique as LSTM allows for lower cost when w is lower than or equal to 1 that is when the UP costs are larger than the OP ones. This is justifiable for the best prediction that the LSTM approach allows to obtain with respect to the SARIMA one as shown in Fig. 13 where we report the real, LSTM and SARIMA predicted traffic values between the nodes Chicago and Indianapolis of the network of Fig. 5 for the period from June 21th 2004 to June 27th 2004. We can observe from Fig. 13 that the application of asymmetric cost function allows both SARIMA and LSTM prediction techniques to provide predicted traffic values larger than the real ones but the LSTM values are closer to the real ones. We report in Fig. 14 the real, LSTM and SARIMA predicted traffic values for the case w = 10 that is when the UP costs are lower than the OP ones. We can still observe that the proposed asymmetric SARIMA and LSTM prediction techniques works correctly underestimating the real traffic and as before LSTM approach provides results closer to the real ones with respect to SARIMA. That allows SARIMA, as shown in Fig. 12, to achieve lower resource allocation costs as well as total cost slightly lower than LSTM in the case w larger than 1.

VII. CONCLUSION
We have proposed and investigated traffic prediction techniques in which the predicted values takes into account the OP and UP costs in NFV networks. Since all prediction techniques make prediction errors then the proposed techniques aim to predict under-estimates or over-estimates of traffic depending on whether the OP cost is lower or higher than the UP one respectively. The techniques have been applied to traditional and AI-based prediction algorithms by defining appropriate loss functions. In particular the SARIMA and LSTM prediction algorithms have been considered with LINEX and cusp loss functions respectively. The proposed solutions have been applied to evaluate the operational cost of an Abilene network equipped with four NFVI-PoPs. We have verified how the proposed asymmetric loss functions allows for a cost reduction that can reach the 40% in some cases. Furthermore we have also shown how the LSTM technique is more effective than SARIMA one in reducing the total cost especially when the OP costs are lower then the UP one.

APPENDIX A EXTENSION OF THE ETSI NFV ARCHITECTURE FOR THE SUPPORT OF THE TRAFFIC PREDICTION AND RESOURCE RECONFIGURATION SOLUTIONS
We show an extension of the ETSI NFV architecture [52] for the support of the proposed LSTM and SARIMA prediction and reconfigurations algorithms. In particular the application of the traffic prediction and resource reconfiguration algorithms will occur as illustrated in Fig. 15 where we report the main functional blocks of the NFV architecture enriched with some operations that allow for the support of proposed algorithms. In the reported example, the processing, RAM and disk memory resources are handled in two NFVI-PoPs. The management and the orchestration of virtualized resources are handed by the Virtual Infrastructure Manager (VIM). In the considered scenario a specialized VIM is also introduced referred to as WAN Infrastructure Manager (WIM) typically used to establish connectivity between access switches in different NFVI-PoPs. VIM and WIM are also helped by Network Controllers to configure both virtual and legacy electronic and optical switches in order to support the concatenation of VNFs of a Network Service (NS). The VNF Managers (VNFM) are responsible for the lifecycle management of VNF Instances and perform functions like VNF instantiation and termination, VNF instance scaling out/in and up/down. The NFV Orchestrator (NFVO) has the responsibilities of the orchestration of NFVI resources across multiple VIM and the lifecycle management of NS. It performs the main following main functions: i) on-boarding of NSs and VNFs in NS and NVF catalogues respectively; ii) storing in the VNFI repository of information about the allocated and consumed cloud and bandwidth resources for the NS instances; iii) NS instantiation and termination and NS instance scaling out/in and up/down. Operation Support System/Business Support System (OSS/BSS) allows for the legacy device management and it provides to submit requests to the NFVO as the ones to on-boarding NS and VNF, to instantiating, terminating and resource scaling any NS and VNF instance. ETSI reports some reference points (Or-Ma-Nfvo, Or-Vi, Nf-Vi, Vi-Vnfm, Or-Vnfm, Ve-Vnfmvnf) in which some interfaces are defined. Through these interfaces, the functional blocks can call up some operations which allows for the NFV management and orchestration. Next we illustrate how we can support the proposed reconfiguration solution in the NFV architecture scenario depicted in Fig. [52]. First of all, two modules are added to the ETSI NFV architectures. The first one is devoted to estimate traffic; the module performs the estimation by applying algorithms such as SARIMA and LSTM illustrated in this article. The estimated traffic data are sent by the OSS/BSS to the NFVO. The second module provides to the application of the reconfiguration algorithms on the basis of the estimated traffic data; it determines how to reconfigure the NSs so as to minimize the sum of the cloud, bandwidth and reconfigurations costs. The main procedures involved to support the algorithms are: • A NS Descriptor (NSD) describing a NS in which the cloud and bandwidth resources can be re-allocated is defined; a request is presented by the OSS/BSS to the NFVO for on-boarding the NSD; the request is presented by using the operation On-board Network Service Descriptor of the Network Service Descriptor interface defined by ETSI; the NFVO inserts the NSD in the NSD catalogue and acknowledges the Network Service on-boarding; • NFVO receives from the OSS/BSS requests for the instantiation of NSs; the requests specify some NSD VOLUME 8, 2020 descriptor parameters characterizing Access Points, Egress Points, VNFs and cloud and bandwidth resources to be allocated; NFVO receives from OSS/BSS a request to instantiate a Network Service using the operation Instantiate Network Service of the Network Service Lifecycle Management interface defined by ETSI; the NFVO provides to instantiate the NS by contacting all of the NFV actors (VIM, WIM, VNFM,. . . ) and if the operation is successful it acknowledges to the OSS/BSS the completion of the NS instantiation; a NS instance is represented with red arrows in Fig. 15 and involve two VNFIs (VNFI-1 and VNFI-2); • the estimated traffic data are periodically sent by the OSS/BSS to the NFVO that executes the proposed reconfiguration algorithms so as to minimize the sum of the cloud, bandwidth and reconfiguration costs [7]; next the NFVO activates the VIM, VNFM, WIM and Network Controllers to reconfigure the NSs according to the outputs of the algorithm; for instance we have reported in dashed red arrows the NS reconfigured; in this case VNF-2 is migrated toward the NFVI-PoPs whose cloud resource cost is lower than the one of the NFVI-PoP-2. The ARMA process expressed by (4) can be equivalently written as follows: where ϕ * (B) and ω * (B) are polynomials of degree p * = p+sP and q * = q + sQ respectively. It has been shown [50] that the LINEX predictor has the following expression: where the expression of σ 2 h+j|h is the following: whereẽ n+j−i|n = d n+j−i −d n+j−i (j = 1, . . . , h). From expression (14) we can notice how the term σ 2 n+j|n j = 1, . . . , h can be recursively evaluated starting from j = 1 as long as a recursive evaluation of the terms E n [ẽ n+j−i|nẽn+j−k|n ] and E n [ẽ n+j−i|n w n+j−k ] can be accomplished.
By taking account of 13 and after some algebra we achieve the iterative procedures in Algorithm 3 and Algorithm 4 for the evaluation ofσ i,j andρ i,j (i, j = 1, . . . , h) respectively.