Autonomous Intelligent VNF Profiling for Future Intelligent Network Orchestration

In this article, we propose a profile-based data-driven analysis framework to extract and analyze the characteristics and behavior of virtualized network functions (VNFs) in virtualized networks from the resource and performance perspective. This framework represents some solutions for applying profiling information to analyze VNF-level service performance and discover resource and performance correlations. Different machine learning approaches deploy the resource and performance analysis outcomes to make some performance predictions for the proactive orchestration and management of the service life cycle. Although there have been a number of prior studies on VNF profiling, to the best of our knowledge, this article is the first study introducing an autonomous time-wise profiling method that provides insights on VNF behavior in both performance and resource utilization perspectives, in a real deployment environment. This helps to provide efficient resource allocation and performance plans to ensure service performance requirements, specified in the service level agreement, and prevent unnecessary life cycle management (LCM) actions such as VNF migration and scaling. We present a detailed evaluation to validate our method and a case study showing how the automated system improves the LCM decision by reducing the number of VNF migrations in a real-life scenario.


I. INTRODUCTION
N ETWORK function virtualization (NFV) has emerged as a promising technology to make networks flexible and scalable. NFV decouples network functions (NFs) from the proprietary hardware and allows NFV service providers to run virtualized NFs (VNFs) with different functionalities as software on top of a common physical node and share resources such as CPU time, memory access, and network bandwidth. Despite the flexibility and resource sharing achieved by network virtualization, it poses new concerns especially in terms of VNF orchestration [1] and resource management [2], ensuring service performance requirements specified in the service level agreement (SLA). The diverse VNFs' performance requirements and the dynamic nature of the network imply complex processes of VNF orchestration [3]. Although the existing network function virtualization management and orchestration (NFV-MANO) systems are capable of supporting the management of resources and life cycle management (LCM) of VNFs, they lack enough intelligence to support the orchestration processes autonomously. The next generation of intelligent orchestrators should be able to react proactively to the increase or decrease of the compute, storage, and network resources and autonomously manage the VNFs accordingly. To this end, the intelligent orchestrators need some insights about the behavior of VNFs to make autonomous decisions about the VNF orchestration and resource management to meet the service performance requirements [4]. In this work, our main focus is on providing such intelligence and insights about VNF behavior, (i.e., VNF performance and resource utilization) to be used as the basis for autonomous VNF orchestrations and proactive resource management. A VNF profiling platform provides some information, named VNF profile, about the different characteristics of VNFs by monitoring and benchmarking the underlying virtualized system. The studies on VNF profiling in the literature have some limitations that are summarized as follows: 1) The existing literature mainly considers different aspects such as VNF performance monitoring, benchmarking [5], [6], data analysis, and prediction [7], [8], [9] separately, and hence there is limited literature that focuses on the entire process i.e. from VNF performance monitoring to data analysis and performance prediction. To the best of our knowledge, none of those studies automate the entire process of monitoring, data gathering, and profiling of NFV platforms in a real deployment environment. 2) There are numerous studies on profiling VNF performance and the configuration of resources for achieving VNF performance, but there are limited efforts to profile VNF behavior in two aspects that include performance and resource utilization considering different types of resources such as computation, memory, and link resources. 3) Although many studies have utilized direct data to study various problems related to virtualized networks [10], it is significantly difficult to attain accurate insights about VNF behavior from only telemetry data of direct metrics. For example, the number of CPU cores used by a given VNF and its ingress traffic rate are all direct metrics and the problem with these is that they do not mean much when there is not enough context to correlate them with. This essentially makes profiling data and analyzing become very important for extracting some insights on VNF resource-performance behavior [11].
To address the aforementioned limitations in the previous research works, we propose a novel profile-based data-driven VNF performance-resource analysis (PDPA) framework and methodology, in conjunction with already standardized or future intelligent NFV management and orchestration mechanisms. In PDPA, profile data, analytics, and machine learning build off each other to deliver deep understanding, or insights, about the resource-performance behavior of VNFs. Instead of providing and working with some volatile monitoring metrics for reactive service management, we characterize the metrics that significantly impact the VNF-level service performance and use them by machine learning-based approaches to make performance predictions for proactive VNF management. It helps network service operators make efficient resource allocation and performance plans that should be considered to efficiently configure the underlying system designed to host the VNFs before deployment of services, avoiding unnecessary LCM actions such as VNF migration and scaling.
For the evaluation of our proposed approach, a time-wise method is used to execute iPerf-based benchmarking and monitoring, and generate profile data of network functions under varying input data rates and resource configurations with minimum monitoring loads. We demonstrate how the complex relations among NFV performance KPIs, resource allocation, and utilization can be systematically analyzed out of the profiled data. Then, we compare different classification approaches, which model the performance characteristics of any VNF in the function of input data rate and allocated resources using the profile data. Finally, we present the comparative performance of the proactive system over the reactive system in LCM and show how the proactive system reduces the re-configuration of VNFs and VNF migration.
The structure of the article is as follows: Section II presents an overview of the existing work on profiling and benchmarking of virtualized network services. Section III describes the proposed PDPA framework and its associated components. The testbed architecture and implementation are presented in Section IV. The analysis and prediction methodology and results are presented in Section V and VI respectively. Section VII describes a case study for applying PDPA. The following Section, VIII discusses the findings and next steps for future research, while Section IX concludes this article.

II. BACKGROUND AND RELATED WORK
Current NFV-MANO systems lack intelligence for their resource allocation, scaling, and placement decisions. They manage VNFs and handle resources in a reactive manner while neglecting VNF-level service performance. Having VNF resource and performance profiles available as input to the MANO systems will help to optimize their management decisions for effective VNF LCM and efficient resource management to meet the expected service performance requirements. The studies on VNF performance and resource profiling in the literature focus on different aspects such as performance monitoring, benchmarking, analysis, and prediction.

A. BENCHMARKING AND MONITORING
In the NFV research community, benchmarking approaches are introduced as NFV performance testing solutions for the validation and verification of VNFs before the production deployment of the NFV system [18]. The VNF benchmarking solution has been standardized in European Telecommunications Standards Institute (ETSI) specifications [19]. Besides the test tools, such as Yardstick for NFVI testing, there are various benchmarking methodologies and frameworks specifically designed for VNFs in the literature [5], [6]. Benchmarking an NFV platform, considering all possible situations (different configuration parameters, NFVI properties, resource allocation settings, arrival rates), is a complex task requiring several manual operations steps [20]. Moreover, the complexity and run-time rate of benchmarking increase further when the number of parameters in test cases, defined by the benchmarker, increases. In recent years, there is more traction towards the development of benchmarking mechanisms that can utilize machine learning-based techniques to reduce the human intervention for the autonomous benchmarking of NFV systems [21], [22], [23]. Such frameworks provide support for automated performance measurements of VNFs and services, and they do not provide support for describing and implementing general-purpose test cases. The platform proposed in [24] introduces an automated benchmarking approach of NFV systems to test the performance properties of a multi-VNF network service end-to-end; however, the automated test selection and execution algorithms are missing. Table 1 provides a summary and comparison of some existing VNF benchmarking approaches apart from those published for academic research which are neither open source nor commercial. As discussed in [24], benchmarking solutions for NFV systems must be open source to provide the necessary level of transparency to be successful and widely adopted. Each tool focuses on a specific area within the NFV system and has limited automation and end-to-end testing capability.
The authors in [3] discuss how the quality and quantity of the data gathered from monitoring and benchmarking systems influence the quality of the decision (QoD) for orchestration operations including LCM actions and respectively the performance of the provisioned services in the virtualized infrastructure. Taking into consideration the size of a VNFI and the scale of the resources and components required to be monitored results in a very high load that must be delivered periodically by the monitoring system, thereby leading to a high processing load and time in profiling due to large data processing and analysis activities [25]. To address this monitoring complexity and high load to exhaustively benchmark and test the performance of VNFs in all possible situations we must select a representative subset of infrastructure and workload configurations to profile the VNF [26]. To this end, the NFV-Inspector [27] only tests pre-defined and hand-picked policies and parameters. However, this manual parameter selection imposes some human interventions and results in operational complexities and falling short in its capability in terms of providing automation.
To address this challenge, authors in [17] introduce a weighted resource configuration selection (WRCS) approach that automates the benchmarking process of NFV platforms and also reduces the monitoring load by considering the test cases that have the highest impact on VNF performance. We use WRCS as our proposed framework's main time-wise autonomous component (later explained in Section III). It systematically benchmarks and profiles almost any kind of VNF, considering the VNF as a black box, and provides sufficient monitored data with minimum monitoring effort.
Even with the provisioning of sufficient monitored data, the QoD of the orchestrator cannot be guaranteed as it also depends on the intelligence of the orchestration algorithms (such as VNF placement and migration algorithms) that exploit data from the monitoring system. To this end, an intelligent VNF profiling framework is required to exploit the monitoring and benchmarking data and provide the essential intelligence capabilities to orchestration algorithms by providing them some insights on VNF performance behavior. Further, in this section, we review the state of the art of such profiling approaches.

B. PREDICTION AND ANALYSIS
The next generation of intelligent orchestrators should be able to autonomously manage the VNFs. The authors in [28] discuss that the high degree of automation, required by future intelligent VNF orchestrators, is paired with proactive decision-making capability. In a proactive decision-making solution [29], [30], the decision is made based on some predictions and the action (e.g. recovery procedure) starts ahead of time but in the case of the reactive one, where there is no prediction, the action (recovery procedure) is executed after the (failure) occurrence. As such, different VNF orchestration decision-makers can acquire proactive capability by having profile-based predictions and analysis as their input. Table 2 represents a comparison between various VNF profiling approaches (having a main focus on prediction and analysis aspects for intelligent orchestration) and the PDPA framework, proposed in this paper. The comparison of these profiling approaches is in terms of their considered resources and performance metrics, and the methods used to predict the performance measurements and/or the required resources. The predicted resource and performance profiles that can be used during the development and LCM of VNF can be categorized as: 1) VNF performance KPIs, specified in the SLA, can be used in the operation phase of the VNF DevOps cycle for resource scaling and placement decisions, 2) the absolute configuration of resources to meet the required performance KPIs which can be used in the development phase, 3) VNF performance in terms of resource utilization which provides more meaningful insights that can be used in both the development phase for resource configuration decisions and the operation phase for resource scaling and placement decisions.
To the best of our knowledge, all the compared profiling methods and frameworks are limited in their automation capability, and resource and performance analysis of VNFs. None of the VNF profiling studies automates the entire process of benchmarking, monitoring, data analysis, and profiling of NFV platforms in a real deployment environment. Among all the reviewed approaches, NAP [17] uses maximum automation however it does not cover resource-performance analysis and prediction of resource utilization which is one of the distinct features of our proposed profiling framework named PDPA. The proposed PDPA automates all the profiling processes, considering different types of resources such as computation, memory, and link resources. PDPA embeds profile data analytics and machine learning to deliver a deeper understanding of the resource-performance behavior of VNFs.

III. PROFILE-BASED DATA-DRIVEN VNF-LEVEL PERFORMANCE ANALYSIS
We introduce a fully automated (the entire process from VNF performance monitoring to data analysis and performance prediction) profile-based data-driven VNF performance-resource analysis (PDPA) framework in Fig. 1 along with our proposed solutions (defined by red color) for each component of this framework. PDPA framework considers different NFVIs as the test environment, where different VNF instances reside. NFVI is controlled and managed by Open Source MANO (OSM) [32], as the NFV orchestrator (NFVO) solution. On the next level over NFVO, there is a profiling component, named Profiler, consisting of Monitoring and Benchmarking Agents. Inside the Profiler, the performance and resource consumption of the VNFs are tested against different test cases defined by the Benchmarking Agent, then test results are monitored and collected by the Monitoring Agent.
We apply the WRCS approach [17] as the Benchmarking Agent of our proposed framework. It systematically benchmarks almost any kind of VNF (consider the VNF as a black box) and provides sufficient monitored data with minimum monitoring effort. The zero-touch network orchestration architecture proposed in [33] demonstrates how to apply this method integrated with other processes of network orchestration. WRCS defines different test cases out of all possible resource configurations. It computes the weight of each resource by checking its impact on the maximum input rate (MIR), then prioritizes and selects the resources with the highest weight. To make a test suit containing different resource configurations, weighted random configurations (i.e., a weighted random number between the minimum and maximum amount) of prioritized resources and different input data rates are considered. Selected test cases are applied to the VNF instances running on NFVI. The Monitoring Agent monitors the performance metrics and the utilization of resources by each VNF. Next, the Profiler creates the performance profile datasets that map the resource configurations defined in the test suit to the measured VNF performance metrics for the running network service over NFVI.
On the next level over profiling, there is the Analyzer component, providing the potential intelligence for the intelligent VNF orchestration. It uses the profile datasets that are extracted by the Profiler. The Analyzer component contains three main sub-components: (i) Pre-processor processes the profile datasets to prepare them for data analysis; (ii) Performance-Resource correlation Analyzer to analyze and model the VNF-level performance and resource utilization behavior for the prediction of forthcoming resource requirements; (iii) Predictor predicts the performance of VNFs by introducing a data-driven classification approach, which models the performance characteristics of any VNF as a function of input data rate and allocated resources using the profile information for proactive management decisionmaking.
The acquired performance-resource modelings of VNFs and corresponding predictions can then be exploited by service providers and network administrators or, at a lower level, by NFVO towards efficient and proactive resource management and respective LCM decisions. The proposed framework can be integrated with different VNF resource management, VNF migration/scaling, and mobility management algorithms by generating alerts to performance drops, and resource demand burst, and predicting handover requests to derive LCM decisions proactively and ensures that the respective service KPIs and SLAs are satisfied. Fig. 2 illustrates the step-wise generic process flow which can be applied over any NFVI platforms for the VNF-level performance-resource analysis. The following steps depicted in this figure describe how the proposed framework can be applied for the whole bottom-up process from VNF onboarding to data analysis. In the first step, the NFVI is prepared by onboarding the VNF instances. When the NFVI as the test environment gets prepared the Profiler platform is installed over the NFVI, Then, WRCS is executed to select test cases consisting of resource and system configurations. The NFV system is configured accordingly and test cases are applied on VNF instances and resource monitor (by utilizing Prometheus) 1 monitors and logs the resource consumption and KPIs, and sends the collected datasets to the pre-processor. Pre-processor prepares data to be applied by the Analyzer and subsequently, the Analyzer creates the 1 https://prometheus.io correlations among VNF's KPIs and resource utilization and then, sends them to the Predictor to make predictions on resource utilization. The predictions will be exploited by the orchestrator to make the required proactive LCM decisions.
The PDPA profiling framework is capable to profile any VNF using a black-box approach. However, it is important to note that each VNF has unique performance and resource utilization behavior, and therefore requires its own specific performance profile dataset to accurately model its behavior. It is not recommended to use the trained model of one VNF for another different VNF particularly if they have significant differences in terms of their performance requirements, traffic patterns, infrastructure, and configurations. Nevertheless, transfer learning methods can be used to leverage the well-trained model to initialize the weights of the model to be trained on a new VNF. This approach can reduce the amount of data required to train the new model and improve its performance.
The overall complexity of the PDPA framework can be breakdown into the complexity of data collection (including VNF benchmarking and monitoring), and the complexity of the machine learning model training and testing, as the main time-wise processes of the PDPA framework. Consequently, to minimize the overall complexity of PDPA as a data-driven framework, it is crucial to minimize the complexity of mentioned processes. To this end, the data collection process is streamlined in the least possible time by applying automated solutions for both monitoring and benchmarking agents (Prometheus and WRCS, respectively). They perform collecting data automatically without any human intervention. To further minimize the complexity of data collection while ensuring that the collected data will be enough diverse, the WRCS approach defines different test cases out of all possible resource configurations, incorporating only the resource configurations that have the highest impact on VNF performance. Therefore, the benchmarking and monitoring load is reduced (i) as the number of test cases (benchmarking scenarios) is minimized, and (ii) as the test cases are diverse enough and we do not need to periodically deliver the benchmarking. Finally, to minimize the complexity of machine learning model training and testing, the learning models are trained and tested once in offline mode and then loaded or integrated into the NFV MANO system, so the complexity of model training and testing does not affect the total profiling time. While offline profiling simplifies the complete implementation, service providers can revise trained models by using the up-to-date monitored metrics whenever they make big changes in the underlying virtualized system or the type of profiled VNFs.

IV. IMPLEMENTATION AND EVALUATION
Our testbed architecture, depicted in Fig. 3, illustrates the profile-base measurement model for data gathering and the system topology that we use for the evaluation experiments of the whole proposed framework. In order to generate some profile data, the Autonomous Profiler weighted randomly selects a configuration of resources, updates the VNF descriptor (VNFD) of the VNF instance to be profiled, and asks the Open Source MANO to deploy three VNFs as test VNF instances over OpenStack (as the cloud computing management platform), automatically.
To generate the traffic, the first and last VNF hosting iPerf performs data switching/forwarding and transmission/receiving traffic operations. The first VNF acts as a point of origin for the test packets to be sent through the middle VNF. The last VNF is the terminating function, which receives the sent UDP packets. The middle VNF deploys Snort which is used in the Inline mode and acts as an intrusion detection system (IDS). It is important to note that the Snort VNF is used as a sample and any VNF (black-box approach) can be profiled.
From the provided test suites by WRCS, the iPerf application is used to generate UDP-based traffic between the VMs. As shown in Fig. 3, the Network Monitor, monitors network latency in terms of maximum round trip time (RTT) as well as the egress and ingress packets at the middle VNF. Furthermore, we used Prometheus as the Resource Monitor to collect and store the compute resource utilization such as CPU utilization, and memory utilization. The monitoring metrics and KPIs as well as the VNF performance dataset records are stored utilizing the Elasticsearch, Logstash, and Kibana (the Elastic Stack) 2 data repository which will be filtered and used by the Autonomous Profiler.
In this section, we described how different VNF-level performance metrics are collected under different input data rates and resource configurations. We profiled the Snort VNF with the following resource configuration: (i) CPU between 0.3 and 1.0 vCPU cores, (ii) memory between 1 GB and 1.6 GB, (iii) link capacity between 400 MB and 800 MB, and a minimum input rate is 50 Mbps. Having this VNF performance dataset available, we want to investigate how VNF performance, resource utilization, and resource configurations are correlated and extract analytical models from this dataset.

V. ANALYSIS METHODOLOGY AND RESULTS
We try out several statistical and analytical techniques and discuss them towards some useful insights to be applied to VNF resource utilization predictions for proactive orchestrations.

A. PERFORMANCE-RESOURCE CONFIGURATION CORRELATION ANALYSIS
To analyze and discover the influential VNF-level performance-resource relationship, we compute the correlation coefficient among resource configuration, resource utilization (CPU utilization percentage, memory utilization percentage, and output traffic rate), and KPIs (latency, loss rate, and MIR) measurements. Since some of the measured parameters may have a non-Gaussian distribution, we use Spearman's rank correlation coefficient (SRCC) [34] to find the strength of relationships among parameters inside the VNF performance dataset. The Spearman scores are between −1 and 1 for perfectly negatively correlated parameters and perfectly positively correlated respectively and are defined as follows: where R(X ) and R(Y ) are the ranks of each parameter, such that the least value has rank one, and cov and stdv denote covariance and standard deviations of rank parameters, respectively. The correlation matrix illustrated in Fig. 4 represents the computed SRCC among measured parameters in the dataset in both quantitative and pictorial ways: correlation coefficient in the upper triangle (color and intensity indicate whether positive or negative correlation and its strength, respectively), bivariate ellipses in the lower triangle (ellipse direction and color indicate whether positive or negative correlation; ellipticity and color intensity are proportional to the correlation coefficient). The discovered correlations in this matrix provide some useful insights into which parameter may or may not be relevant as input for developing a machine learning model.

B. RESOURCE CONFIGURATION-RESOURCE UTILIZATION ANALYSIS
The three lowest rows in the correlogram matrix show that resource configurations CPU, memory, and link capacity have the strongest relationships with their corresponding utilization. CPU and memory have negative relations with CPU utilization and memory utilization respectively, while link capacity has a positive relation with output traffic rate (denoted as Egress_tx) as the main metric to reflect the rate of link capacity utilization. When CPU and memory resources become scarce their consumption percentage increases to the maximum available amount, reflecting their negative relations; By increasing the available link capacity, its corresponding consumption (output traffic rate) increases too, reflecting the positive relations.

C. PERFORMANCE-RESOURCE UTILIZATION CORRELATION ANALYSIS
The obtained results in the correlogram matrix demonstrate that CPU utilization and memory utilization are statistically correlated with latency (0.65 and 0.67, respectively), MIR (0.78 and 0.7, respectively), and loss rate (0.47 and 0.42, respectively). The bivariant correlations among selected performance parameters and resource utilization are illustrated in Fig. 5 to Fig. 9. As can be seen in Fig. 5, there are some significant regression relations between CPU utilization and performance KPIs such as average MIR, latency, and loss rate. By increasing the performance parameters resource utilization increases too. This can be interpreted as the WRCS keeping the input network load in line with resource capacity, the bigger MIR the higher CPU is utilized, and subsequently, since latency and loss rate are directly proportional to MIR, the bigger latency and loss rate the higher CPU is utilized because of high packet arrival rate. As shown in Fig. 6, in comparison to CPU, memory seems to have the least correlation with the KPIs. To discover and demonstrate the non-linear relations and trends between resource utilization and KPIs the correlations between the resource utilization percentages (CPU and memory) and latency, MIR, and loss rate are depicted in Fig. 7, 8, and 9 respectively with a 95% confidence interval. As can be seen from these figures, CPU utilization has more significant relations with loss rate, MIR, and latency respectively, while as already demonstrated by corresponding Regression correlations, memory has less impact on the KPIs. To sum up, we can assume that the CPU has a higher influence on VNF-level performance, as it is confirmed in [17], therefore CPU is more likely to become a bottleneck resource rather than memory.
144 VOLUME 1, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

VI. PREDICTION METHODOLOGY AND RESULTS
As we discussed, different VNF orchestration decisionmakers can acquire proactive capability by having profiling information including profile-based predictions and analysis as their input. The profiling information can be used in two phases of the VNF DevOps cycle (development and operation).

Correlation between Loss Rate and (a) Memory Utilization, (b) CPU Utilization (with 95% confidence interval).
So far, different data analytical approaches were presented to analyze the impact of the underlying virtualized system's resource configurations and each VNF's input rate, on performance characteristics and resource utilization of that VNF, using the profile data. Accordingly, the metrics that have significant impacts on the VNF-level service performance are constructed and used by different machine learning classification approaches in the Prediction Module to make some performance predictions for proactive VNF management. Further in this section, we explain the profile-based prediction module of the proposed PDPA framework.

A. LABELING SYSTEM
Machine learning classifications as a supervised learning method require labeled data. Several previous classification studies apply another labeling system, called quantitative labeling which represents the percentage of the actual resource utilization by the VNF. It is discussed in [8] that applying only the quantitative resource utilization can misguide the network and service providers to make wrong management decisions since the predicted resource utilization values might be inaccurate results from old monitoring values. We propose a qualitative labeling system to describe different categories of the VNF resource utilization states. The VNF resource utilization states are categorized into five qualitative states: trivial, low, medium, high, and critical. These represent the set of target classes that the classifier tries to classify. Although it cannot specify how exactly a resource is used, it can provide enough information about the VNF resource utilization and not lead to wrong perceptions. Therefore, we apply qualitative labeling, inspired by the labeling policy in [35], and describe them as follows: where U i,j is utilization of resource i by VNF j, i ∈ {CPU , Memory, Link}, and l i m in L U i,j is the corresponding label when U i,j is between β i,j,m and β i,j,m+1 . The boundaries for resource utilization are adjustable parameters for each resource and VNF, assigned by the network operator or service provider. Service providers can set these boundaries according to their resource utilization policies and the initial resource requirements defined in the VNF descriptors.

B. PREDICTION MODELS AND TRAINING
We use TensorFlow and SciKitLearn to implement the proposed Predictor component of PDPA. The VNF performance dataset is used as the training data that is collected using the Profiler. The Pre-processor component removes any missing data or erroneous values and then scales all data using the standard scaler to standardize the range of features of data. We have explored different types of classification algorithms, including decision-tree-based algorithms (random forest (RF) and decision tree (DT)), artificial neural networks (ANN)(multi-layer perceptron (MLP)), Bayesiannetwork-based algorithms (Gaussian Naive Bayes (GNB)), and K-nearest neighbor (KNN) algorithm. We considered 11 neighbors for the KNN model. It is worth mentioning that, the MLP model (a feedforward ANN model, where each layer is fully connected to the following one) is the only model for that we performed hyperparameter optimization to fine-tune its hyperparameters. We applied the GridSearchCV method (an exhaustive search technique for finding optimal values of the hyperparameters) with five folds over a parameter grid to find the optimal values of the hyperparameters including, hidden layer sizes, activation function, learning rate, optimizer, L2 regularization term, and maximum iterations.
The explored parameter grid along with the obtained optimal values of the MLP model's hyperparameters are shown in Table 3. The input layer of the MLP model has a node for each input rate and resource configuration parameters, and the output layer has one node for the predicted resource utilization class. Moreover, for our multiclass resource utilization problem, we also used multinomial logistic regression (LR) which is a modified version of logistic regression that predicts a multinomial probability (i.e. more than two classes) for each input example. We considered newton-cg as the solver of the LR model.

C. EVALUATION METRICS
Given a trained machine learning classifier and a test set, the test outcome is divided into four groups: i) True Positive  FN )) is plotted as a function of the false positive rate (FP = (FP + TN )). ROC area is a robust metric for machine learning classifier performance evaluation.
• Recall: is a measure of completeness, which represents the ratio between the number of resource utilization classes that are correctly predicted and the total number of relevant classes (TP/ (TP + FN )).
• F-measure: is a metric that combines precision and recall in a single score, which is the harmonic mean of precision and recall. The formal definition of F-measure is [36]: • Mean absolute error (MAE) is used to calculate the error rate of each classifier-based model, which represents the success of prediction. If the predicted values on the test instances are p 1 , p 2 , . . . , p n , and the actual values are a 1 , a 2 , . . . , a n , for n data points, then MAE is formally defined as below [36]:

D. ACCURACY OF PERFORMANCE PREDICTION
In this section, we show the effectiveness of each machine learning classifier-based prediction model for each resource utilization of the VNF instance. In the following, we have shown the experimental results of LR, DT, RF, GNB, KNN, and MLP based resource usage prediction models. For the purpose of evaluating our approach, we employ the tenfold cross-validation technique on the dataset. The tenfold cross-validation technique splits VNF performance data into ten sets of size N /10. After that, it trains each model on nine sets and tests it using the remaining set. According to the procedure of cross-validation, this repeats ten times and we take a mean prediction result for each model. To show the effectiveness of each classifier-based model, we calculate and compare the prediction results in terms of Accuracy, MAE, Precision, Recall, F-measure, and ROC value defined above. In Tables 4, 5, and 6, we have shown the prediction results of each classifier-based model in terms of Accuracy, MAE rate, Precision, Recall, F-measure, and ROC value of CPU, Memory, and Link Utilization respectively. As shown in Table 4, we see that the correctly classified instances of RF are 75% and higher than other classifier-based models. Similarly, according to the experimental results shown in Table 5, we see that the correctly classified instances of RF are 93% and also higher than other classifiers. All experimental results shown in Tables 4, 5, and 6 show that, among different machine learning algorithms used to train the classifier, RF has higher precision for all CPU, memory, and link utilization prediction. The difference in prediction accuracy is due to different ''class'' generation results.
To evaluate the complexity of our work, we compared it with the complexity of NAP which provides the maximum automation and is the most similar approach to PDPA. Since both NAP and PDPA are offline approaches, the data collection step is the main factor for defining their total complexity. Therefore, they have the same complexity by applying WRCS as the corresponding benchmarking agent. In the PDPA approach, the average time taken to select the resources utilizing binary search for one record of performance profiles is less than 1.6 minutes, almost the same as in NAP. Although the average time to run and deploy the network service with three VNFs by the OSM in the PDPA implementation platform is 175.64 seconds, and in the case of the NAP platform, utilizing the docker container for each VNF, the average time to run the docker container is around 1.5 seconds [17].

VII. CASE STUDY
As we discussed in Section III, VNF profiles extracted by the proposed framework and profile-based predictions can be exploited by network service providers and administrators or different VNF orchestration components (LCM algorithms) to obtain essential wisdom about the VNFs and reveal actions to improve the system performance and resource usage. The VNF profile and profile-based predictions are used in two phases of the VNF DevOps cycle: • Before the deployment and during the development of a VNF (reactive mode): for example, the network service and function provider reactively specifies which resource configurations should be considered or where to place the VNF.
• During the operation (proactive mode): for example, VNF and resource manager proactively react to the predicted decrease in service performance or increase and decrease in resource utilization by making vertical and horizontal scaling decisions ahead of time. In this section, we explain a case study on how the extracted VNF profiles and predictions from the proposed automated profiling system improve the LCM decision, in the proactive mode, in a real-life scenario. For this case study, we consider VNF migration as one of the important high-cost resourceconsuming LCM actions. Due to the dynamic nature of the network, some physical nodes may not be able to provide sufficient resources, which leads to the migration of VNF to a new node with a sufficient amount of resources. The VNF migration is usually triggered by two situations such as link and node overload. link overload is when the link usage is higher than a predefined threshold and node overload is when both CPU and Memory resources are higher than their corresponding predefined thresholds.
We assume that one VM only hosts one VNF so that the VNF migration is degraded to the VM migration and it migrates the whole VM instead of a single VNF. We consider a migration policy based on a static VM resource utilization threshold. The VNF running on the VM, overreaching the utilization threshold, triggers VM migration to other hosts having enough amount of resources available to be consumed by it. VM migration is an expensive operation that causes service downtime, introduces overhead, and must be performed carefully. Both link-aware and node-aware VM migrations can be avoided by having the VM overload state predictions and simply scaling up the link and resources of the VM proactively before the overload state happens. Proper resource scaling can avoid overloading the VM's resources and consequent VM migration which in turn affects the reliability and consistency of service delivery. This is illustrated by profiling the performance of the Snort VNF (using the implementation that we described in Section IV) under different resource configurations and input data rates, which model the dynamic nature of the hosting VM, where the available resources and their corresponding demands are uncertain. We then select the RF classifier which has higher precision for all link, CPU, and Memory utilization predictions than other classifiers discussed in the previous section. It is used to derive a model from the VNF performance dataset to predict the overload state of resource utilization, in the function of allocated resources and input rate. The static VM link utilization threshold is defined as 95% of the available link, and the static VM CPU and Memory utilization threshold are defined as 80% of the available CPU and Memory respectively. A sample of data with resource utilization greater than the static threshold is labeled as an overload state and otherwise the normal state.
The objective is now to predict the overload state of the VM over several tests with stochastic configurations of resources and different input rates to evaluate profile-based predictions for defining VM link overload states and VM node overload states useful for the proactive link and resource scaling respectively which can avoid VM migrations. These tests are divided into seven filters (F1 to F7), as detailed in Table 7. The results in this table clarify the usefulness of the VNF profiles and their corresponding predictions. The number of predicted link and node overloads defines the number of times when the VM's link and resources need to be scaled up to avoid VM migrations.
As is represented in both link and node-aware VM migrations, the profile-based prediction model can predict most of the overload states in all filters. In the former case, if proper link scaling is proactively applied in these states, the number of link-aware VM migration decisions will be decreased by 93% in F1 and F2, 100% in F4, and more than 88% in other filters. In the latter case, if proper resource scaling is proactively applied in these states, the number of nodeaware VM migration decisions will be decreased in most of the filters, for example, 100% in F4, and 90% in F2. As it is also confirmed by the represented experiments in [37], in the case of proactive scaling, by predicting the upcoming VNF resource utilization, we can better utilize the limited and valuable resources over time. Without a VNF profile, we can never predict the upcoming link shortages to take necessary scaling actions proactively to make sure that a certain resource configuration offers enough service reliability.

VIII. PDPA INTEGRATION WITH NEXT-GENERATION WIRELESS NETWORKS
In this section, we explain how the PDPA framework aligns with next-generation networks and their vision to drive network automation and service orchestration for future smarter data-driven networks that learn and improve.
A. 5G ANALYTICS 5G networks embed analytics by using network data analytic function (NWDAF) in 5G Core to enable service optimization and deliver network automation. NWDAF is defined in 3GPP TS 29.520 as a new 5G component to provide analytics insights to other 5G Core NFs as well as orchestration and management (OAM) systems. NWDAF can be configured to invoke the existing OAM services to retrieve the management data that are relevant for analytics generation, which may include NF resource configuration information [38].
Likewise, NWDAF can interface with PDPA to collect performance profile information that is of interest for analytics generations. PDPA has two categories of outputs to be consumed by the NWDAF: (i). Insights in the form of VNFs' resource-performance correlations and predictions to the NWDAFs as an analytics consumer; (ii). Data in the form of VNFs' resource-performance dataset to the NWDAFs as a data consumer. NWDAF uses this VNF information to infer and provide operational insights to other 5G NFs as well as OAM systems to drive network automation and service orchestration. The NWDAF can also interface with the PDPA to collect the profile data of the corresponding network slice managed object (including the Network Repository Function serving the network slice, the NFs associated with the network slice) to generate analytics for proactive resource scaling and autonomous management of the NSs accordingly.

B. 6G DATA-SPECIFIC KPIS
Towards the 6G era, emerging new network services and use cases, empowered by technologies like AI and Blockchain, will have new QoS and performance requirements. Therefore, 6G systems will introduce new data-specific KPIs like age of information (AoI), the value of information (VoI), and semantics of the information to capture the real requirements of the new use cases and perform optimal resource slicing and allocation. For example, one of the emerging use cases is mission-critical machine-type communications (cMTC) and a dedicated cMTC management function is needed to allocate optimal resources to them appropriately. This functionality requires resource awareness information, gathered from devices to control resource utilization. The ''on-time'' delivery of information as quantified by novel metrics like AoI is an important aspect of this network function [39].
We believe that the PDPA framework can be applied for resource and performance profiling of such network functions having new data-specific KPIs. However, the proposed solutions for the benchmarking and monitoring agent (WRCS and Prometheus, respectively) of the PDPA framework need to be changed. So, we need to think of new benchmarking and monitoring solutions that have already been tested for such 6G data-specific KPIs, although, the community still needs to well understand and quantitatively define these KPIs beforehand. Then, by incorporating appropriate monitored metrics into our feature extraction process, the Predictor component of PDPA can predict data-specific KPIs of the VNF.

C. FUTURE HIGH DYNAMIC WIRELESS NETWORKS
Next-generation wireless networks, with new applications demanding high user mobility and accommodating innovative technologies, are expected to be extremely dynamic. Online learning is a potential extension to the PDPA, where the learning models can be continuously updated based on the newly generated profile data, to be adapted to changing network conditions and requirements in real-time. One possible approach to adding online learning to the PDPA framework is to retrain the offline trained models by using the online collected samples. To this end, the best offline-trained model is selected and is further trained by using online data gathered when the VNF is deployed and running in a live (production) environment. However, there are some limitations and challenges that need to be considered when retraining the offline-trained models in an online learning environment.
• Resource constraints: The amount of data required for model training is significant, and collecting and processing this data in real-time may require significant computational resources. This can affect the computational efficiency of the proposed method and may require the use of specialized hardware or distributed computing systems.
• Data collection: Additionally, the quality and diversity of the data collected in an online environment may not be as good as that collected in an offline environment (the online measurements are prone to noise), leading to a lower accuracy of the learning model. To mitigate the measurement noise, the measurements can be repeated several times and some form of regularization should be used to prevent overfitting to measurement noise and reduce model complexity. It may require additional monitoring and benchmarking tools to ensure the quality and diversity of the data collected in an online environment.
• Model training: To make the method suitable for online learning, it may require the use of specialized techniques such as incremental learning algorithms, which can minimize the computational cost of model updates while maintaining the accuracy of the learning model.
Overall, implementing online learning in PDPA may be feasible with some modifications, but it requires careful consideration of the trade-offs between computational complexity and the benefits of real-time model updates.
As another potential extension of the proposed PDPA framework, it can also be extended to incorporate physical node configurations of NFVI by making some adaptations. One approach is to modify the benchmarking and monitoring agents to incorporate physical node information, such as the different applied technologies of CPU and network interfaces. This information can be used to generate profiles of the VNFs with additional features that capture physical node configurations. These additional features can then be used as input to the machine learning models along with other performance and resource-related features. The machine learning models can then learn the correlations between physical node configurations and VNF performance, which can help in making proactive orchestration decisions. However, incorporating physical node configurations in the proposed method may increase the complexity of the data collection process and the machine learning models, and may require additional computational resources.

IX. CONCLUSION AND FUTURE WORK
Although the existing network function virtualization management and orchestration systems have provided a significant approach to the management of resources and LCM of VNFs, they lack the intelligence to support the orchestration capabilities autonomously. On the other hand, the uncertainty of the performance and resource utilization behavior of VNFs will cause the severe inefficiency of the most mathematically driven models to improve resource efficiency and ensure service KPIs. In this work, we propose the use of the VNF profile for creating some analytical approaches to exploit the profiled data and model the VNF-level performance and resource utilization behavior of services. The proposed profile-based data-driven framework provides intelligence to the NFV-MANO or service operator to map the service performance specifications agreed in the Service Level Agreement to an adequate resource allocation towards efficient and proactive resource management and life cycle management in the virtualized infrastructure.
In the future, we will integrate the PDPA framework with different VNF resource management, VNF placement, migration/scaling, and mobility management algorithms by proactively predicting performance drops and resource demand bursts. We will also expand the PDPA framework towards the inclusion of the following considerations: • Hierarchical computing NFVI (edge device, edge server, cloud server) and their respective resource configurations; • Dynamicity of the network and service conditions by making Online-Learning by obtaining real-time information of VNFs; • Incorporating different aspects of the entire NFVI, such as physical node configurations.