iOn-Profiler: Intelligent Online Multi-Objective VNF Profiling With Reinforcement Learning

Leveraging the potential of Virtualised Network Functions (VNFs) requires a clear understanding of the link between resource consumption and performance. The current state of the art tries to do that by utilising machine learning and specifically Supervised Learning (SL) models for given network environments and VNF types assuming single-objective optimisation targets. Taking a different approach, iOn-Profiler poses a novel VNF profiler optimising multi-resource type allocation and performance objectives using adapted Reinforcement Learning (RL). Our approach can meet key performance indicator targets while minimising multi-resource type consumption and optimising the VNF output rate compared to existing single-objective solutions. Our experimental evaluation with three real-world VNF types over a total of 39 study scenarios (13 per VNF), for three resource types (virtual CPU, memory, and network link capacity), verifies the accuracy of resource allocation predictions and corresponding successful profiling decisions via a benchmark comparison between our RL model and SL models. We also conduct a complementary exhaustive search-space study revealing that different resources impact performance in varying ways per VNF type, implying the necessity of multi-objective optimisation, individualised examination per VNF type, and adaptable online profile learning, such as with the autonomous online learning approach of iOn-Profiler.


Introduction
The rise of Cloud computing, Software-Defined Networking (SDN) and Network Function Virtualisation (NFV) have caused a paradigm shift from traditional networking based on specialised hardware to utilising general-purpose programmable hardware as a resource for running Virtualised Network Functions (VNFs).This change has simplified the design, deployment, and management of network services, with network-based service providers offering Service Level Agreements (SLAs) to their customers that outline performance requirements and Key Performance Indicator (KPI) levels (e.g., throughput, packet loss, response time, processing latency, and so forth).
At the core of achieving SLA goals lies the essential process of VNF profiling.It involves the systematic analysis and characterisation of different VNFs within a programmable SDN environment.The primary objective is to understand each VNF's individual resource requirements, performance expectations and operational behaviour by discovering the relationship between resource configuration and performance.This knowledge enables service providers to decide the (i) optimal allocation of network and computation resources such as CPU or bandwidth, for each VNF instance, while (ii) ensuring adherence to predefined KPI thresholds after SLA goals.VNF profiling is undertaken by a "VNF profiler" and the resulting profile describes a discovered reciprocal mapping between optimised resource allocations and the KPI thresholds for the respective VNF, which enables knowing the expected performance after allocating resources and vice versa.
In the context of contemporary networks such as 5G and future 6G, attention to profiling is driven by its significance for NFV MANagement and Orchestration (MANO) systems.The latter can use VNF profiles to instantiate Network Services (NSs) by adapting optimised resource configurations.Moreover, profiles can be used to optimise the life-cycle management of running services.As an example, the 5G-VIOS [1] common interfacility orchestration platform leverages autonomously generated [2] profiling models to deploy and orchestrate inter-edge NSs across multiple domains and facilities by (i) autonomously assigning optimised resource configurations to inter-edge NSs while also (ii) exposing corresponding performance profiles.
The current work presents iOn-Profiler, an online VNF profiler that leverages adaptive Reinforcement Learning (RL).In summary, our most significant and novel contributions are: 1. Online, multi-objective optimisation profiling: We investigate RL-based adaptive VNF profiling for minimising the use of both compute and network resources, as well as finding the Optimum Output Rate (OR).The latter stands for the output rate achieved by the profiled VNF under an optimal (i.e., minimum) resource configuration that meets KPI targets.2. Pragmatic VNF case studies: We consider three pragmatic VNFs in our experimental study, namely a virtual FireWall (vFW) and two different modes of the Snort [3] open source intrusion prevention system (Inline and Passive modes).Besides pragmatic, these VNFs span both dissimilar and similar features, allowing to assess functionality footprint on resulting profiles.3. Oracle exhaustive search: We conduct an exhaustive search of resource-to-KPI combinations for all VNFs involving all resource types to establish an all-possible performance knowledge and understanding of the impact importance of different resource types on the performance of different VNF types.Then we utilise this Oracle-gained insight to carefully explore and tune the RL reward function parameters of iOn-Profiler.4. Extensive experimental analysis: Overall, our evaluation study highlights that different resources impact VNF performance in distinct ways.Besides Oracle search, this conclusion is also established through the analysis of each VNF type's Pareto front over a total of 39 scenarios (13 per each of 3 VNF types).Our highlight results and conclusions include: • Multi-objective optimisation is necessary for proper VNF profiling.
• There is a strong requirement for studying each VNF type and mode of operation individually such as demonstrated for Snort (Passive vs. Inline modes).• Online learning is significant, as fixed Supervised Learning (SL) models lack adaptability to dynamics.
The rest of this article is organised as follows.Section 2 provides the reader with the necessary background context.Section 3 discusses the design of iOn-Profiler.The experimental setup, resource and model configurations are described in Sec. 4. Our experimental evaluation is presented in Sec. 5 followed by our future work and conclusion in Sec. 6.

Background & Motivation in Intelligent VNF profiling
We discuss the essential background context, encompassing the problem statement, the specific objectives set for the proposed iOn-Profiler solution, and an in-depth analysis of the state of the art in intelligent VNF profiling.

Problem statement & utilising machine learning
Let I be the set of all considered resource types i ∈ I, and K be the set of every considered KPI type k ∈ K. Also, Let x = {x 1 , x 2 , ...x v } be the decision vector of KPI threshold targets for allocating resources.Each threshold target x k corresponds to KPI k and can get a value only from the partition set T k defined below.Last, x ∈ X, where set X is the feasible set of decision vectors.Let the set T k = {τ 1 , τ 2 , ...τ ω } be an partition set of considered performance thresholds for KPI type k.Last, let f i (x) be the allocated amount for resource type i, given the KPI threshold targets x.We define the following multi-objective optimisation problem: To consider maximum KPI thresholds as well (e.g., for packet drop rate), we adopt appropriate minimum and maximum threshold constraints τ min As detailed in Sec.2.3, Machine Learning (ML) poses a dominant trend in the VNF profiling literature due to its adaptability to complex environments.Compared to other types of prominent works based on linear programming and heuristics (e.g., [4,5]) ML solutions delve deeper into VNF-to-resource specifics, with core challenges captured better: First, ML can capture better network and service dynamics, particularly regarding 5G and 6G programmable networks due to their agility.Second, they can do so within practical time-scales despite the NP-hardness [6] of the underlying optimisation problem, by converging towards optimised configurations involving different resources and subject to KPI targets.
Prominent examples of ML models used for profiling include Linear Regression [7], K-Nearest Neighbors Regression (KNNR) [8], Interpolation [9], Artificial Neural Networks (ANNs) [10], and Curve Fit [11].However, it has been shown [12] that regression is not well-suited for predicting saturation regions, while SL models like ANN and KNNR, along with Interpolation, do not provide configuration trends with a monotonic rising function.In contrast, Curve Fit achieves high accuracy in predicting VNF performance but is limited in multi-objective resource optimisation.Moreover, SL models explored for service-level VNF profiling and placement may prove suitable under static conditions [13], however, they can significantly underperform under dynamic network conditions [14] such as in contemporary networks.Last, an important weakness of most ML works is their approach to profiling as a single-objective (i.e., single resourcetype) optimisation problem, hence lacking realism as most VNFs need more than one resource types, posing a non-linear impact of allocated resource amount combinations on resulting KPIs.

Solution objectives
The problem statement presented above establishes the context for the current solution effort, which revolves around four primary research objectives: Objective #1: The profiler should accommodate multiple resource types and KPIs, and must efficiently converge towards optimised VNF configurations within practical timeframes, despite the NP-hardness of the underlying optimisation problem.
Objective #2: Leverage online learning ML techniques to effectively adapt to the dynamics of contemporary networks.
Objective #3: Investigate the impact of different and pragmatic VNF types, with varying functionality features, on optimal resource allocation concerning specific KPI targets.
Objective #4: Conduct a comprehensive evaluation by comparing the proposed online learning solution against state of the art SL-based VNF profiler models.
In pursuit of these objectives, iOn-Profiler extends our prior work of [15] with an (i) in-depth analysis of the complex results obtained from an exhaustive search of resource-to-KPI combinations, to (ii) gain a comprehensive understanding of the relevance of resource-to-KPI and resource-to-VNF type relationships, thus enabling to (iii) fine-tune the parameters of the online learning model in iOn-Profiler.Additionally, our extension involves considering (iv) a broader set of pragmatic VNF types, encompassing different features, and exploring (v) multiple optimisation scenarios per VNF.Lastly, the article presents a (vi) meticulous experimental evaluation of state of the art SL-based VNF profiler benchmark models, including Random Forest (RF) and Multi-Layer Perceptron (MLP).The presented research aims to advance the field of VNF profiling and contribute valuable insights into enhancing the efficiency and performance of future network architectures.
Compared to the rest of the state of the art in ML-based profiling (elaborated in Sec.2.3), iOn-profiler is designed to cover existing gaps (see Tab. 1).We go beyond single-objective optimisation by utilising RL to better fit real-world applications while being adaptable to network dynamics.We exploit carefully designed reward functions for the multi-objective optimisation of virtual Central Processor Unit (vCPU), memory, and network Link Capacity (LC) resource allocations that can achieve desirable VNF KPIs targets such as the CPU utilisation, memory utilisation, latency and Optimum OR.
To do so, our comprehensive study considers a wide spectrum of different scalarisation weights among vCPU, memory and LC objectives, which describe the Pareto front of optimised resource-to-KPI combinations that we wish to approach in 39 scenarios.The Pareto front is a concept representing the set of non-dominated solutions1 .When it comes to VNF profiling the state of the art frequently ignores the Pareto front, posing a major research gap.Even when considered, this refers primarily to SL approaches tailored as "static" models trained for a given VNF type, under specific conditions (e.g., network structure or traffic), and therefore cumbersome or even impossible to generalise, if realistic at all for the highly agile and dynamic contemporary programmable networks.

State of the art
The state of the art discussed below is summarised and compared in Tab. 1.First off, various important works [17,18,24] have explored offline profiling for VNF Service Function Chains (SFCs) with a focus on different resources.Regarding optimal VNF placement and profiling, RAVIN [21] introduces a resource-aware algorithm based on the Balanced Best Fit Decreasing (BBFD) heuristic algorithm.It enforces performance SLAs for multi-tenant NFV servers while balancing resource use, aiming to minimize server count, guarantee performance, and improve resource utilization, including the processor's Last Level Cache and Memory Bandwidth (MB).However, extensive offline profiling by exploring all possible VNF configurations such as in the aforementioned works is time-consuming, leading to the development of models that focus on limiting the profiling time such as in [18].Nonetheless, and unlike our current effort in iOn-Profiler, endeavours like [18] do not encompass the concurrent consideration of pivotal KPIs such as vCPU utilization, memory utilization, latency, throughput, and packet loss.In another study by [23], researchers address the challenges of diagnosing NFV performance and introduce a metric referred to as the Coefficient of Interference.This metric quantifies the variations observed in latency measurements on a per-packet basis when performance diagnosis is applied against cases in which it is not employed.Last, other works, such as ORCA [16] and z-TORCH [6], have streamlined the profiling process for data collection and optimal VNF placement.However, these approaches may not consider optimal KPIs and pre-defined resource configurations.Last, other notable contributions in the literature include the NFV-Inspector [20], an automated profiling and analysis platform, and the work of [19] utilising ML techniques such as Interpolation, Gaussian Process, ANN, and Linear Regression for predicting VNF performance.
Regarding our own contributions to the field of VNF profiling, the Novel Autonomous Profiling (NAP) method [2] focuses on offline autonomous profiling by identifying the initial optimal resource configuration for each standalone VNF based on a weighted resource configuration selection approach.Furthermore, our most recent work of [22] introduces a novel autonomous temporal profiling technique, examining VNF behaviour across performance and resource utilisation aspects.The proposed technique automates the profiling processes, encompassing diverse resource types like computation, memory, and network resources, to yield deeper insight into VNFs resource-performance correlations.Finally, further to our prior works and, particularly, NAP [2], the current method in iOn-Profiler spans an offline training and an online learning phase that enables adopting network dynamics at deployment time, both grounded in RL.Moreover, iOn-Profiler deploys VNFs on the established MANO platform, namely Open Source MANO (OSM) [25], and suggests a multi-objective VNFs profiling strategy also grounded in RL.Fitting RL profiling agents into a more complex RL model-based orchestration scheme is possible with a hierarchical RL structures [26] allowing to place VNFs to nodes and allocating resources there leveraging multiple sources of information spanning from VNF profiles, system-wide resource usage information [26] and even service consumers mobility [27].
In conclusion, using ML for VNF profiling has been extensively studied in various domains.These studies collectively demonstrate that ML significantly enhances the accuracy and other qualitative features of VNF profiling compared to traditional methods.As the field of intelligent VNF profiling continues to evolve, further advancements in ML-based approaches hold promise for improving network performance and resource optimisation.In this context, this paper stresses the advantages of RL compared to other intelligent solutions, covering all features but that of SL model used in Tab. 1. 3 iOn-Profiler model design

Integration into the next generation NFV MANO
Figure 1 illustrates the interaction between our proposed iOn-Profiler, NFV Orchestrator (NFVO), Virtualised Infrastructure Manager (VIM), and monitoring tools to provide an intelligent and autonomous NFV MANO system.The diagram not only shows interaction but also demonstrates the integration of the iOn-Profiler into next-generation intelligent NFV MANO.Through online profiling, the configuration of resources is selected and dynamically updates existing virtual network function (VNF) descriptors.As a result, the MANO system deploys a network slice with the newly defined resources.In Fig. 1, we outline iOn-Profiler's architecture.

Offline Profiling
Given a series of resource availabilities and a number of KPI targets, iOn-Profiler employs the NAP method [2] to select a baseline resource configuration.Resources2 and KPIs types can be arbitrary, provided they are described by a value in a totally ordered bounded set with at least 3 elements (see Sec. 4.2, Tables 3 and 4).NAP is based on the concept of optimal Input Rate (IR) and OR and can be divided in 3 stages.The optimal IR is the maximum IR (in packets per second) associated with a specific resource configuration for which the system under test still respects all KPI targets, and the optimal OR is the OR associated with it.
In stage 1, NAP employs exponential ramp-up and binary search to find the optimal IR for each resource's upper/lower bounds while other resources remain at their median.Using these values, weights are calculated to measure resource influence on performance.In stage 2, NAP uses weighted random selection for applying resource configurations, measuring IR, OR, and KPIs.In stage 3, NAP trains a model to estimate minimum resource allocation based on IR and KPI targets.The method uses the NFVO and VIM to deploy the VNFs, a traffic generator and a monitoring probe at each step.Traffic generators can be employed to overcome (a) a possible lack of available real traffic datasets and (b) the need for fine-tuning the IR as required by the algorithm.We refer to the above as offline profiling as it needs a dataset implying the control of IR for generating arbitrary network traffic conditions, and training before the model can be used in production.

Online Multi-Objective Optimisation Profiling
After deploying the VNF with baseline resources, the iOn-Profiler employs Q-Learning (see Sec. 3.2) to address possible discrepancies after moving from the staging environment where the Offline Profiling regression model is trained, to an online dynamic environment.Possible disparities are recognised when the target KPI thresholds are breached, prompting the resetting of the exploration rate and other learning parameters (see Sec. 3.1.2).This continuous optimisation tries to minimise resource usage without violating KPI targets and to improve allocation accuracy.Therefore, the optimisation objectives need to match the same set of resource types selected for the offline profiling, subject to the same restrictions.Each action uses the NFVO and VIM APIs to scale in/out the VNF instance and the exposed monitoring capabilities.Last, the term online profiling is due to the profiler (i) observing only existing network traffic without control over IR, and (ii) being used in a production environment.Call Algorithm 2 to find a ′ based on s ′ that gives the maximum Scalarised Q-value

Multi-objective reinforcement learning model adaptation
Algorithm 1 describes our multi-objective RL approach to optimising resource allocation for a given type of VNF.This approach is aimed at addressing a Markov decision process by dynamically constructing Q-tables (Q o ) for each optimisation objective (o) that stores the estimated discounted sum of future rewards for each possible action (a) at a given state (s).The Q-tables gradually converge by exploring the action space and performing updates based on the recursive Bellman equation (shown in line 17) Our model considers the following definitions for state (a), action (a) and reward (R o ):

State
A vector that encompasses allocated resources (e.g., vCPU cores number) in addition to the measured KPIs (e.g., vCPU utilisation) and OR.

Action
The set of feasible actions encompasses increasing, decreasing, or preserving resource assignments.These actions induce shifts between various states of allocation (e.g., incrementing/decrementing the number of vCPU cores).In terms of action choice, we employ a scalarized ϵ-greedy algorithm, which facilitates the selection of actions that optimise individual rewards for each resource category by selecting the action with the highest reward with probability 1 − ϵ.

Reward Function
To find the reward function for each VNF type, we consider and optimise the parameters of the following reward function model.For each resource type (i.e., objective), we use the zedoid (i.e., a reverse sigmoid) general formula of f (x) = 1 1+e x .Zedoid function allows to adaptively/gradually yield reward values that decay with increased resource allocations and vice versa.Therefore, the rewards promote a more cost-efficient use of resources.We adopt an appropriately parametrised (discussed in Sec.4.4) version of the zedoid function depicted in Fig. 2. The adopted zedoid is shifted by 0.5 units.This is a desired transposition of the zedoid curve so that reward values reflect meaningful (i.e., positive) resource allocations over the x-axis.It is worth noting that in Fig. 2, the blue solid graph curve and the translated red dotted curve follow the 1 1+e 8x and 1 1+e 8(x−0.5)formulas, respectively.We also impose a penalty for constraint violation (including KPI targets) by mapping the computed value to 0. The general formula of the adopted reward function for each resource type is defined in (2): where ô is allocated resource, i.e., the number of allocated vCPU cores, the amount of allocated memory or LC; and β is the steepness coefficient of the resource reward function that defines a desired curve steepness best fitting a resource type's adaptability to allocation changes.We return and optimise the selection of β in Sec.4.4.We extend the scalarisation function from single to multiple objective calculations, as in Algorithm 2. For each action at line 2 -3, Q values from all objectives are put in a vector as (3).Note that m refers to each optimisation objective.This vector and a weight vector w = (w 1 , w 2 , . . ., w m ) are applied to the scalarisation function f (v, w) to calculate the scalarised Q-value (SQ) according to (4).The sum of all weights must be 1.At line 5, SQ is appended to the SQlist.Finally, at line 7 the algorithm returns the action a ′ corresponding to the highest SQ.

Solution complexity and practical costs
The execution (time and memory) complexity of iOn-Profiler is defined by the interaction between actions and the state space of the underlying Q-learning process.Each state includes the allocated resource values, KPI thresholds, parameters referring to the input passed to the VNF (e.g., input requests traffic), and last, the observed KPI measurements.As such, the state space complexity is defined by the count of (a) the resource types considered, (b) the input types, and (c) the targeted KPIs.Given the former, the asymptotic execution complexity is also a function of (i) the granularity of possible resource assignment levels per resource; (ii) the number of resources; (iii) the measurement granularity per KPIs; and last the (iv) possible input levels per each input type.
Further to the problem definition (Sec.The above gives an an asymptotic upper bound for the space complexity and time needed to explore the whole space of Practical implementations of Q-learning solutions set thresholds for action steps, hence reducing memory and time costs significantly.Another aspect of time costs refers to the adopted learning rate in RL. The demonstrative implementation setup considered in the current paper assumes three types of resources; one input type: input traffic to the VNF; and four target KPIs with one threshold target each (see Tab. 4).As such, the state space is represented by a vector of nine elements, including the allocated vCPU cores, memory, and output LC, the four KPI measurements, and input traffic value, and the computed scalarised Q value.Given the adopted resource configuration values in Tab.

Experimental setup
Fig. 3 depicts our profiling experimental setup assuming Snort or vFW as the VNF instance.It shows the connection between the profiled VNF on the one hand, and the traffic generator and server end-point machines on the other.The two end-point machines have two vCPU cores, 2 GB of memory and 10 GB of storage.For simplicity, we employ iPerf as a traffic generator with UDP packets, noting that active data collection is more suitable for RL than static datasets.
We employ the Prometheus and Node exporter monitoring tools to gather the following metrics: vCPU utilisation, memory utilisation, and ingress and egress traffic rates to and from the VNF, respectively.Additionally, we calculated the mean Round Trip Time (RTT) using the ping utility.In addition, the duration for the offline profiling was set to 48 hours for each VNF model.The software tools and frameworks used in this study are outlined in Tab. 2.

VNF type scenarios
We evaluate the performance of our proposed method using three different types of VNFs as our experimental scenarios.These VNFs cover a range of scenarios and demonstrate varying sensitivities to different resources.For example, the  performance of the copying VNFs may be more impacted by memory utilisation, while the intercepting VNFs may be more impacted by vCPU utilisation.

Snort (Inline mode):
The Snort VNF operates as a traffic gateway between network segments and inspects all incoming packets before forwarding them to the destination.This mode slows down traffic transmission and may block suspicious packets.

Snort (Passive mode):
The Passive mode Snort VNF operates outside of the direct traffic path and copies incoming traffic to detect suspicious activity.This mode raises a different set of resource needs compared to the Inline mode, as shown by our evaluation results.

Virtual Firewall (vFW):
Allows packets to pass only through specified ports towards the destination server.

Resource and KPI targets configuration
We consider vCPU cores, memory and LC as our profiled resources.Tab. 3 shows the upper and lower bounds for their configuration values, chosen in accordance with our experimental environment and the specs of the considered VNF types.The iPerf traffic generator client transmits UDP packets with an initial traffic rate of 50 Mbps to the destination iPerf server.The traffic rate gets gradually increased, and the assumed KPI thresholds are specified in Tab. 4.

Rewards configuration
We conducted an experiment-based parameter tuning of our reward functions to optimally adjust parameters, such as the steepness coefficient β, to the unique requirements and characteristics of the three different VNF types and to the impact of the three different resource types on profiling performance.This resulted in 9 individual reward parameterisations.
To achieve this, we analysed each resource type for each VNF type in isolation.This involved using a fixed resource allocation value for the other two resource types to speed up the process.The mean values of the other two resource types, which yield optimal allocation of the investigated resource type in a controlled environment (i.e.minimum resource usage for maximum OR), were used as the fixed values.• vCPU (R cpu ; β = 7).Fig. 6(a) demonstrates an experiment where the model was trained to vary vCPU values from 0.6 to 1.8 cores while maintaining memory and LC at 1300 MB and 600 Mbps, respectively.Using R cpu , it can be seen that the vCPU cores get reduced from 1.40 to approximately 0.87 at episode 1450.However, we find the minimum vCPU cores faster using β = 7.
• Memory (R mem ; β = 7).For R mem in Fig. 6(b), the system using the R mem reward function, tries to make adjustments to decrease the memory from 1430 to 1140 MB when the vcpu cores are 1.2 and LC is 600 Mbps.However, the best reward function based on the minimum memory is R mem for β = 7 at episode 1600.
• Link Capacity (R lc ; β = 9).As shown in Fig. 6(c), we use R lc to find the minimum LC where vCPU cores are 1.2 cores and memory is 1300 MB.However, β = 9 can significantly reduce LC from 686 Mbps to 482 Mbps.

Performance study
We conduct a comprehensive search in Sec.5.1 to discover an "Oracle" model of optimal profiles in a simulation environment.These optimal solutions set the ultimate performance targets for our RL model.Moreover, a practical assessment of our approach requires a comparison against intelligent models thus we train SL models and compare their performance against online learning over a dynamic environment with growing dataset size, so as to draw adaptability conclusions.

Oracle resource allocation (exhaustive search study)
The results presented in Fig. 7, 8 and 9 correspond to each VNF type, namely Snort for Inline mode, Snort for Passive mode and vFW, respectively.Each figure contains 5 graphs that portray performance after an exhaustive exploration of resource allocation combinations towards identifying an optimal trade-off combining a minimum of resources for optimal performance in terms of OR.All performance measurements are based on the mean values of at least 30 recorded instances from a dataset attained during the offline profiling stage, alongside corresponding 95% confidence intervals.Graphs (a) and (b) in Figures 7, 8 and 9 show the mean OR against the number of allocated vCPU cores and LC, respectively, for different allocated memory levels mapped to each curve in the graphs per each VNF type.Their purpose is to pinpoint a minimum of resource allocation on the x-axis for which the OR on the y-axis converges to a maximum mean value.Specifically, Graph (a) in each figure above illustrates the impact of vCPU cores on OR with a fixed LC of 600 Mbps, while Graph (b) shows the impact of LC with a fixed allocation of 1.2 cores for vCPUs.Fixing these values serves to focus on the direct relationship between pairs of values.Note that fixed values are carefully selected to accommodate Optimum ORs after preliminary test runs.Graphs (c) and (d) plot mean OR (orange curves) compared to consistently increasing LC levels (blue curves).The y-axes show bit-rates against increasing LC levels grouped by increasing vCPU cores or memory for (c) and (d), respectively.If these two curves identify, then the LC is best-utilised, with the best resource combinations achieved at optimal (i.e., maximised) OR levels.As with (a) and (b), we keep memory fixed at 1300 MB for (c) and vCPU at 1.2 cores for (d).Last, Graph (e) in each figure shows on the x-axis increasing memory levels grouped by incrementally increasing vCPU cores: 0.6, 0.8, . . ., 1.8, given fixed 600 Mbps.

Snort (Inline mode)
The graphs of Fig. 7(a) and Fig. 7(b) show three curves corresponding to 1300 MB, 1500 MB and 1600 MB memory levels.We observe that the OR grows with the number of vCPU cores in the range of 0.6 -1.4 cores regardless of allocated memory in Fig. 7(a), excluding the case of 1.0 -1.2 vCPU for the 1500 MB and 1600 MB memory curves due to outliers as denoted by confidence intervals.The OR also increases with LC in the Graph 7(b) for all memory curves.By comparing the different memory curves, increased memory results in a higher OR.Based on the above, we can conclude that the total allocation of all resource types collectively affects the OR.Also, OR converges to a maximum of ~550 Mbps after vCP U = 1.4.Another important conclusion from Graph (b) (also backed by conclusions below after Graph (d)) is that the OR for a memory of less than 1500 MB ceases to increase and is, thus, sub-optimal.At the same time, an increased memory allocation at 1600 MB does not increase OR further.Regarding Graphs (c) and (d) of Fig. 7 the optimal utilisation of LC can be achieved with minimum vCPU 1.4, as OR for increasing LC in Graph (c) slowly converges and finally identifies with LC at a minimum (i.e., optimal) allocation of vCPU 1.4.Note that this is consistent with the observation from Graph 7(a) (see above).The best LC utilisation can be achieved with a minimum of memory (1500 MB), as OR in Graph (d) for increasing LC identifies with LC for a minimum (i.e., optimal) memory level of 1500.For completeness, we note that lower memory levels like for 1100 MB show a linear (but not identifying) trend between OR and LC curves, yet with large confidence intervals.Last, Graph (e) of Fig. 7 leads to the conclusion that OR (orange curve) generally increases with vCPU until before vCP U = 1.4 irrespective of some large confidence interval values, and then converges for vCPU≥1.4.This is consistent with the observation from Graph 7(a), and with the conclusion from Graph 7(c).

Snort (Passive mode)
Likewise to Snort for Inline mode, the conclusions for each graph of Fig. 8 are as follows.Regarding Graph (a), increasing vCPU cores causes a higher OR.However, the OR converges and remains at around 525 Mbps in the range of 1.0 -1.8 vCPU cores.This holds for all memory level curves, for which LC results strongly identify.Graph (b), on the other hand, shows that the OR increases in an almost linear function with LC at 1.2 vCPU cores at all memory sizes.In addition, Even though we added more memory across all vCPU cores, as also shown in Graph (e), the OR at each vCPU core remained the same.Therefore we conclude that memory does not impact OR.This is due to this VNF type's different nature compared to Snort (Inline mode), with the latter needing memory resources to inspect packets before forwarding them.The OR in Graph (c) is similar to LC in the range of 1.0-1.8 of vCPU cores, while in Graph (d) OR changes along with LC across the memory range.We conclude that the vCPU cores and LC affect OR, but memory does not.

virtual FireWall
The graphs of Fig. 9 for the case of vFW are similar to the ones for Snort (Passive mode), where the OR depends on the LC across the vCPU core and memory range.Nevertheless, for vCPU equal to 0.6 in Graphs (a) and (c), the OR also

Online learning profiling performance
We assess iOn-Profiler's Q-Learning adaptation of Algorithm 1 in a dynamic environment where the dataset size grows at run time.We compare the predicted resources to those obtained in a static environment with a static dataset for each training episode, as shown in Fig. 10 and Fig. 11.We calculate the percentage error in resource allocation compared to the optimal allocation, along with 95% confidence intervals for each resource type and the reported results refer to a scenario of 4 equal resource weights w( 1 ⁄3, 1 ⁄3, 1 ⁄3) and tuned parameters β for each resource in a static environment.

Setup and training of SL benchmarks
For the MLP and RF benchmarks, we forecast the resources required periodically at landmark episodes (where RL performance is recorded and depicted in Figures 10-12) by training the models on the available dataset collected up to that episode.Then, we split the dataset into a 90:10 ratio for the training and test sets, and normalised it using min-max feature scaling.This approach allows us to conduct a fair comparison between RL and the benchmarks over the same training data.
The input variables for the SL predictions include vCPU and memory utilisation, latency, and Optimum OR, while output variables include the number of vCPU cores, memory, and LC.The number of trees in the RF is set to 500, 500, and 800 for Snort (Inline mode), Snort (Passive mode) and vFW, respectively.The MLP parameters are described in Tab. 5.

Snort (Inline mode)
Graphs (a), (b), and (c) of Fig. 10 show the resource allocation percentage error for Snort with Inline mode.According to (a) and (b), RL has less vCPU and memory percentage error than MLP and RF.As for Fig. 10(c), MLP and RF do not significantly reduce LC whereas the RL gives a lower percentage error.We can infer from the data above that RL can provide a lower percentage of prediction resource error than MLP and RF.The underlying reason is that RL learns to reduce resource consumption from past events.In contrast, the resource allocation percentage error of MLP and RF are high because they use a static trained model that makes them unable to adapt to reduce resource consumption.

Snort (Passive mode)
According to Graph (a) of Fig. 11, RL produces a lower percentage error regarding vCPU cores than MLP and RF.In terms of Memory, Graph (b) shows that RL yields no error contrary to MLP and RF.In Graph (c), the LC percentage error is negative and quite similar for RL, MLP, and RF.Therefore, RL is more accurate for Snort with Passive mode as it yields lower error percentages than MLP and RF, notably for vCPU and memory.The analysis presented in Figure 12 focuses on the performance of the vFW VNF.In general, our online RL model exhibits notably superior capabilities for predicting resource allocation compared to the benchmarks across all resources.When examining Graphs (a), (b), and (c) after 150-175 episodes 5 , the RL model demonstrates mean percentage errors of 9%, 2%, and 5% for vCPU cores, memory, and LC respectively.In contrast, the MLP and RF models yield errors of 52% and 18% for vCPU cores, respectively, and produce a 37% error for memory, and -5% and -6% of error, respectively, for LC.Noteworthy, all models achieve an error close to 0% for LC, posing a significant finding considering the substantial impact of LC on the performance of the vFW.However, negative errors by the benchmarks indicate under-provisioning predictions compared to the required LC.The over-provisioning predictions made by the RL model are preferred over the under-provisioning exhibited by the benchmarks, as the latter results in sub-optimal OR performance of the vFW.

Resource optimisation scenarios
We examine the impact of resource objectives on performance as a result of the resource type importance on the optimisation problem.The latter is captured via weighted parameters in the scalarised Q-Learning equation of formula (4).We investigate 39 scenarios (13 per VNF type) with different RL resource allocation objective weights, with our findings presented in Tables 6, 7, and 8 for each type of VNF.

Snort (Inline mode)
Our findings in Tab.6 show that all weight configurations can reduce vCPU core usage to around 40%.Specifically, setting the weight of vCPU to one (w(1, 0, 0)) leads to a 40% reduction in vCPU usage, but also results in an 80% reduction in memory usage and a low link utilisation of 34.38% (as expressed by OR /LC).On the other hand, weight configurations such as w( 1 /2, 1 /2, 0) and w( 1 /2, 0, 1 /2) increase vCPU usage, reduce memory usage, and increase link utilisation to almost 80% respectively.Our analysis reveals that the weight of LC has a higher impact on vCPU usage than the weight of memory reward.Furthermore, the weight of vCPU and the weight of LC do not affect memory usage.The steady-state LC utilisation for w(0, 0, 1), w(0, 1 /2, 1 /2), and w( 1 /2, 0, 1 /2) is 92.73, 91.70, and 79.54 respectively.If the weight of vCPU is increased, the OR/LC decreases significantly, while the weight of memory has no significant effect on the OR/LC.Finally, compared to schemes with high resource weighting or equal weighting, w( 1 /2, 1 /2, 0), which only weights vCPU and memory, does not increase link utilisation.As a result, considering the weight of LC is critical in enhancing LC utilisation.

Snort (Passive mode)
The case of only vCPU w(1, 0, 0) in Tab.7 minimises demand for vCPU cores to 40%.vCPU usage can be reduced to 44% and 42% also in the case of w( 1 /2, 1 /2, 0) and w( 1 /2, 0, 1 /2), respectively.Memory weight has a slight impact on allocated vCPU cores.Memory and LC weights have a minor impact on allocating vCPU cores.The weight of vCPU has more impact on memory allocation than that of LC.The weight of vCPU has a greater influence on the LC utilisation than the memory weight.In conclusion, Snort Passive mode with w( 1 /3, 1 /3, 1 /3) can effectively reduce resource usage while achieving a high link utilisation OR/LC.

Highlight conclusions & limitations
Link Capacity is more untactful on OR performance in Snort Passive mode than in Snort Inline mode and vFW because traffic is forwarded directly to the destination without being inspected before forwarding.vFW gives the highest OR to LC ratio at around 83.63% compared to Snort Inline mode (58.50%) and Snort Passive mode (57.36%).Because vFW drops packets incoming to unallowed ports and forwards packets from allowed ports, packet delay does not occur in this VNF.But unlike vFW, Snort Passive duplicates packets with a latency stop before forwarding them, and Snort Inline packets must be inspected before being sent to the output link.This inspection delay causes congestion in the output link.When considering the effect of the weights of each resource's reward function on reducing the corresponding resource while maintaining the OR, we find that the behaviour of each VNF is different.Finally, we acknowledge the following experimental limitations.First, the performance of vFW and Snort can fluctuate under a constant resource allocation, depending upon the number of configuration rules loaded into the system.Our method assumes that all configurable aspects of VNF behaviour, aside from resource allocation, exhibit relative stability throughout the VNF's lifecycle.Future research should assess the performance of RL in scenarios involving dynamic configuration changes.Second, iPerf has limited traffic generation capabilities, e.g., it struggles to reach rates ≥1 Gbps, and packets are not completely realistic.While a well-configured iPerf suffices for demonstrating the proposed method and showcasing an experimental proof of concept, it cannot fully capture a production network deployment with real traffic.

Conclusion
We introduce iOn-Profiler as an intelligent online learning VNF profiler using ML and in particular RL, incorporating Q-Learning across a range of optimisation objectives.This autonomous profiling RL model-based adjusts to network dynamics and our work and study results demonstrate its effectiveness by improving the efficiency of profiling for two modes of the Snort (Inline mode and Passive mode) VNF and for vFW, a virtual firewall VNF.We investigate 39 scenarios (13 per VNF type) with different RL resource allocation objective weights to understand the impact of different resource types on the quality of our profiling model's resource allocation decisions.Our comprehensive evaluation results highlight the importance of considering multiple resource optimisation objectives and examining each VNF type individually with online learning, rather than with a statically trained SL model that is impossible to adapt to dynamics such as in demand patterns or be easily used for transfer learning purposes.

k
and τ max k and redefine the problem constraints as: m k ≥ τ min k , ∀k and m k ≤ τ max k , ∀k.

else 11 Action
a ← Call Algorithm 2 for state s; end Take action a and observe the next state s ′ Calculate reward (R o ) of each resource in s ′ through equation (2).

end
Ask the NFVO to scale in or out the VNF based on s ′ Find the OR and record the corresponding state Insert ∥s − s ′ ∥ in ∆s;s ← s ′ ; if n > N ε and max ∆s < ε thenThe algorithm has converged;

Figure 2 :
Figure 2: Translation of zedoid function by 0.5 units to yield rewards only for positive resource allocation over the x-axis.

Figure 10 :
Figure 10: Percentage error of resource allocation predictions by MLP, RF and RL regarding Snort (Inline mode).

Figure 11 :
Figure 11: Percentage error of resource allocation predictions by MLP, RF and RL regarding Snort (Passive mode).

Figure 12 :
Figure 12: Percentage error of resource allocation predictions by MLP, RF and RL regarding the vFW VNF.

Table 1 :
State of the art summary in intelligent VNF profiling.Compared to others, iOn-Profiler "fills in" all columns corresponding to research gaps.
Algorithm 1: Multi-objective Q-Learning adaptation.Input: learning rate α = 0.1, discount factor γ = 0.99, the best steepness coefficient value (β) for each reward function, maximum number of steps (N ), convergence check threshold (ε), number of steps for convergence check (N ε ) .for each objective o do Initialise Q o (s, a) as an empty Q-table.
end for each episode t do Initialise present state s vector; Initialise circular buffer ∆s with N ε slots; 2.1), let λ = |I| be the number of resource types considered and κ = |K| the count of the different KPIs targeted.Let Ψ be the set of VNF input categories, with ζ = |Ψ|.Also, let each i be assigned values in {ι 1 , ι 2 , ...ι ρ }.Note that the granularity of the latter counts ρ feasible resource level allocation options for each i.For coherence, we classify measurements for each k into the immediately preceding class within set T k , thus counting ω measurement options (Recall T k from Sec. 2.1).Finally, let set U ψ = {υ 1 , υ 2 , ...υ η } define a partition of possible VNF input levels per type ψ ∈ Ψ, thus counting η levels.

Table 2 :
Software frameworks and tools.
quantisation of state space and implementation necessities after the complexity analysis in Sec.3.3.For the scalarised ϵ-greedy algorithm, we adopt a decay factor ϵ = 0.9999, with minimum exploration rate 0.1, learning rate α = 0.1, and discount factor γ = 0.99.Training is organised in episodes encompassing action steps until either a maximum number of steps is reached or the minimum resources are found.Given this setup, a total of 2000 episodes was assumed, encompassing a mean number of 738 steps per episode.

Table 4 :
KPI targets that the VNF under profiling should meet.

Table 5 :
Parameters of the MLP Model

Table 8 :
Mean steady-state output for vFW in 13 resource-scalarisation scenarios.Our future research plans include expanding our model for service function chains in various VNF configurations and for different VNF types across different network resource substrates.We also aim to further explore network adaptability and the benefits of iOn-Profiler with transfer learning between different resources and/or across different VNF types.