Introduction
Fifth Generation (5G) networks are advancing to support futuristic, non-conventional services and use cases in mobile networks. Network slicing is a major contributor to this advancement. The ability of network slicing technology to create multiple logical networks on top of a shared common physical infrastructure makes per-service configuration and accommodation on the network possible. For these logical networks to support an increasing range of new use-cases through network slicing technology, requirements such as slice isolation, security, bandwidth, and latency must be met. Network slicing technology is designed to allow the allocation of logical 5G network resources to heterogeneous service types over the network in a flexible and dynamic way [1]. However, the complexity of meeting these requirements when dealing with heterogeneous network services cannot be ignored. The complexity is exacerbated when the logical network slices deployed on top of the shared common physical infrastructure are implemented end-to-end across the Radio Access Network, (RAN), Transport Network (TN), and Core Network (CN) domains on the 5G network, as seen in Fig. 1. Furthermore, to ensure that service differentiation and Service Level Agreements (SLA) of each application type in the network are met, orchestration and management of the network slice must be considered throughout the entire network slice life cycle; from slice preparation and design to slice decommissioning and termination [2]. This is a complex task, especially when managing network slice supporting services with a myriad of heterogeneous services and application types.
Software-defined networking (SDN) and network function virtualization (NFV) are tipped to be the key enablers of network-slicing technology. In addition, these technologies have been utilized in end-to-end 5G network implementations when deploying RAN and CN functionalities. An elaborate 5G network architecture of the CN components interfacing with the RAN and Data Network (DN) is shown in Fig. 2 Concretely, achieving true 5G end-to-end network slices is driven by the possibilities of deploying the 5G RAN, TN and CN functions as software components, decoupled from the hardware, and the separation of the control plane and user plane functions [1] as shown in Fig. 2. This trend of SDN and NFV enabled technology in 5G RAN and CN is also extended and supported in the TN by existing data forwarding devices in the TN that adhere to the SDN and NFV philosophies. In addition, achieving flexibility and dynamicity in the management and orchestration of network slice resources can be supported by the different domain-level controllers implemented in the RAN, TN, and CN to handle the operation and management of end-to-end 5G networks [3].
To achieve the vision of network slicing, automating the operation and management of the network slice life cycle will be key. That said, understanding traffic patterns for decision-making from the different network applications and use cases is complex. Therefore, relying on human network administrators or conventional mathematical models to understand the correlation and patterns in this 5G network traffic during the orchestration and management of a network slice life cycle is a daunting task [2]. In contrast, the recent advancement of machine-learning (ML)-driven solutions to understand complex relations in datasets makes it possible to use machine-learning techniques to automate network slice orchestration and management [4].
Real-world deployment of network slices with embedded intelligent orchestration and management is still in the infancy stages. Moreover, the actual deployment of end-to-end network slices across the network domain shown in Fig. 1 has been studied and discussed in several high-level theoretical frameworks [3], [5], [6], [7], [8], [9], [10], [11], [12]. At the same time, large-scale testbeds have implemented network slicing, at times, in a disaggregated manner and in silos across the network domain. Furthermore, because of the importance of the design and planning stage of a network slice, studies have attempted to design network slices using network slice templates.
As it is, looking at the issues around network slicing that we have raised above, there is a disaggregation of literature around the topic, especially on the implementation of real-world end-to-end network slicing on physical infrastructure, coupled with the implementation of intelligent management and orchestration that supports the initial design of a network slice from a network slice template.
Therefore, in this systematic review, our aim is to consolidate the literature that discusses end-to-end implementation of network slices that are reproducible and easy to replicate. This review focuses on the literature that presents testbeds that implement end-to-end 5G network slices using open-source software solutions. We also focus on literature that clearly explain how network slicing is achieved at each network domain. We deliberately include literature in this systematic review that explain the low-level implementation of network slices with resource management and isolation in the different network slices domains. Furthermore, we also include literature that embeds intelligence in these end-to-end network slices. This means that the testbeds to be considered for inclusion in this systematic review would have not only clearly explained the low-level implementation details of network slicing on the testbed but also the strategies that were used to embed intelligence into their work. We complement this systematic review by providing an in-depth understanding of how 5G networking data collection can be achieved for ML workflows on the 5G network. We review studies that explain in detail how data collection was performed, which open-source data collection tools were used, and also describe the dataset that was collected. We further explain in detail how these datasets were parsed for preprocessing and the different dimensionality reduction techniques that were used with these 5G networking data. We also include an explanation of how understanding of these 5G networking data using unsupervised machine-learning techniques can aid in the design and planning of a network slice using network slice templates. Finally, this study also presents literature that provide a clear understanding of what a network slice template is in terms of attributes and values expected to define a network slice template in 5G.
In summary, the main contributions of this systematic review are as follows:
It serves as a one-stop resource for researchers to explore and understand the state-of-the-art advancements in the design, implementation, and management of end-to-end 5G network slices based on open-source software on Commercial-Off-The-Shelf (COTS) devices.
It provides insights into how networking data can be collected from 5G networks, parsed, and preprocessed for downstream ML tasks.
It underscores the role of network slice templates in the design, orchestration and implementation of network slices.
It highlights the connection between unsupervised machine-learning techniques and their applications in the design of network slice templates.
To the best of our knowledge, our work is the first of its kind to address the lack of consolidated literature that discusses the different topics raised above. Furthermore, if there are, our work is the first of its kind to present a systematic literature review that addresses the different thematic areas on end-to-end intelligent network slicing; data collection, parsing, and preprocessing techniques; network slice templates design and the potential to use unsupervised machine-learning techniques in a bid to understand 5G networking data to define and design network slice templates for deployment. The organization of this paper is shown in Fig. 3.
Related Works
Esmaeily and Kralevska [1] present a systematic review of small-scale network slice testbeds that can be implemented in university settings for research. This detailed survey presents state-of-the-art testbeds that have implemented network slicing for different use cases. They present software packages that fit the European Telecommunications Standards Institute (ETSI) Management and Orchestration (MANO) NFV framework that can be used for 5G network slice functions across the network domain. They analyze and present design criteria that a 5G network slice should meet. These design criteria can be primary or secondary design criteria. Primary design criteria for network slices include support of the main enabling technologies, MANO equipped with dynamic monitoring capability, multi-network domain with partial slicing support, and multi-tenancy support. Secondary design criteria include multi-radio access technologies support, end-to-end network slicing, cross-location support, machine-learning-enabled network slicing, and open-source-driven implementation. Esmaeily and Kralevska [1] evaluate the testbed according to these two design criteria and summarize with open research challenges.
This initial work on network slicing testbeds is useful in our study because it sets the scene on the status of testbed deployments. However, looking at the design criteria considered, we see that open-source support for network slicing experimentation and end-to-end network slicing implementation is not a primary design consideration. This is a focus of our work and is included as one of the considerations in the inclusion criteria. Furthermore, the work only covered a few testbeds that show the integration of ML into network slicing since publication in 2021. We extend these findings by discussing more work within the scope of ML integration into network slicing. Although the study uses the systematic review approach in its methodology, it is different from the systematic review in this paper because the major thematic areas in this work, such as the datasets used for downstream ML workflows, the data collection tools, the implementation of end-to-end network slicing using open source tools, and the discussion of network slice template design, are not covered.
Research work has been conducted to provide a review of machine-learning or intelligence in network slicing. However, we note that most of the work only provides high-level implementation details as discussed in the literatures [8], [9], [12], and [13]. Other similar studies focus on network slicing Life Cycle Management (LCM) tasks that can be augmented using machine-learning when the network slice has already been orchestrated [7], [10], [14]. Studies that have proposed machine-learning models to handle tasks such as traffic prediction, slice admission control, radio resource sharing, network slice elasticity, user admission control VNF placement among other tasks in the network slice life cycle are discussed in the survey by Donatti et al. [2]. The work of Donatti et al. [2] surveys ML techniques that can be implemented in the preparation, commissioning, and operational phases of the network slicing. The work summarizes numerous studies and provides an overview of intelligent network slicing using machine-learning in the LCM of network slices. However, we note that most of these studies included by the authors of [2] are validated and evaluated in simulated and emulated environments.
As much as the work by Donatti et al. [2] provides a comprehensive survey of ML techniques for intelligent network slice LCM, we find some areas of improvement. Most of the studies highlighted were based on simulation and emulation results for evaluation. At the time of writing, significant work from other researchers on real-world implementations on real hardware devices has been done and will be presented in this systematic review. We understand that this research area will benefit from understanding implementations, techniques, and evaluation strategies of intelligent network slices on real hardware devices.
Furthermore, for the ML-related work by Donatti et al. [2], no information is given on the datasets used. In addition, for work that had real-world implementation on real hardware devices, lower-level implementation details inclined toward open-source deployments that researchers can replicate easily for end-to-end network slicing are not provided. However, we still find the survey by Donatti et al. [2] very useful as it gives a broad overview of the state of the art of network slicing and ML for intelligent network slice LCM with the algorithms used, implementation scope, and network slice problems solved.
Another difference observed between the survey by Donatti et al. [2] and this systematic review is that they used the scoping review methodology to conduct the survey, and not a systematic review as is the case with the review methodology of this article. In the case of this study, with the systematic review, we provide a rigorous, in-depth, and narrow-focused methodology to understand the state of the art of ML in the slicing of 5G networks, guided by very specific research questions and synthesis of high-quality included evidence from the literature.
Phyu et al. [4] present a similar work, closely related to this study. They begin by giving an exhaustive background on network slicing and machine-learning. They highlight different application areas of machine-learning in network slicing giving key lessons learned from each. Traffic forecasting is one of the application areas covered in the survey. It is complemented by a list of ML techniques and learning approaches that can be used for traffic forecasting in network slicing. This ranges from supervised, unsupervised, reinforcement learning, and deep reinforcement learning techniques. Another area of focus was resource forecasting. The survey highlights ML techniques, evaluation methods, the dataset used, and the environment in which the data was collected to aid in implanting intelligence in the forecasting of resources in network slicing. Other focus areas of application of machine-learning in network slicing studied include minimizing overprovisioning and SLA violations, minimizing monetary costs for operators, and maximizing throughput and link utilization, among others.
The study also analyzed the ML techniques used for three categories of slice admission control. The three categories were: (i) slice admission control (ii) end-user admission control and (iii) two-level admission control that considers both slice admission and end-user admission control. To this end, the focus area of the ML techniques used in slice admission control, the ML techniques used, the ML model used, the input, the decision made and the reward or objective were listed [4].
Other application areas of machine-learning in network slicing covered by Phyu et al. [4] aimed at supporting business, technological, and network performance needs. The objectives and rewards of this focus area ranged from maximizing the price per unit paid by slices, total network costs, and Key Performance Indicator (KPI) satisfaction, among others.
The slice resource allocation problem is also covered in this comprehensive survey by Phyu et al. [4], where the focus was on resource allocation techniques using ML algorithms in the RAN, TN, CN and joint allocation schemes in the RAN and CN. In this application area of ML in network slicing, the focus area, the rewards, and objectives of the different solutions are also discussed. Intuition into how end-to-end network slicing resource allocation can be realized is also provided in the survey by Phyu et al. [4].
We find this comprehensive scoping survey by Phyu et al. [4] important in understanding the broad-based application of ML in network slicing tasks. However, the survey differs from this systematic review, first, from a methodological perspective and in lack of low-level implementation details of network slicing; especially end-to-end low-level implementation details of network slicing on real physical testbeds from different literature are not provided. Moreover, in the list of datasets, we notice a bias to RAN-related datasets. Equally, most of the datasets mentioned are not elaborated nor described in detail. More importantly, we build on one of the future research directions listed in the survey by Phyu et al. [4] where an open research challenge of incorporating high-level ML theory and algorithms in practical network slicing deployments is identified and should be investigated.
A summary of how the different related works differ from this systematic review is shown in Table 1.
Systematic Review Aims, Objectives, Research Question and Scope
This systematic review aims to provide a state-of-the-art overview of current literature and studies that explain in detail the implementation of end-to-end network slicing in 5G and the possibilities of using ML techniques in managing network slicing tasks. To achieve this, this systematic review provides a comprehensive analysis highlighting open-source tools used in the low-level implementation of end-to-end network slicing, data collection methods and tools used for collecting data, and a description of the datasets that can be used for ML tasks in network slicing. This systematic review also attempts to create a nexus between understanding patterns in 5G networking data and the design of network slice templates while providing details of how a 5G network slice template can be defined. The research questions to be answered, the objectives they set out to achieve, and the scope of this study are elaborated in Table 2.
Systematic Review Methodology
A. Search Strategy
The first step of this systematic review is to define keywords, create research questions, and identify suitable databases in the domain of the research questions raised in this systematic review. Since the questions were interdisciplinary and we expected the literature to cut across computer science and electrical engineering, we settled on 4 popular databases that index such scholarly work, i.e., Scopus, Web of Science, IEEE Explore, and Association for Computing Machinery (ACM).
The search strings used in this search strategy varied for each database. This was guided by how the databases structure their search queries. Importantly, considering the objectives set out for this systematic review, key attention was paid to defining the search strings to ensure the search procedure returned the most relevant literature. We demonstrate this with the search performed in the Scopus database, Scopus Query 2, by including the search term ‘industry vertical’ or ‘vertical’ and concatenating this list of strings with ‘template*’ OR “custom template” OR ‘blueprint’ which returned a narrowed search results. However, to obtain more articles not featured in this search but met the objectives of this systematic review search, we used an additional search query on Scopus (Scopus Query 1) that did not include the search string list and concatenation explained above. Table 3 shows the search queries done in Scopus to demonstrate this. Furthermore, we highlight the additional search terms used in Scopus Query 2 and other databases used in this work in Table 3.
B. Inclusion Criteria
To decide which articles were to be included in this systematic review after the full text assessment was performed, we developed the following inclusion criteria.
Articles that present an end-to-end implementation of a 5G network and 5G network slicing on real hardware devices and verifiable testbed results.
Publications that clearly explain the low-level implementation details of a 5G network slice from the RAN to the CN with verifiable testbed test results using open-source software.
Studies that clearly explain the data collection procedure on a 5G network, including the tools used, and the applications involved in data generation, and describe the dataset that was collected.
Studies that give an in-depth description of how 5G networking data was handled from collection, parsing, and preprocessing with relevant application techniques for feature engineering for ML workflows.
Studies that demonstrate the possibilities of embedding intelligence using machine-learning techniques in 5G network slice management.
Studies that provide insights on how a network slice template with attributes can be designed or translated from a slice tenant request to a template mapped to network resources.
Publications that implement unsupervised learning, especially clustering techniques, of 5G networking data with concrete evaluation and interpretable results.
C. Exclusion Criteria
To ensure that only articles directly relevant to our research question were included, we established the following exclusion criteria to filter out studies that, while related to our topic, did not significantly contribute to answering our specific research questions raised in this systematic review.
Publications not in English.
Publications that were purely based on a simulated environment to implement an end-to-end 5G network and network slicing.
Publications that claim to implement network slicing on real hardware devices but only cover high-level proof-of-concepts and frameworks that do not explain low-level implementation details of network slicing across the domain nor testbed results.
Publications that do not clearly outline how the data is collected, the tools used, and a description of the dataset.
Publications that only collected networking data biased towards control plane data and not user plane data nor the higher layer applications generating the data.
Publications that implement classification of networking data that is not generated in a 5G Network/Context nor not for use in 5G networking and network slicing context.
Publications that collected 5G network monitoring data that has no relation to 5G network resource management or 5G network slicing.
The Fig. 4 shows the article selection process used to get the final articles included for the full text review in this systematic review. A total of 3,490 articles were initially identified. After removing 276 duplicates and putting the remaining 3,214 through title and abstract screening, 168 articles were included for full text screening. A total of 22 articles were finally included in this systematic review.
D. Data Extraction Form and Format
In this systematic review, a data extraction form was used to capture the crucial aspects of the studies considered for inclusion. In addition to basic information such as the title of the study, the full reference, and the year the study was published, we summarized the problem(s) identified by each study. After reading the full text, a decision was made whether to include the study in this systematic review or not based on the inclusion and exclusion criteria explained above. In cases where the study was excluded, we gave reasons why the study was excluded. Information about the interventions or solutions presented by the authors was extracted from studies that met the inclusion criteria. We also summarize the outcomes of the proposed solution and present key findings and results from these studies. Subsequently, a summary of the author’s conclusion and our comments on the study conclude the data extraction forms. The comment section reiterates the strength and usefulness of the study in this systematic review, including key weaknesses we identified despite including the publication in this systematic review. The comments also highlight potential areas of improvement in some of the studies that met the inclusion criteria. A sample of a data extraction form is shown in Table 4.
Results
In this section, we categorize the included research articles that attempted to answer the research questions asked by this systematic review. We use the Population Intervention Comparator Outcome (PICO) framework described by the authors of [15] to present the results for each of the questions answered. However, in this study, we make alterations to this framework by considering the following:
P - The problem the research question was addressing.
I - The intervention, which means the solutions the research explored.
C - The comparator or comparable solutions that were explored, either stated or implied.
O - The outcome of the study, which included the results of the study.
In addition, we further elaborate the framework and give information on the results that this systematic review was able to gather.
A. Question 1
How can network application-level data and network flow data be collected from end-to-end 5G networks for downstream ML workloads?
There is a constant need for realistic and available 5G networking datasets for intelligent decision-making and integrating machine-learning workloads in 5G networks. In this section, we present different approaches taken by researchers in collecting realistic 5G datasets for downstream ML workflows. This includes tools used for data collection and considerations made in this data collection procedure in the context of 5G networks. We present the results using the PICO framework in Table 5.
Additionally, Tables 6, 7 and 8 summarize the data collected, data collection methodology, and domain where data was collected by the different studies discussed. Specifically, Tables 6 and 7 provide a list of features from the varying datasets. Tables 6 and 7 also show the domain where data was collected and the application in the higher layers of the 5G networking stack involved in data generation. Table 8 shows the data collection tools used by the studies we have presented in Table 5 and provides insights into the network deployment options and whether network slicing was considered when network deployment and data collection were done.
B. Question 2
How can network data and network flow data from 5G networks be processed for downstream ML workloads/tasks?
More data will become available to aid in decision making and embedding intelligence in 5G networks. Specifically, this data will be consumed by machine-learning and artificial intelligence algorithms to provide useful inferences for intelligent decision making. In this section, we review and present recent approaches taken in processing and preprocessing 5G networking and network flow datasets for downstream machine-learning tasks. We present the results of the PICO framework in Table 9.
We also present Table 10 that shows a summary of the feature engineering methodologies used by the included studies, the downstream ML tasks that the feature engineering methodology supported and if the study was conducted in the context of network slicing, giving details of the level of implementation of network slicing from the study’s findings.
C. Question 3
How has ML aided intelligence been applied in the end-to-end life cycle management of 5G networks using open source software on real hardware devices?
With the widespread deployment of 5G systems, the realization of true end-to-end network slices is yet to mature. However, there have been high-level frameworks developed to describe how end-to-end network slices can be deployed in 5G networks. In this section, this systematic review presents practical approaches taken towards the deployment of 5G end-to-end network slices. We narrowed our results to literature that provides frameworks with low-level implementation details and literature that present 5G network slices implemented on real testbeds using open-source software and commercial off-the-shelf equipment.
The PICO framework results are presented in Table 11. In Table 12 we present a summary of end-to-end network slicing implementation across the different 5G network domains. This includes the software and the physical hardware used across the different network domains. We also present a summary of the types of network slices deployed and give a list of the intelligence modules deployed in these network slices.
D. Question 4
What are the standard methods/approaches for creating or designing a 5G network slice template?
In a slice’s life cycle, to meet SLA agreements of a particular service profile, the design and definition of the network slice plays a critical role. In this section, we present research that contributes to the design of network slices using network slice templates. We present the attributes that we expect in a network slice template and how this differs in different network slice deployment scenarios.
First, the PICO framework results for Question 4 are presented in Table 13. Second, we show the attributes that describe a network slice template adopted from [34] in Table 14. Third, we present network slice template attributes that cut across the RAN, TN and CN domain adopted from [35] in Table 15.
E. Question 5
What unsupervised machine-learning techniques have been used to design custom end-to-end network slices in 5G?
Machine-learning techniques are expected to support the management of network slices and the life cycle of a network slice. In this section, we focus on the role that machine-learning techniques play in recognizing patterns in networking data from 5G networks. We believe that this ability to understand patterns in network traffic will aid in the commissioning of network slices and network slice resource management, before and after deployment of a network slice. In this regard, we focus on current research that has made attempts towards understanding patterns in network traffic using unsupervised machine-learning techniques. We narrow down on unsupervised learning techniques because of the unpredictable nature of traffic in 5G and beyond 5G networks brought about by the different heterogeneous and customized applications that will be supported by 5G. Consequently, understanding these traffic patterns can aid in resource management in 5G networks, especially in the age of network slicing in 5G.
The PICO results in Table 16 present relevant research that this systematic review found useful in understanding the state of the art of unsupervised ML techniques in mobile networking data. In Table 17 we provide an elaborate summary of unsupervised clustering techniques in mobile networking data. Additionally, in Table 17 we present data preprocessing techniques used before clustering is performed, the number of clusters produced, and the cluster evaluation method used. Finally, we provide insights into whether the clustering results were interpreted to network slices or not.
Results Synthesis
A. End-to-End 5G Data Collection
Collecting end-to-end 5G datasets requires implementing an end-to-end 5G network. A realistic dataset is desired as it can be used, with some level of certainty, to predict events in a 5G network. Simulation-based datasets are mostly sanitized and fail in this regard. In this line, we see that 5G networks, as much as they are poised to be enhanced by ML workflows, the data collection procedures and processes are still fuzzy. As much as the standards propose tools such as the Network Data Analytics Function (NWDAF) [38], the implementation of these tools is still in the infancy stages, hence opening the arena for different implementation frameworks and procedures that can aid in data collection in 5G networks. Looking at the work by [16], [17], and [18] data collection in 5G, require collection, parsing, and post-processing before it is useful for downstream ML tasks. In this context, post-processing involves taking the data captured in its raw form using a myriad of data collection tools and parsing it using custom scripts that then produce the desired output. For example, [16] wrote a custom Python script to process the data scraped on the web, while [17] wrote a custom sampling function to interact with the Amarisoft Callbox API to collect different network metrics and [18] used OpenDPI to process PCAPs to construct flows from the PCAPs. To this end, we can conclude that post-processing of raw data after collection from devices or interfaces of a 5G network is a vital step. This custom processing allows researchers to collect data that suit their work and exposes a multitude of data that can be collected from the 5G networks. In other advanced cases, monitoring tools can be embedded in 5G networks to perform data collection, data export, and storage tasks. For example, [19] implements an Elastic search Software Development Kit (SDK) on top of the FlexRAN controller to collect data. The work of [20] and [21] considers data collection in a network-slicing scenario implemented in a cloud-native environment by integrating monitoring tools such as the ELK stack for data collection complemented by data exporters like Filebeat and Prometheus exporter. The same is seen when the work by [22] integrates FlexRAN controller with an Apache Kafka as a broker for data collection.
As seen in the work by [19], [20], [21], and [22], these monitoring tools are technology-agnostic and can be applied and accustomed to 5G networks to suit specific data collection and monitoring needs. However, consideration must be taken of the kind of data that needs to be collected and the scale of the testbed in question. Nonetheless, in such advanced cases, interfacing these technology-agnostic monitoring tools is useful if the integration is done at the controller level to collect and ship data from the controllers in the network through to the data collection, storage, and analysis pipeline.
At the same time, it is important to note that data collection, especially in the 5G context, can be a very demanding and expensive process for the operator. The expenses suffered in this case can be from plugging in extra devices on the network to handle data collection steps or actual Capital Expenditure (CAPEX) incurred by using OAM tools that are technology agnostics like the ELK stack, etc. Moreover, collection and processing of this data is also a daunting task as it takes time and consumes a lot of physical computing resources. Studies like [17] present a sampling methodology to ensure not all data is collected but representative data is collected for use on network analysis tasks. On the other hand, the work by [39] proposes the use of Generative ML models to reproduce datasets. This aims to reduce the time and space complexity observed when preprocessing raw datasets.
In this regard, to solve for the availability of ML datasets for 5G network intelligence tasks, a lot of consideration has to be made by balancing the complexity of the network, the type and quantity of the data, and the expenses of the data collection methods. This can be technological or financial expenses, as well as relevance of the data. The more realistic the data, the more useful it is for downstream analysis and ML tasks.
On the other hand, there are no standard methods for collecting and presenting the data for downstream ML analysis as far as 5G networking is concerned. From Tables 6 and 7, listing the different datasets, we can see that there are some common features expected from the data collected across the RAN domain, and some of these features can also be cascaded to legacy RAN systems like LTE-RAN.
B. Data Preprocessing for Ml Tasks in 5G
From the previous section, 5G networking data collected using monitoring tools, custom scripts, or advanced generative models need to be analyzed or passed to ML tools for inference and model building. A common challenge in ML workflows is selecting the most significant features in available data to effectively train and optimize ML models [40]. Therefore, this requires either feature selection or feature extraction from the datasets. Depending on the downstream ML tasks, different approaches can be taken. For example, studies presented in this systematic review that involve the classification of 5G network data inclined to network slicing rely primarily on feature selection procedures for downstream classification tasks. For example, [25], [26] use network flow data from the Unicauca dataset to classify the network flow data into predefined slices i.e., Enhanced Mobile Broadband (eMBB), Massive Machine-Type Communications (MMTC), and Ultra-Reliable and Low Latency Communications (URLLC). Before classification is performed, feature selection algorithms such as ANOVA, Chi-Square, MI, correlation analysis specifically in [28], and Boruta were used. From a summary of studies by [25], [26], and [28], we note that these feature selection algorithms could potentially reduce the dimensionality of the dataset by more than 70%.
On the other hand, feature extraction techniques can also be applied to this 5G networking and network flow dataset for downstream ML tasks. Genetic feature extraction techniques in the work by [24] can also complement classification tasks.
In contrast, in unsupervised tasks, feature extraction techniques tend to be preferred, as seen in the work by [27] where PCA was used. Importantly, with feature extraction using techniques like PCA, reducing the number of features should not be prioritized to such an extent that the variance in the dataset is lost. This balance has to be maintained when considering features present in a networking dataset for the downstream unsupervised learning task. The essence of such procedures is to make sure that at some point the explainability of the results obtained can still be guaranteed and the ground truth in the data can still hold.
Moreover, in the context of data preprocessing for downstream ML tasks that support network slicing, we note that normal ML preprocessing will still be important, as such tasks will still rely on ML pipelines built for other legacy ML tasks. In this data encoding, standardization, and normalization will still be required and this needs to be tailored to meet the needs of the downstream ML task.
C. E2E Network Slice Implementation and Embedding Intelligence in the Ran, TN and CN
To implement end-to-end network slices, this systematic review focused on research that can be easily replicated using open-source tools. The studies included in this paper are research works that achieve end-to-end network slicing on real testbeds with real hardware devices. Generally, on the RAN, open source software such as OAI RAN has been used in the work of [29], [30], [31], [32], and [33] to provide 5G radio signals over the B210 SDR. In most of these studies, [29], [30], [32], [33], except [31], the FlexRAN controller is integrated into the RAN to provide data collection services and RAN control to achieve network slicing in the RAN. Specifically, these studies that integrate the FlexRAN controller in the RAN, general network slicing, network slice resource isolation, and bandwidth separation in the RAN are achieved by allocating PRBs on the RAN to different network slices using the SD-RAN controller, FlexRAN. However, [31] takes a different approach in RAN slicing, where two separate gNBs are deployed to service as network slices. We find that this approach is not representative of the vision of what a true network slicing is on the RAN. Especially when it comes to the logical separation of physical resources on a shared physical infrastructure.
Studies included in this review that implement network slicing in the transport network make use of programmable switches or forwarding devices that adhere to the principles of SDN. The general idea of TN slicing is based on appropriate ways of handling, isolating, and forwarding traffic using programmable switches. Open vSwitch (OVS) is the dominant forwarding device when implementing network slicer in TN while interfacing with SDN controllers as seen in [29], [30], and [31]. Several controllers have been mentioned to aid in the manipulation of traffic routes, setting bandwidth restrictions, and meeting slice isolation and resource needs for different network slices. They include ONOS in [29], [32], and [33] Ryu in [30], Floodlight controller in [31]. In TN network slicing, we find the implementation of network slicing fuzzy but flexible. This allows researchers to take approaches driven by their research needs to implement network slicing at the TN. However, in most studies, the consensus is that SDN and NFV play a key role in traffic engineering in the TN to achieve network slicing in 5G networks. The same is true for any transport network implemented in the fronthaul and backhaul of the 5G system to support network slicing as seen in [30].
In the CN, the approach to achieving network slicing is supported by the deployment of core network functions as cloud-native network functions over cloud orchestration platforms like Docker or Kubernetes. This is common across the studies included in this review. Concretely, the work by [31] provides insight into the isolation of the slice by proposing a core network of shared control planes among the slices, and the user plane function is separated and implemented for each slice. Research on slicing of core networks is mostly driven by the virtualization of core network functions. From this analysis, we find the deployment of network slices at the core network to be still a fuzzy topic as far as implementation is concerned. Most implementations of network slicing at the CN will also be guided by the different SLAs defined by the network slices needed.
In addition, embedding intelligence for network slice deployment and management is mostly driven by the need to manage resources and the life cycle of a network slice predictably and dynamically. As much as a closed-loop implementation of intelligence in network slicing is desired, a disaggregated approach to implementing ML models in an end-to-end 5G network is seen. This is true in situations where the inference needed for the RAN slicing is different from the inference needed for the CN slicing. For example, the studies included in this review show that the implementation of ML models can be used to predict and dynamically adjust the allocation of PRBs in the RAN to support different network slice instances [29], [30], [32], [33] At the CN, other studies focus on using ML models to aid in the placement of VNFs to support low latency in a specific network slice [30]. From this analysis, we conclude that embedding intelligence in network slicing, especially in a closed-loop end-to-end manner, is not guided by a one-size-fits-all ML model to handle intelligent tasks at the RAN, TN, and CN. However, the ML models deployed will complement each other’s intelligent capabilities to allow a specific slice implementation to meet the required SLA. At the same time, data collection, implementation of these models and the inference they generate will be done on the different controllers in question in each network domain as seen in [29], [30], [32], and [33] where FlexRAN handles the ML models at the RAN while other SDN controllers such as Ryu [30] and ONOS [29], [32], [33] handled the ML models in the TN.
D. Approaches to Designing A 5G Network Slice Template
The studies included in this review show that the design of a network slice template is guided by translating the requirement of slice tenants into physical network resources. In our view, the attributes expected in a high-level network slice template will describe the lower-level implementation of a network slice on the physical network device. Looking at Tables 14 and 15, we can see that the definition of a network slice template generally follows a specified philosophy. This philosophy calls for defining network slices that are isolated, logical network functions will be mostly described as cloud native functions or virtual network functions, and management capabilities of a network slice instance will be exposed. At the same time, we note that the standard definition of a network slice template has not yet been standardized. Furthermore, looking at the work of [34] and [35] we find that the authors have made strides in defining the network slice templates; however, translation into actual network slices, implemented on a physical testbed with real isolation, management and QoS and SLA adherence results are yet to be implemented.
We see that descriptions by [34] and [35] of what a network slice template should look like have an inclination towards CNF and VNF descriptors when defining a network slice template. Templates that describe attributes such as “osContainerDesc” in [34] and VNF deployment options as virtual machines or containers in [35] prove separation of logical resources over shared physical infrastructure in network slicing will benefit from CNF and VNF deployment options.
At the same time, the capabilities of defining the network slice templates based on the SLA and service profile in question with multiple attributes that can be considered means network slicing as a technology can be expanded and other slices, beyond the generic categories can be defined and implemented. In fact, the lack of a strict combination of attributes that define a specific network slice category implementation from the attributes listed in Tables 14 and 15 means network operators are free to mix and match these attributes to suit their needs, allowing this flexibility.
E. Can Unsupervised Ml Techniques Aid in the Design of Network Slice Templates?
Understanding network traffic patterns can be done using unsupervised machine-learning techniques. When the results of such pattern recognition tasks are extended to network slicing scenarios in 5G, the life cycle management of network slices can benefit from such explorations. In this review, although we do not see a direct correlation in how unsupervised machine-learning techniques have been used previously to aid in the design of network slices and network slice templates, there has been work that uses unsupervised machine-learning techniques in 5G network traffic datasets to understand the traffic patterns.
In this regard, in interpreting the results from such pattern recognition tasks in 5G network traffic, service profiles can be derived from external analysis with domain experts to create network slice templates. Research in this field of using unsupervised learning to recognize patterns in network slice traffic shows that clustering techniques are best suited for this work. This is evident in the work of [27] where K-Means with iterations were used to recognize network traffic patterns. The same is seen in [36] where the study first implements Basic Agglomerative Hierarchical Clustering and later improved to Enhanced Agglomerative Hierarchical Clustering to ensure that the study is consistent with the ground truth.
In both works, we find that cluster validation plays an important role. Cluster validation, augmented by external analysis that involves human experts in interpreting the results is vital and desired. We see that the authors of [36] use attributes such as the Number of Flows, Average Bandwidth (Bytes/s), Average Duration(s), and Burstiness (pkt) to describe the clusters. Although these clusters represent generic network slice categories, they help to give initial directions to the customization and definition of network slices. The work of [37] takes a different approach in that it looks at developing an alternative clustering algorithm that is suited for the kind of dataset the 5G network produces. The Decorrelating Adversarial Nets for Clustering-friendly Encodingalgorithm [37], is specifically designed as a fit-for-purpose algorithm that addresses the suitability and weaknesses of conventional clustering algorithms. With a different clustering approach applied, we still see external metrics as the desired approach to analyse and validate the cluster results.
This systematic review also notes that the work by [36] and [37] did not directly contribute to the creation of clusters that were analysed and interpreted as network slices, but they guided how clustering networking data can be done. They highlighted this in the context of 5G mobile networking data.
Discussion
As much as data collection on the 5G network can be achieved through multiple unstandardized data collection methodologies, there is still a lack of datasets that cover an end-to-end implementation of the 5G network from the RAN, TN, and CN domains. In addition, most of the data that we have presented in Tables 6 and 7 may show the application responsible for producing the data. However, we find that these datasets and their associated features still describe summarized and high-level information about applications that interact with 5G networks. To this end, there is a lack of granularity that goes down to the different applications on the network, including network traces and network flows, of specific applications from the RAN, TN to the CN. This is still an open area of research that can be investigated. With this exploration, understanding traffic patterns can be done at a very low and granular level for each network application on the 5G network.
Passing the data down to downstream ML tasks in a bid to support network slicing tasks has been investigated, as we have highlighted in Table 10. We still note the need to explore other data preprocessing techniques, especially for unsupervised learning tasks. These studies are meant to provide insights into how this kind of networking dataset can be preprocessed for unsupervised ML tasks. This activity should involve comparative studies on dimensionality reduction in 5G networking datasets for such unsupervised learning tasks.
When implementing end-to-end network slices on physical hardware, the expectation of a true network slice is the logical separation of network resources on a shared physical infrastructure. This will be driven mainly by open source solutions deployed in a cloud-native fashion or using virtual network functions, as shown in Table 12.
Concurrently, control of the physical resources is mostly guided by the relevant controllers in the different domains. Although we see some consistency in handling resources in the RAN, using PRBs, or in the CN, using control and user plane separation and dedicated core network functions to different network slices, we still find the TN implementation fuzzy. This is by design because of the different traffic engineering approaches that can be taken to meet the different network slicing requirements in the TN. This is mainly guided by the demands of the SLA defined by the service profile of the network slice. Therefore, meeting these SLAs and embedding intelligence in network slicing deployment means the controllers are the points of contact for exposing APIs for data collection and inference injection. At the same time, the management of network slices with embedded intelligence must be done from a holistic approach as much as end-to-end implementations can be done in a silo.
On the other hand, network slicing template definition is an important step in the network slice life cycle. However, we see limited work conducted in this space. As much as the research on network slice template definition is still vague and maturing, we also appreciate there are inefficiencies in the standards documentation and provisions on how network slice templates can be defined. Nonetheless, there is a constant philosophy in the definition of network slice templates, guided by CNF and VNF concepts, that is widely consistent with the research findings that we have presented in Tables 14 and 15 above. In addition, issues related to the handling of slice isolation, slice management, monitoring, and orchestration strategies to meet QoS requirements have been consistent in the network slice templates we have presented and discussed. Nevertheless, on the lower level, we see the definition of the network slice templates angling more towards defining attributes that are well suited for implementation in cloud-native environments with these philosophies and considerations in mind.
Using unsupervised models, especially clustering, in network slices life cycle management looks to be promising. In the current state-of-the-art work we have presented above, there is no nexus between clustering results and how they are adopted, interpreted, and mapped to an actual network slice template. However, we have seen work such as [27] analysing the clustering results based on attributes in the dataset such as the Number of Flows, Average Bandwidth (Bytes/sec), Average Duration(sec), and Burstiness (pkt) in relation to a specific type of network slice. However, when we look at the attributes of the network slice template in Tables 14 and 15, we see attributes such as dLThpSlice, dLThpPerUE, and placement (position placement info: Far, Edge, Associated Edge, Central Cloud) that can be extrapolated using domain knowledge to define service profiles that can be used for the definition of network slice templates. Therefore, this is an emerging area of research where the design and definition of a network slice template, especially filling in the attributes with values, can be guided by results obtained from understanding the traffic of a 5G network using unsupervised machine-learning techniques, especially clustering techniques.
Conclusion
As 5G networks evolve and the embedding of intelligence in 5G networks becomes a reality, a lot of concerted efforts will go into building realistic platforms that allow for easy integration of machine-learning techniques in 5G networks and network slices. This will also be augmented by available and realistic datasets that can be built using open-source tools and replicable data collection strategies. In this systematic review, we have presented the current state of research in the technological domains of 5G networking datasets, dataset collection methods, available and evaluated end-to-end network slicing solutions with detailed implementation insights, and the potential of using machine-learning techniques to manage these networks and design network slices using network slice templates. This systematic review has also highlighted research gaps in the previously mentioned thematic areas. We have demonstrated a lack of end-to-end datasets that are granular and a representation of the characteristics of an application on the network. We have also demonstrated the possibilities of building a closed-loop testbed using open-source solutions to test for end-to-end intelligent network slicing. Furthermore, we have explored the steps that can be taken in defining a network slice template with its associated attributes. Additionally, we have illustrated how this step in a network slice life cycle can benefit from unsupervised machine-learning techniques to understand patterns in network traffic. In the future, we plan to address open research gaps, such as building end-to-end datasets for 5G networking, and using unsupervised machine-learning techniques in defining a network slice template with attributes and actual values. This will be validated on a small-scale testbed with real hardware to test the end-to-end implementation of custom 5G network slice templates as an initial proof of concept.
ACKNOWLEDGMENT
The authors acknowledge the contributions made by Dr. Patricia Makwambeni, Senior Librarian at the University of Cape Town, who helped formulate the search strategy used in this systematic review. They also acknowledge Maurine Chepkoech and Linnette Muga for proofreading this document.