Introduction
The rapid evolution towards sixth-generation (6G) networks has repositioned networking, computing and communications at the center of technological innovation, making them valuable and viable for new types of high-speed and compute-intensive applications and services, while guaranteeing their pervasiveness and dependability. 6G technologies, benefitting from softwarisation, Gb/s speed and sub-THz communications paradigms, create new opportunities for developing new and innovative network management strategies while navigating the evolution towards disaggregation, new software-based paradigms in architecting and operating future connectivity platforms. In the same context, various features such as computing, automation and smartness, trust, privacy and security are embraced [1], [2], [3].
In the 6G landscape, various and diverse technologies will be integrated towards a unified access and management framework. In this context, non-orthogonal multiple access schemes [4], [5], Internet of Things (IoT) [6], machine-to-machine (M2M) communications [7], serverless computing [8] and intent-based networking (IBN) [9] are only a few of the 6G key enabling technologies from a network perspective. As the vision of new, smart, and innovative capabilities is becoming the reality at a rapid pace, the key security, privacy and resiliency features are not only requested “by design”, but instead, they push the envelope of managing a truly evolving system, with features engineered “by the evolution itself”. The fundamentally new and unknown features of advanced, disaggregated, virtualized and multi-vendor 6G based infrastructures challenge the security and resilience design to the next level, by managing the heterogeneous, complex, and highly versatile infrastructures as they evolve [10], [11]. For instance, a service can now migrate across multiple disaggregated domains and environments, with highly intelligent and flexible security mechanisms in each one.
In the same context, resource optimization in such complex environments includes data collection from various endpoints that should be properly processed in due time, in order to support strict latency requirements [12]. Therefore, there is a two-fold challenge: i) data collection and classification in large heterogeneous network environments, and ii) wide-area optimization to support bandwidth and latency demanding applications. Typical optimization approaches might lead to highly non-convex problems where the extraction of the optimum solution might be quite difficult, both from a computational and latency perspective. Other approaches in the literature decompose the original highly non-convex problem to a set of convex subproblems that are separately solved [13]. However, even so, a calculation error is unavoidably inserted, while at the same time calculations will have to be repeated for each network reconfiguration, thus violating latency requirements.
Since such complex systems can significantly increase the number of potential endpoints and the types of potential attacks, resource optimization and attack mitigation can be performed with the help of artificial intelligence (AI). In future 6G networks, AI-methods are used to: (i) optimize resource allocation and perform optimum network reconfiguration if necessary [14], [15], (ii) improve response and resilience of systems, for example, towards early detection of threats and anomalies [16], and (iii) identify and correct vulnerabilities of the systems predicted to be exposed, in a sandbox environment, such as in digital twins [17]. Consequently, the application of AI should follow a coordinated approach, by combining both reactive and predictive methods.
Machine learning (ML) methods, being a part of AI, have emerged over the last decade as a powerful approach to predict specific output variables from given input datasets [18], [19]. In this context, three main approaches can be identified: i) supervised learning, ii) unsupervised learning, and iii) reinforcement learning (RL). The first case is more appropriate for labelled datasets, since training is based on predefined samples (i.e., input/output pairs). However, in complex network orientations this might not always be feasible due to the diverse nature of the involved devices and technologies. Therefore, unsupervised learning approaches are more appropriate for unlabeled datasets [20]. In such cases, data patterns can be extracted, and classifications can be performed according to the relative positions of the associated data points. Hence, this method can be highly applicable for anomaly detection. However, in both aforementioned approaches data sampling will have to be repeated for each network reconfiguration. Finally, in RL, a set of actions and transitions is defined per network orientation. In this context, the transition to a new state is based on the action that maximizes a predefined reward [21]. Since there can be numerous pairs of states-actions, in practical applications, neural networks are trained that define the best possible action per state (deep reinforcement learning - DRL).
It becomes apparent from the above that resource optimization and security enhancement require efficient data collection and processing from various and diverse sources, and time-efficient decision-making procedures. In this context, the network data analytics function (NWDAF) that has already been defined in Release 15 of 3GPP [22] can collect information from various network functions (NFs) and support ML optimization. NWDAF can be combined with additional key enabling technologies in fifth-generation (5G) networks, such as open radio access network (O-RAN) which also supports closed loop automation via ML [23], [24], [25]. NWDAF is based on network function virtualization (NFV), which is a key supporting technology in beyond 5G (B5G) networks [26], [27], [28]. In this context, basic network operations and hardware elements are replaced by virtual machines, allowing more flexible network deployments, due to the hardware decoupling capabilities offered by NFV.
The goal of this survey is to study all recent approaches towards the integration of NWDAF in next generation networks. To the best of our knowledge, this is the first scientific attempt to record all recent works on the field of NWDAF implementations and its role in enhancing resource optimization, security and privacy.
A. Indicative Related Works
In this subsection, various indicative related works are investigated on the field of efficient data monitoring, resource optimization and security enhancement in broadband wireless networks. The survey in [29] deals with various aspects related to process optimization in 5G networks, such as network monitoring, large data collection and big data analysis, and the integration of various AI/ML algorithms with an emphasis on lightweight models. To this end, the synergies between NWDAF and O-RAN are also highlighted, towards a fully self-configurable wireless network. In [30], the authors present recent works on ML-aided network anomaly detection. Here, the advantages and disadvantages of each ML approach (supervised, unsupervised and reinforcement learning) with respect to anomaly detection are also discussed. In particular, supervised learning approaches can be more efficient for labelled datasets, since input/output training pairs are well-defined. However, their performance can significantly deteriorate when unlabeled data are used and in this case, unsupervised learning approaches can provide improved anomaly detection performance. Nevertheless, accurate performance metrics cannot be established, due to the absence of labelled data. Finally, in RL approaches the corresponding algorithms can be trained in real-time and there is no need for an a priori training dataset. However, since all actions should be examined for the one that maximizes overall reward, computational complexity can be significantly increased. Finally, the authors point out various open issues towards efficient anomaly detection, such as the use of realistic datasets, as well power consumption during model execution.
The work in [10] deals with several security and privacy challenges in 6G networks. As the authors point out, apart from well-known attacks (e.g., denial of service - DoS), additional attacks may occur in 6G networks, such as learning-empowered attacks and massive data breaches, due to the increase in connected devices and novel technologies. A key outcome of this work is that there is a fundamental trade-off among real-time consuming security methods and energy efficiency.
In [31], various research challenges with respect to ML data-driven optimization in 5G networks were investigated along with related technologies, such as big data analytics and cloud-edge computing. In greater detail, application and user-oriented optimization approaches were highlighted. Moreover, various research challenges were identified, including real-time data analytics and prediction errors. In [32], the authors focus on O-RAN and associated implementation challenges, such as RAN Intelligent Controllers (RICs) that can be used to effectively control and manage 3GPP-defined RANs, and ML workflows for process optimization and security enhancement. In the same context, experimental platforms and standardization activities are also discussed. Finally, in [18], various resource optimization approaches were presented, in the context of ML-assisted radio resource management. The derived analysis is mainly focused on physical layer aspects of 5G/B5G networks.
Table 1 summarizes the recent surveys focusing on resource optimization, security and privacy, and ML and their contributions, including this work.
B. Contributions
The presented studies in the previous subsection have dealt with recent advancements on efficient resource optimization and security enhancement via ML. However, a key challenge in next generation networks will be the existence of a unified framework for efficient data collection, resource optimization, and security protection. Therefore, unlike other works in the literature, our work focuses on various aspects related to NWDAF implementations, such as effective data collection over vast and distributed data generation endpoints, network optimization via ML algorithms, and anomaly detection and mitigation.
In light of the above, our main contributions in this survey can be summarized as follows:
The main 5G/6G technologies for data monitoring are presented and special focus is given to anomaly detection for strengthening network security and privacy, also considering the latest 3GPP technical specifications.
Recent works on NWDAF are presented, emphasizing on its role to ensure efficient network monitoring and data collection, towards resource optimization, as well as anomaly detection, and security and privacy enhancement.
Limitations and open issues are identified as well, to trigger further discussions on the integration of NWDAF with 6G networks.
A high-level architectural approach is presented and discussed for efficient data collection and ML model training, either for resource optimization or anomaly detection, taking into account all latest achievements on ML.
C. Structure
The rest of this paper is organized as follows. In Section II, key technologies for data collection and security enhancement are presented, including federated learning (FL), IoT-edge-cloud (IEC) architectural approaches, the O-RAN framework, and the NWDAF. The most important security threats and anomaly detection algorithms are outlined in Section III. In Section IV, recent works on NWDAF implementations are discussed and categorized. Based on the key findings of the presented works, a discussion takes place in Section V, where basic limitations and open issues are identified. To this end, a high-level concept for large scale data collection and ML model training based on NWDAF is presented. Finally, concluding remarks are summarized in Section VI. The structure of this survey paper is also depicted in Fig. 1 for illustration purposes while Table 2 includes the acronyms used throughout the paper.
Technologies for Data Monitoring and Security Enhancement
The support of highly demanding applications in the context of next generation networks necessitates continuous key performance indicators (KPIs) evaluation and possible network redesign. Therefore, a large amount of data are collected and processed for network optimization purposes. As next generation networks involve the integration of a vast number of heterogeneous technologies and devices, global optimization can be quite challenging. Here, various emerging technologies make feasible large scale data collection and management, including distributed and decentralized ML, IEC integration towards a unified continuum, O-RAN, as well as NWDAF. In the following subsections, the key points of each technology are highlighted.
A. Distributed and Decentralized Machine Learning
ML approaches can be highly applicable in non-convex problems for process optimization, as well as anomaly detection and mitigation. In these cases, as also stated in the introductory part, data are collected from various end points of the network orientation and the appropriate ML models are trained according to a predefined manner, such as supervised or unsupervised learning, as well as reinforcement learning. In realistic orientations however, data collection and processing to a single end point might not be the best option, since on one hand computational load is significantly increased, and on the other hand there is the danger of single point of failure. Therefore, distributed and decentralized ML approaches can be applicable as well [33] and [34].
In this context, the concept of FL has emerged over the last years, as a computationally efficient approach for decentralized ML calculations [35], [36]. To this end, there are various participating nodes in each training round, where it is also assumed that the individual datasets are uncorrelated. Therefore, each node trains a local ML model based on the available dataset. At periodic time intervals, results are sent to a primary node for proper model aggregation. The updated model is also sent to the participating nodes at a later stage. It should be noted at this point that exchanged data among the participating nodes and the primary model server do not contain any sensitive information but only the parameters of the trained model (e.g., weights of a neural network-NN in cases of deep learning - DL). Hence, a dual goal can be achieved: on one hand training data remain localized, thus mitigating security and privacy concerns, and on the other hand faster training times can be achieved, due to the distributed and diverse nature of the FL approach. However, certain vulnerabilities may occur, such as training data tampering in one or more participating nodes, which in turn will affect the derived parameters of the local ML model, as well as eavesdropping attempts on the global model parameters [37].
In decentralized FL training, there can be multiple aggregation servers that periodically send their updated model parameters to a primary server. In Fig. 2 for example, a two-tier FL model is depicted. In this case, there are N participating nodes, divided into a discrete number of subsets. Each subset communicates with a local FL aggregation server. A primary FL server located on the top of the corresponding hierarchy can communicate with all local FL servers. The latter combine the training parameters from the participating nodes to aggregate the local FL models, which in turn are sent to the primary server [38].
B. IoT-Edge-Cloud Convergence
The interconnection of a vast number of IoT devices in the network has created a new landscape in which various data transmissions take place from resource constrained devices. The transmission and processing of data in centralized cloud infrastructures for resource optimization would increase the computational load and jeopardize latency requirements in highly critical applications. In this context, a novel architectural approach that has emerged over the last years is IEC integration, towards in what is also called as IEC continuum. In this case, multi-access computing (MEC) servers that are in the proximity of IoT devices collect related data and process them accordingly [39], [40], [41]. Hence, efficient task offloading from IoT devices is now feasible, leveraging reduced latency, decentralized and distributed computations, as well as energy consumption minimization [42]. Furthermore, edge deployments can bring data closer to the user through intelligent edge caching, further reducing the end-to-end latency [43]. In practice, edge servers are used for computationally demanding calculations in latency critical applications, while cloud servers for macroscopic KPIs monitoring and optimization. In Fig. 3 for example, various traffic flows are depicted in different colours according to their significance in terms of latency. Therefore, medium-critical traffic is forwarded to the MEC server, while non-critical traffic can be offloaded to the cloud server.
Security and privacy enhancement in IEC systems can be a key challenge towards large scale deployment, due to the resource constrained nature of IoT devices that may not be always in the position to execute advanced security protocols. In this case, blockchain technology can be used for task offloading to MEC servers [44], [45]. The blockchain network works as a mediator between the IoT devices and the edge servers where data are generated as a form of blockchain transactions. The blockchain stores these transactions in a distributed ledger to guarantee data integrity, availability, immutability, traceability, and transparency.
IEC approaches can be highly applicable in a wide range of use cases, such as smart manufacturing [46], [47], e-health applications [48], optimization of maritime procedures such as in time arrival of vessels in ports [49], [50], the agricultural sector [51], etc.
C. O-RAN
Initially driven by mobile network operators (MNOs) 5G infrastructure cost and affordability concerns, the industry transition towards an open, disaggregated, intelligent, virtualised and highly extensible 5G vRAN architecture has consolidated around the O-RAN Alliance [22], which defines the O-RAN architecture and specification for 5G and beyond networks based on virtualized and interoperable components. In this context, the goal is to build on 3GPP 5G standards with a foundation of virtualized network elements, white-box hardware and standardized interfaces that fully embrace the core principles of intelligence and openness [52], [53]. The O-RAN paradigm aims to make the RAN more independent, standardizing a set of open interfaces to avoid proprietary technologies and vendor lock-in. O-RAN supports AI-optimized closed-loop automation, which leverages flexible resource management. In the second half of 2020, the O-RAN Alliance introduced the slicing architecture [54], where the general principles, requirements, reference signals and deployment options for network slicing on the reference architecture are presented. O-RAN conceives network slicing as an end-to-end (E2E) feature including core, transport and RAN segments.
A schematic overview of the O-RAN components and interfaces is shown in Fig. 4. In this context, two critical building blocks are responsible for the execution of the ML workflow: (i) the near real-time (RT) RIC is a logical function that enables near-RT control and optimization of RAN elements and resources via fine-grained data collection and actions over E2 interface, as depicted in Fig. 4, and (ii) the non-RT RIC, which is a logical function within service and management orchestration (SMO) that enables non-RT control and optimization of RAN elements and resources, AI/ML workflow including model training, inference and updates, and policy-based guidance of applications/features in near-RT RIC.
It should be noted at this point that during the deployment of supervised and unsupervised ML algorithms, the ML training host is essentially located in the non-RT RIC, while the ML model host/actor can be located either in the non-RT RIC or in the near-RT RIC. On the contrary, in the framework of RL, both the ML training host and the ML inference host/actor shall be co-located as part of non-RT RIC or near-RT RIC [19].
D. NWDAF
The NWDAF is a new feature of 5G networks, which enables the network operators to either implement their own ML based data analytics methodologies or integrate third-party solutions to their networks. NWDAF, which is defined in 3GPP TS 29.520 [22], incorporates standard interfaces from the service-based architecture to collect data by subscription or request data from other NFs and similar procedures and is depicted in Fig 5.
3GPP has defined two models of analytics services provided from NWDAF. The first one involves a subscription model and is used in case periodic analytics information is needed, e.g., periodic load information (current/predicted) of network slice. The second involves a Request/Ad-Hoc model and is used if a one-time analytics computation and information is needed, e.g., experience score of a newly deployed application for a particular day/hour/region. The NWDAF offers two services (called Nnwdaf services in [22]): The first one is the Nnwdaf_EventsSubscription service, which enables the NF service consumers to subscribe to and unsubscribe from different analytics events provided by the NWDAF, and subsequently enables the NWDAF to notify the NF consumers about subscribed events. The second Nnwdaf service is the Nnwdaf_AnalyticsInfo service, which enables the NF consumers to request and get specific analytics from the NWDAF. The Nnwdaf services and the associated operations are listed in Table 3.
The NWDAF in the 5G core (5GC) network plays a key role as a functional entity that collects KPIs and other information about different network domains and uses them to provide analytics-based statistics and predictive insights to 5GC NFs, for example to the policy control function (PCF). Advanced ML algorithms can utilize the information collected by the NWDAF for tasks such as mobility prediction and optimization, anomaly detection, predictive quality of service (QoS) and data correlation [55].
3GPP TR 23.791 [56] has currently listed the following formula-based/AI-ML analytics use cases for 5G, using NWDAF: i) Load-level computation and prediction for a network slice instance, ii) Service experience computation and prediction for an application/user equipment (UE) group, iii) Load analytics information and prediction for a specific NF, iv) Network load performance computation and future load prediction, v) UE Expected behaviour prediction vi) UE Abnormal behaviour/anomaly detection, vii) UE Mobility-related information and prediction, viii) UE Communication pattern prediction, ix) Congestion information - current and predicted for a specific location, and x) QoS sustainability which involves reporting and predicting QoS change.
The NWDAF combines information from different core NFs, and infrastructure telemetry data, and provides analytics and predictions that can be used by 5G core NFs. The centralized management data analytics function (C-MDAF), on the other hand, which is conceptually described in [57], provides management data analytics services (MDAS) that can be consumed by various consumers, like management service producers/consumers for network and service management, NFs (e.g., NWDAF), self-organizing network functions, network and service optimization tools/functions, service level assurance functions, human operators, and AFs [58]. Both the NWDAF and the C-MDAF can utilize AI/ML for advanced data analytics facilitating intelligent decision making. Finally, AI/ML support procedures in 5G systems with the help of NWDAF are described in [59].
In Fig. 6, an indicative use case is presented where NWDAF is combined with O-RAN in a smart manufacturing environment. In this context, sensor nodes, that form the IoT network, can communicate directly with baseband units via the O-RAN interfaces and transmit their information to the 5G network. In this context, and with respect to the IEC architectural approach that was previously mentioned, the NWDAF may request information on demand that is later forwarded to the corresponding applications. Moreover, advanced ML resource optimization approaches can be adopted as well, for macroscopic network optimization.
Although 6G networks have not been standardized yet, certain architectural approaches have been discussed over the last years, such as “organic” 6G networks [60]. In this case, instead of a full network redeployment, certain functionalities can be supported that are modelled as discrete workers, according to the requests of the mobile nodes. A worker is able to execute all the steps of a specific procedure, equivalent to all the different steps in the 5G service based architecture operations executed by the different network functions. Therefore, a very large simplification is achieved as internal messages do not have to be encoded or decoded and inter-component interfaces do not have to be established. Therefore, distributed data collection, security and privacy enforcement as well ML model training, being the core components in large scale distributed and secure network optimization can be modelled as discreet workers. These workers are able to acquire the subscriber state (e.g., mobile node), execute the steps required for the specific request (including the communication with the radio and core network elements), update the subscriber state and respond to the mobile node request.
Security Threats and Anomaly Detection
In wireless networks, the application of AI/ML is not only growing but is gaining importance in the network security domain, from both the offensive and defensive perspectives, and requires a coordinated approach. Unlike 5G networks, where security solutions across all devices and base stations are configured with universal settings for certain types of attacks, it is apparent that such an approach cannot be applied in 6G networks. In this case, the high diversity in service provisioning, connected devices and associated protocols, and the various physical layer encoding and transmission techniques, render a highly complex environment with different requirements and settings. Since each scenario may have unique security requirements, energy availability and computation capabilities, the selection and configuration of security strategies need to be optimized for 6G networks in an adaptive and dynamic manner [61], [62].
A. 3Gpp Threat Categorization
Recently, 3GPP has provided TR 33.926, capturing, threats and critical assets that have been identified in the context of security assurance specifications [63]. The identified threats were divided in seven categories, one covering threats relating to 3GPP-defined interfaces while the other six categories follow the STRIDE categorization [64]:
Spoofing identity involving illegal access and use of another user’s authentication information, e.g. username and password.
Tampering with data focusing on malicious data modification, for example, unauthorized changes to persistent data, residing in a database, data alteration as it flows between two computers over the Internet.
Repudiation threats related to users who deny performing an action without other parties being able to prove otherwise. More specifically, a user might perform an illegal operation in a system that cannot trace the prohibited operations.
Information disclosure associated with the exposure of information to individuals who should not have access to it. For example, unauthorized users might be able to read a file that they were not granted access to, or an intruder might read data flowing between two computers.
Denial of service attacks where valid users are denied the requested service, e.g., by making a Web server temporarily unavailable or unusable.
Elevation of privilege where an unprivileged user gains privileged access, thus being able to compromise or destroy the entire system. An illustrative case of this threat involves an attacker that has penetrated all system defenses, becoming part of the trusted system itself.
B. Ensuring Security and Privacy in 6G Networks
The security attacks in 6G networks are polymorphic in nature and sophisticated [65], using previously unseen custom code, able to communicate with external command and control entities to update their functionality or even implement themselves entirely from code fragments that they intelligently harvest from benign programs, scripts and software blocks already present in the security system in place. To this end, a smart decision support system is required to evaluate different factors, such as the severity of the incident, the criticality and resilience of the infrastructure compromised as well as the cost of enforcing a mitigation.
Over the last years, various works have dealt with security and privacy issues in 6G networks. In [61] for example, a detailed analysis of the threat landscape based on future use scenarios of 6G networks is provided. In this context, the space-air-ground ocean integrated network (SAGOIN) architecture has been presented, along with two different FL-based AI model training frameworks in 6G: the single region FL framework, where all involved nodes correspond to a particular area, as well as the cross-layer FL framework, where FL can be applied to various participating regions. In this context, specific security attacks have been considered, that have been mitigated with a proposed Q-learning framework, which decides on the entrance of trusted nodes in the FL training stage according to the maximization of specific training rewards.
In [62], the authors have proposed an optimisation framework to address the identified challenges in 6G networks. The proposed framework optimizes security scheme selection and configurations to balance the security-energy trade-off in various scenarios. In [65], the authors analyze various potential new threats caused by the introduction of new technologies related to the usage of open-source tools and frameworks for 6G network deployment and present possible mitigation strategies to address these threats. These include zero trust architecture (ZTA), as well as automated management system for open-source security. In the first case, it is assumed that an attacker can exist inside the network. Therefore, various network entities have to mutually authenticate with other similar entities in a secure manner, such as public key infrastructure (PKI). In the second case, all open-source information is recorder, to avoid tampering attempts.
ML and AI-based optimization approaches can be used to improve time-series and statistics-based methods to operate beyond non-normal and train the system generating attacks. For instance, in [66] the use of generative adversarial networks (GANs) is explored to simulate intrusions and malware for improving its detection, and to fuel defense against different attack methods. GANs can be very helpful towards threat detection and mitigation, since new potential datasets can be generated from the available training data, thus simulating additional potential attacks that may take place. In [67], several DL architectures are used for the detection of threats. In the same context, in [68], various anomaly detection approaches based on DL are investigated, such as classification or cluster-based anomaly detection. In [69], various potential threats based on the STRIDE methodology are presented in the context of internet of senses. The latter term refers to highly demanding application in the 6G concept, where various sensors properly deployed create visual representations, such as holograms.
In [70], an FL-based approach for anomaly detection has been considered. In this context, each participating FL node follows a double hierarchy for anomaly detection and rejection. At the first stage, the ML model is trained with a relative smaller number of potential threats. The second detector employs a more complex model, which identifies potential anomalies that were unable to be detected from the first detector.
Finally, in [71], a multilevel FL approach has been considered between IoT devices and edge applications. This algorithm is divided into three phases (clustering, training, detection). In this case, a set of trusted IoT nodes is selected that join the nearest edge server, for sharing the training models and all attacks’ detection events. In each set of nodes, the cluster head and the cluster members are identified. The edge devices update the corresponding models that are later send to the involved cluster heads. In the detection phase, the cluster heads and the edge server categorize the behaviors of their respective monitored devices, either as normal or as malicious according to the global training model obtained during the training process.
State-of-the-Art on NWDAF
In this section, various works are presented that propose practical NWDAF implementations, towards resource optimization and security enhancement. For each presented work, key issues are highlighted along with potential limitations.
A. NWDAF-Aided Resource Optimization
In [72], a survey of the 3GPP Release 18 on NWDAF management is presented. More specifically, the authors study i) the network resource model (NRM) enhancement for supporting multiple NWDAFs and their logical function decomposition, and ii) performance management of the NWDAF, in terms of data collection, output and efficiency. Moreover, the FS_MANWDAF (TR 28.864) [73] framework is introduced by China Telecom and two key issues to further promotethe development of NWDAF management, were highlighted, i.e. performance metrics for NWDAF and its impact on the overall energy efficiency. In this context, in [74] the authors have presented an NWDAF implementation, where data generation is performed according to 3GPP specifications and intelligent network management capabilities are highlighted.
A number of works, use open-source tools to develop their NWDAF implementation. In [75], an experimental setup of NWDAF has been implemented and presented with the help of the free5GC open-source framework [76]. In this context, ML-assisted operations can be supported as well with the help of Python. Training modules train the model through the data from other NFs or provide the analytic results to other NFs based on the trained model. In [77], a practical implementation of NWDAF was considered, using open-source software, including Open5GS [78] and UERANSIM [79]. In this context, an initial analysis of the collected data was presented, along with a potential usage of NWDAF to support MANO activities, such as core NF placement. Moreover, ML techniques are applied to the collected data, and in particular unsupervised learning to group core NF-NF interactions. According to the presented results, clustering approaches can be quite helpful in defining similarities between NF-NF interactions. As the authors point out, a key challenging approach is forecasting and proactive network management. In [19], an O-RAN implementation has been considered that incorporates AI/ML as well as network telemetry with NWDAF. In this context, a supervised learning algorithm has been considered for cell traffic prediction as well as a DRL algorithm for energy efficiency maximization.
In [80], a novel framework is presented for the extraction and delivery of RAN data analytics information to radio resource management (RRM) algorithms. To this end, the RAN data analytics function (RANDAF) has been proposed, which can operate independently from the RAN nodes. The RANDAF is conceived as a data analytics capability separate from the RAN nodes and equipped with the necessary storage and computation resources to properly cope with the large spatial dimension (e.g., multi-node/multi-cell) and temporal dimension (e.g., long time series of collected data) of the data analytics within the 5G-RAN. In the same context, potential use cases of network analytics are presented, such as prototype trajectories for enhanced handover as well as UE radio resource utilization patterns for enhanced admission control. Then, in [55], an enhanced 5G architecture that allows for end-to-end (E2E) support of data analytics is presented. In particular, the requirements for data analytics to improve the operation in different domains are presented. To this end, predictive RRM is investigated, via a particular implementation for application and radio access network analytics, based on a novel database for collecting and analyzing radio measurements.
In [81], a cloud-based NWDAF approach has been implemented. A key novelty of the presented approach is the use of serverless computing that can decouple software functionalities for specific hardware equipment. For this purpose, the Amazon Web Service has been used. As the authors also correctly point out, the specific approach can leverage startup times with respect to virtualized approaches and leverage payment-as-a-service models. In [82], the authors deal with effective slice management in 5G networks, in the context of IBN. In this context, NWDAF provides information per slice that is later used for optimum slice deployment with the help of ML algorithms.
In [83], the authors deal with use-case development interface (UDI), which defines the appropriate procedures among producers and consumers for their interactions, data exchange, as well as tracking the actions and evaluating the results in the context of 5G networks. The work in [84] deals with NWDAF offloading to the cloud, according to cost minimization and utility maximization. In this context, the mapping of gNBs to 5G-NWDAF has been formulated as an Integer Linear Program.
In [85], a Meta-NWDAF architectural approach has been presented. A key novelty of the presented approach is that signaling overhead can be reduced, since learning is based according to a few samples in the past. Another key finding of the presented work is that meta-learning can be applied not only to classification problems but also in time series prediction problems.
In [86], a similar approach to [66] is considered, where synthetic datasets are produced with the help of a GAN algorithm. In this case, the NWDAF public dataset has been used as a baseline for generating training data. Therefore, the goal was to produce a labelled dataset that can be used in various resource optimization approaches as well as for anomaly detection. In this case, the proposed approach tries to create similar data based on the frequency of appearance of certain key variables (e.g., the amount of data transmitted over a specific period, the type of connected devices, the load of the network, etc.). According to the presented results, high synthetic data scores were achieved (i.e., almost 98%). In the same context, the generated dataset was used for anomaly detection with the decision tree classifier, where high accuracy and precision were achieved.
Other works, employ distributed ML paradigms for NWDAF, i.e. FL and transfer learning. In [87], FL is combined with NWDAF in a setup where NFs may use specific ML models to train local datasets. Afterwards, the updated models are sent as events using data collection procedures. Therefore, NFs that have been subscribed to receive global updates of the model are informed.
Likewise, the work in [88] presents simulation results on FL-based NWDAF approaches. In this context, three different architectural approaches are presented, namely centralized ML, decentralized FL and centralized FL. In the second case, local computation nodes exchange their updated ML model parameters with their neighbor nodes in the distributed network. It is assumed that separate NWDAF instances are deployed per participating node. Afterwards, model aggregation is performed on local level. As the authors correctly point out, although this approach can improve system’s latency, lack of convergence and robustness may occur. In the centralized approach, one of the computing nodes serves as a primary FL server. This server aggregates the received local model parameters from the local NWDAFs to generate a new global model, which is then sent back to the participating nodes. In this manner, the server NWDAF is capable of enforcing the local NWDAFs to achieve a certain threshold of prediction accuracy in the long term. An interesting outcome of the presented study is that centralized FL approach can outperform the corresponding decentralized approach both in terms of latency and prediction. In particular, accuracy may reach up to 98% for the four node scenario.
In [89], a distributed NWDAF architectural approach has been presented, where leaf NWDAFs create local models and root NWDAF constructs a global model by aggregating the local models. This structure can guarantee data privacy since local models are created in NFs and can reduce network resource usage because the global model is created by collecting local models. Moreover, since ML model training takes place in a decentralized computational mode via FL, the energy footprint of the involved devices is reduced. Each distributed installation of multiple NWDAFs is dependent on specific regions, specific slices, and/or specific NFs.
In [90], the concept of transfer learning has been adopted for improving the performance of the NWDAF. Current specifications of NWDAF can only determine whether a local existing model can be directly used or need further training. As a result, with the help of transfer learning, a particular NWDAF can request a model from another NWDAF that meets certain requirements and target goals. This is achieved with the help of the Analytics Data Repository Function (ADRF). The ADRF registers the model information to NRF to make it accessible from other NWDAFs.
B. NWDAF-Aided Security and Privacy
The importance of employing NWDAF for anomaly detection purposes is studied in several works. In [91], data collection with the help of NWDAF has been used for anomaly detection. To this end, convolutional neural networks (CNN) have been chosen among various DL methods since they can identify unwanted and suspicious data patterns.
In [92], various research challenges associated with anomaly detection with NWDAF are discussed. In particular, the communications among different NWDAFs are not standardized. As the authors correctly point out, one solution could be the implementation of lightweight ML-based systems for fast, accurate, communication-efficient, and private traffic anomaly detection and warning in NWDAF. However, on one hand such an approach requires multiple training and update rounds that might not always be feasible in bandwidth constrained devices, and on the other hand a tampering attempt to alter the training data of a participating node in the FL scheme would also result in malfunction in the primary model.
In [93], an attack surface analysis is conducted for NWDAF, based on the STRIDE framework. In this context, threat ratings are assigned to security threats faced by the NWDAF data collection process, and targeted threat protection measures are formulated to provide security technology guidelines for NWDAF planning and development.
In [94], ML has been used with NWDAF for network traffic prediction and anomaly detection. According to the simulation results, neural network algorithms outperform linear regression in network load prediction, whereas the tree-based gradient boosting algorithm outperforms logistic regression in anomaly detection. Additional open issues are identified as well, such as labeling various and diverse data from network sources.
Other works focus on the role of NWDAF for safeguarding the security and privacy of wireless networks. Here, in [95], the authors propose a privacy efficient scheme based on SIGMA generated messages, to avoid leakage of sensitive information, such as weight updates or the identities of the involved nodes during each FL round. According to the presented results, communication overhead is significantly reduced when compared to other state of the art works. In an effort to further leverage security and privacy in NWDAF enabled 6G architectures, the work in [96] deals with partial Homomorphic Encryption (HE) to secure ML model sharing with privacy preservation.
All presented works are summarized in Table 4, where their key considerations, limitations, and open issues are identified. Moreover, in Table 5 related works have been categorized according to ML-assisted resource optimization and/or security and privacy enhancement (including anomaly detection).
Limitations and Open Issues
From the analysis of the previous paragraphs, as well as from Tables 4 and 5, various key outcomes, open issues and limitations can be identified to trigger further discussions on practical NWDAF implementations and its role on resource optimization, and security and privacy enhancement as we move towards the 6G era. Below, our main findings are given:
In the vast majority of related works, there are no realistic datasets from 5G network operators, and data are used from open-source simulation environments.
The main problem with the O-RAN specification is that it has not been defined following the principles of “security by design”. As a result, the current O-RAN specification presents several security and privacy risks. A major concern is that O-RAN does not specify any security procedure regarding untrusted cloud operators, the specification just assumes that the cloud environments shall be trusted. To this end, the utilisation of Trusted Execution Environments (TEE) is an appropriate measure that can mitigate the risks for the operation of O-RAN in untrusted clouds.
In the vast majority of related works, limited network orientations have been considered. It is essential to examine the presented approaches in large scale orientations in order to examine performance limitations and scalability issues.
The design and implementation of NWDAF approaches should be based on open tools and architectures to leverage flexible and multivendor deployments as well as ease of implementation of potential updates and incorporation of 6G technologies.
In various works in literature, NWDAF are alternately deployed on cloud or edge infrastructures. Since most data are created at the edge yet most applications require federation of clouds, it is necessary to make the many edges and clouds interoperate.
It is apparent that FL is a well-promising approach that can leverage security and privacy in broadband networks. As already mentioned, instead of transmitting entire training datasets to centralized cloud locations, only the parameters of the corresponding models are transmitted. Even so, there is still the possibility of attacks in one or more participating nodes during FL training. Hence, tampered parameters will be inferred to the primary aggregated model. In this case, a distributed NWDAF architectural approach might be more feasible.
In the same context, as it is quite difficult to adopt a common labelling framework for all involved entities in a 6G infrastructure, decentralized FL training approaches leverage independent local data labelling, collection, and ML model training.
From the analysis of the presented works, it becomes apparent that unsupervised learning approaches can be quite beneficial for anomaly detection in unlabeled data. In the same context, additional threats and attack vectors can be produced with the help of generative adversarial networks. This approach has been followed for example in the works in [66] and [86], where synthetic datasets have been generated. Hence, with respect to the first limitation that was mentioned regarding performance evaluation with non-realistic datasets, this approach can be used on one hand for accurate performance evaluation and on the other hand for threat landscape expansion.
In latency critical applications, the use of private 5G networks might be more appropriate for data collection and optimization [97], [98], [99]. In this case, various types of such networks can be supported, such as: i) fully autonomous ones, where an organization owns all the equipment, private clouds, and spectrum, ii) hybrid private-public cloud 5G networks, where a business may own or lease on-premises equipment and use a public or private cloud service to host parts of the network and iii) private 5G delivered via network slicing, which may include an on-site RAN and other equipment, depending on application needs.
Based on the above, in Fig. 7 a high-level architectural approach is presented for decentralized data collection and decentralized/distributed ML model training, where GANs, transfer and meta-learning modules are used. In particular, various NWDAF instances are placed in discrete parts of the network. Data collection in each NWDAF instance is assumed to follow specific labelling and data format rules. In the same context, each NWDAF instance communicates with a GAN cloud server where local datasets are expanded. In the same context, inter-NWDAF communication can be supported as well, via privacy preserving methods, such as LDP, where updated training parameters are exchanged. The primary FL server communicates with a transfer learning database, where all individual generated ML models are stored and retrieved on demand. In the same context, local FL servers communicate with a meta-learning server, where key updates are inferred to all participating models. It should be noted that both local GAN servers, as well as the meta-learning server have been modelled as cloud servers due to the required processing power. Finally, in all data transactions, encrypted transfers are supported.
A high-level architectural approach for data collection, ML model training and threat mitigation in future networks.
Moreover, it should be noted that the proposed approach is compliant with Release 18 of 3GPP [100], which has suggested involving FL instead of sharing raw training data between NWDAF instances. With respect to Fig. 6 for example, where NWDAF is combined with O-RAN in a smart manufacturing environment, different NWDAF regions may correspond either to different production units of a company or to multiple companies that produce similar products. In the first case, private 5G networks can be deployed that ensure more robust connectivity and reduced latency compared to public networks. Therefore, on one hand privacy critical information remains localized, and on the other hand tampering attempts to the training parameters of ML models can be mitigated due the encrypted communications among the involved nodes and local FL servers.
Conclusion
The decentralized and heterogeneous nature of future broadband networks necessitates efficient and adaptive resource optimization mechanisms with enhanced security and privacy. When data collection is combined with advanced machine learning algorithms, full-scale optimization can be supported along with threat detection and mitigation. The NWDAF concept, introduced from Release 15 of 3GPP, supports data collection from various network functions, as well as ML-assisted resource optimization. NWDAF can be deployed with other key enabling technologies, such as O-RAN, and support large scale data collection and optimization in a secure and trusted environment. In this survey, recent progress on NWDAF practical implementations has been presented, and various limitations and associated challenges have been identified. Since a unified data labelling and security provision framework might not be feasible in large-scale heterogeneous environments, due to the diverse nature of the incorporated devices and technologies, decentralized approaches can be employed considering different network sub-regions. To this end, fundamental operations such as data collection, machine learning model training and privacy enhancement can be employed under a common data access and management framework in each region. Afterwards, federated learning can be employed to properly update the corresponding models. Another important issue that was highlighted is the absence of realistic network data for accurate performance evaluation of the considered implementations. Here, emerging approaches, such as the use of generative adversarial networks for realistic data generation can leverage accurate performance evaluation, and the identification of additional potential threats and anomalies. Based on the above, a high-level architectural approach has been presented, where NWDAF can be used for data collection in large-scale heterogeneous environments where emerging ML paradigms, such as transfer and meta-learning can contribute to the construction of robust ML models with reduced training times.