Transforming the 5G RAN With Innovation: The Confluence of Cloud Native and Intelligence

Intelligence and cloudiﬁcation are widely recognized as key driving forces in the evolution of 5G radio access network (RAN). This paper presents a promising architecture framework for the evolution of 5G radio access network, enabled by a deep integration with cloudiﬁcation and artiﬁcial intelligence/machine learning (AI/ML) technologies. To accommodate the diversiﬁed scenarios and services and handle the complexity of the 5G network in a ﬂexible and efﬁcient manner, the architecture framework highlights three concepts: convergence of RAN and cloud, RAN empowered by hierarchical AI capabilities, and mutual awareness between RAN and services. The key design aspects and technologies that realize those concepts are discussed systematically. Two typical use cases including the RAN slice resource allocation optimization and RAN-aware video service assurance, are demonstrated along with the simulation or lab test results to validate the potential of the architecture framework.


I. INTRODUCTION
The 4G cellular network has revolutionized people's daily life. The 5G's vision to reshape the society by empowering numerous businesses is still yet to realize. While the core network has been more prepared for that with its evolution on network function virtualization (NFV), servicebased architecture and network slicing, more fundamental challenges come from the radio access network (RAN) domain.
The first of the pain points is the fact that the 5G RAN deployment is still time-consuming and that its maintenance is still complex, especially when compared with Wi-Fi, the other popular wireless solution. It may take months for mobile network operator (MNO) to deploy a network, with a group of specialized engineers to ensure its normal operation. It is also noteworthy that, with the unprecedentedly complex network hierarchy and protocols, the assurance and optimization of network performance is ever challenging, and involves multiple dimensions such as coverage, capacity and energy The associate editor coordinating the review of this manuscript and approving it for publication was Derek Abbott .
consumption. Though artificial intelligence/machine learning (AI/ML) and big data technologies have shown huge potential in extensive studies, the application of them in RAN is just setting out. Last but not the least, when 5G is being integrated with verticals, e.g., industrial control system, the dynamics in traditional RAN and application services are hidden from each other, making the end-to-end system less adaptive to guarantee the strict service level agreement (SLA). In short, the following targets are to be achieved for the successful evolution of 5G RAN: agility in deployment and operation; intelligence for performance maximization; synergy between RAN and application services.
Toward that end, the telco industry is evolving to embrace the most cutting-edge technologies. Cloud technology from IT industry provides on-demand computing services over the network, featuring faster time-to-market, innovations, and economies of scale. Similar attempts in the telco may be dated back to the initiative of C-RAN [1], which proposed to aggregate computing power at the network edge to build cloud RAN. The ideas therein have laid down the foundation of today's O-RAN concept [2]. Today on this track, the most ambitious companies are starting to deliver VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ a private cloud-based 5G network within a few days, sell the service with a pay-as-you-go model, and offers easy and user-friendly maintenance. Another technology, AI/ML, is about extracting the latent knowledge usually from a large set of data, and applying it for analysis, prediction and decision. Wireless big data (WBD) is the lifeblood of RAN intelligence [3]. It refers to the abundant data in different locations and time granularities, inclusive of network conditions, UE status and service characteristics. RAN intelligence may greatly simplify network automation with support of self-configuration, self-monitoring, selfhealing and self-optimization [4], [5]. More importantly, it also enables collaboration between RAN and application services dynamically and seamlessly. This paper provides a systematic overview of RAN evolution towards the cloudification and intelligence. A brief review of the efforts in the industry is given in Section II. In Section III, the evolved RAN architecture framework featured by cloudification and intelligence is introduced to address the identified challenges. The key design aspects for RAN cloudification and intelligence are detailed in Section IV. As a major contribution of this paper, serviceaware and RAN-aware cross-layer optimization is proposed for the architectural framework to enable the close coordination between the service applications and RAN. A new interface from RIC is introduced and the potential service enhancement information exchange via the newly defined interface is analyzed. Section V demonstrates two typical use cases that are enabled by the architecture framework. The summary and open points are presented in Section VI.

II. JOINT ENDEAVORS OF THE INDUSTRY
This section offers an overview of the major collaborative efforts and trends in standards development organizations (SDOs) that drive the 5G RAN evolution.

A. O-RAN ALLIANCE
O-RAN Alliance [2] is dedicated to transforming RAN to an open, intelligent, virtualized and fully inter-operable paradigm. Cloudification is regarded as one of the pillars of O-RAN architecture. Facilitated by the standardization of front-haul that decouples hardware radio units and softwarebased network functions (NFs), O-Cloud is specified to host all the virtualized network functions (VNFs) in a unified manner. The orchestration of the VNFs is managed by the centralized Service Management and Orchestration (SMO).
For RAN intelligence, two new logical functions, i.e., Near-Real-time RAN Intelligent Controller (Near-RT RIC) and Non-Real-time RAN Intelligent Controller (Non-RT RIC), are defined to optimize RAN behavior in different control loops denoted by timing (''near-real-time'' for a range from 10 milliseconds to 1 second, and ''non-real-time'' for above 1 second), but are both targeted to accommodate AI/ML workflows and enable embedded RAN intelligence. Thanks to the service-based architecture with developer-friendly APIs, RAN optimizing microservices from 3rd parties like open-source communities or verticals can be on-boarded to serve various use cases. Those customized microservices are termed as xApps and rApps in Near-RT RIC and Non-RT RIC respectively, and typically empowered by AI/ML. Traffic Steering, QoS/QoE optimization and RAN slicing assurance are among the first use cases that are supported by this innovative paradigm.

B. ETSI
ETSI NFV laid the foundation for the virtualization of NFs by standardizing the reference architecture of NFV. It also specified the family of NFV Management and Orchestration (NFV-MANO) APIs that enables efficient management and orchestration of VNFs and network services within an interoperable ecosystem.
ETSI MEC (Multi-access Edge Computing) is another well-recognized concept that reforms the industry, providing cloud-computing capabilities and an IT service environment at the edge of the network. With MEC, IT economies of scale can be leveraged, allowing proximity, context, agility and speed to be used for wider innovation which can be translated into unique value and revenue generation. MEC uses a virtualization infrastructure for running applications at the edge of the network. The MEC specifications leverage the NFV infrastructure and the infrastructure management to the furthest extent possible. The MEC application orchestrator (MEAO) helps to orchestrate the MEC applications to realize the services provided. Recently, containerized solutions, e.g., containerized network functions (CNFs) are being studied.
The initiative of ETSI Zero-touch network and Service Management (ZSM) Industry Specification Group (ISG) is to define the architectural framework for automated end-to-end network and service managements. The vision is zero-touch autonomous network operation driven by high-level policies: the intelligent systems will be able to self-manage and selforganize (configuration, healing, assurance, optimization, etc.) without human intervention beyond the initial intents. The modular, flexible, scalable, extensible and service-based ZSM framework offers support to model-driven service abstraction, open interfaces, as well as integration of AI/ML functionalities toward operational autonomy. A complete set of key means and technologies are also identified to ensure inter-operable, unified and consistent solutions for closedloop, cross-domain and cognitive operations.

C. 3GPP
As the provider of de-facto telco industry standards, 3GPP merged the pioneering work of ETSI NFV into 5G. The unified management and orchestration of 5G networks and network slices extends the conventional operation and management (OAM) to the provisioning and life cycle management (LCM) aspects, with a flexible and interoperable management tool set to handle the complexities of 5G deployments and service assurance. That tool set includes a basic set of model-driven management services, comprehensive data models and abundant performance metric definitions.
3GPP's efforts toward network intelligence may date back to self-configuration network (SON) in 2008, which opened the door to the application of data analytics and AI/ML. The introduction of network data analytics function (NWDAF) and management data analytics function (MDAF) since Rel-15 signified the architectural evolution toward a data-driven mobile network. Recently, NWDAF has been extended to support both data analytics and model training [6]. A study item aimed to the transfer of AI/ML model in 5G system is launched for Rel-18 [7]. Following the track in core and management, 3GPP RAN3 is leading the work on RAN data collection and utilization in Rel-17/18 [8].

III. CLOUD-NATIVE AND INTELLIGENT RAN ARCHITECTURE FRAMEWORK
In this section, a cloud-native and intelligent RAN architecture framework is introduced upon the 3GPP and O-RAN architecture [2], [9]. Leveraging the radio edge cloud platform and embedded intelligence, RIC based serviceaware and RAN-aware cross-layer optimization is further proposed by introducing the interactions between Near-RT RIC and applications (e.g., application running on MEC or the application deployed in the central cloud). This greatly enhances the RAN agility and helps to unleash the full potential of RAN by customizing the RAN capabilities for diversified services' needs. Fig. 1 illustrates the functional view of the architecture spanning from the RAN radio site to the central cloud. The architecture is built upon the key component defined in the O-RAN architecture, e.g., SMO, Non-RT RIC and Near-RT RIC [2]. The newly proposed interactions between Near-RT RIC and the MEC and the central cloud applications are highlighted in brown, which is further described in Section IV-F. The remote radio units are built on dedicated hardware, the central unit (CU), distributed unit (DU), as well as Near-RT RIC and MEC, are deployed on distributed radio cloud platforms with Acceleration Abstraction Layer (AAL). The radio cloud platform is designed to guarantee RAN performance on latency, throughput and reliability. For the same purpose, hardware accelerators are integrated with generic computing, network, storage and synchronization hardware for the complex digital signal processing tasks in lower layers. Computing power optimized for intensive AI/ML tasks, such as model training, is also part of the heterogeneous infrastructure. On top of RAN, SMO is deployed for unified management and orchestration of the network functionalities, the radio cloud platform with the support of the Non-RT RIC as the RAN management domain intelligent engine.
On the control plane of RAN, based on the existing RAN architecture defined in 3GPP [9], Near-RT RIC is introduced as the intelligent engine with embedded AI/ML capability for RAN optimization and automation [2]. Near-RT RIC connects to CU control plane (CU-CP) and DU for finegrained data collection and near-real-time control [10]. The collected data, termed radio network information (RNI), may consist of UE-level, QoS flow-level, slice-level and cell-level measurements, cell configurations and UE contexts, etc. The AI/ML capabilities in Near-RT RIC leverages RNI to produce analytics, predictions and radio resource management (RRM) decisions that are optimized for individual application service. Depending on the deployed xApps, a Near-RT RIC can be customized for missions like unmanned aerial vehicle (UAV) communications, path-based handover optimization, slice SLA assurance and so forth. On the user plane, MEC is deployed on-demand at the edge to host localized application services in the form of MEC Apps. The steering of user plane traffic is performed by local user plane function (UPF) or breakout. VOLUME 11, 2023 The synergy of Near-RT RIC and MEC is further enabled by the exchange of service enhancement information. This mechanism supports mutual awareness between RAN and a particular application service in a near-real-time manner, enabling them to optimize its behavior more adaptively.
Specifically, the presented framework highlights the following concepts: A. RAN-CLOUD CONVERGENCE Cloudification has been moving beyond the core and down to the far end of radio network. The flexibility and scalability from cloud technology may finally enable speedy and ondemand roll-out that the telco longs for, as well as lower cost for maintenance and upgrade. Meanwhile, the convergence means even more to MNO, by transforming it from a conventional communication service provider to a valueadding information service provider. The MNO will offer a complete solution ranging from cloud infrastructure to pervasive connections, and to an ecosystem of specialized services tailored for the verticals.
The groundwork of such a convergence is the radio cloud platform. The platform is built on mature virtualization technologies (e.g., Kubernetes and OpenStack) and connected to the centralized and unified orchestrator, forming a unified resource pool. Various RAN NFs and application services can be deployed on the virtualized resources as an integrated service offered to end customers. The radio cloud platform not only offers flexibility and agility in deployment and upgrade, but also enables MNOs to expedite the development and go-live of customized services via well-validated continuous integration / continuous delivery (CI/CD) and DevOps tools.

B. RAN EMPOWERED BY HIERARCHICAL AI
AI/ML has demonstrated tremendous potential in a variety of aspects from RRM to digital signal processing and even RF domain. To adapt to its diversified usage scenarios in RAN, hierarchical and distributed intelligence are essential. The centralized AI engine located in the management and orchestration layer and the distributed AI engines located in RAN coordinate in terms of both data and model delivery to facilitate different AI/ML tasks. For example, to alleviate the computing resource requirements of a RAN node, computing intensive AI/ML model training can be done by the centralized AI/ML engine in Non-RT RIC, leaving AI/ML inference and online learning to the distributed AI engines in Near-RT RIC. Some most complicated tasks often necessitate concatenation of multiple models. To address that, both Non-RT RIC and Near-RT RIC support microservice architecture to host different AI/ML functionalities, and API framework that facilitates efficient interconnection and interaction among the microservices. It allows thirdparty developers to build complicated AI/ML-empowered solutions to optimize RAN performance, and also opens the door for integrating RAN with value added capabilities.

C. RAN-AS-A-SERVICE
The success of 5G and beyond telco industry, essentially counts on how well it will serve the highly diversified application services. As the demands on data plane such as latency and privacy are well addressed by MEC, the synergy of RAN and application service in control plane is raising ever more interest. Via the exchange of service enhancement information, Near-RT RIC is able to serve as the anchor for mutual awareness of RAN and application service, which effectively transforms RAN from a dumb pipeline into a versatile and responsive service itself.
From each application service, Near-RT RIC may obtain its specific requirements and characteristics. Combining such information with RNI as input, Near-RT RIC performs nearreal-time RAN control to ensure quality of service (QoS) and quality of experience (QoE). On the other hand, Near-RT RIC may expose the RAN analytics to the application service in a near-real-time manner, allowing the application service to adjust itself to RAN's instantaneous capability. For example, if poor radio condition is predicted for a UE playing cloud-gaming in the next 100 millisecond, and Near-RT RIC exposes that to the cloud server, the cloud server may quickly lower its video resolution to avoid stalling that ruins user experience.
This RAN as a service (RANaaS) paradigm with Near-RT RIC is generic in that it does not change CU/DU design for specific application services. Only the lightweight software xApp in Near-RT RIC has to be tailored to the specific service, while the CU/DU simply execute the common control signaling from Near-RT RIC. Such an advantage makes this paradigm especially attractive for telco and vertical owners, leading to an ecosystem of smart applications that combines their expertise and eventually leads to winwin situation.

IV. KEY DESIGN CONSIDERATIONS
To support the cloud native and intelligent RAN evolution, six key technical aspects are identified. Among all, radio cloud platform performance optimization, hardware accelerator and AAL, and orchestration are relevant to cloud-native RAN. The other three aspects, i.e. customized RNI collection and control coordination, AI/ML workflow support, and service enhancement information exposure, are for empowering RAN intelligence.

A. RADIO CLOUD PLATFORM
A detailed reference architecture for the radio cloud platform is shown in Fig. 2. To support carrier-grade and real-time resource provisioning for RAN applications, the radio cloud platform is tailored with the following key capabilities: • Support for real-time operating system and preemptive scheduling. The real-time operating system together with preemptive scheduling ensures deterministic response time for the time-sensitive events, interrupts and other tasks of RAN applications. Fig. 3 shows the test results on real-time performance of the  radio cloud platform prototype. It tests the interval from the moment a task is created to that it has been scheduled. The test is performed on 4 threads each for 8.64 × 107 times. It can be observed that the latency is kept below the maximum of 15 microseconds, rendering the prototype capable of real-time tasks in RAN.
• Support for CPU/Memory affinity and isolation. The high-performance RAN applications require a large amount of CPU cycles and memory to work properly. It depends on the cloud platform to provide such a mechanism to guarantee resource.
• Support for network acceleration technology. The virtualized RAN functions use Ethernet network interface card (NIC) for communications with each other. Boosting the network throughput is key to the cloud platform.
Currently the PCI pass-through, SR-IOV, DPDK, and other network offloading technologies can be used in the cloud platform to achieve the goal. In addition to hardware optimizations, the telecom platform-as-a-service (PaaS) utilities are becoming mature. For example, Prometheus, Elastic, and Istio can be used in radio cloud platform to provide the monitoring, logging, and trace functionality.

B. HARDWARE ACCELERATOR AND AAL
Hardware accelerators are introduced in RAN cloudification to offload the most computation-intensive tasks in the radio protocol stack. The design of accelerator should meet the overall architecture requirements of RAN, and balance the system performance, power consumption and cost. It specifically involves selection of acceleration offloading function and acceleration model (e.g., inline model and lookaside model), and definition of the AAL.
To decouple the NF and the hardware accelerators, an AAL is needed as a middleware, where an AAL profile specifies the workloads that a hardware accelerator processes on behalf of a cloudified NF. For example, the AAL profile for PDSCH may be described in terms of support for interrupts, CRC types, concatenation of non-byte aligned code blocks and so forth; once in operation, the unprocessed transport blocks, code blocks or symbols with proper configurations will be fed to the hardware accelerator to generate the required outputs efficiently. The AAL interface consists of an API and associated information models between an application and the AAL implementation within an platform instance. The API includes the common API for AAL discovery, initialization, real-time performance telemetry and status monitoring, and the profile specific API for enabling the applications to efficiently offload AAL profile workload to AAL implementation in a consistent way, without knowing every detail of the underlying hardware accelerators.

C. ORCHESTRATION
Orchestration enables the deployment of virtualized RAN functions via the management of RAN cloud infrastructure, and the LCM of VNF/CNFs. The management of RAN cloud infrastructure, i.e., radio cloud platform, covers cloud infrastructure discovery, scaling-in/out, configuration management, performance management and fault management, as well as the hardware accelerator management. The LCM of VNF/CNFs covers both design-time and run-time. The design-time LCM focuses on the descriptor of network services that specifies not only the fundamental computation, memory, network and storage requirements, but also specific requirements on location (which impacts delay), hardware accelerator, affinity or dependency on other network services, etc. For the run-time LCM, aspects like instantiation, scaling, healing, migration, etc. should be considered.
As shown in Fig. 2, infrastructure management services (IMS) and deployment management services (DMS) provided by the radio cloud platform enable the orchestration functionalities, e.g., federated cloud orchestration and management (FOCOM) and network function orchestrator (NFO) that reside in the orchestration and management layer [11]. The coordination of the network management and the radio cloud orchestration is crucial for the efficiency of the cloud infrastructure and RAN. AI/ML can be further introduced to achieve the network and resource automation. One good example is that the containerized gNB can be automatically VOLUME 11, 2023  scaled in or out with the accurately predicted network traffic load. Energy saving could also be achieved by leveraging the joint analytics/predictions of the cloud resources and the network traffic. Optimized cloud energy saving actions, e.g., CPU core state switching between idle state (also known as C-state) and performance state (also known as P-state), accelerator management, CNF scale-up/down and scale-in/out, CNF termination/re-initiation, server shutdown, etc. can be determined and enforced by the IMS and DMS.

D. RNI COLLECTION AND CONTROL COORDINATION
The key to RAN intelligence is the collection of RNI and the data-driven fine-grained RAN control. When Near-RT RIC is leveraged with multiple xApps for diverse services, two issues arise.
One issue is that the volume of RNI can be so vast that it might overwhelm the network connection between Near-RT RIC and CU/DU. In light of this, the RNI subscriptions must be fine-grained so that each xApp just collects the exact UE/slice/cell's RNI it needs, with the desired time granularity. In addition, a subscription sharing mechanism is needed to ensure that no duplicate RNI is transferred to Near-RT RIC when it is requested by different xApps. Meanwhile, it is possible that the control commands from different xApps may conflict with each other, which may neutralize the expected gains or even degrade the overall system performance. Therefore, it is vital that Near-RT RIC ensures coordination of the commands by understanding their intents and potential effects.
These two issues are taken into consideration in the design of Near-RT RIC. A reference architecture for the Near-RT RIC is shown in Fig. 4. The subscription management functionality inside Near-RT RIC is defined to enable the efficient RNI collection [10]. The database is introduced to convey and store RNI. Conflict mitigation functionality is also designed to tackle the potential conflict between the control commands from multiple xApps. The xApp may even obtain the guidance from the conflict mitigation functionality prior to initiating a control action, which can be indication of a potential conflict with other xApps, or recommendation how to modify a tentative control command to avoid conflict. All these functionalities can be accessed by xApps using a set of Near-RT RIC APIs provided by the Near-RT RIC.
Security function is also introduced for Near-RT RIC as a crucial component. The targets of the secure function is to prevent flawed or malicious xApps from abusing radio network information (e.g. leaking them to unauthorized external systems) and/or control capabilities over RAN functions. Well-recognized solutions for authentication and authorization have been introduced.

E. AI/ML WORKFLOW SUPPORT
A general AI/ML workflow involves model design and composition, raw data collection, pipelining, training and retraining, evaluation, packaging and cataloging, deployment, inference, update, etc. To facilitate the xApp developers in the complicated AI/ML life cycle, an AI/ML workflow support framework is desired. The framework should take into account different levels of demands, from dedicated computing power (e.g., GPUs), run-time environment and universal AI/ML SDK and APIs, to the establishment of a common base AI/ML model repository. One challenge of provisioning such support is the distributed nature of the whole process: a model can be initially trained by different vendors, then evaluated and specialized by MNO with its own configurations, then deployed in Near-RT RIC or CU/DU, for near-real-time or real-time inference respectively, and then re-trained with local data for further enhancements. To that end, the Near-RT RIC and Non-RT RIC have both identified AI/ML support as a key functionality.
Advanced AI/ML technologies, e.g., federated learning should also be supported. This is achieved by the hierarchical AI/ML support, where the centralized AI/ML engine located in Non-RT RIC may serve for the global training, and the distributed AI/ML engine in Near-RT RIC is responsible for the local training. To improve the federated learning efficiency and performance, proper selection of Near-RT RIC participates in the federated learning task is crucial. The computing capabilities, data availability, processing and model exchange latency should be considered to select the most appropriate Near-RT RICs as the local training nodes.

F. SERVICE ENHANCEMENT INFORMATION EXCHANGE
As indicated in Section III, the service enhancement information associates closely the RAN and the application service by sharing RAN analytics/predictions and service requirements/characteristics. The Near-RT RIC architecture shown in Fig. 4

also facilitates such cross-layer QoS/QoE optimization via the interactions between Near-RT RIC and MEC/NEF.
For each specific type of service, an xApp may provide a corresponding service API that support incoming and/or outgoing service enhancement information. The service API can be exposed externally via the API enablement function and the exposure function in Fig. 4. The API enablement function supports internal API registration and discovery such that the xApp's API can be known by other xApps as well as the newly added exposure function. The exposure function will act as the gateway that aggregates the service APIs from different xApps and exposes them externally, either directly to applications running on MEC or indirectly via Network Exposure Function (NEF) or local NEF. For the latter case, the current network capability exposure framework and security mechanisms used for 5G core network could be leveraged for the RAN information exposure. Moreover, NEF and local NEF also help to facilitate the correlation of UE's external identifiers and RAN identifiers. It should be noted that a careful design for such service APIs is needed to guarantee low latency for the exchange of service enhancement information. Table 1 lists examples of incoming service enhancement information for video, XR and machine vision inspection services which will most likely benefit from such exchange. Taking machine vision inspection for example, a customized xApp could learn the period and data size of the service's burst traffic from the FPS and picture resolution/size information collected from application server, and then optimize the pre-scheduling or configured grant configuration parameters for the uplink resource allocation, e.g., period and number of resource blocks. It helps to reduce scheduling latency and improve radio resource utilization by better matching the service characteristics and RAN scheduling behavior. Examples of outgoing service enhancement information envisioned to be valuable for those applications are summarized in Table 2. For instance, with the predicted UE throughput provided by Near-RT RIC, an application is able to adjust its codec rate for avoiding video stalling in poor radio conditions.
To enable UE specific service-aware and RAN-aware cross-layer optimization, one challenging issue is how to correlate the external UE identifier used by the service applications (e.g., the IP address or Generic Public Subscription Identifier) and the UE identifiers used in RAN (e.g., AMF UE  NGAP ID). This could be achieved with the assistance of the 5G core network, since the 5G core is aware of the various UE identifiers. Fig. 5 illustrates the key procedures of the Near-RT RIC empowered service enhancement information exposure, where Near-RT RIC provides the RAN analytics information (as one type of the enhancement information) to the MEC via the local NEF. In the context of service-based architecture, Near-RT RIC acts as the RAN enhancement information exposure service producer. The local NEF could consume the service using the UE ID known by RAN and further expose the service to the MEC with the external UE ID. The key steps involved in Fig. 5 are listed as follows: 1) MEC application requests for one-time or subscription of UE level RAN analytics information (e.g., predicted throughput) to the local NEF with external UE ID (e.g., IP address). 2) Local NEF translates the external UE ID to the UE ID known by RAN (e.g., Globally Unique AMF Identifier (GUAMI) + AMF UE NGAP ID) by interacting with session management function (SMF) or binding support function (BSF) and the access and mobility management function (AMF).

3) Local NEF requests for the RAN analytics information from Near-RT RIC using the UE ID known by RAN (GUAMI + AMF UE NGAP ID). 4) Near-RT RIC generates the RAN analytics information
based on its collected data from CU/DU. AI/ML can be leveraged to predict the RAN analytics information. 5) The RAN analytics information is delivered to the local NEF and MEC application. The cloud-native and intelligent RAN architecture is built upon the 3GPP and O-RAN architecture, with integration of the cloudification and intelligence components, it poses great challenges on the security and stability. For example, the malicious xApps may exploit UE identification, track UE location and change UE priority, attackers exploit nonauthorized Near-RT RIC APIs or other related interfaces to access to network resources and services which they are not entitled to use, the security and stability risks caused by the radio cloud platform. Therefore, security analysis and the threat model for the architecture framework should be carefully studied and relevant assets, stakeholders, vulnerabilities, threats, requirements, countermeasures and solutions need to be further identified to reduce risk exposure and mitigate any harmful effects.

V. TYPICAL USE CASES
In this section, two promising use cases empowered by the proposed architecture framework are given and the simulation and test results are shared to demonstrate the benefits and the key capabilities of the architecture components.

A. RAN SLICE RESOURCE ALLOCATION OPTIMIZATION
To meet the diverse demands of emerging services, the 5G RAN supports network slicing, by which a single physical RAN infrastructure can form multiple logical networks (i.e., slices). Each slice is characterized by different performance, cost and latency requirements.
The major challenge for a 5G network with network slice (NS) is to meet the different SLA of different slices. To address this, a layered AI/ML solution is proposed to improve the SLA satisfaction ratio (SSR) modeled as the average joint satisfaction radio in terms of delay and transmission rate. The SSR of the n-th slice in a time window t is defined as: where the total number of slices is N , and n = 1, . . . , N is the index. In the n-th slice, the set of users is denoted as U n , and u is the user's index. During the time window t, the total number of service packets for user u in the n-th slice is X u n (t), and x = 1, . . . , X u n (t) is the index of the service packets. K u,x n is the indicator whether the x-th service packet during the time window for user u in the n-th slice meets the SLA requirements in terms of required delay d req n and required transmission rate R req n , and can be denoted as where d u,x n is delay of of the x-th service packet during the time window t for user u in the n-th slice, and R n u is the transmission rate during the time window t for user u in the n-th slice.
In this paper, a Deep Neural Network (DNN) [13] based slice resource pre-optimization model is first designed to learn the long-term experience E = (s, a, r), r = (SSR 1 , SSR 2 , . . . , SSR n ), which is counted every 1 second. s = (SINR, Throughput) denotes the network status. a = (P 1 , P 2 , . . . , P n ) is configuration actions. P n is PRB percentage configured for the n-th NS in current NS time window. The pre-optimization model narrows down the search space of the potential resource allocation actions and then a multi-arm bandit (MAB) [14] based online learning model is used to fine-tune the solution and find the optimal resource allocation decisions (the percentage of PRBs in the current slice window). The layered approach helps to accelerate the convergence speed and optimize performance with the coordination of the long-term experience and online learning. More specifically, DNN model is designed with input of users' SINR and network slice throughput, and with output of the PRB percentage allocation among multiple network slices. To assure the performance, the samples with SSR above the threshold of 90% are selected for the initial training of the DNN model. After that, the MAB online learning with the upper confidence bound (UCB) policy are implemented once per second to find the optimal solution within the derived slice resource allocation action space.
The hierarchical AI/ML capabilities offered by the architecture helps to facilitate the AI/ML model training and AI/ML inference in an efficient manner as shown in Fig. 6. Note that the AI/ML model training is carried out by Non-RT RIC since the task is computationally intensive and Non-RT RIC is usually deployed on the region or central cloud with abundant computation resources. With the trained model delivered from Non-RT RIC to Near-RT RIC and the realtime input of the SINR and traffic data collected from the gNB, the online learning resource allocation action space can be generated by Near-RT RIC via AI/ML model inference and the variation operation (dropout-based model uncertainty calculating) of the inference results. In typical scenarios, SMO/Non-RT RIC are deployed in MNO's central cloud, and Near-RT RIC is deployed at the edge radio cloud for the low latency communications with CU and DU. A Near-RT RIC can connect to multiple CUs and DUs, and naturally the xApps (like xApp1 and xApp2 in Fig. 6) can optimize multiple cells depending on the network deployment.
The SLA satisfaction rate (SSR) per slice achieved from the proposed scheme is compared with two baseline schemes, i.e., the Hard-slicing and the Deep Q-network (DQN) [12].  Hard-slicing is a network slicing scheme that allocates network resources to different slice in a completely isolated manner, where each slice is allocated with a fixed proportion of total bandwidth. Fig. 7 and Fig. 8 show the performance of the layered intelligence approach in the simulation experiment. We consider a RAN with one base station serving multiple users. Detailed simulation parameters are summarized in Table 3. There are three kinds of classic network slices including the eMBB, mMTC, and URLLC, characterized by the data rates and delay requirements of (40 Mbps, 26 ms), (8 Mbps, 16 ms) and (2 Mbps, 20 ms) respectively.
As shown in Fig. 7, the convergence speed of the proposed method exceeds those of DQN and Hard-slicing schemes obviously, which can converge to a SSR above 94% within 2000 slicing time windows. In contrast, the model convergence speed of DQN scheme is slow and can be significantly affected by the initial allocation of slice resources. After about 1800 iterations, the average SSR of DQN scheme exceeds that of the Hard slicing method, and after about 2500 iterations, its average SSR just exceeds 90%.   The SSR for the mMTC slice reaches 100% under the three schemes. The proposed DNN+MAB scheme has a better performance for eMBB and uRLLC slices. Especially for the eMBB slice, the proposed DNN+MAB method outperforms the other two schemes in terms of SSR by 13% and 59% respectively.

B. RAN-AWARE VIDEO SLA ASSURANCE
In this section, we further investigate assurance of video SLA against fluctuating radio conditions with the architectural framework. RAN-aware cross-layer application optimization is proposed based on the Near-RT RIC, where Near-RT RIC predicts the UE's throughput in a near-real-time manner (in order of 100 ms) with collected data from 5G CU/DU and exposes it to the video server to assist the application optimization. The video application may leverage the predicted user throughput provided by Near-RT RIC to proactively adjust its behavior. For example, the video application may adjust the codec rate or resolution based VOLUME 11, 2023  on the predictions to assure the SLA performance and avoid stalling. This scheme could be used for video surveillance and also remote control applications in 5G verticals.
We validate the scheme in a lab with commercial 5G base station, where 5G CU/DU and Near-RT RIC are deployed on the radio cloud platform. The basic cell configurations are listed in Table 4. There is an ''Uplink throughput Prediction'' xApp running in the Near-RT RIC as shown in Fig. 9. The xApp collects information of the gNB and radio channel quality (such as RSRP, RSRQ, SINR for each UE). It uses an AI time-series forecasting model to predict available uplink throughput of the UE in the near future based on the collected information. According to the prediction results, the xApp sends the service enhancement information, i.e., the uplink throughput predictions, to the camera management server. The management server can configure the video codec in the remote cameras based on the information. The remote cameras equipped with 5G modules record real-time video and stream the video to the management server via 5G uplink. The default codec settings in the cameras are applied for a predefined video quality. Upon reconfiguration from the management server, a camera change its codec settings to adapt to the radio channel variations. When the radio channel deteriorates, the camera is reconfigured to use a lower coding rate. The reconfiguration can be executed in about 200 ms to 300 ms.
In the test, we simulate automated guided vehicles (AGVs) with cameras for video monitoring in the smart factory. For that purpose, a wireless channel fader is attached for each camera to simulate the fluctuations of radio channel condition independently. The experiment is executed once with ''AI off'' (i.e., the xApp is disabled) and another round with ''AI on'' (i.e., the xApp is enabled). The results are compared as below.    Fig. 10 shows the predicted uplink throughput versus the actual uplink throughput. The AI model inside the xApp can reach a 98% prediction accuracy for above 90% samples. Fig. 11 shows the codec rate varies over time. With ''AI on'', the codec rate will follow the changing uplink throughput prediction for each UE. Fig. 12 shows the frame interval performance of the video stream arriving at the server. From the figure we can see that, in the case of ''AI off'', the variation of video frame interval matches the variation of actual throughput in Fig. 10. A significantly larger video frame interval can be observed from the server when the uplink channel deteriorates. If the video frame interval exceeds 200 ms, the real-time player will have insufficient video cache, which will result in stalling in the display. With ''AI on'', the video frame interval keeps stable at the level of 100 ms, as the codec is tuned to adapt to the radio channel conditions, so that video stream will be smooth.

VI. CONCLUSION AND FUTURE WORK
The evolution of 5G RAN is imminent driven by the demands from diversified application services and from MNOs. A cloud-native and intelligent architecture framework for the evolution is promoted in this paper, which can address the demands by streamlining the state-of-the-art progress on cloudification and AI/ML. In particular, the paper investigates the mechanism of service-aware and RAN-aware crosslayer optimization that enables close coordination between RAN and service applications in this framework. The key design considerations of the architectural are also elaborated. Two example use cases demonstrate the great potential of the proposed architectural framework.
As the deep convergence of information technology, communication technology and data technology is widely recognized for RAN evolution, a number of open points are yet to be further explored. Those topics include: converged communication and computing capabilities at RAN with high energy efficiency and controllable cost; design for heterogeneous hardware platform; multi-cloud management and orchestration that enable coordination of edge radio cloud and central cloud; management and orchestration of AI/ML functionalities, etc. The security protection for the cloudnative and intelligent architecture are also essential. NURIT SPRECHER has spent many years working as an expert system architect and technologist, defining the carrier-grade network and service architecture evolution and system design. She is currently the Head of Network and Service Automation Standardization at Nokia. She has contributed to many projects carried out in ETSI, O-RAN ALLIANCE, GSMA, IETF, ITU-T, IEEE, and BBF. She has participated in core discussions on the next generation network with tier-1 carriers and a number of governments. She has spoken at many conferences and a contributing author to numerous publications. She is a Distinguished Member of the Nokia Technical Committee. She is an ETSI Fellow, a technology visionary and strategist with 30 years' global telecommunications industry experience. She initiated and drove the industry effort to set up the ETSI ISG Multi-Access Edge Computing (MEC) and successfully chaired the ISG during its first two-year term. She played an instrumental role in the creation of the ETSI ISG Zero-Touch Network and Service Management (ZSM) and serves as the Vice Chair for the group.