Flexible Functional Split and Fronthaul delay: A queuing-based model

We study the delay over virtual RAN (vRAN) topologies, entailing base stations that are divided into centralized and distributed units, as well as the packet-switched fronthaul network that connects them. We consider the use of flexible functional split, where the functions that are executed at each of these two entities can be dynamically shifted. We propose a queuing-based model, which is able to precisely mimic the behavior of such nodes, and we validate it by means of extensive simulations. We also exploit Jackson Networks theory to establish the end-to-end delay over the fronthaul network, allowing us to assess the impact of having different networking policies and conditions (for instance, background traffic or heterogeneous technologies). Thanks to the simulator we can also broaden the analysis, by studying the delay variability. In addition, we conduct an in-depth analysis of the performance exhibited by a realistic network setup, whose particular characteristics might hinder the services performance, due to the longer dwell times at each split configuration. The results evince the validity of the proposed model, even under realistic conditions. We show that it might not be enough to guarantee an average stable operation of the centralized/distributed units, but the traffic load should remain below the slowest service rate, to avoid reaching unacceptable delays. An increase of > 100× is observed in the delay, using the realistic network setup, when these conditions do not hold.


I. INTRODUCTION
It is well known that one of the most stringent requirements for forthcoming cellular systems (5G and Beyond 5G) comes from the so-called Ultra-Reliable Low Latency Communication (URLLC). End-to-end delay needs to be kept at very low values, to enable an appropriate provisioning of some of the envisaged services (augmented reality, autonomous driving, etc). On the other hand, and considering an architectural look, Radio Access Networks (RAN) have also witnessed a strong shift, and thanks to the SDN and NFV paradigms, operators are deploying many of the functions that were traditionally co-located in the base station (BS) in centralized controllers. While this leverages several advantages (CAPEX/OPEX reduction, cooperation between access elements, etc.), it also imposes new challenges, in particular considering the delay [1].
In this regard, the Cloud-RAN (C-RAN) solution [3], [4] starts from a single centralized entity that controls a number of Remote Radio Heads (RRH). The former executes many of the functions of traditional BSs, and the RRHs implement the radio-frequency (RF) tasks. Although this approach might yield important capacity improvements in the RAN, by exploiting tight cooperation techniques (i.e. CoMP), it also imposes stringent requirements in the network connecting the different entities, both in terms of throughput and delay. In order to overcome the limitations coming from fully centralized solutions, functional-split architectures were proposed to permit the use of multiple centralization levels [5], which could be adapted to the particular network characteristics, traffic features and service requirements. The traditional BS is now divided in three entities, so that the RRH is called the radio unit (RU) and the BS protocol stack is divided between a distributed unit (DU), which is located close to the RU, and a central unit (CU). Hence, CUs can be grouped FIGURE 1: Possible functional splits between the CU and DU according to 3GPP. The vertical dashed red line establich the split, so that the function at left are placed in the CU and the functions at right are moved to the DU. [2]. together in virtualization pools or clusters. Figure 1 depicts the functional splits defined by the 3GPP [2] for LTE, which set the limit between the functions implemented in the CU (at left) and DU (at right). While the splits are defined based on the LTE protocol stack, the specification of the 5G radio access network assumes the split of the base station (i.e. gNB) [6]. Indeed, the protocol stack of LTE and 5G are almost identical in their high level descriptions, but for the the new Service Data Adaptation Protocol (SDAP) included in 5G. As can be seen, most of the splits are defined in the protocols boundaries, while a few of them separate functionalities that belong to the same protocol. It is worth noting that the RF functionalities would be located at the RU, so that the PHY/RF split would boil down to a C-RAN solution.
The applicability of the different splits is still being analyzed by standardization and industrial bodies like 3GPP and O-RAN [7], where special attention is being payed to low layer splits [8]. According to those analysis, Table 1 summarizes the requirements, benefits and limitations identified for each split by the 3GPP [2].
Altogether, the new RAN is made of physical and virtual entities, leading to the so called virtual RAN (vRAN) [9]. These architectural changes have also imposed new challenges on the fronthaul network that connects such entities, since it needs to provide quality of service (QoS) levels aligned with the configured split. Recent initiatives, such as the Next Generation Fronthaul Interface (NGFI) [10], envision a packet-switched fronthaul network divided in a segment connecting the RU and DU (NGFI-I), and another one, the so-called midhaul, which provides connectivity between the DU and CU (NGFI-II). Eventually, recent studies consider the dynamic, or flexible, shifting of the selected functional split, to adapt the network to the varying environment [11] (i.e. new services, traffic changes). In this novel scenario, the network adaptation encompasses both the selection of the most appropriate functional split and the reconfiguration of the underlying fronthaul network to provide As an example, Figure 2 shows the functional diagram of a base station with PDCP/RLC split. As can be observed, the CU hosts both RRC and PDCP protocols and it is connected to the gateways of the operator core. On the other hand, the DU implements protocols below PDCP and it is connected to the RU which performs the RF tasks. Finally, NGFI-I and NGFI-II connect both the CU with the DU, and the DU with the RU.
In this paper we consider a flexible functional split scenario, and we broaden the model that was originally presented in [12], which allows us to accurately predict the delay traversing CU or DU nodes. The main contributions are: fronthaul packet-switched network. To the best of our knowledge, this is the first work that exploits queuing theory to study the delay for vRAN architecture with Flexible Functional Split, embracing the fronthaul network. • We exploit an event-driven simulator to assess the validity of the proposed model, which also allows us to study the variability of the observed behavior, and to carry out various complementary analysis. • Finally, we also discuss the behavior exhibited by a real network configuration, where the particular characteristics might severely hinder the observed performance, due to rather long buffer lengths. We assess how the delay is distributed, and we discuss how this could also be used as a worst-case design parameter.
The proposed model can help to understand the expected behavior of different functional split policies, and to find reasonable limits on the maximum acceptable traffic load. In addition, the developed simulator could also help to derive appropriate queuing management policies. The rest of the paper is structured as follows. In Section II we provide an in-depth literature review of flexible functional split, and we highlight the main contributions of our paper. Then, Section III introduces the queuing model that is used to characterize the delay of CUs and DUs, and how this can be extended to study the overall fronthaul delay, exploiting Jackson Open Networks Theory. In Section IV we validate the model, comparing its performance with that obtained after a thorough simulation-based study, over a simple scenario with various configurations. In order to assess the validity of the model with more complex scenarios, we use a realistic network deployment in Section V, which has an optimal split policy, and we again compare the results yielded by the proposed model with the behavior obtained by means of simulation-based experiments. We conclude the paper in Section VI, where we summarize the work and identify our future lines of research.

II. RELATED WORK
In recent years the research community has studied functional split solutions from different angles. From a general perspective, some works analyze the capabilities of each split [13], and their performance [12]. These studies have been complemented with other works that mostly focus on the constraints imposed by the fronthaul network [14]. Worth of attention are papers that consider the x-haul (networking resources shared between backhaul and fronthaul) [15]. Along with it, other lines of research have paid attention to the application of regular networking techniques in RAN, which exploit functional splits. Within this group, an overview of scheduling techniques suitable for dynamic functional splits is presented in [16]. In addition, and also from an overall perspective, Lagén et al. provide in [17] an overview of architectures supporting functional splits, and the fronthaul compression proposed by 3GPP and O-RAN.
Apart from the literature devoted to assessing the capabilities and requirements of functional splits, other research efforts aim at implementing dynamic functional split solutions. In this regard, a functional split prototype is introduced in [18] to evaluate the impact of low-level split options over the energy consumption, while the authors of [19] depict a 5G RAN implementation that allows dynamic split shifting. Also from this implementation perspective, some works have paid attention to the reconfiguration of the underlying network to support the split requirements. For instance, in [20], [21] the authors analyze and assess solutions to reconfigure the optical fronthaul network, while Chang et al. evaluate in [22] the implementation of a framework to enable flexible functional split over Ethernet-based fronthaul. A few works focus on the resources in the access elements, like [23], where the VOLUME 4, 2016 authors propose a new architecture, called F-RAN, to enable split selection considering availability of radio resources. From a different perspective, the authors of [24] propose a technique to select the functional split, shifting baseband signal precoding, using compression-after-precoding (CAP) or data-sharing (DS) strategies, when it is implemented in the BBU or RRH.
As can be observed, the scope of the aforementioned works is different from ours in essence. The related literature analyzed so far aims to study the goodness and limitation of flexible functional split, and how it can be implemented. On the other hand, our goal is to propose a framework to analyze the system performance when a given split selection policy is applied, in particular focusing on the end-to-end delay. In this sense, some works have addressed the modeling, definition, development and assessment of such policies. For instance, in [25], [26] Harutyunyan et al. model the functional split selection as a Virtual Network Embedding (VNE) problem, which is formulated as a Integer Linear Program (ILP). Similarly, the authors of [27] propose an algorithm to allocate Customer Virtual Networks (CVN) considering functional split. Another proposal can be found in [28] where Rodriguez et al. suggest a split selection algorithm to ensure efficient utilization of the fronthaul network, while allowing a cooperation among BSs. In this same line, the authors of [29] propose adjusting the split selection to enable URLLC services, exploiting Coordinated Multi-Point techniques.
Furthermore, a number of split selection policies have been proposed to optimize different metrics, taking into account various scenario features. For instance, the data rate is maximized in [30], while end-to-end delay is considered in [31]. Differently, the mobility of users and interference level are taken into account in the split selection solution in [32] and [33], respectively. Worth of mentioning is the work of Temesgene et al. [34], where reinforcement learning is applied to select the split in a scenario with small cells. As we have mentioned before, the adaptation of the fronthaul network is a requirement to efficiently implement flexible split solutions. In this sense, Alameer and Sezgin proposed in [35] a solution for allocating functions that jointly tackles resource allocation and routing, using Alternating Direction Method of Multipliers (ADMM). A solution for functional split selection to optimize energy consumption, in scenarios comprising small cells, is proposed in [36], where the optimization problem is posed as a constraint Markov Decision Process (MDP).
Among the works that propose split selection policies, a group of them pay special attention to the optical fronthaul. In this sense, a novel mobile fronthaul architecture to reduce latency in functional split enabled networks is proposed in [37], while the authors of [38] introduce different techniques with the goal of minimizing the delay. Similarly, the available capacity of the optical network is considered in [39], [40] as a constraint in the split selection procedure, while wavelength usage and transponder cost of optical networks is addressed in [41]. Other works focus on the orchestration and split reconfiguration for the optical fronthaul network [42], and on the development of simulation tools to study the impact of flexible functional split solutions over the underlying optical network [43], [44]. As can be observed, although these works develop split selection policies, they model and analyze the system performance according to particular indicators and features. Opposed to that, our paper does not propose a split policy, but it aims to model the system in a generic way, regardless of the specific split selection policy, to understand its behavior and fairly compare different strategies.
Another group of works seek to optimize energy consumption in flexible functional split scenarios. Under this category, the authors of [45] discuss the optimal centralization level, considering energy consumption and midhaul bandwidth. Temesgene et al. [46] suggest the use of Q-learning and SARSA algorithms to optimize the placement of functions in terms of energy. In a similar way, an online solution for flexible functional split selection considering energy is proposed in [47], where the problem is formulated as a MDP. The energy consumption, together with functional split, is also studied in [48], using a real implementation based on Open Air Interface (OAI). Energy is also considered in [49], although in this case the flexible functional split is applied to optimize the energy consumption, including baseband processing, in scenarios with Unmanned Aerial Vehicles (UAVs) Finally, it is also worth mentioning some works that consider the interplay of functional split with techniques used for service deployment and provisioning in cellular networks. In this regard, functional split is considered in [50] as part of network slicing to ensure certain QoS. Following a similar approach, Papa et al. study the combination of functional split and network slicing in [51]. From a more generic perspective, the authors of [52] propose a framework to handle heterogeneous RAN, functional split selection, and network slicing for multiple services. Finally, the authors of [53] consider split selection together with task offloading, analyzing the interplay of functional split and fog/cloud services. Once again, these works differ from ours in their scope, since they do not model the behavior of the fronthaul network with functional split.
All in all, after this thorough literature review, we can conclude that the modeling of vRAN has not been sufficiently addressed before. The theoretical model described and evaluated in this paper aims to shed light on the endto-end performance of the vRAN, in terms of delay, and for any arbitrary split selection policy. In this sense, it is worth remarking that the propose model can be configured to consider any split strategy, and so it would provide an arena to fairly compare them. To our best knowledge, this is the first paper proposing a theoretical model that yields the overall end-to-end delay in vRAN networks, embracing the use of flexible functional split, as well as the impact of the fronthaul network. Service rate of the j th split α j,k Probability of shifting from j th to k th split s k=1 α j,k = 1, α j,j = 0 γ j Change rate for the j th split ξ j Inverse of time at stand-by after leaving j th split π i (t) Probability of state (i, t) There are i frames in the node when: (1) t odd, using split j : j = t+1 2 , (i, j) (2) t even, standby after split j :

III. FRONTHAUL QUEUING MODEL
In this section we discuss the proposed queuing-based model. It encompasses two types of nodes: (1) the one used to reflect the behavior of both CU and DU; and (2) the nodes that are used, in the packet-switched fronthaul network, to connect CUs and DUs. While the latter will be based on the legacy M/M/1, CUs and DUs, which might implement different splits, require a more complex approach. Hence, we first present the model for CU and DU entities. Afterwards, we exploit Open Jackson Network theory to establish the average end-to-end delay. Table 2 enumerates all the variables and symbols that are used in the proposed model, including those that are used to solve it.

A. CU AND DU MODEL
As has been mentioned earlier, the proposed model for the CU and DU is an extension of the one that was presented in [12]. We focus on downlink communication, although the same reasoning can be applied to uplink. We assume that frames arrive at the CU following a Poisson process of rate λ pkt/ms. The CU/DU might be configured in s different functional split configurations, each of them characterized by a certain service time, which we assume exponentially distributed, with mean µ −1 j ms, for the j th split. We assume that such nodes might change their current split, after a time that we also assume exponentially distributed, with mean γ −1 j ms for the j th split. Before moving to the next configuration, a standby situation happens, which captures the time devoted to the reconfiguration tasks required in the CU/DU. The time that the node spends in this standby situation (where it does not process frames) is also modeled with an exponential random variable, with average ξ −1 j ms, for the j th split. A change in the functional split is modeled as a random event, with probability α jk of going from split j th to k th . We impose that α jj equals 0, ensuring that whenever there is a functional split change, the node does actually modify the configuration.
The main improvements from the model that was originally presented in [12] are: • We can consider different sojourn times for the various functional splits. • The time spent at the standby status is also different for each split. • When changing a particular configuration, we ensure that the next functional split is different from the previous one. We can thus capture a more realistic behavior, having a greater number of knobs to tune the node configuration. The CU/DU node can be modeled with the 3-dimensional Markov chain that is shown in Figure 3.
We define two types of states by means of the tuples (i, j) and (i, j), respectively. The first one denotes a state of normal operation, where i corresponds to the current number of frames at the node, and j is the index of the current functional split. The second tuple represents a standby state reached upon leaving the j th split. The proposed Markov chain has s horizontal planes, each of them representing a particular split.
If the CU/DU node is active and working at a particular split j, anytime a frame arrives there is a rightwards transition (rate λ), and when a frame finishes its processing and exits the node, we can see a leftwards transition (rate µ j ). At any time the node might shift to a standby situation, modeled with a transition to the corresponding state (i, j), in the same plane, with rate γ j . Once in this standby situation, frames might keep arriving, reflected by rightwards transitions, but the node would not be able to process them, and there is not any leftwards transition, as can be seen in Figure 3.
When in the standby state, the node will eventually go to another split configuration. The corresponding sojourn time is modeled with an exponential random variable, and so the overall rate towards other splits is ξ j . Once the standby status is over, the next functional split is selected with probability α jk , with k ∈ {1, . . . , s}, k = j, and so the transition rate between (i, j) and (i, k) is α jk · ξ j . Although the CU/DU node cannot process frames during this standby situation, we assume that the node has enough buffer capacity to keep incoming frames until they can be eventually processed, provided it works in a stable regime of operation.
The underlying model boils down to a quasi-birth-death (QBD) process, where each level corresponds to all states having the same number of frames: (i, j) and (i, j), for j, j ∈ {1, . . . , s}. Therefore, we use the Matrix Geometric method to find the average delay of processing a frame in this node. The reader can refer to the seminal works from VOLUME 4, 2016

FIGURE 3: Markov chain for CU and DU nodes
Neuts [54] and Hajek [55] for a thorough treatment of this theoretical framework.
The infinitesimal matrix characterizing the QBD process is defined as follows: where L 0 , B, L, F ∈ R 2s×2s . Matrices B, F are given in equation (2), while L is given in (3). On the other hand, We denote the stationary distribution of the QBD process as Π = [π 0 , π 1 , π 2 , . . .], where π i is a column vector of length 2s, and π i (t), t ∈ {1, . . . , 2s} is the probability of having i frames at the node when: (1) for odd t, the node is working at the j th split, and j = t+1 2 , (2) for even t, the node is at standby, after split j th , j = t 2 .
If the node is working at a stable operation regime, then a stationary solution for Π exists and there is a constant matrix R that fulfills the following relation [56, Theorem 3.1.1]: where R ∈ R 2s×2s . Although there is not a closed solution for the quadratic equation (4), an iterative method can be used instead to find R. In addition, there exists a unique positive solution to the finite system of equations, from which we can obtain vector π 0 : where 0, 1 are all-zeros and all-ones column vectors of length 2s, respectively. Then, the complete stationary distribution Π = [π 0 , π 1 , . . .] can be obtained as: From the stationary probability distribution Π, we can straightforwardly obtain the average number of frames in the node N cu/du : where · 1 is the 1-norm. Finally, applying Little's Law, we can find the average delay per frame τ cu/du , which encompasses both the waiting and processing times: As mentioned before, the stationary distribution is only guaranteed if the average service rate of the node is higher than the incoming data rate. We can thus establish the maximum packet rate λ max that ensures system stability: where θ i is the probability that the CU/DU node works at a particular functional split. The value of each θ i can be obtained by solving: where Θ is a column vector of length 2s, with the probability of working at a particular split (and the corresponding standby configuration): Θ = [θ 1 , θ 1 , θ 2 , θ 2 , . . . θ s , θ s ]; M = L + B + F ; and 0 and 1 are all-zeros and all-ones column vectors of length 2s, respectively.

B. FRONTHAUL END-TO-END DELAY
Once we have established the delay in both CU and DU nodes, we are now interested in finding the end-to-end delay between a CU and its corresponding DU. As mentioned earlier, we assume they are connected through a packetswitched fronthaul network, comprising switches as well as links connecting them, which might be of different technologies. In the most generic case, we consider that both switches and links can be modeled as legacy M/M/1 queuing systems and so exploit Open Jackson Networks theory to find the endto-end delay.
We model the network topology as a directed graph G = (V, E), where V is the set containing all network nodes and E is the set of all links. If we assume that there are c CUs, d DUs, n switches, and l links, then we can define V |V| = c + d + n + l. Based on the network topology and the particular routing strategy we can establish the routing matrix, R, of size V × V , which defines how frames travel from the CU to its corresponding DU. In Section IV we provide an illustrative example of such matrix. Figure 4 shows a typical fronthaul connection, where CU x and DU x are connected through a single switch, S x , and the corresponding two links. As can be seen, the QBD model that was introduced previously is used to capture the performance of both CU and DU nodes, while M/M/1 systems are used for both the switch and the corresponding two links.
In a packet-switched network, provided that the conditions established by Burke and Jackson's Theorems [57]- [60] are met, we can establish the end-to-end delay as the sum of the individual contributions from each of the considered nodes within the path. These conditions impose that the output process is statistically identical to the one at the input, for a given node. As for the M/M/1 node, the delay can be calculated as: where µ and λ are such node's service and incoming rates, respectively. In order to guarantee a stable regime of operation, it is required that µ > λ.
Since both the switches and the corresponding links can be shared by different traffic flows, the incoming data rate at each of them might be different. We assume that only CUs receive external traffic, and that the aforementioned routing matrix, R, captures how the flows traverse the fronthaul network. We define Λ as a row vector, where each component λ v corresponds the arrival rate at each of the v ∈ V nodes [59], [60]: where Φ is another row-vector containing the external traffic in the network, i.e., φ v = 0 for all switches, links, DUs, and φ v = 0, for all CUs. Hence, based on the routing matrix R and the incoming traffic at all CUs we can obtain the incoming traffic rate at each node and so yield the corresponding delays.
Once we have the delay of all nodes in the network, we could use the following expression to establish the end-toend delay for any particular flow f ∈ F, where F is the set of all flows, as: where P (f ) returns the nodes traversed by flow f . We can also obtain the overall average delay (for all considered flows), by applying Little's Law to the whole network: where λ 0 is the overall external traffic in the network: λ 0 = v∈V φ v , and n v is the average number of frames at node v.   For CUs and DUs this value is provided by equation (7), and for switches and links (M/M/1) it can be obtained as: being ρ the corresponding node occupancy, which can be calculated as: ρ = λ µ . As will be discussed later, the output process of CUs is not strictly Poisson and this would actually hinder the possibility of applying the Open Jackson framework. We will discuss that, under mild conditions (i.e. short standby times), the results are still valid, and close to real performances.

IV. MODEL VALIDATION AND DISCUSSION
In this section we validate the previously described model, comparing the theoretical results with those obtained from extensive simulation-based experiments. For that, we exploit an event-driven simulator, which was developed from scratch in C++. In a nutshell, it implements the two types of nodes used in the model: M/M/1 for links and switches, and the QBD for the CU and DU, and it considers four types of events. First, for all nodes, we implement two event types: (1) arrival of an external frame; and (2) end of frame processing. In addition, in the case of the QBD nodes (CU/DU), two additional event types are taken into account: (3) change of functional split; and (4) end of stand-by situation. Several flows can be configured, and a routing matrix is used to establish the node that needs to process a frame, when it first enters the system, or whenever it finishes its processing by any other node 1 . Table 3 shows the configuration parameters that we use in all scenarios. We consider four functional splits (s = 4), with service rates µ 1,2,3,4 = {1, 1.5, 2, 4} pkt/ms. These values are chosen for illustrative purposes, and they reflect the different processing delays featured by each functional split option. In addition, the average time at each of the functional splits is given by the corresponding rates: γ 1,2,3,4 = { 1 100 , 2 100 , 3 100 , 4 100 } ms −1 . As can be seen, these rates are flipped for the DU, since processing is divided between the two entities (i.e. when split 1 is used in the CU for a particular frame, the DU should use split 4, and so appropriately complete the processing). Furthermore, we assume that ξ j is constant for all possible configurations, ξ j = ξ ∀j, and we modify the corresponding standby time to evaluate its impact. Matrix A establishes the probabilities of selecting the next functional split, upon a change from this particular configuration. In this sense, α j,k corresponds to the probability of going to split k from j, with α j,j = 0, and t=1 s α t,k = 1. As can be observed in Table 3, the corresponding matrix for the DU is the flipped version of the CU one, to reflect that a frame processed with a certain split in the CU requires a particular one in the DU.
As was mentioned earlier, M/M/1 nodes are used to model the behavior of both links and switches on the fronthaul network. In particular, we will use various service rates to reflect different situations. As a starting point, the service rates of the switches will be µ n = 5 pkt/ms (we will reduce it to 3 pkt/ms in the last scenario), while for the fronthaul links we consider two underlying technologies: optical fiber, µ of = 8 pkt/ms, and milimeter waves, whose service rate, µ mmw , will be varied (1, 2, 4 pkt/ms) to assess its impact. Table 3 also depicts the routing matrix that corresponds to the scenario shown in Figure 7. The indexes for both rows and columns are: cu 1 , cu 2 , cu 3 , du 1 , du 2 , du 3 , s 1 , s 2 , s 3 , s 4 ; and we assume that there are three different flows in the network: φ i , i = 1, 2, 3, from cu i to du i . The corresponding routes are depicted in Figure 7. As we mentioned before, this  Switches service rates µn = 5, 3 (pkt/ms) Optical fiber service rate µ of = 8 (pkt/ms) mmWave service rate µmmw = 1, 2, 4 (pkt/ms) Routing matrix (cf. Fig. 7) section is intended to validate both the proposed model and the simulator implementation. To this end, we have selected configuration parameters that permit us exemplifying different situations, but which, while sensible, are synthetic. In the next section, the proposed model will be used to analyze a realistic setup.

A. SINGLE NODE CU/DU
In the first set of experiments, we validate the model for a single CU/DU node. In Figure 5 we show the average sojourn time at the CU (upper figure) and DU (lower figure) when using the configuration depicted in Table 3, as we increase the incoming frame rate. We repeat the experiment for different values of the average standby time. The results that are obtained with the theoretical model are shown with solid lines, while the markers correspond to the values yielded by the simulator. In this case, 100 independent simulations, comprising the transmission of 10 6 frames, were carried out for each configuration (λ and ξ −1 combination), to ensure statistically tight results. First, we can observe an almost perfect match between the two approaches, thus validating both the proposed model for the CU/DU nodes, as well as the simulator. On the other hand, the Figure 5 also shows the great impact of the standby duration, since the average sojourn time heavily increases when ξ −1 gets higher. It is worth mentioning that, in real systems, it is quite likely that the time required at the standby configuration is much shorter than those characterizing the different functional splits, as   was reported in [19]. We can see that the DU yields longer times than the CU, since the probability of working at the quicker split configurations is lower. The figure also reflects the maximum admissible frame rate to ensure system stability (asymptotically increase of the delay for a certain λ). The corresponding values, which can be obtained using (9), are summarized in Table 4. Although the DU seems to be more restrictive than the CU, it is worth recalling that the stability of both nodes needs to be guaranteed, and so the maximum allowable rate for a particular flow shall be the lowest one. In order to characterize the overall end-to-end delay, along the complete fronthaul network, we exploit, as was previously discussed, the Jackson Theory, which requires that all nodes comply with the Burke's Theorem, so that the output process at every node is statistically identical to the one at the input [58], [60]. Hence, for the CU nodes, we need the output process to be Poisson, which implies that the inter-departure times follow an exponential random variable. Even if the incoming frame rate ensures system stability, as established by (9), there might be circumstances that hinder the aforementioned requirement. In this sense, in order to guarantee that the corresponding Jackson Theory conditions hold, we need that: (i) the incoming frame rate is lower than the slowest function split service rate; and (ii) the time at the stand-by situation can be neglected and the node thus moves instantaneously from one split to the next one.
In order to assess whether these two aspects need to be VOLUME 4, 2016 strictly respected or not, we use the simulator to study the inter-departure times at the CU. Figure 6 represents the corresponding relative standard deviation (RSD) of such times, which is defined as the ratio between the standard deviation and the average value. If the output of the CU node were a pure Poisson process, the corresponding RSD would equal 1. We observe that the RSD of the output process is substantially greater than 1 when the average standby time is large, and thus the output process could not be modeled as a Poisson process in this situation. Conversely, when the value of the standby time is much smaller than the split times, the RSD barely differs from 1. Hence, we can conclude that, under realistic conditions (i.e. standby-times much shorter than split times), the output of the CU node mostly corresponds to a Poisson process, even if the incoming frame rate is slightly higher than the slowest functional split rate (1 pkt/ms). According to that, the use of Jackson Theory (as was discussed in Section III) to analyze the end-to-end delay in the fronthaul network is valid.

B. FRONTHAUL END-TO-END DELAY
After validating the model that we have introduced for the CU/DU nodes, and studying whether it can be exploited to assess the overall end-to-end delay, we now focus on studying such parameter. We consider the scenario shown in Figure 7, which comprises three CU/DU pairs, and four switches that interconnect them. A flow is established between each CU/DU pair, and the corresponding paths are as follows (see Figure 7): We first assume that all links are of high capacity (optical) and they are not bottleneck, so that the do not impact the overall delay. Under this assumption, the links are not included in the evaluation, but only the CU/DU nodes, as well as the four switches, are considered. We increase the frame rate for all flows and we study the average end-toend delay for the three of them, as well as the overall delay. We use (14) and (15) Figure 8a shows the average delay. As can be seen, there is again an almost perfect match between the delays obtained with the analytical model and those yielded by the simulator. As the frame rate increases, the end-to-end delay gets higher. More interestingly, the graph also shows that the proposed model yields delays rather close to the ones obtained in the experiments, even when the requirements to apply Jackson Theory do not completely hold, i.e. when the traffic rate is higher than the slowest service rate (1 pkt/ms). In fact, there is not a relevant difference with the delays obtained with the simulator, even when getting closer to the maximum λ ensuring system stability, which is (when the stand-by time could be neglected) ≈ 1.4974 pkt/ms. The results show that for low rates the maximum delay is below 10 ms, while it sharply increases when the incoming rate surpasses the slowest service rate (1 pkt/ms). The simulator does not only allow us to validate the proposed model, but it can also be exploited to broaden the analysis. One particular aspect of interest is the dispersion of the delay, since not only its average value, but its variability as well might jeopardize the behavior of services with timestringent requirements. Since the model can only be used to ascertain the average value, we use the simulator to look at the delay variability. Figure 8b uses whisker plots to represent such variability for various λ (per flow) values. Each whisker plot includes the median (0.5-percentile) as an horizontal line within each box, as well as the 0.25and 0.75-percentiles, which correspond to the box lower and upper limits, respectively. In addition, the 0.05-and 0.95percentiles are also represented, as the lower and upper limits of the vertical lines. In addition, we have added, as a circular marker, the corresponding average delay. As can be observed, not only the delay grows with the incoming traffic rate, but the variability gets also higher. For instance, for a packet rate of 1.2 pkt/ms, the average delay is roughly around 2 ms, but the 95% confidence interval might be as large as 100 ms (5 times the average value).  We now use a different configuration. We keep the frame rate for flows 1 and 3 at 0.8 pkt/ms, and we increase the traffic for flow 2. As can be seen in Figure 7, f 2 traverses S 1 (which is also used by f 1 ), and S 3 , shared with f 3 . Figure 9 shows the end-to-end delays. We use solid lines to represent the analytical values, while markers correspond to the delays obtained with the simulator. We also carried out 100 independent experiments per configuration. In this case, for each run we ensure that the flow having the lowest rate generates 10 6 frames, and that the other two flows are always active (the number of transmitted packets is adapted to guarantee they are active during the whole experiment), so as to ensure the validity of the results. Again, the analytical and simulationbased results show almost not difference between them. We can see that when the fronthaul switches are not heavily loaded the impact of the increased traffic over the delay perceived by the other flows is not very relevant, and they stay almost constant for all λ f2 . Hence, we can also conclude that under these particular circumstances, the increased end- to-end delay for flow 2 is mostly due to the time spent at both the CU and DU nodes.

C. IMPACT OF BACKGROUND TRAFFIC
In order to complement the previous results, we synthetically increase the load of a number of the fronthaul switches, to assess how the end-to-end delay for the flows of interest is affected. We fix the rates for all flows to 0.8 pkt/ms, which ensures that the conditions to apply Jackson Theory hold. Then, we add some background traffic, with rate λ bg in S 1 and S 3 , and we study the end-to-end delay for the three flows. In order to add the background traffic in the model we just need to include an external flow at a particular switch (vector Φ), and accordingly adapt the routing matrix (R). We represent the results in Figure 10, where again solid lines correspond to analytical results, while markers are the values yielded by the simulator. The duration of each experiment is established by sending 10 6 for the slowest flow (including the background traffic), while we ensure that all the others are sending packets during the whole time. Once again, we can see an almost perfect match between simulation results and analytical values. In Figure 10a, we add the background traffic only at S 1 , while in Figure 10b, it affects both S 1 and S 3 . We can observe that when background traffic is larger, and so the load of the corresponding nodes gets higher, there is a clear increase on the end-to-end delay for the affected flows. In this sense, when the background traffic only affects S 1 , the end-to-end delay for flow 3 remains constant, since this flow does not traverse such node. On the other hand, when both S 1 and S 3 have some background traffic, the endto-end delay for flow 2 is more heavily affected, since it goes through both nodes, while the other two flows only use one of them. It is worth recalling that the service rate of the switches is µ mm1 = 5 pkt/ms and when λ bg = 3 pkt/ms, the load of S 3 would reach 4.6 pkt/ms (the sum of λ bg , λ f2 , λ f3 ), so at that point the network is fairly congested.

D. IMPACT OF HETEROGENEOUS LINKS AND ROUTING STRATEGY
The last experiment that was run over this validation scenario (cf. Figure 7), aims to evaluate the impact of the links characteristics over the network performance. We also assess how the routing strategy might yield lower delays. We assume that all links in the fronthaul network are of high capacity (µ fo = 8 pkt/ms), but the one connecting S 1 and S 2 , which is of lower capacity µ mmw , reflecting the use of a different technology (for instance, millimeter wave). Under these conditions, we also vary the routing strategy at S 1 for flow f 1 , so that with probability ϕ frames use the shortest path (i.e., traversing the link between S 1 and S 2 ), and with probability 1 − ϕ they will use the path: CU 1 → S 1 → S 3 → S 2 → DU 1 . Furthermore, in this setup we decrease the processing capacity of the four switches, to µ n = 3 pkt/ms, so that their impact over the overall delay is comparable with that of the millimeter wave link. Figure 11 shows how the overall average delay (considering all flows) varies as ϕ is modified. The rates for the three flows were 0.8 pkt/ms. Analytical results are represented with solid lines, while the markers are the values obtained with the simulator, again averaging the output of 100 independent runs, in each of them transmitting 10 6 frames per flow. There is again a good match between analytical and simulationbased results. The results show that the routing strategy has an impact over the network performance. More interestingly, we can actually see that there might exist optimum operation points, where the overall delay is minimum, which could be found by using the proposed theoretical model. In the scenario we are considering, when the service rate of the link between S 1 and S 2 is 1 pkt/ms, this optimum value is seen for ϕ ≈ 0.6.

V. PERFORMANCE OF A REALISTIC TOPOLOGY
In order to illustrate the applicability of the proposed model, in this section we use it to analyze the performance of a split selection policy over a realistic network deployment. We will first describe the system under analysis, and the particular split selection policy applied. Later on, we will use an outcome of that policy to feed the model and evaluate the expected system behavior.

A. SYSTEM DESCRIPTION
As mentioned before, the network consists of a set of links, switches, and base stations, each of which is divided into a CU and a DU. We now assume that all CUs are deployed into a single data center, located at a convenient position for the operator. Conversely, DUs are deployed close to the radio equipment. One fourth of the base stations correspond to macro cell, whereas the rest are small cells [61]. The geographical location of the base stations is that of a dense urban scenario, in which macro cells are distributed over a triangular grid, with an inter-site distance of 200 m, and small cells are randomly distributed over the covered area. Altogether, the scenario comprises G base stations. DUs and the data center containing the CUs are connected by means of a packet-switched fronthaul network, which, in addition to DUs and the data center, consists of layer-3 or layer-2 switches and high-speed links (1 Tbps). We assume that there is, on average, one network switch per 10 DUs and they are connected to the data center via a minimum spanning tree plus additional links from a Waxman model [62], until an average node degree of 3.5 is achieved [63]. The number of UEs is modeled with variable U , which corresponds to U = 10 × G [61]. UEs can be either uniformly distributed over the covered area, or concentrated into clusters.
Under this scenario, we assume that the functional split is dynamically chosen with the goal of optimizing userperceived performance. Namely, the network operator aims at instantaneously maximizing user data rates in a proportionally fair manner. This can be accomplished by selecting the functional split such that the sum of the logarithm of the user spectral efficiencies is maximized. In particular, we adopt the model described in [33], where the functional split selection is modeled as an optimization problem, as follows: φ g e ≥ 0 ∀e ∈ E, ∀g ∈ G, (19) x g ∈ {1, ..., s} ∀g ∈ 1, ..., G, (20) where p u is the signal power received by UE u from its serving base station, ς is thermal noise, i u,g is the interference power received by UE u from cell g. The serving base station of u is denoted by the index h u , c(x g ) is the maximum interference cancellation factor that can be applied in base station g given its current functional split x g , G {1, ..., G}, where we recall that G is the number of base stations. As for traffic, φ g e is the flow proasduced by base station g on link e, Φ e is the capacity of link e, E is the set of all links, E + (n) is the set of links leaving node n, E − (n) is the set of edges entering node n and r(x) is the capacity required by split x. Notice that in (16) the cancellation factor that multiplies i ug is determined by the lowest functional split of the interfering and serving base stations.
Problem (16) can be approximated by a Mixed Integer Linear Program (MILP), as shown in [33], which allows for timely solving, even for relatively large networks. Our setup considers G = 300 base stations and 4 possible functional splits: PDCP-RLC, MAC-PHY; Intra-PHY and C-RAN. The fronthaul protocols used for these splits are CPRI or eCPRI for C-RAN and Intra-PHY protocols [64], the nFAPI protocol for MAC-PHY [65], and the F1 application protocol for PDCP-RLC [66], as described in 3GPP recommendations. The use of these protocols produces a signaling overhead  that can be comparable or even greater than the actual user throughput [61]. Nonetheless, these additional throughput requirements are already considered in the model, since each functional split is described separately. For this scenario the equation (16) can be solved in less than 500 ms using operator grade equipment [33]. Our simulated time spans 12 hours and the optimal functional split is computed every second. Since there are strategies in the state of the art that can change the functional split in the millisecond range, we consider that changing the split every second is a feasible option, if required. Users are randomly distributed over the covered area, and move according to the mobility parameters proposed in TS38.913 [61]: 20% are vehicles moving at 30 km/h and the remaining 80% are pedestrians walking at 3 km/h. Their movement is mainly confined to street and squares without special preference for any specific point. Nonetheless, small clusters do form randomly, which influences the optimal functional split selection.

B. SYSTEM PERFORMANCE
By considering the features that were previously discussed to establish the optimal functional split policy, we then exploit the model introduced in Section III, as well as the eventdriven simulator, to assess the performance of a realistic network scenario. In particular, the network topology under consideration is shown in Figure 12. The picture shows the overall network that has been used during the simulation and from which we have obtained the statistics needed for the model. Then, we have chosen a sub-network, which is highlighted in Figure 12, to carry out the analysis.
Taking the outcome of the 12-hour simulation that was described above, we use the split change probabilities to obtain the corresponding rates (γ). To estimate the service rates, we assume that the network is limited by the computational resources in the data center, which is a sensible assumption, since all CUs are deployed in a single facility. This way, we use the computational complexity in the CU associated to each split [67], so that the C-RAN configuration would yield the lowest service rate (highest computational needs at the CU). We also assume that the service rate for this C-RAN configuration would correspond to an average channel quality. For that, we use mean transport block size (TBS) of a

FIGURE 12
: Real network scenario, the sub-network used for the evaluation is highlighted and zoomed in base station using 15 physical resource blocks (PRBs), which is 695 packets per seconds (packets are assumed to have 1500 bytes). From that value, we estimate the service rate for others splits by scaling that of the C-RAN scheme by the ratio of computational complexities (i.e. using a linear relationship). We reckon that different assumptions could be taken, rather than the computational limitation, and we leave other network configurations for our future work. Nevertheless, as will be seen below, the described configuration illustrates the applicability of the proposed model on a realistic setup. Furthermore, we neglect the propagation delay, since it is much lower than the overall delay. For a global distance of 3 km in the fronthaul network, the propagation delay would be more than 100 times lower than the values that were observed for the lowest traffic rate. All in all, we select four CU-DU pairs, which are connected by means of the fronthaul network comprising 6 switches, as shown in Figure 12. From the 12-hours simulation selecting the optimal split, we obtained the model parameters which are summarized in Table 5. As can be observed, the outcome of the split selection policy described above, and the corresponding solution of (16), does not embrace all the splits in a single base station. On the contrary, for the first base station, the optimal policy shifts between PDCP-RLC and MAC-PHY splits, while for the others it selects the highest centralization options. It is worth pointing out that different statistics would be obtained with other policies, network setups or users deployments, but our primary goal is to assess the validity of the proposed model for realistic configuration characteristics. On the other hand, the service rate of the switches is set high enough so that we can neglect their impact over the overall delay. Since all CUs are collocated, the corresponding routing matrix becomes trivial and it is therefore not included in Table 5. First, we focus on the CU behavior. Since the standbytimes are rather short compared to the average dwell time at each functional split, it is sensible neglecting them. Hence, by using Eq. (9) we can find the traffic rate that ensures system stability. We take CU 3 configuration, which just uses splits 2 and 3, with inverse of dwell time given by γ cu3 in Table 5. The maximum traffic rate to ensure system stability is, for this particular configuration, ≈ 1.22 pkt/ms. We then conduct an experiment in which we increase the incoming rate until 1 pkt/ms (roughly 80% of the maximum admissible rate), thus ensuring system stability and we analyze the delay in the CU. Figure 13 shows the results. The solid line corresponds to the delay yielded by the proposed model, while the markers reflect the results that were obtained after simulations encompassing the transmission of 10 8 packets. We can again see that the model is able to perfectly match the expected behavior, regardless of the particular CU configuration. More interestingly, results show that the delay can significantly increase, even if we keep the load well below λ max . The reason is that when the incoming packet rate is higher than the service rate of any split configuration, frames start to be kept at the buffer, until the CU moves to a quicker configuration. Since the dwell times at the various splits in real networks might be much longer than the ones used in the scenario that was studied in Section IV, the buffer length may strongly increase, and so the delay, which might become unacceptable.
In order to better highlight this behavior, Figure 14 and 15 show the evolution of the buffer lengths for a particular experiment. First, Figure 14 illustrates the instantaneous variation of the buffer length within a time interval of 3000 seconds and for an incoming traffic rate of 1 pkt/ms. We use gray areas to reflect the time intervals in which the CU was working at the third split configuration, with a service rate lower than the incoming traffic rate. As can be seen, the buffer length increases very sharply, reaching rather high values. On the other hand, when the CU moves to a quicker configuration (split #2), the graph evinces that the buffer occupancy also reduces at a very quick pace (the service rate in this split is 37% faster than the incoming traffic). As can be seen, the buffer length remains mostly below 10 frames, but when the slowest split configuration is active. Hence, the system is stable, but the average delay remarkably increases, up to unacceptable values, as was shown in Figure 13, showing in addition a very large variability.
Then, Figure 15 shows the probability density function (pdf) of the node occupancy (number of frames at the CU). We can first observe that the theoretical results match again the values obtained with the simulator. In accordance to what was seen in Figure 14, the results evince that lower buffer lengths are more likely with the fast split (#2), while longer buffer sizes are mostly caused by moving to split #3. On the other hand, the results, obtained for two different values of λ, also show the strong impact of the slowest service rate. When the traffic rate is slower than such value (upper figures), longer buffer lengths are not likely with split #2, but this is not the case when λ equals 1 pkt/ms, where the quickest split is needed to transmit the frames that were buffered when split #3 was active. In addition, the pdf is this case has a much longer tail (see the lower values of the y-axis in the figures), reflecting larger buffer lengths. Finally, Figure 16 depicts the end-to-end delay seen by the four flows, as we increase the traffic load for each of them. First, Figure 16a shows the average delay comparing analytical and simulated results. As can be seen, the theoretical model yields again the same performance than the experiments carried out with the simulator. The results show that the base stations using splits #2 and #3, MAC-PHY and Intra-PHY respectively, are less impacted by the increase in the traffic rate. On the contrary, the first base station, which uses the fastest splits in VOLUME 4, 2016 the CU, shows higher end-to-end delay as we increase λ, due to the DU behavior, whose service rates are slower, as could be seen in Table 5. The results also evince that, provided the traffic load is below the slowest split configuration, the end-to-end delays remains within reasonable levels. Then, in Figure 16b we show the delay variability, obtained using the simulator. We use again whisker plots, and we add as well the corresponding average delay, represented with circular markers. For low data rates all the flows present similar distributions. However, as we increase the incoming traffic rate, we can see that the delay at the base station with fastest splits in the CU (Flow 1 ) does not only has a higher average value, but also far larger variability.
In order to complement the previous discussion, Figure 17 shows the complementary cdf for the end-to-end delay, using the scenario depicted in Figure 12. As can be seen, for low values of λ, the probability of having long delays is very low. However, when λ gets higher, surpassing the service rate of a particular split (this mimics the configuration that was used to obtain Figure 14), the probability of suffering rather long delays is quite high. This can be used as another design parameter, considering the performance under a worst-case scenario.
All in all, we can conclude that the proposed model yields accurate results, even when using realistic configuration setups.

VI. CONCLUSION
In this paper we have introduced a novel model, based on queuing theory, which can be used to study the performance of CU/DU nodes in vRAN architectures. We can consider different service rates for the various functional splits, as well as dwell times for each of them and their corresponding standby times. The delay in traversing such nodes can be obtained with the matrix-geometric method. We have also studied the circumstances under which we could exploit this model, together with Open Networks Jackson Theory, to assess the end-to-end delay over a fronthaul network. We have shown that under sensible operation regimes, there is a very good match between the values yielded by the theoretical model and the real behavior.
We have also conducted a thorough study of the expected performance over a fronthaul network. An event-driven simulator was used not only to assess the validity of the proposed model, but also to broaden the analysis, by looking at the variability of the observed performance. In all cases, the match between the values obtained by the simulator and the proposed model is almost perfect.
Last, we have used a more realistic configuration, to assess the impact that the use of flexible functional split might have over the end-to-end delay over the fronthaul. The features of the scenario were selected from a realistic network setup, where the average sojourn times at each functional split might be much longer. We first confirmed that the proposed model still yields accurate results. On the other hand, we also saw that even if the stability of the CU/DU nodes Fronthaul delay (s) Prob. delay > x λ = 0.4 pkt/ms λ = 0.6 pkt/ms λ = 0.8 pkt/ms λ = 1 pkt/ms FIGURE 17: Complementary CDF of the end-to-end fronthaul delay was guaranteed, the buffer lengths, and so the delay, might strongly increase when the traffic load becomes higher than the service rate of a particular split configuration. In this sense, we saw that when the incoming traffic rate was higher than the slowest split processing rate, even if system stability was ensured, the delay could increase by a factor of > 100×. This can be used to carry out an analysis based on a worstcase performance, which might hinder the behavior of certain services, especially those having strict delay requirements.
There are two different lines of work that we will pursue in our future research. On the one hand we will study how the use of finite buffers and different traffic characteristics, as well as split selection policies, impact the system performance, using the developed simulator. On the other hand, we will also exploit the model to facilitate the design and planning of vRAN topologies, proposing sensible split selection policies. We will also exploit the developed simulator to propose and study different buffer management schemes.