A Data-Driven Multiobjective Optimization Framework for Hyperdense 5G Network Planning

The trials and rollout of the fifth generation (5G) network technologies are gradually intensifying as 5G is positioned as a platform that not only accommodates exploding data traffic but also unlocks a multitude use cases, services and deployment scenarios. However, the need for hyperdense 5G deployments is revealing some of the limitations of planning approaches that hitherto proved adequate for pre-5G systems. The hyperdensification envisioned in 5G networks not only adds complexity to network planning and optimization problems, but underlines need for more realistic data-driven approaches that consider cost, varying demands and other contextual attributes to produce feasible topologies. Furthermore, the quest for network programmability and automation including the 5G radio access network (RAN), as manifested by network slicing technologies and more flexible RAN architectures, are also among other factors that influence planning and optimization frameworks. Collectively, these deployment trends, technological developments and evolving (and diverse) service demands point towards the need for more holistic frameworks. This article proposes a data-driven multiobjective optimization framework for hyperdense 5G network planning with practical case studies used to illustrate added value compared to contemporary network planning and optimization approaches. Comparative results from the case study with real network data reveal potential performance and cost improvements of hyperdense optimized networks produced by the proposed framework due to increased use of contextual data of planning area and focus on objectives that target demand satisfaction.


I. INTRODUCTION
This section provides an overview of mobile network deployment trends and their holistic planning and optimization frameworks that inspire the research contribution presented herein.

A. OVERVIEW OF DEPLOYMENT TRENDS
The ongoing mobile data traffic growth is mostly driven by increased mobile broadband subscriptions globally coupled with the increase in the average mobile data consumption per subscription. The latter is mostly attributed to streaming and sharing of increasingly high-definition video, as well as, a range of emerging immersive video content (e.g. augmented reality). A recent report noted that global mobile data traffic will quadruple between years 2019 and 2025 to The associate editor coordinating the review of this manuscript and approving it for publication was Bijoy Chand Chand Chatterjee . 160 exabyte per month (with a further 53 exabyte per month projected for fixed wireless access traffic) [1]. Such projections on mobile data traffic growth and the diverse requirements (e.g. latency, reliability etc.) imposed by a multitude of new services, are prompting mobile operators to upgrade their network to address the emerging demands and remain competitive. The current typical scenario is for operators to maintain multi-standard networks with expanding fourth-generation (4G) long term evolution (LTE) footprint and while gradually phasing out preceding (pre-4G) technology generations. At the same time operators are maximizing the value of their LTE network investments by applying LTE-Advanced and LTE-Advanced Pro enhancements, which will provide not only capacity scalability, but also flexibility to adopt cellular connectivity for vertical services. These LTE-enhancements include infrastructureless proximity services for vehicles-to-everything and VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ public-safety scenarios, and narrowband cellular connectivity for Internet of Things (IoT) devices [2]. However, even as LTE network expansion is ongoing globally, mobile operators, equipment vendors and other industry stakeholders are already aggressively trialing and rolling-out the fifth generation (5G) network technologies to support evolving connectivity needs of the current decade and beyond [1]. Moreover, sudden or unforeseen changes in mobile traffic trends (as noted recently by changes in traffic patterns induced by COVID-19 measures [1]), further underlines need for networks that are scalable for both projected and unforeseen scenarios. To that end, 5G is envisioned to be a unifying connectivity fabric that not only accommodates exploding data traffic but also unlocks diverse vertical use cases. This is attributed to fact that 5G is specified from the beginning to enable enhanced mobile broadband services, provide support for mission-critical communications with stringent reliability and latency demands; and connecting massive number of IoT devices [3], [4]. The ensuing diverse system requirements has necessitated not only 5G core network advances (e.g. network slicing, virtualization, in-networking caching etc.), but also development of 5G new radio (NR) air interface with flexible numerology and increased capacity through enhanced spectral efficiency (e.g. with higher-order modulation, massive multiple-input multiple-output (MIMO) etc.) and operation in new high bands (including pioneer millimeter wave bands in the 24-28 GHz region) [5]. Moreover, 5G unlocks further network capacity gains through aggressive spectrum reuse by increased network densification [6]. The latter process of network densification refers to addition of new cell sites (typically small cell sites) to supplement existing macrocellular networks and is usually quantified by the site density (site/km 2 ) or inter site distance.
Network densification has been a mainstay of 4G networks with site densities of 10-30 sites/km 2 becoming the norm, particularly in urban scenarios [7], [8]. The need for small cells will be even more critical in 5G networks due to operation in higher spectrum bands and need to support traffic densities that are two to three order of magnitude higher than LTE [4]. The general industry consensus is that 5G will drive hyperdense deployments with site densities in excess of 150 sites/km 2 in urban and selected indoor scenarios [6]. With gradual rollout of 5G, the mobile network are becoming increasingly heterogeneous constituting layered cell types (indoor and outdoor small cells complementing umbrella macrocells), based on multiple radio access technologies (4G and 5G, alongside pre-4G technologies), operating in different spectrum bands (low and mid bands used by all radio access technologies (RATs) and 5G high-bands) [9]. Maintaining heterogeneous networks is strategic from a business perspective as it allows operators to maintain services for legacy user equipment (UE) categories while simultaneously providing capabilities to support new vertical-driven use cases and enhanced user experiences leveraging 5G performance enhancements.

B. HOLISTIC PLANNING FRAMEWORKS
However, the transition towards these heterogeneous hyperdense networks is exposing a number of significant challenges, such as, the increased complexity in network planning and optimization. Network planning in this cellular context connotes to the process of determining the number, location, and configuration of base stations (macrocells and/or small cells) to create a mobile network topology that satisfies pre-determined objectives or requirements of from the perspective of different stakeholders. Typically, this includes the supply-side perspective of mobile network operator's (e.g. maximizing return on investment (ROI) per small cell) and the demand-side perspective subscribers or end user (e.g. achieving certain quality of service (QoS)). It is noted that in this case the 'user' is not limited to human users, but also includes devices, machines or other connected ''things'' [10]. Careful network planning is equally useful for greenfield deployments in target planning areas where no prior deployments exist, as it is for brownfield deployments (or incremental deployments) enhancing or extending existing network infrastructure. The brownfield scenario exemplifies contemporary network planning processes, whereby, network operators are looking to apply 5G upgrades on existing 4G networks. In general, the network planning process constitutes at least three concurrent (and sometimes recursive) phases, namely: 1) Network dimensioning: this is the initial step that typically uses link budget analysis to obtain a rough estimate on the number of base stations required to fulfill certain high-level coverage and capacity targets for a given planning area [11]. 2) Detailed network planning: The detailed network planning phase utilizes network dimensioning estimates to evaluate a more accurate number of required base stations, as well as, their precise locations and initial cell parameters (e.g. antenna heights, azimuths, tilts etc.). 3) Post-deployment optimization: The recurring optimization procedures conducted in production networks to maintain or enhance performance in response to unforeseen or dynamic factors not captured in the detailed network planning phase [12]. These factors may include persistent network failures or quality degradation and demand evolution due to new services, subscriber growth, adoption of higher category UEs and so on.
The investigations on methods to overcome challenges in detailed network planning continue to gain significant traction in the research community as the need for hyperdense 5G (and beyond) networks becomes more apparent (see, for instance [13], and references quoted therein). The densification approach in pre-5G networks was typically motivated by macrocell network performance shortcomings observed from post-deployment optimization procedures, customer feedback, crowd-sourced data, network drive or walk test campaigns, and so on. These findings would then inform small cell placement decisions usually targeting coverage holes, traffic hotspots or selected offloading relief points for overloaded macrocells [8]. This reactive small cell deployment approach allowed for small cell deployments without the obligatory detailed network planning. By contrast, the envisioned scale of densification in 5G networks makes small cells deployment a critical pre-requisite for 5G fulfilling service requirements, rather than simply a macrocell complement. Moreover, it implies that small cell sites are becoming a major contributor to the network's overall energy footprint and total cost of ownership (TCO). These factors underline the need for detailed planning of small cell deployments in the 5G network rollouts [14]. However, these heterogeneous hyperdense 5G networks reveal limitations of traditional planning approaches, but also unlock opportunities for new approaches for detailed network planning processes.
Additionally, new small cell ownership and sharing models are emerging beyond traditional operator-only models [15]. The enormous amounts of data produced from existing networks and derived from other open data sources (including non-telecom data) are also providing useful input for increasingly data-driven planning process [15], [16]. Mobile network operators are also seeing increased incentive to transform from being mere broadband service providers to providers of flexible communications and computing platforms supporting a multitude of services with differing key performance indicator (KPI) requirements and deployment options [4]. This flexibility in supporting these diverse services (both legacy and new services) whilst minimizing cost of densification is also motivating developments in radio access networks (RANs) taking advantage of increased virtualization and softwarisation in the RAN [17]. These radio access technology evolution and different deployment trends are strongly interrelated and underline need for enhanced planning frameworks that consider them in a holistic manner.

C. CONTRIBUTIONS OF THIS WORK
This article proposes a data-driven multiobjective optimization framework for hyperdense 5G network planning with practical case studies used to illustrate possible added value compared to legacy approaches. Specific contributions embedded in the proposed framework include the following: 1) Introducing a pragmatic approach for the selection of candidate site locations for hyperdense deployments, taking into consideration different small cell infrastructure ownership or sharing models, as well as, characterizing the cost efficiency of individual candidate site as opposed to approaches assuming homogeneous candidate sites. 2) Adoption of a data-driven approach leveraging contextual datasets (e.g. geospatial, spatiotemporal traffic, demographic data etc.) of target area as an input to the planning and optimization processes to enhance the precision of the 5G hyperdense network planning under realistic conditions. 3) Adoption of use case or service driven planning targets that is commensurate with the needs of the emerging slice-based approaches, whereby, satisfaction of demands of individual (per user, per service etc.) KPI requirements becomes a primary objective, rather than common cell-level or network-level KPI targets. 4) Implementation of brownfield network planning process that targets to optimize new 5G deployments as complement to legacy pre-5G infrastructure, considering not only multi-RAT operation but also heterogeneous architectures including new next-generation RAN (NG-RAN) split architectures.
The numerical results from a realistic planning case study reveals a number of interesting insights when the proposed multiobjective optimization framework is benchmarked against traditional approaches: 1) By focusing on satisfying KPI demands of individual user services (rather than homogeneous network-wide KPIs) and utilizing user distributions derived from realistic data, the proposed framework is not only able to satisfy demand more effectively but also does so with higher cost-efficiency. 2) The candidate site selection is critical in ensuring computational tractability of hyperdense planning and optimization. The framework candidate site selection is also informed by contextual data and is based on multiple criteria, thus achieving better performance (demand satisfaction) and cost-efficiency, compared to candidate site selection based on single criteria (e.g. cost).
The rest of the paper is organized as follows. Section II presents a state-of-art survey of network planning and optimization approaches and identifies trends and gaps to be considered for holistic planning framework proposed herein. Thereafter, Section III outlines the system model of the proposed framework and Section IV presents the case study to evaluate the framework against commonly adopted approaches. The results obtained from the case study are then analyzed in Section V. Finally, the concluding discussions and future potential research directions are presented in Section VI. List of acronyms used in this paper are presented in Table 1.

II. RELATED WORKS AND GAPS A. ANALYSIS OF TRENDS
As noted previously, the need for holistic planning frameworks is motivated by several technology and deployment trends. An analysis of the trends is provided prior to a summary survey of the pre-5G/5G network planning research works, some of which aspire towards holistic planning approaches.

1) HETEROGENEOUS SMALL CELL OWNERSHIP MODELS
The small cell deployments in pre-5G networks have been mostly operator-led (in same way as macro deployments), with an operator responsible for site acquisition, infrastructure deployment and site maintenance. This approach provides network planning autonomy for the operator in terms small cell placement decisions. However, ongoing hyperdensification makes it commercially unsustainable for operators to meet growing service demands from own small cell deployments, thus motivating alternative small cell ownership or sharing models [15]. Site sharing through commercial agreements has been prevalent in macro deployments and has gained traction for small deployments. However, the use of small cells deployed by non-operator third parties (neutral hosts) is project to contributed majority of the deployed small cells after year 2023 [9]. These neutral hosts (e.g. public venue owners, municipalities etc.) leverage their infrastructure (e.g. buildings, utility poles, advertisement panels etc.) to deploy and provide smallcells-as-a-service model for use by one or more operators [15], [18]. Another interesting but relatively niche model is of operator-branded but user-deployed small cells for closed-access residential use analogous to traditional private Wi-Fi deployments [19]. From a network planning perspective, the neutral host small cells present a planning constraint in terms of possible candidate site locations and operator decisions on whether to deploy or share small cells in specific locations.

2) ADOPTION OF DATA-DRIVEN NETWORK PLANNING PARADIGM
The increased availability of information on the time-varying context of target planning areas is enabling enhanced precision or accuracy of network planning and optimization processes [14]. This contextual information is typically data that provides realistic representation of attributes that influence network planning decision [16]. This includes information on the 3D radio propagation environment (buildings, terrain, and other obstacles), spatiotemporal demand distribution (user locations and services consumed), KPI requirements for different services, availability of support infrastructure (site facilities, backhauling, energy sources etc.) and techno-economic parameters (e.g. TCO, average revenue per user, market share etc.). In the past, mobile network planners have circumvented lack of realistic data by simplifying the planning process through approximations derived from accumulated planning experience or leveraging synthetic data obtained from mathematical models for radio propagation, traffic distribution, and so on. However, the requirement for high-precision planning for hyperdense 5G networks [14], advances in data analytics [16], and the increased access to relevant contextual datasets (e.g. public sector Open Data [20]), is shaping the trend towards adoption of a more data-driven paradigm in 5G network planning [16], [21]- [23].

3) INCREASED COMPLEXITY IN THE NETWORK PLANNING PROBLEMS
Detailed mobile network planning is essentially a practical optimization problem of evaluating optimal topology while considering two or more mutually conflicting objectives. For instance, a commonly encountered conflict in network densification is the one between minimizing costs (number of base stations) and maximizing achievable performance (frequency reuse gains). In such situations there is no single solution that minimizes all the objectives. A common approaches for solving these problems is by use of multiobjective optimization approaches that produce a set of optimal trade-off solutions known as Pareto set or Pareto front of solutions [24], [25]. However, solving these multiobjective problems is complex and their optimal solutions belong to the class NP-complete that cannot be found in polynomial time. In the case of detailed planning of hyperdense 5G networks, the complexity of multiobjective optimization is underscored by the inherent heterogeneity in the system and addition of optimization objectives or constraints that may have been considered insignificant in pre-5G network planning (e.g. minimizing electromagnetic field (EMF) exposure levels [26]). Moreover, the need to plan for increasingly large number small cell deployments and exploitation of big data resources for planning, places high demands computational resources, necessitating innovative problem formulation to be within the feasible computational efficiency bounds.

4) MIGRATION TO A SERVICE-ORIENTED NETWORK PARADIGM
The planning of pre-5G networks has traditionally been underpinned by some common network-wide performance targets (e.g. cell edge rates, coverage probabilities, etc.) with minimal consideration of the differentiation between targeted services. This ''one-size-fits-all'' approach for the legacy systems is not suitable for the new increasingly diverse service requirements placed on mobile networks. Flexibility is, among others, now a very important requirement for more service-oriented paradigm. To that end, 5G networks are service or use case driven, essentially implying that they are engineered to be flexible connectivity platforms that simultaneously support a multitude of services with differing requirements and deployment options [4]. This service differentiation manifests not just in terms of KPI requirements (e.g. data rates, traffic densities, latency, reliability, mobility, security etc.), but also in terms of functional requirements of the service (e.g. positioning, caching, computing, security etc.). Network slicing is main enabler for this new service-oriented paradigm [27]. With 5G network slicing, a mobile operator can build different end-to-end logical networks (slices) on a common and shared network infrastructure (network resources and functions). The instantiated network slices must satisfy certain network performance and function requirements of each service instance. Furthermore, the network management and orchestration systems will continuously monitor and adapt the performance of network slices such that requirements of the service are met throughout the session. A notable development in this context is the O-RAN Alliance specified open interfaces and intelligent RAN controllers that enable external applications to have assurance RAN slice KPIs [28]. From a 5G network planning perspective, this service-oriented paradigm also necessitates formulation of the optimization objectives towards fulfillment or satisfaction of requirements of different types of services envisioned for the network. Indeed it is noted that satisfaction-based optimization approaches have appeared in recent studies on Wi-Fi access point selection [29] and resource allocation in 5G sliced networks [30]- [32].

5) HETEROGENEITY OF RAN TECHNOLOGIES AND ARCHITECTURES
The 5G NG-RAN is characterized by heterogeneity. The interworking approach for multiple RATs (pre-5G, 5G, beyond 5G, non 3rd generation partnership project (3GPP) etc.) is embedded in the NG-RAN specification, allowing for continuity of support of diverse user device technologies and service types through intersystem roaming, dual connectivity, service exposure, network or resource sharing and so on [17]. Furthermore, virtualization of network functions in the NG-RAN enables disaggregation and functional decomposition of the 5G base station or next-generation node B (gNB) into central unit (CU), distributed unit (DU) and radio unit (RU) [17]. As a result, in addition to traditional integrated gNB deployments, different cloud-RAN (C-RAN) architectures are possible depending on the CU/DU/RU functional splits adopted and their placement within the NG-RAN. Additionally, the control plane and user plane separation allow their independent scaling and separate placement within the network. This deployment flexibility is attributed to the fact that most of those functions are hosted as software services, and can be dynamically instantiated at different parts of the network, such as, the radio site, on the edge or even at remote data centers) [33]. Essentially, this software reconfigurability enhances the automaton and programmability of the overall 5G network and caters to flexibility demands of network slicing. For network operator or service provider, the actual placement decisions for different network functions is typically influenced by factors, such as, KPI targets, cost, backhaul/fronthaul capacity, deployment scenario needs or business requirements. The function placement flexibility and considerations also include placement of some core network functions and application layer functions (e.g. edge computing, caching etc.) in the RAN. Notably, the placement of the core network's user plane function (UPF) within or on the edge of the RAN is key enabler for provisioning slices with stringent latency requirements [33]. These heterogeneous aspects of the 5G-RAN impact network planning not just in terms interworking between systems but also implications of different architectures, particularly in a brownfield network planning process that targets to optimize 5G new deployments as complement to legacy pre-5G infrastructure.

B. IDENTIFIED GAPS
The multiobjective optimization framework for network planning have been discussed in some previous publications. Table 2 acknowledges the advances made in the literature, but also highlights the gaps and how they are considered in this paper.

1) RADIO ACCESS TECHNOLOGIES AND ARCHITECTURE
In Table 2 majority of research has focused on pre-5G network planning for single-RAT heterogeneous networks from the perspective of multiple cell types or layers (macro and small cells). Furthermore, most of 5G network planning studies are underpinned by greenfield deployment assumptions, with no consideration of interworking with legacy pre-5G deployments in the problem formulation.
Considerations for Proposed Framework: The network planning have to take into account the flexibility of the NG-RAN architecture and inherent heterogeneity as noted previously in Section II-A5. Notably, there have been recently a steady increase in network planning research works [34], [35], [42], [50]- [52], which account for the disaggregated deployment of gNB functions (RUs, DUs, CUs) across different physical locations. This typically results in network planning problems where placement of small cell functions (DU and CU) are optimized at potential tiered-level locations represented as RAN edge node or processor pools sites, 169428 VOLUME 8, 2020 subject to constraints on backhaul/fronthaul capacity, energy, cost and so on [33]. However, these works do not provide detailed evaluation of RAN performance or consideration of spatiotemporal user and demand distributions. With the new NG-RAN architectures there is noted need for network planning and optimization to consider end-to-end performance including radio access, edge and transport.

2) RADIO PROPAGATION MODELLING
The pathloss coverage map contains the information on the pathloss at different locations in a target planning area with respective to specific base station locations and their antenna parameters (antenna pattern, tilt, height, etc.). As such, pathloss maps of high accuracy and resolution are required for optimizing coverage and minimizing interference in the planning phase. The pathloss predictions are obtained using propagation models that can be roughly split into empirical models and deterministic models [53], [54]. Empirical models are derived from measurements performed in specific locations and re-calibrated for use in other target planning areas. This allows for propagation modelling to be carried out with low computational effort while at least rough geospatial data of the target planning area is required. Due to simplicity of use empirical models represent a regular choice particularly in the pre-5G network planning as noted in Table 2. However, this simplicity and ease of reuse of the models comes with a disadvantage of limited modelling accuracy. On the other hand, deterministic modelling typically relies on ray tracing algorithms used to compute radio propagation in multipath environments. The propagation modeling accuracy is much higher than for empirical models, but with the cost of higher computational effort and need for detailed 3D geospatial data of the target area. A notable exception is geometry-based stochastic modeling approaches which overcome geospatial data unavailability by using predefined stochastic distribution to recreate scatterers [54].
Considerations for Proposed Framework: There is increased popularity of deterministic models for propagation modeling in 5G and beyond systems regardless of the challenges noted above [54]. This is attributed to factors including the flexibility of accurately modelling channels with carrier frequencies ranging from sub-6 GHz to even Terahertz bands, different carrier bandwidths, as well as, the need to evaluate advanced 5G multiantenna techniques (e.g. beamforming) that place high demand on channel spatial resolution. Furthermore, deterministic models provide the flexibility of modelling time-varying channels with mobile radio transmitters and receivers, as well as, scattering objects in the surrounding vicinity.

3) CANDIDATE SITE SELECTION
In practice the deployment of base stations is only feasible in a limited number of candidate site locations within a given planning area, due to constraints, such as, availability of site facilities (e.g. grid power, fixed line infrastructure etc.), deployment costs, permits for civil work and right of access.
To that end, the type of base station or cell would also dictate the number of possible sites. Macro base station antennas are typically deployed 20-60m above ground and require erection of radio towers or deployment at rooftops of tall buildings. On the other hand, outdoor small cells are usually deployed closer to the user ''below rooftop level'' several meters above ground and present more diverse ownership models (as noted previously in Section II-A1), which results in equally diverse types of site locations.
Considerations for Proposed Framework: From a network planning perspective, the knowledge on the available candidate sites is critical in ensuring practical feasibility of topologies computed by network planning algorithms. It is noted in Table 2, that the use of candidate site locations has been limited to leveraging of existing pre-5G sites (mostly macro sites) as input for network planning, whereas, candidate site locations for small cell sites are usually arbitrarily selected with little or no practical consideration. With onset of hyperdensification, the careful consideration of small cell candidate sites will be critical for evaluating overall deployment costs and ensuring small cells have prerequisite access to high capacity backhaul and energy sources.

4) USER DISTRIBUTION AND DEMAND DEFINITION
Macro site deployment is typically planned to provide a blanket coverage over a given planning area. To that end, the supplementary deployment of small cells closely correlate to the expected spatiotemporal traffic distribution, whereby, denser deployments occur in areas with significant periods of high-traffic densities. The spatiotemporal traffic distributions are determined by variations in user distributions and their respective demands, whereby, user distribution refers to the actual spatiotemporal distribution of users in the planning area, whereas, demand represents actual KPI value (e.g. throughput) required to support each user service. In most studies, probability distributions (e.g. Poisson, uniform etc.) provide a convenient way for generating user distribution for the network planning (see Table 2). Yet, more realistic user distributions are obtained when applying the population data. Similarly, the definition of service demand has usually been simplified to few arbitrary chosen target values or it is simply replaced by network performance measures (e.g. cell edge or center rates) that provide minimum performance that user should achieve to support all services envisioned in planning phase.
Considerations for Proposed Framework: Simple models may work in conventional macro-only networks but result in sub-optimality when network density is increasing due to introduction of small cells. To that end, spatiotemporal traffic data from existing pre-5G networks [22], [55], [56] provides a useful resource for optimizing hyperdense 5G and beyond networks. Sets of practical network data provide dynamic user distribution (e.g. hourly user/service dynamics) and service demand trends that can be used to predict accurately future traffic demands for a given service. Furthermore, with the emergence of a slice-based service-oriented networks, the definition of service demand should be linked to specific requirement of network slices (e.g. as noted in [30], [47]).

5) COST MODELLING
Although the small cell site deployment and operation costs are low (compared to macro sites), they may represent a significant fraction of the overall network TCO due to increased densification. Therefore, modelling of costs in network planning process is critical while ensuring that obtained network topology meets performance targets within allowable or sustainable cost constraints. The cost consideration in most network planning studies of Table 2 has been abstracted and represented in terms of the number sites, essentially with assumption that all sites admit the same cost.
Considerations for Proposed Framework: The above explained simplification of non-parametrized cost modelling does not account for heterogeneity of candidate site locations and related cost differences due to presence (or lack) of permits, built structures, site facilities (e.g. powering, backhaul etc.), as well as, their ownership models noted in Section II-A1. This calls for more explicit modelling of all cost factors in 5G hyperdense deployments. Furthermore, the transition towards C-RAN architecture underlines the need for careful cost modeling including not just radio sites, but also costs of related radio edge deployments and transport infrastructure, which may for instance influence the kind of split architectures adopted in a particular part of the network [35], [50]- [52].

6) OPTIMIZATION OBJECTIVES AND APPROACHES
In earlier discussions it was noted that mobile network planning problems usually entail evaluation of optimal topologies under the consideration of two or more mutually conflicting objectives. To that end, the common optimization trade-off noted in Table 2 has been between maximizing the achievable performance whilst minimizing the cost. As noted previously the cost in most cases has been represented indirectly by the number of small cell radio sites.
Considerations for Proposed Framework: The transition toward flexible cloud-based 5G RAN architectures that tightly integrate access with backhaul/fronthaul [57], further underline usefulness of optimizing end-to-end performance and costs aspects in hyperdense deployments. Furthermore, the use of network slicing is increasing the need for versatility in formulation of optimization objectives, not only being limited to throughput-related targets, but also satisfying other KPIs (e.g. latency) included in slice definition.

III. SYSTEM MODEL
The holistic planning framework proposed in this paper is depicted in Fig. 1. It consists of three core parts, namely data collection, data driven analysis and multiobjective optimization. The framework takes relevant contextual data of target planning area as an input and provides Pareto optimal networks as an output. Comparative performance evaluation is performed over realized networks to thoroughly understand trade-offs between satisfaction and economic indicators. The aim of the evaluation is to support operator's decision on network deployment. The framework is tuned for brownfield planning in target areas with pre-existing pre-5G network deployments. Yet, it can also be applied for greenfield 5G planning with modification on models that are based on the existing network data.
Objectives for core parts of the framework and related methodologies are described as follows.

A. DATA COLLECTION
The effectiveness of data-driven heterogeneous hyperdense network planning is highly dependent on the availability and quality of contextual data of target planning area. This includes, geospatial data, data describing existing networks deployed in the area, data on subscribers and services supported by those networks, and cost and revenue related data [14]. This input data is collected by operator from the network, as well as, external data sources including over-thetop service providers and public data [22]. The quality of this gathered data is dependent on its ability to provide accurate understanding on users' demand, candidate sites and radio propagation in the planning area. Most important data sets for the framework are briefly described as follows.

GEOSPATIAL DATA
This term refers to data providing digital representation of geospatial phenomena, such as, terrain, building, roads/rails, vegetation and other relevant geospatial data that affects radio propagation and demand distribution. Such data can be obtained from 2D/3D maps, built infrastructure plans for the area and other geographical information data sources.

EXISTING NETWORK DATA
We need existing network topology, configuration and coverage data, but also spatiotemporal distribution of users/devices and their traffic, obtained from the operator's network management system (NMS).

SERVICE DATA
The demand distribution is one of the key planning inputs. To formulate it, we need the service data from operator's existing network and KPI requirements defined for future services.

COST AND REVENUE DATA
Network TCO is a sum of various factors including cost of required network upgrades, equipment, installation, new site acquisition and/or rental, powering, fronthaul/backhaul, maintenance and licensing [58]. Data that describes these costs needs to be surveyed in the context of the planning area and targets. We note that also costs of using street furniture and site sharing are also relevant while predicting the overall network cost. Finally, data that provides insights on spatiotemporal distribution of current or projected revenues is also important in ensuring ROI.

B. DATA DRIVEN ANALYSIS
This analysis is performed to produce service demand distribution, to identify candidate sites for small cells and to compute propagation.

1) GEOSPATIAL MODELING
Geospatial modeling of the planning area is a key input needed for the accurate propagation computation. The terrain, vegetation, buildings and other 3D structures need to be modeled in detail to accurately capture their impact on the radio propagation. The impact of the applied radio frequency can be captured well only if accurate geospatial models are developed with thorough understanding of the propagation characteristics [54].
Combining available geospatial data and future built infrastructure development plan of the area, aforementioned maps are created with knowledge on requirements of applied propagation computation tool. Implementation of maps is carried out using 3D map editors, for instance WallMan module of WinProp software [59].

2) PROPAGATION PREDICTION
The performance characterization of 5G NR is highly sensitive to the choice of the propagation model, especially in mmWave frequencies [60]. Previously in Section II-B2, it was noted that the use of deterministic 3D ray-tracing methods and high-accuracy geospatial maps provides useful means for radio propagation prediction across 5G low, mid and highbands [54], [61]- [63].
Propagation predictions are computed for existing cells and all small cell candidate sites. We denote the resulting average channel power response by ∈ R N t ×N a where N t = N m +N c , N m is the number of existing cells and N c is the number of candidate sites and N a is the number of area elements (pixels) in the planning area A.

3) USER AND DEMAND DISTRIBUTION
As satisfying users' demand is a key objective of the proposed network planning framework, users and their demand distribution needs to be modeled as realistically as possible. Typically planning studies assume that users are distributed uniformly or according the Poisson point process while the VOLUME 8, 2020 user density is derived from population density or by some other means, as noted previously in Section II-B4. Yet, better accuracy can be achieved if user distribution is obtained by applying the user distribution data obtained from the existing NMS [6].
Conventionally, the NMS follows the number of users attached to different base stations (BSs), with the resolution that depends on the cell sizes. Besides this cell-level information on user distribution, the current network monitoring tools can provide spatiotemporal user distribution statistics per pixel with relatively high resolution [64]. In urban area, such a tool typically achieves around 50 m × 50 m pixel resolution that can be enhanced to 20 m × 20 m pixel resolution by applying machine learning techniques [14].
In Fig. 2 we propose an approach to obtain user and demand distribution. The approach starts either from the number of users per cell or per pixel. In the former case, we apply the pixel-level cell coverage map from the NMS or we simulate it. The instantaneous number of users is obtained by using the Poisson distribution over the statistics.
Users' service demand can be expressed in terms of indicators such as throughput, reliability (e.g. block error rate) or latency. The demand definition should reflect thorough understanding of user behaviour, services and their requirements. Moreover, as noted previously in Section II-A4, in network slicing context this demand definition provides basis for definition of user or service-specific slice templates. In Fig. 2 we have a simple approach based on service clutter map that is generated based on targeted services and their spatial distribution that can be obtained from operators' planning targets.

4) SPATIAL USER SATISFACTION
User satisfaction is a measure for the match between the user demand and networks' ability to fulfil the demand with respect to certain performance indicators like data throughput and latency. The match varies in different locations since neither signal coverage nor the user distribution are even. Let A be the area where the network planning is carried out and let I (u, x) refer to a performance indicator value for a user u in the location x ∈ A. Then we define the user satisfaction as where I d (u) is the performance demanded by the user u.
We make the following notes on the measure (1): • User is optimally satisfied in location x if S(u, x) = 1. Otherwise, user either suffer from the service underprovision (S(u, x) < 1) or enjoys of overprovision (S(u, x) > 1). These cases are suboptimal from the network operator perspective.
• While S(u, x) represents the satisfaction of a certain user in a certain location, in network studies we create a user satisfaction statistics that is denoted by S. The spatial user satisfaction can be obtained from S by focusing the observation on the values of S in some limited geographical area like pixel.
• The network performance target can be defined in terms of satisfaction outage. Such occasion takes place when S(u, x) < 1. Let N u be the number of all users in the statistics. Then we define and the satisfaction outage reads as S out = N out /N u . We now set a target S out < p, where p defines the target probability of an outage.

5) SELECTION OF CANDIDATE SITES
It was noted in Section II-B3 that selection of candidate sites is one of key steps in ensuring practical feasibility of deployments and computational tractability of the network planning and optimization problems. While the complexity of network planning and optimization problem increases with the number of candidate sites and the target site density, complexity can be alleviated by a deliberate process of elimination of impractical small cell site locations from the overall candidate site set. This elimination can be based e.g. on location specific costs and customer local dissatisfaction. We propose a small cell candidate site selection methodology that efficiently produces a trade-off between conflicting interest. Fig. 3 represents an approach that can be used to select candidate site locations based on site deployment cost and spatial user satisfaction. We notice that the site deployment cost is computed based on the collected cost data and it is a sum of many factors. It includes the costs of building the last mile optical backbone, equipment installation, site rental and/or acquisition costs [58]. Furthermore, the site deployment cost may include the reduction that can be obtained through site sharing or by using low-cost site locations in e.g. street furniture.
A location can become a candidate if its site deployment cost is less than a given cost cap and the local user satisfaction is below the target level. The resulting number of candidate sites is determined by the values we set for the cost cap and the target level of satisfaction. To further relax the computational complexity, filtering can be optionally performed on the resulted candidate site set by setting a certain minimum mutual distance among the candidate sites.
When satisfaction is measured in terms of latency, we also need to know transport network topology: where are sites for processing pools and what is the transport technology.
The contribution of radio propagation time to the end-toend (e2e) latency is negligible. It was previously noted in Section II-A5 that in C-RAN deployments with functional decomposition, the small cell would include DUs and/or CUs functions placed in different processor pool or radio edge node locations depending on the adopted functional splits in the RAN [17], [33], [35], [50], [52]. The e2e latency in the transport network is sum of delays due to transmission, traffic switching, processing as well as interface encapsulation that depends on the adopted functional splits and placements (including UPF of the core network) [33]. With sites for the pools and all possible placement of the network functions, we can know all possible e2e paths from each candidate site incurring their own e2e latency.

C. MULTIOBJECTIVE OPTIMIZATION
It is recalled that in brownfield scenarios, densification through small cell deployments occurs as a complement to existing networks. The main planning challenge is to define the best set of small cell site locations when the number of candidate site locations is large and there are multiple performance targets. Natural methodology in this problem setup is provided by the multiobjective optimization. Accordingly, we apply herein a metaheuristic multiobjective optimization method, whereby, we start from a possible network deployments, simulate the system with selected deployments, compute values of objective functions and reformulate the networks until the ending criteria of the optimization is fulfilled [49]. In each phase we leverage the input planning data described in previous sections.

1) SYSTEM SIMULATION
To produce and rank the possible network realizations based on the user satisfaction, Monte Carlo simulation is performed. Let us denote a possible network realization by a vector x = (1, x c ), where 1 refers to the existing network and x c ∈ {0, 2, 3, . . . , N o + 1} N c contains the prospective small VOLUME 8, 2020 cell extension. N o is the number of options for candidate sites in terms of small cell type and e2e paths. For a site, 0 value means no small cell placement while each remaining values refers to applied specific small cell type (e.g. femtocell, picocell, microcell etc.) and e2e path from the radio site to the UPF. The system simulation for a network x, is performed as follows.
First, let P r ∈ R N t denote the cells' transmission power vector on the downlink reference channel used for the cell selection. Then the received reference channel power matrix R r ∈ R N t ×N a is obtained as where refers to the Hadamard product. Using the users' locations, the reference power matrix R r can be mapped to the matrix R u r ∈ R N t ×N u that contains users received powers on the reference channel, where N u is the number of users. The user cell association is then performed based on the best received power criteria. To ensure the offloading from existing cells towards small cells, some bias can be added to the cell association procedure. Now, for i th user, the serving cell c * i is obtained from Once users are attached to cells, a binary serving cell matrix C ∈ {0, 1} N t ×N u can be defined. That is, for i th user C(c * i , i) = 1 and C(c, i) = 0 otherwise. We denote by C the complement of C. Further, the vector containing the users' SINRs (per resource block) is computed using the formula where is the Hadamard division and col refers to the columnwise summation. Moreover, P n is the white noise power and R u d contains users' received powers on the data channel. Now R d = diag(x P d ) , where P d contains the cells' data channel transmission powers. We note that SINR values from (5) are applied to compute performance indicators such as throughput and reliability using a mapping table or function for applied radio technology [65], [66]. For instance, the user throughput vector can be obtained similarly as in [66], using the formula: where B prb , B eff , SINR eff , N prb and N mimo are bandwidth per PRB, bandwidth efficiency, SINR efficiency, number of PRBs vector and number of MIMO layers vector, respectively. Based on serving cell matrix C and e2e path in x, latency of users can be calculated using the formula where T r , T t , T sw and T p are total two-way latency due to radio propagation, fiber transmission, traffic switching and processing and T DU , T CU and T UPF are latency due to interface encapsulation resulted from split and placement of the DU, CU and UPF network functions [35].
The user satisfaction statistics is now embedded in the vector S = I I d , where I d is user demanded performance indicator value vector and I is the indicator vector that is obtained from the simulation, (6) for throughput and (7) for latency. Finally, satisfaction objectives for the multiobjective optimization can be derived from S, e.g. in terms of percentiles.

2) ECONOMIC OBJECTIVES
Network TCO is a sum of all site deployment costs including both location dependent and location independent costs. Most important factors include costs of equipment, installation, site acquisition or rental, backhaul, powering, maintenance and licensing.
For a small cell network extension the annual TCO can be obtained from where sc ∈ R N c ×N o is the cost matrix with values for having ready-made small cell options at the corresponding candidate sites. Equation (8) can be written in the form where N v is the number of active small cells, sc i represent the location independent costs, sc d (l j ) refers to the location dependent costs of j th active small cell, denoted by l j . As can be seen from (9), the small cell network extension cost does not vary only with the number of active cells but also with the location dependent variable costs. Thus, minimizing only the number of active small cells cannot be used to assess TCO. Revenue is a function of both users' (satisfied) demand and their buying power. It is mainly obtained from users' service payments based on different billing models (e.g. metered, fixed-price recurring, volume-based, tiered, bundled etc.) although there might be also other indirect revenue sources. In data volume based invoicing the revenue per user is typically location dependent due to socio-economic factors. Accordingly, the network topology may affect to the total revenue as well. In such case we can apply net present value to assess the net discounted cash flow considering both network costs and unevenly distributed opportunities to gain revenue.

IV. A CASE STUDY
In the following we carry out a brownfield planning case study to demonstrate the usage of the proposed holistic multiobjective planning framework. Specifically, we utilize the framework to benchmark improvements of different 5G hyperdense deployment approaches over an existing pre-5G macrocellular network.

A. STARTING POINT: THE EXISTING NETWORK
As a starting point for the case study we have adopted a 1.67km × 1.48km (∼2.5km 2 ) downtown area of Addis Ababa (Ethiopia), depicted in Fig. 4. This area exemplifies an urban scenario with active business area, intra-urban roads and light rails, public hotspots (Meskel Square, stadium and parks), and residential areas (bottom left). The city is located at altitude of 2000-3000m above sea level and there are buildings of heights up to 79m.
The existing pre-5G network in the selected study area is composed by ten macro BS sites. Each site is tri-sectored supporting three 3G+ cells that apply 3GPP band 1 with quad-carrier support and three 4G cells that operate employing 20 MHz bandwidth at 3GPP band 3 carrier. Although we use precise site coordinates and antenna parameters, they are not presented in the paper due to confidentiality reasons.
To characterize the network traffic we have analyzed the numbers of users and the data flow during October 2019 within 10 sites of Fig. 4. The obtained results indicate that most of the connections take place through the 3G+ technology. To give an example, we have presented in Fig. 5 the hourly mean number of users served by cell 1 of site 1 applying 3G+ and 4G technologies for a one week period (Oct 25-Nov 01). Furthermore, the distribution of the highest mean data traffic per hour is shown in Fig. 6 for the observed 10 sites within the same time period. As can be seen from the Fig. 6, the busiest traffic hour commonly occurs around the lunch time (12 pm to 1 pm). The afternoon hours at round 4 pm and 5 pm are also popular as well as the evening hour at 9 pm. Exception is the Sunday which is the only non-working day in Addis Ababa.

B. NETWORK UPGRADE AND HYPERDENSIFICATION 1) PROPAGATION PREDICTION
For the propagation modeling we have created a 3D map of the area using local terrain and building map data imported and edited using the WallMan module of WinProp software suite [59], see illustration of Fig. 7. Applying this building and terrain map, we can compute signal paths in all cells using the deterministic 3D propagation model (dominant path ray-tracing model [67]) of ProMan package included the WinProp suite [59].

2) USER DISTRIBUTION AND THROUGHPUT DEMAND
While planning the hyperdense small cell extension for the current network we assume the same distribution of users as in the present network but with much higher expectation for the users' throughput. The applied user distribution is based on the real NMS data collected in October 2019. Data consists of the cell-level numbers of users and observed per VOLUME 8, 2020  hour data flow with 50 m × 50 m pixel accuracy. Based on this data and user density clutter of Table 3, defined by the NGMN Alliance [68], the user and throughput demand distribution is generated by applying the approach presented in Fig. 2. A snapshot of the user distribution is depicted in Fig. 8.
We have focused on the throughput that is computed based on the mapping in (6), whereby, the applied SINR and bandwidth efficiency parameters are selected according to [66]. Furthermore, we assume that future throughput demand will be much higher than that observed in the present NMS data due to improved devices and new services. Target values are listed in the Table 3 that is formulated by scaling the ideal 5G demand set defined by the NGMN in [68].
First, the user satisfaction is simulated for the existing and upgraded 4G macro network, see the related technology assumptions summarized in Table 4. Results of Fig. 9 show that we obtain limited user satisfaction even after upgrading the macro cells to support carrier aggregation and higher-order MIMO antenna configurations. As we see from Fig. 9 only bit more than 10% of the users are satisfied whereas at the 50%-ile and 10%-ile level users are far from satisfaction. The spatial user satisfaction levels for the   upgraded macrocellular network is shown in Fig. 10. While user satisfaction is good on large land portions of the study area (dark green), it is noticeable that satisfaction is rather low (red, orange, yellow) on areas where user density and demands are high.

3) CANDIDATE SITE LOCATIONS FOR SMALL cells
For the computation of the annual small cell site deployment cost we have used assumptions presented in Table 5. These cost numbers are obtained by contextualizing the assumptions in [58]. Furthermore, we recall the optical backbone termination points and street furniture illustrated in Fig. 4.   Based on this data we have computed a normalized site deployment cost map shown in Fig. 11.
Using the spatial satisfaction and site deployment cost maps as an input for the techno-economic method of Fig. 3, we obtain candidate site locations depicted in Fig. 12. We note  that blue dots in Fig. 12 refer to possible small cell site locations, called as candidate sites. The number of actually deployed small cells will be smaller than the number of candidate sites. The candidate site locations of Fig. 12 were obtained assuming 300 candidate sites/km 2 density that is related to 0.45 target satisfaction and 0.53 cost cap.

4) MULTIOBJECTIVE OPTIMIZATION ALGORITHM
We apply the non-dominated sorting genetic algorithm (NSGA)-II that is popular due to its excellent performance when considering the computational complexity and ability not to lose good solutions [69], [70].
We recall that the NSGA-II algorithm is described well in [69]. It is applied in our context by starting from an initial network. The further network creation is performed by first selecting parent networks using tournament selection and then creating children networks from the parents using crossover and mutation. From current and newly created networks, next generation networks are selected after applying non-dominated and crowding distance sorting methods. In non-dominated sorting, networks are sorted based on their performance ranks in terms of the satisfaction and cost objectives. Then a number of best networks is selected based on their ranks. If the number of networks exceeds the target, then networks with higher distance from its neighbours in terms of the objective functions are preferred. The VOLUME 8, 2020 described network formulation iterations continue until only an insignificant change in the satisfaction and cost functions is seen among iterations. We use NSGA-II with assumptions listed in Table 6 [71].

A. PLANNING APPROACHES AND CANDIDATE SITE DISTRIBUTIONS
We analyze performance of the proposed data-driven planning framework considering two approaches: • Data Driven Planning for user satisfaction and TCO (DDP1): Here planning applies user distribution and candidate site locations based on the NMS data from the current network and site deployment cost. The 10%ile user satisfaction (see (1)) is computed in terms of throughput and TCO is obtained according to equation (8) using values in Table 5.
• Data Driven Planning for user throughput and TCO (DDP2): This planning approach is otherwise similar with DDP1 but it applies throughput instead of satisfaction. The difference between above approaches looks minor, but it is important to notice that in DDP1 planning aims to fulfil the user demand while in DDP2 planning focus on the data supply. That is, DDP2 leads in some cases data overprovision that unnecessarily consumes network resources.
For comparison purposes, we also evaluate the following two approaches that apply the conventional clutter data: In addition, for comparison purposes we have also carried out planning by assuming a uniform user distribution and an uniform grid of candidate site locations. Then final site locations are selected using the 10%-ile user throughput and number of small cells as objectives. This planning approach is called as Grid Based Planning for user throughput (GBP). We note that CCBP is emphasizing more the operator interests while CSBP reflects better the user needs. The grid based approach of GBP is typically applied in theoretically oriented planning studies (e.g. [38], [46]) and here it provides a comparison benchmark. Summary of the five planning approaches is presented in Table 7.
Let us recall the spatial distribution of the candidate site locations (small blue and white dots) represented in Fig. 12. This small cell site distribution reflects the planning approach DDP1 where both site deployment cost and spatial user satisfaction maps have been taken into account. In Fig. 13 we have spatial distributions of the candidate site locations for the rest of the introduced planning approaches. In Fig. 13 (a) both site deployment cost and throughput are taken into account, in Fig. 13 (b) focus is in site deployment cost and in Fig. 13 (c) spatial user satisfaction is applied. Finally, in Fig. 13 (d) candidate site locations form a uniform grid. We note that the underlying map colors refer to the user satisfaction and candidate site density was set to 300 sites/km 2 .
While DDP1 and DDP2 lead to seemingly quite similar candidate site distributions (see Fig. 12 and Fig. 13 (a)), it is seen from Figs. 13 (b) and (c) that emphasizing site deployment cost or spatial satisfaction may very strongly drive the selection of candidate site locations. That is, in Fig. 13 (c) candidate sites are strongly concentrated in red areas where user demand is high, and in Fig. 13 (b) the candidate sites are on the low site deployment cost clutters, see also Fig 11. If joint criteria over site deployment cost and spatial throughput is applied as in Fig. 12 and Fig. 13 (a), then site locations do not frequently appear in low-cost, low-demand areas although it still seems that deployment cost impact on the candidate location selection.

B. PERFORMANCE COMPARISONS FOR PARETO NETWORKS
The Pareto optimal solution of a multiobjective optimization problem is found if it is not possible to improve any objective without degrading at least one other objective. Accordingly,  the Pareto network refers to the best small cell extension what the applied planning approach with given parameter values can produce. To find Pareto network for a certain number of small cells we created sets of candidate sites as in Fig. 12 and Fig. 13, and then used planning objectives of Table 7 to select the candidate sites such that network performance was optimized. The resulting performance was then compared against the upgraded macrocell deployment. Fig. 14 presents the achieved 10%-ile user satisfaction gains with respect to the upgraded macro case for Pareto networks obtained using different planning approaches of Table 7. The DDP1 presents clearly the best performance in terms of 10%-ile user satisfaction. With DDP1 the 10%-ile user satisfaction can be 5-9 times better than for the upgraded macro deployment. Of course, for such high performance increase, the required number of small cells is well over 100 while there was just 10 macrocell sites in the system. For other planning approaches gains are smaller but still notable. The performance of DDP2, CSBP, GBP are close to each other while the cost driven planning approach CCBP is clearly worst. We note that deployments with 221 small cells (white dots) are shown also in Fig. 12 and Fig. 13. Fig. 15 shows the performance of different planning approaches in terms of 10%-ile user throughput. Since throughput is now used as a performance measure it is natural that DDP2 provides better results than DDP1. Yet, as discussed previously, the user satisfaction is better measure for the performance since it does not lead to service overprovision. While CCBP is again performing worst, the simple GBP approach seems to work well when the number of small cells VOLUME 8, 2020  is high. This is due to fact that the uniform user distribution match well with the uniform candidate site grid and good mutual distance between small cells lower the interference between small cells. Fig. 16 shows the incurred network cost for the different planning approaches with respect to the number of small cells. The high user satisfaction gain of DDP1 is achieved with costs lower than in CSBP and GBP. On the other hand, the CCBP provides lowest costs but as seen from Fig. 14 and Fig. 15 it has also worst performance among all planning approaches. Moreover, it is noted that cost differences between planning approaches increase with increased number of small cells. We note that slope of the cost line for CSBP and GBP is larger than for other planning approaches since CSBP and GBP do not consider site specific deployment costs but just use the number of small cells as a planning objective.  In Fig. 17 we have 10%-ile, 50%-ile and 90%-ile relative user satisfaction gains in case of 221 small cells. To that end, it is observed that the DDP1 leads to a network topology that outperforms all other planning approaches at 10%-ile performance but not at 90%-ile performance where users are over-provisioned. Actually, from Fig. 18 it is observed that the 10%-ile satisfaction gain take place in a network topology that, on the other hand, maximize the 90%-ile throughput. Fig. 17 and Fig. 18 also show that the CCBP approach provides the lowest performance in all percentiles in terms of both user satisfaction and throughput. This is a heavy penalty for the cost benefit it admit by placing the candidate sites on the low-cost clutter. The further densification of CCBP small cell extension does not pay back since it leads to overdense small cell deployment in the low-cost clutter as seen from Fig. 13 (b). This performance degradation is also seen from the SINR performances that are shown in Fig. 19. The GBP approach has the best SINR performance, as its topology is obtained from equally spaced candidate sites. However, by ignoring cost considerations, the GBP approach ends up becoming 18.10% more expensive than the CCBP approach. Yet, the best SINR is not directly mapped to the satisfaction in Fig. 17 and throughput in Fig. 18 due to its uniform user distribution assumption and inconsideration of satisfaction objective.

VI. CONCLUSION AND FUTURE WORK
We proposed and analyzed a data driven planning framework that considers practical challenges faced in the deployment of a heterogeneous 5G hyperdense network. In the literature review we identified trends and gaps of recent pre-5G and 5G network planning works and noticed that most of the studies employ a greenfield network planning scenario assuming a single radio access technology. This framework is rarely practical since pre-5G networks appear almost everywhere and 5G is typically deployed on top of existing macrocellular 3G/4G network. Although there are recent studies that consider network function virtualization and flexible deployment of 5G, they mostly focus on the placement of NG-RAN functions in the transport network without seeing the radio performance. Even more importantly, network planning studies typically do not consider users' service demand and lack of realistic models for radio propagation and spatial user distribution/data consumption. Moreover, there are very few studies that aims to identify candidate site locations for small cells in a realistic network planning setting. To fill this gap we have proposed a data-driven multiobjective planning framework that holistically address aforementioned aspects and enable practical 5G network planning by applying realistic data-driven models.
We experimented the proposed planning framework by using a case study for a selected urban hotspot of Addis Ababa (Ethiopia). We first modeled the existing macrocell network based on accurate deployment data that includes radio network configuration for 3G and 4G sites. We also had in our use the network management system cell-level and spatial traffic statistics with 50m×50m spatial accuracy for one month in 2019. Finally, we had data on the street furniture, existing backbone (with access locations) and a 3D map of the area. All this data enables a realistic planning of a 5G small cell network extension.
For the 5G small cell network planning we set target by scaling the current service supply distribution upwards following the 5G performance targets of NGMN. Then we identified a set of possible small cell locations (called as candidate sites) using the above-mentioned data and different planning approaches. Thereafter we performed a comparative analysis of the proposed data driven planning framework against the conventional clutter based planning approaches. For the analysis we introduced performance measures such as user satisfaction and total cost of ownership (TCO). We recall that the former can be defined for different indicators as a fraction between observed indicators value and target indicator value.
Results hint that notable gains can be obtained if accurate service demand data is applied in the small cell network planning. With ratio 12-22 small cells per macro base station we obtained 3-9 times improvement in 10%-ile level of user satisfaction with respect to macrocell only deployment. As expected, identifying small cell candidate site locations exclusively based on either users' dissatisfaction or site deployment cost may considerably increase user satisfaction or reduce small cell network costs, respectively. Yet, it is better strategy to apply the TCO as a planning target since cost minimization may heavily compromise the network performance while focusing on user satisfaction only increases notably the network deployment costs. Also, we noticed that it is more reasonable to apply the user satisfaction as a performance indicator instead of e.g. throughput to avoid service overprovision. The uniform grid for candidate site locations would work well if the number of deployed small cells is large. Yet, the practical availability of site locations set constraints for this approach.
Future works include experimenting the framework for other network environments and multiobjective optimization algorithms. It is also important to extend the analysis for a wider span of satisfaction objectives including latency and reliability. With analysis of service data these extensions will make it possible to study also the impact of 5G network slicing on the planning principles. Furthermore, practical techno-economic impacts of the various new generation radio access network and transport network technology options can be investigated using the proposed framework to provide insights on successful deployment strategy.