Design Methodology for Wireless Backhaul/Fronthaul Using Free Space Optics and Fibers

The recent increase in data rates for Free space optics (FSO) transmission technology, means they could be used for designing the backhaul/fronthaul for 5G and beyond cellular networks. The flexibility and cost-effectiveness provided by FSO are the primary reasons for the mobile operators to investigate the potential of the technology as a mobile backhaul/fronthaul. Unfortunately, the reliability of FSO links is weather dependent, especially if the link covers considerable distance. Optical fibers, on the other hand, are expensive but more reliable. Hence, optimally designing a hybrid network consisting of both fiber and FSO connections can bring in the cost-effectiveness of FSO as well as the robustness of fibers. In such a design, the more important links are connected using fibers while the links with higher tolerance towards failure are designed using FSO. Therefore, in this paper, we propose a hybrid FSO/fiber backhaul/fronthauling methodology for connecting wireless base stations (BSs) to the network core. We first formulate a mixed integer non-linear program (MINLP) for determining the number of splitter/FSO distribution points required in the network that optimally provide connectivity to the BSs. The MINLP is designed to identify the locations of the splitters/FSO distribution points as well. Thereafter, we solve the MINLP with the help of particle swarm optimisation (PSO) and mixed integer linear programming (MILP) techniques. We also derive a heuristic for solving the MILP. Finally, we propose another method for determining the number of splitters required in a relatively shorter time: K-means cluster based method. The results verify that the hybrid network is cost-effective while conforming to the data rate and reliability requirements of the links. The proposal allows evaluation of a seamless design solution of the hybrid network with practical time-complexity.


I. INTRODUCTION
T HE telecommunication industry has witnessed a phenomenal increase in the use of smart devices over the last decade. As a result, the data rate requirement for catering to the communication demands of the end users has increased many-folds. The evolution of mobile networks into the fifth generation (5G) networks is a result of the extensive bandwidth requirements of the present age [1], [2]. 5G is expected to provide end users with ultra-low latency, high data rate support and ubiquitous access [3]. Network densification is a key strategy to meet the 5G requirements [2], as it enables higher spectrum reuse and therefore, higher capacity and reliability. The decrease in the size of the cells also signifies that larger number of base stations (BSs) are required to provide wireless coverage over a geographical area. As a result, network capital (CAPEX) and operational expenditure (OPEX) increase considerably, both due to the cost of BSs and high-speed connectivity (backhaul/fronthaul) [1].
The present day cellular backhaul rely mostly on two transport technologies: optical fibers and microwave radio links [4], [5], [6]. Optical fibers provide high capacity and reliability but need high initial investment [7]. Further, the fixed connections provided by optical links provide limited flexibility. On the other hand, microwave has high CAPEX, high OPEX due to spectrum licensing fees [8], lower data rate support and lower energy efficiency [9], [10]. In order to reduce the OPEX of licensed microwave links, unlicensed point to point microwave links have been employed as mobile backhaul in the recent years. Unfortunately, unlicensed microwave links are more prone to interference and are prone to unpredictable deployment issues [11]. Free space optics (FSO) is also being considered as an alternative to microwave that provides wireless connectivity. In contrast to microwave, FSO can provide high data rate connections between two points up to several kilometers [7]. FSO provides a high reuse factor, an inherent security, and robustness to electromagnetic interference. Since, FSO operates in frequencies in the unlicensed band above 300 GHz, the OPEX is low as well [7].
Unfortunately, FSO is sensitive to weather conditions. Its reachability degrades heavily in the event of fog and precipitation [12], [13]. Hence, reliability of the connection is of primary consideration when deploying a FSO link [14]. Consequently, a hybrid backhaul/fronthaul consisting of fiber and FSO can be a potent solution. The reliability of fiber over longer ranges can be complemented with the cost effectiveness and data carrying potential of FSO over shorter ranges. connections using a point-to-multipoint (P2MP) passive optical network (PON).

A. Use Case and Motivation
The generic use case and associated network topology is illustrated in Fig. 1. We consider a full fiber PON as a distribution network where the fiber connections terminate at the optical network units (ONUs). We assume that either the BSs are collocated with the ONU or the BSs are connected to the ONU using a FSO link. For the former case, we assume that the ONU and the splitter are situated far from each other, i.e., their separation distance is in the order of tens to hundreds of meters. However, for the latter case, we assume that the ONUs are equipped with FSO devices (FSODs) and are situated in proximity of the splitter (e.g., less than 10 meters). Such an assumption minimizes the cost of connecting the FSOD equipped ONUs to the network as our motive is to reduce the network deployment cost as much as possible by connecting the BSs using FSO links. Placing the FSOD equipped ONU far away from the splitter introduces significant fiber laying expenditure. Note that an FSOD consists of the FSO transceiver and the FSO processing unit. We assume that the poles for hosting the FSODs are already present. One such example is street units hosting FSODs (FSO Towers). We call an ONU equipped with FSOD as an FSO distribution point (FSO-DP).
Since, a fiber link is more expensive as compared to an FSO link, the target should be to use FSO links as much as possible while meeting the data rate and reliability requirements of the BSs. However, it is well-known that the reliability of FSO, being wireless in nature, degrades with the increase in the distance of separation between the transmitter and the receiver. Hence, to cover a given geographical area, the number and proper positioning of the FSO-DPs plays a vital role so that the distance between the FSO-DPs and the BSs can be minimized. One might think that distributing large number of FSO-DPs over the area might allow us to establish FSO connections cost-effectively as the FSO-DPs and the BSs can be kept in close proximity. However, using a large number of FSO-DPs might increase the overall CAPEX and OPEX of the network; both because of the increase in the number of FSO-DPs and also because the FSO-DPs need to be connected to the core network with fiber links for reliability. Hence, these conflicting requirements call for an optimisation based solution.

B. Contributions
We realise that the problem of minimisation of cost of the backhaul/fronthaul has multiple objectives: r Determination of the number of splitters 1 required to cover a certain geographical area. Please note that as we focus on the Optical Distribution Network -ODN -we assume a PON is already available up to the first stage split (i.e., the OLT and the Splitter 1 shown in Fig. 1).
r Identification of the locations of the splitters covering the given geographical area.
r Identification of the most cost-effective link technology (between FSO and fiber) for connecting the splitters and the BSs such that cost of the overall network is minimised. Therefore, in this paper we propose a cost-effective splitter placement and link selection (SPALS) methodology that sets up 5G backhaul/fronthaul. We ensure that the backhaul/fronthaul satisfies the data rate and the reliability requirements of the connected BSs. Note that our work focuses on capacity and reliability metrics, and the hybrid FSO architecture is independent of the specific technology (i.e., MAC) deployed. It is thus compatible with free space and fiber based technology that is capable to support 5G services like ultra-reliable low latency communications (URLLC) and enhanced Mobile Broadband (eMBB).
An high level overview of the proposal can be found in Fig. 2. The primary contributions of this paper are: 1) We have formulated a mixed integer non-linear program (MINLP) to identify the number of splitters required for the network deployment, the splitter locations and connection technology to be used for connecting the BSs to the splitter. The MINLP minimizes the overall network deployment cost (see Fig. 2(a)). We call this module as SPALS-MINLP. 2) Since, the MINLP, being a variant of the bin-packing problem, with the additional complexity of finding the splitter locations, is np-hard [15], we have first reduced the MINLP into a mixed integer linear program (MILP), by assuming the splitter locations to be known. The MILP solves the network deployment cost minimisation problem by identifying the link technology to be used for connecting the splitter and the BSs. The MILP selects the optimum number of splitters after accepting the locations of splitters as a parameter. Essentially, the MILP is a bin-packing problem. We name SPALS with MILP based solution as SPALS-M. 3) We provide a heuristic to approximately solve the proposed MILP with lower time complexity. We have shown that the heuristic provides almost comparable results as that of the MILP. We call SPALS with heuristic solution as SPALS-H.

4)
The splitter locations are identified using particle swarm optimisation (PSO). The PSO module accepts the number of splitters required as an input parameter. The PSO and the MILP/Heuristic algorithm work in tandem and form the SPALS module (see Fig. 2(b)). 5) Finally, we have proposed a K-means clustering based solution to identify number of splitters required for providing connectivity in a relatively shorter time. Note that we compromise on optimality for obtaining a faster solution in the the K-means Clustering method. We add the prefix "kM" whenever we use K-means Clustering algorithm for determining the number of splitters. The paper is organized as follows. Section II summarises the state of the art. We present the system model in Section III. The SPALS-MINLP has been formulated in Section IV. Thereafter, we present the PSO, MILP and heuristic method for solving the optimisation problem in Section V. The K-means Clustering method for determining the number of splitters required is illustrated in Section VI. The simulation setup is elaborated in Section VII. The results are discussed in Section VIII followed by the conclusion.

II. RELATED WORKS
In this section, we briefly discuss the current state of the art on FSO networks. Excellent surveys on FSO networks can be found in [7] and [12], where detailed information about the advantages and applications of FSO, FSO channel modelling, FSO transceiver design, modulation techniques, channel coding and spatial diversity used in FSO, and information theoretical limits of an FSO system are available.
FSO has been an interesting candidate for supporting wireless networks for more than a decade now. However, the primary obstacle towards the adoption of FSO as wireless backhaul/fronthaul is the lack of reliability in longer links; especially when the weather conditions are not conducive. As a result, several innovative solutions have been proposed that enhance the link reliability. We can observe such attempts in [14], [16], [17], [18], [19], [20], [21], [22], [23], [24].
The authors of [14] introduce a linear programming model to establish a cellular backhaul using K-disjoint paths. They place mirrors in strategic locations to ensure line-of-sight (LoS). However, the mirrors merely act as reflectors, thereby increasing the effective link path length. Hence, the reliability of the link decreases. K-disjoint paths are set up to enhance link reliability. We can find a proposal regarding setting up of backhaul links by choosing either fiber or FSO/Radio Frequency (RF) links in [18].
A Radio over FSO based fronthaul is presented in [19], where the signal from the FSO is coupled to optical fiber for further propagation towards the network core. In [20], a framework for cost optimal deployment for 5G network in a dense urban scenario is investigated. Both the number of radio resource heads (RRHs) and cost of the connections to the RRHs (using either FSO or fiber) is minimized. Two optimal FSO transceiver placement and resource allocation schemes that facilitate cooperative dynamic FSO networks are presented in [21]. The paper proposes the use of longer FSO links in clear weather. On the other hand, the FSO transceivers switch to using shorter links in case of bad weather conditions. Another fronthaul design using RF/FSO can be found in [22] where rate maximization is achieved by using efficient quantization schemes. Similar to [21], we find another network topology reconfiguration scheme in [24], where the FSO links are rearranged depending upon traffic demands or weather conditions. We find the use of FSO in providing backhaul to wireless mesh networks (WMNs). In [16], an already operational WMN is considered that requires additional gateways for optimal operation. The additional gateways are connected to the network core using FSO/RF links. The capacity of WMNs is improved using strategically placed FSO links in [17].
In most of the works available in the literature, FSO is primarily used as an upgrade in network capacity. Further, most of the works consider that the fiber network is already present in the geographical area when the decision of using FSO links is undertaken. To the best of our knowledge, the joint problem of deciding the number of FSO transceivers required, identification of the optimum locations of the FSO transceivers and selecting the connection from the BS to the FSO-DP/splitter has not been researched upon. Therefore, in this paper, we proceed to solve this joint problem.

III. SYSTEM MODEL
In this section, we elaborate the networking environment where the SPALS will be operated. In order to provide an insight on the benefit of using SPALS, we have focused on a grid area (20 km × 20 km) with random uniform distribution of active users (orange dots) as shown in Fig. 3. We have marked a separate area at the centre of the grid and have deployed users with higher density in that region as shown in Fig. 3 [25]. This distribution mimics the user distributions commonly found in the city centre and the suburban areas of a typical city. We have identified the positions of the BSs (black "*" signs in Fig. 3) using the K-means clustering algorithm with the restriction that a BS can provide coverage to a maximum of 10 active users, leading to an uncontended data rate of 100 Mbps per user (although different assumptions on user data rate can be easily considered). We also assume that the central distribution point "o" (shown as Splitter 1 in Fig. 1) is situated exactly at the middle of the grid, i.e., at co-ordinates (10000,10000). Thereafter, our proposal focuses on deciding the number and positions of the splitters and whether the connections are provided to the BSs through fiber or FSO links (see Fig. 1). For the small-cell fiber connectivity, we have considered passive optical networks (PONs), as this is being considered as a cost-effective alternative to point-to-point fiber links [26]. Hence, depending on the reliability and the data rate requirements of the connected BSs, we select the ideal technology (FSO or PON) for providing the connections. The overall objective is to minimize the cost of the network.
We next introduce the datarate, the reliability and the cost models used in the SPALS optimisation formulation. Please note that we have resorted to a generic model for our evaluation in this work. However, more complex and specialised models can easily be added to the SPALS optimisation formulation by including the datarate and reliability w.r.t. distance in a look-up table and thereafter, using the values as input parameters to the SPALS problem.

A. Data Rate Model
We adopt the model from [18] to determine the data rate (D F SO u,s ) obtained by a FSO link given the distance of separation between the splitter s and the BS u. The relation between obtained data rate D F SO u,s and distance distance δ u,s is shown in (1).
where, D t is the maximum data rate provided by the FSO link and d D is a threshold distance in kilometers.

B. Reliability Model
Similar to the data rate condition shown in (1), the reliability of the FSO link given the distance of separation between the splitter s and the BS u is calculated using (2) [18].
where, d R is a threshold distance in kilometers.

C. Cost Model
We consider FSO-DP setup cost, fiber cost and fiber installation cost for the evaluations. Setting up an FSO link requires two FSO devices (one at the FSO-DP and the other at the BS). The FSO-DPs are connected to the central distribution point via fibers. Therefore, we also need to consider the cost of connecting the FSO-DP to the central distribution point.
Fiber installation cost varies depending upon the scenario. In rural areas, trenching is required for installing the fiber. In urban areas, sewer ducts might be used for placing the fibers; thus reducing the trenching expenditure by up to 95% [27]. In our study we take in consideration the two extreme and opposite cases of 100% trenching requirement and 100% duct availability.

IV. SPLITTER PLACEMENT AND LINK SELECTION PROBLEM
In this section, we present the optimisation problem formulation for determining the number and locations of the splitters/FSO-DPs and selecting the connection technology between the BS and the splitter. For the discussions in this section and Section V, we assume that the maximum number of splitters available is equal to the number of BSs available in the scenario. The optimisation chooses a subset of the available splitters for the network deployment. We shall discuss the K-means Cluster method for a quick identification of the required number of splitters in Section VI.
Our target is to minimize the cost of the network by providing connections that are most cost effective given the circumstances. At the same time, we also need to ensure that the data-rate and the reliability provided by the links identified (both FSO and PON) are above the acceptable limits.
The cost (Ω) of the network is shown by (3). are the expenditures incurred if the link between BS u and splitter s is set up using FSO and fiber respectively. β s is a binary variable that marks whether the splitter (s) is used in the network (β s = 1) or not (β s = 0). δ s,o is the distance between the splitter s and central distribution point o and F is the cost of fiber per meter (including fiber laying costs). The cost of fiber installation is considered only when the splitter is used, i.e. β s = 1.
The first two terms of (3) aim at minimizing the overall cost of the network by considering the cost of the equipment used to set up a link between BS u and the splitter. Here, we have binary variables instead of a continuous variables because for setting up a link with a particular technology, we need to procure all the necessary equipments 2 required to set up the connection. The third term of (3) evaluates the fiber cost for setting up a connection between the splitter s and the central distribution point (o).
We must ensure that the link between BS u and splitter s is able to satisfy the data rate requirements of BS u, which is given by (4).
where 0 ≤ α P ON u,s ≤ 1 is a continuous variable that indicates that the link between BS u and splitter s is set up using fiber. captures the data rate supported if the link between BS u and splitter s is set up using FSO and fiber respectively. D th u is the data rate requirement of BS u. Next, we ensure that the PON has enough capacity to support the traffic forwarded by the created FSO links between the BS u and splitter s (via an ONU with FSOD installed).
where 0 ≤ α F SO u,s ≤ 1 is a continuous variable that indicates the load that the established FSO link between BS u and splitter s has introduced in the PON.
The total load carried by the PON should be within the capacity limits as shown in (6).
We also ensure that either a FSO or a fiber connectivity is used to establish a link between BS u and splitter s.
Thereafter, we need to guarantee that if a FSO or a fiber link is connected to a splitter, then the splitter needs to be marked as active.
The reliability constraint is shown by (11).
where, R F SO u,s is the reliability of the FSO link between BS u and splitter s while R th u is the reliability requirement threshold of BS u. Here, we assume that the fiber links are 100% reliable. Note that fibers have a failure probability which is directly proportional to the fiber length. However, the time between fiber-cuts are quite large and the downtime is long (several hours or days). Since, here we are interested in the short-term availability of links (i.e., due to atmospheric conditions), we are neglecting fiber link failures.
The distance between the BS u and splitter s is given by (12).
where, L x u and L x s are the x-coordinates and L y u and L y s are the y-coordinates of BS u and splitter s respectively.
Similarly, the distance between splitter s and the central distribution point (o) (shown as Splitter 1 in Fig. 1) is given by (13).
where, L x o and L y o are the x-coordinates and y-coordinates central distribution point o.
The cost of a fiber connection between BS u and splitter s is given by (14).
where, M is a large integer and 0 ≤ γ u,s ≤ 1.
where, N is a large integer and 0 ≤ ψ u,s ≤ 1.
Thus, we formulate the SPALS optimization problem in (21).
The meanings of the symbols used in (3) - (20) are summarised in Table I. Solving (21) returns the the number of splitters/FSO-DPs required, the co-ordinates of the splitters/FSO-DPs and the connection technology required to set up the BS links from the splitters/FSO-DPs. In this section, we provide the procedure for solving the SPALS optimisation problem formulated in Section IV, which belongs to the class of Mixed Integer Non-Linear Program (MINLP). Further, the selection of the optimum number of splitters to be deployed out of the set of possible splitters makes the basic objective of the MINLP similar to a bin-packing problem [15]. Therefore, the MINLP is inherently np-hard. A global MINLP solver can be employed to solve it. Hence, to check the effectiveness of the formulated MINLP, we have used Python based Pyomo package [28] in conjunction with COUENNE [29], a global MINLP solver. As expected, due to the hardness of the problem, the evaluation of the formulated optimisation problem takes excessively long time; even for a (2 km × 2 km) grid, the computation time is of the order of days. This indicates that we need an efficient heuristic algorithm if we want to solve the same problem for our desired (20 km × 20 km) or even larger grids. Hence, we split the problem into the following two sub-parts: 1) Determination of splitter locations.
2) Identification of connection technology and optimum number of splitters.

A. Determination of Splitter Locations
In this section, we provide the procedure to identify the optimal locations of the splitters (provided that the maximum number of splitters is known and is equal to K) so that the connections to the BSs can be performed in a cost-effective manner.
Since, we are working with multiple splitter locations on a large grid area (20 km × 20 km), the number of combinations for the splitter locations is extremely high. Therefore, we resort to Particle Swarm Optimisation (PSO) for the purpose of identifying splitter locations [30]. PSO is a meta-heuristic that is inspired by swarm intelligence, social behavior, and food searching of a bird flock or a fish school. This algorithm has been widely used in the literature to solve non-linear optimisation problems.
In our problem, we generate L particles that form the initial population P. Each of the L particles contains randomly assigned x and y coordinates of the K splitter locations.
where, W (l) is a matrix that holds K random splitter locations for the particle (l = 1, . . ., L). Given the splitter locations, the utility (U ) is the cost returned by Section V-B after selecting the optimum splitter count and optimum connection technology. In each iteration, the PSO records the best solution among the solutions obtained by all the particles (W (global) ), i.e., the splitter location combination that returns the lowest connection cost among all the particles. Additionally, each particle also records the position combination of its best performance (W (l,local) ). Thereafter, the PSO calculates the velocity term V where, ψ is the inertia weight that controls the convergence speed. c 1 and c 2 represent the size of the step that the particle takes toward its best individual local candidate solution W (l,local) and the global best solution W (global) respectively. The parameters φ 1 and φ 2 are two random positive numbers generated for each k (i.e, for each element of W (l) ). The PSO updates each element k of the particle W (l) by the following equation.
The process is repeated till the PSO reaches maximum number of iterations (I max ). The value of I max and the number of particles (L) influence the accuracy of the PSO algorithm. Using larger values of I max and L return solutions that are closer to the global optimum. However, larger values of L and I max require more time to execute the PSO algorithm. Hence, a trade-off between speed and accuracy is required to solve our problem. Since, we are solving a network deployment problem, the splitter locations are determined only during the network deployment phase. Hence, we have the luxury to invest sufficient time for solving the problem. Therefore, we recommend that sufficiently large values of I max and L should be used. Thus, we shall be able to achieve a solution that is very close to the global optimum.

B. Identification of Connection Technology and Optimum Number of Splitters
In this sub-section, we discuss how separating the optimisation problem into two sub-problems, i.e, the splitter location identification and the connection technology selection, reduces the MINLP introduced in Section IV into an Mixed Integer Linear Program (MILP). can be pre-calculated using the equations (12), (13), (14), (1) and (2) respectively. Therefore, the corresponding constraints are no longer required to be a part of the optimisation problem. Further, the variables γ u,s and ψ u,s are no longer required and as a result, are removed when designing the reduced optimisation problem.
Hence, the optimisation framework presented in (25) is an MILP. Hence, it can is solvable by a linear solver like CPLEX [31] or Gurobi [32]. We summarise the steps of preprocessing that converts the MINLP to MILP in Algorithm 1.
The value of the objective function of the MILP (25) act as the cost function of the PSO described in Section V-A. Working in tandem with the PSO algorithm, the MILP obtains the locations of the splitters and also the connection technologies to the BSs that return the lowest cost (see Fig. 2). The connections are identified by the values of θ F SO u,s and θ P ON u,s . Thus, we solve the SPALS problem. Since, Optimisation (25) is a MILP, the combination of PSO and MILP results in a much quicker method to find a near optimal splitter location. We call the combination of PSO and MILP solution method of SPALS as SPALS-M.

2) Moving Towards a Heuristic:
Even though the solution times of the MILPs are shorter than MINLPs, our designed MILP, being a bin packing problem, is also inherently np-hard. Therefore, as the size of the data set increases, the solution time of the MILP may no longer be practical. As a result, we look into alternatives to solve our MILP (25) with polynomial time complexity.
Since, the MILP (25) is trying to establish a FSO or a fiber link between the BS u ∈ U and the splitter s ∈ S after identifying the optimum number of splitters required, we can infer the following r If R F SO u,s < R th u and/or D F SO u,s < D th u then it is not possible to set up a FSO link for connecting u and s because the reliability and/or the datarate constraints are no longer satisfied by the best available FSO link.
r If C P ON u,s < C F SO u,s , then the optimisation will always select a fiber link for connecting s and u because the target of the optimisation is to minimise cost. Note that PON always meets the data rate and the reliability requirements.
r BS u will always be connected to the splitter s that has the lowest fiber connection cost to the central distribution point while satisfying the reliability and the data rate constraints. Since we are going towards a greedy heuristic, we shall be selecting the BSs one by one for connection. Therefore, if a previously chosen BS (u ) is already connected to a certain splitter (s ), the connection cost of that particular splitter (s ) to the central distribution point will be deemed as zero for the current BS (u). Therefore, we can perform the following operation without changing the solution of the problem (the process is summarised in Algorithm 2).  ). This measure ensures that a fiber link is always selected over FSO link for connecting s and u. The reduced problem can be solved using a heuristic approximation algorithm shown in Algorithm 3. We reduce the number of candidate splitters for a certain BS using the fact that the least expensive splitter in the current iteration will always be used for providing connection to a BS. Hence, effectively, for every BS, there is only a single candidate splitter with two possible connections (FSO and PON). The fiber link capacities can be ignored for the optimisation as the problem is modeled to have PON capacity suitable to transport the BS data rate requirements. Even if the aggregated BS data rate requirement becomes high, PON capacity can be enhanced by adding higher channel rates (100 Gbps and above) and multiple wavelength channels, possibly even dedicating one WDM channel to just one BS (i.e., through point-to-point wavelength overlay). Thus,

VI. K-MEANS CLUSTERING BASED SOLUTION
In Section V, we have presented a method of solving the SPALS-MINLP (21) with reduced complexity. However, identifying the number of splitters to be placed from a higher number of candidate splitters, is computationally expensive. Therefore, in this section, we develop a K-means Clustering based solution for SPALS (kMSPALS) for quickly determining the number of splitters. Note that in kMSPALS, we sacrifice optimality for faster results.
We observe that the cost involved in providing connectivity either with FSO or fiber links is directly proportional to the distance of separation between the splitter location and the BS. Therefore, we infer that the K-means clustering algorithm can identify the number of cluster centroids required to group the number of BSs. Thereafter, we can use the method given in Section V-A to identify the exact splitter locations.
From the literature, we can find that the Silhouette method is one of the most efficient methods of determining the number of clusters required to optimally group a set of data-points [33], [34]. The Silhouette method uses a collection of proximities to construct silhouettes. The proximities include similarities and dissimilarities between the objects of the clusters [33].
Let us assume that there are K clusters and a sample is assigned to cluster C(i). Let |C(i)| denote the number of samples in cluster C(i). Then, we calculate two metrics a(i) and b(i) for the sample i using (26). Formally, a(i) is the average dissimilarity of sample i to its cluster and b(i) is the average dissimilarity of sample i to the closest cluster apart from cluster C(i).
After calculating a(i) and b(i), the Silhouette of sample i is calculated as where, s(i) is the Silhouette value of sample i, and −1 ≤ s(i) ≤ 1. Note that higher s(i), denotes better clustering [33]. Finally, the average of all the Silhouette values is calculated.
where, N is the total number of samples and K is the number of clusters. In kMSPALS, the number of clusters (K) is varied from 2 to the total number of samples present in the data set and the value of S K is noted in each case. The value of K that returns the highest S K is determined as the optimum number of clusters. In Fig. 4, we show the silhouette average of the BS distribution illustrated in Fig. 3. From Fig. 4, we can see that the maximum value of S K is obtained for K = 12. Therefore, we select K = 12 as the number of splitters that are required to be placed for kMSPALS in the example grid area.

VII. SIMULATION SETUP
The evaluation framework for the proposal was generated using Python and the optimisation framework was coded using the Pyomo package. Initially, COUENNE, a non-linear solver was used for solving the SPALS-MINLP in a scaled down scenario (2 km × 2 km grid). The Gurobi solver was used for solving the MILP (both in the scaled down and the original  Table II. The cost of fiber is taken as 1 unit per meter. If trenching is required for installing the fibers, an additional cost of 1300 units per meter has been assumed. On the other hand, if ducts are already present in a location (e. g. sewer ducts), trenching is no longer required. Hence, we have used 65 units per meter for installation cost when ducts are already present.
The cost of a hybrid RF/FSO link is taken to be independent of the distance (up to the system maximum reach). Since two FSODs are required to set up an FSO link (one at the ONU and the other at the BS), 10 k units has been adopted as the cost of a FSO link. It should be noted that the threshold distances for data rate (d D ) and reliability (d R ) were considered following the work in [18]. In particular, the reliability threshold distance refers to a light fog scenario as we wanted to benchmark our proposal against a fairly challenging scenario for FSO communication. It is envisaged that such values would need to be adapted to specific locations, i.e., using statistical weather data, in order to obtain results that are accurate to given locations.
In order to benchmark our simplified proposals, SPALS-M and SPALS-H, we have first solved SPALS-MINLP using COUENNE. Unfortunately, the solution search space for the 20 km × 20 km grid is very large and as a result, was unsolvable in a laboratory computer. Therefore, we scaled down the network to a 2 km × 2 km grid. The parameters used for this case are shown in Table III. Notice that here some parameters were artificially scaled down to maintain the proportion to the network size and obtain insightful solution in the scaled down network. Thus, while these are useful for the purpose of algorithm comparison, the parameters are not representative of actual system specifications.

VIII. RESULTS
In this section, we provide results that gauge the performance of the proposal.

A. Comparison With Performance of COUENNE
We start the evaluation of our proposal by first solving the SPALS-MINLP (21) with the help of COUENNE. For the experiment, we set the maximum number of splitters to be placed as 25. The MINLP selects the optimum number of splitters to be used for cost minimisation. We also take a reduced grid area of 2 km × 2 km. We resort to such a reduced system as COUENNE takes a long time to converge. While the area is reduced, the random assignment of end users still provides sufficient generality to carry out the comparison. We also execute SPALS-M and SPALS-H algorithms on the same grid.
In Fig. 5(e), we find that the final network deployment costs are similar for both SPALS-MINLP and SPALS-M/H (both MILP and Heuristic) based solutions. As expected, COUENNE, being a global optimiser provides better solution. On the other hand, SPALS-H being a greedy heuristic, it provides slightly sub-optimal solution compared to SPALS-MINLP and SPALS-M The difference in the cost returned by SPALS-MINLP and SPALS-H is approximately within 1.5% up to a reliability threshold of 0.9 and then about 12% at reliability threshold approximately equal to 1. SPALS-M, on the other hand, approximately reaches the global minimum. The slight difference between SPALS-MINLP and SPALS-M is mainly due to the stochastic nature of the PSO algorithm. Hence, we extrapolate that it is reliable to use the SPALS-M for larger grid areas and obtain sufficiently accurate results. Further, SPALS-H might be also used to obtain a fast solution with reasonable accuracy.
Further, it is observed in Fig. 5(e) the cost of the network starts increasing for values of the reliability threshold above 0.2. The primary reason behind the escalation in cost is the increase in the number of splitters required to meet the data rate and reliability thresholds of the network (as seen in Fig. 5(f)). Further, at higher reliability thresholds, a large number of BSs are connected using fiber links (as seen in Fig. 5(d)) as the FSO links become inadequate. Hence, the cost of the network increases even further.
Looking at the time of execution, SPALS-MNILP, SPALS-M and SPALS-H require (> 5 days), (≈ 2 hrs) and (≈ 15 mins) respectively for solving the optimization problem for a single

1) kMSPALS Evaluation:
We present the evaluation of kMSPALS first as the intuition with the fixed number of splitters returned by the K-Means Clustering algorithm helps in the explanation of the SPALS performance later on. Once the maximum number of splitters to be used is identified, we employ SPALS-H to obtain the positions of the splitters and optimum connections from the splitters to the BSs. The maximum number of splitters to be used for the network deployment is obtained using the K-Means Clustering algorithm. Hence, the number of splitters required cannot exceed the number of splitters returned by the K-Means Clustering algorithm.
In Fig. 6, the splitter locations that are used are represented by blue dots. On the other hand, the unused splitter locations are denoted by pink squares. The BSs connected using FSO to the splitter is marked by green stars and the BSs connected to the splitter using fiber is marked by red stars. We can clearly observe that as the desired reliability of the links increases, higher number of BSs are being connected using fiber links. However, as the reliability expectation of the links decreases, slowly the majority of the connections shift towards FSO. Note that the PSO module of the SPALS-H algorithm optimises the splitter locations so that the cost of the network can be minimized while satisfying the operational constraints.
2) SPALS Evaluation: In Fig. 7, we obtain the number of splitters required to be deployed in the 20 km × 20 km grid by using SPALS. The primary difference between Fig. 6 and Fig. 7 is in the difference in the maximum number of splitters that can be used for network deployment. The ideal value to be used for the maximum number of splitters is equal to the number of BSs in the area. However, a sufficiently large value for which we can observe unused splitter locations even at the highest desired link reliability threshold is an acceptable value for the maximum number of splitters.
The increase in the number of splitters in the network deployment has the following two contrasting effects on the network deployment costr The BSs are now closer to the splitters -As a result, higher number of BSs can be connected using comparatively inexpensive FSO links. Alternatively, if the BS needs to be connected to the splitter using a fiber connection, the fiber laying cost is lower due to the lower distance between the splitter and the BS. Thus, network deployment cost decreases.  r Higher number of splitters needs to the connected to the PON using fibers -As the splitters/FSO-DPs are providing connectivity to a set of BSs, the splitter connectivity must be absolutely reliable and therefore, fiber connections are used. Therefore, the connection to the splitters are expensive. Moreover, installing more splitters/FSO-DPs also increases the overall cost of the network. Thus, we observe an increase in network deployment cost. The above two conflicting effects are weighed by SPALS to reduce the overall cost of the network by selecting the optimum number of splitters required. Hence, the cost returned by SPALS is lower than kMSPALS as observed from Fig. 9.

C. Comparison of Network Deployment Costs
Finally, the cost of the network is a direct consequence of the connection technology used in the network. FSO links are more cost effective over longer distances when compared to fiber links. However, an FSO link's reliability decreases with distance. Therefore, as the desired network reliability increases, the cost of the network increases as well (either due to the usage of more splitters or more fiber connections). We can observe the effect of desired reliability on network cost in Fig. 9. Further, the cost of fiber deployment is heavily dependent on whether or not ducts are already available in the region or not. We show the extreme cases with full or zero availability of ducts in Fig. 9. Any practical deployment would cost somewhere between the extreme cases.
In SPALS, we evaluate the cost of network deployment with the maximum allowable splitter set cardinalities to be 50 (as we found it is a sufficiently large number for a 20 km × 20 km grid). Thereafter, we choose the best splitter set that returns the minimum network deployment cost. kMSPALS, on the other hand, identifies the splitter set cardinality without considering the reliability and data rate constraints and hence provides suboptimal solution. As a result, the SPALS minimses the network deployment cost more efficiently as compared to kMSPALS. The observation from Fig. 9 reinforces our claim; even when the link reliability requirement is 1, the SPALS algorithm provides a cost reduction of about 73% and 70% w.r.t to a scenario with all fiber connections and where ducts are absent or present respectively. On the other hand, kMSPALS provides 25% and 23% improvement when ducts are absent and present respectively.

IX. CONCLUSION
FSO is a promising solution for providing backhaul/ fronthauling services to 5G and beyond BSs. Unfortunately, the weather dependency of FSO links can be an obstacle towards its large scale adoption. However, FSO links are sufficiently reliable over shorter distances. As a result, their benefits can be availed in a hybrid network with FSO and optical fibers. In this paper, we have proposed an efficient and near optimal method for designing a hybrid FSO/fiber network with the objective of minimizing network deployment cost. The proposed method carefully considers the reliability and data rate requirements of the connected wireless BSs while selecting the link technology. The results reflect the dependency of network cost on the reliability requirements. The cost reduction is especially evident (up to 73% even for high desired link reliability) when ducts are not present in the scenario and trenching is required for fiber installation. The proposed network deployment method produces results in practical time limits and therefore, is suitable for adoption in network planning. As a future work, we intend to extend the deployed network to a multihop FSO scenario.