Network Site Optimization and Clustering Study Based on Simulated Annealing Algorithm

Nowadays, communication networks are becoming increasingly complex. This paper aims to demonstrate an effective method to achieve the intelligent planning for network base stations (BSs). The various parameters such as BS coordinates (x, y), the collaboration of multiple types of BS, and the density of BS construction are taken as design parameters for BS placement. We construct the objective function using the lowest total cost and the total minimum workload of BS to 90%. To solve the problem of siting planning with large data volume and mixed placement of multiple BS, we propose a new practical three-step model for BS siting planning: (I) roughly selecting the alternative coordinates for the BS using the DBSCAN algorithm; (II) correcting and further refining the alternative BS coordinates using the K-means algorithm; (III) determining the optimal BS construction solution to meet the requirements using simulated annealing algorithm (SAA). The real data of a $2500\times 2500$ area have been used for the simulation test. The simulation result shows that BS placement covers 90.03% of the workload, confirming that the proposed method can handle site planning for large orders of magnitude of data and use a mix of BS to achieve the best economics for the demand. This paper provides basic support for future research on network site optimization.

BS location and coverage information, combined with the capacity and transceiver characteristics of the proposed BS [6]. In addition, the construction cost is a significant factor in meeting the user service demand while ensuring the BS radius and coverage design. Therefore, to solve such problems, there are two main approaches: using the coverage area [7] or the size of the service volume [6] for deciding whether to meet the demand. The solution to the coverage area approach can be divided into two categories: Set Covering Location Problems (SCLP) [8], which aim to minimize the cost of meeting the demanding across all areas, and Maximal Covering Location Problems (MCLP) [9], which aim to achieve as large an area as possible at a given cost. Given its practical applications in fields such as electronic vehicle BSs [10] and Unmanned Aerial Vehicle (UAV) BSs [11], MCLP has been paid considerable attention by many scholars.
Various methods and tools have been used practically to determine the optimal construction location of BS for communication network planning. In the past, researchers often utilized multi-objective linear programming methods to solve these problems; however, over time various approaches such as simulated annealing algorithm [12], particle swarm optimization algorithm [13], genetic algorithm [5], discrete fireworks algorithm [6], and greedy algorithm [14] have gradually become the mainstream solutions to such problems. Among them, multi-objective linear programming methods provide higher-quality solutions by relaxing some constraints [15]. The performance of population-based evolutionary algorithms, such as genetic algorithms, extensively depends on the efficiency of coding [6]. Hence, developing new algorithms or taking measures to improve the existing methods is crucial.
In the previous period, many scholars have proposed various models for constructing BS, among which, the two-step model proposed by Downs and Camm [16] opened up new ideas for solving such problems. The first step of this model is to determine the candidate locations set (CLS), and the second step is to select the alternative sites that satisfy the conditions in the corresponding CLS using a heuristic algorithm to achieve the maximum coverage. This method can find the optimal solution in most cases. Instead of the two-step model. He et al. proposed a three-step model [17], which selects CLSs as the initialization phase in the first step, like the two-step model. In the second step, they remove some alternative BS coordinates with low coverage service from the selected CLSs as the correction phase. Further, they select alternative sites based on the correction phase in the third step to achieve maximum coverage. Besides, Amine et al [18] considered several factors for BS construction in a realistic situation, constrained the traffic demand, and used the optimal coverage with the lowest economic overhead as the planning objective. Genetic algorithms were applied to solve this multi-objective planning problem.
Recently, many scholars have considered several practical factors in the existing BS siting problems. A distance-decaying function for the facility coverage was considered by Haghi et al. [19]. Arana-Jiménez et al. [20] modeled this problem from a fuzzy perspective producing suitable fuzzy solutions. The study of Baldomero-Naranjo et al. [21] considered the demand unknown and distributed along the edges. In the study of Tedeschi and Andretta [22], the problem of selecting the optimal site for a BS with elliptical coverage is considered. Rodriguez et al. [23] analyzed the effect of vehicles' average utilization on the siting in a study on facility location and equipment emplacement led by. Nwelih et al. [24] introduced a weighted fitness function that combines coverage, capacity, and transmit power parameters in this field. However, in most of the articles, only the influence of the selection of coordinates for a single type of BS was considered, neglecting the complexity of the practical situation where multiple types of BSs work in combination [25], [26]. Besides, only a few papers provide solutions for practical situations of a large order of magnitude [17], [27].
In summary, the current BS placement methodology has several challenges and limitations, as follows: (1) Most of them contain only a single type of base station construction and do not consider a mixture of multiple BSs.
(2)The stability problem of coexistence between BSs, i.e., the threshold between BSs, is not ignored.
(3) Most of the siting schemes have optimization space. Moreover, only partially utilize the optimization algorithm to reach the optimal solution for a given situation.
Based on the above analysis, this paper intends to consider a network planning model that applies to multiple BS working together and it is suitable for large-scale data computation adopting the proposed improved three-step model to meet the multiple requirements of BS planning (including coverage, user requirements, and signal quality).
The three-step model entirely considers the challenges and limitations proposed above and expects to use clustering ideas to solve the stability problem of BS coexistence and optimization ideas to solve the problem of mixed construction of multiple BSs and the optimal solution of the scheme.
The ideas of the model mainly consist of the following three points: (1) Preliminary clustering of a large number of BS coordinates data points based on the DBSCAN algorithm, where (the radius parameter) ε is set to a given threshold of 10 between BSs and (the neighborhood density threshold) parameter is 1. Computer simulation can obtain the set of coordinates of the proposed station site; (2) Optimizing the set of proposed site coordinates obtained by the DBSCAN algorithm in (1) and using the K-means algorithm to cluster the obtained set of site coordinates can avoid some of the data clusters in (1) being too large to ensure the stability of the BS operation (the radius of the cluster exceeds a given threshold distance), where the parameter K of the algorithm is one-tenth of the maximum value of the horizontal and vertical distance of the cluster in which it is located.
(3) Considering the actual situation of mixed construction of multiple BSs, the set of coordinates of the proposed BSs obtained in (2) is optimized again, and the station coordinates are used to select which BS to be constructed as the decision variable, the total cost of construction is minimized as the objective function, and the service volume in the coverage area is greater than 90% of the total service volume as the constraint for planning. Finally, based on the simulated annealing algorithm (algorithm parameters: initial temperature of 1000, maximum number of iterations of 1000, number of iterations at each temperature of 10000, and temperature decay coefficient of 0.95), the final station plan is obtained.
The planning model can be applied to various BS network planning models. It represents a breakthrough from most traditional BS planning models that include only a single type of BS and can combine different types of BSs to achieve optimal planning objectives. Also, the model has a flexible structure, and the BS site planning for any requirement can be accomplished by converting the specified constraints and established objectives of the actual problem into the corresponding parts of the model. Besides, with the aid of the clustering algorithms, the method proposed in this paper (combining initialization in step 1 with correction of optional station coordinates in step 2) can greatly reduce the amount of data to be processed during the optimization of the planning scheme (in step 3). The effectiveness of the model is verified in a 2500 × 2500 test data set.
This paper uses a real dataset with a size of 2500 × 2500 as an example to explain how the model works. The remaining parts of this paper are organized as follows. Section II discusses the BS data pre-processing. Section III demonstrates how the three-step model works to obtain the best plan for BS arrangement. Section IV summarizes the final scheme and model characteristics and presents ideas for future research.

II. BASE STATION DATA MUNGING
This paper uses a fixed area of size 2500 × 2500 raster as an example to construct new base stations (BSs). Information on the indicators of each raster area includes whether it is a weak coverage point, the service volume in the raster area, and the latitude and longitude coordinates (simplified to geographical horizontal and vertical coordinates).
Firstly, for the given data, careful observation was made, and it was found that the amount of weakly covered raster data needed to be more robust and conducive to the subsequent solution of the model. Fig. 1 shows the location of the 182,807 data points in the planning area.
Considering the temporal complexity of the calculation, it is decided to pre-process the data. Since the plan requires coverage of more than 90% of the services, this paper sorts all the weak coverage data in ascending order by service volume. It is found that the number of data points accounts for 70.01% of the total data when the service volume accumulates to 5%. Although these data are extensive, they have little impact on demand, and will hardly appear in the later planning coverage calculation. Therefore, we decide to eliminate this part of  the data and complete the data simplification. A random sample of 2,000 data points from the BS data source is analyzed descriptively, and a significant degree of dispersion between the data is found. Fig. 2 displays that eliminating some points with minimal service volume not only facilitates the calculation but also facilitates the subsequent clustering of the features presented, thereby ensuring the coverage of the service requirements. Thus, this processing is reasonable and scientific.
The data were then checked for outliers and missing values, and all data were found to be present and reasonable. At this point, the analysis and pre-processing of the data are completed. The locations of the 54,817 data points filtered after pre-processing the data are illustrated in Fig. 3.

III. MODELLING AND SOLVING
When investigating the demand data of Base Stations(BSs) in the selected area, three things need to be taken into consideration: latitude and longitude coordinates, signal coverage strength, and total service volume in this grid. To improve the signal coverage strength of the area, the coverage demand should reach at least 90% of the total service volume when the network BSs planning is completed. In addition, to ensure signal stability, the distance between the BSs must be greater than the given threshold value. In the actual network planning, the signal coverage strength and stability, the cost of building BSs, and some other practical factors must be taken into consideration. This paper divides BSs into macro and micro BSs. In this article, the coverage area of macro BSs is 30 grid units in radius, and the cost is 10 value units. In comparison, the coverage area of micro BSs is 10 grid units in radius, and the price is 1 value unit. The following part of this section illustrates how our model works to deploy BSs.

A. STEP 1: ROUGHLY SELECTING THE ALTERNATIVE COORDINATES FOR THE BS BASED ON DBSCAN ALGORITHM
The site selection and construction of base stations(BSs) have the following characteristics: 1. The number of base stations to be constructed is still being determined. More base stations may be required in areas with high traffic density. In contrast, areas with low traffic density may require fewer base stations.
2. There is a constraint between the base station coverage radius and the coordinates of weak coverage points. If the service volume demand of two weak coverage points is satisfied by the same BS, the distance between the two weak coverage points should be less than twice the distance of the BS coverage radius (i.e., the diameter of the BS coverage area). The DBSCAN clustering algorithm has the following characteristics: 1. Compared with other clustering algorithms, the DBSCAN algorithm does not need to determine the number of clusters in advance.
2. The parameter that needs to be determined for the DBSCAN algorithm is the maximum coverage radius. The idea of this algorithm is that if the distance between two data points is less than or equal to a certain threshold, then these two points belong to the same cluster. By continuously performing simulation iterations, the DBSCAN algorithm can determine the number of clusters actively and combine the coordinates of weak coverage point locations with the base station coverage radius. Therefore, we choose the DBSCAN algorithm in the initial base station construction coordinates selection.
To simplify the planning complexity, this paper first considers only micro BSs when selecting BS sites. A single macro BS is subsequently used to replace multiple similar micro BSs to save the cost. Since the coverage of the micro BSs is 10, we set ε (the radius parameter) to 10 and MinPts (the neighborhood density threshold) parameter to 1, i.e., no noise points by default [28], [29]. By inputting preprocessed weak coverage raster data, preliminary clustering results can be obtained, and the results are shown in Fig. 4.
Next, the center coordinate of each cluster in the clustering result is denoted as (X center , Y center ) for a cluster containing n data points: where X i represents the horizontal coordinate of the i-th point in a certain cluster, and Y i represents the vertical coordinate of the i-th point in a certain cluster [28]. It is observed from Fig. 5 that the clustering effect is relatively significant, and the goal of clustering the weak coverage points is initially achieved. After continuous iterative calculations in Matlab, the results of some clustering centers and the number of points they contain are shown in Table 1. Table 1 demonstrates that, after clustering, the largest cluster contains 556 coordinates, which is larger than the maximum value of the coverage (314) for a single micro BS. It can be seen in Fig. 6 that the DBSCAN clustering method  only allows the existence of two points whose distance is less than the given radius parameter. However, not all the points within the cluster are necessarily within a circle whose radius is the parameter (ε) [30]. The cluster centers generated only by the iterative DBSCAN clustering method may not be used as alternative coordinates for the initial micro BSs, as they may ignore some points and may not meet the 90% service demand requirement. Therefore, further clustering is required for some clusters with a more substantial coordinate coverage.

B. STEP 2: CORRECTING AND FURTHER REFINING THE ALTERNATIVE BS COORDINATES BASED ON K-MEANS ALGORITHM
The K-means clustering algorithm is used for the secondary clustering, which is to further separate the clusters that exceed the coverage of the micro BS. This algorithm divides the sample into K clusters and minimizes the sum of the distances between all objects in the sample space and their cluster centers [31], [32], [33]. The clusters that do not meet the predefined threshold in the initial clustering are subjected to a secondary clustering analysis. The K value for each cluster is selected individually. The two furthest data distance from the x coordinate in a single cluster is called x max , and the two furthest data distance from the y coordinate is called y max . Since the micro base radius parameter ε is 10, the number of clusters K for a single sample space is expressed as follows.
By updating the DBSCAN clustering results with the clustering results of K-means, 5,815 clustering centers can be chosen for BS sites. If all 5,815 clustering centers are built with micro BSs, the entire service volume can be covered. Alternatively, if some of the 5,815 clustering centers are selected to establish macro BSs, some to build micro BSs, and the rest to remain BS-free, the construction cost can be significantly reduced while still reaching the planning target of a 90% service volume. The 5,815 clustered centers have been identified as potential sites for new BSs and are shown in Fig. 7 for visualization purposes.
Some summary results of the final clustering centers are given in Table 2.

C. STEP 3: OBJECTIVE OPTIMIZATION MODEL BASED ON SAA
Nowadays, the architecture of a 5G network is more complex than that of a 4G network, and the emergence of 5G microbes can effectively improve the cost-effectiveness of network BS construction and optimize the service capacity of the system. Since the above two clustering steps have ensured signal coverage and stability, the third optimization step can be performed by combining the results of the above two clustering steps with the minimum total cost as the objective function and the minimum workload as the constraint to form a planning model. The intelligent optimization based on a  simulated annealing algorithm is used to further improve the network base station siting scheme by adjusting the number of stations and macro and micro BS types and changing the station layout.
In addition, it should be emphasized that the signal coverage of the network BS construction scheme and the signal interference generated by the coexistence of macro and micro BSs are also factors and objectives that we need to consider. However, in the first two steps of clustering, the DBSCAN algorithm in the first step ensures the signal coverage, and the K-means algorithm in the second step ensures the stability of the coexisting signals of macro and micro BSs, so both of them play a simplified contribution to the overall optimization objective.
To obtain the optimal construction plan for BSs, this paper considers the selection of new BSs as the decision variable, the minimum total cost of new BSs as the objective function, and the constraint that the service volume within the coverage area is greater than 90% of the total service volume. In summary, the following model can be obtained.
Objective function:  Binding conditions: The work described in this paper chooses possible station coordinates randomly, and the range of choices is as follows. At the same time, to avoid the new-generated macro BS coverage including the surrounding micro BS coverage, the alternative sites within both the macro BS and the micro BS coverage are set as the first choice by Matlab, i.e., no BS is established. According to this model, this problem can be solved using the SAA, which is a heuristic optimization algorithm that theoretically finds the optimal global solution [34], [35]. Therefore, it is employed to provide the optimal siting solution. The initialization parameters for SSA in this paper are set as shown in Table 3.
In this paper, macro BSs are established at all optional sites as the initial condition. Finally, the construction cost of these BSs converges to 2,149 after about 482 iterations. During this process, 102 macro BSs and 1129 micro BSs are established, covering 90.03% of the total service volume and achieving the target of providing service coverage to 90% service volume. The change in the cost of each iteration and the siting options for some BSs is shown in Fig. 8.
The final visualization of the BS is depicted in Fig. 9.

IV. MODEL COMPARISON
The optimal BS construction plan and its required cost are finally obtained through the optimization calculation of the above steps. On this basis, we selected some calculation results of the same context as this problem, used the same data set for comparison, and confirmed the superiority and practicality of the three-step model. The specific results are shown in Table 4. Compared with the optimization scheme using SAA alone, it is found that without the constraint of pre-clustering, the calculation results show more serious deviations, and the coverage areas of the BSs overlap significantly. Although the coverage areas are very substantial, the resulting cost increase is not reasonable for applying the practical problem.
The BS site planning using the DBSCAN algorithm combined with the exhaustive method is the opposite of the model solution process proposed in this paper, in which the type of BSs that can be built at each point is first determined. Then the location of the new BSs is determined by clustering using the DBSCAN algorithm. From the comparison of the results of the two planning schemes, it is easy to see that the use of the DBSCAN algorithm for site planning leads to a high total cost. Although the problem of the coverage threshold of the BS is well solved, the traversal algorithm using only methods such as the exhaustive enumeration method will make the results not optimal, and the results have strong randomness. In contrast, the use of some combinatorial optimization algorithms and intelligent optimization algorithms can solve this problem well.
In this method, the weak coverage points to be planned are first clustered using the K-means clustering method to determine the approximate location of the BS construction points. Then PSO is used to optimize the solution of specific station sites considering the cost and benefit factors. The total cost of this method is lower than the above two methods, and it combines the ideas of clustering and optimization to solve the problem. However, it needs to fully consider the threshold of BSs and the waste of resources caused by repeated planning, while the three-step model of two-layer clustering can solve such problems very well. In this paper, we present a new three-step model with practical implications for planning optimal BS construction solutions. VOLUME 11, 2023 Algorithm 2 K-means Input: Set coordinates after initial clustering D = {(x 1 , y 1 ) , (x 2 , y 2 ) , . . . , (x m , y m )} ; and set the number of proposed sites as k. Process: 01: Randomly select k clustered point coordinates from D as the initial site coordinate vector µ 1 , µ 2 , . . . , µ k 02: repeat 03: Set C i = ∅ (1 ≤ i ≤ k) 04: for j = 1, 2, . . . , m do 05: Calculate the distance between (xj, yj) and i-th Determine the cluster labeling of (xj, yj) based on the closest mean vector: λj = argmin i∈12...k dji 07: Assign (xj, yj) to the corresponding clusters: The model consists of initialization, correction, refining as well as optimization. The design parameters for BS placement are BS coordinates (x, y), the collaboration of multiple types of BS, and the density of BS construction. We construct the objective function using the lowest total cost and setting the total minimum workload of BS to 90%. The real data of a 2500 × 2500 area are used for the simulation test and the results are reasonable, confirming the effectiveness of this method for large orders of magnitude of data and multi-types BS. This study also leads to the following conclusions: 1. Combining the DBSCAN and K-means algorithms can yield more accurate results. The DBSCAN clustering algorithm identifies more concentrated points in the alternative area and distinguishes more discrete points; however, it may lead to the merging of adjacent clusters. In contrast, the K-means clustering algorithm can refine the decomposition of connected clusters, but it is more affected by noise points.
2. The SAA can provide an optimal solution for BS siting and construction relatively quickly to minimize costs and satisfy the constraint of covering at least 90% of the service volume. 1129 micro BSs and 102 macro BSs are established In the current scenario of constructing BSs, we can import the coordinates of the current area that have not yet met the demands for business, meanwhile import the types of BSs to be established (BSs with differing coverage radius) This will enable us to perform site planning for the new BSs using the method outlined in this paper. The present model requires fewer variables, leading to higher computational efficiency and the ability to handle large-scale data, ultimately reducing operational expenses. Moreover, it allows for a rough determination of the types and coordinates of the newly constructed BSs.
Introducing clustering algorithms in station site optimization reduces spatial and temporal complexity. The SAA, using heuristic ideas, allows the model to be solved accurately. However, our experiments are based on Matlab simulations, and no field experiments are conducted, so we have neglected the height of actual geographical locations and the issue of frequency interference between signals in real-world scenarios. Also, in the algorithmic approach, the selection of initial values may affect the final alternative BS coordinates due to the multiple uses of clustering algorithms [39] In addition, the stability of the clustering algorithm as an unsupervised pattern recognition method [40] is also worth considering [41]. Therefore, to meet this experiment's needs, we overconstrained the model. In the future, we will gradually relax certain constraints to evaluate the model, and we will also try to improve the model by randomly adding loss functions such as altitude and signal interference.

APPENDIX A ALGORITHM FLOW CHART
The flowchart depicted above illustrates the entire algorithm, which is primarily employed for the problem of base station siting planning. The algorithm consists of three main components and the specifics of each step are outlined below: • The DBSCAN algorithm used in the first step of clustering Application object: coordinate points of BSs that reach a specific service volume; Parameter settings: ε (the radius parameter) is 10, which is the minimum coverage radius of the BS, and MinPts (the neighborhood density threshold) parameter is 1 to ensure that all points are considered.
• The K-means algorithm used in the second step of clustering Application object:the set of BSs whose core points and boundary points in the cluster in the clustering result in the first step exceed the BS threshold distance; Parameter setting: K (number of clusters) is the maximum value of the horizontal and vertical distance within the selected cluster, and the formula is as follows. The 10 in the formula depends on the stability distance of the BS threshold.
• SAA used in the third optimization step Application object:a planning model in which the station coordinates are used to select which BS to build as the decision variable, the least total cost of construction as the objective function, and the service volume in the coverage area is more significant than 90% of the total service volume as the constraint; Parameter settings:The specific parameter settings of the simulated annealing algorithm are shown below. The values of these parameters are obtained by comparing multiple simulations.

APPENDIX C
The data in this article represents the existing network coverage in a specific city area. The data has been rasterized for computational convenience, dividing the area into a 2500 × 2500 coordinate grid.
The dataset contains information about 182,808 weak coverage points within this 2500 × 2500 coordinate grid. The information includes each weak coverage point's X and Y coordinates and the corresponding traffic volume. In this area, the minimum traffic volume among the weak coverage points is 0.000192, the maximum traffic volume is 47,795.01, and the average traffic volume of the weak coverage points is 38.59.

APPENDIX D
The abbreviations of terms used in the article are as follows:

APPENDIX E
The explanations of the parameters in this article are as follows:

APPENDIX F
In the experimental process, we use a computer with Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (12 CPUs), ∼2.6GHz, NVIDIA GeForce GTX 1650 graphics card, and 16G of RAM. The processing time in the third step of the simulated annealing algorithm using the data mentioned in this paper is about 250 seconds. At the later stage of the actual program, when the data are gradually stabilized, it is possible to decrease the maximum number of iterations within a single epoch. This adjustment aims to expedite the speed of each epoch and facilitate the calculation of a substantial amount of data. He is currently pursuing the B.E. degree in artificial intelligence with Tiangong University, Tianjin, China. His main research interests include data clustering and regression calculations, modeling estimation, and data prediction.