Machine Learning-Based Online Coverage Estimator (MLOE): Advancing Mobile Network Planning and Optimization

Nowadays, the dependency on high-performance digital mobile connectivity is not limited to human usage but also the intelligent objects increasingly deployed to serve the needs of Internet of Things (IoT) applications. However, the current network planning technique limitation has constrained the real potential of mobile digital connectivity development. This situation has hindered sustainable Internet-oriented economic and technological development. The 3rd generation partnership project (3GPP), through its specification release 18 (Rel.18), has included and leveraged the potential capabilities of machine learning (ML) technologies in advanced mobile network planning. The main objective is to enhance mobile network planning performance and reduce complexity. To materialize this aim, we propose a novel ML-based Online coverage Estimator (MLOE) tool developed based on Random Forest (RF) ML algorithm. It uses seven unique features to predict the mobile network performance through reference signal received power (RSRP). Accordingly, the results showed that MLOE outperformed traditional empirical techniques and previous works. The final trained RF algorithm has achieved an outstanding root mean square error (RMSE) of 2.65 dB and a coefficient of determination ( $R^{2}$ ) of 0.93. With the dynamic and fast-growing mobile technology, MLOE has been deployed on an online platform using MATLAB® Web App Server, which offers a modular and scalable architecture.


I. INTRODUCTION
Despite the recent technological advances, issues related to unsatisfactory mobile network services are still challenging, as highlighted by the Malaysian Communications and Multimedia Commission (MCMC) [1]. Among the contributing factors to the latter issue is the current network planning techniques' limitation. According to [2], [3], [4], and [5], the traditional empirical techniques primarily applied The associate editor coordinating the review of this manuscript and approving it for publication was Huiyan Zhang . in the industry are inaccurate. Meanwhile, the deterministic techniques are not practical to apply on real-world operational scales due to their complexity, requirement for highresolution topographic maps, intense reference information, and high computing power.
Aiming to address the latter issues, 3GPP, through its Rel. 18, has included and leveraged the potential capabilities of ML techniques to enhance mobile network planning performance and reduce complexity [6]. In this regard, four main objectives were outlined. The first is focused on identifying a common ML framework, including the functional requirements of ML architecture. Secondly, to identify areas where ML could improve the performance of mobile network planning functions. Thirdly, to identify what is required for an adequate ML model characterization and description, establishing proper notation. Then, evaluate ML-based techniques to understand the attainable gains and complexity requirements.
In principle, ML techniques are less complex than traditional empirical methods in producing a propagation model. In the former method, the researcher has to find the appropriate rules and algorithms to produce the required forecast output quality. Because of this, most propagation models based on traditional empirical techniques are inflexible due to the constraints faced [5]. On the other hand, in the ML technique, this process is automatically done by the computer system based on the characteristics of input parameters and response variables that have been set. Fig. 1 shows the comparison of the working principles of ML techniques and traditional empirical techniques in producing a prediction model. Accordingly, this study will review and identify the most practical and optimum ML algorithm for developing MLOE for RSRP-based prediction. RSRP is one of the 4G and 5G networks' key performance indicators (KPI) that telco companies utilize to understand and evaluate the performance and coverage of their network [7], [8]. While the specification of RSRP in 4G and 5G are not similar, RSRP, in both technologies, serves the same function of performing cell selection/reselection and handover process [9].
In the meantime, this study will also examine several features to evaluate their suitability for RSRP prediction. Finally, we have utilized the MATLAB Web App server for the MLOE deployment, which is publicly accessible via the Internet. Fig. 2 describes the overall concept of execution of this study.
As Malaysia's 5G mobile network is still under development and testing, this study focused on utilizing the existing 4G network. This is also in line with the direction of Malaysia's government through the Jalinan Digital Negara (JENDELA) plan. The plan has enforced the 3G network services termination by late 2021 to empower 4G networks for bridging the digital divide, especially between rural and urban areas [10].
As such, the contributions of this study can be summarized as follows • The performance of the final trained RF model is evaluated against traditional empirical techniques and previous works done by [5], [12], [13], [14], and [15].
• Describes a comprehensive MLOE development methodology for mobile network planning and optimization where the final proposed tool can be accessed online freely at [16]. The rest of the paper is organized as follows: Section II discusses the recent progress in ML-based propagation modelling. Section III presents the methodology of the study.
The results of the study are discussed in Section IV. Finally, Section V concludes the article.

II. RECENT PROGRESS IN ML-BASED PROPAGATION MODELLING
Recently, ML techniques have been actively explored by researchers from various fields, including mobile telecommunications [17], [18], [19], [20]. The latter is because ML techniques that focus on data and algorithms have proven their capabilities in many fields of application, such as medicine, automotive, economics, banking and many other fields [21], [22], [23], [24]. In addition, ML-based models can improve their accuracy over time without having to be specifically programmed [25]. Therefore, it coincides with the development trend of mobile network technology which is dynamic and rapidly evolving [26], [27], [28], [29].
Predicting mobile network coverage based on specific features is categorized as a regression-type supervised machine learning technique [12], [30]. This field of study, especially for the ground-to-ground (G2G) mobile networks, has been explored by [5], [12], [13], [14], and [15] using a real-world dataset. In the latter studies, most ML models' prediction accuracy was around 4 dB to 8 dB, depending on the type of environment studied. While each study reported different findings, it was found that the Random Forest (RF) ML algorithm consistently showed the best prediction performance. RF was also found to be more efficient regarding training duration and prediction speed than other ML models such as Artificial Neural Network (ANN), Support Vector Machine (SVM) and Gaussian process regression (GPR) [12], [15]. On the other hand, RF is also very well known for its robustness and powerful capabilities, especially not susceptible to sample and feature disturbances also noise [12], [13], [15]. Hence, RF has been selected as the basic framework in this study which is in line with the study objectives to develop a practical and optimal RSRP prediction model for use in realworld operations.
In [15], we have listed the features utilized in the previous works. According to domain knowledge, among identified features that highly affect the performance of mobile network signal propagation are the antenna tilt angle, azimuth angle, position and location [31]. Therefore, in Table 1, we have assessed and summarized the pros and cons of the features utilized in previous works, which finally led to determining the features to be utilized in this study.
Previous works never tested the trained RF model outside the study area. This may be because the ML technique, which falls under the empirical category, tends to be inaccurate when applied outside the study area. Therefore, this study will test the performance of the final trained RF model inside and outside the study area and compare it with the prediction results from the traditional empirical propagation models.
Most of the time, the wireless planning tools in the desktop version are inflexible. This is because it was developed based on a conventional framework. However, the innovation of mobile network planning through an online platform, such as pioneered by CloudRF [32], is a game changer in telecommunications. In addition to its advantage of being accessible anywhere, the flexible and scalable development architecture makes upgrading the system much easier and simpler. At the same time, sharing the latest update or information with end users will be more effective.
Meanwhile, the burden of data processing and analysis is entirely on the server side, which gives the user a huge advantage. End users do not need to worry about processing power and storage capacity requirements. They only need to ensure the Internet connectivity is stable. This kind of development framework is more user-oriented. Therefore, the service package offered is usually suited to most end users' preferences and financial capabilities. The identical framework has been applied by Internet giant companies such as Amazon, Microsoft, and Google in their cloud computing services [33], [34], [35].
However, a survey of research articles related to ML found that less than 10% of studies discuss the deployment of ML models, and far fewer of them successfully deploy ML models into real-world applications [36]. This is because deploying ML models in a real operational environment is critical and challenging [37]. ML models successfully deployed at the production level can only increase operational efficiency or develop new value propositions [38].

III. METHODOLOGY
To develop an accurate RSRP prediction model, the features must be able to describe precisely the receiver (R x ) location in reference to the location of the transmitter (T x ) antenna. Other than that, the features also must be able to describe the signal propagation status and the characteristics of the operational environment. Therefore, the level of signal attenuation experienced before arriving at the R x location can be 3098 VOLUME 11, 2023 predicted more precisely. In this study, the MLOE development activities were carried out based on the workflow described in Fig. 3. Explanation regarding each step in the following workflow is briefly described as follows:

A. DATA ACQUISITION
Two types of raw data were acquired, which are: (i) the 4G mobile network base station (BS) technical specifications; and (ii) the real-world 4G RSRP data. The real-world 4G RSRP data was obtained through measurement campaign activities conducted in 14 areas around the Klang Valley, Malaysia, representing dense urban, urban, suburban, and open area environments using hardware and software described in Table 2. The measurement campaign was conducted at a vehicle speed below 40 km/h to minimize the fastfading effect due to the Doppler shifts [4], while maintaining the speed limit regulation in respective areas.
The measurement campaign was done for two purposes: (i) to generate the RF model training dataset and (ii) to generate a test dataset for the final trained RF model. For model training, the data collected in Putrajaya are only used (refer to Fig. 4). It consists of 10 transceiver BS antennas. Putrajaya is a unique region with multiple environmental categories [39], [40]. Meanwhile, for final trained RF model testing and evaluation purpose, 12 transceiver BS located inside and outside of Putrajaya have been identified (refer to Table 3 ). The details information regarding the test location and distribution of test points are available in the open-source GitHub repository at [41].  Although there are several test locations outside Putrajaya with different types of land use and environmental characteristics, the selection of these locations is purposely to analyse the trained RF model generalization capability. This model prediction performance would then be compared to the traditional empirical propagation models. These tests would also identify whether the final trained RF model experiences overfitting, where the trained model tends to produce very poor prediction performance on the test area located outside the study area. Thus, the assumption is that if the prediction performance inside and outside training areas is minimal, then the trained model would have the potential to provide reliable prediction capabilities in Malaysia's harsh tropical/irregularterrain environments and other similar context areas.

B. FEATURES GENERATION AND DATASET PREPARATION
To create the dataset for model training purposes, the raw data mentioned earlier was utilized to generate seven unique features as follows:

1) DISTANCE
The distance (D) is the separation distance in meters between BS and user equipment (UE) at the Euclidean plane. It is used to estimate the UE position referenced to the position of the BS antenna on the x-axis. The calculation of D is based on the spherical law of cosines [43], [44].

2) AZIMUTH OFFSET ANGLE
The azimuth offset angle (AZ offset ) represents the absolute angle formed at the horizontal plane between the BS antenna pointing direction and the location of UE. It is used to estimate the UE position referenced to the position of the BS antenna on the y-axis. In the meantime, AZ offset is also used to estimate the UE position within the coverage area of the BS antenna main lobe in the horizontal plane. The received signal strength will be stronger if the UE position is near the BS antenna boresight [45], [46]. The calculation of AZ offset is given in (1).
where, AZ ant refers to the BS antenna azimuth angle. Meanwhile, AZ ue refers to the UE azimuth angle calculated using the arctan2 function [47].

3) ELEVATION ANGLE
The elevation angle (ELEV ) is referred to the angle formed between the horizontal plane and the observed line from UE to the BS antenna. ELEV is used to estimate the UE position with reference to the position of the BS antenna on the z-axis. Other than that, ELEV is also used to describe the characteristics of the UE's position on the undulating Earth's surface. Even at the same separation distance between the BS and the UE, the ELEV value will differ according to the actual position of the UE either in the highlands or in the valleys. The calculation of ELEV is given in (2).
whereas h ant is the antenna height and h ue is the UE height.
Both height values are referred to above sea level (ASL). D is the value of the separation distance between T x and R x as described in (1). The web-based radio planning tool called CloudRF [32], which is equipped with 10 meters resolution of terrain and clutter information, has been utilised to generate the values of h ant and h ue .

4) TILT OFFSET ANGLE
The tilt offset angle (TILT offset ) is the angle formed between the boresight of the BS antenna and the observed line from UE to the BS antenna on the vertical plane. TILT offset is used to estimate the UE position in the beam of the main lobe of the BS antenna on the vertical plane. Based on the Pythagorean theorem, the value of the observation angle from the BS antenna to the UE from the horizon is equal to ELEV . Meanwhile TILT ant is equal to the sum of the mechanical and electrical tilt angle of the BS antenna. Therefore, the calculation of TILT offset is given in (3).

5) ENVIRONMENT CATEGORY (CLASS)
At the same specification of BS, the extent of mobile network coverage will vary according to its operating environment [48]. The transmitted signal will be reflected more frequently in urban areas compared to suburban and open areas. As a result, the signal attenuates much faster in urban areas. Because of that, the dataset in this study was classed accordingly into three categories: urban, suburban, and open areas. Even though the RF algorithm is insensitive to the dataset's variance [49], for computational simplicity, this category was coded as 1, 2 and 3 to represent urban, suburban, and open areas, respectively.

6) FREQUENCY BANDS (FQ)
The frequency band, also called radio frequency, is the air interface medium that carries information from the transmitter (BS antenna) to the receiver (UE). Different frequency bands have different characteristics and capabilities. Lowfrequency bands give larger coverage, but the transmitted signal capacity is low. In contrast, the high-frequency band can carry high-capacity applications such as video calls, online movies and more. But the downside is the limited coverage. In this study, two 4G mobile networks frequency bands are studied, i.e., 1800 MHz dan 2600 MHz.

7) SIGNAL PROPAGATION STATUS (OBS)
The propagation of mobile signals is greatly affected by the surrounding objects. This is due to the nature of radio waves that will be scattered, reflected, refracted, or absorbed when interacting with any object in its path [50]. As a result, it affects the signal power level at the UE location. Therefore, 3100 VOLUME 11, 2023 the features related to signal propagation status were included in this study. Due to the limited information, the signal propagation status in this study was categorised roughly into two classes, which are line-of-sight (LOS) and non-line-of-sight (NLOS). This feature was extracted using the CloudRF radio planning tool, which is also embedded with high-resolution 3D building maps from OpenStreetMap [51]. The signals propagation status was coded as 0 and 1 to represent LOS and NLOS, respectively. The abovementioned features are illustrated in Fig. 5 and summarized in Table 4 for a clear explanation and comparison.
Before proceeding to the model training and refinement activities, the training dataset must undergo a data-cleaning process to eliminate outliers. This cleaning process was carried out using an interquartile range (IQR) score [52]. A total of 16,310 points of the training dataset were used in this process. As a result, 1,846 outliers' data were detected and eliminated. The same IQR score is then utilised to clean the test dataset. The aim is to ensure that the final trained RF model is tested within the studied data range. Details about IQR scores used in this study are available publicly through an open-source GitHub repository [53].

C. MODEL TRAINING AND REFINEMENT
MATLAB 2020a Regression Learner was used to train the RF model. The training process was performed using 10-fold cross-validation (CV), a resampling method for the training and validating process, mainly to prevent overfitting [5], [14], [15]. Evaluation matrix, Root-mean-square error (RMSE) and coefficient of determination (R 2 ) were used to evaluate the performance of the trained model. The calculation of RMSE and R 2 is given in (4) and (5), respectively.
where n sample is the total number of samples, y i is actual value, andŷ i is the predictive value. According to [12], if the RMSE less than 7 dB, it is considered acceptable for urban environment. While 10 to 15 dB is acceptable for suburban and rural area. During the model training session, important feature analysis is performed using the feature selection function available in MATLAB Regression Learner. This analysis is to identify and remove non-influential features. This kind of analysis also has been utilised by [5], [12], and [15] in their works.
Hyperparameter tuning is the final process in the model training session, where a model will be tuned in more detail to obtain the most optimal performance results. For this purpose, VOLUME 11, 2023 the RF model was tuned using the Optimizable Ensemble function, also available in MATLAB Regression Learner and the results are shown in Fig. 6. The newly generated values of RMSE and R 2 are equal to 2.65 dB and 0.93, respectively. Therefore, the accuracy of the final trained RF model was increased by 0.58 dB and model variability increased by 0.03. As a result, the final trained RF model hyperparameter settings have 499 learners, two minimum leaf sizes, and five predictors to sample. The final trained RF model is then exported to the MATLAB workspace for interpretation and evaluation. In summary, the ML model training and refinement activities were implemented based on the flow chart shown in Fig. 7.

D. MODEL INTERPRETATION AND EVALUATION
ML model like RF is often referred to as black box model because it is difficult to interpret how the model makes predictions [54], [55]. Interpretability tools help to overcome this issue and reveal how the features contribute to the predictions. In the meantime, interpretability tools can validate whether the model uses the correct principle for its predictions based on domain knowledge. Besides, it can also find model biases that are not immediately apparent.
In this study, we utilised a partial dependent plot (PDP), a type of Model-Agnostic Method, which aims to show the marginal effect that features have on the predicted outcome [54], [56]. PDP is a global interpretation tool that can explain how a trained model makes predictions for the entire data set. The partial dependence function for regression is defined as (6) [54]: whereas x s are the features for which the partial dependence function should be plotted and x c are the other features used in the ML modelf . From the PDP graph, we extract the X and Y values and calculate the standard deviation (SD) as defined in (7) [57]. The most important features will show higher SD values. where µ is the mean of A: Two approaches have been taken to assess and evaluate the performance of the final trained RF model. In the first approach, the prediction performance of the final trained RF model was compared to the prediction performance of the RF models developed in the previous works, as listed in Table 1.
Meanwhile, in the second approach, the prediction performance of the final trained RF model was compared to the prediction performance of several traditional empirical propagation models that are available in CloudRF, i.e., COST231, SUI, ECC-33 and ITM. The assessment and evaluation were carried out using the test datasets prepared earlier. Based on the literature review, COST231, SUI, ECC-33, and ITM have been widely used in wireless network planning, as listed in Table 5.

E. WEB-BASED ML MODEL DEVELOPMENT AND DEPLOYMENT
The development and deployment of MLOE is conducted according to the three stages described in Fig. 8. In the first stage, the Graphical User Interface (GUI) was designed into three main sections: (i) Features calculation and generation function; (ii) RSRP ML-based prediction model; and (iii) Features specification description and other relevant references. Using a MATLAB script file, the GUI was designed, programmed, and linked to the final trained RF model.
In the second stage, the GUI functions, data, and settings that define the final web application were compiled into a deployment format. In the third stage, the compiled project file was deployed in an application server with MATLAB Web App Server program and the appropriate MATLAB Runtime version. MATLAB Runtime is a collection of shared libraries and code that enables the packaged MATLAB applications to be utilized on a device without the MATLAB program installed. Therefore, the end users can access MLOE through a web browser using HTTP or HTTPS protocols. The source code for the development of MLOE is available publicly through an open-source GitHub repository at [70].

IV. RESULTS AND DISCUSSIONS A. FEATURES IMPORTANCE
The training result of the RF model with all features equal to 3.23 dB and 0.90 for RMSE and R 2 , respectively. Meanwhile, the feature selection analysis results are shown in Table 6. All features used in this study influence the RSRP prediction process based on the results. However, the level of influence of each feature on the RSRP prediction process is varied. Therefore, the PDP analysis will reveal in detail the level of influence of these features.
In Fig. 9, the PDP revealed the feature Distance is negatively correlated to the response variable. This trend coincides with the domain knowledge, where the larger the separation distance between BS and UE, the weaker the received signal  strength at the UE location. There is a considerable variation in the response variable for distance values varies from 0 to 1200 meters, contributing to 4.34 dBm SD. At 400 meters to 600 meters, the RSRP values will decrease drastically. However, the RSRP remained unchanged when the distance value reached 900 meters and above.
The response variable is also negatively correlated with the AZ offset as shown in Fig. 10. This result is also coinciding with the principle of domain knowledge, where the received signal strength at the UE location gets weaker when its position is away from the boresight of the BS antenna. The variation of RSRP value is also substantial, especially when AZ offset in the range of 10 degrees to 20 degrees, and 50 degrees to 70 degrees. In the range of 20 degrees to 50 degrees, the rate of change in the RSRP value is only within 2 dB, while above 70 degrees, it was found that the rate of change in the RSRP value is very low, which is close to the zero value. Overall, the SD value for AZ offset is equal to 5.56 dBm, which is the highest compared to the other features applied in this study. Therefore, it is considered that the feature AZ offset is the most important and influential feature of this study.
In contrast with the PDP analysis result for ELEV feature, as shown in Fig. 11. The graph slope trend showed a positively correlated, which means the closer UE is to the BS, the better signal strength will be received. This coincides with the principle of domain knowledge. A drastic change in the RSRP value can be seen in the range of 3 degrees to 7 degrees. In the VOLUME 11, 2023  range of 11 degrees to 25 degrees, the RSRP values remain in an upward trend, within the range of 4dBm.
As for TILT offset feature, the graph trend is more towards a negative correlation. However, the graph slope (Fig. 12) is not too steep compared to the other feature, i.e., D, AZ offset and ELEV . This result coincides with the domain knowledge where the farther the UE location from the BS antenna boresight on the vertical plane, the received signal strength will get weaker. The SD for the AZ offset is 0.69 dBm, which means the influence level on the changes in RSRP value is not very high. Between 0 to 13 degrees, the variation of RSRP values is within 1 dBm. However, from 10 degrees to 13 degrees, the RSRP values decreased drastically within 2 dBm. This may be due to the reduction in the receiving signal strength when transitioning from the main lobe to the side lobe. After that, the RSRP values remain unchanged.
For environmental category features (refer to Fig. 13), a large gap in the RSRP value separates class 3 (open area) from class 2 (suburban) and class 1 (urban). The SD for this feature is equal to 5.03 dB, the second highest after the Azimuth Offset. So, it is proven that the type of environment  is an essential and influential feature in the RSRP prediction process. This coincides with the domain knowledge principle, where signal propagation attenuation in open areas is much lower than in suburban and urban areas. However, for Class 2 and Class 1, the difference is not too significant. This is likely due to the less dense urban topology characteristics in Putrajaya. Therefore, more training dataset for a different type of urban topology is required to understand its influence better.
For the operating frequency features (refer to Fig. 14), it was found that the gap in the response variable during the RSRP forecasting process is relatively straightforward in differentiating between the 1800 MHz and 2600 MHz bands. The generated SD value of 1.33 dBm shows that the frequency bands feature is important and influential to the RSRP prediction process. PDP results show that the signal propagation performance on the lower frequency band is better, in line with the domain knowledge.
Based on the PDP analysis results in Fig. 15, the signal status features do not significantly contribute to the RSRP prediction process. The gap between LOS and NLOS categories 3104 VOLUME 11, 2023  is around 0.5 dBm, contributing to 0.26 SD. Therefore, this feature is the least influential in this study. Perhaps due to the rough classifying of the signal status characteristics, which does not consider the difference in the number of obstacles and the die electric properties of each obstacle, the contribution of this parameter is seen as not very prominent compared to the contribution from the other features. However, it's still contributing to the final RSRP prediction process by a small margin.
Overall, it can be concluded that features applied in this study are influential based on priority order, as shown in Table 7.

B. ML-BASED PREDICTION MODEL
In Fig. 16, the prediction performance of the final trained RF model was compared to the prediction performance of the RF model produced in previous works. It was found that the performance of the final trained RF model in this study is better than that of the RF model developed in previous works. After the hyperparameter tuning process, the final   trained RF model has achieved 2.65 dB and 0.93 of RMSE and R 2 respectively. This indicates that the features applied in this study are very suitable and influential for the mobile network path loss study. Compared to the features applied in the previous works, the combination of features in this study, especially for D, AZ offset , ELEV and TILT offset is capable of mapping more precisely the actual position of the UE in the main lobe beam of the BS antenna.
In Fig. 17 to Fig. 19, the prediction performance of the final trained RF model was compared to the prediction performance of traditional empirical propagation models, i.e., COST231, SUI, ECC-33 and ITM using the test datasets.
Overall, the performance of the final trained RF prediction model showed better performance than the traditional empirical propagation model in urban areas (refer to Fig. 17). As expected, the RMSE of the final trained RF model was below 7 dB in the study area and increased when tested outside Putrajaya. However, the differences were not significant in the test area of Cyberjaya and Selangor. Meanwhile, the difference was almost three times for the test areas in Bukit Bintang and SOGO, which fall under the dense urban area category. The latter was expected because the landscape planning and urban design in both test areas differ from Putrajaya.
For the suburban test area (refer to Fig. 18), the performance of the final trained RF prediction model in the study area remains below 7dB. While the RF model prediction accuracy for all test locations outside Putrajaya surprisingly showed not more than 10 dB even though the build-up characteristics in some test areas were different. This may be due to structure height, density, and material properties in suburban areas, which are less significant than in urban areas. Meanwhile, it was found that the prediction performance of the final trained RF prediction model in suburban areas is more consistent than the traditional empirical propagation models. The traditional empirical propagation model tends to be inconsistent and less accurate, especially for SUI and ITM.
Likewise, in open test areas (refer to Fig. 19), the performance of the final trained RF prediction model inside the study area remains below 7 dB and outside the study area remains below 10 dB. This indicates that the performance of the RF prediction model is also better for open area categories compared to the traditional empirical propagation models.
In conclusion, this study's final trained RF prediction model proved to predict the RSRP value more accurately than the traditional empirical propagation models. Even outside the study area, especially in suburban and open areas, the final trained RF prediction model is seen as capable of providing reliable prediction capabilities. Meanwhile, a more detailed training dataset is required for urban areas, especially for features that are more specifically described on differentiation types of urban design and signal propagation characteristics. Therefore, the ML algorithm will be able to distinguish more precisely the type and properties of the material that interact with the transmitted radio wave signal. Fig. 20 shows the final GUI of MLOE, which can be accessed openly through the Internet [16]. In the Calculator Function Section, users need to enter the BS technical specifications in Antenna/Tx Section, while the UE specifications are in Mobile Equipment/Rx Section. With the information entered, this calculator function will generate the output value of the features on the Features Value column. Thus, to perform the RSRP predictions process using the generated feature values, the user needs to press the Value Transfer button to send the generated values to the Prediction Model Section. However, the Prediction Model Section can only accept the value of the feature within the allowed range, which is equal to the IQR score as described earlier. If the value transferred is outside the allowed range, the value transfer process will be failed or be incomplete.

C. WEB-BASED APPLICATION
In general, the RSRP prediction process can be implemented in two approaches which are (i) predict the RSRP value by using the calculator functio; and (ii) predict the RSRP value by directly submitting the values of the required features in the Prediction Model Section. The GUI will directly interact with the final trained RF algorithm deployed in MATLAB Web APP Server to produce the RSRP prediction result.
In the Indicator and Description Section, there are three separate tabs. The first tab explains the feature characteristics and the related abbreviations. Meanwhile, the second tab explains information regarding RSRP signal strength categories. Finally, the third tab included background information regarding the ML algorithm and a disclaimer for user 3106 VOLUME 11, 2023 reference. At the same time, the author's e-mail address and YouTube page link for a tutorial video [71] were included for user' convenience.
Overall, MLOE can be utilised for mobile network planning before and after BS deployment. Before BS deployment, this model can identify the best location for BS to be deployed to achieve optimal coverage. Meanwhile, the model can be applied after BS deployment to perform antenna adjustment and fine-tuning, especially on the antenna azimuth and downtilting angle.

V. CONCLUSION
This article introduces and explains the concept and methodology of developing MLOE for predicting the coverage of 4G mobile networks through RSRP. In this study, the MATLAB App Designer was used to design and program the MLOE before being compiled and deployed into MATLAB Web App Server. The usage of seven unique features in this study has proven the capability to precisely describe the UE's actual location with reference to the BS antenna. As a result, it is enhanced the prediction accuracy to 2.65 dB and 0.93 for RMSE and R 2 respectively. The performance of MLOE is proven better than traditional empirical techniques and previous works. Finally, future work is to extend the MLOE function for 5G networks. Furthermore, the capabilities of remote sensing satellite image data will be exploited to produce more details and precise information related to the urban design dan characteristics of signal propagation. Besides, the benefits of MLOE can be further expanded by generating prediction results in spatial format, which allows for seamless integration into existing mobile network planning and monitoring systems.