An Accurate Empirical Path Loss Model for Heterogeneous Fixed Wireless Networks Below 5.8 GHz Frequencies

Great progress has been made in providing convenient wireless communications with easy connectivity for users everywhere. Many empirical path loss (PL) models have been developed to assess the performance of new radio networks. This article first studies the state-of-the-art of empirical PL models, along with vegetation effects on radio signal propagation. Next, an accurate empirical PL model is proposed for fixed wireless networks under challenging rural propagation conditions. The proposed model is based on a Canadian dataset from a wireless internet service provider, using the Wireless-To-The-Home technology in the unlicensed 900 MHz, 2.4 and 5.8 GHz ISM bands and in the licensed 3.65 GHz band. The proposed model considers several parameters, such as line-of-sight obstructions, frequency bands and dynamic link distance splitting, in addition to seasonal variations in PL attenuation. It outperforms other models in terms of accuracy when tested on a dataset from a different Canadian region, and it provides excellent and steady accuracy when tested on a largely different open-access dataset for mobile communication technology from seven different regions in England.


I. INTRODUCTION
Wireless communication networks are a good alternative for providing connectivity in rural regions thanks to their robustness, easiness of deployment and low costs. However, a serious challenge remains in terms of predicting the radio signal attenuation caused by environmental parameters, such as ground relief or vegetation. Many wireless standards are used by WTTH (Wireless-To-The-Home) providers, such as IEEE 802.11 (particularly the wide-range Wi-Fi versions), which attracted a lot of interest due to its practical advantages [1]. A successful deployment of wireless networks requires careful planning due to radio signal impairments caused by the surrounding environment, resulting in propagation path loss (PL) and limiting the quality of service (QoS). Accurate PL estimation is crucial for network planning and troubleshooting. This estimation can be made either work examines the accuracy of these models for our studied environment, in order to select a convenient PL model or design a new one.
Despite the redundancy of empirical PL models, just a few are handling rural environments and frequencies less than 6 GHz. In addition, empirical PL models have specific validity intervals (distance, frequency, environment type, and so forth), and their performances outside these intervals have not been fully investigated. Previous PL models have, in general, been tested on a single (or two, at most) frequency band or a single technology. Thus, wireless engineers need assistance and guidelines with regards to how to choose a convenient PL model. This work is a step towards dealing with these problems.
The main contributions of this article can be summarized as follows: 1) We survey the state-of-the-art PL empirical models and focus on the vegetation effect. 2) We study the accuracy of listed PL models by comparing their predictions based on our dataset in four frequency bands below 6 GHz for outdoor Wi-Fi WTTH in Canadian rural regions. 3) We establish the effect of long-term seasonal variations on radio signal PL attenuation. 4) We propose a new PL empirical model based on obstruction level, frequency band, dynamic distance splitting and long-term seasonal variation. 5) We test and validate the proposed model on datasets from different regions and countries.
The paper is organized as follows: Section II presents the state-of-the-art of empirical PL models. In Section III, quantitative performance metrics for PL model accuracy are detailed. In Section IV, the data collection and measurement are presented, and the accuracies of listed PL models are compared. The proposed PL model is detailed in Section V. In Section VI, seasonal PL variations have been integrated into the proposed model. Section VII presents the model validation conducted via independent international measurements. Finally, conclusions and future work are provided in Section VIII.

II. STATE-OF-THE-ART
A. RELATED WORK Many research papers have studied existing PL models. In [3], Sicker et al. described and implemented 30 propagation models proposed over the last 70 years for urban and rural areas. They found that the landscape of PL models is precarious. Furthermore, they affirmed that PL modelling achieves, at best, 8 to 9 dB Root Mean Squared Error (RMSE) in urban environments and around 15 dB RMSE in rural ones. The authors of [4] developed a new taxonomy by conducting a deep survey covering more than 60 years of continuous research. They stated that the next generation of PL models will be data-centric, that attempt to extract information from directed measurements. Phillips et al. analyzed 28 PL models by using a large set of data from a wireless network in rural New Zealand [5]. They demonstrated that: ''the state-ofthe-art, even for the ''simple'' case of rural environments, is surprisingly ill-equipped to make accurate predictions.'' The best accuracy they achieve is a 12 dB RMSE.
Sarkar et al. [6] reviewed various propagation models for both indoor and outdoor environments and summarized their advantages and disadvantages. Sun et al. [7] presented mmWave propagation measurements and PL models for outdoors and indoors, then investigated their performances across 5G networks. Erunkulu et al. [8] surveyed existing techniques and mechanisms for network coverage prediction. They provided an up-to-date review of existing PL models, along with a comparative analysis, to aid in the planning and implementation of cellular networks. Wang et al. [9] compared the prediction performances of several channel models in terms of three important metrics: the cell radius, spectral efficiency, and outage probability in both indoor and outdoor scenarios for Internet of Things (IoT) communications. Anusha et al. [10] surveyed different propagation models to calculate path attenuation in rural, suburban and urban areas.
Most of the research considered PL modelling for mobile technologies in urban areas [11]. Whereas some works focused on empirical PL models for FWA in rural areas, but a good proportion of these studies were simulation-based only, with just a few including measurements. Moraitis et al. [12] examined the accuracy of PL models at 3.7 GHz for FWA Long-Term Evolution (LTE). They recommended the standard propagation model (SPM) as the best option for network dimensioning and planning. MacCartney and Rappaport [13] studied rural macro-cell (RMa) propagation models and the current 3rd Generation Partnership Project (3GPP) RMa path loss models for frequencies from 0.5 GHz to 30 GHz. They used measurements in rural Virginia to develop a new RMa PL model for 73 GHz that is more accurate than the existing 3GPP RMa models and can be used for frequencies from 0.5 GHz to 100 GHz. Chee et al. [14] studied the effects of carrier frequency, antenna height and season on a FWA network deployed in a rural area at 825 MHz and 3535 MHz. They found that the frequency dependence of a wireless channel varies strongly with the environment. El Chall et al. [15] investigated LoRaWAN radio channel in the 868 MHz band. Extensive measurement campaigns were carried out to develop a PL model for indoor and outdoor environments in urban and rural locations of Lebanon. Maurya et al. [16] studied the PL in urban and rural areas by considering several parameters, such as frequency, location and best fit. De Guzman et al. [17] applied 3-ray PL analysis to evaluate the impact of variation in evaporation duct height and antenna height in rural coastal areas. Rakesh and Nalineswari [18] analyzed PL models at Global System for Mobile Communications (GSM) 940 MHz and IEEE 802.16's WIMAX 3.5 GHz frequencies in different terrains including rural areas.
Since Wi-Fi is classically used for indoor communications, most studies have considered its PL modeling for indoor applications. However, Wi-Fi-based long-distance networks have emerged as an alternative technology for providing Internet access in rural areas. Fendji et al. [19] compared slope-based empirical PL models to measurements at 2.4 GHz using the 802.11n standard. They proposed a new model based on the Liechty model because the number of obstacles was known in their experiment. The Liechty model provided a better prediction in a non-line of sight (NLOS) scenario, while the new model outperformed in combined scenario (line of sight (LOS) and NLOS). In our context, it is impractical to apply this model since the link distance exceeds twenty kilometers, and the number of obstacles is unknown. El-Keyi et al. [20] updated the log-distance PL model for indoor Wi-Fi to take wall penetration, reflection, scattering, and diffraction effects into account. Oni and Idachaba [21] reviewed PL models and their adaptation to Wi-Fi indoor propagation environments. Rademacher et al. [22] conducted outdoor experiments to measure the PL for Wi-Fi links at distances of up to 10.3 km. They found that the Longley-Rice model provides accurate results. Hong and Wu [23] estimated the signal loss of Wi-Fi at 2.4 GHz to improve search and rescue operations after disasters. Brinkhoff and Hornbuckle [24] studied the Wi-Fi signal range in agricultural crops environments. They proved that the coverage distance exceeded 1 Km. We [1] investigated the Wi-Fi coverage for outdoor IoT applications, and we compared the predictions of many PL models.

B. PATH LOSS EMPIRICAL MODELS
Effective wireless network deployment requires accurate PL modeling to predict the coverage range and minimize required infrastructure. Many PL models have been developed for specific environments [2]. In this section, the most well-known empirical PL models and their pertinent parameters are revisited.

1) 3GPP TR 38.901 MODEL
This model was mainly designed for mobile networks [25], with a frequency bandwidth of 10% around the center value of no larger than 2 GHz. It considers LOS and NLOS links, and supports urban, indoor and rural links, but, in this article, we focus only on its rural macro-cell configuration. It is described using the following formula [26]:

2) FREE SPACE AND LOG-DISTANCE MODELS
The free space model describes the radio signal power loss in LOS links and is written as [2]: PL fs = 32.45 + 20 log 10 d + 20 log 10 f The log-distance model considers NLOS links and random shadowing effects. It is given by [27]: where n is the path loss exponent, PL d0 is the PL in dB at the reference distance d 0 , and χ is the shadowing effect, which is zero-mean Gaussian distributed with a standard deviation between 5 and 16 dB [28].

3) OKUMURA MODEL
This model was built using data collected in the city of Tokyo, Japan [29]. It considers urban, suburban and open areas. The urban area model is the most widely used in cities without tall blocking structures. It is given by [29]: where PLO 50 This model is restricted to the 150-1920 MHz frequency range, receiver heights of 1-3 m, AP antenna heights of 30-100 m and link distances of 1-100 km.

4) OKUMURA-HATA MODEL (OH)
This model [30] is based on the graphical information obtained in the Okumura model but considers the effects of the diffraction, reflection and scattering caused by city structures. Additionally, it applies corrections for suburban and rural environments. The PL in urban areas is given by: PLOH U = 69.55 + 26.16 log 10 f − 13.82 log 10 H AP + 44.9 − 6.55 log 10 H AP log 10 d − C H (11) where 56 log 10 f + 1.1 log 10 f − 0.7 H r for small or medium cities 8.29(log 10 (1.54H r )) 2 − 1.1 if 150 ≤ f ≤ 300MHz for large cities 3.2(log 10 (11.75H r )) 2 − 4.97 if 300 < f ≤ 1500MHz for large cities the path loss in suburban areas is given by: and the path loss in open areas is given by: 78 log 10 f 2 +18.33 log 10 f −40.94 (14) where C H is the antenna height correction factor. This model is restricted to frequencies of 150-1500 MHz, receiver heights of 1-10 m, AP antenna heights of 30-200m and link distances of 1-20 km.

5) EXTENDED COST-231 HATA MODEL
COST is a European Union forum for cooperative scientific research, which developed this model based on measurements in multiple European cities. It extends the urban Okumura-Hata model to cover frequencies of up to 2 GHz [31]. It is most often cited as the COST 231 model, but it also referred to as the Hata Model PCS extension. It is given by [32]: where A 1, . . . , B 3 are the Hata parameters described below: where, aH r is the receiver antenna height correction factor, it is calculated as: 56 log 10 f + 1.1 log 10 f − 0.7 H r for suburban or rural environments 8.29(log 10 C r is a correction factor that is equal to 0 dB for suburban or rural environments and 3 dB for urban areas. At 900 MHz, according to this model, the PL is: For a frequency of 1800 MHz, the PL is: This model was developed by the Electronic Communication Committee (ECC) and extrapolated from the original Okumura measurements. It subdivides urban areas into 'large' and 'medium' categories. Since the characteristics of a highly built-up area such as Tokyo are quite different from those found in European suburban areas, the use of the 'medium city' model is recommended [33]. It can be stated as [34]: where PL fs , A bm , G AP and G r are the free space attenuation, the basic median PL and the AP and receiver antennas' height gain factors, respectively. They are expressed as:  (20) This model is restricted to frequencies ≤ 2000 MHz, receiver heights of 1-10 m, AP antenna heights of 30-200 m and link distances of 1-20 km.

7) STANFORD UNIVERSITY INTERIM (SUI) MODEL
This model is an extension of the Hata model for frequencies above 1900 MHz. It introduced the PL exponent γ and the weak fading standard deviation S as random variables obtained through a statistical procedure. It was originally proposed for FWA networks in the 3.5 GHz band [35]. It divided geographic areas into three types of terrains: A, B and C. Terrain A has the highest PL for hilly areas with moderate to dense vegetation, so it can be considered for densely populated urban areas. Terrain B has an intermediate PL, and it is used for hilly terrains with sparse vegetation or flat terrains with moderate to high tree densities, making it suitable for suburban environments. Terrain C used for flat or rural terrain with light vegetation. The model is given by [28]: where d 0 = 100 m, and S is a log-normally distributed factor for the shadow fading of trees and other clutters. It is capped at 8.2 dB for rural environments, 9.6 for suburban environments, and 10.6 dB for urban environments. X f is a correction factor for frequencies larger than 2 GHz, X h is a correction factor for the receiving antenna height and A is the intercept parameter. The PL exponent γ is set to 2 in free space, 3-5 in an urban NLOS and greater than 5 inside buildings. Parameters a, b, and c depend on the terrain and are given in Table 1. This model is restricted to frequencies ≤ 11 GHz, receiver antenna heights of 2-10 m, AP antenna heights of 10-80 m and link distances of 0.1-8 km.

8) ERICSSON (9999) MODEL
This model was also built off of the modified Okumura-Hata model according to the propagation environment. Sometimes it is called 9999 model. Regardless, it is given by [36]: where The constants a 0 , a 1 , a 2 and a 3 are given in Table 2. This model is restricted to frequencies ≤ 11 GHz, receiver antenna heights of 2-10 m, AP antenna heights of 10-80 m and link distances of 0.1-8 km.

9) WALFISCH-IKEGAMI (WI) PROPAGATION MODEL
This model was developed by the COST-231 project. Four additional factors are included for better prediction within the urban context: heights of buildings, widths of roads, building separations and road orientations. As a result, its applicability in rural area with vegetation is doubtful. It considers only vertical buildings and distinguishes LOS and NLOS situations [37]. The LOS PL is given by: PL WILOS = 42.6 + 26 log 10 (d) + 20 log 10 (f ) (23) where d is in Km, and f is in MHz. For NLOS situations, the PL is: where PL rts is the rooftop-to-street diffraction, and PL msd is the multi-screen diffraction loss. This model is restricted to the frequencies of 800-2000 MHz, AP antenna heights of 4-50 m, receiver antenna heights of 1-3 m and link distances of 0.02-5 km.

10) STANDARD PROPAGATION MODEL (SPM)
This model is based on Hata PL formulas but it ignores the effects of diffraction, clutter, and terrain. It is appropriate for cellular technologies [38]. The PL is given by [39]: where K 1 , . . . , K 3 are the SPM parameters. They are given according to Hata models as follow: The correction function for the receiver antenna height was also ignored for h r ≤ 1.5 m since it has negligible values in that range.
For h r > 1.5, K 6 is the same as in the Hata model, namely, For 900 MHz and h r ≤ 1.5 m, the PL is given by [40]:

C. THE IMPACT OF VEGETATION
Radio signal attenuation through vegetation is traditionally assumed to increase exponentially with the crossed distance through foliage [41]. In the literature, we can find essentially two groups of models: empirical models based on observations and measurements, and analytical models based on propagation theory. Foliage empirical models have attracted a lot of attention since they provide a good compromise between accuracy and simplicity. In the literature, we can find many such models, including the modified exponential decay [42], Weissberger [43], ITU-R [44], COST 235 [45], FITU-R [46] and maximum attenuation [47] models. In general, the foliage PL can be described with the following exponential decay model: where the parameters A, B and C are fitted empirically according to the foliage type. Here, f is the frequency, and d is the distance crossed through vegetation. During our literature search, we noticed that most research studies were done in controlled laboratory-like environments. Hence, the distance crossed through foliage must be known for an accurate estimation of the vegetation loss to be formulated. Unfortunately, such information is difficult to obtain, especially in rural environments where the locations and types of trees and the densities of the vegetation (in-leaf and out-leaf propagation) are not uniform within the radio signal path. In addition, available models cover only relatively short foliage distances (400 m), while radio links may exceed 10 Km.
Furthermore, there are other parameters that affect the vegetation loss, such as the presence or absence of leaves on deciduous trees, and the humidity of the vegetation, as they are involved in the estimation of trees' dielectric constants (permittivity and conductivity) [48]. In general, high humidity increases the propagation loss, but the amount of loss is still difficult to predict, as it depends on the tree types, frequency, humidity level, rainfall rate, and so on. In general, isolated trees do not represent an important problem, but dense vegetation has a major effect on a wireless signal. In addition, the CCIR report [44] has assessed the vegetation attenuation per meter through foliage for many frequencies for greater simplicity. It is about 0.05, 0.1, 0.2, 0.3 and 0.4 dB/m for 0.2, 0.5, 1, 2 and 3 GHz, respectively. At low frequencies, the horizontal polarization has less attenuation than the vertical one due to scattering from tree trunks, but this difference vanishes above 1 GHz [49].
Finally, as the vegetation PL requires a wide set of parameters and existing models are not practical for real and rural deployments, it is important to establish a more accurate and easier-to-use PL model for rural areas. With this aim in mind, we integrate the vegetation effect implicitly through the PL empirical modeling for NLOS links because vegetation is the major source of obstructions in rural areas.

D. DISCUSSION
According to our review, the most relevant parameters for PL estimation are the AP and receiver antennas heights, link distance and frequency band. As the PL increases with frequency band and link distance, it is important to choose the lowest possible frequency to raise the radio signal coverage and the quality of NLOS links. On the other hand, the PL is inversely dependent on the antennas' heights since taller antennas decrease the effect of obstructions on the radio signal. Most existing empirical models' validity intervals were developed to respect mainly GSM applications. In this context, fixed wireless applications have not been investigated well, and not enough measurements have been made to confirm the accuracy of these models in such applications. Table 3 compares many research studies according to the relevance of the existing PL empirical models to their measures. When a new model is proposed by the authors to fit their measures, it is listed as ''author proposed'' in the ''fitted path loss'' column. This table lists the frequency band, coverage distance, area type, main wireless technologies and if the mobility is considered in each study. It also indicates whether the measurements were taken in a controlled environment. It is noticed that most existing research papers consider the PL for mobile technologies and urban areas. In most cases, the coverage distance does not exceed 1 Km for Wi-Fi. The locations of these experiments are mostly controlled. Wi-Fi is often considered for indoor or small outdoor areas such as university campus. Recently, this technology has been used for outdoor long-range links to bring cheap and convenient broadband fixed Internet access to rural or difficult-to-access areas. Furthermore, only one wireless technology or two frequencies are considered at most. While with the diversification of services and applications, many communication technologies can be offered in the same area to satisfy various communication needs.
Unfortunately, it is difficult to find research papers dealing with FWA for long-range Wi-Fi in rural areas that include several frequency bands or several technologies. It is also difficult to get any agreement concerning the best PL model to use for a given area, technology or frequency. A PL model cannot be blindly used in a given environment unless its inherent parameters are well-tuned to that environment. Furthermore, propagation in rural areas is considered less challenging since it is assumed that there will be a clear LOS or free space. This observation is not true, in general, due to high vegetation densities and the differences inherent in seasonal propagation conditions. Hence, this article explores the applicability of existing empirical PL models for FWA in rural areas for Wi-Fi long-range radio links with many frequency bands.

III. PERFORMANCE METRICS
In order to compare the prediction accuracies of PL models quantitatively and efficiently, many performance metrics are used [58], [59]. The prediction error is given by: where PL meas i and PL pred i are the measured and predicted PLs, respectively, and i is the index of each PL sample. The mean error is then given by:ε where N is the total number of observations. The mean absolute error (MAE) expresses the mean shift between the measurements and the predictions and is given by: The root mean squared error is among the most important metrics with which to assess prediction accuracy and is perceived as the shadow factor. It is provided by: The standard deviation of a given PL model is given by: where PL i is the path loss model at a given index, andPL is the mean. The cumulative distribution function (CDF) of a PL evaluated at a given value, say x, is the probability that the PL will take a value less than or equal x. It is given by: The five-number summary can be deduced from the CDF: the maximal and the minimal PL values, the lower and the upper quartiles and the median value. The maximal PL value PL max is given by: The minimal PL value PL min is given by: The path loss median value (PL %50 ) separates the ordered PL values into two equal intervals and is given by: The lower quartile PL 25% divides the interval between the minimal and median PL values into two equal halves and is expressed as: The upper quartile PL 75% divides the interval from the median to the maximal PL values into two equal halves and is expressed as: The coefficient of determination or ''R squared'' assesses the ability of a given model to predict a given observation. In the best case, R 2 = 1, the model fits the measures perfectly. When R 2 = 0, the model predicts mostlyPL meas . A negative value means that the model fails in its predictions. R 2 is expressed by: The statistical dependence between two parameters, x and y, is given by the cross-correlation coefficient, also referred as the sample Pearson correlation coefficient. It evaluates whether the relationship between these two variables can be described with a linear function. Hence, this function can be used to extract the parameters that influence the path loss the most. It is defined as: wherex andȳ are the mean values of x and y, and N is the total number of observations. Since the measurements include thousands of observations, a logarithmic regression is used to ease and clarify the visualization. Hence, the cloud of observations can be simplified via a logarithmic curve. Its equation is given by: where A, B, C and D are tuned for the best fit according to f , and x is the input variable. For example, if logarithmic regression is used to clarify the curve of the measures according to distance, then these variables are fitted according to the measured PL and the corresponding distance. In the same manner, the listed PL models produce a cloud of points that are difficult to visualize according to each pertinent parameter (e.g. distance). The use of logarithmic regression can simplify and clarify the visualization of a given PL model according to a pertinent parameter.

IV. DATA COLLECTION AND MEASUREMENT METHODS
The measurements were provided by an Internet service provider servicing two different rural regions with FWA networks: SLSJ and OUT in Quebec, Canada. Figures 1 and 2 present the distribution of the wireless links in the two regions. The signal paths contain hills, plains and lakes and are largely covered by trees. The propagation environment varies between two conditions: (i) extremely cold, snowy weather with coniferous trees during the winter, and (ii) hot, rainy weather with deciduous leaves during the summer. Essentially, the networks consist of Wi-Fi longrange links based on the 802.11 standard, as previously presented in [1], [60]. The networks include thousands of CPE units connected to hundreds of APs. Measurements include relevant information such as transmitted and received signal power, data rates, signal-to-noise ratios, noise floors and so forth. Operating frequencies correspond to the ISM bands: 915 MHz, 2.4, and 5.8 GHz as well as the licensed 3.65 GHz band. These frequencies are diversified to provide various penetrations capabilities. Bandwidths range between 5 and 80 MHz for diversified throughput options. AP and CPE antennas of various types and gains are used. Many transmitted powers are used for various radio link configurations. Additional pertinent information includes antenna gains and heights, link distances and obstruction type (LOS or NLOS). Only links within the AP coverage are used. CPE nodes are mostly installed on rooftops of houses, whereas APs are installed on communication towers, churches' steeples or the rooftops of houses. Hence, the effects of ground reflection and moving objects can be neglected. The proposed model is always tuned according to the SLSJ region and tested separately in the OUT region in order to verify the generalizability of our assumptions. The PL is computed with the Friis link budget formula [61]: where P r and P t are the received and transmitted signal powers, respectively; G t and G CPE are the AP and CPE antennas' gains, respectively; and L is the total link loss, which includes the PL and the total losses of the internal devices L int for the APs and CPE units. As all variables are known through data sheets (L int , P t , G t , G r ) and measurements (P r ), the PL is assessed via the following formula: Figure 3 presents the PL measurements and their logarithmic regressions (Eq. 40) for easy visualization. The distance varies between 0.05 and 18 Km, and the PL interval is within 70 and 150 dB.

A. MEASUREMENT COMPARISONS WITH PATH LOSS MODELS
Each listed PL model has many configurations depending on area (e.g., urban, rural), frequency, antenna heights and radio link distances. Their accuracies are assessed according to their RMSEs. Then the best configuration in terms of the RMSE of each listed PL model is used in Fig. 4. The logarithmic regression of each model configuration is added to clarify the curves. According to this figure, urban or bigcity configurations are mostly retained, which confirms the inaccuracies of the rural or open-area configurations of most of the listed PL models since they considered rural areas to be low-obstruction areas, whereas radio signals are highly attenuated by the increased vegetation density. To obtain a more detailed picture of these PL models' accuracies, the RMSEs of all their configurations are compared in Table 4 At an accuracy in this range, a radio link can easily be predicted to be feasible while proving impossible to deploy in a real environment. In the remainder of this article, a new, accurate PL model is proposed and optimized according to many data types and scenarios. Then its accuracy and statistical performance metrics are compared with the listed PL models in order to highlight the improvements it brings.

V. PROPOSED PATH LOSS EMPIRICAL MODEL
The advantage of empirical PL models is their good balance between the low computing complexity and the improved prediction accuracy. Once generated, the proposed PL model is similar in terms of computational complexity to any other mathematical formula that involves arithmetic operations with few input parameters. Regarding the input parameters, the PL models are generally dependent on the following ones: distance, frequency band and antennas' heights. This finding was confirmed by the previous review on PL models, where most listed models are dependent on these parameters. Table 5 contains the cross-correlation coefficients (see Eq. 39) of the PL measurements with the pertinent parameters. Note that link distance and AP antenna height are highly correlated, as higher APs cover, in general, wider distances. Hence, the proposed model must contain both parameters in the same term. The PL cross-correlation with frequency is −0.01, which is low due to the non-uniform data distribution. This finding is confirmed by the LOS links' measurements, which are uniformly scattered according to frequency bands, which have a cross-correlation with frequency of 0.17. Antennas' heights are negatively correlated to PL measurements, which means that when antennas' heights rise, the PL generally decreases. This conclusion reflects the expectation that since higher antennas can better clear LOS obstructions, they can consequently decrease the PL  attenuation. Whereas, distance is positively correlated to PL, which increases logically for wider radio links. Fig. 5 presents the PL measurements versus the pertinent parameters: frequency, distance, AP and CPE antennas' heights, respectively. The logarithmic regressions are added for better visibility of the relationships and the variability between the PL and various pertinent parameters. The PL variability with frequency is low due to non-uniform data distribution as discussed earlier. When the dataset is conveniently scattered according to frequency, as for LOS links, the PL variability improves considerably, and the cross-correlation increases from −0.01 to 0.17. According to the logarithmic regression curves, the PL variability with distance is within the range of 105-130 dB, whereas it is 114-122 dB and 115-122 dB for AP and CPE heights respectively.
Since the PL can be logarithmically approximated by pertinent parameters, as in Figure 5, and by considering the previous arguments, the proposed model is expressed as follows: PL zek = A 0 + A 1 log 10 (f ) + A 2 log 10 (d) + A 3 log 10 (H AP ) +A 4 log 10 (h r ) + A 5 log 10 (H AP )× log 10 (d) (43) where f , d, H AP and H r are the frequency, link distance and AP and CPE antennas' heights, respectively. A 0 , A 1 ,  A 2 , A 3 , A 4 , and A 5 are model parameters that are tuned for better accuracy. More precisely, A 0 is the intercept parameter. A 1 is the frequency dependent tuning parameter, logically it must have a positive value since the PL increases for higher frequencies. A 2 is the distance dependent tuning parameter, its value is positive as the PL increase for wider radio links. VOLUME 8, 2020  A 5 is the tuning parameter for the correlated distance and AP antenna height term. A 3 and A 4 are respectively AP and CPE heights' tuning parameters. They have negative values since higher antennas are better to clear the line of sight and decrease the propagation PL.

A. MODEL OPTIMIZATION OBTAINED BY SPLITTING DATA
Model parameters are extracted by using the least squares algorithm implemented via the curve fit function in the optimization module of SciPy [62]. The extraction is done with SLSJ measurements and tested directly with the OUT ones in order to verify its generalizability. Three optimization criteria are used: region, line of sight obstruction (LOS, NLOS) and frequency bands (0.9, 2.4, 3.65 and 5.8 GHz). Hence, seven sets of parameters are shown in Table 6. Parameters extracted according to LOS links are smaller than those extracted according to NLOS links since there are fewer obstructions in the LOS case. Frequency dependent parameters vary slightly with the frequency bands since the PL is less correlated with frequency (e.g., Table 6 ). The 5.8GHz band has the largest parameters since the radio signal is the most highly attenuated in this band. Figure 6 presents the measured and the predicted PLs for the SLSJ and OUT regions. The link distances in the OUT region are smaller due to the vegetation obstructions. As the links in OUT have more obstructions, the proposed model slightly underpredicts the PL in comparison to the predictions made for the SLSG region. Figure 7 illustrates the measured and predicted PLs by splitting the data between LOS and NLOS links for the SLSJ and OUT regions. The predicted PLs for the NLOS links are around 6 to 7 dB higher than those predicted for the LOS links. Figure 8 presents the measured and predicted PLs by splitting the data according to frequency bands for the SLSJ and OUT regions. The 5.8 GHz band has the highest PL values since it is the most sensitive to obstruction. The proposed model generalizes well since it still accurate for a different region (e.g., OUT). Table 7 contains the RMSEs for the various optimization criteria. The total RMSE by region is 10.15 dB, which corresponds to an improvement of 6.89 dB compared to the best of listed PL models. The RMSE difference between the SLSJ and OUT regions is due to the high vegetation density in the OUT region. When splitting regions' measurements according to obstruction or frequency bands, an additional improvement of 0.33 dB is obtained since model has been tuned with more specific data.

B. DYNAMIC DISTANCE SPLITTING (DDS)
Since the PL is highly dependent on the radio link distance, which can exceed 18 Km for long-range Wi-Fi, the available measurements are dynamically split into many intervals according to the path distance to improve the PL prediction accuracy. Hence, each data interval is optimized separately to extract its own parameters A 0 , . . . , A 5 . The number of intervals is optimized to minimize the overall RMSE. To predict the PL of a new link, the interval parameters that include the link distance are used. The new PL expression is then: +A 3,i log 10 (H AP ) + A 4,i log 10 (h r ) +A 5,i log 10 (H AP )× log 10 (d) (44) where A 0,i , A 1,i , A 2,i , A 3,i , A 4,i and A 5,i are the PL model tuning parameters for the ith interval. Figure 9 presents the comparison between the PL measurements and DDS model results for the SLSJ and OUT regions. The DDS is tuned according to the SLSJ measurements and applied directly to the OUT region. It generalizes well despite the differences between regions in terms of vegetation and terrain and shows an additional improvement of 0.16 dB RMSE in the OUT region according to Table 8. Figure 10 presents the extraction of the optimal number of intervals by comparing the RMSE for the SLSJ region    (training set) to the RSME for the OUT region (testing set). Its value is 6, and the corresponding RMSE for OUT is 10.36 dB. When the number of intervals exceeds this optimal value, the DDS model becomes overfitted since the RMSE continues to decrease in the SLSJ region, whereas it increases in the OUT region. Therefore, the DDS model progressively loses its generalizability as it memorizes the learning set. Figure 11 compares the CDF of the PL measurements to the values obtained from the proposed PL models. The PL dds model produces values closer to the measurements because splitting the available paths into many intervals increases the prediction accuracy. Furthermore, the proposed PL models' values vary faster than the measurements around their mean values since their standard deviations are bigger.

C. COMPARISON WITH OTHER MODELS
The comparison of the RMSEs of the proposed PL models and the most accurate configurations of the listed PL models is presented in Table 9 by region, obstruction level, frequency bands and whole datasets. The DDS algorithm improved the RSME of the proposed PL zek model by 0.2 dB for the whole VOLUME 8, 2020  dataset, while the proposed PL zek model's RSME is better than those of existing models by at least 6.8 dB. When the data are split according to regions, the RMSE for the proposed model PL zek is 6.6 and 7.5 dB better for SLSJ and OUT regions, respectively. For the LOS links, the improvement is 2.4 dB, while it is 6.5 dB for the NLOS links. The 0.915 GHz frequency band showed an improvement of 5.9 dB, while the accuracy improved by 7.0, 3.2 and 4.9 dB for the 2.4, 3.65 and 5.8 GHz frequency bands, respectively.
An additional statistical analysis is carried out in Table 10, where the proposed PL model values are compared to the PL measurements and to the values obtained through the best configurations of the listed PL models in terms of the coefficient of determination (R 2 ), mean, standard deviation, MAE, min, max, and so on. The proposed models have positive coefficients of determination since their predictions are closer to the measurements, while the other models have negative R 2 s due to their high prediction errors. The same observation holds for the other statistical parameters, where the mean, standard deviation, min, max, and so forth are closer to those for the measurements than the statistical parameters obtained via other listed PL models. The mean absolute error (MAE) of the proposed PL zek model is 8.1 dB, which means that when designing new radio links, there can be a difference of 8.1 dB, on average, between the PL prediction and the real PL value. When LOS-NLOS or frequency splitting is considered, the MAE improves to 7.8 dB.      previous statistical analysis, especially for higher PL values, where the link distances and the vegetation obstructions are large. The differences between the measurements and the values generated by the other listed PL models are high since these models are developed and tuned for urban areas with more obstructions. Furthermore, their rural configurations are less accurate since they consider radio links to be mostly LOS links, contrary to the high vegetation densities in these areas.

VI. SEASONAL EFFECT
In the rural Canadian context, the extreme propagation conditions can be split according to the seasons into two groups: cold and hot. Since our research considers only long-term extreme PL seasonal variations that last throughout entire cold or hot seasons, snow and rain falling are not accounted for, as they are of limited duration and their impact is reduced in our operation frequencies [63], [64]. The main long-term seasonal effects are the PL difference due to leaf growth during the hot season and snow accumulation and leaves falling during the cold season. During the hot season, the attenuation includes vegetation and buildings (for NLOS links) added to the free space effects. Thus, the hot season attenuation A hot is: where A fs , A veg and A build are the free space, vegetation and building attenuations, respectively. During the autumn, leaves fall progressively, and the total vegetation attenuation decreases progressively to reach its value for the cold season. During the winter, the conifers, buildings and accumulated snow (for NLOS links) and free space attenuations are considered. The cold season attenuation A cold is given as follows: A cold = A fs + A con + A build + A accu (46) where A con and A accu are the conifers and the accumulated snow attenuations, respectively. Furthermore, most researchers have used laboratory-like environments, where all the influencing parameters are under control, and the path distance is many hundred meters. Yet, in wide-deployment networks or hard-to-access rural areas, most of these parameters are difficult to assess accurately. Consequently, in this article, an easy and comprehensive PL model is developed by considering long-term extreme seasonal variations in PL measurements during winter and summer. Figure 13 presents the PL during the summer when the leaves are present and during the winter when the leaves have fallen and snow has accumulated. The PL difference between the two seasons is presented in the figure below; the total mean difference is about 3.92 dB, which mean that when installing a wireless link during the winter, a margin of at least 3.92 dB must be considered for the PL attenuation increase during the summer. After splitting the radio links according to obstruction levels, the mean seasonal differences in the PL are 1.22 and 5.14 dB for the LOS and NLOS links, respectively. Hence, margins of 1.22 and 5.14 dB should be considered for the summer. Figure 14 presents the CDFs of the PL measurements for the summer and winter seasons for the entire dataset and for when the data are classified according to line of sight obstructions (LOS, NLOS). It reflects the expectations that the margin for NLOS links between summer and winter  should be bigger than the one for the LOS links. Furthermore, the minimal and maximal PL means are sorted logically in the following ascending order: Winter LOS, Summer LOS, Winter all data, Winter NLOS, Summer all data and Summer NLOS.

A. MODEL OPTIMIZATION ACCORDING TO SEASONAL VARIATION
To model the long-term PL variations between summer and winter, the measurements are split into the two corresponding groups. Then each group is split according to the obstruction level (LOS, NLOS). Later, the proposed model is tuned for each group by using the least squares algorithm implemented by the curve fit function in the optimization module of SciPy [62]. Therefore, four groups of parameters are obtained: LOS Summer, LOS Winter, NLOS Summer and NLOS Winter. They are presented in Table 11, where the NLOS and Summer parameters are generally greater than LOS and Winter ones.
Similarly, Figure 15 presents four groups of curves, where each one compares the measured and predicted PLs according to seasonal variations, line of sight obstructions (LOS and NLOS) and region (SLSJ and OUT). The predicted PLs during the summer are higher, as leaves are growing. Similarly, the PL seasonal variation is bigger for NLOS links than it is for LOS ones. This variation increases for greater distances since more vegetation can be present along the signal path. Table 12 shows that our model accuracy has been improved by 0.69 dB (in terms of the RMSE) when seasonal effects are considered. The RMSE for the entire dataset is about  9.46 dB, whereas it is 10.15 dB without taking seasonal effects into consideration. When taking the seasonal effects into consideration, the MAE drops to 7.5 dB, whereas the MAEs are 8.1 and 8 for the PL zek and PL dds models, respectively.

VII. VALIDATION WITH INTERNATIONAL INDEPENDENT MEASUREMENTS
In order to verify the generalizability of our model, it is tested with independent and open PL measurements for mobile communication technology from seven different regions in England with radio link distances of over 25 km. Measurements were made during summer and winter, and they include rural, urban and suburban areas with various obstructions, such as vegetation, buildings, hills, and so on. Frequency bands range from 449 MHz to 5850 GHz. A complete description of the measurements is available in [65]. Table 13 presents a comparison of the RMSEs between our proposed model and the most accurate PL models among all the models listed in this article. In each location, all the configurations 182772 VOLUME 8, 2020 of the proposed PL model are compared to all listed PL models, but, for the sake of simplicity, only the most accurate configurations have been included in this table. This comparison includes many frequency bands, the obstruction type and the season for all concerned regions. According to this comparison, our model is among the most accurate, with a steady performance in various environments despite the following issues: 1) The differences between technologies: our model has been optimized for Wi-Fi technology, whereas the test measurements were taken for mobile GSM. 2) The difference between countries: our model has been optimized in Canada, whereas the other measurements were made in England.
3) The diversity in topography: dense urban, suburban, dense rural, high vegetation, mountainous, and so forth. 4) The difference in antennas' heights: various heights were used for our model, and 1.5 m was used for the receiver antenna heights in England dataset. 5) The heterogeneous nature of our measurements, and the homogeneous test data. 6) The wide frequency range, which is extended from 450 MHz to 5.8 GHz.

VIII. CONCLUSION
Wireless networks operating below 6 GHz present an attractive solution for connecting people in rural regions. Therefore, wireless engineers need assistance in choosing a convenient PL model for their purposes from among the many available models. This article provides a step towards dealing with this issue. This article reviews the available PL models and discusses the effects of vegetation on radio signal attenuation. In addition, long-term extreme PL seasonal variations that last throughout entire cold or hot seasons are considered since this issue has not been covered well in the literature. The accuracies of most known PL models are characterized for FWAs of various frequencies and configurations in two different rural regions in Canada. Then the light is shed on how to derive a useful PL model for the rural area by proposing a new PL model. The improvement brought by this new model is about 7.2 dB in terms of the RSME. Taking the seasonal effect into consideration adds an additional improvement of 0.69 dB. The model is tested on datasets from different regions in England, and it provides predictions with very high accuracy. The proposed model is valid for frequencies below 6 GHz, CPE heights of 3-81 feet, AP antenna heights of 3-220 feet and link distances below 34 Km. Future research could consider a solution for the user association and AP selection in order to improve the QoS in difficult rural environments.