On the Usefulness of the Generalised Additive Model for Mean Path Loss Estimation in Body Area Networks

In this article, the usefulness of the Generalised Additive Model for mean path loss estimation in Body Area Networks is investigated. The research concerns a narrow-band indoor off-body network operating at 2.45 GHz, being based on measurements performed with four different users. The mean path loss is modelled as a sum of four components that depend on path length, antenna orientation angle, absolute difference between transmitting and receiving antenna heights and relative polarisation of both antennas. It is proved that the Generalised Additive Model allows for mean path loss estimation with a higher accuracy in comparison with Linear Regression. The obtained mean error is 0 dB, the root mean square error is 5.52 dB and the adjusted coefficient of determination is 61.2%.


I. INTRODUCTION
The design of a wireless system should be preceded by a deep analysis of radio channel properties in the target frequency band, environment and scenario. For this reason, there is an unceasing need for elaborating radio channel models for new types of radio systems working in new frequency bands and/or in new types of environments.
Nowadays, Body Area Networks (BANs), which refer to body centric wireless communications where at least one of the communication devices is attached to the human body, play a very important role in the next generation of wireless systems [1]. In this article, off-body communications are considered between a fixed (off-body) device and a wearable (on-body) one. Placing the antenna on the user's body causes many disadvantageous phenomena, e.g., near-field coupling or radiation pattern distortion. Moreover, significant variations of the received signal may be caused by shadowing and scattering from the body as well as from the environment.
The associate editor coordinating the review of this manuscript and approving it for publication was Lorenzo Mucchi . Therefore, in order to boost the overall system performance, a good understanding of the radio channel is required.
One of the commonly analysed components of the radio channel characteristics, beside fast and slow fading ones, is the mean path loss [2]. In the literature, one can find many different approaches for mean path loss modelling in various types of wireless systems working at a wide range of possible frequencies. The most common statistical tool used for developing empirical models for mean path loss estimation is Linear Regression (LR) with the Least Square Method (LSM) [3] approach, which has been used, e.g., in [4] for path loss modelling at 28 and 38 GHz in indoor environment. The more complex Multivariate Linear Regression (MLR) [5] approach has been applied in [6] for mean path loss estimation for mobile systems operating in a containers terminal environment, or in [7] for ground reflection path loss estimation. If the relationship between path loss and one of the independent variables is not linear, one can use the logarithmic function for its linearisation or use a non-linear regression, e.g., in [8] a non-linear multi-regression with the pseudo gradient search approach has been used for path loss estimation in mobile communication systems. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ In the last few years, one can also find more sophisticated methods for radio channel modelling. In [9], path loss prediction models for aircraft cabin environments are created with the use of different machine learning methods, like back propagation neural network, support vector regression, or random forest. The authors of [10] have applied a deep learning technique for path loss prediction in mobile communication systems at 2.6 GHz and compared the proposed approach with traditional channel models. The neural network approach has been proposed in [11] for path loss modelling in urban areas at 900, 1800 and 2100 MHz. Even fuzzy-logic techniques may be useful for building path loss models, as it has been done in [12] for metropolitan environments at 900 and 1800 MHz.
Some researchers also use a kind of hybrid methods. For instance, in [13] a heuristic approach has been applied for penetration and path losses modelling at 677 MHz; a neural network combined with multiple regression have been used for the elaboration of an indoor prediction model. In [14], a linear fitting has been used for elaborating a log-distance path loss model, but the authors have also used a Fourier series in order to model a non-linear dependence of the path loss exponent on the transmitter distance to the closest wall.
As for other wireless networks, in BANs (in the majority of cases) the mean path loss models are developed with the use of LR. In [15], this tool has been used for elaborating an experimental path loss model for in-body communications within 2.36 to 2.5 GHz. A similar approach has been proposed in [16], where LR with LSM has been used for path loss estimation in the localisation of an endoscopic capsule. LR has also been used in [17] for investigations of different approaches for path loss exponent estimation in off-body channels. Also, in [18], the analysis of the mean path loss for narrow and ultra-wide band off-body networks in a ferryboat environment is based on LR. Additionally, in [19] linear modelling is used for investigations of the impact of frequency dependence of human tissues on the path loss in ultra-wideband in-body channels in the range from 3.1 to 5.1 GHz. In-body channels are also considered in [20] and [21], where LSM is used to obtain path loss parameters for homogeneous human tissues, such as muscle, brain, fat and skin. In [22], an off-body channel at 6-8.5 GHz is studied and a linear path loss model for a hospital environment is proposed, using LSM, with a height-dependent and a body obstruction attenuation factors. One can also find research on the usage of mm-waves for BAN applications, e.g., in [23], where LR is used for elaborating a path loss model for on-body channels at 94 GHz, whereas in [24] and [25] the same statistical tool is used for calculating path loss models for line of sight (LoS) and non-LoS (NLoS) off-body channels at 60 GHz within indoor environments. Although the current article addresses mean path loss modelling methods, it is worthwhile to mention that LR can also be used for modelling the influence of the user's body on channel characteristics, e.g., in [26], for the calculation the parameters of a maximum body-shadowing loss model, which is a component of a more general model for off-and body-to-body communications.
The Generalised Additive Model (GAM) -described in Section II -is a statistical tool that uses smooth functions of predictor variables, which allows to see the contributions of each of the variables toward the composite model, and to model nonlinearities [27]. It is commonly used for elaborating statistical models in a wide scope of applications, like forecasting gas usage [28], marine power systems analysis [27], computer-aided diagnosis systems [29], multi-objective programming [30], modelling of hospital admissions [31], managing high-speed railways [32], selection of optimal conditions for wine grapes analysis [33], analysis of solar irradiances for solar energy production [34], or even for spatio-temporal modelling for criminal incidents [35], just to mention a few. However, to the best of the authors' knowledge, there is no research on the usage of GAM for radio channel modelling, especially for the development of mean path loss models for BANs, which determines the novelty of the presented work.
The rest of the article is structured as follows. Section II consists of the description of GAM, which has been used to develop a new mean path loss model for an off-body channel. In Section III, the measurement campaign is described, including equipment, environment and scenarios being investigated. In Section IV, the general mean path loss model is formulated, and a comparison between models obtained with the use of LR and GAM is addressed. The final mean path loss model is presented in Section V. Section VI concludes the article.

II. BRIEF DESCRIPTION OF THE GENERALISED ADDITIVE MODEL
Assume that there is a relationship between random variable Y ∈ R and random variable X ∈ R P , given by [36], [37]: where f is an unknown function describing the relationship between X and Y , is a random error independent from X with the expected value equal to 0. Random variable Y is called the response or dependent variable [36], while random vector X = (X 1 , X 2 , · · · , X P ) contains the predictors, called independent variables, features or sometimes just variables [36]. Currently, the linear statistical model that is used most often is LR, which is given by [3], [38]: where β 0 is an intercept and β 1 , β 2 , · · · ,β P are the slopes corresponding to the components of vector X accordingly. The benefit of using (2) is the fact that one needs to calculate only P + 1 coefficients (β 0 , β 1 , β 2 , · · · , β P ) by using the method of least squares [37]. Unfortunately, in many real-life scenarios, (2) is not a proper approach, which results in low accuracy [36], [37], therefore some methods to fit a non-linear function have been proposed, e.g., polynomial regression. However, the disadvantage of this method, and others that belong to the same group called basis functions, is the need to assume a priori the form of f (X ) in (1) (e.g., a degree of a polynomial). The quality of the calculated model highly depends on whether f (X ) can be approximated by a polynomial or not. Additionally, there is a problem with polynomials with degree higher than 4, because the polynomial curve can become overflexible and take very unsuitable shapes [36].
In this article, one proposes the usage of a statistical model given by [37], [39], [40]: which is called the Generalised Additive Model, where each linear component β K X K in (2) may be replaced by a non-linear function f K (X K ), which are unspecified smooth functions fitted by using scatterplot smoothers (e.g., a spline function) [37]. One should notice that not all of the f K functions need to be non-linear, being possible that f K is linear or has other parametric forms (e.g., when X K is a qualitative variable). GAM has the following advantages [36]: 1) it allows to fit a non-linear function, f K , to each X K , which can be hardly accomplished by LR; 2) it maintains much of its interpretability, while being much more flexible than LR; 3) it is an additive model, which allows to analyse and infer each variable X K separately. By virtue of all these advantages, GAM may decrease the mean square error significantly in comparison to Linear and Polynomial Regressions.
However, as all statistical learning methods, GAM has also some disadvantages and limitations, which can be described as: 1) propensity for overfitting; 2) relatively higher computational complexity, compared to LR; 3) unstable behaviours at the boundaries of smooth splines; 4) propensity for missing important interactions between variables. However, the majority of these disadvantages can be overcome by proper validation techniques and data analysis.
In this article, for the discussion about the quality of prediction models, the most commonly used measures of accuracy have been applied, i.e., mean error (µ e ) and root mean square error (σ e ). In practice, one has access to collected N measurements, which are fixed sets of predictors , andf estimation based on measurements (with the use of a certain model). Therefore, the mean error can be calculated as [41]: where n denotes the measurement number and N denotes the sample size (size of data set). In the case when µ e (often called bias) is equal to 0, on average the prediction model yields the true value [3]. The second parameter, root mean square error, is given by [41]: In addition, for model evaluation, it is common practice to use the coefficient of determination (R 2 ), which can be expressed by [5]: where y denotes the mean value of y, being given by: It should be noted that R 2 ∈ [0, 1]: when R 2 = 1, σ e equals to 0, which means that the model perfectly fits the measurement data, whereas in the case of R 2 = 0, there is no relationship between the response and predictors. In general, the R 2 value gives the percentage of the variance in the dependent variable that can be explained by the independent variables used in the model.
However, one should keep in mind that adding new independent variables to an existing model always increases the coefficient of determination, but results in a decrease in the reliability of the model assessment. Therefore, in practice the so-called adjusted coefficient of determination (R 2 adj ) is used [42]: where P is the number of predictors. It is assumed that, for a positive verification of the model, the adjusted coefficient of determination should be greater than 0.6. In order to evaluate the statistical significance of particular components of both LR and GAM models, the significance level (α), i.e., threshold probability that certain component is statistically significant, has been calculated. It is assumed that the obtained significance level should be lower than the conventional one, i.e., 0.05 [39].

III. MEASUREMENT CAMPAIGN
Measurements were performed in an actual 7 × 5 × 3 m 3 indoor office environment (see Fig. 1), containing typical scatterers, such as tables, chairs, and computers, using the methodology described in [43] and the measurement stand presented in [2].
The transmitting (Tx) section consists of a vector signal generator R&S SMBV100A [44] and an on-body transmitting patch antenna (with a rectangular radiator and linear polarisation), operating in the 2.45 GHz band. This antenna has a 3 dBi gain, and half-power beamwidths of 115 • and 140 • in the H-and E-planes, respectively. The generator is connected to the antenna with a flexible 7 m long RG174 cable [45] (this length was the shortest one fitting the measurement scenario), having an attenuation of 12.1 dB, which was taken into account during the calibration process.
The receiving (Rx) section consists of a spectrum analyser Anritsu MS2724B [46] with a control computer. Measurements were done asynchronously, with the average sample period of 150 ms, and 40 ms standard deviation. The off-body fixed antenna is a dual polarised quad-ridged horn LB-OSJ-0760 [47], operating in the frequency range of [0.7,6] GHz, with a gain of 10 dBi, and half-power beamwidths of 58 • and 46 • in the H-and E-planes, respectively. The height of the antenna (h Rx ) was 1.4 m. It should be noted that measurements were carried out along the axis of the room, which means that the body as well as the Tx antenna were within the main beam of the Rx one in the majority of the investigated cases. Only for small distances (i.e., 1 m and 2 m) there was the need to compensate the characteristics of the Rx antenna, which has been done accordingly.
Switching between vertical (V) and horizontal (H) polarisations of the Rx antenna was performed via a Tesoel TS121 RF switch [48]. Considering the V polarisation of the Tx antenna allowed to perform measurements for co-polarised (CP) and cross-polarised (XP) channels, respectively. All RF connections in the Rx section are done by using Huber+Suhner Sucoflex104 3 m long cables [49].
Measurements have been performed with four users, whose characteristics are detailed in Tab  with a 45 • counter-clockwise increment. For each distance and rotation, measurements were performed with 50 samples, and a median value of the path loss was calculated. Globally, 33 600 instantaneous and 672 median path loss values have been collected during measurements.

IV. GENERALISED ADDITIVE MODEL VS. LINEAR REGRESSION
On the basis of theoretical premises that enable the selection of independent variables (predictors) affecting the mean path loss, the general mean path loss model (L G p ), which is considered as a dependent variable, can be formulated as follows: (9) where: • ϕ [ • ] -on-body (Tx) antenna orientation angle, i.e., the angle between the main directions of the on-body (Tx) and the off-body (Rx) antennas (counter clockwise); • P -polarisation component (variable with two possible values, for CP and XP channels). In order to check the collinearity of predictors, the correlation coefficient [37] between each of them has been calculated -the correlation between particular variables is not significant (≈ 0), which allows to use predictors (without any preprocessing) for GAM and LR models.
It is also essential to know the number of measurements performed for each value of a particular predictor, Tab. 2, which contains the numbers of empirical data for all independent variables (predictors), corresponding to the description of the measurements campaign.
One can notice that for the polarisation component (P) both sets of measurements (for XP and CP) are equinumerous. For the antenna orientation angle (ϕ), the number of measurements is 6 000 for each angle from the subset {0 • , 90 • , 180 • , 270 • }, and 2 400 for each element of {45 • , 135 • , 225 • , 315 • }, which is due to the time constrains that have occurred during measurements, but the expected impact of this difference on model results is not significant, especially when one considers that the purpose of the work is to compare two methods for mean path loss modelling. For the absolute difference of Tx and Rx antennas heights, the number of measurements obtained for particular h values is not uniform, and ranges from 2 400 for the majority of the cases up to 12 000, which is due to different locations of the on-body antenna and different heights of particular users. On the other hand, the distribution for the distance is uniform, and the number of measurements equals 5 600 for each value of d.
For the comparison between the two models, LR in (2) and GAM in (3), the values of specific components and the obtained significance level (α) are shown in Tab. 3. The formulation of the model via LR is then given by: f (X ) =β 0 +β 1 X 1 + · · · +β P X P whereβ P is an estimation of real β P assumed in (2). Similarly, the model obtained via GAM is given by: f (X ) =β 0 +f 1 (X 1 ) +f 2 (X 2 ) + · · · +f P (X P ) wheref P (X P ) is an estimate of real f P (X P ) assumed in (3). Fig. 2 presents plots of the specific components of LR (β P X P ), which correspond tof P (X P ) for GAM. One can notice that for the antenna angle variable there is a non-linear function, its shape resulting from the smooth spline functions [37]; it can be approximated by the sine of half angle (this estimation is shown in Tab. 3). For other variables, there are linear functions, for both LR and GAM. The shape of the given functions was confirmed by the resampling method (crossvalidation), which (on the basis of many probes) estimates the degree of non-linearity for a particular functionf P (X P ) [39].
On the basis of Tab. 4, containing the fitting comparison evaluation of LR and GAM, one can conclude, based on µ e = 0, that both models are not biased (they do not contain systematic errors). Due to the fact that the antenna angle variable is modelled by a non-linear function (see Fig. 2) σ e is lower for GAM (5.52 dB) in comparison to LR (6.90 dB). This also results in higher R 2 (and accordingly higher R 2 adj ) for GAM than for LR, which is 0.615 (0.612), and 0.396 (0.392), respectively. Considering that for a positive verification of the model the adjusted coefficient of determination should be greater than 0.6, only GAM seems to estimate the mean path loss values with the required accuracy. From the analysis of Fig. 2d, which depicts the relationship between antenna orientation angle and mean path loss, one can conclude that LR does not map the mean path loss properly, which influences the performance of the model. For the remaining variables there are no significant differences between models.
The performance of the model can be also visualised by the scatter plots, where in the abscissa axis there is a measured mean path loss, while in the ordinate one there is a mean path loss predicted by models, Fig. 3. For the perfectly predicting model, the points on the scatter plot should be concentrated on the line with a 45 • slope. It can be proved that the slope of the regression line between measured and predicted points is equal to R 2 [5]. The analysis of scatter plots (for fixed value of the measured path loss) allows for the comparison of the spread of estimated points for the proposed model. For  example, if one compares the spread of points for measured path loss equals 60 dB, it can be seen that GAM has better performance (the predicted points are more concentrated). Therefore, one can conclude that for GAM there is a lower dispersion of the predicted values than for LR, which emphasises the advantage of GAM over LR.

V. MEAN PATH LOSS MODEL FOR OFF-BODY CHANNELS
After the comparison of the LR and GAM approaches, and the analysis of their fit to the empirical data, the choice of the final model has been done.
It may be formulated as follows: where is a mean path loss for the reference scenario, in which the user is at the reference distance of 1 m, Rx and Tx antennas are at the same heights, the on-body antenna is facing the fixed one, and both are with the same polarisation.
The path length dependent component is a log-linear function of distance (expressed in m) between the user and the off-body antenna, in relation to the reference distance of 1 m. As one can see, the path loss exponent equals 1.269, which is expected in an indoor environment like the investigated one. The negative slope of the f h function, expressed by means that the higher the difference between Tx and Rx antennas heights (expressed in m) the lower the mean path loss. This may be caused by the fact that for h = 0 the on-body antenna is placed on the torso, which has the biggest size in comparison with the wrist or the head, resulting in a higher impact of body shadowing. The model component related to the antenna orientation angle (expressed in degrees) is a sine function of the half of this angle and with the amplitude equal to 12.57, which means that the maximum attenuation occurs for ϕ = 180 • , which should be expected, since for this angle the on-body antenna is in the opposite  direction to the off-body one, and the received signal is a combination of many multi-path components without the direct one (except a small contribution of the creeping wave component). According to expectations, when the polarisation of Tx and Rx antennas are the same (CP channel) there is no additional loss related to the polarisation mismatch. The opposite situation occurs for the XP channel, in which the two polarisations are orthogonal, resulting in P = 7.75 dB, as it is expressed by 0.00 for CP 7.75 for XP (17)   4 shows a comparison between measured (red cross label) and predicted (green circle label) mean path loss values as a function of distance for U3, ϕ = 45 • , h = 0.1 m and XP channel. In addition, a 95% prediction interval for GAM (which is the interval that covers the true value of the path loss for 95% of cases [37], [38]) is presented with the use of a dashed line in order to show the accuracy of the prediction. As one can see, the model fits well to the empirical data and all estimated mean path loss values are within the prediction interval. VOLUME 8, 2020

VI. CONCLUSION
In this article, the usefulness of the Generalised Additive Model for mean path loss estimation in BANs based on measured data is investigated. To the best of the authors' knowledge, this is the first approach to the usage of GAM for this kind of application.
Initially, a description of GAM is presented, followed by the measurement campaign, including equipment, environment and scenarios being investigated. The measurements have been performed for a narrow-band indoor off-body network operating at 2.45 GHz, and for four different users.
In the main part of this article, the general mean path loss model is formulated and a comparison between models obtained with the use of LR and GAM is presented. The mean path loss is modelled as a sum of four components depending on path length, antenna orientation angle, absolute difference between transmitting and receiving antennas heights and relative polarisation of both antennas. All metrics that have been used for the evaluation of the models' fit to the empirical data show better values for GAM. In particular, the root mean square error is 5.52 dB, being 1.38 dB lower than for LR, and the adjusted coefficient of determination is 0.22 higher for GAM, being equal to 0.61.
GAM proves to be a better method, because of the non-linear relationship between the mean path loss and the antenna orientation angle. In such situations, when one or more independent variables have a non-linear nature, GAM allows for mean path loss prediction with a higher accuracy in comparison with LR.
Future work will focus on exploring more variables with non-linear relationships with the dependent variable, performing measurement campaigns in different environments (both indoor and outdoor) and introducing dynamic scenarios (i.e., user's movement).
MICHAŁ LASKOWSKI was born in Dubai, UAE, in 1999. He is currently pursuing the bachelor's degree in data engineering with the Gdańsk University of Technology (GUT), with a specialization in creating big data solutions. As a Polish Delegate, he participated in the World Youth Forum in Egypt, in 2019. His current research interests include statistical learning, with a particular interest in additive models. He was a recipient of a scholarship from the University of California, Berkeley, USA, for TrepCamp Entrepreneurial Simulator. In addition, he was awarded the Rector's Scholarship for the Best Students and the City of Gdynia Scholarship for Academic Excellence. Professor since 2013, and has been an Associate Professor since 2020. He has authored or coauthored many publications including books, book chapters, articles, reports, and papers presented during international and domestic conferences. He participated and still participates in several projects related to special applications of wireless techniques as well as two COST Actions such as IC1004 and CA15104. His current main scope of research is radio channel modeling in body area networks. His research interests include wireless communication and radio wave propagation. He was a recipient of the Young Scientists Awards from the URSI in 2011 and 2016, and many domestic awards. He is a Senior Member of the URSI and a member of the Gdańsk Scientific Society. He is also a member of the Board of the Working Group on Propagation of the European Association on Antennas and Propagation (EurAAP) and the Vice-Chair of Commission-F of the Polish National Committee of the URSI. He was a Management Committee Substitute Member of the COST CA15104 Action and the Co-Chair of the Sub Working Group Internet-of-Things for Health within this action. VOLUME 8, 2020 LUÍS M. CORREIA (Senior Member, IEEE) was born in Portugal, in 1958. He received the Ph.D. degree in electrical and computer engineering from the IST, University of Lisbon, in 1991, where he is currently a Professor in telecommunications, with his work focused on wireless and mobile communications in the areas of propagation, channel characterization, radio networks, and traffic and its applications, with the research activities developed with the INESC-ID Institute. He has acted as a consultant for the Portuguese telecommunications operators and regulator, besides other public and private entities, and has been serving on the board of directors for a telecommunications company. Besides being responsible for research projects at the national level, he has participated in 32 projects within European frameworks, having coordinated six and taken leadership responsibilities at various levels in many others. He has supervised more than 200 M.Sc./Ph.D. degree students, edited six books, contributed to European strategic documents, and authored more than 500 papers in international and national journals and conferences, for which he also served as a reviewer, an editor, and a board member. Internationally, he was a part of 37 Ph.D. juries, and 68 research projects and institution's evaluation committees for funding agencies in 12 countries, and the European COST and Commission. He has been the chairman of conferences, the technical programme committee, and the steering committee of various major conferences, besides other several duties. He was a National Delegate of the COST Domain Committee on the ICT. He was active in the European NetWorks Platform, by being an Elected Member of its Expert Advisory Group and of its Steering Board, and the Chairman of its Working Group on Applications, and was also elected to the European 5G PPP Association. He has launched and served as the Chairman for the IEEE Communications Society Portugal Chapter. Researcher of a number of significant projects in different areas, including permanent magnet thrusters and ship intelligence (equipment health monitoring). His main focus of research is currently statistical (machine) learning. In addition, his research interests include digital signal processing (statistical signal processing, Hilbert transform, and time-frequency spectral analysis) and telecommunications (digital modulations and information theory).