Assessment of Treatment Influence in Mobile Network Coverage on Board High-Speed Trains

In this paper, we investigate the systematic change in the signal quality of mobile telecommunication systems inside a high-speed train. Using commercial mobile phones, we perform measurements inside a carriage of a high-speed train with an installed repeater and inside a carriage of a different high-speed train equipped with low radio frequency attenuation windows. We carry out additional reference measurements inside regular carriages of a high speed train. We use the adjusted Harrell–Davis method and the nonparametric bootstrap to test multiple quantiles of samples from the key performance indicators produced by the commercial mobile phones. In addition, we propose a common framework to standardize future comparisons on such studies. The results confirm considerable improvements in signal quality resulting from both the repeater and low RF attenuation windows.

Some of the significant issues on the physical layer of the protocol stack include small-scale fading, large-scale fading, and the time and frequency selectivity [5]- [7]. On the higher layers of the protocol stack, the so-called signaling storm is caused by the large number of mobile users that simultaneously access the network, which overloads the available signaling resources [5]. The diverse nature of those challenges impede mobile network providers from delivering high-quality services to consumers while on board HSTs. As a result, this motivates researchers to conduct measurement campaigns in an attempt to address those challenges. Evaluations performed in [8] report degradations in the form of packet drops, network disconnections, and Round Trip Time (RTT) at velocity ranging 200-300 km/h. A similar study in [9] has found sufficient throughput up to 100 km/h; however, service quality significantly degenerated at velocities of 300 km/h and beyond.
Commercially available solutions to improve network on board HST vary in applicability, achievable gains, and complexity. We first distinguish between active and passive network improvements. Among the active improvements, there are the amplify-and-forward repeaters [10], which are transparent to the protocols, and the decode-and-forward repeaters [11]. Passive improvements constitute structural changes that reduce signal penetration loss [12], [13]. The work in [14] attempted to optimize handover. The studies in [15] and [16] propose an improved resource-scheduling scheme to reduce Quality of Service (QoS) problems. These studies evaluate the proposed scheme under static-, low-, and high-mobility contexts. In terms of statistical evaluation, the majority of the previously mentioned analyses have relied solely on descriptive statistics of the KPIs. An exception is the study in [17], where the authors suggest using nonparametric hypothesis testing and then divide the traveled trajectory into kilometerlong segments in order to evaluate the collected KPIs.

A. CHALLENGES
The KPIs in mobile networks vary at different time-space scale, and this poses the first challenge in KPI evaluation. The variations in a KPI due to a specific modification in a mobile network do not occur in isolation, but results from a combination of many different factors. Some of these factors are due to variations in space and time of the propagation channel, in the traffic load in the mobile network, and in the measurement sample [18], [19]. Secondly, the characteristics of the implemented modification need to be first evaluated before the KPIs of interest are selected. KPIs differ from one another in terms of the underlying statistical distribution and data type. This becomes a challenge for researchers when the KPIs need to be compared among diverse studies. Thirdly, the length of the trajectory, velocity of the vehicle, and the KPI's recording rate produces large data sets with large variations in KPI samples. Some KPIs have sample sizes in the order of tens of thousands, whereas others have just a few hundreds, therefore making it difficult to perform statistical comparisons. Furthermore, the mobile networks contain several Radio Access Technologies (RATs) that have discontinuities along the sampled trajectories [20].
For operational reasons, we cannot control the factors mentioned above, and any statistical inference concerning a KPI needs to be robust against these challenges. For this reason, we do not assume any specific form of KPI distribution. This motivates the use of nonparametric methods [21] to detect and infer changes in the KPIs of mobile communication networks under operational constraints and high mobility. In addition, the lack of standard terminologies among studies that focus on detecting change in mobile networks is another constraint. This undermines the comparability of results from the current literature and those from new studies. These conditions make it difficult for researchers to develop a protocol-independent method that can detect changes in the QoS of the mobile network as perceived by the user.

B. CURRENT WORK
We rely on inferential statistics as it enables comparing probability distributions of the underlying data generating processes.
In addition, we introduce terminology to standardize comparisons among similar studies. Likewise, the present contribution proposes an empirical KPI assessment methodology based on distribution-free hypothesis testing combined with the bootstrap [22], [23]. More specifically, the proposed scheme uses the adjusted Harrell-Davis (HD) estimator [24] developed by Wilcox [25] which can compare multiple quantiles of two distributions simultaneously. This method helps us to determine more accurately whether the distributions differ and how they differ from each other.
In the experiment, the unmodified HST carriage is called the placebo carriage. In an experimental setup, KPI samples are collected from a modified carriage and from a placebo carriage. We performed measurement in two experimental setups: repeater and low Radio Frequency (RF) attenuation windows. We compared the KPI distribution under the influence of a repeater and the KPI distribution under the influence of low RF attenuation windows with that of the placebo. The statistical analysis indicates that signal quality significantly improved.
This paper is organized as follows: Sec. II defines the proposed terminology, formulates the problem statement, and describes the collected data. Section III discusses our method for analyzing the data, whereas Sec. IV describes two case studies where the proposed method is applied. The results are discussed in Sec. IV-F, and we conclude with some remarks.

A. DEFINITIONS AND TERMINOLOGY
The following definitions characterize various measurement contexts in a mobile network. Figure 1 illustrates two different contexts. Fig.1 (a) shows the controlled condition where there is no mobility. Under the controlled condition, it is possible to isolate variables to a large degree. The uncontrolled condition is illustrated in Fig.1 (b). In this scenario the mobile network infrastructure is static and the users are mobile. In context (b), the variable isolation is limited and the samples display non-stationary statistical properties.
We further define the velocities of the transceivers v(p 1 ) and v(p 2 ) and the relative velocity as v r = v(p 1 ) − v(p 2 ). When the transceivers are fixed, v r = 0, we name it as static condition which is identical to the controlled conditions in Fig.1(a).
The non-static condition depicts the case in which either p 1 or p 2 moves along the traveled trajectory and v r = 0 as illustrated in Fig.1 (b). In terms of space, we define V as a volume equivalent to an indoor environment. When V is large, it can be further divided into spatial sections to improve the representation of the context under evaluation. Figure 2 illustrates the modifications done on the mobile networks or on a vehicle similar to those shown in Fig. 1. Any systematic modification is called a treatment and is denoted as Q. The context without systematic modification is called placebo q.
The effect of a treatment is called treatment effect and its magnitude is the effect size. Assessing a treatment effect requires comparing it with at least one of the following: 1) reference values, 2) absence of treatment or placebo q, 3) an alternative treatment e.g. Q 1 . In addition, we use y to indicate the observations from a treatment and x to denote placebo-related observations. The treatments, samples, and the comparison with reference values are shown in Fig. 2

(d).
A treatment without active elements is called passive treatment. The effect of such treatment is related to the changes in the characteristics of volume V, such as building materials and internal and external geometries [26], [27]. The treatment effect, which in this case is the Vehicular Penetration Loss (VPL), is estimated by the signal power levels P inside and P outside [28]. In controlled conditions, we control the transmitted signal power P TX at the transmitting antenna. Thus, we get a good estimate of P TX , namelyP TX , instead of using the estimatedP outside [29]. Since the volume's walls represent the treatment Q, we can estimate the treatment effect bŷ L Q (dB) = −10 log 10 P insidê P TX (dB) (1) whereL Q represents the transmission loss L Q due to a passive treatment.
A treatment composed of active systems that superimposes power gain on the signal levels is called active treatment. In the controlled condition, we assess the treatment effects via the signal power levels and express it as a gainĜ Q , G Q (dB) = 10 log 10 P Q P TX (dB) .
The measure L Q in (1) is similar to the signal transmission losses in buildings [30] or vehicles [13], [31]. Both L Q and G Q are compared to referential values, e.g., measurements inside an anechoic chamber [13]. The variablesL andP are empirically derived from measuring power levels via averaging or a computation of a given quantile. In this study, the passive treatment is a Frequency Selective Surface (FSS), and the active treatment is a repeater.
Measurements performed via commercial mobile phones allow mobile network operators to assess the KPIs of interest. The number of observations available for a given KPI depends on the recording rate r and on the duration of the measurement d. The recording rate is the number of KPI points recorded per second. The product, r ×d, represents the expected observations from a measurement. We refer to the observed data as recorded observations. The reliable observations result from data preprocessing. In measurements performed under uncontrolled conditions, given a KPI and its recording rate, we define spatiotemporal granularity as the number of KPI points spread along the traveled trajectory. A mobile phone is said to be in a connected state when it performs a task that actively generates user data, e.g., a download or a phone call. Further, a mobile phone is said to be in an idle state when it is registered and connected to the mobile network without any active user data transfer. Fig. 1 (a), shows the controlled condition in which the mobility state can be controlled through constant distance p 1 − p 2 and has no velocity. The signals can be controlled through transmitted power levels, angle of incidence, and polarization of the RF signal. Meanwhile, the propagation conditions can be predesigned and controlled through reflective and dispersive behavior. As a result, we can fully control the distribution of the received signal by limiting the physical conditions in the experiment. This will ultimately control the statistical properties of the observed KPIs. Then, we compare with the KPI probability distributions inferred for the no-treatment scenario to assess the treatment effect. Figure 1 (b) depicts a context that is characterized by non static and uncontrolled conditions, which constitute a real-world operational situation of transceivers inside HSTs. In this operational context, the users on board HSTs move at trajectories s(x, y, z) while being served by the mobile network. We assume that the Base Station (BS) is at a fixed position p 1 (x, y, z), and the user is at position p 2 (x, y, z) on board the moving HST. The relative instantaneous velocity v r = v p 2 (x, y, z) along the trajectory contains stops, accelerations, and intervals of steady velocity.

B. THE NEED FOR A NONPARAMETRIC APPROACH
The velocity of HSTs causes fast and frequent transitions in propagation conditions, that is, hilly and flat terrains, combinations of high and low building, and different  mobile network deployments. In railway tracks with dedicated mobile network BSs, the distance between HSTs is within few hundreds meters; thus, large blockages cause rapid transitions in the Line of Sight (LOS), strong multipath components, and Non Line of Sight (NLOS). The conditional distribution of the signal received by MSs inside HSTs changes under LOS and NLOS, respectively. The position of the BSs relative to the HSTs is considered to be random as there is no dedicated deployment for railway coverage. Hence, multipath and LOS propagation conditions are dominant in urban and rural geographical regions, respectively.
The velocity of the HSTs cause fast transitions in the propagation condition and it can be reasonably assumed that the received signal is characterized by a mixed distribution. We assume that a shorter travelled trajectory would decrease the number of such combinations. The treatment effect is then superimposed on effects from the propagation conditions. In such experimental setting, the resulting data do not have an easily identifiable or tractable distribution. Hence, we rely on a nonparametric approach and consider the distributions as unknown.
Mobile network resources are not uniformly distributed in space. That is, the available resources vary along the travelled trajectory. This variation is primarily due to the differences in the characteristics of the network deployment between dense urban, suburban, and rural regions. In this case, the spatial variation in radio resources results in variations in the QoS. The user perceives this variation through conditions such as No Service (NoS), call drops, insufficient data throughput [19], or frequent Radio Access Technology (RAT) handover [20]. Network providers have little or no control over the variables being measured when the KPIs are being assessed under operational conditions. In addition, accessing some KPIs requires a special setup such as that in [32], which is not feasible in HSTs.
Mobile operators have full control of their mobile networks and can collect KPI samples from both the infrastructure and the user side. The combined analyses of both sides is of strategic interest for a mobile operator. Therefore, the mobile operator wishes not to disclose KPI observations at the infrastructure side. Since we aim to assess the treatments regardless of any specific mobile operator, we are then forced to base the assessment solely on KPI samples at the user's side. In this way, Fig. 2 (a) and (c) represent the user's side. The mobile network side -e.g. BSs -is represented by Fig. 2  (b). In this study, we compare the KPI distribution under the treatment context with that under the no treatment, given that both distributions are unknown. Also, the distribution of x Q is unknown. In the following paragraphs, we discuss some possible changes in the KPI distributions resulting from treatment Q.

a)
Treatment effect x Q has no statistical significance. At a given level of significance, we cannot distinguish between the two KPIs that are with and without treatments. b) Treatment effect x Q changes the mean of Here, we infer the KPI distribution of y Q is related to the KPI distribution of x q by a deterministic shift. c) Treatment effect x Q impacts the variance of x q , i.e.,σ 2 y Q =σ 2 x q . Consequently, the y Q and x q differ.
C. DATA DESCRIPTION Figure 3 illustrates specific attributes of the KPI samples considered in this work: namely, the mobile network operator, radio access technology, frequency band, and physical channel. We analyse KPI distributions for specific choices of the attributes. The required filtering according to Fig. 3 is parametrized in the measurement setup and performed during data preprocessing. In this manner, we obtain a data set for one chosen set of attributes, which is then used to infer the treatment effect. The data set used in this work is attributed by one network operator, the RAT 3G, frequency band 1, and the downlink direction., cf. Fig. 3.
The mobile phone, MS, is a system that recovers information from the application layer contained in g(t, ω, s), e.g., voice, text files, pictures. In order to accomplish this task, the BS and MS exchange a large amount of information and many KPIs are computed from both sides. The data array D in (3) is a metadata file that contains records of the computed KPIs and mobile network information, e.g., system information, user information.
We represent the data array from the contexts with and without treatment by D Q and D q , respectively. A represen-tation of the treatment effect through the data array is where D Q is the treatment effect in the data array. The rows in (3) list A, B, . . . , E as measurement events. Both the measurement events and their content depend on various factors, such as the type of task performed on the application layer, e.g., file transfer, voice call, and position. Other factors are events from the mobile network during the measurement, e.g., absence of signal and RAT coverage. Under optimum conditions, the measurement events are recorded at a fixed interval. The outcome of a KPI is an element of a measurement event and appears in the same column of (3). For instance, a sample from a KPI (a i2 ) in the measurement event A is Here, the samples of a selected KPI from the contexts with and without treatment are x q and y Q , respectively, Using (4), we write the expression of the KPI under the treatment as where x Q is the treatment effect observed via a selected KPI x.
As the duration of the measurement increases, so does the length of the KPI vector in (6). A mobile phone connects only to one RAT at a time, and the distribution of the RAT is not necessarily continuous along a travelled trajectory. Given a RAT dependent KPI, the sample size of this KPI decreases by the proportion of the RAT available along the traveled trajectory, as depicted in Fig. 8. Computing a KPI depends on the parameters measured by the MS and on the values reported by the BS. However, mobile signal coverage gaps along the traveled trajectory are not uncommon and result in missing reports from the BS at many positions, which reduces the KPI sample size. The samples from the vector in (6) and (7) contain information on the temporal and spatial dynamics of the stochastic process.
In addition, train stations have dedicated cellular network that have better QoS than that along the railway. The trains that approach or stop at a train station are at least partially served by this dedicated cellular network. In this case, the spatial variation of the signal g(t, ω, s) is considerably smaller than the variation inside an HST that is moving steadily away from a train station. Both signal levels and service quality are better under these conditions. Consequently, a portion of the KPI samples in x and y contain only temporal variations, which results in better signal and service levels.

III. DATA ANALYSIS
For the statistical evaluation, we use a nonparametric statistical approach based on multiple hypotheses testing combined with bootstrap to test for treatment effect and to estimate the effect size. This approach assesses whether the treatment Q results in different KPI distribution based on the collected samples.
In order to design the data evaluation approach, we assume the following: A1 The variations in the observed KPIs are considered random independent identically distributed within a given geographical region and a single travel direction.

A2
The KPI probability distribution only depends on treatment Q. Other factors can be neglected. A3 The KPI distribution is unknown. To satisfy assumption A1, the travel trajectory is divided systematically into geographical regions. This results in urban regions S u1 , . . . , S uk u and rural regions S r1 , . . . , S rk r . The number of urban regions k u and rural regions k r depend on the length of the trajectory s(x, y), among other features. Selecting a geographical region results in a pair of data sets D Q and D q which contain samples fromŷ Q under treatment Q and samplesx q under treatment q (placebo).

A. ESTIMATION OF TREATMENT EFFECT
The adjusted HD method is a nonparametric approach for estimating the difference between sample quantiles of independent groups [24], [25]. This method uses the HD estimate [24] to compute the pth sample quantile F −1 (p) = θ p . Then, the difference between quantiles is computed via multiple comparisons.
Let a pair of quantiles from the samples with and without treatment beθ pq andθ pQ , respectively. We estimate the difference between those quantiles as The null hypothesis of no treatment effect is where θ pq and θ pQ are the true quantiles of the with and without treatment distributions, respectively. The function qcomhd in R package WRS2 [33] tests whether the corresponding single or multiple quantiles of two distributions are the same. That is, it computes (9), tests (10) and provides the 1 − α confidence interval. In order to specify the sample size for this estimation, we define a time window of 150 seconds in the measurement data. The samples from the desired KPI inside this time window form the sample size. The time window corresponds to the duration of the longest preprogrammed task and maintains partial agreement with recommendations in [34]. This results in 300 and 450 scores for the recording rates of 2 and 3 samples/sec, respectively. When the available number of scores in a KPI vector is smaller than 300, the entire sample is used. In order to perform the test, the function qcomhd produces bootstrap replicates with size equal to the size of the input data. For example, given input data of 100 scores, the function produces 2000 bootstrap replicates with size 100.
Our study also includes cases where the size of the input data is as large as 5000, and we exploit this large sample by extracting subsamples with smaller size. For this reason, we adjust the package to produce bootstrap replicates with prespecified sample size as the input in the function qcomhd.

B. STATISTICAL SIGNIFICANCE
The testing design in [25] is based on the null hypothesis significance testing [35]. A Type I error occurs when the null hypothesis is true, but is wrongfully rejected. The prespecified statistical significance, α = 0.05, represents the acceptable probability to commit Type I error. In [25], the empirical p-value, that is the empirical probability of type 162950 VOLUME 8, 2020 I error or α, is defined aŝ where A is the number of times that H 0 is rejected, C is the number of times that H 0 is not rejected, and B is the number of bootstrap replicates. Using (11), we further compute the p-value as 2 min(p * , 1− p * ). The HD method applies multiple comparisons, and the overall significance level is controlled via the classical Bonferroni method [36], in which the prespecified statistical significance is adjusted as where k is the number of tested quantiles. The null hypothesis is rejected if, and only if, at least one p-value satisfies p-value n < α n .

IV. CASE STUDY
In this case study, we apply the aforementioned nonparametric statistical approach based on multiple hypotheses testing to investigate the advantages of two commercial solutions to improve mobile network services inside HSTs, see Fig. 4. B1 A passive treatment based on modified windows equipped with a FSS. Fig. 5 depicts a FSS [30] that reduces the VPL of HSTs and hence improves mobile network services inside HSTs. B2 An active treatment using rooftop antennas and Amplify-and-Forward (AF) repeaters. In Fig. 6 the AF repeater creates an path that bypass the VPL in HSTs. Due to its external antenna, the repeater also allows users inside HSTs to access mobile network infrastructures near and distant from the railway track. The Device under Test (DUT) is the entire 210 -meter-long HST which consists of individual carriages of 26.5 -meters each. Each treatment is prototypically applied to carriages of two separate railjet trains [37]. The Measurement Entity (ME) shown in Fig. 7 is deployed on board railjet trains. Selecting and applying treatments Q and q to two different carriages yields V Q and V q . In the following sub-sections we further explore those treatments and their respective effects.

A. PASSIVE TREATMENT: FREQUENCY SELECTIVE WINDOWS
Current HSTs are equipped with metal-coated windows to provide thermal and visual comfort to passengers. As a side effect, the metal coating attenuates the wireless signal transmissions of cellular phones and Global Positioning System (GPS) up to 40 dB. The untreated HST is fully equipped with lightly coated windows, which are referred to as RW [12]. In general, a FSS consists of a two-dimensional periodic geometric structure composed of metallic and dielectric materials [30], [38]. The geometric structure defines the effective electromagnetic properties, i.e. permittivity, conductivity, and magnetic permeability.
The transmission of electromagnetic waves through a FSS is highly linear over a large dynamic range of wave amplitudes. Its transfer function depends on the frequency, polarization, and angle of incidence of the waves [12], [39], [40]. An engineered FSS functions as a frequency selective absorber, passive repeater [41], filter, or reflector [42]. The passive treatment consists in equipping a section of a HST with a FSS window, named SW, that yields a wide-band bandpass filter within the frequency range 0.7-3.5 GHz [43]. This SW features substantially smaller RF attenuation than a RW does. The Austrian Federal Railways partially upgraded a single carriage of a HST with SWs to evaluate how this modification improves the QoS of the mobile network inside the HST.
Using (1), we estimate the transmission loss of a HST without treatment bŷ L q = −10 log 10 P q P TX (dB) (14) and the loss of a HST with treatment by, HereP q andP Q are the received signal power estimates inside the HST.
To arrive at an equivalent representation for (8), we subtract (14) and (15) to express the estimated difference in transmission loss L , Taking the difference ensures that the value ofP TX cancels, as well as the antenna characteristics, cable losses, and many other setup specific properties. In the context with treatment, the reduced transmission loss increases the received signal power levels. The condition that attributes improvement to the current treatment is expressed as The improved signal power level can be represented aŝ whereP q ,P Q are quantile estimates of the KPI samples and represent the received signal power levels.   for amplification are selected by digitally programmable filters. The filter transfer functions have steep transitions between the pass-and stopbands which are associated with increased group delay [44].
The AF repeater is a device with non-linear behavior because there is a maximum output power level due to physical limits of the RF circuit and its power transmission limits. Furthermore, the AF repeater is a Single-Input Single-Output (SISO) device that uses a single antenna. This fundamentally influences the achievable data rates for high-speed mobile network services that employ Multiple Input Multiple Output (MIMO) techniques using multiple antennas at the BS and MS.
The active treatment creates an additional propagation path between the outside and the inside of the HST in parallel to the passive propagation paths through the windows. The performance of this active treatment strongly depends on the optimization of the AF repeater's configuration as well as the scenario under test, e.g. base station deployment along the track [45].
Similarly to Sec. IV-A, the gain under active treatment is written as The condition that attributes improvement to the current treatment is whereP q ,P Q are quantile estimates of KPIs samples and represent the received signal power levels.
In the equations above, the notationP is a selected KPI that represents an estimate of the received signal power level.

C. MEASUREMENT SETUP
We carried out measurement campaigns in two railjet trains [37]. The train with a carriage equipped with SW traveled from Vienna to Graz (198 km, 3 hour duration), while the other with a carriage equipped with an AF repeater traveled from Vienna to Salzburg (305 km, 3 hour duration), as detailed in Fig. 8.
The ME in Fig. 7, consists of a group of six commercial off-the shelf smart phones. For our measurement, we divided the smart phones in three groups of two. The first group was locked to 2G RAT only. The second group was locked to 2G and 3G RATs. The third group had access to 2G, 3G, and 4G. Each mobile was equipped with dedicated software (Anite Nemo Walker Air) [46] which was configured to repeat a fixed sequence of tasks. The tasks consisted of one voice call with total duration of 120s, one download and one upload both with 10 MB data volume. The tasks emulate user activities in quasi-real usage conditions. In both measurements, one ME was deployed inside section V Q , where treatment Q predominates, and another ME in section V q under the predominance of treatment q. The ME is placed on a passenger's seating, as shown in Fig. 7.

D. VARIABLES OF INTEREST
We expect to detect a statistically significant treatment effect in any selected KPI under the direct influence of the treatment. Different KPIs show different sensitivities to the treatments. Table 1 displays the group of KPIs used to assess the treatment effect.
Both the AF repeater and the SW are expected to improve link quality at the physical layer. For this reason, the treatment effect is expected to be associated with the metrics from this layer. Based on this association, three KPIs report signal levels in the downlink: Received Signal Strength Indicator (RSSI), Received Signal Code Power (RSCP) and received Signal-to-Interference Ratio (SIR). In Universal Mobile Terrestrial Services (UMTS), the User Equipment (UE) reports the RSSI and RSCP according to the Third Generation Partnership (3GPP) standards. The RSSI report measures the total received signal plus interference and noise power within the signal band of interest. It also serves as a rudimentary indicator of the power level of the signal of interest [47] and forms the basis for estimating more specific link quality-related KPIs [48]. The selected RSCP from the primary cell is such a more specific measure of link quality. The SIR indicates the signal quality and is estimated via the Dedicated Physical Control Channel (DPCCH) [47]. It is a metric that contains interference from co-channels and neighbor cells. Therefore, the variation in this KPI depends on the radio resource control and network topology (among others). In MRN contexts, the SIR is affected by the multipath components passing through the vehicle frame [49].

E. DATA PREPROCESSING
The size of the data sets depends on the length of the regions, the average velocity of the train, and the recording rate. Before the statistical evaluation of the KPIs discussed in Sec. IV-C, we filter the data sets with respect to mobile network operator and RAT of interest, cf. Sec. II-C. In addition, we select the geographical regions from the travelled trajectory. The 3G (UMTS) network offers continuous coverage along the travel trajectories and sufficient spatial granularity of the data sample.  We select the group of mobile phones that was configured to access 3G preferably and 2G alternatively (see Sec. IV-C) for a single operator. The mobiles phones repeatedly execute the tasks described in Sec. IV-C, and hence they remain predominantly in a connected state (only 2.8% in idle state along the route Vienna to Graz). The route Vienna to Salzburg traverses a well-covered flat terrain, thus contribution of the samples from an idle state is neglected in the statistical evaluation from both tracks. The stops at train stations are part of the regular train commute, and we have not imposed any special treatment of those.
With support from the Eurostat [50], we classify segments of the travelled trajectories Vienna-Graz s 1 (x, y) and Vienna-Salzburg s 2 (x, y) into regions as described in Sec. III. Figure 8 shows this classification, where each region corresponds to data sets with treatment Q and without treatment q.

F. KPI HISTOGRAMS
Figures 9 and 10 present histograms of KPIs comparing data with and without active treatment in urban and rural regions. In these histograms, we observe that the treatment effect imposes a considerable shift on the lower quantiles and imposes truncation due the limited output power of the AF repeater on the upper quantiles. The KPI histograms with treatment indicate left-skewed distributions, upper-bounded, and show a large overlap with the histograms without treatment. Figures 11 and 12 show histograms of KPIs comparing data with and without passive treatment in urban and rural regions. Here the histograms show that the treatment effect imposes a moderate shift on the lower quantiles and a small shift on the upper quantiles. The histograms indicate that the distributions with and without passive treatment share a large overlapped region. Tables 2 and 3 summarize the changes in the distributions from both active and passive treatments and attribute statistical significance to these.

G. EVALUATION FOR TREATMENT EFFECT
Traditionally, the median is used, for assessing changes in KPIs from the physical layer of mobile communication systems from measurements.
The treatment effect of the active treatment assessed via the median is summarized in Table 2. Its magnitude for the RSCP varies from 12.38 dB to 15.12 dB and from 16.08 dB to 20.02 dB in the urban and rural regions, respectively. The treatment effect of the passive treatment assessed via the median is summarized in Table 2. Its magnitude for the RSCP varies 1.80 dB to 7.97 dB and from 3.53 dB to 10.67 dB in the urban and rural regions, respectively.
Due to the characteristics of the active and passive treatments, the effect size that we evaluated solely via the median prevents the treatment effects from being detected in lower and upper quantiles. Indeed, the treatment effects from the active and passive treatments are not equal among the sample quantiles, as seen in Figs. 9-12. This behavior of the treatment effect is presented in Table 3 displaying results of multiple hypotheses testing carried out by the HD method.
In the following, we discuss the effect size for the 0.25 and 0.75 quantiles as summarized in Table 3. The column ''Effect size'' gives the 95% confidence interval for the shift in the data due to the treatment.
In the active treatment, the magnitude of the effect size for RSCP varies from 9.93 dB to 14.91 dB and from 15.56 dB to 27.93 dB in the urban and rural regions, respectively.
In the passive treatment, the magnitude of the treatment effect for RSCP varies between 3.53 dB and 14.08 dB for urban and from 5.22 dB to 22.79 dB for rural regions, respectively. The treatment effect in rural regions is more varied, i.e., the samples exhibit a wider range. This is due to the poor signal coverage in those areas. As a result, both treatments lead to larger improvements in those regions.
The values presented in Table 3 show that the treatment effect among the quantiles is unequal. This implies that both treatments change the KPI distribution. The null hypothesis of equal distribution is rejected for all evaluated quantiles.
The active treatment effect is more considerable in the rural region (Fig. 10) then in urban (Fig. 9). This is explained by the low signal levels in rural regions at the input of the AF  repeater. The received signals are amplified with the desired gain because the RF amplifiers of the AF repeater operate far below its output power limit.
The passive treatment shows no output power limit. However, the treatment effect depends on distribution of Angle of Arrival (AoA) and polarization. The distribution of AoA and polarization differs between the urban and rural regions, as a result of the differences in deployment of mobile network and characteristics of the terrain between those regions. Multipath propagation conditions are dominant in urban regions, whereas LOS is dominant in rural regions.
The FSS used in the passive treatment imposes higher attenuation on the weaker multipath components with highly VOLUME 8, 2020    scattered AoA. Accordingly, the passive treatment shows benefits in urban regions.
The analysis shows that the effect size depends on the characteristics of the treatment (active or passive), as well as the network deployment (rural or urban). The mobile service coverage in the urban regions is better than in the rural regions where signal levels are much lower. Therefore, the overall benefits of a treatment depend on the mix of networks deployments along the travelled trajectory.
The chosen segmentation of the travelled trajectory in terms of length (km) and type (urban vs. rural) influences the validity of assumption A2 in Sec. III, see Fig. 8. Short segments, e.g., 5-10 km, result in small spatial variations in the KPIs with treatment effect. Long segments, e.g., hundreds of kilometers, hide significant variations in the treatment effect on the KPI. This effect can be observed when comparing the RSCP statistics of the whole data set with the RSCP statistics in rural region for the active treatment. The results presented in Table 2 show this bias: the effect size along the 64-km track in the rural regions is larger at approximately 18 dB, compared to and an effect size of approximately 12 dB along the entire 305-km trajectory.
The treatment effect differs among the set of investigated KPIs. Some KPIs may consistently display no effect, in contrast to others. In our case study, the goal of the treatment is an enhanced signal level on board the HST.
Specifically, we emphasize that the results in Table 2 show that the effect sizes in the Rx SIR under both active and passive treatments are much smaller than in the RSCP and RSSI. This is partially explainable by the closed-loop transmit power control in the network for voice calls. This control loop varies the transmit power to achieve an operatorspecified target value for Rx SIR.
In regions, where the interference level is lower than the noise level plus VPL, the Rx SIR on board the HST is close to the Rx Signal-to-Noise Ratio (SNR). This is true for both treatments: active and passive. Here, any treatment reducing the VPL has the potential of improving the measured Rx SIR on board the HST.
Active treatments can completely compensate the VPL in rural areas where the received signal levels are low, but come at the expense of adding some noise. Passive treatments can only partially compensate the VPL and they do not add noise. Therefore, the active treatment shows more considerable KPI improvements in rural regions than the passive treatment. In urban regions, neither treatment shows much improvement.

V. CONCLUSION
In this paper, we inferred significant benefits of treatments in mobile network coverage on board high-speed trains (HSTs) in urban and rural regions through measurements of KPIs in Austria. Two treatments are assessed: an active treatment based on AF repeaters and a passive one using structured windows with low RF attenuation.
We use the modified HD method and nonparametric bootstrap to test multiple sample quantiles from collected key performance indicators (KPIs) for significance. This approach handles well the differences in mobility, regional conditions, and from the varying sample sizes along the railway track.
The results support the hypothesis that signal quality improves with both treatments. The improvement in the mobile network coverage on board the HST with the AF repeater is higher than that due to the installed low RF attenuation windows. The enhancements from both treatments are larger when the HST travels in rural regions.
Regarding the effect of the difference in signal quality, prior research has shown that QoS enhancement with both treatments is similar along the trajectories travelled in Austria. The experiments in this study cannot consider the effect of the network deployment along the railroad.